Matthew Weier-O’Phinney uploaded the slides from a presentation on using version control (Specifically, Subversion) aimed at new developers. If you don’t already use some form of version control (svn, git, bzr, or god forbid cvs), you should.
As a sysadmin who’s been getting into clustered virtualized hardware stuff, I’m unbelievably jealous of Wolfram Alpha’s custom Dell hardware (note: Youtube video with horrible music, I suggest hitting mute) — it’s a 2u, quad-board, dual-socket, quad-core system. You fit four servers into 2U of space. It’s essentially one of our 1950 or sc1435 virtualization hosts, but in 2u of space with an infiniband backplane.
I wish Dell would make toys like this publicly available. They’ve already done the design and engineering work, and I think there’s a huge market for this type of unit given Dell’s complete and …
Redis is an interesting database project that reminds me a bit of a low-cost-of-entry hadoop/couchDB/simpleDB. “MySQL is to Oracle as Redis is to couchDB.”
It’s a simple key/value database that keeps everything in RAM but writes to disk occasionally, sort of the way MySQL works but without the whole overhead of SQL. So, kind of like the bastard stepchild of Tokyo Cabinet and memcached.
Frankly, I’m not sure I buy the justification for needing to keep everything in RAM. It makes it easy in the short term, but in the long term as your dataset …
If you’re running a cluster environment with shared resources, you need to have STONITH or some sort of fencing running.
A cluster is a complicated beast. It’s a community of machines that makes decisions. The decisions can be simple and only affect the cluster (i.e. which service runs where and which client is responding to requests), or they can be complicated and communicate with outside devices or humans.
Fencing means that the cluster has the ability to kick a node out of the cluster if it starts doing things that the cluster doesn’t like, possibly because of a hardware failure, …
You can always use LSB scripts (/etc/init.d) with pacemaker, but it’s better to make them osb-compliant…
Seeing this error message in /var/log/xen/xend.log?
ERROR (SrvDaemon:347) Exception starting xend (no
element found: line 1, column 0)
You’ve got a corrupt xen status base. Go under /var/lib/xend/ and remove any xml files under any of those directories. Don’t delete the directories or sockets themselves.
I had this happen after a nice STONITH-induced reboot loop. Hint: When you’re setting up new devices in a cluster with STONITH, you might consider using “stonith-action=poweroff” so that when you create or encounter an error in your cluster configuration you don’t cause your machines to power cycle endlessly.
I had a weirdness happen as I was feeling my way through this node configuration with crm (pacemaker) 1.0.3.
It turns out that as I configured my resources and created locations and constraints, the crm created a bunch of lrm_resource (location resource manager) objects in the xml cib. You can’t see these from the crm shell, but you can see them if you dump the XML out using cibadmin –query > cluster.xml.
I was getting some strange errors. For example, I had the location constraint established for a resource named app-03-stonith such that it’s chance of running on app-03 was …
With the new email features in the most recent crm_mon daemon, it shouldn’t be too difficult to get a service set up so that Nagios will alert us when a stonith event happens, and maybe even some details about why.
It’ll take much longer for me to decide on what WAV file should play in my office when that event gets triggered. I’m really torn between an extremely loud “Boom, HEADSHOT!,” or the first few lines of “I shot the sheriff…”, or “Hey, Man, Nice Shot”, and maybe even “Karma Police” …
This thread, and all the other short …
With Dell kit of 1950/860 and newer, I’m using the built in IPMI-over-LAN in the BIOS for stonith instead of messing with DRAC5 or more complicated means. It’s easy to configure on it’s own IP and it just plain works.
First, a security note with people who have their machines on a “public” network: You’ll want to disable or set a password for the ‘null’ default user for ipmi. This may be done for you in recent versions of dell firmware, but it isn’t on some older stuff. Also, IPMI v. 1.5 doesn’t do encryption by default; you’ll want to …
Random notes … It’s nice to have manuals, but maybe these will help someone.
…
If you enjoy the content, consider subscribing to the feed(s).
1 Comment