DRBD v2 posted Wed, 08 May 2013 15:12:17 UTC

Previously I had written a fairly lengthy post on creating a cheap SAN using DRBD, iSCSI, and corosync/pacemaker. It was actually the second time we had done this setup at work, having originally done iSCSI LUN's using logical volumes on top of a single DRBD resource instead of what I described in my last post where we did iSCSI LUN's which were themselves separate DRBD resources on top of local logical volumes on each node of the cluster. Having run with that for awhile, and added around forty LUN's, I will say that it is rather slow at migrating from the primary to secondary node and only takes longer as we continue to add new DRBD resources.

Since we're in the process of setting up a new DRBD cluster, we've decided to go back to using the original design of using iSCSI LUN's using logical volumes on top of one large, single DRBD resource. I'll also mention that we had some real nightmares using the latest and greatest versions of Pacemaker 1.1.8 in Red Hat Enterprise Linux 6.4, so we're also pegging our cluster tools at the previous versions of everything which shipped in 6.3. Maybe the 6.4 stuff would have wokred if we were running a cluster in the more tradional Red Hat way (using CMAN).

So now our sl.repo file specifies the 6.3 release:


[scientific-linux]
name=Scientific Linux - $releasever
baseurl=http://ftp.scientificlinux.org/linux/scientific/6.3/$basearch/os/
enabled=1
gpgcheck=0

And we've also added a newer version of crmsh which must be installed forcibly from the RPM itself as it overwrites some of the files in the RHEL 6.3 pacemaker packages:


rpm --replacefiles -Uvh http://download.opensuse.org/repositories/network:/ha-clustering/RedHat_RHEL-6/x86_64/crmsh-1.2.5-55.3.x86_64.rpm

We did this specifically to allow use of rsc_template in our cluster which cleans everything up and makes the configuration hilariously simple.

We've also cleaned up the corosync configuration a bit by removing /etc/corosync/service.d/pcmk and adding that to the main configuration, as well as making use of the key we generated using corosync-keygen by enabling secauth:


amf {
  mode: disabled
}
 
logging {
  fileline: off
  to_stderr: no
  to_logfile: yes
  to_syslog: no
  logfile: /var/log/cluster/corosync.log
  debug: off
  timestamp: on
  logger_subsys {
    subsys: AMF
    debug: off
    tags: enter|leave|trace1|trace2|trace3|trace4|trace6
  }
}
 
totem {
  version: 2
  token: 10000
  token_retransmits_before_loss_const: 10
  vsftype: none
  secauth: on
  threads: 0
  rrp_mode: active
 
 
  interface {
    ringnumber: 0
    bindnetaddr: 172.16.165.0
    broadcast: yes
    mcastport: 5405
  }
  interface {
    ringnumber: 1
    bindnetaddr: 10.0.0.0
    broadcast: yes
    mcastport: 5405
  }
}

service {
  ver: 1
  name: pacemaker
}

aisexec {
  user: root
  group: root
}
 
corosync {
  user: root
  group: root
}

Other than that, there's onle one DRBD resource now. And once it's configured, you shouldn't ever really need to touch DRBD at all. lvcreate happens only once, and only on the primary storage node. We've also learned that corosync-cfgtool -s may not always be the best way to check membership, so you can also check corosync-objctl | grep member.

We also ran across a DRBD related bug in 6.4 which seems to affect this mixed 6.3/6.4 environment as well. We're still using kmod-drbd84 from El Repo, which is currently at version 8.4.2. Apparently in the shipping version of 8.4.3, they've fixed the bug that causes the file /usr/lib/drbd/crm-fence-peer.sh to break things horribly under 6.4 but also seems to work better even using Pacemaker 1.1.7 under 6.3. I recommend grabbing the tarball for 8.4.3 and overwriting the version shipping with 8.4.2. I'm sure as soon as 8.4.3 is packaged and available on El Repo, this won't be necessary.

You might want to set up a cronjob to run this DRBD verification script once a month or so:


#!/bin/sh

for i in $(drbdsetup show all | grep ^resource | awk '{print $2}' | sed -e 's/^r//'); do
	drbdsetup verify $i
	drbdsetup wait-sync $i
done

echo "DRBD device verification completed"

And maybe run this cluster backup script nightly just so you always have a reference point if something significant changes in your cluster:

#!/bin/bash

#define some variables
PATH=/bin:/sbin:/usr/bin:/usr/sbin
hour=$(date +"%H%M")
today=$(date +"%Y%m%d")
basedir="/srv/backups/cluster"
daily=$basedir/daily/$today
monthly=$basedir/monthly
lock="/tmp/$(basename $0)"

if test -f $lock; then
	echo "exiting; lockfile $lock exists; please check for existing backup process"
	exit 1
else
	touch $lock
fi

if ! test -d $daily ; then
	mkdir -p $daily
fi

if ! test -d $monthly ; then
	mkdir -p $monthly
fi


# dump and compress both CRM and CIB
crm_dumpfile="crm-$today-$hour.txt.xz"
if ! crm configure show | xz -c > $daily/$crm_dumpfile; then
	echo "something went wrong while dumping CRM on $(hostname -s)"
else
	echo "successfully dumped CRM on $(hostname -s)"
fi

cib_dumpfile="cib-$today-$hour.xml.xz"
if ! cibadmin -Q | xz -c > $daily/$cib_dumpfile; then
	echo "something went wrong while dumping CIB on $(hostname -s)"
else
	echo "successfully dumped CIB on $(hostname -s)"
fi

# keep a monthly copy
if test "x$(date +"%d")" == "x01" ; then
	monthly=$monthly/$today
	mkdir -p $monthly
	cp $daily/$crm_dumpfile $monthly
	cp $daily/$cib_dumpfile $monthly
fi

# remove daily backups after 2 weeks
for dir in $(find "$basedir/daily/" -type d -mtime +14| sort); do
	if test -d "$dir"; then
		echo "removing $dir"
		rm -rf "$dir"
	else
		echo "$dir not found"
	fi
done

# remove monthly backups after 6 months
for dir in $(find "$basedir/monthly/" -type d -mtime +180| sort); do
	if test -d "$dir"; then
		echo "removing $dir"
		rm -rf "$dir"
	else
		echo "$dir not found"
	fi
done

rm -f $lock

And finally, we have the actual cluster configuration itself, more or less straight out of production:


node salt
node pepper
rsc_template lun ocf:heartbeat:iSCSILogicalUnit \
	params target_iqn="iqn.2013-04.net.bitgnome:vh-storage01" additional_parameters="mode_page=8:0:18:0x10:0:0xff:0xff:0:0:0xff:0xff:0xff:0xff:0x80:0x14:0:0:0:0:0:0" \
	op start interval="0" timeout="10" \
	op stop interval="0" timeout="10" \
	op monitor interval="10" timeout="10"
primitive fence-salt stonith:fence_ipmilan \
	params ipaddr="172.16.74.164" passwd="abcd1234" login="laitsadmin" verbose="true" pcmk_host_list="salt" \
	op start interval="0" timeout="20" \
	op stop interval="0" timeout="20"
primitive fence-pepper stonith:fence_ipmilan \
	params ipaddr="172.16.74.165" passwd="abcd1234" login="laitsadmin" verbose="true" pcmk_host_list="pepper" \
	op start interval="0" timeout="20" \
	op stop interval="0" timeout="20"
primitive ip ocf:heartbeat:IPaddr2 \
	params ip="172.16.165.24" cidr_netmask="25" \
	op start interval="0" timeout="20" \
	op stop interval="0" timeout="20" \
	op monitor interval="10" timeout="20"
primitive lun1 @lun \
	params lun="1" path="/dev/vg0/vm-ldap1"
primitive lun2 @lun \
	params lun="2" path="/dev/vg0/vm-test1"
primitive lun3 @lun \
	params lun="3" path="/dev/vg0/vm-mail11"
primitive lun4 @lun \
	params lun="4" path="/dev/vg0/vm-mail2"
primitive lun5 @lun \
	params lun="5" path="/dev/vg0/vm-www1"
primitive lun6 @lun \
	params lun="6" path="/dev/vg0/vm-ldap-slave1"
primitive lun7 @lun \
	params lun="7" path="/dev/vg0/vm-ldap-slave2"
primitive lun8 @lun \
	params lun="8" path="/dev/vg0/vm-ldap-slave3"
primitive lun9 @lun \
	params lun="9" path="/dev/vg0/vm-www2"
primitive lvm_vg0 ocf:heartbeat:LVM \
	params volgrpname="vg0" \
	op start interval="0" timeout="30" \
	op stop interval="0" timeout="30" \
	op monitor interval="10" timeout="30" depth="0"
primitive r0 ocf:linbit:drbd \
	params drbd_resource="r0" \
	op start interval="0" timeout="240" \
	op promote interval="0" timeout="90" \
	op demote interval="0" timeout="90" \
	op notify interval="0" timeout="90" \
	op stop interval="0" timeout="100" \
	op monitor interval="20" role="Slave" timeout="20" \
	op monitor interval="10" role="Master" timeout="20"
primitive tgt ocf:heartbeat:iSCSITarget \
	params iqn="iqn.2013-04.net.bitgnome:vh-storage01" tid="1" allowed_initiators="172.16.165.18 172.16.165.19 172.16.165.20 172.16.165.21" \
	op start interval="0" timeout="10" \
	op stop interval="0" timeout="10" \
	op monitor interval="10" timeout="10"
ms ms-r0 r0 \
	meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true"
location salt-fencing fence-salt -inf: salt
location pepper-fencing fence-pepper -inf: pepper
colocation drbd-with-tgt inf: ms-r0:Master tgt:Started
colocation ip-with-lun inf: ip lun
colocation lun-with-lvm inf: lun lvm_stor01
colocation lvm-with-drbd inf: lvm_stor01 ms-r0:Master
order drbd-before-lvm inf: ms-r0:promote lvm_stor01:start
order lun-before-ip inf: lun ip
order lvm-before-lun inf: lvm_stor01 lun
order tgt-before-drbd inf: tgt ms-r0
property $id="cib-bootstrap-options" \
	dc-version="1.1.7-6.el6-abcd1234" \
	cluster-infrastructure="openais" \
	expected-quorum-votes="2" \
	no-quorum-policy="ignore" \
	stonith-enabled="true" \
	last-lrm-refresh="1368030674" \
	stonith-action="reboot"
rsc_defaults $id="rsc-options" \
        resource-stickiness="100"

The great part about this configuration is that the constraints are all tied to the rsc_template, so you don't need to specify new constraints each time you add a new LUN. And because we're using a template, the actual LUN primitives are as short as possible while still uniquely identifying each unit. It's quite lovely really.