rants, tirades, ruminations
DRBD v2 posted Wed, 08 May 2013 15:12:17 UTC
Previously I had written a fairly lengthy post on creating a cheap SAN using DRBD, iSCSI, and corosync/pacemaker. It was actually the second time we had done this setup at work, having originally done iSCSI LUN’s using logical volumes on top of a single DRBD resource instead of what I described in my last post where we did iSCSI LUN’s which were themselves separate DRBD resources on top of local logical volumes on each node of the cluster. Having run with that for awhile, and added around forty LUN’s, I will say that it is rather slow at migrating from the primary to secondary node and only takes longer as we continue to add new DRBD resources.
Since we’re in the process of setting up a new DRBD cluster, we’ve decided to go back to using the original design of using iSCSI LUN’s using logical volumes on top of one large, single DRBD resource. I’ll also mention that we had some real nightmares using the latest and greatest versions of Pacemaker 1.1.8 in Red Hat Enterprise Linux 6.4, so we’re also pegging our cluster tools at the previous versions of everything which shipped in 6.3. Maybe the 6.4 stuff would have wokred if we were running a cluster in the more tradional Red Hat way (using CMAN).
So now our sl.repo file specifies the 6.3 release:
[scientific-linux]
name=Scientific Linux - $releasever
baseurl=http://ftp.scientificlinux.org/linux/scientific/6.3/$basearch/os/
enabled=1
gpgcheck=0
And we’ve also added a newer version of crmsh which must be installed forcibly from the RPM itself as it overwrites some of the files in the RHEL 6.3 pacemaker packages:
rpm --replacefiles -Uvh http://download.opensuse.org/repositories/network:/ha-clustering/RedHat_RHEL-6/x86_64/crmsh-1.2.5-55.3.x86_64.rpm
We did this specifically to allow use of rsc_template in our cluster which cleans everything up and makes the configuration hilariously simple.
We’ve also cleaned up the corosync configuration a bit by removing /etc/corosync/service.d/pcmk and adding that to the main configuration, as well as making use of the key we generated using corosync-keygen by enabling secauth:
amf {
mode: disabled
}
logging {
fileline: off
to_stderr: no
to_logfile: yes
to_syslog: no
logfile: /var/log/cluster/corosync.log
debug: off
timestamp: on
logger_subsys {
subsys: AMF
debug: off
tags: enter|leave|trace1|trace2|trace3|trace4|trace6
}
}
totem {
version: 2
token: 10000
token_retransmits_before_loss_const: 10
vsftype: none
secauth: on
threads: 0
rrp_mode: active
interface {
ringnumber: 0
bindnetaddr: 172.16.165.0
broadcast: yes
mcastport: 5405
}
interface {
ringnumber: 1
bindnetaddr: 10.0.0.0
broadcast: yes
mcastport: 5405
}
}
service {
ver: 1
name: pacemaker
}
aisexec {
user: root
group: root
}
corosync {
user: root
group: root
}
Other than that, there’s onle one DRBD resource now. And once it’s configured, you shouldn’t ever really need to touch DRBD at all. lvcreate happens only once, and only on the primary storage node. We’ve also learned that corosync-cfgtool -s
may not always be the best way to check membership, so you can also check corosync-objctl | grep member
.
We also ran across a DRBD related bug in 6.4 which seems to affect this mixed 6.3/6.4 environment as well. We’re still using kmod-drbd84 from El Repo, which is currently at version 8.4.2. Apparently in the shipping version of 8.4.3, they’ve fixed the bug that causes the file /usr/lib/drbd/crm-fence-peer.sh to break things horribly under 6.4 but also seems to work better even using Pacemaker 1.1.7 under 6.3. I recommend grabbing the tarball for 8.4.3 and overwriting the version shipping with 8.4.2. I’m sure as soon as 8.4.3 is packaged and available on El Repo, this won’t be necessary.
You might want to set up a cronjob to run this DRBD verification script once a month or so:
#!/bin/sh
for i in $(drbdsetup show all | grep ^resource | awk '{print $2}' | sed -e 's/^r//'); do
drbdsetup verify $i
drbdsetup wait-sync $i
done
echo "DRBD device verification completed"
And maybe run this cluster backup script nightly just so you always have a reference point if something significant changes in your cluster:
#!/usr/bin/env bash
#define some variables
PATH=/bin:/sbin:/usr/bin:/usr/sbin
hour=$(date +"%H%M")
today=$(date +"%Y%m%d")
basedir="/srv/backups/cluster"
daily=$basedir/daily/$today
monthly=$basedir/monthly
lock="/tmp/$(basename $0)"
if test -f $lock; then
echo "exiting; lockfile $lock exists; please check for existing backup process"
exit 1
else
touch $lock
fi
if ! test -d $daily ; then
mkdir -p $daily
fi
if ! test -d $monthly ; then
mkdir -p $monthly
fi
# dump and compress both CRM and CIB
crm_dumpfile="crm-$today-$hour.txt.xz"
if ! crm configure show | xz -c > $daily/$crm_dumpfile; then
echo "something went wrong while dumping CRM on $(hostname -s)"
else
echo "successfully dumped CRM on $(hostname -s)"
fi
cib_dumpfile="cib-$today-$hour.xml.xz"
if ! cibadmin -Q | xz -c > $daily/$cib_dumpfile; then
echo "something went wrong while dumping CIB on $(hostname -s)"
else
echo "successfully dumped CIB on $(hostname -s)"
fi
# keep a monthly copy
if test "x$(date +"%d")" == "x01" ; then
monthly=$monthly/$today
mkdir -p $monthly
cp $daily/$crm_dumpfile $monthly
cp $daily/$cib_dumpfile $monthly
fi
# remove daily backups after 2 weeks
for dir in $(find "$basedir/daily/" -type d -mtime +14| sort); do
if test -d "$dir"; then
echo "removing $dir"
rm -rf "$dir"
else
echo "$dir not found"
fi
done
# remove monthly backups after 6 months
for dir in $(find "$basedir/monthly/" -type d -mtime +180| sort); do
if test -d "$dir"; then
echo "removing $dir"
rm -rf "$dir"
else
echo "$dir not found"
fi
done
rm -f $lock
And finally, we have the actual cluster configuration itself, more or less straight out of production:
node salt
node pepper
rsc_template lun ocf:heartbeat:iSCSILogicalUnit \
params target_iqn="iqn.2013-04.net.bitgnome:vh-storage01" additional_parameters="mode_page=8:0:18:0x10:0:0xff:0xff:0:0:0xff:0xff:0xff:0xff:0x80:0x14:0:0:0:0:0:0" \
op start interval="0" timeout="10" \
op stop interval="0" timeout="10" \
op monitor interval="10" timeout="10"
primitive fence-salt stonith:fence_ipmilan \
params ipaddr="172.16.74.164" passwd="abcd1234" login="laitsadmin" verbose="true" pcmk_host_list="salt" \
op start interval="0" timeout="20" \
op stop interval="0" timeout="20"
primitive fence-pepper stonith:fence_ipmilan \
params ipaddr="172.16.74.165" passwd="abcd1234" login="laitsadmin" verbose="true" pcmk_host_list="pepper" \
op start interval="0" timeout="20" \
op stop interval="0" timeout="20"
primitive ip ocf:heartbeat:IPaddr2 \
params ip="172.16.165.24" cidr_netmask="25" \
op start interval="0" timeout="20" \
op stop interval="0" timeout="20" \
op monitor interval="10" timeout="20"
primitive lun1 @lun \
params lun="1" path="/dev/vg0/vm-ldap1"
primitive lun2 @lun \
params lun="2" path="/dev/vg0/vm-test1"
primitive lun3 @lun \
params lun="3" path="/dev/vg0/vm-mail11"
primitive lun4 @lun \
params lun="4" path="/dev/vg0/vm-mail2"
primitive lun5 @lun \
params lun="5" path="/dev/vg0/vm-www1"
primitive lun6 @lun \
params lun="6" path="/dev/vg0/vm-ldap-slave1"
primitive lun7 @lun \
params lun="7" path="/dev/vg0/vm-ldap-slave2"
primitive lun8 @lun \
params lun="8" path="/dev/vg0/vm-ldap-slave3"
primitive lun9 @lun \
params lun="9" path="/dev/vg0/vm-www2"
primitive lvm_vg0 ocf:heartbeat:LVM \
params volgrpname="vg0" \
op start interval="0" timeout="30" \
op stop interval="0" timeout="30" \
op monitor interval="10" timeout="30" depth="0"
primitive r0 ocf:linbit:drbd \
params drbd_resource="r0" \
op start interval="0" timeout="240" \
op promote interval="0" timeout="90" \
op demote interval="0" timeout="90" \
op notify interval="0" timeout="90" \
op stop interval="0" timeout="100" \
op monitor interval="20" role="Slave" timeout="20" \
op monitor interval="10" role="Master" timeout="20"
primitive tgt ocf:heartbeat:iSCSITarget \
params iqn="iqn.2013-04.net.bitgnome:vh-storage01" tid="1" allowed_initiators="172.16.165.18 172.16.165.19 172.16.165.20 172.16.165.21" \
op start interval="0" timeout="10" \
op stop interval="0" timeout="10" \
op monitor interval="10" timeout="10"
ms ms-r0 r0 \
meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true"
location salt-fencing fence-salt -inf: salt
location pepper-fencing fence-pepper -inf: pepper
colocation drbd-with-tgt inf: ms-r0:Master tgt:Started
colocation ip-with-lun inf: ip lun
colocation lun-with-lvm inf: lun lvm_stor01
colocation lvm-with-drbd inf: lvm_stor01 ms-r0:Master
order drbd-before-lvm inf: ms-r0:promote lvm_stor01:start
order lun-before-ip inf: lun ip
order lvm-before-lun inf: lvm_stor01 lun
order tgt-before-drbd inf: tgt ms-r0
property $id="cib-bootstrap-options" \
dc-version="1.1.7-6.el6-abcd1234" \
cluster-infrastructure="openais" \
expected-quorum-votes="2" \
no-quorum-policy="ignore" \
stonith-enabled="true" \
last-lrm-refresh="1368030674" \
stonith-action="reboot"
rsc_defaults $id="rsc-options" \
resource-stickiness="100"
The great part about this configuration is that the constraints are all tied to the rsc_template, so you don’t need to specify new constraints each time you add a new LUN. And because we’re using a template, the actual LUN primitives are as short as possible while still uniquely identifying each unit. It’s quite lovely really.