[Gluster-users] Testing failover and recovery

Wed Dec 4 11:17:26 UTC 2013

Hello,

I've found GlusterFS to be an interesting project. Not so much experience
of it
(although from similar usecases with DRBD+NFS setups) so I setup some
testcase to try out failover and recovery.

For this I have a setup with two glusterfs servers (each is a VM) and one
client (also a VM).
I'm using GlusterFS 3.4 btw.

The servers manages a gluster volume created as:

gluster volume create testvol rep 2 transport tcp gs1:/export/vda1/brick
gs2:/export/vda1/brick
gluster volume start testvol
gluster volume set testvol network.ping-timeout 5

Then the client mounts this volume as:
mount -t glusterfs gs1:/testvol /import/testvol

Everything seems to work good in normal usecases, I can write/read to the
volume, take servers down and up again etc.

As a fault scenario, I'm testing a fault injection like this:

1. continuesly writing timestamps to a file on the volume from the client.
It is automated in a smaller testscript like:
per at hallsmark.se:~/glusterfs-test$ cat scripts/test-gfs-client.sh
#!/bin/sh

gfs=/import/testvol

while true; do
date +%s >> $gfs/timestamp.txt
ts=`tail -1 $gfs/timestamp.txt`
md5sum=`md5sum $gfs/timestamp.txt | cut -f1 -d" "`
echo "Timestamp = $ts, md5sum = $md5sum"
sleep 1
done
per at hallsmark.se:~/glusterfs-test$

As can be seen, the client is a quite simple user of the glusterfs volume.
Low datarate and single user for example.

2. disabling ethernet in one of the VM (ifconfig eth0 down) to simulate
like a broken network

3. After a short while, the failed server is brought alive again (ifconfig
eth0 up)

Step 2 and 3 is also automated in a testscript like:

per at hallsmark.se:~/glusterfs-test$ cat scripts/fault-injection.sh
#!/bin/sh

# fault injection script tailored for two glusterfs nodes named gs1 and gs2

if [ "$HOSTNAME" == "gs1" ]; then
peer="gs2"
else
peer="gs1"
fi

inject_eth_fault() {
echo "network down..."
ifconfig eth0 down
sleep 10
ifconfig eth0 up
echo "... and network up again."
}

recover() {
echo "recovering from fault..."
service glusterd restart
}

while true; do
sleep 60
if [ ! -f /tmp/nofault ]; then
if ping -c 1 $peer; then
inject_eth_fault
recover
fi
fi
done
per at hallsmark.se:~/glusterfs-test$

I then see that:

A. This goes well first time, one server leaves the cluster and the client
hang for like 8 seconds before beeing able to write to the volume again.

B. When the failed server comes back, I can check that from both servers
they see each other and "gluster peer status" shows they believe the other
is in connected state.

C. When the failed server comes back, it is not automatically seeking
active participation on syncing volume etc (the local storage timestamp
file isn't updated).

D. If I do restart of glusterd service (service glusterd restart) the
failed node seems to get back like it was before. Not always though... The
chance is higher if I have long time between fault injections (long = 60
sec or so, with a forced faulty state of 10 sec)
With a period time of some minutes, I could have the cluster servicing the
client OK for up to 8+ hours at least.
Shortening the period, I'm easily down to like 10-15 minutes.

E. Sooner or later I enter a state where the two servers seems to be up,
seeing it's peer (gluster peer status) and such but none is serving the
volume to the client.
I've tried to "heal" the volume in different way but it doesn't help.
Sometimes it is just that one of the timestamp copies in each of
the servers is ahead which is simpler but sometimes both the timestamp
files have added data at end that the other doesnt have.

To the questions:

* Is it so that from a design point of perspective, the choice in the
glusterfs team is that one shouldn't rely soley on glusterfs daemons beeing
able to  recover from a faulty state? There is need for cluster manager
services (like heartbeat for example) to be part? That would make
experience C understandable and one could then take heartbeat or similar
packages to start/stop services.

* What would then be the recommended procedure to recover from a faulty
glusterfs node? (so that experience D and E is not happening)

* What is the expected failover timing (of course depending on config, but
say with a give ping timeout etc)?
  and expected recovery timing (with similar dependency on config)?

* What/how is glusterfs team testing to make sure that the failover,
recovery/healing functionality etc works?

Any opinion if the testcase is bad is of course also very welcome.

Best regards,
Per
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131204/3b5f0062/attachment.html>