[Gluster-users] Testing failover and recovery

Mon Dec 9 13:12:22 UTC 2013

Hello,

Interesting, we seems to be several users with issues regarding recovery
but there is no to little replies... ;-)

I did some more testing over the weekend. Same initial workload (two
glusterfs servers, one client that continuesly
updates a file with timestamps) and then two easy testcases:

1. one of the glusterfs servers is constantly rebooting (just a initscript
that sleeps for 60 seconds before issuing "reboot")

2. similar to 1 but instead of rebooting itself, it is rebooting the other
glusterfs server so that the result is that they a server
    comes up, wait for a bit and then rebooting the other server.

During the whole weekend this has progressed nicely. The client is running
all the time without issues and the glusterfs
that comes back (either only one or one of the servers, depending on the
testcase shown above) is actively getting into
sync and updates it's copy of the file.

So it seems to me that we need to look deeper in the recovery case (of
course, but it is interesting to know about the
nice&easy usescases as well). I'm surprised that the recovery from a
failover (to restore the rendundancy) isn't getting
higher attention here. Are we (and others that has difficulties in this
area) running a unusual usecase?

BR,
Per

On Wed, Dec 4, 2013 at 12:17 PM, Per Hallsmark <per at hallsmark.se> wrote:

> Hello,
>
> I've found GlusterFS to be an interesting project. Not so much experience
> of it
> (although from similar usecases with DRBD+NFS setups) so I setup some
> testcase to try out failover and recovery.
>
> For this I have a setup with two glusterfs servers (each is a VM) and one
> client (also a VM).
> I'm using GlusterFS 3.4 btw.
>
> The servers manages a gluster volume created as:
>
> gluster volume create testvol rep 2 transport tcp gs1:/export/vda1/brick
> gs2:/export/vda1/brick
> gluster volume start testvol
> gluster volume set testvol network.ping-timeout 5
>
> Then the client mounts this volume as:
> mount -t glusterfs gs1:/testvol /import/testvol
>
> Everything seems to work good in normal usecases, I can write/read to the
> volume, take servers down and up again etc.
>
> As a fault scenario, I'm testing a fault injection like this:
>
> 1. continuesly writing timestamps to a file on the volume from the client.
> It is automated in a smaller testscript like:
> :~/glusterfs-test$ cat scripts/test-gfs-client.sh
> #!/bin/sh
>
> gfs=/import/testvol
>
> while true; do
> date +%s >> $gfs/timestamp.txt
> ts=`tail -1 $gfs/timestamp.txt`
>  md5sum=`md5sum $gfs/timestamp.txt | cut -f1 -d" "`
> echo "Timestamp = $ts, md5sum = $md5sum"
>  sleep 1
> done
> :~/glusterfs-test$
>
> As can be seen, the client is a quite simple user of the glusterfs volume.
> Low datarate and single user for example.
>
>
> 2. disabling ethernet in one of the VM (ifconfig eth0 down) to simulate
> like a broken network
>
> 3. After a short while, the failed server is brought alive again (ifconfig
> eth0 up)
>
> Step 2 and 3 is also automated in a testscript like:
>
> :~/glusterfs-test$ cat scripts/fault-injection.sh
> #!/bin/sh
>
> # fault injection script tailored for two glusterfs nodes named gs1 and gs2
>
> if [ "$HOSTNAME" == "gs1" ]; then
> peer="gs2"
> else
> peer="gs1"
> fi
>
> inject_eth_fault() {
> echo "network down..."
> ifconfig eth0 down
>  sleep 10
> ifconfig eth0 up
> echo "... and network up again."
> }
>
> recover() {
> echo "recovering from fault..."
> service glusterd restart
> }
>
> while true; do
> sleep 60
> if [ ! -f /tmp/nofault ]; then
> if ping -c 1 $peer; then
>  inject_eth_fault
> recover
> fi
> fi
> done
> :~/glusterfs-test$
>
>
> I then see that:
>
> A. This goes well first time, one server leaves the cluster and the client
> hang for like 8 seconds before beeing able to write to the volume again.
>
> B. When the failed server comes back, I can check that from both servers
> they see each other and "gluster peer status" shows they believe the other
> is in connected state.
>
> C. When the failed server comes back, it is not automatically seeking
> active participation on syncing volume etc (the local storage timestamp
> file isn't updated).
>
> D. If I do restart of glusterd service (service glusterd restart) the
> failed node seems to get back like it was before. Not always though... The
> chance is higher if I have long time between fault injections (long = 60
> sec or so, with a forced faulty state of 10 sec)
> With a period time of some minutes, I could have the cluster servicing the
> client OK for up to 8+ hours at least.
> Shortening the period, I'm easily down to like 10-15 minutes.
>
> E. Sooner or later I enter a state where the two servers seems to be up,
> seeing it's peer (gluster peer status) and such but none is serving the
> volume to the client.
> I've tried to "heal" the volume in different way but it doesn't help.
> Sometimes it is just that one of the timestamp copies in each of
> the servers is ahead which is simpler but sometimes both the timestamp
> files have added data at end that the other doesnt have.
>
> To the questions:
>
> * Is it so that from a design point of perspective, the choice in the
> glusterfs team is that one shouldn't rely soley on glusterfs daemons beeing
> able to  recover from a faulty state? There is need for cluster manager
> services (like heartbeat for example) to be part? That would make
> experience C understandable and one could then take heartbeat or similar
> packages to start/stop services.
>
> * What would then be the recommended procedure to recover from a faulty
> glusterfs node? (so that experience D and E is not happening)
>
> * What is the expected failover timing (of course depending on config, but
> say with a give ping timeout etc)?
>   and expected recovery timing (with similar dependency on config)?
>
> * What/how is glusterfs team testing to make sure that the failover,
> recovery/healing functionality etc works?
>
> Any opinion if the testcase is bad is of course also very welcome.
>
> Best regards,
> Per
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131209/69c23114/attachment.html>