[Gluster-users] Why does this setup not survive a node crash?

Sun Apr 17 15:42:20 UTC 2011

> >> Secondly, I changed the volume parameter "network.ping-timeout" from
> >> its default of 43 to 10 seconds, in order to get faster recovery from a
> >> downed storage node:
> >>
> >>        gluster volume set pfs-rw1 network.ping-timeout 10
> >>
> >> This configuration now survives the loss of either node of the two
> >> storage server mirrors. There is a noticeable delay before commands on
> >> the mount point complete the first time a command is issued after one
> >> of the nodes have gone done - but then they return at the same speed as
> >> when all nodes were present.

Does someone know the default behavior to expect when there are two storage
server mirrors served to NFS clients, rather than GlusterFS clients as
above? In my case IP failover is implemented with ucarp. But I haven't yet
tested what happens if one mirror is simply unplugged. (Yeah, I should have,
but needed to implement in production in a hurry as other storage was
failing. Test systems come next.)

When one mirror goes down, and the client systems end up addressing only
half the normally-mirrored storage systems, how will the storage behave?
When a file is written to by an NFS client, will gluster treat the write as
a success from the clients' POV when it writes to the remaining half of the
mirror, or will there be a delay while gluster keeps trying to write to the
second mirror? If there will be a delay, is there some setting lik
network.ping-timeout that effects its length, that should be tuned?

Looking here:
http://europe.gluster.org/community/documentation/index.php/Gluster_3.1:_Setting_Volume_Options
I don't see anything that's obviously related to this behavior.

Thanks,
Whit