[Gluster-users] [Gluster-devel] Testing replication and HA

David Bierce david.bierce at appcore.com
Tue Feb 11 14:43:41 UTC 2014


Isn’t that working as intended?  I’ve had the fuse client failover just fine in testing and in production when a memory error caused a kernel panic.

That timeout is tunable, but when a brick in the cluster goes down writes by the client are suspended until the timeout is reached.  In our environment, we have 100s of VM images that are running live so we’ve had to set the timeout down to 2 seconds, to avoid the client file systems remounting in read only or causing excessive errors to applications in the VM that get grumpy when there is a write lock of more than a few 100ms.

The timeout can be set per volume http://gluster.org/community/documentation/index.php/Gluster_3.2:_Setting_Volume_Options

network.ping-timeout

When the timeout is reached for the brick failed brick, it does have to recreate handles for all the files in the volume, which is apparently quite an expensive operation.  In our environment, with only 100s of files, this has been livable, but if you have 100k files, I’d imagine it is quite a wait to get the clients state of the volume back to usable.


On Feb 11, 2014, at 2:49 AM, Sharuzzaman Ahmat Raslan <sharuzzaman at gmail.com> wrote:

> Hi all,
> 
> Is the 42s timeout tunable?
> 
> Should the default be made lower, eg. 3 second?
> 
> Thanks.
> 
> 
> 
> 
> On Tue, Feb 11, 2014 at 3:37 PM, Kaushal M <kshlmster at gmail.com> wrote:
> The 42 second hang is most likely the ping timeout of the client translator.
> 
> What most likely happened was that, the brick on annex3 was being used
> for the read when you pulled its plug. When you pulled the plug, the
> connection between the client and annex3 isn't gracefully terminated
> and the client translator still sees the connection as alive. Because
> of this the next fop is also sent to annex3, but it will timeout as
> annex3 is dead. After the timeout happens, the connection is marked as
> dead, and the associated client xlator is marked as down. Since afr
> now know annex3 is dead, it sends the next fop to annex4 which is
> still alive.
> 
> These kinds of unclean connection terminations are only handled by
> request/ping timeouts currently. You could set the ping timeout values
> to be lower, to reduce the detection time.
> 
> ~kaushal
> 
> On Tue, Feb 11, 2014 at 11:57 AM, Krishnan Parthasarathi
> <kparthas at redhat.com> wrote:
> > James,
> >
> > Could you provide the logs of the mount process, where you see the hang for 42s?
> > My initial guess, seeing 42s, is that the client translator's ping timeout
> > is in play.
> >
> > I would encourage you to report a bug and attach relevant logs.
> > If the issue (observed) turns out to be an acceptable/explicable behavioural
> > quirk of glusterfs, then we could close the bug :-)
> >
> > cheers,
> > Krish
> > ----- Original Message -----
> >> It's been a while since I did some gluster replication testing, so I
> >> spun up a quick cluster *cough, plug* using puppet-gluster+vagrant (of
> >> course) and here are my results.
> >>
> >> * Setup is a 2x2 distributed-replicated cluster
> >> * Hosts are named: annex{1..4}
> >> * Volume name is 'puppet'
> >> * Client vm's mount (fuse) the volume.
> >>
> >> * On the client:
> >>
> >> # cd /mnt/gluster/puppet/
> >> # dd if=/dev/urandom of=random.51200 count=51200
> >> # sha1sum random.51200
> >> # rsync -v --bwlimit=10 --progress random.51200 root at localhost:/tmp
> >>
> >> * This gives me about an hour to mess with the bricks...
> >> * By looking on the hosts directly, I see that the random.51200 file is
> >> on annex3 and annex4...
> >>
> >> * On annex3:
> >> # poweroff
> >> [host shuts down...]
> >>
> >> * On client1:
> >> # time ls
> >> random.51200
> >>
> >> real    0m42.705s
> >> user    0m0.001s
> >> sys     0m0.002s
> >>
> >> [hangs for about 42 seconds, and then returns successfully...]
> >>
> >> * I then powerup annex3, and then pull the plug on annex4. The same sort
> >> of thing happens... It hangs for 42 seconds, but then everything works
> >> as normal. This is of course the cluster timeout value and the answer to
> >> life the universe and everything.
> >>
> >> Question: Why doesn't glusterfs automatically flip over to using the
> >> other available host right away? If you agree, I'll report this as a
> >> bug. If there's a way to do this, let me know.
> >>
> >> Apart from the delay, glad that this is of course still HA ;)
> >>
> >> Cheers,
> >> James
> >> @purpleidea (twitter/irc)
> >> https://ttboj.wordpress.com/
> >>
> >>
> >> _______________________________________________
> >> Gluster-devel mailing list
> >> Gluster-devel at nongnu.org
> >> https://lists.nongnu.org/mailman/listinfo/gluster-devel
> >>
> > _______________________________________________
> > Gluster-users mailing list
> > Gluster-users at gluster.org
> > http://supercolony.gluster.org/mailman/listinfo/gluster-users
> 
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
> https://lists.nongnu.org/mailman/listinfo/gluster-devel
> 
> 
> 
> -- 
> Sharuzzaman Ahmat Raslan
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users




More information about the Gluster-users mailing list