[Gluster-users] 2.0.6

Sun Aug 23 14:23:41 UTC 2009

On Sun, 23 Aug 2009 05:21:58 -0700
Anand Avati <avati at gluster.com> wrote:

> > Lets assume your theory is right. Then I obviously managed to create a
> > scenario where the bail-out decisions for servers are clearly bad. In fact
> > they are so bad that the whole service breaks down. This is of course a no-go
> > for an application thats sole (or primary) purpose is to keep your fileservice
> > up, no matter what servers in the backend crash or vanish. As long as there is
> > a theoretical way of performing the needed fileservice it should be up and
> > running. Even iff your theory were right, still glusterfs does not handle
> > the situation as good as is could (read: as a user would expect).
> 
> OK, first of all, this is now a very different issue we are trying to
> address. Correct me if I'm wrong, the new problem definition now is -
> 'when glusterfs is presented with a backend filesystem which hangs FS
> calls, the replicate module does not provide FS service' (and not any
> more, as previously described by you, 'glusterfs has not been able to
> run bonnie even for an hour on all 2.0.x releases because of lack of
> attention towards stability and concentration on featurism').

My simple opinion on this is: I am able to multitask, I forked a related
question because we just stumbled over it. The original question did not die,
it just awaits a proof that I am able to hang a local fs with glusterfsd.

> Please
> do understand that this is not at all a (regular) crash of the
> filesystem, as described, which can be reliably reproduced within an
> hour, and the dev team not caring to fix it. The problem does not
> deserve such an attack.

As said, this is only a related question based solely on your sight of the
story.

> The reason why this issue persists is - there is no reliable way to
> even detect this hang programatically. The right way to "deal" with it
> is to translate the "disk hang" into a "subvolume down" is hard,
> because -- Has the server stopped responding? No, ping-pong replies
> are coming just fine. Has the backend disk started returning IO
> errors? No, the FS calls just hang exactly like a deadlock. Detecting
> hardware failures can be done with reasonable reliability. Detecting
> buggy software lockups and such deadlocks is a very hard (theoretical)
> problem.

Maybe it gets a lot easier if you delegate the answer to the admin.

> The simplest way around it having timeouts at a higher layer. And it
> is for a reason that the current call timeouts are 1800 seconds - we
> have seen in our QA lab that truncate() call on multi terabyte large
> file on ext3 takes more than 20 minutes to complete, and during that
> period all other calls happening on that filesystem also freeze.

That could be a separate issue for extX maintainers, but probably they won't
listen...

> Programatically this situation is no different from the hang you face.
> The 1800sec timeout currently used is based on experimental
> calculations and not arbitrary. If you can come up with a better way
> of reliably detecting that the backend FS has hung itself (even
> considering the delay situations which I explained above), we are
> willing to use that technique provided it is reasonable enough (do
> consider situations where the backend fs could be an NFS which might
> have temporarily blocked for multiple minutes for the server to reboot
> etc).

Whatever the exact situation looks like, at least the admin should have an
idea of what is impossible in his setup. If he has (like me) only local disks
with few TB average size and no ext3, he probably wants to make sure that
glusterfs bails out servers that do not respond within some configurable (but
low) number of minutes.
If you have a situation with servers spread all over the planet or file
actions to the limit you would very likely increase timeouts because you know
your own bad situation, but still want to make something out of it.
Is there some config option for tuning these scenarios?

> Avati

-- 
Regards,
Stephan