[Gluster-users] Replication not working on server hang

David Saez Padros david at ols.es
Sun Aug 30 08:26:39 UTC 2009


> What we have here (kernel lockups and glusterfs on the same machine)
> might not be a co-incidence.

but it could be

> There might well be a correlation -- but
> by nature of the problem it is not right to treat this as a
> cause-effect relation with glusterfs being the cause.

i think it's also not right to simply discard glusterfs being the

> It is just
> not right to blame _any_ userspace application for any kind of kernel
> lockups or hangs. 

well, i'm to saying that this is glusterfs fault, what i'm saying
is that is very likely that glusterfs is at least triggering this

> So any hang or lockup in the kernel can only be
> caused by a bug in itself, which could possibly be triggered by a
> specific user application.

maybe, but don't you feel that this needs to be investigated in order
to know what is really happening ?

> What we will be fixing is failing over to other machines when the
> backend FS hangs. The reason why this was not a priority (so far
> atleast) is because a kernel is a trusted piece of software in the
> system, and when you are having a kernel which has a bug in the fs,
> you should just upgrade to a newer kernel.

yes, but right now there is no evidence that this is a kernel bug.
 From a user's point of view, if this did not happen when using nfs and
happens when using glusterfs the most evident solution is to switch back
to nfs (like you, we usually prefer to trust kernel stability against
application stability) and not do any kernel upgrade unless there is an
evidence that this is a kernel bug (as a kernel upgrade could mean 
having to upgrade many other pieces of software that were working ok
and that will need to be tested again).

> What we promise to fix is a way to (as best
> as possible) somehow translate a backend FS hang into a "subvolume
> down" status and consider that subvolume to be down. After that, you
> will _still_ continue to face kernel hangs and lockups and just
> glusterfs will stop hanging. Your machines would still remain locked
> up.

that's great !

Thanx & best regards ...

    David Saez Padros                http://www.ols.es
    On-Line Services 2000 S.L.       telf    +34 902 50 29 75

