[Gluster-users] Replication not working on server hang

Anand Avati anand.avati at gmail.com
Sun Aug 30 00:51:22 UTC 2009


Jeff,
  We are working on bug 144. We think one of the changes we plan to
bring in 2.0.7 will fix this problem. The discussion in this thread is
about those situations where the server (machine hosting the
storage/posix volume) hangs the backend filesystem (verified by kernel
console logs) and that in turn results in the mountpoint hang. While
your symptoms are similar on the client side hanging, and we
acknowledge that yours definitely is a glusterfs bug. The discussion
thread is about the situation where the server side kernel misbehaves
and results in glusterfs hanging. The two actual problems are quite
different.

Avati

On Sat, Aug 29, 2009 at 10:40 AM, Jeff Evans<jeffe at tricab.com> wrote:
> Hi All,
>
> I'm afraid that I have some more fuel to add to the glusterfs hanging
> "fire".
>
> Way back when experimenting with 1.3, I began experiencing hangs.
>
> Then we added the read-ahead Xlator to the server and the hangs
> miraculously stopped.
> That may well be a coincidence, I don't know, but we never hung while
> read-ahead was loaded.
>
> Then came version 2.0 and we hit a bug:
>
> http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=144
>
> So, we had to take out read-ahead and now we have the hangs again! Doh
> & Double Doh!
>
> This has forced me to take glusterfs out of production, and management
> is now questioning my decision to utilize it at all (a subscription
> won't be purchased anytime soon).
>
> Some points to note:
>
> I'm using ext3, the rest of my set-up is detailed in the above
> bugzilla report.
>
> My hangs have often been triggered with a grep -R on the glusterFS
> mount (yes, just reading!).
>
> None of my hangs have ever given me a single log entry.
>
> When hung, the server affected cannot be logged into. Just get the
> first line 'Last login:...'
>
> This, and other services I run, seem to indicate that existing
> processes that have already open FD's NOT on glusterfs can continue to
> execute, but no new FD's can be opened at all, system wide.
>
> To date there has been a lot of talk about the underlying FS being an
> issue in these cases.
>
> I seriously doubt it, & certainly not in the case of ext3.
>
> I agree that the server process shouldn't be able to hang a stable
> system, but what about the client?
>
> Could this be the work of GlusterFS/Fuse/Kernel interaction?
>
> Whatever the cause, it is one very large show stopper that we MUST
> rectify.
>
> We may well be dealing with several parallel issues.
> Finding common factors in our glusterfs instances should help us
> narrow down the search.
>
> Regards, Jeff.
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>



More information about the Gluster-users mailing list