[Gluster-users] The continuing story ...

Zenaan Harkness zen at freedbms.net
Tue Sep 8 14:01:17 UTC 2009

On Tue, Sep 08, 2009 at 05:37:09AM -0700, Anand Avati wrote:
> >> > I doubt that this can be a real solution. My guess is that glusterfsd runs
> >> > into some race condition where it locks itself up completely.
> >> > It is not funny to debug something the like on a production setup. Best would
> >> > be to have debugging output sent from the servers' glusterfsd directly to a
> >> > client to save the logs. I would not count on syslog in this case, if it
> >> > survives one could use a serial console for syslog output though.
> I'm going to iterate through this yet again at the risk of frustrating
> you. glusterfsd (on the server side) is yet another process running
> only system calls. If glusterfsd has a race condition and locks itself
> up, then it locks _only its own process_ up. What you are having is a
> frozen system. There is no way glusterfsd can lock up your system
> through just VFS system calls, even if it wanted to, intentionally. It
> is a pure user space process and has no power to lock up the system.
> The worst glusterfsd can do to your system is deadlock its own process
> resulting in a glusterfs fuse mountpoint hang, or segfault and result
> in a core dump.

It appears OP has no core-dump.

It appears OP has no gluster logs.

It appears OP cannot log in/ ssh to observe results, but instead must
cold boot.

Debugging opportunities are getting slim.

Are there kernel instrumention utils that OP can use, to determine
one or more of:

   -  file descriptors running out
   -  thread deadlock condition occurring
   -  some other kernel level subsystem failure
      -  eg networking, fs, scheduler/memory


I have been watching closely. I am potential gluster user, monitoring
this situation - thanks to all parties for ongoing analysis and
patience in this case. Gluster appears to be a new technology, with
excellent potential.


