Stephan von Krawczynski
skraw at ithnet.com
Sat Aug 22 14:20:41 UTC 2009
On Sat, 22 Aug 2009 05:42:45 -0500 (CDT)
Anand Avati <avati at gluster.com> wrote:
> > It is perfectly clear to us that glusterfs(d) is the reason for the
> > box
> > becoming instable and producing a hang even on a local fs (you cannot
> > df on
> > the exported partition for example).
> > We will therefore continue with debugging as told before.
> glusterfsd is just another application as far as the backend export filesystem is concerned. If your backend export fs is hung and refuses to respond to df, I would refuse to accept that glusterfsd is guilty. If your backend filesystem ended in that state, it is a bug in the backend fs. glusterfsd is just another application which issues system calls and does not do anything funky at all. If an application issuing system calls is causing the export fs to stop responding to df, it is not the fault of the application. If you can get dmesg output at the time of such a hang, that might have some hard evidence.
Ok, please stay serious. As described in my original email from 19th
effectively _all_ four physical boxes have not-moving (I deny to use "hanging"
here) gluster processes. The mount points on the clients hang (which made
bonnies stop), the primary server looks pretty much ok, but does obviously
serve nothing to the clients, and the secondary has a hanging local fs for
what causes ever.
Now can you please elaborate how you come to the conclusion that this complete
service lock up derives from one hanging fs on one secondary server of a
replicate setup (which you declare as the cause and I as the effect of locked
up gluster processes).
More information about the Gluster-users