[Gluster-users] 2.0.6

David Saez Padros david at ols.es
Sat Aug 22 16:50:56 UTC 2009


Hi

I have experienced similar problems with df hanging, transport endpoint
disconnected, server locked ... with no apparent reason when copying
a lot of files to the glusterfs file system. No idea if this is the real
cause but since i changed something in the configuration the problems 
stop (by now). I had two servers exporting two unified bricks that were
replicated on clients (two different replicated bricks), on the client
configuration i had two vol files on each client, one for each different
gluster bricks but both of them use the same names for the bricks (vol
files where identical except for the option remote-subvolume brick
values). Once i changed the name of the bricks so each file had
bricks with names not identical to the other vol file the problems
disapeared. Not sure if this was the problem or not but by now the
problem has no appeared again.


> On Sat, 22 Aug 2009 05:42:45 -0500 (CDT)
> Anand Avati <avati at gluster.com> wrote:
> 
>>> It is perfectly clear to us that glusterfs(d) is the reason for the
>>> box
>>> becoming instable and producing a hang even on a local fs (you cannot
>>> df on
>>> the exported partition for example).
>>> We will therefore continue with debugging as told before.
>> glusterfsd is just another application as far as the backend export filesystem is concerned. If your backend export fs is hung and refuses to respond to df, I would refuse to accept that glusterfsd is guilty. If your backend filesystem ended in that state, it is a bug in the backend fs. glusterfsd is just another application which issues system calls and does not do anything funky at all. If an application issuing system calls is causing the export fs to stop responding to df, it is not the fault of the application. If you can get dmesg output at the time of such a hang, that might have some hard evidence.
>>
>> Avati
> 
> Ok, please stay serious. As described in my original email from 19th
> effectively _all_ four physical boxes have not-moving (I deny to use "hanging"
> here) gluster processes. The mount points on the clients hang (which made
> bonnies stop), the primary server looks pretty much ok, but does obviously
> serve nothing to the clients, and the secondary has a hanging local fs for
> what causes ever.
> Now can you please elaborate how you come to the conclusion that this complete
> service lock up derives from one hanging fs on one secondary server of a
> replicate setup (which you declare as the cause and I as the effect of locked
> up gluster processes).
> 
> ?
> 

-- 
Salu-2 y hasta pronto ...

----------------------------------------------------------------
    David Saez Padros                http://www.ols.es
    On-Line Services 2000 S.L.       telf    +34 902 50 29 75
----------------------------------------------------------------





More information about the Gluster-users mailing list