[Gluster-users] The continuing story ...

Fri Sep 18 14:31:06 UTC 2009

On 09/18/2009 04:14 AM, Anand Avati wrote:
>>
>> Is this during runtime or during mount time? For me, it can happen at mount
>> time when I reboot all servers at the same time. Once I kill -9 the 'mount'
>> process, umount the mount point, and then remount, the freeze clears. Still,
>> having to do this in the first place is bothersome.
>>      
> The "hang" you see at mount time when all servers are down is actually
> the delay for a tcp connect() to succeed. During the initialization
> connect() is given a full opportunity before transport states are set,
> so that operations on the mountpoint immediately following a mount do
> not fail unnecessarily. So your mount "hang" is supposed to return
> after 3mins (the default tcp connect time). If the server machine is
> up or if the machines are in the same network (where a "Host not
> reachable" error is received much earlier) then the hang should not be
> observed. Can you confirm that your hang is actually the delay matched
> in this description? We are figuring out the best way to cut this
> delay.
>
>    

For me, it does not clear after 3 mins or 3 hours. I restarted the 
machines at midnight, and the first time I tried again was around 1pm 
the next day (13 hours). I easily recognize the symptoms as the 
/bin/mount remains in the process tree. I can't get a strace -p on the 
/bin/mount process since it is frozen. The glusterfsd process is not 
frozen - the glusterfs process seems to be waiting on /bin/mount to 
complete. The only way to unfreeze the mount seems to be to kill -9 
/bin/mount (regular kill does not work), at which the mount point goes 
into the disconnected state, and it is recovered using unmount / 
remount. I tried to track down the problem before, but became confused, 
because glusterfs seems to do it's own FUSE mount management rather than 
using the standard (for Linux anyways?) FUSE user space libraries. If my 
memory is correct - it seems like the process is: I run mount, the mount 
runs /sbin/mount.glusterfs, which runs glusterfs, which runs /bin/mount 
with the full options?

This is where I discovered the other issue where the 'mount 
/gluster/mountpoint' can return before the mount point is completely set 
up, introducing a race where a user can access the mount point and see 
an error or an empty directory before seeing the actual contents. I 
don't know if these are related or separate issues.

Cheers,
mark

-- 
Mark Mielke<mark at mielke.cc>