[Gluster-users] The continuing story ...
mark at mark.mielke.cc
Fri Sep 18 15:29:53 UTC 2009
On 09/18/2009 10:31 AM, Mark Mielke wrote:
> For me, it does not clear after 3 mins or 3 hours. I restarted the
> machines at midnight, and the first time I tried again was around 1pm
> the next day (13 hours). I easily recognize the symptoms as the
> /bin/mount remains in the process tree. I can't get a strace -p on the
> /bin/mount process since it is frozen. The glusterfsd process is not
> frozen - the glusterfs process seems to be waiting on /bin/mount to
> complete. The only way to unfreeze the mount seems to be to kill -9
> /bin/mount (regular kill does not work), at which the mount point goes
> into the disconnected state, and it is recovered using unmount /
> remount. I tried to track down the problem before, but became
> confused, because glusterfs seems to do it's own FUSE mount management
> rather than using the standard (for Linux anyways?) FUSE user space
> libraries. If my memory is correct - it seems like the process is: I
> run mount, the mount runs /sbin/mount.glusterfs, which runs glusterfs,
> which runs /bin/mount with the full options?
Oh - to further clarify - the exact equivalent symptoms (/bin/mount
being frozen for 13 hours, requiring kill -9 to clear the condition)
happened on all three machines. So, it wasn't a one off. If I reboot the
machines one by one - there are some 10 second pauses (expected) but
everything is ok. It's if the reboots are within seconds of each other,
and the mount occurs when no servers are up (although I don't think it
is tcp/ip wait, and all of the machines should be on the network by this
point in the boot process, so it should be an immediate Connection
Refused?), that the /bin/mount gets locked up.
> This is where I discovered the other issue where the 'mount
> /gluster/mountpoint' can return before the mount point is completely
> set up, introducing a race where a user can access the mount point and
> see an error or an empty directory before seeing the actual contents.
> I don't know if these are related or separate issues.
Mark Mielke<mark at mielke.cc>
More information about the Gluster-users