GlusterFS-2.1.0-git mount.glusterfs bug? (was: Re: [Gluster-devel] Add me too for lockup of system for very simple GlusterFS config...)

Mark Mielke mark at mark.mielke.cc
Sun Sep 6 22:12:22 UTC 2009


Ok - I think this turns out to be a GlusterFS 2.1.0-git specific bug, 
but I've included all of the details:

My first use of GlusterFS is using GlusterFS 2.1.0-git on Fedora 11 / 
x86_64 with ext4 partitions. / is an ext4 partition. 
/export/gluster-test is a different ext4 partition. I do use NFS + 
AutoFS, and NFS does export other partitions under /export. This is to 
be a very simple client/server.

For the server, I have this:

# cat /export/gluster-test-server.vol
volume brick
  type storage/posix
  option directory /export/gluster-test/
end-volume

volume server
  type protocol/server
  option transport-type tcp
  subvolumes brick
  option auth.addr.brick.allow 47.134.128.*
end-volume

For the client, I have this:

# cat /export/gluster-test-client.vol
volume brick1
  type protocol/client
  option transport-type tcp
  option remote-host 47.134.128.21
#option remote-port 7000
  option remote-subvolume brick
end-volume

The server/client IP is 47.134.128.21. There are no firewalls active on 
this machine at this time.

To launch the server, I used: (GlusterFS 2.1.0-git install int 
/opt/glusterfs)

# /opt/glusterfs/sbin/glusterfsd --volfile=/export/gluster-test-server.vol

To launch the client and mount, I used:

# mkdir /tmp/t
# mount -t glusterfs /export/gluster-test-client.vol /tmp/t
... output says FUSE initialized ...
# cd /tmp/t

 From this point, I *appeared* to be able to modify /tmp/t. However, it 
turns out that the mount did not actually complete, and I was just 
changing /tmp/t under /tmp, not under GlusterFS. I believe this matches 
the documented usage under gluster.org:

bash# mount -t glusterfs /etc/glusterfs/glusterfs.vol //mnt//glusterfs


I determined that I was able to sudo / su / login / run commands from 
"/bin", however when I did "ls /" or "ls /export", everything would 
freeze and "/sbin/shutdown -r now" would not complete. "cd /export" 
would also freeze. I suspect that "ls /" does stat("/export") and this 
is why it freezes. During this investigation period, I noticed the ps 
output was strange:

root      2312     1  0 16:10 ?        00:00:00 
/opt/glusterfs/sbin/glusterfsd --volfile=gluster-test-server.vol
root      2370     1  0 16:11 ?        00:00:00 
/opt/glusterfs/sbin/glusterfs --log-level=NORMAL 
--volfile=/export/gluster-test-client.vol /export
root      2385  2370  0 16:11 ?        00:00:00 /bin/mount -i -f -t 
fuse.glusterfs -o allow_other,default_permissions,max_read=131072 
/export/gluster-test-client.vol /export
root      2577  2467  0 16:13 tty4     00:00:00 grep gluster

Why is it trying to mount on /export?

I ran this test multiple times - each time my ''mount -t glusterfs" was 
on /tmp/t - I never used /export. Each time, it had the same results - 
the /sbin/mount.glusterfs was somehow translating it to /export. I 
determined to trace some of the processes, and found that the I could 
"strace -p" for glusterfsd and glusterfs, but I could "strace -p" of 
2385 would freeze. Control-C was frozen for all of these, including the 
"strace -p", however, if I did "kill -9" (regular kill did not work) of 
the /bin/mount process, then the "strace -p" would come back. Finally, I 
killed /bin/mount *three* times (it came back twice?), and killed 
glusterfs, the system went back to normal with no freezes. During this, 
I also did a df on /tmp/t which showed that /tmp/t was /, but df in 
general (which presumedly was trying to query /export) would freeze.

To confirm this thinking, I started the glusterfs mount directly:

# /opt/glusterfs/bin/glusterfs --volfile=/export/gluster-test-client.vol 
/tmp/t

And it worked perfectly - no freeze, and /tmp/t was a proper glusterfs 
mount. Changes to /tmp/t were reflected in /export/gluster-test.

I also determined that the complete system freeze and failure to 
"/sbin/shutdown -r now" was due to failure for NFS to shut down properly 
while the system was in the "frozen" state. If I restarted the whole 
scenario, but ensured that both "nfs" and "autofs" were NOT running, 
then although accesses to /export would freeze, I was able to restart 
the system using "/sbin/shutdown -r now" or Ctrl-Alt-Del from the 
console. So, the real freeze was that any access to /export would become 
stuck in the kernel like an NFS hard mount. I did not wait around to see 
if it would time out after 30 minutes as I was running these tests in 
quick succession and my family was waiting for me outside in the car. :-)

Thinking about the above - I think /sbin/mount.glusterfs must have a 
problem for it to use /export even though I passed in /tmp - but, this 
is not the only problem. There is also some sort of other failure that 
causes system lockup instead of clean failure. One scenario I can think 
of is that it is trying to mount /export against something 
/export/gluster-test, and this might be leading to some sort of loop? I 
think /export was being put in a half-mounted state, where it was being 
controlled by FUSE/GlusterFS, but GlusterFS was not able to serve any 
requests on it?

Going back to /sbin/mount.glusterfs, here is a more exact test showing 
this problem:

[root at wcarh033]/# mount -t glusterfs /export/gluster-test-client.vol /tmp
[root at wcarh033]/# ps -ef | grep gluster
root      3221     1  0 17:54 ?        00:00:00 
/opt/glusterfs/sbin/glusterfs --log-level=NORMAL 
--volfile=/export/gluster-test-client.vol /export
root      3232  3221  0 17:54 ?        00:00:00 /bin/mount -i -f -t 
fuse.glusterfs -o allow_other,default_permissions,max_read=131072 
/export/gluster-test-client.vol /export
root      3238  3151  0 17:54 pts/0    00:00:00 grep gluster

If I try to recover from this, I can recover from the freeze, but not 
from the whole situation:

[root at wcarh033]/# kill -9 3221
[root at wcarh033]/# ps -ef | grep gluster
root      3232     1  0 17:54 ?        00:00:00 /bin/mount -i -f -t 
fuse.glusterfs -o allow_other,default_permissions,max_read=131072 
/export/gluster-test-client.vol /export
root      3243  3151  0 17:56 pts/0    00:00:00 grep gluster
[root at wcarh033]/# kill -9 3232
[root at wcarh033]/# ps -ef | grep gluster
root      3245  3151  0 17:56 pts/0    00:00:00 grep gluster
[root at wcarh033]/# ls /export
ls: cannot access /export: Transport endpoint is not connected

I reboot the machine to clean up for that, at least for now.

Where is /export coming from? It's on the command line - I wonder if the 
command line parsing is broken?

In /sbin/mount.glusterfs, I see these lines which do not appear in 
GlusterFS 2.0.6:

     mount_provided=$(echo "$@" | cut -f2 -d'/');

     [ -n "$mount_provided" ] && {
         mount_point="/$mount_provided";
     }

     [ -z "$mount_point" ] && {
         usage;
         exit 0;
     }


Before, it used to say:

     mount_point="$2";

If I switch the code back to what it used to be, my original test works 
fine. No freeze. Whoohoo!

Please fix in GIT. Thanks.

Cheers,
mark

-- 
Mark Mielke<mark at mielke.cc>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-devel/attachments/20090906/31f42b9f/attachment-0003.html>


More information about the Gluster-devel mailing list