GlusterFS-2.1.0-git mount.glusterfs bug? (was: Re: [Gluster-devel] Add me too for lockup of system for very simple GlusterFS config...)

Anand Avati avati at gluster.com
Tue Sep 8 10:37:38 UTC 2009


lease consider the master branch unusable for a few more weeks. You
can checkout 'release-2.0' branch for staying up to date with the
latest of the stable branch.

Avati

On Sun, Sep 6, 2009 at 3:12 PM, Mark Mielke<mark at mark.mielke.cc> wrote:
> Ok - I think this turns out to be a GlusterFS 2.1.0-git specific bug, but
> I've included all of the details:
>
> My first use of GlusterFS is using GlusterFS 2.1.0-git on Fedora 11 / x86_64
> with ext4 partitions. / is an ext4 partition. /export/gluster-test is a
> different ext4 partition. I do use NFS + AutoFS, and NFS does export other
> partitions under /export. This is to be a very simple client/server.
>
> For the server, I have this:
>
> # cat /export/gluster-test-server.vol
> volume brick
>  type storage/posix
>  option directory /export/gluster-test/
> end-volume
>
> volume server
>  type protocol/server
>  option transport-type tcp
>  subvolumes brick
>  option auth.addr.brick.allow 47.134.128.*
> end-volume
>
> For the client, I have this:
>
> # cat /export/gluster-test-client.vol
> volume brick1
>  type protocol/client
>  option transport-type tcp
>  option remote-host 47.134.128.21
> #option remote-port 7000
>  option remote-subvolume brick
> end-volume
>
> The server/client IP is 47.134.128.21. There are no firewalls active on this
> machine at this time.
>
> To launch the server, I used: (GlusterFS 2.1.0-git install int
> /opt/glusterfs)
>
> # /opt/glusterfs/sbin/glusterfsd --volfile=/export/gluster-test-server.vol
>
> To launch the client and mount, I used:
>
> # mkdir /tmp/t
> # mount -t glusterfs /export/gluster-test-client.vol /tmp/t
> ... output says FUSE initialized ...
> # cd /tmp/t
>
> From this point, I *appeared* to be able to modify /tmp/t. However, it turns
> out that the mount did not actually complete, and I was just changing /tmp/t
> under /tmp, not under GlusterFS. I believe this matches the documented usage
> under gluster.org:
>
> bash# mount -t glusterfs /etc/glusterfs/glusterfs.vol /mnt/glusterfs
>
> I determined that I was able to sudo / su / login / run commands from
> "/bin", however when I did "ls /" or "ls /export", everything would freeze
> and "/sbin/shutdown -r now" would not complete. "cd /export" would also
> freeze. I suspect that "ls /" does stat("/export") and this is why it
> freezes. During this investigation period, I noticed the ps output was
> strange:
>
> root      2312     1  0 16:10 ?        00:00:00
> /opt/glusterfs/sbin/glusterfsd --volfile=gluster-test-server.vol
> root      2370     1  0 16:11 ?        00:00:00
> /opt/glusterfs/sbin/glusterfs --log-level=NORMAL
> --volfile=/export/gluster-test-client.vol /export
> root      2385  2370  0 16:11 ?        00:00:00 /bin/mount -i -f -t
> fuse.glusterfs -o allow_other,default_permissions,max_read=131072
> /export/gluster-test-client.vol /export
> root      2577  2467  0 16:13 tty4     00:00:00 grep gluster
>
> Why is it trying to mount on /export?
>
> I ran this test multiple times - each time my ''mount -t glusterfs" was on
> /tmp/t - I never used /export. Each time, it had the same results - the
> /sbin/mount.glusterfs was somehow translating it to /export. I determined to
> trace some of the processes, and found that the I could "strace -p" for
> glusterfsd and glusterfs, but I could "strace -p" of 2385 would freeze.
> Control-C was frozen for all of these, including the "strace -p", however,
> if I did "kill -9" (regular kill did not work) of the /bin/mount process,
> then the "strace -p" would come back. Finally, I killed /bin/mount *three*
> times (it came back twice?), and killed glusterfs, the system went back to
> normal with no freezes. During this, I also did a df on /tmp/t which showed
> that /tmp/t was /, but df in general (which presumedly was trying to query
> /export) would freeze.
>
> To confirm this thinking, I started the glusterfs mount directly:
>
> # /opt/glusterfs/bin/glusterfs --volfile=/export/gluster-test-client.vol
> /tmp/t
>
> And it worked perfectly - no freeze, and /tmp/t was a proper glusterfs
> mount. Changes to /tmp/t were reflected in /export/gluster-test.
>
> I also determined that the complete system freeze and failure to
> "/sbin/shutdown -r now" was due to failure for NFS to shut down properly
> while the system was in the "frozen" state. If I restarted the whole
> scenario, but ensured that both "nfs" and "autofs" were NOT running, then
> although accesses to /export would freeze, I was able to restart the system
> using "/sbin/shutdown -r now" or Ctrl-Alt-Del from the console. So, the real
> freeze was that any access to /export would become stuck in the kernel like
> an NFS hard mount. I did not wait around to see if it would time out after
> 30 minutes as I was running these tests in quick succession and my family
> was waiting for me outside in the car. :-)
>
> Thinking about the above - I think /sbin/mount.glusterfs must have a problem
> for it to use /export even though I passed in /tmp - but, this is not the
> only problem. There is also some sort of other failure that causes system
> lockup instead of clean failure. One scenario I can think of is that it is
> trying to mount /export against something /export/gluster-test, and this
> might be leading to some sort of loop? I think /export was being put in a
> half-mounted state, where it was being controlled by FUSE/GlusterFS, but
> GlusterFS was not able to serve any requests on it?
>
> Going back to /sbin/mount.glusterfs, here is a more exact test showing this
> problem:
>
> [root at wcarh033]/# mount -t glusterfs /export/gluster-test-client.vol /tmp
> [root at wcarh033]/# ps -ef | grep gluster
> root      3221     1  0 17:54 ?        00:00:00
> /opt/glusterfs/sbin/glusterfs --log-level=NORMAL
> --volfile=/export/gluster-test-client.vol /export
> root      3232  3221  0 17:54 ?        00:00:00 /bin/mount -i -f -t
> fuse.glusterfs -o allow_other,default_permissions,max_read=131072
> /export/gluster-test-client.vol /export
> root      3238  3151  0 17:54 pts/0    00:00:00 grep gluster
>
> If I try to recover from this, I can recover from the freeze, but not from
> the whole situation:
>
> [root at wcarh033]/# kill -9 3221
> [root at wcarh033]/# ps -ef | grep gluster
> root      3232     1  0 17:54 ?        00:00:00 /bin/mount -i -f -t
> fuse.glusterfs -o allow_other,default_permissions,max_read=131072
> /export/gluster-test-client.vol /export
> root      3243  3151  0 17:56 pts/0    00:00:00 grep gluster
> [root at wcarh033]/# kill -9 3232
> [root at wcarh033]/# ps -ef | grep gluster
> root      3245  3151  0 17:56 pts/0    00:00:00 grep gluster
> [root at wcarh033]/# ls /export
> ls: cannot access /export: Transport endpoint is not connected
>
> I reboot the machine to clean up for that, at least for now.
>
> Where is /export coming from? It's on the command line - I wonder if the
> command line parsing is broken?
>
> In /sbin/mount.glusterfs, I see these lines which do not appear in GlusterFS
> 2.0.6:
>
>     mount_provided=$(echo "$@" | cut -f2 -d'/');
>
>     [ -n "$mount_provided" ] && {
>         mount_point="/$mount_provided";
>     }
>
>     [ -z "$mount_point" ] && {
>         usage;
>         exit 0;
>     }
>
>
> Before, it used to say:
>
>     mount_point="$2";
>
> If I switch the code back to what it used to be, my original test works
> fine. No freeze. Whoohoo!
>
> Please fix in GIT. Thanks.
>
> Cheers,
> mark
>
> --
> Mark Mielke <mark at mielke.cc>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>
>





More information about the Gluster-devel mailing list