[Gluster-users] Problems with gluster and autofs
Phil Packer
philp at layer3.co.uk
Tue Feb 9 10:44:49 UTC 2010
Hi,
I have a working glusterfs setup running on Centos 5.3 with
glusterfs-2.0.4 (compiled from the source RPM)
fuse-2.7.4-1
dkms-fuse-2.7.4-1.rf
autofs-5.0.1-0.rc2.102
kernel 2.6.18-128.1.10.el5
and this all works just fine - autofs mounts the file system as you would expect and this has been in production for some time.
However, if I try and upgrade any of the components, it breaks in that the autofs mount will hang rather than completing the mount.
Mounting the file system by hand with an explicit mount command always works correctly.
I've tried several versions of glusterfs later than the above including the latest 3.0.2-1 with exactly the same result.
Additionally keeping that version of gluster and updating any of the other components also seems to break it, although I've not been able to test all the combinations - certainly the following set doesn't work either:
glusterfs-3.0.2-1
dkms-fuse-2.7.4-1.nodist.rf
fuse-2.7.4-8.el5
autofs-5.0.1-0.rc2.131.el5_4.1
2.6.18-164.11.1.el5
I wonder if someone on the list can help me, as I've seen nothing in bugzilla relating to this.
Relevant information follows (for my test rig only)
Server volfile is:
---snip---
[l3admin at oy-centos-5_3-buildserver glusterfs]$ cat /etc/glusterfs/glusterfsd.vol
## Export volume "images-brick" with the contents of /export/images directory
volume posix
type storage/posix
option directory /export/shared/
end-volume
volume locks
type features/locks
subvolumes posix
end-volume
volume server
type protocol/server
option transport-type tcp/server
subvolumes locks
option auth.addr.locks.allow *
end-volume
---snip---
Client volfile:
---snip---
[l3admin at oy-centos-5_3-buildserver glusterfs]$ cat /etc/glusterfs/glusterfs.vol
volume oy-centos-5_3-buildserver
type protocol/client
option transport-type tcp/client
option remote-host 127.0.0.1
option remote-subvolume locks
end-volume
---snip---
/etc/auto.master has the following:
---snip---
/mnt/auto /etc/auto.d/auto.gluster --timeout=60 --ghost
---snip---
and auto.gluster has
---snip---
# Mount the glustered file system
shared -fstype=glusterfs :/etc/glusterfs/glusterfs.vol
---snip---
Mounting the gluster file system directly works fine:
---snip---
sudo mount -t glusterfs /etc/glusterfs/glusterfs.vol /mnt/auto/shared/
[l3admin at oy-centos-5_3-buildserver ~]$ df /mnt/auto/shared
Filesystem 1K-blocks Used Available Use% Mounted on
glusterfs#/etc/glusterfs/glusterfs.vol
2031360 543744 1382656 29% /mnt/auto/shared
---snip---
starting autofs and attempting to access the mounted directory eg
ls /mnt/auto/shared/
causes the glusterfs to hang leaving a process list like this:
[l3admin at oy-centos-5_3-buildserver glusterfs]$ pstree -p | grep glu
|-automount(22819)-+-mount(22830)---mount.glusterfs(22831)---glusterfs(22880)---glusterfs(22881)
|-glusterfs(22882)---{glusterfs}(22883)
|-glusterfsd(22356)---{glusterfsd}(22357)
Running gdb against 22882 during the hang shows:
(gdb) bt
#0 0x00c81402 in __kernel_vsyscall ()
#1 0x00ee7473 in __xstat64 at GLIBC_2.1 () from /lib/libc.so.6
#2 0x00df66ec in stat64 () from /usr/lib/glusterfs/3.0.0/xlator/mount/fuse.so
#3 0x00df419c in init (this_xl=0x88db1f8) at fuse-bridge.c:3368
#4 0x00b2293d in xlator_init (xl=0x88db1f8) at xlator.c:940
#5 0x00b22583 in xlator_init_rec (xl=0x88db1f8) at xlator.c:833
#6 0x00b226e6 in xlator_tree_init (xl=0x88db1f8) at xlator.c:871
#7 0x0804b299 in _xlator_graph_init ()
#8 0x0804b433 in glusterfs_graph_init ()
#9 0x0804d40c in main ()
(gdb) directory /home/l3admin/rpmbuild/BUILD/glusterfs-3.0.0/glusterfsd/src
Source directories searched: /home/l3admin/rpmbuild/BUILD/glusterfs-3.0.0/glusterfsd/src:$cdir:$cwd
(gdb) list *0x00df419c
0xdf419c is in init (fuse-bridge.c:3368).
3363 gf_log ("fuse", GF_LOG_ERROR,
3364 "Mandatory option 'mountpoint' is not specified.");
3365 goto cleanup_exit;
3366 }
3367
3368 if (stat (value_string, &stbuf) != 0) {
3369 if (errno == ENOENT) {
3370 gf_log (this_xl->name, GF_LOG_ERROR,
3371 "%s %s does not exist",
3372 ZR_MOUNTPOINT_OPT, value_string);
(gdb)
(gdb) select-frame 3
(gdb) print value_string
$1 = 0x88da198 "/mnt/auto/shared"
By the time I'd got here, the spawned process on 22883 had died (is there a watchdog of some sort?) so I repeated the exercise and ran gdb on the watchdog process (which I think was pid 22881) getting this:
(gdb) bt
#0 0x0013d402 in __kernel_vsyscall ()
#1 0x001cf996 in nanosleep () from /lib/libc.so.6
#2 0x0020915c in usleep () from /lib/libc.so.6
#3 0x00d7fa2d in gf_timer_proc (ctx=0x8560008) at timer.c:177
#4 0x0068573b in start_thread () from /lib/libpthread.so.0
#5 0x0020fcfe in clone () from /lib/libc.so.6
And so I presume that this is waiting for some communication from the process which spawned it, indicating that the mount was complete???
Regards to all
Phil
--
Director, Layer3 Systems Ltd
Layer3 Systems Limited is registered in England. Company no 3130393
43 Pendle Road, Streatham, London, SW16 6RT
tel: 020 8769 4484
More information about the Gluster-users
mailing list