[Gluster-users] "Too many levels of symbolic links" with glusterfs automounting
harry mangalam
hjmangalam at gmail.com
Wed Jun 20 04:23:05 UTC 2012
One client log file is here:
http://goo.gl/FyYfy
On the server side, on bs1 & bs4, there is a huge, current nfs.log file
(odd since I neither wanted nor configured an nfs export). It is filled
entirely with these lines:
tail -5 nfs.log
[2012-06-19 21:11:54.402567] E [rdma.c:4458:tcp_connect_finish]
0-gl-client-1: tcp connect to failed (Connection refused)
[2012-06-19 21:11:54.406023] E [rdma.c:4458:tcp_connect_finish]
0-gl-client-2: tcp connect to failed (Connection refused)
[2012-06-19 21:11:54.409486] E [rdma.c:4458:tcp_connect_finish]
0-gl-client-3: tcp connect to failed (Connection refused)
[2012-06-19 21:11:54.412822] E [rdma.c:4458:tcp_connect_finish]
0-gl-client-6: tcp connect to 10.2.7.11:24008 failed (Connection
refused)
[2012-06-19 21:11:54.416231] E [rdma.c:4458:tcp_connect_finish]
0-gl-client-7: tcp connect to 10.2.7.11:24008 failed (Connection
refused)
on servers bs2, bs3 there is a current, huge log of this line, repeating
every 3s:
[2012-06-19 21:14:00.907387] I [socket.c:1798:socket_event_handler]
0-transport: disconnecting now
I was reminded as I was copying it that the client and servers are
slightly different - the client is "3.3.0qa42-1" while the server is
"3.3.0-1". Is this enough version skew to cause a difference? There
are no other problems that I'm aware of but if it's the case that a
slight version skew will be problematic, I'll be careful to keep them
exactly aligned. I think this was done since the final release binary
did not support the glibc that we were usin gon the compute nodes and
the 3.3.0qa42-1 did. Perhaps too sloppy...?
gluster volume info
Volume Name: gl
Type: Distribute
Volume ID: 21f480f7-fc5a-4fd8-a084-3964634a9332
Status: Started
Number of Bricks: 8
Transport-type: tcp,rdma
Bricks:
Brick1: bs2:/raid1
Brick2: bs2:/raid2
Brick3: bs3:/raid1
Brick4: bs3:/raid2
Brick5: bs4:/raid1
Brick6: bs4:/raid2
Brick7: bs1:/raid1
Brick8: bs1:/raid2
Options Reconfigured:
performance.io-cache: on
performance.quick-read: on
performance.io-thread-count: 64
auth.allow: 10.2.*.*,10.1.*.*
gluster volume status
Status of volume: gl
Gluster process Port Online
Pid
------------------------------------------------------------------------------
Brick bs2:/raid1 24009 Y
2908
Brick bs2:/raid2 24011 Y
2914
Brick bs3:/raid1 24009 Y
2860
Brick bs3:/raid2 24011 Y
2866
Brick bs4:/raid1 24009 Y
2992
Brick bs4:/raid2 24011 Y
2998
Brick bs1:/raid1 24013 Y
10122
Brick bs1:/raid2 24015 Y
10154
NFS Server on localhost 38467 Y
9475
NFS Server on 10.2.7.11 38467 Y
10160
NFS Server on bs2 38467 N
N/A
NFS Server on bs3 38467 N
N/A
Hmm sure enough, bs1 and bs4 (localhost in the above info) appear to be
running NFS servers, while bs2 & bs3 are not...?
OK - after some googling, the gluster nfs serive can be shut off with
gluster volume set gl nfs.disable on
and now the status looks like this:
gluster volume status
Status of volume: gl
Gluster process Port Online
Pid
------------------------------------------------------------------------------
Brick bs2:/raid1 24009 Y
2908
Brick bs2:/raid2 24011 Y
2914
Brick bs3:/raid1 24009 Y
2860
Brick bs3:/raid2 24011 Y
2866
Brick bs4:/raid1 24009 Y
2992
Brick bs4:/raid2 24011 Y
2998
Brick bs1:/raid1 24013 Y
10122
Brick bs1:/raid2 24015 Y
10154
hjm
On Tue, 2012-06-19 at 13:05 -0700, Anand Avati wrote:
> Can you post the complete logs? Is the 'Too many levels of symbolic
> links' (or ELOOP) logs seen in the client log or brick logs?
>
>
> Avati
>
> On Tue, Jun 19, 2012 at 11:22 AM, harry mangalam
> <hjmangalam at gmail.com> wrote:
> (Apologies if this already posted, but I recently had to
> change smtp servers
> which scrambled some list permissions, and I haven't seen it
> post)
>
> I set up a 3.3 gluster volume for another sysadmin and he has
> added it
> to his cluster via automount. It seems to work initially but
> after some
> time (days) he is now regularly seeing this warning:
> "Too many levels of symbolic links"
> when he tries to traverse the mounted filesystems.
>
> $ df: `/share/gl': Too many levels of symbolic links
>
> It's supposed to be mounted on /share/gl with a symlink to /gl
> ie: /gl -> /share/gl
>
> I've been using gluster with static mounts on a cluster and
> have never
> seen this behavior; google does not seem to record anyone else
> seeing
> this with gluster. However, I note that the "Howto Automount
> GlusterFS"
> page at
> http://www.gluster.org/community/documentation/index.php/Howto_Automount_GlusterFS
> has been deleted. Is automounting no longer supported?
>
> His auto.master file is as follows (sorry for the wrapping):
>
> w1
> -rw,intr,bg,v3,rsize=16384,wsize=16384,retrans=10,timeo=20,hard,lock,defaults,noatime,async 10.1.50.2:/&
> w2
> -rw,intr,bg,v3,rsize=16384,wsize=16384,retrans=10,timeo=20,hard,lock,defaults,noatime,async 10.1.50.3:/&
> mathbio
> -rw,intr,bg,v3,rsize=16384,wsize=16384,retrans=10,timeo=20,hard,lock,defaults,noatime,async 10.1.50.2:/&
> tw
> -rw,intr,bg,v3,rsize=16384,wsize=16384,retrans=10,timeo=20,hard,lock,defaults,noatime,async 10.1.50.4:/&
> shwstore
> -rw,intr,bg,v3,rsize=16384,wsize=16384,lock,defaults,noatime,async
> shwraid.biomol.uci.edu:/&
> djtstore
> -rw,intr,bg,v3,rsize=16384,wsize=16384,lock,defaults,noatime,async
> djtraid.biomol.uci.edu:/&
> djtstore2
> -rw,intr,bg,v3,rsize=16384,wsize=16384,lock,defaults,noatime,async
> djtraid2.biomol.uci.edu:/djtraid2:/&
> djtstore3
> -rw,intr,bg,v3,rsize=16384,wsize=16384,lock,defaults,noatime,async
> djtraid3.biomol.uci.edu:/djtraid3:/&
> kevin
> -rw,intr,bg,rsize=65520,wsize=65520,retrans=10,timeo=20,hard,lock,defaults,noatime,async 10.2.255.230:/&
> samlab
> -rw,intr,bg,rsize=65520,wsize=65520,retrans=10,timeo=20,hard,lock,defaults,noatime,async 10.2.255.237:/&
> new-data
> -rw,intr,bg,rsize=65520,wsize=65520,retrans=10,timeo=20,hard,lock,defaults,noatime,async nas-1-1.ib:/&
> gl -fstype=glusterfs
> bs1:/&
>
>
> He has never seen this behavior with the other automounted
> fs's. The
> system logs from the affected nodes do not have any gluster
> strings that
> appear to be relevant, but /var/log/glusterfs/share-gl.log
> ends with
> this series of odd lines:
>
> [2012-06-18 08:57:38.964243] I
> [client-handshake.c:453:client_set_lk_version_cbk]
> 0-gl-client-6: Server
> lk version = 1
> [2012-06-18 08:57:38.964507] I [fuse-bridge.c:3376:fuse_init]
> 0-glusterfs-fuse: FUSE inited with protocol versions:
> glusterfs 7.13
> kernel 7.16
> [2012-06-18 09:16:48.692701] W
> [client3_1-fops.c:2630:client3_1_lookup_cbk] 0-gl-client-4:
> remote
> operation failed: Stale NFS file handle.
> Path: /tdlong/RILseq/makebam.commands
> (90193380-d107-4b6c-b02f-ab53a0f65148)
> [2012-06-18 09:16:48.693030] W
> [client3_1-fops.c:2630:client3_1_lookup_cbk] 0-gl-client-4:
> remote
> operation failed: Stale NFS file handle.
> Path: /tdlong/RILseq/makebam.commands
> (90193380-d107-4b6c-b02f-ab53a0f65148)
> [2012-06-18 09:16:48.693165] W
> [client3_1-fops.c:2630:client3_1_lookup_cbk] 0-gl-client-4:
> remote
> operation failed: Stale NFS file handle.
> Path: /tdlong/RILseq/makebam.commands
> (90193380-d107-4b6c-b02f-ab53a0f65148)
> [2012-06-18 09:16:48.693394] W
> [client3_1-fops.c:2630:client3_1_lookup_cbk] 0-gl-client-4:
> remote
> operation failed: Stale NFS file handle.
> Path: /tdlong/RILseq/makebam.commands
> (90193380-d107-4b6c-b02f-ab53a0f65148)
> [2012-06-18 10:56:32.756551] I
> [fuse-bridge.c:4037:fuse_thread_proc]
> 0-fuse: unmounting /share/gl
> [2012-06-18 10:56:32.757148] W
> [glusterfsd.c:816:cleanup_and_exit]
> (-->/lib64/libc.so.6(clone+0x6d) [0x3829ed44bd]
> (-->/lib64/libpthread.so.0 [0x382aa0673d]
> (-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0x17c)
> [0x40524c]))) 0-:
> received signum (15), shutting down
>
> Any hints as to why this is happening?
>
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>
>
More information about the Gluster-users
mailing list