[Bugs] [Bug 1357257] New: observing " Too many levels of symbolic links" after adding bricks and then issuing a replace brick

Sun Jul 17 09:57:18 UTC 2016

https://bugzilla.redhat.com/show_bug.cgi?id=1357257

            Bug ID: 1357257
           Summary: observing " Too many levels of symbolic links" after
                    adding bricks and then issuing a replace brick
           Product: GlusterFS
           Version: 3.8.1
         Component: nfs
          Keywords: Triaged
          Severity: high
          Assignee: bugs at gluster.org
          Reporter: ndevos at redhat.com
                CC: bugs at gluster.org
        Depends On: 1356068

+++ This bug was initially created as a clone of Bug #1356068 +++

Description of problem:
======================
As part of bug verification of bz#1278284, I was trying to do a add brick on an
ec volume followed by a replace brick. On doing that I observed " Too many
levels of symbolic links" on rhel6.7 client

When We do a add-brick to a pure ec volume, and then follow that with a new dir
creates and then file creates under new dir, and then do a replace brick of any
of the bricks (old set or new set),we see the above mentioned error on client.

I observerd this only on one of the clients(seen on rhel 6.7)

Was able to reproduce consistently and didn't observe on a replace brick issued
on a pure ec volume.

Version-Release number of selected component (if applicable):
all?

Steps to Reproduce:
==================
1.create a pure-ec volume (or even distributed)(say 1x(4+2))
2.mount volume on two clients in my case one was rhel7.2 and other was rhel6.7
both using different nfs servers
3.now create some files and dirs from both mnts
4. Now add bricks in this case the (4+2) ie 6 bricks
5. Now create a new dir and in that dir create another dir for each clients and
pump in IOs from both clients into their respective dirs(use dd command)
6. while Ios are in progress replace any of the bricks and it can be seen that
on rhel 6.7 client you get too many levels of symblic links

--- Additional comment from Ashish Pandey on 2016-05-20 05:48:02 EDT ---

I could recreate this issue with plain EC volume AND plain distributed volume.

I placed  different gf_msg in posix_create and _open.
I can see that the moment replace brick happens, connection to *ALL* the bricks
restart. So at that point of time NO brick was actually connected to client.

I mounted the volume using nfs and started creating files in a loop file-1 to
file-100.
The moment replace brick happens, the file which should be created at that
moment in for loop will NOT be created. It is this file for which we get "Too
many symbolic ..."
error message.
In brick logs we can see that the file-75 was created. "disconnect happens on
all the brick".  No log message for file-76. and then  file-77 created and so
on....

This is the logs from brick which was NOT replaced.

2016-05-20 07:29:27.942286] E [MSGID: 113069] [posix.c:3113osix_writev]
0-plain-posix: ************writev on 15  [Invalid argument]
[2016-05-20 07:29:28.019197] E [MSGID: 113069] [posix.c:2604osix_create]
0-plain-posix: ************Created  /brick/gluster/p2/TEST/118/file1-74 fd = 15
 [No such file or directory]
[2016-05-20 07:29:28.205203] E [MSGID: 113069] [posix.c:3113osix_writev]
0-plain-posix: ************writev on 15  [Invalid argument]
[2016-05-20 07:29:28.685664] I [MSGID: 115036] [server.c:552:server_rpc_notify]
0-plain-server: disconnecting connection from
rhs3-16301-2016/05/20-07:27:01:823285-plain-client-1-0-0
[2016-05-20 07:29:28.685827] I [MSGID: 101055] [client_t.c:420:gf_client_unref]
0-plain-server: Shutting down connection
rhs3-16301-2016/05/20-07:27:01:823285-plain-client-1-0-0
[2016-05-20 07:29:29.693590] I [glusterfsd-mgmt.c:58:mgmt_cbk_spec] 0-mgmt:
Volume file changed
[2016-05-20 07:29:29.748094] I [glusterfsd-mgmt.c:58:mgmt_cbk_spec] 0-mgmt:
Volume file changed
[2016-05-20 07:29:30.592393] I [glusterfsd-mgmt.c:1600:mgmt_getspec_cbk]
0-glusterfs: No change in volfile, continuing
[2016-05-20 07:29:30.592638] I [glusterfsd-mgmt.c:1600:mgmt_getspec_cbk]
0-glusterfs: No change in volfile, continuing
[2016-05-20 07:29:30.663883] I [login.c:81:gf_auth] 0-auth/login: allowed user
names: d02e404c-4feb-48fb-990d-619f6685d477
[2016-05-20 07:29:30.663934] I [MSGID: 115029]
[server-handshake.c:612:server_setvolume] 0-plain-server: accepted client from
rhs3-16396-2016/05/20-07:29:29:744958-plain-client-1-0-0 (version: 3.7.5)
[2016-05-20 07:29:31.856194] I [dict.c:473ict_get]
(-->/usr/local/lib/libglusterfs.so.0(default_getxattr_cbk+0xc2)
[0x7f80d7f31362]
-->/usr/local/lib/glusterfs/3.7.5/xlator/features/marker.so(marker_getxattr_cbk+0xde)
[0x7f80cc4adebe] -->/usr/local/lib/libglusterfs.so.0(dict_get+0x63)
[0x7f80d7f1ad43] ) 0-dict: !this || key=() [Invalid argument]
[2016-05-20 07:29:32.693587] E [MSGID: 113069] [posix.c:2604osix_create]
0-plain-posix: ************Created  /brick/gluster/p2/TEST/118/file1-81 fd = 15
 [No such file or directory]
[2016-05-20 07:29:32.868530] E [MSGID: 113069] [posix.c:3113osix_writev] 0-

  --------------------------------------------
512+0 records out
524288 bytes (524 kB) copied, 0.601281 s, 872 kB/s
dd: closing output file `TEST/118/file1-75': Stale file handle
dd: opening `TEST/118/file1-76': Too many levels of symbolic links
512+0 records in
512+0 records out
----------------------------------------------

I also tried it with PLAIN distributed volume and getting the same error
message on mount point and on brick logs
In fact the above logs are for plain distributed volume.

I think this is happening because glusterd_svcs_stop function is getting called
by glusterd_op_replace_brick function.

glusterd_svcs_stop will stop nfs service along with shd, quotad and others.

This causes all the connection to the bricks to be disconnected which will be
connected again.
But for this duration bricks will be unreachable. 

Do we need to stop and start the nfs for replace bricks?

--- Additional comment from Atin Mukherjee on 2016-05-24 05:43:45 EDT ---

(In reply to Ashish Pandey from comment #9)
> I could recreate this issue with plain EC volume AND plain distributed
> volume.
> 
> I placed  different gf_msg in posix_create and _open.
> I can see that the moment replace brick happens, connection to *ALL* the
> bricks restart. So at that point of time NO brick was actually connected to
> client.
> 
> I mounted the volume using nfs and started creating files in a loop file-1
> to file-100.
> The moment replace brick happens, the file which should be created at that
> moment in for loop will NOT be created. It is this file for which we get
> "Too many symbolic ..."
> error message.
> In brick logs we can see that the file-75 was created. "disconnect happens
> on all the brick".  No log message for file-76. and then  file-77 created
> and so on....
> 
> This is the logs from brick which was NOT replaced.
> 
> 2016-05-20 07:29:27.942286] E [MSGID: 113069] [posix.c:3113osix_writev]
> 0-plain-posix: ************writev on 15  [Invalid argument]
> [2016-05-20 07:29:28.019197] E [MSGID: 113069] [posix.c:2604osix_create]
> 0-plain-posix: ************Created  /brick/gluster/p2/TEST/118/file1-74 fd =
> 15  [No such file or directory]
> [2016-05-20 07:29:28.205203] E [MSGID: 113069] [posix.c:3113osix_writev]
> 0-plain-posix: ************writev on 15  [Invalid argument]
> [2016-05-20 07:29:28.685664] I [MSGID: 115036]
> [server.c:552:server_rpc_notify] 0-plain-server: disconnecting connection
> from rhs3-16301-2016/05/20-07:27:01:823285-plain-client-1-0-0
> [2016-05-20 07:29:28.685827] I [MSGID: 101055]
> [client_t.c:420:gf_client_unref] 0-plain-server: Shutting down connection
> rhs3-16301-2016/05/20-07:27:01:823285-plain-client-1-0-0
> [2016-05-20 07:29:29.693590] I [glusterfsd-mgmt.c:58:mgmt_cbk_spec] 0-mgmt:
> Volume file changed
> [2016-05-20 07:29:29.748094] I [glusterfsd-mgmt.c:58:mgmt_cbk_spec] 0-mgmt:
> Volume file changed
> [2016-05-20 07:29:30.592393] I [glusterfsd-mgmt.c:1600:mgmt_getspec_cbk]
> 0-glusterfs: No change in volfile, continuing
> [2016-05-20 07:29:30.592638] I [glusterfsd-mgmt.c:1600:mgmt_getspec_cbk]
> 0-glusterfs: No change in volfile, continuing
> [2016-05-20 07:29:30.663883] I [login.c:81:gf_auth] 0-auth/login: allowed
> user names: d02e404c-4feb-48fb-990d-619f6685d477
> [2016-05-20 07:29:30.663934] I [MSGID: 115029]
> [server-handshake.c:612:server_setvolume] 0-plain-server: accepted client
> from rhs3-16396-2016/05/20-07:29:29:744958-plain-client-1-0-0 (version:
> 3.7.5)
> [2016-05-20 07:29:31.856194] I [dict.c:473ict_get]
> (-->/usr/local/lib/libglusterfs.so.0(default_getxattr_cbk+0xc2)
> [0x7f80d7f31362]
> -->/usr/local/lib/glusterfs/3.7.5/xlator/features/marker.
> so(marker_getxattr_cbk+0xde) [0x7f80cc4adebe]
> -->/usr/local/lib/libglusterfs.so.0(dict_get+0x63) [0x7f80d7f1ad43] )
> 0-dict: !this || key=() [Invalid argument]
> [2016-05-20 07:29:32.693587] E [MSGID: 113069] [posix.c:2604osix_create]
> 0-plain-posix: ************Created  /brick/gluster/p2/TEST/118/file1-81 fd =
> 15  [No such file or directory]
> [2016-05-20 07:29:32.868530] E [MSGID: 113069] [posix.c:3113osix_writev] 0-
> 
>   --------------------------------------------
> 512+0 records out
> 524288 bytes (524 kB) copied, 0.601281 s, 872 kB/s
> dd: closing output file `TEST/118/file1-75': Stale file handle
> dd: opening `TEST/118/file1-76': Too many levels of symbolic links
> 512+0 records in
> 512+0 records out
> ----------------------------------------------
> 
> 
> I also tried it with PLAIN distributed volume and getting the same error
> message on mount point and on brick logs
> In fact the above logs are for plain distributed volume.
> 
> 
> I think this is happening because glusterd_svcs_stop function is getting
> called by glusterd_op_replace_brick function.
> 
> glusterd_svcs_stop will stop nfs service along with shd, quotad and others.
We'd need to restart all the daemons in that case and that's why you see that
glusterd issues a glusterd_svcs_stop () before performing the replace brick
functionality and then it invokes glusterd_svcs_manager () post the replace
brick operation such that all the clients get notified about the topology
change.
> 
> This causes all the connection to the bricks to be disconnected which will
> be connected again.
> But for this duration bricks will be unreachable. 
> 
> Do we need to stop and start the nfs for replace bricks?

--- Additional comment from Ashish Pandey on 2016-05-24 05:59:35 EDT ---

This issue is happening with plain distribute volume too.
Somehow, nfs is propagating wrong errno, ELOOP, instead of ENOTCON which gives 
this error message.

--- Additional comment from Soumya Koduri on 2016-06-13 07:16:30 EDT ---

gluster-NFS server doesn't seem to be sending ELOOP error anytime. It is not a
standard NFSv3 error. Most probably this error is generated on the NFS-client.
One of the reasons could be when the NFS-client detects a loop during readdir
from bug770250 .i.e, if the directory cookie (dir_cookie) of a child directory
is same as any of the parent directory's cookie in the client page cache. 

I am guessing probably when gluster-NFS server is restarted, the cookie
generated and sent by the server post reboot could have collided with the value
it might had sent before. A pkt trace collected from the client should help us
here.

Also please check if there is below log message in the kernel logs on the
client machine.

[root at skoduri n]# dmesg | grep readdir
[21615.224218] NFS: directory /n contains a readdir loop.Please contact your
server vendor.  The file: . has duplicate cookie 1457221137058955403
[root at skoduri n]#

--- Additional comment from Ashish Pandey on 2016-06-15 05:32:05 EDT ---

Soumya,

This issue is happening with plain distributed volume also. No need of afr or
ec volume.
Just create a distributed volume , mount it using nfs and do IO . Replace a
brick as mentioned above and you can hit the issue.

I could have provided all the info you asked but I think if you recreate it,
you can check all the things from nfs-ganesha point of you in real time.

--- Additional comment from Kaleb KEITHLEY on 2016-06-15 09:50:24 EDT ---

Please provide a test case

--- Additional comment from Soumya Koduri on 2016-07-12 06:17:20 EDT ---

After doing some experiments, here is what seem to have happened. Client is
throwing this error when the server doesn't respond to its lookup request for
any file. I ran following test to confirm that -

Created 1*2 volume -

Status of volume: vol1
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 192.168.122.201:/vol1                 49152     0          Y       4249 
Brick 192.168.122.201:/vol1_r               49153     0          Y       4254 
Self-heal Daemon on localhost               N/A       N/A        N       N/A  

[root at skoduri mnt]# for i in `seq 1 100`; do mkdir dir.$i ; for j in `seq 1
100`; do dd if=/dev/urandom of=dir.$i/testfile.$i.$j bs=1k count=$j & done
;wait ; done

Now put a breakpoint in the nfs process in lookup code path "nfs_lookup()"

(gdb) b nfs_lookup
Breakpoint 1 at 0x7f603dae81a0: file nfs-generics.c, line 82.
(gdb) c
Continuing.
[Switching to Thread 0x7f603e48e700 (LWP 7737)]

Breakpoint 1, nfs_lookup (nfsx=0x7f6038019610, xl=0x7f603800ebc0,
nfu=nfu at entry=0x7f603e48bae0, 
    pathloc=pathloc at entry=0x7f60340b9d10, cbk=cbk at entry=0x7f603daf4030
<nfs3svc_lookup_cbk>, 
    local=local at entry=0x7f60340b9860) at nfs-generics.c:82
warning: Source file is more recent than executable.
82            if ((!nfsx) || (!xl) || (!pathloc) || (!nfu))
(gdb) p *pathloc
$1 = {path = 0x7f60381d1d60 "/dir.4/testfile.4.10", name = 0x7f60381d1d67
"testfile.4.10", inode = 0x7f603515c9f4, 
  parent = 0x7f603515bfac, gfid = '\000' <repeats 15 times>, pargfid =
"[Z-\317$/LV\262\374\005\214\360Y\246g"}
(gdb)   

Now restart nfs process -

[root at dhcp35-197 glusterfs]# gluster v start vol1 force
volume start: vol1: success
[root at dhcp35-197 glusterfs]# 

I could see ELOOP error thrown -

[skoduri at skoduri ~]$ sudo tail -f /var/log/messages | grep nfs | grep loop

Jul 12 15:32:30 skoduri kernel: [149340.627189] VFS: Lookup of 'testfile.4.10'
in nfs 0:57 would have caused loop

[root at skoduri skoduri]# tshark -r /tmp/nfs-ELOOP.pcap nfs.name ==
"testfile.4.10"
Running as user "root" and group "root". This could be dangerous.
11807  21.644349 192.168.122.1 -> 192.168.122.201 NFS 232 V3 LOOKUP Call, DH:
0xac64f17e/testfile.4.10
12032  41.336656 192.168.122.1 -> 192.168.122.201 NFS 14548 V3 WRITE Call, FH:
0x98e32213 Offset: 0 Len: 18432 FILE_SYNC
[root at skoduri skoduri]# 

Note for some weird reason, the second WRITE call seems to be amalgamation of
two different packets (1st one being WRITE on FH 0x98e32213 followed by lookup
of 0xac64f17e/testfile.4.10).

As can be seen there is not reply sent to the lookup on
0xac64f17e/testfile.4.10

Will attach pkt trace and rpcdebug log messages.

--- Additional comment from Soumya Koduri on 2016-07-13 05:09:09 EDT ---

Inputs from Niels - 

>>>
The affected file is testfile.4.10. In Wireshark we can find the LOOKUP
procedure with a filter like 'nfs.name == "testfile.4.10"'. This results
in the LOOKUP Call, twice...

- frame 11807 is a plain LOOKUP
- frame 12032 is a WRITE Call, but it also carries the same LOOKUP again

The reply on the LOOKUP, found with "rpc.xid == 0xf4ef4e62" is in frame
12049.

The contents of the LOOKUP Reply in the obj_attributes have the details
from a directory, whereas testfile.4.10 really should be a file, or
probably NFS3ERR_NOENT. The directory has the same 'fileid' (used as
inode by the NFS-client) as the parent directory of testfile.4.10. This
obviously is an incorrect reply.

The fileid/inode from the LOOKUP Reply matches the one in the messages
file too (12897189535768225383).

This looks like a bug in Gluster/NFS to me... Hooray?

If we can capture a tcpdump on the NFS-server, we can see the GlusterFS
protocol too. This then makes it possible to dissect the filehandle in
volume-id + GFID, which we then can match with the replies that the
bricks send.
<<<

--- Additional comment from Soumya Koduri on 2016-07-13 05:37:14 EDT ---

As suggested by Niels above, have taken pkt trace on the server side (copied to
the same machine mentioned above "/root/bug1328451/nfs-ELOOP-server.pcap"

568 152.036090 192.168.122.1 -> 192.168.122.201 NFS 228 V3 LOOKUP Call, DH:
0x84b243c9/testfile.1
569 152.036116 192.168.122.201 -> 192.168.122.1 TCP 68 2049→666 [ACK] Seq=1
Ack=161 Win=30080 Len=0 TSval=4495843 TSecr=10310504
570 152.036717 192.168.122.201 -> 192.168.122.201 GlusterFS 340 V330 LOOKUP
Call, Filename: (nameless, by GFID)
571 152.037099 192.168.122.201 -> 192.168.122.201 GlusterFS 360 V330 LOOKUP
Reply (Call In 570)
572 152.037126 192.168.122.201 -> 192.168.122.201 TCP 68 49145→49155 [ACK]
Seq=2045 Ack=1549 Win=50176 Len=0 TSval=4495845 TSecr=4495845
573 152.037331 192.168.122.201 -> 192.168.122.201 GlusterFS 296 V330 LOOKUP
Call, Filename: (nameless, by GFID)
574 152.037643 192.168.122.201 -> 192.168.122.201 GlusterFS 360 V330 LOOKUP
Reply (Call In 573)
575 152.037845 192.168.122.201 -> 192.168.122.1 NFS 332 V3 LOOKUP Reply (Call
In 568), FH: 0x84b243c9

>From the pkt trace, looks like when nfs server received LOOKUP of
0x84b243c9/testfile.1, it performed glusterfs nameless LOOKUP of root gfid
followed by dir1 gfid. But there is no subsequent LOOKUP done on testfile.
That's the reason we could see dir1 attributes in the NFS LOOKUP reply

>From gdb(/code), 

Post nameless lookup of dir1 gfid, 

(gdb) bt
#0  nfs3_lookup_resume (carg=0x7fffe07b906c) at nfs3.c:1438
#1  0x00007fffea5358f4 in nfs3_fh_resolve_entry_hard
(cs=cs at entry=0x7fffe07b906c) at nfs3-helpers.c:3789
#2  0x00007fffea535b19 in nfs3_fh_resolve_inode_lookup_cbk (frame=<optimized
out>, cookie=<optimized out>, 
    this=0x7fffe4019610, op_ret=<optimized out>, op_errno=<optimized out>,
inode=<optimized out>, 
    buf=0x7fffe82a6104, xattr=0x7ffff55581e4, postparent=0x7fffe82a6334) at
nfs3-helpers.c:3714
#3  0x00007fffea50d611 in nfs_fop_lookup_cbk (frame=0x7ffff5db8a5c,
cookie=0x7fffe4018650, this=<optimized out>, 
    op_ret=<optimized out>, op_errno=<optimized out>, inode=<optimized out>,
buf=0x7fffe82a6104, 
    xattr=0x7ffff55581e4, postparent=0x7fffe82a6334) at nfs-fops.c:430
#4  0x00007ffff7e1a897 in io_stats_lookup_cbk (frame=0x7ffff5db8b30,
cookie=<optimized out>, this=<optimized out>, 
    op_ret=0, op_errno=117, inode=0x7fffe8f97128, buf=0x7fffe82a6104,
xdata=0x7ffff55581e4, 
    postparent=0x7fffe82a6334) at io-stats.c:2116
#5  0x00007fffea59a710 in dht_discover_complete
(this=this at entry=0x7fffe4016a30, 
    discover_frame=discover_frame at entry=0x7ffff5db87e0) at dht-common.c:383
#6  0x00007fffea59b38b in dht_discover_cbk (frame=0x7ffff5db87e0,
cookie=<optimized out>, this=0x7fffe4016a30, 
    op_ret=<optimized out>, op_errno=22, inode=0x7fffe8f97128,
stbuf=0x7fffdaefda70, xattr=0x7ffff55581e4, 
    postparent=0x7fffdaefdae0) at dht-common.c:512
#7  0x00007fffea6832dc in client3_3_lookup_cbk (req=<optimized out>,
iov=<optimized out>, count=<optimized out>, 
    myframe=0x7ffff5db8490) at client-rpc-fops.c:2937
#8  0x00007ffff7efde90 in rpc_clnt_handle_reply
(clnt=clnt at entry=0x7fffe405b0f0, pollin=pollin at entry=0x7fffd4003aa0)
    at rpc-clnt.c:759
#9  0x00007ffff7efe16c in rpc_clnt_notify (trans=<optimized out>,
mydata=0x7fffe405b120, event=<optimized out>, 
    data=0x7fffd4003aa0) at rpc-clnt.c:920
#10 0x00007ffff7efa543 in rpc_transport_notify (this=this at entry=0x7fffe406ae30, 
    event=event at entry=RPC_TRANSPORT_MSG_RECEIVED,
data=data at entry=0x7fffd4003aa0) at rpc-transport.c:541
#11 0x00007ffff7e431a3 in socket_event_poll_in (this=this at entry=0x7fffe406ae30)
at socket.c:2238
#12 0x00007ffff7e45b68 in socket_event_handler (fd=<optimized out>, idx=9,
data=0x7fffe406ae30, poll_in=1, 
    poll_out=0, poll_err=0) at socket.c:2351
#13 0x00007ffff7f8388a in event_dispatch_epoll_handler (event=0x7fffdaefdf10,
event_pool=0x44ceb0)
    at event-epoll.c:571
#14 event_dispatch_epoll_worker (data=0x7fffe4041bf0) at event-epoll.c:674
#15 0x00007ffff72a260a in start_thread (arg=0x7fffdaefe700) at
pthread_create.c:334
---Type <return> to continue, or q <return> to quit--- 
#16 0x00007ffff6b8fbbd in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:109
(gdb) n
1441            nfs_user_t                      nfu = {0, };
(gdb) 
1438    {
(gdb) 
1441            nfs_user_t                      nfu = {0, };
(gdb) 
1438    {
(gdb) 
1445            if (!carg) {
(gdb) 
1441            nfs_user_t                      nfu = {0, };
(gdb) 
1443            struct nfs3_fh                  newfh = {{0},};
(gdb) 
1441            nfs_user_t                      nfu = {0, };
(gdb) 
1443            struct nfs3_fh                  newfh = {{0},};
(gdb) 
1441            nfs_user_t                      nfu = {0, };
(gdb) 
1443            struct nfs3_fh                  newfh = {{0},};
(gdb) 
1445            if (!carg) {
(gdb) 
1452            nfs3_check_fh_auth_status (cs, stat, _gf_false, nfs3err);
(gdb) 
1453            nfs3_check_fh_resolve_status (cs, stat, nfs3err);
(gdb) p cs->resolventry
$3 = 0x7fffe4131a70 "testfile.1"
(gdb) p cs->resolvefh
$4 = {ident = ":OGL", exportid =
"\205\376E\272\361+HI\241\256\060t\005\024\066\242", 
  gfid = "\374K\332D\t\310Ho\263-\263\241%R\260\277", mountid =
"\265B\202\327", '\000' <repeats 11 times>, 
  padding = '\000' <repeats 11 times>}
(gdb) n
1454            cs->parent = cs->resolvefh;
(gdb) 
1456        if (cs->hardresolved) {
(gdb) 
1458            nfs3_fh_build_child_fh (&cs->parent, &cs->stbuf, &newfh);
(gdb) 
1440            int                             ret = -EFAULT;
(gdb) p cs->hardresolved
$5 = 1
(gdb) n
1457            stat = NFS3_OK;
(gdb) 
1458            nfs3_fh_build_child_fh (&cs->parent, &cs->stbuf, &newfh);
(gdb) 
1459            goto nfs3err;
(gdb) 
1470                    nfs3_log_common_res (rpcsvc_request_xid (cs->req),
(gdb) 
1473                    nfs3_lookup_reply (cs->req, stat, &newfh, &cs->stbuf,
(gdb) 
1475                    nfs3_call_state_wipe (cs);
(gdb) 

In 

int
nfs3_fh_resolve_entry_hard (nfs3_call_state_t *cs)
{
        int             ret = -EFAULT;
        nfs_user_t      nfu = {0, };

        if (!cs)
                return ret;

        nfs_loc_wipe (&cs->resolvedloc);
        nfs_user_root_create (&nfu);
        gf_msg_trace (GF_NFS3, 0, "FH hard resolution: gfid: %s "
                      ", entry: %s", uuid_utoa (cs->resolvefh.gfid),
                      cs->resolventry);

        ret = nfs_entry_loc_fill (cs->nfsx, cs->vol->itable,
cs->resolvefh.gfid,
                                  cs->resolventry, &cs->resolvedloc,
                                  NFS_RESOLVE_CREATE);

>>>>> Since entry is not present in the inode table, ret is '-2'.

        if (ret == -2) {
                gf_msg_trace (GF_NFS3, 0, "Entry needs lookup: %s",
                              cs->resolvedloc.path);
                /* If the NFS op is lookup, let the resume callback
                 * handle the sending of the lookup fop. Similarly,
                 * if the NFS op is create, let the create call
                 * go ahead in the resume callback so that an EEXIST gets
                 * handled at posix without an extra fop at this point.
                 */
                if (nfs3_lookup_op (cs) ||
                    (nfs3_create_op (cs) && !nfs3_create_exclusive_op (cs))) {
                        cs->lookuptype = GF_NFS3_FRESH;
                        cs->resolve_ret = 0;
                        nfs3_call_resume (cs);

>>>>> Since it is LOOKUP fop, we call resume_fn - nfs3_lookup_resume() which responds with the stat of pargfid ('dir1' in this case). 

                } else {
                        cs->hardresolved = 1;
                        nfs_lookup (cs->nfsx, cs->vol, &nfu, &cs->resolvedloc,
                                    nfs3_fh_resolve_entry_lookup_cbk, cs);
                }
                ret = 0;

The fix seems to be either fall through above else case for LOOKUP fop or check
for cs->resolveentry and resolve it in "nfs3_lookup_resume" as well.

--- Additional comment from Soumya Koduri on 2016-07-13 13:05:53 CEST ---

This seems to be a bug in the nameless lookup resolution of the Gluster/NFS
server code-path (typically happens post restart). NFS xlator seems to be
bailing out post LOOKUP of parent directory FH (without resolving the child
file entry) resulting in sending stat of parent dir. 

NFS-client on receiving the attributes of an entry same as it parent, threw an
error "Too many levels of symbolic links" to the application.

--- Additional comment from Vijay Bellur on 2016-07-17 09:58:54 CEST ---

COMMIT: http://review.gluster.org/14911 committed in master by Niels de Vos
(ndevos at redhat.com) 
------
commit 3c485cb896837c8e362fd0b094325002ce806ac4
Author: Soumya Koduri <skoduri at redhat.com>
Date:   Wed Jul 13 16:24:31 2016 +0530

    nfs: Reset cs->resolvedhard while resolving an entry

    If an entry is not found in the inode table, nfs xlator should be
    resolving it by sending an explicit lookup to the brick process.
    But currently its broken in case of NFS3_LOOKUP fop where in the server
    bails out early resulting in sending pargfid attributes to the client.
    To fix the same reset 'cs->resolvedhard' so that an explicit lookup
    is done for the entry in the resume_fn "nfs3_lookup_resume()".

    Change-Id: I999f8bca7ad008526c174d13f69886dc809d9552
    Signed-off-by: Soumya Koduri <skoduri at redhat.com>
    BUG: 1356068
    Reviewed-on: http://review.gluster.org/14911
    CentOS-regression: Gluster Build System <jenkins at build.gluster.org>
    NetBSD-regression: NetBSD Build System <jenkins at build.gluster.org>
    Smoke: Gluster Build System <jenkins at build.gluster.org>
    Reviewed-by: Niels de Vos <ndevos at redhat.com>

Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1356068
[Bug 1356068] observing " Too many levels of symbolic links" after adding
bricks and then issuing a replace brick
-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.