[Bugs] [Bug 1610256] New: [Ganesha] While performing lookups from two of the clients, "ls" command got failed with "Invalid argument"

bugzilla at redhat.com bugzilla at redhat.com
Tue Jul 31 10:12:09 UTC 2018


https://bugzilla.redhat.com/show_bug.cgi?id=1610256

            Bug ID: 1610256
           Summary: [Ganesha] While performing lookups from two of the
                    clients, "ls" command got failed with "Invalid
                    argument"
           Product: GlusterFS
           Version: mainline
         Component: libgfapi
          Severity: high
          Assignee: jthottan at redhat.com
          Reporter: jthottan at redhat.com
        QA Contact: bugs at gluster.org
                CC: amukherj at redhat.com, bugs at gluster.org,
                    dang at redhat.com, ffilz at redhat.com,
                    grajoria at redhat.com, jthottan at redhat.com,
                    msaini at redhat.com, nbalacha at redhat.com,
                    rcyriac at redhat.com, rhinduja at redhat.com,
                    rhs-bugs at redhat.com, sankarshan at redhat.com,
                    spalai at redhat.com, storage-qa-internal at redhat.com
            Blocks: 1569657



+++ This bug was initially created as a clone of Bug #1569657 +++

Description of problem:

Single volume mounted via 4 different VIP's on 4 clients (v3/v4).While running
linux untar,dbench,iozone from 2 clients and parallel lookups from another 2
clients, lookups got failed on both the clients.

Client on which lookup failed did following sequence 
   Client 1:
  -> while true;do find . -mindepth 1 -type f;done
  ->  while true;do ls -lRt;done 

   Client 2:
  -> find command in loop

Doing "ls" on the same mount point-

[root at dhcp47-33 mani-mount]# ls
ls: reading directory .: Invalid argument
[root at dhcp47-33 mani-mount]# ls
ls: reading directory .: Invalid argument
[root at dhcp47-33 mani-mount]# ls
ls: reading directory .: Invalid argument
[root at dhcp47-33 mani-mount]# ls
ls: reading directory .: Invalid argument
[root at dhcp47-33 mani-mount]# ls
ls: reading directory .: Invalid argument
[root at dhcp47-33 mani-mount]# ls
ls: reading directory .: Invalid argument
[root at dhcp47-33 mani-mount]# ls
ls: reading directory .: Invalid argument
[root at dhcp47-33 mani-mount]# 


Able to create files and dirs on same mount-

[root at dhcp47-33 mani-mount]# touch mani
[root at dhcp47-33 mani-mount]# touch mani1
[root at dhcp47-33 mani-mount]# touch mani2
[root at dhcp47-33 mani-mount]# touch mani3
[root at dhcp47-33 mani-mount]# mkdir ms1
[root at dhcp47-33 mani-mount]# mkdir ms2
[root at dhcp47-33 mani-mount]# ls
ls: reading directory .: Invalid argument
[root at dhcp47-33 mani-mount]# ls
ls: reading directory .: Invalid argument


Another client on which lookups failed-

[root at dhcp46-20 mani-mount]# ^C
[root at dhcp46-20 mani-mount]# ls
ls: reading directory .: Invalid argument
[root at dhcp46-20 mani-mount]# ls
ls: reading directory .: Invalid argument
[root at dhcp46-20 mani-mount]# ls
ls: reading directory .: Invalid argument
[root at dhcp46-20 mani-mount]# ls
ls: reading directory .: Invalid argument
[root at dhcp46-20 mani-mount]# ls
ls: reading directory .: Invalid argument



Unmounted and remounted the same volume on same client with same VIP.Issue
still exist.

Mounted the same volume on another client with same VIP.Again "ls" unable to
list content

Did "ls" from one of the client from which iozone was ongoing,able to get data-

mani-mount]# ls
dir1  f2           linux-4.9.5.tar.xz  mani1  mani3  ms2      test
f1    linux-4.9.5  mani                mani2  ms1    run6396  test1



Version-Release number of selected component (if applicable):

# rpm -qa | grep ganesha
nfs-ganesha-gluster-2.5.5-4.el7rhgs.x86_64
glusterfs-ganesha-3.12.2-7.el7rhgs.x86_64
nfs-ganesha-2.5.5-4.el7rhgs.x86_64


How reproducible:
1/1

Steps to Reproduce:
1.Create 4 node ganesha cluster
2.Create  2 x (2 + 1) arbiter volume
3.Export the volume via ganesha
4.Mount the volume on 4 clients with 4 different VIP's.
   2 clients with vers=3 and 2 clients with vers=4.0
5.Perform following data set-
  -> Client 1 (v3):Run dbench first.Post completion run iozone
  -> Client 2 (v4):lookups finds and ls -lRt in loop
  -> Client 3 (v3):lookups finds
  -> Client 4 (v4) :liux untars

Actual results:
Lookups got failed from both the clients performing lookups.No impact on
ongoing IO's


Expected results:
lookups should not fail


Additional info:

Not able to find any error logs causing lookups to fail in ganesha-gfapi.logs.
On all the 4 server node,ganesha is up and running


# showmount -e
Export list for dhcp37-120.lab.eng.blr.redhat.com:
/Ganesha-lock (everyone)
/mani-test1   (everyone)

------------------------------

]#
[root at dhcp47-33 mani-mount]# ls
ls: reading directory .: Invalid argument

A
--- Additional comment from Jiffin on 2018-04-23 03:01:59 EDT ---

Reason for error :
after performing readdir call in, ganesha's mdcache(not gluster mdcahe)
performs a getattr call on each entries of dirent list to refresh it's cache.
When gettattr call reaches fsal_gluster, first it performs the glfs_h_stat, for
directory  in the root "ms2"(gfid : 59d7dc9b-e2ae-4bca-8b97-14539fe1aa7a) , one
of the layer in client stack returned EINVAL(I was not able to find any packets
related to this gfid). I find only this server in ganesha cluster have that
issue
And I have lost setup in same state and didn't find layer returned EINVAL 

At back end :

# getfattr -d -m "." -e hex ms2
# file: ms2
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.gfid=0x59d7dc9be2ae4bca8b9714539fe1aa7a
trusted.glusterfs.dht=0x0000000000000000000000007ffffffe
trusted.glusterfs.dht.mds=0x00000000

I found following messages in ganesha-gfapi.log :

[2018-04-19 16:50:33.923678] E [MSGID: 101046]
[dht-common.c:1857:dht_revalidate_cbk] 1-mani-test1-dht: dict is null
[2018-04-19 16:51:37.606081] I [MSGID: 109063]
[dht-layout.c:713:dht_layout_normalize] 1-mani-test1-dht: Found anomalies in
(null) (gfid = a0056ea8-18ac-431f-a2a0-b06a5355998f). Holes=1 overlaps=0
[2018-04-19 16:53:07.867398] I [MSGID: 109063]
[dht-layout.c:713:dht_layout_normalize] 1-mani-test1-dht: Found anomalies in
(null) (gfid = 859569c9-d4fa-49e0-b15a-5102b85f3c51). Holes=1 overlaps=0
[2018-04-19 16:55:28.636417] I [MSGID: 109063]
[dht-layout.c:713:dht_layout_normalize] 1-mani-test1-dht: Found anomalies in
(null) (gfid = 74f8d078-af5a-4cb2-9241-a1a080c47e7d). Holes=1 overlaps=0
[2018-04-19 16:59:05.204906] I [MSGID: 109063]
[dht-layout.c:713:dht_layout_normalize] 1-mani-test1-dht: Found anomalies in
(null) (gfid = 59e0bb2d-7ffa-444d-b071-69963db29047). Holes=1 overlaps=0
[2018-04-19 17:07:47.896369] I [MSGID: 109063]
[dht-layout.c:713:dht_layout_normalize] 1-mani-test1-dht: Found anomalies in
(null) (gfid = 29887570-967c-4603-a4bf-a55601b0d0f3). Holes=1 overlaps=0
[2018-04-19 17:10:27.273871] I [MSGID: 109063]
[dht-layout.c:713:dht_layout_normalize] 1-mani-test1-dht: Found anomalies in
(null) (gfid = 877b79e6-a47c-479d-b7b5-5879a4c21fca). Holes=1 overlaps=0
[2018-04-19 17:10:41.758168] I [MSGID: 109063]
[dht-layout.c:713:dht_layout_normalize] 1-mani-test1-dht: Found anomalies in
(null) (gfid = 59d7dc9b-e2ae-4bca-8b97-14539fe1aa7a). Holes=1 overlaps=0


Manisha :

Since set up was not in same state, priority of this bug depends on how
reproducible  the issue is ?

@Dang :

Is it okay to skip the entry which failed "getattrs" in from directory list and
continue with rest of entries instead of failing the entire readdir operation ?

@Nithya :
Have u encountered any similar issue with dht ?

--- Additional comment from Manisha Saini on 2018-04-23 03:16:32 EDT ---

(In reply to Jiffin from comment #3)
> Reason for error :
> after performing readdir call in, ganesha's mdcache(not gluster mdcahe)
> performs a getattr call on each entries of dirent list to refresh it's
> cache. When gettattr call reaches fsal_gluster, first it performs the
> glfs_h_stat, for directory  in the root "ms2"(gfid :
> 59d7dc9b-e2ae-4bca-8b97-14539fe1aa7a) , one of the layer in client stack
> returned EINVAL(I was not able to find any packets related to this gfid). I
> find only this server in ganesha cluster have that issue
> And I have lost setup in same state and didn't find layer returned EINVAL 
> 
> At back end :
> 
> # getfattr -d -m "." -e hex ms2
> # file: ms2
> security.
> selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f7
> 43a733000
> trusted.gfid=0x59d7dc9be2ae4bca8b9714539fe1aa7a
> trusted.glusterfs.dht=0x0000000000000000000000007ffffffe
> trusted.glusterfs.dht.mds=0x00000000
> 
> I found following messages in ganesha-gfapi.log :
> 
> [2018-04-19 16:50:33.923678] E [MSGID: 101046]
> [dht-common.c:1857:dht_revalidate_cbk] 1-mani-test1-dht: dict is null
> [2018-04-19 16:51:37.606081] I [MSGID: 109063]
> [dht-layout.c:713:dht_layout_normalize] 1-mani-test1-dht: Found anomalies in
> (null) (gfid = a0056ea8-18ac-431f-a2a0-b06a5355998f). Holes=1 overlaps=0
> [2018-04-19 16:53:07.867398] I [MSGID: 109063]
> [dht-layout.c:713:dht_layout_normalize] 1-mani-test1-dht: Found anomalies in
> (null) (gfid = 859569c9-d4fa-49e0-b15a-5102b85f3c51). Holes=1 overlaps=0
> [2018-04-19 16:55:28.636417] I [MSGID: 109063]
> [dht-layout.c:713:dht_layout_normalize] 1-mani-test1-dht: Found anomalies in
> (null) (gfid = 74f8d078-af5a-4cb2-9241-a1a080c47e7d). Holes=1 overlaps=0
> [2018-04-19 16:59:05.204906] I [MSGID: 109063]
> [dht-layout.c:713:dht_layout_normalize] 1-mani-test1-dht: Found anomalies in
> (null) (gfid = 59e0bb2d-7ffa-444d-b071-69963db29047). Holes=1 overlaps=0
> [2018-04-19 17:07:47.896369] I [MSGID: 109063]
> [dht-layout.c:713:dht_layout_normalize] 1-mani-test1-dht: Found anomalies in
> (null) (gfid = 29887570-967c-4603-a4bf-a55601b0d0f3). Holes=1 overlaps=0
> [2018-04-19 17:10:27.273871] I [MSGID: 109063]
> [dht-layout.c:713:dht_layout_normalize] 1-mani-test1-dht: Found anomalies in
> (null) (gfid = 877b79e6-a47c-479d-b7b5-5879a4c21fca). Holes=1 overlaps=0
> [2018-04-19 17:10:41.758168] I [MSGID: 109063]
> [dht-layout.c:713:dht_layout_normalize] 1-mani-test1-dht: Found anomalies in
> (null) (gfid = 59d7dc9b-e2ae-4bca-8b97-14539fe1aa7a). Holes=1 overlaps=0
> 
> 
> Manisha :
> 
> Since set up was not in same state, priority of this bug depends on how
> reproducible  the issue is ?

Jiffin,when I shared the setup,to me it was in same state.Don't know how
ganesha service got crashed.That also needs to be looked upon.
Also as we are unable to get any files from mount point post performing "ls",to
me it stands blocker.

I will try to repro the issue.But considering the lack of qe bandwidth I will
try to update the BZ by 26th April EOD 

Keeping needinfo intact

> 
> @Dang :
> 
> Is it okay to skip the entry which failed "getattrs" in from directory list
> and continue with rest of entries instead of failing the entire readdir
> operation ?
> 
> @Nithya :
> Have u encountered any similar issue with dht ?

--- Additional comment from Susant Kumar Palai on 2018-04-23 05:48:00 EDT ---

from gfapi log :

[2018-04-19 16:18:17.188419] W [MSGID: 108001] [afr-common.c:5171:afr_notify]
0-mani-test1-replicate-0: Client-quorum is not met
[2018-04-19 16:18:17.188877] I [MSGID: 114018]
[client.c:2285:client_rpc_notify] 0-mani-test1-client-3: disconnected from
mani-test1-client-3. Client process will keep trying to connect to glusterd
until brick's port is available
[2018-04-19 16:18:17.188955] I [MSGID: 114018]
[client.c:2285:client_rpc_notify] 0-mani-test1-client-4: disconnected from
mani-test1-client-4. Client process will keep trying to connect to glusterd
until brick's port is available
[2018-04-19 16:18:17.188976] W [MSGID: 108001] [afr-common.c:5171:afr_notify]
0-mani-test1-replicate-1: Client-quorum is not met
[2018-04-19 16:18:17.188805] I [MSGID: 114018]
[client.c:2285:client_rpc_notify] 0-mani-test1-client-2: disconnected from
mani-test1-client-2. Client process will keep trying to connect to glusterd
until brick's port is available
[2018-04-19 16:18:17.189312] I [MSGID: 114018]
[client.c:2285:client_rpc_notify] 0-mani-test1-client-5: disconnected from
mani-test1-client-5. Client process will keep trying to connect to glusterd
until brick's port is available
[2018-04-19 16:18:17.189342] E [MSGID: 108006]
[afr-common.c:4944:__afr_handle_child_down_event] 0-mani-test1-replicate-1: All
subvolumes are down. Going offline until atleast one of them comes back up.
[2018-04-19 16:18:17.190301] E [MSGID: 108006]
[afr-common.c:4944:__afr_handle_child_down_event] 0-mani-test1-replicate-0: All
subvolumes are down. Going offline until atleast one of them comes back up.
The message "I [MSGID: 104043] [glfs-mgmt.c:628:glfs_mgmt_getspec_cbk] 0-gfapi:
No change in volfile, continuing" repeated 2 times between [2018-04-19
16:17:52.270376] and [2018-04-19 16:18:15.862041]
[2018-04-19 16:35:23.081937] I [MSGID: 109063]
[dht-layout.c:713:dht_layout_normalize] 1-mani-test1-dht: Found anomalies in
(null) (gfid = 7cdfc3f6-4337-4a50-a58f-0305e65cb0c0). Holes=1 overlaps=0
[2018-04-19 16:38:39.796313] I [MSGID: 109063]
[dht-layout.c:713:dht_layout_normalize] 1-mani-test1-dht: Found anomalies in
(null) (gfid = f4619ab7-cfc9-481c-91ad-a03fa096ecdc). Holes=1 overlaps=0
[2018-04-19 16:38:42.005686] I [MSGID: 109063]
[dht-layout.c:713:dht_layout_normalize] 1-mani-test1-dht: Found anomalies in
(null) (gfid = bbb57d4e-b895-4437-82bb-04c9b45a991b). Holes=1 overlaps=0
[2018-04-19 16:39:22.441298] I [MSGID: 109063]
[dht-layout.c:713:dht_layout_normalize] 1-mani-test1-dht: Found anomalies in
(null) (gfid = f6ac57d8-835d-4ad1-8eb2-2d970b14b312). Holes=1 overlaps=0
[2018-04-19 16:39:29.100457] I [MSGID: 109063]
[dht-layout.c:713:dht_layout_normalize] 1-mani-test1-dht: Found anomalies in
(null) (gfid = 29ee038c-9e56-4ff3-965a-e619d3c0eec3). Holes=1 overla

Seems like the layout needed a heal and both the server went down. This will
lead to lookup failure on root it self.

Having the setup would have helped confirming the layout issue and further any
client-server connection issue.

In my opinion, either the bricks were killed or there was network partition and
hence the problem.

--- Additional comment from Daniel Gryniewicz on 2018-04-23 09:49:13 EDT ---

MDCACHE doesn't do a getattrs.  The attributes of the object referenced by the
dirent are passed back to MDCACHE in the callback by the FSAL.  FSAL_GLUSTER
uses glfs_xreaddirplus_r() to get both the file handle and it's attributes,
which are then passed back to MDCACHE.  So no separate getattrs() should be
called.

That said, MDCACHE needs the attributes when it creates the object, so we can't
just skip the dirent.

--- Additional comment from Manisha Saini on 2018-04-24 06:00:05 EDT ---

(In reply to Manisha Saini from comment #4)
> (In reply to Jiffin from comment #3)
> > Reason for error :
> > after performing readdir call in, ganesha's mdcache(not gluster mdcahe)
> > performs a getattr call on each entries of dirent list to refresh it's
> > cache. When gettattr call reaches fsal_gluster, first it performs the
> > glfs_h_stat, for directory  in the root "ms2"(gfid :
> > 59d7dc9b-e2ae-4bca-8b97-14539fe1aa7a) , one of the layer in client stack
> > returned EINVAL(I was not able to find any packets related to this gfid). I
> > find only this server in ganesha cluster have that issue
> > And I have lost setup in same state and didn't find layer returned EINVAL 
> > 
> > At back end :
> > 
> > # getfattr -d -m "." -e hex ms2
> > # file: ms2
> > security.
> > selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f7
> > 43a733000
> > trusted.gfid=0x59d7dc9be2ae4bca8b9714539fe1aa7a
> > trusted.glusterfs.dht=0x0000000000000000000000007ffffffe
> > trusted.glusterfs.dht.mds=0x00000000
> > 
> > I found following messages in ganesha-gfapi.log :
> > 
> > [2018-04-19 16:50:33.923678] E [MSGID: 101046]
> > [dht-common.c:1857:dht_revalidate_cbk] 1-mani-test1-dht: dict is null
> > [2018-04-19 16:51:37.606081] I [MSGID: 109063]
> > [dht-layout.c:713:dht_layout_normalize] 1-mani-test1-dht: Found anomalies in
> > (null) (gfid = a0056ea8-18ac-431f-a2a0-b06a5355998f). Holes=1 overlaps=0
> > [2018-04-19 16:53:07.867398] I [MSGID: 109063]
> > [dht-layout.c:713:dht_layout_normalize] 1-mani-test1-dht: Found anomalies in
> > (null) (gfid = 859569c9-d4fa-49e0-b15a-5102b85f3c51). Holes=1 overlaps=0
> > [2018-04-19 16:55:28.636417] I [MSGID: 109063]
> > [dht-layout.c:713:dht_layout_normalize] 1-mani-test1-dht: Found anomalies in
> > (null) (gfid = 74f8d078-af5a-4cb2-9241-a1a080c47e7d). Holes=1 overlaps=0
> > [2018-04-19 16:59:05.204906] I [MSGID: 109063]
> > [dht-layout.c:713:dht_layout_normalize] 1-mani-test1-dht: Found anomalies in
> > (null) (gfid = 59e0bb2d-7ffa-444d-b071-69963db29047). Holes=1 overlaps=0
> > [2018-04-19 17:07:47.896369] I [MSGID: 109063]
> > [dht-layout.c:713:dht_layout_normalize] 1-mani-test1-dht: Found anomalies in
> > (null) (gfid = 29887570-967c-4603-a4bf-a55601b0d0f3). Holes=1 overlaps=0
> > [2018-04-19 17:10:27.273871] I [MSGID: 109063]
> > [dht-layout.c:713:dht_layout_normalize] 1-mani-test1-dht: Found anomalies in
> > (null) (gfid = 877b79e6-a47c-479d-b7b5-5879a4c21fca). Holes=1 overlaps=0
> > [2018-04-19 17:10:41.758168] I [MSGID: 109063]
> > [dht-layout.c:713:dht_layout_normalize] 1-mani-test1-dht: Found anomalies in
> > (null) (gfid = 59d7dc9b-e2ae-4bca-8b97-14539fe1aa7a). Holes=1 overlaps=0
> > 
> > 
> > Manisha :
> > 
> > Since set up was not in same state, priority of this bug depends on how
> > reproducible  the issue is ?
> 
> Jiffin,when I shared the setup,to me it was in same state.Don't know how
> ganesha service got crashed.That also needs to be looked upon.
> Also as we are unable to get any files from mount point post performing
> "ls",to me it stands blocker.
> 
> I will try to repro the issue.But considering the lack of qe bandwidth I
> will try to update the BZ by 26th April EOD 
> 




> 
> > 
> > @Dang :
> > 
> > Is it okay to skip the entry which failed "getattrs" in from directory list
> > and continue with rest of entries instead of failing the entire readdir
> > operation ?
> > 
> > @Nithya :
> > Have u encountered any similar issue with dht ?


(In reply to Manisha Saini from comment #4)
> (In reply to Jiffin from comment #3)
> > Reason for error :
> > after performing readdir call in, ganesha's mdcache(not gluster mdcahe)
> > performs a getattr call on each entries of dirent list to refresh it's
> > cache. When gettattr call reaches fsal_gluster, first it performs the
> > glfs_h_stat, for directory  in the root "ms2"(gfid :
> > 59d7dc9b-e2ae-4bca-8b97-14539fe1aa7a) , one of the layer in client stack
> > returned EINVAL(I was not able to find any packets related to this gfid). I
> > find only this server in ganesha cluster have that issue
> > And I have lost setup in same state and didn't find layer returned EINVAL 
> > 
> > At back end :
> > 
> > # getfattr -d -m "." -e hex ms2
> > # file: ms2
> > security.
> > selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f7
> > 43a733000
> > trusted.gfid=0x59d7dc9be2ae4bca8b9714539fe1aa7a
> > trusted.glusterfs.dht=0x0000000000000000000000007ffffffe
> > trusted.glusterfs.dht.mds=0x00000000
> > 
> > I found following messages in ganesha-gfapi.log :
> > 
> > [2018-04-19 16:50:33.923678] E [MSGID: 101046]
> > [dht-common.c:1857:dht_revalidate_cbk] 1-mani-test1-dht: dict is null
> > [2018-04-19 16:51:37.606081] I [MSGID: 109063]
> > [dht-layout.c:713:dht_layout_normalize] 1-mani-test1-dht: Found anomalies in
> > (null) (gfid = a0056ea8-18ac-431f-a2a0-b06a5355998f). Holes=1 overlaps=0
> > [2018-04-19 16:53:07.867398] I [MSGID: 109063]
> > [dht-layout.c:713:dht_layout_normalize] 1-mani-test1-dht: Found anomalies in
> > (null) (gfid = 859569c9-d4fa-49e0-b15a-5102b85f3c51). Holes=1 overlaps=0
> > [2018-04-19 16:55:28.636417] I [MSGID: 109063]
> > [dht-layout.c:713:dht_layout_normalize] 1-mani-test1-dht: Found anomalies in
> > (null) (gfid = 74f8d078-af5a-4cb2-9241-a1a080c47e7d). Holes=1 overlaps=0
> > [2018-04-19 16:59:05.204906] I [MSGID: 109063]
> > [dht-layout.c:713:dht_layout_normalize] 1-mani-test1-dht: Found anomalies in
> > (null) (gfid = 59e0bb2d-7ffa-444d-b071-69963db29047). Holes=1 overlaps=0
> > [2018-04-19 17:07:47.896369] I [MSGID: 109063]
> > [dht-layout.c:713:dht_layout_normalize] 1-mani-test1-dht: Found anomalies in
> > (null) (gfid = 29887570-967c-4603-a4bf-a55601b0d0f3). Holes=1 overlaps=0
> > [2018-04-19 17:10:27.273871] I [MSGID: 109063]
> > [dht-layout.c:713:dht_layout_normalize] 1-mani-test1-dht: Found anomalies in
> > (null) (gfid = 877b79e6-a47c-479d-b7b5-5879a4c21fca). Holes=1 overlaps=0
> > [2018-04-19 17:10:41.758168] I [MSGID: 109063]
> > [dht-layout.c:713:dht_layout_normalize] 1-mani-test1-dht: Found anomalies in
> > (null) (gfid = 59d7dc9b-e2ae-4bca-8b97-14539fe1aa7a). Holes=1 overlaps=0
> > 
> > 
> > Manisha :
> > 
> > Since set up was not in same state, priority of this bug depends on how
> > reproducible  the issue is ?
> 
> Jiffin,when I shared the setup,to me it was in same state.Don't know how
> ganesha service got crashed.That also needs to be looked upon.
> Also as we are unable to get any files from mount point post performing
> "ls",to me it stands blocker.
> 
> I will try to repro the issue.But considering the lack of qe bandwidth I
> will try to update the BZ by 26th April EOD 
> 
> Keeping needinfo intact
> 
> > 
> > @Dang :
> > 
> > Is it okay to skip the entry which failed "getattrs" in from directory list
> > and continue with rest of entries instead of failing the entire readdir
> > operation ?
> > 
> > @Nithya :
> > Have u encountered any similar issue with dht ?


--- Additional comment from Jiffin on 2018-06-06 02:59:38 EDT ---

I tried to recreate  similar on latest ganesha build(+ plus fix for 1580107)
and I was not able to create the issue with bonnie + linux untar(for 2 hrs)
(tried twice) on 2*(2+1) volume. Can you please retry the following in the new
build.
Please enable debugging logging for ganesha and gfapi , collect the packet on
the server where "ls -ltr " is performed. 

Enable debug log for gfapi -- set diagnostics.client-log-level to DEBUG

Enable debug for ganesha on following components readdir and cache inode
add follow in ganesha.conf

LOG {
                ## Default log level for all components
        #Default_Log_Level = WARN;

                ## Configure per-component log levels.
                        Components {
                                       CACHE_INODE = FULL_DEBUG;
                                       CACHE_INODE_LRU = FULL_DEBUG;
                                       NFS_READDIR = FULL_DEBUG;
                       }


                                                                               
                                                                               
                                        }

Please restart nfs-ganesha post that

--- Additional comment from Manisha Saini on 2018-06-21 14:54:00 EDT ---

(In reply to Jiffin from comment #10)
> I tried to recreate  similar on latest ganesha build(+ plus fix for 1580107)
> and I was not able to create the issue with bonnie + linux untar(for 2 hrs)
> (tried twice) on 2*(2+1) volume. Can you please retry the following in the
> new build.
> Please enable debugging logging for ganesha and gfapi , collect the packet
> on the server where "ls -ltr " is performed. 
> 
> Enable debug log for gfapi -- set diagnostics.client-log-level to DEBUG
> 
> Enable debug for ganesha on following components readdir and cache inode
> add follow in ganesha.conf
> 
> LOG {
>                 ## Default log level for all components
>         #Default_Log_Level = WARN;
> 
>                 ## Configure per-component log levels.
>                         Components {
>                                        CACHE_INODE = FULL_DEBUG;
>                                        CACHE_INODE_LRU = FULL_DEBUG;
>                                        NFS_READDIR = FULL_DEBUG;
>                        }
> 
> 
>                                                                             
> }
> 



There are no logs generated on those Ganesha server nodes through which clients
are mapped,performing lookups.The other nodes which are performing dbench and
untars have the logs in place.

Setup detail is same as in comment #17

The client on which lookup causing "invalid argument"


dhcp47-170.lab.eng.blr.redhat.com - root/redhat

[root at dhcp47-170 readdir_test]# ls
ls: reading directory .: Invalid argument
[root at dhcp47-170 readdir_test]# ls
ls: reading directory .: Invalid argument
[root at dhcp47-170 readdir_test]# ls
ls: reading directory .: Invalid argument
[root at dhcp47-170 readdir_test]# ls
ls: reading directory .: Invalid argument


Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1569657
[Bug 1569657] [Ganesha] While performing lookups from two of the clients,
"ls" command got failed with "Invalid argument"
-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are on the CC list for the bug.


More information about the Bugs mailing list