[Bugs] [Bug 1333645] New: NFS+attach tier:IOs hang while attach tier is issued

bugzilla at redhat.com bugzilla at redhat.com
Fri May 6 05:28:25 UTC 2016


https://bugzilla.redhat.com/show_bug.cgi?id=1333645

            Bug ID: 1333645
           Summary: NFS+attach tier:IOs hang while attach tier is issued
           Product: GlusterFS
           Version: 3.7.11
         Component: tiering
          Keywords: ZStream
          Severity: urgent
          Priority: urgent
          Assignee: bugs at gluster.org
          Reporter: rkavunga at redhat.com
        QA Contact: bugs at gluster.org
                CC: bugs at gluster.org, byarlaga at redhat.com,
                    nchilaka at redhat.com, rcyriac at redhat.com,
                    rgowdapp at redhat.com, rkavunga at redhat.com,
                    skoduri at redhat.com, smohan at redhat.com
        Depends On: 1306194, 1311002
            Blocks: 1305205, 1306930



+++ This bug was initially created as a clone of Bug #1311002 +++

+++ This bug was initially created as a clone of Bug #1306194 +++

on a 16 node setup, with ec volume, I started IOs from 3 different clients.
While IOs were going on I attached a tier to the volume, and the IOs were hung.

I tried this twice and both times IOs got hung.

In 3.7.5-17 there used to be a temporary pause(about 5 min) when attach tier
was issued. But in this build 3.7.5-19 the IOs have hung for more than 2 Hours




volinfo before and after attach tier:
gluster v create npcvol disperse 12 disperse-data 8
10.70.37.202:/bricks/brick1/npcvol 10.70.37.195:/bricks/brick1/npcvol
10.70.35.133:/bricks/brick1/npcvol 10.70.35.239:/bricks/brick1/npcvol
10.70.35.225:/bricks/brick1/npcvol 10.70.35.11:/bricks/brick1/npcvol
10.70.35.10:/bricks/brick1/npcvol 10.70.35.231:/bricks/brick1/npcvol
10.70.35.176:/bricks/brick1/npcvol 10.70.35.232:/bricks/brick1/npcvol
10.70.35.173:/bricks/brick1/npcvol 10.70.35.163:/bricks/brick1/npcvol
10.70.37.101:/bricks/brick1/npcvol 10.70.37.69:/bricks/brick1/npcvol
10.70.37.60:/bricks/brick1/npcvol 10.70.37.120:/bricks/brick1/npcvol
10.70.37.202:/bricks/brick2/npcvol 10.70.37.195:/bricks/brick2/npcvol
10.70.35.133:/bricks/brick2/npcvol 10.70.35.239:/bricks/brick2/npcvol
10.70.35.225:/bricks/brick2/npcvol 10.70.35.11:/bricks/brick2/npcvol
10.70.35.10:/bricks/brick2/npcvol 10.70.35.231:/bricks/brick2/npcvol

gluster volume tier npcvol attach rep 2 10.70.35.176:/bricks/brick7/npcvol_hot
10.70.35.232:/bricks/brick7/npcvol_hot 10.70.35.173:/bricks/brick7/npcvol_hot
10.70.35.163:/bricks/brick7/npcvol_hot 10.70.37.101:/bricks/brick7/npcvol_hot
10.70.37.69:/bricks/brick7/npcvol_hot 10.70.37.60:/bricks/brick7/npcvol_hot
10.70.37.120:/bricks/brick7/npcvol_hot 10.70.37.195:/bricks/brick7/npcvol_hot
10.70.37.202:/bricks/brick7/npcvol_hot 10.70.35.133:/bricks/brick7/npcvol_hot
10.70.35.239:/bricks/brick7/npcvol_hot



[root at dhcp37-202 ~]# gluster v status npcvol
Status of volume: npcvol
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.37.202:/bricks/brick1/npcvol    49161     0          Y       628  
Brick 10.70.37.195:/bricks/brick1/npcvol    49161     0          Y       30704
Brick 10.70.35.133:/bricks/brick1/npcvol    49158     0          Y       24148
Brick 10.70.35.239:/bricks/brick1/npcvol    49158     0          Y       24128
Brick 10.70.35.225:/bricks/brick1/npcvol    49157     0          Y       24467
Brick 10.70.35.11:/bricks/brick1/npcvol     49157     0          Y       24272
Brick 10.70.35.10:/bricks/brick1/npcvol     49160     0          Y       24369
Brick 10.70.35.231:/bricks/brick1/npcvol    49160     0          Y       32189
Brick 10.70.35.176:/bricks/brick1/npcvol    49161     0          Y       1392 
Brick 10.70.35.232:/bricks/brick1/npcvol    49161     0          Y       26630
Brick 10.70.35.173:/bricks/brick1/npcvol    49161     0          Y       28493
Brick 10.70.35.163:/bricks/brick1/npcvol    49161     0          Y       28592
Brick 10.70.37.101:/bricks/brick1/npcvol    49161     0          Y       28410
Brick 10.70.37.69:/bricks/brick1/npcvol     49161     0          Y       357  
Brick 10.70.37.60:/bricks/brick1/npcvol     49161     0          Y       31071
Brick 10.70.37.120:/bricks/brick1/npcvol    49176     0          Y       1311 
Brick 10.70.37.202:/bricks/brick2/npcvol    49162     0          Y       651  
Brick 10.70.37.195:/bricks/brick2/npcvol    49162     0          Y       30723
Brick 10.70.35.133:/bricks/brick2/npcvol    49159     0          Y       24167
Brick 10.70.35.239:/bricks/brick2/npcvol    49159     0          Y       24148
Brick 10.70.35.225:/bricks/brick2/npcvol    49158     0          Y       24486
Brick 10.70.35.11:/bricks/brick2/npcvol     49158     0          Y       24291
Brick 10.70.35.10:/bricks/brick2/npcvol     49161     0          Y       24388
Brick 10.70.35.231:/bricks/brick2/npcvol    49161     0          Y       32208
Snapshot Daemon on localhost                49163     0          Y       810  
NFS Server on localhost                     2049      0          Y       818  
Self-heal Daemon on localhost               N/A       N/A        Y       686  
Quota Daemon on localhost                   N/A       N/A        Y       859  
Snapshot Daemon on 10.70.37.101             49162     0          Y       28538
NFS Server on 10.70.37.101                  2049      0          Y       28546
Self-heal Daemon on 10.70.37.101            N/A       N/A        Y       28439
Quota Daemon on 10.70.37.101                N/A       N/A        Y       28576
Snapshot Daemon on 10.70.37.195             49163     0          Y       30851
NFS Server on 10.70.37.195                  2049      0          Y       30859
Self-heal Daemon on 10.70.37.195            N/A       N/A        Y       30751
Quota Daemon on 10.70.37.195                N/A       N/A        Y       30889
Snapshot Daemon on 10.70.37.120             49177     0          Y       1438 
NFS Server on 10.70.37.120                  2049      0          Y       1446 
Self-heal Daemon on 10.70.37.120            N/A       N/A        Y       1339 
Quota Daemon on 10.70.37.120                N/A       N/A        Y       1477 
Snapshot Daemon on 10.70.37.69              49162     0          Y       492  
NFS Server on 10.70.37.69                   2049      0          Y       500  
Self-heal Daemon on 10.70.37.69             N/A       N/A        Y       385  
Quota Daemon on 10.70.37.69                 N/A       N/A        Y       542  
Snapshot Daemon on 10.70.37.60              49162     0          Y       31197
NFS Server on 10.70.37.60                   2049      0          Y       31205
Self-heal Daemon on 10.70.37.60             N/A       N/A        Y       31099
Quota Daemon on 10.70.37.60                 N/A       N/A        Y       31235
Snapshot Daemon on 10.70.35.239             49160     0          Y       24287
NFS Server on 10.70.35.239                  2049      0          Y       24295
Self-heal Daemon on 10.70.35.239            N/A       N/A        Y       24176
Quota Daemon on 10.70.35.239                N/A       N/A        Y       24325
Snapshot Daemon on 10.70.35.231             49162     0          Y       32340
NFS Server on 10.70.35.231                  2049      0          Y       32348
Self-heal Daemon on 10.70.35.231            N/A       N/A        Y       32236
Quota Daemon on 10.70.35.231                N/A       N/A        Y       32389
Snapshot Daemon on 10.70.35.176             49162     0          Y       1535 
NFS Server on 10.70.35.176                  2049      0          Y       1545 
Self-heal Daemon on 10.70.35.176            N/A       N/A        Y       1420 
Quota Daemon on 10.70.35.176                N/A       N/A        Y       1589 
Snapshot Daemon on dhcp35-225.lab.eng.blr.r
edhat.com                                   49159     0          Y       24623
NFS Server on dhcp35-225.lab.eng.blr.redhat
.com                                        2049      0          Y       24631
Self-heal Daemon on dhcp35-225.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       24514
Quota Daemon on dhcp35-225.lab.eng.blr.redh
at.com                                      N/A       N/A        Y       24661
Snapshot Daemon on 10.70.35.232             49162     0          Y       26759
NFS Server on 10.70.35.232                  2049      0          Y       26767
Self-heal Daemon on 10.70.35.232            N/A       N/A        Y       26658
Quota Daemon on 10.70.35.232                N/A       N/A        Y       26805
Snapshot Daemon on 10.70.35.163             49162     0          Y       28721
NFS Server on 10.70.35.163                  2049      0          Y       28729
Self-heal Daemon on 10.70.35.163            N/A       N/A        Y       28620
Quota Daemon on 10.70.35.163                N/A       N/A        Y       28760
Snapshot Daemon on 10.70.35.11              49159     0          Y       24427
NFS Server on 10.70.35.11                   2049      0          Y       24435
Self-heal Daemon on 10.70.35.11             N/A       N/A        Y       24319
Quota Daemon on 10.70.35.11                 N/A       N/A        Y       24465
Snapshot Daemon on 10.70.35.10              49162     0          Y       24521
NFS Server on 10.70.35.10                   2049      0          Y       24529
Self-heal Daemon on 10.70.35.10             N/A       N/A        Y       24416
Quota Daemon on 10.70.35.10                 N/A       N/A        Y       24560
Snapshot Daemon on 10.70.35.133             49160     0          Y       24314
NFS Server on 10.70.35.133                  2049      0          Y       24322
Self-heal Daemon on 10.70.35.133            N/A       N/A        Y       24203
Quota Daemon on 10.70.35.133                N/A       N/A        Y       24352
Snapshot Daemon on 10.70.35.173             49162     0          Y       28625
NFS Server on 10.70.35.173                  2049      0          Y       28633
Self-heal Daemon on 10.70.35.173            N/A       N/A        Y       28521
Quota Daemon on 10.70.35.173                N/A       N/A        Y       28671

Task Status of Volume npcvol
------------------------------------------------------------------------------
There are no active volume tasks



#####after attach tier
[root at dhcp37-202 ~]# gluster v status npcvol
Status of volume: npcvol
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Hot Bricks:
Brick 10.70.35.239:/bricks/brick7/npcvol_ho
t                                           49161     0          Y       25252
Brick 10.70.35.133:/bricks/brick7/npcvol_ho
t                                           49161     0          Y       25276
Brick 10.70.37.202:/bricks/brick7/npcvol_ho
t                                           49164     0          Y       2028 
Brick 10.70.37.195:/bricks/brick7/npcvol_ho
t                                           49164     0          Y       31793
Brick 10.70.37.120:/bricks/brick7/npcvol_ho
t                                           49178     0          Y       2504 
Brick 10.70.37.60:/bricks/brick7/npcvol_hot 49163     0          Y       32188
Brick 10.70.37.69:/bricks/brick7/npcvol_hot 49163     0          Y       1548 
Brick 10.70.37.101:/bricks/brick7/npcvol_ho
t                                           49163     0          Y       29535
Brick 10.70.35.163:/bricks/brick7/npcvol_ho
t                                           49163     0          Y       29799
Brick 10.70.35.173:/bricks/brick7/npcvol_ho
t                                           49163     0          Y       29669
Brick 10.70.35.232:/bricks/brick7/npcvol_ho
t                                           49163     0          Y       27813
Brick 10.70.35.176:/bricks/brick7/npcvol_ho
t                                           49163     0          Y       2607 
Cold Bricks:
Brick 10.70.37.202:/bricks/brick1/npcvol    49161     0          Y       628  
Brick 10.70.37.195:/bricks/brick1/npcvol    49161     0          Y       30704
Brick 10.70.35.133:/bricks/brick1/npcvol    49158     0          Y       24148
Brick 10.70.35.239:/bricks/brick1/npcvol    49158     0          Y       24128
Brick 10.70.35.225:/bricks/brick1/npcvol    49157     0          Y       24467
Brick 10.70.35.11:/bricks/brick1/npcvol     49157     0          Y       24272
Brick 10.70.35.10:/bricks/brick1/npcvol     49160     0          Y       24369
Brick 10.70.35.231:/bricks/brick1/npcvol    49160     0          Y       32189
Brick 10.70.35.176:/bricks/brick1/npcvol    49161     0          Y       1392 
Brick 10.70.35.232:/bricks/brick1/npcvol    49161     0          Y       26630
Brick 10.70.35.173:/bricks/brick1/npcvol    49161     0          Y       28493
Brick 10.70.35.163:/bricks/brick1/npcvol    49161     0          Y       28592
Brick 10.70.37.101:/bricks/brick1/npcvol    49161     0          Y       28410
Brick 10.70.37.69:/bricks/brick1/npcvol     49161     0          Y       357  
Brick 10.70.37.60:/bricks/brick1/npcvol     49161     0          Y       31071
Brick 10.70.37.120:/bricks/brick1/npcvol    49176     0          Y       1311 
Brick 10.70.37.202:/bricks/brick2/npcvol    49162     0          Y       651  
Brick 10.70.37.195:/bricks/brick2/npcvol    49162     0          Y       30723
Brick 10.70.35.133:/bricks/brick2/npcvol    49159     0          Y       24167
Brick 10.70.35.239:/bricks/brick2/npcvol    49159     0          Y       24148
Brick 10.70.35.225:/bricks/brick2/npcvol    49158     0          Y       24486
Brick 10.70.35.11:/bricks/brick2/npcvol     49158     0          Y       24291
Brick 10.70.35.10:/bricks/brick2/npcvol     49161     0          Y       24388
Brick 10.70.35.231:/bricks/brick2/npcvol    49161     0          Y       32208
Snapshot Daemon on localhost                49163     0          Y       810  
NFS Server on localhost                     2049      0          Y       2048 
Self-heal Daemon on localhost               N/A       N/A        Y       2056 
Quota Daemon on localhost                   N/A       N/A        Y       2064 
Snapshot Daemon on 10.70.37.60              49162     0          Y       31197
NFS Server on 10.70.37.60                   2049      0          Y       32208
Self-heal Daemon on 10.70.37.60             N/A       N/A        Y       32216
Quota Daemon on 10.70.37.60                 N/A       N/A        Y       32224
Snapshot Daemon on 10.70.37.195             49163     0          Y       30851
NFS Server on 10.70.37.195                  2049      0          Y       31813
Self-heal Daemon on 10.70.37.195            N/A       N/A        Y       31821
Quota Daemon on 10.70.37.195                N/A       N/A        Y       31829
Snapshot Daemon on 10.70.37.120             49177     0          Y       1438 
NFS Server on 10.70.37.120                  2049      0          Y       2524 
Self-heal Daemon on 10.70.37.120            N/A       N/A        Y       2532 
Quota Daemon on 10.70.37.120                N/A       N/A        Y       2540 
Snapshot Daemon on 10.70.37.101             49162     0          Y       28538
NFS Server on 10.70.37.101                  2049      0          Y       29555
Self-heal Daemon on 10.70.37.101            N/A       N/A        Y       29563
Quota Daemon on 10.70.37.101                N/A       N/A        Y       29571
Snapshot Daemon on 10.70.37.69              49162     0          Y       492  
NFS Server on 10.70.37.69                   2049      0          Y       1574 
Self-heal Daemon on 10.70.37.69             N/A       N/A        Y       1582 
Quota Daemon on 10.70.37.69                 N/A       N/A        Y       1590 
Snapshot Daemon on 10.70.35.173             49162     0          Y       28625
NFS Server on 10.70.35.173                  2049      0          Y       29690
Self-heal Daemon on 10.70.35.173            N/A       N/A        Y       29698
Quota Daemon on 10.70.35.173                N/A       N/A        Y       29713
Snapshot Daemon on 10.70.35.231             49162     0          Y       32340
NFS Server on 10.70.35.231                  2049      0          Y       1022 
Self-heal Daemon on 10.70.35.231            N/A       N/A        Y       1033 
Quota Daemon on 10.70.35.231                N/A       N/A        Y       1043 
Snapshot Daemon on 10.70.35.176             49162     0          Y       1535 
NFS Server on 10.70.35.176                  2049      0          Y       2627 
Self-heal Daemon on 10.70.35.176            N/A       N/A        Y       2635 
Quota Daemon on 10.70.35.176                N/A       N/A        Y       2659 
Snapshot Daemon on 10.70.35.239             49160     0          Y       24287
NFS Server on 10.70.35.239                  2049      0          Y       25272
Self-heal Daemon on 10.70.35.239            N/A       N/A        Y       25280
Quota Daemon on 10.70.35.239                N/A       N/A        Y       25288
Snapshot Daemon on dhcp35-225.lab.eng.blr.r
edhat.com                                   49159     0          Y       24623
NFS Server on dhcp35-225.lab.eng.blr.redhat
.com                                        2049      0          Y       25622
Self-heal Daemon on dhcp35-225.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       25630
Quota Daemon on dhcp35-225.lab.eng.blr.redh
at.com                                      N/A       N/A        Y       25638
Snapshot Daemon on 10.70.35.11              49159     0          Y       24427
NFS Server on 10.70.35.11                   2049      0          Y       25455
Self-heal Daemon on 10.70.35.11             N/A       N/A        Y       25463
Quota Daemon on 10.70.35.11                 N/A       N/A        Y       25471
Snapshot Daemon on 10.70.35.133             49160     0          Y       24314
NFS Server on 10.70.35.133                  2049      0          Y       25296
Self-heal Daemon on 10.70.35.133            N/A       N/A        Y       25304
Quota Daemon on 10.70.35.133                N/A       N/A        Y       25312
Snapshot Daemon on 10.70.35.10              49162     0          Y       24521
NFS Server on 10.70.35.10                   2049      0          Y       25578
Self-heal Daemon on 10.70.35.10             N/A       N/A        Y       25586
Quota Daemon on 10.70.35.10                 N/A       N/A        Y       25594
Snapshot Daemon on 10.70.35.232             49162     0          Y       26759
NFS Server on 10.70.35.232                  2049      0          Y       27833
Self-heal Daemon on 10.70.35.232            N/A       N/A        Y       27841
Quota Daemon on 10.70.35.232                N/A       N/A        Y       27866
Snapshot Daemon on 10.70.35.163             49162     0          Y       28721
NFS Server on 10.70.35.163                  2049      0          Y       29819
Self-heal Daemon on 10.70.35.163            N/A       N/A        Y       29827
Quota Daemon on 10.70.35.163                N/A       N/A        Y       29852

Task Status of Volume npcvol
------------------------------------------------------------------------------
Task                 : Tier migration      
ID                   : 524ad8fe-a743-47df-a4e9-edd2db05c60b
Status               : in progress         






Following is the Ios triggered before attach and were going on while attach:
1)client1:created a 300Mb file and started to copy the file to new files 
for i in {2..50};do cp hlfile.1 hlfile.$i;done

2)client2:created 50Mb file and initiated a rename of file continuously 
for i in {2..1000};do cp rename.1 rename.$i;done

3)client3: linux untar
4)copying a 3GB file to create new files in loop 
for i in {1..10};do cp File.mkv cheema$i.mkv;done

4)Client 4: created 10000 Zerobyte file and while then triggered remove of 5000
file so that it goes on while attach tier
[root at rhs-client30 zerobyte]# rm -rf zb{5000..10000}

--- Additional comment from Red Hat Bugzilla Rules Engine on 2016-02-10
04:45:45 EST ---

This bug is automatically being proposed for the current z-stream release of
Red Hat Gluster Storage 3 by setting the release flag 'rhgs‑3.1.z' to '?'. 

If this bug should be proposed for a different release, please manually change
the proposed release flag.

--- Additional comment from nchilaka on 2016-02-10 05:18:43 EST ---

sosreports of both clients and servers available at 
[nchilaka at rhsqe-repo nchilaka]$ chmod -R 0777 bug.1306194
[nchilaka at rhsqe-repo nchilaka]$ pwd
/home/repo/sosreports/nchilaka

--- Additional comment from Mohammed Rafi KC on 2016-02-10 11:13:31 EST ---

There is a blocking lock held on one of the brick, which is not released. All
of the other clients are waiting on this lock. We couldn't look into the owner
of the lock, because by the time ping timer is expired and lock was released.

After that i/o's resumed. We need to look which client acquired the lock and
why they are not releasing it.

--- Additional comment from Soumya Koduri on 2016-02-11 09:00:05 EST ---

When we tried to reproduce the issue, we see "Stale File Handle" errors after
attach-tier. When did RCA using gdb, we found that ESTALE is returned via
svc_client (which is enabled by USS). So we have disabled USS and then re-tried
the test. Now we see the mount points hang.


On the server side, the volume got unexported -

[skoduri at skoduri ~]$ showmount -e 10.70.35.225
Export list for 10.70.35.225:
[skoduri at skoduri ~]$ 


Tracing back from the logs and the code, 

[2016-02-11 13:26:02.540565] E [MSGID: 112070] [nfs3.c:896:nfs3_getattr]
0-nfs-nfsv3: Volume is disabled: finalvol
[2016-02-11 13:28:02.600425] E [MSGID: 112070] [nfs3.c:896:nfs3_getattr]
0-nfs-nfsv3: Volume is disabled: finalvol
[2016-02-11 13:28:02.600546] E [rpcsvc.c:565:rpcsvc_check_and_reply_error]
0-rpcsvc: rpc actor failed to complete successfully


This msg is logged when that volume is not in the list of nfs->initedxl[] list.
This list will be updated as part of "nfs_startup_subvolume()" which is invoked
during notify of "GF_EVENT_CHILD_UP". So suspecting that nfs xlator has not
received this event which resulted in this volume being in unexported state.
Attaching the nfs log for further debugging.

--- Additional comment from Mohammed Rafi KC on 2016-02-11 09:55:09 EST ---

During the nfs graph initialization, we do a lookup on the root. Looks like
this lookup is blocked on a lock which held by another nfs process. We need to
figure it out why the nfs server who acquired the lock failed to unlock it.

--- Additional comment from Raghavendra G on 2016-02-12 01:43:36 EST ---

Rafi reported that stale lock or unlock failures are seen even when first
lookup on root is happening. Here is a most likely RCA. I am assuming a
"tier-dht" has two dht subvols "hot-dht" and "cold-dht". Also stale lock is
found on one of the bricks corresponding to hot-dht.

1. Lookup on / on tier-dht.
2. Lookup is wound to hashed subvol - cold-dht and is successful.
3. tier-dht figures out / is a directory and does a lookup on both hot-dht and
cold-dht.
4. on hot-dht, some subvols - say c1, c2 - are down. But lookup is still
successful as some other subvols (say c3, c4) are up.
5. lookup on / is successful on cold-dht.
6. tier-dht decides it needs to heal layout of "/".

>From here I am skipping events on cold-dht as they are irrelevant for this RCA.

7. tier-dht winds inodelk on hot-dht. hot-dht winds it to first subvol in the
layout-list (Say c1 in this case). Note that subvols with 0 ranges are stored
in the beginning of the list. All the subvols where lookup failed (say because
of ENOTCONN) ends up with 0 ranges. The relative order of subvols with 0 ranges
is undefined and depends on whose lookup failed first.
8. c1 comes up
9. hot-dht acquires lock on c1.
10. tier-dht tries to refresh its layout of /. Winds lookup on hot and cold
dhts again.
11. hot-dht sees that layout's generation number is lagging behind current
generation number (as c1 came after lookup on / completed). It issues a fresh
lookup and reconstructs the layout for /. Since c2 is still down, it is pushed
to the beginning of the subvol list of layout.
12. tier-dht is done with healing. It issues unlock on hot-dht.
13. hot-dht winds unlock call to first subvol in layout of /, which is c2.
14. unlock fails with ENOTCONN and a stale lock is left on c1.

--- Additional comment from Raghavendra G on 2016-02-12 01:46:00 EST ---

steps 7 and 8 can be swapped for more clarity and RCA is still valid

--- Additional comment from Mohammed Rafi KC on 2016-02-12 05:04:36 EST ---

Based on comment 6, it could be a an intrusive fix, that requires testing for
pure dht and tier also. To recover from this hang with out any interruption for
application continuity would be to restart nfs server, which can be done by
volume start force. This will restart only nfs server , if there is no other
process requires restart.

--- Additional comment from Laura Bailey on 2016-02-14 21:45:57 EST ---

Rafi, based on https://bugzilla.redhat.com/show_bug.cgi?id=1303045#c3, I
shouldn't document this as a known issue, right?

--- Additional comment from Mohammed Rafi KC on 2016-02-15 01:15:28 EST ---

Yes, I have included this as part of the bug 1303045 .

--- Additional comment from Laura Bailey on 2016-02-15 20:07:44 EST ---

Thanks Rafi, removing this from the tracker bug.

--- Additional comment from nchilaka on 2016-02-17 01:17:25 EST ---

Workaround testing: I tested the work around by restarting volume using force.
While the IOs resumed, which means the workaround is fine, but there is a small
problem which has been discussed for which bz#1309186 - file creates fail with
" failed to open '<filename>': Too many levels of symbolic links for file
create/write when restarting NFS using vol start force  has been raised

--- Additional comment from Vijay Bellur on 2016-02-23 02:36:21 EST ---

REVIEW: http://review.gluster.org/13492 (dht:remember locked subvol and send
unlock to the same) posted (#1) for review on master by mohammed rafi  kc
(rkavunga at redhat.com)

--- Additional comment from Vijay Bellur on 2016-02-27 13:42:49 EST ---

REVIEW: http://review.gluster.org/13492 (dht:remember locked subvol and send
unlock to the same) posted (#2) for review on master by mohammed rafi  kc
(rkavunga at redhat.com)

--- Additional comment from Vijay Bellur on 2016-03-04 00:56:10 EST ---

REVIEW: http://review.gluster.org/13492 (dht:remember locked subvol and send
unlock to the same) posted (#3) for review on master by mohammed rafi  kc
(rkavunga at redhat.com)

--- Additional comment from Vijay Bellur on 2016-03-04 11:45:18 EST ---

REVIEW: http://review.gluster.org/13492 (dht:remember locked subvol and send
unlock to the same) posted (#4) for review on master by mohammed rafi  kc
(rkavunga at redhat.com)

--- Additional comment from Vijay Bellur on 2016-03-08 16:49:15 EST ---

REVIEW: http://review.gluster.org/13492 (dht:remember locked subvol and send
unlock to the same) posted (#5) for review on master by mohammed rafi  kc
(rkavunga at redhat.com)

--- Additional comment from Vijay Bellur on 2016-03-09 01:56:50 EST ---

REVIEW: http://review.gluster.org/13492 (dht:remember locked subvol and send
unlock to the same) posted (#6) for review on master by mohammed rafi  kc
(rkavunga at redhat.com)

--- Additional comment from Vijay Bellur on 2016-03-15 02:15:00 EDT ---

REVIEW: http://review.gluster.org/13492 (dht:remember locked subvol and send
unlock to the same) posted (#7) for review on master by mohammed rafi  kc
(rkavunga at redhat.com)

--- Additional comment from Vijay Bellur on 2016-03-16 08:13:52 EDT ---

REVIEW: http://review.gluster.org/13492 (dht:remember locked subvol and send
unlock to the same) posted (#8) for review on master by mohammed rafi  kc
(rkavunga at redhat.com)

--- Additional comment from Vijay Bellur on 2016-05-03 07:01:09 EDT ---

REVIEW: http://review.gluster.org/13492 (dht:remember locked subvol and send
unlock to the same) posted (#9) for review on master by mohammed rafi  kc
(rkavunga at redhat.com)

--- Additional comment from Vijay Bellur on 2016-05-04 08:42:37 EDT ---

REVIEW: http://review.gluster.org/13492 (dht:remember locked subvol and send
unlock to the same) posted (#10) for review on master by mohammed rafi  kc
(rkavunga at redhat.com)

--- Additional comment from Vijay Bellur on 2016-05-05 06:54:34 EDT ---

REVIEW: http://review.gluster.org/13492 (dht:remember locked subvol and send
unlock to the same) posted (#11) for review on master by mohammed rafi  kc
(rkavunga at redhat.com)

--- Additional comment from Vijay Bellur on 2016-05-05 09:34:58 EDT ---

REVIEW: http://review.gluster.org/13492 (dht:remember locked subvol and send
unlock to the same) posted (#12) for review on master by mohammed rafi  kc
(rkavunga at redhat.com)

--- Additional comment from Vijay Bellur on 2016-05-05 13:33:55 EDT ---

REVIEW: http://review.gluster.org/13492 (dht:remember locked subvol and send
unlock to the same) posted (#13) for review on master by mohammed rafi  kc
(rkavunga at redhat.com)

--- Additional comment from Vijay Bellur on 2016-05-06 01:16:30 EDT ---

REVIEW: http://review.gluster.org/13492 (dht:remember locked subvol and send
unlock to the same) posted (#14) for review on master by mohammed rafi  kc
(rkavunga at redhat.com)


Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1305205
[Bug 1305205] NFS mount hangs on a tiered volume
https://bugzilla.redhat.com/show_bug.cgi?id=1306194
[Bug 1306194] NFS+attach tier:IOs hang while attach tier is issued
https://bugzilla.redhat.com/show_bug.cgi?id=1306930
[Bug 1306930] Writes on files are hung from nfs mount on performing
attach-tier
https://bugzilla.redhat.com/show_bug.cgi?id=1311002
[Bug 1311002] NFS+attach tier:IOs hang while attach tier is issued
-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Bugs mailing list