[Bugs] [Bug 1347524] New: NFS+attach tier:IOs hang while attach tier is issued
bugzilla at redhat.com
bugzilla at redhat.com
Fri Jun 17 07:25:35 UTC 2016
https://bugzilla.redhat.com/show_bug.cgi?id=1347524
Bug ID: 1347524
Summary: NFS+attach tier:IOs hang while attach tier is issued
Product: GlusterFS
Version: 3.8.0
Component: tiering
Keywords: ZStream
Severity: urgent
Priority: urgent
Assignee: bugs at gluster.org
Reporter: rkavunga at redhat.com
QA Contact: bugs at gluster.org
CC: bugs at gluster.org, byarlaga at redhat.com,
nchilaka at redhat.com, rcyriac at redhat.com,
rgowdapp at redhat.com, rkavunga at redhat.com,
skoduri at redhat.com, smohan at redhat.com
Depends On: 1306194, 1311002, 1333645
Blocks: 1305205, 1306930
+++ This bug was initially created as a clone of Bug #1333645 +++
+++ This bug was initially created as a clone of Bug #1311002 +++
+++ This bug was initially created as a clone of Bug #1306194 +++
on a 16 node setup, with ec volume, I started IOs from 3 different clients.
While IOs were going on I attached a tier to the volume, and the IOs were hung.
I tried this twice and both times IOs got hung.
In 3.7.5-17 there used to be a temporary pause(about 5 min) when attach tier
was issued. But in this build 3.7.5-19 the IOs have hung for more than 2 Hours
volinfo before and after attach tier:
gluster v create npcvol disperse 12 disperse-data 8
10.70.37.202:/bricks/brick1/npcvol 10.70.37.195:/bricks/brick1/npcvol
10.70.35.133:/bricks/brick1/npcvol 10.70.35.239:/bricks/brick1/npcvol
10.70.35.225:/bricks/brick1/npcvol 10.70.35.11:/bricks/brick1/npcvol
10.70.35.10:/bricks/brick1/npcvol 10.70.35.231:/bricks/brick1/npcvol
10.70.35.176:/bricks/brick1/npcvol 10.70.35.232:/bricks/brick1/npcvol
10.70.35.173:/bricks/brick1/npcvol 10.70.35.163:/bricks/brick1/npcvol
10.70.37.101:/bricks/brick1/npcvol 10.70.37.69:/bricks/brick1/npcvol
10.70.37.60:/bricks/brick1/npcvol 10.70.37.120:/bricks/brick1/npcvol
10.70.37.202:/bricks/brick2/npcvol 10.70.37.195:/bricks/brick2/npcvol
10.70.35.133:/bricks/brick2/npcvol 10.70.35.239:/bricks/brick2/npcvol
10.70.35.225:/bricks/brick2/npcvol 10.70.35.11:/bricks/brick2/npcvol
10.70.35.10:/bricks/brick2/npcvol 10.70.35.231:/bricks/brick2/npcvol
gluster volume tier npcvol attach rep 2 10.70.35.176:/bricks/brick7/npcvol_hot
10.70.35.232:/bricks/brick7/npcvol_hot 10.70.35.173:/bricks/brick7/npcvol_hot
10.70.35.163:/bricks/brick7/npcvol_hot 10.70.37.101:/bricks/brick7/npcvol_hot
10.70.37.69:/bricks/brick7/npcvol_hot 10.70.37.60:/bricks/brick7/npcvol_hot
10.70.37.120:/bricks/brick7/npcvol_hot 10.70.37.195:/bricks/brick7/npcvol_hot
10.70.37.202:/bricks/brick7/npcvol_hot 10.70.35.133:/bricks/brick7/npcvol_hot
10.70.35.239:/bricks/brick7/npcvol_hot
[root at dhcp37-202 ~]# gluster v status npcvol
Status of volume: npcvol
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick 10.70.37.202:/bricks/brick1/npcvol 49161 0 Y 628
Brick 10.70.37.195:/bricks/brick1/npcvol 49161 0 Y 30704
Brick 10.70.35.133:/bricks/brick1/npcvol 49158 0 Y 24148
Brick 10.70.35.239:/bricks/brick1/npcvol 49158 0 Y 24128
Brick 10.70.35.225:/bricks/brick1/npcvol 49157 0 Y 24467
Brick 10.70.35.11:/bricks/brick1/npcvol 49157 0 Y 24272
Brick 10.70.35.10:/bricks/brick1/npcvol 49160 0 Y 24369
Brick 10.70.35.231:/bricks/brick1/npcvol 49160 0 Y 32189
Brick 10.70.35.176:/bricks/brick1/npcvol 49161 0 Y 1392
Brick 10.70.35.232:/bricks/brick1/npcvol 49161 0 Y 26630
Brick 10.70.35.173:/bricks/brick1/npcvol 49161 0 Y 28493
Brick 10.70.35.163:/bricks/brick1/npcvol 49161 0 Y 28592
Brick 10.70.37.101:/bricks/brick1/npcvol 49161 0 Y 28410
Brick 10.70.37.69:/bricks/brick1/npcvol 49161 0 Y 357
Brick 10.70.37.60:/bricks/brick1/npcvol 49161 0 Y 31071
Brick 10.70.37.120:/bricks/brick1/npcvol 49176 0 Y 1311
Brick 10.70.37.202:/bricks/brick2/npcvol 49162 0 Y 651
Brick 10.70.37.195:/bricks/brick2/npcvol 49162 0 Y 30723
Brick 10.70.35.133:/bricks/brick2/npcvol 49159 0 Y 24167
Brick 10.70.35.239:/bricks/brick2/npcvol 49159 0 Y 24148
Brick 10.70.35.225:/bricks/brick2/npcvol 49158 0 Y 24486
Brick 10.70.35.11:/bricks/brick2/npcvol 49158 0 Y 24291
Brick 10.70.35.10:/bricks/brick2/npcvol 49161 0 Y 24388
Brick 10.70.35.231:/bricks/brick2/npcvol 49161 0 Y 32208
Snapshot Daemon on localhost 49163 0 Y 810
NFS Server on localhost 2049 0 Y 818
Self-heal Daemon on localhost N/A N/A Y 686
Quota Daemon on localhost N/A N/A Y 859
Snapshot Daemon on 10.70.37.101 49162 0 Y 28538
NFS Server on 10.70.37.101 2049 0 Y 28546
Self-heal Daemon on 10.70.37.101 N/A N/A Y 28439
Quota Daemon on 10.70.37.101 N/A N/A Y 28576
Snapshot Daemon on 10.70.37.195 49163 0 Y 30851
NFS Server on 10.70.37.195 2049 0 Y 30859
Self-heal Daemon on 10.70.37.195 N/A N/A Y 30751
Quota Daemon on 10.70.37.195 N/A N/A Y 30889
Snapshot Daemon on 10.70.37.120 49177 0 Y 1438
NFS Server on 10.70.37.120 2049 0 Y 1446
Self-heal Daemon on 10.70.37.120 N/A N/A Y 1339
Quota Daemon on 10.70.37.120 N/A N/A Y 1477
Snapshot Daemon on 10.70.37.69 49162 0 Y 492
NFS Server on 10.70.37.69 2049 0 Y 500
Self-heal Daemon on 10.70.37.69 N/A N/A Y 385
Quota Daemon on 10.70.37.69 N/A N/A Y 542
Snapshot Daemon on 10.70.37.60 49162 0 Y 31197
NFS Server on 10.70.37.60 2049 0 Y 31205
Self-heal Daemon on 10.70.37.60 N/A N/A Y 31099
Quota Daemon on 10.70.37.60 N/A N/A Y 31235
Snapshot Daemon on 10.70.35.239 49160 0 Y 24287
NFS Server on 10.70.35.239 2049 0 Y 24295
Self-heal Daemon on 10.70.35.239 N/A N/A Y 24176
Quota Daemon on 10.70.35.239 N/A N/A Y 24325
Snapshot Daemon on 10.70.35.231 49162 0 Y 32340
NFS Server on 10.70.35.231 2049 0 Y 32348
Self-heal Daemon on 10.70.35.231 N/A N/A Y 32236
Quota Daemon on 10.70.35.231 N/A N/A Y 32389
Snapshot Daemon on 10.70.35.176 49162 0 Y 1535
NFS Server on 10.70.35.176 2049 0 Y 1545
Self-heal Daemon on 10.70.35.176 N/A N/A Y 1420
Quota Daemon on 10.70.35.176 N/A N/A Y 1589
Snapshot Daemon on dhcp35-225.lab.eng.blr.r
edhat.com 49159 0 Y 24623
NFS Server on dhcp35-225.lab.eng.blr.redhat
.com 2049 0 Y 24631
Self-heal Daemon on dhcp35-225.lab.eng.blr.
redhat.com N/A N/A Y 24514
Quota Daemon on dhcp35-225.lab.eng.blr.redh
at.com N/A N/A Y 24661
Snapshot Daemon on 10.70.35.232 49162 0 Y 26759
NFS Server on 10.70.35.232 2049 0 Y 26767
Self-heal Daemon on 10.70.35.232 N/A N/A Y 26658
Quota Daemon on 10.70.35.232 N/A N/A Y 26805
Snapshot Daemon on 10.70.35.163 49162 0 Y 28721
NFS Server on 10.70.35.163 2049 0 Y 28729
Self-heal Daemon on 10.70.35.163 N/A N/A Y 28620
Quota Daemon on 10.70.35.163 N/A N/A Y 28760
Snapshot Daemon on 10.70.35.11 49159 0 Y 24427
NFS Server on 10.70.35.11 2049 0 Y 24435
Self-heal Daemon on 10.70.35.11 N/A N/A Y 24319
Quota Daemon on 10.70.35.11 N/A N/A Y 24465
Snapshot Daemon on 10.70.35.10 49162 0 Y 24521
NFS Server on 10.70.35.10 2049 0 Y 24529
Self-heal Daemon on 10.70.35.10 N/A N/A Y 24416
Quota Daemon on 10.70.35.10 N/A N/A Y 24560
Snapshot Daemon on 10.70.35.133 49160 0 Y 24314
NFS Server on 10.70.35.133 2049 0 Y 24322
Self-heal Daemon on 10.70.35.133 N/A N/A Y 24203
Quota Daemon on 10.70.35.133 N/A N/A Y 24352
Snapshot Daemon on 10.70.35.173 49162 0 Y 28625
NFS Server on 10.70.35.173 2049 0 Y 28633
Self-heal Daemon on 10.70.35.173 N/A N/A Y 28521
Quota Daemon on 10.70.35.173 N/A N/A Y 28671
Task Status of Volume npcvol
------------------------------------------------------------------------------
There are no active volume tasks
#####after attach tier
[root at dhcp37-202 ~]# gluster v status npcvol
Status of volume: npcvol
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Hot Bricks:
Brick 10.70.35.239:/bricks/brick7/npcvol_ho
t 49161 0 Y 25252
Brick 10.70.35.133:/bricks/brick7/npcvol_ho
t 49161 0 Y 25276
Brick 10.70.37.202:/bricks/brick7/npcvol_ho
t 49164 0 Y 2028
Brick 10.70.37.195:/bricks/brick7/npcvol_ho
t 49164 0 Y 31793
Brick 10.70.37.120:/bricks/brick7/npcvol_ho
t 49178 0 Y 2504
Brick 10.70.37.60:/bricks/brick7/npcvol_hot 49163 0 Y 32188
Brick 10.70.37.69:/bricks/brick7/npcvol_hot 49163 0 Y 1548
Brick 10.70.37.101:/bricks/brick7/npcvol_ho
t 49163 0 Y 29535
Brick 10.70.35.163:/bricks/brick7/npcvol_ho
t 49163 0 Y 29799
Brick 10.70.35.173:/bricks/brick7/npcvol_ho
t 49163 0 Y 29669
Brick 10.70.35.232:/bricks/brick7/npcvol_ho
t 49163 0 Y 27813
Brick 10.70.35.176:/bricks/brick7/npcvol_ho
t 49163 0 Y 2607
Cold Bricks:
Brick 10.70.37.202:/bricks/brick1/npcvol 49161 0 Y 628
Brick 10.70.37.195:/bricks/brick1/npcvol 49161 0 Y 30704
Brick 10.70.35.133:/bricks/brick1/npcvol 49158 0 Y 24148
Brick 10.70.35.239:/bricks/brick1/npcvol 49158 0 Y 24128
Brick 10.70.35.225:/bricks/brick1/npcvol 49157 0 Y 24467
Brick 10.70.35.11:/bricks/brick1/npcvol 49157 0 Y 24272
Brick 10.70.35.10:/bricks/brick1/npcvol 49160 0 Y 24369
Brick 10.70.35.231:/bricks/brick1/npcvol 49160 0 Y 32189
Brick 10.70.35.176:/bricks/brick1/npcvol 49161 0 Y 1392
Brick 10.70.35.232:/bricks/brick1/npcvol 49161 0 Y 26630
Brick 10.70.35.173:/bricks/brick1/npcvol 49161 0 Y 28493
Brick 10.70.35.163:/bricks/brick1/npcvol 49161 0 Y 28592
Brick 10.70.37.101:/bricks/brick1/npcvol 49161 0 Y 28410
Brick 10.70.37.69:/bricks/brick1/npcvol 49161 0 Y 357
Brick 10.70.37.60:/bricks/brick1/npcvol 49161 0 Y 31071
Brick 10.70.37.120:/bricks/brick1/npcvol 49176 0 Y 1311
Brick 10.70.37.202:/bricks/brick2/npcvol 49162 0 Y 651
Brick 10.70.37.195:/bricks/brick2/npcvol 49162 0 Y 30723
Brick 10.70.35.133:/bricks/brick2/npcvol 49159 0 Y 24167
Brick 10.70.35.239:/bricks/brick2/npcvol 49159 0 Y 24148
Brick 10.70.35.225:/bricks/brick2/npcvol 49158 0 Y 24486
Brick 10.70.35.11:/bricks/brick2/npcvol 49158 0 Y 24291
Brick 10.70.35.10:/bricks/brick2/npcvol 49161 0 Y 24388
Brick 10.70.35.231:/bricks/brick2/npcvol 49161 0 Y 32208
Snapshot Daemon on localhost 49163 0 Y 810
NFS Server on localhost 2049 0 Y 2048
Self-heal Daemon on localhost N/A N/A Y 2056
Quota Daemon on localhost N/A N/A Y 2064
Snapshot Daemon on 10.70.37.60 49162 0 Y 31197
NFS Server on 10.70.37.60 2049 0 Y 32208
Self-heal Daemon on 10.70.37.60 N/A N/A Y 32216
Quota Daemon on 10.70.37.60 N/A N/A Y 32224
Snapshot Daemon on 10.70.37.195 49163 0 Y 30851
NFS Server on 10.70.37.195 2049 0 Y 31813
Self-heal Daemon on 10.70.37.195 N/A N/A Y 31821
Quota Daemon on 10.70.37.195 N/A N/A Y 31829
Snapshot Daemon on 10.70.37.120 49177 0 Y 1438
NFS Server on 10.70.37.120 2049 0 Y 2524
Self-heal Daemon on 10.70.37.120 N/A N/A Y 2532
Quota Daemon on 10.70.37.120 N/A N/A Y 2540
Snapshot Daemon on 10.70.37.101 49162 0 Y 28538
NFS Server on 10.70.37.101 2049 0 Y 29555
Self-heal Daemon on 10.70.37.101 N/A N/A Y 29563
Quota Daemon on 10.70.37.101 N/A N/A Y 29571
Snapshot Daemon on 10.70.37.69 49162 0 Y 492
NFS Server on 10.70.37.69 2049 0 Y 1574
Self-heal Daemon on 10.70.37.69 N/A N/A Y 1582
Quota Daemon on 10.70.37.69 N/A N/A Y 1590
Snapshot Daemon on 10.70.35.173 49162 0 Y 28625
NFS Server on 10.70.35.173 2049 0 Y 29690
Self-heal Daemon on 10.70.35.173 N/A N/A Y 29698
Quota Daemon on 10.70.35.173 N/A N/A Y 29713
Snapshot Daemon on 10.70.35.231 49162 0 Y 32340
NFS Server on 10.70.35.231 2049 0 Y 1022
Self-heal Daemon on 10.70.35.231 N/A N/A Y 1033
Quota Daemon on 10.70.35.231 N/A N/A Y 1043
Snapshot Daemon on 10.70.35.176 49162 0 Y 1535
NFS Server on 10.70.35.176 2049 0 Y 2627
Self-heal Daemon on 10.70.35.176 N/A N/A Y 2635
Quota Daemon on 10.70.35.176 N/A N/A Y 2659
Snapshot Daemon on 10.70.35.239 49160 0 Y 24287
NFS Server on 10.70.35.239 2049 0 Y 25272
Self-heal Daemon on 10.70.35.239 N/A N/A Y 25280
Quota Daemon on 10.70.35.239 N/A N/A Y 25288
Snapshot Daemon on dhcp35-225.lab.eng.blr.r
edhat.com 49159 0 Y 24623
NFS Server on dhcp35-225.lab.eng.blr.redhat
.com 2049 0 Y 25622
Self-heal Daemon on dhcp35-225.lab.eng.blr.
redhat.com N/A N/A Y 25630
Quota Daemon on dhcp35-225.lab.eng.blr.redh
at.com N/A N/A Y 25638
Snapshot Daemon on 10.70.35.11 49159 0 Y 24427
NFS Server on 10.70.35.11 2049 0 Y 25455
Self-heal Daemon on 10.70.35.11 N/A N/A Y 25463
Quota Daemon on 10.70.35.11 N/A N/A Y 25471
Snapshot Daemon on 10.70.35.133 49160 0 Y 24314
NFS Server on 10.70.35.133 2049 0 Y 25296
Self-heal Daemon on 10.70.35.133 N/A N/A Y 25304
Quota Daemon on 10.70.35.133 N/A N/A Y 25312
Snapshot Daemon on 10.70.35.10 49162 0 Y 24521
NFS Server on 10.70.35.10 2049 0 Y 25578
Self-heal Daemon on 10.70.35.10 N/A N/A Y 25586
Quota Daemon on 10.70.35.10 N/A N/A Y 25594
Snapshot Daemon on 10.70.35.232 49162 0 Y 26759
NFS Server on 10.70.35.232 2049 0 Y 27833
Self-heal Daemon on 10.70.35.232 N/A N/A Y 27841
Quota Daemon on 10.70.35.232 N/A N/A Y 27866
Snapshot Daemon on 10.70.35.163 49162 0 Y 28721
NFS Server on 10.70.35.163 2049 0 Y 29819
Self-heal Daemon on 10.70.35.163 N/A N/A Y 29827
Quota Daemon on 10.70.35.163 N/A N/A Y 29852
Task Status of Volume npcvol
------------------------------------------------------------------------------
Task : Tier migration
ID : 524ad8fe-a743-47df-a4e9-edd2db05c60b
Status : in progress
Following is the Ios triggered before attach and were going on while attach:
1)client1:created a 300Mb file and started to copy the file to new files
for i in {2..50};do cp hlfile.1 hlfile.$i;done
2)client2:created 50Mb file and initiated a rename of file continuously
for i in {2..1000};do cp rename.1 rename.$i;done
3)client3: linux untar
4)copying a 3GB file to create new files in loop
for i in {1..10};do cp File.mkv cheema$i.mkv;done
4)Client 4: created 10000 Zerobyte file and while then triggered remove of 5000
file so that it goes on while attach tier
[root at rhs-client30 zerobyte]# rm -rf zb{5000..10000}
--- Additional comment from Red Hat Bugzilla Rules Engine on 2016-02-10
04:45:45 EST ---
This bug is automatically being proposed for the current z-stream release of
Red Hat Gluster Storage 3 by setting the release flag 'rhgs‑3.1.z' to '?'.
If this bug should be proposed for a different release, please manually change
the proposed release flag.
--- Additional comment from nchilaka on 2016-02-10 05:18:43 EST ---
sosreports of both clients and servers available at
[nchilaka at rhsqe-repo nchilaka]$ chmod -R 0777 bug.1306194
[nchilaka at rhsqe-repo nchilaka]$ pwd
/home/repo/sosreports/nchilaka
--- Additional comment from Mohammed Rafi KC on 2016-02-10 11:13:31 EST ---
There is a blocking lock held on one of the brick, which is not released. All
of the other clients are waiting on this lock. We couldn't look into the owner
of the lock, because by the time ping timer is expired and lock was released.
After that i/o's resumed. We need to look which client acquired the lock and
why they are not releasing it.
--- Additional comment from Soumya Koduri on 2016-02-11 09:00:05 EST ---
When we tried to reproduce the issue, we see "Stale File Handle" errors after
attach-tier. When did RCA using gdb, we found that ESTALE is returned via
svc_client (which is enabled by USS). So we have disabled USS and then re-tried
the test. Now we see the mount points hang.
On the server side, the volume got unexported -
[skoduri at skoduri ~]$ showmount -e 10.70.35.225
Export list for 10.70.35.225:
[skoduri at skoduri ~]$
Tracing back from the logs and the code,
[2016-02-11 13:26:02.540565] E [MSGID: 112070] [nfs3.c:896:nfs3_getattr]
0-nfs-nfsv3: Volume is disabled: finalvol
[2016-02-11 13:28:02.600425] E [MSGID: 112070] [nfs3.c:896:nfs3_getattr]
0-nfs-nfsv3: Volume is disabled: finalvol
[2016-02-11 13:28:02.600546] E [rpcsvc.c:565:rpcsvc_check_and_reply_error]
0-rpcsvc: rpc actor failed to complete successfully
This msg is logged when that volume is not in the list of nfs->initedxl[] list.
This list will be updated as part of "nfs_startup_subvolume()" which is invoked
during notify of "GF_EVENT_CHILD_UP". So suspecting that nfs xlator has not
received this event which resulted in this volume being in unexported state.
Attaching the nfs log for further debugging.
--- Additional comment from Mohammed Rafi KC on 2016-02-11 09:55:09 EST ---
During the nfs graph initialization, we do a lookup on the root. Looks like
this lookup is blocked on a lock which held by another nfs process. We need to
figure it out why the nfs server who acquired the lock failed to unlock it.
--- Additional comment from Raghavendra G on 2016-02-12 01:43:36 EST ---
Rafi reported that stale lock or unlock failures are seen even when first
lookup on root is happening. Here is a most likely RCA. I am assuming a
"tier-dht" has two dht subvols "hot-dht" and "cold-dht". Also stale lock is
found on one of the bricks corresponding to hot-dht.
1. Lookup on / on tier-dht.
2. Lookup is wound to hashed subvol - cold-dht and is successful.
3. tier-dht figures out / is a directory and does a lookup on both hot-dht and
cold-dht.
4. on hot-dht, some subvols - say c1, c2 - are down. But lookup is still
successful as some other subvols (say c3, c4) are up.
5. lookup on / is successful on cold-dht.
6. tier-dht decides it needs to heal layout of "/".
>From here I am skipping events on cold-dht as they are irrelevant for this RCA.
7. tier-dht winds inodelk on hot-dht. hot-dht winds it to first subvol in the
layout-list (Say c1 in this case). Note that subvols with 0 ranges are stored
in the beginning of the list. All the subvols where lookup failed (say because
of ENOTCONN) ends up with 0 ranges. The relative order of subvols with 0 ranges
is undefined and depends on whose lookup failed first.
8. c1 comes up
9. hot-dht acquires lock on c1.
10. tier-dht tries to refresh its layout of /. Winds lookup on hot and cold
dhts again.
11. hot-dht sees that layout's generation number is lagging behind current
generation number (as c1 came after lookup on / completed). It issues a fresh
lookup and reconstructs the layout for /. Since c2 is still down, it is pushed
to the beginning of the subvol list of layout.
12. tier-dht is done with healing. It issues unlock on hot-dht.
13. hot-dht winds unlock call to first subvol in layout of /, which is c2.
14. unlock fails with ENOTCONN and a stale lock is left on c1.
--- Additional comment from Raghavendra G on 2016-02-12 01:46:00 EST ---
steps 7 and 8 can be swapped for more clarity and RCA is still valid
--- Additional comment from Mohammed Rafi KC on 2016-02-12 05:04:36 EST ---
Based on comment 6, it could be a an intrusive fix, that requires testing for
pure dht and tier also. To recover from this hang with out any interruption for
application continuity would be to restart nfs server, which can be done by
volume start force. This will restart only nfs server , if there is no other
process requires restart.
--- Additional comment from Laura Bailey on 2016-02-14 21:45:57 EST ---
Rafi, based on https://bugzilla.redhat.com/show_bug.cgi?id=1303045#c3, I
shouldn't document this as a known issue, right?
--- Additional comment from Mohammed Rafi KC on 2016-02-15 01:15:28 EST ---
Yes, I have included this as part of the bug 1303045 .
--- Additional comment from Laura Bailey on 2016-02-15 20:07:44 EST ---
Thanks Rafi, removing this from the tracker bug.
--- Additional comment from nchilaka on 2016-02-17 01:17:25 EST ---
Workaround testing: I tested the work around by restarting volume using force.
While the IOs resumed, which means the workaround is fine, but there is a small
problem which has been discussed for which bz#1309186 - file creates fail with
" failed to open '<filename>': Too many levels of symbolic links for file
create/write when restarting NFS using vol start force has been raised
--- Additional comment from Vijay Bellur on 2016-02-23 02:36:21 EST ---
REVIEW: http://review.gluster.org/13492 (dht:remember locked subvol and send
unlock to the same) posted (#1) for review on master by mohammed rafi kc
(rkavunga at redhat.com)
--- Additional comment from Vijay Bellur on 2016-02-27 13:42:49 EST ---
REVIEW: http://review.gluster.org/13492 (dht:remember locked subvol and send
unlock to the same) posted (#2) for review on master by mohammed rafi kc
(rkavunga at redhat.com)
--- Additional comment from Vijay Bellur on 2016-03-04 00:56:10 EST ---
REVIEW: http://review.gluster.org/13492 (dht:remember locked subvol and send
unlock to the same) posted (#3) for review on master by mohammed rafi kc
(rkavunga at redhat.com)
--- Additional comment from Vijay Bellur on 2016-03-04 11:45:18 EST ---
REVIEW: http://review.gluster.org/13492 (dht:remember locked subvol and send
unlock to the same) posted (#4) for review on master by mohammed rafi kc
(rkavunga at redhat.com)
--- Additional comment from Vijay Bellur on 2016-03-08 16:49:15 EST ---
REVIEW: http://review.gluster.org/13492 (dht:remember locked subvol and send
unlock to the same) posted (#5) for review on master by mohammed rafi kc
(rkavunga at redhat.com)
--- Additional comment from Vijay Bellur on 2016-03-09 01:56:50 EST ---
REVIEW: http://review.gluster.org/13492 (dht:remember locked subvol and send
unlock to the same) posted (#6) for review on master by mohammed rafi kc
(rkavunga at redhat.com)
--- Additional comment from Vijay Bellur on 2016-03-15 02:15:00 EDT ---
REVIEW: http://review.gluster.org/13492 (dht:remember locked subvol and send
unlock to the same) posted (#7) for review on master by mohammed rafi kc
(rkavunga at redhat.com)
--- Additional comment from Vijay Bellur on 2016-03-16 08:13:52 EDT ---
REVIEW: http://review.gluster.org/13492 (dht:remember locked subvol and send
unlock to the same) posted (#8) for review on master by mohammed rafi kc
(rkavunga at redhat.com)
--- Additional comment from Vijay Bellur on 2016-05-03 07:01:09 EDT ---
REVIEW: http://review.gluster.org/13492 (dht:remember locked subvol and send
unlock to the same) posted (#9) for review on master by mohammed rafi kc
(rkavunga at redhat.com)
--- Additional comment from Vijay Bellur on 2016-05-04 08:42:37 EDT ---
REVIEW: http://review.gluster.org/13492 (dht:remember locked subvol and send
unlock to the same) posted (#10) for review on master by mohammed rafi kc
(rkavunga at redhat.com)
--- Additional comment from Vijay Bellur on 2016-05-05 06:54:34 EDT ---
REVIEW: http://review.gluster.org/13492 (dht:remember locked subvol and send
unlock to the same) posted (#11) for review on master by mohammed rafi kc
(rkavunga at redhat.com)
--- Additional comment from Vijay Bellur on 2016-05-05 09:34:58 EDT ---
REVIEW: http://review.gluster.org/13492 (dht:remember locked subvol and send
unlock to the same) posted (#12) for review on master by mohammed rafi kc
(rkavunga at redhat.com)
--- Additional comment from Vijay Bellur on 2016-05-05 13:33:55 EDT ---
REVIEW: http://review.gluster.org/13492 (dht:remember locked subvol and send
unlock to the same) posted (#13) for review on master by mohammed rafi kc
(rkavunga at redhat.com)
--- Additional comment from Vijay Bellur on 2016-05-06 01:16:30 EDT ---
REVIEW: http://review.gluster.org/13492 (dht:remember locked subvol and send
unlock to the same) posted (#14) for review on master by mohammed rafi kc
(rkavunga at redhat.com)
--- Additional comment from Vijay Bellur on 2016-05-06 05:36:36 EDT ---
REVIEW: http://review.gluster.org/14236 (dht:remember locked subvol and send
unlock to the same) posted (#1) for review on release-3.7 by mohammed rafi kc
(rkavunga at redhat.com)
--- Additional comment from Vijay Bellur on 2016-05-06 08:26:29 EDT ---
COMMIT: http://review.gluster.org/14236 committed in release-3.7 by Raghavendra
G (rgowdapp at redhat.com)
------
commit fd8921b9eb03af69815bb2d7cff07b63048c2d5a
Author: Mohammed Rafi KC <rkavunga at redhat.com>
Date: Tue May 3 14:43:20 2016 +0530
dht:remember locked subvol and send unlock to the same
During locking we send lock request to cached subvol,
and normally we unlock to the cached subvol
But with parallel fresh lookup on a directory, there
is a race window where the cached subvol can change
and the unlock can go into a different subvol from
which we took lock.
This will result in a stale lock held on one of the
subvol.
So we will store the details of subvol which we took the lock
and will unlock from the same subvol
back port of>
>Change-Id: I47df99491671b10624eb37d1d17e40bacf0b15eb
>BUG: 1311002
>Signed-off-by: Mohammed Rafi KC <rkavunga at redhat.com>
>Reviewed-on: http://review.gluster.org/13492
>Reviewed-by: N Balachandran <nbalacha at redhat.com>
>Smoke: Gluster Build System <jenkins at build.gluster.com>
>NetBSD-regression: NetBSD Build System <jenkins at build.gluster.org>
>Reviewed-by: Raghavendra G <rgowdapp at redhat.com>
>CentOS-regression: Gluster Build System <jenkins at build.gluster.com>
Change-Id: Ia847e7115d2296ae9811b14a956f3b6bf39bd86d
BUG: 1333645
Signed-off-by: Mohammed Rafi KC <rkavunga at redhat.com>
Reviewed-on: http://review.gluster.org/14236
Smoke: Gluster Build System <jenkins at build.gluster.com>
NetBSD-regression: NetBSD Build System <jenkins at build.gluster.org>
CentOS-regression: Gluster Build System <jenkins at build.gluster.com>
Reviewed-by: Raghavendra G <rgowdapp at redhat.com>
Referenced Bugs:
https://bugzilla.redhat.com/show_bug.cgi?id=1305205
[Bug 1305205] NFS mount hangs on a tiered volume
https://bugzilla.redhat.com/show_bug.cgi?id=1306194
[Bug 1306194] NFS+attach tier:IOs hang while attach tier is issued
https://bugzilla.redhat.com/show_bug.cgi?id=1306930
[Bug 1306930] Writes on files are hung from nfs mount on performing
attach-tier
https://bugzilla.redhat.com/show_bug.cgi?id=1311002
[Bug 1311002] NFS+attach tier:IOs hang while attach tier is issued
https://bugzilla.redhat.com/show_bug.cgi?id=1333645
[Bug 1333645] NFS+attach tier:IOs hang while attach tier is issued
--
You are receiving this mail because:
You are the QA Contact for the bug.
You are on the CC list for the bug.
You are the assignee for the bug.
More information about the Bugs
mailing list