[Bugs] [Bug 1214289] I/O failure on attaching tier

Tue Jun 2 15:56:53 UTC 2015

https://bugzilla.redhat.com/show_bug.cgi?id=1214289

--- Comment #10 from nchilaka <nchilaka at redhat.com> ---
Seeing the following issue on latest downstream build 
Following are the steps to reproduce:
1)create a dist-rep volume 
  gluster v create tiervol2 replica 2 10.70.46.233:/rhs/brick1/tiervol2
10.70.46.236:/rhs/brick1/tiervol2 10.70.46.240:/rhs/brick1/tiervol2
10.70.46.243:/rhs/brick1  /tiervol2
2)start and issue commands like info and status
3)Now mount using NFS
4) Trigger some IOs on this volume
5)While IOs are happening attach a tier

It can be seen that the tier gets attached successfully, but the IOs fail to
write anymore

Some Observations worth noting:
1)This happens only when we mount using NFS. With glusterfs mount works
well(Anoop, comment if you see issue even on glusterfs mount)
2)Seems to be some problem with tiering and NFS interaction as I see that NFS
ports are all down when I run above scenario
3)This issue is hit only when IOs were in progress while attaching
tier(although this will be the most valid case in customer site)

[root at rhsqa14-vm1 ~]# gluster v status tiervol2
Status of volume: tiervol2
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.46.233:/rhs/brick1/tiervol2     49153     0          Y       1973 
Brick 10.70.46.236:/rhs/brick1/tiervol2     49154     0          Y       24453
Brick 10.70.46.240:/rhs/brick1/tiervol2     49154     0          Y       32272
Brick 10.70.46.243:/rhs/brick1/tiervol2     49153     0          Y       31759
NFS Server on localhost                     2049      0          Y       1992 
Self-heal Daemon on localhost               N/A       N/A        Y       2017 
NFS Server on 10.70.46.243                  2049      0          Y       31778
Self-heal Daemon on 10.70.46.243            N/A       N/A        Y       31790
NFS Server on 10.70.46.236                  2049      0          Y       24472
Self-heal Daemon on 10.70.46.236            N/A       N/A        Y       24482
NFS Server on 10.70.46.240                  2049      0          Y       32292
Self-heal Daemon on 10.70.46.240            N/A       N/A        Y       32312

Task Status of Volume tiervol2
------------------------------------------------------------------------------
There are no active volume tasks

[root at rhsqa14-vm1 ~]# gluster v info tiervol2

Volume Name: tiervol2
Type: Distributed-Replicate
Volume ID: a98f39c2-03ed-4ec7-909f-573b89a2a3e8
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: 10.70.46.233:/rhs/brick1/tiervol2
Brick2: 10.70.46.236:/rhs/brick1/tiervol2
Brick3: 10.70.46.240:/rhs/brick1/tiervol2
Brick4: 10.70.46.243:/rhs/brick1/tiervol2
Options Reconfigured:
performance.readdir-ahead: on
[root at rhsqa14-vm1 ~]# #################Now i have mounted the regular dist-rep
vol  https://www.kernel.org/pub/linux/kernel/v4.x/linux-4.0.4.tar.xz
You have new mail in /var/spool/mail/root
[root at rhsqa14-vm1 ~]# #################Now i have mounted the regular dist-rep
vol  tiervol2##########
[root at rhsqa14-vm1 ~]# ls /rhs/brick1/tiervol2
linux-4.0.4.tar.xz
[root at rhsqa14-vm1 ~]#  #################Next I will attach a tier while
untaring the image, and will check status of vol, it will show nfs
down###########
[root at rhsqa14-vm1 ~]# ls /rhs/brick1/tiervol2 ;gluster v attach-tier tiervol2
10.70.46.236:/rhs/brick2/tiervol2 10.70.46.240:/rhs/brick2/tiervol2
linux-4.0.4  linux-4.0.4.tar.xz
Attach tier is recommended only for testing purposes in this release. Do you
want to continue? (y/n) y
volume attach-tier: success
volume rebalance: tiervol2: success: Rebalance on tiervol2 has been started
successfully. Use rebalance status command to check status of the rebalance
process.
ID: 1e59a5cc-2ff0-48ce-a34e-0521cbe65d73

You have mail in /var/spool/mail/root
[root at rhsqa14-vm1 ~]# ls /rhs/brick1/tiervol2
linux-4.0.4  linux-4.0.4.tar.xz
[root at rhsqa14-vm1 ~]# gluster v info tiervol2

Volume Name: tiervol2
Type: Tier
Volume ID: a98f39c2-03ed-4ec7-909f-573b89a2a3e8
Status: Started
Number of Bricks: 6
Transport-type: tcp
Hot Tier :
Hot Tier Type : Distribute
Number of Bricks: 2
Brick1: 10.70.46.240:/rhs/brick2/tiervol2
Brick2: 10.70.46.236:/rhs/brick2/tiervol2
Cold Bricks:
Cold Tier Type : Distributed-Replicate
Number of Bricks: 2 x 2 = 4
Brick3: 10.70.46.233:/rhs/brick1/tiervol2
Brick4: 10.70.46.236:/rhs/brick1/tiervol2
Brick5: 10.70.46.240:/rhs/brick1/tiervol2
Brick6: 10.70.46.243:/rhs/brick1/tiervol2
Options Reconfigured:
performance.readdir-ahead: on
[root at rhsqa14-vm1 ~]# gluster v status tiervol2
Status of volume: tiervol2
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Hot Bricks:
Brick 10.70.46.240:/rhs/brick2/tiervol2     49155     0          Y       32411
Brick 10.70.46.236:/rhs/brick2/tiervol2     49155     0          Y       24590
Brick 10.70.46.233:/rhs/brick1/tiervol2     49153     0          Y       1973 
Brick 10.70.46.236:/rhs/brick1/tiervol2     49154     0          Y       24453
Brick 10.70.46.240:/rhs/brick1/tiervol2     49154     0          Y       32272
Brick 10.70.46.243:/rhs/brick1/tiervol2     49153     0          Y       31759
NFS Server on localhost                     N/A       N/A        N       N/A  
NFS Server on 10.70.46.236                  N/A       N/A        N       N/A  
NFS Server on 10.70.46.243                  N/A       N/A        N       N/A  
NFS Server on 10.70.46.240                  N/A       N/A        N       N/A  

Task Status of Volume tiervol2
------------------------------------------------------------------------------
Task                 : Rebalance           
ID                   : 1e59a5cc-2ff0-48ce-a34e-0521cbe65d73
Status               : in progress         

sosreport Logs attached

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are on the CC list for the bug.