[Bugs] [Bug 1189027] New: Gluster volume crash after rebuild partition table on XFS disk

Wed Feb 4 09:30:12 UTC 2015

https://bugzilla.redhat.com/show_bug.cgi?id=1189027

            Bug ID: 1189027
           Summary: Gluster volume crash after rebuild partition table on
                    XFS disk
           Product: GlusterFS
           Version: pre-release
         Component: glusterd
          Severity: medium
          Assignee: bugs at gluster.org
          Reporter: e.ivanov at ptl.ru
                CC: bugs at gluster.org, gluster-bugs at redhat.com

Description of problem:
Magic block on my XFS disk was failure and i tried xfs_repair. After xfs_repair
i rebuild partition table. Data on my disk was saved. Next i tried restart
service and show status gluster volume:

[root at node1 ~]# gluster volume status BlockStorage1-3
Locking failed on data3. Please check log file for details.
Locking failed on data0. Please check log file for details.

Version-Release number of selected component (if applicable):
[root at node1 ~]# rpm -qa | grep gluster
glusterfs-3.6.2-1.el7.x86_64
glusterfs-api-3.6.2-1.el7.x86_64
glusterfs-fuse-3.6.2-1.el7.x86_64
glusterfs-server-3.6.2-1.el7.x86_64
glusterfs-libs-3.6.2-1.el7.x86_64
glusterfs-cli-3.6.2-1.el7.x86_64

[root at node1 ~]# gluster volume info BlockStorage1-3

Volume Name: BlockStorage1-3
Type: Replicate
Volume ID: fd146b57-6a49-497b-8aa0-b324dd50e79a
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: data3.os.ptl.ru:/data/glusterfs/disk1/BlockStorage1-3
Brick2: data1.os.ptl.ru:/data/glusterfs/disk1/BlockStorage1-3
Brick3: data0.os.ptl.ru:/data/glusterfs/disk1/BlockStorage1-3
Options Reconfigured:
auth.allow: 10.0.2.*

[root at node1 ~]# gluster peer status
Number of Peers: 2

Hostname: data3
Uuid: b1b583d0-f884-47ad-9376-28a625e39d15
State: Peer in Cluster (Connected)

Hostname: data0
Uuid: ad5425db-0b88-48a7-90b7-a585609ce95d
State: Peer in Cluster (Connected)

After 20-30 second, i tried again and gluster peer has disconnected:
[root at node1 ~]# gluster peer status
Number of Peers: 2

Hostname: data3
Uuid: b1b583d0-f884-47ad-9376-28a625e39d15
State: Peer in Cluster (Disconnected)

Hostname: data0
Uuid: ad5425db-0b88-48a7-90b7-a585609ce9
5d
State: Peer in Cluster (Disconnected) 

After 20-30 second, i tried again and gluster peer has connected:

[root at node1 ~]# gluster peer status
Number of Peers: 2

Hostname: data3.os.ptl.ru
Uuid: b1b583d0-f884-47ad-9376-28a625e39d15
State: Peer in Cluster (Connected)

Hostname: data0
Uuid: ad5425db-0b88-48a7-90b7-a585609ce95d
State: Peer in Cluster (Connected)

How reproducible:
1/1

Steps to Reproduce:
1. xfs_repair /dev/sdb1
2. fdisk /dev/sdb ( delete partition, create new partition )
3. check gluster volume status and logs

Actual results:

Peer is disconnected and reconnected again

Expected results:

Peer should not be disconnected.

Additional info:

Logs:
[2015-02-04 09:28:24.748664] I
[glusterd-handshake.c:1119:__glusterd_mgmt_hndsk_versions_ack] 0-management:
using the op-version 30600
[2015-02-04 09:28:41.655675] I
[glusterd-handshake.c:1119:__glusterd_mgmt_hndsk_versions_ack] 0-management:
using the op-version 30600
[2015-02-04 09:28:14.187049] I [MSGID: 106004]
[glusterd-handler.c:4365:__glusterd_peer_rpc_notify] 0-management: Peer
ad5425db-0b88-48a7-90b7-a585609ce95d, in Peer in Cluster state, has
disconnected from glusterd.
[2015-02-04 09:29:10.194630] C
[rpc-clnt-ping.c:109:rpc_clnt_ping_timer_expired] 0-management: server
10.0.2.101:24007 has not responded in the last 30 seconds, disconnecting.
[2015-02-04 09:29:10.195189] E [rpc-clnt.c:362:saved_frames_unwind] (-->
/lib64/libglusterfs.so.0(_gf_log_callingfn+0x186)[0x7f2cbad514c6] (-->
/lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f2cbab2401e] (-->
/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f2cbab2412e] (-->
/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x82)[0x7f2cbab25a92] (-->
/lib64/libgfrpc.so.0(rpc_clnt_notify+0x48)[0x7f2cbab26248] ))))) 0-management:
forced unwinding frame type(Peer mgmt) op(--(2)) called at 2015-02-04
09:28:10.277497 (xid=0x9c)
[2015-02-04 09:29:10.195369] E [rpc-clnt.c:362:saved_frames_unwind] (-->
/lib64/libglusterfs.so.0(_gf_log_callingfn+0x186)[0x7f2cbad514c6] (-->
/lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f2cbab2401e] (-->
/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f2cbab2412e] (-->
/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x82)[0x7f2cbab25a92] (-->
/lib64/libgfrpc.so.0(rpc_clnt_notify+0x48)[0x7f2cbab26248] ))))) 0-management:
forced unwinding frame type(GF-DUMP) op(NULL(2)) called at 2015-02-04
09:28:40.192112 (xid=0x9d)
[2015-02-04 09:29:10.195400] W [rpc-clnt-ping.c:154:rpc_clnt_ping_cbk]
0-management: socket disconnected
[2015-02-04 09:29:10.195426] I [MSGID: 106004]
[glusterd-handler.c:4365:__glusterd_peer_rpc_notify] 0-management: Peer
2414717a-7615-4b0a-9940-e5e82592482c, in Peer in Cluster state, has
disconnected from glusterd.
[2015-02-04 09:29:10.195638] W [glusterd-locks.c:647:glusterd_mgmt_v3_unlock]
(--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x186)[0x7f2cbad514c6] (-->
/usr/lib64/glusterfs/3.6.2/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x3f1)[0x7f2cabd89521]
(-->
/usr/lib64/glusterfs/3.6.2/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x1a2)[0x7f2cabd01442]
(-->
/usr/lib64/glusterfs/3.6.2/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x4c)[0x7f2cabcfa01c]
(--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x90)[0x7f2cbab26290] )))))
0-management: Lock for vol BlockStorage1-3 not held

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.