[Bugs] [Bug 1652430] glusterd fails to start, when glusterd is restarted in a loop for every 45 seconds while volume creation is in-progress

bugzilla at redhat.com bugzilla at redhat.com
Thu Nov 22 04:30:39 UTC 2018


https://bugzilla.redhat.com/show_bug.cgi?id=1652430



--- Comment #1 from Atin Mukherjee <amukherj at redhat.com> ---
Description of problem:
On OCS setup(Independent mode), with 3 gluster nodes and brick-multiplexing
enabled.
Started pvc create for 100 volumes. While volume create is going on ran below
script on gluster node N1 
"for i in `seq 1 100`; do systemctl restart glusterd; sleep 45; done "
between 15 to 20 iterations of above script, glusterd is not starting on the
node N1. I suspect this is because of volume mismatch between the gluster
nodes.


Version-Release number of selected component (if applicable):
mainline

How reproducible:
1/1

Steps to Reproduce:
1. On OCS setup(Independent mode CRS), enabled brick-multiplexing
2. Started creating volume-claims of 1GiB using "for i in `seq 1 100`; do sh
pvc-create.sh claim$i 1 ; done"
3. Volume create succeeded for 50 volumes
4. Issued glusterd restart on node N1 continuosly for every 45 seconds time gap

Actual results:
On node N1 glusterd is unable to start and seen mismatch of 1 volume. 
There is a volume mismatch between heketi-cli volume list and gluster volume
list

Expected results:
glusterd should restart successfully and volume list should be same between
gluster v list and heketi-cli volume list 

Additional info:
Peer status from Node N2 and N3
[root at dhcp35-156 vols]# gluster peer status
Number of Peers: 2

Hostname: dhcp35-110.lab.eng.blr.redhat.com
Uuid: cb5c8529-0b49-4741-b12f-4798e55c769e
State: Peer in Cluster (Disconnected)

Hostname: 10.70.35.222
Uuid: 19bb9f88-c5ab-4651-982b-be5e078d1f4a
State: Peer in Cluster (Connected)

[root at dhcp35-222 ~]# gluster peer status
Number of Peers: 2

Hostname: dhcp35-110.lab.eng.blr.redhat.com
Uuid: cb5c8529-0b49-4741-b12f-4798e55c769e
State: Peer in Cluster (Disconnected)

Hostname: 10.70.35.156
Uuid: bf626278-3af5-455d-ad0b-31b231062353
State: Peer in Cluster (Connected)



##########################################################
[root at dhcp35-110 ~]# systemctl status glusterd
● glusterd.service - GlusterFS, a clustered file-system server
   Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled; vendor
preset: disabled)
   Active: failed (Result: exit-code) since Wed 2018-11-21 15:19:23 IST; 59min
ago
  Process: 13611 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid
--log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, status=1/FAILURE)
 Main PID: 10818 (code=exited, status=15)
    Tasks: 833
   CGroup: /system.slice/glusterd.service
           ├─10556 /usr/sbin/glusterfsd -s 10.70.35.110 --volfile-id
vol_glusterfs_claim60_456c496e-ed71-11e8-9e77-525400c3b207.10.70.35.110.var-lib-heketi-mounts-vg_cf23...
           ├─11056 /usr/sbin/glusterfs -s localhost --volfile-id
gluster/glustershd -p /var/run/gluster/glustershd/glustershd.pid -l
/var/log/glusterfs/glustershd.log -S ...
           ├─15160 /usr/sbin/glusterfsd -s 10.70.35.110 --volfile-id
vol_glusterfs_claim01_21256873-ed5c-11e8-9e77-525400c3b207.10.70.35.110.var-lib-heketi-mounts-vg_d45d...
           └─19913 /usr/sbin/glusterfsd -s 10.70.35.110 --volfile-id
heketidbstorage.10.70.35.110.var-lib-heketi-mounts-vg_d45dd79574b274788519f64901344d68-brick_207c3753...

Nov 21 15:19:15 dhcp35-110.lab.eng.blr.redhat.com systemd[1]: Starting
GlusterFS, a clustered file-system server...
Nov 21 15:19:23 dhcp35-110.lab.eng.blr.redhat.com systemd[1]: glusterd.service:
control process exited, code=exited status=1
Nov 21 15:19:23 dhcp35-110.lab.eng.blr.redhat.com systemd[1]: Failed to start
GlusterFS, a clustered file-system server.
Nov 21 15:19:23 dhcp35-110.lab.eng.blr.redhat.com systemd[1]: Unit
glusterd.service entered failed state.
Nov 21 15:19:23 dhcp35-110.lab.eng.blr.redhat.com systemd[1]: glusterd.service
failed.
##########################################################

glusterd log
###########################################################
[2018-11-21 09:49:15.431202] I [MSGID: 100030] [glusterfsd.c:2504:main]
0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 3.12.2 (args:
/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO)
[2018-11-21 09:49:15.440327] I [MSGID: 106478] [glusterd.c:1451:init]
0-management: Maximum allowed open file descriptors set to 65536
[2018-11-21 09:49:15.440415] I [MSGID: 106479] [glusterd.c:1509:init]
0-management: Using /var/lib/glusterd as working directory
[2018-11-21 09:49:15.440437] I [MSGID: 106479] [glusterd.c:1514:init]
0-management: Using /var/run/gluster as pid file working directory
[2018-11-21 09:49:15.450782] W [MSGID: 103071]
[rdma.c:4597:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event channel
creation failed [No such device]
[2018-11-21 09:49:15.450833] W [MSGID: 103055] [rdma.c:4906:init]
0-rdma.management: Failed to initialize IB Device
[2018-11-21 09:49:15.450848] W [rpc-transport.c:350:rpc_transport_load]
0-rpc-transport: 'rdma' initialization failed
[2018-11-21 09:49:15.450976] W [rpcsvc.c:1661:rpcsvc_create_listener]
0-rpc-service: cannot create listener, initing the transport failed
[2018-11-21 09:49:15.451004] E [MSGID: 106243] [glusterd.c:1792:init]
0-management: creation of 1 listeners failed, continuing with succeeded
transport
[2018-11-21 09:49:19.453464] I [MSGID: 106513]
[glusterd-store.c:2249:glusterd_restore_op_version] 0-glusterd: retrieved
op-version: 31304
[2018-11-21 09:49:19.454348] I [MSGID: 106544]
[glusterd.c:158:glusterd_uuid_init] 0-management: retrieved UUID:
cb5c8529-0b49-4741-b12f-4798e55c769e
[2018-11-21 09:49:23.668857] E [MSGID: 106206]
[glusterd-store.c:2890:glusterd_store_update_volinfo] 0-management: Failed to
get next store iter
[2018-11-21 09:49:23.668912] E [MSGID: 106207]
[glusterd-store.c:3177:glusterd_store_retrieve_volume] 0-management: Failed to
update volinfo for vol_glusterfs_claim36_575133c6-ed71-11e8-9e77-525400c3b207
volume
[2018-11-21 09:49:23.668949] E [MSGID: 106201]
[glusterd-store.c:3394:glusterd_store_retrieve_volumes] 0-management: Unable to
restore volume: vol_glusterfs_claim36_575133c6-ed71-11e8-9e77-525400c3b207
[2018-11-21 09:49:23.669006] E [MSGID: 101019] [xlator.c:505:xlator_init]
0-management: Initialization of volume 'management' failed, review your volfile
again
[2018-11-21 09:49:23.669028] E [MSGID: 101066]
[graph.c:367:glusterfs_graph_init] 0-management: initializing translator failed
[2018-11-21 09:49:23.669049] E [MSGID: 101176]
[graph.c:738:glusterfs_graph_activate] 0-graph: init failed
[2018-11-21 09:49:23.670077] W [glusterfsd.c:1367:cleanup_and_exit]
(-->/usr/sbin/glusterd(glusterfs_volumes_init+0xfd) [0x562eea369fcd]
-->/usr/sbin/glusterd(glusterfs_process_volfp+0x163) [0x562eea369e73]
-->/usr/sbin/glusterd(cleanup_and_exit+0x6b) [0x562eea3693eb] ) 0-: received
signum (-1), shutting down

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Bugs mailing list