[Gluster-users] Problem when rebooting geo-replication slave
Hans Höök
hans.hook at altrusoft.se
Tue Jan 21 14:20:15 UTC 2014
Hi list,
I have a problem when a geo-replicating slave has to be rebooted.
After reboot the slave is out of sync and the gluster demon fails to
even start.
I have a workaround procedure that seems to work but it seems I must be
doing something wrong or missing out on something.
I am currently using gluster 3.4.0 with the following setup.
Two replicating masters: fe and ni
One geo-replicating slave with periodic snapshots in zfs: nitinol
From master fe I have successfully setup geo-replication with:
gluster volume geo-replication gvarchive
nitinol:/zfspool/gluster/gvarchive start
All is fine... not really...
When slave nitinol is rebooted it becomes broken.
service glusterfs-server start # fails - the demon does not start with
following log entry:
[2014-01-21 13:14:53.352007] I [glusterfsd.c:1910:main]
0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 3.4.2
(/usr/sbin/glusterd -p /var/run/glusterd.pid)
[2014-01-21 13:14:53.354316] I [glusterd.c:961:init] 0-management: Using
/var/lib/glusterd as working directory
[2014-01-21 13:14:53.356431] I [socket.c:3480:socket_init]
0-socket.management: SSL support is NOT enabled
[2014-01-21 13:14:53.356490] I [socket.c:3495:socket_init]
0-socket.management: using system polling thread
[2014-01-21 13:14:53.357999] W [rdma.c:4197:__gf_rdma_ctx_create]
0-rpc-transport/rdma: rdma_cm event channel creation failed (No such device)
[2014-01-21 13:14:53.358055] E [rdma.c:4485:init] 0-rdma.management:
Failed to initialize IB Device
[2014-01-21 13:14:53.358136] E [rpc-transport.c:320:rpc_transport_load]
0-rpc-transport: 'rdma' initialization failed
[2014-01-21 13:14:53.358185] W [rpcsvc.c:1389:rpcsvc_transport_create]
0-rpc-service: cannot create listener, initing the transport failed
[2014-01-21 13:14:55.083839] I
[glusterd-store.c:1339:glusterd_restore_op_version] 0-glusterd:
retrieved op-version: 2
[2014-01-21 13:14:55.092907] E
[glusterd-store.c:1858:glusterd_store_retrieve_volume] 0-: Unknown key:
brick-0
[2014-01-21 13:14:55.093002] E
[glusterd-store.c:1858:glusterd_store_retrieve_volume] 0-: Unknown key:
brick-1
.....
[2014-01-21 13:14:55.741895] E
[glusterd-store.c:1858:glusterd_store_retrieve_volume] 0-: Unknown key:
brick-0
[2014-01-21 13:14:55.741989] E
[glusterd-store.c:1858:glusterd_store_retrieve_volume] 0-: Unknown key:
brick-1
[2014-01-21 13:14:55.792063] I
[glusterd-handler.c:2818:glusterd_friend_add] 0-management: connect
returned 0
[2014-01-21 13:14:55.792258] I [rpc-clnt.c:962:rpc_clnt_connection_init]
0-management: setting frame-timeout to 600
[2014-01-21 13:14:55.792416] I [socket.c:3480:socket_init] 0-management:
SSL support is NOT enabled
[2014-01-21 13:14:55.792443] I [socket.c:3495:socket_init] 0-management:
using system polling thread
[2014-01-21 13:14:55.796485] E
[glusterd-store.c:2487:glusterd_resolve_all_bricks] 0-glusterd: resolve
brick failed in restore
[2014-01-21 13:14:55.796546] E [xlator.c:390:xlator_init] 0-management:
Initialization of volume 'management' failed, review your volfile again
[2014-01-21 13:14:55.796574] E [graph.c:292:glusterfs_graph_init]
0-management: initializing translator failed
[2014-01-21 13:14:55.796596] E [graph.c:479:glusterfs_graph_activate]
0-graph: init failed
[2014-01-21 13:14:55.797136] W [glusterfsd.c:1002:cleanup_and_exit]
(-->/usr/sbin/glusterd(main+0x3cd) [0x7f737c1fb85d]
(-->/usr/sbin/glusterd(glusterfs_volumes_init+0xc0) [0x7f737c1fe650]
(-->/usr/sbin/glusterd(glusterfs_process_volfp+0x103)
[0x7f737c1fe553]))) 0-: received signum (0), shutting down
I have successfully corrected the situation by the following procedure:
# on slave:
rm -rf /var/lib/glusterd/vols
# on master
gluster volume geo-replication gvarchive
nitinol:/zfspool/gluster/gvarchive stop
gluster peer detach nitinol
# on slave:
service glusterfs-server start
# on master:
gluster peer probe nitinol
gluster volume geo-replication gvarchive
nitinol:/zfspool/gluster/gvarchive start
This does not seem correct.
Why does the volumes get out of sync?
Regards
Hans Höök
More information about the Gluster-users
mailing list