[Gluster-users] Replacing a node in a 4x2 distributed/replicated setup
Thomas Bätzler
t.baetzler at bringe.com
Tue Nov 3 14:07:30 UTC 2015
Hi,
Atin Mukherjee wrote:
> This could very well be related to op-version. Could you look at the
> faulty node's glusterd log and see the error log entries, that would
> give us the exact reason of failure.
op-version is 1 across all the nodes.
I've made some progress: by persistently wiping /var/lib/glusterd except
fpr glusterd.info and restarting glusterd on the new node, I've
progressed to a state where all nodes agree that my replacement node is
part of the gluster:
root at glucfshead2:~# for i in `seq 2 9`; do echo "glucfshead$i:"; ssh
glucfshead$i "gluster peer status" | grep -A2 glucfshead9 ; done
glucfshead2:
Hostname: glucfshead9
Uuid: 040e61dd-fd02-4957-8833-cf5708b837f0
State: Peer in Cluster (Connected)
glucfshead3:
Hostname: glucfshead9
Uuid: 040e61dd-fd02-4957-8833-cf5708b837f0
State: Peer in Cluster (Connected)
glucfshead4:
Hostname: glucfshead9
Uuid: 040e61dd-fd02-4957-8833-cf5708b837f0
State: Peer in Cluster (Connected)
glucfshead5:
Hostname: glucfshead9
Uuid: 040e61dd-fd02-4957-8833-cf5708b837f0
State: Peer in Cluster (Connected)
glucfshead6:
Hostname: glucfshead9
Uuid: 040e61dd-fd02-4957-8833-cf5708b837f0
State: Peer in Cluster (Connected)
glucfshead7:
Hostname: glucfshead9
Uuid: 040e61dd-fd02-4957-8833-cf5708b837f0
State: Peer in Cluster (Connected)
glucfshead8:
Hostname: glucfshead9
Uuid: 040e61dd-fd02-4957-8833-cf5708b837f0
State: Peer in Cluster (Connected)
The new node sees all of the other nodes:
root at glucfshead9:~# gluster peer status
Number of Peers: 7
Hostname: glucfshead4.bo.rz.pixum.net
Uuid: 8547dadd-96bf-45fe-b49d-bab8f995c928
State: Peer in Cluster (Connected)
Hostname: glucfshead2
Uuid: 73596f88-13ae-47d7-ba05-da7c347f6141
State: Peer in Cluster (Connected)
Hostname: glucfshead3
Uuid: a17ae95d-4598-4cd7-9ae7-808af10fedb5
State: Peer in Cluster (Connected)
Hostname: glucfshead5.bo.rz.pixum.net
Uuid: 249da8ea-fda6-47ff-98e0-dbff99dcb3f2
State: Peer in Cluster (Connected)
Hostname: glucfshead6
Uuid: a0229511-978c-4904-87ae-7e1b32ac2c72
State: Peer in Cluster (Connected)
Hostname: glucfshead7
Uuid: 548ec75a-0131-4c92-aaa9-7c6ee7b47a63
State: Peer in Cluster (Connected)
Hostname: glucfshead8
Uuid: 5e54cbc1-482c-460b-ac38-00c4b71c50b9
State: Peer in Cluster (Connected)
The old nodes all agree that the to-be-replaced node is offline:
root at glucfshead2:~# for i in `seq 2 9`; do echo "glucfshead$i:"; ssh
glucfshead$i "gluster peer status" | grep -B2 Rej ; done
glucfshead2:
Hostname: glucfshead1
Uuid: 09ed9a29-c923-4dc5-957a-e0d3e8032daf
State: Peer Rejected (Disconnected)
glucfshead3:
Hostname: glucfshead1
Uuid: 09ed9a29-c923-4dc5-957a-e0d3e8032daf
State: Peer Rejected (Disconnected)
glucfshead4:
Hostname: glucfshead1
Uuid: 09ed9a29-c923-4dc5-957a-e0d3e8032daf
State: Peer Rejected (Disconnected)
glucfshead5:
Hostname: glucfshead1
Uuid: 09ed9a29-c923-4dc5-957a-e0d3e8032daf
State: Peer Rejected (Disconnected)
glucfshead6:
Hostname: glucfshead1
Uuid: 09ed9a29-c923-4dc5-957a-e0d3e8032daf
State: Peer Rejected (Disconnected)
glucfshead7:
Hostname: glucfshead1
Uuid: 09ed9a29-c923-4dc5-957a-e0d3e8032daf
State: Peer Rejected (Disconnected)
glucfshead8:
Hostname: glucfshead1
Uuid: 09ed9a29-c923-4dc5-957a-e0d3e8032daf
State: Peer Rejected (Disconnected)
glucfshead9:
If I try to replace the downed brick with my new brick it says it's
successful:
root at glucfshead2:~# gluster volume replace-brick archive
glucfshead1:/data/glusterfs/archive/brick1
glucfshead9:/data/glusterfs/archive/brick1/brick commit force
volume replace-brick: success: replace-brick commit successful
However on checking the broken brick is still show as online:
root at glucfshead2:~# gluster volume info
Volume Name: archive
Type: Distributed-Replicate
Volume ID: d888b302-2a35-4559-9bb0-4e182f49f9c6
Status: Started
Number of Bricks: 4 x 2 = 8
Transport-type: tcp
Bricks:
Brick1: glucfshead1:/data/glusterfs/archive/brick1
Brick2: glucfshead5:/data/glusterfs/archive/brick1
Brick3: glucfshead2:/data/glusterfs/archive/brick1
Brick4: glucfshead6:/data/glusterfs/archive/brick1
Brick5: glucfshead3:/data/glusterfs/archive/brick1
Brick6: glucfshead7:/data/glusterfs/archive/brick1
Brick7: glucfshead4:/data/glusterfs/archive/brick1
Brick8: glucfshead8:/data/glusterfs/archive/brick1
Options Reconfigured:
cluster.data-self-heal: off
cluster.entry-self-heal: off
cluster.metadata-self-heal: off
features.lock-heal: on
cluster.readdir-optimize: on
auth.allow: 172.16.15.*
performance.flush-behind: off
performance.io-thread-count: 16
features.quota: off
performance.quick-read: on
performance.stat-prefetch: off
performance.io-cache: on
performance.cache-refresh-timeout: 1
nfs.disable: on
performance.cache-max-file-size: 200kb
performance.cache-size: 2GB
performance.write-behind-window-size: 4MB
performance.read-ahead: off
storage.linux-aio: off
diagnostics.brick-sys-log-level: INFO
server.statedump-path: /var/tmp
cluster.self-heal-daemon: off
All of the old bricks complain loudly that they can't connect to
glucfshead1:
[2015-11-03 13:54:59.422135] I [MSGID: 106004]
[glusterd-handler.c:4398:__glusterd_peer_rpc_notify] 0-management: Peer
09ed9a29-c923-4dc5-957a-e0d3e8032daf, in Peer Rejected state, has
disconnected from glusterd.
[2015-11-03 13:56:24.996215] I
[glusterd-replace-brick.c:99:__glusterd_handle_replace_brick]
0-management: Received replace brick req
[2015-11-03 13:56:24.996283] I
[glusterd-replace-brick.c:154:__glusterd_handle_replace_brick]
0-management: Received replace brick commit-force request
[2015-11-03 13:56:25.016345] E
[glusterd-rpc-ops.c:1087:__glusterd_stage_op_cbk] 0-management: Received
stage RJT from uuid: 040e61dd-fd02-4957-8833-cf5708b837f0
The new server only logs "Stage failed".
[2015-11-03 13:56:25.015942] E
[glusterd-op-sm.c:4585:glusterd_op_ac_stage_op] 0-management: Stage
failed on operation 'Volume Replace brick', Status : -1
I tried to detach glucfshead1 since it's no longer online, but I only
get a message that I can't do it since that server is still part of a
volume. Any further ideas ideas that I could try?
TIA,
Thomas
---
Diese E-Mail wurde von Avast Antivirus-Software auf Viren geprüft.
https://www.avast.com/antivirus
More information about the Gluster-users
mailing list