[Gluster-users] Replacing a node in a 4x2 distributed/replicated setup

Atin Mukherjee atin.mukherjee83 at gmail.com
Fri Oct 30 16:56:36 UTC 2015


This could very well be related to op-version. Could you look at the faulty
node's glusterd log and see the error log entries, that would give us the
exact reason of failure.

-Atin
Sent from one plus one
On Oct 30, 2015 5:35 PM, "Thomas Bätzler" <t.baetzler at bringe.com> wrote:

> Hi,
>
> can somebody help me with fixing our 8 node gluster please?
>
> Setup is as follows:
>
> root at glucfshead2:~# gluster volume info
>
> Volume Name: archive
> Type: Distributed-Replicate
> Volume ID: d888b302-2a35-4559-9bb0-4e182f49f9c6
> Status: Started
> Number of Bricks: 4 x 2 = 8
> Transport-type: tcp
> Bricks:
> Brick1: glucfshead1:/data/glusterfs/archive/brick1
> Brick2: glucfshead5:/data/glusterfs/archive/brick1
> Brick3: glucfshead2:/data/glusterfs/archive/brick1
> Brick4: glucfshead6:/data/glusterfs/archive/brick1
> Brick5: glucfshead3:/data/glusterfs/archive/brick1
> Brick6: glucfshead7:/data/glusterfs/archive/brick1
> Brick7: glucfshead4:/data/glusterfs/archive/brick1
> Brick8: glucfshead8:/data/glusterfs/archive/brick1
> Options Reconfigured:
> cluster.data-self-heal: off
> cluster.entry-self-heal: off
> cluster.metadata-self-heal: off
> features.lock-heal: on
> cluster.readdir-optimize: on
> performance.flush-behind: off
> performance.io-thread-count: 16
> features.quota: off
> performance.quick-read: on
> performance.stat-prefetch: off
> performance.io-cache: on
> performance.cache-refresh-timeout: 1
> nfs.disable: on
> performance.cache-max-file-size: 200kb
> performance.cache-size: 2GB
> performance.write-behind-window-size: 4MB
> performance.read-ahead: off
> storage.linux-aio: off
> diagnostics.brick-sys-log-level: WARNING
> cluster.self-heal-daemon: off
>
> Volume Name: archive2
> Type: Distributed-Replicate
> Volume ID: 0fe86e42-e67f-46d8-8ed0-d0e34f539d69
> Status: Started
> Number of Bricks: 4 x 2 = 8
> Transport-type: tcp
> Bricks:
> Brick1: glucfshead1:/data/glusterfs/archive2/brick1
> Brick2: glucfshead5:/data/glusterfs/archive2/brick1
> Brick3: glucfshead2:/data/glusterfs/archive2/brick1
> Brick4: glucfshead6:/data/glusterfs/archive2/brick1
> Brick5: glucfshead3:/data/glusterfs/archive2/brick1
> Brick6: glucfshead7:/data/glusterfs/archive2/brick1
> Brick7: glucfshead4:/data/glusterfs/archive2/brick1
> Brick8: glucfshead8:/data/glusterfs/archive2/brick1
> Options Reconfigured:
> cluster.metadata-self-heal: off
> cluster.entry-self-heal: off
> cluster.data-self-heal: off
> diagnostics.count-fop-hits: on
> diagnostics.latency-measurement: on
> features.lock-heal: on
> diagnostics.brick-sys-log-level: WARNING
> storage.linux-aio: off
> performance.read-ahead: off
> performance.write-behind-window-size: 4MB
> performance.cache-size: 2GB
> performance.cache-max-file-size: 200kb
> nfs.disable: on
> performance.cache-refresh-timeout: 1
> performance.io-cache: on
> performance.stat-prefetch: off
> performance.quick-read: on
> features.quota: off
> performance.io-thread-count: 16
> performance.flush-behind: off
> auth.allow: 172.16.15.*
> cluster.readdir-optimize: on
> cluster.self-heal-daemon: off
>
> Some time ago node, glucfshead1 broke down. After some fiddling it was
> decided not to deal with that immediately because the gluster was in
> production and a rebuild on 3.4 would basically render the gluster
> unusable.
>
> Recently it was felt that we needed to deal with the situation and we
> hired some experts to deal with the problem. So we reinstalled the
> broken node and gave it a new name/ip and upgraded all systems to 3.6.4.
>
> The plan was to probe the "new" node into the gluster and then do a
> brick-replace on it. However that did not go as expected.
>
> The node that we removed is now listed as "Peer Rejected":
>
> root at glucfshead2:~# gluster peer status
> Number of Peers: 7
>
> Hostname: glucfshead1
> Uuid: 09ed9a29-c923-4dc5-957a-e0d3e8032daf
> State: Peer Rejected (Disconnected)
>
> Hostname: glucfshead3
> Uuid: a17ae95d-4598-4cd7-9ae7-808af10fedb5
> State: Peer in Cluster (Connected)
>
> Hostname: glucfshead4
> Uuid: 8547dadd-96bf-45fe-b49d-bab8f995c928
> State: Peer in Cluster (Connected)
>
> Hostname: glucfshead5
> Uuid: 249da8ea-fda6-47ff-98e0-dbff99dcb3f2
> State: Peer in Cluster (Connected)
>
> Hostname: glucfshead6
> Uuid: a0229511-978c-4904-87ae-7e1b32ac2c72
> State: Peer in Cluster (Connected)
>
> Hostname: glucfshead7
> Uuid: 548ec75a-0131-4c92-aaa9-7c6ee7b47a63
> State: Peer in Cluster (Connected)
>
> Hostname: glucfshead8
> Uuid: 5e54cbc1-482c-460b-ac38-00c4b71c50b9
> State: Peer in Cluster (Connected)
>
> If I probe the replacement node (glucfshead9) it only ever shows up on
> one of my running nodes and it's in state "Rejected Peer (Connected)".
>
> How can we fix this - preferably without losing data?
>
> TIA,
> Thomas
>
> ---
> Diese E-Mail wurde von Avast Antivirus-Software auf Viren geprüft.
> https://www.avast.com/antivirus
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20151030/741a16d1/attachment.html>


More information about the Gluster-users mailing list