[Gluster-users] Fwd: Replica brick not working

Miloš Čučulović - MDPI cuculovic at mdpi.com
Wed Dec 14 08:04:16 UTC 2016


Atin,

I was able to move forward a bit. Initially, I had this:

sudo gluster peer status
Number of Peers: 1

Hostname: storage2
Uuid: 32bef70a-9e31-403e-b9f3-ec9e1bd162ad
State: Peer Rejected (Connected)

Then, on storage2 I removed all from /var/lib/glusterd except the info file.

Now I am getting another error message:

sudo gluster peer status
Number of Peers: 1

Hostname: storage2
Uuid: 32bef70a-9e31-403e-b9f3-ec9e1bd162ad
State: Sent and Received peer request (Connected)

But the add brick is still not working. I checked the hosts file and all 
seems ok, ping is also working well.

The think I also need to know, when adding a new replicated brick, do I 
need to first sync all files, or the new brick server needs to be empty? 
Also, do I first need to create the same volume on the new server or 
adding it to the volume of server1 will do it automatically?


- Kindest regards,

Milos Cuculovic
IT Manager

---
MDPI AG
Postfach, CH-4020 Basel, Switzerland
Office: St. Alban-Anlage 66, 4052 Basel, Switzerland
Tel. +41 61 683 77 35
Fax +41 61 302 89 18
Email: cuculovic at mdpi.com
Skype: milos.cuculovic.mdpi

On 14.12.2016 05:13, Atin Mukherjee wrote:
> Milos,
>
> I just managed to take a look into a similar issue and my analysis is at
> [1]. I remember you mentioning about some incorrect /etc/hosts entries
> which lead to this same problem in earlier case, do you mind to recheck
> the same?
>
> [1]
> http://www.gluster.org/pipermail/gluster-users/2016-December/029443.html
>
> On Wed, Dec 14, 2016 at 2:57 AM, Miloš Čučulović - MDPI
> <cuculovic at mdpi.com <mailto:cuculovic at mdpi.com>> wrote:
>
>     Hi All,
>
>     Moving forward with my issue, sorry for the late reply!
>
>     I had some issues with the storage2 server (original volume), then
>     decided to use 3.9.0, si I have the latest version.
>
>     For that, I synced manually all the files to the storage server. I
>     installed there gluster 3.9.0, started it, created new volume called
>     storage and all seems to work ok.
>
>     Now, I need to create my replicated volume (add new brick on
>     storage2 server). Almost all the files are there. So, I was adding
>     on storage server:
>
>     * sudo gluter peer probe storage2
>     * sudo gluster volume add-brick storage replica 2
>     storage2:/data/data-cluster force
>
>     But there I am receiving "volume add-brick: failed: Host storage2 is
>     not in 'Peer in Cluster' state"
>
>     Any idea?
>
>     - Kindest regards,
>
>     Milos Cuculovic
>     IT Manager
>
>     ---
>     MDPI AG
>     Postfach, CH-4020 Basel, Switzerland
>     Office: St. Alban-Anlage 66, 4052 Basel, Switzerland
>     Tel. +41 61 683 77 35
>     Fax +41 61 302 89 18
>     Email: cuculovic at mdpi.com <mailto:cuculovic at mdpi.com>
>     Skype: milos.cuculovic.mdpi
>
>     On 08.12.2016 17:52, Ravishankar N wrote:
>
>         On 12/08/2016 09:44 PM, Miloš Čučulović - MDPI wrote:
>
>             I was able to fix the sync by rsync-ing all the directories,
>             then the
>             hale started. The next problem :), as soon as there are
>             files on the
>             new brick, the gluster mount will render also this one for
>             mounts, and
>             the new brick is not ready yet, as the sync is not yet done,
>             so it
>             results on missing files on client side. I temporary removed
>             the new
>             brick, now I am running a manual rsync and will add the
>             brick again,
>             hope this could work.
>
>             What mechanism is managing this issue, I guess there is
>             something per
>             built to make a replica brick available only once the data is
>             completely synced.
>
>         This mechanism was introduced in  3.7.9 or 3.7.10
>         (http://review.gluster.org/#/c/13806/
>         <http://review.gluster.org/#/c/13806/>). Before that version, you
>         manually needed to set some xattrs on the bricks so that healing
>         could
>         happen in parallel while the client still would server reads
>         from the
>         original brick.  I can't find the link to the doc which
>         describes these
>         steps for setting xattrs.:-(
>
>         Calling it a day,
>         Ravi
>
>
>             - Kindest regards,
>
>             Milos Cuculovic
>             IT Manager
>
>             ---
>             MDPI AG
>             Postfach, CH-4020 Basel, Switzerland
>             Office: St. Alban-Anlage 66, 4052 Basel, Switzerland
>             Tel. +41 61 683 77 35
>             Fax +41 61 302 89 18
>             Email: cuculovic at mdpi.com <mailto:cuculovic at mdpi.com>
>             Skype: milos.cuculovic.mdpi
>
>             On 08.12.2016 16:17, Ravishankar N wrote:
>
>                 On 12/08/2016 06:53 PM, Atin Mukherjee wrote:
>
>
>
>                     On Thu, Dec 8, 2016 at 6:44 PM, Miloš Čučulović - MDPI
>                     <cuculovic at mdpi.com <mailto:cuculovic at mdpi.com>
>                     <mailto:cuculovic at mdpi.com
>                     <mailto:cuculovic at mdpi.com>>> wrote:
>
>                         Ah, damn! I found the issue. On the storage
>                     server, the storage2
>                         IP address was wrong, I inversed two digits in
>                     the /etc/hosts
>                         file, sorry for that :(
>
>                         I was able to add the brick now, I started the
>                     heal, but still no
>                         data transfer visible.
>
>                 1. Are the files getting created on the new brick though?
>                 2. Can you provide the output of `getfattr -d -m . -e hex
>                 /data/data-cluster` on both bricks?
>                 3. Is it possible to attach gdb to the self-heal daemon
>                 on the original
>                 (old) brick and get a backtrace?
>                     `gdb -p <pid of self-heal daemon on the orignal brick>`
>                      thread apply all bt  -->share this output
>                     quit gdb.
>
>
>                 -Ravi
>
>
>                     @Ravi/Pranith - can you help here?
>
>
>
>                         By doing gluster volume status, I have
>
>                         Status of volume: storage
>                         Gluster process                       TCP Port
>                     RDMA Port
>                     Online  Pid
>                     ------------------------------------------------------------------------------
>
>                         Brick storage2:/data/data-cluster     49152     0 Y
>                          23101
>                         Brick storage:/data/data-cluster      49152     0 Y
>                          30773
>                         Self-heal Daemon on localhost         N/A
>                      N/A Y
>                          30050
>                         Self-heal Daemon on storage           N/A
>                      N/A Y
>                          30792
>
>
>                         Any idea?
>
>                         On storage I have:
>                         Number of Peers: 1
>
>                         Hostname: 195.65.194.217
>                         Uuid: 7c988af2-9f76-4843-8e6f-d94866d57bb0
>                         State: Peer in Cluster (Connected)
>
>
>                         - Kindest regards,
>
>                         Milos Cuculovic
>                         IT Manager
>
>                         ---
>                         MDPI AG
>                         Postfach, CH-4020 Basel, Switzerland
>                         Office: St. Alban-Anlage 66, 4052 Basel, Switzerland
>                         Tel. +41 61 683 77 35
>                         Fax +41 61 302 89 18
>                         Email: cuculovic at mdpi.com
>                     <mailto:cuculovic at mdpi.com>
>                     <mailto:cuculovic at mdpi.com <mailto:cuculovic at mdpi.com>>
>                         Skype: milos.cuculovic.mdpi
>
>                         On 08.12.2016 13:55, Atin Mukherjee wrote:
>
>                             Can you resend the attachment as zip? I am
>                     unable to extract
>                     the
>                             content? We shouldn't have 0 info file. What
>                     does gluster peer
>                             status
>                             output say?
>
>                             On Thu, Dec 8, 2016 at 4:51 PM, Miloš
>                     Čučulović - MDPI
>                             <cuculovic at mdpi.com
>                     <mailto:cuculovic at mdpi.com>
>                     <mailto:cuculovic at mdpi.com <mailto:cuculovic at mdpi.com>>
>                             <mailto:cuculovic at mdpi.com
>                     <mailto:cuculovic at mdpi.com>
>                     <mailto:cuculovic at mdpi.com
>                     <mailto:cuculovic at mdpi.com>>>> wrote:
>
>                                 I hope you received my last email Atin,
>                     thank you!
>
>                                 - Kindest regards,
>
>                                 Milos Cuculovic
>                                 IT Manager
>
>                                 ---
>                                 MDPI AG
>                                 Postfach, CH-4020 Basel, Switzerland
>                                 Office: St. Alban-Anlage 66, 4052 Basel,
>                     Switzerland
>                                 Tel. +41 61 683 77 35
>                                 Fax +41 61 302 89 18
>                                 Email: cuculovic at mdpi.com
>                     <mailto:cuculovic at mdpi.com>
>                     <mailto:cuculovic at mdpi.com <mailto:cuculovic at mdpi.com>>
>                             <mailto:cuculovic at mdpi.com
>                     <mailto:cuculovic at mdpi.com>
>                     <mailto:cuculovic at mdpi.com <mailto:cuculovic at mdpi.com>>>
>                                 Skype: milos.cuculovic.mdpi
>
>                                 On 08.12.2016 10:28, Atin Mukherjee wrote:
>
>
>                                     ---------- Forwarded message ----------
>                                     From: *Atin Mukherjee*
>                     <amukherj at redhat.com <mailto:amukherj at redhat.com>
>                             <mailto:amukherj at redhat.com
>                     <mailto:amukherj at redhat.com>>
>                                     <mailto:amukherj at redhat.com
>                     <mailto:amukherj at redhat.com>
>                             <mailto:amukherj at redhat.com
>                     <mailto:amukherj at redhat.com>>>
>                     <mailto:amukherj at redhat.com <mailto:amukherj at redhat.com>
>                             <mailto:amukherj at redhat.com
>                     <mailto:amukherj at redhat.com>>
>                                     <mailto:amukherj at redhat.com
>                     <mailto:amukherj at redhat.com>
>                             <mailto:amukherj at redhat.com
>                     <mailto:amukherj at redhat.com>>>>>
>                                     Date: Thu, Dec 8, 2016 at 11:56 AM
>                                     Subject: Re: [Gluster-users] Replica
>                     brick not working
>                                     To: Ravishankar N
>                     <ravishankar at redhat.com <mailto:ravishankar at redhat.com>
>                             <mailto:ravishankar at redhat.com
>                     <mailto:ravishankar at redhat.com>>
>                                     <mailto:ravishankar at redhat.com
>                     <mailto:ravishankar at redhat.com>
>                             <mailto:ravishankar at redhat.com
>                     <mailto:ravishankar at redhat.com>>>
>                             <mailto:ravishankar at redhat.com
>                     <mailto:ravishankar at redhat.com>
>                     <mailto:ravishankar at redhat.com
>                     <mailto:ravishankar at redhat.com>>
>                                     <mailto:ravishankar at redhat.com
>                     <mailto:ravishankar at redhat.com>
>                             <mailto:ravishankar at redhat.com
>                     <mailto:ravishankar at redhat.com>>>>>
>                                     Cc: Miloš Čučulović - MDPI
>                     <cuculovic at mdpi.com <mailto:cuculovic at mdpi.com>
>                             <mailto:cuculovic at mdpi.com
>                     <mailto:cuculovic at mdpi.com>>
>                                     <mailto:cuculovic at mdpi.com
>                     <mailto:cuculovic at mdpi.com>
>                     <mailto:cuculovic at mdpi.com <mailto:cuculovic at mdpi.com>>>
>                                     <mailto:cuculovic at mdpi.com
>                     <mailto:cuculovic at mdpi.com>
>                     <mailto:cuculovic at mdpi.com <mailto:cuculovic at mdpi.com>>
>                             <mailto:cuculovic at mdpi.com
>                     <mailto:cuculovic at mdpi.com>
>                     <mailto:cuculovic at mdpi.com
>                     <mailto:cuculovic at mdpi.com>>>>>,
>                                     Pranith Kumar Karampuri
>                                     <pkarampu at redhat.com
>                     <mailto:pkarampu at redhat.com>
>                     <mailto:pkarampu at redhat.com
>                     <mailto:pkarampu at redhat.com>>
>                             <mailto:pkarampu at redhat.com
>                     <mailto:pkarampu at redhat.com>
>                     <mailto:pkarampu at redhat.com
>                     <mailto:pkarampu at redhat.com>>>
>                                     <mailto:pkarampu at redhat.com
>                     <mailto:pkarampu at redhat.com>
>                             <mailto:pkarampu at redhat.com
>                     <mailto:pkarampu at redhat.com>>
>                     <mailto:pkarampu at redhat.com <mailto:pkarampu at redhat.com>
>                             <mailto:pkarampu at redhat.com
>                     <mailto:pkarampu at redhat.com>>>>>,
>                                     gluster-users
>                                     <gluster-users at gluster.org
>                     <mailto:gluster-users at gluster.org>
>                             <mailto:gluster-users at gluster.org
>                     <mailto:gluster-users at gluster.org>>
>                             <mailto:gluster-users at gluster.org
>                     <mailto:gluster-users at gluster.org>
>                             <mailto:gluster-users at gluster.org
>                     <mailto:gluster-users at gluster.org>>>
>                                     <mailto:gluster-users at gluster.org
>                     <mailto:gluster-users at gluster.org>
>                             <mailto:gluster-users at gluster.org
>                     <mailto:gluster-users at gluster.org>>
>                                     <mailto:gluster-users at gluster.org
>                     <mailto:gluster-users at gluster.org>
>                             <mailto:gluster-users at gluster.org
>                     <mailto:gluster-users at gluster.org>>>>>
>
>
>
>
>                                     On Thu, Dec 8, 2016 at 11:11 AM,
>                     Ravishankar N
>                                     <ravishankar at redhat.com
>                     <mailto:ravishankar at redhat.com>
>                             <mailto:ravishankar at redhat.com
>                     <mailto:ravishankar at redhat.com>>
>                     <mailto:ravishankar at redhat.com
>                     <mailto:ravishankar at redhat.com>
>                             <mailto:ravishankar at redhat.com
>                     <mailto:ravishankar at redhat.com>>>
>                                     <mailto:ravishankar at redhat.com
>                     <mailto:ravishankar at redhat.com>
>                             <mailto:ravishankar at redhat.com
>                     <mailto:ravishankar at redhat.com>>
>                     <mailto:ravishankar at redhat.com
>                     <mailto:ravishankar at redhat.com>
>                             <mailto:ravishankar at redhat.com
>                     <mailto:ravishankar at redhat.com>>>>>
>
>                                     wrote:
>
>                                         On 12/08/2016 10:43 AM, Atin
>                     Mukherjee wrote:
>
>                                             >From the log snippet:
>
>                                             [2016-12-07 09:15:35.677645]
>                     I [MSGID: 106482]
>
>
>                     [glusterd-brick-ops.c:442:__glusterd_handle_add_brick]
>                                             0-management: Received add
>                     brick req
>                                             [2016-12-07 09:15:35.677708]
>                     I [MSGID: 106062]
>
>
>                     [glusterd-brick-ops.c:494:__glusterd_handle_add_brick]
>                                             0-management: replica-count is 2
>                                             [2016-12-07 09:15:35.677735]
>                     E [MSGID: 106291]
>
>
>                     [glusterd-brick-ops.c:614:__glusterd_handle_add_brick]
>                                     0-management:
>
>                                             The last log entry indicates
>                     that we hit the
>                             code path in
>
>                     gd_addbr_validate_replica_count ()
>
>                                                             if
>                     (replica_count ==
>                                     volinfo->replica_count) {
>                                                                     if
>                     (!(total_bricks %
>                                             volinfo->dist_leaf_count)) {
>
>                         ret = 1;
>
>                         goto out;
>                                             }
>                                                             }
>
>
>                                         It seems unlikely that this
>                     snippet was hit
>                             because we print
>                                     the E
>                                         [MSGID: 106291] in the above
>                     message only if
>                     ret==-1.
>
>                     gd_addbr_validate_replica_count() returns -1 and
>                             yet not
>                                     populates
>                                         err_str only when in
>                     volinfo->type doesn't match
>                             any of the
>                                     known
>                                         volume types, so volinfo->type
>                     is corrupted
>                     perhaps?
>
>
>                                     You are right, I missed that ret is
>                     set to 1 here in
>                             the above
>                                     snippet.
>
>                                     @Milos - Can you please provide us
>                     the volume info
>                             file from
>                                     /var/lib/glusterd/vols/<volname>/
>                     from all the three
>                             nodes to
>                                     continue
>                                     the analysis?
>
>
>
>                                         -Ravi
>
>                                             @Pranith, Ravi - Milos was
>                     trying to convert a
>                             dist (1 X 1)
>                                             volume to a replicate (1 X
>                     2) using add brick
>                             and hit
>                                     this issue
>                                             where add-brick failed. The
>                     cluster is
>                             operating with 3.7.6.
>                                             Could you help on what
>                     scenario this code path
>                             can be
>                                     hit? One
>                                             straight forward issue I see
>                     here is missing
>                             err_str in
>                                     this path.
>
>
>
>
>
>
>                                     --
>
>                                     ~ Atin (atinm)
>
>
>
>                                     --
>
>                                     ~ Atin (atinm)
>
>
>
>
>                             --
>
>                             ~ Atin (atinm)
>
>
>
>
>                     --
>
>                     ~ Atin (atinm)
>
>
>
>
>
>
>
>
> --
>
> ~ Atin (atinm)


More information about the Gluster-users mailing list