[Gluster-users] Fwd: Replica brick not working

Wed Dec 14 12:50:55 UTC 2016

On Wed, Dec 14, 2016 at 1:34 PM, Miloš Čučulović - MDPI <cuculovic at mdpi.com>
wrote:

> Atin,
>
> I was able to move forward a bit. Initially, I had this:
>
> sudo gluster peer status
> Number of Peers: 1
>
> Hostname: storage2
> Uuid: 32bef70a-9e31-403e-b9f3-ec9e1bd162ad
> State: Peer Rejected (Connected)
>
> Then, on storage2 I removed all from /var/lib/glusterd except the info
> file.
>
> Now I am getting another error message:
>
> sudo gluster peer status
> Number of Peers: 1
>
> Hostname: storage2
> Uuid: 32bef70a-9e31-403e-b9f3-ec9e1bd162ad
> State: Sent and Received peer request (Connected)
>

Please edit /var/lib/glusterd/peers/32bef70a-9e31-403e-b9f3-ec9e1bd162ad
file and set the state to 3 in storage1 and restart glusterd instance.

> But the add brick is still not working. I checked the hosts file and all
> seems ok, ping is also working well.
>
> The think I also need to know, when adding a new replicated brick, do I
> need to first sync all files, or the new brick server needs to be empty?
> Also, do I first need to create the same volume on the new server or adding
> it to the volume of server1 will do it automatically?
>
>
> - Kindest regards,
>
> Milos Cuculovic
> IT Manager
>
> ---
> MDPI AG
> Postfach, CH-4020 Basel, Switzerland
> Office: St. Alban-Anlage 66, 4052 Basel, Switzerland
> Tel. +41 61 683 77 35
> Fax +41 61 302 89 18
> Email: cuculovic at mdpi.com
> Skype: milos.cuculovic.mdpi
>
> On 14.12.2016 05:13, Atin Mukherjee wrote:
>
>> Milos,
>>
>> I just managed to take a look into a similar issue and my analysis is at
>> [1]. I remember you mentioning about some incorrect /etc/hosts entries
>> which lead to this same problem in earlier case, do you mind to recheck
>> the same?
>>
>> [1]
>> http://www.gluster.org/pipermail/gluster-users/2016-December/029443.html
>>
>> On Wed, Dec 14, 2016 at 2:57 AM, Miloš Čučulović - MDPI
>> <cuculovic at mdpi.com <mailto:cuculovic at mdpi.com>> wrote:
>>
>>     Hi All,
>>
>>     Moving forward with my issue, sorry for the late reply!
>>
>>     I had some issues with the storage2 server (original volume), then
>>     decided to use 3.9.0, si I have the latest version.
>>
>>     For that, I synced manually all the files to the storage server. I
>>     installed there gluster 3.9.0, started it, created new volume called
>>     storage and all seems to work ok.
>>
>>     Now, I need to create my replicated volume (add new brick on
>>     storage2 server). Almost all the files are there. So, I was adding
>>     on storage server:
>>
>>     * sudo gluter peer probe storage2
>>     * sudo gluster volume add-brick storage replica 2
>>     storage2:/data/data-cluster force
>>
>>     But there I am receiving "volume add-brick: failed: Host storage2 is
>>     not in 'Peer in Cluster' state"
>>
>>     Any idea?
>>
>>     - Kindest regards,
>>
>>     Milos Cuculovic
>>     IT Manager
>>
>>     ---
>>     MDPI AG
>>     Postfach, CH-4020 Basel, Switzerland
>>     Office: St. Alban-Anlage 66, 4052 Basel, Switzerland
>>     Tel. +41 61 683 77 35
>>     Fax +41 61 302 89 18
>>     Email: cuculovic at mdpi.com <mailto:cuculovic at mdpi.com>
>>     Skype: milos.cuculovic.mdpi
>>
>>     On 08.12.2016 17:52, Ravishankar N wrote:
>>
>>         On 12/08/2016 09:44 PM, Miloš Čučulović - MDPI wrote:
>>
>>             I was able to fix the sync by rsync-ing all the directories,
>>             then the
>>             hale started. The next problem :), as soon as there are
>>             files on the
>>             new brick, the gluster mount will render also this one for
>>             mounts, and
>>             the new brick is not ready yet, as the sync is not yet done,
>>             so it
>>             results on missing files on client side. I temporary removed
>>             the new
>>             brick, now I am running a manual rsync and will add the
>>             brick again,
>>             hope this could work.
>>
>>             What mechanism is managing this issue, I guess there is
>>             something per
>>             built to make a replica brick available only once the data is
>>             completely synced.
>>
>>         This mechanism was introduced in  3.7.9 or 3.7.10
>>         (http://review.gluster.org/#/c/13806/
>>         <http://review.gluster.org/#/c/13806/>). Before that version, you
>>         manually needed to set some xattrs on the bricks so that healing
>>         could
>>         happen in parallel while the client still would server reads
>>         from the
>>         original brick.  I can't find the link to the doc which
>>         describes these
>>         steps for setting xattrs.:-(
>>
>>         Calling it a day,
>>         Ravi
>>
>>
>>             - Kindest regards,
>>
>>             Milos Cuculovic
>>             IT Manager
>>
>>             ---
>>             MDPI AG
>>             Postfach, CH-4020 Basel, Switzerland
>>             Office: St. Alban-Anlage 66, 4052 Basel, Switzerland
>>             Tel. +41 61 683 77 35
>>             Fax +41 61 302 89 18
>>             Email: cuculovic at mdpi.com <mailto:cuculovic at mdpi.com>
>>             Skype: milos.cuculovic.mdpi
>>
>>             On 08.12.2016 16:17, Ravishankar N wrote:
>>
>>                 On 12/08/2016 06:53 PM, Atin Mukherjee wrote:
>>
>>
>>
>>                     On Thu, Dec 8, 2016 at 6:44 PM, Miloš Čučulović - MDPI
>>                     <cuculovic at mdpi.com <mailto:cuculovic at mdpi.com>
>>                     <mailto:cuculovic at mdpi.com
>>                     <mailto:cuculovic at mdpi.com>>> wrote:
>>
>>                         Ah, damn! I found the issue. On the storage
>>                     server, the storage2
>>                         IP address was wrong, I inversed two digits in
>>                     the /etc/hosts
>>                         file, sorry for that :(
>>
>>                         I was able to add the brick now, I started the
>>                     heal, but still no
>>                         data transfer visible.
>>
>>                 1. Are the files getting created on the new brick though?
>>                 2. Can you provide the output of `getfattr -d -m . -e hex
>>                 /data/data-cluster` on both bricks?
>>                 3. Is it possible to attach gdb to the self-heal daemon
>>                 on the original
>>                 (old) brick and get a backtrace?
>>                     `gdb -p <pid of self-heal daemon on the orignal
>> brick>`
>>                      thread apply all bt  -->share this output
>>                     quit gdb.
>>
>>
>>                 -Ravi
>>
>>
>>                     @Ravi/Pranith - can you help here?
>>
>>
>>
>>                         By doing gluster volume status, I have
>>
>>                         Status of volume: storage
>>                         Gluster process                       TCP Port
>>                     RDMA Port
>>                     Online  Pid
>>                     ------------------------------
>> ------------------------------------------------
>>
>>                         Brick storage2:/data/data-cluster     49152     0
>> Y
>>                          23101
>>                         Brick storage:/data/data-cluster      49152     0
>> Y
>>                          30773
>>                         Self-heal Daemon on localhost         N/A
>>                      N/A Y
>>                          30050
>>                         Self-heal Daemon on storage           N/A
>>                      N/A Y
>>                          30792
>>
>>
>>                         Any idea?
>>
>>                         On storage I have:
>>                         Number of Peers: 1
>>
>>                         Hostname: 195.65.194.217
>>                         Uuid: 7c988af2-9f76-4843-8e6f-d94866d57bb0
>>                         State: Peer in Cluster (Connected)
>>
>>
>>                         - Kindest regards,
>>
>>                         Milos Cuculovic
>>                         IT Manager
>>
>>                         ---
>>                         MDPI AG
>>                         Postfach, CH-4020 Basel, Switzerland
>>                         Office: St. Alban-Anlage 66, 4052 Basel,
>> Switzerland
>>                         Tel. +41 61 683 77 35
>>                         Fax +41 61 302 89 18
>>                         Email: cuculovic at mdpi.com
>>                     <mailto:cuculovic at mdpi.com>
>>                     <mailto:cuculovic at mdpi.com <mailto:cuculovic at mdpi.com
>> >>
>>                         Skype: milos.cuculovic.mdpi
>>
>>                         On 08.12.2016 13:55, Atin Mukherjee wrote:
>>
>>                             Can you resend the attachment as zip? I am
>>                     unable to extract
>>                     the
>>                             content? We shouldn't have 0 info file. What
>>                     does gluster peer
>>                             status
>>                             output say?
>>
>>                             On Thu, Dec 8, 2016 at 4:51 PM, Miloš
>>                     Čučulović - MDPI
>>                             <cuculovic at mdpi.com
>>                     <mailto:cuculovic at mdpi.com>
>>                     <mailto:cuculovic at mdpi.com <mailto:cuculovic at mdpi.com
>> >>
>>                             <mailto:cuculovic at mdpi.com
>>                     <mailto:cuculovic at mdpi.com>
>>                     <mailto:cuculovic at mdpi.com
>>                     <mailto:cuculovic at mdpi.com>>>> wrote:
>>
>>                                 I hope you received my last email Atin,
>>                     thank you!
>>
>>                                 - Kindest regards,
>>
>>                                 Milos Cuculovic
>>                                 IT Manager
>>
>>                                 ---
>>                                 MDPI AG
>>                                 Postfach, CH-4020 Basel, Switzerland
>>                                 Office: St. Alban-Anlage 66, 4052 Basel,
>>                     Switzerland
>>                                 Tel. +41 61 683 77 35
>>                                 Fax +41 61 302 89 18
>>                                 Email: cuculovic at mdpi.com
>>                     <mailto:cuculovic at mdpi.com>
>>                     <mailto:cuculovic at mdpi.com <mailto:cuculovic at mdpi.com
>> >>
>>                             <mailto:cuculovic at mdpi.com
>>                     <mailto:cuculovic at mdpi.com>
>>                     <mailto:cuculovic at mdpi.com <mailto:cuculovic at mdpi.com
>> >>>
>>                                 Skype: milos.cuculovic.mdpi
>>
>>                                 On 08.12.2016 10:28, Atin Mukherjee wrote:
>>
>>
>>                                     ---------- Forwarded message
>> ----------
>>                                     From: *Atin Mukherjee*
>>                     <amukherj at redhat.com <mailto:amukherj at redhat.com>
>>                             <mailto:amukherj at redhat.com
>>                     <mailto:amukherj at redhat.com>>
>>                                     <mailto:amukherj at redhat.com
>>                     <mailto:amukherj at redhat.com>
>>                             <mailto:amukherj at redhat.com
>>                     <mailto:amukherj at redhat.com>>>
>>                     <mailto:amukherj at redhat.com <mailto:
>> amukherj at redhat.com>
>>                             <mailto:amukherj at redhat.com
>>                     <mailto:amukherj at redhat.com>>
>>                                     <mailto:amukherj at redhat.com
>>                     <mailto:amukherj at redhat.com>
>>                             <mailto:amukherj at redhat.com
>>                     <mailto:amukherj at redhat.com>>>>>
>>                                     Date: Thu, Dec 8, 2016 at 11:56 AM
>>                                     Subject: Re: [Gluster-users] Replica
>>                     brick not working
>>                                     To: Ravishankar N
>>                     <ravishankar at redhat.com <mailto:
>> ravishankar at redhat.com>
>>                             <mailto:ravishankar at redhat.com
>>                     <mailto:ravishankar at redhat.com>>
>>                                     <mailto:ravishankar at redhat.com
>>                     <mailto:ravishankar at redhat.com>
>>                             <mailto:ravishankar at redhat.com
>>                     <mailto:ravishankar at redhat.com>>>
>>                             <mailto:ravishankar at redhat.com
>>                     <mailto:ravishankar at redhat.com>
>>                     <mailto:ravishankar at redhat.com
>>                     <mailto:ravishankar at redhat.com>>
>>                                     <mailto:ravishankar at redhat.com
>>                     <mailto:ravishankar at redhat.com>
>>                             <mailto:ravishankar at redhat.com
>>                     <mailto:ravishankar at redhat.com>>>>>
>>                                     Cc: Miloš Čučulović - MDPI
>>                     <cuculovic at mdpi.com <mailto:cuculovic at mdpi.com>
>>                             <mailto:cuculovic at mdpi.com
>>                     <mailto:cuculovic at mdpi.com>>
>>                                     <mailto:cuculovic at mdpi.com
>>                     <mailto:cuculovic at mdpi.com>
>>                     <mailto:cuculovic at mdpi.com <mailto:cuculovic at mdpi.com
>> >>>
>>                                     <mailto:cuculovic at mdpi.com
>>                     <mailto:cuculovic at mdpi.com>
>>                     <mailto:cuculovic at mdpi.com <mailto:cuculovic at mdpi.com
>> >>
>>                             <mailto:cuculovic at mdpi.com
>>                     <mailto:cuculovic at mdpi.com>
>>                     <mailto:cuculovic at mdpi.com
>>                     <mailto:cuculovic at mdpi.com>>>>>,
>>                                     Pranith Kumar Karampuri
>>                                     <pkarampu at redhat.com
>>                     <mailto:pkarampu at redhat.com>
>>                     <mailto:pkarampu at redhat.com
>>                     <mailto:pkarampu at redhat.com>>
>>                             <mailto:pkarampu at redhat.com
>>                     <mailto:pkarampu at redhat.com>
>>                     <mailto:pkarampu at redhat.com
>>                     <mailto:pkarampu at redhat.com>>>
>>                                     <mailto:pkarampu at redhat.com
>>                     <mailto:pkarampu at redhat.com>
>>                             <mailto:pkarampu at redhat.com
>>                     <mailto:pkarampu at redhat.com>>
>>                     <mailto:pkarampu at redhat.com <mailto:
>> pkarampu at redhat.com>
>>                             <mailto:pkarampu at redhat.com
>>                     <mailto:pkarampu at redhat.com>>>>>,
>>                                     gluster-users
>>                                     <gluster-users at gluster.org
>>                     <mailto:gluster-users at gluster.org>
>>                             <mailto:gluster-users at gluster.org
>>                     <mailto:gluster-users at gluster.org>>
>>                             <mailto:gluster-users at gluster.org
>>                     <mailto:gluster-users at gluster.org>
>>                             <mailto:gluster-users at gluster.org
>>                     <mailto:gluster-users at gluster.org>>>
>>                                     <mailto:gluster-users at gluster.org
>>                     <mailto:gluster-users at gluster.org>
>>                             <mailto:gluster-users at gluster.org
>>                     <mailto:gluster-users at gluster.org>>
>>                                     <mailto:gluster-users at gluster.org
>>                     <mailto:gluster-users at gluster.org>
>>                             <mailto:gluster-users at gluster.org
>>                     <mailto:gluster-users at gluster.org>>>>>
>>
>>
>>
>>
>>                                     On Thu, Dec 8, 2016 at 11:11 AM,
>>                     Ravishankar N
>>                                     <ravishankar at redhat.com
>>                     <mailto:ravishankar at redhat.com>
>>                             <mailto:ravishankar at redhat.com
>>                     <mailto:ravishankar at redhat.com>>
>>                     <mailto:ravishankar at redhat.com
>>                     <mailto:ravishankar at redhat.com>
>>                             <mailto:ravishankar at redhat.com
>>                     <mailto:ravishankar at redhat.com>>>
>>                                     <mailto:ravishankar at redhat.com
>>                     <mailto:ravishankar at redhat.com>
>>                             <mailto:ravishankar at redhat.com
>>                     <mailto:ravishankar at redhat.com>>
>>                     <mailto:ravishankar at redhat.com
>>                     <mailto:ravishankar at redhat.com>
>>                             <mailto:ravishankar at redhat.com
>>                     <mailto:ravishankar at redhat.com>>>>>
>>
>>                                     wrote:
>>
>>                                         On 12/08/2016 10:43 AM, Atin
>>                     Mukherjee wrote:
>>
>>                                             >From the log snippet:
>>
>>                                             [2016-12-07 09:15:35.677645]
>>                     I [MSGID: 106482]
>>
>>
>>                     [glusterd-brick-ops.c:442:__gl
>> usterd_handle_add_brick]
>>                                             0-management: Received add
>>                     brick req
>>                                             [2016-12-07 09:15:35.677708]
>>                     I [MSGID: 106062]
>>
>>
>>                     [glusterd-brick-ops.c:494:__gl
>> usterd_handle_add_brick]
>>                                             0-management: replica-count
>> is 2
>>                                             [2016-12-07 09:15:35.677735]
>>                     E [MSGID: 106291]
>>
>>
>>                     [glusterd-brick-ops.c:614:__gl
>> usterd_handle_add_brick]
>>                                     0-management:
>>
>>                                             The last log entry indicates
>>                     that we hit the
>>                             code path in
>>
>>                     gd_addbr_validate_replica_count ()
>>
>>                                                             if
>>                     (replica_count ==
>>                                     volinfo->replica_count) {
>>                                                                     if
>>                     (!(total_bricks %
>>                                             volinfo->dist_leaf_count)) {
>>
>>                         ret = 1;
>>
>>                         goto out;
>>                                             }
>>                                                             }
>>
>>
>>                                         It seems unlikely that this
>>                     snippet was hit
>>                             because we print
>>                                     the E
>>                                         [MSGID: 106291] in the above
>>                     message only if
>>                     ret==-1.
>>
>>                     gd_addbr_validate_replica_count() returns -1 and
>>                             yet not
>>                                     populates
>>                                         err_str only when in
>>                     volinfo->type doesn't match
>>                             any of the
>>                                     known
>>                                         volume types, so volinfo->type
>>                     is corrupted
>>                     perhaps?
>>
>>
>>                                     You are right, I missed that ret is
>>                     set to 1 here in
>>                             the above
>>                                     snippet.
>>
>>                                     @Milos - Can you please provide us
>>                     the volume info
>>                             file from
>>                                     /var/lib/glusterd/vols/<volname>/
>>                     from all the three
>>                             nodes to
>>                                     continue
>>                                     the analysis?
>>
>>
>>
>>                                         -Ravi
>>
>>                                             @Pranith, Ravi - Milos was
>>                     trying to convert a
>>                             dist (1 X 1)
>>                                             volume to a replicate (1 X
>>                     2) using add brick
>>                             and hit
>>                                     this issue
>>                                             where add-brick failed. The
>>                     cluster is
>>                             operating with 3.7.6.
>>                                             Could you help on what
>>                     scenario this code path
>>                             can be
>>                                     hit? One
>>                                             straight forward issue I see
>>                     here is missing
>>                             err_str in
>>                                     this path.
>>
>>
>>
>>
>>
>>
>>                                     --
>>
>>                                     ~ Atin (atinm)
>>
>>
>>
>>                                     --
>>
>>                                     ~ Atin (atinm)
>>
>>
>>
>>
>>                             --
>>
>>                             ~ Atin (atinm)
>>
>>
>>
>>
>>                     --
>>
>>                     ~ Atin (atinm)
>>
>>
>>
>>
>>
>>
>>
>>
>> --
>>
>> ~ Atin (atinm)
>>
>

-- 

~ Atin (atinm)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20161214/6ff72255/attachment.html>