[Gluster-users] Fwd: Replica brick not working

Atin Mukherjee amukherj at redhat.com
Wed Dec 14 04:13:23 UTC 2016


Milos,

I just managed to take a look into a similar issue and my analysis is at
[1]. I remember you mentioning about some incorrect /etc/hosts entries
which lead to this same problem in earlier case, do you mind to recheck the
same?

[1] http://www.gluster.org/pipermail/gluster-users/2016-December/029443.html

On Wed, Dec 14, 2016 at 2:57 AM, Miloš Čučulović - MDPI <cuculovic at mdpi.com>
wrote:

> Hi All,
>
> Moving forward with my issue, sorry for the late reply!
>
> I had some issues with the storage2 server (original volume), then decided
> to use 3.9.0, si I have the latest version.
>
> For that, I synced manually all the files to the storage server. I
> installed there gluster 3.9.0, started it, created new volume called
> storage and all seems to work ok.
>
> Now, I need to create my replicated volume (add new brick on storage2
> server). Almost all the files are there. So, I was adding on storage server:
>
> * sudo gluter peer probe storage2
> * sudo gluster volume add-brick storage replica 2
> storage2:/data/data-cluster force
>
> But there I am receiving "volume add-brick: failed: Host storage2 is not
> in 'Peer in Cluster' state"
>
> Any idea?
>
> - Kindest regards,
>
> Milos Cuculovic
> IT Manager
>
> ---
> MDPI AG
> Postfach, CH-4020 Basel, Switzerland
> Office: St. Alban-Anlage 66, 4052 Basel, Switzerland
> Tel. +41 61 683 77 35
> Fax +41 61 302 89 18
> Email: cuculovic at mdpi.com
> Skype: milos.cuculovic.mdpi
>
> On 08.12.2016 17:52, Ravishankar N wrote:
>
>> On 12/08/2016 09:44 PM, Miloš Čučulović - MDPI wrote:
>>
>>> I was able to fix the sync by rsync-ing all the directories, then the
>>> hale started. The next problem :), as soon as there are files on the
>>> new brick, the gluster mount will render also this one for mounts, and
>>> the new brick is not ready yet, as the sync is not yet done, so it
>>> results on missing files on client side. I temporary removed the new
>>> brick, now I am running a manual rsync and will add the brick again,
>>> hope this could work.
>>>
>>> What mechanism is managing this issue, I guess there is something per
>>> built to make a replica brick available only once the data is
>>> completely synced.
>>>
>> This mechanism was introduced in  3.7.9 or 3.7.10
>> (http://review.gluster.org/#/c/13806/). Before that version, you
>> manually needed to set some xattrs on the bricks so that healing could
>> happen in parallel while the client still would server reads from the
>> original brick.  I can't find the link to the doc which describes these
>> steps for setting xattrs.:-(
>>
>> Calling it a day,
>> Ravi
>>
>>>
>>> - Kindest regards,
>>>
>>> Milos Cuculovic
>>> IT Manager
>>>
>>> ---
>>> MDPI AG
>>> Postfach, CH-4020 Basel, Switzerland
>>> Office: St. Alban-Anlage 66, 4052 Basel, Switzerland
>>> Tel. +41 61 683 77 35
>>> Fax +41 61 302 89 18
>>> Email: cuculovic at mdpi.com
>>> Skype: milos.cuculovic.mdpi
>>>
>>> On 08.12.2016 16:17, Ravishankar N wrote:
>>>
>>>> On 12/08/2016 06:53 PM, Atin Mukherjee wrote:
>>>>
>>>>>
>>>>>
>>>>> On Thu, Dec 8, 2016 at 6:44 PM, Miloš Čučulović - MDPI
>>>>> <cuculovic at mdpi.com <mailto:cuculovic at mdpi.com>> wrote:
>>>>>
>>>>>     Ah, damn! I found the issue. On the storage server, the storage2
>>>>>     IP address was wrong, I inversed two digits in the /etc/hosts
>>>>>     file, sorry for that :(
>>>>>
>>>>>     I was able to add the brick now, I started the heal, but still no
>>>>>     data transfer visible.
>>>>>
>>>>> 1. Are the files getting created on the new brick though?
>>>> 2. Can you provide the output of `getfattr -d -m . -e hex
>>>> /data/data-cluster` on both bricks?
>>>> 3. Is it possible to attach gdb to the self-heal daemon on the original
>>>> (old) brick and get a backtrace?
>>>>     `gdb -p <pid of self-heal daemon on the orignal brick>`
>>>>      thread apply all bt  -->share this output
>>>>     quit gdb.
>>>>
>>>>
>>>> -Ravi
>>>>
>>>>>
>>>>> @Ravi/Pranith - can you help here?
>>>>>
>>>>>
>>>>>
>>>>>     By doing gluster volume status, I have
>>>>>
>>>>>     Status of volume: storage
>>>>>     Gluster process                       TCP Port  RDMA Port
>>>>> Online  Pid
>>>>> ------------------------------------------------------------
>>>>> ------------------
>>>>>
>>>>>     Brick storage2:/data/data-cluster     49152     0 Y
>>>>>      23101
>>>>>     Brick storage:/data/data-cluster      49152     0 Y
>>>>>      30773
>>>>>     Self-heal Daemon on localhost         N/A       N/A Y
>>>>>      30050
>>>>>     Self-heal Daemon on storage           N/A       N/A Y
>>>>>      30792
>>>>>
>>>>>
>>>>>     Any idea?
>>>>>
>>>>>     On storage I have:
>>>>>     Number of Peers: 1
>>>>>
>>>>>     Hostname: 195.65.194.217
>>>>>     Uuid: 7c988af2-9f76-4843-8e6f-d94866d57bb0
>>>>>     State: Peer in Cluster (Connected)
>>>>>
>>>>>
>>>>>     - Kindest regards,
>>>>>
>>>>>     Milos Cuculovic
>>>>>     IT Manager
>>>>>
>>>>>     ---
>>>>>     MDPI AG
>>>>>     Postfach, CH-4020 Basel, Switzerland
>>>>>     Office: St. Alban-Anlage 66, 4052 Basel, Switzerland
>>>>>     Tel. +41 61 683 77 35
>>>>>     Fax +41 61 302 89 18
>>>>>     Email: cuculovic at mdpi.com <mailto:cuculovic at mdpi.com>
>>>>>     Skype: milos.cuculovic.mdpi
>>>>>
>>>>>     On 08.12.2016 13:55, Atin Mukherjee wrote:
>>>>>
>>>>>         Can you resend the attachment as zip? I am unable to extract
>>>>> the
>>>>>         content? We shouldn't have 0 info file. What does gluster peer
>>>>>         status
>>>>>         output say?
>>>>>
>>>>>         On Thu, Dec 8, 2016 at 4:51 PM, Miloš Čučulović - MDPI
>>>>>         <cuculovic at mdpi.com <mailto:cuculovic at mdpi.com>
>>>>>         <mailto:cuculovic at mdpi.com <mailto:cuculovic at mdpi.com>>>
>>>>> wrote:
>>>>>
>>>>>             I hope you received my last email Atin, thank you!
>>>>>
>>>>>             - Kindest regards,
>>>>>
>>>>>             Milos Cuculovic
>>>>>             IT Manager
>>>>>
>>>>>             ---
>>>>>             MDPI AG
>>>>>             Postfach, CH-4020 Basel, Switzerland
>>>>>             Office: St. Alban-Anlage 66, 4052 Basel, Switzerland
>>>>>             Tel. +41 61 683 77 35
>>>>>             Fax +41 61 302 89 18
>>>>>             Email: cuculovic at mdpi.com <mailto:cuculovic at mdpi.com>
>>>>>         <mailto:cuculovic at mdpi.com <mailto:cuculovic at mdpi.com>>
>>>>>             Skype: milos.cuculovic.mdpi
>>>>>
>>>>>             On 08.12.2016 10:28, Atin Mukherjee wrote:
>>>>>
>>>>>
>>>>>                 ---------- Forwarded message ----------
>>>>>                 From: *Atin Mukherjee* <amukherj at redhat.com
>>>>>         <mailto:amukherj at redhat.com>
>>>>>                 <mailto:amukherj at redhat.com
>>>>>         <mailto:amukherj at redhat.com>> <mailto:amukherj at redhat.com
>>>>>         <mailto:amukherj at redhat.com>
>>>>>                 <mailto:amukherj at redhat.com
>>>>>         <mailto:amukherj at redhat.com>>>>
>>>>>                 Date: Thu, Dec 8, 2016 at 11:56 AM
>>>>>                 Subject: Re: [Gluster-users] Replica brick not working
>>>>>                 To: Ravishankar N <ravishankar at redhat.com
>>>>>         <mailto:ravishankar at redhat.com>
>>>>>                 <mailto:ravishankar at redhat.com
>>>>>         <mailto:ravishankar at redhat.com>>
>>>>>         <mailto:ravishankar at redhat.com <mailto:ravishankar at redhat.com>
>>>>>                 <mailto:ravishankar at redhat.com
>>>>>         <mailto:ravishankar at redhat.com>>>>
>>>>>                 Cc: Miloš Čučulović - MDPI <cuculovic at mdpi.com
>>>>>         <mailto:cuculovic at mdpi.com>
>>>>>                 <mailto:cuculovic at mdpi.com <mailto:cuculovic at mdpi.com
>>>>> >>
>>>>>                 <mailto:cuculovic at mdpi.com <mailto:cuculovic at mdpi.com>
>>>>>         <mailto:cuculovic at mdpi.com <mailto:cuculovic at mdpi.com>>>>,
>>>>>                 Pranith Kumar Karampuri
>>>>>                 <pkarampu at redhat.com <mailto:pkarampu at redhat.com>
>>>>>         <mailto:pkarampu at redhat.com <mailto:pkarampu at redhat.com>>
>>>>>                 <mailto:pkarampu at redhat.com
>>>>>         <mailto:pkarampu at redhat.com> <mailto:pkarampu at redhat.com
>>>>>         <mailto:pkarampu at redhat.com>>>>,
>>>>>                 gluster-users
>>>>>                 <gluster-users at gluster.org
>>>>>         <mailto:gluster-users at gluster.org>
>>>>>         <mailto:gluster-users at gluster.org
>>>>>         <mailto:gluster-users at gluster.org>>
>>>>>                 <mailto:gluster-users at gluster.org
>>>>>         <mailto:gluster-users at gluster.org>
>>>>>                 <mailto:gluster-users at gluster.org
>>>>>         <mailto:gluster-users at gluster.org>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>                 On Thu, Dec 8, 2016 at 11:11 AM, Ravishankar N
>>>>>                 <ravishankar at redhat.com
>>>>>         <mailto:ravishankar at redhat.com> <mailto:ravishankar at redhat.com
>>>>>         <mailto:ravishankar at redhat.com>>
>>>>>                 <mailto:ravishankar at redhat.com
>>>>>         <mailto:ravishankar at redhat.com> <mailto:ravishankar at redhat.com
>>>>>         <mailto:ravishankar at redhat.com>>>>
>>>>>
>>>>>                 wrote:
>>>>>
>>>>>                     On 12/08/2016 10:43 AM, Atin Mukherjee wrote:
>>>>>
>>>>>                         >From the log snippet:
>>>>>
>>>>>                         [2016-12-07 09:15:35.677645] I [MSGID: 106482]
>>>>>
>>>>>         [glusterd-brick-ops.c:442:__glusterd_handle_add_brick]
>>>>>                         0-management: Received add brick req
>>>>>                         [2016-12-07 09:15:35.677708] I [MSGID: 106062]
>>>>>
>>>>>         [glusterd-brick-ops.c:494:__glusterd_handle_add_brick]
>>>>>                         0-management: replica-count is 2
>>>>>                         [2016-12-07 09:15:35.677735] E [MSGID: 106291]
>>>>>
>>>>>         [glusterd-brick-ops.c:614:__glusterd_handle_add_brick]
>>>>>                 0-management:
>>>>>
>>>>>                         The last log entry indicates that we hit the
>>>>>         code path in
>>>>>                         gd_addbr_validate_replica_count ()
>>>>>
>>>>>                                         if (replica_count ==
>>>>>                 volinfo->replica_count) {
>>>>>                                                 if (!(total_bricks %
>>>>>                         volinfo->dist_leaf_count)) {
>>>>>                                                         ret = 1;
>>>>>                                                         goto out;
>>>>>                         }
>>>>>                                         }
>>>>>
>>>>>
>>>>>                     It seems unlikely that this snippet was hit
>>>>>         because we print
>>>>>                 the E
>>>>>                     [MSGID: 106291] in the above message only if
>>>>> ret==-1.
>>>>>                     gd_addbr_validate_replica_count() returns -1 and
>>>>>         yet not
>>>>>                 populates
>>>>>                     err_str only when in volinfo->type doesn't match
>>>>>         any of the
>>>>>                 known
>>>>>                     volume types, so volinfo->type is corrupted
>>>>> perhaps?
>>>>>
>>>>>
>>>>>                 You are right, I missed that ret is set to 1 here in
>>>>>         the above
>>>>>                 snippet.
>>>>>
>>>>>                 @Milos - Can you please provide us the volume info
>>>>>         file from
>>>>>                 /var/lib/glusterd/vols/<volname>/ from all the three
>>>>>         nodes to
>>>>>                 continue
>>>>>                 the analysis?
>>>>>
>>>>>
>>>>>
>>>>>                     -Ravi
>>>>>
>>>>>                         @Pranith, Ravi - Milos was trying to convert a
>>>>>         dist (1 X 1)
>>>>>                         volume to a replicate (1 X 2) using add brick
>>>>>         and hit
>>>>>                 this issue
>>>>>                         where add-brick failed. The cluster is
>>>>>         operating with 3.7.6.
>>>>>                         Could you help on what scenario this code path
>>>>>         can be
>>>>>                 hit? One
>>>>>                         straight forward issue I see here is missing
>>>>>         err_str in
>>>>>                 this path.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>                 --
>>>>>
>>>>>                 ~ Atin (atinm)
>>>>>
>>>>>
>>>>>
>>>>>                 --
>>>>>
>>>>>                 ~ Atin (atinm)
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>         --
>>>>>
>>>>>         ~ Atin (atinm)
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> ~ Atin (atinm)
>>>>>
>>>>
>>>>
>>>>
>>
>>


-- 

~ Atin (atinm)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20161214/df16301f/attachment.html>


More information about the Gluster-users mailing list