[Gluster-users] Fwd: Replica brick not working

Pranith Kumar Karampuri pkarampu at redhat.com
Thu Dec 8 17:47:28 UTC 2016


On Thu, Dec 8, 2016 at 10:22 PM, Ravishankar N <ravishankar at redhat.com>
wrote:

> On 12/08/2016 09:44 PM, Miloš Čučulović - MDPI wrote:
>
>> I was able to fix the sync by rsync-ing all the directories, then the
>> hale started. The next problem :), as soon as there are files on the new
>> brick, the gluster mount will render also this one for mounts, and the new
>> brick is not ready yet, as the sync is not yet done, so it results on
>> missing files on client side. I temporary removed the new brick, now I am
>> running a manual rsync and will add the brick again, hope this could work.
>>
>> What mechanism is managing this issue, I guess there is something per
>> built to make a replica brick available only once the data is completely
>> synced.
>>
> This mechanism was introduced in  3.7.9 or 3.7.10 (
> http://review.gluster.org/#/c/13806/). Before that version, you manually
> needed to set some xattrs on the bricks so that healing could happen in
> parallel while the client still would server reads from the original
> brick.  I can't find the link to the doc which describes these steps for
> setting xattrs.:-(
>

https://gluster.readthedocs.io/en/latest/Administrator%20Guide/Managing%20Volumes/#replace-brick


> Calling it a day,
> Ravi
>
>
>> - Kindest regards,
>>
>> Milos Cuculovic
>> IT Manager
>>
>> ---
>> MDPI AG
>> Postfach, CH-4020 Basel, Switzerland
>> Office: St. Alban-Anlage 66, 4052 Basel, Switzerland
>> Tel. +41 61 683 77 35
>> Fax +41 61 302 89 18
>> Email: cuculovic at mdpi.com
>> Skype: milos.cuculovic.mdpi
>>
>> On 08.12.2016 16:17, Ravishankar N wrote:
>>
>>> On 12/08/2016 06:53 PM, Atin Mukherjee wrote:
>>>
>>>>
>>>>
>>>> On Thu, Dec 8, 2016 at 6:44 PM, Miloš Čučulović - MDPI
>>>> <cuculovic at mdpi.com <mailto:cuculovic at mdpi.com>> wrote:
>>>>
>>>>     Ah, damn! I found the issue. On the storage server, the storage2
>>>>     IP address was wrong, I inversed two digits in the /etc/hosts
>>>>     file, sorry for that :(
>>>>
>>>>     I was able to add the brick now, I started the heal, but still no
>>>>     data transfer visible.
>>>>
>>>> 1. Are the files getting created on the new brick though?
>>> 2. Can you provide the output of `getfattr -d -m . -e hex
>>> /data/data-cluster` on both bricks?
>>> 3. Is it possible to attach gdb to the self-heal daemon on the original
>>> (old) brick and get a backtrace?
>>>     `gdb -p <pid of self-heal daemon on the orignal brick>`
>>>      thread apply all bt  -->share this output
>>>     quit gdb.
>>>
>>>
>>> -Ravi
>>>
>>>>
>>>> @Ravi/Pranith - can you help here?
>>>>
>>>>
>>>>
>>>>     By doing gluster volume status, I have
>>>>
>>>>     Status of volume: storage
>>>>     Gluster process                       TCP Port  RDMA Port Online
>>>> Pid
>>>> ------------------------------------------------------------
>>>> ------------------
>>>>     Brick storage2:/data/data-cluster     49152     0 Y
>>>>      23101
>>>>     Brick storage:/data/data-cluster      49152     0 Y
>>>>      30773
>>>>     Self-heal Daemon on localhost         N/A       N/A Y
>>>>      30050
>>>>     Self-heal Daemon on storage           N/A       N/A Y
>>>>      30792
>>>>
>>>>
>>>>     Any idea?
>>>>
>>>>     On storage I have:
>>>>     Number of Peers: 1
>>>>
>>>>     Hostname: 195.65.194.217
>>>>     Uuid: 7c988af2-9f76-4843-8e6f-d94866d57bb0
>>>>     State: Peer in Cluster (Connected)
>>>>
>>>>
>>>>     - Kindest regards,
>>>>
>>>>     Milos Cuculovic
>>>>     IT Manager
>>>>
>>>>     ---
>>>>     MDPI AG
>>>>     Postfach, CH-4020 Basel, Switzerland
>>>>     Office: St. Alban-Anlage 66, 4052 Basel, Switzerland
>>>>     Tel. +41 61 683 77 35
>>>>     Fax +41 61 302 89 18
>>>>     Email: cuculovic at mdpi.com <mailto:cuculovic at mdpi.com>
>>>>     Skype: milos.cuculovic.mdpi
>>>>
>>>>     On 08.12.2016 13:55, Atin Mukherjee wrote:
>>>>
>>>>         Can you resend the attachment as zip? I am unable to extract the
>>>>         content? We shouldn't have 0 info file. What does gluster peer
>>>>         status
>>>>         output say?
>>>>
>>>>         On Thu, Dec 8, 2016 at 4:51 PM, Miloš Čučulović - MDPI
>>>>         <cuculovic at mdpi.com <mailto:cuculovic at mdpi.com>
>>>>         <mailto:cuculovic at mdpi.com <mailto:cuculovic at mdpi.com>>> wrote:
>>>>
>>>>             I hope you received my last email Atin, thank you!
>>>>
>>>>             - Kindest regards,
>>>>
>>>>             Milos Cuculovic
>>>>             IT Manager
>>>>
>>>>             ---
>>>>             MDPI AG
>>>>             Postfach, CH-4020 Basel, Switzerland
>>>>             Office: St. Alban-Anlage 66, 4052 Basel, Switzerland
>>>>             Tel. +41 61 683 77 35
>>>>             Fax +41 61 302 89 18
>>>>             Email: cuculovic at mdpi.com <mailto:cuculovic at mdpi.com>
>>>>         <mailto:cuculovic at mdpi.com <mailto:cuculovic at mdpi.com>>
>>>>             Skype: milos.cuculovic.mdpi
>>>>
>>>>             On 08.12.2016 10:28, Atin Mukherjee wrote:
>>>>
>>>>
>>>>                 ---------- Forwarded message ----------
>>>>                 From: *Atin Mukherjee* <amukherj at redhat.com
>>>>         <mailto:amukherj at redhat.com>
>>>>                 <mailto:amukherj at redhat.com
>>>>         <mailto:amukherj at redhat.com>> <mailto:amukherj at redhat.com
>>>>         <mailto:amukherj at redhat.com>
>>>>                 <mailto:amukherj at redhat.com
>>>>         <mailto:amukherj at redhat.com>>>>
>>>>                 Date: Thu, Dec 8, 2016 at 11:56 AM
>>>>                 Subject: Re: [Gluster-users] Replica brick not working
>>>>                 To: Ravishankar N <ravishankar at redhat.com
>>>>         <mailto:ravishankar at redhat.com>
>>>>                 <mailto:ravishankar at redhat.com
>>>>         <mailto:ravishankar at redhat.com>>
>>>>         <mailto:ravishankar at redhat.com <mailto:ravishankar at redhat.com>
>>>>                 <mailto:ravishankar at redhat.com
>>>>         <mailto:ravishankar at redhat.com>>>>
>>>>                 Cc: Miloš Čučulović - MDPI <cuculovic at mdpi.com
>>>>         <mailto:cuculovic at mdpi.com>
>>>>                 <mailto:cuculovic at mdpi.com <mailto:cuculovic at mdpi.com>>
>>>>                 <mailto:cuculovic at mdpi.com <mailto:cuculovic at mdpi.com>
>>>>         <mailto:cuculovic at mdpi.com <mailto:cuculovic at mdpi.com>>>>,
>>>>                 Pranith Kumar Karampuri
>>>>                 <pkarampu at redhat.com <mailto:pkarampu at redhat.com>
>>>>         <mailto:pkarampu at redhat.com <mailto:pkarampu at redhat.com>>
>>>>                 <mailto:pkarampu at redhat.com
>>>>         <mailto:pkarampu at redhat.com> <mailto:pkarampu at redhat.com
>>>>         <mailto:pkarampu at redhat.com>>>>,
>>>>                 gluster-users
>>>>                 <gluster-users at gluster.org
>>>>         <mailto:gluster-users at gluster.org>
>>>>         <mailto:gluster-users at gluster.org
>>>>         <mailto:gluster-users at gluster.org>>
>>>>                 <mailto:gluster-users at gluster.org
>>>>         <mailto:gluster-users at gluster.org>
>>>>                 <mailto:gluster-users at gluster.org
>>>>         <mailto:gluster-users at gluster.org>>>>
>>>>
>>>>
>>>>
>>>>
>>>>                 On Thu, Dec 8, 2016 at 11:11 AM, Ravishankar N
>>>>                 <ravishankar at redhat.com
>>>>         <mailto:ravishankar at redhat.com> <mailto:ravishankar at redhat.com
>>>>         <mailto:ravishankar at redhat.com>>
>>>>                 <mailto:ravishankar at redhat.com
>>>>         <mailto:ravishankar at redhat.com> <mailto:ravishankar at redhat.com
>>>>         <mailto:ravishankar at redhat.com>>>>
>>>>
>>>>                 wrote:
>>>>
>>>>                     On 12/08/2016 10:43 AM, Atin Mukherjee wrote:
>>>>
>>>>                         >From the log snippet:
>>>>
>>>>                         [2016-12-07 09:15:35.677645] I [MSGID: 106482]
>>>>
>>>>         [glusterd-brick-ops.c:442:__glusterd_handle_add_brick]
>>>>                         0-management: Received add brick req
>>>>                         [2016-12-07 09:15:35.677708] I [MSGID: 106062]
>>>>
>>>>         [glusterd-brick-ops.c:494:__glusterd_handle_add_brick]
>>>>                         0-management: replica-count is 2
>>>>                         [2016-12-07 09:15:35.677735] E [MSGID: 106291]
>>>>
>>>>         [glusterd-brick-ops.c:614:__glusterd_handle_add_brick]
>>>>                 0-management:
>>>>
>>>>                         The last log entry indicates that we hit the
>>>>         code path in
>>>>                         gd_addbr_validate_replica_count ()
>>>>
>>>>                                         if (replica_count ==
>>>>                 volinfo->replica_count) {
>>>>                                                 if (!(total_bricks %
>>>>                         volinfo->dist_leaf_count)) {
>>>>                                                         ret = 1;
>>>>                                                         goto out;
>>>>                         }
>>>>                                         }
>>>>
>>>>
>>>>                     It seems unlikely that this snippet was hit
>>>>         because we print
>>>>                 the E
>>>>                     [MSGID: 106291] in the above message only if
>>>> ret==-1.
>>>>                     gd_addbr_validate_replica_count() returns -1 and
>>>>         yet not
>>>>                 populates
>>>>                     err_str only when in volinfo->type doesn't match
>>>>         any of the
>>>>                 known
>>>>                     volume types, so volinfo->type is corrupted perhaps?
>>>>
>>>>
>>>>                 You are right, I missed that ret is set to 1 here in
>>>>         the above
>>>>                 snippet.
>>>>
>>>>                 @Milos - Can you please provide us the volume info
>>>>         file from
>>>>                 /var/lib/glusterd/vols/<volname>/ from all the three
>>>>         nodes to
>>>>                 continue
>>>>                 the analysis?
>>>>
>>>>
>>>>
>>>>                     -Ravi
>>>>
>>>>                         @Pranith, Ravi - Milos was trying to convert a
>>>>         dist (1 X 1)
>>>>                         volume to a replicate (1 X 2) using add brick
>>>>         and hit
>>>>                 this issue
>>>>                         where add-brick failed. The cluster is
>>>>         operating with 3.7.6.
>>>>                         Could you help on what scenario this code path
>>>>         can be
>>>>                 hit? One
>>>>                         straight forward issue I see here is missing
>>>>         err_str in
>>>>                 this path.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>                 --
>>>>
>>>>                 ~ Atin (atinm)
>>>>
>>>>
>>>>
>>>>                 --
>>>>
>>>>                 ~ Atin (atinm)
>>>>
>>>>
>>>>
>>>>
>>>>         --
>>>>
>>>>         ~ Atin (atinm)
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> ~ Atin (atinm)
>>>>
>>>
>>>
>>>
>


-- 
Pranith
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20161208/9dce947e/attachment.html>


More information about the Gluster-users mailing list