[Gluster-users] Fwd: Replica brick not working
Pranith Kumar Karampuri
pkarampu at redhat.com
Thu Dec 8 17:58:10 UTC 2016
On Thu, Dec 8, 2016 at 11:25 PM, Pranith Kumar Karampuri <
pkarampu at redhat.com> wrote:
>
>
> On Thu, Dec 8, 2016 at 11:17 PM, Pranith Kumar Karampuri <
> pkarampu at redhat.com> wrote:
>
>>
>>
>> On Thu, Dec 8, 2016 at 10:22 PM, Ravishankar N <ravishankar at redhat.com>
>> wrote:
>>
>>> On 12/08/2016 09:44 PM, Miloš Čučulović - MDPI wrote:
>>>
>>>> I was able to fix the sync by rsync-ing all the directories, then the
>>>> hale started. The next problem :), as soon as there are files on the new
>>>> brick, the gluster mount will render also this one for mounts, and the new
>>>> brick is not ready yet, as the sync is not yet done, so it results on
>>>> missing files on client side. I temporary removed the new brick, now I am
>>>> running a manual rsync and will add the brick again, hope this could work.
>>>>
>>>> What mechanism is managing this issue, I guess there is something per
>>>> built to make a replica brick available only once the data is completely
>>>> synced.
>>>>
>>> This mechanism was introduced in 3.7.9 or 3.7.10 (
>>> http://review.gluster.org/#/c/13806/). Before that version, you
>>> manually needed to set some xattrs on the bricks so that healing could
>>> happen in parallel while the client still would server reads from the
>>> original brick. I can't find the link to the doc which describes these
>>> steps for setting xattrs.:-(
>>>
>>
>> https://gluster.readthedocs.io/en/latest/Administrator%20Gui
>> de/Managing%20Volumes/#replace-brick
>>
>
> Oh this is addition of bricks?
> Just do the following:
> 1) Bring the new brick down by killing it.
> 2) On the root of the mount directory(Let's call it /mnt) do:
>
> mkdir /mnt/<name-of-nonexistent-dir>
> rmdir /mnt/<name-of-nonexistent-dir>
> setfattr -n trusted.non-existent-key -v abc /mnt
> setfattr -x trusted.non-existent-key /mnt
>
> 3) Start the volume using: "gluster volume start <volname> force"
>
> This will trigger the heal which will make sure everything is healed and
> the application will only see the correct data.
>
> Since you did an explicit rsync, there is no gurantee that things should
> work as expected. We will be adding the steps above to documentation.
>
Please note that you need to do these steps exactly, If you do the
mkdir/rmdir/setfattr steps by bringing the good brick, reverse heal will
happen and the data will be removed.
>
>>
>>
>>> Calling it a day,
>>> Ravi
>>>
>>>
>>>> - Kindest regards,
>>>>
>>>> Milos Cuculovic
>>>> IT Manager
>>>>
>>>> ---
>>>> MDPI AG
>>>> Postfach, CH-4020 Basel, Switzerland
>>>> Office: St. Alban-Anlage 66, 4052 Basel, Switzerland
>>>> Tel. +41 61 683 77 35
>>>> Fax +41 61 302 89 18
>>>> Email: cuculovic at mdpi.com
>>>> Skype: milos.cuculovic.mdpi
>>>>
>>>> On 08.12.2016 16:17, Ravishankar N wrote:
>>>>
>>>>> On 12/08/2016 06:53 PM, Atin Mukherjee wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Dec 8, 2016 at 6:44 PM, Miloš Čučulović - MDPI
>>>>>> <cuculovic at mdpi.com <mailto:cuculovic at mdpi.com>> wrote:
>>>>>>
>>>>>> Ah, damn! I found the issue. On the storage server, the storage2
>>>>>> IP address was wrong, I inversed two digits in the /etc/hosts
>>>>>> file, sorry for that :(
>>>>>>
>>>>>> I was able to add the brick now, I started the heal, but still no
>>>>>> data transfer visible.
>>>>>>
>>>>>> 1. Are the files getting created on the new brick though?
>>>>> 2. Can you provide the output of `getfattr -d -m . -e hex
>>>>> /data/data-cluster` on both bricks?
>>>>> 3. Is it possible to attach gdb to the self-heal daemon on the original
>>>>> (old) brick and get a backtrace?
>>>>> `gdb -p <pid of self-heal daemon on the orignal brick>`
>>>>> thread apply all bt -->share this output
>>>>> quit gdb.
>>>>>
>>>>>
>>>>> -Ravi
>>>>>
>>>>>>
>>>>>> @Ravi/Pranith - can you help here?
>>>>>>
>>>>>>
>>>>>>
>>>>>> By doing gluster volume status, I have
>>>>>>
>>>>>> Status of volume: storage
>>>>>> Gluster process TCP Port RDMA Port Online
>>>>>> Pid
>>>>>> ------------------------------------------------------------
>>>>>> ------------------
>>>>>> Brick storage2:/data/data-cluster 49152 0 Y
>>>>>> 23101
>>>>>> Brick storage:/data/data-cluster 49152 0 Y
>>>>>> 30773
>>>>>> Self-heal Daemon on localhost N/A N/A Y
>>>>>> 30050
>>>>>> Self-heal Daemon on storage N/A N/A Y
>>>>>> 30792
>>>>>>
>>>>>>
>>>>>> Any idea?
>>>>>>
>>>>>> On storage I have:
>>>>>> Number of Peers: 1
>>>>>>
>>>>>> Hostname: 195.65.194.217
>>>>>> Uuid: 7c988af2-9f76-4843-8e6f-d94866d57bb0
>>>>>> State: Peer in Cluster (Connected)
>>>>>>
>>>>>>
>>>>>> - Kindest regards,
>>>>>>
>>>>>> Milos Cuculovic
>>>>>> IT Manager
>>>>>>
>>>>>> ---
>>>>>> MDPI AG
>>>>>> Postfach, CH-4020 Basel, Switzerland
>>>>>> Office: St. Alban-Anlage 66, 4052 Basel, Switzerland
>>>>>> Tel. +41 61 683 77 35
>>>>>> Fax +41 61 302 89 18
>>>>>> Email: cuculovic at mdpi.com <mailto:cuculovic at mdpi.com>
>>>>>> Skype: milos.cuculovic.mdpi
>>>>>>
>>>>>> On 08.12.2016 13:55, Atin Mukherjee wrote:
>>>>>>
>>>>>> Can you resend the attachment as zip? I am unable to extract
>>>>>> the
>>>>>> content? We shouldn't have 0 info file. What does gluster peer
>>>>>> status
>>>>>> output say?
>>>>>>
>>>>>> On Thu, Dec 8, 2016 at 4:51 PM, Miloš Čučulović - MDPI
>>>>>> <cuculovic at mdpi.com <mailto:cuculovic at mdpi.com>
>>>>>> <mailto:cuculovic at mdpi.com <mailto:cuculovic at mdpi.com>>>
>>>>>> wrote:
>>>>>>
>>>>>> I hope you received my last email Atin, thank you!
>>>>>>
>>>>>> - Kindest regards,
>>>>>>
>>>>>> Milos Cuculovic
>>>>>> IT Manager
>>>>>>
>>>>>> ---
>>>>>> MDPI AG
>>>>>> Postfach, CH-4020 Basel, Switzerland
>>>>>> Office: St. Alban-Anlage 66, 4052 Basel, Switzerland
>>>>>> Tel. +41 61 683 77 35
>>>>>> Fax +41 61 302 89 18
>>>>>> Email: cuculovic at mdpi.com <mailto:cuculovic at mdpi.com>
>>>>>> <mailto:cuculovic at mdpi.com <mailto:cuculovic at mdpi.com>>
>>>>>> Skype: milos.cuculovic.mdpi
>>>>>>
>>>>>> On 08.12.2016 10:28, Atin Mukherjee wrote:
>>>>>>
>>>>>>
>>>>>> ---------- Forwarded message ----------
>>>>>> From: *Atin Mukherjee* <amukherj at redhat.com
>>>>>> <mailto:amukherj at redhat.com>
>>>>>> <mailto:amukherj at redhat.com
>>>>>> <mailto:amukherj at redhat.com>> <mailto:amukherj at redhat.com
>>>>>> <mailto:amukherj at redhat.com>
>>>>>> <mailto:amukherj at redhat.com
>>>>>> <mailto:amukherj at redhat.com>>>>
>>>>>> Date: Thu, Dec 8, 2016 at 11:56 AM
>>>>>> Subject: Re: [Gluster-users] Replica brick not working
>>>>>> To: Ravishankar N <ravishankar at redhat.com
>>>>>> <mailto:ravishankar at redhat.com>
>>>>>> <mailto:ravishankar at redhat.com
>>>>>> <mailto:ravishankar at redhat.com>>
>>>>>> <mailto:ravishankar at redhat.com <mailto:ravishankar at redhat.com
>>>>>> >
>>>>>> <mailto:ravishankar at redhat.com
>>>>>> <mailto:ravishankar at redhat.com>>>>
>>>>>> Cc: Miloš Čučulović - MDPI <cuculovic at mdpi.com
>>>>>> <mailto:cuculovic at mdpi.com>
>>>>>> <mailto:cuculovic at mdpi.com <mailto:cuculovic at mdpi.com
>>>>>> >>
>>>>>> <mailto:cuculovic at mdpi.com <mailto:cuculovic at mdpi.com
>>>>>> >
>>>>>> <mailto:cuculovic at mdpi.com <mailto:cuculovic at mdpi.com>>>>,
>>>>>> Pranith Kumar Karampuri
>>>>>> <pkarampu at redhat.com <mailto:pkarampu at redhat.com>
>>>>>> <mailto:pkarampu at redhat.com <mailto:pkarampu at redhat.com>>
>>>>>> <mailto:pkarampu at redhat.com
>>>>>> <mailto:pkarampu at redhat.com> <mailto:pkarampu at redhat.com
>>>>>> <mailto:pkarampu at redhat.com>>>>,
>>>>>> gluster-users
>>>>>> <gluster-users at gluster.org
>>>>>> <mailto:gluster-users at gluster.org>
>>>>>> <mailto:gluster-users at gluster.org
>>>>>> <mailto:gluster-users at gluster.org>>
>>>>>> <mailto:gluster-users at gluster.org
>>>>>> <mailto:gluster-users at gluster.org>
>>>>>> <mailto:gluster-users at gluster.org
>>>>>> <mailto:gluster-users at gluster.org>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Dec 8, 2016 at 11:11 AM, Ravishankar N
>>>>>> <ravishankar at redhat.com
>>>>>> <mailto:ravishankar at redhat.com> <mailto:
>>>>>> ravishankar at redhat.com
>>>>>> <mailto:ravishankar at redhat.com>>
>>>>>> <mailto:ravishankar at redhat.com
>>>>>> <mailto:ravishankar at redhat.com> <mailto:
>>>>>> ravishankar at redhat.com
>>>>>> <mailto:ravishankar at redhat.com>>>>
>>>>>>
>>>>>> wrote:
>>>>>>
>>>>>> On 12/08/2016 10:43 AM, Atin Mukherjee wrote:
>>>>>>
>>>>>> >From the log snippet:
>>>>>>
>>>>>> [2016-12-07 09:15:35.677645] I [MSGID: 106482]
>>>>>>
>>>>>> [glusterd-brick-ops.c:442:__glusterd_handle_add_brick]
>>>>>> 0-management: Received add brick req
>>>>>> [2016-12-07 09:15:35.677708] I [MSGID: 106062]
>>>>>>
>>>>>> [glusterd-brick-ops.c:494:__glusterd_handle_add_brick]
>>>>>> 0-management: replica-count is 2
>>>>>> [2016-12-07 09:15:35.677735] E [MSGID: 106291]
>>>>>>
>>>>>> [glusterd-brick-ops.c:614:__glusterd_handle_add_brick]
>>>>>> 0-management:
>>>>>>
>>>>>> The last log entry indicates that we hit the
>>>>>> code path in
>>>>>> gd_addbr_validate_replica_count ()
>>>>>>
>>>>>> if (replica_count ==
>>>>>> volinfo->replica_count) {
>>>>>> if (!(total_bricks %
>>>>>> volinfo->dist_leaf_count)) {
>>>>>> ret = 1;
>>>>>> goto out;
>>>>>> }
>>>>>> }
>>>>>>
>>>>>>
>>>>>> It seems unlikely that this snippet was hit
>>>>>> because we print
>>>>>> the E
>>>>>> [MSGID: 106291] in the above message only if
>>>>>> ret==-1.
>>>>>> gd_addbr_validate_replica_count() returns -1 and
>>>>>> yet not
>>>>>> populates
>>>>>> err_str only when in volinfo->type doesn't match
>>>>>> any of the
>>>>>> known
>>>>>> volume types, so volinfo->type is corrupted
>>>>>> perhaps?
>>>>>>
>>>>>>
>>>>>> You are right, I missed that ret is set to 1 here in
>>>>>> the above
>>>>>> snippet.
>>>>>>
>>>>>> @Milos - Can you please provide us the volume info
>>>>>> file from
>>>>>> /var/lib/glusterd/vols/<volname>/ from all the three
>>>>>> nodes to
>>>>>> continue
>>>>>> the analysis?
>>>>>>
>>>>>>
>>>>>>
>>>>>> -Ravi
>>>>>>
>>>>>> @Pranith, Ravi - Milos was trying to convert a
>>>>>> dist (1 X 1)
>>>>>> volume to a replicate (1 X 2) using add brick
>>>>>> and hit
>>>>>> this issue
>>>>>> where add-brick failed. The cluster is
>>>>>> operating with 3.7.6.
>>>>>> Could you help on what scenario this code path
>>>>>> can be
>>>>>> hit? One
>>>>>> straight forward issue I see here is missing
>>>>>> err_str in
>>>>>> this path.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>> ~ Atin (atinm)
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>> ~ Atin (atinm)
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>> ~ Atin (atinm)
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>> ~ Atin (atinm)
>>>>>>
>>>>>
>>>>>
>>>>>
>>>
>>
>>
>> --
>> Pranith
>>
>
>
>
> --
> Pranith
>
--
Pranith
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20161208/fe1ceeea/attachment.html>
More information about the Gluster-users
mailing list