[Gluster-users] Fwd: Replica brick not working
Atin Mukherjee
amukherj at redhat.com
Wed Dec 14 04:13:23 UTC 2016
Milos,
I just managed to take a look into a similar issue and my analysis is at
[1]. I remember you mentioning about some incorrect /etc/hosts entries
which lead to this same problem in earlier case, do you mind to recheck the
same?
[1] http://www.gluster.org/pipermail/gluster-users/2016-December/029443.html
On Wed, Dec 14, 2016 at 2:57 AM, Miloš Čučulović - MDPI <cuculovic at mdpi.com>
wrote:
> Hi All,
>
> Moving forward with my issue, sorry for the late reply!
>
> I had some issues with the storage2 server (original volume), then decided
> to use 3.9.0, si I have the latest version.
>
> For that, I synced manually all the files to the storage server. I
> installed there gluster 3.9.0, started it, created new volume called
> storage and all seems to work ok.
>
> Now, I need to create my replicated volume (add new brick on storage2
> server). Almost all the files are there. So, I was adding on storage server:
>
> * sudo gluter peer probe storage2
> * sudo gluster volume add-brick storage replica 2
> storage2:/data/data-cluster force
>
> But there I am receiving "volume add-brick: failed: Host storage2 is not
> in 'Peer in Cluster' state"
>
> Any idea?
>
> - Kindest regards,
>
> Milos Cuculovic
> IT Manager
>
> ---
> MDPI AG
> Postfach, CH-4020 Basel, Switzerland
> Office: St. Alban-Anlage 66, 4052 Basel, Switzerland
> Tel. +41 61 683 77 35
> Fax +41 61 302 89 18
> Email: cuculovic at mdpi.com
> Skype: milos.cuculovic.mdpi
>
> On 08.12.2016 17:52, Ravishankar N wrote:
>
>> On 12/08/2016 09:44 PM, Miloš Čučulović - MDPI wrote:
>>
>>> I was able to fix the sync by rsync-ing all the directories, then the
>>> hale started. The next problem :), as soon as there are files on the
>>> new brick, the gluster mount will render also this one for mounts, and
>>> the new brick is not ready yet, as the sync is not yet done, so it
>>> results on missing files on client side. I temporary removed the new
>>> brick, now I am running a manual rsync and will add the brick again,
>>> hope this could work.
>>>
>>> What mechanism is managing this issue, I guess there is something per
>>> built to make a replica brick available only once the data is
>>> completely synced.
>>>
>> This mechanism was introduced in 3.7.9 or 3.7.10
>> (http://review.gluster.org/#/c/13806/). Before that version, you
>> manually needed to set some xattrs on the bricks so that healing could
>> happen in parallel while the client still would server reads from the
>> original brick. I can't find the link to the doc which describes these
>> steps for setting xattrs.:-(
>>
>> Calling it a day,
>> Ravi
>>
>>>
>>> - Kindest regards,
>>>
>>> Milos Cuculovic
>>> IT Manager
>>>
>>> ---
>>> MDPI AG
>>> Postfach, CH-4020 Basel, Switzerland
>>> Office: St. Alban-Anlage 66, 4052 Basel, Switzerland
>>> Tel. +41 61 683 77 35
>>> Fax +41 61 302 89 18
>>> Email: cuculovic at mdpi.com
>>> Skype: milos.cuculovic.mdpi
>>>
>>> On 08.12.2016 16:17, Ravishankar N wrote:
>>>
>>>> On 12/08/2016 06:53 PM, Atin Mukherjee wrote:
>>>>
>>>>>
>>>>>
>>>>> On Thu, Dec 8, 2016 at 6:44 PM, Miloš Čučulović - MDPI
>>>>> <cuculovic at mdpi.com <mailto:cuculovic at mdpi.com>> wrote:
>>>>>
>>>>> Ah, damn! I found the issue. On the storage server, the storage2
>>>>> IP address was wrong, I inversed two digits in the /etc/hosts
>>>>> file, sorry for that :(
>>>>>
>>>>> I was able to add the brick now, I started the heal, but still no
>>>>> data transfer visible.
>>>>>
>>>>> 1. Are the files getting created on the new brick though?
>>>> 2. Can you provide the output of `getfattr -d -m . -e hex
>>>> /data/data-cluster` on both bricks?
>>>> 3. Is it possible to attach gdb to the self-heal daemon on the original
>>>> (old) brick and get a backtrace?
>>>> `gdb -p <pid of self-heal daemon on the orignal brick>`
>>>> thread apply all bt -->share this output
>>>> quit gdb.
>>>>
>>>>
>>>> -Ravi
>>>>
>>>>>
>>>>> @Ravi/Pranith - can you help here?
>>>>>
>>>>>
>>>>>
>>>>> By doing gluster volume status, I have
>>>>>
>>>>> Status of volume: storage
>>>>> Gluster process TCP Port RDMA Port
>>>>> Online Pid
>>>>> ------------------------------------------------------------
>>>>> ------------------
>>>>>
>>>>> Brick storage2:/data/data-cluster 49152 0 Y
>>>>> 23101
>>>>> Brick storage:/data/data-cluster 49152 0 Y
>>>>> 30773
>>>>> Self-heal Daemon on localhost N/A N/A Y
>>>>> 30050
>>>>> Self-heal Daemon on storage N/A N/A Y
>>>>> 30792
>>>>>
>>>>>
>>>>> Any idea?
>>>>>
>>>>> On storage I have:
>>>>> Number of Peers: 1
>>>>>
>>>>> Hostname: 195.65.194.217
>>>>> Uuid: 7c988af2-9f76-4843-8e6f-d94866d57bb0
>>>>> State: Peer in Cluster (Connected)
>>>>>
>>>>>
>>>>> - Kindest regards,
>>>>>
>>>>> Milos Cuculovic
>>>>> IT Manager
>>>>>
>>>>> ---
>>>>> MDPI AG
>>>>> Postfach, CH-4020 Basel, Switzerland
>>>>> Office: St. Alban-Anlage 66, 4052 Basel, Switzerland
>>>>> Tel. +41 61 683 77 35
>>>>> Fax +41 61 302 89 18
>>>>> Email: cuculovic at mdpi.com <mailto:cuculovic at mdpi.com>
>>>>> Skype: milos.cuculovic.mdpi
>>>>>
>>>>> On 08.12.2016 13:55, Atin Mukherjee wrote:
>>>>>
>>>>> Can you resend the attachment as zip? I am unable to extract
>>>>> the
>>>>> content? We shouldn't have 0 info file. What does gluster peer
>>>>> status
>>>>> output say?
>>>>>
>>>>> On Thu, Dec 8, 2016 at 4:51 PM, Miloš Čučulović - MDPI
>>>>> <cuculovic at mdpi.com <mailto:cuculovic at mdpi.com>
>>>>> <mailto:cuculovic at mdpi.com <mailto:cuculovic at mdpi.com>>>
>>>>> wrote:
>>>>>
>>>>> I hope you received my last email Atin, thank you!
>>>>>
>>>>> - Kindest regards,
>>>>>
>>>>> Milos Cuculovic
>>>>> IT Manager
>>>>>
>>>>> ---
>>>>> MDPI AG
>>>>> Postfach, CH-4020 Basel, Switzerland
>>>>> Office: St. Alban-Anlage 66, 4052 Basel, Switzerland
>>>>> Tel. +41 61 683 77 35
>>>>> Fax +41 61 302 89 18
>>>>> Email: cuculovic at mdpi.com <mailto:cuculovic at mdpi.com>
>>>>> <mailto:cuculovic at mdpi.com <mailto:cuculovic at mdpi.com>>
>>>>> Skype: milos.cuculovic.mdpi
>>>>>
>>>>> On 08.12.2016 10:28, Atin Mukherjee wrote:
>>>>>
>>>>>
>>>>> ---------- Forwarded message ----------
>>>>> From: *Atin Mukherjee* <amukherj at redhat.com
>>>>> <mailto:amukherj at redhat.com>
>>>>> <mailto:amukherj at redhat.com
>>>>> <mailto:amukherj at redhat.com>> <mailto:amukherj at redhat.com
>>>>> <mailto:amukherj at redhat.com>
>>>>> <mailto:amukherj at redhat.com
>>>>> <mailto:amukherj at redhat.com>>>>
>>>>> Date: Thu, Dec 8, 2016 at 11:56 AM
>>>>> Subject: Re: [Gluster-users] Replica brick not working
>>>>> To: Ravishankar N <ravishankar at redhat.com
>>>>> <mailto:ravishankar at redhat.com>
>>>>> <mailto:ravishankar at redhat.com
>>>>> <mailto:ravishankar at redhat.com>>
>>>>> <mailto:ravishankar at redhat.com <mailto:ravishankar at redhat.com>
>>>>> <mailto:ravishankar at redhat.com
>>>>> <mailto:ravishankar at redhat.com>>>>
>>>>> Cc: Miloš Čučulović - MDPI <cuculovic at mdpi.com
>>>>> <mailto:cuculovic at mdpi.com>
>>>>> <mailto:cuculovic at mdpi.com <mailto:cuculovic at mdpi.com
>>>>> >>
>>>>> <mailto:cuculovic at mdpi.com <mailto:cuculovic at mdpi.com>
>>>>> <mailto:cuculovic at mdpi.com <mailto:cuculovic at mdpi.com>>>>,
>>>>> Pranith Kumar Karampuri
>>>>> <pkarampu at redhat.com <mailto:pkarampu at redhat.com>
>>>>> <mailto:pkarampu at redhat.com <mailto:pkarampu at redhat.com>>
>>>>> <mailto:pkarampu at redhat.com
>>>>> <mailto:pkarampu at redhat.com> <mailto:pkarampu at redhat.com
>>>>> <mailto:pkarampu at redhat.com>>>>,
>>>>> gluster-users
>>>>> <gluster-users at gluster.org
>>>>> <mailto:gluster-users at gluster.org>
>>>>> <mailto:gluster-users at gluster.org
>>>>> <mailto:gluster-users at gluster.org>>
>>>>> <mailto:gluster-users at gluster.org
>>>>> <mailto:gluster-users at gluster.org>
>>>>> <mailto:gluster-users at gluster.org
>>>>> <mailto:gluster-users at gluster.org>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Dec 8, 2016 at 11:11 AM, Ravishankar N
>>>>> <ravishankar at redhat.com
>>>>> <mailto:ravishankar at redhat.com> <mailto:ravishankar at redhat.com
>>>>> <mailto:ravishankar at redhat.com>>
>>>>> <mailto:ravishankar at redhat.com
>>>>> <mailto:ravishankar at redhat.com> <mailto:ravishankar at redhat.com
>>>>> <mailto:ravishankar at redhat.com>>>>
>>>>>
>>>>> wrote:
>>>>>
>>>>> On 12/08/2016 10:43 AM, Atin Mukherjee wrote:
>>>>>
>>>>> >From the log snippet:
>>>>>
>>>>> [2016-12-07 09:15:35.677645] I [MSGID: 106482]
>>>>>
>>>>> [glusterd-brick-ops.c:442:__glusterd_handle_add_brick]
>>>>> 0-management: Received add brick req
>>>>> [2016-12-07 09:15:35.677708] I [MSGID: 106062]
>>>>>
>>>>> [glusterd-brick-ops.c:494:__glusterd_handle_add_brick]
>>>>> 0-management: replica-count is 2
>>>>> [2016-12-07 09:15:35.677735] E [MSGID: 106291]
>>>>>
>>>>> [glusterd-brick-ops.c:614:__glusterd_handle_add_brick]
>>>>> 0-management:
>>>>>
>>>>> The last log entry indicates that we hit the
>>>>> code path in
>>>>> gd_addbr_validate_replica_count ()
>>>>>
>>>>> if (replica_count ==
>>>>> volinfo->replica_count) {
>>>>> if (!(total_bricks %
>>>>> volinfo->dist_leaf_count)) {
>>>>> ret = 1;
>>>>> goto out;
>>>>> }
>>>>> }
>>>>>
>>>>>
>>>>> It seems unlikely that this snippet was hit
>>>>> because we print
>>>>> the E
>>>>> [MSGID: 106291] in the above message only if
>>>>> ret==-1.
>>>>> gd_addbr_validate_replica_count() returns -1 and
>>>>> yet not
>>>>> populates
>>>>> err_str only when in volinfo->type doesn't match
>>>>> any of the
>>>>> known
>>>>> volume types, so volinfo->type is corrupted
>>>>> perhaps?
>>>>>
>>>>>
>>>>> You are right, I missed that ret is set to 1 here in
>>>>> the above
>>>>> snippet.
>>>>>
>>>>> @Milos - Can you please provide us the volume info
>>>>> file from
>>>>> /var/lib/glusterd/vols/<volname>/ from all the three
>>>>> nodes to
>>>>> continue
>>>>> the analysis?
>>>>>
>>>>>
>>>>>
>>>>> -Ravi
>>>>>
>>>>> @Pranith, Ravi - Milos was trying to convert a
>>>>> dist (1 X 1)
>>>>> volume to a replicate (1 X 2) using add brick
>>>>> and hit
>>>>> this issue
>>>>> where add-brick failed. The cluster is
>>>>> operating with 3.7.6.
>>>>> Could you help on what scenario this code path
>>>>> can be
>>>>> hit? One
>>>>> straight forward issue I see here is missing
>>>>> err_str in
>>>>> this path.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> ~ Atin (atinm)
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> ~ Atin (atinm)
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> ~ Atin (atinm)
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> ~ Atin (atinm)
>>>>>
>>>>
>>>>
>>>>
>>
>>
--
~ Atin (atinm)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20161214/df16301f/attachment.html>
More information about the Gluster-users
mailing list