[Gluster-users] [RESOLVED] issues recovering machine in gluster

Wed Jun 15 10:27:10 UTC 2016

On 15 June 2016 at 08:55, Arif Ali <mail at arif-ali.co.uk> wrote:

>
> On 15 June 2016 at 08:09, Atin Mukherjee <amukherj at redhat.com> wrote:
>
>>
>>
>> On 06/15/2016 12:14 PM, Arif Ali wrote:
>> >
>> > On 15 June 2016 at 06:48, Atin Mukherjee <amukherj at redhat.com
>> > <mailto:amukherj at redhat.com>> wrote:
>> >
>> >
>> >
>> >     On 06/15/2016 11:06 AM, Gandalf Corvotempesta wrote:
>> >     > Il 15 giu 2016 07:09, "Atin Mukherjee" <amukherj at redhat.com
>> <mailto:amukherj at redhat.com>
>> >     > <mailto:amukherj at redhat.com <mailto:amukherj at redhat.com>>> ha
>> scritto:
>> >     >> To get rid of this situation you'd need to stop all the running
>> glusterd
>> >     >> instances and go into /var/lib/glusterd/peers folder on all the
>> nodes
>> >     >> and manually correct the UUID file names and their content if
>> required.
>> >     >
>> >     > If i understood properly the only way to fix this is by bringing
>> the
>> >     > whole cluster down? "you'd need to stop all the running glusterd
>> instances"
>> >     >
>> >     > I hope you are referring to all instances on the failed node...
>> >
>> >     No, since the configuration are synced across all the nodes, any
>> >     incorrect data gets replicated through out. So in this case to be
>> on the
>> >     safer side and validate the correctness all glusterd instances on
>> *all*
>> >     the nodes should be brought down. Having said that, this doesn't
>> impact
>> >     I/O as the management path is different than I/O.
>> >
>> >
>> > As a sanity, one of the things I did last night, was to reboot the whole
>> > gluster system, when I had downtime arranged. I thought this is
>> > something would be asked, as I had seen similar requests on the mailing
>> > list previously
>> >
>> > Unfortunately though, it didn't fix the problem.
>>
>> Only reboot is not going to solve the problem. You'd need to correct the
>> configuration as I explained earlier in this thread. If it doesn't
>> please send the me the content of /var/lib/glusterd/peers/ &
>> /var/lib/glusterd/glusterd.info file from all the nodes where glusterd
>> instances are running. I'll take a look and correct them and send it
>> back to you.
>>
>
> Thanks Atin,
>
> Apologies, I missed your mail, as I was travelling
>
> I have checked the relevant files you have mentioned, and they seem to
> look correct to me, but I have attached it for sanity, maybe you can spot
> something, that I have not seen
>

I have been discussing the issue with Atin on IRC, and we have resolved the
problem. Thanks Atin, it was much appreciated

For the purpose of this list. I had the UUID file matching the host in
/var/lib/glusterd/peers for the host itself. This was not required. Once I
removed the UUID based on the node where glusterd was running, the node was
able function correctly
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160615/2fd1ce9f/attachment.html>