[Gluster-users] issues recovering machine in gluster

Atin Mukherjee amukherj at redhat.com
Wed Jun 15 05:09:26 UTC 2016


So the issue looks like an incorrect UUID got populated in the peer
configuration which lead to this inconsistency and here is the log entry
to prove this. I have a feeling that the steps were not properly
performed or you missed to copy the older UUID of the failed node to the
new one.

[2016-06-13 18:25:09.738363] E [MSGID: 106170]
[glusterd-handshake.c:1051:gd_validate_mgmt_hndsk_req] 0-management:
Request from peer 10.28.9.12:65299 has an entry in peerinfo, but uuid
does not match

To get rid of this situation you'd need to stop all the running glusterd
instances and go into /var/lib/glusterd/peers folder on all the nodes
and manually correct the UUID file names and their content if required.

Just to give you an idea on how the peer configurations are structured
and stored, here is an example:

On a 3 node cluster (say N1, N2, N3)
N1's UUID - dc07f77f-09f3-46f4-8d92-f2d7f6e627af

(By 'cat /var/lib/glusterd/glusterd.info | grep UUID' on N1)

N2's UUID -  02d157bd-a738-4914-991e-60953409f1b1
N3's UUID -  932186a6-4b29-4216-8da1-2fe193c928c1

N1's peer configuration
=======================
root at ebbc696b4dc4:/home/glusterfs# cd /var/lib/glusterd/peers/
root at ebbc696b4dc4:/var/lib/glusterd/peers# ls -lrt
total 8
-rw------- 1 root root 71 Jun 15 05:01
02d157bd-a738-4914-991e-60953409f1b1   -----> N2's UUID
-rw------- 1 root root 71 Jun 15 05:02
932186a6-4b29-4216-8da1-2fe193c928c1 - N3's UUID


Content of other peers (N2,3) on N1's disk
==========================================
root at ebbc696b4dc4:/var/lib/glusterd/peers# cat
02d157bd-a738-4914-991e-60953409f1b1
uuid=02d157bd-a738-4914-991e-60953409f1b1
state=3
hostname1=172.17.0.3

root at ebbc696b4dc4:/var/lib/glusterd/peers# cat
932186a6-4b29-4216-8da1-2fe193c928c1
uuid=932186a6-4b29-4216-8da1-2fe193c928c1
state=3
hostname1=172.17.0.4

Similarly you will find the details of N1, N2 on N3 & N1 & N3 on N2.

You'd need to validate this theory on all the nodes and correct the
content and remove the unwanted UUIDs. Post that restarting all the
glusterd instances should solve the problem.

HTH,
Atin


On 06/13/2016 08:16 PM, Atin Mukherjee wrote:
> Please send us the glusterd log file along with cmd_history.log from all
> the 6 nodes. The logs you mentioned in the thread are not relevant to
> debug the issue. Which gluster version are you using?
> 
> ~Atin
> 
> On 06/13/2016 06:49 PM, Arif Ali wrote:
>> Hi all,
>>
>> Hopefully, someone can help
>>
>> We have a 6 node gluster setup, and have successfully got the gluster
>> system up and running, and had no issues with the initial install.
>>
>> For other reasons, we had to re-provision the nodes, and therefore we
>> had to go through some recovery steps to get the node back into the
>> system. The documentation I used was [1].
>>
>> The key thing is that everything in the documentation worked without a
>> problem. The replication of gluster works, and can easily monitor that
>> through the heal commands.
>>
>> Unfortunately, we are not able to run "gluster volume status", which
>> hangs for a moment, and in the end we get "Error : Request timed out ".
>> Most of the log files are clean, except for
>> /var/log/glusterfs/etc-glusterfs-glusterd.vol.log. See below for some of
>> the contents
>>
>> [2016-06-13 12:57:01.054458] W [socket.c:870:__socket_keepalive]
>> 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 45, Invalid
>> argument
>> [2016-06-13 12:57:01.054492] E [socket.c:2966:socket_connect]
>> 0-management: Failed to set keep-alive: Invalid argument
>> [2016-06-13 12:57:01.059023] W [socket.c:870:__socket_keepalive]
>> 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 45, Invalid
>> argument
>> [2016-06-13 12:57:01.059042] E [socket.c:2966:socket_connect]
>> 0-management: Failed to set keep-alive: Invalid argument
>>
>> Any assistance on this would be much appreciated.
>>
>> [1] https://access.redhat.com/documentation/en-US/Red_Hat_Storage/3/html/Administration_Guide/sect-Replacing_Hosts.html#Replacing_a_Host_Machine_with_the_Same_Hostname
>>
>> --
>> Arif Ali
>>
>> IRC: arif-ali at freenode
>> LinkedIn: http://uk.linkedin.com/in/arifali
>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-users
>>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
> 


More information about the Gluster-users mailing list