[Gluster-users] Error "Failed to find host nfs1.lightspeed.ca" when adding a new node to the cluster.

Ernie Dunbar maillist at lightspeed.ca
Wed Apr 6 22:34:45 UTC 2016


On 2016-04-06 11:42, Ernie Dunbar wrote:
> I've already successfully created a Gluster cluster, but when I try to
> add a new node, gluster on the new node claims it can't find the
> hostname of the first node in the cluster.
> 
> I've added the hostname nfs1.lightspeed.ca to /etc/hosts like this:
> 
> root at nfs3:/home/ernied# cat /etc/hosts
> 127.0.0.1	localhost
> 192.168.1.31    nfs1.lightspeed.ca      nfs1
> 192.168.1.32    nfs2.lightspeed.ca      nfs2
> 127.0.1.1	nfs3.lightspeed.ca	nfs3
> 
> 
> # The following lines are desirable for IPv6 capable hosts
> ::1     localhost ip6-localhost ip6-loopback
> ff02::1 ip6-allnodes
> ff02::2 ip6-allrouters
> 
> I can ping the hostname:
> 
> root at nfs3:/home/ernied# ping -c 3 nfs1
> PING nfs1.lightspeed.ca (192.168.1.31) 56(84) bytes of data.
> 64 bytes from nfs1.lightspeed.ca (192.168.1.31): icmp_seq=1 ttl=64 
> time=0.148 ms
> 64 bytes from nfs1.lightspeed.ca (192.168.1.31): icmp_seq=2 ttl=64 
> time=0.126 ms
> 64 bytes from nfs1.lightspeed.ca (192.168.1.31): icmp_seq=3 ttl=64 
> time=0.133 ms
> 
> --- nfs1.lightspeed.ca ping statistics ---
> 3 packets transmitted, 3 received, 0% packet loss, time 1998ms
> rtt min/avg/max/mdev = 0.126/0.135/0.148/0.016 ms
> 
> I can get gluster to probe the hostname:
> 
> root at nfs3:/home/ernied# gluster peer probe nfs1
> peer probe: success. Host nfs1 port 24007 already in peer list
> 
> But if I try to create the brick on the new node, it says that the
> host can't be found? Um...
> 
> root at nfs3:/home/ernied# gluster volume create gv2 replica 3
> nfs1.lightspeed.ca:/brick1/gv2/ nfs2.lightspeed.ca:/brick1/gv2/
> nfs3.lightspeed.ca:/brick1/gv2
> volume create: gv2: failed: Failed to find host nfs1.lightspeed.ca
> 
> Our logs from /var/log/glusterfs/etc-glusterfs-glusterd.vol.log:
> 
> [2016-04-06 18:19:18.107459] E [MSGID: 106452]
> [glusterd-utils.c:5825:glusterd_new_brick_validate] 0-management:
> Failed to find host nfs1.lightspeed.ca
> [2016-04-06 18:19:18.107496] E [MSGID: 106536]
> [glusterd-volume-ops.c:1364:glusterd_op_stage_create_volume]
> 0-management: Failed to find host nfs1.lightspeed.ca
> [2016-04-06 18:19:18.107516] E [MSGID: 106301]
> [glusterd-syncop.c:1281:gd_stage_op_phase] 0-management: Staging of
> operation 'Volume Create' failed on localhost : Failed to find host
> nfs1.lightspeed.ca
> [2016-04-06 18:19:18.231864] E [MSGID: 106170]
> [glusterd-handshake.c:1051:gd_validate_mgmt_hndsk_req] 0-management:
> Request from peer 192.168.1.31:65530 has an entry in peerinfo, but
> uuid does not match
> [2016-04-06 18:19:18.231919] E [MSGID: 106170]
> [glusterd-handshake.c:1060:gd_validate_mgmt_hndsk_req] 0-management:
> Rejecting management handshake request from unknown peer
> 192.168.1.31:65530
> 
> That error about the entry in peerinfo doesn't match anything in
> Google besides the source code for Gluster. My guess is that my
> earlier unsuccessful attempts to add this node before v3.7.10 have
> created a conflict that needs to be cleared.


More interesting, is what happens when I try to add the third server to 
the brick from the first gluster server:

root at nfs1:/home/ernied# gluster volume add-brick gv2 replica 3 
nfs3:/brick1/gv2
volume add-brick: failed: One or more nodes do not support the required 
op-version. Cluster op-version must atleast be 30600.

Yet, when I view the operating version in 
/var/lib/glusterd/glusterd.info:

root at nfs1:/home/ernied# cat /var/lib/glusterd/glusterd.info
UUID=1207917a-23bc-4bae-8238-cd691b7082c7
operating-version=30501

root at nfs2:/home/ernied# cat /var/lib/glusterd/glusterd.info
UUID=e394fcec-41da-482a-9b30-089f717c5c06
operating-version=30501

root at nfs3:/home/ernied# cat /var/lib/glusterd/glusterd.info
UUID=ae191e96-9cd6-4e2b-acae-18f2cc45e6ed
operating-version=30501

I see that the operating version is the same on all nodes!


More information about the Gluster-users mailing list