[Bugs] [Bug 1245036] New: glusterd fails to peer probe if one of the node is behind the NAT.

bugzilla at redhat.com bugzilla at redhat.com
Tue Jul 21 06:17:58 UTC 2015


https://bugzilla.redhat.com/show_bug.cgi?id=1245036

            Bug ID: 1245036
           Summary: glusterd fails to peer probe if one of the node is
                    behind the NAT.
           Product: GlusterFS
           Version: 3.7.1
         Component: glusterd
          Assignee: bugs at gluster.org
          Reporter: hchiramm at redhat.com
                CC: bugs at gluster.org, gluster-bugs at redhat.com



Description of problem:

Currently glusterd fails to establish successful 'peer probe' if one of the
node which is participating in peer probe is behind the NAT. For ex: containers
running in multiple hosts fails when it peer probe to form a trusted pool. 

Test setup is configured with Atomic Hosts and 'flannel' for overlay networking 



                       Test Setup:

Container-1 IP : 10.50.72.2  ( running on Worker-1 where Worker-1 is atomic
host1)
Container-2 IP : 10.50.97.2  ( running on Worker-2 where Worker-2 is atomic
host2)


PING  from Container-1 to Container-2 works
SSH   from Container-1 to Container-2 works.


The gluster pool list says:

 Container-1:
--------------------------------------------------------------------------------------
-bash-4.3# ip a s eth0
5: eth0: <BROADCAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default
    link/ether 02:42:0a:32:61:02 brd ff:ff:ff:ff:ff:ff
    inet 10.50.97.2/24 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::42:aff:fe32:6102/64 scope link
       valid_lft forever preferred_lft forever

-bash-4.3# gluster pool list
UUID                                    Hostname        State
3c6bf65d-6a58-46ad-90d4-4e2d9b4dc80e    10.50.72.2      Connected
175daada-0ca4-4e18-b72b-460c9da19f96    localhost       Connected


As you can see above, in Container -1 it says both gluster nodes are connected
and the peer probe is successful. However in Container-2, the remote node is in
"disconnected" status.



 Container-2:
--------------------------------------------------------------------------------------

-bash-4.3# ip a s eth0
5: eth0: <BROADCAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default
    link/ether 02:42:0a:32:48:02 brd ff:ff:ff:ff:ff:ff
    inet 10.50.72.2/24 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::42:aff:fe32:4802/64 scope link
       valid_lft forever preferred_lft forever

-bash-4.3# gluster pool list
UUID                                    Hostname        State
175daada-0ca4-4e18-b72b-460c9da19f96    10.50.97.0      Disconnected
3c6bf65d-6a58-46ad-90d4-4e2d9b4dc80e    localhost       Connected



The below netstat output shows the "flannel" GW IP as the source IP in reverse
connection. which cause the glusterd to fail   




-bash-4.3# netstat -ntp
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State      
PID/Program name   
tcp        0      0 10.50.72.2:45442        202.255.47.226:80       TIME_WAIT  
-                  
tcp        1      1 10.50.72.2:41806        140.138.144.170:80      LAST_ACK   
-                  
tcp        0      0 10.50.72.2:22           10.50.72.1:55834        ESTABLISHED
146/sshd: root at pts/
tcp        0      0 10.50.72.2:58350        192.26.91.193:80        TIME_WAIT  
-    

tcp        0      0 10.50.72.2:24007        10.50.97.0:1022         ESTABLISHED
35/glusterd           ---> flannel GW IP

tcp        1      1 10.50.72.2:49727        123.255.202.74:80       LAST_ACK   
-                  

tcp        0      0 10.50.72.2:22           10.50.97.0:51955        ESTABLISHED
330/sshd: root at pts/  --> flannel GW IP

tcp        1      1 10.50.72.2:49723        123.255.202.74:80       LAST_ACK   
-                  
tcp        1      1 10.50.72.2:49734        123.255.202.74:80       LAST_ACK   
-                  
tcp        0      0 10.50.72.2:44396        103.22.220.133:80       TIME_WAIT  
-                  
tcp        0      0 10.50.72.2:37028        212.138.64.22:80        TIME_WAIT  
-                  
tcp        0      1 10.50.72.2:58308        137.189.4.14:80         LAST_ACK   
-    

As an additional info, the telnet from containers to 24007 works in both
direction.

-bash-4.3# ip a s eth0
5: eth0: <BROADCAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default 
    link/ether 02:42:0a:32:61:02 brd ff:ff:ff:ff:ff:ff
    inet 10.50.97.2/24 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::42:aff:fe32:6102/64 scope link 
       valid_lft forever preferred_lft forever


 -bash-4.3# telnet 10.50.72.2 24007
Trying 10.50.72.2...
Connected to 10.50.72.2.
Escape character is '^]'.
^]
telnet> Connection closed.



-bash-4.3# ip a s eth0
5: eth0: <BROADCAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default 
    link/ether 02:42:0a:32:48:02 brd ff:ff:ff:ff:ff:ff
    inet 10.50.72.2/24 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::42:aff:fe32:4802/64 scope link 
       valid_lft forever preferred_lft forever
-bash-4.3# telnet 10.50.97.2 24007
Trying 10.50.97.2...
Connected to 10.50.97.2.
Escape character is '^]'.
^]
telnet> Connection closed.




Version-Release number of selected component (if applicable):

GlusterFS 3.7.2

How reproducible:

Always

Steps to Reproduce:


Same as above.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Bugs mailing list