[Gluster-users] glusterfs client waiting on SYN_SENT to connect...
Liam Slusser
lslusser at gmail.com
Tue Dec 14 20:58:40 UTC 2010
Just wanted to update you all. Turns out the problem is my Juniper
Firewall - sort of. I've created a service in our Juniper that
describes "Gluster" and allowed the "tcp session" to never timeout.
The problem comes when a server crashes and the TCP connection isn't
"cleaned up". It looks like the gluster client always starts using
the same outbound (source) TCP port and in our firewall that
source/dest port combination is already in use (never times out right)
and the firewall isn't allowing it to be created again - so its
blocked.
So right now if i do a netstat -pan
tcp 0 1 10.10.10.101:996 10.20.10.102:6996
SYN_SENT 23491/glusterfs
tcp 0 1 10.10.10.101:997 10.20.10.102:6996
SYN_SENT 23491/glusterfs
tcp 0 1 10.10.10.101:1000 10.20.10.102:6996
SYN_SENT 23491/glusterfs
tcp 0 0 10.10.10.101:1001 10.20.10.102:6996
ESTABLISHED 23491/glusterfs
tcp 0 0 10.10.10.101:999 10.20.10.101:6996
ESTABLISHED 23491/glusterfs
tcp 0 1 10.10.10.101:998 10.20.10.101:6996
SYN_SENT 23491/glusterfs
tcp 0 1 10.10.10.101:1003 10.20.10.101:6996
SYN_SENT 23491/glusterfs
tcp 0 1 10.10.10.101:1002 10.20.10.101:6996
SYN_SENT 23491/glusterfs
Now if i kill the gluster process and restart it again....notice the
source port doesn't change...
tcp 0 1 10.10.10.101:996 10.20.10.102:6996
SYN_SENT 23687/glusterfs
tcp 0 1 10.10.10.101:997 10.20.10.102:6996
SYN_SENT 23687/glusterfs
tcp 0 1 10.10.10.101:1000 10.20.10.102:6996
SYN_SENT 23687/glusterfs
tcp 0 0 10.10.10.101:1001 10.20.10.102:6996
ESTABLISHED 23687/glusterfs
tcp 0 0 10.10.10.101:999 10.20.10.101:6996
ESTABLISHED 23687/glusterfs
tcp 0 1 10.10.10.101:998 10.20.10.101:6996
SYN_SENT 23687/glusterfs
tcp 0 1 10.10.10.101:1003 10.20.10.101:6996
SYN_SENT 23687/glusterfs
tcp 0 1 10.10.10.101:1002 10.20.10.101:6996
SYN_SENT 23687/glusterfs
Now if i kill and restart a few times...i can get lucky and get a
different source port...but you can see i'm still missing a few
bricks.
tcp 0 0 10.10.10.101:994 10.20.10.102:6996
ESTABLISHED 23745/glusterfs
tcp 0 0 10.10.10.101:995 10.20.10.102:6996
ESTABLISHED 23745/glusterfs
tcp 0 0 10.10.10.101:998 10.20.10.102:6996
ESTABLISHED 23745/glusterfs
tcp 0 1 10.10.10.101:1000 10.20.10.102:6996
SYN_SENT 23745/glusterfs
tcp 0 0 10.10.10.101:997 10.20.10.101:6996
ESTABLISHED 23745/glusterfs
tcp 0 0 10.10.10.101:996 10.20.10.101:6996
ESTABLISHED 23745/glusterfs
tcp 0 1 10.10.10.101:1003 10.20.10.101:6996
SYN_SENT 23745/glusterfs
tcp 0 1 10.10.10.101:1002 10.20.10.101:6996
SYN_SENT 23745/glusterfs
Now telnet works always because it always picks a random source port:
$ telnet 10.20.10.102 6996
Trying 10.20.10.102...
Connected to glusterserver (10.20.10.102).
Escape character is '^]'.
$ netstat -pan|grep telne
tcp 0 0 10.10.10.101:58757 10.20.10.102:6996
ESTABLISHED 23622/telnet
Why does gluster not use a more random source port?? I'm going to
have to dig through the Juniper docs to see if i can manually close an
active session (lets hope) which should fix my immediate problem but
it doesn't really fix the long term problem.
Thoughts?
thanks,
liam
On Fri, Dec 3, 2010 at 6:51 PM, Liam Slusser <lslusser at gmail.com> wrote:
> Ah the two different IPs are because I was changing my IPs for this mailing
> list and I guess I forgot that one. :) Will try added a static route.
> Also going to snoop traffic and see if the gluster client is actually
> getting to the server or being blocked by the firewall. Ill letcha all know
> what I find.
>
> Thanks for the ideas.
>
> Liam
>
> On Dec 3, 2010 6:32 PM, <mki-glusterfs at mozone.net> wrote:
>> On Fri, Dec 03, 2010 at 04:25:18PM -0800, Liam Slusser wrote:
>>> [root at client~]# netstat -pan|grep glus
>>> tcp 0 1 10.8.10.107:1000 10.8.11.102:6996 SYN_SENT 3385/glusterfs
>>>
>>> from the gluster client log:
>>>
>>> However, the port is obviously open...
>>>
>>> [root at client~]# telnet 10.8.11.102 6996
>>> Trying 10.2.56.102...
>>> Connected to glusterserverb (10.8.11.102).
>>> Escape character is '^]'.
>>> ^]
>>> telnet> close
>>> Connection closed.
>>
>> Looking further... why is your telnet trying 10.2.56.102 when you
>> clearly specified 10.8.11.102? Also, what happens if you do a
>> specific route for the 10.8.11.0/24 block thru the appropriate gw
>> without relying on the default gw to route for you? In this way
>> you dont end up in a situation where the client is mistakenly
>> trying to go over the wrong interface. The telnet maybe switching
>> to an alternate interface to see if it gets thru?
>>
>> Mohan
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>
More information about the Gluster-users
mailing list