[Bugs] [Bug 1187347] New: RPC ping does not retransmit

bugzilla at redhat.com bugzilla at redhat.com
Thu Jan 29 19:57:08 UTC 2015


https://bugzilla.redhat.com/show_bug.cgi?id=1187347

            Bug ID: 1187347
           Summary: RPC ping does not retransmit
           Product: GlusterFS
           Version: 3.6.1
         Component: fuse
          Assignee: bugs at gluster.org
          Reporter: skippy at skippy.net
                CC: bugs at gluster.org, gluster-bugs at redhat.com



Created attachment 985748
  --> https://bugzilla.redhat.com/attachment.cgi?id=985748&action=edit
tcpdumps of gluster clients, servers, and firewall

Description of problem:
The Gluster FUSE client seems to only send one RPC ping packet, and marks the
remote server down if that single packet is not received.


Version-Release number of selected component (if applicable): 3.6.1


How reproducible:
We have been unable to reliably reproduce this.



Steps to Reproduce:

We have a replica 2 Gluster configuration, with two physical Gluster servers
hosting bricks.  All clients are VMware virtual machines, and all clients use
FUSE to make glusterfs mounts.

The servers are in a different subnet than the clients.  There is a SonicWall
firewall between the subnets.

Randomly through the day we'll have Gluster clients claim a ping timeout from a
brick server. In every case, the client reports a ping time out to first one
server and them almost immediately to the other server.  The client will
re-establish a connection to both servers within a few seconds (often within
the same second as the disconnect is reported).

Clients do not fail together. That is, client1 will report a disconnect while
clients2 and client3 are happily using the Gluster volumes.


Actual results:

After much tcpdump and Wireshark, it appears to us as though the clients send
an RPC ping packet to the server.  This packet is getting lost somewhere, such
that the servers never ACK them.  The client TCP stack appears to re-transmit
the packet, and we see that the servers do ACK these retransmitted packets.

The retransmitted packet ACK seems not be accepted by the client, causing the
client to drop the connection to the server.


Expected results:


We would expect the client to be a little more resilient.  A single packet
retransmission should not tear down the entire Gluster universe.  No other
application in our network produces anything remotely similar.

Additional info:

Attached are several tcpdumps, from the Gluster clients, servers, and our
firewall.

gluster error at Tue Jan 13 07:06:43 EST 2015 (UTC Tue Jan 13 12:06:43
UTC 2015)

GLUSTER CLIENT: 192.168.135.61, GLUSTER SERVER: 192.168.30.115
SRC PORT: 1014, DEST PORT: 49162

1. See t11.pcap7.
 - packet number 85187
 - This is the initiation of a Gluster Dump RPC call on the gluster
client side

2. See t11.pcap7, packet number 85196.
 - This is a retransmission of the Gluster Dump RPC call in the previous
packet.

3. Now, see dump firewall.cap
 - Missing: the initiation of the Gluster Dump RPC call (from packet
85187 above)
 - However, the retransmission is in packet number 39789

4. Finally, see dump gluster-t2.pcap3
 - Again Missing: the initiation of the Gluster Dump RPC call
 - And this time the retransmission is also missing on the server side.
 - We¹re asssuming this is because the firewall dropped it, not knowing
it belonged to an active
   TCP conversation.


Further down below in the t11.pcap6 capture you can see the client gives
up and send TCP Resets for the failed RPC initiations.  There¹s several
RPC calls missing from the client to the firewall in these captures.  The
details below are to show one specific example.  But notice that we have
failed initiations to both gluster servers in these captures.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Bugs mailing list