[Bugs] [Bug 1144672] New: file locks are not released in frequently disconnects after apply BUG #1129787 patch

Sat Sep 20 08:01:43 UTC 2014

https://bugzilla.redhat.com/show_bug.cgi?id=1144672

            Bug ID: 1144672
           Summary: file locks are not released in frequently disconnects
                    after apply BUG #1129787 patch
           Product: GlusterFS
           Version: 3.4.5
         Component: rpc
          Assignee: gluster-bugs at redhat.com
          Reporter: jaden1q84 at gmail.com
                CC: bugs at gluster.org

Description of problem:

First of all, this issue happen after apply http://review.gluster.org/8065 and
set the network.tcp-timeout as 30s.

In a replicated gluster volume with 2 nodes server. On the client-side, using
mount.glusterfs to access that gluster volume. A test program running on one of
the nodes opens a file in the volume and flock it(only flock once), then read
and write file frequently. On one of nodes, we simulate a network disconnects
for 15s then reconnect. Note that the 15s is less than the network.tcp-timeout
30s. Keep this disconnect and reconnect for some time and exit the test
program, the FD in server-side wouldn't be closed. If restart the test program,
it will failed while flock, return a Resource Temporarily Unavailable error.

Network failure timeline:
---(15s connected)---|---(15s disconnected)---|---(15s
connected)---|...repeat...

Version-Release number of selected component (if applicable):

How reproducible:
By simulation, this issue is reproducible.

Steps to Reproduce:
1. Setting a replicated volume with 2 server nodes(A and B). On node A, using
fuse client to access the volume. 
Run a simple test program, create a file and flock it, then do some radom
reading in a deadloop, just not unlock it or exit.

2. On node B, add 2 iptables commands to block the connection between fuse
client on A and glusterfsd on B. eg
iptables -A INPUT -p tcp -s 200.200.200.20 --dport 49154 -j REJECT
iptables -A OUTPUT -p tcp -d 200.200.200.20 --sport 49154 -j DROP

Note: 200.200.200.20 is the IP of A, 49154 is the listen port of glusterfsd on
B.

These 2 commands is about to keep socket on node A closed first(by REJECT), and
at the same time only drop OUTPUT packets on B whick can keep socket on node B
alive.

3. After 30s, delete 2 iptables commands with 
iptables -A INPUT -p tcp -s 200.200.200.20 --dport 49154 -j REJECT
iptables -A OUTPUT -p tcp -d 200.200.200.20 --sport 49154 -j DROP

4. Repeat 2-3 several times. Exit the test program, then restart it, it cannot
flock again.

Actual results:
File flocks not released.

Expected results:
File flocks released.

Additional info:
Here is a preview patch to fix the issue, will submit to Gerrit later.

http://supercolony.gluster.org/pipermail/gluster-devel/2014-September/042315.html

The major modification is adding an id for different tcp connection between
a pair client and server to avoid a connection socket not close at the same
time.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=oI1SQ9vBDU&a=cc_unsubscribe