[Bugs] [Bug 1144672] file locks are not released in frequently disconnects after apply BUG #1129787 patch

Mon Sep 22 06:20:16 UTC 2014

https://bugzilla.redhat.com/show_bug.cgi?id=1144672

--- Comment #2 from Jaden Liang <jaden1q84 at gmail.com> ---
A little mistakes of step 3 to reproduce. Use -D to delete iptables commands.

3. After 30s, delete 2 iptables commands with 
iptables -D INPUT -p tcp -s 200.200.200.20 --dport 49154 -j REJECT
iptables -D OUTPUT -p tcp -d 200.200.200.20 --sport 49154 -j DROP

And port 49154 is the glusterfsd listening port, it might be different in
different servers. It can be found with 'ps auxf | grep glusterfsd' to find out
the brick glusterfsd of the testfile.

(In reply to Jaden Liang from comment #0)
> Description of problem:
> 
> First of all, this issue happen after apply http://review.gluster.org/8065
> and set the network.tcp-timeout as 30s.
> 
> In a replicated gluster volume with 2 nodes server. On the client-side,
> using mount.glusterfs to access that gluster volume. A test program running
> on one of the nodes opens a file in the volume and flock it(only flock
> once), then read and write file frequently. On one of nodes, we simulate a
> network disconnects for 15s then reconnect. Note that the 15s is less than
> the network.tcp-timeout 30s. Keep this disconnect and reconnect for some
> time and exit the test program, the FD in server-side wouldn't be closed. If
> restart the test program, it will failed while flock, return a Resource
> Temporarily Unavailable error.
> 
> Network failure timeline:
> ---(15s connected)---|---(15s disconnected)---|---(15s
> connected)---|...repeat...
> 
> Version-Release number of selected component (if applicable):
> 
> 
> How reproducible:
> By simulation, this issue is reproducible.
> 
> Steps to Reproduce:
> 1. Setting a replicated volume with 2 server nodes(A and B). On node A,
> using fuse client to access the volume. 
> Run a simple test program, create a file and flock it, then do some radom
> reading in a deadloop, just not unlock it or exit.
> 
> 2. On node B, add 2 iptables commands to block the connection between fuse
> client on A and glusterfsd on B. eg
> iptables -A INPUT -p tcp -s 200.200.200.20 --dport 49154 -j REJECT
> iptables -A OUTPUT -p tcp -d 200.200.200.20 --sport 49154 -j DROP
> 
> Note: 200.200.200.20 is the IP of A, 49154 is the listen port of glusterfsd
> on B.
> 
> These 2 commands is about to keep socket on node A closed first(by REJECT),
> and at the same time only drop OUTPUT packets on B whick can keep socket on
> node B alive.
> 
> 3. After 30s, delete 2 iptables commands with 
> iptables -A INPUT -p tcp -s 200.200.200.20 --dport 49154 -j REJECT
> iptables -A OUTPUT -p tcp -d 200.200.200.20 --sport 49154 -j DROP
> 
> 4. Repeat 2-3 several times. Exit the test program, then restart it, it
> cannot flock again.
> 
> Actual results:
> File flocks not released.
> 
> Expected results:
> File flocks released.
> 
> Additional info:
> Here is a preview patch to fix the issue, will submit to Gerrit later.
> 
> http://supercolony.gluster.org/pipermail/gluster-devel/2014-September/042315.
> html
> 
> The major modification is adding an id for different tcp connection between
> a pair client and server to avoid a connection socket not close at the same
> time.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=fxcj7gtWEu&a=cc_unsubscribe