[Bugs] [Bug 1301805] New: Contending exclusive NFS file locks from two hosts breaks locking when blocked host gives up early.

Tue Jan 26 01:44:37 UTC 2016

https://bugzilla.redhat.com/show_bug.cgi?id=1301805

            Bug ID: 1301805
           Summary: Contending exclusive NFS file locks from two hosts
                    breaks locking when blocked host gives up early.
           Product: GlusterFS
           Version: 3.6.5
         Component: nfs
          Severity: medium
          Assignee: bugs at gluster.org
          Reporter: jbyers at stonefly.com
                CC: bugs at gluster.org

Contending exclusive NFS file locks from two hosts breaks
locking when blocked host gives up early.

Using the Linux 'flock' utility to test GlusterFS NFS file
locking by contending exclusive locking of the same file by
two different hosts results in broken file locking for that
file when the client waiting for the lock gives up on a
timeout or is process killed. No other file lock attempt will
work from any host again without a "gluster volume stop/start",
a glusterd restart, or deleting the lock file.

This problem is known to occur on glusterfs 3.6.5, and is also
said to occur on glusterfs 3.4.2.

The same test plan works correctly using native Linux kernel
NFS server, the contended file lock does not become broken.

No work-around has been found to evade this problem.

1) Locking from each host one at a time works fine:

[Linux-105 ~]# date; flock -x /mnt/locktest/locktest-1 -c 'date "+locked @ %T";
sleep 10;  date "+unlocked @ %T"'; date "+sts=$? @ %T"
Mon Jan 25 16:59:17 PST 2016
locked @ 16:59:18
unlocked @ 16:59:28
sts=0 @ 16:59:28

[Linux-121]# date; flock -x /mnt/locktest/locktest-1 -c 'date "+locked @ %T";
sleep 10;  date "+unlocked @ %T"'; date "+sts=$? @ %T"
Mon Jan 25 16:59:55 PST 2016
locked @ 16:59:56
unlocked @ 17:00:06
sts=0 @ 17:00:06

2) Locking from both hosts at the same time works fine as long
as the hosts are willing to wait forever, and the second is
not interrupted:

[Linux-105]# date; flock -x /mnt/locktest/locktest-1 -c 'date "+locked @ %T";
sleep 10;  date "+unlocked @ %T"'; date "+sts=$? @ %T"
Mon Jan 25 17:03:52 PST 2016
locked @ 17:03:52
unlocked @ 17:04:02
sts=0 @ 17:04:02

[Linux-121]# date; flock -x /mnt/locktest/locktest-1 -c 'date "+locked @ %T";
sleep 10;  date "+unlocked @ %T"'; date "+sts=$? @ %T"
Mon Jan 25 17:03:54 PST 2016
locked @ 17:04:02
unlocked @ 17:04:12
sts=0 @ 17:04:12

3) Locking from both hosts at the same time, where the second
one is not willing to wait and times out works, but leaves the
lock file in a state where it cannot be locked by anyone ever
again without a gluster volume stop/start, a glusterd restart,
or deleting the lock file:

[Linux-105]# date; flock -w 3 -x /mnt/locktest/locktest-1 -c 'date "+locked @
%T"; sleep 10;  date "+unlocked @ %T"'; date "+sts=$? @ %T"
Mon Jan 25 17:07:59 PST 2016
locked @ 17:07:59
unlocked @ 17:08:09
sts=0 @ 17:08:09

[Linux-121]# date; flock -w 3 -x /mnt/locktest/locktest-1 -c 'date "+locked @
%T"; sleep 10;  date "+unlocked @ %T"'; date "+sts=$? @ %T"
Mon Jan 25 17:08:01 PST 2016
sts=1 @ 17:08:04

[Linux-105]# date; flock -w 3 -x /mnt/locktest/locktest-1 -c 'date "+locked @
%T"; sleep 10;  date "+unlocked @ %T"'; date "+sts=$? @ %T"
Mon Jan 25 17:08:17 PST 2016
sts=1 @ 17:08:20

[Linux-121]# date; flock -w 3 -x /mnt/locktest/locktest-1 -c 'date "+locked @
%T"; sleep 10;  date "+unlocked @ %T"'; date "+sts=$? @ %T"
Mon Jan 25 17:08:27 PST 2016
sts=1 @ 17:08:30

On the GlusterFS server, there are then two file-handles stuck
open on the lockfile by the brick process, and one lock, these
normally disappear when the lock is released:

[Gluster-186]# date; lsof /exports/nas-segment-0001/locktest/locktest-1
Mon Jan 25 17:10:58 PST 2016
COMMAND     PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
glusterfs 31421 root   14w   REG   8,48        0  120
/exports/nas-segment-0001/locktest/locktest-1
glusterfs 31421 root   15w   REG   8,48        0  120
/exports/nas-segment-0001/locktest/locktest-1

[Gluster-186]# fgrep 31421 /proc/locks
2: POSIX  ADVISORY  WRITE 31421 08:04:2329801 0 EOF

[Gluster-186]# ps -elf |grep 31421
/usr/sbin/glusterfsd -s 10.10.60.186 --volfile-id
locktest.10.10.60.186.exports-nas-segment-0001-locktest 

[Linux-105]# rm /mnt/locktest/locktest-1

4) Repeating the same test from two hosts but without using
the "-w 3" timeout option, and instead killing the second host
command with ^C fails the same way.

5) Repeating the same test with two shells on the same host
does *not* exhibit the problem:

[Linux-105]# date; flock -w 3 -x /mnt/locktest/locktest-1 -c 'date "+locked @
%T"; sleep 10;  date "+unlocked @ %T"'; date "+sts=$? @ %T"
Mon Jan 25 17:14:21 PST 2016
locked @ 17:14:21
unlocked @ 17:14:31
sts=0 @ 17:14:31

[Linux-105]# date; flock -w 3 -x /mnt/locktest/locktest-1 -c 'date "+locked @
%T"; sleep 10;  date "+unlocked @ %T"'; date "+sts=$? @ %T"
Mon Jan 25 17:14:23 PST 2016
sts=1 @ 17:14:26

[Linux-105]# date; flock -w 3 -x /mnt/locktest/locktest-1 -c 'date "+locked @
%T"; sleep 10;  date "+unlocked @ %T"'; date "+sts=$? @ %T"
Mon Jan 25 17:14:39 PST 2016
locked @ 17:14:39
unlocked @ 17:14:49
sts=0 @ 17:14:49

[Linux-105]# date; flock -w 3 -x /mnt/locktest/locktest-1 -c 'date "+locked @
%T"; sleep 10;  date "+unlocked @ %T"'; date "+sts=$? @ %T"
Mon Jan 25 17:14:53 PST 2016
locked @ 17:14:53
unlocked @ 17:15:03
sts=0 @ 17:15:03

[Linux-105]# date; flock -w 3 -x /mnt/locktest/locktest-1 -c 'date "+locked @
%T"; sleep 10;  date "+unlocked @ %T"'; date "+sts=$? @ %T"
Mon Jan 25 17:14:39 PST 2016
locked @ 17:14:39
unlocked @ 17:14:49
sts=0 @ 17:14:49

5) Additional information:

[Gluster-186]# glusterd -V
glusterfs 3.6.5 built on Sep  2 2015 12:35:56

[Gluster-186]# gluster volume info locktest
Volume Name: locktest
Type: Distribute
Volume ID: f56cb000-47b8-49db-b885-a8ab50333dd2
Status: Started
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: 10.10.60.186:/exports/nas-segment-0001/locktest
Options Reconfigured:
nfs.rpc-auth-allow: *
server.allow-insecure: on
performance.quick-read: off
performance.stat-prefetch: off
nfs.disable: off
nfs.addr-namelookup: off

[Gluster-186]# lsmod|egrep 'nfs|lock'

[Gluster-186]# rpcinfo -p
   program vers proto   port  service
    100000    4   tcp    111  portmapper
    100000    3   tcp    111  portmapper
    100000    2   tcp    111  portmapper
    100000    4   udp    111  portmapper
    100000    3   udp    111  portmapper
    100000    2   udp    111  portmapper
    100005    3   tcp  38465  mountd
    100005    1   tcp  38466  mountd
    100003    3   tcp   2049  nfs
    100021    4   tcp  38468  nlockmgr
    100021    1   udp    703  nlockmgr
    100227    3   tcp   2049  nfs_acl
    100021    1   tcp    705  nlockmgr

[Gluster-186]# netstat -nape |egrep ':38465|:38466|:2049|:38468|:703|:705'
tcp        0      0 0.0.0.0:705                 0.0.0.0:*                  
LISTEN      0          2240964    31474/glusterfs
tcp        0      0 0.0.0.0:2049                0.0.0.0:*                  
LISTEN      0          2240881    31474/glusterfs
tcp        0      0 0.0.0.0:38465               0.0.0.0:*                  
LISTEN      0          2240870    31474/glusterfs
tcp        0      0 0.0.0.0:38466               0.0.0.0:*                  
LISTEN      0          2240873    31474/glusterfs
tcp        0      0 0.0.0.0:38468               0.0.0.0:*                  
LISTEN      0          2240886    31474/glusterfs
udp        0      0 0.0.0.0:703                 0.0.0.0:*                      
        0          2240959    31474/glusterfs

[Linux-105]# date; mount|grep nfs
Mon Jan 25 16:53:14 PST 2016
10.10.60.186:/locktest on /mnt/locktest type nfs
(rw,vers=3,tcp,addr=10.10.60.186)

[Linux-121]# date; mount|grep nfs
Mon Jan 25 16:54:01 PST 2016
10.10.60.186:/locktest on /mnt/locktest type nfs
(rw,vers=3,tcp,addr=10.10.60.186)

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.