[Gluster-devel] AFR+locks bug?

Mon Mar 17 19:19:51 UTC 2008

Hi Anand,

Anand Avati wrote:
> Szabolcs,
>  do you still face this issue?

Sorry for the long long delay and the late reply.

I've seen that the posix-locks translator had received heavy development
and had gone under major redesign a pair of months ago. So I tried the
latest TLA patchset, but no luck.

However, the situation is somewhat different now. On the master locker
(server), I get:

> lusta1:~# locktests -n 10 -f /mnt/glusterfs/locktest-alone.vol -c 3
> Init
> process initalization
> ....................
> --------------------------------------
> 
> TEST : TRY TO WRITE ON A READ  LOCK:==========
> TEST : TRY TO WRITE ON A WRITE LOCK:==========
> TEST : TRY TO READ  ON A READ  LOCK:==========
> TEST : TRY TO READ  ON A WRITE LOCK:==========
> TEST : TRY TO SET A READ  LOCK ON A READ  LOCK:===xxxxx=x

And a mount point lockup on all machines.

FUSE was not upgraded this time, it's still 2.7.0-glfs7.

Thanks,
--
Szabolcs

> 2008/1/19, Székelyi Szabolcs <cc at avaxio.hu <mailto:cc at avaxio.hu>>:
> 
>     Anand Avati wrote:
>     > Szabolcs,
>     >  I suspect it might be an issue with 2.7.2-glfs8. We are seeing
>     similar
>     > issues with the 2.7.2 fuse. Please let us know if 2.7.0 works well
>     for you.
> 
>     Well, with fuse-2.7.0-glfs7, the same happens.
> 
>     --
>     Szabolcs
> 
>     > 2008/1/17, Székelyi Szabolcs <cc at avaxio.hu <mailto:cc at avaxio.hu>
>     <mailto:cc at avaxio.hu <mailto:cc at avaxio.hu>>>:
>     >
>     >     AFR with posix-locks behaves really strange nowadays...
>     GlusterFS is a
>     >     fresh TLA checkout (patch-636), FUSE is brand the new 2.7.2-glfs8.
>     >
>     >     I have 4 servers with a 4-way AFR on each and features/posix-locks
>     >     loaded just above storage/posix bricks. On each AFR, one
>     replica is the
>     >     local storage, the remaining 3 are on the other 3 servers.
>     >
>     >     The 4 AFR bricks are mounted on each server from 'localhost'.
>     >
>     >     The machines are freshly booted. Basic FS functions (ls, copy,
>     cat) work
>     >     fine.
>     >
>     >     Now I run a distributed locking test using [1]. On the
>     "master" locker I
>     >     get:
>     >
>     >     > # /tmp/locktests -n 10 -c 3  -f /mnt/glusterfs/testfile
>     >     > Init
>     >     > process initalization
>     >     > ....................
>     >     > --------------------------------------
>     >     >
>     >     > TEST : TRY TO WRITE ON A READ  LOCK:==========
>     >     > TEST : TRY TO WRITE ON A WRITE LOCK:==========
>     >     > TEST : TRY TO READ  ON A READ  LOCK:==========
>     >     > TEST : TRY TO READ  ON A WRITE LOCK:==========
>     >     > TEST : TRY TO SET A READ  LOCK ON A READ  LOCK:
>     >
>     >     After about 5 minutes, another
>     >
>     >     > RDONLY: fcntl: Transport endpoint is not connected
>     >
>     >     appears, and the locking processes exit on all slave servers,
>     the master
>     >     blocks.
>     >
>     >     The mount point locks up. Even an `ls` from a different
>     terminal seems
>     >     to block forever.
>     >
>     >     You can find my server config below. Client configs are
>     simple, just a
>     >     protocol/client brick from localhost. I can provide server
>     debug logs if
>     >     you need.
>     >
>     >     Any idea?
>     >
>     >     Thanks,
>     >     --
>     >     Szabolcs
>     >
>     >
>     >     [1]
>     >    
>     http://nfsv4.bullopensource.org/tools/tests_tools/locktests-net.tar.gz
>     >
>     >
>     >     My server config (from a single node, lu1):
>     >
>     >     volume data-posix
>     >       type storage/posix
>     >       option directory /srv/glusterfs
>     >     end-volume
>     >
>     >     volume data1
>     >       type features/posix-locks
>     >       subvolumes data-posix
>     >     end-volume
>     >
>     >     volume data2
>     >       type protocol/client
>     >       option transport-type tcp/client
>     >       option remote-host lu2
>     >       option remote-subvolume data2
>     >     end-volume
>     >
>     >     volume data3
>     >       type protocol/client
>     >       option transport-type tcp/client
>     >       option remote-host lu3
>     >       option remote-subvolume data3
>     >     end-volume
>     >
>     >     volume data4
>     >       type protocol/client
>     >       option transport-type tcp/client
>     >       option remote-host lu4
>     >       option remote-subvolume data4
>     >     end-volume
>     >
>     >     volume data-afr
>     >       type cluster/afr
>     >       subvolumes data1 data2 data3 data4
>     >     end-volume
>     >
>     >     volume server
>     >       type protocol/server
>     >       subvolumes data1 data-afr
>     >       option transport-type tcp/server
>     >       option auth.ip.data1.allow 10.0.0.*
>     >       option auth.ip.data-afr.allow 127.0.0.1 <http://127.0.0.1>
>     <http://127.0.0.1>,10.0.0.*
>     >     end-volume
> 
> 
>     _______________________________________________
>     Gluster-devel mailing list
>     Gluster-devel at nongnu.org <mailto:Gluster-devel at nongnu.org>
>     http://lists.nongnu.org/mailman/listinfo/gluster-devel
> 
> 
> 
> 
> -- 
> If I traveled to the end of the rainbow
> As Dame Fortune did intend,
> Murphy would be there to tell me
> The pot's at the other end.