[Gluster-users] client coherence problem with locks and truncate

Sat Sep 5 10:45:16 UTC 2009

Can you try your tests by mounting with --attribute-timeout=0 command
line parameter?

Avati

On Fri, Sep 4, 2009 at 12:01 PM, Robert L. Millner<rmillner at webappvm.com> wrote:
> Hi,
>
> We're observing a coherence issue with GlusterFS 2.0.6.  One client
> opens a file, locks, truncates and writes.  Another client waiting on a
> read lock may see a zero length file after the read lock is granted.
>
> If both nodes read/write in a loop, this tends to happen within a few
> hundred tries.  The same code runs for 10000 loops without a problem if
> both programs run on GlusterFS on the same node or local file system
> (ext3) on the same node.
>
> Node1 does the following (strace):
>
> 2206  1252031615.509555 open("testfile", O_RDWR|O_CREAT|O_LARGEFILE, 0644) = 3
> 2206  1252031615.514886 fcntl64(3, F_SETLKW64, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0}, 0xbfcaee78) = 0
> 2206  1252031615.517742 select(0, NULL, NULL, NULL, {0, 0}) = 0 (Timeout)
> 2206  1252031615.517788 _llseek(3, 0, [0], SEEK_SET) = 0
> 2206  1252031615.517829 ftruncate64(3, 0) = 0
> 2206  1252031615.520632 write(3, "01234567890123456789012345678901"..., 900) = 900
> 2206  1252031615.599782 close(3)        = 0
>
> 2206  1252031615.604731 open("testfile", O_RDONLY|O_CREAT|O_LARGEFILE, 0644) = 3
> 2206  1252031615.615158 fcntl64(3, F_SETLKW64, {type=F_RDLCK, whence=SEEK_SET, start=0, len=0}, 0xbfcaee78) = 0
> 2206  1252031615.624680 fstat64(3, {st_dev=makedev(0, 13), st_ino=182932, st_mode=S_IFREG|0644, st_nlink=1, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=16, st_size=900, st_atime=2009/09/03-19:33:35, st_mtime=2009/09/03-19:33:35, st_ctime=2009/09/03-19:33:35}) = 0
> 2206  1252031615.624787 _llseek(3, 0, [0], SEEK_SET) = 0
> 2206  1252031615.624851 read(3, "01234567890123456789012345678901"..., 4096) = 900
> 2206  1252031615.625126 close(3)        = 0
>
>
> Node2 does the following (strace):
>
> 2126  1252031615.504350 open("testfile", O_RDONLY|O_CREAT|O_LARGEFILE, 0644) = 3
> 2126  1252031615.509004 fcntl64(3, F_SETLKW64, {type=F_RDLCK, whence=SEEK_SET, start=0, len=0}, 0xbfc05dc8) = 0
> 2126  1252031615.587697 fstat64(3, {st_dev=makedev(0, 13), st_ino=182932, st_mode=S_IFREG|0644, st_nlink=1, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, st_size=0, st_atime=2009/09/03-19:33:35, st_mtime=2009/09/03-19:33:35, st_ctime=2009/09/03-19:33:35}) = 0
> 2126  1252031615.588027 _llseek(3, 0, [0], SEEK_SET) = 0
> 2126  1252031615.588089 read(3, "", 4096) = 0
> 2126  1252031615.588228 close(3)        = 0
>
>
>
> Both node clocks are NTP disciplined.  As these are virtual machines,
> there's a higher dispersion but I believe you can round to the nearest
> 0.1s for time correlation.
>
> Node2 waits for the write lock to clear before getting its read lock.
> Node1 also tries to read the file and agrees with node2 on every stat
> field except st_size.  Node2 tries to read the file and gets no data.
>
> This is on 32 bit CentOS5 with a 2.6.27 kernel, fuse 2.7.4 on VMware.
> Also observed on Amazon EC2 with their 2.6.21 fc8xen kernel.
>
> I can make the problem unrepeatable in 10000 tries by changing the
> select on Node1 to timeout in 0.1 seconds.  The problem repeats in under
> 5000 tries if select is set to timeout in 0.01 seconds.
>
> This happens whether or not gluster is run with
> --disable-direct-io-mode.
>
> The volume is mirrored between four servers.  Below is the server
> configuration.  The export directory is on ext3.
>
> volume posix
>  type storage/posix
>  option directory /var/data/export
> end-volume
>
> volume locks
>  type features/locks
>  option mandatory-locks on
>  subvolumes posix
> end-volume
>
> volume brick
>  type performance/io-threads
>  option thread-count 8
>  subvolumes locks
> end-volume
>
> volume server
>  type protocol/server
>  option transport-type tcp
>  option auth.addr.brick.allow *
>  subvolumes brick
> end-volume
>
>
> And the client configuration:
>
> volume remote1
>  type protocol/client
>  option transport-type tcp
>  option remote-host 10.10.10.145
>  option remote-subvolume brick
> end-volume
>
> volume remote2
>  type protocol/client
>  option transport-type tcp
>  option remote-host 10.10.10.130
>  option remote-subvolume brick
> end-volume
>
> volume remote3
>  type protocol/client
>  option transport-type tcp
>  option remote-host 10.10.10.221
>  option remote-subvolume brick
> end-volume
>
> volume remote4
>  type protocol/client
>  option transport-type tcp
>  option remote-host 10.10.10.104
>  option remote-subvolume brick
> end-volume
>
> volume replicated
>  type cluster/replicate
>  subvolumes remote1 remote2 remote3 remote4
> end-volume
>
> volume writebehind
>    type performance/write-behind
>    subvolumes replicated
> end-volume
>
> volume cache
>    type performance/io-cache
>    subvolumes writebehind
> end-volume
>
>
>
> The problem persists with those configurations and if any or all of the
> following tweaks are made:
>
> 1. Remove the replicated volume and just use remote1.
> 2. Get rid of threads on the server.
> 3. Get rid of io-cache and writebehind on the clients.
> 4. Use mandatory locking on the test file.
>
>
> Please let me know if there's any more information needed to debug this
> further or any guidance on how to avoid it.
>
> Thank you!
>
>    Cheers,
>    Rob
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>