[Gluster-devel] Parallel readdir from NFS clients causes incorrect data

Wed Apr 3 23:35:32 UTC 2013

Hmm, I was be tempted to suggest that you were bitten by the gluster/ext4
readdir's d_off incompatibility issue (which got recently fixed
http://review.gluster.org/4711/). But you say it works fine when you do ls
one at a time sequentially.

I just realized after reading your email that, in glusterfs, because we use
the same anonymous fd for multiple client/application's readdir query, we
have a race in the posix translator where two threads attempt to push/pull
the same backend cursor in a chaotic way resulting in duplicate/lost
entries. This might be the issue you are seeing, just guessing.

Will you be willing to try out a source cod patch on top of the git HEAD to
rebuild your glusterfs and verify if it fixes the issue? Will really
appreciate it!

Thanks,
Avati

On Wed, Apr 3, 2013 at 2:37 PM, Michael Brown <michael at netdirect.ca> wrote:

>  I'm seeing a problem on my fairly fresh RHEL gluster install. Smells to
> me like a parallelism problem on the server.
>
> If I mount a gluster volume via NFS (using glusterd's internal NFS server,
> nfs-kernel-server) and read a directory from multiple clients *in
> parallel*, I get inconsistent results across servers. Some files are
> missing from the directory listing, some may be present twice!
>
> Exactly which files (or directories!) are missing/duplicated varies each
> time. But I can very consistently reproduce the behaviour.
>
> You can see a screenshot here: http://imgur.com/JU8AFrt
>
> The replication steps are:
> * clusterssh to each NFS client
> * unmount /gv0 (to clear cache)
> * mount /gv0 [1]
> * ls -al /gv0/common/apache-jmeter-2.9/bin (which is where I first
> noticed this)
>
> Here's the rub: if, instead of doing the 'ls' in parallel, I do it in
> series, it works just fine (consistent correct results everywhere). But
> hitting the gluster server from multiple clients *at the same time*causes problems.
>
> I can still stat() and open() the files missing from the directory
> listing, they just don't show up in an enumeration.
>
> Mounting gv0 as a gluster client filesystem works just fine.
>
> Details of my setup:
> 2 × gluster servers: 2×E5-2670, 128GB RAM, RHEL 6.4 64-bit,
> glusterfs-server-3.3.1-1.el6.x86_64 (from EPEL)
> 4 × NFS clients: 2×E5-2660, 128GB RAM, RHEL 5.7 64-bit,
> glusterfs-3.3.1-11.el5 (from kkeithley's repo, only used for testing)
> gv0 volume information is below
> bricks are 400GB SSDs with ext4[2]
> common network is 10GbE, replication between servers happens over direct
> 10GbE link.
>
> I will be testing on xfs/btrfs/zfs eventually, but for now I'm on ext4.
>
> Also attached is my chatlog from asking about this in #gluster
>
> [1]: fstab line is: fearless1:/gv0 /gv0 nfs
> defaults,sync,tcp,wsize=8192,rsize=8192 0 0
> [2]: yes, I've turned off dir_index to avoid That Bug. I've run the d_off
> test, results are here: http://pastebin.com/zQt5gZnZ
>
> ----
> gluster> volume info gv0
>
> Volume Name: gv0
> Type: Distributed-Replicate
> Volume ID: 20117b48-7f88-4f16-9490-a0349afacf71
> Status: Started
> Number of Bricks: 8 x 2 = 16
> Transport-type: tcp
> Bricks:
> Brick1: fearless1:/export/bricks/500117310007a6d8/glusterdata
> Brick2: fearless2:/export/bricks/500117310007a674/glusterdata
> Brick3: fearless1:/export/bricks/500117310007a714/glusterdata
> Brick4: fearless2:/export/bricks/500117310007a684/glusterdata
> Brick5: fearless1:/export/bricks/500117310007a7dc/glusterdata
> Brick6: fearless2:/export/bricks/500117310007a694/glusterdata
> Brick7: fearless1:/export/bricks/500117310007a7e4/glusterdata
> Brick8: fearless2:/export/bricks/500117310007a720/glusterdata
> Brick9: fearless1:/export/bricks/500117310007a7ec/glusterdata
> Brick10: fearless2:/export/bricks/500117310007a74c/glusterdata
> Brick11: fearless1:/export/bricks/500117310007a838/glusterdata
> Brick12: fearless2:/export/bricks/500117310007a814/glusterdata
> Brick13: fearless1:/export/bricks/500117310007a850/glusterdata
> Brick14: fearless2:/export/bricks/500117310007a84c/glusterdata
> Brick15: fearless1:/export/bricks/500117310007a858/glusterdata
> Brick16: fearless2:/export/bricks/500117310007a8f8/glusterdata
> Options Reconfigured:
> diagnostics.count-fop-hits: on
> diagnostics.latency-measurement: on
> nfs.disable: off
> ----
>
> --
> Michael Brown               | `One of the main causes of the fall of
> Systems Consultant          | the Roman Empire was that, lacking zero,
> Net Direct Inc.             | they had no way to indicate successful
> ☎: +1 519 883 1172 x5106    | termination of their C programs.' - Firth
>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
> https://lists.nongnu.org/mailman/listinfo/gluster-devel
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-devel/attachments/20130403/e813b6a2/attachment-0001.html>