[Bugs] [Bug 1369448] New: readdir false-failure with non-Linux

bugzilla at redhat.com bugzilla at redhat.com
Tue Aug 23 12:58:08 UTC 2016


https://bugzilla.redhat.com/show_bug.cgi?id=1369448

            Bug ID: 1369448
           Summary: readdir false-failure with non-Linux
           Product: GlusterFS
           Version: 3.7.14
         Component: posix
          Keywords: Triaged
          Severity: medium
          Priority: medium
          Assignee: bugs at gluster.org
          Reporter: hgowtham at redhat.com
                CC: bugs at gluster.org, pkarampu at redhat.com,
                    root at linkage.white-void.net, stdin at niklaas.eu
        Depends On: 1297203
            Blocks: 1369447



+++ This bug was initially created as a clone of Bug #1297203 +++

Description of problem:
With GlusterFS on FreeBSD and VMware ESXi as NFS client, I encountered the
following error and accessing storage takes long time.

[2016-01-10 20:20:41.485157] E [posix.c:4902:posix_fill_readdir] 0-gv0-posix:
seekdir(0x5) failed on dir=0x806813940: Invalid argument (offset reused from
another DIR * structure?)
[2016-01-10 20:20:41.485283] I [server-rpc-fops.c:1882:server_readdirp_cbk]
0-gv0-server: 36451: READDIRP -2 (d3608328-6167-42fe-8ec8-e6cde384e1ab) ==>
(Invalid argument)

How reproducible:
Always.

Steps to Reproduce:
1. Build on FreeBSD 10.1-amd64 environment.
2. Create a single-brick volume on UFS or ZFS pool.
3. Connect from VMware ESXi 5.1 to GlusterFS, as NFS storage.

Actual results:
Got the errors and accessing storage from ESXi takes too long.

Expected results:
No errors logged.

Additional info:
Reading GlusterFS posix storage code I found two problems.
One is __posix_fd_ctx_get (in xlators/storage/posix/src/posix-helpers.c) does
not set pfd->dir_eof, and another one is posix_fill_readdir does not check
whether pfd->dir_eof is set.

dir_eof is added for NetBSD port, with bug 1129939 and review
http://review.gluster.org/8926.

--- Additional comment from Pranith Kumar K on 2016-06-15 03:20:25 EDT ---

hi,
     Is this bug still giving you a problem. I do not have any FreeBSD
machines, but if you could help, I will be happy to work with you to fix this
issue correctly for you. Sorry was busy with other commitments, so couldn't
spend time on this issue.

Pranith

--- Additional comment from 2510 on 2016-06-15 08:58:30 EDT ---

Hello,

Yes, still it is a 'problem' for me.


Recently I found more important problem that glusterfs reuses a value
('cookie') returned by telldir for another DIR (opendir'ed for same directory),
when accessing through gluster's NFS.
This problem also blocks me from using glusterfs.

I'm working for these bugs on GitHub. Here is a patch fixes problem above, but
it is only for FreeBSD+UFS, and it does not fix the root problem.

https://github.com/2510/glusterfs-freebsd/blob/develop/patches/glusterfs-3.6.8.patch3

--- Additional comment from Pranith Kumar K on 2016-06-15 10:16:16 EDT ---

Will it be possible for you to collaborate with me to work together to find the
Root cause and find a fix that works for you as well? Do you think you can
share some test machines so that we can work together on those machines to fix
this? I mainly develop on Linux. Do you think it is possible to recreate this
bug on a VM with freeBSD?

Pranith

--- Additional comment from 2510 on 2016-06-15 10:31:38 EDT ---

Yes, it is reproducible on a VM, but we need a test NFS client that sends
readdirs.
Please wait for a while. I'll prepare the machine.

--- Additional comment from Pranith Kumar K on 2016-06-15 10:36:09 EDT ---

Hey,
    I work in India TimeZone. It is 8PM here and I have some personal work now.
Do you mind if we catch up on this tomorrow? Please let me know your timezone
as well so that we can find a time that works for both of us.

Pranith

--- Additional comment from 2510 on 2016-06-15 10:59:30 EDT ---

Hello.

I am in Asia/Tokyo(Japan) timezone (UTC+9), and I am working for this problem
personally. (not for corporation/business works)

And, okay, preparing a dev VM will take some more time.
Can you send me SSH public key so I can set up an account for you?

Thanks,
2510

--- Additional comment from 2510 on 2016-06-16 08:45:58 EDT ---

A dev VM is ready.

--- Additional comment from Niklaas Baudet von Gersdorff on 2016-07-18 03:06:07
EDT ---

I run into the same and other problems when using gluster on FreeBSD. At some
point accessing the gluster becomes very slow and my systems slow down
tremendously. Have a look at this log: http://sprunge.us/MMhM

On the contrary to the issue mentioned above, I disabled NFS because it didn't
work either on FreeBSD. I already filed a bug report on FreeBSD's bugzilla:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=209752

--- Additional comment from Pranith Kumar K on 2016-07-18 06:31:52 EDT ---

(In reply to 2510 from comment #7)
> A dev VM is ready.

I can come online, sorry I missed this email. And because of the update on
comment-8 I came to know of this again. Please let me know.

--- Additional comment from 2510 on 2016-07-24 05:04:10 EDT ---

(In reply to Pranith Kumar K from comment #9)
> I can come online, sorry I missed this email. And because of the update on
> comment-8 I came to know of this again. Please let me know.

Can you tell me your SSH public key?
I'll create your account.

--- Additional comment from Pranith Kumar K on 2016-07-25 21:57:26 EDT ---

ssh-rsa
AAAAB3NzaC1yc2EAAAADAQABAAABAQDkduuGBq++zm/JKYVUcfM6YOqzYp2Dj0ag3OvlkFTXyNZ1QVOoEWuH9MAeF/MlHd14nLvFKSdpI+qr+faY+Wtyt/Za09YnizyMBuEo9hIw307EwynOdfAO8N/PKLAvtsNQ7Xk3UHUfHrvVuJr5qZFs1sWNau67/DBxd3bUO/FUl3FZoZqWg3/qsG8ZTCVEPc4N0qY9xiDFxgDh81lmK8t24S8d9RfMrKtpPbSe75HW1CxqM6AGLpQtDscIydGqmRYYcYSn9box4T3erbVxNpcpSlk6K1akMJhbuNoEbDfD7n4t8X/BLj/h3gJIUTlrXnpPj+hluiHDmeBlhu7a7ctd
pk at dhcp35-190.lab.eng.blr.redhat.com

--- Additional comment from 2510 on 2016-07-27 09:21:50 EDT ---

Okay, I created your account and VM is ready for dev.

------------------------------------------------------------------------

* ssh pk at 153.126.154.119 with your key.

* Here's my work. You may copy to your home directory.

/home/2510/glusterfs-3.6.8-orig
  - Original glusterfs-3.6.8, without any patches.
/home/2510/glusterfs-3.6.8-patched
  - with some patches, from /home/2510/glusterfs-freebsd/patches/*

* You can use sudo (for mount/umount, install/start/stop glusterfs)
* To mount or umount NFS directory:

$ sudo mount -t nfs 10.0.0.1:/testvol /mnt/testvol
$ sudo umount /mnt/testvol

* To start/stop glusterfs service:

$ sudo service start glusterfsd
$ sudo killall glusterd glusterfsd

* This VM is built for this ticket.
  I do not use for other purposes.

* Device for a brick is /dev/vtbd0p4, and mounted on /mnt/brick.

------------------------------------------------------------------------

Currently, I suspect that glusterfs NFS reuses cookie returned by seekdir over
different DIR.
(It means the ticket my subject, the false-positive, is wrong)

This can be tested by following:

1) Install glusterfs and run.
2) Create a volume (with a brick), then mount gluster volume via nfs.

# mkdir /mnt/testvol
# mount -t nfs 10.0.0.1:/testvol /mnt/testvol

3) Create many files on it.

# sudo sh -c 'for i in `seq 0 1 200`; do touch /mnt/testvol/$i; done'

4) List directory entries.

# ls /mnt/testvol

With original glusterfs, only 51 entries are listed. (entries numbered 48 to
200 are missing)
With patched glusterfs, all entries are listed.

--- Additional comment from Pranith Kumar K on 2016-07-29 14:18:58 EDT ---

(In reply to 2510 from comment #12)
> Okay, I created your account and VM is ready for dev.
> 
> ------------------------------------------------------------------------
> 
> * ssh pk at 153.126.154.119 with your key.
> 
> * Here's my work. You may copy to your home directory.
> 
> /home/2510/glusterfs-3.6.8-orig
>   - Original glusterfs-3.6.8, without any patches.
> /home/2510/glusterfs-3.6.8-patched
>   - with some patches, from /home/2510/glusterfs-freebsd/patches/*
> 
> * You can use sudo (for mount/umount, install/start/stop glusterfs)
> * To mount or umount NFS directory:
> 
> $ sudo mount -t nfs 10.0.0.1:/testvol /mnt/testvol
> $ sudo umount /mnt/testvol
> 
> * To start/stop glusterfs service:
> 
> $ sudo service start glusterfsd
> $ sudo killall glusterd glusterfsd
> 
> * This VM is built for this ticket.
>   I do not use for other purposes.
> 
> * Device for a brick is /dev/vtbd0p4, and mounted on /mnt/brick.
> 
> ------------------------------------------------------------------------
> 
> Currently, I suspect that glusterfs NFS reuses cookie returned by seekdir
> over different DIR.
> (It means the ticket my subject, the false-positive, is wrong)
> 
> This can be tested by following:
> 
> 1) Install glusterfs and run.
> 2) Create a volume (with a brick), then mount gluster volume via nfs.
> 
> # mkdir /mnt/testvol
> # mount -t nfs 10.0.0.1:/testvol /mnt/testvol
> 
> 3) Create many files on it.
> 
> # sudo sh -c 'for i in `seq 0 1 200`; do touch /mnt/testvol/$i; done'
> 
> 4) List directory entries.
> 
> # ls /mnt/testvol
> 
> With original glusterfs, only 51 entries are listed. (entries numbered 48 to
> 200 are missing)
> With patched glusterfs, all entries are listed.

Thanks for this VM. I may have to show this problem to some of our NFS devs
too. Please keep the VM until this bug is completely fixed. I will let you know
once we are done with finding the Root cause


Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1297203
[Bug 1297203] readdir false-failure with non-Linux
https://bugzilla.redhat.com/show_bug.cgi?id=1369447
[Bug 1369447] readdir false-failure with non-Linux
-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Bugs mailing list