[Gluster-users] incomplete listing of a directory, sometimes getdents loops until out of memory

John Brunelle john_brunelle at harvard.edu
Fri Jun 14 17:04:48 UTC 2013

Thanks, Jeff!  I ran readdir.c on all 23 bricks on the gluster nfs
server to which my test clients are connected (one client that's
working, and one that's not; and I ran on those, too).  The results
are attached.

The values it prints are all well within 32 bits, *except* for one
that's suspiciously the max 32-bit signed int:

$ cat readdir.out.* | awk '{print $1}' | sort | uniq | tail

That outlier is the same subdirectory on all 23 bricks.  Could this be
the issue?



On Fri, Jun 14, 2013 at 11:05 AM, John Brunelle
<john_brunelle at harvard.edu> wrote:
> Thanks for the reply, Vijay.  I set that parameter "On", but it hasn't
> helped, and in fact it seems a bit worse.  After making the change on
> the volume and dropping caches on some test clients, some are now
> seeing zero subdirectories at all.  In my tests before, after dropping
> caches clients go back to seeing all the subdirectories, and it's only
> after a while they start disappearing (and have never gone to zero
> before).
> Any other ideas?
> Thanks,
> John
> On Fri, Jun 14, 2013 at 10:35 AM, Vijay Bellur <vbellur at redhat.com> wrote:
>> On 06/13/2013 03:38 PM, John Brunelle wrote:
>>> Hello,
>>> We're having an issue with our distributed gluster filesystem:
>>> * gluster 3.3.1 servers and clients
>>> * distributed volume -- 69 bricks (4.6T each) split evenly across 3 nodes
>>> * xfs backend
>>> * nfs clients
>>> * nfs.enable-ino32: On
>>> * servers: CentOS 6.3, 2.6.32-279.14.1.el6.centos.plus.x86_64
>>> * cleints: CentOS 5.7, 2.6.18-274.12.1.el5
>>> We have a directory containing 3,343 subdirectories.  On some clients,
>>> ls lists only a subset of the directories (a different amount on
>>> different clients).  On others, ls gets stuck in a getdents loop and
>>> consumes more and more memory until it hits ENOMEM.  On yet others, it
>>> works fine.  Having the bad clients remount or drop caches makes the
>>> problem temporarily go away, but eventually it comes back.  The issue
>>> sounds a lot like bug #838784, but we are using xfs on the backend,
>>> and this seems like more of a client issue.
>> Turning on "cluster.readdir-optimize" can help readdir when a directory
>> contains a number of sub-directories and there are more bricks in the
>> volume. Do you observe any change with this option enabled?
>> -Vijay
-------------- next part --------------
A non-text attachment was scrubbed...
Name: readdir_output.tar.bz2
Type: application/x-bzip2
Size: 327378 bytes
Desc: not available
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20130614/f91c9ec0/attachment.bz2>

More information about the Gluster-users mailing list