[Gluster-devel] Performance problem with XFS
Xavier Hernandez
xhernandez at datalab.es
Tue Mar 26 18:01:16 UTC 2013
Hi,
since one of the improvements seemed to be the reduction of the number
of directories inside .glusterfs I've made a modification to
storage/posix so that instead of creating 2 levels of 256 directories
each, I create 4 levels of 16 directories.
With this change, the first and second ls take 0.9 seconds; the third 9.
I don't know what causes such slowness on the third ls, however the
second ls has improved a lot.
Any one has some advice ?
Is there any way to improve this ? some tweak of the kernel/xfs/gluster ?
Thanks,
Xavi
Al 26/03/13 11:02, En/na Xavier Hernandez ha escrit:
> Hi,
>
> I've reproduced a problem I've seen with directory listing of
> directories not accessed for a long time (some hours). Gluster version
> is 3.3.1.
>
> I've made the tests with different hardware and the behavior is quite
> similar.
>
> The problem can be clearly seen doing this:
>
> 1. Format bricks with XFS, inode size 512, and mount them
> 2. Create a gluster volume (I've tried several combinations, see later)
> 3. Start and mount it
> 4. Create a directory <vol>/dirs and fill it with 300 subdirectories
> 5. Unmount the volume, stop it and flush kernel caches of all servers
> (sync ; echo 3 > /proc/sys/vm/drop_caches)
> 6. Start the volume, mount it, and execute "time ls -l <vol>/dirs | wc
> -l"
> 7. Create 80.000 directories at <vol>/ (notice that these directories
> are not created inside <vol>/dirs)
> 8. Unmount the volume, stop it and flush kernel caches of all servers
> (sync ; echo 3 > /proc/sys/vm/drop_caches)
> 9. Start the volume, mount it, and execute "time ls -l <vol>/dirs | wc
> -l"
> 10. Delete directory <vol>/dirs and recreate it with 300
> subdirectories also
> 11. Unmount the volume, stop it and flush kernel caches of all servers
> (sync ; echo 3 > /proc/sys/vm/drop_caches)
> 12. Start the volume, mount it, and execute "time ls -l <vol>/dirs |
> wc -l"
>
> With this test, I get the following times:
>
> first ls: 1 second
> second ls: 3.5 seconds
> third ls: 10 seconds
>
> I don't understand the second ls because the <vol>/dirs directory
> still have the same 300 subdirectories. But the third one is worst.
>
> I've tried with different kinds of volumes (distributed-replicated,
> distributed, and even a single brick), and the behavior is the same
> (though the times are smaller when less bricks are involved).
>
> After reaching this situation, I've tried to get the previous ls times
> by deleting directories, however the times do not seem to improve.
> Only after doing some "dirty" tests and removing empty gfid
> directories from <vol>/.glusterfs on all bricks I get better times,
> though not as good as the first ls (3 - 4 seconds better than the
> third ls).
>
> This is always reproducible if the volume is stopped and the caches
> are emptied before each ls. With more files and/or directories, it can
> take up to 20 or more seconds to list a directory with 100-200
> subdirectories.
>
> Without stopping anything, a second ls responds in about 0.2 seconds.
>
> I've also tested this with ext4 and BTRFS (I know it is not supported,
> but tested anyway). These are the results:
>
> ext4 first ls: 0.5 seconds
> ext4 second ls: 0.8 seconds
> ext4 third ls: 7 seconds
>
> btrfs first ls: 0.5 seconds
> btrfs second ls: 0.5 seconds
> btrfs third ls: 0.5 seconds
>
> It seems clear that it depends on the file system, but if I access
> directly the bricks, all ls take at most 0.1 seconds to complete.
>
> Repairing and defragmenting the bricks does not help.
>
> strace'ing the glusterfs process of the bricks, I see that for each
> directory a lot of entries from <vol>/.glusterfs are lstat'ed and a
> lot of lgetxattr are called. For 300 directories I've counted more
> than 4500 lstat's and more than 5300 lgetxattr, many of them repeated.
> I've also noticed that some lstat's take from 10 to 60 ms to complete
> (with XFS).
>
> Is there any way to minimize these effects ? I'm doing something wrong ?
>
> Thanks in advance for your help,
>
> Xavi
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
> https://lists.nongnu.org/mailman/listinfo/gluster-devel
More information about the Gluster-devel
mailing list