[Gluster-devel] Performance problem with XFS

Wed Mar 27 11:30:49 UTC 2013

I can't combine -Tt with -c, it only shows the final report but not the 
time consumed on each call. Also, -c flag shows system CPU time for the 
ls process, not wall clock time. Values using -c seem quite normal.

Using -Tt it reports the wall clock time for each call. I summarized the 
results in a table, attached to the email.

I've also included a detailed list of the system calls made by ls sorted 
by time.

Xavi

Al 26/03/13 19:49, En/na Anand Avati ha escrit:
> Can you run ls as 'strace -Ttc ls' in each of the three runs to 
> compare the output of first and third run to see where most of the 
> time is getting spent?
>
> Avati
>
> On Tue, Mar 26, 2013 at 11:01 AM, Xavier Hernandez 
> <xhernandez at datalab.es <mailto:xhernandez at datalab.es>> wrote:
>
>     Hi,
>
>     since one of the improvements seemed to be the reduction of the
>     number of directories inside .glusterfs I've made a modification
>     to storage/posix so that instead of creating 2 levels of 256
>     directories each, I create 4 levels of 16 directories.
>
>     With this change, the first and second ls take 0.9 seconds; the
>     third 9.
>
>     I don't know what causes such slowness on the third ls, however
>     the second ls has improved a lot.
>
>     Any one has some advice ?
>
>     Is there any way to improve this ? some tweak of the
>     kernel/xfs/gluster ?
>
>     Thanks,
>
>     Xavi
>
>     Al 26/03/13 11:02, En/na Xavier Hernandez ha escrit:
>
>         Hi,
>
>         I've reproduced a problem I've seen with directory listing of
>         directories not accessed for a long time (some hours). Gluster
>         version is 3.3.1.
>
>         I've made the tests with different hardware and the behavior
>         is quite similar.
>
>         The problem can be clearly seen doing this:
>
>         1. Format bricks with XFS, inode size 512, and mount them
>         2. Create a gluster volume (I've tried several combinations,
>         see later)
>         3. Start and mount it
>         4. Create a directory <vol>/dirs and fill it with 300
>         subdirectories
>         5. Unmount the volume, stop it and flush kernel caches of all
>         servers (sync ; echo 3 > /proc/sys/vm/drop_caches)
>         6. Start the volume, mount it, and execute "time ls -l
>         <vol>/dirs | wc -l"
>         7. Create 80.000 directories at <vol>/ (notice that these
>         directories are not created inside <vol>/dirs)
>         8. Unmount the volume, stop it and flush kernel caches of all
>         servers (sync ; echo 3 > /proc/sys/vm/drop_caches)
>         9. Start the volume, mount it, and execute "time ls -l
>         <vol>/dirs | wc -l"
>         10. Delete directory <vol>/dirs and recreate it with 300
>         subdirectories also
>         11. Unmount the volume, stop it and flush kernel caches of all
>         servers (sync ; echo 3 > /proc/sys/vm/drop_caches)
>         12. Start the volume, mount it, and execute "time ls -l
>         <vol>/dirs | wc -l"
>
>         With this test, I get the following times:
>
>         first ls: 1 second
>         second ls: 3.5 seconds
>         third ls: 10 seconds
>
>         I don't understand the second ls because the <vol>/dirs
>         directory still have the same 300 subdirectories. But the
>         third one is worst.
>
>         I've tried with different kinds of volumes
>         (distributed-replicated, distributed, and even a single
>         brick), and the behavior is the same (though the times are
>         smaller when less bricks are involved).
>
>         After reaching this situation, I've tried to get the previous
>         ls times by deleting directories, however the times do not
>         seem to improve. Only after doing some "dirty" tests and
>         removing empty gfid directories from <vol>/.glusterfs on all
>         bricks I get better times, though not as good as the first ls
>         (3 - 4 seconds better than the third ls).
>
>         This is always reproducible if the volume is stopped and the
>         caches are emptied before each ls. With more files and/or
>         directories, it can take up to 20 or more seconds to list a
>         directory with 100-200 subdirectories.
>
>         Without stopping anything, a second ls responds in about 0.2
>         seconds.
>
>         I've also tested this with ext4 and BTRFS (I know it is not
>         supported, but tested anyway). These are the results:
>
>         ext4 first ls: 0.5 seconds
>         ext4 second ls: 0.8 seconds
>         ext4 third ls: 7 seconds
>
>         btrfs first ls: 0.5 seconds
>         btrfs second ls: 0.5 seconds
>         btrfs third ls: 0.5 seconds
>
>         It seems clear that it depends on the file system, but if I
>         access directly the bricks, all ls take at most 0.1 seconds to
>         complete.
>
>         Repairing and defragmenting the bricks does not help.
>
>         strace'ing the glusterfs process of the bricks, I see that for
>         each directory a lot of entries from <vol>/.glusterfs are
>         lstat'ed and a lot of lgetxattr are called. For 300
>         directories I've counted more than 4500 lstat's and more than
>         5300 lgetxattr, many of them repeated. I've also noticed that
>         some lstat's take from 10 to 60 ms to complete (with XFS).
>
>         Is there any way to minimize these effects ? I'm doing
>         something wrong ?
>
>         Thanks in advance for your help,
>
>         Xavi
>
>         _______________________________________________
>         Gluster-devel mailing list
>         Gluster-devel at nongnu.org <mailto:Gluster-devel at nongnu.org>
>         https://lists.nongnu.org/mailman/listinfo/gluster-devel
>
>
>
>     _______________________________________________
>     Gluster-devel mailing list
>     Gluster-devel at nongnu.org <mailto:Gluster-devel at nongnu.org>
>     https://lists.nongnu.org/mailman/listinfo/gluster-devel
>
>
>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
> https://lists.nongnu.org/mailman/listinfo/gluster-devel

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-devel/attachments/20130327/fb5ad441/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: strace-ls.txt.gz
Type: application/x-gzip
Size: 13331 bytes
Desc: not available
URL: <http://supercolony.gluster.org/pipermail/gluster-devel/attachments/20130327/fb5ad441/attachment-0001.gz>