[Gluster-devel] Performance problem with XFS
Xavier Hernandez
xhernandez at datalab.es
Wed Mar 27 11:30:49 UTC 2013
I can't combine -Tt with -c, it only shows the final report but not the
time consumed on each call. Also, -c flag shows system CPU time for the
ls process, not wall clock time. Values using -c seem quite normal.
Using -Tt it reports the wall clock time for each call. I summarized the
results in a table, attached to the email.
I've also included a detailed list of the system calls made by ls sorted
by time.
Xavi
Al 26/03/13 19:49, En/na Anand Avati ha escrit:
> Can you run ls as 'strace -Ttc ls' in each of the three runs to
> compare the output of first and third run to see where most of the
> time is getting spent?
>
> Avati
>
> On Tue, Mar 26, 2013 at 11:01 AM, Xavier Hernandez
> <xhernandez at datalab.es <mailto:xhernandez at datalab.es>> wrote:
>
> Hi,
>
> since one of the improvements seemed to be the reduction of the
> number of directories inside .glusterfs I've made a modification
> to storage/posix so that instead of creating 2 levels of 256
> directories each, I create 4 levels of 16 directories.
>
> With this change, the first and second ls take 0.9 seconds; the
> third 9.
>
> I don't know what causes such slowness on the third ls, however
> the second ls has improved a lot.
>
> Any one has some advice ?
>
> Is there any way to improve this ? some tweak of the
> kernel/xfs/gluster ?
>
> Thanks,
>
> Xavi
>
> Al 26/03/13 11:02, En/na Xavier Hernandez ha escrit:
>
> Hi,
>
> I've reproduced a problem I've seen with directory listing of
> directories not accessed for a long time (some hours). Gluster
> version is 3.3.1.
>
> I've made the tests with different hardware and the behavior
> is quite similar.
>
> The problem can be clearly seen doing this:
>
> 1. Format bricks with XFS, inode size 512, and mount them
> 2. Create a gluster volume (I've tried several combinations,
> see later)
> 3. Start and mount it
> 4. Create a directory <vol>/dirs and fill it with 300
> subdirectories
> 5. Unmount the volume, stop it and flush kernel caches of all
> servers (sync ; echo 3 > /proc/sys/vm/drop_caches)
> 6. Start the volume, mount it, and execute "time ls -l
> <vol>/dirs | wc -l"
> 7. Create 80.000 directories at <vol>/ (notice that these
> directories are not created inside <vol>/dirs)
> 8. Unmount the volume, stop it and flush kernel caches of all
> servers (sync ; echo 3 > /proc/sys/vm/drop_caches)
> 9. Start the volume, mount it, and execute "time ls -l
> <vol>/dirs | wc -l"
> 10. Delete directory <vol>/dirs and recreate it with 300
> subdirectories also
> 11. Unmount the volume, stop it and flush kernel caches of all
> servers (sync ; echo 3 > /proc/sys/vm/drop_caches)
> 12. Start the volume, mount it, and execute "time ls -l
> <vol>/dirs | wc -l"
>
> With this test, I get the following times:
>
> first ls: 1 second
> second ls: 3.5 seconds
> third ls: 10 seconds
>
> I don't understand the second ls because the <vol>/dirs
> directory still have the same 300 subdirectories. But the
> third one is worst.
>
> I've tried with different kinds of volumes
> (distributed-replicated, distributed, and even a single
> brick), and the behavior is the same (though the times are
> smaller when less bricks are involved).
>
> After reaching this situation, I've tried to get the previous
> ls times by deleting directories, however the times do not
> seem to improve. Only after doing some "dirty" tests and
> removing empty gfid directories from <vol>/.glusterfs on all
> bricks I get better times, though not as good as the first ls
> (3 - 4 seconds better than the third ls).
>
> This is always reproducible if the volume is stopped and the
> caches are emptied before each ls. With more files and/or
> directories, it can take up to 20 or more seconds to list a
> directory with 100-200 subdirectories.
>
> Without stopping anything, a second ls responds in about 0.2
> seconds.
>
> I've also tested this with ext4 and BTRFS (I know it is not
> supported, but tested anyway). These are the results:
>
> ext4 first ls: 0.5 seconds
> ext4 second ls: 0.8 seconds
> ext4 third ls: 7 seconds
>
> btrfs first ls: 0.5 seconds
> btrfs second ls: 0.5 seconds
> btrfs third ls: 0.5 seconds
>
> It seems clear that it depends on the file system, but if I
> access directly the bricks, all ls take at most 0.1 seconds to
> complete.
>
> Repairing and defragmenting the bricks does not help.
>
> strace'ing the glusterfs process of the bricks, I see that for
> each directory a lot of entries from <vol>/.glusterfs are
> lstat'ed and a lot of lgetxattr are called. For 300
> directories I've counted more than 4500 lstat's and more than
> 5300 lgetxattr, many of them repeated. I've also noticed that
> some lstat's take from 10 to 60 ms to complete (with XFS).
>
> Is there any way to minimize these effects ? I'm doing
> something wrong ?
>
> Thanks in advance for your help,
>
> Xavi
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org <mailto:Gluster-devel at nongnu.org>
> https://lists.nongnu.org/mailman/listinfo/gluster-devel
>
>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org <mailto:Gluster-devel at nongnu.org>
> https://lists.nongnu.org/mailman/listinfo/gluster-devel
>
>
>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
> https://lists.nongnu.org/mailman/listinfo/gluster-devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-devel/attachments/20130327/fb5ad441/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: strace-ls.txt.gz
Type: application/x-gzip
Size: 13331 bytes
Desc: not available
URL: <http://supercolony.gluster.org/pipermail/gluster-devel/attachments/20130327/fb5ad441/attachment-0001.gz>
More information about the Gluster-devel
mailing list