[Gluster-devel] Performance problem with XFS

Tue Mar 26 18:49:04 UTC 2013

Can you run ls as 'strace -Ttc ls' in each of the three runs to compare the
output of first and third run to see where most of the time is getting
spent?

Avati

On Tue, Mar 26, 2013 at 11:01 AM, Xavier Hernandez <xhernandez at datalab.es>wrote:

> Hi,
>
> since one of the improvements seemed to be the reduction of the number of
> directories inside .glusterfs I've made a modification to storage/posix so
> that instead of creating 2 levels of 256 directories each, I create 4
> levels of 16 directories.
>
> With this change, the first and second ls take 0.9 seconds; the third 9.
>
> I don't know what causes such slowness on the third ls, however the second
> ls has improved a lot.
>
> Any one has some advice ?
>
> Is there any way to improve this ? some tweak of the kernel/xfs/gluster ?
>
> Thanks,
>
> Xavi
>
> Al 26/03/13 11:02, En/na Xavier Hernandez ha escrit:
>
>  Hi,
>>
>> I've reproduced a problem I've seen with directory listing of directories
>> not accessed for a long time (some hours). Gluster version is 3.3.1.
>>
>> I've made the tests with different hardware and the behavior is quite
>> similar.
>>
>> The problem can be clearly seen doing this:
>>
>> 1. Format bricks with XFS, inode size 512, and mount them
>> 2. Create a gluster volume (I've tried several combinations, see later)
>> 3. Start and mount it
>> 4. Create a directory <vol>/dirs and fill it with 300 subdirectories
>> 5. Unmount the volume, stop it and flush kernel caches of all servers
>> (sync ; echo 3 > /proc/sys/vm/drop_caches)
>> 6. Start the volume, mount it, and execute "time ls -l <vol>/dirs | wc -l"
>> 7. Create 80.000 directories at <vol>/ (notice that these directories are
>> not created inside <vol>/dirs)
>> 8. Unmount the volume, stop it and flush kernel caches of all servers
>> (sync ; echo 3 > /proc/sys/vm/drop_caches)
>> 9. Start the volume, mount it, and execute "time ls -l <vol>/dirs | wc -l"
>> 10. Delete directory <vol>/dirs and recreate it with 300 subdirectories
>> also
>> 11. Unmount the volume, stop it and flush kernel caches of all servers
>> (sync ; echo 3 > /proc/sys/vm/drop_caches)
>> 12. Start the volume, mount it, and execute "time ls -l <vol>/dirs | wc
>> -l"
>>
>> With this test, I get the following times:
>>
>> first ls: 1 second
>> second ls: 3.5 seconds
>> third ls: 10 seconds
>>
>> I don't understand the second ls because the <vol>/dirs directory still
>> have the same 300 subdirectories. But the third one is worst.
>>
>> I've tried with different kinds of volumes (distributed-replicated,
>> distributed, and even a single brick), and the behavior is the same (though
>> the times are smaller when less bricks are involved).
>>
>> After reaching this situation, I've tried to get the previous ls times by
>> deleting directories, however the times do not seem to improve. Only after
>> doing some "dirty" tests and removing empty gfid directories from
>> <vol>/.glusterfs on all bricks I get better times, though not as good as
>> the first ls (3 - 4 seconds better than the third ls).
>>
>> This is always reproducible if the volume is stopped and the caches are
>> emptied before each ls. With more files and/or directories, it can take up
>> to 20 or more seconds to list a directory with 100-200 subdirectories.
>>
>> Without stopping anything, a second ls responds in about 0.2 seconds.
>>
>> I've also tested this with ext4 and BTRFS (I know it is not supported,
>> but tested anyway). These are the results:
>>
>> ext4 first ls: 0.5 seconds
>> ext4 second ls: 0.8 seconds
>> ext4 third ls: 7 seconds
>>
>> btrfs first ls: 0.5 seconds
>> btrfs second ls: 0.5 seconds
>> btrfs third ls: 0.5 seconds
>>
>> It seems clear that it depends on the file system, but if I access
>> directly the bricks, all ls take at most 0.1 seconds to complete.
>>
>> Repairing and defragmenting the bricks does not help.
>>
>> strace'ing the glusterfs process of the bricks, I see that for each
>> directory a lot of entries from <vol>/.glusterfs are lstat'ed and a lot of
>> lgetxattr are called. For 300 directories I've counted more than 4500
>> lstat's and more than 5300 lgetxattr, many of them repeated. I've also
>> noticed that some lstat's take from 10 to 60 ms to complete (with XFS).
>>
>> Is there any way to minimize these effects ? I'm doing something wrong ?
>>
>> Thanks in advance for your help,
>>
>> Xavi
>>
>> ______________________________**_________________
>> Gluster-devel mailing list
>> Gluster-devel at nongnu.org
>> https://lists.nongnu.org/**mailman/listinfo/gluster-devel<https://lists.nongnu.org/mailman/listinfo/gluster-devel>
>>
>
>
> ______________________________**_________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
> https://lists.nongnu.org/**mailman/listinfo/gluster-devel<https://lists.nongnu.org/mailman/listinfo/gluster-devel>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-devel/attachments/20130326/78043156/attachment-0001.html>