[Gluster-devel] Performance problem with XFS

Tue Mar 26 18:01:16 UTC 2013

Hi,

since one of the improvements seemed to be the reduction of the number 
of directories inside .glusterfs I've made a modification to 
storage/posix so that instead of creating 2 levels of 256 directories 
each, I create 4 levels of 16 directories.

With this change, the first and second ls take 0.9 seconds; the third 9.

I don't know what causes such slowness on the third ls, however the 
second ls has improved a lot.

Any one has some advice ?

Is there any way to improve this ? some tweak of the kernel/xfs/gluster ?

Thanks,

Xavi

Al 26/03/13 11:02, En/na Xavier Hernandez ha escrit:
> Hi,
>
> I've reproduced a problem I've seen with directory listing of 
> directories not accessed for a long time (some hours). Gluster version 
> is 3.3.1.
>
> I've made the tests with different hardware and the behavior is quite 
> similar.
>
> The problem can be clearly seen doing this:
>
> 1. Format bricks with XFS, inode size 512, and mount them
> 2. Create a gluster volume (I've tried several combinations, see later)
> 3. Start and mount it
> 4. Create a directory <vol>/dirs and fill it with 300 subdirectories
> 5. Unmount the volume, stop it and flush kernel caches of all servers 
> (sync ; echo 3 > /proc/sys/vm/drop_caches)
> 6. Start the volume, mount it, and execute "time ls -l <vol>/dirs | wc 
> -l"
> 7. Create 80.000 directories at <vol>/ (notice that these directories 
> are not created inside <vol>/dirs)
> 8. Unmount the volume, stop it and flush kernel caches of all servers 
> (sync ; echo 3 > /proc/sys/vm/drop_caches)
> 9. Start the volume, mount it, and execute "time ls -l <vol>/dirs | wc 
> -l"
> 10. Delete directory <vol>/dirs and recreate it with 300 
> subdirectories also
> 11. Unmount the volume, stop it and flush kernel caches of all servers 
> (sync ; echo 3 > /proc/sys/vm/drop_caches)
> 12. Start the volume, mount it, and execute "time ls -l <vol>/dirs | 
> wc -l"
>
> With this test, I get the following times:
>
> first ls: 1 second
> second ls: 3.5 seconds
> third ls: 10 seconds
>
> I don't understand the second ls because the <vol>/dirs directory 
> still have the same 300 subdirectories. But the third one is worst.
>
> I've tried with different kinds of volumes (distributed-replicated, 
> distributed, and even a single brick), and the behavior is the same 
> (though the times are smaller when less bricks are involved).
>
> After reaching this situation, I've tried to get the previous ls times 
> by deleting directories, however the times do not seem to improve. 
> Only after doing some "dirty" tests and removing empty gfid 
> directories from <vol>/.glusterfs on all bricks I get better times, 
> though not as good as the first ls (3 - 4 seconds better than the 
> third ls).
>
> This is always reproducible if the volume is stopped and the caches 
> are emptied before each ls. With more files and/or directories, it can 
> take up to 20 or more seconds to list a directory with 100-200 
> subdirectories.
>
> Without stopping anything, a second ls responds in about 0.2 seconds.
>
> I've also tested this with ext4 and BTRFS (I know it is not supported, 
> but tested anyway). These are the results:
>
> ext4 first ls: 0.5 seconds
> ext4 second ls: 0.8 seconds
> ext4 third ls: 7 seconds
>
> btrfs first ls: 0.5 seconds
> btrfs second ls: 0.5 seconds
> btrfs third ls: 0.5 seconds
>
> It seems clear that it depends on the file system, but if I access 
> directly the bricks, all ls take at most 0.1 seconds to complete.
>
> Repairing and defragmenting the bricks does not help.
>
> strace'ing the glusterfs process of the bricks, I see that for each 
> directory a lot of entries from <vol>/.glusterfs are lstat'ed and a 
> lot of lgetxattr are called. For 300 directories I've counted more 
> than 4500 lstat's and more than 5300 lgetxattr, many of them repeated. 
> I've also noticed that some lstat's take from 10 to 60 ms to complete 
> (with XFS).
>
> Is there any way to minimize these effects ? I'm doing something wrong ?
>
> Thanks in advance for your help,
>
> Xavi
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
> https://lists.nongnu.org/mailman/listinfo/gluster-devel