[Gluster-users] [Gluster-devel] Feedback on DHT option "cluster.readdir-optimize"

Thu Nov 10 15:27:25 UTC 2016

On Thu, Nov 10, 2016 at 3:17 AM, Nithya Balachandran
<nbalacha at redhat.com> wrote:
>
>
> On 8 November 2016 at 20:21, Kyle Johnson <kjohnson at gnulnx.net> wrote:
>>
>> Hey there,
>>
>> We have a number of processes which daily walk our entire directory tree
>> and perform operations on the found files.
>>
>> Pre-gluster, this processes was able to complete within 24 hours of
>> starting.  After outgrowing that single server and moving to a gluster setup
>> (two bricks, two servers, distribute, 10gig uplink), the processes became
>> unusable.
>>
>> After turning this option on, we were back to normal run times, with the
>> process completing within 24 hours.
>>
>> Our data is heavy nested in a large number of subfolders under /media/ftp.
>
>
> Thanks for getting back to us - this is very good information. Can you
> provide a few more details?
>
> How deep is your directory tree and roughly how many directories do you have
> at each level?
> Are all your files in the lowest level dirs or do they exist on several
> levels?
> Would you be willing to provide the gluster volume info output for this
> volume?
>>

I have had performance improvement with this option when the first
level below the root consisted several thousands of directories
without any files. IIRC, I was testing this in a 16 x 2 setup.

Regards,
Vijay