[Gluster-devel] performance improvements

Vincent Régnard vregnard at tbs-internet.com
Wed Oct 24 14:12:58 UTC 2007


Daniel van Ham Colchete a écrit :
> On 10/23/07, Vincent Régnard <vregnard at tbs-internet.com> wrote:
>> Hi all,
>>
>> The afr synchronisation using "find -mtime -1 -type f -exec head -c1
>> trick" takes approximately 30 minutes for a 20GB filesystem with 300.000
>> files. Which seems too long to be acceptable for us. I'd like to tune
>> some parameters to increase performance.
>>
>> Vincent.
>>
> Vicent,
> 
> so, in 1800 seconds you lookup(), open(), read() and close() 300.000 times?
> That's 6ms for each file, and that's really good. Are you sure you
> interconnect has a 5ms round-trip time? I would bet it is less.

Hi Daniel, and thank you for your clear answer.

Actually, according to logs I have, only about a hundred file is 
modified every day, so open() read() and close() only occur for these 
100 files. Not the whole filesystem. So half an hours seams a long time 
for that ? I now do this file synchronisation on a per directory basis 
(only directories where I know changes might occur) to reduce the 
runtime and it actually runs much faster. find command really seams to 
spend a huge time going down and up in the whole directory tree.
I tried to do the same operation for all the files (not restrincting to 
recently modified), so this means opening etc.. all the 300.000 files, 
but untill now I never managed to get it finish (beeing trying for 2 
weeks now) ! Either glusterfs crashes before or I have to stop client or 
server for another reason (software upgrade). I had it run once for more 
than 12 hours, but it did not complete.

> 
> IMO, usually you shouldn't measure GlusterFS performance with things
> happening serially. GlusterFS is really good when things are happening in
> parallel. I prefer to measure a network filesystem performance not on how
> much time it takes to do one operation, but on how many operations it can do
> in at the same time. If you had 300 threads trying to read all those files
> it would be a lot faster. Usually that's the way real utilization happens,
> if you have a webserver and a mail server using your storage, you will have
> lots of web requests and e-mail sessions reading and writing at the same
> time.
> 

Regarding read/write access, I monitor with 10 clients in parralel (I 
can see the activity of 10 gluster threads). This seams ok to me, I have 
between 5 and 2 MB/s on a 100Mb network. But there is certainly no 
readdir() in that case, at least not in directories with many files. My 
real problem is listing files in directory. Mainly for mail purpose 
(smtp+imap server). I have not performed any test yest, but transfering 
to glusterfs my maildirs  with about 10.000 files each really frightens me.

I also made some tests restricting the configuration to client and 
servers in the same datacenter (round-trip about .1ms), but the result 
seems to be roughly the same regarding read/write performance.

-- 
Vincent Régnard
vregnard at tbs-internet.com
TBS-internet.com
027 630 5902





More information about the Gluster-devel mailing list