[Gluster-devel] performance improvements

Wed Oct 24 17:48:49 UTC 2007

Vincent Régnard wrote:
> Daniel van Ham Colchete a écrit :
>> On 10/23/07, Vincent Régnard <vregnard at tbs-internet.com> wrote:
>>> Hi all,
>>>
>>> The afr synchronisation using "find -mtime -1 -type f -exec head -c1
>>> trick" takes approximately 30 minutes for a 20GB filesystem with 
>>> 300.000
>>> files. Which seems too long to be acceptable for us. I'd like to tune
>>> some parameters to increase performance.
>>>
>>> Vincent.
>>>
>> Vicent,
>>
>> so, in 1800 seconds you lookup(), open(), read() and close() 300.000 
>> times?
>> That's 6ms for each file, and that's really good. Are you sure you
>> interconnect has a 5ms round-trip time? I would bet it is less.
>
> Hi Daniel, and thank you for your clear answer.
>
> Actually, according to logs I have, only about a hundred file is 
> modified every day, so open() read() and close() only occur for these 
> 100 files. Not the whole filesystem. So half an hours seams a long 
> time for that ? I now do this file synchronisation on a per directory 
> basis (only directories where I know changes might occur) to reduce 
> the runtime and it actually runs much faster. find command really 
> seams to spend a huge time going down and up in the whole directory tree.
> I tried to do the same operation for all the files (not restrincting 
> to recently modified), so this means opening etc.. all the 300.000 
> files, but untill now I never managed to get it finish (beeing trying 
> for 2 weeks now) ! Either glusterfs crashes before or I have to stop 
> client or server for another reason (software upgrade). I had it run 
> once for more than 12 hours, but it did not complete.

Actually, there's a stat call on every file.  Otherwise find can't tell 
how old the file is.  That's what's taking time.  The open, read and 
close time for those 100 files should be negligible, no matter the size 
of the files.  The find operation is for forcing self-heal, which is 
only required if the mounts are out of sync.  The two back-end servers 
should already be in sync as long as they were both up throughout the 
day while the client wrote to those 100 or so files, so when you read a 
single byte from each file you are just reading a single byte (not 
copying the file from one server to the other as when self heal is 
performed), which shouldn't take much time at all.

>>
>> IMO, usually you shouldn't measure GlusterFS performance with things
>> happening serially. GlusterFS is really good when things are 
>> happening in
>> parallel. I prefer to measure a network filesystem performance not on 
>> how
>> much time it takes to do one operation, but on how many operations it 
>> can do
>> in at the same time. If you had 300 threads trying to read all those 
>> files
>> it would be a lot faster. Usually that's the way real utilization 
>> happens,
>> if you have a webserver and a mail server using your storage, you 
>> will have
>> lots of web requests and e-mail sessions reading and writing at the same
>> time.
>>
>
> Regarding read/write access, I monitor with 10 clients in parralel (I 
> can see the activity of 10 gluster threads). This seams ok to me, I 
> have between 5 and 2 MB/s on a 100Mb network. But there is certainly 
> no readdir() in that case, at least not in directories with many 
> files. My real problem is listing files in directory. Mainly for mail 
> purpose (smtp+imap server). I have not performed any test yest, but 
> transfering to glusterfs my maildirs  with about 10.000 files each 
> really frightens me.
>
> I also made some tests restricting the configuration to client and 
> servers in the same datacenter (round-trip about .1ms), but the result 
> seems to be roughly the same regarding read/write performance.

I just moved to glusterfs for the filesystem for one of my clients, and 
they coded the app it's serving to have lots of files in a directory 
(but the directory is never read directly by the app, the subdirectory 
names are in a DB).  30k+ subdirectories in that one directory, and the 
first ls on it takes about 10 seconds.  Subsequent listings take about 
.3 seconds or less.  There's obviously some caching going on. 

-- 

-Kevan Benson
-A-1 Networks