[Gluster-users] Gluster's read performance

Fri Sep 21 06:52:16 UTC 2012

"Small files" is sort of a misconception. Initial file ops include a small amount of overhead, with a lookup, the filename is hashed, the dht subvolume  is selected and the request is sent to that subvolume. If it's a replica, the request is sent to each replica in that subvolume set (usually 2). If it is a replica, all the replicas have to respond. If  one or more have pending flags or there's an attribute mismatch, either some self heal action has to take place, or a split-brain is determined. If the file doesn't exist on that subvolume, the same must be done to all the subvolumes. If the file is found, a link file is made on the expected dht subvolume pointing to the place we found the file. This will make finding it faster the next time. Once the file is found and is determined to be clean, the file system can move on to the next file operation. 

PHP applications, specifically, normally have a lot of small files that are opened for every page query so per-page, that overhead adds up. PHP also queries a lot of files that just don't exist. Your single page might query 200 files that just aren't there. They're in a different portion of the search path, or they're a plugin that's not used, etc.

NFS mitigates that affect by using FScache in the kernel. It stores directories and stats, preventing the call to the actual filesystem. This also means, of course, that the image that was just uploaded through a different server isn't going to exist on this one until the cache times out. Stale data in a multi-client system is going to have to be expected in a cached client.

Jeff Darcy created a test translator that caches negative lookups which he said also mitigated the PHP problem pretty nicely.

If you have control over your app, things like absolute pathing for PHP or leaving file descriptors open can also avoid overhead. Also, optimizing the number of times you open a file or the number of files to open can help.

So "small files" refers to the percent of total file op time that's spent on overhead vs actual data retrieval.

Chandan Kumar <chandank.kumar at gmail.com> wrote:

>Hello All,
>
>I am new to gluster and evaluating it for my production environment. After
>reading some blogs and googling I learned that NFS mount at clients give
>better read performance for small files and the glusterfs/FUSE mount gives
>better for large write operations.
>
>Now my questions are
>
>1) What do we mean by small files? 1KB/1MB/1GB?
>2) If I am using NFS mount at the client I am most likely loosing the high
>availability feature of gluster. unlike fuse mount where if primary goes
>down I don't need to worry about availability.
>
>Basically my production environment will mostly have read operations of
>files ranging from 400KB to 5MB and they will be concurrently read by
>different threads.
>
>Thanks,
>Chandan
>
>_______________________________________________
>Gluster-users mailing list
>Gluster-users at gluster.org
>http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20120920/861a8993/attachment.html>