[Gluster-users] Gluster speed sooo slow

Fernando Frediani (Qube) fernando.frediani at qubenet.net
Mon Aug 13 11:16:03 UTC 2012


I heard from a Large ISP talking to someone that works there they were trying to use GlusteFS for Maildir and they had a hell because of the many small files and had customer complaining all the time.
Latency is acceptable on a networked filesystem, but the results people are reporting are beyond any latency problems, they are due to the way Gluster is structured and that was already confirmed by some people on this list, so changed are indeed needed on the code. If you take even a Gigabit network the round trip isn't that much really, (not more than a quarter of a ms) so it shouldn't be a big thing.
Yes FUSE might also contribute to decrease performance but still the performance problems are on the architecture of the filesystem.
One thing that is new to Gluster and that in my opinion could contribute to increase performance is the Distributed-Stripped volumes, but that doesn't still work for all enviroemnts.
So as it stands for Multimedia or Archive files fine, for other usages I wouldn't bet my chips and would rather test thoroughly first.

-----Original Message-----
From: Brian Candler [mailto:B.Candler at pobox.com] 
Sent: 13 August 2012 11:00
To: Fernando Frediani (Qube)
Cc: 'Ivan Dimitrov'; 'gluster-users at gluster.org'
Subject: Re: [Gluster-users] Gluster speed sooo slow

On Mon, Aug 13, 2012 at 09:40:49AM +0000, Fernando Frediani (Qube) wrote:
>    I think Gluster as it stands now and current level of development is
>    more for Multimedia and Archival files, not for small files nor for
>    running Virtual Machines. It requires still a fair amount of
>    development which hopefully RedHat will put in place.

I know a large ISP is using gluster successfully for Maildir storage - or at least was a couple of years ago when I last spoke to them about it - which means very large numbers of small files.

I think you need to be clear on the difference between throughput and latency.

Any networked filesystem is going to have latency, and gluster maybe suffers more than most because of the FUSE layer at the client.  This will show as poor throughput if a single client is sequentially reading or writing lots of small files, because it has to wait a round trip for each request.

However, if you have multiple clients accessing at the same time, you can still have high total throughput.  This is because the "wasted" time between requests from one client is used to service other clients.

If gluster were to do aggressive client-side caching then it might be able to make responses appear faster to a single client, but this would be at the risk of data loss (e.g.  responding that a file has been committed to disk, when in fact it hasn't).  But this would make no difference to total throughput with multiple clients, which depends on the available bandwidth into the disk drives and across the network.

So it all depends on your overall usage pattern. Only make your judgement based on a single-threaded benchmark if that's what your usage pattern is really going to be like: i.e.  are you really going to have a single user accessing the filesystem, and their application reads or writes one file after the other rather than multiple files concurrently.

Regards,

Brian.



More information about the Gluster-users mailing list