[Gluster-devel] Re; Load balancing ...

Fri Apr 25 14:59:58 UTC 2008

Ok,

Well here's the thing. I've tried to apply Gluster in 8 different "real world" scenario's, and each time I've failed either because of bugs or because "this simply isn't what GlusterFS is designed for".

Now, I approached each case fairly seriously and tried different combinations of read and write caches, different numbers of io threads trying each combination on both client and server to see what differences / pros / cons were.

Jumbo frames is a non-starter for me as my production environment is Xen based, which doesn't yet support Jumbo frames.
(not that I think it's remotely likely this will solve the problem)

Suggesting that I'm either not tuning it properly or should be using an alternative filesystem I'm afraid is a bit of a cop-out. There are real problems here and saying "yes but Gluster is only designed to work in specific instances" is frankly a bit daft, and if this were the case, instead of a heavy sales pitch on the website along the lines of "Gluster is wonderful and does everything", it should be saying "Gluster will do x, y and z, only."

Now, Zope is a long-standing web based application server that I've been using for nearly 10 years, telling me it's "excessive" really doesn't fly. Trying to back up a gluster AFR with rsync runs into similar problems when you have lots of small files - it takes way longer than it should do.

Moving to the other end of the scale, AFR can't cope with large files either .. handling of sparse files doesn't work properly and self-heal has no concept of repairing part of a file .. so sticking a 20Gb file on a GlusterFS is just asking for trouble as every time you restart a gluster server (or every time one crashes) it'll crucify your network.

Now, a couple of points;

a. With regards to metadata, given two volumes mirrored via AFR, please can you 
   explain to me why it's ok to do a data read operation against one node only, but not a metadata read 
   operation .. and what would break if you read metadata from only one volume?

b. Looking back through the list, Gluster's non-caching mechanism for acquiring file-system information 
   seems to be at the root of many of it's performance issues. Is there no mileage in trying to address 
   this issue ?

c. If I stop one of my two servers, AFR suddenly speeds up "a lot" !
   Would it be so bad if there were an additional option "subvolume-read-meta" ?
   This would probably involve only a handful of additional lines of code, if that .. ?

Gareth.

----- Original Message -----
From: gordan at bobich.net
To: gluster-devel at nongnu.org
Sent: Friday, April 25, 2008 2:51:16 PM GMT +00:00 GMT Britain, Ireland, Portugal
Subject: Re: [Gluster-devel] Re; Load balancing ...

On Fri, 25 Apr 2008, Gareth Bult wrote:

>> lookup() will always have to be done on all the subvols
>> So the delay can not be avoided.
>
> Urm, each volume should be a complete copy .. so why would it need to 
> reference all subvolumes? It's not like it's trying to lock a file or 
> anything ... ??

>From what I understand a lookup() is necessary because the metadata _MUST_ 
in all cases be in sync. The files don't necessarily need to be synced, 
but the metadata does. And a directory with a lot of files will have a 
fair amount of metadata.

If you need to relax this constraint for performance reasons, you may want 
to look into something more like Coda - assuming you can live with 
limitations like 1MB of metadata per directory and permissions based on a 
paradigm fundamentally different to the posix owner/group system (due the 
intended design of it as a global file system rather than a cluster file 
system).

> One very valid use case for Gluster is for replication for read-mostly 
> volumes over slow links.

If your use case can be constrained to read-only, you may want to look 
into AFS.

> This limitation makes gluster unusable over slow links, which seems like 
> a fairly critical limitation.
> (indeed both AFR and Unify are unusable over ADSL .. this is probably 
> something worth mentioning a little more clearly in the documentation, 
> I've run into a number of people who've wasted quite a bit of time 
> thining they were at fault ..)

Maybe I'm wrong, but this simply isn't what GlusterFS is designed for. 
There is no way to have posix semantics and locking without consystent 
metadata across all nodes. If you are happy to sacrificy posix-ness for 
performance and you want a less tightly coupled system, Coda may well do 
what you want.

> In addition, I was previously trying to use a gluster filesystem as a 
> platform to share a "Products" folder for a zope instance. I couldn't 
> understand why zope was taking 7 minutes to start rather than 20 seconds 
> .. further investigation revealed it was purely down to filesystem 
> information access re; loading lots of small scripts.

That sounds excessive. How many small scripts was it looking up? Did this 
slowdown occur only the first time or every time it was started, even 
after the data was in sync? Was there any parallelism to the loading, or 
was it done in series? Have you applied any tuning parameters? Have you 
tried increasing the metadata expiry timeout?

> This was on an AFR volume over a 1Gb network link .. THAT is how much 
> difference it makes, 20 seconds -> 7 minutes .. surely this level of 
> impact on filesystem performance deserves a little more priority ?

You haven't mentioned what optimizations were applied (threads? caches? 
metadata timeouts?).

> Overall I think the modular design of gluster is fantastic and it fills 
> a critical need for users .. it's just a real shame that it comes up 
> short in a few key areas.  :-(

Didn't somebody mention here on the list recently that they got their app 
performance from taking > 1 minute to 1-2 seconds by changing their 
configuration (jumbo frames, trunking and a separate replication vs. user 
vlan)?

Gordan

_______________________________________________
Gluster-devel mailing list
Gluster-devel at nongnu.org
http://lists.nongnu.org/mailman/listinfo/gluster-devel