[Gluster-devel] Re; Load balancing ...

Fri Apr 25 13:51:16 UTC 2008

On Fri, 25 Apr 2008, Gareth Bult wrote:

>> lookup() will always have to be done on all the subvols
>> So the delay can not be avoided.
>
> Urm, each volume should be a complete copy .. so why would it need to 
> reference all subvolumes? It's not like it's trying to lock a file or 
> anything ... ??

>From what I understand a lookup() is necessary because the metadata _MUST_ 
in all cases be in sync. The files don't necessarily need to be synced, 
but the metadata does. And a directory with a lot of files will have a 
fair amount of metadata.

If you need to relax this constraint for performance reasons, you may want 
to look into something more like Coda - assuming you can live with 
limitations like 1MB of metadata per directory and permissions based on a 
paradigm fundamentally different to the posix owner/group system (due the 
intended design of it as a global file system rather than a cluster file 
system).

> One very valid use case for Gluster is for replication for read-mostly 
> volumes over slow links.

If your use case can be constrained to read-only, you may want to look 
into AFS.

> This limitation makes gluster unusable over slow links, which seems like 
> a fairly critical limitation.
> (indeed both AFR and Unify are unusable over ADSL .. this is probably 
> something worth mentioning a little more clearly in the documentation, 
> I've run into a number of people who've wasted quite a bit of time 
> thining they were at fault ..)

Maybe I'm wrong, but this simply isn't what GlusterFS is designed for. 
There is no way to have posix semantics and locking without consystent 
metadata across all nodes. If you are happy to sacrificy posix-ness for 
performance and you want a less tightly coupled system, Coda may well do 
what you want.

> In addition, I was previously trying to use a gluster filesystem as a 
> platform to share a "Products" folder for a zope instance. I couldn't 
> understand why zope was taking 7 minutes to start rather than 20 seconds 
> .. further investigation revealed it was purely down to filesystem 
> information access re; loading lots of small scripts.

That sounds excessive. How many small scripts was it looking up? Did this 
slowdown occur only the first time or every time it was started, even 
after the data was in sync? Was there any parallelism to the loading, or 
was it done in series? Have you applied any tuning parameters? Have you 
tried increasing the metadata expiry timeout?

> This was on an AFR volume over a 1Gb network link .. THAT is how much 
> difference it makes, 20 seconds -> 7 minutes .. surely this level of 
> impact on filesystem performance deserves a little more priority ?

You haven't mentioned what optimizations were applied (threads? caches? 
metadata timeouts?).

> Overall I think the modular design of gluster is fantastic and it fills 
> a critical need for users .. it's just a real shame that it comes up 
> short in a few key areas.  :-(

Didn't somebody mention here on the list recently that they got their app 
performance from taking > 1 minute to 1-2 seconds by changing their 
configuration (jumbo frames, trunking and a separate replication vs. user 
vlan)?

Gordan