[Gluster-devel] Re; Load balancing ...

Fri Apr 25 20:28:22 UTC 2008

Gareth Bult wrote:

>> The fact that the file may have been deleted or modified when you try to 
>> open it. File's content is a feature of the file. Whether the file is 
>> there and/or up to date is a feature of the metadata of the file and it's 
>> parent directory. If you start loosening this, you might as well 
>> disconnect the nodes and run them in a deliberate split-brain case and 
>> resync periodically with all the conflict and data loss that entails.
> 
> Whoa, hold up .. I'm specifically looking at metadata information, not trying to open the file. Even when you come to open the file, you may not be worried about the file being 1ms out of sync with the writer ... for example if I'm doing a;
> find / gluster -grep -Hi "spme string" {} \;
> 
> I'd far rather it complete in 10 seconds and risk the 1ms .. than wait a couple of minutes.

Sounds like you'd gladly sacrifice correctness of the "answer" for 
speed. I don't think that's a trade-off most people would accept from a 
POSIX FS.

> Let's say I have my root filesystem running on Gluster (yes I know it's not bootable atm, but when they fix fuse mmap it should be) then I will have lots and lots of files that I want to scan / open / run that I will hardly ever change .. even better, let's say I want to keep a hot-swap mirror of my filesystem, technically gluster can easily do this .. so I use one machine as RW and use another to keep a copy .. then if my machine blows up, I can just switch to the second machine with no data loss. Well, I can't because with meta-data working the way it does, it's simply too slow.

Funny you should say that - I'm working on an OSR module for GlusterFS.
mmap writing support isn't implemented yet, but AFAIK, only mmap read is 
required for shared libraries to work. So it _should be_ usable as a 
root fs. I guess I'll find out for sure soon enough. :-)

> The documentation says when configuring Gluster "think like a programmer". Yet when I want to configure Gluster for a particular purpose, knowing full well the risks and that I can live with them, all of a sudden it's breaking the rules (!)

Can you cite an example? My apologies if I missed it earlier on the thread.

> [..]
> 
> I've tried pretty much every combination .. my current test setup is two data servers and one client running client AFR. I must admit however that I've not tried -a or -e on the client, but then the issue relates to the fact that it's querying both servers, not how long the client caches the information for ...

Have you tried with server-side AFR?

> I run DRBD to replicate my filesystem data .. currently on about 8 machines and about 20 DRBD volumes. This is all live and high volume, and after trying other approaches recently, I simply can't find anything that will do the job reliably, so using it is a bit of a no-brainer.

Indeed, DRBD is very good. I have several clusters running DRBD+GFS, but 
sometimes the tight coupling of RHCS/GFS is inappropriate (e.g. when 
disconnected operation can be required with an intermitted backup/mirror).

> That said I do not use any fencing or clustering, I simply do a manual switch in the event of a problem. I've been running this for many years and have never had an issue re; split brain or exploding servers - hence it's a usable solution, despite warnings and theories, it works where nothing else appears to.

Manual failover? How... 1970s...
For most of my systems, uptimes are of too high an importance to not 
have automatic failover. And if you're going to have automated failover, 
you might as well have load-sharing to boot with primary-primary operation.

> The gluster / fuse base looks to be great ... what I'm probably going to do, once I get a little time is try to build an alternative to AFR for use by us mugs who are prepared to risk the wrath of a the great God Posixlock ... ;-)

Go for it. I can see that it might be a useful plug-in. Especially in 
the case where clients can be read-only but need local copies for 
performance reasons. Essentially it would be more of a real-time file 
replication solution than a file-system. But if that's what you are 
trying to do, then you might as well use NFS with a huge FS-Cache.

Gordan