[Gluster-users] Separating reads and writes for performance benefit

Thu Mar 22 19:23:51 UTC 2012

On 22 Mar 2012, at 15:14, Haris Zukanovic wrote:

> Intention is to use Gluster for replication/failover purposes.
> 1. To see uploaded files through PHP application on all PHP servers so that I can do seamless failover with my HTTP loadbalancer
> 2. To update the application code on all servers at the same time
> 3. To have a safe replica in case of 1 brick failure
> 
> Do you think Gluster FS is suitable for all of these?

The first thing people usually try for this is bidirectional rsync triggered from a 1-minute cron. That works ok until you realise that you can't delete, because you can't distinguish a new file on one node from a delete on the other. So you look at csync2 and find that doesn't work! Then you run into gluster, which solves all your problems, but has small-file read performance problems.

I've been thinking that a mixture of gluster and rsync could yield better performance with some tradeoffs. Use gluster to handle the fully-synced file system, but don't read from it directly: maintain a local copy of its content via rsync using --delete. This does mean that you end up storing everything twice, but it fixes both gluster's speed problem and rsync's delete problem. The downside is that it's not synchronous, so you get a delay between a write on one node and it being available on other nodes. You could work around that by writing to all locations from your app - a typical web app doesn't do many writes so that may be acceptable. If you are using ip or cookie-based stickiness in your balancer, the client will be reading from where they just wrote, and everyone else will see it shortly afterwards. Overall it's a bit like using mysql with replicated slaves.

Just a thought...

Marcus
-- 
Marcus Bointon
Synchromedia Limited: Creators of http://www.smartmessages.net/
UK info at hand CRM solutions
marcus at synchromedia.co.uk | http://www.synchromedia.co.uk/