[Gluster-users] Fitting gluster to my domestic use case

Pedro Côrte-Real pedro at pedrocr.net
Tue Feb 17 13:41:30 UTC 2015


Hi everyone,

I've been trying several data sync solutions (git-annex, syncthing)
without much success for my use case. I've been considering glusterfs
and am still not sure it will work out. Here's my situation:

- Three machines, laptop, server1 and server2 all with enough storage
to hold all my files
- server1 and server2 are each in their own gigabit LAN and only have
connectivity between themselves through residential level internet (so
combined they see something like 1Mb/s one way and 5Mb/s another)
- laptop is almost always either in the same LAN as server1 or server2
usually connected through Wifi

My requirements are:

1) Local work on the laptop should be as fast as local disk (no
blocking on network I/O) and work seamlessly offline
2) All three machines should be able to sync among each other (so that
if laptop has synced with server1 then server2 can get the changes
from server1 when laptop is off)
3) All content goes to all machines (so I have geographically
distributed copies)
4) (Ideally) all three machines can receive writes locally and sync
them to other machines (conflicts may need to be handled)
5) (Ideally) adding and removing machines is seamless (so I can add
more machines to the cluster for redundancy or bring up a new laptop
just by configuring everything and letting it sync)
6) (Ideally) snapshots are taken at regular intervals as a backup means
7) (Ideally) some machines can be configured to not have the full set
of files so I can have say 20TB of files in the cluster in total and
see them all in my laptop even though it only has a 500GB local cache
of those files

>From what I gather from the documentation and some experiments the
situation with glusterfs is the following:

- With normal replication (laptop, server1 and server2 form a cluster)
I get 2, 3, 4 and 6
- With geo-replication between laptop and each server I could get 1, 3 and 6
- With geo-replication between laptop and a cluster of server1 and
server2 I'd get 1, 2, 3 and 6 but possibly poor performance as the
cluster is running over the internet
- With a cluster of server1 and server2 accessed directly by the
laptop I'd get 2, 3, and 6 and in the future when better caching is
implemented I'd get 7

None of these solutions seem ideal and maybe glusterfs just doesn't
work for my use case and I need to find something else or change my
use case.

Is there anything else I could do that would work better than what
I've figure out already?

Thanks,

Pedro


More information about the Gluster-users mailing list