[Gluster-devel] Spurious disconnections / connectivity loss

Mon Feb 1 13:45:51 UTC 2010

Daniel Maher wrote:
> Gordan Bobic wrote:
> 
>> That's hardly unexpected. If you are using client-side replicate, I'd 
>> expect to see the bandwidth requirements multiply with the number of 
>> replicas. For all clustered configurations (not limited to glfs) I use 
>> a separate LAN for cluster communication to ensure best possible 
>> throughput/latencies, and specifically in case of glfs, I do server 
>> side replicate so that the replicate traffic gets offloaded to that 
>> private cluster LAN, so the bandwidth requirements to the clients can 
>> be kept down to sane levels.
> 
> If you're willing to describe your setup further, i'd love to hear about 
> it.  I'm currently using client-side replication, and for the reasons of 
> scalability you described, i'd like to move to a server-side replication 
> setup.
> 
> In particular, i'm interested in how you handle the connectivity between 
> the clients and the servers vis-à-vis load balancing (if any) and 
> availability.  For example, in your configuration, how does a given 
> client « know » which server to speak to, and what happens that server 
> becomes inaccessible ?

Since client connections persist, you could use something as naive as 
DNS round-robin load balancing - it'll do a good enough job in most 
cases, if you have lots of clients. In terms of fail-over, that's more 
tricky. NFS over UDP (with unfsd) works relatively gracefully with just 
failing over the IP to one of the surviving servers (use something like 
RedHat Cluster to handle the IP resource fail-over). Unfortunately, glfs 
protocol itself doesn't handle disconnects gracefully - you just end up 
with a "transport end point not connected" and you have to 
umount+remount to get the volume back, which is messy and most 
definitely not transparent. The obvious disadvantage of unfsd is 
performance (and it's pretty dire, no two ways about it), although as I 
mentioned in a thread here a while back, glfs protocol for client 
connection doesn't seem to yield noticable benefits over unfsd, due to 
it's own fuse overheads.

Performance translators help a lot, but unfortunately, last time I 
tested, they destabilized things too much and I had to remove them.

> I realise that this may not be necessarily be appropriate discussion for 
> the devel mailing list, but iirc, you're not on the user list, hence the 
> reply here.

Yeah, I should probably sign up to the users list at some point. Most of 
my posts are about possible bug reports which wouldn't be particularly 
useful on the users list so I never bothered signing up to it.

Gordan