[Gluster-devel] AFR-over-Unify (instead of the other way around) ?

Wed May 30 14:05:46 UTC 2007

Hello,

First: Thanx!
After spending hours looking for a distributed fault-tolerant filesystem
that would allow me to setup LVS+HA without a mandatory (and dreaded)
"shared block-device", I eventually stumbled on GlusterFS, which just
sounds the most promising filesystem in regards with: performance,
fault-tolerance, no-metadata server requirement, ease of installation
(cf. no kernel patch/module), ease of configuration, and ease of
recovery (cf. easily accessible underlying filesystem)... Just great!!!

Simulating a 12-nodes GlusterFS on the same server (cf. Julien Perez
example), I have successfully unified (cluster/unify) 4 replicated
(cluster/afr) groups of 3 bricks each (cluster/unify OVER cluster/afr);
fault-tolerance is working as expected (and I will warmly welcome the
AFR auto-healing feature from version 1.4 :-D )

Now, my question/problem/suggestion:

Is it possible to work the forementioned scenario the other way around,
in other words: cluster/afr OVER cluster/unify ?
My test - using the branch 2.4 TLA check-out - *seem* to show the opposite:
- if a node is down, the RR-scheduler does not skip to the next
available node in the unified subvolume
- after bringing all redundant nodes down (for one given file) and back
up again, the system does not behave consistently: bringing just (the
"right") *one* node down again makes the file/partition unavailable
again (though the two other nodes are up)

Wouldn't AFR-over-Unify be useful ?
The benefits would be:
 - a single node could be added to one of the AFR-ed "unified stripe",
thus preventing the need to group AFR-ed nodes one-with-another as in
the Unify-over-AFR case
 - provided enough nodes AND a "failing-over" scheduler, AFR-healing
could not be needed, since AFR could "always" find a node to write to in
each "unified stripe" (since all nodes would need to be down within a
"unified stripe" for it to completely fail)

Now, maybe you'll tell me this scenario is inherently flawed :-/ (and I
just made a fool of myself :-D )

PS: I had a thought about nesting a unified GlusterFS within another
AFR-ed GlusterFS... but I didn't do it, for I don't think it would be
very elegant (and performant ?)... unless you tell me that's the way to
go :-)

Thank you for your feedback,

-- 

Cédric Dufour @ IDIAP Research Institute