[Gluster-devel] AFR-over-Unify (instead of the other way around) ?

Thu May 31 16:16:43 UTC 2007

Hello again,

Thank you very much for your feedback.

@ Krishna: your precisions about AFR's behavior are very useful;
wouldn't they be appropriate in your translators documentation (wiki)
page ? I actually struggled to understand why the AFR Cluster Example
was configured that way (using both a 'brickN' and a 'brickN-afr'); it
is now clear.

@ Avati: that's what I was thinking of (and expecting); that'd be great!
Now, if you are into that, would it be possible - from a
write-operations point of view - to have AFR still think of a unified
child "binary-wise": as either unavailable (all nodes down), or
available (at least one node up), provided the unifying scheduler
allowed "failing-over" from a failing node to the next potentially
active node (which could be some extension of the
'rr.limits.min-free-disk' option, as I understand it; e.g. 'rr.fail-over
true').

But now I talk and I don't do the coding, don't I !?! :-D

Best regards

Cédric Dufour @ IDIAP Research Institute

Anand Avati wrote:
> Currently AFR treats its children's failure as binary.. 0 or 1, if one
> fails 'completel'y then it shifts over to the other. this behaviour is
> being fixed and the next release should work smoothly for you. thanks
> for your input!
>
> avati
>
> 2007/5/31, Krishna Srinivas <krishna at zresearch.com>:
>> Hi Cedric,
>>
>> This setup (afr over unify) is possible, but afr and unify were not
>> designed to
>> work that way. In fact we had thought about this kind of setup.
>>
>> AFR's job is to replicate the files among its children. AFR expects
>> its first child to have all the files (the replicate *:n option creates
>> atleast one file, which would be on the first child). All the read-only
>> operations (like readdir() read() getattr() etc) happen on AFR's
>> first child. This being the case if AFR's children are unify1 and
>> unify2,
>> if one of unify1's node goes down, then user will not be able to
>> access the files that were there in the node that went down.
>> Suppose the whole of unify1 goes down, then AFR will failover
>> and do read operations from unify2.
>>
>> Regards,
>> Krishna
>>
>> On 5/30/07, "Cédric Dufour @ IDIAP Research Institute"
>> <cedric.dufour at idiap.ch> wrote:
>> > Hello,
>> >
>> > First: Thanx!
>> > After spending hours looking for a distributed fault-tolerant
>> filesystem
>> > that would allow me to setup LVS+HA without a mandatory (and dreaded)
>> > "shared block-device", I eventually stumbled on GlusterFS, which just
>> > sounds the most promising filesystem in regards with: performance,
>> > fault-tolerance, no-metadata server requirement, ease of installation
>> > (cf. no kernel patch/module), ease of configuration, and ease of
>> > recovery (cf. easily accessible underlying filesystem)... Just
>> great!!!
>> >
>> > Simulating a 12-nodes GlusterFS on the same server (cf. Julien Perez
>> > example), I have successfully unified (cluster/unify) 4 replicated
>> > (cluster/afr) groups of 3 bricks each (cluster/unify OVER
>> cluster/afr);
>> > fault-tolerance is working as expected (and I will warmly welcome the
>> > AFR auto-healing feature from version 1.4 :-D )
>> >
>> > Now, my question/problem/suggestion:
>> >
>> > Is it possible to work the forementioned scenario the other way
>> around,
>> > in other words: cluster/afr OVER cluster/unify ?
>> > My test - using the branch 2.4 TLA check-out - *seem* to show the
>> opposite:
>> > - if a node is down, the RR-scheduler does not skip to the next
>> > available node in the unified subvolume
>> > - after bringing all redundant nodes down (for one given file) and
>> back
>> > up again, the system does not behave consistently: bringing just (the
>> > "right") *one* node down again makes the file/partition unavailable
>> > again (though the two other nodes are up)
>> >
>> > Wouldn't AFR-over-Unify be useful ?
>> > The benefits would be:
>> >  - a single node could be added to one of the AFR-ed "unified stripe",
>> > thus preventing the need to group AFR-ed nodes one-with-another as in
>> > the Unify-over-AFR case
>> >  - provided enough nodes AND a "failing-over" scheduler, AFR-healing
>> > could not be needed, since AFR could "always" find a node to write
>> to in
>> > each "unified stripe" (since all nodes would need to be down within a
>> > "unified stripe" for it to completely fail)
>> >
>> > Now, maybe you'll tell me this scenario is inherently flawed :-/
>> (and I
>> > just made a fool of myself :-D )
>> >
>> > PS: I had a thought about nesting a unified GlusterFS within another
>> > AFR-ed GlusterFS... but I didn't do it, for I don't think it would be
>> > very elegant (and performant ?)... unless you tell me that's the
>> way to
>> > go :-)
>> >
>> > Thank you for your feedback,
>> >
>> > --
>> >
>> > Cédric Dufour @ IDIAP Research Institute
>> >
>> >
>> > _______________________________________________
>> > Gluster-devel mailing list
>> > Gluster-devel at nongnu.org
>> > http://lists.nongnu.org/mailman/listinfo/gluster-devel
>> >
>>
>>
>> _______________________________________________
>> Gluster-devel mailing list
>> Gluster-devel at nongnu.org
>> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>>
>
>