[Gluster-devel] afr logic

Thu Oct 18 17:40:42 UTC 2007

thanks for explanations, Krishna, so according to your mail:

"The version attribute is incremented during the close() call."

if afr is client side then version attribute of an open file on crashed
server node has not been incremented and during repairing the file is copied
from latest correctly closed version => by the logics repaired replica will
be in consistent state because node crash corrupts usually files not closed
before crash (the same for directories), that is what I wanted to hear,
thanks

in the case of server side afr a version attribute is not incremented on all
replicas if crashed node is master what is much worse because replicas can
be left with different data but the same version attribute during crash

stripe gives me about 20% performance increase on my artificial test and it
is valuable for me, so I'll reconfigure RAID 0+1 into RAID 10 to escape full
file duplication after crash

regards, Alexey.

On 10/18/07, Krishna Srinivas <krishna at zresearch.com> wrote:
>
> On 10/18/07, Kevan Benson <kbenson at a-1networks.com> wrote:
> > Alexey Filin wrote:
> > > ops, my English...
> > >
> > > the question is: in your terms I have RAID 0+1 (RAID 10 is a trivial
> case),
> > > i.e. afr over stripe, if one brick is repaired, is its complement to
> be
> > > repaired too? I have some doubts about how stripe algorithm slices
> files, if
> > > it depends on parameters not equal for replicas (e.g. load average)
> then
> > > complement is to be copied too even if configuration of stripe for
> bouth
> > > replicas is the same. Even it is so I want to hear it explicitly from
> > > glusterfs team with promise don't change the policy!
> > >
> > > Regards, Alexey.
> >
> > http://gluster.org/docs/index.php/GlusterFS_FAQ#Why_is_striping_bad.3F
> >
> http://gluster.org/docs/index.php/GlusterFS_Translators_v1.3#Stripe_Translator
> >
> > Like the AFR translator, the stripe translator lets you specify what
> > behavior applies to particular files, so you can specify stripe size per
> > file type/name.
> >
> > I haven't done anything with the striping, and according to the FAQ it's
> > not recommended.  Since I haven't done anything with it myself, I can't
> > say whether their concerns in the overhead are are dwarfed by the
> > bandwidth limitations of the medium when using Ethernet.
> >
> > As for the repairing question, if you stripe AFR's any brick that had a
> > problem would have a problem with a *portion* (stripe) of a file, but
> > that portion is AFR'd somewhere else and will be repaired (I assume).
> > If you AFR stripes, that means that if there's a problem in a brick,
> > it's a portion (stripe) of a file, and there's no copy to repair from.
> > There's a copy of the full file, but that doesn't mean the AFR knows one
> > of the AFR's files has a problem,it's trusted.afr.version attribure is
> > probably the same as the good file.  The good file is there, but it may
> > not be what's accessed be default, as the AFr probably has no method of
> > determining it's not a valid copy if the trusted.afr.version attributes
> > match.
>
> Correct. If AFR has 2 children, which are of type cluster/stripe, if one
> of the
> bricks go down, the entire stripe of which it is part of goes down. The
> other
> stripe of the AFR will continue to work in the normal way. When the downed
> stripe comes back up, and open() happens the entire file will be replaced
> from the latest version.
> Think about having stripe on afr, now healing will happen on the partial
> files
> in case any brick goes down and comes back up.
>
> >
> > You know, I find myself doing quite a lot of assuming today.  Hopefully
> > someone on the gluster team will review all this and let me know where
> > I'm correct, clarify where I'm ambiguous, and most importantly, let me
> > know when I'm talking out my ass.
> >
>
> No, you are doing fine, thanks for your help on the mailing list :-)
>
> Thanks
> Krishna
>
> > --
> >
> > -Kevan Benson
> > -A-1 Networks
> >
>