[Gluster-devel] afr and self-heal issues

Fri Aug 3 20:21:02 UTC 2007

Hi Nathan,

On 8/3/07, Nathan Allen Stratton <nathan at robotics.net> wrote:
>
> My setup is 3 servers each with 3 volumes:
>
> vs0 ns brick-a mirror-c
> vs0 ns brick-b mirror-a
> vs0 ns brick-c mirror-b
>
> I afr replicate *:3 the ns bricks into block-ns-afr and afr replicate *:2
> each brick-(a-c) and mirror(a-c) with replicate *:2 into block-(a-c)-afr.
>
> I then unify block-(a-c)-afr into share-unify with option namesspace
> block-ns-afr.
>
> If a server goes down things lock up and then crash (known issue that the
> gluster guys are working on).

What tla version are you using? I think this problem has been fixed.

> If I leave one server (vs0) off and restart
> the crashed servers I can write to my share unify, I would expect files
> going to block-b-afr to land in vs1 brick-b and vs2 mirror-b and that is
> exactly what happens. Unify is using rr scheduler and as expected files
> also are sent to block-c-afr. Server vs2 brick-c gets the block-c-afr
> files, but the odd part is, so does vs1 mirror-a....
>
> Why would that happen? block-c-afr is made up of vs2 brick-c and vs0
> (server that is down) mirror-c.
>

block-a-afr block-b-afr and block-c-afr are the subvolumes of unify.
When all the servers are up, all the subvolumes of unify will be in up
state and scheduling happens on all three subvolumes. However
when you bring down vs0 (first server), still all the subvolumes
will be up. block-a-afr will be up because 2nd server vs1 which has the
mirror-a is up. so creation of files will still be scheduled on block-a-afr
which is why you see files in mirror-a of vs1, they are not the files which
got scheduled on block-b-afr or block-c-afr.

> Also, when I bring back up vs0, I would expect ns to be brought back up to
> date with the others since it was part of the afr *:3, but it is not. I
> also would expect that files part of block-c-afr that are in vs2 brick-c
> would also be copied to vs0 mirror-c, but that also does not happen.

If self-heal has to happen, open() call needs to be issued on that file.
just run find . -exec od -N 1 {} > /dev/null \;
that should self-heal the files.

>
> Also, I was playing around with stripe, does it work in the latest code?
> If I edit my configs and comment out my unify and replace it with stripe I
> only get what looks like unify, but without the namespace requirements.
> I.E. no matter what I put for block-size my files are still their normally
> 300 or so megs. Is the issue that I am using it server side rather then
> client side?
>
> Any ideas?
>
> Full configs are at:
>
> http://share.robotics.net/client.vol
> http://share.robotics.net/vs0_server.vol
> http://share.robotics.net/vs1_server.vol
> http://share.robotics.net/vs2_server.vol
>
>
>
> ><>
> Nathan Stratton                         CTO, Voila IP Communications
> nathan at robotics.net                  nathan at voilaip.com
> http://www.robotics.net                 http://www.voilaip.com
>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>