[Gluster-devel] self-heal behavior

Mon Jul 9 20:36:23 UTC 2007

Just some more info for you. I'm glad to see there has been thought put into 
some sort of "backgrounding" of the AFR. I saw "massive" across all bricks 
when I started using AFR (with 750GB of user data, 270K files). I was doing 
an RSYNC to update the glusterfs data from the master file server. So 
anytime anyone wants to increase the number of relicas, and then go through 
all the files (eg. using an rsync), there will be a lot of activity going 
on.

I don't like the performance hit, and it is not "critical" that all the 
replica happen right away -- in fact, it is extremely low priority. My 
highest priority is low latency to my master files. It seems that the best 
time to do the replica is not on the open(), but maybe some talk to a 
"scheduler" xlator who is also looking at bandwidth availability.

I want to bring bandwidth into the picture, because making replicas across a 
wan connection makes sense for disaster recovery. In the case of the bricks 
out of sync with email files (where the replica data could chew up all 
available wan bandwidth), maybe the AFR, and a separate "garbage cleanup" or 
file syncing could be done by another process that is also working in the 
background. I could forsee that with large datasets, and network outages, 
that large datasets will be out of sync, and have to resync, that resyncing 
is never "finished". Having a large performance hit or demand on the wan 
bandwidth when a network outage is fixed would be horrible.

>From: "Anand Avati" <avati at zresearch.com>
>To: "Gerry Reno" <greno at verizon.net>
>CC: gluster-devel <gluster-devel at nongnu.org>
>Subject: Re: [Gluster-devel] self-heal behavior
>Date: Wed, 4 Jul 2007 19:33:14 +0530
>
>Gerry,
>your question is appropriate, but the answer to 'when to resync' is not
>very simple. when a brick which was brought down is brought up later, it 
>may
>be a completely new (empty) brick. In that case starting to sync every file
>would most likely be the wrong decision. (we should rather sync the file
>which the user needs than some unused file). Even if we chose to sync files
>without user accessing them it would be very sluggish too since it would be
>intervening in other operations.
>
>The current approach is to sync files on the next open() on it. This is
>usually a good balance since, during open() if we were to sync a file, even
>if it was a GB it would take 10-15 secs, and for normal files (in the order
>of few MBs) it is almost not noticable. But if this were to happen together
>for all files whether the user accessed them or not there would be a lot of
>traffic and be very sluggish.
>
>This approach of syncing on open() is what even other filesystems which
>support redundancy do.
>
>Detecting 'idle time' and beginning sync-up and pausing the sync-up when
>user begins activity is a very tricky job, but that is definitely what we
>aim at finally. It is not enough if AFR detects the client is free, because
>the servers may be busy serving files to another client and syncing at that
>time may not be the most apprpriate time. The following versions of AFR 
>will
>have more options to tune 'when' to sync. Currently it is only at open(). 
>We
>plan to add options to make it sync on lookup() (happens on ls). Later
>versions would have pro-active syncing (detecting that both server and
>clients are idle etc).
>
>thanks,
>avati
>
>2007/7/4, Gerry Reno <greno at verizon.net>:
>>
>>   I've been doing some testing of self-heal.  Basically taking down one
>>brick and then copying some files to one of the client mounts, then
>>bringing the downed brick back up.  What I see is that when I bring the
>>downed brick back up, no activity occurs.  It's only when I start doing
>>something in one of the client mounts that something occurs to rebuild
>>the out-of-sync brick.  My concern with this is that if I have four
>>applications on different client nodes (separate machines) using the
>>same data set (mounted on GlusterFS).  The brick on one of these nodes
>>is out-of-sync, and it is not until some user is trying to use the
>>application that the brick starts to resync.  This results in sluggish
>>performance to the user as all the data has to be brought over the
>>network from other bricks since the local brick is out-of-sync.  Now
>>there may have been ten minutes of idle time prior to this user trying
>>to access the data but glusterfs did not make any use of this time to
>>rebuild the out-of-sync brick but rather waited until a user tried to
>>access data.  To me, it appears that glusterfs should be making use of
>>such opportunity and this would diminish the overall impact to users of
>>the out-of-sync condition.
>>
>>Regards,
>>Gerry
>>
>>
>>
>>_______________________________________________
>>Gluster-devel mailing list
>>Gluster-devel at nongnu.org
>>http://lists.nongnu.org/mailman/listinfo/gluster-devel
>>
>
>
>
>--
>Anand V. Avati
>_______________________________________________
>Gluster-devel mailing list
>Gluster-devel at nongnu.org
>http://lists.nongnu.org/mailman/listinfo/gluster-devel

_________________________________________________________________
http://newlivehotmail.com