[Gluster-devel] Query on healing process

Fri Feb 26 02:59:47 UTC 2016

Hi Ravi,

Thanks for the response.

We are using Glugsterfs-3.7.8

Here is the use case:

We have a logging file which saves logs of the events for every board of a
node and these files are in sync using glusterfs. System in replica 2 mode
it means When one brick in a replicated volume goes offline, the glusterd
daemons on the other nodes keep track of all the files that are not
replicated to the offline brick. When the offline brick becomes available
again, the cluster initiates a healing process, replicating the updated
files to that brick. But in our casse, we see that log file of one board is
not in the sync and its format is corrupted means files are not in sync.

Even the outcome of #gluster volume heal c_glusterfs info shows that there
is no pending heals.

Also , The logging file which is updated is of fixed size and the new
entries will be wrapped ,overwriting the old entries.

This way we have seen that after few restarts , the contents of the same
file on two bricks are different , but the volume heal info shows zero
entries

Solution:

But when we tried to put delay  > 5 min before the healing everything is
working fine.

Regards,
Abhishek

On Fri, Feb 26, 2016 at 6:35 AM, Ravishankar N <ravishankar at redhat.com>
wrote:

> On 02/25/2016 06:01 PM, ABHISHEK PALIWAL wrote:
>
> Hi,
>
> Here, I have one query regarding the time taken by the healing process.
> In current two node setup when we rebooted one node then the self-healing
> process starts less than 5min interval on the board which resulting the
> corruption of the some files data.
>
>
> Heal should start immediately after the brick process comes up. What
> version of gluster are you using? What do you mean by corruption of data?
> Also, how did you observe that the heal started after 5 minutes?
> -Ravi
>
>
> And to resolve it I have search on google and found the following link:
> https://support.rackspace.com/how-to/glusterfs-troubleshooting/
>
> Mentioning that the healing process can takes upto 10min of time to start
> this process.
>
> Here is the statement from the link:
>
> "Healing replicated volumes
>
> When any brick in a replicated volume goes offline, the glusterd daemons
> on the remaining nodes keep track of all the files that are not replicated
> to the offline brick. When the offline brick becomes available again, the
> cluster initiates a healing process, replicating the updated files to that
> brick. *The start of this process can take up to 10 minutes, based on
> observation.*"
>
> After giving the time of more than 5 min file corruption problem has been
> resolved.
>
> So, Here my question is there any way through which we can reduce the time
> taken by the healing process to start?
>
>
> Regards,
> Abhishek Paliwal
>
>
>
>
> _______________________________________________
> Gluster-devel mailing listGluster-devel at gluster.orghttp://www.gluster.org/mailman/listinfo/gluster-devel
>
>
>
>

-- 

Regards
Abhishek Paliwal
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-devel/attachments/20160226/790b315f/attachment.html>