[Gluster-devel] Query on healing process

Fri Feb 26 04:27:38 UTC 2016

Hello,

On 02/26/2016 08:29 AM, ABHISHEK PALIWAL wrote:
> Hi Ravi,
>
> Thanks for the response.
>
> We are using Glugsterfs-3.7.8
>
> Here is the use case:
>
> We have a logging file which saves logs of the events for every board 
> of a node and these files are in sync using glusterfs. System in 
> replica 2 mode it means When one brick in a replicated volume goes 
> offline, the glusterd daemons on the other nodes keep track of all the 
> files that are not replicated to the offline brick. When the offline 
> brick becomes available again, the cluster initiates a healing 
> process, replicating the updated files to that brick. But in our 
> casse, we see that log file of one board is not in the sync and its 
> format is corrupted means files are not in sync.

Just to understand you correctly, you have mounted the 2 node replica-2 
volume on both these nodes and writing to a logging file from the mounts 
right?

>
> Even the outcome of #gluster volume heal c_glusterfs info shows that 
> there is no pending heals.
>
> Also , The logging file which is updated is of fixed size and the new 
> entries will be wrapped ,overwriting the old entries.
>
> This way we have seen that after few restarts , the contents of the 
> same file on two bricks are different , but the volume heal info shows 
> zero entries
>
> Solution:
>
> But when we tried to put delay > 5 min before the healing everything 
> is working fine.
>
> Regards,
> Abhishek
>
> On Fri, Feb 26, 2016 at 6:35 AM, Ravishankar N <ravishankar at redhat.com 
> <mailto:ravishankar at redhat.com>> wrote:
>
>     On 02/25/2016 06:01 PM, ABHISHEK PALIWAL wrote:
>>     Hi,
>>
>>     Here, I have one query regarding the time taken by the healing
>>     process.
>>     In current two node setup when we rebooted one node then the
>>     self-healing process starts less than 5min interval on the board
>>     which resulting the corruption of the some files data.
>
>     Heal should start immediately after the brick process comes up.
>     What version of gluster are you using? What do you mean by
>     corruption of data? Also, how did you observe that the heal
>     started after 5 minutes?
>     -Ravi
>>
>>     And to resolve it I have search on google and found the following
>>     link:
>>     https://support.rackspace.com/how-to/glusterfs-troubleshooting/
>>
>>     Mentioning that the healing process can takes upto 10min of time
>>     to start this process.
>>
>>     Here is the statement from the link:
>>
>>     "Healing replicated volumes
>>
>>     When any brick in a replicated volume goes offline, the glusterd
>>     daemons on the remaining nodes keep track of all the files that
>>     are not replicated to the offline brick. When the offline brick
>>     becomes available again, the cluster initiates a healing process,
>>     replicating the updated files to that brick. *The start of this
>>     process can take up to 10 minutes, based on observation.*"
>>
>>     After giving the time of more than 5 min file corruption problem
>>     has been resolved.
>>
>>     So, Here my question is there any way through which we can reduce
>>     the time taken by the healing process to start?
>>
>>
>>     Regards,
>>     Abhishek Paliwal
>>
>>
>>
>>
>>     _______________________________________________
>>     Gluster-devel mailing list
>>     Gluster-devel at gluster.org <mailto:Gluster-devel at gluster.org>
>>     http://www.gluster.org/mailman/listinfo/gluster-devel
>
>
>
>
>
> -- 
>
>
>
>
> Regards
> Abhishek Paliwal

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-devel/attachments/20160226/2d9e61c8/attachment.html>