[Gluster-users] [Gluster-devel] Query on healing process

Thu Mar 3 05:44:55 UTC 2016

Hi Ravi,

As I discussed earlier this issue, I investigated this issue and find that
healing is not triggered because the "gluster volume heal c_glusterfs info
split-brain" command not showing any entries as a outcome of this command
even though the file in split brain case.

So, what I have done I manually deleted the gfid entry of that file from
.glusterfs directory and follow the instruction mentioned in the following
link to do heal

https://github.com/gluster/glusterfs/blob/master/doc/debugging/split-brain.md

and this works fine for me.

But my question is why the split-brain command not showing any file in
output.

Here I am attaching all the log which I get from the node for you and also
the output of commands from both of the boards

In this tar file two directories are present

000300 - log for the board which is running continuously
002500-  log for the board which is rebooted

I am waiting for your reply please help me out on this issue.

Thanks in advanced.

Regards,
Abhishek

On Fri, Feb 26, 2016 at 1:21 PM, ABHISHEK PALIWAL <abhishpaliwal at gmail.com>
wrote:

> On Fri, Feb 26, 2016 at 10:28 AM, Ravishankar N <ravishankar at redhat.com>
> wrote:
>
>> On 02/26/2016 10:10 AM, ABHISHEK PALIWAL wrote:
>>
>> Yes correct
>>
>>
>> Okay, so when you say the files are not in sync until some time, are you
>> getting stale data when accessing from the mount?
>> I'm not able to figure out why heal info shows zero when the files are
>> not in sync, despite all IO happening from the mounts. Could you provide
>> the output of getfattr -d -m . -e hex /brick/file-name from both bricks
>> when you hit this issue?
>>
>> I'll provide the logs once I get. here delay means we are powering on the
>> second board after the 10 minutes.
>>
>>
>> On Feb 26, 2016 9:57 AM, "Ravishankar N" <ravishankar at redhat.com> wrote:
>>
>>> Hello,
>>>
>>> On 02/26/2016 08:29 AM, ABHISHEK PALIWAL wrote:
>>>
>>> Hi Ravi,
>>>
>>> Thanks for the response.
>>>
>>> We are using Glugsterfs-3.7.8
>>>
>>> Here is the use case:
>>>
>>> We have a logging file which saves logs of the events for every board of
>>> a node and these files are in sync using glusterfs. System in replica 2
>>> mode it means When one brick in a replicated volume goes offline, the
>>> glusterd daemons on the other nodes keep track of all the files that are
>>> not replicated to the offline brick. When the offline brick becomes
>>> available again, the cluster initiates a healing process, replicating the
>>> updated files to that brick. But in our casse, we see that log file of
>>> one board is not in the sync and its format is corrupted means files are
>>> not in sync.
>>>
>>>
>>> Just to understand you correctly, you have mounted the 2 node replica-2
>>> volume on both these nodes and writing to a logging file from the mounts
>>> right?
>>>
>>>
>>> Even the outcome of #gluster volume heal c_glusterfs info shows that
>>> there is no pending heals.
>>>
>>> Also , The logging file which is updated is of fixed size and the new
>>> entries will be wrapped ,overwriting the old entries.
>>>
>>> This way we have seen that after few restarts , the contents of the same
>>> file on two bricks are different , but the volume heal info shows zero
>>> entries
>>>
>>> Solution:
>>>
>>> But when we tried to put delay  > 5 min before the healing everything
>>> is working fine.
>>>
>>> Regards,
>>> Abhishek
>>>
>>> On Fri, Feb 26, 2016 at 6:35 AM, Ravishankar N <
>>> <ravishankar at redhat.com>ravishankar at redhat.com> wrote:
>>>
>>>> On 02/25/2016 06:01 PM, ABHISHEK PALIWAL wrote:
>>>>
>>>> Hi,
>>>>
>>>> Here, I have one query regarding the time taken by the healing process.
>>>> In current two node setup when we rebooted one node then the
>>>> self-healing process starts less than 5min interval on the board which
>>>> resulting the corruption of the some files data.
>>>>
>>>>
>>>> Heal should start immediately after the brick process comes up. What
>>>> version of gluster are you using? What do you mean by corruption of data?
>>>> Also, how did you observe that the heal started after 5 minutes?
>>>> -Ravi
>>>>
>>>>
>>>> And to resolve it I have search on google and found the following link:
>>>> https://support.rackspace.com/how-to/glusterfs-troubleshooting/
>>>>
>>>> Mentioning that the healing process can takes upto 10min of time to
>>>> start this process.
>>>>
>>>> Here is the statement from the link:
>>>>
>>>> "Healing replicated volumes
>>>>
>>>> When any brick in a replicated volume goes offline, the glusterd
>>>> daemons on the remaining nodes keep track of all the files that are not
>>>> replicated to the offline brick. When the offline brick becomes available
>>>> again, the cluster initiates a healing process, replicating the updated
>>>> files to that brick. *The start of this process can take up to 10
>>>> minutes, based on observation.*"
>>>>
>>>> After giving the time of more than 5 min file corruption problem has
>>>> been resolved.
>>>>
>>>> So, Here my question is there any way through which we can reduce the
>>>> time taken by the healing process to start?
>>>>
>>>>
>>>> Regards,
>>>> Abhishek Paliwal
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Gluster-devel mailing listGluster-devel at gluster.orghttp://www.gluster.org/mailman/listinfo/gluster-devel
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>>
>>>
>>>
>>>
>>> Regards
>>> Abhishek Paliwal
>>>
>>>
>>>
>>>
>>
>>
>
>
> --
>
>
>
>
> Regards
> Abhishek Paliwal
>

-- 

Regards
Abhishek Paliwal
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160303/055d9e7c/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: HU37300_rep.tar.gz
Type: application/x-gzip
Size: 250104 bytes
Desc: not available
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160303/055d9e7c/attachment-0001.gz>