[Gluster-users] Question about "Possibly undergoing heal" on a file being reported.

Fri May 6 05:30:23 UTC 2016

Be sure that the "gluster --print-statedumpdir" directory exists first.

On May 5, 2016 10:04:41 PM PDT, Ravishankar N <ravishankar at redhat.com> wrote:
>Thanks for the response. The healinfo outputs  'Possibly undergoing 
>heal'  only when the selfheal daemon is performing heal and not when 
>there is IO from the mount. Could you provide the state dump of the 2 
>bricks (and the mount too if you know from which mount this vm image is
>
>being accessed)?
>
>The command is `kill -USR1 <pid>` where pid is the process id of the 
>brick or fuse mount. The statedump will be saved in `gluster 
>--print-statedumpdir`
>Wanted to check if there are any stale locks being held on the bricks.
>
>Thanks,
>Ravi
>
>On 05/06/2016 01:22 AM, Richard Klein (RSI) wrote:
>> I agree there is activity but it's very low I/O based, like updating
>log files.  It shouldn't be high enough IO to keep it permanently in
>the "Possibly undergoing healing" state for days.  But just to make
>sure, I powered off the VM and there is no activity now at all and the
>"trusted.afr.dirty" is still changing.  I will leave the VM in a
>powered off state until tomorrow.  I agree with you that is shouldn't
>but that is my dilemma.
>>
>> Thanks for the insight,
>>
>> Richard Klein
>> RSI
>>
>>> -----Original Message-----
>>> From: gluster-users-bounces at gluster.org [mailto:gluster-users-
>>> bounces at gluster.org] On Behalf Of Joe Julian
>>> Sent: Thursday, May 05, 2016 1:44 PM
>>> To: gluster-users at gluster.org
>>> Subject: Re: [Gluster-users] Question about "Possibly undergoing
>heal" on a file
>>> being reported.
>>>
>>> FYI, that's not "no activity". The file is clearly changing. The
>dirty state flipping
>>> back and forth between 1 and 0 is a byproduct of writes occurring.
>The clients
>>> set the flag, do the write, then clear the flag.
>>> My guess is that's why it's only "possibly" undergoing self-heal.
>The write may
>>> have still been pending at the moment of the check.
>>>
>>> On 05/05/2016 10:22 AM, Richard Klein (RSI) wrote:
>>>> There are 2 hosts involved and we have a replica value of 2.  The
>hosts are
>>> called n1c1cl1 and n1c2cl1.  Below is the info you requested. The
>file name in
>>> gluster is "/97f52c71-80bd-4c2b-8e47-3c8c77712687".
>>>> -- From the n1c1cl1 brick --
>>>>
>>>> [root at n1c1cl1 ~]# ll -h
>>>> /data/brick0/gv0cl1/97f52c71-80bd-4c2b-8e47-3c8c77712687
>>>> -rwxr--r--. 2 root root 3.7G May  5 12:10
>>>> /data/brick0/gv0cl1/97f52c71-80bd-4c2b-8e47-3c8c77712687
>>>>
>>>> [root at n1c1cl1 ~]# getfattr -d -m . -e hex
>>>> /data/brick0/gv0cl1/97f52c71-80bd-4c2b-8e47-3c8c77712687
>>>> getfattr: Removing leading '/' from absolute path names # file:
>>>> data/brick0/gv0cl1/97f52c71-80bd-4c2b-8e47-3c8c77712687
>>>>
>>>
>security.selinux=0x73797374656d5f753a6f626a6563745f723a64656661756c74
>>> 5
>>>> f743a733000
>>>> trusted.afr.dirty=0xe68000000000000000000000
>>>> trusted.bit-rot.version=0x020000000000000057196a8d000e1606
>>>> trusted.gfid=0xb1a49bd1ea01479f9a8277992461e85f
>>>>
>>>> -- From the n1c2cl1 brick --
>>>>
>>>> [root at n1c2cl1 ~]# ll -h
>>>> /data/brick0/gv0cl1/97f52c71-80bd-4c2b-8e47-3c8c77712687
>>>> -rwxr--r--. 2 root root 3.7G May  5 12:16
>>>> /data/brick0/gv0cl1/97f52c71-80bd-4c2b-8e47-3c8c77712687
>>>>
>>>> [root at n1c2cl1 ~]# getfattr -d -m . -e hex
>>>> /data/brick0/gv0cl1/97f52c71-80bd-4c2b-8e47-3c8c77712687
>>>> getfattr: Removing leading '/' from absolute path names # file:
>>>> data/brick0/gv0cl1/97f52c71-80bd-4c2b-8e47-3c8c77712687
>>>>
>>>
>security.selinux=0x73797374656d5f753a6f626a6563745f723a64656661756c74
>>> 5
>>>> f743a733000
>>>> trusted.afr.dirty=0xd38000000000000000000000
>>>> trusted.bit-rot.version=0x020000000000000057196a8d000e20ae
>>>> trusted.gfid=0xb1a49bd1ea01479f9a8277992461e85f
>>>>
>>>> --
>>>>
>>>> The "trusted.afr.dirty" is changing about 2 or 3 times a minute on
>both files.
>>> Let me know if you need further info and thanks.
>>>> Richard Klein
>>>> RSI
>>>>
>>>>
>>>>
>>>> From: Ravishankar N [mailto:ravishankar at redhat.com]
>>>> Sent: Wednesday, May 04, 2016 8:52 PM
>>>> To: Richard Klein (RSI); gluster-users at gluster.org
>>>> Subject: Re: [Gluster-users] Question about "Possibly undergoing
>heal" on a
>>> file being reported.
>>>>
>>>>> On 05/05/2016 01:50 AM, Richard Klein (RSI) wrote:
>>>>> First time e-mailer to the group, greetings all.  We are using
>Gluster 3.7.6 in
>>> Cloudstack on CentOS7 with KVM.  Gluster is our primary storage. 
>All is going
>>> well >but we have a test VM QCOW2 volume that gets stuck in the
>"Possibly
>>> undergoing healing".  By stuck I mean it stays in that state for
>over 24 hrs.  This
>>> is a test VM >with no activity on it and we have removed the swap
>file on the
>>> guest as well thinking that may be causing high I/O.  All the tools
>show that the
>>> VM is basically idle >with low I/O.  The only way I can clear it up
>is to power
>>> the VM off, move the QCOW2 volume from the Gluster mount then back
>>> (basically remove and recreate it) >then power the VM back on.  Once
>I do this
>>> process all is well again but then it happened again on the same
>volume/file.
>>>>> One additional note, I have even powered off the VM completely and
>the
>>> QCOW2 file still stays in this state.
>>>>> When this happens, can you share the output of the extended
>attributes of
>>> the file in question from all the bricks of the replica in which the
>file resides?
>>>> `getfattr -d -m . -e hex /path/to/bricks/file-name`
>>>>
>>>> Also what is the size of this VM image file?
>>>>
>>>> Thanks,
>>>> Ravi
>>>>
>>>>
>>>>
>>>>> Is there a way to stop/abort or force the heal to finish?  Any
>help with a
>>> direction would be appreciated.
>>>>> Thanks,
>>>>>
>>>>> Richard Klein
>>>>> RSI
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Gluster-users mailing list
>>>> Gluster-users at gluster.org
>>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>>
>>>> _______________________________________________
>>>> Gluster-users mailing list
>>>> Gluster-users at gluster.org
>>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://www.gluster.org/mailman/listinfo/gluster-users
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-users
>
>
>_______________________________________________
>Gluster-users mailing list
>Gluster-users at gluster.org
>http://www.gluster.org/mailman/listinfo/gluster-users

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160505/42292e00/attachment.html>