[Gluster-users] Question about "Possibly undergoing heal" on a file being reported.

Tue May 10 15:39:23 UTC 2016

Well, after about 3 days of being in the healing state that file finally went to normal.  All is good now but I am still uncertain if this behavior is normal.  Thanks for the help, thought you'd like to know.

Richard Klein
RSI

> -----Original Message-----
> From: gluster-users-bounces at gluster.org [mailto:gluster-users-
> bounces at gluster.org] On Behalf Of Richard Klein (RSI)
> Sent: Friday, May 06, 2016 1:51 PM
> To: gluster-users at gluster.org
> Subject: Re: [Gluster-users] Question about "Possibly undergoing heal" on a file
> being reported.
> 
> I have performed the state dump as you requested on both replica pair hosts.
> They hosts are called n1c1cl1 and n1c2cl1.  The file is on /data/brick0/gv0cl1
> on both hosts.  The attached zip file contains the state dump results of brick0
> for both hosts.  I identified the process ID by finding the glusterfsd process that
> had /data/brick0/gv0cl1 in the command line.
> 
> Let me know if you need further information.
> 
> P.S. The "Possibly undergoing heal" is still showing and the "trusted.afd.dirty" is
> still changing even after the VM has been off since yesterday.
> 
> Richard Klein
> RSI
> 
> > -----Original Message-----
> > From: Ravishankar N [mailto:ravishankar at redhat.com]
> > Sent: Friday, May 06, 2016 12:05 AM
> > To: Richard Klein (RSI); gluster-users at gluster.org
> > Subject: Re: [Gluster-users] Question about "Possibly undergoing heal"
> > on a file being reported.
> >
> > Thanks for the response. The healinfo outputs  'Possibly undergoing
> > heal'  only when the selfheal daemon is performing heal and not when
> > there is IO from the mount. Could you provide the state dump of the 2
> > bricks (and the mount too if you know from which mount this vm image is
> being accessed)?
> >
> > The command is `kill -USR1 <pid>` where pid is the process id of the
> > brick or fuse mount. The statedump will be saved in `gluster
> > --print-statedumpdir` Wanted to check if there are any stale locks being held
> on the bricks.
> >
> > Thanks,
> > Ravi
> >
> > On 05/06/2016 01:22 AM, Richard Klein (RSI) wrote:
> > > I agree there is activity but it's very low I/O based, like updating
> > > log files.  It
> > shouldn't be high enough IO to keep it permanently in the "Possibly
> > undergoing healing" state for days.  But just to make sure, I powered
> > off the VM and there is no activity now at all and the
> > "trusted.afr.dirty" is still changing.  I will leave the VM in a
> > powered off state until tomorrow.  I agree with you that is shouldn't but that
> is my dilemma.
> > >
> > > Thanks for the insight,
> > >
> > > Richard Klein
> > > RSI
> > >
> > >> -----Original Message-----
> > >> From: gluster-users-bounces at gluster.org [mailto:gluster-users-
> > >> bounces at gluster.org] On Behalf Of Joe Julian
> > >> Sent: Thursday, May 05, 2016 1:44 PM
> > >> To: gluster-users at gluster.org
> > >> Subject: Re: [Gluster-users] Question about "Possibly undergoing
> > >> heal" on a file being reported.
> > >>
> > >> FYI, that's not "no activity". The file is clearly changing. The
> > >> dirty state flipping back and forth between 1 and 0 is a byproduct
> > >> of writes occurring. The clients set the flag, do the write, then clear the
> flag.
> > >> My guess is that's why it's only "possibly" undergoing self-heal.
> > >> The write may have still been pending at the moment of the check.
> > >>
> > >> On 05/05/2016 10:22 AM, Richard Klein (RSI) wrote:
> > >>> There are 2 hosts involved and we have a replica value of 2.  The
> > >>> hosts are
> > >> called n1c1cl1 and n1c2cl1.  Below is the info you requested. The
> > >> file name in gluster is "/97f52c71-80bd-4c2b-8e47-3c8c77712687".
> > >>> -- From the n1c1cl1 brick --
> > >>>
> > >>> [root at n1c1cl1 ~]# ll -h
> > >>> /data/brick0/gv0cl1/97f52c71-80bd-4c2b-8e47-3c8c77712687
> > >>> -rwxr--r--. 2 root root 3.7G May  5 12:10
> > >>> /data/brick0/gv0cl1/97f52c71-80bd-4c2b-8e47-3c8c77712687
> > >>>
> > >>> [root at n1c1cl1 ~]# getfattr -d -m . -e hex
> > >>> /data/brick0/gv0cl1/97f52c71-80bd-4c2b-8e47-3c8c77712687
> > >>> getfattr: Removing leading '/' from absolute path names # file:
> > >>> data/brick0/gv0cl1/97f52c71-80bd-4c2b-8e47-3c8c77712687
> > >>>
> > >>
> >
> security.selinux=0x73797374656d5f753a6f626a6563745f723a64656661756c74
> > >> 5
> > >>> f743a733000
> > >>> trusted.afr.dirty=0xe68000000000000000000000
> > >>> trusted.bit-rot.version=0x020000000000000057196a8d000e1606
> > >>> trusted.gfid=0xb1a49bd1ea01479f9a8277992461e85f
> > >>>
> > >>> -- From the n1c2cl1 brick --
> > >>>
> > >>> [root at n1c2cl1 ~]# ll -h
> > >>> /data/brick0/gv0cl1/97f52c71-80bd-4c2b-8e47-3c8c77712687
> > >>> -rwxr--r--. 2 root root 3.7G May  5 12:16
> > >>> /data/brick0/gv0cl1/97f52c71-80bd-4c2b-8e47-3c8c77712687
> > >>>
> > >>> [root at n1c2cl1 ~]# getfattr -d -m . -e hex
> > >>> /data/brick0/gv0cl1/97f52c71-80bd-4c2b-8e47-3c8c77712687
> > >>> getfattr: Removing leading '/' from absolute path names # file:
> > >>> data/brick0/gv0cl1/97f52c71-80bd-4c2b-8e47-3c8c77712687
> > >>>
> > >>
> >
> security.selinux=0x73797374656d5f753a6f626a6563745f723a64656661756c74
> > >> 5
> > >>> f743a733000
> > >>> trusted.afr.dirty=0xd38000000000000000000000
> > >>> trusted.bit-rot.version=0x020000000000000057196a8d000e20ae
> > >>> trusted.gfid=0xb1a49bd1ea01479f9a8277992461e85f
> > >>>
> > >>> --
> > >>>
> > >>> The "trusted.afr.dirty" is changing about 2 or 3 times a minute on
> > >>> both
> > files.
> > >> Let me know if you need further info and thanks.
> > >>> Richard Klein
> > >>> RSI
> > >>>
> > >>>
> > >>>
> > >>> From: Ravishankar N [mailto:ravishankar at redhat.com]
> > >>> Sent: Wednesday, May 04, 2016 8:52 PM
> > >>> To: Richard Klein (RSI); gluster-users at gluster.org
> > >>> Subject: Re: [Gluster-users] Question about "Possibly undergoing
> > >>> heal" on a
> > >> file being reported.
> > >>>
> > >>>> On 05/05/2016 01:50 AM, Richard Klein (RSI) wrote:
> > >>>> First time e-mailer to the group, greetings all.  We are using
> > >>>> Gluster 3.7.6 in
> > >> Cloudstack on CentOS7 with KVM.  Gluster is our primary storage.
> > >> All is going well >but we have a test VM QCOW2 volume that gets
> > >> stuck in the "Possibly undergoing healing".  By stuck I mean it
> > >> stays in that state for over 24 hrs.  This is a test VM >with no
> > >> activity on it and we have removed the swap file on the guest as
> > >> well thinking that may be causing high I/O.  All the tools show
> > >> that the VM is basically idle >with low I/O.  The only way I can
> > >> clear it up is to power the VM off, move the QCOW2 volume from the
> > >> Gluster mount then back (basically remove and recreate it) >then
> > >> power the VM back on.  Once I do
> > this process all is well again but then it happened again on the same
> > volume/file.
> > >>>> One additional note, I have even powered off the VM completely
> > >>>> and the
> > >> QCOW2 file still stays in this state.
> > >>>> When this happens, can you share the output of the extended
> > >>>> attributes of
> > >> the file in question from all the bricks of the replica in which
> > >> the file
> > resides?
> > >>> `getfattr -d -m . -e hex /path/to/bricks/file-name`
> > >>>
> > >>> Also what is the size of this VM image file?
> > >>>
> > >>> Thanks,
> > >>> Ravi
> > >>>
> > >>>
> > >>>
> > >>>> Is there a way to stop/abort or force the heal to finish?  Any
> > >>>> help with a
> > >> direction would be appreciated.
> > >>>> Thanks,
> > >>>>
> > >>>> Richard Klein
> > >>>> RSI
> > >>>
> > >>>
> > >>>
> > >>> _______________________________________________
> > >>> Gluster-users mailing list
> > >>> Gluster-users at gluster.org
> > >>> http://www.gluster.org/mailman/listinfo/gluster-users
> > >>>
> > >>> _______________________________________________
> > >>> Gluster-users mailing list
> > >>> Gluster-users at gluster.org
> > >>> http://www.gluster.org/mailman/listinfo/gluster-users
> > >> _______________________________________________
> > >> Gluster-users mailing list
> > >> Gluster-users at gluster.org
> > >> http://www.gluster.org/mailman/listinfo/gluster-users
> > > _______________________________________________
> > > Gluster-users mailing list
> > > Gluster-users at gluster.org
> > > http://www.gluster.org/mailman/listinfo/gluster-users
> >