[Gluster-users] Self Heal Confusion

Fri Dec 28 17:18:48 UTC 2018

I assume the options were off by default but I'll turn them back on.  
I'm working on getting the information.

On 12/28/18 1:00 AM, Ashish Pandey wrote:
>
> Hi Brett,
>
> First the answers of all your questions -
>
> 1.  If a self-heal deamon is listed on a host (all of mine show one with
> a volume status command) can I assume it's enabled and running?
>
> For your volume, projects self heal daemon is UP and running
>
> 2.  I assume the volume that has all the self-heals pending has some
> serious issues even though I can access the files and directories on
> it.  If self-heal is running shouldn't the numbers be decreasing?
>
> It should heal the entries and the number of entries coming in 
> "gluster v heal volname info" command should be decreasing.
>
> It appears to me self-heal is not working properly so how to I get it to
> start working or should I delete the volume and start over?
>
> As you can access all the files from mount point, I think the volume 
> and the files are in good state as of now.
> I don't think you should think of deleting your volume before trying 
> to fix it.
> If there is no fix or the fix is taking time you can go ahead with 
> that option.
>
> -----------------------
> Why all these options are off?
>
> performance.quick-read: off
> performance.parallel-readdir: off
> performance.readdir-ahead: off
> performance.write-behind: off
> performance.read-ahead: off
>
> Although this should not matter to your issue but I think you should 
> enable all the above unless you have a reason to not to do so.
> --------------------
>
> I would like you to perform following steps and provide some more 
> information -
>
> 1 - Try to restart self heal and see if that works.
> "gluster v start volume force" will kill and restart the self heal 
> processes.
>
> 2 - If step 1 is not fruitful, get the list of entries need to be 
> healed and pick one of the entry to heal. I mean we should focus on 
> one entry to find out why it is
> not getting healed instead of all the 5900 entries. Let's call it entry1.
>
> 3 -  Now access the entry1 from mount point, read, write on it and see 
> if this entry has been healed. Check for heal info. Accessing file 
> from mount point triggers client side heal
> which could also heal the file.
>
> 4 - Check for the logs in /var/log/gluster, mount logs and glustershd 
> logs should be checked and provided.
>
> 5 -  Get the external attributes of entry1 from all the bricks.
>
> If the path of the entry1 on mount point is /a/b/c/entry1 then you 
> have to run following command on all the nodes -
>
> getfattr -m. -d -e hex <path of the brick on the node>/a/b/c/entry1
>
> Please provide the output of above command too.
>
> ---
> Ashish
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> ------------------------------------------------------------------------
> *From: *"Brett Holcomb" <biholcomb at l1049h.com>
> *To: *gluster-users at gluster.org
> *Sent: *Friday, December 28, 2018 3:49:50 AM
> *Subject: *Re: [Gluster-users] Self Heal Confusion
>
> Resend as I did not reply to the list earlier.  TBird responded to the 
> poster and not the list.
>
> On 12/27/18 11:46 AM, Brett Holcomb wrote:
>
>     Thank you. I appreciate the help  Here is the information.  Let me
>     know if you need anything else.  I'm fairly new to gluster.
>
>     Gluster version is 5.2
>
>     1. gluster v info
>
>     Volume Name: projects
>     Type: Distributed-Replicate
>     Volume ID: 5aac71aa-feaa-44e9-a4f9-cb4dd6e0fdc3
>     Status: Started
>     Snapshot Count: 0
>     Number of Bricks: 2 x 3 = 6
>     Transport-type: tcp
>     Bricks:
>     Brick1: gfssrv1:/srv/gfs01/Projects
>     Brick2: gfssrv2:/srv/gfs01/Projects
>     Brick3: gfssrv3:/srv/gfs01/Projects
>     Brick4: gfssrv4:/srv/gfs01/Projects
>     Brick5: gfssrv5:/srv/gfs01/Projects
>     Brick6: gfssrv6:/srv/gfs01/Projects
>     Options Reconfigured:
>     cluster.self-heal-daemon: enable
>     performance.quick-read: off
>     performance.parallel-readdir: off
>     performance.readdir-ahead: off
>     performance.write-behind: off
>     performance.read-ahead: off
>     performance.client-io-threads: off
>     nfs.disable: on
>     transport.address-family: inet
>     server.allow-insecure: on
>     storage.build-pgfid: on
>     changelog.changelog: on
>     changelog.capture-del-path: on
>
>     2.  gluster v status
>
>     Status of volume: projects
>     Gluster process                             TCP Port  RDMA Port 
>     Online  Pid
>     ------------------------------------------------------------------------------
>     Brick gfssrv1:/srv/gfs01/Projects           49154 0         
>     Y       7213
>     Brick gfssrv2:/srv/gfs01/Projects           49154 0         
>     Y       6932
>     Brick gfssrv3:/srv/gfs01/Projects           49154 0         
>     Y       6920
>     Brick gfssrv4:/srv/gfs01/Projects           49154 0         
>     Y       6732
>     Brick gfssrv5:/srv/gfs01/Projects           49154 0         
>     Y       6950
>     Brick gfssrv6:/srv/gfs01/Projects           49154 0         
>     Y       6879
>     Self-heal Daemon on localhost               N/A N/A        Y      
>     11484
>     Self-heal Daemon on gfssrv2                 N/A N/A        Y      
>     10366
>     Self-heal Daemon on gfssrv4                 N/A N/A        Y      
>     9872
>     Self-heal Daemon on srv-1-gfs3.corp.l1049h.
>     net                                         N/A N/A        Y      
>     9892
>     Self-heal Daemon on gfssrv6                 N/A N/A        Y      
>     10372
>     Self-heal Daemon on gfssrv5                 N/A N/A        Y      
>     10761
>
>     Task Status of Volume projects
>     ------------------------------------------------------------------------------
>     There are no active volume tasks
>
>     3. I've given the summary since the actual list for two volumes is
>     around 5900 entries.
>
>     Brick gfssrv1:/srv/gfs01/Projects
>     Status: Connected
>     Total Number of entries: 85
>     Number of entries in heal pending: 85
>     Number of entries in split-brain: 0
>     Number of entries possibly healing: 0
>
>     Brick gfssrv2:/srv/gfs01/Projects
>     Status: Connected
>     Total Number of entries: 0
>     Number of entries in heal pending: 0
>     Number of entries in split-brain: 0
>     Number of entries possibly healing: 0
>
>     Brick gfssrv3:/srv/gfs01/Projects
>     Status: Connected
>     Total Number of entries: 0
>     Number of entries in heal pending: 0
>     Number of entries in split-brain: 0
>     Number of entries possibly healing: 0
>
>     Brick gfssrv4:/srv/gfs01/Projects
>     Status: Connected
>     Total Number of entries: 0
>     Number of entries in heal pending: 0
>     Number of entries in split-brain: 0
>     Number of entries possibly healing: 0
>
>     Brick gfssrv5:/srv/gfs01/Projects
>     Status: Connected
>     Total Number of entries: 58854
>     Number of entries in heal pending: 58854
>     Number of entries in split-brain: 0
>     Number of entries possibly healing: 0
>
>     Brick gfssrv6:/srv/gfs01/Projects
>     Status: Connected
>     Total Number of entries: 58854
>     Number of entries in heal pending: 58854
>     Number of entries in split-brain: 0
>     Number of entries possibly healing: 0
>
>     On 12/27/18 3:09 AM, Ashish Pandey wrote:
>
>         Hi Brett,
>
>         Could you please tell us more about the setup?
>
>         1 - Gluster v info
>         2 - gluster v status
>         3 - gluster v heal <volname> info
>
>         These are the very basic information to start with debugging
>         or suggesting any workaround.
>         It should always be included when asking such questions on
>         mailing list so that people can reply sooner.
>
>
>         Note: Please hide IP address/hostname or any other information
>         you don't want world to see.
>
>         ---
>         Ashish
>
>         ------------------------------------------------------------------------
>         *From: *"Brett Holcomb" <biholcomb at l1049h.com>
>         *To: *gluster-users at gluster.org
>         *Sent: *Thursday, December 27, 2018 12:19:15 AM
>         *Subject: *Re: [Gluster-users] Self Heal Confusion
>
>         Still no change in the heals pending.  I found this reference,
>         https://archive.fosdem.org/2017/schedule/event/glusterselinux/attachments/slides/1876/export/events/attachments/glusterselinux/slides/1876/fosdem.pdf,
>         which mentions the default SELinux context for a brick and
>         that internal operations such as self-heal, rebalance should
>         be ignored. but they do not elaborate on what ignore means -
>         is it just not doing self-heal or something else.
>
>         I did set SELinux to permissive and nothing changed.  I'll try
>         setting the bricks to the context mentioned in this pdf and
>         see what happens.
>
>
>         On 12/20/18 8:26 PM, John Strunk wrote:
>
>             Assuming your bricks are up... yes, the heal count should
>             be decreasing.
>
>             There is/was a bug wherein self-heal would stop healing
>             but would still be running. I don't know whether your
>             version is affected, but the remedy is to just restart the
>             self-heal daemon.
>             Force start one of the volumes that has heals pending. The
>             bricks are already running, but it will cause shd to
>             restart and, assuming this is the problem, healing should
>             begin...
>
>             $ gluster vol start my-pending-heal-vol force
>
>             Others could better comment on the status of the bug.
>
>             -John
>
>
>             On Thu, Dec 20, 2018 at 5:45 PM Brett Holcomb
>             <biholcomb at l1049h.com <mailto:biholcomb at l1049h.com>> wrote:
>
>                 I have one volume that has 85 pending entries in
>                 healing and two more
>                 volumes with 58,854 entries in healing pending. These
>                 numbers are from
>                 the volume heal info summary command.  They have
>                 stayed constant for two
>                 days now.  I've read the gluster docs and many more. 
>                 The Gluster docs
>                 just give some commands and non gluster docs basically
>                 repeat that.
>                 Given that it appears no self-healing is going on for
>                 my volume I am
>                 confused as to why.
>
>                 1.  If a self-heal deamon is listed on a host (all of
>                 mine show one with
>                 a volume status command) can I assume it's enabled and
>                 running?
>
>                 2.  I assume the volume that has all the self-heals
>                 pending has some
>                 serious issues even though I can access the files and
>                 directories on
>                 it.  If self-heal is running shouldn't the numbers be
>                 decreasing?
>
>                 It appears to me self-heal is not working properly so
>                 how to I get it to
>                 start working or should I delete the volume and start
>                 over?
>
>                 I'm running gluster 5.2 on Centos 7 latest and updated.
>
>                 Thank you.
>
>
>                 _______________________________________________
>                 Gluster-users mailing list
>                 Gluster-users at gluster.org
>                 <mailto:Gluster-users at gluster.org>
>                 https://lists.gluster.org/mailman/listinfo/gluster-users
>
>
>         _______________________________________________
>         Gluster-users mailing list
>         Gluster-users at gluster.org
>         https://lists.gluster.org/mailman/listinfo/gluster-users
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20181228/615914c8/attachment.html>