[Gluster-users] Self Heal Confusion

Sat Dec 29 01:29:58 UTC 2018

I've been trying to find the file name from guid with references such as 
https://docs.gluster.org/en/latest/Troubleshooting/gfid-to-path/, the 
script I referenced, and other ways but no luck.  The Guid in the 
command below does not exist in the directory 
/srv/gfs01/Projects/.glusterfs/63/5a.  Other files with a GUID for a 
name exist.

It appears the files do not exist.  In addition the file that is in the 
63/5a directory  shows a link to a file that does not exist.

On 12/28/18 6:23 PM, Brett Holcomb wrote:
>
> I've done step 1 with no results yet so I'm trying step 2 but can't 
> find the file via the GFID name.  The gluster volume heal projects 
> info output is in a text file so I grabbed the first entry from the 
> file for Brick gfssrv1:/srv/gfs01/Projects which is listed a
>
> <gfid:the long gfid>
>
> I then tried to use this method here, 
> https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.2/html/administration_guide/ch, 
> to find the  file.  However when I do the mount there is no .gfid 
> directory anywhere.
>
> I then used the Gluster GFID resolver from here, 
> https://gist.github.com/semiosis/4392640, and that gives me this 
> output which has no file linked to it.
>
> [root at srv-1-gfs1 ~]# ./gfid-resolver.sh /srv/gfs01/Projects 
> 6e5ab8ae-65f4-4594-9313-3483bf031adc
> 6e5ab8ae-65f4-4594-9313-3483bf031adc    ==      File:
> Done.
>
> So at this point either I'm doing something wrong (most likely) or the 
> files do not exist. I've tried this on several files.
>
>
>
> On 12/28/18 1:00 AM, Ashish Pandey wrote:
>>
>> Hi Brett,
>>
>> First the answers of all your questions -
>>
>> 1.  If a self-heal deamon is listed on a host (all of mine show one with
>> a volume status command) can I assume it's enabled and running?
>>
>> For your volume, projects self heal daemon is UP and running
>>
>> 2.  I assume the volume that has all the self-heals pending has some
>> serious issues even though I can access the files and directories on
>> it.  If self-heal is running shouldn't the numbers be decreasing?
>>
>> It should heal the entries and the number of entries coming in 
>> "gluster v heal volname info" command should be decreasing.
>>
>> It appears to me self-heal is not working properly so how to I get it to
>> start working or should I delete the volume and start over?
>>
>> As you can access all the files from mount point, I think the volume 
>> and the files are in good state as of now.
>> I don't think you should think of deleting your volume before trying 
>> to fix it.
>> If there is no fix or the fix is taking time you can go ahead with 
>> that option.
>>
>> -----------------------
>> Why all these options are off?
>>
>> performance.quick-read: off
>> performance.parallel-readdir: off
>> performance.readdir-ahead: off
>> performance.write-behind: off
>> performance.read-ahead: off
>>
>> Although this should not matter to your issue but I think you should 
>> enable all the above unless you have a reason to not to do so.
>> --------------------
>>
>> I would like you to perform following steps and provide some more 
>> information -
>>
>> 1 - Try to restart self heal and see if that works.
>> "gluster v start volume force" will kill and restart the self heal 
>> processes.
>>
>> 2 - If step 1 is not fruitful, get the list of entries need to be 
>> healed and pick one of the entry to heal. I mean we should focus on 
>> one entry to find out why it is
>> not getting healed instead of all the 5900 entries. Let's call it entry1.
>>
>> 3 -  Now access the entry1 from mount point, read, write on it and 
>> see if this entry has been healed. Check for heal info. Accessing 
>> file from mount point triggers client side heal
>> which could also heal the file.
>>
>> 4 - Check for the logs in /var/log/gluster, mount logs and glustershd 
>> logs should be checked and provided.
>>
>> 5 -  Get the external attributes of entry1 from all the bricks.
>>
>> If the path of the entry1 on mount point is /a/b/c/entry1 then you 
>> have to run following command on all the nodes -
>>
>> getfattr -m. -d -e hex <path of the brick on the node>/a/b/c/entry1
>>
>> Please provide the output of above command too.
>>
>> ---
>> Ashish
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> ------------------------------------------------------------------------
>> *From: *"Brett Holcomb" <biholcomb at l1049h.com>
>> *To: *gluster-users at gluster.org
>> *Sent: *Friday, December 28, 2018 3:49:50 AM
>> *Subject: *Re: [Gluster-users] Self Heal Confusion
>>
>> Resend as I did not reply to the list earlier.  TBird responded to 
>> the poster and not the list.
>>
>> On 12/27/18 11:46 AM, Brett Holcomb wrote:
>>
>>     Thank you. I appreciate the help  Here is the information.  Let
>>     me know if you need anything else. I'm fairly new to gluster.
>>
>>     Gluster version is 5.2
>>
>>     1. gluster v info
>>
>>     Volume Name: projects
>>     Type: Distributed-Replicate
>>     Volume ID: 5aac71aa-feaa-44e9-a4f9-cb4dd6e0fdc3
>>     Status: Started
>>     Snapshot Count: 0
>>     Number of Bricks: 2 x 3 = 6
>>     Transport-type: tcp
>>     Bricks:
>>     Brick1: gfssrv1:/srv/gfs01/Projects
>>     Brick2: gfssrv2:/srv/gfs01/Projects
>>     Brick3: gfssrv3:/srv/gfs01/Projects
>>     Brick4: gfssrv4:/srv/gfs01/Projects
>>     Brick5: gfssrv5:/srv/gfs01/Projects
>>     Brick6: gfssrv6:/srv/gfs01/Projects
>>     Options Reconfigured:
>>     cluster.self-heal-daemon: enable
>>     performance.quick-read: off
>>     performance.parallel-readdir: off
>>     performance.readdir-ahead: off
>>     performance.write-behind: off
>>     performance.read-ahead: off
>>     performance.client-io-threads: off
>>     nfs.disable: on
>>     transport.address-family: inet
>>     server.allow-insecure: on
>>     storage.build-pgfid: on
>>     changelog.changelog: on
>>     changelog.capture-del-path: on
>>
>>     2.  gluster v status
>>
>>     Status of volume: projects
>>     Gluster process                             TCP Port RDMA Port 
>>     Online  Pid
>>     ------------------------------------------------------------------------------
>>     Brick gfssrv1:/srv/gfs01/Projects           49154 0         
>>     Y       7213
>>     Brick gfssrv2:/srv/gfs01/Projects           49154 0         
>>     Y       6932
>>     Brick gfssrv3:/srv/gfs01/Projects           49154 0         
>>     Y       6920
>>     Brick gfssrv4:/srv/gfs01/Projects           49154 0         
>>     Y       6732
>>     Brick gfssrv5:/srv/gfs01/Projects           49154 0         
>>     Y       6950
>>     Brick gfssrv6:/srv/gfs01/Projects           49154 0         
>>     Y       6879
>>     Self-heal Daemon on localhost               N/A N/A       
>>     Y       11484
>>     Self-heal Daemon on gfssrv2                 N/A N/A       
>>     Y       10366
>>     Self-heal Daemon on gfssrv4                 N/A N/A       
>>     Y       9872
>>     Self-heal Daemon on srv-1-gfs3.corp.l1049h.
>>     net                                         N/A N/A       
>>     Y       9892
>>     Self-heal Daemon on gfssrv6                 N/A N/A       
>>     Y       10372
>>     Self-heal Daemon on gfssrv5                 N/A N/A       
>>     Y       10761
>>
>>     Task Status of Volume projects
>>     ------------------------------------------------------------------------------
>>     There are no active volume tasks
>>
>>     3. I've given the summary since the actual list for two volumes
>>     is around 5900 entries.
>>
>>     Brick gfssrv1:/srv/gfs01/Projects
>>     Status: Connected
>>     Total Number of entries: 85
>>     Number of entries in heal pending: 85
>>     Number of entries in split-brain: 0
>>     Number of entries possibly healing: 0
>>
>>     Brick gfssrv2:/srv/gfs01/Projects
>>     Status: Connected
>>     Total Number of entries: 0
>>     Number of entries in heal pending: 0
>>     Number of entries in split-brain: 0
>>     Number of entries possibly healing: 0
>>
>>     Brick gfssrv3:/srv/gfs01/Projects
>>     Status: Connected
>>     Total Number of entries: 0
>>     Number of entries in heal pending: 0
>>     Number of entries in split-brain: 0
>>     Number of entries possibly healing: 0
>>
>>     Brick gfssrv4:/srv/gfs01/Projects
>>     Status: Connected
>>     Total Number of entries: 0
>>     Number of entries in heal pending: 0
>>     Number of entries in split-brain: 0
>>     Number of entries possibly healing: 0
>>
>>     Brick gfssrv5:/srv/gfs01/Projects
>>     Status: Connected
>>     Total Number of entries: 58854
>>     Number of entries in heal pending: 58854
>>     Number of entries in split-brain: 0
>>     Number of entries possibly healing: 0
>>
>>     Brick gfssrv6:/srv/gfs01/Projects
>>     Status: Connected
>>     Total Number of entries: 58854
>>     Number of entries in heal pending: 58854
>>     Number of entries in split-brain: 0
>>     Number of entries possibly healing: 0
>>
>>     On 12/27/18 3:09 AM, Ashish Pandey wrote:
>>
>>         Hi Brett,
>>
>>         Could you please tell us more about the setup?
>>
>>         1 - Gluster v info
>>         2 - gluster v status
>>         3 - gluster v heal <volname> info
>>
>>         These are the very basic information to start with debugging
>>         or suggesting any workaround.
>>         It should always be included when asking such questions on
>>         mailing list so that people can reply sooner.
>>
>>
>>         Note: Please hide IP address/hostname or any other
>>         information you don't want world to see.
>>
>>         ---
>>         Ashish
>>
>>         ------------------------------------------------------------------------
>>         *From: *"Brett Holcomb" <biholcomb at l1049h.com>
>>         *To: *gluster-users at gluster.org
>>         *Sent: *Thursday, December 27, 2018 12:19:15 AM
>>         *Subject: *Re: [Gluster-users] Self Heal Confusion
>>
>>         Still no change in the heals pending.  I found this
>>         reference,
>>         https://archive.fosdem.org/2017/schedule/event/glusterselinux/attachments/slides/1876/export/events/attachments/glusterselinux/slides/1876/fosdem.pdf,
>>         which mentions the default SELinux context for a brick and
>>         that internal operations such as self-heal, rebalance should
>>         be ignored. but they do not elaborate on what ignore means -
>>         is it just not doing self-heal or something else.
>>
>>         I did set SELinux to permissive and nothing changed.  I'll
>>         try setting the bricks to the context mentioned in this pdf
>>         and see what happens.
>>
>>
>>         On 12/20/18 8:26 PM, John Strunk wrote:
>>
>>             Assuming your bricks are up... yes, the heal count should
>>             be decreasing.
>>
>>             There is/was a bug wherein self-heal would stop healing
>>             but would still be running. I don't know whether your
>>             version is affected, but the remedy is to just restart
>>             the self-heal daemon.
>>             Force start one of the volumes that has heals pending.
>>             The bricks are already running, but it will cause shd to
>>             restart and, assuming this is the problem, healing should
>>             begin...
>>
>>             $ gluster vol start my-pending-heal-vol force
>>
>>             Others could better comment on the status of the bug.
>>
>>             -John
>>
>>
>>             On Thu, Dec 20, 2018 at 5:45 PM Brett Holcomb
>>             <biholcomb at l1049h.com <mailto:biholcomb at l1049h.com>> wrote:
>>
>>                 I have one volume that has 85 pending entries in
>>                 healing and two more
>>                 volumes with 58,854 entries in healing pending. 
>>                 These numbers are from
>>                 the volume heal info summary command.  They have
>>                 stayed constant for two
>>                 days now.  I've read the gluster docs and many more. 
>>                 The Gluster docs
>>                 just give some commands and non gluster docs
>>                 basically repeat that.
>>                 Given that it appears no self-healing is going on for
>>                 my volume I am
>>                 confused as to why.
>>
>>                 1.  If a self-heal deamon is listed on a host (all of
>>                 mine show one with
>>                 a volume status command) can I assume it's enabled
>>                 and running?
>>
>>                 2.  I assume the volume that has all the self-heals
>>                 pending has some
>>                 serious issues even though I can access the files and
>>                 directories on
>>                 it.  If self-heal is running shouldn't the numbers be
>>                 decreasing?
>>
>>                 It appears to me self-heal is not working properly so
>>                 how to I get it to
>>                 start working or should I delete the volume and start
>>                 over?
>>
>>                 I'm running gluster 5.2 on Centos 7 latest and updated.
>>
>>                 Thank you.
>>
>>
>>                 _______________________________________________
>>                 Gluster-users mailing list
>>                 Gluster-users at gluster.org
>>                 <mailto:Gluster-users at gluster.org>
>>                 https://lists.gluster.org/mailman/listinfo/gluster-users
>>
>>
>>         _______________________________________________
>>         Gluster-users mailing list
>>         Gluster-users at gluster.org
>>         https://lists.gluster.org/mailman/listinfo/gluster-users
>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20181228/400d3dfd/attachment-0001.html>