[Gluster-users] Self Heal Confusion
Brett Holcomb
biholcomb at l1049h.com
Sat Dec 29 01:29:58 UTC 2018
I've been trying to find the file name from guid with references such as
https://docs.gluster.org/en/latest/Troubleshooting/gfid-to-path/, the
script I referenced, and other ways but no luck. The Guid in the
command below does not exist in the directory
/srv/gfs01/Projects/.glusterfs/63/5a. Other files with a GUID for a
name exist.
It appears the files do not exist. In addition the file that is in the
63/5a directory shows a link to a file that does not exist.
On 12/28/18 6:23 PM, Brett Holcomb wrote:
>
> I've done step 1 with no results yet so I'm trying step 2 but can't
> find the file via the GFID name. The gluster volume heal projects
> info output is in a text file so I grabbed the first entry from the
> file for Brick gfssrv1:/srv/gfs01/Projects which is listed a
>
> <gfid:the long gfid>
>
> I then tried to use this method here,
> https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.2/html/administration_guide/ch,
> to find the file. However when I do the mount there is no .gfid
> directory anywhere.
>
> I then used the Gluster GFID resolver from here,
> https://gist.github.com/semiosis/4392640, and that gives me this
> output which has no file linked to it.
>
> [root at srv-1-gfs1 ~]# ./gfid-resolver.sh /srv/gfs01/Projects
> 6e5ab8ae-65f4-4594-9313-3483bf031adc
> 6e5ab8ae-65f4-4594-9313-3483bf031adc == File:
> Done.
>
> So at this point either I'm doing something wrong (most likely) or the
> files do not exist. I've tried this on several files.
>
>
>
> On 12/28/18 1:00 AM, Ashish Pandey wrote:
>>
>> Hi Brett,
>>
>> First the answers of all your questions -
>>
>> 1. If a self-heal deamon is listed on a host (all of mine show one with
>> a volume status command) can I assume it's enabled and running?
>>
>> For your volume, projects self heal daemon is UP and running
>>
>> 2. I assume the volume that has all the self-heals pending has some
>> serious issues even though I can access the files and directories on
>> it. If self-heal is running shouldn't the numbers be decreasing?
>>
>> It should heal the entries and the number of entries coming in
>> "gluster v heal volname info" command should be decreasing.
>>
>> It appears to me self-heal is not working properly so how to I get it to
>> start working or should I delete the volume and start over?
>>
>> As you can access all the files from mount point, I think the volume
>> and the files are in good state as of now.
>> I don't think you should think of deleting your volume before trying
>> to fix it.
>> If there is no fix or the fix is taking time you can go ahead with
>> that option.
>>
>> -----------------------
>> Why all these options are off?
>>
>> performance.quick-read: off
>> performance.parallel-readdir: off
>> performance.readdir-ahead: off
>> performance.write-behind: off
>> performance.read-ahead: off
>>
>> Although this should not matter to your issue but I think you should
>> enable all the above unless you have a reason to not to do so.
>> --------------------
>>
>> I would like you to perform following steps and provide some more
>> information -
>>
>> 1 - Try to restart self heal and see if that works.
>> "gluster v start volume force" will kill and restart the self heal
>> processes.
>>
>> 2 - If step 1 is not fruitful, get the list of entries need to be
>> healed and pick one of the entry to heal. I mean we should focus on
>> one entry to find out why it is
>> not getting healed instead of all the 5900 entries. Let's call it entry1.
>>
>> 3 - Now access the entry1 from mount point, read, write on it and
>> see if this entry has been healed. Check for heal info. Accessing
>> file from mount point triggers client side heal
>> which could also heal the file.
>>
>> 4 - Check for the logs in /var/log/gluster, mount logs and glustershd
>> logs should be checked and provided.
>>
>> 5 - Get the external attributes of entry1 from all the bricks.
>>
>> If the path of the entry1 on mount point is /a/b/c/entry1 then you
>> have to run following command on all the nodes -
>>
>> getfattr -m. -d -e hex <path of the brick on the node>/a/b/c/entry1
>>
>> Please provide the output of above command too.
>>
>> ---
>> Ashish
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> ------------------------------------------------------------------------
>> *From: *"Brett Holcomb" <biholcomb at l1049h.com>
>> *To: *gluster-users at gluster.org
>> *Sent: *Friday, December 28, 2018 3:49:50 AM
>> *Subject: *Re: [Gluster-users] Self Heal Confusion
>>
>> Resend as I did not reply to the list earlier. TBird responded to
>> the poster and not the list.
>>
>> On 12/27/18 11:46 AM, Brett Holcomb wrote:
>>
>> Thank you. I appreciate the help Here is the information. Let
>> me know if you need anything else. I'm fairly new to gluster.
>>
>> Gluster version is 5.2
>>
>> 1. gluster v info
>>
>> Volume Name: projects
>> Type: Distributed-Replicate
>> Volume ID: 5aac71aa-feaa-44e9-a4f9-cb4dd6e0fdc3
>> Status: Started
>> Snapshot Count: 0
>> Number of Bricks: 2 x 3 = 6
>> Transport-type: tcp
>> Bricks:
>> Brick1: gfssrv1:/srv/gfs01/Projects
>> Brick2: gfssrv2:/srv/gfs01/Projects
>> Brick3: gfssrv3:/srv/gfs01/Projects
>> Brick4: gfssrv4:/srv/gfs01/Projects
>> Brick5: gfssrv5:/srv/gfs01/Projects
>> Brick6: gfssrv6:/srv/gfs01/Projects
>> Options Reconfigured:
>> cluster.self-heal-daemon: enable
>> performance.quick-read: off
>> performance.parallel-readdir: off
>> performance.readdir-ahead: off
>> performance.write-behind: off
>> performance.read-ahead: off
>> performance.client-io-threads: off
>> nfs.disable: on
>> transport.address-family: inet
>> server.allow-insecure: on
>> storage.build-pgfid: on
>> changelog.changelog: on
>> changelog.capture-del-path: on
>>
>> 2. gluster v status
>>
>> Status of volume: projects
>> Gluster process TCP Port RDMA Port
>> Online Pid
>> ------------------------------------------------------------------------------
>> Brick gfssrv1:/srv/gfs01/Projects 49154 0
>> Y 7213
>> Brick gfssrv2:/srv/gfs01/Projects 49154 0
>> Y 6932
>> Brick gfssrv3:/srv/gfs01/Projects 49154 0
>> Y 6920
>> Brick gfssrv4:/srv/gfs01/Projects 49154 0
>> Y 6732
>> Brick gfssrv5:/srv/gfs01/Projects 49154 0
>> Y 6950
>> Brick gfssrv6:/srv/gfs01/Projects 49154 0
>> Y 6879
>> Self-heal Daemon on localhost N/A N/A
>> Y 11484
>> Self-heal Daemon on gfssrv2 N/A N/A
>> Y 10366
>> Self-heal Daemon on gfssrv4 N/A N/A
>> Y 9872
>> Self-heal Daemon on srv-1-gfs3.corp.l1049h.
>> net N/A N/A
>> Y 9892
>> Self-heal Daemon on gfssrv6 N/A N/A
>> Y 10372
>> Self-heal Daemon on gfssrv5 N/A N/A
>> Y 10761
>>
>> Task Status of Volume projects
>> ------------------------------------------------------------------------------
>> There are no active volume tasks
>>
>> 3. I've given the summary since the actual list for two volumes
>> is around 5900 entries.
>>
>> Brick gfssrv1:/srv/gfs01/Projects
>> Status: Connected
>> Total Number of entries: 85
>> Number of entries in heal pending: 85
>> Number of entries in split-brain: 0
>> Number of entries possibly healing: 0
>>
>> Brick gfssrv2:/srv/gfs01/Projects
>> Status: Connected
>> Total Number of entries: 0
>> Number of entries in heal pending: 0
>> Number of entries in split-brain: 0
>> Number of entries possibly healing: 0
>>
>> Brick gfssrv3:/srv/gfs01/Projects
>> Status: Connected
>> Total Number of entries: 0
>> Number of entries in heal pending: 0
>> Number of entries in split-brain: 0
>> Number of entries possibly healing: 0
>>
>> Brick gfssrv4:/srv/gfs01/Projects
>> Status: Connected
>> Total Number of entries: 0
>> Number of entries in heal pending: 0
>> Number of entries in split-brain: 0
>> Number of entries possibly healing: 0
>>
>> Brick gfssrv5:/srv/gfs01/Projects
>> Status: Connected
>> Total Number of entries: 58854
>> Number of entries in heal pending: 58854
>> Number of entries in split-brain: 0
>> Number of entries possibly healing: 0
>>
>> Brick gfssrv6:/srv/gfs01/Projects
>> Status: Connected
>> Total Number of entries: 58854
>> Number of entries in heal pending: 58854
>> Number of entries in split-brain: 0
>> Number of entries possibly healing: 0
>>
>> On 12/27/18 3:09 AM, Ashish Pandey wrote:
>>
>> Hi Brett,
>>
>> Could you please tell us more about the setup?
>>
>> 1 - Gluster v info
>> 2 - gluster v status
>> 3 - gluster v heal <volname> info
>>
>> These are the very basic information to start with debugging
>> or suggesting any workaround.
>> It should always be included when asking such questions on
>> mailing list so that people can reply sooner.
>>
>>
>> Note: Please hide IP address/hostname or any other
>> information you don't want world to see.
>>
>> ---
>> Ashish
>>
>> ------------------------------------------------------------------------
>> *From: *"Brett Holcomb" <biholcomb at l1049h.com>
>> *To: *gluster-users at gluster.org
>> *Sent: *Thursday, December 27, 2018 12:19:15 AM
>> *Subject: *Re: [Gluster-users] Self Heal Confusion
>>
>> Still no change in the heals pending. I found this
>> reference,
>> https://archive.fosdem.org/2017/schedule/event/glusterselinux/attachments/slides/1876/export/events/attachments/glusterselinux/slides/1876/fosdem.pdf,
>> which mentions the default SELinux context for a brick and
>> that internal operations such as self-heal, rebalance should
>> be ignored. but they do not elaborate on what ignore means -
>> is it just not doing self-heal or something else.
>>
>> I did set SELinux to permissive and nothing changed. I'll
>> try setting the bricks to the context mentioned in this pdf
>> and see what happens.
>>
>>
>> On 12/20/18 8:26 PM, John Strunk wrote:
>>
>> Assuming your bricks are up... yes, the heal count should
>> be decreasing.
>>
>> There is/was a bug wherein self-heal would stop healing
>> but would still be running. I don't know whether your
>> version is affected, but the remedy is to just restart
>> the self-heal daemon.
>> Force start one of the volumes that has heals pending.
>> The bricks are already running, but it will cause shd to
>> restart and, assuming this is the problem, healing should
>> begin...
>>
>> $ gluster vol start my-pending-heal-vol force
>>
>> Others could better comment on the status of the bug.
>>
>> -John
>>
>>
>> On Thu, Dec 20, 2018 at 5:45 PM Brett Holcomb
>> <biholcomb at l1049h.com <mailto:biholcomb at l1049h.com>> wrote:
>>
>> I have one volume that has 85 pending entries in
>> healing and two more
>> volumes with 58,854 entries in healing pending.
>> These numbers are from
>> the volume heal info summary command. They have
>> stayed constant for two
>> days now. I've read the gluster docs and many more.
>> The Gluster docs
>> just give some commands and non gluster docs
>> basically repeat that.
>> Given that it appears no self-healing is going on for
>> my volume I am
>> confused as to why.
>>
>> 1. If a self-heal deamon is listed on a host (all of
>> mine show one with
>> a volume status command) can I assume it's enabled
>> and running?
>>
>> 2. I assume the volume that has all the self-heals
>> pending has some
>> serious issues even though I can access the files and
>> directories on
>> it. If self-heal is running shouldn't the numbers be
>> decreasing?
>>
>> It appears to me self-heal is not working properly so
>> how to I get it to
>> start working or should I delete the volume and start
>> over?
>>
>> I'm running gluster 5.2 on Centos 7 latest and updated.
>>
>> Thank you.
>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> <mailto:Gluster-users at gluster.org>
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20181228/400d3dfd/attachment-0001.html>
More information about the Gluster-users
mailing list