[Gluster-users] Self Heal Confusion
Brett Holcomb
biholcomb at l1049h.com
Fri Dec 28 23:23:46 UTC 2018
I've done step 1 with no results yet so I'm trying step 2 but can't find
the file via the GFID name. The gluster volume heal projects info
output is in a text file so I grabbed the first entry from the file for
Brick gfssrv1:/srv/gfs01/Projects which is listed a
<gfid:the long gfid>
I then tried to use this method here,
https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.2/html/administration_guide/ch,
to find the file. However when I do the mount there is no .gfid
directory anywhere.
I then used the Gluster GFID resolver from here,
https://gist.github.com/semiosis/4392640, and that gives me this output
which has no file linked to it.
[root at srv-1-gfs1 ~]# ./gfid-resolver.sh /srv/gfs01/Projects
6e5ab8ae-65f4-4594-9313-3483bf031adc
6e5ab8ae-65f4-4594-9313-3483bf031adc == File:
Done.
So at this point either I'm doing something wrong (most likely) or the
files do not exist. I've tried this on several files.
On 12/28/18 1:00 AM, Ashish Pandey wrote:
>
> Hi Brett,
>
> First the answers of all your questions -
>
> 1. If a self-heal deamon is listed on a host (all of mine show one with
> a volume status command) can I assume it's enabled and running?
>
> For your volume, projects self heal daemon is UP and running
>
> 2. I assume the volume that has all the self-heals pending has some
> serious issues even though I can access the files and directories on
> it. If self-heal is running shouldn't the numbers be decreasing?
>
> It should heal the entries and the number of entries coming in
> "gluster v heal volname info" command should be decreasing.
>
> It appears to me self-heal is not working properly so how to I get it to
> start working or should I delete the volume and start over?
>
> As you can access all the files from mount point, I think the volume
> and the files are in good state as of now.
> I don't think you should think of deleting your volume before trying
> to fix it.
> If there is no fix or the fix is taking time you can go ahead with
> that option.
>
> -----------------------
> Why all these options are off?
>
> performance.quick-read: off
> performance.parallel-readdir: off
> performance.readdir-ahead: off
> performance.write-behind: off
> performance.read-ahead: off
>
> Although this should not matter to your issue but I think you should
> enable all the above unless you have a reason to not to do so.
> --------------------
>
> I would like you to perform following steps and provide some more
> information -
>
> 1 - Try to restart self heal and see if that works.
> "gluster v start volume force" will kill and restart the self heal
> processes.
>
> 2 - If step 1 is not fruitful, get the list of entries need to be
> healed and pick one of the entry to heal. I mean we should focus on
> one entry to find out why it is
> not getting healed instead of all the 5900 entries. Let's call it entry1.
>
> 3 - Now access the entry1 from mount point, read, write on it and see
> if this entry has been healed. Check for heal info. Accessing file
> from mount point triggers client side heal
> which could also heal the file.
>
> 4 - Check for the logs in /var/log/gluster, mount logs and glustershd
> logs should be checked and provided.
>
> 5 - Get the external attributes of entry1 from all the bricks.
>
> If the path of the entry1 on mount point is /a/b/c/entry1 then you
> have to run following command on all the nodes -
>
> getfattr -m. -d -e hex <path of the brick on the node>/a/b/c/entry1
>
> Please provide the output of above command too.
>
> ---
> Ashish
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> ------------------------------------------------------------------------
> *From: *"Brett Holcomb" <biholcomb at l1049h.com>
> *To: *gluster-users at gluster.org
> *Sent: *Friday, December 28, 2018 3:49:50 AM
> *Subject: *Re: [Gluster-users] Self Heal Confusion
>
> Resend as I did not reply to the list earlier. TBird responded to the
> poster and not the list.
>
> On 12/27/18 11:46 AM, Brett Holcomb wrote:
>
> Thank you. I appreciate the help Here is the information. Let me
> know if you need anything else. I'm fairly new to gluster.
>
> Gluster version is 5.2
>
> 1. gluster v info
>
> Volume Name: projects
> Type: Distributed-Replicate
> Volume ID: 5aac71aa-feaa-44e9-a4f9-cb4dd6e0fdc3
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 2 x 3 = 6
> Transport-type: tcp
> Bricks:
> Brick1: gfssrv1:/srv/gfs01/Projects
> Brick2: gfssrv2:/srv/gfs01/Projects
> Brick3: gfssrv3:/srv/gfs01/Projects
> Brick4: gfssrv4:/srv/gfs01/Projects
> Brick5: gfssrv5:/srv/gfs01/Projects
> Brick6: gfssrv6:/srv/gfs01/Projects
> Options Reconfigured:
> cluster.self-heal-daemon: enable
> performance.quick-read: off
> performance.parallel-readdir: off
> performance.readdir-ahead: off
> performance.write-behind: off
> performance.read-ahead: off
> performance.client-io-threads: off
> nfs.disable: on
> transport.address-family: inet
> server.allow-insecure: on
> storage.build-pgfid: on
> changelog.changelog: on
> changelog.capture-del-path: on
>
> 2. gluster v status
>
> Status of volume: projects
> Gluster process TCP Port RDMA Port
> Online Pid
> ------------------------------------------------------------------------------
> Brick gfssrv1:/srv/gfs01/Projects 49154 0
> Y 7213
> Brick gfssrv2:/srv/gfs01/Projects 49154 0
> Y 6932
> Brick gfssrv3:/srv/gfs01/Projects 49154 0
> Y 6920
> Brick gfssrv4:/srv/gfs01/Projects 49154 0
> Y 6732
> Brick gfssrv5:/srv/gfs01/Projects 49154 0
> Y 6950
> Brick gfssrv6:/srv/gfs01/Projects 49154 0
> Y 6879
> Self-heal Daemon on localhost N/A N/A Y
> 11484
> Self-heal Daemon on gfssrv2 N/A N/A Y
> 10366
> Self-heal Daemon on gfssrv4 N/A N/A Y
> 9872
> Self-heal Daemon on srv-1-gfs3.corp.l1049h.
> net N/A N/A Y
> 9892
> Self-heal Daemon on gfssrv6 N/A N/A Y
> 10372
> Self-heal Daemon on gfssrv5 N/A N/A Y
> 10761
>
> Task Status of Volume projects
> ------------------------------------------------------------------------------
> There are no active volume tasks
>
> 3. I've given the summary since the actual list for two volumes is
> around 5900 entries.
>
> Brick gfssrv1:/srv/gfs01/Projects
> Status: Connected
> Total Number of entries: 85
> Number of entries in heal pending: 85
> Number of entries in split-brain: 0
> Number of entries possibly healing: 0
>
> Brick gfssrv2:/srv/gfs01/Projects
> Status: Connected
> Total Number of entries: 0
> Number of entries in heal pending: 0
> Number of entries in split-brain: 0
> Number of entries possibly healing: 0
>
> Brick gfssrv3:/srv/gfs01/Projects
> Status: Connected
> Total Number of entries: 0
> Number of entries in heal pending: 0
> Number of entries in split-brain: 0
> Number of entries possibly healing: 0
>
> Brick gfssrv4:/srv/gfs01/Projects
> Status: Connected
> Total Number of entries: 0
> Number of entries in heal pending: 0
> Number of entries in split-brain: 0
> Number of entries possibly healing: 0
>
> Brick gfssrv5:/srv/gfs01/Projects
> Status: Connected
> Total Number of entries: 58854
> Number of entries in heal pending: 58854
> Number of entries in split-brain: 0
> Number of entries possibly healing: 0
>
> Brick gfssrv6:/srv/gfs01/Projects
> Status: Connected
> Total Number of entries: 58854
> Number of entries in heal pending: 58854
> Number of entries in split-brain: 0
> Number of entries possibly healing: 0
>
> On 12/27/18 3:09 AM, Ashish Pandey wrote:
>
> Hi Brett,
>
> Could you please tell us more about the setup?
>
> 1 - Gluster v info
> 2 - gluster v status
> 3 - gluster v heal <volname> info
>
> These are the very basic information to start with debugging
> or suggesting any workaround.
> It should always be included when asking such questions on
> mailing list so that people can reply sooner.
>
>
> Note: Please hide IP address/hostname or any other information
> you don't want world to see.
>
> ---
> Ashish
>
> ------------------------------------------------------------------------
> *From: *"Brett Holcomb" <biholcomb at l1049h.com>
> *To: *gluster-users at gluster.org
> *Sent: *Thursday, December 27, 2018 12:19:15 AM
> *Subject: *Re: [Gluster-users] Self Heal Confusion
>
> Still no change in the heals pending. I found this reference,
> https://archive.fosdem.org/2017/schedule/event/glusterselinux/attachments/slides/1876/export/events/attachments/glusterselinux/slides/1876/fosdem.pdf,
> which mentions the default SELinux context for a brick and
> that internal operations such as self-heal, rebalance should
> be ignored. but they do not elaborate on what ignore means -
> is it just not doing self-heal or something else.
>
> I did set SELinux to permissive and nothing changed. I'll try
> setting the bricks to the context mentioned in this pdf and
> see what happens.
>
>
> On 12/20/18 8:26 PM, John Strunk wrote:
>
> Assuming your bricks are up... yes, the heal count should
> be decreasing.
>
> There is/was a bug wherein self-heal would stop healing
> but would still be running. I don't know whether your
> version is affected, but the remedy is to just restart the
> self-heal daemon.
> Force start one of the volumes that has heals pending. The
> bricks are already running, but it will cause shd to
> restart and, assuming this is the problem, healing should
> begin...
>
> $ gluster vol start my-pending-heal-vol force
>
> Others could better comment on the status of the bug.
>
> -John
>
>
> On Thu, Dec 20, 2018 at 5:45 PM Brett Holcomb
> <biholcomb at l1049h.com <mailto:biholcomb at l1049h.com>> wrote:
>
> I have one volume that has 85 pending entries in
> healing and two more
> volumes with 58,854 entries in healing pending. These
> numbers are from
> the volume heal info summary command. They have
> stayed constant for two
> days now. I've read the gluster docs and many more.
> The Gluster docs
> just give some commands and non gluster docs basically
> repeat that.
> Given that it appears no self-healing is going on for
> my volume I am
> confused as to why.
>
> 1. If a self-heal deamon is listed on a host (all of
> mine show one with
> a volume status command) can I assume it's enabled and
> running?
>
> 2. I assume the volume that has all the self-heals
> pending has some
> serious issues even though I can access the files and
> directories on
> it. If self-heal is running shouldn't the numbers be
> decreasing?
>
> It appears to me self-heal is not working properly so
> how to I get it to
> start working or should I delete the volume and start
> over?
>
> I'm running gluster 5.2 on Centos 7 latest and updated.
>
> Thank you.
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> <mailto:Gluster-users at gluster.org>
> https://lists.gluster.org/mailman/listinfo/gluster-users
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20181228/ef9eebf5/attachment.html>
More information about the Gluster-users
mailing list