[Gluster-users] gluster 5.11 issue: Endless performing entry selfheal / Completed entry selfheal on the same pending heal files after server reboot + very high load

Artem Russakovskii archon810 at gmail.com
Wed May 20 06:09:12 UTC 2020


>
> @Artem what is the average size  of the files for your apaches ?
>

The average size is probably 15-20MB, but the files are as large as 100MB+.
The files are a combination of Android APK files and image files.

Sincerely,
Artem

--
Founder, Android Police <http://www.androidpolice.com>, APK Mirror
<http://www.apkmirror.com/>, Illogical Robot LLC
beerpla.net | @ArtemR <http://twitter.com/ArtemR>


On Fri, May 15, 2020 at 11:20 PM Strahil Nikolov <hunter86_bg at yahoo.com>
wrote:

> On May 16, 2020 1:14:17 AM GMT+03:00, Artem Russakovskii <
> archon810 at gmail.com> wrote:
> >Hi Hari,
> >
> >Thanks for the analysis.
> >
> >   1. I understand why the rebooted node would have 0 heal state files
> >whereas the other nodes would be going up. The problem with 5.11 was
> >that
> >   there was a bug that prevented the heal from completing, which as I
> >   mentioned was fixed in 5.13.
> >
> >   2. If the number of files to heal is known, why are operations like
> >md5sum needed to perform the heal at all? Why doesn't the auto heal
> >agent
> >just go through the list one file at a time and perform the heal, then
> >mark
> >the files as healed? From what I've seen, even requesting a manual heal
> >didn't do anything until those files were touched in some way (like
> >md5sum).
> >
> >   3. cscope? I am unable to find what you're talking about -
> >   http://cscope.sourceforge.net/ seems to be a code search tool?
> >
> >   4. What's the best way to analyze and present the data about what's
> >   raising the load to 100+ on all nodes after reboots? If there's some
> >monitoring tool I could run, reproduce the issue, then save the output
> >and
> >   send it here, that'd be great.
> >
> > 5. Based on what I've seen when I straced apache processes, they would
> >all hang for a long time when they ran across some of the gluster data.
> > Specifically, one of the directories (with only several files, nothing
> >  special) which may be queried in some way by a lot of page loads (for
> >context, we have 2 busy WordPress sites), would come up a lot in
> >straces
> >and hang. I even tried moving this directory out of gluster and adding
> >a
> > symlink, but I'm not sure that helped. I wonder if multiple conditions
> >cause frequent read access to a certain resource in gluster to spike
> >out of
> >   control and go haywire.
> >Here, the gluster root directory is SITE/public/wp-content/uploads, and
> >   the dirs I saw the most are symlinked as follows:
> >   lrwxrwxrwx   1 archon810 users       73 Apr 26 15:47
> > wp-security-audit-log -> SITE/public/wp-content/wp-security-audit-log/
> >(belongs to https://wordpress.org/plugins/wp-security-audit-log/ but
> >the
> >   Pro version)
> >
> >   A sample strace log of requests to this dir:
> >   [2020-05-15 15:10:15 PDT]
> >   stat("SITE/public/wp-content/uploads/wp-security-audit-log",
> >   {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
> >   [2020-05-15 15:10:15 PDT]
> >access("SITE/public/wp-content/uploads/wp-security-audit-log", R_OK) =
> >0
> >   [2020-05-15 15:10:15 PDT]
>
> >access("SITE/public/wp-content/uploads/wp-security-audit-log/custom-alerts.php",
> >   F_OK) = -1 ENOENT (No such file or directory)
> >   [2020-05-15 15:10:15 PDT]
>
> >stat("SITE/public/wp-content/uploads/wp-security-audit-log/custom-sensors/",
> >   0x7ffeff14c4d0) = -1 ENOENT (No such file or directory)
> >   [2020-05-15 15:10:18 PDT]
> >   stat("SITE/public/wp-content/uploads/wp-security-audit-log",
> >   {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0
> >   [2020-05-15 15:10:18 PDT]
> >access("SITE/public/wp-content/uploads/wp-security-audit-log", R_OK) =
> >0
> >   [2020-05-15 15:10:18 PDT]
>
> >access("SITE/public/wp-content/uploads/wp-security-audit-log/custom-alerts.php",
> >   F_OK) = -1 ENOENT (No such file or directory)
> >   [2020-05-15 15:10:18 PDT]
>
> >stat("SITE/public/wp-content/uploads/wp-security-audit-log/custom-sensors/",
> >   0x7ffeff14c4d0) = -1 ENOENT (No such file or directory)
> >   [2020-05-15 15:10:19 PDT]
> >   stat("SITE/public/wp-content/uploads/wp-security-audit-log",
> >   {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
> >   [2020-05-15 15:10:19 PDT]
> >access("SITE/public/wp-content/uploads/wp-security-audit-log", R_OK) =
> >0
> >   [2020-05-15 15:10:19 PDT]
>
> >access("SITE/public/wp-content/uploads/wp-security-audit-log/custom-alerts.php",
> >   F_OK) = -1 ENOENT (No such file or directory)
> >   [2020-05-15 15:10:19 PDT]
>
> >stat("SITE/public/wp-content/uploads/wp-security-audit-log/custom-sensors/",
> >   0x7ffeff14c4d0) = -1 ENOENT (No such file or directory)
> >   [2020-05-15 15:10:21 PDT]
> >   stat("SITE/public/wp-content/uploads/wp-security-audit-log",
> >   {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0
> >   [2020-05-15 15:10:21 PDT]
> >access("SITE/public/wp-content/uploads/wp-security-audit-log", R_OK) =
> >0
> >   [2020-05-15 15:10:21 PDT]
>
> >access("SITE/public/wp-content/uploads/wp-security-audit-log/custom-alerts.php",
> >   F_OK) = -1 ENOENT (No such file or directory)
> >   [2020-05-15 15:10:21 PDT]
>
> >stat("SITE/public/wp-content/uploads/wp-security-audit-log/custom-sensors/",
> >   0x7ffeff14c4d0) = -1 ENOENT (No such file or directory)
> >   [2020-05-15 15:10:23 PDT]
> >   stat("SITE/public/wp-content/uploads/wp-security-audit-log",
> >   {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
> >   [2020-05-15 15:10:23 PDT]
> >access("SITE/public/wp-content/uploads/wp-security-audit-log", R_OK) =
> >0
> >   [2020-05-15 15:10:23 PDT]
>
> >access("SITE/public/wp-content/uploads/wp-security-audit-log/custom-alerts.php",
> >   F_OK) = -1 ENOENT (No such file or directory)
> >   [2020-05-15 15:10:23 PDT]
>
> >stat("SITE/public/wp-content/uploads/wp-security-audit-log/custom-sensors/",
> >   0x7ffeff14c4d0) = -1 ENOENT (No such file or directory)
> >   [2020-05-15 15:10:25 PDT]
> >   stat("SITE/public/wp-content/uploads/wp-security-audit-log",
> >   {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
> >   [2020-05-15 15:10:25 PDT]
> >access("SITE/public/wp-content/uploads/wp-security-audit-log", R_OK) =
> >0
> >   [2020-05-15 15:10:25 PDT]
>
> >access("SITE/public/wp-content/uploads/wp-security-audit-log/custom-alerts.php",
> >   F_OK) = -1 ENOENT (No such file or directory)
> >   [2020-05-15 15:10:25 PDT]
>
> >stat("SITE/public/wp-content/uploads/wp-security-audit-log/custom-sensors/",
> >   0x7ffeff14c4d0) = -1 ENOENT (No such file or directory)
> >   [2020-05-15 15:10:27 PDT]
> >   stat("SITE/public/wp-content/uploads/wp-security-audit-log",
> >   {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0
> >   [2020-05-15 15:10:27 PDT]
> >access("SITE/public/wp-content/uploads/wp-security-audit-log", R_OK) =
> >0
> >   [2020-05-15 15:10:27 PDT]
>
> >access("SITE/public/wp-content/uploads/wp-security-audit-log/custom-alerts.php",
> >   F_OK) = -1 ENOENT (No such file or directory)
> >   [2020-05-15 15:10:27 PDT]
>
> >stat("SITE/public/wp-content/uploads/wp-security-audit-log/custom-sensors/",
> >   0x7ffeff14c4d0) = -1 ENOENT (No such file or directory)
> >
> >The load spikes and everything hanging is seriously stressing me out
> >because at any point it can cause an outage for us.
> >
> >Sincerely,
> >Artem
> >
> >--
> >Founder, Android Police <http://www.androidpolice.com>, APK Mirror
> ><http://www.apkmirror.com/>, Illogical Robot LLC
> >beerpla.net | @ArtemR <http://twitter.com/ArtemR>
> >
> >
> >On Thu, May 7, 2020 at 11:32 PM Hari Gowtham <hgowtham at redhat.com>
> >wrote:
> >
> >> The heal info is working fine. The explanation to what's happening:
> >> When a node goes down, the changes to this node can't be done. So on
> >the
> >> other nodes which were up, get the changes and keeps track saying
> >> these files were changed (note: this change hasn't been reflected to
> >the
> >> node which was down). Once the node down comes back up,
> >> it doesn't know what happened when it was down. But the other nodes
> >know
> >> that there are a few changes which didn't make it to the rebooted
> >node.
> >> So the node down is blamed by the other nodes .This is what is shown
> >in
> >> the heal info. As the node which was up doesn't have any change that
> >went
> >> into that node alone.
> >> It says 0 files to be healed and the other nodes as it has the data
> >say
> >> which are the files that need to heal.
> >> This is the expected working.
> >> So as per the rebooted node, heal info is working fine.
> >>
> >> About healing the file itself:
> >> Doing an operation on a file, triggers client side heal as per the
> >design,
> >> that's the reason these files are getting corrected after the md5sum
> >(I
> >> hope this is done from the client side not the backend itself).
> >> So this is expected.
> >> About the heals not happening for a long time, there can be some
> >issue
> >> there.
> >> @Karthik Subrahmanya <ksubrahm at redhat.com> is the better person to
> >help
> >> you with this.
> >>
> >> About the CPU usage going higher:
> >> We need info about what is consuming more CPU.
> >> Glusterd needs to do a bit of handshake and connect after reboot.
> >During
> >> this a little bit of data is transferred as well.
> >> If the number of nodes goes higher it can contribute to hike.
> >> Similarly, if the heal is happening, then it can increase the usage.
> >> So we need info about what is consuming the cpu to know if it's
> >expected
> >> or not.
> >> If this hike is expected, you can try using cscope to restrict the
> >cpu
> >> usage by that particular process.
> >>
> >> On Tue, Apr 28, 2020 at 3:02 AM Artem Russakovskii
> ><archon810 at gmail.com>
> >> wrote:
> >>
> >>> Good news, after upgrading to 5.13 and running this scenario again,
> >the
> >>> self heal actually succeeded without my intervention following a
> >server
> >>> reboot.
> >>>
> >>> The load was still high during this process, but at least the
> >endless
> >>> heal issue is resolved.
> >>>
> >>> I'd still love to hear from the team on managing heal load spikes.
> >>>
> >>> Sincerely,
> >>> Artem
> >>>
> >>> --
> >>> Founder, Android Police <http://www.androidpolice.com>, APK Mirror
> >>> <http://www.apkmirror.com/>, Illogical Robot LLC
> >>> beerpla.net | @ArtemR <http://twitter.com/ArtemR>
> >>>
> >>>
> >>> On Sun, Apr 26, 2020 at 3:13 PM Artem Russakovskii
> ><archon810 at gmail.com>
> >>> wrote:
> >>>
> >>>> Hi all,
> >>>>
> >>>> I've been observing this problem for a long time now and it's time
> >to
> >>>> finally figure out what's going on.
> >>>>
> >>>> We're running gluster 5.11 and have a 10TB 1 x 4 = 4 replicate
> >volume.
> >>>> I'll include its slightly redacted config below.
> >>>>
> >>>> When I reboot one of the servers and it goes offline for a bit,
> >when it
> >>>> comes back, heal info tells me there are some files and dirs that
> >are "heal
> >>>> pending". 0 "split-brain" and "possibly healing" - only "heal
> >pending"
> >>>> are >0.
> >>>>
> >>>>    1. For some reason, the server that was rebooted shows "heal
> >>>>    pending" 0. All other servers show "heal pending" with some
> >number, say 65.
> >>>>    2. We have cluster.self-heal-daemon enabled.
> >>>>    3. The logs are full of "performing entry selfheal" and
> >"completed
> >>>>    entry selfheal" messages that continue to print endlessly.
> >>>>    4. This "heal pending" number never goes down by itself, but it
> >does
> >>>>    if I run some operation on it, like md5sum.
> >>>>    5. When the server goes down for reboot and especially when it
> >comes
> >>>>    back, the load on ALL servers shoots up through the roof (load
> >of 100+) and
> >>>>    ends up bringing everything down, including apache and nginx. My
> >theory is
> >>>>    that self-heal kicks in so hard that it kills IO on these
> >attached Linode
> >>>>    block devices. However, after some time - say 10 minutes - the
> >load
> >>>>    subsides, but the "heal pending" remains and the gluster logs
> >continue to
> >>>>    output "performing entry selfheal" and "completed entry
> >selfheal" messages.
> >>>>    This load spike has become a huge issue for us because it brings
> >down the
> >>>>    whole site for entire minutes.
> >>>>    6. At this point in my investigation, I noticed that the
> >>>>    selfheal messages actually repeat for the same gfids over and
> >over.
> >>>>    [2020-04-26 21:32:29.877987] I [MSGID: 108026]
> >>>>    [afr-self-heal-entry.c:897:afr_selfheal_entry_do]
> >0-SNIP_data1-replicate-0:
> >>>>    performing entry selfheal on
> >96d282cf-402f-455c-9add-5f03c088a1bc
> >>>>    [2020-04-26 21:32:29.901246] I [MSGID: 108026]
> >>>>    [afr-self-heal-common.c:1729:afr_log_selfheal]
> >0-SNIP_data1-replicate-0:
> >>>>    Completed entry selfheal on
> >96d282cf-402f-455c-9add-5f03c088a1bc. sources=
> >>>>    sinks=0 1 2
> >>>>    [2020-04-26 21:32:32.171959] I [MSGID: 108026]
> >>>>    [afr-self-heal-entry.c:897:afr_selfheal_entry_do]
> >0-SNIP_data1-replicate-0:
> >>>>    performing entry selfheal on
> >96d282cf-402f-455c-9add-5f03c088a1bc
> >>>>    [2020-04-26 21:32:32.225828] I [MSGID: 108026]
> >>>>    [afr-self-heal-common.c:1729:afr_log_selfheal]
> >0-SNIP_data1-replicate-0:
> >>>>    Completed entry selfheal on
> >96d282cf-402f-455c-9add-5f03c088a1bc. sources=
> >>>>    sinks=0 1 2
> >>>>    [2020-04-26 21:32:33.346990] I [MSGID: 108026]
> >>>>    [afr-self-heal-entry.c:897:afr_selfheal_entry_do]
> >0-SNIP_data1-replicate-0:
> >>>>    performing entry selfheal on
> >96d282cf-402f-455c-9add-5f03c088a1bc
> >>>>    [2020-04-26 21:32:33.374413] I [MSGID: 108026]
> >>>>    [afr-self-heal-common.c:1729:afr_log_selfheal]
> >0-SNIP_data1-replicate-0:
> >>>>    Completed entry selfheal on
> >96d282cf-402f-455c-9add-5f03c088a1bc. sources=
> >>>>    sinks=0 1 2
> >>>>    7. I used gfid-resolver.sh from
> >https://gist.github.com/4392640.git
> >>>>    to resolve this gfid to the real location and yup - it was one
> >of the files
> >>>>    (a dir actually) listed as "heal pending" in heal info. As soon
> >as I ran
> >>>>    md5sum on the file inside (which was also listed in "heal
> >pending"), the
> >>>>    log messages stopped repeating for this entry and it disappeared
> >from "heal
> >>>>    pending" heal info. These were the final log lines:
> >>>>    [2020-04-26 21:32:35.642662] I [MSGID: 108026]
> >>>>    [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do]
> >>>>    0-SNIP_data1-replicate-0: performing metadata selfheal on
> >>>>    96d282cf-402f-455c-9add-5f03c088a1bc
> >>>>    [2020-04-26 21:32:35.658714] I [MSGID: 108026]
> >>>>    [afr-self-heal-common.c:1729:afr_log_selfheal]
> >0-SNIP_data1-replicate-0:
> >>>>    Completed metadata selfheal on
> >96d282cf-402f-455c-9add-5f03c088a1bc.
> >>>>    sources=0 [1] 2  sinks=3
> >>>>    [2020-04-26 21:32:35.686509] I [MSGID: 108026]
> >>>>    [afr-self-heal-entry.c:897:afr_selfheal_entry_do]
> >0-SNIP_data1-replicate-0:
> >>>>    performing entry selfheal on
> >96d282cf-402f-455c-9add-5f03c088a1bc
> >>>>    [2020-04-26 21:32:35.720387] I [MSGID: 108026]
> >>>>    [afr-self-heal-common.c:1729:afr_log_selfheal]
> >0-SNIP_data1-replicate-0:
> >>>>    Completed entry selfheal on
> >96d282cf-402f-455c-9add-5f03c088a1bc. sources=0
> >>>>    [1] 2  sinks=3
> >>>>
> >>>> I have to repeat this song and dance every time I reboot servers
> >and run
> >>>> md5sum on each "heal pending" file or else the messages will
> >continue
> >>>> presumably indefinitely. In the meantime, the files seem to be fine
> >when
> >>>> accessed.
> >>>>
> >>>> What I don't understand is:
> >>>>
> >>>>    1. Why doesn't gluster just heal them properly instead of
> >getting
> >>>>    stuck? Or maybe this was fixed in v6 or v7, which I haven't
> >upgraded to due
> >>>>    to waiting for another unrelated issue to be fixed?
> >>>>    2. Why does heal info show 0 "heal pending" files on the server
> >that
> >>>>    was rebooted, but all other servers show the same number of
> >"heal pending"
> >>>>    entries >0?
> >>>>    3. Why are there these insane load spikes upon going down and
> >>>>    especially coming back online? Is it related to the issue here?
> >I'm pretty
> >>>>    sure that it didn't happen in previous versions of gluster, when
> >this issue
> >>>>    didn't manifest - I could easily bring down one of the servers
> >without it
> >>>>    creating havoc when it comes back online.
> >>>>
> >>>> Here's the volume info:
> >>>>
> >>>> Volume Name: SNIP_data1
> >>>>
> >>>> Type: Replicate
> >>>>
> >>>> Volume ID: 11ecee7e-d4f8-497a-9994-ceb144d6841e
> >>>>
> >>>> Status: Started
> >>>>
> >>>> Snapshot Count: 0
> >>>>
> >>>> Number of Bricks: 1 x 4 = 4
> >>>>
> >>>> Transport-type: tcp
> >>>>
> >>>> Bricks:
> >>>>
> >>>> Brick1: SNIP:/mnt/SNIP_block1/SNIP_data1
> >>>>
> >>>> Brick2: SNIP:/mnt/SNIP_block1/SNIP_data1
> >>>>
> >>>> Brick3: SNIP:/mnt/SNIP_block1/SNIP_data1
> >>>>
> >>>> Brick4: SNIP:/mnt/SNIP_block1/SNIP_data1
> >>>>
> >>>> Options Reconfigured:
> >>>>
> >>>> performance.client-io-threads: on
> >>>>
> >>>> nfs.disable: on
> >>>>
> >>>> transport.address-family: inet
> >>>>
> >>>> cluster.self-heal-daemon: enable
> >>>>
> >>>> performance.cache-size: 1GB
> >>>>
> >>>> cluster.lookup-optimize: on
> >>>>
> >>>> performance.read-ahead: off
> >>>>
> >>>> client.event-threads: 4
> >>>>
> >>>> server.event-threads: 4
> >>>>
> >>>> performance.io-thread-count: 32
> >>>>
> >>>> cluster.readdir-optimize: on
> >>>>
> >>>> features.cache-invalidation: on
> >>>>
> >>>> features.cache-invalidation-timeout: 600
> >>>>
> >>>> performance.stat-prefetch: on
> >>>>
> >>>> performance.cache-invalidation: on
> >>>>
> >>>> performance.md-cache-timeout: 600
> >>>>
> >>>> network.inode-lru-limit: 500000
> >>>>
> >>>> performance.parallel-readdir: on
> >>>>
> >>>> performance.readdir-ahead: on
> >>>>
> >>>> performance.rda-cache-limit: 256MB
> >>>>
> >>>> network.remote-dio: enable
> >>>>
> >>>> network.ping-timeout: 5
> >>>>
> >>>> cluster.quorum-type: fixed
> >>>>
> >>>> cluster.quorum-count: 1
> >>>>
> >>>> cluster.granular-entry-heal: enable
> >>>>
> >>>> cluster.data-self-heal-algorithm: full
> >>>>
> >>>>
> >>>> Appreciate any insight. Thank you.
> >>>>
> >>>> Sincerely,
> >>>> Artem
> >>>>
> >>>> --
> >>>> Founder, Android Police <http://www.androidpolice.com>, APK Mirror
> >>>> <http://www.apkmirror.com/>, Illogical Robot LLC
> >>>> beerpla.net | @ArtemR <http://twitter.com/ArtemR>
> >>>>
> >>> ________
> >>>
> >>>
> >>>
> >>> Community Meeting Calendar:
> >>>
> >>> Schedule -
> >>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> >>> Bridge: https://bluejeans.com/441850968
> >>>
> >>> Gluster-users mailing list
> >>> Gluster-users at gluster.org
> >>> https://lists.gluster.org/mailman/listinfo/gluster-users
> >>>
> >>
> >>
> >> --
> >> Regards,
> >> Hari Gowtham.
> >>
>
> Hi Hari,
>
> I can confirm that I have  observed what Artem has described on v7.X .
> When one of my Gluster  nodes  was down for some time (more  data for
> heal) and powered up - the node  is barely responsive over ssh (separate
> network than the gluster one) and the system is seriously  loaded untill
> the heal is over.
> I'm using the defaults for healing options.
> Does  the healing process require  some checksumming  on blamed entries ?
>
>
> @Artem, Gluster has 2 kinds  of healing.
> A) FUSE clients can cause a healing of a file which is not the same on all
> bricks
> This is why md5sum causes a heal.
> Usually even a simple 'stat' will trigger that, but I have noticed that
> Gluster with sharding might require reading the file with an offset that
> matches the shard in order this type of heal to work.
>
> B) There is a heal daemon that runs every 15min (or somewhere there)
> which crawls over the entries blamed and  triggers healing .
>
> Also, as far as I know, each file that is being healed  is  also locked
> for the duration of the heal. That was the reason why oVirt uses  sharding
> , so instead of the whole  disk being locked - only a small piece (shard )
> is locked  untill healed.
>
> @Artem what is the average size  of the files for your apaches ?
>
> Best Regards,
> Strahil Nikolov
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20200519/d3611654/attachment.html>


More information about the Gluster-users mailing list