[Gluster-users] 3.8.3 Shards Healing Glacier Slow

David Gossage dgossage at carouselchecks.com
Wed Aug 31 09:04:03 UTC 2016


On Wed, Aug 31, 2016 at 3:50 AM, Krutika Dhananjay <kdhananj at redhat.com>
wrote:

> No, sorry, it's working fine. I may have missed some step because of which
> i saw that problem. /.shard is also healing fine now.
>
> Let me know if it works for you.
>
> -Krutika
>
> On Wed, Aug 31, 2016 at 12:49 PM, Krutika Dhananjay <kdhananj at redhat.com>
> wrote:
>
>> OK I just hit the other issue too, where .shard doesn't get healed. :)
>>
>> Investigating as to why that is the case. Give me some time.
>>
>> -Krutika
>>
>> On Wed, Aug 31, 2016 at 12:39 PM, Krutika Dhananjay <kdhananj at redhat.com>
>> wrote:
>>
>>> Just figured the steps Anuradha has provided won't work if granular
>>> entry heal is on.
>>> So when you bring down a brick and create fake2 under / of the volume,
>>> granular entry heal feature causes
>>> sh to remember only the fact that 'fake2' needs to be recreated on the
>>> offline brick (because changelogs are granular).
>>>
>>> In this case, we would be required to indicate to self-heal-daemon that
>>> the entire directory tree from '/' needs to be repaired on the brick that
>>> contains no data.
>>>
>>> To fix this, I did the following (for users who use granular entry
>>> self-healing):
>>>
>>> 1. Kill the last brick process in the replica (/bricks/3)
>>>
>>> 2. [root at server-3 ~]# rm -rf /bricks/3
>>>
>>> 3. [root at server-3 ~]# mkdir /bricks/3
>>>
>>> 4. Create a new dir on the mount point:
>>>     [root at client-1 ~]# mkdir /mnt/fake
>>>
>>> 5. Set some fake xattr on the root of the volume, and not the 'fake'
>>> directory itself.
>>>     [root at client-1 ~]# setfattr -n "user.some-name" -v "some-value" /mnt
>>>
>>> 6. Make sure there's no io happening on your volume.
>>>
>>
I'll test this on dev today.  But for my case in production this means I'll
need to shut down every VM after work for this heal?  Will the fact I have
6k files already listed as needing heals affect anything?


>
>>> 7. Check the pending xattrs on the brick directories of the two good
>>> copies (on bricks 1 and 2), you should be seeing same values as the one
>>> marked in red in both bricks.
>>> (note that the client-<num> xattr key will have the same last digit as
>>> the index of the brick that is down, when counting from 0. So if the first
>>> brick is the one that is down, it would read trusted.afr.*-client-0; if the
>>> second brick is the one that is empty and down, it would read
>>> trusted.afr.*-client-1 and so on).
>>>
>>> [root at server-1 ~]# getfattr -d -m . -e hex /bricks/1
>>> # file: 1
>>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7
>>> 23a6574635f72756e74696d655f743a733000
>>> trusted.afr.dirty=0x000000000000000000000000
>>> *trusted.afr.rep-client-2=0x000000000000000100000001*
>>> trusted.gfid=0x00000000000000000000000000000001
>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>> trusted.glusterfs.volume-id=0xa349517bb9d44bdf96da8ea324f89e7b
>>>
>>> [root at server-2 ~]# getfattr -d -m . -e hex /bricks/2
>>> # file: 2
>>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7
>>> 23a6574635f72756e74696d655f743a733000
>>> trusted.afr.dirty=0x000000000000000000000000
>>> *trusted.afr.rep-client-2=0x000**000000000000100000001*
>>> trusted.gfid=0x00000000000000000000000000000001
>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>> trusted.glusterfs.volume-id=0xa349517bb9d44bdf96da8ea324f89e7b
>>>
>>> 8. Flip the 8th digit in the trusted.afr.<VOLNAME>-client-2 to a 1.
>>>
>>> [root at server-1 ~]# setfattr -n trusted.afr.rep-client-2 -v
>>> *0x000000010000000100000001* /bricks/1
>>> [root at server-2 ~]# setfattr -n trusted.afr.rep-client-2 -v
>>> *0x000000010000000100000001* /bricks/2
>>>
>>> 9. Get the xattrs again and check the xattrs are set properly now
>>>
>>> [root at server-1 ~]# getfattr -d -m . -e hex /bricks/1
>>> # file: 1
>>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7
>>> 23a6574635f72756e74696d655f743a733000
>>> trusted.afr.dirty=0x000000000000000000000000
>>> *trusted.afr.rep-client-2=0x000**000010000000100000001*
>>> trusted.gfid=0x00000000000000000000000000000001
>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>> trusted.glusterfs.volume-id=0xa349517bb9d44bdf96da8ea324f89e7b
>>>
>>> [root at server-2 ~]# getfattr -d -m . -e hex /bricks/2
>>> # file: 2
>>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7
>>> 23a6574635f72756e74696d655f743a733000
>>> trusted.afr.dirty=0x000000000000000000000000
>>> *trusted.afr.rep-client-2=0x000**000010000000100000001*
>>> trusted.gfid=0x00000000000000000000000000000001
>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>> trusted.glusterfs.volume-id=0xa349517bb9d44bdf96da8ea324f89e7b
>>>
>>> 10. Force-start the volume.
>>>
>>> [root at server-1 ~]# gluster volume start rep force
>>> volume start: rep: success
>>>
>>> 11. Monitor heal-info command to ensure the number of entries keeps
>>> growing.
>>>
>>> 12. Keep monitoring with step 10 and eventually the number of entries
>>> needing heal must come down to 0.
>>> Also the checksums of the files on the previously empty brick should now
>>> match with the copies on the other two bricks.
>>>
>>> Could you check if the above steps work for you, in your test
>>> environment?
>>>
>>> You caught a nice bug in the manual steps to follow when granular
>>> entry-heal is enabled and an empty brick needs heal. Thanks for reporting
>>> it. :) We will fix the documentation appropriately.
>>>
>>> -Krutika
>>>
>>>
>>> On Wed, Aug 31, 2016 at 11:29 AM, Krutika Dhananjay <kdhananj at redhat.com
>>> > wrote:
>>>
>>>> Tried this.
>>>>
>>>> With me, only 'fake2' gets healed after i bring the 'empty' brick back
>>>> up and it stops there unless I do a 'heal-full'.
>>>>
>>>> Is that what you're seeing as well?
>>>>
>>>> -Krutika
>>>>
>>>> On Wed, Aug 31, 2016 at 4:43 AM, David Gossage <
>>>> dgossage at carouselchecks.com> wrote:
>>>>
>>>>> Same issue brought up glusterd on problem node heal count still stuck
>>>>> at 6330.
>>>>>
>>>>> Ran gluster v heal GUSTER1 full
>>>>>
>>>>> glustershd on problem node shows a sweep starting and finishing in
>>>>> seconds.  Other 2 nodes show no activity in log.  They should start a sweep
>>>>> too shouldn't they?
>>>>>
>>>>> Tried starting from scratch
>>>>>
>>>>> kill -15 brickpid
>>>>> rm -Rf /brick
>>>>> mkdir -p /brick
>>>>> mkdir mkdir /gsmount/fake2
>>>>> setfattr -n "user.some-name" -v "some-value" /gsmount/fake2
>>>>>
>>>>> Heals visible dirs instantly then stops.
>>>>>
>>>>> gluster v heal GLUSTER1 full
>>>>>
>>>>> see sweep star on problem node and end almost instantly.  no files
>>>>> added t heal list no files healed no more logging
>>>>>
>>>>> [2016-08-30 23:11:31.544331] I [MSGID: 108026]
>>>>> [afr-self-heald.c:646:afr_shd_full_healer] 0-GLUSTER1-replicate-0:
>>>>> starting full sweep on subvol GLUSTER1-client-1
>>>>> [2016-08-30 23:11:33.776235] I [MSGID: 108026]
>>>>> [afr-self-heald.c:656:afr_shd_full_healer] 0-GLUSTER1-replicate-0:
>>>>> finished full sweep on subvol GLUSTER1-client-1
>>>>>
>>>>> same results no matter which node you run command on.  Still stuck
>>>>> with 6330 files showing needing healed out of 19k.  still showing in logs
>>>>> no heals are occuring.
>>>>>
>>>>> Is their a way to forcibly reset any prior heal data?  Could it be
>>>>> stuck on some past failed heal start?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> *David Gossage*
>>>>> *Carousel Checks Inc. | System Administrator*
>>>>> *Office* 708.613.2284
>>>>>
>>>>> On Tue, Aug 30, 2016 at 10:03 AM, David Gossage <
>>>>> dgossage at carouselchecks.com> wrote:
>>>>>
>>>>>> On Tue, Aug 30, 2016 at 10:02 AM, David Gossage <
>>>>>> dgossage at carouselchecks.com> wrote:
>>>>>>
>>>>>>> updated test server to 3.8.3
>>>>>>>
>>>>>>> Brick1: 192.168.71.10:/gluster2/brick1/1
>>>>>>> Brick2: 192.168.71.11:/gluster2/brick2/1
>>>>>>> Brick3: 192.168.71.12:/gluster2/brick3/1
>>>>>>> Options Reconfigured:
>>>>>>> cluster.granular-entry-heal: on
>>>>>>> performance.readdir-ahead: on
>>>>>>> performance.read-ahead: off
>>>>>>> nfs.disable: on
>>>>>>> nfs.addr-namelookup: off
>>>>>>> nfs.enable-ino32: off
>>>>>>> cluster.background-self-heal-count: 16
>>>>>>> cluster.self-heal-window-size: 1024
>>>>>>> performance.quick-read: off
>>>>>>> performance.io-cache: off
>>>>>>> performance.stat-prefetch: off
>>>>>>> cluster.eager-lock: enable
>>>>>>> network.remote-dio: on
>>>>>>> cluster.quorum-type: auto
>>>>>>> cluster.server-quorum-type: server
>>>>>>> storage.owner-gid: 36
>>>>>>> storage.owner-uid: 36
>>>>>>> server.allow-insecure: on
>>>>>>> features.shard: on
>>>>>>> features.shard-block-size: 64MB
>>>>>>> performance.strict-o-direct: off
>>>>>>> cluster.locking-scheme: granular
>>>>>>>
>>>>>>> kill -15 brickpid
>>>>>>> rm -Rf /gluster2/brick3
>>>>>>> mkdir -p /gluster2/brick3/1
>>>>>>> mkdir mkdir /rhev/data-center/mnt/glusterSD/192.168.71.10
>>>>>>> \:_glustershard/fake2
>>>>>>> setfattr -n "user.some-name" -v "some-value"
>>>>>>> /rhev/data-center/mnt/glusterSD/192.168.71.10\:_glustershard/fake2
>>>>>>> gluster v start glustershard force
>>>>>>>
>>>>>>> at this point brick process starts and all visible files including
>>>>>>> new dir are made on brick
>>>>>>> handful of shards are in heal statistics still but no .shard
>>>>>>> directory created and no increase in shard count
>>>>>>>
>>>>>>> gluster v heal glustershard
>>>>>>>
>>>>>>> At this point still no increase in count or dir made no additional
>>>>>>> activity in logs for healing generated.  waited few minutes tailing logs to
>>>>>>> check if anything kicked in.
>>>>>>>
>>>>>>> gluster v heal glustershard full
>>>>>>>
>>>>>>> gluster shards added to list and heal commences.  logs show full
>>>>>>> sweep starting on all 3 nodes.  though this time it only shows as finishing
>>>>>>> on one which looks to be the one that had brick deleted.
>>>>>>>
>>>>>>> [2016-08-30 14:45:33.098589] I [MSGID: 108026]
>>>>>>> [afr-self-heald.c:646:afr_shd_full_healer]
>>>>>>> 0-glustershard-replicate-0: starting full sweep on subvol
>>>>>>> glustershard-client-0
>>>>>>> [2016-08-30 14:45:33.099492] I [MSGID: 108026]
>>>>>>> [afr-self-heald.c:646:afr_shd_full_healer]
>>>>>>> 0-glustershard-replicate-0: starting full sweep on subvol
>>>>>>> glustershard-client-1
>>>>>>> [2016-08-30 14:45:33.100093] I [MSGID: 108026]
>>>>>>> [afr-self-heald.c:646:afr_shd_full_healer]
>>>>>>> 0-glustershard-replicate-0: starting full sweep on subvol
>>>>>>> glustershard-client-2
>>>>>>> [2016-08-30 14:52:29.760213] I [MSGID: 108026]
>>>>>>> [afr-self-heald.c:656:afr_shd_full_healer]
>>>>>>> 0-glustershard-replicate-0: finished full sweep on subvol
>>>>>>> glustershard-client-2
>>>>>>>
>>>>>>
>>>>>> Just realized its still healing so that may be why sweep on 2 other
>>>>>> bricks haven't replied as finished.
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> my hope is that later tonight a full heal will work on production.
>>>>>>> Is it possible self-heal daemon can get stale or stop listening but still
>>>>>>> show as active?  Would stopping and starting self-heal daemon from gluster
>>>>>>> cli before doing these heals be helpful?
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Aug 30, 2016 at 9:29 AM, David Gossage <
>>>>>>> dgossage at carouselchecks.com> wrote:
>>>>>>>
>>>>>>>> On Tue, Aug 30, 2016 at 8:52 AM, David Gossage <
>>>>>>>> dgossage at carouselchecks.com> wrote:
>>>>>>>>
>>>>>>>>> On Tue, Aug 30, 2016 at 8:01 AM, Krutika Dhananjay <
>>>>>>>>> kdhananj at redhat.com> wrote:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Aug 30, 2016 at 6:20 PM, Krutika Dhananjay <
>>>>>>>>>> kdhananj at redhat.com> wrote:
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Aug 30, 2016 at 6:07 PM, David Gossage <
>>>>>>>>>>> dgossage at carouselchecks.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Aug 30, 2016 at 7:18 AM, Krutika Dhananjay <
>>>>>>>>>>>> kdhananj at redhat.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Could you also share the glustershd logs?
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I'll get them when I get to work sure
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> I tried the same steps that you mentioned multiple times, but
>>>>>>>>>>>>> heal is running to completion without any issues.
>>>>>>>>>>>>>
>>>>>>>>>>>>> It must be said that 'heal full' traverses the files and
>>>>>>>>>>>>> directories in a depth-first order and does heals also in the same order.
>>>>>>>>>>>>> But if it gets interrupted in the middle (say because self-heal-daemon was
>>>>>>>>>>>>> either intentionally or unintentionally brought offline and then brought
>>>>>>>>>>>>> back up), self-heal will only pick up the entries that are so far marked as
>>>>>>>>>>>>> new-entries that need heal which it will find in indices/xattrop directory.
>>>>>>>>>>>>> What this means is that those files and directories that were not visited
>>>>>>>>>>>>> during the crawl, will remain untouched and unhealed in this second
>>>>>>>>>>>>> iteration of heal, unless you execute a 'heal-full' again.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> So should it start healing shards as it crawls or not until
>>>>>>>>>>>> after it crawls the entire .shard directory?  At the pace it was going that
>>>>>>>>>>>> could be a week with one node appearing in the cluster but with no shard
>>>>>>>>>>>> files if anything tries to access a file on that node.  From my experience
>>>>>>>>>>>> other day telling it to heal full again did nothing regardless of node used.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> Crawl is started from '/' of the volume. Whenever self-heal
>>>>>>>>>> detects during the crawl that a file or directory is present in some
>>>>>>>>>> brick(s) and absent in others, it creates the file on the bricks where it
>>>>>>>>>> is absent and marks the fact that the file or directory might need
>>>>>>>>>> data/entry and metadata heal too (this also means that an index is created
>>>>>>>>>> under .glusterfs/indices/xattrop of the src bricks). And the data/entry and
>>>>>>>>>> metadata heal are picked up and done in
>>>>>>>>>>
>>>>>>>>> the background with the help of these indices.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Looking at my 3rd node as example i find nearly an exact same
>>>>>>>>> number of files in xattrop dir as reported by heal count at time I brought
>>>>>>>>> down node2 to try and alleviate read io errors that seemed to occur from
>>>>>>>>> what I was guessing as attempts to use the node with no shards for reads.
>>>>>>>>>
>>>>>>>>> Also attached are the glustershd logs from the 3 nodes, along with
>>>>>>>>> the test node i tried yesterday with same results.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Looking at my own logs I notice that a full sweep was only ever
>>>>>>>> recorded in glustershd.log on 2nd node with missing directory.  I believe I
>>>>>>>> should have found a sweep begun on every node correct?
>>>>>>>>
>>>>>>>> On my test dev when it did work I do see that
>>>>>>>>
>>>>>>>> [2016-08-30 13:56:25.223333] I [MSGID: 108026]
>>>>>>>> [afr-self-heald.c:646:afr_shd_full_healer]
>>>>>>>> 0-glustershard-replicate-0: starting full sweep on subvol
>>>>>>>> glustershard-client-0
>>>>>>>> [2016-08-30 13:56:25.223522] I [MSGID: 108026]
>>>>>>>> [afr-self-heald.c:646:afr_shd_full_healer]
>>>>>>>> 0-glustershard-replicate-0: starting full sweep on subvol
>>>>>>>> glustershard-client-1
>>>>>>>> [2016-08-30 13:56:25.224616] I [MSGID: 108026]
>>>>>>>> [afr-self-heald.c:646:afr_shd_full_healer]
>>>>>>>> 0-glustershard-replicate-0: starting full sweep on subvol
>>>>>>>> glustershard-client-2
>>>>>>>> [2016-08-30 14:18:48.333740] I [MSGID: 108026]
>>>>>>>> [afr-self-heald.c:656:afr_shd_full_healer]
>>>>>>>> 0-glustershard-replicate-0: finished full sweep on subvol
>>>>>>>> glustershard-client-2
>>>>>>>> [2016-08-30 14:18:48.356008] I [MSGID: 108026]
>>>>>>>> [afr-self-heald.c:656:afr_shd_full_healer]
>>>>>>>> 0-glustershard-replicate-0: finished full sweep on subvol
>>>>>>>> glustershard-client-1
>>>>>>>> [2016-08-30 14:18:49.637811] I [MSGID: 108026]
>>>>>>>> [afr-self-heald.c:656:afr_shd_full_healer]
>>>>>>>> 0-glustershard-replicate-0: finished full sweep on subvol
>>>>>>>> glustershard-client-0
>>>>>>>>
>>>>>>>> While when looking at past few days of the 3 prod nodes i only
>>>>>>>> found that on my 2nd node
>>>>>>>> [2016-08-27 01:26:42.638772] I [MSGID: 108026]
>>>>>>>> [afr-self-heald.c:646:afr_shd_full_healer] 0-GLUSTER1-replicate-0:
>>>>>>>> starting full sweep on subvol GLUSTER1-client-1
>>>>>>>> [2016-08-27 11:37:01.732366] I [MSGID: 108026]
>>>>>>>> [afr-self-heald.c:656:afr_shd_full_healer] 0-GLUSTER1-replicate-0:
>>>>>>>> finished full sweep on subvol GLUSTER1-client-1
>>>>>>>> [2016-08-27 12:58:34.597228] I [MSGID: 108026]
>>>>>>>> [afr-self-heald.c:646:afr_shd_full_healer] 0-GLUSTER1-replicate-0:
>>>>>>>> starting full sweep on subvol GLUSTER1-client-1
>>>>>>>> [2016-08-27 12:59:28.041173] I [MSGID: 108026]
>>>>>>>> [afr-self-heald.c:656:afr_shd_full_healer] 0-GLUSTER1-replicate-0:
>>>>>>>> finished full sweep on subvol GLUSTER1-client-1
>>>>>>>> [2016-08-27 20:03:42.560188] I [MSGID: 108026]
>>>>>>>> [afr-self-heald.c:646:afr_shd_full_healer] 0-GLUSTER1-replicate-0:
>>>>>>>> starting full sweep on subvol GLUSTER1-client-1
>>>>>>>> [2016-08-27 20:03:44.278274] I [MSGID: 108026]
>>>>>>>> [afr-self-heald.c:656:afr_shd_full_healer] 0-GLUSTER1-replicate-0:
>>>>>>>> finished full sweep on subvol GLUSTER1-client-1
>>>>>>>> [2016-08-27 21:00:42.603315] I [MSGID: 108026]
>>>>>>>> [afr-self-heald.c:646:afr_shd_full_healer] 0-GLUSTER1-replicate-0:
>>>>>>>> starting full sweep on subvol GLUSTER1-client-1
>>>>>>>> [2016-08-27 21:00:46.148674] I [MSGID: 108026]
>>>>>>>> [afr-self-heald.c:656:afr_shd_full_healer] 0-GLUSTER1-replicate-0:
>>>>>>>> finished full sweep on subvol GLUSTER1-client-1
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> My suspicion is that this is what happened on your setup.
>>>>>>>>>>>>> Could you confirm if that was the case?
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Brick was brought online with force start then a full heal
>>>>>>>>>>>> launched.  Hours later after it became evident that it was not adding new
>>>>>>>>>>>> files to heal I did try restarting self-heal daemon and relaunching full
>>>>>>>>>>>> heal again. But this was after the heal had basically already failed to
>>>>>>>>>>>> work as intended.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> OK. How did you figure it was not adding any new files? I need
>>>>>>>>>>> to know what places you were monitoring to come to this conclusion.
>>>>>>>>>>>
>>>>>>>>>>> -Krutika
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> As for those logs, I did manager to do something that caused
>>>>>>>>>>>>> these warning messages you shared earlier to appear in my client and server
>>>>>>>>>>>>> logs.
>>>>>>>>>>>>> Although these logs are annoying and a bit scary too, they
>>>>>>>>>>>>> didn't do any harm to the data in my volume. Why they appear just after a
>>>>>>>>>>>>> brick is replaced and under no other circumstances is something I'm still
>>>>>>>>>>>>> investigating.
>>>>>>>>>>>>>
>>>>>>>>>>>>> But for future, it would be good to follow the steps Anuradha
>>>>>>>>>>>>> gave as that would allow self-heal to at least detect that it has some
>>>>>>>>>>>>> repairing to do whenever it is restarted whether intentionally or otherwise.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I followed those steps as described on my test box and ended up
>>>>>>>>>>>> with exact same outcome of adding shards at an agonizing slow pace and no
>>>>>>>>>>>> creation of .shard directory or heals on shard directory.  Directories
>>>>>>>>>>>> visible from mount healed quickly.  This was with one VM so it has only 800
>>>>>>>>>>>> shards as well.  After hours at work it had added a total of 33 shards to
>>>>>>>>>>>> be healed.  I sent those logs yesterday as well though not the glustershd.
>>>>>>>>>>>>
>>>>>>>>>>>> Does replace-brick command copy files in same manner?  For
>>>>>>>>>>>> these purposes I am contemplating just skipping the heal route.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> -Krutika
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Aug 30, 2016 at 2:22 AM, David Gossage <
>>>>>>>>>>>>> dgossage at carouselchecks.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> attached brick and client logs from test machine where same
>>>>>>>>>>>>>> behavior occurred not sure if anything new is there.  its still on 3.8.2
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Number of Bricks: 1 x 3 = 3
>>>>>>>>>>>>>> Transport-type: tcp
>>>>>>>>>>>>>> Bricks:
>>>>>>>>>>>>>> Brick1: 192.168.71.10:/gluster2/brick1/1
>>>>>>>>>>>>>> Brick2: 192.168.71.11:/gluster2/brick2/1
>>>>>>>>>>>>>> Brick3: 192.168.71.12:/gluster2/brick3/1
>>>>>>>>>>>>>> Options Reconfigured:
>>>>>>>>>>>>>> cluster.locking-scheme: granular
>>>>>>>>>>>>>> performance.strict-o-direct: off
>>>>>>>>>>>>>> features.shard-block-size: 64MB
>>>>>>>>>>>>>> features.shard: on
>>>>>>>>>>>>>> server.allow-insecure: on
>>>>>>>>>>>>>> storage.owner-uid: 36
>>>>>>>>>>>>>> storage.owner-gid: 36
>>>>>>>>>>>>>> cluster.server-quorum-type: server
>>>>>>>>>>>>>> cluster.quorum-type: auto
>>>>>>>>>>>>>> network.remote-dio: on
>>>>>>>>>>>>>> cluster.eager-lock: enable
>>>>>>>>>>>>>> performance.stat-prefetch: off
>>>>>>>>>>>>>> performance.io-cache: off
>>>>>>>>>>>>>> performance.quick-read: off
>>>>>>>>>>>>>> cluster.self-heal-window-size: 1024
>>>>>>>>>>>>>> cluster.background-self-heal-count: 16
>>>>>>>>>>>>>> nfs.enable-ino32: off
>>>>>>>>>>>>>> nfs.addr-namelookup: off
>>>>>>>>>>>>>> nfs.disable: on
>>>>>>>>>>>>>> performance.read-ahead: off
>>>>>>>>>>>>>> performance.readdir-ahead: on
>>>>>>>>>>>>>> cluster.granular-entry-heal: on
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Mon, Aug 29, 2016 at 2:20 PM, David Gossage <
>>>>>>>>>>>>>> dgossage at carouselchecks.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Mon, Aug 29, 2016 at 7:01 AM, Anuradha Talur <
>>>>>>>>>>>>>>> atalur at redhat.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> ----- Original Message -----
>>>>>>>>>>>>>>>> > From: "David Gossage" <dgossage at carouselchecks.com>
>>>>>>>>>>>>>>>> > To: "Anuradha Talur" <atalur at redhat.com>
>>>>>>>>>>>>>>>> > Cc: "gluster-users at gluster.org List" <
>>>>>>>>>>>>>>>> Gluster-users at gluster.org>, "Krutika Dhananjay" <
>>>>>>>>>>>>>>>> kdhananj at redhat.com>
>>>>>>>>>>>>>>>> > Sent: Monday, August 29, 2016 5:12:42 PM
>>>>>>>>>>>>>>>> > Subject: Re: [Gluster-users] 3.8.3 Shards Healing Glacier
>>>>>>>>>>>>>>>> Slow
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> > On Mon, Aug 29, 2016 at 5:39 AM, Anuradha Talur <
>>>>>>>>>>>>>>>> atalur at redhat.com> wrote:
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> > > Response inline.
>>>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>>>> > > ----- Original Message -----
>>>>>>>>>>>>>>>> > > > From: "Krutika Dhananjay" <kdhananj at redhat.com>
>>>>>>>>>>>>>>>> > > > To: "David Gossage" <dgossage at carouselchecks.com>
>>>>>>>>>>>>>>>> > > > Cc: "gluster-users at gluster.org List" <
>>>>>>>>>>>>>>>> Gluster-users at gluster.org>
>>>>>>>>>>>>>>>> > > > Sent: Monday, August 29, 2016 3:55:04 PM
>>>>>>>>>>>>>>>> > > > Subject: Re: [Gluster-users] 3.8.3 Shards Healing
>>>>>>>>>>>>>>>> Glacier Slow
>>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>>> > > > Could you attach both client and brick logs?
>>>>>>>>>>>>>>>> Meanwhile I will try these
>>>>>>>>>>>>>>>> > > steps
>>>>>>>>>>>>>>>> > > > out on my machines and see if it is easily
>>>>>>>>>>>>>>>> recreatable.
>>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>>> > > > -Krutika
>>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>>> > > > On Mon, Aug 29, 2016 at 2:31 PM, David Gossage <
>>>>>>>>>>>>>>>> > > dgossage at carouselchecks.com
>>>>>>>>>>>>>>>> > > > > wrote:
>>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>>> > > > Centos 7 Gluster 3.8.3
>>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>>> > > > Brick1: ccgl1.gl.local:/gluster1/BRICK1/1
>>>>>>>>>>>>>>>> > > > Brick2: ccgl2.gl.local:/gluster1/BRICK1/1
>>>>>>>>>>>>>>>> > > > Brick3: ccgl4.gl.local:/gluster1/BRICK1/1
>>>>>>>>>>>>>>>> > > > Options Reconfigured:
>>>>>>>>>>>>>>>> > > > cluster.data-self-heal-algorithm: full
>>>>>>>>>>>>>>>> > > > cluster.self-heal-daemon: on
>>>>>>>>>>>>>>>> > > > cluster.locking-scheme: granular
>>>>>>>>>>>>>>>> > > > features.shard-block-size: 64MB
>>>>>>>>>>>>>>>> > > > features.shard: on
>>>>>>>>>>>>>>>> > > > performance.readdir-ahead: on
>>>>>>>>>>>>>>>> > > > storage.owner-uid: 36
>>>>>>>>>>>>>>>> > > > storage.owner-gid: 36
>>>>>>>>>>>>>>>> > > > performance.quick-read: off
>>>>>>>>>>>>>>>> > > > performance.read-ahead: off
>>>>>>>>>>>>>>>> > > > performance.io-cache: off
>>>>>>>>>>>>>>>> > > > performance.stat-prefetch: on
>>>>>>>>>>>>>>>> > > > cluster.eager-lock: enable
>>>>>>>>>>>>>>>> > > > network.remote-dio: enable
>>>>>>>>>>>>>>>> > > > cluster.quorum-type: auto
>>>>>>>>>>>>>>>> > > > cluster.server-quorum-type: server
>>>>>>>>>>>>>>>> > > > server.allow-insecure: on
>>>>>>>>>>>>>>>> > > > cluster.self-heal-window-size: 1024
>>>>>>>>>>>>>>>> > > > cluster.background-self-heal-count: 16
>>>>>>>>>>>>>>>> > > > performance.strict-write-ordering: off
>>>>>>>>>>>>>>>> > > > nfs.disable: on
>>>>>>>>>>>>>>>> > > > nfs.addr-namelookup: off
>>>>>>>>>>>>>>>> > > > nfs.enable-ino32: off
>>>>>>>>>>>>>>>> > > > cluster.granular-entry-heal: on
>>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>>> > > > Friday did rolling upgrade from 3.8.3->3.8.3 no
>>>>>>>>>>>>>>>> issues.
>>>>>>>>>>>>>>>> > > > Following steps detailed in previous recommendations
>>>>>>>>>>>>>>>> began proces of
>>>>>>>>>>>>>>>> > > > replacing and healngbricks one node at a time.
>>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>>> > > > 1) kill pid of brick
>>>>>>>>>>>>>>>> > > > 2) reconfigure brick from raid6 to raid10
>>>>>>>>>>>>>>>> > > > 3) recreate directory of brick
>>>>>>>>>>>>>>>> > > > 4) gluster volume start <> force
>>>>>>>>>>>>>>>> > > > 5) gluster volume heal <> full
>>>>>>>>>>>>>>>> > > Hi,
>>>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>>>> > > I'd suggest that full heal is not used. There are a few
>>>>>>>>>>>>>>>> bugs in full heal.
>>>>>>>>>>>>>>>> > > Better safe than sorry ;)
>>>>>>>>>>>>>>>> > > Instead I'd suggest the following steps:
>>>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>>>> > > Currently I brought the node down by systemctl stop
>>>>>>>>>>>>>>>> glusterd as I was
>>>>>>>>>>>>>>>> > getting sporadic io issues and a few VM's paused so
>>>>>>>>>>>>>>>> hoping that will help.
>>>>>>>>>>>>>>>> > I may wait to do this till around 4PM when most work is
>>>>>>>>>>>>>>>> done in case it
>>>>>>>>>>>>>>>> > shoots load up.
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> > > 1) kill pid of brick
>>>>>>>>>>>>>>>> > > 2) to configuring of brick that you need
>>>>>>>>>>>>>>>> > > 3) recreate brick dir
>>>>>>>>>>>>>>>> > > 4) while the brick is still down, from the mount point:
>>>>>>>>>>>>>>>> > >    a) create a dummy non existent dir under / of mount.
>>>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> > so if noee 2 is down brick, pick node for example 3 and
>>>>>>>>>>>>>>>> make a test dir
>>>>>>>>>>>>>>>> > under its brick directory that doesnt exist on 2 or
>>>>>>>>>>>>>>>> should I be dong this
>>>>>>>>>>>>>>>> > over a gluster mount?
>>>>>>>>>>>>>>>> You should be doing this over gluster mount.
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> > >    b) set a non existent extended attribute on / of
>>>>>>>>>>>>>>>> mount.
>>>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> > Could you give me an example of an attribute to set?
>>>>>>>>>>>>>>>>  I've read a tad on
>>>>>>>>>>>>>>>> > this, and looked up attributes but haven't set any yet
>>>>>>>>>>>>>>>> myself.
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> Sure. setfattr -n "user.some-name" -v "some-value"
>>>>>>>>>>>>>>>> <path-to-mount>
>>>>>>>>>>>>>>>> > Doing these steps will ensure that heal happens only from
>>>>>>>>>>>>>>>> updated brick to
>>>>>>>>>>>>>>>> > > down brick.
>>>>>>>>>>>>>>>> > > 5) gluster v start <> force
>>>>>>>>>>>>>>>> > > 6) gluster v heal <>
>>>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> > Will it matter if somewhere in gluster the full heal
>>>>>>>>>>>>>>>> command was run other
>>>>>>>>>>>>>>>> > day?  Not sure if it eventually stops or times out.
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> full heal will stop once the crawl is done. So if you want
>>>>>>>>>>>>>>>> to trigger heal again,
>>>>>>>>>>>>>>>> run gluster v heal <>. Actually even brick up or volume
>>>>>>>>>>>>>>>> start force should
>>>>>>>>>>>>>>>> trigger the heal.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Did this on test bed today.  its one server with 3 bricks on
>>>>>>>>>>>>>>> same machine so take that for what its worth.  also it still runs 3.8.2.
>>>>>>>>>>>>>>> Maybe ill update and re-run test.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> killed brick
>>>>>>>>>>>>>>> deleted brick dir
>>>>>>>>>>>>>>> recreated brick dir
>>>>>>>>>>>>>>> created fake dir on gluster mount
>>>>>>>>>>>>>>> set suggested fake attribute on it
>>>>>>>>>>>>>>> ran volume start <> force
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> looked at files it said needed healing and it was just 8
>>>>>>>>>>>>>>> shards that were modified for few minutes I ran through steps
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> gave it few minutes and it stayed same
>>>>>>>>>>>>>>> ran gluster volume <> heal
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> it healed all the directories and files you can see over
>>>>>>>>>>>>>>> mount including fakedir.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> same issue for shards though.  it adds more shards to heal
>>>>>>>>>>>>>>> at glacier pace.  slight jump in speed if I stat every file and dir in VM
>>>>>>>>>>>>>>> running but not all shards.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> It started with 8 shards to heal and is now only at 33 out
>>>>>>>>>>>>>>> of 800 and probably wont finish adding for few days at rate it goes.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>>>> > > > 1st node worked as expected took 12 hours to heal 1TB
>>>>>>>>>>>>>>>> data. Load was
>>>>>>>>>>>>>>>> > > little
>>>>>>>>>>>>>>>> > > > heavy but nothing shocking.
>>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>>> > > > About an hour after node 1 finished I began same
>>>>>>>>>>>>>>>> process on node2. Heal
>>>>>>>>>>>>>>>> > > > proces kicked in as before and the files in
>>>>>>>>>>>>>>>> directories visible from
>>>>>>>>>>>>>>>> > > mount
>>>>>>>>>>>>>>>> > > > and .glusterfs healed in short time. Then it began
>>>>>>>>>>>>>>>> crawl of .shard adding
>>>>>>>>>>>>>>>> > > > those files to heal count at which point the entire
>>>>>>>>>>>>>>>> proces ground to a
>>>>>>>>>>>>>>>> > > halt
>>>>>>>>>>>>>>>> > > > basically. After 48 hours out of 19k shards it has
>>>>>>>>>>>>>>>> added 5900 to heal
>>>>>>>>>>>>>>>> > > list.
>>>>>>>>>>>>>>>> > > > Load on all 3 machnes is negligible. It was suggested
>>>>>>>>>>>>>>>> to change this
>>>>>>>>>>>>>>>> > > value
>>>>>>>>>>>>>>>> > > > to full cluster.data-self-heal-algorithm and restart
>>>>>>>>>>>>>>>> volume which I
>>>>>>>>>>>>>>>> > > did. No
>>>>>>>>>>>>>>>> > > > efffect. Tried relaunching heal no effect, despite
>>>>>>>>>>>>>>>> any node picked. I
>>>>>>>>>>>>>>>> > > > started each VM and performed a stat of all files
>>>>>>>>>>>>>>>> from within it, or a
>>>>>>>>>>>>>>>> > > full
>>>>>>>>>>>>>>>> > > > virus scan and that seemed to cause short small
>>>>>>>>>>>>>>>> spikes in shards added,
>>>>>>>>>>>>>>>> > > but
>>>>>>>>>>>>>>>> > > > not by much. Logs are showing no real messages
>>>>>>>>>>>>>>>> indicating anything is
>>>>>>>>>>>>>>>> > > going
>>>>>>>>>>>>>>>> > > > on. I get hits to brick log on occasion of null
>>>>>>>>>>>>>>>> lookups making me think
>>>>>>>>>>>>>>>> > > its
>>>>>>>>>>>>>>>> > > > not really crawling shards directory but waiting for
>>>>>>>>>>>>>>>> a shard lookup to
>>>>>>>>>>>>>>>> > > add
>>>>>>>>>>>>>>>> > > > it. I'll get following in brick log but not constant
>>>>>>>>>>>>>>>> and sometime
>>>>>>>>>>>>>>>> > > multiple
>>>>>>>>>>>>>>>> > > > for same shard.
>>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>>> > > > [2016-08-29 08:31:57.478125] W [MSGID: 115009]
>>>>>>>>>>>>>>>> > > > [server-resolve.c:569:server_resolve]
>>>>>>>>>>>>>>>> 0-GLUSTER1-server: no resolution
>>>>>>>>>>>>>>>> > > type
>>>>>>>>>>>>>>>> > > > for (null) (LOOKUP)
>>>>>>>>>>>>>>>> > > > [2016-08-29 08:31:57.478170] E [MSGID: 115050]
>>>>>>>>>>>>>>>> > > > [server-rpc-fops.c:156:server_lookup_cbk]
>>>>>>>>>>>>>>>> 0-GLUSTER1-server: 12591783:
>>>>>>>>>>>>>>>> > > > LOOKUP (null) (00000000-0000-0000-00
>>>>>>>>>>>>>>>> > > > 00-000000000000/241a55ed-f0d5-4dbc-a6ce-ab784a0ba6ff.221)
>>>>>>>>>>>>>>>> ==> (Invalid
>>>>>>>>>>>>>>>> > > > argument) [Invalid argument]
>>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>>> > > > This one repeated about 30 times in row then nothing
>>>>>>>>>>>>>>>> for 10 minutes then
>>>>>>>>>>>>>>>> > > one
>>>>>>>>>>>>>>>> > > > hit for one different shard by itself.
>>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>>> > > > How can I determine if Heal is actually running? How
>>>>>>>>>>>>>>>> can I kill it or
>>>>>>>>>>>>>>>> > > force
>>>>>>>>>>>>>>>> > > > restart? Does node I start it from determine which
>>>>>>>>>>>>>>>> directory gets
>>>>>>>>>>>>>>>> > > crawled to
>>>>>>>>>>>>>>>> > > > determine heals?
>>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>>> > > > David Gossage
>>>>>>>>>>>>>>>> > > > Carousel Checks Inc. | System Administrator
>>>>>>>>>>>>>>>> > > > Office 708.613.2284
>>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>>> > > > _______________________________________________
>>>>>>>>>>>>>>>> > > > Gluster-users mailing list
>>>>>>>>>>>>>>>> > > > Gluster-users at gluster.org
>>>>>>>>>>>>>>>> > > > http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>>> > > > _______________________________________________
>>>>>>>>>>>>>>>> > > > Gluster-users mailing list
>>>>>>>>>>>>>>>> > > > Gluster-users at gluster.org
>>>>>>>>>>>>>>>> > > > http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>>>> > > --
>>>>>>>>>>>>>>>> > > Thanks,
>>>>>>>>>>>>>>>> > > Anuradha.
>>>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>> Anuradha.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160831/8e2dd801/attachment.html>


More information about the Gluster-users mailing list