[Gluster-users] 3.8.3 Shards Healing Glacier Slow
David Gossage
dgossage at carouselchecks.com
Tue Sep 6 16:56:47 UTC 2016
On Tue, Sep 6, 2016 at 11:41 AM, Krutika Dhananjay <kdhananj at redhat.com>
wrote:
>
>
> On Tue, Sep 6, 2016 at 7:27 PM, David Gossage <dgossage at carouselchecks.com
> > wrote:
>
>> Going to top post with solution Krutika Dhananjay came up with. His
>> steps were much less volatile and could be done with volume still being
>> actively used and also much less prone to accidental destruction.
>>
>> My use case and issue were desire to wipe a brick and recreate with same
>> directory structure so as to change underlying raid setup of disks making
>> brick. Problem occurred that getting the shards to heal was 99% of the
>> time failing.
>>
>>
> Hi,
>
> Thank you for posting this before I could get around to it. Also thanks to
> Pranith for suggesting the additional precautionary 'trusted.afr.dirty'
> step (step 4 below) and reviewing the steps once.
>
> IIUC the newly-introduced reset-brick command serves as an alternative to
> all this lengthy process listed below.
>
> @Pranith,
> Is the above statement correct? If so, do we know which releases will have
> the reset-brick command/feature?
>
>
>
>> These are steps he provided that has been working well.
>>
>
>
Err.. she. :)
>
ack so sorry
>
> -Krutika
>
>
>>
>> 1) kill brick pid on server that you want to replace
>> kill -15 <brickpid>
>>
>> 2) do brick maintenance which in my case was:
>> zpool destroy <ZFSPOOL>
>> zpool create (options) yada yada disks
>>
>> 3) make sure original path to brick exists
>> mkdir /path/to/brick
>>
>> 4) set extended attribute on new brick path (not over gluster mount)
>> setfattr -n trusted.afr.dirty -v 0x000000000000000000000001 /pa
>> th/to/brick
>>
>> 5) create a mount point to volume
>> mkdir /mnt-brick-test
>> glusterfs --volfile-id=<VOLNAME> --volfile-server=<valid host or ip of an
>> active gluster server> --client-pid=-6 /mnt-brick-test
>>
>> 6)set an extended attribute on the gluster network mount VOLNAME is the
>> gluster volume KILLEDBRICK# is the index of server needing heal. they
>> start from 0 and gluster v info should display them in order
>> setfattr -n trusted.replace-brick -v VOLNAME-client-KILLEDBRICK#
>> /mnt-brick-test
>>
>> 7) gluster heal should know show the / root of gluster volume in output
>> gluster v heal VOLNAME info
>>
>> 8) force start volume to bring up killed brick
>> gluster v start VOLNAME force
>>
>> 9) optionally watch heal progress and drink beer while you wait and hope
>> nothing blows up
>> watch -n 10 gluster v heal VOLNAME statistics heal-count
>>
>> 10) unmount gluster network mount from server
>> umount /mnt-brick-test
>>
>> 11) Praise the developers for their efforts
>>
>> *David Gossage*
>> *Carousel Checks Inc. | System Administrator*
>> *Office* 708.613.2284
>>
>> On Thu, Sep 1, 2016 at 2:29 PM, David Gossage <
>> dgossage at carouselchecks.com> wrote:
>>
>>> On Thu, Sep 1, 2016 at 12:09 AM, Krutika Dhananjay <kdhananj at redhat.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Wed, Aug 31, 2016 at 8:13 PM, David Gossage <
>>>> dgossage at carouselchecks.com> wrote:
>>>>
>>>>> Just as a test I did not shut down the one VM on the cluster as
>>>>> finding a window before weekend where I can shut down all VM's and fit in a
>>>>> full heal is unlikely so wanted to see what occurs.
>>>>>
>>>>>
>>>>> kill -15 brick pid
>>>>> rm -Rf /gluster2/brick1/1
>>>>> mkdir /gluster2/brick1/1
>>>>> mkdir /rhev/data-center/mnt/glusterSD/192.168.71.10\:_glustershard
>>>>> /fake3
>>>>> setfattr -n "user.some-name" -v "some-value"
>>>>> /rhev/data-center/mnt/glusterSD/192.168.71.10\:_glustershard
>>>>>
>>>>> getfattr -d -m . -e hex /gluster2/brick2/1
>>>>> # file: gluster2/brick2/1
>>>>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7
>>>>> 23a756e6c6162656c65645f743a733000
>>>>> trusted.afr.dirty=0x000000000000000000000001
>>>>> trusted.afr.glustershard-client-0=0x000000000000000200000000
>>>>>
>>>>
>>>> This is unusual. The last digit ought to have been 1 on account of
>>>> "fake3" being created while hte first brick is offline.
>>>>
>>>> This discussion is becoming unnecessary lengthy. Mind if we discuss
>>>> this and sort it out on IRC today, at least the communication will be
>>>> continuous and in real-time. I'm kdhananjay on #gluster (Freenode). Ping me
>>>> when you're online.
>>>>
>>>> -Krutika
>>>>
>>>
>>> Thanks for assistance this morning. Looks like I lost connection in IRC
>>> and didn't realize it so sorry if you came back looking for me. Let me
>>> know when the steps you worked out have been reviewed and if it's found
>>> safe for production use and I'll give a try.
>>>
>>>
>>>
>>>>
>>>>
>>>>> trusted.afr.glustershard-client-2=0x000000000000000000000000
>>>>> trusted.gfid=0x00000000000000000000000000000001
>>>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>>>> trusted.glusterfs.volume-id=0x5889332e50ba441e8fa5cce3ae6f3a15
>>>>> user.some-name=0x736f6d652d76616c7565
>>>>>
>>>>> getfattr -d -m . -e hex /gluster2/brick3/1
>>>>> # file: gluster2/brick3/1
>>>>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7
>>>>> 23a756e6c6162656c65645f743a733000
>>>>> trusted.afr.dirty=0x000000000000000000000001
>>>>> trusted.afr.glustershard-client-0=0x000000000000000200000000
>>>>> trusted.gfid=0x00000000000000000000000000000001
>>>>> trusted.glusterfs.volume-id=0x5889332e50ba441e8fa5cce3ae6f3a15
>>>>> user.some-name=0x736f6d652d76616c7565
>>>>>
>>>>> setfattr -n trusted.afr.glustershard-client-0 -v
>>>>> 0x000000010000000200000000 /gluster2/brick2/1
>>>>> setfattr -n trusted.afr.glustershard-client-0 -v
>>>>> 0x000000010000000200000000 /gluster2/brick3/1
>>>>>
>>>>> getfattr -d -m . -e hex /gluster2/brick3/1/
>>>>> getfattr: Removing leading '/' from absolute path names
>>>>> # file: gluster2/brick3/1/
>>>>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7
>>>>> 23a756e6c6162656c65645f743a733000
>>>>> trusted.afr.dirty=0x000000000000000000000000
>>>>> trusted.afr.glustershard-client-0=0x000000010000000200000000
>>>>> trusted.gfid=0x00000000000000000000000000000001
>>>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>>>> trusted.glusterfs.volume-id=0x5889332e50ba441e8fa5cce3ae6f3a15
>>>>> user.some-name=0x736f6d652d76616c7565
>>>>>
>>>>> getfattr -d -m . -e hex /gluster2/brick2/1/
>>>>> getfattr: Removing leading '/' from absolute path names
>>>>> # file: gluster2/brick2/1/
>>>>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7
>>>>> 23a756e6c6162656c65645f743a733000
>>>>> trusted.afr.dirty=0x000000000000000000000000
>>>>> trusted.afr.glustershard-client-0=0x000000010000000200000000
>>>>> trusted.afr.glustershard-client-2=0x000000000000000000000000
>>>>> trusted.gfid=0x00000000000000000000000000000001
>>>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>>>> trusted.glusterfs.volume-id=0x5889332e50ba441e8fa5cce3ae6f3a15
>>>>> user.some-name=0x736f6d652d76616c7565
>>>>>
>>>>> gluster v start glustershard force
>>>>>
>>>>> gluster heal counts climbed up and down a little as it healed
>>>>> everything in visible gluster mount and .glusterfs for visible mount files
>>>>> then stalled with around 15 shards and the fake3 directory still in list
>>>>>
>>>>> getfattr -d -m . -e hex /gluster2/brick2/1/
>>>>> getfattr: Removing leading '/' from absolute path names
>>>>> # file: gluster2/brick2/1/
>>>>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7
>>>>> 23a756e6c6162656c65645f743a733000
>>>>> trusted.afr.dirty=0x000000000000000000000000
>>>>> trusted.afr.glustershard-client-0=0x000000010000000000000000
>>>>> trusted.afr.glustershard-client-2=0x000000000000000000000000
>>>>> trusted.gfid=0x00000000000000000000000000000001
>>>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>>>> trusted.glusterfs.volume-id=0x5889332e50ba441e8fa5cce3ae6f3a15
>>>>> user.some-name=0x736f6d652d76616c7565
>>>>>
>>>>> getfattr -d -m . -e hex /gluster2/brick3/1/
>>>>> getfattr: Removing leading '/' from absolute path names
>>>>> # file: gluster2/brick3/1/
>>>>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7
>>>>> 23a756e6c6162656c65645f743a733000
>>>>> trusted.afr.dirty=0x000000000000000000000000
>>>>> trusted.afr.glustershard-client-0=0x000000010000000000000000
>>>>> trusted.gfid=0x00000000000000000000000000000001
>>>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>>>> trusted.glusterfs.volume-id=0x5889332e50ba441e8fa5cce3ae6f3a15
>>>>> user.some-name=0x736f6d652d76616c7565
>>>>>
>>>>> getfattr -d -m . -e hex /gluster2/brick1/1/
>>>>> getfattr: Removing leading '/' from absolute path names
>>>>> # file: gluster2/brick1/1/
>>>>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7
>>>>> 23a756e6c6162656c65645f743a733000
>>>>> trusted.gfid=0x00000000000000000000000000000001
>>>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>>>> trusted.glusterfs.volume-id=0x5889332e50ba441e8fa5cce3ae6f3a15
>>>>> user.some-name=0x736f6d652d76616c7565
>>>>>
>>>>> heal count stayed same for awhile then ran
>>>>>
>>>>> gluster v heal glustershard full
>>>>>
>>>>> heals jump up to 700 as shards actually get read in as needing heals.
>>>>> glustershd shows 3 sweeps started one per brick
>>>>>
>>>>> It heals shards things look ok heal <> info shows 0 files but
>>>>> statistics heal-info shows 1 left for brick 2 and 3. perhaps cause I didnt
>>>>> stop vm running?
>>>>>
>>>>> # file: gluster2/brick1/1/
>>>>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7
>>>>> 23a756e6c6162656c65645f743a733000
>>>>> trusted.gfid=0x00000000000000000000000000000001
>>>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>>>> trusted.glusterfs.volume-id=0x5889332e50ba441e8fa5cce3ae6f3a15
>>>>> user.some-name=0x736f6d652d76616c7565
>>>>>
>>>>> # file: gluster2/brick2/1/
>>>>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7
>>>>> 23a756e6c6162656c65645f743a733000
>>>>> trusted.afr.dirty=0x000000000000000000000000
>>>>> trusted.afr.glustershard-client-0=0x000000010000000000000000
>>>>> trusted.afr.glustershard-client-2=0x000000000000000000000000
>>>>> trusted.gfid=0x00000000000000000000000000000001
>>>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>>>> trusted.glusterfs.volume-id=0x5889332e50ba441e8fa5cce3ae6f3a15
>>>>> user.some-name=0x736f6d652d76616c7565
>>>>>
>>>>> # file: gluster2/brick3/1/
>>>>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7
>>>>> 23a756e6c6162656c65645f743a733000
>>>>> trusted.afr.dirty=0x000000000000000000000000
>>>>> trusted.afr.glustershard-client-0=0x000000010000000000000000
>>>>> trusted.gfid=0x00000000000000000000000000000001
>>>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>>>> trusted.glusterfs.volume-id=0x5889332e50ba441e8fa5cce3ae6f3a15
>>>>> user.some-name=0x736f6d652d76616c7565
>>>>>
>>>>> meta-data split-brain? heal <> info split-brain shows no files or
>>>>> entries. If I had thought ahead I would have checked the values returned
>>>>> by getfattr before, although I do know heal-count was returning 0 at the
>>>>> time
>>>>>
>>>>>
>>>>> Assuming I need to shut down vm's and put volume in maintenance from
>>>>> ovirt to prevent any io. Does it need to occur for whole heal or can I
>>>>> re-activate at some point to bring VM's back up?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> *David Gossage*
>>>>> *Carousel Checks Inc. | System Administrator*
>>>>> *Office* 708.613.2284
>>>>>
>>>>> On Wed, Aug 31, 2016 at 3:50 AM, Krutika Dhananjay <
>>>>> kdhananj at redhat.com> wrote:
>>>>>
>>>>>> No, sorry, it's working fine. I may have missed some step because of
>>>>>> which i saw that problem. /.shard is also healing fine now.
>>>>>>
>>>>>> Let me know if it works for you.
>>>>>>
>>>>>> -Krutika
>>>>>>
>>>>>> On Wed, Aug 31, 2016 at 12:49 PM, Krutika Dhananjay <
>>>>>> kdhananj at redhat.com> wrote:
>>>>>>
>>>>>>> OK I just hit the other issue too, where .shard doesn't get healed.
>>>>>>> :)
>>>>>>>
>>>>>>> Investigating as to why that is the case. Give me some time.
>>>>>>>
>>>>>>> -Krutika
>>>>>>>
>>>>>>> On Wed, Aug 31, 2016 at 12:39 PM, Krutika Dhananjay <
>>>>>>> kdhananj at redhat.com> wrote:
>>>>>>>
>>>>>>>> Just figured the steps Anuradha has provided won't work if granular
>>>>>>>> entry heal is on.
>>>>>>>> So when you bring down a brick and create fake2 under / of the
>>>>>>>> volume, granular entry heal feature causes
>>>>>>>> sh to remember only the fact that 'fake2' needs to be recreated on
>>>>>>>> the offline brick (because changelogs are granular).
>>>>>>>>
>>>>>>>> In this case, we would be required to indicate to self-heal-daemon
>>>>>>>> that the entire directory tree from '/' needs to be repaired on the brick
>>>>>>>> that contains no data.
>>>>>>>>
>>>>>>>> To fix this, I did the following (for users who use granular entry
>>>>>>>> self-healing):
>>>>>>>>
>>>>>>>> 1. Kill the last brick process in the replica (/bricks/3)
>>>>>>>>
>>>>>>>> 2. [root at server-3 ~]# rm -rf /bricks/3
>>>>>>>>
>>>>>>>> 3. [root at server-3 ~]# mkdir /bricks/3
>>>>>>>>
>>>>>>>> 4. Create a new dir on the mount point:
>>>>>>>> [root at client-1 ~]# mkdir /mnt/fake
>>>>>>>>
>>>>>>>> 5. Set some fake xattr on the root of the volume, and not the
>>>>>>>> 'fake' directory itself.
>>>>>>>> [root at client-1 ~]# setfattr -n "user.some-name" -v
>>>>>>>> "some-value" /mnt
>>>>>>>>
>>>>>>>> 6. Make sure there's no io happening on your volume.
>>>>>>>>
>>>>>>>> 7. Check the pending xattrs on the brick directories of the two
>>>>>>>> good copies (on bricks 1 and 2), you should be seeing same values as the
>>>>>>>> one marked in red in both bricks.
>>>>>>>> (note that the client-<num> xattr key will have the same last digit
>>>>>>>> as the index of the brick that is down, when counting from 0. So if the
>>>>>>>> first brick is the one that is down, it would read trusted.afr.*-client-0;
>>>>>>>> if the second brick is the one that is empty and down, it would read
>>>>>>>> trusted.afr.*-client-1 and so on).
>>>>>>>>
>>>>>>>> [root at server-1 ~]# getfattr -d -m . -e hex /bricks/1
>>>>>>>> # file: 1
>>>>>>>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7
>>>>>>>> 23a6574635f72756e74696d655f743a733000
>>>>>>>> trusted.afr.dirty=0x000000000000000000000000
>>>>>>>> *trusted.afr.rep-client-2=0x000000000000000100000001*
>>>>>>>> trusted.gfid=0x00000000000000000000000000000001
>>>>>>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>>>>>>> trusted.glusterfs.volume-id=0xa349517bb9d44bdf96da8ea324f89e7b
>>>>>>>>
>>>>>>>> [root at server-2 ~]# getfattr -d -m . -e hex /bricks/2
>>>>>>>> # file: 2
>>>>>>>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7
>>>>>>>> 23a6574635f72756e74696d655f743a733000
>>>>>>>> trusted.afr.dirty=0x000000000000000000000000
>>>>>>>> *trusted.afr.rep-client-2=0x000**000000000000100000001*
>>>>>>>> trusted.gfid=0x00000000000000000000000000000001
>>>>>>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>>>>>>> trusted.glusterfs.volume-id=0xa349517bb9d44bdf96da8ea324f89e7b
>>>>>>>>
>>>>>>>> 8. Flip the 8th digit in the trusted.afr.<VOLNAME>-client-2 to a 1.
>>>>>>>>
>>>>>>>> [root at server-1 ~]# setfattr -n trusted.afr.rep-client-2 -v
>>>>>>>> *0x000000010000000100000001* /bricks/1
>>>>>>>> [root at server-2 ~]# setfattr -n trusted.afr.rep-client-2 -v
>>>>>>>> *0x000000010000000100000001* /bricks/2
>>>>>>>>
>>>>>>>> 9. Get the xattrs again and check the xattrs are set properly now
>>>>>>>>
>>>>>>>> [root at server-1 ~]# getfattr -d -m . -e hex /bricks/1
>>>>>>>> # file: 1
>>>>>>>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7
>>>>>>>> 23a6574635f72756e74696d655f743a733000
>>>>>>>> trusted.afr.dirty=0x000000000000000000000000
>>>>>>>> *trusted.afr.rep-client-2=0x000**000010000000100000001*
>>>>>>>> trusted.gfid=0x00000000000000000000000000000001
>>>>>>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>>>>>>> trusted.glusterfs.volume-id=0xa349517bb9d44bdf96da8ea324f89e7b
>>>>>>>>
>>>>>>>> [root at server-2 ~]# getfattr -d -m . -e hex /bricks/2
>>>>>>>> # file: 2
>>>>>>>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7
>>>>>>>> 23a6574635f72756e74696d655f743a733000
>>>>>>>> trusted.afr.dirty=0x000000000000000000000000
>>>>>>>> *trusted.afr.rep-client-2=0x000**000010000000100000001*
>>>>>>>> trusted.gfid=0x00000000000000000000000000000001
>>>>>>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>>>>>>> trusted.glusterfs.volume-id=0xa349517bb9d44bdf96da8ea324f89e7b
>>>>>>>>
>>>>>>>> 10. Force-start the volume.
>>>>>>>>
>>>>>>>> [root at server-1 ~]# gluster volume start rep force
>>>>>>>> volume start: rep: success
>>>>>>>>
>>>>>>>> 11. Monitor heal-info command to ensure the number of entries keeps
>>>>>>>> growing.
>>>>>>>>
>>>>>>>> 12. Keep monitoring with step 10 and eventually the number of
>>>>>>>> entries needing heal must come down to 0.
>>>>>>>> Also the checksums of the files on the previously empty brick
>>>>>>>> should now match with the copies on the other two bricks.
>>>>>>>>
>>>>>>>> Could you check if the above steps work for you, in your test
>>>>>>>> environment?
>>>>>>>>
>>>>>>>> You caught a nice bug in the manual steps to follow when granular
>>>>>>>> entry-heal is enabled and an empty brick needs heal. Thanks for reporting
>>>>>>>> it. :) We will fix the documentation appropriately.
>>>>>>>>
>>>>>>>> -Krutika
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Aug 31, 2016 at 11:29 AM, Krutika Dhananjay <
>>>>>>>> kdhananj at redhat.com> wrote:
>>>>>>>>
>>>>>>>>> Tried this.
>>>>>>>>>
>>>>>>>>> With me, only 'fake2' gets healed after i bring the 'empty' brick
>>>>>>>>> back up and it stops there unless I do a 'heal-full'.
>>>>>>>>>
>>>>>>>>> Is that what you're seeing as well?
>>>>>>>>>
>>>>>>>>> -Krutika
>>>>>>>>>
>>>>>>>>> On Wed, Aug 31, 2016 at 4:43 AM, David Gossage <
>>>>>>>>> dgossage at carouselchecks.com> wrote:
>>>>>>>>>
>>>>>>>>>> Same issue brought up glusterd on problem node heal count still
>>>>>>>>>> stuck at 6330.
>>>>>>>>>>
>>>>>>>>>> Ran gluster v heal GUSTER1 full
>>>>>>>>>>
>>>>>>>>>> glustershd on problem node shows a sweep starting and finishing
>>>>>>>>>> in seconds. Other 2 nodes show no activity in log. They should start a
>>>>>>>>>> sweep too shouldn't they?
>>>>>>>>>>
>>>>>>>>>> Tried starting from scratch
>>>>>>>>>>
>>>>>>>>>> kill -15 brickpid
>>>>>>>>>> rm -Rf /brick
>>>>>>>>>> mkdir -p /brick
>>>>>>>>>> mkdir mkdir /gsmount/fake2
>>>>>>>>>> setfattr -n "user.some-name" -v "some-value" /gsmount/fake2
>>>>>>>>>>
>>>>>>>>>> Heals visible dirs instantly then stops.
>>>>>>>>>>
>>>>>>>>>> gluster v heal GLUSTER1 full
>>>>>>>>>>
>>>>>>>>>> see sweep star on problem node and end almost instantly. no
>>>>>>>>>> files added t heal list no files healed no more logging
>>>>>>>>>>
>>>>>>>>>> [2016-08-30 23:11:31.544331] I [MSGID: 108026]
>>>>>>>>>> [afr-self-heald.c:646:afr_shd_full_healer]
>>>>>>>>>> 0-GLUSTER1-replicate-0: starting full sweep on subvol GLUSTER1-client-1
>>>>>>>>>> [2016-08-30 23:11:33.776235] I [MSGID: 108026]
>>>>>>>>>> [afr-self-heald.c:656:afr_shd_full_healer]
>>>>>>>>>> 0-GLUSTER1-replicate-0: finished full sweep on subvol GLUSTER1-client-1
>>>>>>>>>>
>>>>>>>>>> same results no matter which node you run command on. Still
>>>>>>>>>> stuck with 6330 files showing needing healed out of 19k. still showing in
>>>>>>>>>> logs no heals are occuring.
>>>>>>>>>>
>>>>>>>>>> Is their a way to forcibly reset any prior heal data? Could it
>>>>>>>>>> be stuck on some past failed heal start?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> *David Gossage*
>>>>>>>>>> *Carousel Checks Inc. | System Administrator*
>>>>>>>>>> *Office* 708.613.2284
>>>>>>>>>>
>>>>>>>>>> On Tue, Aug 30, 2016 at 10:03 AM, David Gossage <
>>>>>>>>>> dgossage at carouselchecks.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> On Tue, Aug 30, 2016 at 10:02 AM, David Gossage <
>>>>>>>>>>> dgossage at carouselchecks.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> updated test server to 3.8.3
>>>>>>>>>>>>
>>>>>>>>>>>> Brick1: 192.168.71.10:/gluster2/brick1/1
>>>>>>>>>>>> Brick2: 192.168.71.11:/gluster2/brick2/1
>>>>>>>>>>>> Brick3: 192.168.71.12:/gluster2/brick3/1
>>>>>>>>>>>> Options Reconfigured:
>>>>>>>>>>>> cluster.granular-entry-heal: on
>>>>>>>>>>>> performance.readdir-ahead: on
>>>>>>>>>>>> performance.read-ahead: off
>>>>>>>>>>>> nfs.disable: on
>>>>>>>>>>>> nfs.addr-namelookup: off
>>>>>>>>>>>> nfs.enable-ino32: off
>>>>>>>>>>>> cluster.background-self-heal-count: 16
>>>>>>>>>>>> cluster.self-heal-window-size: 1024
>>>>>>>>>>>> performance.quick-read: off
>>>>>>>>>>>> performance.io-cache: off
>>>>>>>>>>>> performance.stat-prefetch: off
>>>>>>>>>>>> cluster.eager-lock: enable
>>>>>>>>>>>> network.remote-dio: on
>>>>>>>>>>>> cluster.quorum-type: auto
>>>>>>>>>>>> cluster.server-quorum-type: server
>>>>>>>>>>>> storage.owner-gid: 36
>>>>>>>>>>>> storage.owner-uid: 36
>>>>>>>>>>>> server.allow-insecure: on
>>>>>>>>>>>> features.shard: on
>>>>>>>>>>>> features.shard-block-size: 64MB
>>>>>>>>>>>> performance.strict-o-direct: off
>>>>>>>>>>>> cluster.locking-scheme: granular
>>>>>>>>>>>>
>>>>>>>>>>>> kill -15 brickpid
>>>>>>>>>>>> rm -Rf /gluster2/brick3
>>>>>>>>>>>> mkdir -p /gluster2/brick3/1
>>>>>>>>>>>> mkdir mkdir /rhev/data-center/mnt/glusterSD/192.168.71.10
>>>>>>>>>>>> \:_glustershard/fake2
>>>>>>>>>>>> setfattr -n "user.some-name" -v "some-value"
>>>>>>>>>>>> /rhev/data-center/mnt/glusterSD/192.168.71.10\:_glustershard
>>>>>>>>>>>> /fake2
>>>>>>>>>>>> gluster v start glustershard force
>>>>>>>>>>>>
>>>>>>>>>>>> at this point brick process starts and all visible files
>>>>>>>>>>>> including new dir are made on brick
>>>>>>>>>>>> handful of shards are in heal statistics still but no .shard
>>>>>>>>>>>> directory created and no increase in shard count
>>>>>>>>>>>>
>>>>>>>>>>>> gluster v heal glustershard
>>>>>>>>>>>>
>>>>>>>>>>>> At this point still no increase in count or dir made no
>>>>>>>>>>>> additional activity in logs for healing generated. waited few minutes
>>>>>>>>>>>> tailing logs to check if anything kicked in.
>>>>>>>>>>>>
>>>>>>>>>>>> gluster v heal glustershard full
>>>>>>>>>>>>
>>>>>>>>>>>> gluster shards added to list and heal commences. logs show
>>>>>>>>>>>> full sweep starting on all 3 nodes. though this time it only shows as
>>>>>>>>>>>> finishing on one which looks to be the one that had brick deleted.
>>>>>>>>>>>>
>>>>>>>>>>>> [2016-08-30 14:45:33.098589] I [MSGID: 108026]
>>>>>>>>>>>> [afr-self-heald.c:646:afr_shd_full_healer]
>>>>>>>>>>>> 0-glustershard-replicate-0: starting full sweep on subvol
>>>>>>>>>>>> glustershard-client-0
>>>>>>>>>>>> [2016-08-30 14:45:33.099492] I [MSGID: 108026]
>>>>>>>>>>>> [afr-self-heald.c:646:afr_shd_full_healer]
>>>>>>>>>>>> 0-glustershard-replicate-0: starting full sweep on subvol
>>>>>>>>>>>> glustershard-client-1
>>>>>>>>>>>> [2016-08-30 14:45:33.100093] I [MSGID: 108026]
>>>>>>>>>>>> [afr-self-heald.c:646:afr_shd_full_healer]
>>>>>>>>>>>> 0-glustershard-replicate-0: starting full sweep on subvol
>>>>>>>>>>>> glustershard-client-2
>>>>>>>>>>>> [2016-08-30 14:52:29.760213] I [MSGID: 108026]
>>>>>>>>>>>> [afr-self-heald.c:656:afr_shd_full_healer]
>>>>>>>>>>>> 0-glustershard-replicate-0: finished full sweep on subvol
>>>>>>>>>>>> glustershard-client-2
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Just realized its still healing so that may be why sweep on 2
>>>>>>>>>>> other bricks haven't replied as finished.
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> my hope is that later tonight a full heal will work on
>>>>>>>>>>>> production. Is it possible self-heal daemon can get stale or stop
>>>>>>>>>>>> listening but still show as active? Would stopping and starting self-heal
>>>>>>>>>>>> daemon from gluster cli before doing these heals be helpful?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Aug 30, 2016 at 9:29 AM, David Gossage <
>>>>>>>>>>>> dgossage at carouselchecks.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Aug 30, 2016 at 8:52 AM, David Gossage <
>>>>>>>>>>>>> dgossage at carouselchecks.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Aug 30, 2016 at 8:01 AM, Krutika Dhananjay <
>>>>>>>>>>>>>> kdhananj at redhat.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Tue, Aug 30, 2016 at 6:20 PM, Krutika Dhananjay <
>>>>>>>>>>>>>>> kdhananj at redhat.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Tue, Aug 30, 2016 at 6:07 PM, David Gossage <
>>>>>>>>>>>>>>>> dgossage at carouselchecks.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Tue, Aug 30, 2016 at 7:18 AM, Krutika Dhananjay <
>>>>>>>>>>>>>>>>> kdhananj at redhat.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Could you also share the glustershd logs?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I'll get them when I get to work sure
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I tried the same steps that you mentioned multiple times,
>>>>>>>>>>>>>>>>>> but heal is running to completion without any issues.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> It must be said that 'heal full' traverses the files and
>>>>>>>>>>>>>>>>>> directories in a depth-first order and does heals also in the same order.
>>>>>>>>>>>>>>>>>> But if it gets interrupted in the middle (say because self-heal-daemon was
>>>>>>>>>>>>>>>>>> either intentionally or unintentionally brought offline and then brought
>>>>>>>>>>>>>>>>>> back up), self-heal will only pick up the entries that are so far marked as
>>>>>>>>>>>>>>>>>> new-entries that need heal which it will find in indices/xattrop directory.
>>>>>>>>>>>>>>>>>> What this means is that those files and directories that were not visited
>>>>>>>>>>>>>>>>>> during the crawl, will remain untouched and unhealed in this second
>>>>>>>>>>>>>>>>>> iteration of heal, unless you execute a 'heal-full' again.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> So should it start healing shards as it crawls or not
>>>>>>>>>>>>>>>>> until after it crawls the entire .shard directory? At the pace it was
>>>>>>>>>>>>>>>>> going that could be a week with one node appearing in the cluster but with
>>>>>>>>>>>>>>>>> no shard files if anything tries to access a file on that node. From my
>>>>>>>>>>>>>>>>> experience other day telling it to heal full again did nothing regardless
>>>>>>>>>>>>>>>>> of node used.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Crawl is started from '/' of the volume. Whenever self-heal
>>>>>>>>>>>>>>> detects during the crawl that a file or directory is present in some
>>>>>>>>>>>>>>> brick(s) and absent in others, it creates the file on the bricks where it
>>>>>>>>>>>>>>> is absent and marks the fact that the file or directory might need
>>>>>>>>>>>>>>> data/entry and metadata heal too (this also means that an index is created
>>>>>>>>>>>>>>> under .glusterfs/indices/xattrop of the src bricks). And the data/entry and
>>>>>>>>>>>>>>> metadata heal are picked up and done in
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> the background with the help of these indices.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Looking at my 3rd node as example i find nearly an exact same
>>>>>>>>>>>>>> number of files in xattrop dir as reported by heal count at time I brought
>>>>>>>>>>>>>> down node2 to try and alleviate read io errors that seemed to occur from
>>>>>>>>>>>>>> what I was guessing as attempts to use the node with no shards for reads.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Also attached are the glustershd logs from the 3 nodes, along
>>>>>>>>>>>>>> with the test node i tried yesterday with same results.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Looking at my own logs I notice that a full sweep was only
>>>>>>>>>>>>> ever recorded in glustershd.log on 2nd node with missing directory. I
>>>>>>>>>>>>> believe I should have found a sweep begun on every node correct?
>>>>>>>>>>>>>
>>>>>>>>>>>>> On my test dev when it did work I do see that
>>>>>>>>>>>>>
>>>>>>>>>>>>> [2016-08-30 13:56:25.223333] I [MSGID: 108026]
>>>>>>>>>>>>> [afr-self-heald.c:646:afr_shd_full_healer]
>>>>>>>>>>>>> 0-glustershard-replicate-0: starting full sweep on subvol
>>>>>>>>>>>>> glustershard-client-0
>>>>>>>>>>>>> [2016-08-30 13:56:25.223522] I [MSGID: 108026]
>>>>>>>>>>>>> [afr-self-heald.c:646:afr_shd_full_healer]
>>>>>>>>>>>>> 0-glustershard-replicate-0: starting full sweep on subvol
>>>>>>>>>>>>> glustershard-client-1
>>>>>>>>>>>>> [2016-08-30 13:56:25.224616] I [MSGID: 108026]
>>>>>>>>>>>>> [afr-self-heald.c:646:afr_shd_full_healer]
>>>>>>>>>>>>> 0-glustershard-replicate-0: starting full sweep on subvol
>>>>>>>>>>>>> glustershard-client-2
>>>>>>>>>>>>> [2016-08-30 14:18:48.333740] I [MSGID: 108026]
>>>>>>>>>>>>> [afr-self-heald.c:656:afr_shd_full_healer]
>>>>>>>>>>>>> 0-glustershard-replicate-0: finished full sweep on subvol
>>>>>>>>>>>>> glustershard-client-2
>>>>>>>>>>>>> [2016-08-30 14:18:48.356008] I [MSGID: 108026]
>>>>>>>>>>>>> [afr-self-heald.c:656:afr_shd_full_healer]
>>>>>>>>>>>>> 0-glustershard-replicate-0: finished full sweep on subvol
>>>>>>>>>>>>> glustershard-client-1
>>>>>>>>>>>>> [2016-08-30 14:18:49.637811] I [MSGID: 108026]
>>>>>>>>>>>>> [afr-self-heald.c:656:afr_shd_full_healer]
>>>>>>>>>>>>> 0-glustershard-replicate-0: finished full sweep on subvol
>>>>>>>>>>>>> glustershard-client-0
>>>>>>>>>>>>>
>>>>>>>>>>>>> While when looking at past few days of the 3 prod nodes i only
>>>>>>>>>>>>> found that on my 2nd node
>>>>>>>>>>>>> [2016-08-27 01:26:42.638772] I [MSGID: 108026]
>>>>>>>>>>>>> [afr-self-heald.c:646:afr_shd_full_healer]
>>>>>>>>>>>>> 0-GLUSTER1-replicate-0: starting full sweep on subvol GLUSTER1-client-1
>>>>>>>>>>>>> [2016-08-27 11:37:01.732366] I [MSGID: 108026]
>>>>>>>>>>>>> [afr-self-heald.c:656:afr_shd_full_healer]
>>>>>>>>>>>>> 0-GLUSTER1-replicate-0: finished full sweep on subvol GLUSTER1-client-1
>>>>>>>>>>>>> [2016-08-27 12:58:34.597228] I [MSGID: 108026]
>>>>>>>>>>>>> [afr-self-heald.c:646:afr_shd_full_healer]
>>>>>>>>>>>>> 0-GLUSTER1-replicate-0: starting full sweep on subvol GLUSTER1-client-1
>>>>>>>>>>>>> [2016-08-27 12:59:28.041173] I [MSGID: 108026]
>>>>>>>>>>>>> [afr-self-heald.c:656:afr_shd_full_healer]
>>>>>>>>>>>>> 0-GLUSTER1-replicate-0: finished full sweep on subvol GLUSTER1-client-1
>>>>>>>>>>>>> [2016-08-27 20:03:42.560188] I [MSGID: 108026]
>>>>>>>>>>>>> [afr-self-heald.c:646:afr_shd_full_healer]
>>>>>>>>>>>>> 0-GLUSTER1-replicate-0: starting full sweep on subvol GLUSTER1-client-1
>>>>>>>>>>>>> [2016-08-27 20:03:44.278274] I [MSGID: 108026]
>>>>>>>>>>>>> [afr-self-heald.c:656:afr_shd_full_healer]
>>>>>>>>>>>>> 0-GLUSTER1-replicate-0: finished full sweep on subvol GLUSTER1-client-1
>>>>>>>>>>>>> [2016-08-27 21:00:42.603315] I [MSGID: 108026]
>>>>>>>>>>>>> [afr-self-heald.c:646:afr_shd_full_healer]
>>>>>>>>>>>>> 0-GLUSTER1-replicate-0: starting full sweep on subvol GLUSTER1-client-1
>>>>>>>>>>>>> [2016-08-27 21:00:46.148674] I [MSGID: 108026]
>>>>>>>>>>>>> [afr-self-heald.c:656:afr_shd_full_healer]
>>>>>>>>>>>>> 0-GLUSTER1-replicate-0: finished full sweep on subvol GLUSTER1-client-1
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> My suspicion is that this is what happened on your setup.
>>>>>>>>>>>>>>>>>> Could you confirm if that was the case?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Brick was brought online with force start then a full heal
>>>>>>>>>>>>>>>>> launched. Hours later after it became evident that it was not adding new
>>>>>>>>>>>>>>>>> files to heal I did try restarting self-heal daemon and relaunching full
>>>>>>>>>>>>>>>>> heal again. But this was after the heal had basically already failed to
>>>>>>>>>>>>>>>>> work as intended.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> OK. How did you figure it was not adding any new files? I
>>>>>>>>>>>>>>>> need to know what places you were monitoring to come to this conclusion.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> -Krutika
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> As for those logs, I did manager to do something that
>>>>>>>>>>>>>>>>>> caused these warning messages you shared earlier to appear in my client and
>>>>>>>>>>>>>>>>>> server logs.
>>>>>>>>>>>>>>>>>> Although these logs are annoying and a bit scary too,
>>>>>>>>>>>>>>>>>> they didn't do any harm to the data in my volume. Why they appear just
>>>>>>>>>>>>>>>>>> after a brick is replaced and under no other circumstances is something I'm
>>>>>>>>>>>>>>>>>> still investigating.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> But for future, it would be good to follow the steps
>>>>>>>>>>>>>>>>>> Anuradha gave as that would allow self-heal to at least detect that it has
>>>>>>>>>>>>>>>>>> some repairing to do whenever it is restarted whether intentionally or
>>>>>>>>>>>>>>>>>> otherwise.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I followed those steps as described on my test box and
>>>>>>>>>>>>>>>>> ended up with exact same outcome of adding shards at an agonizing slow pace
>>>>>>>>>>>>>>>>> and no creation of .shard directory or heals on shard directory.
>>>>>>>>>>>>>>>>> Directories visible from mount healed quickly. This was with one VM so it
>>>>>>>>>>>>>>>>> has only 800 shards as well. After hours at work it had added a total of
>>>>>>>>>>>>>>>>> 33 shards to be healed. I sent those logs yesterday as well though not the
>>>>>>>>>>>>>>>>> glustershd.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Does replace-brick command copy files in same manner? For
>>>>>>>>>>>>>>>>> these purposes I am contemplating just skipping the heal route.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> -Krutika
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Tue, Aug 30, 2016 at 2:22 AM, David Gossage <
>>>>>>>>>>>>>>>>>> dgossage at carouselchecks.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> attached brick and client logs from test machine where
>>>>>>>>>>>>>>>>>>> same behavior occurred not sure if anything new is there. its still on
>>>>>>>>>>>>>>>>>>> 3.8.2
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Number of Bricks: 1 x 3 = 3
>>>>>>>>>>>>>>>>>>> Transport-type: tcp
>>>>>>>>>>>>>>>>>>> Bricks:
>>>>>>>>>>>>>>>>>>> Brick1: 192.168.71.10:/gluster2/brick1/1
>>>>>>>>>>>>>>>>>>> Brick2: 192.168.71.11:/gluster2/brick2/1
>>>>>>>>>>>>>>>>>>> Brick3: 192.168.71.12:/gluster2/brick3/1
>>>>>>>>>>>>>>>>>>> Options Reconfigured:
>>>>>>>>>>>>>>>>>>> cluster.locking-scheme: granular
>>>>>>>>>>>>>>>>>>> performance.strict-o-direct: off
>>>>>>>>>>>>>>>>>>> features.shard-block-size: 64MB
>>>>>>>>>>>>>>>>>>> features.shard: on
>>>>>>>>>>>>>>>>>>> server.allow-insecure: on
>>>>>>>>>>>>>>>>>>> storage.owner-uid: 36
>>>>>>>>>>>>>>>>>>> storage.owner-gid: 36
>>>>>>>>>>>>>>>>>>> cluster.server-quorum-type: server
>>>>>>>>>>>>>>>>>>> cluster.quorum-type: auto
>>>>>>>>>>>>>>>>>>> network.remote-dio: on
>>>>>>>>>>>>>>>>>>> cluster.eager-lock: enable
>>>>>>>>>>>>>>>>>>> performance.stat-prefetch: off
>>>>>>>>>>>>>>>>>>> performance.io-cache: off
>>>>>>>>>>>>>>>>>>> performance.quick-read: off
>>>>>>>>>>>>>>>>>>> cluster.self-heal-window-size: 1024
>>>>>>>>>>>>>>>>>>> cluster.background-self-heal-count: 16
>>>>>>>>>>>>>>>>>>> nfs.enable-ino32: off
>>>>>>>>>>>>>>>>>>> nfs.addr-namelookup: off
>>>>>>>>>>>>>>>>>>> nfs.disable: on
>>>>>>>>>>>>>>>>>>> performance.read-ahead: off
>>>>>>>>>>>>>>>>>>> performance.readdir-ahead: on
>>>>>>>>>>>>>>>>>>> cluster.granular-entry-heal: on
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Mon, Aug 29, 2016 at 2:20 PM, David Gossage <
>>>>>>>>>>>>>>>>>>> dgossage at carouselchecks.com> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Mon, Aug 29, 2016 at 7:01 AM, Anuradha Talur <
>>>>>>>>>>>>>>>>>>>> atalur at redhat.com> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> ----- Original Message -----
>>>>>>>>>>>>>>>>>>>>> > From: "David Gossage" <dgossage at carouselchecks.com>
>>>>>>>>>>>>>>>>>>>>> > To: "Anuradha Talur" <atalur at redhat.com>
>>>>>>>>>>>>>>>>>>>>> > Cc: "gluster-users at gluster.org List" <
>>>>>>>>>>>>>>>>>>>>> Gluster-users at gluster.org>, "Krutika Dhananjay" <
>>>>>>>>>>>>>>>>>>>>> kdhananj at redhat.com>
>>>>>>>>>>>>>>>>>>>>> > Sent: Monday, August 29, 2016 5:12:42 PM
>>>>>>>>>>>>>>>>>>>>> > Subject: Re: [Gluster-users] 3.8.3 Shards Healing
>>>>>>>>>>>>>>>>>>>>> Glacier Slow
>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>> > On Mon, Aug 29, 2016 at 5:39 AM, Anuradha Talur <
>>>>>>>>>>>>>>>>>>>>> atalur at redhat.com> wrote:
>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>> > > Response inline.
>>>>>>>>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>>>>>>>>> > > ----- Original Message -----
>>>>>>>>>>>>>>>>>>>>> > > > From: "Krutika Dhananjay" <kdhananj at redhat.com>
>>>>>>>>>>>>>>>>>>>>> > > > To: "David Gossage" <dgossage at carouselchecks.com
>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>> > > > Cc: "gluster-users at gluster.org List" <
>>>>>>>>>>>>>>>>>>>>> Gluster-users at gluster.org>
>>>>>>>>>>>>>>>>>>>>> > > > Sent: Monday, August 29, 2016 3:55:04 PM
>>>>>>>>>>>>>>>>>>>>> > > > Subject: Re: [Gluster-users] 3.8.3 Shards
>>>>>>>>>>>>>>>>>>>>> Healing Glacier Slow
>>>>>>>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>>>>>>>> > > > Could you attach both client and brick logs?
>>>>>>>>>>>>>>>>>>>>> Meanwhile I will try these
>>>>>>>>>>>>>>>>>>>>> > > steps
>>>>>>>>>>>>>>>>>>>>> > > > out on my machines and see if it is easily
>>>>>>>>>>>>>>>>>>>>> recreatable.
>>>>>>>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>>>>>>>> > > > -Krutika
>>>>>>>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>>>>>>>> > > > On Mon, Aug 29, 2016 at 2:31 PM, David Gossage <
>>>>>>>>>>>>>>>>>>>>> > > dgossage at carouselchecks.com
>>>>>>>>>>>>>>>>>>>>> > > > > wrote:
>>>>>>>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>>>>>>>> > > > Centos 7 Gluster 3.8.3
>>>>>>>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>>>>>>>> > > > Brick1: ccgl1.gl.local:/gluster1/BRICK1/1
>>>>>>>>>>>>>>>>>>>>> > > > Brick2: ccgl2.gl.local:/gluster1/BRICK1/1
>>>>>>>>>>>>>>>>>>>>> > > > Brick3: ccgl4.gl.local:/gluster1/BRICK1/1
>>>>>>>>>>>>>>>>>>>>> > > > Options Reconfigured:
>>>>>>>>>>>>>>>>>>>>> > > > cluster.data-self-heal-algorithm: full
>>>>>>>>>>>>>>>>>>>>> > > > cluster.self-heal-daemon: on
>>>>>>>>>>>>>>>>>>>>> > > > cluster.locking-scheme: granular
>>>>>>>>>>>>>>>>>>>>> > > > features.shard-block-size: 64MB
>>>>>>>>>>>>>>>>>>>>> > > > features.shard: on
>>>>>>>>>>>>>>>>>>>>> > > > performance.readdir-ahead: on
>>>>>>>>>>>>>>>>>>>>> > > > storage.owner-uid: 36
>>>>>>>>>>>>>>>>>>>>> > > > storage.owner-gid: 36
>>>>>>>>>>>>>>>>>>>>> > > > performance.quick-read: off
>>>>>>>>>>>>>>>>>>>>> > > > performance.read-ahead: off
>>>>>>>>>>>>>>>>>>>>> > > > performance.io-cache: off
>>>>>>>>>>>>>>>>>>>>> > > > performance.stat-prefetch: on
>>>>>>>>>>>>>>>>>>>>> > > > cluster.eager-lock: enable
>>>>>>>>>>>>>>>>>>>>> > > > network.remote-dio: enable
>>>>>>>>>>>>>>>>>>>>> > > > cluster.quorum-type: auto
>>>>>>>>>>>>>>>>>>>>> > > > cluster.server-quorum-type: server
>>>>>>>>>>>>>>>>>>>>> > > > server.allow-insecure: on
>>>>>>>>>>>>>>>>>>>>> > > > cluster.self-heal-window-size: 1024
>>>>>>>>>>>>>>>>>>>>> > > > cluster.background-self-heal-count: 16
>>>>>>>>>>>>>>>>>>>>> > > > performance.strict-write-ordering: off
>>>>>>>>>>>>>>>>>>>>> > > > nfs.disable: on
>>>>>>>>>>>>>>>>>>>>> > > > nfs.addr-namelookup: off
>>>>>>>>>>>>>>>>>>>>> > > > nfs.enable-ino32: off
>>>>>>>>>>>>>>>>>>>>> > > > cluster.granular-entry-heal: on
>>>>>>>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>>>>>>>> > > > Friday did rolling upgrade from 3.8.3->3.8.3 no
>>>>>>>>>>>>>>>>>>>>> issues.
>>>>>>>>>>>>>>>>>>>>> > > > Following steps detailed in previous
>>>>>>>>>>>>>>>>>>>>> recommendations began proces of
>>>>>>>>>>>>>>>>>>>>> > > > replacing and healngbricks one node at a time.
>>>>>>>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>>>>>>>> > > > 1) kill pid of brick
>>>>>>>>>>>>>>>>>>>>> > > > 2) reconfigure brick from raid6 to raid10
>>>>>>>>>>>>>>>>>>>>> > > > 3) recreate directory of brick
>>>>>>>>>>>>>>>>>>>>> > > > 4) gluster volume start <> force
>>>>>>>>>>>>>>>>>>>>> > > > 5) gluster volume heal <> full
>>>>>>>>>>>>>>>>>>>>> > > Hi,
>>>>>>>>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>>>>>>>>> > > I'd suggest that full heal is not used. There are
>>>>>>>>>>>>>>>>>>>>> a few bugs in full heal.
>>>>>>>>>>>>>>>>>>>>> > > Better safe than sorry ;)
>>>>>>>>>>>>>>>>>>>>> > > Instead I'd suggest the following steps:
>>>>>>>>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>>>>>>>>> > > Currently I brought the node down by systemctl
>>>>>>>>>>>>>>>>>>>>> stop glusterd as I was
>>>>>>>>>>>>>>>>>>>>> > getting sporadic io issues and a few VM's paused so
>>>>>>>>>>>>>>>>>>>>> hoping that will help.
>>>>>>>>>>>>>>>>>>>>> > I may wait to do this till around 4PM when most work
>>>>>>>>>>>>>>>>>>>>> is done in case it
>>>>>>>>>>>>>>>>>>>>> > shoots load up.
>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>> > > 1) kill pid of brick
>>>>>>>>>>>>>>>>>>>>> > > 2) to configuring of brick that you need
>>>>>>>>>>>>>>>>>>>>> > > 3) recreate brick dir
>>>>>>>>>>>>>>>>>>>>> > > 4) while the brick is still down, from the mount
>>>>>>>>>>>>>>>>>>>>> point:
>>>>>>>>>>>>>>>>>>>>> > > a) create a dummy non existent dir under / of
>>>>>>>>>>>>>>>>>>>>> mount.
>>>>>>>>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>> > so if noee 2 is down brick, pick node for example 3
>>>>>>>>>>>>>>>>>>>>> and make a test dir
>>>>>>>>>>>>>>>>>>>>> > under its brick directory that doesnt exist on 2 or
>>>>>>>>>>>>>>>>>>>>> should I be dong this
>>>>>>>>>>>>>>>>>>>>> > over a gluster mount?
>>>>>>>>>>>>>>>>>>>>> You should be doing this over gluster mount.
>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>> > > b) set a non existent extended attribute on /
>>>>>>>>>>>>>>>>>>>>> of mount.
>>>>>>>>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>> > Could you give me an example of an attribute to
>>>>>>>>>>>>>>>>>>>>> set? I've read a tad on
>>>>>>>>>>>>>>>>>>>>> > this, and looked up attributes but haven't set any
>>>>>>>>>>>>>>>>>>>>> yet myself.
>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>> Sure. setfattr -n "user.some-name" -v "some-value"
>>>>>>>>>>>>>>>>>>>>> <path-to-mount>
>>>>>>>>>>>>>>>>>>>>> > Doing these steps will ensure that heal happens only
>>>>>>>>>>>>>>>>>>>>> from updated brick to
>>>>>>>>>>>>>>>>>>>>> > > down brick.
>>>>>>>>>>>>>>>>>>>>> > > 5) gluster v start <> force
>>>>>>>>>>>>>>>>>>>>> > > 6) gluster v heal <>
>>>>>>>>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>> > Will it matter if somewhere in gluster the full heal
>>>>>>>>>>>>>>>>>>>>> command was run other
>>>>>>>>>>>>>>>>>>>>> > day? Not sure if it eventually stops or times out.
>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>> full heal will stop once the crawl is done. So if you
>>>>>>>>>>>>>>>>>>>>> want to trigger heal again,
>>>>>>>>>>>>>>>>>>>>> run gluster v heal <>. Actually even brick up or
>>>>>>>>>>>>>>>>>>>>> volume start force should
>>>>>>>>>>>>>>>>>>>>> trigger the heal.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Did this on test bed today. its one server with 3
>>>>>>>>>>>>>>>>>>>> bricks on same machine so take that for what its worth. also it still runs
>>>>>>>>>>>>>>>>>>>> 3.8.2. Maybe ill update and re-run test.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> killed brick
>>>>>>>>>>>>>>>>>>>> deleted brick dir
>>>>>>>>>>>>>>>>>>>> recreated brick dir
>>>>>>>>>>>>>>>>>>>> created fake dir on gluster mount
>>>>>>>>>>>>>>>>>>>> set suggested fake attribute on it
>>>>>>>>>>>>>>>>>>>> ran volume start <> force
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> looked at files it said needed healing and it was just
>>>>>>>>>>>>>>>>>>>> 8 shards that were modified for few minutes I ran through steps
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> gave it few minutes and it stayed same
>>>>>>>>>>>>>>>>>>>> ran gluster volume <> heal
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> it healed all the directories and files you can see
>>>>>>>>>>>>>>>>>>>> over mount including fakedir.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> same issue for shards though. it adds more shards to
>>>>>>>>>>>>>>>>>>>> heal at glacier pace. slight jump in speed if I stat every file and dir in
>>>>>>>>>>>>>>>>>>>> VM running but not all shards.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> It started with 8 shards to heal and is now only at 33
>>>>>>>>>>>>>>>>>>>> out of 800 and probably wont finish adding for few days at rate it goes.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>>>>>>>>> > > > 1st node worked as expected took 12 hours to
>>>>>>>>>>>>>>>>>>>>> heal 1TB data. Load was
>>>>>>>>>>>>>>>>>>>>> > > little
>>>>>>>>>>>>>>>>>>>>> > > > heavy but nothing shocking.
>>>>>>>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>>>>>>>> > > > About an hour after node 1 finished I began same
>>>>>>>>>>>>>>>>>>>>> process on node2. Heal
>>>>>>>>>>>>>>>>>>>>> > > > proces kicked in as before and the files in
>>>>>>>>>>>>>>>>>>>>> directories visible from
>>>>>>>>>>>>>>>>>>>>> > > mount
>>>>>>>>>>>>>>>>>>>>> > > > and .glusterfs healed in short time. Then it
>>>>>>>>>>>>>>>>>>>>> began crawl of .shard adding
>>>>>>>>>>>>>>>>>>>>> > > > those files to heal count at which point the
>>>>>>>>>>>>>>>>>>>>> entire proces ground to a
>>>>>>>>>>>>>>>>>>>>> > > halt
>>>>>>>>>>>>>>>>>>>>> > > > basically. After 48 hours out of 19k shards it
>>>>>>>>>>>>>>>>>>>>> has added 5900 to heal
>>>>>>>>>>>>>>>>>>>>> > > list.
>>>>>>>>>>>>>>>>>>>>> > > > Load on all 3 machnes is negligible. It was
>>>>>>>>>>>>>>>>>>>>> suggested to change this
>>>>>>>>>>>>>>>>>>>>> > > value
>>>>>>>>>>>>>>>>>>>>> > > > to full cluster.data-self-heal-algorithm and
>>>>>>>>>>>>>>>>>>>>> restart volume which I
>>>>>>>>>>>>>>>>>>>>> > > did. No
>>>>>>>>>>>>>>>>>>>>> > > > efffect. Tried relaunching heal no effect,
>>>>>>>>>>>>>>>>>>>>> despite any node picked. I
>>>>>>>>>>>>>>>>>>>>> > > > started each VM and performed a stat of all
>>>>>>>>>>>>>>>>>>>>> files from within it, or a
>>>>>>>>>>>>>>>>>>>>> > > full
>>>>>>>>>>>>>>>>>>>>> > > > virus scan and that seemed to cause short small
>>>>>>>>>>>>>>>>>>>>> spikes in shards added,
>>>>>>>>>>>>>>>>>>>>> > > but
>>>>>>>>>>>>>>>>>>>>> > > > not by much. Logs are showing no real messages
>>>>>>>>>>>>>>>>>>>>> indicating anything is
>>>>>>>>>>>>>>>>>>>>> > > going
>>>>>>>>>>>>>>>>>>>>> > > > on. I get hits to brick log on occasion of null
>>>>>>>>>>>>>>>>>>>>> lookups making me think
>>>>>>>>>>>>>>>>>>>>> > > its
>>>>>>>>>>>>>>>>>>>>> > > > not really crawling shards directory but waiting
>>>>>>>>>>>>>>>>>>>>> for a shard lookup to
>>>>>>>>>>>>>>>>>>>>> > > add
>>>>>>>>>>>>>>>>>>>>> > > > it. I'll get following in brick log but not
>>>>>>>>>>>>>>>>>>>>> constant and sometime
>>>>>>>>>>>>>>>>>>>>> > > multiple
>>>>>>>>>>>>>>>>>>>>> > > > for same shard.
>>>>>>>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>>>>>>>> > > > [2016-08-29 08:31:57.478125] W [MSGID: 115009]
>>>>>>>>>>>>>>>>>>>>> > > > [server-resolve.c:569:server_resolve]
>>>>>>>>>>>>>>>>>>>>> 0-GLUSTER1-server: no resolution
>>>>>>>>>>>>>>>>>>>>> > > type
>>>>>>>>>>>>>>>>>>>>> > > > for (null) (LOOKUP)
>>>>>>>>>>>>>>>>>>>>> > > > [2016-08-29 08:31:57.478170] E [MSGID: 115050]
>>>>>>>>>>>>>>>>>>>>> > > > [server-rpc-fops.c:156:server_lookup_cbk]
>>>>>>>>>>>>>>>>>>>>> 0-GLUSTER1-server: 12591783:
>>>>>>>>>>>>>>>>>>>>> > > > LOOKUP (null) (00000000-0000-0000-00
>>>>>>>>>>>>>>>>>>>>> > > > 00-000000000000/241a55ed-f0d5-4dbc-a6ce-ab784a0ba6ff.221)
>>>>>>>>>>>>>>>>>>>>> ==> (Invalid
>>>>>>>>>>>>>>>>>>>>> > > > argument) [Invalid argument]
>>>>>>>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>>>>>>>> > > > This one repeated about 30 times in row then
>>>>>>>>>>>>>>>>>>>>> nothing for 10 minutes then
>>>>>>>>>>>>>>>>>>>>> > > one
>>>>>>>>>>>>>>>>>>>>> > > > hit for one different shard by itself.
>>>>>>>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>>>>>>>> > > > How can I determine if Heal is actually running?
>>>>>>>>>>>>>>>>>>>>> How can I kill it or
>>>>>>>>>>>>>>>>>>>>> > > force
>>>>>>>>>>>>>>>>>>>>> > > > restart? Does node I start it from determine
>>>>>>>>>>>>>>>>>>>>> which directory gets
>>>>>>>>>>>>>>>>>>>>> > > crawled to
>>>>>>>>>>>>>>>>>>>>> > > > determine heals?
>>>>>>>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>>>>>>>> > > > David Gossage
>>>>>>>>>>>>>>>>>>>>> > > > Carousel Checks Inc. | System Administrator
>>>>>>>>>>>>>>>>>>>>> > > > Office 708.613.2284
>>>>>>>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>>>>>>>> > > > _______________________________________________
>>>>>>>>>>>>>>>>>>>>> > > > Gluster-users mailing list
>>>>>>>>>>>>>>>>>>>>> > > > Gluster-users at gluster.org
>>>>>>>>>>>>>>>>>>>>> > > > http://www.gluster.org/mailman
>>>>>>>>>>>>>>>>>>>>> /listinfo/gluster-users
>>>>>>>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>>>>>>>> > > >
>>>>>>>>>>>>>>>>>>>>> > > > _______________________________________________
>>>>>>>>>>>>>>>>>>>>> > > > Gluster-users mailing list
>>>>>>>>>>>>>>>>>>>>> > > > Gluster-users at gluster.org
>>>>>>>>>>>>>>>>>>>>> > > > http://www.gluster.org/mailman
>>>>>>>>>>>>>>>>>>>>> /listinfo/gluster-users
>>>>>>>>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>>>>>>>>> > > --
>>>>>>>>>>>>>>>>>>>>> > > Thanks,
>>>>>>>>>>>>>>>>>>>>> > > Anuradha.
>>>>>>>>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>> Anuradha.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160906/d54504ac/attachment.html>
More information about the Gluster-users
mailing list