[Gluster-users] 3.8.3 Shards Healing Glacier Slow

Tue Aug 30 13:52:58 UTC 2016

On Tue, Aug 30, 2016 at 8:01 AM, Krutika Dhananjay <kdhananj at redhat.com>
wrote:

>
>
> On Tue, Aug 30, 2016 at 6:20 PM, Krutika Dhananjay <kdhananj at redhat.com>
> wrote:
>
>>
>>
>> On Tue, Aug 30, 2016 at 6:07 PM, David Gossage <
>> dgossage at carouselchecks.com> wrote:
>>
>>> On Tue, Aug 30, 2016 at 7:18 AM, Krutika Dhananjay <kdhananj at redhat.com>
>>> wrote:
>>>
>>>> Could you also share the glustershd logs?
>>>>
>>>
>>> I'll get them when I get to work sure
>>>
>>
>>>
>>>>
>>>> I tried the same steps that you mentioned multiple times, but heal is
>>>> running to completion without any issues.
>>>>
>>>> It must be said that 'heal full' traverses the files and directories in
>>>> a depth-first order and does heals also in the same order. But if it gets
>>>> interrupted in the middle (say because self-heal-daemon was either
>>>> intentionally or unintentionally brought offline and then brought back up),
>>>> self-heal will only pick up the entries that are so far marked as
>>>> new-entries that need heal which it will find in indices/xattrop directory.
>>>> What this means is that those files and directories that were not visited
>>>> during the crawl, will remain untouched and unhealed in this second
>>>> iteration of heal, unless you execute a 'heal-full' again.
>>>>
>>>
>>> So should it start healing shards as it crawls or not until after it
>>> crawls the entire .shard directory?  At the pace it was going that could be
>>> a week with one node appearing in the cluster but with no shard files if
>>> anything tries to access a file on that node.  From my experience other day
>>> telling it to heal full again did nothing regardless of node used.
>>>
>>
> Crawl is started from '/' of the volume. Whenever self-heal detects during
> the crawl that a file or directory is present in some brick(s) and absent
> in others, it creates the file on the bricks where it is absent and marks
> the fact that the file or directory might need data/entry and metadata heal
> too (this also means that an index is created under
> .glusterfs/indices/xattrop of the src bricks). And the data/entry and
> metadata heal are picked up and done in
>
the background with the help of these indices.
>

Looking at my 3rd node as example i find nearly an exact same number of
files in xattrop dir as reported by heal count at time I brought down node2
to try and alleviate read io errors that seemed to occur from what I was
guessing as attempts to use the node with no shards for reads.

Also attached are the glustershd logs from the 3 nodes, along with the test
node i tried yesterday with same results.

>
>
>>>
>>>> My suspicion is that this is what happened on your setup. Could you
>>>> confirm if that was the case?
>>>>
>>>
>>> Brick was brought online with force start then a full heal launched.
>>> Hours later after it became evident that it was not adding new files to
>>> heal I did try restarting self-heal daemon and relaunching full heal again.
>>> But this was after the heal had basically already failed to work as
>>> intended.
>>>
>>
>> OK. How did you figure it was not adding any new files? I need to know
>> what places you were monitoring to come to this conclusion.
>>
>> -Krutika
>>
>>
>>>
>>>
>>>> As for those logs, I did manager to do something that caused these
>>>> warning messages you shared earlier to appear in my client and server logs.
>>>> Although these logs are annoying and a bit scary too, they didn't do
>>>> any harm to the data in my volume. Why they appear just after a brick is
>>>> replaced and under no other circumstances is something I'm still
>>>> investigating.
>>>>
>>>> But for future, it would be good to follow the steps Anuradha gave as
>>>> that would allow self-heal to at least detect that it has some repairing to
>>>> do whenever it is restarted whether intentionally or otherwise.
>>>>
>>>
>>> I followed those steps as described on my test box and ended up with
>>> exact same outcome of adding shards at an agonizing slow pace and no
>>> creation of .shard directory or heals on shard directory.  Directories
>>> visible from mount healed quickly.  This was with one VM so it has only 800
>>> shards as well.  After hours at work it had added a total of 33 shards to
>>> be healed.  I sent those logs yesterday as well though not the glustershd.
>>>
>>> Does replace-brick command copy files in same manner?  For these
>>> purposes I am contemplating just skipping the heal route.
>>>
>>>
>>>> -Krutika
>>>>
>>>> On Tue, Aug 30, 2016 at 2:22 AM, David Gossage <
>>>> dgossage at carouselchecks.com> wrote:
>>>>
>>>>> attached brick and client logs from test machine where same behavior
>>>>> occurred not sure if anything new is there.  its still on 3.8.2
>>>>>
>>>>> Number of Bricks: 1 x 3 = 3
>>>>> Transport-type: tcp
>>>>> Bricks:
>>>>> Brick1: 192.168.71.10:/gluster2/brick1/1
>>>>> Brick2: 192.168.71.11:/gluster2/brick2/1
>>>>> Brick3: 192.168.71.12:/gluster2/brick3/1
>>>>> Options Reconfigured:
>>>>> cluster.locking-scheme: granular
>>>>> performance.strict-o-direct: off
>>>>> features.shard-block-size: 64MB
>>>>> features.shard: on
>>>>> server.allow-insecure: on
>>>>> storage.owner-uid: 36
>>>>> storage.owner-gid: 36
>>>>> cluster.server-quorum-type: server
>>>>> cluster.quorum-type: auto
>>>>> network.remote-dio: on
>>>>> cluster.eager-lock: enable
>>>>> performance.stat-prefetch: off
>>>>> performance.io-cache: off
>>>>> performance.quick-read: off
>>>>> cluster.self-heal-window-size: 1024
>>>>> cluster.background-self-heal-count: 16
>>>>> nfs.enable-ino32: off
>>>>> nfs.addr-namelookup: off
>>>>> nfs.disable: on
>>>>> performance.read-ahead: off
>>>>> performance.readdir-ahead: on
>>>>> cluster.granular-entry-heal: on
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Aug 29, 2016 at 2:20 PM, David Gossage <
>>>>> dgossage at carouselchecks.com> wrote:
>>>>>
>>>>>> On Mon, Aug 29, 2016 at 7:01 AM, Anuradha Talur <atalur at redhat.com>
>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ----- Original Message -----
>>>>>>> > From: "David Gossage" <dgossage at carouselchecks.com>
>>>>>>> > To: "Anuradha Talur" <atalur at redhat.com>
>>>>>>> > Cc: "gluster-users at gluster.org List" <Gluster-users at gluster.org>,
>>>>>>> "Krutika Dhananjay" <kdhananj at redhat.com>
>>>>>>> > Sent: Monday, August 29, 2016 5:12:42 PM
>>>>>>> > Subject: Re: [Gluster-users] 3.8.3 Shards Healing Glacier Slow
>>>>>>> >
>>>>>>> > On Mon, Aug 29, 2016 at 5:39 AM, Anuradha Talur <atalur at redhat.com>
>>>>>>> wrote:
>>>>>>> >
>>>>>>> > > Response inline.
>>>>>>> > >
>>>>>>> > > ----- Original Message -----
>>>>>>> > > > From: "Krutika Dhananjay" <kdhananj at redhat.com>
>>>>>>> > > > To: "David Gossage" <dgossage at carouselchecks.com>
>>>>>>> > > > Cc: "gluster-users at gluster.org List" <
>>>>>>> Gluster-users at gluster.org>
>>>>>>> > > > Sent: Monday, August 29, 2016 3:55:04 PM
>>>>>>> > > > Subject: Re: [Gluster-users] 3.8.3 Shards Healing Glacier Slow
>>>>>>> > > >
>>>>>>> > > > Could you attach both client and brick logs? Meanwhile I will
>>>>>>> try these
>>>>>>> > > steps
>>>>>>> > > > out on my machines and see if it is easily recreatable.
>>>>>>> > > >
>>>>>>> > > > -Krutika
>>>>>>> > > >
>>>>>>> > > > On Mon, Aug 29, 2016 at 2:31 PM, David Gossage <
>>>>>>> > > dgossage at carouselchecks.com
>>>>>>> > > > > wrote:
>>>>>>> > > >
>>>>>>> > > >
>>>>>>> > > >
>>>>>>> > > > Centos 7 Gluster 3.8.3
>>>>>>> > > >
>>>>>>> > > > Brick1: ccgl1.gl.local:/gluster1/BRICK1/1
>>>>>>> > > > Brick2: ccgl2.gl.local:/gluster1/BRICK1/1
>>>>>>> > > > Brick3: ccgl4.gl.local:/gluster1/BRICK1/1
>>>>>>> > > > Options Reconfigured:
>>>>>>> > > > cluster.data-self-heal-algorithm: full
>>>>>>> > > > cluster.self-heal-daemon: on
>>>>>>> > > > cluster.locking-scheme: granular
>>>>>>> > > > features.shard-block-size: 64MB
>>>>>>> > > > features.shard: on
>>>>>>> > > > performance.readdir-ahead: on
>>>>>>> > > > storage.owner-uid: 36
>>>>>>> > > > storage.owner-gid: 36
>>>>>>> > > > performance.quick-read: off
>>>>>>> > > > performance.read-ahead: off
>>>>>>> > > > performance.io-cache: off
>>>>>>> > > > performance.stat-prefetch: on
>>>>>>> > > > cluster.eager-lock: enable
>>>>>>> > > > network.remote-dio: enable
>>>>>>> > > > cluster.quorum-type: auto
>>>>>>> > > > cluster.server-quorum-type: server
>>>>>>> > > > server.allow-insecure: on
>>>>>>> > > > cluster.self-heal-window-size: 1024
>>>>>>> > > > cluster.background-self-heal-count: 16
>>>>>>> > > > performance.strict-write-ordering: off
>>>>>>> > > > nfs.disable: on
>>>>>>> > > > nfs.addr-namelookup: off
>>>>>>> > > > nfs.enable-ino32: off
>>>>>>> > > > cluster.granular-entry-heal: on
>>>>>>> > > >
>>>>>>> > > > Friday did rolling upgrade from 3.8.3->3.8.3 no issues.
>>>>>>> > > > Following steps detailed in previous recommendations began
>>>>>>> proces of
>>>>>>> > > > replacing and healngbricks one node at a time.
>>>>>>> > > >
>>>>>>> > > > 1) kill pid of brick
>>>>>>> > > > 2) reconfigure brick from raid6 to raid10
>>>>>>> > > > 3) recreate directory of brick
>>>>>>> > > > 4) gluster volume start <> force
>>>>>>> > > > 5) gluster volume heal <> full
>>>>>>> > > Hi,
>>>>>>> > >
>>>>>>> > > I'd suggest that full heal is not used. There are a few bugs in
>>>>>>> full heal.
>>>>>>> > > Better safe than sorry ;)
>>>>>>> > > Instead I'd suggest the following steps:
>>>>>>> > >
>>>>>>> > > Currently I brought the node down by systemctl stop glusterd as
>>>>>>> I was
>>>>>>> > getting sporadic io issues and a few VM's paused so hoping that
>>>>>>> will help.
>>>>>>> > I may wait to do this till around 4PM when most work is done in
>>>>>>> case it
>>>>>>> > shoots load up.
>>>>>>> >
>>>>>>> >
>>>>>>> > > 1) kill pid of brick
>>>>>>> > > 2) to configuring of brick that you need
>>>>>>> > > 3) recreate brick dir
>>>>>>> > > 4) while the brick is still down, from the mount point:
>>>>>>> > >    a) create a dummy non existent dir under / of mount.
>>>>>>> > >
>>>>>>> >
>>>>>>> > so if noee 2 is down brick, pick node for example 3 and make a
>>>>>>> test dir
>>>>>>> > under its brick directory that doesnt exist on 2 or should I be
>>>>>>> dong this
>>>>>>> > over a gluster mount?
>>>>>>> You should be doing this over gluster mount.
>>>>>>> >
>>>>>>> > >    b) set a non existent extended attribute on / of mount.
>>>>>>> > >
>>>>>>> >
>>>>>>> > Could you give me an example of an attribute to set?   I've read a
>>>>>>> tad on
>>>>>>> > this, and looked up attributes but haven't set any yet myself.
>>>>>>> >
>>>>>>> Sure. setfattr -n "user.some-name" -v "some-value" <path-to-mount>
>>>>>>> > Doing these steps will ensure that heal happens only from updated
>>>>>>> brick to
>>>>>>> > > down brick.
>>>>>>> > > 5) gluster v start <> force
>>>>>>> > > 6) gluster v heal <>
>>>>>>> > >
>>>>>>> >
>>>>>>> > Will it matter if somewhere in gluster the full heal command was
>>>>>>> run other
>>>>>>> > day?  Not sure if it eventually stops or times out.
>>>>>>> >
>>>>>>> full heal will stop once the crawl is done. So if you want to
>>>>>>> trigger heal again,
>>>>>>> run gluster v heal <>. Actually even brick up or volume start force
>>>>>>> should
>>>>>>> trigger the heal.
>>>>>>>
>>>>>>
>>>>>> Did this on test bed today.  its one server with 3 bricks on same
>>>>>> machine so take that for what its worth.  also it still runs 3.8.2.  Maybe
>>>>>> ill update and re-run test.
>>>>>>
>>>>>> killed brick
>>>>>> deleted brick dir
>>>>>> recreated brick dir
>>>>>> created fake dir on gluster mount
>>>>>> set suggested fake attribute on it
>>>>>> ran volume start <> force
>>>>>>
>>>>>> looked at files it said needed healing and it was just 8 shards that
>>>>>> were modified for few minutes I ran through steps
>>>>>>
>>>>>> gave it few minutes and it stayed same
>>>>>> ran gluster volume <> heal
>>>>>>
>>>>>> it healed all the directories and files you can see over mount
>>>>>> including fakedir.
>>>>>>
>>>>>> same issue for shards though.  it adds more shards to heal at glacier
>>>>>> pace.  slight jump in speed if I stat every file and dir in VM running but
>>>>>> not all shards.
>>>>>>
>>>>>> It started with 8 shards to heal and is now only at 33 out of 800 and
>>>>>> probably wont finish adding for few days at rate it goes.
>>>>>>
>>>>>>
>>>>>>
>>>>>>> > >
>>>>>>> > > > 1st node worked as expected took 12 hours to heal 1TB data.
>>>>>>> Load was
>>>>>>> > > little
>>>>>>> > > > heavy but nothing shocking.
>>>>>>> > > >
>>>>>>> > > > About an hour after node 1 finished I began same process on
>>>>>>> node2. Heal
>>>>>>> > > > proces kicked in as before and the files in directories
>>>>>>> visible from
>>>>>>> > > mount
>>>>>>> > > > and .glusterfs healed in short time. Then it began crawl of
>>>>>>> .shard adding
>>>>>>> > > > those files to heal count at which point the entire proces
>>>>>>> ground to a
>>>>>>> > > halt
>>>>>>> > > > basically. After 48 hours out of 19k shards it has added 5900
>>>>>>> to heal
>>>>>>> > > list.
>>>>>>> > > > Load on all 3 machnes is negligible. It was suggested to
>>>>>>> change this
>>>>>>> > > value
>>>>>>> > > > to full cluster.data-self-heal-algorithm and restart volume
>>>>>>> which I
>>>>>>> > > did. No
>>>>>>> > > > efffect. Tried relaunching heal no effect, despite any node
>>>>>>> picked. I
>>>>>>> > > > started each VM and performed a stat of all files from within
>>>>>>> it, or a
>>>>>>> > > full
>>>>>>> > > > virus scan and that seemed to cause short small spikes in
>>>>>>> shards added,
>>>>>>> > > but
>>>>>>> > > > not by much. Logs are showing no real messages indicating
>>>>>>> anything is
>>>>>>> > > going
>>>>>>> > > > on. I get hits to brick log on occasion of null lookups making
>>>>>>> me think
>>>>>>> > > its
>>>>>>> > > > not really crawling shards directory but waiting for a shard
>>>>>>> lookup to
>>>>>>> > > add
>>>>>>> > > > it. I'll get following in brick log but not constant and
>>>>>>> sometime
>>>>>>> > > multiple
>>>>>>> > > > for same shard.
>>>>>>> > > >
>>>>>>> > > > [2016-08-29 08:31:57.478125] W [MSGID: 115009]
>>>>>>> > > > [server-resolve.c:569:server_resolve] 0-GLUSTER1-server: no
>>>>>>> resolution
>>>>>>> > > type
>>>>>>> > > > for (null) (LOOKUP)
>>>>>>> > > > [2016-08-29 08:31:57.478170] E [MSGID: 115050]
>>>>>>> > > > [server-rpc-fops.c:156:server_lookup_cbk] 0-GLUSTER1-server:
>>>>>>> 12591783:
>>>>>>> > > > LOOKUP (null) (00000000-0000-0000-00
>>>>>>> > > > 00-000000000000/241a55ed-f0d5-4dbc-a6ce-ab784a0ba6ff.221) ==>
>>>>>>> (Invalid
>>>>>>> > > > argument) [Invalid argument]
>>>>>>> > > >
>>>>>>> > > > This one repeated about 30 times in row then nothing for 10
>>>>>>> minutes then
>>>>>>> > > one
>>>>>>> > > > hit for one different shard by itself.
>>>>>>> > > >
>>>>>>> > > > How can I determine if Heal is actually running? How can I
>>>>>>> kill it or
>>>>>>> > > force
>>>>>>> > > > restart? Does node I start it from determine which directory
>>>>>>> gets
>>>>>>> > > crawled to
>>>>>>> > > > determine heals?
>>>>>>> > > >
>>>>>>> > > > David Gossage
>>>>>>> > > > Carousel Checks Inc. | System Administrator
>>>>>>> > > > Office 708.613.2284
>>>>>>> > > >
>>>>>>> > > > _______________________________________________
>>>>>>> > > > Gluster-users mailing list
>>>>>>> > > > Gluster-users at gluster.org
>>>>>>> > > > http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>>> > > >
>>>>>>> > > >
>>>>>>> > > > _______________________________________________
>>>>>>> > > > Gluster-users mailing list
>>>>>>> > > > Gluster-users at gluster.org
>>>>>>> > > > http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>>> > >
>>>>>>> > > --
>>>>>>> > > Thanks,
>>>>>>> > > Anuradha.
>>>>>>> > >
>>>>>>> >
>>>>>>>
>>>>>>> --
>>>>>>> Thanks,
>>>>>>> Anuradha.
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160830/8ba8b529/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: glustershd-node1
Type: application/octet-stream
Size: 322716 bytes
Desc: not available
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160830/8ba8b529/attachment-0002.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: glustershd-node2.gz
Type: application/x-gzip
Size: 645489 bytes
Desc: not available
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160830/8ba8b529/attachment-0002.gz>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: glustershd-node3.gz
Type: application/x-gzip
Size: 296635 bytes
Desc: not available
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160830/8ba8b529/attachment-0003.gz>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: glustershd-testnode
Type: application/octet-stream
Size: 20910 bytes
Desc: not available
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160830/8ba8b529/attachment-0003.obj>