[Gluster-users] 3.8.3 Shards Healing Glacier Slow

Krutika Dhananjay kdhananj at redhat.com
Tue Aug 30 12:50:02 UTC 2016


On Tue, Aug 30, 2016 at 6:07 PM, David Gossage <dgossage at carouselchecks.com>
wrote:

> On Tue, Aug 30, 2016 at 7:18 AM, Krutika Dhananjay <kdhananj at redhat.com>
> wrote:
>
>> Could you also share the glustershd logs?
>>
>
> I'll get them when I get to work sure.
>
>
>>
>> I tried the same steps that you mentioned multiple times, but heal is
>> running to completion without any issues.
>>
>> It must be said that 'heal full' traverses the files and directories in a
>> depth-first order and does heals also in the same order. But if it gets
>> interrupted in the middle (say because self-heal-daemon was either
>> intentionally or unintentionally brought offline and then brought back up),
>> self-heal will only pick up the entries that are so far marked as
>> new-entries that need heal which it will find in indices/xattrop directory.
>> What this means is that those files and directories that were not visited
>> during the crawl, will remain untouched and unhealed in this second
>> iteration of heal, unless you execute a 'heal-full' again.
>>
>
> So should it start healing shards as it crawls or not until after it
> crawls the entire .shard directory?  At the pace it was going that could be
> a week with one node appearing in the cluster but with no shard files if
> anything tries to access a file on that node.  From my experience other day
> telling it to heal full again did nothing regardless of node used.
>
>
>> My suspicion is that this is what happened on your setup. Could you
>> confirm if that was the case?
>>
>
> Brick was brought online with force start then a full heal launched.
> Hours later after it became evident that it was not adding new files to
> heal I did try restarting self-heal daemon and relaunching full heal again.
> But this was after the heal had basically already failed to work as
> intended.
>

OK. How did you figure it was not adding any new files? I need to know what
places you were monitoring to come to this conclusion.

-Krutika


>
>
>> As for those logs, I did manager to do something that caused these
>> warning messages you shared earlier to appear in my client and server logs.
>> Although these logs are annoying and a bit scary too, they didn't do any
>> harm to the data in my volume. Why they appear just after a brick is
>> replaced and under no other circumstances is something I'm still
>> investigating.
>>
>> But for future, it would be good to follow the steps Anuradha gave as
>> that would allow self-heal to at least detect that it has some repairing to
>> do whenever it is restarted whether intentionally or otherwise.
>>
>
> I followed those steps as described on my test box and ended up with exact
> same outcome of adding shards at an agonizing slow pace and no creation of
> .shard directory or heals on shard directory.  Directories visible from
> mount healed quickly.  This was with one VM so it has only 800 shards as
> well.  After hours at work it had added a total of 33 shards to be healed.
> I sent those logs yesterday as well though not the glustershd.
>
> Does replace-brick command copy files in same manner?  For these purposes
> I am contemplating just skipping the heal route.
>
>
>> -Krutika
>>
>> On Tue, Aug 30, 2016 at 2:22 AM, David Gossage <
>> dgossage at carouselchecks.com> wrote:
>>
>>> attached brick and client logs from test machine where same behavior
>>> occurred not sure if anything new is there.  its still on 3.8.2
>>>
>>> Number of Bricks: 1 x 3 = 3
>>> Transport-type: tcp
>>> Bricks:
>>> Brick1: 192.168.71.10:/gluster2/brick1/1
>>> Brick2: 192.168.71.11:/gluster2/brick2/1
>>> Brick3: 192.168.71.12:/gluster2/brick3/1
>>> Options Reconfigured:
>>> cluster.locking-scheme: granular
>>> performance.strict-o-direct: off
>>> features.shard-block-size: 64MB
>>> features.shard: on
>>> server.allow-insecure: on
>>> storage.owner-uid: 36
>>> storage.owner-gid: 36
>>> cluster.server-quorum-type: server
>>> cluster.quorum-type: auto
>>> network.remote-dio: on
>>> cluster.eager-lock: enable
>>> performance.stat-prefetch: off
>>> performance.io-cache: off
>>> performance.quick-read: off
>>> cluster.self-heal-window-size: 1024
>>> cluster.background-self-heal-count: 16
>>> nfs.enable-ino32: off
>>> nfs.addr-namelookup: off
>>> nfs.disable: on
>>> performance.read-ahead: off
>>> performance.readdir-ahead: on
>>> cluster.granular-entry-heal: on
>>>
>>>
>>>
>>> On Mon, Aug 29, 2016 at 2:20 PM, David Gossage <
>>> dgossage at carouselchecks.com> wrote:
>>>
>>>> On Mon, Aug 29, 2016 at 7:01 AM, Anuradha Talur <atalur at redhat.com>
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>> ----- Original Message -----
>>>>> > From: "David Gossage" <dgossage at carouselchecks.com>
>>>>> > To: "Anuradha Talur" <atalur at redhat.com>
>>>>> > Cc: "gluster-users at gluster.org List" <Gluster-users at gluster.org>,
>>>>> "Krutika Dhananjay" <kdhananj at redhat.com>
>>>>> > Sent: Monday, August 29, 2016 5:12:42 PM
>>>>> > Subject: Re: [Gluster-users] 3.8.3 Shards Healing Glacier Slow
>>>>> >
>>>>> > On Mon, Aug 29, 2016 at 5:39 AM, Anuradha Talur <atalur at redhat.com>
>>>>> wrote:
>>>>> >
>>>>> > > Response inline.
>>>>> > >
>>>>> > > ----- Original Message -----
>>>>> > > > From: "Krutika Dhananjay" <kdhananj at redhat.com>
>>>>> > > > To: "David Gossage" <dgossage at carouselchecks.com>
>>>>> > > > Cc: "gluster-users at gluster.org List" <Gluster-users at gluster.org>
>>>>> > > > Sent: Monday, August 29, 2016 3:55:04 PM
>>>>> > > > Subject: Re: [Gluster-users] 3.8.3 Shards Healing Glacier Slow
>>>>> > > >
>>>>> > > > Could you attach both client and brick logs? Meanwhile I will
>>>>> try these
>>>>> > > steps
>>>>> > > > out on my machines and see if it is easily recreatable.
>>>>> > > >
>>>>> > > > -Krutika
>>>>> > > >
>>>>> > > > On Mon, Aug 29, 2016 at 2:31 PM, David Gossage <
>>>>> > > dgossage at carouselchecks.com
>>>>> > > > > wrote:
>>>>> > > >
>>>>> > > >
>>>>> > > >
>>>>> > > > Centos 7 Gluster 3.8.3
>>>>> > > >
>>>>> > > > Brick1: ccgl1.gl.local:/gluster1/BRICK1/1
>>>>> > > > Brick2: ccgl2.gl.local:/gluster1/BRICK1/1
>>>>> > > > Brick3: ccgl4.gl.local:/gluster1/BRICK1/1
>>>>> > > > Options Reconfigured:
>>>>> > > > cluster.data-self-heal-algorithm: full
>>>>> > > > cluster.self-heal-daemon: on
>>>>> > > > cluster.locking-scheme: granular
>>>>> > > > features.shard-block-size: 64MB
>>>>> > > > features.shard: on
>>>>> > > > performance.readdir-ahead: on
>>>>> > > > storage.owner-uid: 36
>>>>> > > > storage.owner-gid: 36
>>>>> > > > performance.quick-read: off
>>>>> > > > performance.read-ahead: off
>>>>> > > > performance.io-cache: off
>>>>> > > > performance.stat-prefetch: on
>>>>> > > > cluster.eager-lock: enable
>>>>> > > > network.remote-dio: enable
>>>>> > > > cluster.quorum-type: auto
>>>>> > > > cluster.server-quorum-type: server
>>>>> > > > server.allow-insecure: on
>>>>> > > > cluster.self-heal-window-size: 1024
>>>>> > > > cluster.background-self-heal-count: 16
>>>>> > > > performance.strict-write-ordering: off
>>>>> > > > nfs.disable: on
>>>>> > > > nfs.addr-namelookup: off
>>>>> > > > nfs.enable-ino32: off
>>>>> > > > cluster.granular-entry-heal: on
>>>>> > > >
>>>>> > > > Friday did rolling upgrade from 3.8.3->3.8.3 no issues.
>>>>> > > > Following steps detailed in previous recommendations began
>>>>> proces of
>>>>> > > > replacing and healngbricks one node at a time.
>>>>> > > >
>>>>> > > > 1) kill pid of brick
>>>>> > > > 2) reconfigure brick from raid6 to raid10
>>>>> > > > 3) recreate directory of brick
>>>>> > > > 4) gluster volume start <> force
>>>>> > > > 5) gluster volume heal <> full
>>>>> > > Hi,
>>>>> > >
>>>>> > > I'd suggest that full heal is not used. There are a few bugs in
>>>>> full heal.
>>>>> > > Better safe than sorry ;)
>>>>> > > Instead I'd suggest the following steps:
>>>>> > >
>>>>> > > Currently I brought the node down by systemctl stop glusterd as I
>>>>> was
>>>>> > getting sporadic io issues and a few VM's paused so hoping that will
>>>>> help.
>>>>> > I may wait to do this till around 4PM when most work is done in case
>>>>> it
>>>>> > shoots load up.
>>>>> >
>>>>> >
>>>>> > > 1) kill pid of brick
>>>>> > > 2) to configuring of brick that you need
>>>>> > > 3) recreate brick dir
>>>>> > > 4) while the brick is still down, from the mount point:
>>>>> > >    a) create a dummy non existent dir under / of mount.
>>>>> > >
>>>>> >
>>>>> > so if noee 2 is down brick, pick node for example 3 and make a test
>>>>> dir
>>>>> > under its brick directory that doesnt exist on 2 or should I be dong
>>>>> this
>>>>> > over a gluster mount?
>>>>> You should be doing this over gluster mount.
>>>>> >
>>>>> > >    b) set a non existent extended attribute on / of mount.
>>>>> > >
>>>>> >
>>>>> > Could you give me an example of an attribute to set?   I've read a
>>>>> tad on
>>>>> > this, and looked up attributes but haven't set any yet myself.
>>>>> >
>>>>> Sure. setfattr -n "user.some-name" -v "some-value" <path-to-mount>
>>>>> > Doing these steps will ensure that heal happens only from updated
>>>>> brick to
>>>>> > > down brick.
>>>>> > > 5) gluster v start <> force
>>>>> > > 6) gluster v heal <>
>>>>> > >
>>>>> >
>>>>> > Will it matter if somewhere in gluster the full heal command was run
>>>>> other
>>>>> > day?  Not sure if it eventually stops or times out.
>>>>> >
>>>>> full heal will stop once the crawl is done. So if you want to trigger
>>>>> heal again,
>>>>> run gluster v heal <>. Actually even brick up or volume start force
>>>>> should
>>>>> trigger the heal.
>>>>>
>>>>
>>>> Did this on test bed today.  its one server with 3 bricks on same
>>>> machine so take that for what its worth.  also it still runs 3.8.2.  Maybe
>>>> ill update and re-run test.
>>>>
>>>> killed brick
>>>> deleted brick dir
>>>> recreated brick dir
>>>> created fake dir on gluster mount
>>>> set suggested fake attribute on it
>>>> ran volume start <> force
>>>>
>>>> looked at files it said needed healing and it was just 8 shards that
>>>> were modified for few minutes I ran through steps
>>>>
>>>> gave it few minutes and it stayed same
>>>> ran gluster volume <> heal
>>>>
>>>> it healed all the directories and files you can see over mount
>>>> including fakedir.
>>>>
>>>> same issue for shards though.  it adds more shards to heal at glacier
>>>> pace.  slight jump in speed if I stat every file and dir in VM running but
>>>> not all shards.
>>>>
>>>> It started with 8 shards to heal and is now only at 33 out of 800 and
>>>> probably wont finish adding for few days at rate it goes.
>>>>
>>>>
>>>>
>>>>> > >
>>>>> > > > 1st node worked as expected took 12 hours to heal 1TB data. Load
>>>>> was
>>>>> > > little
>>>>> > > > heavy but nothing shocking.
>>>>> > > >
>>>>> > > > About an hour after node 1 finished I began same process on
>>>>> node2. Heal
>>>>> > > > proces kicked in as before and the files in directories visible
>>>>> from
>>>>> > > mount
>>>>> > > > and .glusterfs healed in short time. Then it began crawl of
>>>>> .shard adding
>>>>> > > > those files to heal count at which point the entire proces
>>>>> ground to a
>>>>> > > halt
>>>>> > > > basically. After 48 hours out of 19k shards it has added 5900 to
>>>>> heal
>>>>> > > list.
>>>>> > > > Load on all 3 machnes is negligible. It was suggested to change
>>>>> this
>>>>> > > value
>>>>> > > > to full cluster.data-self-heal-algorithm and restart volume
>>>>> which I
>>>>> > > did. No
>>>>> > > > efffect. Tried relaunching heal no effect, despite any node
>>>>> picked. I
>>>>> > > > started each VM and performed a stat of all files from within
>>>>> it, or a
>>>>> > > full
>>>>> > > > virus scan and that seemed to cause short small spikes in shards
>>>>> added,
>>>>> > > but
>>>>> > > > not by much. Logs are showing no real messages indicating
>>>>> anything is
>>>>> > > going
>>>>> > > > on. I get hits to brick log on occasion of null lookups making
>>>>> me think
>>>>> > > its
>>>>> > > > not really crawling shards directory but waiting for a shard
>>>>> lookup to
>>>>> > > add
>>>>> > > > it. I'll get following in brick log but not constant and sometime
>>>>> > > multiple
>>>>> > > > for same shard.
>>>>> > > >
>>>>> > > > [2016-08-29 08:31:57.478125] W [MSGID: 115009]
>>>>> > > > [server-resolve.c:569:server_resolve] 0-GLUSTER1-server: no
>>>>> resolution
>>>>> > > type
>>>>> > > > for (null) (LOOKUP)
>>>>> > > > [2016-08-29 08:31:57.478170] E [MSGID: 115050]
>>>>> > > > [server-rpc-fops.c:156:server_lookup_cbk] 0-GLUSTER1-server:
>>>>> 12591783:
>>>>> > > > LOOKUP (null) (00000000-0000-0000-00
>>>>> > > > 00-000000000000/241a55ed-f0d5-4dbc-a6ce-ab784a0ba6ff.221) ==>
>>>>> (Invalid
>>>>> > > > argument) [Invalid argument]
>>>>> > > >
>>>>> > > > This one repeated about 30 times in row then nothing for 10
>>>>> minutes then
>>>>> > > one
>>>>> > > > hit for one different shard by itself.
>>>>> > > >
>>>>> > > > How can I determine if Heal is actually running? How can I kill
>>>>> it or
>>>>> > > force
>>>>> > > > restart? Does node I start it from determine which directory gets
>>>>> > > crawled to
>>>>> > > > determine heals?
>>>>> > > >
>>>>> > > > David Gossage
>>>>> > > > Carousel Checks Inc. | System Administrator
>>>>> > > > Office 708.613.2284
>>>>> > > >
>>>>> > > > _______________________________________________
>>>>> > > > Gluster-users mailing list
>>>>> > > > Gluster-users at gluster.org
>>>>> > > > http://www.gluster.org/mailman/listinfo/gluster-users
>>>>> > > >
>>>>> > > >
>>>>> > > > _______________________________________________
>>>>> > > > Gluster-users mailing list
>>>>> > > > Gluster-users at gluster.org
>>>>> > > > http://www.gluster.org/mailman/listinfo/gluster-users
>>>>> > >
>>>>> > > --
>>>>> > > Thanks,
>>>>> > > Anuradha.
>>>>> > >
>>>>> >
>>>>>
>>>>> --
>>>>> Thanks,
>>>>> Anuradha.
>>>>>
>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160830/c99e022e/attachment.html>


More information about the Gluster-users mailing list