[Gluster-users] 3.8.3 Shards Healing Glacier Slow

David Gossage dgossage at carouselchecks.com
Tue Aug 30 14:03:17 UTC 2016


On Tue, Aug 30, 2016 at 8:52 AM, David Gossage <dgossage at carouselchecks.com>
wrote:

> On Tue, Aug 30, 2016 at 8:01 AM, Krutika Dhananjay <kdhananj at redhat.com>
> wrote:
>
>>
>>
>> On Tue, Aug 30, 2016 at 6:20 PM, Krutika Dhananjay <kdhananj at redhat.com>
>> wrote:
>>
>>>
>>>
>>> On Tue, Aug 30, 2016 at 6:07 PM, David Gossage <
>>> dgossage at carouselchecks.com> wrote:
>>>
>>>> On Tue, Aug 30, 2016 at 7:18 AM, Krutika Dhananjay <kdhananj at redhat.com
>>>> > wrote:
>>>>
>>>>> Could you also share the glustershd logs?
>>>>>
>>>>
>>>> I'll get them when I get to work sure
>>>>
>>>
>>>>
>>>>>
>>>>> I tried the same steps that you mentioned multiple times, but heal is
>>>>> running to completion without any issues.
>>>>>
>>>>> It must be said that 'heal full' traverses the files and directories
>>>>> in a depth-first order and does heals also in the same order. But if it
>>>>> gets interrupted in the middle (say because self-heal-daemon was either
>>>>> intentionally or unintentionally brought offline and then brought back up),
>>>>> self-heal will only pick up the entries that are so far marked as
>>>>> new-entries that need heal which it will find in indices/xattrop directory.
>>>>> What this means is that those files and directories that were not visited
>>>>> during the crawl, will remain untouched and unhealed in this second
>>>>> iteration of heal, unless you execute a 'heal-full' again.
>>>>>
>>>>
>>>> So should it start healing shards as it crawls or not until after it
>>>> crawls the entire .shard directory?  At the pace it was going that could be
>>>> a week with one node appearing in the cluster but with no shard files if
>>>> anything tries to access a file on that node.  From my experience other day
>>>> telling it to heal full again did nothing regardless of node used.
>>>>
>>>
>> Crawl is started from '/' of the volume. Whenever self-heal detects
>> during the crawl that a file or directory is present in some brick(s) and
>> absent in others, it creates the file on the bricks where it is absent and
>> marks the fact that the file or directory might need data/entry and
>> metadata heal too (this also means that an index is created under
>> .glusterfs/indices/xattrop of the src bricks). And the data/entry and
>> metadata heal are picked up and done in
>>
> the background with the help of these indices.
>>
>
> Looking at my 3rd node as example i find nearly an exact same number of
> files in xattrop dir as reported by heal count at time I brought down node2
> to try and alleviate read io errors that seemed to occur from what I was
> guessing as attempts to use the node with no shards for reads.
>
> Also attached are the glustershd logs from the 3 nodes, along with the
> test node i tried yesterday with same results.
>

Is it possible you just need to spam the heal full command?  Wait for a
certain amount of time for it to time out?

The test server that I did yesterday that stopped at listing 33 shards then
healing none of them stlll had 33 shards in list this morning.  I issued
another heal full and it jumped up and found the missing shards.

On the one hand its reassuring that if I just spam the command enough
eventually it will heal.  It's also disconcerting that if I spam the
command enough times the heal will start.

I can't test if same behavior would occur on live node as I expect if it
did kick in heals I'd have 12 hours of high load during copy again
perhaps.  But I can test if it happens after last shift.  Though I lost
track of how many times I tried restarting heal full over Saturday and
Sunday when it looked to be doing nothing from all heal tracking commands
documented.


>>
>>>>
>>>>> My suspicion is that this is what happened on your setup. Could you
>>>>> confirm if that was the case?
>>>>>
>>>>
>>>> Brick was brought online with force start then a full heal launched.
>>>> Hours later after it became evident that it was not adding new files to
>>>> heal I did try restarting self-heal daemon and relaunching full heal again.
>>>> But this was after the heal had basically already failed to work as
>>>> intended.
>>>>
>>>
>>> OK. How did you figure it was not adding any new files? I need to know
>>> what places you were monitoring to come to this conclusion.
>>>
>>> -Krutika
>>>
>>>
>>>>
>>>>
>>>>> As for those logs, I did manager to do something that caused these
>>>>> warning messages you shared earlier to appear in my client and server logs.
>>>>> Although these logs are annoying and a bit scary too, they didn't do
>>>>> any harm to the data in my volume. Why they appear just after a brick is
>>>>> replaced and under no other circumstances is something I'm still
>>>>> investigating.
>>>>>
>>>>> But for future, it would be good to follow the steps Anuradha gave as
>>>>> that would allow self-heal to at least detect that it has some repairing to
>>>>> do whenever it is restarted whether intentionally or otherwise.
>>>>>
>>>>
>>>> I followed those steps as described on my test box and ended up with
>>>> exact same outcome of adding shards at an agonizing slow pace and no
>>>> creation of .shard directory or heals on shard directory.  Directories
>>>> visible from mount healed quickly.  This was with one VM so it has only 800
>>>> shards as well.  After hours at work it had added a total of 33 shards to
>>>> be healed.  I sent those logs yesterday as well though not the glustershd.
>>>>
>>>> Does replace-brick command copy files in same manner?  For these
>>>> purposes I am contemplating just skipping the heal route.
>>>>
>>>>
>>>>> -Krutika
>>>>>
>>>>> On Tue, Aug 30, 2016 at 2:22 AM, David Gossage <
>>>>> dgossage at carouselchecks.com> wrote:
>>>>>
>>>>>> attached brick and client logs from test machine where same behavior
>>>>>> occurred not sure if anything new is there.  its still on 3.8.2
>>>>>>
>>>>>> Number of Bricks: 1 x 3 = 3
>>>>>> Transport-type: tcp
>>>>>> Bricks:
>>>>>> Brick1: 192.168.71.10:/gluster2/brick1/1
>>>>>> Brick2: 192.168.71.11:/gluster2/brick2/1
>>>>>> Brick3: 192.168.71.12:/gluster2/brick3/1
>>>>>> Options Reconfigured:
>>>>>> cluster.locking-scheme: granular
>>>>>> performance.strict-o-direct: off
>>>>>> features.shard-block-size: 64MB
>>>>>> features.shard: on
>>>>>> server.allow-insecure: on
>>>>>> storage.owner-uid: 36
>>>>>> storage.owner-gid: 36
>>>>>> cluster.server-quorum-type: server
>>>>>> cluster.quorum-type: auto
>>>>>> network.remote-dio: on
>>>>>> cluster.eager-lock: enable
>>>>>> performance.stat-prefetch: off
>>>>>> performance.io-cache: off
>>>>>> performance.quick-read: off
>>>>>> cluster.self-heal-window-size: 1024
>>>>>> cluster.background-self-heal-count: 16
>>>>>> nfs.enable-ino32: off
>>>>>> nfs.addr-namelookup: off
>>>>>> nfs.disable: on
>>>>>> performance.read-ahead: off
>>>>>> performance.readdir-ahead: on
>>>>>> cluster.granular-entry-heal: on
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, Aug 29, 2016 at 2:20 PM, David Gossage <
>>>>>> dgossage at carouselchecks.com> wrote:
>>>>>>
>>>>>>> On Mon, Aug 29, 2016 at 7:01 AM, Anuradha Talur <atalur at redhat.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> ----- Original Message -----
>>>>>>>> > From: "David Gossage" <dgossage at carouselchecks.com>
>>>>>>>> > To: "Anuradha Talur" <atalur at redhat.com>
>>>>>>>> > Cc: "gluster-users at gluster.org List" <Gluster-users at gluster.org>,
>>>>>>>> "Krutika Dhananjay" <kdhananj at redhat.com>
>>>>>>>> > Sent: Monday, August 29, 2016 5:12:42 PM
>>>>>>>> > Subject: Re: [Gluster-users] 3.8.3 Shards Healing Glacier Slow
>>>>>>>> >
>>>>>>>> > On Mon, Aug 29, 2016 at 5:39 AM, Anuradha Talur <
>>>>>>>> atalur at redhat.com> wrote:
>>>>>>>> >
>>>>>>>> > > Response inline.
>>>>>>>> > >
>>>>>>>> > > ----- Original Message -----
>>>>>>>> > > > From: "Krutika Dhananjay" <kdhananj at redhat.com>
>>>>>>>> > > > To: "David Gossage" <dgossage at carouselchecks.com>
>>>>>>>> > > > Cc: "gluster-users at gluster.org List" <
>>>>>>>> Gluster-users at gluster.org>
>>>>>>>> > > > Sent: Monday, August 29, 2016 3:55:04 PM
>>>>>>>> > > > Subject: Re: [Gluster-users] 3.8.3 Shards Healing Glacier Slow
>>>>>>>> > > >
>>>>>>>> > > > Could you attach both client and brick logs? Meanwhile I will
>>>>>>>> try these
>>>>>>>> > > steps
>>>>>>>> > > > out on my machines and see if it is easily recreatable.
>>>>>>>> > > >
>>>>>>>> > > > -Krutika
>>>>>>>> > > >
>>>>>>>> > > > On Mon, Aug 29, 2016 at 2:31 PM, David Gossage <
>>>>>>>> > > dgossage at carouselchecks.com
>>>>>>>> > > > > wrote:
>>>>>>>> > > >
>>>>>>>> > > >
>>>>>>>> > > >
>>>>>>>> > > > Centos 7 Gluster 3.8.3
>>>>>>>> > > >
>>>>>>>> > > > Brick1: ccgl1.gl.local:/gluster1/BRICK1/1
>>>>>>>> > > > Brick2: ccgl2.gl.local:/gluster1/BRICK1/1
>>>>>>>> > > > Brick3: ccgl4.gl.local:/gluster1/BRICK1/1
>>>>>>>> > > > Options Reconfigured:
>>>>>>>> > > > cluster.data-self-heal-algorithm: full
>>>>>>>> > > > cluster.self-heal-daemon: on
>>>>>>>> > > > cluster.locking-scheme: granular
>>>>>>>> > > > features.shard-block-size: 64MB
>>>>>>>> > > > features.shard: on
>>>>>>>> > > > performance.readdir-ahead: on
>>>>>>>> > > > storage.owner-uid: 36
>>>>>>>> > > > storage.owner-gid: 36
>>>>>>>> > > > performance.quick-read: off
>>>>>>>> > > > performance.read-ahead: off
>>>>>>>> > > > performance.io-cache: off
>>>>>>>> > > > performance.stat-prefetch: on
>>>>>>>> > > > cluster.eager-lock: enable
>>>>>>>> > > > network.remote-dio: enable
>>>>>>>> > > > cluster.quorum-type: auto
>>>>>>>> > > > cluster.server-quorum-type: server
>>>>>>>> > > > server.allow-insecure: on
>>>>>>>> > > > cluster.self-heal-window-size: 1024
>>>>>>>> > > > cluster.background-self-heal-count: 16
>>>>>>>> > > > performance.strict-write-ordering: off
>>>>>>>> > > > nfs.disable: on
>>>>>>>> > > > nfs.addr-namelookup: off
>>>>>>>> > > > nfs.enable-ino32: off
>>>>>>>> > > > cluster.granular-entry-heal: on
>>>>>>>> > > >
>>>>>>>> > > > Friday did rolling upgrade from 3.8.3->3.8.3 no issues.
>>>>>>>> > > > Following steps detailed in previous recommendations began
>>>>>>>> proces of
>>>>>>>> > > > replacing and healngbricks one node at a time.
>>>>>>>> > > >
>>>>>>>> > > > 1) kill pid of brick
>>>>>>>> > > > 2) reconfigure brick from raid6 to raid10
>>>>>>>> > > > 3) recreate directory of brick
>>>>>>>> > > > 4) gluster volume start <> force
>>>>>>>> > > > 5) gluster volume heal <> full
>>>>>>>> > > Hi,
>>>>>>>> > >
>>>>>>>> > > I'd suggest that full heal is not used. There are a few bugs in
>>>>>>>> full heal.
>>>>>>>> > > Better safe than sorry ;)
>>>>>>>> > > Instead I'd suggest the following steps:
>>>>>>>> > >
>>>>>>>> > > Currently I brought the node down by systemctl stop glusterd as
>>>>>>>> I was
>>>>>>>> > getting sporadic io issues and a few VM's paused so hoping that
>>>>>>>> will help.
>>>>>>>> > I may wait to do this till around 4PM when most work is done in
>>>>>>>> case it
>>>>>>>> > shoots load up.
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > > 1) kill pid of brick
>>>>>>>> > > 2) to configuring of brick that you need
>>>>>>>> > > 3) recreate brick dir
>>>>>>>> > > 4) while the brick is still down, from the mount point:
>>>>>>>> > >    a) create a dummy non existent dir under / of mount.
>>>>>>>> > >
>>>>>>>> >
>>>>>>>> > so if noee 2 is down brick, pick node for example 3 and make a
>>>>>>>> test dir
>>>>>>>> > under its brick directory that doesnt exist on 2 or should I be
>>>>>>>> dong this
>>>>>>>> > over a gluster mount?
>>>>>>>> You should be doing this over gluster mount.
>>>>>>>> >
>>>>>>>> > >    b) set a non existent extended attribute on / of mount.
>>>>>>>> > >
>>>>>>>> >
>>>>>>>> > Could you give me an example of an attribute to set?   I've read
>>>>>>>> a tad on
>>>>>>>> > this, and looked up attributes but haven't set any yet myself.
>>>>>>>> >
>>>>>>>> Sure. setfattr -n "user.some-name" -v "some-value" <path-to-mount>
>>>>>>>> > Doing these steps will ensure that heal happens only from updated
>>>>>>>> brick to
>>>>>>>> > > down brick.
>>>>>>>> > > 5) gluster v start <> force
>>>>>>>> > > 6) gluster v heal <>
>>>>>>>> > >
>>>>>>>> >
>>>>>>>> > Will it matter if somewhere in gluster the full heal command was
>>>>>>>> run other
>>>>>>>> > day?  Not sure if it eventually stops or times out.
>>>>>>>> >
>>>>>>>> full heal will stop once the crawl is done. So if you want to
>>>>>>>> trigger heal again,
>>>>>>>> run gluster v heal <>. Actually even brick up or volume start force
>>>>>>>> should
>>>>>>>> trigger the heal.
>>>>>>>>
>>>>>>>
>>>>>>> Did this on test bed today.  its one server with 3 bricks on same
>>>>>>> machine so take that for what its worth.  also it still runs 3.8.2.  Maybe
>>>>>>> ill update and re-run test.
>>>>>>>
>>>>>>> killed brick
>>>>>>> deleted brick dir
>>>>>>> recreated brick dir
>>>>>>> created fake dir on gluster mount
>>>>>>> set suggested fake attribute on it
>>>>>>> ran volume start <> force
>>>>>>>
>>>>>>> looked at files it said needed healing and it was just 8 shards that
>>>>>>> were modified for few minutes I ran through steps
>>>>>>>
>>>>>>> gave it few minutes and it stayed same
>>>>>>> ran gluster volume <> heal
>>>>>>>
>>>>>>> it healed all the directories and files you can see over mount
>>>>>>> including fakedir.
>>>>>>>
>>>>>>> same issue for shards though.  it adds more shards to heal at
>>>>>>> glacier pace.  slight jump in speed if I stat every file and dir in VM
>>>>>>> running but not all shards.
>>>>>>>
>>>>>>> It started with 8 shards to heal and is now only at 33 out of 800
>>>>>>> and probably wont finish adding for few days at rate it goes.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> > >
>>>>>>>> > > > 1st node worked as expected took 12 hours to heal 1TB data.
>>>>>>>> Load was
>>>>>>>> > > little
>>>>>>>> > > > heavy but nothing shocking.
>>>>>>>> > > >
>>>>>>>> > > > About an hour after node 1 finished I began same process on
>>>>>>>> node2. Heal
>>>>>>>> > > > proces kicked in as before and the files in directories
>>>>>>>> visible from
>>>>>>>> > > mount
>>>>>>>> > > > and .glusterfs healed in short time. Then it began crawl of
>>>>>>>> .shard adding
>>>>>>>> > > > those files to heal count at which point the entire proces
>>>>>>>> ground to a
>>>>>>>> > > halt
>>>>>>>> > > > basically. After 48 hours out of 19k shards it has added 5900
>>>>>>>> to heal
>>>>>>>> > > list.
>>>>>>>> > > > Load on all 3 machnes is negligible. It was suggested to
>>>>>>>> change this
>>>>>>>> > > value
>>>>>>>> > > > to full cluster.data-self-heal-algorithm and restart volume
>>>>>>>> which I
>>>>>>>> > > did. No
>>>>>>>> > > > efffect. Tried relaunching heal no effect, despite any node
>>>>>>>> picked. I
>>>>>>>> > > > started each VM and performed a stat of all files from within
>>>>>>>> it, or a
>>>>>>>> > > full
>>>>>>>> > > > virus scan and that seemed to cause short small spikes in
>>>>>>>> shards added,
>>>>>>>> > > but
>>>>>>>> > > > not by much. Logs are showing no real messages indicating
>>>>>>>> anything is
>>>>>>>> > > going
>>>>>>>> > > > on. I get hits to brick log on occasion of null lookups
>>>>>>>> making me think
>>>>>>>> > > its
>>>>>>>> > > > not really crawling shards directory but waiting for a shard
>>>>>>>> lookup to
>>>>>>>> > > add
>>>>>>>> > > > it. I'll get following in brick log but not constant and
>>>>>>>> sometime
>>>>>>>> > > multiple
>>>>>>>> > > > for same shard.
>>>>>>>> > > >
>>>>>>>> > > > [2016-08-29 08:31:57.478125] W [MSGID: 115009]
>>>>>>>> > > > [server-resolve.c:569:server_resolve] 0-GLUSTER1-server: no
>>>>>>>> resolution
>>>>>>>> > > type
>>>>>>>> > > > for (null) (LOOKUP)
>>>>>>>> > > > [2016-08-29 08:31:57.478170] E [MSGID: 115050]
>>>>>>>> > > > [server-rpc-fops.c:156:server_lookup_cbk] 0-GLUSTER1-server:
>>>>>>>> 12591783:
>>>>>>>> > > > LOOKUP (null) (00000000-0000-0000-00
>>>>>>>> > > > 00-000000000000/241a55ed-f0d5-4dbc-a6ce-ab784a0ba6ff.221)
>>>>>>>> ==> (Invalid
>>>>>>>> > > > argument) [Invalid argument]
>>>>>>>> > > >
>>>>>>>> > > > This one repeated about 30 times in row then nothing for 10
>>>>>>>> minutes then
>>>>>>>> > > one
>>>>>>>> > > > hit for one different shard by itself.
>>>>>>>> > > >
>>>>>>>> > > > How can I determine if Heal is actually running? How can I
>>>>>>>> kill it or
>>>>>>>> > > force
>>>>>>>> > > > restart? Does node I start it from determine which directory
>>>>>>>> gets
>>>>>>>> > > crawled to
>>>>>>>> > > > determine heals?
>>>>>>>> > > >
>>>>>>>> > > > David Gossage
>>>>>>>> > > > Carousel Checks Inc. | System Administrator
>>>>>>>> > > > Office 708.613.2284
>>>>>>>> > > >
>>>>>>>> > > > _______________________________________________
>>>>>>>> > > > Gluster-users mailing list
>>>>>>>> > > > Gluster-users at gluster.org
>>>>>>>> > > > http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>>>> > > >
>>>>>>>> > > >
>>>>>>>> > > > _______________________________________________
>>>>>>>> > > > Gluster-users mailing list
>>>>>>>> > > > Gluster-users at gluster.org
>>>>>>>> > > > http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>>>> > >
>>>>>>>> > > --
>>>>>>>> > > Thanks,
>>>>>>>> > > Anuradha.
>>>>>>>> > >
>>>>>>>> >
>>>>>>>>
>>>>>>>> --
>>>>>>>> Thanks,
>>>>>>>> Anuradha.
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160830/35fd16da/attachment.html>


More information about the Gluster-users mailing list