[Gluster-users] 3.8.3 Shards Healing Glacier Slow

Mon Aug 29 20:52:22 UTC 2016

attached brick and client logs from test machine where same behavior
occurred not sure if anything new is there.  its still on 3.8.2

Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: 192.168.71.10:/gluster2/brick1/1
Brick2: 192.168.71.11:/gluster2/brick2/1
Brick3: 192.168.71.12:/gluster2/brick3/1
Options Reconfigured:
cluster.locking-scheme: granular
performance.strict-o-direct: off
features.shard-block-size: 64MB
features.shard: on
server.allow-insecure: on
storage.owner-uid: 36
storage.owner-gid: 36
cluster.server-quorum-type: server
cluster.quorum-type: auto
network.remote-dio: on
cluster.eager-lock: enable
performance.stat-prefetch: off
performance.io-cache: off
performance.quick-read: off
cluster.self-heal-window-size: 1024
cluster.background-self-heal-count: 16
nfs.enable-ino32: off
nfs.addr-namelookup: off
nfs.disable: on
performance.read-ahead: off
performance.readdir-ahead: on
cluster.granular-entry-heal: on

On Mon, Aug 29, 2016 at 2:20 PM, David Gossage <dgossage at carouselchecks.com>
wrote:

> On Mon, Aug 29, 2016 at 7:01 AM, Anuradha Talur <atalur at redhat.com> wrote:
>
>>
>>
>> ----- Original Message -----
>> > From: "David Gossage" <dgossage at carouselchecks.com>
>> > To: "Anuradha Talur" <atalur at redhat.com>
>> > Cc: "gluster-users at gluster.org List" <Gluster-users at gluster.org>,
>> "Krutika Dhananjay" <kdhananj at redhat.com>
>> > Sent: Monday, August 29, 2016 5:12:42 PM
>> > Subject: Re: [Gluster-users] 3.8.3 Shards Healing Glacier Slow
>> >
>> > On Mon, Aug 29, 2016 at 5:39 AM, Anuradha Talur <atalur at redhat.com>
>> wrote:
>> >
>> > > Response inline.
>> > >
>> > > ----- Original Message -----
>> > > > From: "Krutika Dhananjay" <kdhananj at redhat.com>
>> > > > To: "David Gossage" <dgossage at carouselchecks.com>
>> > > > Cc: "gluster-users at gluster.org List" <Gluster-users at gluster.org>
>> > > > Sent: Monday, August 29, 2016 3:55:04 PM
>> > > > Subject: Re: [Gluster-users] 3.8.3 Shards Healing Glacier Slow
>> > > >
>> > > > Could you attach both client and brick logs? Meanwhile I will try
>> these
>> > > steps
>> > > > out on my machines and see if it is easily recreatable.
>> > > >
>> > > > -Krutika
>> > > >
>> > > > On Mon, Aug 29, 2016 at 2:31 PM, David Gossage <
>> > > dgossage at carouselchecks.com
>> > > > > wrote:
>> > > >
>> > > >
>> > > >
>> > > > Centos 7 Gluster 3.8.3
>> > > >
>> > > > Brick1: ccgl1.gl.local:/gluster1/BRICK1/1
>> > > > Brick2: ccgl2.gl.local:/gluster1/BRICK1/1
>> > > > Brick3: ccgl4.gl.local:/gluster1/BRICK1/1
>> > > > Options Reconfigured:
>> > > > cluster.data-self-heal-algorithm: full
>> > > > cluster.self-heal-daemon: on
>> > > > cluster.locking-scheme: granular
>> > > > features.shard-block-size: 64MB
>> > > > features.shard: on
>> > > > performance.readdir-ahead: on
>> > > > storage.owner-uid: 36
>> > > > storage.owner-gid: 36
>> > > > performance.quick-read: off
>> > > > performance.read-ahead: off
>> > > > performance.io-cache: off
>> > > > performance.stat-prefetch: on
>> > > > cluster.eager-lock: enable
>> > > > network.remote-dio: enable
>> > > > cluster.quorum-type: auto
>> > > > cluster.server-quorum-type: server
>> > > > server.allow-insecure: on
>> > > > cluster.self-heal-window-size: 1024
>> > > > cluster.background-self-heal-count: 16
>> > > > performance.strict-write-ordering: off
>> > > > nfs.disable: on
>> > > > nfs.addr-namelookup: off
>> > > > nfs.enable-ino32: off
>> > > > cluster.granular-entry-heal: on
>> > > >
>> > > > Friday did rolling upgrade from 3.8.3->3.8.3 no issues.
>> > > > Following steps detailed in previous recommendations began proces of
>> > > > replacing and healngbricks one node at a time.
>> > > >
>> > > > 1) kill pid of brick
>> > > > 2) reconfigure brick from raid6 to raid10
>> > > > 3) recreate directory of brick
>> > > > 4) gluster volume start <> force
>> > > > 5) gluster volume heal <> full
>> > > Hi,
>> > >
>> > > I'd suggest that full heal is not used. There are a few bugs in full
>> heal.
>> > > Better safe than sorry ;)
>> > > Instead I'd suggest the following steps:
>> > >
>> > > Currently I brought the node down by systemctl stop glusterd as I was
>> > getting sporadic io issues and a few VM's paused so hoping that will
>> help.
>> > I may wait to do this till around 4PM when most work is done in case it
>> > shoots load up.
>> >
>> >
>> > > 1) kill pid of brick
>> > > 2) to configuring of brick that you need
>> > > 3) recreate brick dir
>> > > 4) while the brick is still down, from the mount point:
>> > >    a) create a dummy non existent dir under / of mount.
>> > >
>> >
>> > so if noee 2 is down brick, pick node for example 3 and make a test dir
>> > under its brick directory that doesnt exist on 2 or should I be dong
>> this
>> > over a gluster mount?
>> You should be doing this over gluster mount.
>> >
>> > >    b) set a non existent extended attribute on / of mount.
>> > >
>> >
>> > Could you give me an example of an attribute to set?   I've read a tad
>> on
>> > this, and looked up attributes but haven't set any yet myself.
>> >
>> Sure. setfattr -n "user.some-name" -v "some-value" <path-to-mount>
>> > Doing these steps will ensure that heal happens only from updated brick
>> to
>> > > down brick.
>> > > 5) gluster v start <> force
>> > > 6) gluster v heal <>
>> > >
>> >
>> > Will it matter if somewhere in gluster the full heal command was run
>> other
>> > day?  Not sure if it eventually stops or times out.
>> >
>> full heal will stop once the crawl is done. So if you want to trigger
>> heal again,
>> run gluster v heal <>. Actually even brick up or volume start force should
>> trigger the heal.
>>
>
> Did this on test bed today.  its one server with 3 bricks on same machine
> so take that for what its worth.  also it still runs 3.8.2.  Maybe ill
> update and re-run test.
>
> killed brick
> deleted brick dir
> recreated brick dir
> created fake dir on gluster mount
> set suggested fake attribute on it
> ran volume start <> force
>
> looked at files it said needed healing and it was just 8 shards that were
> modified for few minutes I ran through steps
>
> gave it few minutes and it stayed same
> ran gluster volume <> heal
>
> it healed all the directories and files you can see over mount including
> fakedir.
>
> same issue for shards though.  it adds more shards to heal at glacier
> pace.  slight jump in speed if I stat every file and dir in VM running but
> not all shards.
>
> It started with 8 shards to heal and is now only at 33 out of 800 and
> probably wont finish adding for few days at rate it goes.
>
>
>
>> > >
>> > > > 1st node worked as expected took 12 hours to heal 1TB data. Load was
>> > > little
>> > > > heavy but nothing shocking.
>> > > >
>> > > > About an hour after node 1 finished I began same process on node2.
>> Heal
>> > > > proces kicked in as before and the files in directories visible from
>> > > mount
>> > > > and .glusterfs healed in short time. Then it began crawl of .shard
>> adding
>> > > > those files to heal count at which point the entire proces ground
>> to a
>> > > halt
>> > > > basically. After 48 hours out of 19k shards it has added 5900 to
>> heal
>> > > list.
>> > > > Load on all 3 machnes is negligible. It was suggested to change this
>> > > value
>> > > > to full cluster.data-self-heal-algorithm and restart volume which I
>> > > did. No
>> > > > efffect. Tried relaunching heal no effect, despite any node picked.
>> I
>> > > > started each VM and performed a stat of all files from within it,
>> or a
>> > > full
>> > > > virus scan and that seemed to cause short small spikes in shards
>> added,
>> > > but
>> > > > not by much. Logs are showing no real messages indicating anything
>> is
>> > > going
>> > > > on. I get hits to brick log on occasion of null lookups making me
>> think
>> > > its
>> > > > not really crawling shards directory but waiting for a shard lookup
>> to
>> > > add
>> > > > it. I'll get following in brick log but not constant and sometime
>> > > multiple
>> > > > for same shard.
>> > > >
>> > > > [2016-08-29 08:31:57.478125] W [MSGID: 115009]
>> > > > [server-resolve.c:569:server_resolve] 0-GLUSTER1-server: no
>> resolution
>> > > type
>> > > > for (null) (LOOKUP)
>> > > > [2016-08-29 08:31:57.478170] E [MSGID: 115050]
>> > > > [server-rpc-fops.c:156:server_lookup_cbk] 0-GLUSTER1-server:
>> 12591783:
>> > > > LOOKUP (null) (00000000-0000-0000-00
>> > > > 00-000000000000/241a55ed-f0d5-4dbc-a6ce-ab784a0ba6ff.221) ==>
>> (Invalid
>> > > > argument) [Invalid argument]
>> > > >
>> > > > This one repeated about 30 times in row then nothing for 10 minutes
>> then
>> > > one
>> > > > hit for one different shard by itself.
>> > > >
>> > > > How can I determine if Heal is actually running? How can I kill it
>> or
>> > > force
>> > > > restart? Does node I start it from determine which directory gets
>> > > crawled to
>> > > > determine heals?
>> > > >
>> > > > David Gossage
>> > > > Carousel Checks Inc. | System Administrator
>> > > > Office 708.613.2284
>> > > >
>> > > > _______________________________________________
>> > > > Gluster-users mailing list
>> > > > Gluster-users at gluster.org
>> > > > http://www.gluster.org/mailman/listinfo/gluster-users
>> > > >
>> > > >
>> > > > _______________________________________________
>> > > > Gluster-users mailing list
>> > > > Gluster-users at gluster.org
>> > > > http://www.gluster.org/mailman/listinfo/gluster-users
>> > >
>> > > --
>> > > Thanks,
>> > > Anuradha.
>> > >
>> >
>>
>> --
>> Thanks,
>> Anuradha.
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160829/7442a66c/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: client.log
Type: text/x-log
Size: 75086 bytes
Desc: not available
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160829/7442a66c/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: gluster2-brick1-1.log
Type: text/x-log
Size: 17981 bytes
Desc: not available
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160829/7442a66c/attachment-0001.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: gluster2-brick2-1.log
Type: text/x-log
Size: 8152 bytes
Desc: not available
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160829/7442a66c/attachment-0002.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: gluster2-brick3-1.log
Type: text/x-log
Size: 8369 bytes
Desc: not available
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160829/7442a66c/attachment-0003.bin>