[Gluster-users] 3.8.3 Shards Healing Glacier Slow

David Gossage dgossage at carouselchecks.com
Tue Aug 30 15:03:40 UTC 2016


On Tue, Aug 30, 2016 at 10:02 AM, David Gossage <dgossage at carouselchecks.com
> wrote:

> updated test server to 3.8.3
>
> Brick1: 192.168.71.10:/gluster2/brick1/1
> Brick2: 192.168.71.11:/gluster2/brick2/1
> Brick3: 192.168.71.12:/gluster2/brick3/1
> Options Reconfigured:
> cluster.granular-entry-heal: on
> performance.readdir-ahead: on
> performance.read-ahead: off
> nfs.disable: on
> nfs.addr-namelookup: off
> nfs.enable-ino32: off
> cluster.background-self-heal-count: 16
> cluster.self-heal-window-size: 1024
> performance.quick-read: off
> performance.io-cache: off
> performance.stat-prefetch: off
> cluster.eager-lock: enable
> network.remote-dio: on
> cluster.quorum-type: auto
> cluster.server-quorum-type: server
> storage.owner-gid: 36
> storage.owner-uid: 36
> server.allow-insecure: on
> features.shard: on
> features.shard-block-size: 64MB
> performance.strict-o-direct: off
> cluster.locking-scheme: granular
>
> kill -15 brickpid
> rm -Rf /gluster2/brick3
> mkdir -p /gluster2/brick3/1
> mkdir mkdir /rhev/data-center/mnt/glusterSD/192.168.71.10\:_
> glustershard/fake2
> setfattr -n "user.some-name" -v "some-value" /rhev/data-center/mnt/
> glusterSD/192.168.71.10\:_glustershard/fake2
> gluster v start glustershard force
>
> at this point brick process starts and all visible files including new dir
> are made on brick
> handful of shards are in heal statistics still but no .shard directory
> created and no increase in shard count
>
> gluster v heal glustershard
>
> At this point still no increase in count or dir made no additional
> activity in logs for healing generated.  waited few minutes tailing logs to
> check if anything kicked in.
>
> gluster v heal glustershard full
>
> gluster shards added to list and heal commences.  logs show full sweep
> starting on all 3 nodes.  though this time it only shows as finishing on
> one which looks to be the one that had brick deleted.
>
> [2016-08-30 14:45:33.098589] I [MSGID: 108026]
> [afr-self-heald.c:646:afr_shd_full_healer] 0-glustershard-replicate-0:
> starting full sweep on subvol glustershard-client-0
> [2016-08-30 14:45:33.099492] I [MSGID: 108026]
> [afr-self-heald.c:646:afr_shd_full_healer] 0-glustershard-replicate-0:
> starting full sweep on subvol glustershard-client-1
> [2016-08-30 14:45:33.100093] I [MSGID: 108026]
> [afr-self-heald.c:646:afr_shd_full_healer] 0-glustershard-replicate-0:
> starting full sweep on subvol glustershard-client-2
> [2016-08-30 14:52:29.760213] I [MSGID: 108026]
> [afr-self-heald.c:656:afr_shd_full_healer] 0-glustershard-replicate-0:
> finished full sweep on subvol glustershard-client-2
>

Just realized its still healing so that may be why sweep on 2 other bricks
haven't replied as finished.

>
>
> my hope is that later tonight a full heal will work on production.  Is it
> possible self-heal daemon can get stale or stop listening but still show as
> active?  Would stopping and starting self-heal daemon from gluster cli
> before doing these heals be helpful?
>
>
> On Tue, Aug 30, 2016 at 9:29 AM, David Gossage <
> dgossage at carouselchecks.com> wrote:
>
>> On Tue, Aug 30, 2016 at 8:52 AM, David Gossage <
>> dgossage at carouselchecks.com> wrote:
>>
>>> On Tue, Aug 30, 2016 at 8:01 AM, Krutika Dhananjay <kdhananj at redhat.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Tue, Aug 30, 2016 at 6:20 PM, Krutika Dhananjay <kdhananj at redhat.com
>>>> > wrote:
>>>>
>>>>>
>>>>>
>>>>> On Tue, Aug 30, 2016 at 6:07 PM, David Gossage <
>>>>> dgossage at carouselchecks.com> wrote:
>>>>>
>>>>>> On Tue, Aug 30, 2016 at 7:18 AM, Krutika Dhananjay <
>>>>>> kdhananj at redhat.com> wrote:
>>>>>>
>>>>>>> Could you also share the glustershd logs?
>>>>>>>
>>>>>>
>>>>>> I'll get them when I get to work sure
>>>>>>
>>>>>
>>>>>>
>>>>>>>
>>>>>>> I tried the same steps that you mentioned multiple times, but heal
>>>>>>> is running to completion without any issues.
>>>>>>>
>>>>>>> It must be said that 'heal full' traverses the files and directories
>>>>>>> in a depth-first order and does heals also in the same order. But if it
>>>>>>> gets interrupted in the middle (say because self-heal-daemon was either
>>>>>>> intentionally or unintentionally brought offline and then brought back up),
>>>>>>> self-heal will only pick up the entries that are so far marked as
>>>>>>> new-entries that need heal which it will find in indices/xattrop directory.
>>>>>>> What this means is that those files and directories that were not visited
>>>>>>> during the crawl, will remain untouched and unhealed in this second
>>>>>>> iteration of heal, unless you execute a 'heal-full' again.
>>>>>>>
>>>>>>
>>>>>> So should it start healing shards as it crawls or not until after it
>>>>>> crawls the entire .shard directory?  At the pace it was going that could be
>>>>>> a week with one node appearing in the cluster but with no shard files if
>>>>>> anything tries to access a file on that node.  From my experience other day
>>>>>> telling it to heal full again did nothing regardless of node used.
>>>>>>
>>>>>
>>>> Crawl is started from '/' of the volume. Whenever self-heal detects
>>>> during the crawl that a file or directory is present in some brick(s) and
>>>> absent in others, it creates the file on the bricks where it is absent and
>>>> marks the fact that the file or directory might need data/entry and
>>>> metadata heal too (this also means that an index is created under
>>>> .glusterfs/indices/xattrop of the src bricks). And the data/entry and
>>>> metadata heal are picked up and done in
>>>>
>>> the background with the help of these indices.
>>>>
>>>
>>> Looking at my 3rd node as example i find nearly an exact same number of
>>> files in xattrop dir as reported by heal count at time I brought down node2
>>> to try and alleviate read io errors that seemed to occur from what I was
>>> guessing as attempts to use the node with no shards for reads.
>>>
>>> Also attached are the glustershd logs from the 3 nodes, along with the
>>> test node i tried yesterday with same results.
>>>
>>
>> Looking at my own logs I notice that a full sweep was only ever recorded
>> in glustershd.log on 2nd node with missing directory.  I believe I should
>> have found a sweep begun on every node correct?
>>
>> On my test dev when it did work I do see that
>>
>> [2016-08-30 13:56:25.223333] I [MSGID: 108026]
>> [afr-self-heald.c:646:afr_shd_full_healer] 0-glustershard-replicate-0:
>> starting full sweep on subvol glustershard-client-0
>> [2016-08-30 13:56:25.223522] I [MSGID: 108026]
>> [afr-self-heald.c:646:afr_shd_full_healer] 0-glustershard-replicate-0:
>> starting full sweep on subvol glustershard-client-1
>> [2016-08-30 13:56:25.224616] I [MSGID: 108026]
>> [afr-self-heald.c:646:afr_shd_full_healer] 0-glustershard-replicate-0:
>> starting full sweep on subvol glustershard-client-2
>> [2016-08-30 14:18:48.333740] I [MSGID: 108026]
>> [afr-self-heald.c:656:afr_shd_full_healer] 0-glustershard-replicate-0:
>> finished full sweep on subvol glustershard-client-2
>> [2016-08-30 14:18:48.356008] I [MSGID: 108026]
>> [afr-self-heald.c:656:afr_shd_full_healer] 0-glustershard-replicate-0:
>> finished full sweep on subvol glustershard-client-1
>> [2016-08-30 14:18:49.637811] I [MSGID: 108026]
>> [afr-self-heald.c:656:afr_shd_full_healer] 0-glustershard-replicate-0:
>> finished full sweep on subvol glustershard-client-0
>>
>> While when looking at past few days of the 3 prod nodes i only found that
>> on my 2nd node
>> [2016-08-27 01:26:42.638772] I [MSGID: 108026]
>> [afr-self-heald.c:646:afr_shd_full_healer] 0-GLUSTER1-replicate-0:
>> starting full sweep on subvol GLUSTER1-client-1
>> [2016-08-27 11:37:01.732366] I [MSGID: 108026]
>> [afr-self-heald.c:656:afr_shd_full_healer] 0-GLUSTER1-replicate-0:
>> finished full sweep on subvol GLUSTER1-client-1
>> [2016-08-27 12:58:34.597228] I [MSGID: 108026]
>> [afr-self-heald.c:646:afr_shd_full_healer] 0-GLUSTER1-replicate-0:
>> starting full sweep on subvol GLUSTER1-client-1
>> [2016-08-27 12:59:28.041173] I [MSGID: 108026]
>> [afr-self-heald.c:656:afr_shd_full_healer] 0-GLUSTER1-replicate-0:
>> finished full sweep on subvol GLUSTER1-client-1
>> [2016-08-27 20:03:42.560188] I [MSGID: 108026]
>> [afr-self-heald.c:646:afr_shd_full_healer] 0-GLUSTER1-replicate-0:
>> starting full sweep on subvol GLUSTER1-client-1
>> [2016-08-27 20:03:44.278274] I [MSGID: 108026]
>> [afr-self-heald.c:656:afr_shd_full_healer] 0-GLUSTER1-replicate-0:
>> finished full sweep on subvol GLUSTER1-client-1
>> [2016-08-27 21:00:42.603315] I [MSGID: 108026]
>> [afr-self-heald.c:646:afr_shd_full_healer] 0-GLUSTER1-replicate-0:
>> starting full sweep on subvol GLUSTER1-client-1
>> [2016-08-27 21:00:46.148674] I [MSGID: 108026]
>> [afr-self-heald.c:656:afr_shd_full_healer] 0-GLUSTER1-replicate-0:
>> finished full sweep on subvol GLUSTER1-client-1
>>
>>
>>
>>
>>
>>>
>>>>
>>>>>>
>>>>>>> My suspicion is that this is what happened on your setup. Could you
>>>>>>> confirm if that was the case?
>>>>>>>
>>>>>>
>>>>>> Brick was brought online with force start then a full heal launched.
>>>>>> Hours later after it became evident that it was not adding new files to
>>>>>> heal I did try restarting self-heal daemon and relaunching full heal again.
>>>>>> But this was after the heal had basically already failed to work as
>>>>>> intended.
>>>>>>
>>>>>
>>>>> OK. How did you figure it was not adding any new files? I need to know
>>>>> what places you were monitoring to come to this conclusion.
>>>>>
>>>>> -Krutika
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>>> As for those logs, I did manager to do something that caused these
>>>>>>> warning messages you shared earlier to appear in my client and server logs.
>>>>>>> Although these logs are annoying and a bit scary too, they didn't do
>>>>>>> any harm to the data in my volume. Why they appear just after a brick is
>>>>>>> replaced and under no other circumstances is something I'm still
>>>>>>> investigating.
>>>>>>>
>>>>>>> But for future, it would be good to follow the steps Anuradha gave
>>>>>>> as that would allow self-heal to at least detect that it has some repairing
>>>>>>> to do whenever it is restarted whether intentionally or otherwise.
>>>>>>>
>>>>>>
>>>>>> I followed those steps as described on my test box and ended up with
>>>>>> exact same outcome of adding shards at an agonizing slow pace and no
>>>>>> creation of .shard directory or heals on shard directory.  Directories
>>>>>> visible from mount healed quickly.  This was with one VM so it has only 800
>>>>>> shards as well.  After hours at work it had added a total of 33 shards to
>>>>>> be healed.  I sent those logs yesterday as well though not the glustershd.
>>>>>>
>>>>>> Does replace-brick command copy files in same manner?  For these
>>>>>> purposes I am contemplating just skipping the heal route.
>>>>>>
>>>>>>
>>>>>>> -Krutika
>>>>>>>
>>>>>>> On Tue, Aug 30, 2016 at 2:22 AM, David Gossage <
>>>>>>> dgossage at carouselchecks.com> wrote:
>>>>>>>
>>>>>>>> attached brick and client logs from test machine where same
>>>>>>>> behavior occurred not sure if anything new is there.  its still on 3.8.2
>>>>>>>>
>>>>>>>> Number of Bricks: 1 x 3 = 3
>>>>>>>> Transport-type: tcp
>>>>>>>> Bricks:
>>>>>>>> Brick1: 192.168.71.10:/gluster2/brick1/1
>>>>>>>> Brick2: 192.168.71.11:/gluster2/brick2/1
>>>>>>>> Brick3: 192.168.71.12:/gluster2/brick3/1
>>>>>>>> Options Reconfigured:
>>>>>>>> cluster.locking-scheme: granular
>>>>>>>> performance.strict-o-direct: off
>>>>>>>> features.shard-block-size: 64MB
>>>>>>>> features.shard: on
>>>>>>>> server.allow-insecure: on
>>>>>>>> storage.owner-uid: 36
>>>>>>>> storage.owner-gid: 36
>>>>>>>> cluster.server-quorum-type: server
>>>>>>>> cluster.quorum-type: auto
>>>>>>>> network.remote-dio: on
>>>>>>>> cluster.eager-lock: enable
>>>>>>>> performance.stat-prefetch: off
>>>>>>>> performance.io-cache: off
>>>>>>>> performance.quick-read: off
>>>>>>>> cluster.self-heal-window-size: 1024
>>>>>>>> cluster.background-self-heal-count: 16
>>>>>>>> nfs.enable-ino32: off
>>>>>>>> nfs.addr-namelookup: off
>>>>>>>> nfs.disable: on
>>>>>>>> performance.read-ahead: off
>>>>>>>> performance.readdir-ahead: on
>>>>>>>> cluster.granular-entry-heal: on
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Aug 29, 2016 at 2:20 PM, David Gossage <
>>>>>>>> dgossage at carouselchecks.com> wrote:
>>>>>>>>
>>>>>>>>> On Mon, Aug 29, 2016 at 7:01 AM, Anuradha Talur <atalur at redhat.com
>>>>>>>>> > wrote:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ----- Original Message -----
>>>>>>>>>> > From: "David Gossage" <dgossage at carouselchecks.com>
>>>>>>>>>> > To: "Anuradha Talur" <atalur at redhat.com>
>>>>>>>>>> > Cc: "gluster-users at gluster.org List" <Gluster-users at gluster.org>,
>>>>>>>>>> "Krutika Dhananjay" <kdhananj at redhat.com>
>>>>>>>>>> > Sent: Monday, August 29, 2016 5:12:42 PM
>>>>>>>>>> > Subject: Re: [Gluster-users] 3.8.3 Shards Healing Glacier Slow
>>>>>>>>>> >
>>>>>>>>>> > On Mon, Aug 29, 2016 at 5:39 AM, Anuradha Talur <
>>>>>>>>>> atalur at redhat.com> wrote:
>>>>>>>>>> >
>>>>>>>>>> > > Response inline.
>>>>>>>>>> > >
>>>>>>>>>> > > ----- Original Message -----
>>>>>>>>>> > > > From: "Krutika Dhananjay" <kdhananj at redhat.com>
>>>>>>>>>> > > > To: "David Gossage" <dgossage at carouselchecks.com>
>>>>>>>>>> > > > Cc: "gluster-users at gluster.org List" <
>>>>>>>>>> Gluster-users at gluster.org>
>>>>>>>>>> > > > Sent: Monday, August 29, 2016 3:55:04 PM
>>>>>>>>>> > > > Subject: Re: [Gluster-users] 3.8.3 Shards Healing Glacier
>>>>>>>>>> Slow
>>>>>>>>>> > > >
>>>>>>>>>> > > > Could you attach both client and brick logs? Meanwhile I
>>>>>>>>>> will try these
>>>>>>>>>> > > steps
>>>>>>>>>> > > > out on my machines and see if it is easily recreatable.
>>>>>>>>>> > > >
>>>>>>>>>> > > > -Krutika
>>>>>>>>>> > > >
>>>>>>>>>> > > > On Mon, Aug 29, 2016 at 2:31 PM, David Gossage <
>>>>>>>>>> > > dgossage at carouselchecks.com
>>>>>>>>>> > > > > wrote:
>>>>>>>>>> > > >
>>>>>>>>>> > > >
>>>>>>>>>> > > >
>>>>>>>>>> > > > Centos 7 Gluster 3.8.3
>>>>>>>>>> > > >
>>>>>>>>>> > > > Brick1: ccgl1.gl.local:/gluster1/BRICK1/1
>>>>>>>>>> > > > Brick2: ccgl2.gl.local:/gluster1/BRICK1/1
>>>>>>>>>> > > > Brick3: ccgl4.gl.local:/gluster1/BRICK1/1
>>>>>>>>>> > > > Options Reconfigured:
>>>>>>>>>> > > > cluster.data-self-heal-algorithm: full
>>>>>>>>>> > > > cluster.self-heal-daemon: on
>>>>>>>>>> > > > cluster.locking-scheme: granular
>>>>>>>>>> > > > features.shard-block-size: 64MB
>>>>>>>>>> > > > features.shard: on
>>>>>>>>>> > > > performance.readdir-ahead: on
>>>>>>>>>> > > > storage.owner-uid: 36
>>>>>>>>>> > > > storage.owner-gid: 36
>>>>>>>>>> > > > performance.quick-read: off
>>>>>>>>>> > > > performance.read-ahead: off
>>>>>>>>>> > > > performance.io-cache: off
>>>>>>>>>> > > > performance.stat-prefetch: on
>>>>>>>>>> > > > cluster.eager-lock: enable
>>>>>>>>>> > > > network.remote-dio: enable
>>>>>>>>>> > > > cluster.quorum-type: auto
>>>>>>>>>> > > > cluster.server-quorum-type: server
>>>>>>>>>> > > > server.allow-insecure: on
>>>>>>>>>> > > > cluster.self-heal-window-size: 1024
>>>>>>>>>> > > > cluster.background-self-heal-count: 16
>>>>>>>>>> > > > performance.strict-write-ordering: off
>>>>>>>>>> > > > nfs.disable: on
>>>>>>>>>> > > > nfs.addr-namelookup: off
>>>>>>>>>> > > > nfs.enable-ino32: off
>>>>>>>>>> > > > cluster.granular-entry-heal: on
>>>>>>>>>> > > >
>>>>>>>>>> > > > Friday did rolling upgrade from 3.8.3->3.8.3 no issues.
>>>>>>>>>> > > > Following steps detailed in previous recommendations began
>>>>>>>>>> proces of
>>>>>>>>>> > > > replacing and healngbricks one node at a time.
>>>>>>>>>> > > >
>>>>>>>>>> > > > 1) kill pid of brick
>>>>>>>>>> > > > 2) reconfigure brick from raid6 to raid10
>>>>>>>>>> > > > 3) recreate directory of brick
>>>>>>>>>> > > > 4) gluster volume start <> force
>>>>>>>>>> > > > 5) gluster volume heal <> full
>>>>>>>>>> > > Hi,
>>>>>>>>>> > >
>>>>>>>>>> > > I'd suggest that full heal is not used. There are a few bugs
>>>>>>>>>> in full heal.
>>>>>>>>>> > > Better safe than sorry ;)
>>>>>>>>>> > > Instead I'd suggest the following steps:
>>>>>>>>>> > >
>>>>>>>>>> > > Currently I brought the node down by systemctl stop glusterd
>>>>>>>>>> as I was
>>>>>>>>>> > getting sporadic io issues and a few VM's paused so hoping that
>>>>>>>>>> will help.
>>>>>>>>>> > I may wait to do this till around 4PM when most work is done in
>>>>>>>>>> case it
>>>>>>>>>> > shoots load up.
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>> > > 1) kill pid of brick
>>>>>>>>>> > > 2) to configuring of brick that you need
>>>>>>>>>> > > 3) recreate brick dir
>>>>>>>>>> > > 4) while the brick is still down, from the mount point:
>>>>>>>>>> > >    a) create a dummy non existent dir under / of mount.
>>>>>>>>>> > >
>>>>>>>>>> >
>>>>>>>>>> > so if noee 2 is down brick, pick node for example 3 and make a
>>>>>>>>>> test dir
>>>>>>>>>> > under its brick directory that doesnt exist on 2 or should I be
>>>>>>>>>> dong this
>>>>>>>>>> > over a gluster mount?
>>>>>>>>>> You should be doing this over gluster mount.
>>>>>>>>>> >
>>>>>>>>>> > >    b) set a non existent extended attribute on / of mount.
>>>>>>>>>> > >
>>>>>>>>>> >
>>>>>>>>>> > Could you give me an example of an attribute to set?   I've
>>>>>>>>>> read a tad on
>>>>>>>>>> > this, and looked up attributes but haven't set any yet myself.
>>>>>>>>>> >
>>>>>>>>>> Sure. setfattr -n "user.some-name" -v "some-value" <path-to-mount>
>>>>>>>>>> > Doing these steps will ensure that heal happens only from
>>>>>>>>>> updated brick to
>>>>>>>>>> > > down brick.
>>>>>>>>>> > > 5) gluster v start <> force
>>>>>>>>>> > > 6) gluster v heal <>
>>>>>>>>>> > >
>>>>>>>>>> >
>>>>>>>>>> > Will it matter if somewhere in gluster the full heal command
>>>>>>>>>> was run other
>>>>>>>>>> > day?  Not sure if it eventually stops or times out.
>>>>>>>>>> >
>>>>>>>>>> full heal will stop once the crawl is done. So if you want to
>>>>>>>>>> trigger heal again,
>>>>>>>>>> run gluster v heal <>. Actually even brick up or volume start
>>>>>>>>>> force should
>>>>>>>>>> trigger the heal.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Did this on test bed today.  its one server with 3 bricks on same
>>>>>>>>> machine so take that for what its worth.  also it still runs 3.8.2.  Maybe
>>>>>>>>> ill update and re-run test.
>>>>>>>>>
>>>>>>>>> killed brick
>>>>>>>>> deleted brick dir
>>>>>>>>> recreated brick dir
>>>>>>>>> created fake dir on gluster mount
>>>>>>>>> set suggested fake attribute on it
>>>>>>>>> ran volume start <> force
>>>>>>>>>
>>>>>>>>> looked at files it said needed healing and it was just 8 shards
>>>>>>>>> that were modified for few minutes I ran through steps
>>>>>>>>>
>>>>>>>>> gave it few minutes and it stayed same
>>>>>>>>> ran gluster volume <> heal
>>>>>>>>>
>>>>>>>>> it healed all the directories and files you can see over mount
>>>>>>>>> including fakedir.
>>>>>>>>>
>>>>>>>>> same issue for shards though.  it adds more shards to heal at
>>>>>>>>> glacier pace.  slight jump in speed if I stat every file and dir in VM
>>>>>>>>> running but not all shards.
>>>>>>>>>
>>>>>>>>> It started with 8 shards to heal and is now only at 33 out of 800
>>>>>>>>> and probably wont finish adding for few days at rate it goes.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> > >
>>>>>>>>>> > > > 1st node worked as expected took 12 hours to heal 1TB data.
>>>>>>>>>> Load was
>>>>>>>>>> > > little
>>>>>>>>>> > > > heavy but nothing shocking.
>>>>>>>>>> > > >
>>>>>>>>>> > > > About an hour after node 1 finished I began same process on
>>>>>>>>>> node2. Heal
>>>>>>>>>> > > > proces kicked in as before and the files in directories
>>>>>>>>>> visible from
>>>>>>>>>> > > mount
>>>>>>>>>> > > > and .glusterfs healed in short time. Then it began crawl of
>>>>>>>>>> .shard adding
>>>>>>>>>> > > > those files to heal count at which point the entire proces
>>>>>>>>>> ground to a
>>>>>>>>>> > > halt
>>>>>>>>>> > > > basically. After 48 hours out of 19k shards it has added
>>>>>>>>>> 5900 to heal
>>>>>>>>>> > > list.
>>>>>>>>>> > > > Load on all 3 machnes is negligible. It was suggested to
>>>>>>>>>> change this
>>>>>>>>>> > > value
>>>>>>>>>> > > > to full cluster.data-self-heal-algorithm and restart
>>>>>>>>>> volume which I
>>>>>>>>>> > > did. No
>>>>>>>>>> > > > efffect. Tried relaunching heal no effect, despite any node
>>>>>>>>>> picked. I
>>>>>>>>>> > > > started each VM and performed a stat of all files from
>>>>>>>>>> within it, or a
>>>>>>>>>> > > full
>>>>>>>>>> > > > virus scan and that seemed to cause short small spikes in
>>>>>>>>>> shards added,
>>>>>>>>>> > > but
>>>>>>>>>> > > > not by much. Logs are showing no real messages indicating
>>>>>>>>>> anything is
>>>>>>>>>> > > going
>>>>>>>>>> > > > on. I get hits to brick log on occasion of null lookups
>>>>>>>>>> making me think
>>>>>>>>>> > > its
>>>>>>>>>> > > > not really crawling shards directory but waiting for a
>>>>>>>>>> shard lookup to
>>>>>>>>>> > > add
>>>>>>>>>> > > > it. I'll get following in brick log but not constant and
>>>>>>>>>> sometime
>>>>>>>>>> > > multiple
>>>>>>>>>> > > > for same shard.
>>>>>>>>>> > > >
>>>>>>>>>> > > > [2016-08-29 08:31:57.478125] W [MSGID: 115009]
>>>>>>>>>> > > > [server-resolve.c:569:server_resolve] 0-GLUSTER1-server:
>>>>>>>>>> no resolution
>>>>>>>>>> > > type
>>>>>>>>>> > > > for (null) (LOOKUP)
>>>>>>>>>> > > > [2016-08-29 08:31:57.478170] E [MSGID: 115050]
>>>>>>>>>> > > > [server-rpc-fops.c:156:server_lookup_cbk]
>>>>>>>>>> 0-GLUSTER1-server: 12591783:
>>>>>>>>>> > > > LOOKUP (null) (00000000-0000-0000-00
>>>>>>>>>> > > > 00-000000000000/241a55ed-f0d5-4dbc-a6ce-ab784a0ba6ff.221)
>>>>>>>>>> ==> (Invalid
>>>>>>>>>> > > > argument) [Invalid argument]
>>>>>>>>>> > > >
>>>>>>>>>> > > > This one repeated about 30 times in row then nothing for 10
>>>>>>>>>> minutes then
>>>>>>>>>> > > one
>>>>>>>>>> > > > hit for one different shard by itself.
>>>>>>>>>> > > >
>>>>>>>>>> > > > How can I determine if Heal is actually running? How can I
>>>>>>>>>> kill it or
>>>>>>>>>> > > force
>>>>>>>>>> > > > restart? Does node I start it from determine which
>>>>>>>>>> directory gets
>>>>>>>>>> > > crawled to
>>>>>>>>>> > > > determine heals?
>>>>>>>>>> > > >
>>>>>>>>>> > > > David Gossage
>>>>>>>>>> > > > Carousel Checks Inc. | System Administrator
>>>>>>>>>> > > > Office 708.613.2284
>>>>>>>>>> > > >
>>>>>>>>>> > > > _______________________________________________
>>>>>>>>>> > > > Gluster-users mailing list
>>>>>>>>>> > > > Gluster-users at gluster.org
>>>>>>>>>> > > > http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>>> > > >
>>>>>>>>>> > > >
>>>>>>>>>> > > > _______________________________________________
>>>>>>>>>> > > > Gluster-users mailing list
>>>>>>>>>> > > > Gluster-users at gluster.org
>>>>>>>>>> > > > http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>>> > >
>>>>>>>>>> > > --
>>>>>>>>>> > > Thanks,
>>>>>>>>>> > > Anuradha.
>>>>>>>>>> > >
>>>>>>>>>> >
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Thanks,
>>>>>>>>>> Anuradha.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160830/df0d9839/attachment.html>


More information about the Gluster-users mailing list