[Gluster-users] canceling full heal 3.8

Sat Aug 27 14:58:42 UTC 2016

On Fri, Aug 26, 2016 at 8:40 PM, David Gossage <dgossage at carouselchecks.com>
wrote:

> I was in process of redoing underlying disk layout for a brick.  triggered
> full heal.  then realized I had skipped a step of applying zfs set xattr=sa
> which is kind of important running zfs under linux.
>
> Rather than wait however many hours until my TB of data heals is their a
> command in 3.8 to cancel a heal begun by gluster volume heal GLUSTER1
> full?  If not won't be end of world just waste of time to wait and then
> have to redo after writing out a TB of data.
>
>
Does the heal process crawl from any particular node when invoked?  I have
3 nodes.  I ran command from node 3, node 2 is one with files needing
healed, node 1 is brick I heaeld yesterday but forgot to set xattr=sa on
which usually has bad performance results for zfsonlinux.  I did set it
about 30 minutes into the heal figuring better some than none until I could
redo it again.

12 hours later the 1TB of data was healed so I figured I'd move on to node
2, then 3.  Then assuming 12 hour windows for each node I could redo node 1
with correct settings before Monday.  When node 1 healed it first found all
the visible files from mount point and .glusterfs, hen numbers jumped back
up after those were done and it started finding shards.  It happened fairly
quickly.  2nd time around with node 2 it is crawling to a standstill while
finding all the shards to heal.  I'm wondering if its doing the crawl from
node 1 and the poor settings that existed for first 30 minutes of file
heals is slowing it down.  If so I would hope once the files that were
created/healed while settings weren't correct are found and it moves past
them the rest should go faster.

The only errors in any logs are brick logs

[2016-08-27 14:25:10.022786] E [MSGID: 115050]
[server-rpc-fops.c:156:server_lookup_cbk] 0-GLUSTER1-server: 3251237:
LOOKUP (null)
(00000000-0000-0000-0000-000000000000/4c7d44fc-a0c1-413b-8dc4-2abbbe1d4d4f.423)
==> (Invalid argument) [Invalid argument]
[2016-08-27 14:36:59.234073] W [MSGID: 115009]
[server-resolve.c:569:server_resolve] 0-GLUSTER1-server: no resolution type
for (null) (LOOKUP)
[2016-08-27 14:36:59.234128] E [MSGID: 115050]
[server-rpc-fops.c:156:server_lookup_cbk] 0-GLUSTER1-server: 3288322:
LOOKUP (null)
(00000000-0000-0000-0000-000000000000/4c7d44fc-a0c1-413b-8dc4-2abbbe1d4d4f.328)
==> (Invalid argument) [Invalid argument]

And I would hope that it's just related to heal process or when a shard is
hit and its found it doesnt exist here it errors out as expected.

> *David Gossage*
> *Carousel Checks Inc. | System Administrator*
> *Office* 708.613.2284
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160827/e7f808c1/attachment.html>