[Gluster-users] recovery from reboot time?

Wed Mar 20 05:39:25 UTC 2019

There are 2 things happen after a reboot.

1. glusterd (management layer) does a sanity check of its volumes, and sees
if there are anything different while it went down, and tries to correct
its state.
  - This is fine as long as number of volumes are less, or numbers of nodes
are less. (less is referred as < 100).

2. If it is a replicate or disperse volume, then self-heal daemon does
check if there are any self-heal pending.
  - This does a 'index' crawl to check which files actually changed when
one of the brick/node was down.
  - If this list is big, it can sometimes does take some time.

But 'Days/weeks/month' is not a expected/observed behavior. Is there any
logs in the log file? If not, can you do a 'strace -f' to the pid which is
consuming major CPU?? (strace for 1 mins sample is good enough).

-Amar

On Wed, Mar 20, 2019 at 2:05 AM Alvin Starr <alvin at netvel.net> wrote:

> We have a simple replicated volume  with 1 brick on each node of 17TB.
>
> There is something like 35M files and directories on the volume.
>
> One of the servers rebooted and is now "doing something".
>
> It kind of looks like its doing some kind of sality check with the node
> that did not reboot but its hard to say and it looks like it may run for
> hours/days/months....
>
> Will Gluster take a long time with Lots of little files to resync?
>
>
> --
> Alvin Starr                   ||   land:  (905)513-7688
> Netvel Inc.                   ||   Cell:  (416)806-0133
> alvin at netvel.net              ||
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users

-- 
Amar Tumballi (amarts)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190320/3a5e4834/attachment.html>