[Gluster-users] recovery from reboot time?
Amar Tumballi Suryanarayan
atumball at redhat.com
Wed Mar 20 05:39:25 UTC 2019
There are 2 things happen after a reboot.
1. glusterd (management layer) does a sanity check of its volumes, and sees
if there are anything different while it went down, and tries to correct
its state.
- This is fine as long as number of volumes are less, or numbers of nodes
are less. (less is referred as < 100).
2. If it is a replicate or disperse volume, then self-heal daemon does
check if there are any self-heal pending.
- This does a 'index' crawl to check which files actually changed when
one of the brick/node was down.
- If this list is big, it can sometimes does take some time.
But 'Days/weeks/month' is not a expected/observed behavior. Is there any
logs in the log file? If not, can you do a 'strace -f' to the pid which is
consuming major CPU?? (strace for 1 mins sample is good enough).
-Amar
On Wed, Mar 20, 2019 at 2:05 AM Alvin Starr <alvin at netvel.net> wrote:
> We have a simple replicated volume with 1 brick on each node of 17TB.
>
> There is something like 35M files and directories on the volume.
>
> One of the servers rebooted and is now "doing something".
>
> It kind of looks like its doing some kind of sality check with the node
> that did not reboot but its hard to say and it looks like it may run for
> hours/days/months....
>
> Will Gluster take a long time with Lots of little files to resync?
>
>
> --
> Alvin Starr || land: (905)513-7688
> Netvel Inc. || Cell: (416)806-0133
> alvin at netvel.net ||
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
--
Amar Tumballi (amarts)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190320/3a5e4834/attachment.html>
More information about the Gluster-users
mailing list