[Gluster-users] Self heal issues

Fri Aug 7 07:17:59 UTC 2015

On 08/07/2015 12:11 PM, Prasun Gera wrote:
> No, no noticeable difference. Still very high, possibly higher than 
> before.

I was guessing that the cpu usage could be because of the diff algorithm 
which computes  checksums (which is a cpu intensive task). That doesn't  
seem to be the case. Could you do a volume profile and see the FOPS that 
are happening on the bricks and share the result?
1.gluster volume profile <volname> start
2. gluster volume profile <volname> info
3. wait 10-15 seconds
4.gluster volume profile <volname> info

> The system has come down to a crawl. It's difficult to even ssh or run 
> any commands on the terminal. Do you make anything of the logs ? The 
> brick log is just a giant alternating stream of those two lines I 
> mentioned earlier.

>
> On Thu, Aug 6, 2015 at 10:10 PM, Ravishankar N <ravishankar at redhat.com 
> <mailto:ravishankar at redhat.com>> wrote:
>
>
>
>     On 08/07/2015 01:33 AM, Prasun Gera wrote:
>
>         I replaced the brick in a node in my 3x2 dist+repl volume (RHS
>         3). I'm seeing that the heal process, which should essentially
>         be a dump from the working replica to the newly added one is
>         taking exceptionally long. It has moved ~100 G over a day on a
>         1Gigabit network. The CPU usage on both the nodes of the
>         replica has been pretty high.
>
>
>     Does setting `cluster.data-self-heal-algorithm` to full make a
>     difference in the cpu usage?
>
>
>         I also think that nagios is making it worse. The heal is slow
>         enough as it is, and nagios keeps triggering heal info, which
>         I think never completes. I also see my logs filling up These
>         are some of the log contents which I got by running tail on them:
>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150807/4907dcee/attachment.html>