[Bugs] [Bug 1403984] Node node high CPU - healing entries increasing

Mon Dec 26 23:28:27 UTC 2016

https://bugzilla.redhat.com/show_bug.cgi?id=1403984

--- Comment #4 from denmat <tu2Bgone at gmail.com> ---
This bug is most likely a duplicate of this issue
https://bugzilla.redhat.com/show_bug.cgi?id=1402621 (also reported by me). It
has some more statedumps and a client fuse log.

We have had to rip out our GlusterFS installation as it was too unstable and
the instances I stopped have been cleaned up, so I can't get any further logs.

I have captured a screen shot during the time it was misbehaving and have
attached it, if that can provide any further insight. It includes a dstat
showing more disk read/writes and net performance at the time.

1.
The CPU was being hogged by the glusterfsd. 
2.
Number of entries of those bricks was as high as 1700. When we stopped the
hosts accessing the gluster mounts it would go down but quickly started
climbing again as soon as they were brought back on line and CPU would climb
rapidly (we are talking at most 12 client nodes).
The node that was most recently added as a replacement node always showed a
lower count of entries for the brick
3.
We did find there was a directory that was causing 'mount point does not exist'
errors. We removed that from the gluster nodes and it seems to help. But also
we did a rebalance (fix-layout and migrate data) which also appeared to
help(after everything calmed down).

The file 9b886490-6a45-433f-aa49-a398c4ef385d
('/ftpdata/<removed>/bulk_import/photos/Theory classroom in high school.jpg')
that were reported with the SETATTR error, on one of the low CPU nodes
(10.90.3.14), was just one of many similar files of with similar errors. The
files were zero byte and with the same timestamps as the other nodes. At that
time they did not have their permissions set correctly as that happens after
they get the data (I assume). They were owned by root and after they received
data, they had correct permissions. I could remove one of these zero byte files
from the gluster node filesystem and they would be recreated (zero bytes).

Before we removed the cluster it was behaving normally the heal info showed a
normal amount of entries (up to 20).

-- 
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=FsZpasr5Wf&a=cc_unsubscribe