[Gluster-users] Very odd performance issue

Fri May 5 01:10:50 UTC 2017

----- Original Message -----
> From: "David Miller" <dmiller at metheus.org>
> To: gluster-users at gluster.org
> Sent: Thursday, May 4, 2017 2:48:38 PM
> Subject: [Gluster-users] Very odd performance issue
> 
> Background: 4 identical gluster servers with 15 TB each in 2x2 setup.
> CentOS Linux release 7.3.1611 (Core)
> glust erfs-server-3.9.1-1.el7.x86_64
> client systems are using:
> glusterfs-client 3.5.2-2+deb8u3
> 
> The cluster has ~12 TB in use with 21 million files. Lots of jpgs. About 12
> clients are mounting gluster volumes.
> 
> Network load is light: iftop shows each server has 10-15 Mbit reads and about
> half that in writes.
> 
> What I’m seeing that concerns me is that one box, gluster4, has roughly twice
> the CPU utilization and twice or more the load average of the other three
> servers. gluster4 has a 24 hour average of about 30% CPU utilization,
> something that seems to me to be way out of line for a couple MB/sec of
> traffic.
> 
> In running volume top, the odd thing I see is that for gluster1-3 I get
> latency summaries like this:
> Brick: gluster1.publicinteractive.com :/gluster/drupal_prod
> —————————————————————————————
> %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop
> -------- ----------- ----------- ----------- ------------ ----
> 
> 9.96 675.07 us 15.00 us 1067793.00 us 205060 INODELK
> 15.85 3414.20 us 16.00 us 773621.00 us 64494 READ
> 51.35 2235.96 us 12.00 us 1093609.00 us 319120 LOOKUP
> 
> … but my problem server has far more inodelk latency:
> 
> 12.01 4712.03 us 17.00 us 1773590.00 us 47214 READ
> 27.50 2390.27 us 14.00 us 1877571.00 us 213121 INODELK
> 28.70 1643.65 us 12.00 us 1837696.00 us 323407 LOOKUP
> 
> The servers are intended to be identical, and are indeed identical hardware.
> 
> Suggestions on where to look or which FM to RT ver welcome indeed.

IIRC INODELK is for internal locking / synchronization:

"GlusterFS has locks translator which provides the following internal locking operations called  inodelk, entrylk which are used by afr to achieve synchronization of operations on files or directories that conflict with each other."

I found a bug where there was a leak:

https://bugzilla.redhat.com/show_bug.cgi?id=1405886

It was fixed in the 3.8 line, it may be worth looking into upgrading the gluster version on your clients to eliminate any issues that were fixed between 3.5(your client version) and 3.9(your server version).

Also, have a look at the brick and client logs.  You could try searching them for "INODELK".  Are your clients accessing alot of the same files at the same time?  Also on the server where you are seeing the higher load check the self heal daemon logs to see if there is any healing happening.

Sorry I don't have anything concrete, like I said it may be worth upgrading the clients and having a look at your logs to see if you can glean any information from them.

-b

> 
> Thanks,
> 
> David
> 
> 
> 
> 
> 
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users