<div dir="ltr">Sorry for the delay. Somehow Gmail decided to put almost all email from this list to spam.<br>Anyway, yes, I checked the processes. Gluster processes are in &#39;R&#39; state, the others in &#39;S&#39; state.<br>You can find &#39;top -H&#39; output in the first message.<br>We&#39;re running glusterfs 6.8 on CentOS 7.8. Linux kernel 4.19.<br><br>Thanks.<br><div></div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">вт, 23 июн. 2020 г. в 21:49, Strahil Nikolov &lt;<a href="mailto:hunter86_bg@yahoo.com">hunter86_bg@yahoo.com</a>&gt;:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">What  is the OS and it&#39;s version ?<br>

I have seen similar behaviour (different workload)  on RHEL 7.6 (and below).<br>

<br>

Have you checked  what processes are in &#39;R&#39; or &#39;D&#39; state on  st2a  ?<br>

<br>

Best Regards,<br>

Strahil Nikolov<br>

<br>

На 23 юни 2020 г. 19:31:12 GMT+03:00, Pavel Znamensky &lt;<a href="mailto:kompastver@gmail.com" target="_blank">kompastver@gmail.com</a>&gt; написа:<br>

&gt;Hi all,<br>

&gt;There&#39;s something strange with one of our clusters and glusterfs<br>

&gt;version<br>

&gt;6.8: it&#39;s quite slow and one node is overloaded.<br>

&gt;This is distributed cluster with four servers with the same<br>

&gt;specs/OS/versions:<br>

&gt;<br>

&gt;Volume Name: st2<br>

&gt;Type: Distributed-Replicate<br>

&gt;Volume ID: 4755753b-37c4-403b-b1c8-93099bfc4c45<br>

&gt;Status: Started<br>

&gt;Snapshot Count: 0<br>

&gt;Number of Bricks: 2 x 2 = 4<br>

&gt;Transport-type: tcp<br>

&gt;Bricks:<br>

&gt;Brick1: st2a:/vol3/st2<br>

&gt;Brick2: st2b:/vol3/st2<br>

&gt;Brick3: st2c:/vol3/st2<br>

&gt;Brick4: st2d:/vol3/st2<br>

&gt;Options Reconfigured:<br>

&gt;cluster.rebal-throttle: aggressive<br>

&gt;nfs.disable: on<br>

&gt;performance.readdir-ahead: off<br>

&gt;transport.address-family: inet6<br>

&gt;performance.quick-read: off<br>

&gt;performance.cache-size: 1GB<br>

&gt;performance.io-cache: on<br>

&gt;performance.io-thread-count: 16<br>

&gt;cluster.data-self-heal-algorithm: full<br>

&gt;network.ping-timeout: 20<br>

&gt;server.event-threads: 2<br>

&gt;client.event-threads: 2<br>

&gt;cluster.readdir-optimize: on<br>

&gt;performance.read-ahead: off<br>

&gt;performance.parallel-readdir: on<br>

&gt;cluster.self-heal-daemon: enable<br>

&gt;storage.health-check-timeout: 20<br>

&gt;<br>

&gt;op.version for this cluster remains 50400<br>

&gt;<br>

&gt;st2a is a replica for the st2b and st2c is a replica for st2d.<br>

&gt;All our 50 clients mount this volume using FUSE and in contrast with<br>

&gt;other<br>

&gt;our cluster this one works terrible slow.<br>

&gt;Interesting thing here is that there are very low HDDs and network<br>

&gt;utilization from one hand and quite overloaded server from another<br>

&gt;hand.<br>

&gt;Also, there are no files which should be healed according to `gluster<br>

&gt;volume heal st2 info`.<br>

&gt;Load average across servers:<br>

&gt;st2a:<br>

&gt;load average: 28,73, 26,39, 27,44<br>

&gt;st2b:<br>

&gt;load average: 0,24, 0,46, 0,76<br>

&gt;st2c:<br>

&gt;load average: 0,13, 0,20, 0,27<br>

&gt;st2d:<br>

&gt;load average:2,93, 2,11, 1,50<br>

&gt;<br>

&gt;If we stop glusterfs on st2a server the cluster will work as fast as we<br>

&gt;expected.<br>

&gt;Previously the cluster worked on a version 5.x and there were no such<br>

&gt;problems.<br>

&gt;<br>

&gt;Interestingly, that almost all CPU usage on st2a generates by a<br>

&gt;&quot;system&quot;<br>

&gt;load.<br>

&gt;The most CPU intensive process is glusterfsd.<br>

&gt;`top -H` for glusterfsd process shows this:<br>

&gt;<br>

&gt;PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+<br>

&gt;COMMAND<br>

&gt;<br>

&gt;13894 root      20   0 2172892  96488   9056 R 74,0  0,1 122:09.14<br>

&gt;glfs_iotwr00a<br>

&gt;13888 root      20   0 2172892  96488   9056 R 73,7  0,1 121:38.26<br>

&gt;glfs_iotwr004<br>

&gt;13891 root      20   0 2172892  96488   9056 R 73,7  0,1 121:53.83<br>

&gt;glfs_iotwr007<br>

&gt;13920 root      20   0 2172892  96488   9056 R 73,0  0,1 122:11.27<br>

&gt;glfs_iotwr00f<br>

&gt;13897 root      20   0 2172892  96488   9056 R 68,3  0,1 121:09.82<br>

&gt;glfs_iotwr00d<br>

&gt;13896 root      20   0 2172892  96488   9056 R 68,0  0,1 122:03.99<br>

&gt;glfs_iotwr00c<br>

&gt;13868 root      20   0 2172892  96488   9056 R 67,7  0,1 122:42.55<br>

&gt;glfs_iotwr000<br>

&gt;13889 root      20   0 2172892  96488   9056 R 67,3  0,1 122:17.02<br>

&gt;glfs_iotwr005<br>

&gt;13887 root      20   0 2172892  96488   9056 R 67,0  0,1 122:29.88<br>

&gt;glfs_iotwr003<br>

&gt;13885 root      20   0 2172892  96488   9056 R 65,0  0,1 122:04.85<br>

&gt;glfs_iotwr001<br>

&gt;13892 root      20   0 2172892  96488   9056 R 55,0  0,1 121:15.23<br>

&gt;glfs_iotwr008<br>

&gt;13890 root      20   0 2172892  96488   9056 R 54,7  0,1 121:27.88<br>

&gt;glfs_iotwr006<br>

&gt;13895 root      20   0 2172892  96488   9056 R 54,0  0,1 121:28.35<br>

&gt;glfs_iotwr00b<br>

&gt;13893 root      20   0 2172892  96488   9056 R 53,0  0,1 122:23.12<br>

&gt;glfs_iotwr009<br>

&gt;13898 root      20   0 2172892  96488   9056 R 52,0  0,1 122:30.67<br>

&gt;glfs_iotwr00e<br>

&gt;13886 root      20   0 2172892  96488   9056 R 41,3  0,1 121:26.97<br>

&gt;glfs_iotwr002<br>

&gt;13878 root      20   0 2172892  96488   9056 S  1,0  0,1   1:20.34<br>

&gt;glfs_rpcrqhnd<br>

&gt;13840 root      20   0 2172892  96488   9056 S  0,7  0,1   0:51.54<br>

&gt;glfs_epoll000<br>

&gt;13841 root      20   0 2172892  96488   9056 S  0,7  0,1   0:51.14<br>

&gt;glfs_epoll001<br>

&gt;13877 root      20   0 2172892  96488   9056 S  0,3  0,1   1:20.02<br>

&gt;glfs_rpcrqhnd<br>

&gt;13833 root      20   0 2172892  96488   9056 S  0,0  0,1   0:00.00<br>

&gt;glusterfsd<br>

&gt;13834 root      20   0 2172892  96488   9056 S  0,0  0,1   0:00.14<br>

&gt;glfs_timer<br>

&gt;13835 root      20   0 2172892  96488   9056 S  0,0  0,1   0:00.00<br>

&gt;glfs_sigwait<br>

&gt;13836 root      20   0 2172892  96488   9056 S  0,0  0,1   0:00.16<br>

&gt;glfs_memsweep<br>

&gt;13837 root      20   0 2172892  96488   9056 S  0,0  0,1   0:00.05<br>

&gt;glfs_sproc0<br>

&gt;<br>

&gt;Also I didn&#39;t find relevant messages in log files.<br>

&gt;Honestly, don&#39;t know what to do. Does someone know how to debug or fix<br>

&gt;this<br>

&gt;behaviour?<br>

&gt;<br>

&gt;Best regards,<br>

&gt;Pavel<br>

</blockquote></div>