<div dir="ltr">Sorry for the delay. Somehow Gmail decided to put almost all email from this list to spam.<br>Anyway, yes, I checked the processes. Gluster processes are in 'R' state, the others in 'S' state.<br>You can find 'top -H' output in the first message.<br>We're running glusterfs 6.8 on CentOS 7.8. Linux kernel 4.19.<br><br>Thanks.<br><div></div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">вт, 23 июн. 2020 г. в 21:49, Strahil Nikolov <<a href="mailto:hunter86_bg@yahoo.com">hunter86_bg@yahoo.com</a>>:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">What is the OS and it's version ?<br>
I have seen similar behaviour (different workload) on RHEL 7.6 (and below).<br>
<br>
Have you checked what processes are in 'R' or 'D' state on st2a ?<br>
<br>
Best Regards,<br>
Strahil Nikolov<br>
<br>
На 23 юни 2020 г. 19:31:12 GMT+03:00, Pavel Znamensky <<a href="mailto:kompastver@gmail.com" target="_blank">kompastver@gmail.com</a>> написа:<br>
>Hi all,<br>
>There's something strange with one of our clusters and glusterfs<br>
>version<br>
>6.8: it's quite slow and one node is overloaded.<br>
>This is distributed cluster with four servers with the same<br>
>specs/OS/versions:<br>
><br>
>Volume Name: st2<br>
>Type: Distributed-Replicate<br>
>Volume ID: 4755753b-37c4-403b-b1c8-93099bfc4c45<br>
>Status: Started<br>
>Snapshot Count: 0<br>
>Number of Bricks: 2 x 2 = 4<br>
>Transport-type: tcp<br>
>Bricks:<br>
>Brick1: st2a:/vol3/st2<br>
>Brick2: st2b:/vol3/st2<br>
>Brick3: st2c:/vol3/st2<br>
>Brick4: st2d:/vol3/st2<br>
>Options Reconfigured:<br>
>cluster.rebal-throttle: aggressive<br>
>nfs.disable: on<br>
>performance.readdir-ahead: off<br>
>transport.address-family: inet6<br>
>performance.quick-read: off<br>
>performance.cache-size: 1GB<br>
>performance.io-cache: on<br>
>performance.io-thread-count: 16<br>
>cluster.data-self-heal-algorithm: full<br>
>network.ping-timeout: 20<br>
>server.event-threads: 2<br>
>client.event-threads: 2<br>
>cluster.readdir-optimize: on<br>
>performance.read-ahead: off<br>
>performance.parallel-readdir: on<br>
>cluster.self-heal-daemon: enable<br>
>storage.health-check-timeout: 20<br>
><br>
>op.version for this cluster remains 50400<br>
><br>
>st2a is a replica for the st2b and st2c is a replica for st2d.<br>
>All our 50 clients mount this volume using FUSE and in contrast with<br>
>other<br>
>our cluster this one works terrible slow.<br>
>Interesting thing here is that there are very low HDDs and network<br>
>utilization from one hand and quite overloaded server from another<br>
>hand.<br>
>Also, there are no files which should be healed according to `gluster<br>
>volume heal st2 info`.<br>
>Load average across servers:<br>
>st2a:<br>
>load average: 28,73, 26,39, 27,44<br>
>st2b:<br>
>load average: 0,24, 0,46, 0,76<br>
>st2c:<br>
>load average: 0,13, 0,20, 0,27<br>
>st2d:<br>
>load average:2,93, 2,11, 1,50<br>
><br>
>If we stop glusterfs on st2a server the cluster will work as fast as we<br>
>expected.<br>
>Previously the cluster worked on a version 5.x and there were no such<br>
>problems.<br>
><br>
>Interestingly, that almost all CPU usage on st2a generates by a<br>
>"system"<br>
>load.<br>
>The most CPU intensive process is glusterfsd.<br>
>`top -H` for glusterfsd process shows this:<br>
><br>
>PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+<br>
>COMMAND<br>
><br>
>13894 root 20 0 2172892 96488 9056 R 74,0 0,1 122:09.14<br>
>glfs_iotwr00a<br>
>13888 root 20 0 2172892 96488 9056 R 73,7 0,1 121:38.26<br>
>glfs_iotwr004<br>
>13891 root 20 0 2172892 96488 9056 R 73,7 0,1 121:53.83<br>
>glfs_iotwr007<br>
>13920 root 20 0 2172892 96488 9056 R 73,0 0,1 122:11.27<br>
>glfs_iotwr00f<br>
>13897 root 20 0 2172892 96488 9056 R 68,3 0,1 121:09.82<br>
>glfs_iotwr00d<br>
>13896 root 20 0 2172892 96488 9056 R 68,0 0,1 122:03.99<br>
>glfs_iotwr00c<br>
>13868 root 20 0 2172892 96488 9056 R 67,7 0,1 122:42.55<br>
>glfs_iotwr000<br>
>13889 root 20 0 2172892 96488 9056 R 67,3 0,1 122:17.02<br>
>glfs_iotwr005<br>
>13887 root 20 0 2172892 96488 9056 R 67,0 0,1 122:29.88<br>
>glfs_iotwr003<br>
>13885 root 20 0 2172892 96488 9056 R 65,0 0,1 122:04.85<br>
>glfs_iotwr001<br>
>13892 root 20 0 2172892 96488 9056 R 55,0 0,1 121:15.23<br>
>glfs_iotwr008<br>
>13890 root 20 0 2172892 96488 9056 R 54,7 0,1 121:27.88<br>
>glfs_iotwr006<br>
>13895 root 20 0 2172892 96488 9056 R 54,0 0,1 121:28.35<br>
>glfs_iotwr00b<br>
>13893 root 20 0 2172892 96488 9056 R 53,0 0,1 122:23.12<br>
>glfs_iotwr009<br>
>13898 root 20 0 2172892 96488 9056 R 52,0 0,1 122:30.67<br>
>glfs_iotwr00e<br>
>13886 root 20 0 2172892 96488 9056 R 41,3 0,1 121:26.97<br>
>glfs_iotwr002<br>
>13878 root 20 0 2172892 96488 9056 S 1,0 0,1 1:20.34<br>
>glfs_rpcrqhnd<br>
>13840 root 20 0 2172892 96488 9056 S 0,7 0,1 0:51.54<br>
>glfs_epoll000<br>
>13841 root 20 0 2172892 96488 9056 S 0,7 0,1 0:51.14<br>
>glfs_epoll001<br>
>13877 root 20 0 2172892 96488 9056 S 0,3 0,1 1:20.02<br>
>glfs_rpcrqhnd<br>
>13833 root 20 0 2172892 96488 9056 S 0,0 0,1 0:00.00<br>
>glusterfsd<br>
>13834 root 20 0 2172892 96488 9056 S 0,0 0,1 0:00.14<br>
>glfs_timer<br>
>13835 root 20 0 2172892 96488 9056 S 0,0 0,1 0:00.00<br>
>glfs_sigwait<br>
>13836 root 20 0 2172892 96488 9056 S 0,0 0,1 0:00.16<br>
>glfs_memsweep<br>
>13837 root 20 0 2172892 96488 9056 S 0,0 0,1 0:00.05<br>
>glfs_sproc0<br>
><br>
>Also I didn't find relevant messages in log files.<br>
>Honestly, don't know what to do. Does someone know how to debug or fix<br>
>this<br>
>behaviour?<br>
><br>
>Best regards,<br>
>Pavel<br>
</blockquote></div>