[Gluster-users] One of cluster work super slow (v6.8)
Pavel Znamensky
kompastver at gmail.com
Thu Jul 16 08:19:45 UTC 2020
Sorry for the delay. Somehow Gmail decided to put almost all email from
this list to spam.
Anyway, yes, I checked the processes. Gluster processes are in 'R' state,
the others in 'S' state.
You can find 'top -H' output in the first message.
We're running glusterfs 6.8 on CentOS 7.8. Linux kernel 4.19.
Thanks.
вт, 23 июн. 2020 г. в 21:49, Strahil Nikolov <hunter86_bg at yahoo.com>:
> What is the OS and it's version ?
> I have seen similar behaviour (different workload) on RHEL 7.6 (and
> below).
>
> Have you checked what processes are in 'R' or 'D' state on st2a ?
>
> Best Regards,
> Strahil Nikolov
>
> На 23 юни 2020 г. 19:31:12 GMT+03:00, Pavel Znamensky <
> kompastver at gmail.com> написа:
> >Hi all,
> >There's something strange with one of our clusters and glusterfs
> >version
> >6.8: it's quite slow and one node is overloaded.
> >This is distributed cluster with four servers with the same
> >specs/OS/versions:
> >
> >Volume Name: st2
> >Type: Distributed-Replicate
> >Volume ID: 4755753b-37c4-403b-b1c8-93099bfc4c45
> >Status: Started
> >Snapshot Count: 0
> >Number of Bricks: 2 x 2 = 4
> >Transport-type: tcp
> >Bricks:
> >Brick1: st2a:/vol3/st2
> >Brick2: st2b:/vol3/st2
> >Brick3: st2c:/vol3/st2
> >Brick4: st2d:/vol3/st2
> >Options Reconfigured:
> >cluster.rebal-throttle: aggressive
> >nfs.disable: on
> >performance.readdir-ahead: off
> >transport.address-family: inet6
> >performance.quick-read: off
> >performance.cache-size: 1GB
> >performance.io-cache: on
> >performance.io-thread-count: 16
> >cluster.data-self-heal-algorithm: full
> >network.ping-timeout: 20
> >server.event-threads: 2
> >client.event-threads: 2
> >cluster.readdir-optimize: on
> >performance.read-ahead: off
> >performance.parallel-readdir: on
> >cluster.self-heal-daemon: enable
> >storage.health-check-timeout: 20
> >
> >op.version for this cluster remains 50400
> >
> >st2a is a replica for the st2b and st2c is a replica for st2d.
> >All our 50 clients mount this volume using FUSE and in contrast with
> >other
> >our cluster this one works terrible slow.
> >Interesting thing here is that there are very low HDDs and network
> >utilization from one hand and quite overloaded server from another
> >hand.
> >Also, there are no files which should be healed according to `gluster
> >volume heal st2 info`.
> >Load average across servers:
> >st2a:
> >load average: 28,73, 26,39, 27,44
> >st2b:
> >load average: 0,24, 0,46, 0,76
> >st2c:
> >load average: 0,13, 0,20, 0,27
> >st2d:
> >load average:2,93, 2,11, 1,50
> >
> >If we stop glusterfs on st2a server the cluster will work as fast as we
> >expected.
> >Previously the cluster worked on a version 5.x and there were no such
> >problems.
> >
> >Interestingly, that almost all CPU usage on st2a generates by a
> >"system"
> >load.
> >The most CPU intensive process is glusterfsd.
> >`top -H` for glusterfsd process shows this:
> >
> >PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
> >COMMAND
> >
> >13894 root 20 0 2172892 96488 9056 R 74,0 0,1 122:09.14
> >glfs_iotwr00a
> >13888 root 20 0 2172892 96488 9056 R 73,7 0,1 121:38.26
> >glfs_iotwr004
> >13891 root 20 0 2172892 96488 9056 R 73,7 0,1 121:53.83
> >glfs_iotwr007
> >13920 root 20 0 2172892 96488 9056 R 73,0 0,1 122:11.27
> >glfs_iotwr00f
> >13897 root 20 0 2172892 96488 9056 R 68,3 0,1 121:09.82
> >glfs_iotwr00d
> >13896 root 20 0 2172892 96488 9056 R 68,0 0,1 122:03.99
> >glfs_iotwr00c
> >13868 root 20 0 2172892 96488 9056 R 67,7 0,1 122:42.55
> >glfs_iotwr000
> >13889 root 20 0 2172892 96488 9056 R 67,3 0,1 122:17.02
> >glfs_iotwr005
> >13887 root 20 0 2172892 96488 9056 R 67,0 0,1 122:29.88
> >glfs_iotwr003
> >13885 root 20 0 2172892 96488 9056 R 65,0 0,1 122:04.85
> >glfs_iotwr001
> >13892 root 20 0 2172892 96488 9056 R 55,0 0,1 121:15.23
> >glfs_iotwr008
> >13890 root 20 0 2172892 96488 9056 R 54,7 0,1 121:27.88
> >glfs_iotwr006
> >13895 root 20 0 2172892 96488 9056 R 54,0 0,1 121:28.35
> >glfs_iotwr00b
> >13893 root 20 0 2172892 96488 9056 R 53,0 0,1 122:23.12
> >glfs_iotwr009
> >13898 root 20 0 2172892 96488 9056 R 52,0 0,1 122:30.67
> >glfs_iotwr00e
> >13886 root 20 0 2172892 96488 9056 R 41,3 0,1 121:26.97
> >glfs_iotwr002
> >13878 root 20 0 2172892 96488 9056 S 1,0 0,1 1:20.34
> >glfs_rpcrqhnd
> >13840 root 20 0 2172892 96488 9056 S 0,7 0,1 0:51.54
> >glfs_epoll000
> >13841 root 20 0 2172892 96488 9056 S 0,7 0,1 0:51.14
> >glfs_epoll001
> >13877 root 20 0 2172892 96488 9056 S 0,3 0,1 1:20.02
> >glfs_rpcrqhnd
> >13833 root 20 0 2172892 96488 9056 S 0,0 0,1 0:00.00
> >glusterfsd
> >13834 root 20 0 2172892 96488 9056 S 0,0 0,1 0:00.14
> >glfs_timer
> >13835 root 20 0 2172892 96488 9056 S 0,0 0,1 0:00.00
> >glfs_sigwait
> >13836 root 20 0 2172892 96488 9056 S 0,0 0,1 0:00.16
> >glfs_memsweep
> >13837 root 20 0 2172892 96488 9056 S 0,0 0,1 0:00.05
> >glfs_sproc0
> >
> >Also I didn't find relevant messages in log files.
> >Honestly, don't know what to do. Does someone know how to debug or fix
> >this
> >behaviour?
> >
> >Best regards,
> >Pavel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20200716/1be0173a/attachment.html>
More information about the Gluster-users
mailing list