[Gluster-users] How to find out what GlusterFS is doing

Thu Nov 5 14:28:24 UTC 2020

On Thu, Nov 5, 2020 at 4:18 PM mabi <mabi at protonmail.ch> wrote:

> Below is the top output of running "top -bHd d" on one of the nodes, maybe
> that can help to see what that glusterfsd process is doing?
>
>   PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND
>  4375 root      20   0 2856784 120492   8360 D 61.1  0.4 117:09.29
> glfs_iotwr001
>

Waiting for IO, just like the rest of those in D state.
You may have a slow storage subsystem. How many cores do you have, btw?
Y.

 4385 root      20   0 2856784 120492   8360 R 61.1  0.4 117:12.92
> glfs_iotwr003
>  4387 root      20   0 2856784 120492   8360 R 61.1  0.4 117:32.19
> glfs_iotwr005
>  4388 root      20   0 2856784 120492   8360 R 61.1  0.4 117:28.87
> glfs_iotwr006
>  4391 root      20   0 2856784 120492   8360 D 61.1  0.4 117:20.71
> glfs_iotwr008
>  4395 root      20   0 2856784 120492   8360 D 61.1  0.4 117:17.22
> glfs_iotwr009
>  4405 root      20   0 2856784 120492   8360 R 61.1  0.4 117:19.52
> glfs_iotwr00d
>  4406 root      20   0 2856784 120492   8360 R 61.1  0.4 117:29.51
> glfs_iotwr00e
>  4366 root      20   0 2856784 120492   8360 D 55.6  0.4 117:27.58
> glfs_iotwr000
>  4386 root      20   0 2856784 120492   8360 D 55.6  0.4 117:22.77
> glfs_iotwr004
>  4390 root      20   0 2856784 120492   8360 D 55.6  0.4 117:26.49
> glfs_iotwr007
>  4396 root      20   0 2856784 120492   8360 R 55.6  0.4 117:23.68
> glfs_iotwr00a
>  4376 root      20   0 2856784 120492   8360 D 50.0  0.4 117:36.17
> glfs_iotwr002
>  4397 root      20   0 2856784 120492   8360 D 50.0  0.4 117:11.09
> glfs_iotwr00b
>  4403 root      20   0 2856784 120492   8360 R 50.0  0.4 117:26.34
> glfs_iotwr00c
>  4408 root      20   0 2856784 120492   8360 D 50.0  0.4 117:27.47
> glfs_iotwr00f
>  9814 root      20   0 2043684  75208   8424 D 22.2  0.2  50:15.20
> glfs_iotwr003
> 28131 root      20   0 2043684  75208   8424 R 22.2  0.2  50:07.46
> glfs_iotwr004
>  2208 root      20   0 2043684  75208   8424 R 22.2  0.2  49:32.70
> glfs_iotwr008
>  2372 root      20   0 2043684  75208   8424 R 22.2  0.2  49:52.60
> glfs_iotwr009
>  2375 root      20   0 2043684  75208   8424 D 22.2  0.2  49:54.08
> glfs_iotwr00c
>   767 root      39  19       0      0      0 R 16.7  0.0  67:50.83
> dbuf_evict
>  4132 onadmin   20   0   45292   4184   3176 R 16.7  0.0   0:00.04 top
> 28484 root      20   0 2043684  75208   8424 R 11.1  0.2  49:41.34
> glfs_iotwr005
>  2376 root      20   0 2043684  75208   8424 R 11.1  0.2  49:49.49
> glfs_iotwr00d
>  2719 root      20   0 2043684  75208   8424 R 11.1  0.2  49:58.61
> glfs_iotwr00e
>  4384 root      20   0 2856784 120492   8360 S  5.6  0.4   4:01.27
> glfs_rpcrqhnd
>  3842 root      20   0 2043684  75208   8424 S  5.6  0.2   0:30.12
> glfs_epoll001
>     1 root      20   0   57696   7340   5248 S  0.0  0.0   0:03.59 systemd
>     2 root      20   0       0      0      0 S  0.0  0.0   0:09.57 kthreadd
>     3 root      20   0       0      0      0 S  0.0  0.0   0:00.16
> ksoftirqd/0
>     5 root       0 -20       0      0      0 S  0.0  0.0   0:00.00
> kworker/0:0H
>     7 root      20   0       0      0      0 S  0.0  0.0   0:07.36
> rcu_sched
>     8 root      20   0       0      0      0 S  0.0  0.0   0:00.00 rcu_bh
>     9 root      rt   0       0      0      0 S  0.0  0.0   0:00.03
> migration/0
>    10 root       0 -20       0      0      0 S  0.0  0.0   0:00.00
> lru-add-drain
>    11 root      rt   0       0      0      0 S  0.0  0.0   0:00.01
> watchdog/0
>    12 root      20   0       0      0      0 S  0.0  0.0   0:00.00 cpuhp/0
>    13 root      20   0       0      0      0 S  0.0  0.0   0:00.00 cpuhp/1
>
> Any clues anyone?
>
> The load is really high around 20 now on the two nodes...
>
>
> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> On Thursday, November 5, 2020 11:50 AM, mabi <mabi at protonmail.ch> wrote:
>
> > Hello,
> >
> > I have a 3 node replica including arbiter GlusterFS 7.8 server with 3
> volumes and the two nodes (not arbiter) seem to have a high load due to the
> glusterfsd brick process taking all CPU resources (12 cores).
> >
> > Checking these two servers with iostat command shows that the disks are
> not so busy and that they are mostly doing writes activity. On the FUSE
> clients there is not so much activity so I was wondering how to find out or
> explain why GlusterFS is currently generating such a high load on these two
> servers (the arbiter does not show any high load). There are no files
> currently healing either. This volume is the only volume which has the
> quota enabled if this might be a hint. So does anyone know how to see why
> GlusterFS is so busy on a specific volume?
> >
> > Here is a sample "vmstat 60" of one of the nodes:
> >
> > onadmin at gfs1b:~$ vmstat 60
> > procs -----------memory---------- ---swap-- -----io---- -system--
> ------cpu-----
> > r b swpd free buff cache si so bi bo in cs us sy id wa st
> > 9 2 0 22296776 32004 260284 0 0 33 301 153 39 2 60 36 2 0
> > 13 0 0 22244540 32048 260456 0 0 343 2798 10898 367652 2 80 16 1 0
> > 18 0 0 22215740 32056 260672 0 0 308 2524 9892 334537 2 83 14 1 0
> > 18 0 0 22179348 32084 260828 0 0 169 2038 8703 250351 1 88 10 0 0
> >
> > I already tried rebooting but that did not help and there is nothing
> special in the log files either.
> >
> > Best regards,
> > Mabi
>
>
> ________
>
>
>
> Community Meeting Calendar:
>
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://meet.google.com/cpu-eiue-hvk
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20201105/2f582847/attachment.html>