[Bugs] [Bug 1695099] New: The number of glusterfs processes keeps increasing, using all available resources
bugzilla at redhat.com
bugzilla at redhat.com
Tue Apr 2 13:21:23 UTC 2019
https://bugzilla.redhat.com/show_bug.cgi?id=1695099
Bug ID: 1695099
Summary: The number of glusterfs processes keeps increasing,
using all available resources
Product: GlusterFS
Version: 5
Hardware: x86_64
OS: Linux
Status: NEW
Component: glusterd
Severity: high
Assignee: bugs at gluster.org
Reporter: christian.ihle at drift.oslo.kommune.no
CC: bugs at gluster.org
Target Milestone: ---
Classification: Community
Description of problem:
During normal operations, CPU and memory usage gradually increase to 100%,
being used by a large number of glusterfs processes. The result is slowness and
resource starvation. Issue startet happening with GlusterFS 5.2, but did not
improve with 5.5. Did not see this issue in 3.12.
Version-Release number of selected component (if applicable):
GlusterFS 5.2 and 5.5
Heketi 8.0.0
CentOS 7.6
How reproducible:
Users of the cluster hit this issue pretty often by creating and deleting
volumes quickly, from Kubernetes (using Heketi to control GlusterFS). Sometimes
we hit 100% resource usage several times a day.
Steps to Reproduce:
1. Create volume
2. Delete volume
3. Repeat quickly
Actual results:
CPU usage and memory usage increase, and the number of glusterfs processes
increases. I have to login to each node in the cluster and kill old processes
to make nodes responsive again, otherwise the nodes eventually freeze from
resource starvation.
Expected results:
CPU and memory usage should only spike shortly, and not continue to increase,
and there should be only one glusterfs process.
Additional info:
I found some issues that look similar:
* https://github.com/gluster/glusterfs/issues/625
* https://github.com/heketi/heketi/issues/1439
Log output from a time where resource usage increased
(/var/log/glusterfs/glusterd.log):
[2019-04-01 12:07:23.377715] W [MSGID: 101095]
[xlator.c:180:xlator_volopt_dynload] 0-xlator:
/usr/lib64/glusterfs/5.5/xlator/nfs/server.so: cannot open shared object file:
Ingen slik fil eller filkatalog
[2019-04-01 12:07:23.684561] I [run.c:242:runner_log]
(-->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0xe6f9a) [0x7fd04cc46f9a]
-->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0xe6a65) [0x7fd04cc46a65]
-->/lib64/libglusterfs.so.0(runner_log+0x115) [0x7fd058156955] ) 0-management:
Ran script: /var/lib/glusterd/hooks/1/create/post/S10selinux-label-brick.sh
--volname=vol_45653f46dbc8953f876a009b4ea8dd26
[2019-04-01 12:07:26.931683] I [rpc-clnt.c:1000:rpc_clnt_connection_init]
0-snapd: setting frame-timeout to 600
[2019-04-01 12:07:26.932340] I [rpc-clnt.c:1000:rpc_clnt_connection_init]
0-gfproxyd: setting frame-timeout to 600
[2019-04-01 12:07:26.932667] I [MSGID: 106131]
[glusterd-proc-mgmt.c:86:glusterd_proc_stop] 0-management: nfs already stopped
[2019-04-01 12:07:26.932707] I [MSGID: 106568]
[glusterd-svc-mgmt.c:253:glusterd_svc_stop] 0-management: nfs service is
stopped
[2019-04-01 12:07:26.932731] I [MSGID: 106599]
[glusterd-nfs-svc.c:81:glusterd_nfssvc_manager] 0-management: nfs/server.so
xlator is not installed
[2019-04-01 12:07:26.963055] I [MSGID: 106568]
[glusterd-proc-mgmt.c:92:glusterd_proc_stop] 0-management: Stopping glustershd
daemon running in pid: 16020
[2019-04-01 12:07:27.963708] I [MSGID: 106568]
[glusterd-svc-mgmt.c:253:glusterd_svc_stop] 0-management: glustershd service is
stopped
[2019-04-01 12:07:27.963951] I [MSGID: 106567]
[glusterd-svc-mgmt.c:220:glusterd_svc_start] 0-management: Starting glustershd
service
[2019-04-01 12:07:28.985311] I [MSGID: 106131]
[glusterd-proc-mgmt.c:86:glusterd_proc_stop] 0-management: bitd already stopped
[2019-04-01 12:07:28.985478] I [MSGID: 106568]
[glusterd-svc-mgmt.c:253:glusterd_svc_stop] 0-management: bitd service is
stopped
[2019-04-01 12:07:28.989024] I [MSGID: 106131]
[glusterd-proc-mgmt.c:86:glusterd_proc_stop] 0-management: scrub already
stopped
[2019-04-01 12:07:28.989098] I [MSGID: 106568]
[glusterd-svc-mgmt.c:253:glusterd_svc_stop] 0-management: scrub service is
stopped
[2019-04-01 12:07:29.299841] I [run.c:242:runner_log]
(-->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0xe6f9a) [0x7fd04cc46f9a]
-->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0xe6a65) [0x7fd04cc46a65]
-->/lib64/libglusterfs.so.0(runner_log+0x115) [0x7fd058156955] ) 0-management:
Ran script: /var/lib/glusterd/hooks/1/start/post/S29CTDBsetup.sh
--volname=vol_45653f46dbc8953f876a009b4ea8dd26 --first=no --version=1
--volume-op=start --gd-workdir=/var/lib/glusterd
[2019-04-01 12:07:29.338437] E [run.c:242:runner_log]
(-->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0xe6f9a) [0x7fd04cc46f9a]
-->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0xe69c3) [0x7fd04cc469c3]
-->/lib64/libglusterfs.so.0(runner_log+0x115) [0x7fd058156955] ) 0-management:
Failed to execute script:
/var/lib/glusterd/hooks/1/start/post/S30samba-start.sh
--volname=vol_45653f46dbc8953f876a009b4ea8dd26 --first=no --version=1
--volume-op=start --gd-workdir=/var/lib/glusterd
[2019-04-01 12:07:52.658922] I [run.c:242:runner_log]
(-->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0x3b2dd) [0x7fd04cb9b2dd]
-->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0xe6a65) [0x7fd04cc46a65]
-->/lib64/libglusterfs.so.0(runner_log+0x115) [0x7fd058156955] ) 0-management:
Ran script: /var/lib/glusterd/hooks/1/stop/pre/S29CTDB-teardown.sh
--volname=vol_c5112b1e28a7bbc96640a8572009c6f0 --last=no
[2019-04-01 12:07:52.679220] E [run.c:242:runner_log]
(-->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0x3b2dd) [0x7fd04cb9b2dd]
-->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0xe69c3) [0x7fd04cc469c3]
-->/lib64/libglusterfs.so.0(runner_log+0x115) [0x7fd058156955] ) 0-management:
Failed to execute script: /var/lib/glusterd/hooks/1/stop/pre/S30samba-stop.sh
--volname=vol_c5112b1e28a7bbc96640a8572009c6f0 --last=no
[2019-04-01 12:07:52.681081] I [MSGID: 106542]
[glusterd-utils.c:8440:glusterd_brick_signal] 0-glusterd: sending signal 15 to
brick with pid 27595
[2019-04-01 12:07:53.732699] I [MSGID: 106599]
[glusterd-nfs-svc.c:81:glusterd_nfssvc_manager] 0-management: nfs/server.so
xlator is not installed
[2019-04-01 12:07:53.791560] I [MSGID: 106568]
[glusterd-proc-mgmt.c:92:glusterd_proc_stop] 0-management: Stopping glustershd
daemon running in pid: 18583
[2019-04-01 12:07:53.791857] I [MSGID: 106143]
[glusterd-pmap.c:389:pmap_registry_remove] 0-pmap: removing brick
/var/lib/heketi/mounts/vg_799fbf11286fbf497605bbe58c3e9dfa/brick_08bfe132dad6099ab387555298466ca3/brick
on port 49162
[2019-04-01 12:07:53.822032] I [MSGID: 106006]
[glusterd-svc-mgmt.c:356:glusterd_svc_common_rpc_notify] 0-management:
glustershd has disconnected from glusterd.
[2019-04-01 12:07:54.792497] I [MSGID: 106568]
[glusterd-svc-mgmt.c:253:glusterd_svc_stop] 0-management: glustershd service is
stopped
[2019-04-01 12:07:54.792736] I [MSGID: 106567]
[glusterd-svc-mgmt.c:220:glusterd_svc_start] 0-management: Starting glustershd
service
[2019-04-01 12:07:55.812655] I [MSGID: 106131]
[glusterd-proc-mgmt.c:86:glusterd_proc_stop] 0-management: bitd already stopped
[2019-04-01 12:07:55.812837] I [MSGID: 106568]
[glusterd-svc-mgmt.c:253:glusterd_svc_stop] 0-management: bitd service is
stopped
[2019-04-01 12:07:55.816580] I [MSGID: 106131]
[glusterd-proc-mgmt.c:86:glusterd_proc_stop] 0-management: scrub already
stopped
[2019-04-01 12:07:55.816672] I [MSGID: 106568]
[glusterd-svc-mgmt.c:253:glusterd_svc_stop] 0-management: scrub service is
stopped
[2019-04-01 12:07:59.829927] I [run.c:242:runner_log]
(-->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0x3b2dd) [0x7fd04cb9b2dd]
-->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0xe6a65) [0x7fd04cc46a65]
-->/lib64/libglusterfs.so.0(runner_log+0x115) [0x7fd058156955] ) 0-management:
Ran script: /var/lib/glusterd/hooks/1/delete/pre/S10selinux-del-fcontext.sh
--volname=vol_c5112b1e28a7bbc96640a8572009c6f0
[2019-04-01 12:07:59.951300] I [MSGID: 106495]
[glusterd-handler.c:3118:__glusterd_handle_getwd] 0-glusterd: Received getwd
req
[2019-04-01 12:07:59.967584] I [run.c:242:runner_log]
(-->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0xe6f9a) [0x7fd04cc46f9a]
-->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0xe6a65) [0x7fd04cc46a65]
-->/lib64/libglusterfs.so.0(runner_log+0x115) [0x7fd058156955] ) 0-management:
Ran script: /var/lib/glusterd/hooks/1/delete/post/S57glusterfind-delete-post
--volname=vol_c5112b1e28a7bbc96640a8572009c6f0
[2019-04-01 12:07:53.732626] I [MSGID: 106131]
[glusterd-proc-mgmt.c:86:glusterd_proc_stop] 0-management: nfs already stopped
[2019-04-01 12:07:53.732677] I [MSGID: 106568]
[glusterd-svc-mgmt.c:253:glusterd_svc_stop] 0-management: nfs service is
stopped
Examples of errors about deleted volumes from /var/log/glusterfs/glustershd.log
- we get gigabytes of these every day:
[2019-04-02 09:57:08.997572] E [MSGID: 108006]
[afr-common.c:5314:__afr_handle_child_down_event]
10-vol_3be8a34875cc37098593d4bc8740477b-replicate-0: All subvolumes are down.
Going offline until at least one of them comes back up.
[2019-04-02 09:57:09.033441] E [MSGID: 108006]
[afr-common.c:5314:__afr_handle_child_down_event]
26-vol_2399e6ef0347ac569a0b1211f1fd109d-replicate-0: All subvolumes are down.
Going offline until at least one of them comes back up.
[2019-04-02 09:57:09.036003] E [MSGID: 108006]
[afr-common.c:5314:__afr_handle_child_down_event]
40-vol_fafddd8a937a550fbefc6c54830ce44f-replicate-0: All subvolumes are down.
Going offline until at least one of them comes back up.
[2019-04-02 09:57:09.077109] E [MSGID: 108006]
[afr-common.c:5314:__afr_handle_child_down_event]
2-vol_bca47201841f5b50d341eb2bedf5cd46-replicate-0: All subvolumes are down.
Going offline until at least one of them comes back up.
[2019-04-02 09:57:09.103495] E [MSGID: 108006]
[afr-common.c:5314:__afr_handle_child_down_event]
24-vol_fafddd8a937a550fbefc6c54830ce44f-replicate-0: All subvolumes are down.
Going offline until at least one of them comes back up.
[2019-04-02 09:57:09.455818] E [MSGID: 108006]
[afr-common.c:5314:__afr_handle_child_down_event]
30-vol_fafddd8a937a550fbefc6c54830ce44f-replicate-0: All subvolumes are down.
Going offline until at least one of them comes back up.
[2019-04-02 09:57:09.511070] E [MSGID: 108006]
[afr-common.c:5314:__afr_handle_child_down_event]
14-vol_cf3700764dfdce40d60b89fde7e1a643-replicate-0: All subvolumes are down.
Going offline until at least one of them comes back up.
[2019-04-02 09:57:09.490714] E [MSGID: 108006]
[afr-common.c:5314:__afr_handle_child_down_event]
0-vol_c5112b1e28a7bbc96640a8572009c6f0-replicate-0: All subvolumes are down.
Going offline until at least one of them comes back up.
Example of concurrent glusterfs processes on a node:
root 4559 16.8 6.5 14882048 1060288 ? Ssl Apr01 206:00
/usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p
/var/run/gluster/glustershd/glustershd.pid -l /var/log/glusterfs/glustershd.log
-S /var/run/gluster/b1de56a0bbcc8779.socket --xlator-option
*replicate*.node-uuid=24419492-0d80-4a2a-9420-1dd92515eaf1 --process-name
glustershd
root 6507 14.7 6.1 14250324 998280 ? Ssl Apr01 178:33
/usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p
/var/run/gluster/glustershd/glustershd.pid -l /var/log/glusterfs/glustershd.log
-S /var/run/gluster/b1de56a0bbcc8779.socket --xlator-option
*replicate*.node-uuid=24419492-0d80-4a2a-9420-1dd92515eaf1 --process-name
glustershd
root 6743 0.0 1.2 4780344 201708 ? Ssl Apr01 0:35
/usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p
/var/run/gluster/glustershd/glustershd.pid -l /var/log/glusterfs/glustershd.log
-S /var/run/gluster/b1de56a0bbcc8779.socket --xlator-option
*replicate*.node-uuid=24419492-0d80-4a2a-9420-1dd92515eaf1 --process-name
glustershd
root 7660 17.0 6.3 14859244 1027432 ? Ssl Apr01 206:32
/usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p
/var/run/gluster/glustershd/glustershd.pid -l /var/log/glusterfs/glustershd.log
-S /var/run/gluster/b1de56a0bbcc8779.socket --xlator-option
*replicate*.node-uuid=24419492-0d80-4a2a-9420-1dd92515eaf1 --process-name
glustershd
root 7789 0.1 1.5 5390364 250200 ? Ssl Apr01 1:08
/usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p
/var/run/gluster/glustershd/glustershd.pid -l /var/log/glusterfs/glustershd.log
-S /var/run/gluster/b1de56a0bbcc8779.socket --xlator-option
*replicate*.node-uuid=24419492-0d80-4a2a-9420-1dd92515eaf1 --process-name
glustershd
root 9259 16.4 6.3 14841432 1029512 ? Ssl Apr01 198:12
/usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p
/var/run/gluster/glustershd/glustershd.pid -l /var/log/glusterfs/glustershd.log
-S /var/run/gluster/b1de56a0bbcc8779.socket --xlator-option
*replicate*.node-uuid=24419492-0d80-4a2a-9420-1dd92515eaf1 --process-name
glustershd
root 12394 14.0 5.6 13549044 918424 ? Ssl Apr01 167:46
/usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p
/var/run/gluster/glustershd/glustershd.pid -l /var/log/glusterfs/glustershd.log
-S /var/run/gluster/b1de56a0bbcc8779.socket --xlator-option
*replicate*.node-uuid=24419492-0d80-4a2a-9420-1dd92515eaf1 --process-name
glustershd
root 14980 9.2 4.7 11657716 778876 ? Ssl Apr01 110:10
/usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p
/var/run/gluster/glustershd/glustershd.pid -l /var/log/glusterfs/glustershd.log
-S /var/run/gluster/b1de56a0bbcc8779.socket --xlator-option
*replicate*.node-uuid=24419492-0d80-4a2a-9420-1dd92515eaf1 --process-name
glustershd
root 16032 8.2 4.4 11040436 716020 ? Ssl Apr01 97:39
/usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p
/var/run/gluster/glustershd/glustershd.pid -l /var/log/glusterfs/glustershd.log
-S /var/run/gluster/b1de56a0bbcc8779.socket --xlator-option
*replicate*.node-uuid=24419492-0d80-4a2a-9420-1dd92515eaf1 --process-name
glustershd
root 23961 6.3 3.7 9807736 610408 ? Ssl Apr01 62:03
/usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p
/var/run/gluster/glustershd/glustershd.pid -l /var/log/glusterfs/glustershd.log
-S /var/run/gluster/b1de56a0bbcc8779.socket --xlator-option
*replicate*.node-uuid=24419492-0d80-4a2a-9420-1dd92515eaf1 --process-name
glustershd
root 25560 2.8 3.0 8474704 503488 ? Ssl Apr01 27:33
/usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p
/var/run/gluster/glustershd/glustershd.pid -l /var/log/glusterfs/glustershd.log
-S /var/run/gluster/b1de56a0bbcc8779.socket --xlator-option
*replicate*.node-uuid=24419492-0d80-4a2a-9420-1dd92515eaf1 --process-name
glustershd
root 26293 3.2 1.2 4812208 200896 ? Ssl 09:26 0:35
/usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p
/var/run/gluster/glustershd/glustershd.pid -l /var/log/glusterfs/glustershd.log
-S /var/run/gluster/b1de56a0bbcc8779.socket --xlator-option
*replicate*.node-uuid=24419492-0d80-4a2a-9420-1dd92515eaf1 --process-name
glustershd
root 28205 1.3 1.8 5992016 300012 ? Ssl Apr01 13:31
/usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p
/var/run/gluster/glustershd/glustershd.pid -l /var/log/glusterfs/glustershd.log
-S /var/run/gluster/b1de56a0bbcc8779.socket --xlator-option
*replicate*.node-uuid=24419492-0d80-4a2a-9420-1dd92515eaf1 --process-name
glustershd
root 29186 1.4 2.1 6669800 352440 ? Ssl Apr01 13:59
/usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p
/var/run/gluster/glustershd/glustershd.pid -l /var/log/glusterfs/glustershd.log
-S /var/run/gluster/b1de56a0bbcc8779.socket --xlator-option
*replicate*.node-uuid=24419492-0d80-4a2a-9420-1dd92515eaf1 --process-name
glustershd
root 30485 0.9 0.6 3527080 101552 ? Ssl 09:35 0:05
/usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p
/var/run/gluster/glustershd/glustershd.pid -l /var/log/glusterfs/glustershd.log
-S /var/run/gluster/b1de56a0bbcc8779.socket --xlator-option
*replicate*.node-uuid=24419492-0d80-4a2a-9420-1dd92515eaf1 --process-name
glustershd
root 31171 1.0 0.6 3562360 104908 ? Ssl 09:35 0:05
/usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p
/var/run/gluster/glustershd/glustershd.pid -l /var/log/glusterfs/glustershd.log
-S /var/run/gluster/b1de56a0bbcc8779.socket --xlator-option
*replicate*.node-uuid=24419492-0d80-4a2a-9420-1dd92515eaf1 --process-name
glustershd
root 32086 0.6 0.3 2925412 54852 ? Ssl 09:35 0:03
/usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p
/var/run/gluster/glustershd/glustershd.pid -l /var/log/glusterfs/glustershd.log
-S /var/run/gluster/b1de56a0bbcc8779.socket --xlator-option
*replicate*.node-uuid=24419492-0d80-4a2a-9420-1dd92515eaf1 --process-name
glustershd
--
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.
More information about the Bugs
mailing list