[Bugs] [Bug 1711400] New: Dispersed volumes leave open file descriptors on nodes
bugzilla at redhat.com
bugzilla at redhat.com
Fri May 17 17:05:38 UTC 2019
https://bugzilla.redhat.com/show_bug.cgi?id=1711400
Bug ID: 1711400
Summary: Dispersed volumes leave open file descriptors on nodes
Product: GlusterFS
Version: 4.1
Hardware: x86_64
OS: Linux
Status: NEW
Component: disperse
Severity: medium
Assignee: bugs at gluster.org
Reporter: maclemming+redhat at gmail.com
CC: bugs at gluster.org
Target Milestone: ---
Classification: Community
We have a 5 node Gluster cluster running 4.1.7. The gluster service is running
as a container on each host, and is mounted the `/srv/brick` directory on the
host.
We have several volumes that we have set up as dispersed volumes. The client
for Gluster is a Kubernetes cluster. One of the apps we have running is a
Devpi server. After a couple of days running the devpi server, we noticed that
the Gluster servers were unresponsive. Trying to ssh to any of the nodes gave
an error about too many files open. We eventually had to reboot all of the
servers to recover them.
The next day, we checked again, and saw that the glusterfs process that was
responsible for the devpi volume had 3 million files open (as seen with the
command `sudo lsof -a -p <pid> | wc -l`). Stopping the container did not free
up the file descriptors. Only stopping and starting the volume would release
the FDs. However, as soon as devpi starts again and serves files, then the
open FDs start rising again.
I was able to narrow down to when writing to files. Here are the replication
steps:
Create a Gluster dispersed volume:
gluster volume create fd-test disperse 5 redundancy 2 srv1:/path srv2:/path
srv3:/path srv4:/path srv5:/path
gluster volume quota fd-test enable
gluster volume quota fd-test limit-usage / 1GB
Mount the volume on a host, and run the simple script in the Gluster volume:
#!/bin/bash
while [ 1 -eq 1 ]
do
echo "something\n" > file.txt
sleep 1
done
>From any one of the Gluster nodes, find the PID of the Gluster process for the
volume, and run the commands to see the number of FDs (every 5 seconds):
admin at gfs-01:~$ sudo lsof -a -p 11606 | wc -l
26
admin at gfs-01:~$ sudo lsof -a -p 11606 | wc -l
30
admin at gfs-01:~$ sudo lsof -a -p 11606 | wc -l
35
admin at gfs-01:~$ sudo lsof -a -p 11606 | wc -l
40
If you take out the sleep, the FDs will jump up by thousands every second.
If you view the actual FDs without the `wc` command, they are almost all the
same file:
glusterfs 11606 root 1935w REG 8,17 18944 53215266
/srv/brick/fd-test/.glusterfs/2e/4c/2e4c7104-02c4-4ac9-b611-7290938a6e3f
glusterfs 11606 root 1936w REG 8,17 18944 53215266
/srv/brick/fd-test/.glusterfs/2e/4c/2e4c7104-02c4-4ac9-b611-7290938a6e3f
glusterfs 11606 root 1937w REG 8,17 18944 53215266
/srv/brick/fd-test/.glusterfs/2e/4c/2e4c7104-02c4-4ac9-b611-7290938a6e3f
glusterfs 11606 root 1938w REG 8,17 18944 53215266
/srv/brick/fd-test/.glusterfs/2e/4c/2e4c7104-02c4-4ac9-b611-7290938a6e3f
glusterfs 11606 root 1939w REG 8,17 18944 53215266
/srv/brick/fd-test/.glusterfs/2e/4c/2e4c7104-02c4-4ac9-b611-7290938a6e3f
glusterfs 11606 root 1940w REG 8,17 18944 53215266
/srv/brick/fd-test/.glusterfs/2e/4c/2e4c7104-02c4-4ac9-b611-7290938a6e3f
glusterfs 11606 root 1941w REG 8,17 18944 53215266
/srv/brick/fd-test/.glusterfs/2e/4c/2e4c7104-02c4-4ac9-b611-7290938a6e3f
glusterfs 11606 root 1942w REG 8,17 18944 53215266
/srv/brick/fd-test/.glusterfs/2e/4c/2e4c7104-02c4-4ac9-b611-7290938a6e3f
glusterfs 11606 root 1943w REG 8,17 18944 53215266
/srv/brick/fd-test/.glusterfs/2e/4c/2e4c7104-02c4-4ac9-b611-7290938a6e3f
glusterfs 11606 root 1944w REG 8,17 18944 53215266
/srv/brick/fd-test/.glusterfs/2e/4c/2e4c7104-02c4-4ac9-b611-7290938a6e3f
glusterfs 11606 root 1945w REG 8,17 18944 53215266
/srv/brick/fd-test/.glusterfs/2e/4c/2e4c7104-02c4-4ac9-b611-7290938a6e3f
glusterfs 11606 root 1946w REG 8,17 18944 53215266
/srv/brick/fd-test/.glusterfs/2e/4c/2e4c7104-02c4-4ac9-b611-7290938a6e3f
glusterfs 11606 root 1947w REG 8,17 18944 53215266
/srv/brick/fd-test/.glusterfs/2e/4c/2e4c7104-02c4-4ac9-b611-7290938a6e3f
The container itself does not see any open FDs. It is only the Gluster host.
We tried creating a replicated volume and moved the devpi data to the new
volume, and it worked fine without leaving open FDs (constant 90 FDs open), so
the problem appears to be just with dispersed mode.
--
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.
More information about the Bugs
mailing list