[Bugs] [Bug 1360689] New: One client can effectively hang entire gluster array

Wed Jul 27 10:51:54 UTC 2016

https://bugzilla.redhat.com/show_bug.cgi?id=1360689

            Bug ID: 1360689
           Summary: One client can effectively hang entire gluster array
           Product: GlusterFS
           Version: mainline
         Component: core
          Assignee: bugs at gluster.org
          Reporter: rgowdapp at redhat.com
                CC: bugs at gluster.org

Description of problem:
Filing this bug on behalf of "Patrick Glomski" <patrick.glomski at corvidtec.com>

Entire discussion can be found in mail thread:
http://www.gluster.org/pipermail/gluster-devel/2016-July/050012.html

TL;DR: One gluster client can essentially cause denial of service /
availability loss to entire gluster array. There's no way to stop it and almost
no way to find the bad client. Probably all (at least 3.6 and 3.7) versions are
affected.

We have two large replicate gluster arrays (3.6.6 and 3.7.11) that are used in
a high-performance computing environment. Two file access cases cause severe
issues with glusterfs: Some of our scientific codes write hundreds of files
(~400-500) simultaneously (one file or more per processor core, so lots of
small or large writes) and others read thousands of files (2000-3000)
simultaneously to grab metadata from each file (lots of small reads). 

In either of these situations, one glusterfsd process on whatever peer the
client is currently talking to will skyrocket to *nproc* cpu usage (800%,
1600%) and the storage cluster is essentially useless; all other clients will
eventually try to read or write data to the overloaded peer and, when that
happens, their connection will hang. Heals between peers hang because the load
on the peer is around 1.5x the number of cores or more. This occurs in either
gluster 3.6 or 3.7, is very repeatable, and happens much too frequently.

Even worse, there seems to be no definitive way to diagnose which client is
causing the issues. Getting 'volume status <> clients' doesn't help because it
reports the total number of bytes read/written by each client. (a) The metadata
in question is tiny compared to the multi-gigabyte output files being dealt
with and (b) the byte-count is cumulative for the clients and the compute nodes
are always up with the filesystems mounted, so the byte transfer counts are
astronomical. The best solution I've come up with is to blackhole-route traffic
from clients one at a time (effectively push the traffic over to the other
peer), wait a few minutes for all of the backlogged traffic to dissipate (if
it's going to), see if the load on glusterfsd drops, and repeat until I find
the client causing the issue. I would *love* any ideas on a better way to find
rogue clients.

More importantly, though, there must be some feature envorced to stop one user
from having the capability to render the entire filesystem unavailable for all
other users. In the worst case, I would even prefer a gluster volume option
that simply disconnects clients making over some threshold of file open
requests. That's WAY more preferable than a complete availability loss
reminiscent of a DDoS attack...

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1.
2.
3.

Actual results:

Expected results:

Additional info:

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.