[Bugs] [Bug 1381970] GlusterFS Daemon stops working after a longer runtime and higher file workload due to design flaws ?

Fri Dec 2 18:54:46 UTC 2016

https://bugzilla.redhat.com/show_bug.cgi?id=1381970

--- Comment #4 from Giuseppe Ragusa <giuseppe.ragusa at hotmail.com> ---
Further notes:

After extensive log reading and data cross-checks, we found no corruption on
user data; even more: users did not experience any visible error on NFS
automounted areas (mounted "hard"), only some delays and a web re-shared area
got some transient 404s.

All NFS clients (both physical and virtual, hosted on the hyperconverged
infrastructure) are on a dedicated jumbo-frame (9000) VLAN while the Gluster
cluster has been formed and relegated (by means of DNS node names) on a
dedicated jumbo-frame (9000) VLAN with no access from virtual machines (no
Linux bridge over the Gluster node LACP bonds).

We disabled all offloads from the Gluster node nics involved in NFS traffic
(since on the Gluster nodes the access to the NFS VLAN is by means of a
LACP-bonded-then-bridged link).

We tried increasing server.outstanding-rpc-limit to 256 (often suggested on
Gluster ML, albeit mainly for performance reasons).

We tried lowering nfs.outstanding-rpc-limit to 8 (cfr BZ #1008301).

We fixed a sm-notify problem (since the proper hostnames of NFS clients map to
public LAN IPs, sm-notify failed to reach them after each Gluster NFS restart
beacuse NFS traffic on public interface is denied by iptables rules: fixed by
forcing client public names to NFS private IPs in /etc/hosts on each Gluster
node).

We found that oVirt makes a constant flux of Gluster volume queries by means of
VDSM (more than one per second), but these should be informative only (albeit
the load they generate could matter).

We found a couple of still pending but probably unrelevant VDSM errors (one
related to the split-network setup which should be solved in more recent VDSM,
the other a "device" error with python backtrace while invoking
glusterVolumeStatus which does not fail if invoked by hand with VdsClient)

Nothing of the above solved the crash problem.

This is the sample configuration of one involved volume:

Volume Name: home
Type: Distributed-Replicate
Volume ID: 31636f1a-79d7-4065-b345-c14a727330ac
Status: Started
Number of Bricks: 3 x (2 + 1) = 9
Transport-type: tcp
Bricks:
Brick1: read.gluster.private:/srv/glusterfs/disk0/home_brick
Brick2: hall.gluster.private:/srv/glusterfs/disk0/home_brick
Brick3: shockley.gluster.private:/srv/glusterfs/disk0/home_arbiter_brick
(arbiter)
Brick4: read.gluster.private:/srv/glusterfs/disk1/home_brick
Brick5: hall.gluster.private:/srv/glusterfs/disk1/home_brick
Brick6: shockley.gluster.private:/srv/glusterfs/disk1/home_arbiter_brick
(arbiter)
Brick7: read.gluster.private:/srv/glusterfs/disk2/home_brick
Brick8: hall.gluster.private:/srv/glusterfs/disk2/home_brick
Brick9: shockley.gluster.private:/srv/glusterfs/disk2/home_arbiter_brick
(arbiter)
Options Reconfigured:
nfs.acl: off
nfs.addr-namelookup: off
network.ping-timeout: 10
cluster.server-quorum-type: server
cluster.quorum-type: auto
storage.batch-fsync-delay-usec: 0
nfs.enable-ino32: off
cluster.lookup-optimize: on
performance.write-behind-window-size: 32MB
performance.write-behind: on
performance.io-thread-count: 8
performance.cache-refresh-timeout: 4
performance.stat-prefetch: off
client.bind-insecure: on
server.allow-insecure: on
storage.owner-gid: 0
storage.owner-uid: 0
nfs.auth-cache-ttl-sec: 600
nfs.auth-refresh-interval-sec: 600
nfs.exports-auth-enable: off
nfs.disable: off
user.cifs: disable
performance.readdir-ahead: on

I will attach xz-compressed tar archives of the /var/log/gluster dir on each
node.

Brief summary of the IPs involved:

shockley (arbiter node):
NFS: 172.25.15.21
GlusterFS: 172.25.5.21

read (node1):
NFS: 172.25.15.22
GlusterFS: 172.25.5.22

hall (node2):
NFS: 172.25.15.23
GlusterFS: 172.25.5.23

The CTDB cluster assigns (only between node1 and node2) the additional NFS IPs:
172.25.15.202
172.25.15.203

Other relevant configuration notes:

Gluster/CTDB are CPU-bound by means of a systemd slice (with CPUShares=1024).

-- 
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=W0QiQOuYBB&a=cc_unsubscribe