[Gluster-users] ganesha.nfsd process dies when copying files

Fri Aug 10 12:08:57 UTC 2018

Hey all!

I am playing around on my computer with setting up a virtual mini-
cluster of five VM's:

1x router
1x client
3x Gluster/NFS-Ganesha servers

The router is pfSense, the client is Xubuntu 18.04 and the servers are
CentOS 7.5.

I set up the cluster using 'gdeploy' with configuration snippets taken
from oVirt/Cockpit HCI setup and another snippet for setting up the
NFS-Ganesha part of it. The configuration is successful apart from some
minor details I debugged but I'm fairly sure I haven't made any obvious
misses.

All of the VM's are registered in pfSense's DNS, as well as the VIP's
for the NFS-Ganesha nodes, which works great and the client have no
issues with resolving any of the names.

hv01.localdomain	192.168.1.101
hv02.localdomain	192.168.1.102
hv03.localdomain	192.168.1.103
hv01v.localdomain	192.168.1.110
hv02v.localdomain	192.168.1.111
hv03v.localdomain	192.168.1.112

The cluster status is HEALTHY accoring to
'/usr/libexec/ganesha/ganesha-ha.sh' before I start my tests:

client# mount -t nfs -o vers=4.1 hv01v.localdomain:/data /mnt
client# dd if=/dev/urandom of=/var/tmp/test.bin bs=1M count=1024
client# while true; do rsync /var/tmp/test.bin /mnt/; rm -f
/mnt/test.bin; done

Then after a while, the 'nfs-ganesha' service unexpectedly dies and
doesn't restart by itself. The copy loop gets picked up after a while
on 'hv02' until history repeats itself until all of the nodes' 'nfs-
ganesha' services are dead.

With normal logs activated, the dead node says nothing before dying;
sudden heart attack syndrome- so no clues there, and ones remaining
only says they've taken over...

Right now I'm running with FULL_DEBUG which makes testing very
difficult since the throughput is down to a crawl. Nothing strange
about that, just takes a lot more time to provoke.

Please don't hesitate to ask for more information in case there's
something else you'd like me to share!

I'm hoping someone recognizes this behaviour and knows what I'm doing
wrong:)

glusterfs-client-xlators-3.10.12-1.el7.x86_64
glusterfs-api-3.10.12-1.el7.x86_64
nfs-ganesha-2.4.5-1.el7.x86_64
centos-release-gluster310-1.0-1.el7.centos.noarch
glusterfs-3.10.12-1.el7.x86_64
glusterfs-cli-3.10.12-1.el7.x86_64
nfs-ganesha-gluster-2.4.5-1.el7.x86_64
glusterfs-server-3.10.12-1.el7.x86_64
glusterfs-libs-3.10.12-1.el7.x86_64
glusterfs-fuse-3.10.12-1.el7.x86_64
glusterfs-ganesha-3.10.12-1.el7.x86_64

Thanks in advance!

/K
-------------- next part --------------
#gdeploy configuration generated by cockpit-gluster plugin
[hosts]
hv01.localdomain
hv02.localdomain
hv03.localdomain

[yum]
action=install
repolist=
gpgcheck=no
update=no
packages=glusterfs-server,glusterfs-api,glusterfs-ganesha,nfs-ganesha,nfs-ganesha-gluster,policycoreutils-python,device-mapper-multipath,corosync,pacemaker,pcs

[script1:hv01.localdomain]
action=execute
ignore_script_errors=no
file=/usr/share/gdeploy/scripts/grafton-sanity-check.sh -d vdb -h hv01.localdomain,hv02.localdomain,hv03.localdomain

[script1:hv02.localdomain]
action=execute
ignore_script_errors=no
file=/usr/share/gdeploy/scripts/grafton-sanity-check.sh -d vdb -h hv01.localdomain,hv02.localdomain,hv03.localdomain

[script1:hv03.localdomain]
action=execute
ignore_script_errors=no
file=/usr/share/gdeploy/scripts/grafton-sanity-check.sh -d vdb -h hv01.localdomain,hv02.localdomain,hv03.localdomain

[disktype]
jbod

[diskcount]
12

[stripesize]
256

[service1]
action=enable
service=chronyd

[service2]
action=restart
service=chronyd

[script3]
action=execute
file=/usr/share/gdeploy/scripts/blacklist_all_disks.sh
ignore_script_errors=no

[pv1:hv01.localdomain]
action=create
devices=vdb
ignore_pv_errors=no

[pv1:hv02.localdomain]
action=create
devices=vdb
ignore_pv_errors=no

[pv1:hv03.localdomain]
action=create
devices=vdb
ignore_pv_errors=no

[vg1:hv01.localdomain]
action=create
vgname=gluster_vg_vdb
pvname=vdb
ignore_vg_errors=no

[vg1:hv02.localdomain]
action=create
vgname=gluster_vg_vdb
pvname=vdb
ignore_vg_errors=no

[vg1:hv03.localdomain]
action=create
vgname=gluster_vg_vdb
pvname=vdb
ignore_vg_errors=no

[lv1:hv01.localdomain]
action=create
poolname=gluster_thinpool_vdb
ignore_lv_errors=no
vgname=gluster_vg_vdb
lvtype=thinpool
size=450GB
poolmetadatasize=3GB

[lv2:hv02.localdomain]
action=create
poolname=gluster_thinpool_vdb
ignore_lv_errors=no
vgname=gluster_vg_vdb
lvtype=thinpool
size=450GB
poolmetadatasize=3GB

[lv3:hv03.localdomain]
action=create
poolname=gluster_thinpool_vdb
ignore_lv_errors=no
vgname=gluster_vg_vdb
lvtype=thinpool
size=45GB
poolmetadatasize=1GB

[lv4:hv01.localdomain]
action=create
lvname=gluster_lv_data
ignore_lv_errors=no
vgname=gluster_vg_vdb
mount=/gluster_bricks/data
lvtype=thinlv
poolname=gluster_thinpool_vdb
virtualsize=450GB

[lv5:hv02.localdomain]
action=create
lvname=gluster_lv_data
ignore_lv_errors=no
vgname=gluster_vg_vdb
mount=/gluster_bricks/data
lvtype=thinlv
poolname=gluster_thinpool_vdb
virtualsize=450GB

[lv6:hv03.localdomain]
action=create
lvname=gluster_lv_data
ignore_lv_errors=no
vgname=gluster_vg_vdb
mount=/gluster_bricks/data
lvtype=thinlv
poolname=gluster_thinpool_vdb
virtualsize=45GB

[selinux]
yes

[service3]
action=restart
service=glusterd
slice_setup=yes

[firewalld]
action=add
ports=111/tcp,2049/tcp,54321/tcp,5900/tcp,5900-6923/tcp,5666/tcp,16514/tcp,662/tcp,662/udp,892/tcp,892/udp,2020/tcp,2020/udp,875/tcp,875/udp
services=glusterfs,nfs,rpc-bind,high-availability,mountd

[script2]
action=execute
file=/usr/share/gdeploy/scripts/disable-gluster-hooks.sh

[volume1]
action=create
volname=data
transport=tcp
replica=yes
replica_count=3
key=group,storage.owner-uid,storage.owner-gid,network.ping-timeout,performance.strict-o-direct,network.remote-dio,cluster.granular-entry-heal
value=virt,0,0,30,on,off,enable
brick_dirs=hv01.localdomain:/gluster_bricks/data/data,hv02.localdomain:/gluster_bricks/data/data,hv03.localdomain:/gluster_bricks/data/data
ignore_volume_errors=no
arbiter_count=1

[nfs-ganesha]
action=create-cluster
ha-name=ganesha-ha-360
cluster-nodes=hv01.localdomain,hv02.localdomain,hv03.localdomain
vip=192.168.1.110,192.168.1.111,192.168.1.112
volname=data