[Gluster-users] Very poor heal behaviour in 3.7.9

Fri Mar 25 07:34:21 UTC 2016

Have resumed testing with 3.7.9 - this time I have propery hardware 
behind it,

- 3 nodes
- each node with 4 WD Reds in ZFS raid 10
- SSD for slog and cache.

Using a sharded VM setup (4MB shards) and performance has been 
excellent, better than ceph on the same hardware. I have some 
interesting notes on that I will detail later.

However unlike with 3.7.7, heal performance has been abysmal - deal 
breaking in fact. Maybe its my setup?

Have been testing healing by killing  the glusterfsd and glusterd 
processes on another node and letting a VM run. Everything is fun at 
this point, despite a node being down, reads and writes continue normally.

However a heal info shows what appears to be an excessive number of 
shards being marked as needing heals. A simple reboot of a Windows VM 
results in 360 4MB shards - 1.5GB of data. A compile resulted in 7GB of 
shards being touched. Could there be some write amplification at work?

However once I restart the glusterd process, which starts glisterfsd 
performance becomes atrocious. Disk IO nearly stops and any running VM's 
hang or slow down and *lot* until the heal is complete. The "heal info" 
command appears to hang as well, not comppleting at all. A build process 
that was taking 4 min's took over an hour.

Once the heal finishes, I/O returns to normal.

Heres a fragment of the glfsheal log

[2016-03-25 07:12:51.041590] I [MSGID: 114057] 
[client-handshake.c:1437:select_server_supported_programs] 
0-datastore2-client-2: Using Program GlusterFS 3.3, Num (1298437), 
Version (330)
[2016-03-25 07:12:51.041637] I [rpc-clnt.c:1847:rpc_clnt_reconfig] 
0-datastore2-client-1: changing port to 49153 (from 0)
[2016-03-25 07:12:51.041808] I [MSGID: 114046] 
[client-handshake.c:1213:client_setvolume_cbk] 0-datastore2-client-2: 
Connected to datastore2-client-2, attached to remote volume 
'/tank/vmdata/datastore2'.
[2016-03-25 07:12:51.041826] I [MSGID: 114047] 
[client-handshake.c:1224:client_setvolume_cbk] 0-datastore2-client-2: 
Server and Client lk-version numbers are not same, reopening the fds
[2016-03-25 07:12:51.041901] I [MSGID: 108005] 
[afr-common.c:4010:afr_notify] 0-datastore2-replicate-0: Subvolume 
'datastore2-client-2' came back up; going online.
[2016-03-25 07:12:51.041929] I [MSGID: 114057] 
[client-handshake.c:1437:select_server_supported_programs] 
0-datastore2-client-0: *Using Program GlusterFS 3.3, Num (1298437), 
Version (330)*
[2016-03-25 07:12:51.041955] I [MSGID: 114035] 
[client-handshake.c:193:client_set_lk_version_cbk] 
0-datastore2-client-2: Server lk version = 1
[2016-03-25 07:12:51.042319] I [MSGID: 114046] 
[client-handshake.c:1213:client_setvolume_cbk] 0-datastore2-client-0: 
Connected to datastore2-client-0, attached to remote volume 
'/tank/vmdata/datastore2'.
[2016-03-25 07:12:51.042333] I [MSGID: 114047] 
[client-handshake.c:1224:client_setvolume_cbk] 0-datastore2-client-0: 
Server and Client lk-version numbers are not same, reopening the fds
[2016-03-25 07:12:51.042455] I [MSGID: 114057] 
[client-handshake.c:1437:select_server_supported_programs] 
0-datastore2-client-1: Using Program GlusterFS 3.3, Num (1298437), 
Version (330)
[2016-03-25 07:12:51.042520] I [MSGID: 114035] 
[client-handshake.c:193:client_set_lk_version_cbk] 
0-datastore2-client-0: Server lk version = 1
[2016-03-25 07:12:51.042846] I [MSGID: 114046] 
[client-handshake.c:1213:client_setvolume_cbk] 0-datastore2-client-1: 
Connected to datastore2-client-1, attached to remote volume 
'/tank/vmdata/datastore2'.
[2016-03-25 07:12:51.042867] I [MSGID: 114047] 
[client-handshake.c:1224:client_setvolume_cbk] 0-datastore2-client-1: 
Server and Client lk-version numbers are not same, reopening the fds
[2016-03-25 07:12:51.058131] I [MSGID: 114035] 
[client-handshake.c:193:client_set_lk_version_cbk] 
0-datastore2-client-1: Server lk version = 1
[2016-03-25 07:12:51.059075] I [MSGID: 108031] 
[afr-common.c:1913:afr_local_discovery_cbk] 0-datastore2-replicate-0: 
selecting local read_child datastore2-client-2
[2016-03-25 07:12:51.059619] I [MSGID: 104041] 
[glfs-resolve.c:869:__glfs_active_subvol] 0-datastore2: switched to 
graph 766e612d-3739-3437-352d-323031362d30 (0)

I have no idea while client version 3.3 is being used! everything should 
be 3.7.9

Environment:

- Proxmox (debian Jessie, 8.2)
- KVM VM's using gfapi, running on the same nodes as the gluster bricks
- bricks are hosted on 3 ZFS Pools (one per node)
     * compression =lz4
     * xattr=sa
     * sync=standard
     * acltype=posixacl

Volume info:
Volume Name: datastore2
Type: Replicate
Volume ID: 7d93a1c6-ac39-4d94-b136-e8379643bddd
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: vnb.proxmox.softlog:/tank/vmdata/datastore2
Brick2: vng.proxmox.softlog:/tank/vmdata/datastore2
Brick3: vna.proxmox.softlog:/tank/vmdata/datastore2
Options Reconfigured:
performance.readdir-ahead: on
nfs.addr-namelookup: off
nfs.enable-ino32: off
features.shard: on
cluster.quorum-type: auto
cluster.server-quorum-type: server
nfs.disable: on
performance.write-behind: off
performance.strict-write-ordering: on
performance.stat-prefetch: off
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
cluster.eager-lock: enable
network.remote-dio: enable

I can do any testing required, bring back logs etc. Can't build gluster 
though.

thanks,

-- 
Lindsay Mathieson

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160325/e9d2b80d/attachment.html>