[Gluster-users] Very poor heal behaviour in 3.7.9

Krutika Dhananjay kdhananj at redhat.com
Fri Mar 25 12:33:47 UTC 2016


Hi,

There is one bug that was uncovered recently wherein the same file could
possibly get healed twice before marking that it no longer needs a heal.
Pranith sent a patch @ http://review.gluster.org/#/c/13766/ to fix this,
although IIUC this bug existed in versions < 3.7.9 as well.
Also because of this bug, files that need heal may appear in heal-info
output longer than they ought to.
Did you see this issue in versions < 3.7.9 as well?

-Krutika


On Fri, Mar 25, 2016 at 1:04 PM, Lindsay Mathieson <
lindsay.mathieson at gmail.com> wrote:

> Have resumed testing with 3.7.9 - this time I have propery hardware behind
> it,
>
> - 3 nodes
> - each node with 4 WD Reds in ZFS raid 10
> - SSD for slog and cache.
>
> Using a sharded VM setup (4MB shards) and performance has been excellent,
> better than ceph on the same hardware. I have some interesting notes on
> that I will detail later.
>
> However unlike with 3.7.7, heal performance has been abysmal - deal
> breaking in fact. Maybe its my setup?
>
> Have been testing healing by killing  the glusterfsd and glusterd
> processes on another node and letting a VM run. Everything is fun at this
> point, despite a node being down, reads and writes continue normally.
>
> However a heal info shows what appears to be an excessive number of shards
> being marked as needing heals. A simple reboot of a Windows VM results in
> 360 4MB shards - 1.5GB of data. A compile resulted in 7GB of shards being
> touched. Could there be some write amplification at work?
>
> However once I restart the glusterd process, which starts glisterfsd
> performance becomes atrocious. Disk IO nearly stops and any running VM's
> hang or slow down and *lot* until the heal is complete. The "heal info"
> command appears to hang as well, not comppleting at all. A build process
> that was taking 4 min's took over an hour.
>
> Once the heal finishes, I/O returns to normal.
>
>
> Heres a fragment of the glfsheal log
>
> [2016-03-25 07:12:51.041590] I [MSGID: 114057]
> [client-handshake.c:1437:select_server_supported_programs]
> 0-datastore2-client-2: Using Program GlusterFS 3.3, Num (1298437), Version
> (330)
> [2016-03-25 07:12:51.041637] I [rpc-clnt.c:1847:rpc_clnt_reconfig]
> 0-datastore2-client-1: changing port to 49153 (from 0)
> [2016-03-25 07:12:51.041808] I [MSGID: 114046]
> [client-handshake.c:1213:client_setvolume_cbk] 0-datastore2-client-2:
> Connected to datastore2-client-2, attached to remote volume
> '/tank/vmdata/datastore2'.
> [2016-03-25 07:12:51.041826] I [MSGID: 114047]
> [client-handshake.c:1224:client_setvolume_cbk] 0-datastore2-client-2:
> Server and Client lk-version numbers are not same, reopening the fds
> [2016-03-25 07:12:51.041901] I [MSGID: 108005]
> [afr-common.c:4010:afr_notify] 0-datastore2-replicate-0: Subvolume
> 'datastore2-client-2' came back up; going online.
> [2016-03-25 07:12:51.041929] I [MSGID: 114057]
> [client-handshake.c:1437:select_server_supported_programs]
> 0-datastore2-client-0: *Using Program GlusterFS 3.3, Num (1298437),
> Version (330)*
> [2016-03-25 07:12:51.041955] I [MSGID: 114035]
> [client-handshake.c:193:client_set_lk_version_cbk] 0-datastore2-client-2:
> Server lk version = 1
> [2016-03-25 07:12:51.042319] I [MSGID: 114046]
> [client-handshake.c:1213:client_setvolume_cbk] 0-datastore2-client-0:
> Connected to datastore2-client-0, attached to remote volume
> '/tank/vmdata/datastore2'.
> [2016-03-25 07:12:51.042333] I [MSGID: 114047]
> [client-handshake.c:1224:client_setvolume_cbk] 0-datastore2-client-0:
> Server and Client lk-version numbers are not same, reopening the fds
> [2016-03-25 07:12:51.042455] I [MSGID: 114057]
> [client-handshake.c:1437:select_server_supported_programs]
> 0-datastore2-client-1: Using Program GlusterFS 3.3, Num (1298437), Version
> (330)
> [2016-03-25 07:12:51.042520] I [MSGID: 114035]
> [client-handshake.c:193:client_set_lk_version_cbk] 0-datastore2-client-0:
> Server lk version = 1
> [2016-03-25 07:12:51.042846] I [MSGID: 114046]
> [client-handshake.c:1213:client_setvolume_cbk] 0-datastore2-client-1:
> Connected to datastore2-client-1, attached to remote volume
> '/tank/vmdata/datastore2'.
> [2016-03-25 07:12:51.042867] I [MSGID: 114047]
> [client-handshake.c:1224:client_setvolume_cbk] 0-datastore2-client-1:
> Server and Client lk-version numbers are not same, reopening the fds
> [2016-03-25 07:12:51.058131] I [MSGID: 114035]
> [client-handshake.c:193:client_set_lk_version_cbk] 0-datastore2-client-1:
> Server lk version = 1
> [2016-03-25 07:12:51.059075] I [MSGID: 108031]
> [afr-common.c:1913:afr_local_discovery_cbk] 0-datastore2-replicate-0:
> selecting local read_child datastore2-client-2
> [2016-03-25 07:12:51.059619] I [MSGID: 104041]
> [glfs-resolve.c:869:__glfs_active_subvol] 0-datastore2: switched to graph
> 766e612d-3739-3437-352d-323031362d30 (0)
>
>
> I have no idea while client version 3.3 is being used! everything should
> be 3.7.9
>
>
> Environment:
>
> - Proxmox (debian Jessie, 8.2)
> - KVM VM's using gfapi, running on the same nodes as the gluster bricks
> - bricks are hosted on 3 ZFS Pools (one per node)
>     * compression =lz4
>     * xattr=sa
>     * sync=standard
>     * acltype=posixacl
>
> Volume info:
> Volume Name: datastore2
> Type: Replicate
> Volume ID: 7d93a1c6-ac39-4d94-b136-e8379643bddd
> Status: Started
> Number of Bricks: 1 x 3 = 3
> Transport-type: tcp
> Bricks:
> Brick1: vnb.proxmox.softlog:/tank/vmdata/datastore2
> Brick2: vng.proxmox.softlog:/tank/vmdata/datastore2
> Brick3: vna.proxmox.softlog:/tank/vmdata/datastore2
> Options Reconfigured:
> performance.readdir-ahead: on
> nfs.addr-namelookup: off
> nfs.enable-ino32: off
> features.shard: on
> cluster.quorum-type: auto
> cluster.server-quorum-type: server
> nfs.disable: on
> performance.write-behind: off
> performance.strict-write-ordering: on
> performance.stat-prefetch: off
> performance.quick-read: off
> performance.read-ahead: off
> performance.io-cache: off
> cluster.eager-lock: enable
> network.remote-dio: enable
>
>
>
> I can do any testing required, bring back logs etc. Can't build gluster
> though.
>
>
> thanks,
>
>
> --
> Lindsay Mathieson
>
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160325/9373c7aa/attachment.html>


More information about the Gluster-users mailing list