[Gluster-users] Self-Heal Daemon not starting after upgrade 6.10 to 7.8

Mon Nov 2 15:05:54 UTC 2020

Dear Gluster users,

I'm trying to upgrade from gluster 6.10 to 7.8, i've currently tried this
on 2 hosts, but on both the Self-Heal Daemon refuses to start.
It could be because not all not are updated yet, but i'm a bit hesitant to
continue, without the Self-Heal Daemon running.
I'm not using quata's and i'm not seeing the peer reject messages, as other
users reported in the mailing list.
In fact gluster peer status and gluster pool list, display all nodes as
connected.
Also gluster v heal <vol> info shows all nodes as Status: connected,
however some report pending heals, which don't really seem to progress.
Only in gluster v status <vol> the 2 upgraded nodes report not running;

Self-heal Daemon on localhost               N/A       N/A        N       N/A
Self-heal Daemon on 10.32.9.5               N/A       N/A        Y
24022
Self-heal Daemon on 10.201.0.4              N/A       N/A        Y
26704
Self-heal Daemon on 10.201.0.3              N/A       N/A        N       N/A
Self-heal Daemon on 10.32.9.4               N/A       N/A        Y
46294
Self-heal Daemon on 10.32.9.3               N/A       N/A        Y
22194
Self-heal Daemon on 10.201.0.9              N/A       N/A        Y
14902
Self-heal Daemon on 10.201.0.6              N/A       N/A        Y
5358
Self-heal Daemon on 10.201.0.5              N/A       N/A        Y
28073
Self-heal Daemon on 10.201.0.7              N/A       N/A        Y
15385
Self-heal Daemon on 10.201.0.1              N/A       N/A        Y
8917
Self-heal Daemon on 10.201.0.12             N/A       N/A        Y
56796
Self-heal Daemon on 10.201.0.8              N/A       N/A        Y
7990
Self-heal Daemon on 10.201.0.11             N/A       N/A        Y
68223
Self-heal Daemon on 10.201.0.10             N/A       N/A        Y
20828

After the upgrade i see the file /var/lib/glusterd/vols/<vol>/<vol>-shd.vol
being created, which doesn't exists on the 6.10 nodes.

in the logs i see these relevant messages;
log: glusterd.log
0-management: Regenerating volfiles due to a max op-version mismatch or
glusterd.upgrade file not being present, op_version retrieved:60000, max
op_version: 70200

[2020-10-31 21:48:42.256193] W [MSGID: 106204]
[glusterd-store.c:3275:glusterd_store_update_volinfo] 0-management: Unknown
key: tier-enabled
[2020-10-31 21:48:42.256232] W [MSGID: 106204]
[glusterd-store.c:3275:glusterd_store_update_volinfo] 0-management: Unknown
key: brick-0
[2020-10-31 21:48:42.256240] W [MSGID: 106204]
[glusterd-store.c:3275:glusterd_store_update_volinfo] 0-management: Unknown
key: brick-1
[2020-10-31 21:48:42.256246] W [MSGID: 106204]
[glusterd-store.c:3275:glusterd_store_update_volinfo] 0-management: Unknown
key: brick-2
[2020-10-31 21:48:42.256251] W [MSGID: 106204]
[glusterd-store.c:3275:glusterd_store_update_volinfo] 0-management: Unknown
key: brick-3
[2020-10-31 21:48:42.256256] W [MSGID: 106204]
[glusterd-store.c:3275:glusterd_store_update_volinfo] 0-management: Unknown
key: brick-4
[2020-10-31 21:48:42.256261] W [MSGID: 106204]
[glusterd-store.c:3275:glusterd_store_update_volinfo] 0-management: Unknown
key: brick-5
[2020-10-31 21:48:42.256266] W [MSGID: 106204]
[glusterd-store.c:3275:glusterd_store_update_volinfo] 0-management: Unknown
key: brick-6
[2020-10-31 21:48:42.256271] W [MSGID: 106204]
[glusterd-store.c:3275:glusterd_store_update_volinfo] 0-management: Unknown
key: brick-7
[2020-10-31 21:48:42.256276] W [MSGID: 106204]
[glusterd-store.c:3275:glusterd_store_update_volinfo] 0-management: Unknown
key: brick-8

[2020-10-31 21:51:36.049009] W [MSGID: 106617]
[glusterd-svc-helper.c:948:glusterd_attach_svc] 0-glusterd: attach failed
for glustershd(volume=backups)
[2020-10-31 21:51:36.049055] E [MSGID: 106048]
[glusterd-shd-svc.c:482:glusterd_shdsvc_start] 0-glusterd: Failed to attach
shd svc(volume=backups) to pid=9262
[2020-10-31 21:51:36.049138] E [MSGID: 106615]
[glusterd-shd-svc.c:638:glusterd_shdsvc_restart] 0-management: Couldn't
start shd for vol: backups on restart
[2020-10-31 21:51:36.183133] I [MSGID: 106618]
[glusterd-svc-helper.c:901:glusterd_attach_svc] 0-glusterd: adding svc
glustershd (volume=backups) to existing process with pid 9262

log: glustershd.log

[2020-10-31 21:49:55.976120] I [MSGID: 100041]
[glusterfsd-mgmt.c:1111:glusterfs_handle_svc_attach] 0-glusterfs: received
attach request for volfile-id=shd/backups
[2020-10-31 21:49:55.976136] W [MSGID: 100042]
[glusterfsd-mgmt.c:1137:glusterfs_handle_svc_attach] 0-glusterfs: got
attach for shd/backups but no active graph [Invalid argument]

So i suspect something in the logic for the self-heal daemon has
changed, since it has the new *.vol configuration for the shd. Question is,
is this just a transitional state, till all nodes are upgraded. And thus
safe to continue the update. Or is this something that should be fixed, and
if so, any clues how?

Thanks Olaf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20201102/626fdf83/attachment.html>