[Gluster-users] gluster 3.7.3 - volume heal info hangs - unknown heal status

Andreas Mather andreas at allaboutapps.at
Thu Sep 24 07:54:12 UTC 2015


Hi!

Our provider had network maintenance this night, so 2 of our 4 servers got
disconnected and reconnected. Since we knew this was coming, we shifted all
work load off the affected servers. This morning, most of the cluster seems
fine, but for one volume, no heal info can be retrieved, so we basically
don't know about the healing state of the volume. The volume is a replica 2
volume between vhost4-int/brick1 and vhost3-int/brick2.

The volume is accessible, but since I don't get any heal info, I don't know
if it is probably replicated. Any help to resolve this situation is highly
appreciated.

hangs forever:
[root at vhost4 ~]# gluster volume heal vol4 info

glfsheal-vol4.log:
[2015-09-24 07:47:59.284723] I [MSGID: 101190]
[event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 1
[2015-09-24 07:47:59.293735] I [MSGID: 101190]
[event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread
with index 2
[2015-09-24 07:47:59.294061] I [MSGID: 104045] [glfs-master.c:95:notify]
0-gfapi: New graph 76686f73-7434-2e61-6c6c-61626f757461 (0) coming up
[2015-09-24 07:47:59.294081] I [MSGID: 114020] [client.c:2118:notify]
0-vol4-client-1: parent translators are ready, attempting connect on
transport
[2015-09-24 07:47:59.309470] I [MSGID: 114020] [client.c:2118:notify]
0-vol4-client-2: parent translators are ready, attempting connect on
transport
[2015-09-24 07:47:59.310525] I [rpc-clnt.c:1819:rpc_clnt_reconfig]
0-vol4-client-1: changing port to 49155 (from 0)
[2015-09-24 07:47:59.315958] I [MSGID: 114057]
[client-handshake.c:1437:select_server_supported_programs] 0-vol4-client-1:
Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2015-09-24 07:47:59.316481] I [MSGID: 114046]
[client-handshake.c:1213:client_setvolume_cbk] 0-vol4-client-1: Connected
to vol4-client-1, attached to remote volume '/storage/brick2/brick2'.
[2015-09-24 07:47:59.316495] I [MSGID: 114047]
[client-handshake.c:1224:client_setvolume_cbk] 0-vol4-client-1: Server and
Client lk-version numbers are not same, reopening the fds
[2015-09-24 07:47:59.316538] I [MSGID: 108005]
[afr-common.c:3960:afr_notify] 0-vol4-replicate-0: Subvolume
'vol4-client-1' came back up; going online.
[2015-09-24 07:47:59.317150] I [MSGID: 114035]
[client-handshake.c:193:client_set_lk_version_cbk] 0-vol4-client-1: Server
lk version = 1
[2015-09-24 07:47:59.320898] I [rpc-clnt.c:1819:rpc_clnt_reconfig]
0-vol4-client-2: changing port to 49154 (from 0)
[2015-09-24 07:47:59.325633] I [MSGID: 114057]
[client-handshake.c:1437:select_server_supported_programs] 0-vol4-client-2:
Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2015-09-24 07:47:59.325780] I [MSGID: 114046]
[client-handshake.c:1213:client_setvolume_cbk] 0-vol4-client-2: Connected
to vol4-client-2, attached to remote volume '/storage/brick1/brick1'.
[2015-09-24 07:47:59.325791] I [MSGID: 114047]
[client-handshake.c:1224:client_setvolume_cbk] 0-vol4-client-2: Server and
Client lk-version numbers are not same, reopening the fds
[2015-09-24 07:47:59.333346] I [MSGID: 114035]
[client-handshake.c:193:client_set_lk_version_cbk] 0-vol4-client-2: Server
lk version = 1
[2015-09-24 07:47:59.334545] I [MSGID: 108031]
[afr-common.c:1745:afr_local_discovery_cbk] 0-vol4-replicate-0: selecting
local read_child vol4-client-2
[2015-09-24 07:47:59.335833] I [MSGID: 104041]
[glfs-resolve.c:862:__glfs_active_subvol] 0-vol4: switched to graph
76686f73-7434-2e61-6c6c-61626f757461 (0)

Questions to this output:
-) Why does it report "Using Program GlusterFS 3.3, Num (1298437), Version
(330)". We run 3.7.3 ?!
-) guster logs timestamps in UTC not taking server timezone into account.
Is there a way to fix this?

etc-glusterfs-glusterd.vol.log:
no logs to after volume heal info command

storage-brick1-brick1.log:
[2015-09-24 07:47:59.325720] I [login.c:81:gf_auth] 0-auth/login: allowed
user names: 67ef1559-d3a1-403a-b8e7-fb145c3acf4e
[2015-09-24 07:47:59.325743] I [MSGID: 115029]
[server-handshake.c:610:server_setvolume] 0-vol4-server: accepted client
from
vhost4.allaboutapps.at-14900-2015/09/24-07:47:59:282313-vol4-client-2-0-0
(version: 3.7.3)

storage-brick2-brick2.log:
no logs to after volume heal info command


Thanks,

- Andreas
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150924/28d731e1/attachment.html>


More information about the Gluster-users mailing list