[Gluster-users] rpc/glusterd-locks error

Mon Feb 26 14:41:34 UTC 2018

Good morning.

We have a 6 node cluster. 3 nodes are participating in a replica 3 volume.
Naming convention:
xx01 - 3 nodes participating in ovirt_vol
xx02 - 3 nodes NOT particpating in ovirt_vol

Last week, restarted glusterd on each node in cluster to update (one at a
time).
The three xx01 nodes all show the following in glusterd.log:

[2018-02-26 14:31:47.330670] E [socket.c:2020:__socket_read_frag] 0-rpc:
wrong MSG-TYPE (29386) received from 172.26.30.9:24007
[2018-02-26 14:31:47.330879] W
[glusterd-locks.c:843:glusterd_mgmt_v3_unlock]
(-->/usr/lib64/glusterfs/3.12.5/xlator/mgmt/glusterd.so(+0x2322a)
[0x7f46020e922a]
-->/usr/lib64/glusterfs/3.12.5/xlator/mgmt/glusterd.so(+0x2d198)
[0x7f46020f3198]
-->/usr/lib64/glusterfs/3.12.5/xlator/mgmt/glusterd.so(+0xe4755)
[0x7f46021aa755] ) 0-management: Lock for vol ovirtprod_vol not held
[2018-02-26 14:31:47.331066] E [rpc-clnt.c:350:saved_frames_unwind] (-->
/lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f460d64dedb] (-->
/lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f460d412e6e] (-->
/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f460d412f8e] (-->
/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x90)[0x7f460d414710] (-->
/lib64/libgfrpc.so.0(rpc_clnt_notify+0x2a0)[0x7f460d415200] )))))
0-management: forced unwinding frame type(GLUSTERD-DUMP) op(DUMP(1)) called
at 2018-02-26 14:31:47.330496 (xid=0x72e0)
[2018-02-26 14:31:47.333993] E [socket.c:2020:__socket_read_frag] 0-rpc:
wrong MSG-TYPE (84253) received from 172.26.30.8:24007
[2018-02-26 14:31:47.334148] W
[glusterd-locks.c:843:glusterd_mgmt_v3_unlock]
(-->/usr/lib64/glusterfs/3.12.5/xlator/mgmt/glusterd.so(+0x2322a)
[0x7f46020e922a]
-->/usr/lib64/glusterfs/3.12.5/xlator/mgmt/glusterd.so(+0x2d198)
[0x7f46020f3198]
-->/usr/lib64/glusterfs/3.12.5/xlator/mgmt/glusterd.so(+0xe4755)
[0x7f46021aa755] ) 0-management: Lock for vol ovirtprod_vol not held
[2018-02-26 14:31:47.334317] E [rpc-clnt.c:350:saved_frames_unwind] (-->
/lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f460d64dedb] (-->
/lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f460d412e6e] (-->
/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f460d412f8e] (-->
/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x90)[0x7f460d414710] (-->
/lib64/libgfrpc.so.0(rpc_clnt_notify+0x2a0)[0x7f460d415200] )))))
0-management: forced unwinding frame type(GLUSTERD-DUMP) op(DUMP(1)) called
at 2018-02-26 14:31:47.333824 (xid=0x1494b)
[2018-02-26 14:31:48.511390] E [socket.c:2632:socket_poller]
0-socket.management: poll error on socket

Additionally, all show connectivity to 2 of the three hosts (itself, and a
second). None of the 3 show connectivity to the same host (xx01 show
connectivity to itself and yy01, yy01 show connectivity to itself and zz01,
zz01 shows itself and xx01).

However, xx02 hosts (non-volume participating, same cluster) show volume
info as being fine, and all xx01 hosts participating in volume.

In our dev environment, had to stop the volume, and restart glusterd on all
hosts, however for prod, that would mean a system wide outage and down
time, which needs to be avoided.

Any suggestions? Thanks.

vk
--------------------------------
Vineet Khandpur
UNIX System Administrator
Information Technology Services
University of Alberta Libraries
+1-780-492-4718
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180226/c3814d3b/attachment.html>