<div dir="ltr"><div>We've tried to remove "sg" from the cluster so we can re-install the GlusterFS node on it, but the following command run on "br" also gives a timeout error:</div><div><br></div><div>gluster volume remove-brick gvol0 replica 1 sg:/nodirectwritedata/gluster/gvol0 force</div><div><br></div><div>How can we tell "br" to just remove "sg" without trying to contact it?</div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, 24 Feb 2023 at 10:31, David Cunningham <<a href="mailto:dcunningham@voisonics.com">dcunningham@voisonics.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>Hello,</div><div><br></div><div>We have a cluster with two nodes, "sg" and "br", which were running GlusterFS 9.1, installed via the Ubuntu package manager. We updated the Ubuntu packages on "sg" to version 9.6, and now have big problems. The "br" node is still on version 9.1.</div><div><br></div><div>Running "gluster volume status" on either host gives "Error : Request timed out". On "sg" not all processes are running, compared to "br", as below. Restarting the services on "sg" doesn't help. Can anyone advise how we should proceed? This is a production system.</div><div></div><div><br></div><div>root@sg:~# ps -ef | grep gluster<br>root 15196 1 0 22:37 ? 00:00:00 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO<br>root 15426 1 0 22:39 ? 00:00:00 /usr/bin/python3 /usr/sbin/glustereventsd --pid-file /var/run/glustereventsd.pid<br>root 15457 15426 0 22:39 ? 00:00:00 /usr/bin/python3 /usr/sbin/glustereventsd --pid-file /var/run/glustereventsd.pid<br>root 19341 13695 0 23:24 pts/1 00:00:00 grep --color=auto gluster</div><div><br></div><div>root@br:~# ps -ef | grep gluster<br>root 2052 1 0 2022 ? 00:00:00 /usr/bin/python3 /usr/sbin/glustereventsd --pid-file /var/run/glustereventsd.pid<br>root 2062 1 3 2022 ? 10-11:57:16 /usr/sbin/glusterfs --fuse-mountopts=noatime --process-name fuse --volfile-server=br --volfile-server=sg --volfile-id=/gvol0 --fuse-mountopts=noatime /mnt/glusterfs<br>root 2379 2052 0 2022 ? 00:00:00 /usr/bin/python3 /usr/sbin/glustereventsd --pid-file /var/run/glustereventsd.pid<br>root 5884 1 5 2022 ? 18-16:08:53 /usr/sbin/glusterfsd -s br --volfile-id gvol0.br.nodirectwritedata-gluster-gvol0 -p /var/run/gluster/vols/gvol0/br-nodirectwritedata-gluster-gvol0.pid -S /var/run/gluster/61df1d4e1c65300e.socket --brick-name /nodirectwritedata/gluster/gvol0 -l /var/log/glusterfs/bricks/nodirectwritedata-gluster-gvol0.log --xlator-option *-posix.glusterd-uuid=11e528b0-8c69-4b5d-82ed-c41dd25536d6 --process-name brick --brick-port 49152 --xlator-option gvol0-server.listen-port=49152<br>root 10463 18747 0 23:24 pts/1 00:00:00 grep --color=auto gluster<br>root 27744 1 0 2022 ? 03:55:10 /usr/sbin/glusterfsd -s br --volfile-id gvol0.br.nodirectwritedata-gluster-gvol0 -p /var/run/gluster/vols/gvol0/br-nodirectwritedata-gluster-gvol0.pid -S /var/run/gluster/61df1d4e1c65300e.socket --brick-name /nodirectwritedata/gluster/gvol0 -l /var/log/glusterfs/bricks/nodirectwritedata-gluster-gvol0.log --xlator-option *-posix.glusterd-uuid=11e528b0-8c69-4b5d-82ed-c41dd25536d6 --process-name brick --brick-port 49153 --xlator-option gvol0-server.listen-port=49153<br>root 48227 1 0 Feb17 ? 00:00:26 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO<br></div><div><br></div><div>On "sg" in glusterd.log we're seeing:<div><br></div><div>[2023-02-23
20:26:57.619318 +0000] E [rpc-clnt.c:181:call_bail] 0-management:
bailing out frame type(glusterd mgmt v3), op(--(6)), xid = 0x11, unique =
27, sent = 2023-02-23 20:16:50.596447 +0000, timeout = 600 for <a href="http://10.20.20.11:24007" target="_blank">10.20.20.11:24007</a><br>[2023-02-23 20:26:57.619425 +0000] E [MSGID: 106115] [glusterd-mgmt.c:122:gd_mgmt_v3_collate_errors] 0-management: Unlocking failed on br. Please check log file for details. <br>[2023-02-23 20:26:57.619545 +0000] E [MSGID: 106151] [glusterd-syncop.c:1655:gd_unlock_op_phase] 0-management: Failed to unlock on some peer(s) <br>[2023-02-23 20:26:57.619693 +0000] W [glusterd-locks.c:817:glusterd_mgmt_v3_unlock] (-->/usr/lib/x86_64-linux-gnu/glusterfs/9.6/xlator/mgmt/glusterd.so(+0xe19b9) [0x7fadf47fa9b9] -->/usr/lib/x86_64-linux-gnu/glusterfs/9.6/xlator/mgmt/glusterd.so(+0xe0e20) [0x7fadf47f9e20] -->/usr/lib/x86_64-linux-gnu/glusterfs/9.6/xlator/mgmt/glusterd.so(+0xe7904) [0x7fadf4800904] ) 0-management: Lock owner mismatch. Lock for vol gvol0 held by 11e528b0-8c69-4b5d-82ed-c41dd25536d6 <br>[2023-02-23 20:26:57.619780 +0000] E [MSGID: 106117] [glusterd-syncop.c:1679:gd_unlock_op_phase] 0-management: Unable to release lock for gvol0 <br>[2023-02-23 20:26:57.619939 +0000] I [socket.c:3811:socket_submit_outgoing_msg] 0-socket.management: not connected (priv->connected = -1)<br>[2023-02-23 20:26:57.619969 +0000] E [rpcsvc.c:1567:rpcsvc_submit_generic]
0-rpc-service: failed to submit message (XID: 0x3, Program: GlusterD
svc cli, ProgVers: 2, Proc: 27) to rpc-transport (socket.management)<br>[2023-02-23 20:26:57.619995 +0000] E [MSGID: 106430] [glusterd-utils.c:678:glusterd_submit_reply] 0-glusterd: Reply submission failed <br></div><div><br></div><div>And in the brick log:</div><div><br></div><div>[2023-02-23 20:22:56.717721 +0000] I [addr.c:54:compare_addr_and_update] 0-/nodirectwritedata/gluster/gvol0: allowed = "*", received addr = "10.20.20.11"<br>[2023-02-23 20:22:56.717817 +0000] I [login.c:110:gf_auth] 0-auth/login: allowed user names: a26c7de4-1236-4e0a-944a-cb82de7f7f0e<br>[2023-02-23 20:22:56.717840 +0000] I [MSGID: 115029] [server-handshake.c:561:server_setvolume] 0-gvol0-server: accepted client from CTX_ID:46b23c19-5114-4a20-9306-9ea6faf02d51-GRAPH_ID:0-PID:35568-HOST:br.m5voip.com-PC_NAME:gvol0-client-0-RECON_NO:-0 (version: 9.1) with subvol /nodirectwritedata/gluster/gvol0 <br>[2023-02-23 20:22:56.741545 +0000] W [socket.c:766:__socket_rwv] 0-tcp.gvol0-server: readv on <a href="http://10.20.20.11:49144" target="_blank">10.20.20.11:49144</a> failed (No data available)<br>[2023-02-23 20:22:56.741599 +0000] I [MSGID: 115036] [server.c:500:server_rpc_notify] 0-gvol0-server: disconnecting connection [{client-uid=CTX_ID:46b23c19-5114-4a20-9306-9ea6faf02d51-GRAPH_ID:0-PID:35568-HOST:br.m5voip.com-PC_NAME:gvol0-client-0-RECON_NO:-0}] <br>[2023-02-23 20:22:56.741866 +0000] I [MSGID: 101055] [client_t.c:397:gf_client_unref] 0-gvol0-server: Shutting down connection CTX_ID:46b23c19-5114-4a20-9306-9ea6faf02d51-GRAPH_ID:0-PID:35568-HOST:br.m5voip.com-PC_NAME:gvol0-client-0-RECON_NO:-0 <br></div><div><br></div></div><div>Thanks for your help,<br></div><div><br>-- <br><div dir="ltr"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div>David Cunningham, Voisonics Limited<br><a href="http://voisonics.com/" target="_blank">http://voisonics.com/</a><br>USA: +1 213 221 1092<br>New Zealand: +64 (0)28 2558 3782</div></div></div></div></div></div></div></div></div></div></div></div></div>
</blockquote></div><br clear="all"><br>-- <br><div dir="ltr" class="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div>David Cunningham, Voisonics Limited<br><a href="http://voisonics.com/" target="_blank">http://voisonics.com/</a><br>USA: +1 213 221 1092<br>New Zealand: +64 (0)28 2558 3782</div></div></div></div></div></div></div></div></div></div></div>