[Gluster-users] Big problems after update to 9.6

David Cunningham dcunningham at voisonics.com
Thu Feb 23 21:36:51 UTC 2023


We've tried to remove "sg" from the cluster so we can re-install the
GlusterFS node on it, but the following command run on "br" also gives a
timeout error:

gluster volume remove-brick gvol0 replica 1
sg:/nodirectwritedata/gluster/gvol0 force

How can we tell "br" to just remove "sg" without trying to contact it?


On Fri, 24 Feb 2023 at 10:31, David Cunningham <dcunningham at voisonics.com>
wrote:

> Hello,
>
> We have a cluster with two nodes, "sg" and "br", which were running
> GlusterFS 9.1, installed via the Ubuntu package manager. We updated the
> Ubuntu packages on "sg" to version 9.6, and now have big problems. The "br"
> node is still on version 9.1.
>
> Running "gluster volume status" on either host gives "Error : Request
> timed out". On "sg" not all processes are running, compared to "br", as
> below. Restarting the services on "sg" doesn't help. Can anyone advise how
> we should proceed? This is a production system.
>
> root at sg:~# ps -ef | grep gluster
> root     15196     1  0 22:37 ?        00:00:00 /usr/sbin/glusterd -p
> /var/run/glusterd.pid --log-level INFO
> root     15426     1  0 22:39 ?        00:00:00 /usr/bin/python3
> /usr/sbin/glustereventsd --pid-file /var/run/glustereventsd.pid
> root     15457 15426  0 22:39 ?        00:00:00 /usr/bin/python3
> /usr/sbin/glustereventsd --pid-file /var/run/glustereventsd.pid
> root     19341 13695  0 23:24 pts/1    00:00:00 grep --color=auto gluster
>
> root at br:~# ps -ef | grep gluster
> root      2052     1  0  2022 ?        00:00:00 /usr/bin/python3
> /usr/sbin/glustereventsd --pid-file /var/run/glustereventsd.pid
> root      2062     1  3  2022 ?        10-11:57:16 /usr/sbin/glusterfs
> --fuse-mountopts=noatime --process-name fuse --volfile-server=br
> --volfile-server=sg --volfile-id=/gvol0 --fuse-mountopts=noatime
> /mnt/glusterfs
> root      2379  2052  0  2022 ?        00:00:00 /usr/bin/python3
> /usr/sbin/glustereventsd --pid-file /var/run/glustereventsd.pid
> root      5884     1  5  2022 ?        18-16:08:53 /usr/sbin/glusterfsd -s
> br --volfile-id gvol0.br.nodirectwritedata-gluster-gvol0 -p
> /var/run/gluster/vols/gvol0/br-nodirectwritedata-gluster-gvol0.pid -S
> /var/run/gluster/61df1d4e1c65300e.socket --brick-name
> /nodirectwritedata/gluster/gvol0 -l
> /var/log/glusterfs/bricks/nodirectwritedata-gluster-gvol0.log
> --xlator-option *-posix.glusterd-uuid=11e528b0-8c69-4b5d-82ed-c41dd25536d6
> --process-name brick --brick-port 49152 --xlator-option
> gvol0-server.listen-port=49152
> root     10463 18747  0 23:24 pts/1    00:00:00 grep --color=auto gluster
> root     27744     1  0  2022 ?        03:55:10 /usr/sbin/glusterfsd -s br
> --volfile-id gvol0.br.nodirectwritedata-gluster-gvol0 -p
> /var/run/gluster/vols/gvol0/br-nodirectwritedata-gluster-gvol0.pid -S
> /var/run/gluster/61df1d4e1c65300e.socket --brick-name
> /nodirectwritedata/gluster/gvol0 -l
> /var/log/glusterfs/bricks/nodirectwritedata-gluster-gvol0.log
> --xlator-option *-posix.glusterd-uuid=11e528b0-8c69-4b5d-82ed-c41dd25536d6
> --process-name brick --brick-port 49153 --xlator-option
> gvol0-server.listen-port=49153
> root     48227     1  0 Feb17 ?        00:00:26 /usr/sbin/glusterd -p
> /var/run/glusterd.pid --log-level INFO
>
> On "sg" in glusterd.log we're seeing:
>
> [2023-02-23 20:26:57.619318 +0000] E [rpc-clnt.c:181:call_bail]
> 0-management: bailing out frame type(glusterd mgmt v3), op(--(6)), xid =
> 0x11, unique = 27, sent = 2023-02-23 20:16:50.596447 +0000, timeout = 600
> for 10.20.20.11:24007
> [2023-02-23 20:26:57.619425 +0000] E [MSGID: 106115]
> [glusterd-mgmt.c:122:gd_mgmt_v3_collate_errors] 0-management: Unlocking
> failed on br. Please check log file for details.
> [2023-02-23 20:26:57.619545 +0000] E [MSGID: 106151]
> [glusterd-syncop.c:1655:gd_unlock_op_phase] 0-management: Failed to unlock
> on some peer(s)
> [2023-02-23 20:26:57.619693 +0000] W
> [glusterd-locks.c:817:glusterd_mgmt_v3_unlock]
> (-->/usr/lib/x86_64-linux-gnu/glusterfs/9.6/xlator/mgmt/glusterd.so(+0xe19b9)
> [0x7fadf47fa9b9]
> -->/usr/lib/x86_64-linux-gnu/glusterfs/9.6/xlator/mgmt/glusterd.so(+0xe0e20)
> [0x7fadf47f9e20]
> -->/usr/lib/x86_64-linux-gnu/glusterfs/9.6/xlator/mgmt/glusterd.so(+0xe7904)
> [0x7fadf4800904] ) 0-management: Lock owner mismatch. Lock for vol gvol0
> held by 11e528b0-8c69-4b5d-82ed-c41dd25536d6
> [2023-02-23 20:26:57.619780 +0000] E [MSGID: 106117]
> [glusterd-syncop.c:1679:gd_unlock_op_phase] 0-management: Unable to release
> lock for gvol0
> [2023-02-23 20:26:57.619939 +0000] I
> [socket.c:3811:socket_submit_outgoing_msg] 0-socket.management: not
> connected (priv->connected = -1)
> [2023-02-23 20:26:57.619969 +0000] E [rpcsvc.c:1567:rpcsvc_submit_generic]
> 0-rpc-service: failed to submit message (XID: 0x3, Program: GlusterD svc
> cli, ProgVers: 2, Proc: 27) to rpc-transport (socket.management)
> [2023-02-23 20:26:57.619995 +0000] E [MSGID: 106430]
> [glusterd-utils.c:678:glusterd_submit_reply] 0-glusterd: Reply submission
> failed
>
> And in the brick log:
>
> [2023-02-23 20:22:56.717721 +0000] I [addr.c:54:compare_addr_and_update]
> 0-/nodirectwritedata/gluster/gvol0: allowed = "*", received addr =
> "10.20.20.11"
> [2023-02-23 20:22:56.717817 +0000] I [login.c:110:gf_auth] 0-auth/login:
> allowed user names: a26c7de4-1236-4e0a-944a-cb82de7f7f0e
> [2023-02-23 20:22:56.717840 +0000] I [MSGID: 115029]
> [server-handshake.c:561:server_setvolume] 0-gvol0-server: accepted client
> from
> CTX_ID:46b23c19-5114-4a20-9306-9ea6faf02d51-GRAPH_ID:0-PID:35568-HOST:br.m5voip.com-PC_NAME:gvol0-client-0-RECON_NO:-0
> (version: 9.1) with subvol /nodirectwritedata/gluster/gvol0
> [2023-02-23 20:22:56.741545 +0000] W [socket.c:766:__socket_rwv]
> 0-tcp.gvol0-server: readv on 10.20.20.11:49144 failed (No data available)
> [2023-02-23 20:22:56.741599 +0000] I [MSGID: 115036]
> [server.c:500:server_rpc_notify] 0-gvol0-server: disconnecting connection
> [{client-uid=CTX_ID:46b23c19-5114-4a20-9306-9ea6faf02d51-GRAPH_ID:0-PID:35568-HOST:br.m5voip.com-PC_NAME:gvol0-client-0-RECON_NO:-0}]
>
> [2023-02-23 20:22:56.741866 +0000] I [MSGID: 101055]
> [client_t.c:397:gf_client_unref] 0-gvol0-server: Shutting down connection
> CTX_ID:46b23c19-5114-4a20-9306-9ea6faf02d51-GRAPH_ID:0-PID:35568-HOST:br.m5voip.com-PC_NAME:gvol0-client-0-RECON_NO:-0
>
>
> Thanks for your help,
>
> --
> David Cunningham, Voisonics Limited
> http://voisonics.com/
> USA: +1 213 221 1092
> New Zealand: +64 (0)28 2558 3782
>


-- 
David Cunningham, Voisonics Limited
http://voisonics.com/
USA: +1 213 221 1092
New Zealand: +64 (0)28 2558 3782
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20230224/28b56ad7/attachment.html>


More information about the Gluster-users mailing list