[Bugs] [Bug 1813029] New: volume brick fails to come online because other process is using port 49152

bugzilla at redhat.com bugzilla at redhat.com
Thu Mar 12 18:34:00 UTC 2020


https://bugzilla.redhat.com/show_bug.cgi?id=1813029

            Bug ID: 1813029
           Summary: volume brick fails to come online because other
                    process is using port 49152
           Product: GlusterFS
           Version: 6
          Hardware: x86_64
                OS: Linux
            Status: NEW
         Component: glusterd
          Severity: urgent
          Assignee: bugs at gluster.org
          Reporter: amgad.saleh at nokia.com
                CC: bugs at gluster.org
  Target Milestone: ---
    Classification: Community



Description of problem:
Glusterd starts volume bricks when booting starting with port 49152, if the
port is used by any other process even in a transient manner (like mistral),
glusterd won't do to the next port number and we ends up with volume brick
offline.

Status of volume: cacerts
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.10.10.12:/gluster/cacerts/brick    N/A       N/A        N       N/A  

If we do volume stop/start (no restart for glusterd), the brick will come
online on the next available port number (49156)

Status of volume: cacerts
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.10.10.12:/gluster/cacerts/brick    49156     0          Y       17660
So there's 

This happen during reboot of a VM, the  mistral process established a
connection using port 49152 on server , so it's using it (See mistral conn to
amqp)

-----------------

-bash-4.2# lsof -ni -p 4927 | grep 49152
lsof: WARNING: can't stat() fuse.glusterfs file system /cacerts
Output information may be incomplete.
glusterfs 682 root 10u IPv4 156905 0t0 TCP 10.10.10.12:49089->10.10.10.13:49152
(ESTABLISHED)
nginx 807 nginx 78u IPv4 3471078 0t0 TCP 10.10.10.9:amqp->10.10.10.12:49152
(ESTABLISHED)
mistral-s 4927 mistral 12u IPv4 3476971 0t0 TCP
10.10.10.12:49152->10.10.10.9:amqp (ESTABLISHED)
glusterfs 8852 root 13u IPv4 76064 0t0 TCP 10.10.10.12:49141->10.10.10.13:49152
(ESTABLISHED)

---------------
GlusterFS won't be able to claim full access to it and fails with the messages:
----------------------
/var/log/glusterfs/bricks/gluster-cacerts-brick.log
[2020-03-11 11:29:48.311164] W [socket.c:721:__socket_rwv] 0-glusterfs: readv
on 10.10.10.12:24007 failed (No data available)
[2020-03-11 11:29:48.311208] I [glusterfsd-mgmt.c:2443:mgmt_rpc_notify]
0-glusterfsd-mgmt: disconnected from remote-host: 10.10.10.12
[2020-03-11 11:29:48.311220] I [glusterfsd-mgmt.c:2463:mgmt_rpc_notify]
0-glusterfsd-mgmt: Exhausted all volfile servers
[2020-03-11 11:30:08.541800] I [MSGID: 115036] [server.c:499:server_rpc_notify]
0-cacerts-server: disconnecting connection from
CTX_ID:577576fc-fc99-4874-95d9-1acd3ad25153-GRAPH_ID:0-PID:20906-HOST:cbnd-2-server-PC_NAME:cacerts-client-1-RECON_NO:-0
[2020-03-11 11:30:08.542166] I [MSGID: 101055] [client_t.c:436:gf_client_unref]
0-cacerts-server: Shutting down connection
CTX_ID:577576fc-fc99-4874-95d9-1acd3ad25153-GRAPH_ID:0-PID:20906-HOST:cbnd-2-server-PC_NAME:cacerts-client-1-RECON_NO:-0
[2020-03-11 11:35:09.957365] I [MSGID: 100030] [glusterfsd.c:2847:main]
0-/usr/sbin/glusterfsd: Started running /usr/sbin/glusterfsd version 6.5 (args:
/usr/sbin/glusterfsd -s 10.10.10.12 --volfile-id
cacerts.10.10.10.12.gluster-cacerts-brick -p
/var/run/gluster/vols/cacerts/10.10.10.12-gluster-cacerts-brick.pid -S
/var/run/gluster/6378652cef803b2a.socket --brick-name /gluster/cacerts/brick -l
/var/log/glusterfs/bricks/gluster-cacerts-brick.log --xlator-option
*-posix.glusterd-uuid=a9a3e24e-599a-4c45-9fbb-fe9914193d33 --process-name brick
--brick-port 49152 --xlator-option cacerts-server.listen-port=49152)
[2020-03-11 11:35:09.957872] I [glusterfsd.c:2556:daemonize] 0-glusterfs: Pid
of current running process is 6574
[2020-03-11 11:35:09.963535] I [socket.c:904:__socket_server_bind]
0-socket.glusterfsd: closing (AF_UNIX) reuse check socket 9
[2020-03-11 11:35:09.967195] I [MSGID: 101190]
[event-epoll.c:680:event_dispatch_epoll_worker] 0-epoll: Started thread with
index 0
[2020-03-11 11:35:09.967320] I [MSGID: 101190]
[event-epoll.c:680:event_dispatch_epoll_worker] 0-epoll: Started thread with
index 1
[2020-03-11 11:37:59.244311] I [rpcsvc.c:2698:rpcsvc_set_outstanding_rpc_limit]
0-rpc-service: Configured rpc.outstanding-rpc-limit with value 64
[2020-03-11 11:37:59.244561] E [socket.c:923:__socket_server_bind]
0-tcp.cacerts-server: binding to failed: Address already in use
[2020-03-11 11:37:59.244578] E [socket.c:925:__socket_server_bind]
0-tcp.cacerts-server: Port is already in use
[2020-03-11 11:38:00.244719] E [socket.c:923:__socket_server_bind]
0-tcp.cacerts-server: binding to failed: Address already in use
[2020-03-11 11:38:00.244771] E [socket.c:925:__socket_server_bind]
0-tcp.cacerts-server: Port is already in use
[2020-03-11 11:38:01.244877] E [socket.c:923:__socket_server_bind]
0-tcp.cacerts-server: binding to failed: Address already in use
[2020-03-11 11:38:01.244919] E [socket.c:925:__socket_server_bind]
0-tcp.cacerts-server: Port is already in use
[2020-03-11 11:38:02.245025] E [socket.c:923:__socket_server_bind]
0-tcp.cacerts-server: binding to failed: Address already in use
[2020-03-11 11:38:02.245066] E [socket.c:925:__socket_server_bind]
0-tcp.cacerts-server: Port is already in use
[2020-03-11 11:38:03.245174] E [socket.c:923:__socket_server_bind]
0-tcp.cacerts-server: binding to failed: Address already in use
[2020-03-11 11:38:03.245216] E [socket.c:925:__socket_server_bind]
0-tcp.cacerts-server: Port is already in use
[2020-03-11 11:38:04.245348] E [socket.c:923:__socket_server_bind]
0-tcp.cacerts-server: binding to failed: Address already in use
[2020-03-11 11:38:04.245399] E [socket.c:925:__socket_server_bind]
0-tcp.cacerts-server: Port is already in use
[2020-03-11 11:38:05.245519] E [socket.c:923:__socket_server_bind]
0-tcp.cacerts-server: binding to failed: Address already in use
[2020-03-11 11:38:05.245568] E [socket.c:925:__socket_server_bind]
0-tcp.cacerts-server: Port is already in use
[2020-03-11 11:38:06.245681] E [socket.c:923:__socket_server_bind]
0-tcp.cacerts-server: binding to failed: Address already in use
[2020-03-11 11:38:06.245741] E [socket.c:925:__socket_server_bind]
0-tcp.cacerts-server: Port is already in use
[2020-03-11 11:38:07.245848] E [socket.c:923:__socket_server_bind]
0-tcp.cacerts-server: binding to failed: Address already in use
[2020-03-11 11:38:07.245902] E [socket.c:925:__socket_server_bind]
0-tcp.cacerts-server: Port is already in use
[2020-03-11 11:38:08.246040] E [socket.c:923:__socket_server_bind]
0-tcp.cacerts-server: binding to failed: Address already in use
[2020-03-11 11:38:08.246093] E [socket.c:925:__socket_server_bind]
0-tcp.cacerts-server: Port is already in use
[2020-03-11 11:38:09.246190] E [socket.c:3718:socket_listen]
0-tcp.cacerts-server: __socket_server_bind failed;closing socket 12
[2020-03-11 11:38:09.246242] W [rpcsvc.c:1991:rpcsvc_create_listener]
0-rpc-service: listening on transport failed
[2020-03-11 11:38:09.246298] W [MSGID: 115045] [server.c:1186:server_init]
0-cacerts-server: creation of listener failed
[2020-03-11 11:38:09.247097] E [MSGID: 101019] [xlator.c:629:xlator_init]
0-cacerts-server: Initialization of volume 'cacerts-server' failed, review your
volfile again
[2020-03-11 11:38:09.247124] E [MSGID: 101066]
[graph.c:362:glusterfs_graph_init] 0-cacerts-server: initializing translator
failed
[2020-03-11 11:38:09.247154] E [MSGID: 101176]
[graph.c:725:glusterfs_graph_activate] 0-graph: init failed
[2020-03-11 11:38:09.248308] W [glusterfsd.c:1570:cleanup_and_exit]
(->/usr/sbin/glusterfsd(mgmt_getspec_cbk+0x873) [0x557305bb0383]
-->/usr/sbin/glusterfsd(glusterfs_process_volfp+0x239) [0x557305ba95f9]
-->/usr/sbin/glusterfsd(cleanup_and_exit+0x54) [0x557305ba8a84] ) 0: received
signum (-1), shutting down

-------------------------------------------------------------------------
When I rebooted that VM (where glusterd was started), it came back with the
volume brick on port 49152

Status of volume: cacerts
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick 10.10.10.12:/gluster/cacerts/brick 49152 0 Y 4838

*** So there's a race condition.

-bash-4.2# gluster volume status

How it should work:
glusterd should move to the next available port number for volume bricks if the
one trying already being used.

How reproducible:
This doesn't happen all the time, but occasionally. 

Steps to Reproduce:
1. 3 VM cluster, with 4 volumes replica of 3
2. Add files to the volumes
3. Reboot a VM and check gluster volume status

Actual results:


Expected results:


Additional info:

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Bugs mailing list