[Bugs] [Bug 1543711] New: glustershd/ glusterd is not using right port when connecting to glusterfsd process

bugzilla at redhat.com bugzilla at redhat.com
Fri Feb 9 04:08:37 UTC 2018


https://bugzilla.redhat.com/show_bug.cgi?id=1543711

            Bug ID: 1543711
           Summary: glustershd/glusterd is not using right port when
                    connecting to glusterfsd process
           Product: GlusterFS
           Version: 4.0
         Component: glusterd
          Keywords: Triaged
          Severity: high
          Assignee: bugs at gluster.org
          Reporter: amukherj at redhat.com
                CC: bugs at gluster.org, zz.sh.cynthia at gmail.com
        Depends On: 1537362
            Blocks: 1537346



+++ This bug was initially created as a clone of Bug #1537362 +++

+++ This bug was initially created as a clone of Bug #1537346 +++

Description of problem:
sometimes after reboot one sn nodes 
The output of command “gluster v heal mstate info” shows
 [root at testsn-1:/var/log/glusterfs/bricks]
# gluster v heal mstate info
Brick testsn-0.local:/mnt/bricks/mstate/brick
/testas-0/var/lib/ntp/drift 
/testas-2/var/lib/ntp/drift 
/.install-done 
/testas-0/var/lib/ntp 
/testmn-1/var/lib/ntp 
/testas-2/var/lib/ntp 
/testmn-0/var/lib/ntp 
/testmn-1/var/lib/ntp/drift 
/testas-1/var/lib/ntp 
/testas-1/var/lib/ntp/drift 
/testmn-0/var/lib/ntp/drift 
Status: Connected
Number of entries: 11

Brick testsn-1.local:/mnt/bricks/mstate/brick
Status: Transport endpoint is not connected
Number of entries: -

glustershd can not connect to local brick process! when i check the glustershd
process i  find it always fail when trying to connect to glusterfsd process
with port 49155.
[2018-01-18 10:42:29.891811] I [rpc-clnt.c:1986:rpc_clnt_reconfig]
0-mstate-client-1: changing port to 49155 (from 0)
[2018-01-18 10:42:29.892120] E [socket.c:2369:socket_connect_finish]
0-mstate-client-1: connection to 192.168.1.3:49155 failed (Connection refused);
disconnecting 

however, from local mstate glusterfsd process, it is listenning on port 49153!

Version-Release number of selected component (if applicable):
glusterfs3.12.3

How reproducible:

reboot sn node
Steps to Reproduce:
1.reboot sn node
2.
3.

Actual results:
glustershd can not connected to one local glusterfsd brick process
this can be seen from the following netstat command output;
[root at testsn-1:/var/log/glusterfs/bricks]
# ps -ef | grep glustershd
root      1295     1  0 Jan18 ?        00:00:18 /usr/sbin/glusterfs -s
testsn-1.local --volfile-id gluster/glustershd -p
/var/run/gluster/glustershd/glustershd.pid -l /var/log/glusterfs/glustershd.log
-S /var/run/gluster/178dba826edae38df4ba67f25beeb1e6.socket --xlator-option
*replicate*.node-uuid=9ccea6b1-4d81-4020-a4ba-ee6821268ba8
root     19900 27911  0 04:10 pts/1    00:00:00 grep glustershd
[root at testsn-1:/var/log/glusterfs/bricks]
# netstat -p | grep 1295
tcp        0      0 testsn-1.local:49098    testsn-0.local:49154    ESTABLISHED
1295/glusterfs
tcp        0      0 testsn-1.local:49099    testsn-0.local:49152    ESTABLISHED
1295/glusterfs
tcp        0      0 testsn-1.local:49140    testsn-1.local:24007    ESTABLISHED
1295/glusterfs
tcp        0      0 testsn-1.local:49097    testsn-0.local:49153    ESTABLISHED
1295/glusterfs
tcp        0      0 testsn-1.local:49096    testsn-0.local:49155    ESTABLISHED
1295/glusterfs
tcp        0      0 testsn-1.local:49120    testsn-1.local:49156    ESTABLISHED
1295/glusterfs
tcp        0      0 testsn-1.local:49121    testsn-1.local:49152    ESTABLISHED
1295/glusterfs
tcp        0      0 testsn-1.local:49126    testsn-1.local:49154    ESTABLISHED
1295/glusterfs
unix  3      [ ]         STREAM     CONNECTED      36264 1295/glusterfs     
/var/run/gluster/178dba826edae38df4ba67f25beeb1e6.socket
unix  2      [ ]         DGRAM                     36258 1295/glusterfs      

Expected results:
glustershd should be able to connected to local brick process

Additional info:
# gluster v status mstate
Status of volume: mstate
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick testsn-0.local:/mnt/bricks/mstate/bri
ck                                          49154     0          Y       1113 
Brick testsn-1.local:/mnt/bricks/mstate/bri
ck                                          49155     0          Y       1117 
Self-heal Daemon on localhost               N/A       N/A        Y       1295 
Self-heal Daemon on testsn-2.local          N/A       N/A        Y       1813 
Self-heal Daemon on testsn-0.local          N/A       N/A        Y       1135   

Task Status of Volume mstate
------------------------------------------------------------------------------
There are no active volume tasks

It is quite strange that the mstate brick process listen port is showed as
49155 in “gluster v heal status mstate” but showed 49153 in ps command!
[root at testsn-1:/var/log/glusterfs/bricks]
# ps -ef | grep -i glusterfsd | grep mstate
root      1117     1  0 Jan18 ?        00:00:05 /usr/sbin/glusterfsd -s
testsn-1.local --volfile-id mstate.testsn-1.local.mnt-bricks-mstate-brick -p
/var/run/gluster/vols/mstate/testsn-1.local-mnt-bricks-mstate-brick.pid -S
/var/run/gluster/b520b934b415e6a68776cc4852901a77.socket --brick-name
/mnt/bricks/mstate/brick -l
/var/log/glusterfs/bricks/mnt-bricks-mstate-brick.log --xlator-option
*-posix.glusterd-uuid=9ccea6b1-4d81-4020-a4ba-ee6821268ba8 --brick-port 49153
--xlator-option mstate-server.listen-port=49153 --xlator-option
transport.socket.bind-address=testsn-1.local

--- Additional comment from Worker Ant on 2018-01-22 21:29:06 EST ---

REVIEW: https://review.gluster.org/19263 (glusterd: process pmap sign in only
when port is marked as free) posted (#3) for review on master by Atin Mukherjee

--- Additional comment from Worker Ant on 2018-01-22 21:45:23 EST ---

REVIEW: https://review.gluster.org/19263 (glusterd: process pmap sign in only
when port is marked as free) posted (#4) for review on master by Atin Mukherjee

--- Additional comment from Worker Ant on 2018-01-25 03:01:51 EST ---

COMMIT: https://review.gluster.org/19263 committed in master by \"Atin
Mukherjee\" <amukherj at redhat.com> with a commit message- glusterd: process pmap
sign in only when port is marked as free

Because of some crazy race in volume start code path because of friend
handshaking with volumes with quorum enabled we might end up into a situation
where glusterd would start a brick and get a disconnect and then immediately
try
to start the same brick instance based on another friend update request. And
then if for the very first brick even if the process doesn't come up at the end
sign in event gets sent and we end up having two duplicate portmap entries for
the same brick. Since in brick start we mark the previous port as free, its
better to consider a sign in request as no op if the corresponding port type is
marked as free.

Change-Id: I995c348c7b6988956d24b06bf3f09ab64280fc32
BUG: 1537362
Signed-off-by: Atin Mukherjee <amukherj at redhat.com>


Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1537346
[Bug 1537346] glustershd/glusterd is not using right port when connecting
to glusterfsd process
https://bugzilla.redhat.com/show_bug.cgi?id=1537362
[Bug 1537362] glustershd/glusterd is not using right port when connecting
to glusterfsd process
-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Bugs mailing list