[Bugs] [Bug 1353426] New: glusterd: glusterd provides stale port information when a volume is recreated with same brick path

Thu Jul 7 04:20:08 UTC 2016

https://bugzilla.redhat.com/show_bug.cgi?id=1353426

            Bug ID: 1353426
           Summary: glusterd: glusterd provides stale port information
                    when a volume is recreated with same brick path
           Product: GlusterFS
           Version: 3.8.0
         Component: glusterd
          Keywords: Triaged
          Severity: high
          Assignee: bugs at gluster.org
          Reporter: amukherj at redhat.com
                CC: amukherj at redhat.com, bugs at gluster.org,
                    kramdoss at redhat.com, rhinduja at redhat.com,
                    rhs-bugs at redhat.com, sanandpa at redhat.com,
                    sankarshan at redhat.com
        Depends On: 1333749, 1334270

+++ This bug was initially created as a clone of Bug #1334270 +++

+++ This bug was initially created as a clone of Bug #1333749 +++

Description of problem:
-------------------------

Had a 2*(4+2) volume, with roughly a lakh files (of size 1k created) from nfs
client. Did a 'ls -l | wc -l' , at the same time started creating files of 1g
from another mountpoint. When both the above mentioned commands were
proceeding, did a tier-attach of 2*2 volume. The command got executed
successfully, but these were the errors seen in the logs: 

[2016-05-06 09:29:58.797805] E [socket.c:2279:socket_connect_finish]
0-ozone-client-4: connection to 10.70.35.210:49153 failed (Connection refused)
[2016-05-06 09:30:00.848981] E [MSGID: 109037] [tier.c:1237:tier_process_brick]
0-tier: Failed to get journal_mode of sql db
/bricks/brick1/ozone/.glusterfs/ozone.db
[2016-05-06 09:30:00.849018] E [MSGID: 109087]
[tier.c:1341:tier_build_migration_qfile] 0-ozone-tier-dht: Brick
/bricks/brick1/ozone/.glusterfs/ozone.db query failed
[2016-05-06 09:30:01.018505] E [MSGID: 109037]
[tier.c:1394:tier_migrate_files_using_qfile] 0-tier: Failed to open
/var/run/gluster/ozone-tier-dht/promote-ozone-1 to the query file [No such file
or directory]
[2016-05-06 09:30:02.807200] E [socket.c:2279:socket_connect_finish]
0-ozone-client-4: connection to 10.70.35.210:49153 failed (Connection refused)

Multiple connection refused errors were seen in the other nodes: 

[2016-05-06 09:29:19.434783] E [socket.c:2279:socket_connect_finish]
0-ozone-client-4: connection to 10.70.35.210:49153 failed (Connection refused)
[2016-05-06 09:29:23.438925] E [socket.c:2279:socket_connect_finish]
0-ozone-client-4: connection to 10.70.35.210:49153 failed (Connection refused)
[2016-05-06 09:29:27.446776] E [socket.c:2279:socket_connect_finish]
0-ozone-client-4: connection to 10.70.35.210:49153 failed (Connection refused)
[2016-05-06 09:29:31.452944] E [socket.c:2279:socket_connect_finish]
0-ozone-client-4: connection to 10.70.35.210:49153 failed (Connection refused)
[2016-05-06 09:29:35.460888] E [socket.c:2279:socket_connect_finish]
0-ozone-client-4: connection to 10.70.35.210:49153 failed (Connection refused)
[2016-05-06 09:29:39.464874] E [socket.c:2279:socket_connect_finish]
0-ozone-client-4: connection to 10.70.35.210:49153 failed (Connection refused)

Version-Release number of selected component (if applicable):
-------------------------------------------------------------
glusterfs-3.7.9-3.el7rhgs.x86_64

How reproducible: Hit it once
--------------------

Sosreports will be copied to
http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/<bugnumber>/

--- Additional comment from Sweta Anandpara on 2016-05-06 06:11:14 EDT ---

[qe at rhsqe-repo 1333749]$ 
[qe at rhsqe-repo 1333749]$ hostname
rhsqe-repo.lab.eng.blr.redhat.com
[qe at rhsqe-repo 1333749]$ 
[qe at rhsqe-repo 1333749]$ 
[qe at rhsqe-repo 1333749]$ pwd
/home/repo/sosreports/1333749
[qe at rhsqe-repo 1333749]$ 
[qe at rhsqe-repo 1333749]$ 
[qe at rhsqe-repo 1333749]$ ls -l 
total 97940
-rwxr-xr-x. 1 qe qe 17523472 May  6 15:37
sosreport-dhcp35-210.lab.eng.blr.redhat.com-20160506092326.tar.xz
-rwxr-xr-x. 1 qe qe 25732732 May  6 15:37
sosreport-sysreg-prod-20160506092321.tar.xz
-rwxr-xr-x. 1 qe qe 29254684 May  6 15:37
sosreport-sysreg-prod-20160506092323.tar.xz
-rwxr-xr-x. 1 qe qe 27769984 May  6 15:37
sosreport-sysreg-prod-20160506092325.tar.xz
[qe at rhsqe-repo 1333749]$

--- Additional comment from Joseph Elwin Fernandes on 2016-05-06 09:47:18 EDT
---

Looks like network failure.

Depending on the version and journal mode of sqlite3, tier migration process,
does the query locally (rhel 7) or via CTR (rhel 6, which is not recommended).
To get the version and journal mode tier does a IPC FOP to the brick. Since in
this case, there is
as failure of network(not sure which path the client translator selects,
network or
unix domain socket as it only sends it to the local bricks on the node, looking
at
the log its seems a network call as they are using IP's and port numbers)
or the brick is down, this call fails.

Will look into the brick log of this node and see if the bricks is up or not.

--- Additional comment from Joseph Elwin Fernandes on 2016-05-06 09:49:57 EDT
---

What the vol info ? I mean the name of the bricks?

--- Additional comment from Joseph Elwin Fernandes on 2016-05-06 09:51:42 EDT
---

is this the vol info of the setup ?

type=5
count=16
status=1
sub_count=2
stripe_count=1
replica_count=2
disperse_count=6
redundancy_count=2
version=3
transport-type=0
volume-id=9227798a-cdd5-4ff6-ab5e-046a8434cc5e
username=15f61858-8340-4da1-aa0f-9df80581d1f0
password=3516b404-dbd0-4858-b29a-1470ad22c120
op-version=30700
client-op-version=30700
quota-version=0
cold_count=12
cold_replica_count=1
cold_disperse_count=6
cold_redundancy_count=2
hot_count=4
hot_replica_count=2
hot_type=2
cold_type=4
parent_volname=N/A
restored_from_snap=00000000-0000-0000-0000-000000000000
snap-max-hard-limit=256
cluster.tier-mode=cache
features.ctr-enabled=on
performance.readdir-ahead=on
brick-0=10.70.35.13:-bricks-brick3-ozone_tier
brick-1=10.70.35.137:-bricks-brick3-ozone_tier
brick-2=10.70.35.85:-bricks-brick3-ozone_tier
brick-3=10.70.35.210:-bricks-brick3-ozone_tier
brick-4=10.70.35.210:-bricks-brick0-ozone
brick-5=10.70.35.85:-bricks-brick0-ozone
brick-6=10.70.35.137:-bricks-brick0-ozone
brick-7=10.70.35.13:-bricks-brick0-ozone
brick-8=10.70.35.210:-bricks-brick1-ozone
brick-9=10.70.35.85:-bricks-brick1-ozone
brick-10=10.70.35.137:-bricks-brick1-ozone
brick-11=10.70.35.13:-bricks-brick1-ozone
brick-12=10.70.35.210:-bricks-brick2-ozone
brick-13=10.70.35.85:-bricks-brick2-ozone
brick-14=10.70.35.137:-bricks-brick2-ozone
brick-15=10.70.35.13:-bricks-brick2-ozone

--- Additional comment from Red Hat Bugzilla Rules Engine on 2016-05-06
17:48:24 EDT ---

This bug is automatically being proposed for the current z-stream release of
Red Hat Gluster Storage 3 by setting the release flag 'rhgs‑3.1.z' to '?'. 

If this bug should be proposed for a different release, please manually change
the proposed release flag.

--- Additional comment from Sweta Anandpara on 2016-05-08 23:37:30 EDT ---

Yes it is. Had a 2*(4+2) volume(ozone) as cold tier and 2*2 volume as hot tier
(ozone_tier)

--- Additional comment from Sweta Anandpara on 2016-05-08 23:45:36 EDT ---

I have the setup if it has to be looked at. 

The hypervisor was impacted in the rack-replacement/lab-shutdown that took
place last weekend. Have got my setup back online now and can be accessed at:
10.70.35.210
Will share the password over email.

[root at dhcp35-210 ~]# 
[root at dhcp35-210 ~]# gluster v info

Volume Name: ozone
Type: Tier
Volume ID: 9227798a-cdd5-4ff6-ab5e-046a8434cc5e
Status: Started
Number of Bricks: 16
Transport-type: tcp
Hot Tier :
Hot Tier Type : Distributed-Replicate
Number of Bricks: 2 x 2 = 4
Brick1: 10.70.35.13:/bricks/brick3/ozone_tier
Brick2: 10.70.35.137:/bricks/brick3/ozone_tier
Brick3: 10.70.35.85:/bricks/brick3/ozone_tier
Brick4: 10.70.35.210:/bricks/brick3/ozone_tier
Cold Tier:
Cold Tier Type : Distributed-Disperse
Number of Bricks: 2 x (4 + 2) = 12
Brick5: 10.70.35.210:/bricks/brick0/ozone
Brick6: 10.70.35.85:/bricks/brick0/ozone
Brick7: 10.70.35.137:/bricks/brick0/ozone
Brick8: 10.70.35.13:/bricks/brick0/ozone
Brick9: 10.70.35.210:/bricks/brick1/ozone
Brick10: 10.70.35.85:/bricks/brick1/ozone
Brick11: 10.70.35.137:/bricks/brick1/ozone
Brick12: 10.70.35.13:/bricks/brick1/ozone
Brick13: 10.70.35.210:/bricks/brick2/ozone
Brick14: 10.70.35.85:/bricks/brick2/ozone
Brick15: 10.70.35.137:/bricks/brick2/ozone
Brick16: 10.70.35.13:/bricks/brick2/ozone
Options Reconfigured:
performance.readdir-ahead: on
features.ctr-enabled: on
cluster.tier-mode: cache
[root at dhcp35-210 ~]#

--- Additional comment from Atin Mukherjee on 2016-05-09 01:57:00 EDT ---

I have an initial RCA for why the client was trying to connect to the stale
port.

A brick process initiates a SIGNOUT event from cleanup_and_exit () which is
called only in a graceful shut down case. If a brick process is brought down by
kill -9 semantics then glusterd doesn't receive this event which eventually
means the stale port details will still be in the data structure. As port
search logic starts from base_port and goes up to last_alloc, glusterd will
provide the older stale details instead of new one in this case resulting into
this failure.

I am thinking of modifying the port map search logic from last_alloc to base
such that we always pick up the fresh entries and eliminate the case of
clashing with older entries.

Question here is why are we using kill -9 instead of kill -15 to test brick
down scenario?

--- Additional comment from Atin Mukherjee on 2016-05-09 04:50:39 EDT ---

And I confirmed with Sweta that the same volume was stopped, deleted and
recreated back with same brick path and kill -9 was used to bring down the
brick process.

--- Additional comment from Vijay Bellur on 2016-05-09 06:23:00 EDT ---

REVIEW: http://review.gluster.org/14268 (glusterd: search port from last_alloc
to base_port) posted (#1) for review on master by Atin Mukherjee
(amukherj at redhat.com)

--- Additional comment from Vijay Bellur on 2016-07-05 07:43:24 EDT ---

COMMIT: http://review.gluster.org/14268 committed in master by Jeff Darcy
(jdarcy at redhat.com) 
------
commit 967a77ed4db0e1c0bcc23f132e312b659ce961ef
Author: Atin Mukherjee <amukherj at redhat.com>
Date:   Mon May 9 12:14:37 2016 +0530

    glusterd: search port from last_alloc to base_port

    If a brick process is killed ungracefully then GlusterD wouldn't receive a
    PMAP_SIGNOUT event and hence the stale port details wouldn't be removed
out.

    Now consider the following case:
    1. Create a volume with 1 birck
    2. Start the volume (say brick port allocated is 49152)
    3. Kill the brick process by 'kill -9'
    4. Stop & delete the volume
    5. Recreate the volume and start it. (Now the brick port gets 49153)
    6. Mount the volume

    Now in step 6 mount will fail as GlusterD will provide back the stale port
    number given the query starts searching from the base_port.

    Solution:

    To avoid this, searching for port from last_alloc and coming down to
base_port
    should solve the issue.

    Change-Id: I9afafd722a7fda0caac4cc892605f4e7c0e48e73
    BUG: 1334270
    Signed-off-by: Atin Mukherjee <amukherj at redhat.com>
    Reviewed-on: http://review.gluster.org/14268
    Smoke: Gluster Build System <jenkins at build.gluster.org>
    NetBSD-regression: NetBSD Build System <jenkins at build.gluster.org>
    CentOS-regression: Gluster Build System <jenkins at build.gluster.org>
    Reviewed-by: Samikshan Bairagya <samikshan at gmail.com>
    Reviewed-by: Jeff Darcy <jdarcy at redhat.com>

Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1333749
[Bug 1333749] glusterd: glusterd provides stale port information when a
volume is recreated with same brick path
https://bugzilla.redhat.com/show_bug.cgi?id=1334270
[Bug 1334270] glusterd: glusterd provides stale port information when a
volume is recreated with same brick path
-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.