[Bugs] [Bug 1311572] New: tests : remove brick command execution displays success even after, one of the bricks down.

bugzilla at redhat.com bugzilla at redhat.com
Wed Feb 24 13:38:34 UTC 2016


https://bugzilla.redhat.com/show_bug.cgi?id=1311572

            Bug ID: 1311572
           Summary: tests : remove brick command execution displays
                    success even after, one of the bricks down.
           Product: GlusterFS
           Version: 3.7.8
         Component: distribute
          Severity: low
          Assignee: bugs at gluster.org
          Reporter: rtalur at redhat.com
                CC: amukherj at redhat.com, bugs at gluster.org,
                    rhs-bugs at redhat.com, sabansal at redhat.com,
                    sankarshan at redhat.com, sasundar at redhat.com,
                    storage-qa-internal at redhat.com, trao at redhat.com
        Depends On: 1225716



+++ This bug was initially created as a clone of Bug #1225716 +++

+++ This bug was initially created as a clone of Bug #1201205 +++

Description of problem:

Remove brick command execution displays success even after, one of the bricks
down. But gluster v status <vol> shows remove-brick process failed and so the
rebalance log messages.

BUild found:

[root at rhsauto032 ~]# rpm -qa | grep gluster
gluster-nagios-common-0.1.4-1.el6rhs.noarch
glusterfs-3.6.0.50-1.el6rhs.x86_64
glusterfs-server-3.6.0.50-1.el6rhs.x86_64
gluster-nagios-addons-0.1.14-1.el6rhs.x86_64
samba-glusterfs-3.6.509-169.4.el6rhs.x86_64
glusterfs-libs-3.6.0.50-1.el6rhs.x86_64
glusterfs-api-3.6.0.50-1.el6rhs.x86_64
glusterfs-cli-3.6.0.50-1.el6rhs.x86_64
glusterfs-geo-replication-3.6.0.50-1.el6rhs.x86_64
vdsm-gluster-4.14.7.3-1.el6rhs.noarch
glusterfs-fuse-3.6.0.50-1.el6rhs.x86_64
glusterfs-rdma-3.6.0.50-1.el6rhs.x86_64
[root at rhsauto032 ~]# 


[root at rhsauto032 ~]# glusterfs --version
glusterfs 3.6.0.50 built on Mar  6 2015 11:04:46
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2013 Red Hat, Inc. <http://www.redhat.com/>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
It is licensed to you under your choice of the GNU Lesser
General Public License, version 3 or any later version (LGPLv3
or later), or the GNU General Public License, version 2 (GPLv2),
in all cases as published by the Free Software Foundation.
[root at rhsauto032 ~]# 

Reproducible: continuously.

Steps:

1. create a distribute volume with N brick (N > 3)

2. bring down one of the brick

3. initiate remove-brick

Expected result:
Remove brick should not start


output from the test:

[root at rhsauto032 ~]# gluster v info dist

Volume Name: dist
Type: Distribute
Volume ID: 6725427c-e363-4695-a4ac-65ec65ab0997
Status: Started
Snap Volume: no
Number of Bricks: 5
Transport-type: tcp
Bricks:
Brick1: rhsauto032.lab.eng.blr.redhat.com:/rhs/brick1/d1
Brick2: rhsauto032.lab.eng.blr.redhat.com:/rhs/brick1/d2
Brick3: rhsauto032.lab.eng.blr.redhat.com:/rhs/brick1/d3
Brick4: rhsauto032.lab.eng.blr.redhat.com:/rhs/brick1/d4
Brick5: rhsauto032.lab.eng.blr.redhat.com:/rhs/brick1/d0_2
Options Reconfigured:
performance.readdir-ahead: on
auto-delete: disable
snap-max-soft-limit: 90
snap-max-hard-limit: 256
[root at rhsauto032 ~]# gluster v status dist
Status of volume: dist
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick rhsauto032.lab.eng.blr.redhat.com:/rh
s/brick1/d1                                 49214     0          Y       14069
Brick rhsauto032.lab.eng.blr.redhat.com:/rh
s/brick1/d2                                 49215     0          Y       14078
Brick rhsauto032.lab.eng.blr.redhat.com:/rh
s/brick1/d3                                 49216     0          Y       14084
Brick rhsauto032.lab.eng.blr.redhat.com:/rh
s/brick1/d4                                 49217     0          Y       14093
Brick rhsauto032.lab.eng.blr.redhat.com:/rh
s/brick1/d0_2                               49219     0          Y       14102
NFS Server on localhost                     2049      0          Y       31543
NFS Server on rhsauto040.lab.eng.blr.redhat
.com                                        2049      0          Y       16743
NFS Server on rhsauto034.lab.eng.blr.redhat
.com                                        2049      0          Y       25003

Task Status of Volume dist
------------------------------------------------------------------------------
There are no active volume tasks

[root at rhsauto032 ~]# kill -9 14069
[root at rhsauto032 ~]# gluster v status dist
Status of volume: dist
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick rhsauto032.lab.eng.blr.redhat.com:/rh
s/brick1/d1                                 N/A       N/A        N       14069
Brick rhsauto032.lab.eng.blr.redhat.com:/rh
s/brick1/d2                                 49215     0          Y       14078
Brick rhsauto032.lab.eng.blr.redhat.com:/rh
s/brick1/d3                                 49216     0          Y       14084
Brick rhsauto032.lab.eng.blr.redhat.com:/rh
s/brick1/d4                                 49217     0          Y       14093
Brick rhsauto032.lab.eng.blr.redhat.com:/rh
s/brick1/d0_2                               49219     0          Y       14102
NFS Server on localhost                     2049      0          Y       31543
NFS Server on rhsauto040.lab.eng.blr.redhat
.com                                        2049      0          Y       16743
NFS Server on rhsauto034.lab.eng.blr.redhat
.com                                        2049      0          Y       25003

Task Status of Volume dist
------------------------------------------------------------------------------
There are no active volume tasks

[root at rhsauto032 ~]# gluster v remove-brick dist
rhsauto032.lab.eng.blr.redhat.com:/rhs/brick1/d1 start
volume remove-brick start: success
ID: a3e82c82-c2ba-4c02-b09d-c3414246c0d4


[root at rhsauto032 ~]# gluster v status dist
Status of volume: dist
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick rhsauto032.lab.eng.blr.redhat.com:/rh
s/brick1/d1                                 N/A       N/A        N       14069
Brick rhsauto032.lab.eng.blr.redhat.com:/rh
s/brick1/d2                                 49215     0          Y       14078
Brick rhsauto032.lab.eng.blr.redhat.com:/rh
s/brick1/d3                                 49216     0          Y       14084
Brick rhsauto032.lab.eng.blr.redhat.com:/rh
s/brick1/d4                                 49217     0          Y       14093
Brick rhsauto032.lab.eng.blr.redhat.com:/rh
s/brick1/d0_2                               49219     0          Y       14102
NFS Server on localhost                     2049      0          Y       31543
NFS Server on rhsauto040.lab.eng.blr.redhat
.com                                        2049      0          Y       20221
NFS Server on rhsauto034.lab.eng.blr.redhat
.com                                        2049      0          Y       28515

Task Status of Volume dist
------------------------------------------------------------------------------
Task                 : Remove brick        
ID                   : a3e82c82-c2ba-4c02-b09d-c3414246c0d4glust
Removed bricks:     
rhsauto032.lab.eng.blr.redhat.com:/rhs/brick1/d1
Status               : failed              

[root at rhsauto032 ~]# 


Log messages:

[2015-03-11 02:00:28.209263] I [event-epoll.c:629:event_dispatch_epoll_worker]
0-epoll: Started thread with index 1
[2015-03-11 02:00:33.232184] I [graph.c:269:gf_add_cmdline_options] 0-dist-dht:
adding option 'node-uuid' for volume 'dist-dht' with value
'86741341-4584-4a10-ac2a-32cf9230c967'
[2015-03-11 02:00:33.232210] I [graph.c:269:gf_add_cmdline_options] 0-dist-dht:
adding option 'rebalance-cmd' for volume 'dist-dht' with value '5'
[2015-03-11 02:00:33.232223] I [graph.c:269:gf_add_cmdline_options] 0-dist-dht:
adding option 'readdir-optimize' for volume 'dist-dht' with value 'on'
[2015-03-11 02:00:33.232235] I [graph.c:269:gf_add_cmdline_options] 0-dist-dht:
adding option 'assert-no-child-down' for volume 'dist-dht' with value 'yes'
[2015-03-11 02:00:33.232246] I [graph.c:269:gf_add_cmdline_options] 0-dist-dht:
adding option 'lookup-unhashed' for volume 'dist-dht' with value 'yes'
[2015-03-11 02:00:33.232258] I [graph.c:269:gf_add_cmdline_options] 0-dist-dht:
adding option 'use-readdirp' for volume 'dist-dht' with value 'yes'
[2015-03-11 02:00:33.233257] I
[dht-shared.c:272:dht_parse_decommissioned_bricks] 0-dist-dht: decommissioning
subvolume dist-client-1
[2015-03-11 02:00:33.233380] I [dht-shared.c:337:dht_init_regex] 0-dist-dht:
using regex rsync-hash-regex = ^\.(.+)\.[^.]+$
[2015-03-11 02:00:33.236568] I [event-epoll.c:629:event_dispatch_epoll_worker]
0-epoll: Started thread with index 2
[2015-03-11 02:00:33.239365] I [client.c:2350:notify] 0-dist-client-1: parent
translators are ready, attempting connect on transport
[2015-03-11 02:00:33.244180] I [client.c:2350:notify] 0-dist-client-2: parent
translators are ready, attempting connect on transport
[2015-03-11 02:00:33.244920] I [rpc-clnt.c:1759:rpc_clnt_reconfig]
0-dist-client-1: changing port to 49214 (from 0)
[2015-03-11 02:00:33.251076] I [client.c:2350:notify] 0-dist-client-3: parent
translators are ready, attempting connect on transport
[2015-03-11 02:00:33.254459] E [socket.c:2213:socket_connect_finish]
0-dist-client-1: connection to 10.70.37.7:49214 failed (Connection refused)
[2015-03-11 02:00:33.254506] W [dht-common.c:6044:dht_notify] 0-dist-dht:
Received CHILD_DOWN. Exiting
[2015-03-11 02:00:33.254767] I [rpc-clnt.c:1759:rpc_clnt_reconfig]
0-dist-client-2: changing port to 49215 (from 0)
[2015-03-11 02:00:33.258997] I [client.c:2350:notify] 0-dist-client-4: parent
translators are ready, attempting connect on transport
[2015-03-11 02:00:33.262513] I
[client-handshake.c:1412:select_server_supported_programs] 0-dist-client-2:
Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2015-03-11 02:00:33.262886] I [client-handshake.c:1200:client_setvolume_cbk]
0-dist-client-2: Connected to dist-client-2, attached to remote volume
'/rhs/brick1/d2'.
[2015-03-11 02:00:33.262911] I [client-handshake.c:1210:client_setvolume_cbk]
0-dist-client-2: Server and Client lk-version numbers are not same, reopening
the fds
[2015-03-11 02:00:33.263263] I
[client-handshake.c:187:client_set_lk_version_cbk] 0-dist-client-2: Server lk
version = 1
[2015-03-11 02:00:33.263661] I [rpc-clnt.c:1759:rpc_clnt_reconfig]
0-dist-client-3: changing port to 49216 (from 0)

--- Additional comment from Anand Avati on 2015-05-28 10:52:08 IST ---

REVIEW: http://review.gluster.org/10954 (dht: check if all bricks are started
before performing remove-brick) posted (#1) for review on master by Sakshi
Bansal (sabansal at redhat.com)

--- Additional comment from Anand Avati on 2015-06-08 18:33:50 IST ---

REVIEW: http://review.gluster.org/10954 (dht : check if all bricks are started
before performing remove-brick) posted (#2) for review on master by Sakshi
Bansal (sabansal at redhat.com)

--- Additional comment from Anand Avati on 2015-08-29 08:45:00 IST ---

REVIEW: http://review.gluster.org/10954 (glusterd : check if all bricks are
started before performing remove-brick) posted (#3) for review on master by
Sakshi Bansal (sabansal at redhat.com)

--- Additional comment from Anand Avati on 2015-09-01 11:10:42 IST ---

REVIEW: http://review.gluster.org/10954 (glusterd: check if all bricks are
started before performing remove-brick) posted (#4) for review on master by
Sakshi Bansal (sabansal at redhat.com)

--- Additional comment from Vijay Bellur on 2015-09-03 15:18:01 IST ---

REVIEW: http://review.gluster.org/10954 (glusterd : check if all bricks are
started before performing remove-brick) posted (#5) for review on master by
Sakshi Bansal (sabansal at redhat.com)

--- Additional comment from Vijay Bellur on 2016-01-07 13:21:14 IST ---

REVIEW: http://review.gluster.org/13191 (glusterd: remove-brick commit getting
executed before migration has completed) posted (#1) for review on master by
Sakshi Bansal

--- Additional comment from Vijay Bellur on 2016-01-07 15:56:12 IST ---

REVIEW: http://review.gluster.org/13191 (tests: remove-brick commit getting
executed before migration has completed) posted (#2) for review on master by
Sakshi Bansal

--- Additional comment from Vijay Bellur on 2016-01-12 17:04:29 IST ---

REVIEW: http://review.gluster.org/13191 (tests: remove-brick commit getting
executed before migration has completed) posted (#3) for review on master by
Atin Mukherjee (amukherj at redhat.com)

--- Additional comment from Vijay Bellur on 2016-01-28 11:51:00 IST ---

REVIEW: http://review.gluster.org/13191 (tests: remove-brick commit getting
executed before migration has completed) posted (#4) for review on master by
Sakshi Bansal

--- Additional comment from Vijay Bellur on 2016-02-02 16:10:31 IST ---

REVIEW: http://review.gluster.org/13191 (tests: remove-brick commit getting
executed before migration has completed) posted (#5) for review on master by
Sakshi Bansal

--- Additional comment from Vijay Bellur on 2016-02-24 18:45:52 IST ---

COMMIT: http://review.gluster.org/13191 committed in master by Raghavendra
Talur (rtalur at redhat.com) 
------
commit 6209e227f86025ff9591d78e69c4758b62271a04
Author: Sakshi Bansal <sabansal at redhat.com>
Date:   Thu Jan 7 13:09:58 2016 +0530

    tests: remove-brick commit getting executed before migration has completed

    Remove brick commit will fail when it is executed while rebalance is in
    progress. Hence added a rebalance timeout check before remove-brick commit
to
    enusre that rebalance has completed.

    Change-Id: Ic12f97cbba417ce8cddb35ae973f2bc9bde0fc80
    BUG: 1225716
    Signed-off-by: Sakshi Bansal <sabansal at redhat.com>
    Reviewed-on: http://review.gluster.org/13191
    Reviewed-by: Gaurav Kumar Garg <ggarg at redhat.com>
    Smoke: Gluster Build System <jenkins at build.gluster.com>
    CentOS-regression: Gluster Build System <jenkins at build.gluster.com>
    NetBSD-regression: NetBSD Build System <jenkins at build.gluster.org>
    Reviewed-by: Raghavendra Talur <rtalur at redhat.com>


Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1225716
[Bug 1225716] tests : remove brick command execution displays success even
after, one of the bricks down.
-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Bugs mailing list