[Bugs] [Bug 1264520] volume rebalance start is successfull but status returns failed status

Fri Aug 19 10:55:08 UTC 2016

https://bugzilla.redhat.com/show_bug.cgi?id=1264520

Nithya Balachandran <nbalacha at redhat.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
              Flags|                            |needinfo?(shelsucker at hotmai
                   |                            |l.com)

--- Comment #12 from Nithya Balachandran <nbalacha at redhat.com> ---
(In reply to Leildin from comment #11)
> (In reply to Nithya Balachandran from comment #10)
> > My apologies for the extremely delayed response.
> > 
> > I went through the code and the glusterd process generates the volfiles
> > based on the info stored in /var/lib/glusterd/.  It looks like something
> > might be wrong there.
> > 
> > glusterd uses the information in the /var/lib/glusterd/<volname>/bricks
> > directory to generate the client info portion for the client vol files (this
> > includes any fuse client, rebalance etc).
> > 
> > For example, I have a volume called loop with 3 bricks.
> > 
> > Volume Name: loop
> > Type: Distribute
> > Volume ID: 68b941df-b656-4950-bcfa-bdd940b774a7
> > Status: Started
> > Number of Bricks: 3
> > Transport-type: tcp
> > Bricks:
> > Brick1: 192.168.122.9:/bricks/brick2/b2
> > Brick2: 192.168.122.9:/bricks/brick1/b2
> > Brick3: 192.168.122.8:/bricks/brick2/b2
> > Options Reconfigured:
> > transport.address-family: inet
> > performance.readdir-ahead: on
> > nfs.disable: on
> > diagnostics.client-log-level: INFO
> > 
> > 
> > If I check the brick info stored in /var/lib/glusterd/vols/loop/bricks, I
> > see 
> > 
> > -rw------- 1 root root 179 Aug 17 13:39 192.168.122.8:-bricks-brick2-b2
> > -rw------- 1 root root 175 Aug 17 13:39 192.168.122.9:-bricks-brick1-b2
> > -rw------- 1 root root 175 Aug 17 13:39 192.168.122.9:-bricks-brick2-b2
> > 
> > 
> > These files contain the information which is used to generate the volfiles.
> > 
> > 
> > [root at nb-rhs3-srv1 bricks]# cat 192.168.122.9:-bricks-brick2-b2
> > hostname=192.168.122.9
> > path=/bricks/brick2/b2
> > real_path=/bricks/brick2/b2
> > listen-port=0
> > rdma.listen-port=0
> > decommissioned=0
> > brick-id=loop-client-0   <--- client 0
> > mount_dir=/b2
> > snap-status=0
> > 
> > 
> > [root at nb-rhs3-srv1 bricks]# cat 192.168.122.9:-bricks-brick1-b2
> > hostname=192.168.122.9
> > path=/bricks/brick1/b2
> > real_path=/bricks/brick1/b2
> > listen-port=0
> > rdma.listen-port=0
> > decommissioned=0
> > brick-id=loop-client-1    <--- client 1
> > mount_dir=/b2
> > snap-status=0
> > 
> > [root at nb-rhs3-srv1 bricks]# cat 192.168.122.8:-bricks-brick2-b2
> > hostname=192.168.122.8
> > path=/bricks/brick2/b2
> > real_path=/bricks/brick2/b2
> > listen-port=49152
> > rdma.listen-port=0
> > decommissioned=0
> > brick-id=loop-client-2   <--- client 2
> > mount_dir=/b2
> > snap-status=0
> > 
> > 
> > It sounds like the files in /var/lib/glusterd/data/bricks for the original 6
> > bricks have for some reason got the same brick-id. 
> > 
> > We do not know why this could have happened. If you have any steps to
> > reproduce the issue, please let us know.
> > 
> > Can you please send across the contents of the /var/lib/glusterd/data on the
> > server so we can confirm this theory?
> > 
> > If this is the case, this problem will show up everytime the volfiles are
> > generated (if you were to change an option or add/remove bricks for
> > example). You will need to edit the files and correct the brick ids in the
> > same order as listed in the gluster volume info. 
> > 
> > 
> > Brick1: gls-safran1:/gluster/bricks/brick1/data   <-- data-client-0
> > Brick2: gls-safran1:/gluster/bricks/brick2/data   <-- data-client-1
> > Brick3: gls-safran1:/gluster/bricks/brick3/data   <-- data-client-2
> > Brick4: gls-safran1:/gluster/bricks/brick4/data   <-- data-client-3
> > Brick5: gls-safran1:/gluster/bricks/brick5/data   <-- data-client-4
> > Brick6: gls-safran1:/gluster/bricks/brick6/data   <-- data-client-5
> > Brick7: gls-safran1:/gluster/bricks/brick7/data   <-- data-client-6
> > Brick8: gls-safran1:/gluster/bricks/brick8/data   <-- data-client-7
> > 
> > Please let me know if you have any questions.
> 
> Hi,
> 
> I have since moved on to gluster 3.7.14 on all of my servers.
> I can confirm that when I had the bug any rebalance, option change would
> corrupt the vol files.
> I had to go back into them and make them right then upgrade to not have the
> bug anymore.
> Do you still want the /var/lib/glusterd/vols/data files ?
> They are correct and don't get corrupted anymore.

Hi,

If you corrected the files in /var/lib/glusterd/vols/data/bricks, then yes,
this should not happen anymore and we don't need the files. 

In that case can we close the BZ?

-- 
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=2g0DDy8EYz&a=cc_unsubscribe