[Bugs] [Bug 1264520] volume rebalance start is successfull but status returns failed status

Fri Aug 19 09:23:09 UTC 2016

https://bugzilla.redhat.com/show_bug.cgi?id=1264520

Leildin <shelsucker at hotmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
              Flags|needinfo?(shelsucker at hotmai |
                   |l.com)                      |

--- Comment #11 from Leildin <shelsucker at hotmail.com> ---
(In reply to Nithya Balachandran from comment #10)
> My apologies for the extremely delayed response.
> 
> I went through the code and the glusterd process generates the volfiles
> based on the info stored in /var/lib/glusterd/.  It looks like something
> might be wrong there.
> 
> glusterd uses the information in the /var/lib/glusterd/<volname>/bricks
> directory to generate the client info portion for the client vol files (this
> includes any fuse client, rebalance etc).
> 
> For example, I have a volume called loop with 3 bricks.
> 
> Volume Name: loop
> Type: Distribute
> Volume ID: 68b941df-b656-4950-bcfa-bdd940b774a7
> Status: Started
> Number of Bricks: 3
> Transport-type: tcp
> Bricks:
> Brick1: 192.168.122.9:/bricks/brick2/b2
> Brick2: 192.168.122.9:/bricks/brick1/b2
> Brick3: 192.168.122.8:/bricks/brick2/b2
> Options Reconfigured:
> transport.address-family: inet
> performance.readdir-ahead: on
> nfs.disable: on
> diagnostics.client-log-level: INFO
> 
> 
> If I check the brick info stored in /var/lib/glusterd/vols/loop/bricks, I
> see 
> 
> -rw------- 1 root root 179 Aug 17 13:39 192.168.122.8:-bricks-brick2-b2
> -rw------- 1 root root 175 Aug 17 13:39 192.168.122.9:-bricks-brick1-b2
> -rw------- 1 root root 175 Aug 17 13:39 192.168.122.9:-bricks-brick2-b2
> 
> 
> These files contain the information which is used to generate the volfiles.
> 
> 
> [root at nb-rhs3-srv1 bricks]# cat 192.168.122.9:-bricks-brick2-b2
> hostname=192.168.122.9
> path=/bricks/brick2/b2
> real_path=/bricks/brick2/b2
> listen-port=0
> rdma.listen-port=0
> decommissioned=0
> brick-id=loop-client-0   <--- client 0
> mount_dir=/b2
> snap-status=0
> 
> 
> [root at nb-rhs3-srv1 bricks]# cat 192.168.122.9:-bricks-brick1-b2
> hostname=192.168.122.9
> path=/bricks/brick1/b2
> real_path=/bricks/brick1/b2
> listen-port=0
> rdma.listen-port=0
> decommissioned=0
> brick-id=loop-client-1    <--- client 1
> mount_dir=/b2
> snap-status=0
> 
> [root at nb-rhs3-srv1 bricks]# cat 192.168.122.8:-bricks-brick2-b2
> hostname=192.168.122.8
> path=/bricks/brick2/b2
> real_path=/bricks/brick2/b2
> listen-port=49152
> rdma.listen-port=0
> decommissioned=0
> brick-id=loop-client-2   <--- client 2
> mount_dir=/b2
> snap-status=0
> 
> 
> It sounds like the files in /var/lib/glusterd/data/bricks for the original 6
> bricks have for some reason got the same brick-id. 
> 
> We do not know why this could have happened. If you have any steps to
> reproduce the issue, please let us know.
> 
> Can you please send across the contents of the /var/lib/glusterd/data on the
> server so we can confirm this theory?
> 
> If this is the case, this problem will show up everytime the volfiles are
> generated (if you were to change an option or add/remove bricks for
> example). You will need to edit the files and correct the brick ids in the
> same order as listed in the gluster volume info. 
> 
> 
> Brick1: gls-safran1:/gluster/bricks/brick1/data   <-- data-client-0
> Brick2: gls-safran1:/gluster/bricks/brick2/data   <-- data-client-1
> Brick3: gls-safran1:/gluster/bricks/brick3/data   <-- data-client-2
> Brick4: gls-safran1:/gluster/bricks/brick4/data   <-- data-client-3
> Brick5: gls-safran1:/gluster/bricks/brick5/data   <-- data-client-4
> Brick6: gls-safran1:/gluster/bricks/brick6/data   <-- data-client-5
> Brick7: gls-safran1:/gluster/bricks/brick7/data   <-- data-client-6
> Brick8: gls-safran1:/gluster/bricks/brick8/data   <-- data-client-7
> 
> Please let me know if you have any questions.

Hi,

I have since moved on to gluster 3.7.14 on all of my servers.
I can confirm that when I had the bug any rebalance, option change would
corrupt the vol files.
I had to go back into them and make them right then upgrade to not have the bug
anymore.
Do you still want the /var/lib/glusterd/vols/data files ?
They are correct and don't get corrupted anymore.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=Q5i60GHlom&a=cc_unsubscribe