[Bugs] [Bug 1227262] New: rebalance failing on one of the node, in 2X2 Distributed-replicate volume setup with 2 bricks per node setup

Tue Jun 2 09:22:49 UTC 2015

https://bugzilla.redhat.com/show_bug.cgi?id=1227262

            Bug ID: 1227262
           Summary: rebalance failing on one of the node, in 2X2
                    Distributed-replicate volume setup with 2 bricks per
                    node setup
           Product: Red Hat Gluster Storage
           Version: 3.1
         Component: gluster-dht
          Severity: high
          Assignee: rhs-bugs at redhat.com
          Reporter: sasundar at redhat.com
        QA Contact: sasundar at redhat.com
                CC: bugs at gluster.org, gluster-bugs at redhat.com,
                    nbalacha at redhat.com, srangana at redhat.com
        Depends On: 1221656

+++ This bug was initially created as a clone of Bug #1221656 +++

Description of problem:
-----------------------
I was using 2 RHEL 6.6 machines and installed glusterfs-3.7.0beta2 builds. Each
node has 3 bricks. After creating a cluster of these 2 nodes, by peer probing,
I created a distributed replicate volume of 2X2 bricks.

Adding a pair of another bricks to this volume and rebalancing resulted in
rebalance failing in one of the node.

Version-Release number of selected component (if applicable):
-------------------------------------------------------------
glusterfs-3.7.0beta2 build

How reproducible:
------------------
Always

Steps to Reproduce:
-------------------
1. Create a 2 node cluster with 3 bricks per node
2. Create a distributed-replicate volume of 2X2 
3. Start the volume
4. Mount the volume ( fuse, nfs )
5. Create few files on the mount
6. Add a pair of bricks to the volume
7. Perform rebalance

Actual results:
---------------
Rebalance failed on the second node

Expected results:
-----------------
Rebalance should complete successfully

--- Additional comment from SATHEESARAN on 2015-05-14 09:57:44 EDT ---

[root@~]# gluster volume rebalance vmstore start
volume rebalance: vmstore: success: Rebalance on vmstore has been started
successfully. Use rebalance status command to check status of the rebalance
process.
ID: 9372b71c-e6f4-44fb-a2e4-9707443f3457

[root@ ~]# gluster volume rebalance vmstore status
                                    Node Rebalanced-files          size      
scanned      failures       skipped               status   run time in secs
                               ---------      -----------   -----------  
-----------   -----------   -----------         ------------     --------------
                               localhost                0        0Bytes        
    3             0             1            completed               1.00
                             10.70.37.58                0        0Bytes        
    0             3             0               failed               0.00
volume rebalance: vmstore: success:

<snip_rebalance_logs>
[2015-05-14 19:17:41.419890] I [dht-rebalance.c:2112:gf_defrag_process_dir]
0-vmstore-dht: migrate data called on /
[2015-05-14 19:17:41.424661] I [dht-common.c:3539:dht_setxattr] 0-vmstore-dht:
fixing the layout of /.trashcan
[2015-05-14 19:17:41.424688] I
[dht-selfheal.c:1494:dht_fix_layout_of_directory] 0-vmstore-dht: subvolume 0
(vmstore-replicate-0): 101834 chunks
[2015-05-14 19:17:41.424699] I
[dht-selfheal.c:1494:dht_fix_layout_of_directory] 0-vmstore-dht: subvolume 1
(vmstore-replicate-1): 101834 chunks
[2015-05-14 19:17:41.424708] I
[dht-selfheal.c:1494:dht_fix_layout_of_directory] 0-vmstore-dht: subvolume 2
(vmstore-replicate-2): 101834 chunks
[2015-05-14 19:17:41.434411] I [dht-rebalance.c:2112:gf_defrag_process_dir]
0-vmstore-dht: migrate data called on /.trashcan
[2015-05-14 19:17:41.446254] I [dht-common.c:3539:dht_setxattr] 0-vmstore-dht:
fixing the layout of /.trashcan/internal_op
[2015-05-14 19:17:41.446279] I
[dht-selfheal.c:1494:dht_fix_layout_of_directory] 0-vmstore-dht: subvolume 0
(vmstore-replicate-0): 101834 chunks
[2015-05-14 19:17:41.446290] I
[dht-selfheal.c:1494:dht_fix_layout_of_directory] 0-vmstore-dht: subvolume 1
(vmstore-replicate-1): 101834 chunks
[2015-05-14 19:17:41.446298] I
[dht-selfheal.c:1494:dht_fix_layout_of_directory] 0-vmstore-dht: subvolume 2
(vmstore-replicate-2): 101834 chunks
[2015-05-14 19:17:41.453365] I [dht-rebalance.c:2112:gf_defrag_process_dir]
0-vmstore-dht: migrate data called on /.trashcan/internal_op
[2015-05-14 19:17:41.458214] I [dht-common.c:3539:dht_setxattr] 0-vmstore-dht:
fixing the layout of /.trashcan/internal_op
[2015-05-14 19:17:41.458542] E [dht-rebalance.c:2368:gf_defrag_settle_hash]
0-vmstore-dht: fix layout on /.trashcan/internal_op failed
[2015-05-14 19:17:41.458824] E [MSGID: 109016]
[dht-rebalance.c:2528:gf_defrag_fix_layout] 0-vmstore-dht: Fix layout failed
for /.trashcan
</snip_rebalance_logs>

--- Additional comment from SATHEESARAN on 2015-05-14 09:59:17 EDT ---

Following is the mail conversation from Nithya to gluster-devel for this issue
:

<snip>
The rebalance failure is due to the interaction of the lookup-unhashed changes
and rebalance local crawl changes.
</snip>

--- Additional comment from Anand Avati on 2015-05-15 01:21:34 EDT ---

REVIEW: http://review.gluster.org/10788 (dht/rebalance : Fixed rebalance
failure) posted (#1) for review on release-3.7 by N Balachandran
(nbalacha at redhat.com)

--- Additional comment from Anand Avati on 2015-05-28 13:50:16 EDT ---

REVIEW: http://review.gluster.org/10788 (dht/rebalance : Fixed rebalance
failure) posted (#3) for review on release-3.7 by Shyamsundar Ranganathan
(srangana at redhat.com)

--- Additional comment from Shyamsundar on 2015-05-28 13:50:55 EDT ---

--- Additional comment from Niels de Vos on 2015-06-02 04:20:19 EDT ---

The required changes to fix this bug have not made it into glusterfs-3.7.1.
This bug is now getting tracked for glusterfs-3.7.2.

Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1221656
[Bug 1221656] rebalance failing on one of the node
-- 
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=JZPbZwwDgj&a=cc_unsubscribe