[Bugs] [Bug 1224144] New: rebalance failed after attaching the tier to the volume.

Fri May 22 09:11:45 UTC 2015

https://bugzilla.redhat.com/show_bug.cgi?id=1224144

            Bug ID: 1224144
           Summary: rebalance failed after attaching the tier to the
                    volume.
           Product: Red Hat Gluster Storage
           Version: 3.1
         Component: glusterfs
     Sub Component: tiering
          Keywords: TestBlocker
          Severity: urgent
          Assignee: rhs-bugs at redhat.com
          Reporter: trao at redhat.com
        QA Contact: nchilaka at redhat.com
                CC: bugs at gluster.org, ndevos at redhat.com
        Depends On: 1221534, 1222092
            Blocks: 1221957 (glusterfs-tiering-supportability)

+++ This bug was initially created as a clone of Bug #1221534 +++

Description of problem:

After attaching the tier to the volume rebalance status shows failed and
rebalance doesnot happen between cold and hot tier.

Version-Release number of selected component (if applicable):
root at rhsqa14-vm3 ~]# glusterfs --version
glusterfs 3.7.0beta2 built on May 11 2015 01:27:45
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2013 Red Hat, Inc. <http://www.redhat.com/>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
It is licensed to you under your choice of the GNU Lesser
General Public License, version 3 or any later version (LGPLv3
or later), or the GNU General Public License, version 2 (GPLv2),
in all cases as published by the Free Software Foundation.
You have new mail in /var/spool/mail/root
[root at rhsqa14-vm3 ~]# 
[root at rhsqa14-vm3 ~]# 
[root at rhsqa14-vm3 ~]# rpm -qa | grep gluster
glusterfs-libs-3.7.0beta2-0.0.el6.x86_64
glusterfs-fuse-3.7.0beta2-0.0.el6.x86_64
glusterfs-rdma-3.7.0beta2-0.0.el6.x86_64
glusterfs-3.7.0beta2-0.0.el6.x86_64
glusterfs-api-3.7.0beta2-0.0.el6.x86_64
glusterfs-cli-3.7.0beta2-0.0.el6.x86_64
glusterfs-geo-replication-3.7.0beta2-0.0.el6.x86_64
glusterfs-extra-xlators-3.7.0beta2-0.0.el6.x86_64
glusterfs-client-xlators-3.7.0beta2-0.0.el6.x86_64
glusterfs-server-3.7.0beta2-0.0.el6.x86_64
[root at rhsqa14-vm3 ~]#

How reproducible:
easily

Steps to Reproduce:
1.create dist-rep volume.
2.fuse mount it and add some dirs.
3. attach tier to the volume.
4. rebalance status shows failed.
5. on mount point dirs are missing.

Actual results:

rebalance fails.

Additional info:

[root at rhsqa14-vm1 linux-4.0]# gluster  v info test

Volume Name: test
Type: Distribute
Volume ID: d5b37bbc-af62-48cd-8942-43b5a395da65
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: 10.70.46.233:/rhs/brick4/t1
Brick2: 10.70.46.236:/rhs/brick4/t1
Options Reconfigured:
performance.readdir-ahead: on 
[root at rhsqa14-vm1 linux-4.0]# 
[root at rhsqa14-vm1 linux-4.0]# gluster v attach-tier test replica 2
10.70.46.233:/rhs/brick3/t2 10.70.46.236:/rhs/brick3/t2 10.70.46.233:/rhs
/brick5/t2 10.70.46.236:/rhs/brick5/t2 force
Attach tier is recommended only for testing purposes in this release. Do you
want to continue? (y/n) y
volume attach-tier: success   
volume rebalance: test: success: Rebalance on test has been started
successfully. Use rebalance status command to check status of the rebala
nce process.
ID: a89419aa-c45b-42ef-8998-1a23cde16efe

[root at rhsqa14-vm1 linux-4.0]# 

[root at rhsqa14-vm1 linux-4.0]# gluster v rebalance test status                   
                                    Node Rebalanced-files          size      
scanned      failures       skipped               status   run
 time in secs
                               ---------      -----------   -----------  
-----------   -----------   -----------         ------------     -
-------------
                               localhost                0        0Bytes        
    0             0             0               failed               0.00
                            10.70.46.236                0        0Bytes        
    0             0             0               failed               0.00
volume rebalance: test: success:
[root at rhsqa14-vm1 linux-4.0]# 
[root at rhsqa14-vm1 linux-4.0]# gluster v info test

Volume Name: test
Type: Tier
Volume ID: d5b37bbc-af62-48cd-8942-43b5a395da65
Status: Started
Number of Bricks: 6
Transport-type: tcp
Hot Tier :
Hot Tier Type : Distributed-Replicate
Number of Bricks: 2 x 2 = 4   
Brick1: 10.70.46.236:/rhs/brick5/t2
Brick2: 10.70.46.233:/rhs/brick5/t2
Brick3: 10.70.46.236:/rhs/brick3/t2
Brick4: 10.70.46.233:/rhs/brick3/t2
Cold Bricks:
Cold Tier Type : Distribute   
Number of Bricks: 2
Brick5: 10.70.46.233:/rhs/brick4/t1
Brick6: 10.70.46.236:/rhs/brick4/t1
Options Reconfigured:
performance.readdir-ahead: on 

Before attaching the tier: on mount poit
[root at rhsqa14-vm5 disk1]# ls -la
total 4
drwxr-xr-x.  5 root root  133 May 14  2015 .
dr-xr-xr-x. 30 root root 4096 May 14 05:47 ..
-rw-r--r--.  1 root root    0 May 14 05:47 t1
-rw-r--r--.  1 root root    0 May 14 05:47 t2
-rw-r--r--.  1 root root    0 May 14 05:47 t4
drwxr-xr-x.  3 root root   48 May 14  2015 .trashcan
drwxr-xr-x.  2 root root   12 May 14  2015 triveni
[root at rhsqa14-vm5 disk1]#

After attaching the tier on mount point

[root at rhsqa14-vm5 disk1]# ls -la
total 4
drwxr-xr-x.  4 root root  211 May 14  2015 .
dr-xr-xr-x. 30 root root 4096 May 14 05:47 ..
-rw-r--r--.  1 root root    0 May 14 05:47 t1
-rw-r--r--.  1 root root    0 May 14 05:47 t2
-rw-r--r--.  1 root root    0 May 14 05:47 t4
drwxr-xr-x.  3 root root   96 May 14  2015 .trashcan
[root at rhsqa14-vm5 disk1]# 

rebalance Log messages:

[2015-05-14 09:52:37.116716] I [graph.c:269:gf_add_cmdline_options]
0-test-hot-replicate-1: adding option 'data-self-heal' for volume
'test-hot-replicate-1' with value 'off'
[2015-05-14 09:52:37.116736] I [graph.c:269:gf_add_cmdline_options]
0-test-hot-replicate-0: adding option 'readdir-failover' for volume
'test-hot-replicate-0' with value 'off'
[2015-05-14 09:52:37.116747] I [graph.c:269:gf_add_cmdline_options]
0-test-hot-replicate-0: adding option 'entry-self-heal' for volume
'test-hot-replicate-0' with value 'off'
[2015-05-14 09:52:37.116757] I [graph.c:269:gf_add_cmdline_options]
0-test-hot-replicate-0: adding option 'metadata-self-heal' for volume
'test-hot-replicate-0' with value 'off'
[2015-05-14 09:52:37.116767] I [graph.c:269:gf_add_cmdline_options]
0-test-hot-replicate-0: adding option 'data-self-heal' for volume
'test-hot-replicate-0' with value 'off'
[2015-05-14 09:52:37.116778] I [graph.c:269:gf_add_cmdline_options]
0-test-cold-dht: adding option 'commit-hash' for volume 'test-cold-dht' with
value '2862842620'
[2015-05-14 09:52:37.116788] I [graph.c:269:gf_add_cmdline_options]
0-test-cold-dht: adding option 'node-uuid' for volume 'test-cold-dht' with
value '87acbf29-e821-48bf-9aa8-bbda9321e609'
[2015-05-14 09:52:37.116798] I [graph.c:269:gf_add_cmdline_options]
0-test-cold-dht: adding option 'rebalance-cmd' for volume 'test-cold-dht' with
value '6'
[2015-05-14 09:52:37.116809] I [graph.c:269:gf_add_cmdline_options]
0-test-cold-dht: adding option 'readdir-optimize' for volume 'test-cold-dht'
with value 'on'
[2015-05-14 09:52:37.116820] I [graph.c:269:gf_add_cmdline_options]
0-test-cold-dht: adding option 'assert-no-child-down' for volume
'test-cold-dht' with value 'yes'
[2015-05-14 09:52:37.116835] I [graph.c:269:gf_add_cmdline_options]
0-test-cold-dht: adding option 'lookup-unhashed' for volume 'test-cold-dht'
with value 'yes'
[2015-05-14 09:52:37.116846] I [graph.c:269:gf_add_cmdline_options]
0-test-cold-dht: adding option 'use-readdirp' for volume 'test-cold-dht' with
value 'yes'
[2015-05-14 09:52:37.118410] I [dht-shared.c:598:dht_init] 0-tier-dht: dht_init
using commit hash 2862842620
[2015-05-14 09:52:37.120852] E [MSGID: 109037]
[tier.c:1007:tier_load_externals] 0-tier-dht: Error loading libgfdb.so
/usr/lib64/libgfdb.so: cannot open shared object file: No such file or
directory

[2015-05-14 09:52:37.120892] E [MSGID: 109037] [tier.c:1070:tier_init]
0-tier-dht: Could not load externals. Aborting
[2015-05-14 09:52:37.120903] E [xlator.c:426:xlator_init] 0-tier-dht:
Initialization of volume 'tier-dht' failed, review your volfile again
[2015-05-14 09:52:37.120913] E [graph.c:322:glusterfs_graph_init] 0-tier-dht:
initializing translator failed
[2015-05-14 09:52:37.120921] E [graph.c:661:glusterfs_graph_activate] 0-graph:
init failed
[2015-05-14 09:52:37.124323] W [glusterfsd.c:1219:cleanup_and_exit] (--> 0-:
received signum (0), shutting down
~
~

--- Additional comment from Triveni Rao on 2015-05-15 05:37:19 EDT ---

This was reproducible on new downstream build also:

[root at rhsqa14-vm1 ~]# rpm -qa | grep gluster
glusterfs-3.7.0-2.el6rhs.x86_64
glusterfs-cli-3.7.0-2.el6rhs.x86_64
glusterfs-libs-3.7.0-2.el6rhs.x86_64
glusterfs-client-xlators-3.7.0-2.el6rhs.x86_64
glusterfs-api-3.7.0-2.el6rhs.x86_64
glusterfs-server-3.7.0-2.el6rhs.x86_64
glusterfs-fuse-3.7.0-2.el6rhs.x86_64
[root at rhsqa14-vm1 ~]# 

[root at rhsqa14-vm1 ~]# glusterfs --version
glusterfs 3.7.0 built on May 15 2015 01:31:10
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2013 Red Hat, Inc. <http://www.redhat.com/>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
It is licensed to you under your choice of the GNU Lesser
General Public License, version 3 or any later version (LGPLv3
or later), or the GNU General Public License, version 2 (GPLv2),
in all cases as published by the Free Software Foundation.
[root at rhsqa14-vm1 ~]# 

[root at rhsqa14-vm1 ~]# gluster v attach-tier vol1 replica 2
10.70.46.233:/rhs/brick3/m1 10.70.46.236:/rhs/brick3/m1
10.70.46.233:/rhs/brick5/m1 10.70.4
6.236:/rhs/brick5/m1  
Attach tier is recommended only for testing purposes in this release. Do you
want to continue? (y/n) y
volume attach-tier: success
volume rebalance: vol1: success: Rebalance on vol1 has been started
successfully. Use rebalance status command to check status of the rebalance
proces
s.
ID: b954b5e0-c4fa-4619-92ca-3c5e657269aa

You have new mail in /var/spool/mail/root
[root at rhsqa14-vm1 ~]# 
[root at rhsqa14-vm1 ~]# gluster v rebalance vol1  status
                                    Node Rebalanced-files          size      
scanned      failures       skipped               status   run time in s
ecs
                               ---------      -----------   -----------  
-----------   -----------   -----------         ------------     -----------
---
                               localhost                0        0Bytes        
    0             0             0               failed               0
.00
                            10.70.46.236                0        0Bytes        
    0             0             0               failed               0
.00
volume rebalance: vol1: success:
[root at rhsqa14-vm1 ~]# 
[root at rhsqa14-vm1 ~]# 
[root at rhsqa14-vm1 ~]# 

[root at rhsqa14-vm1 ~]# less /var/log/glusterfs/vol1-rebalance.log 
[2015-05-15 08:55:31.691592] I [MSGID: 100030] [glusterfsd.c:2294:main]
0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.0 (args:
/usr/sbin/glusterfs -s localhost --volfile-id rebalance/vol1 --xlator-option
*dht.use-readdirp=yes --xlator-option *dht.lookup-unhashed=yes --xlator-option
*dht.assert-no-child-down=yes --xlator-option *replicate*.data-self-heal=off
--xlator-option *replicate*.metadata-self-heal=off --xlator-option
*replicate*.entry-self-heal=off --xlator-option
*replicate*.readdir-failover=off --xlator-option *dht.readdir-optimize=on
--xlator-option *tier-dht.xattr-name=trusted.tier-gfid --xlator-option
*dht.rebalance-cmd=6 --xlator-option
*dht.node-uuid=87acbf29-e821-48bf-9aa8-bbda9321e609 --xlator-option
*dht.commit-hash=2863506458 --socket-file
/var/run/gluster/gluster-rebalance-37d0a9c0-21c1-46cf-ba95-419f9fbfbab0.sock
--pid-file
/var/lib/glusterd/vols/vol1/rebalance/87acbf29-e821-48bf-9aa8-bbda9321e609.pid
-l /var/log/glusterfs/vol1-rebalance.log)
[2015-05-15 08:55:31.730515] I [event-epoll.c:629:event_dispatch_epoll_worker]
0-epoll: Started thread with index 1
[2015-05-15 08:55:36.690982] I [graph.c:269:gf_add_cmdline_options] 0-tier-dht:
adding option 'commit-hash' for volume 'tier-dht' with value '2863506458'
[2015-05-15 08:55:36.691003] I [graph.c:269:gf_add_cmdline_options] 0-tier-dht:
adding option 'node-uuid' for volume 'tier-dht' with value
'87acbf29-e821-48bf-9aa8-bbda9321e609'
[2015-05-15 08:55:36.691014] I [graph.c:269:gf_add_cmdline_options] 0-tier-dht:
adding option 'rebalance-cmd' for volume 'tier-dht' with value '6'
[2015-05-15 08:55:36.691025] I [graph.c:269:gf_add_cmdline_options] 0-tier-dht:
adding option 'xattr-name' for volume 'tier-dht' with value 'trusted.tier-gfid'
[2015-05-15 08:55:36.691035] I [graph.c:269:gf_add_cmdline_options] 0-tier-dht:
adding option 'readdir-optimize' for volume 'tier-dht' with value 'on'
[2015-05-15 08:55:36.691046] I [graph.c:269:gf_add_cmdline_options] 0-tier-dht:
adding option 'assert-no-child-down' for volume 'tier-dht' with value 'yes'
[2015-05-15 08:55:36.691056] I [graph.c:269:gf_add_cmdline_options] 0-tier-dht:
adding option 'lookup-unhashed' for volume 'tier-dht' with value 'yes'
[2015-05-15 08:55:36.691069] I [graph.c:269:gf_add_cmdline_options] 0-tier-dht:
adding option 'use-readdirp' for volume 'tier-dht' with value 'yes'
[2015-05-15 08:55:36.691080] I [graph.c:269:gf_add_cmdline_options]
0-vol1-hot-dht: adding option 'commit-hash' for volume 'vol1-hot-dht' with
value '2863506458'
[2015-05-15 08:55:36.692412] I [graph.c:269:gf_add_cmdline_options]
0-vol1-hot-dht: adding option 'node-uuid' for volume 'vol1-hot-dht' with value
'87acbf29-e821-48bf-9aa8-bbda9321e609'
[2015-05-15 08:55:36.692425] I [graph.c:269:gf_add_cmdline_options]
0-vol1-hot-dht: adding option 'rebalance-cmd' for volume 'vol1-hot-dht' with
value '6'
[2015-05-15 08:55:36.692437] I [graph.c:269:gf_add_cmdline_options]
0-vol1-hot-dht: adding option 'readdir-optimize' for volume 'vol1-hot-dht' with
value 'on'
[2015-05-15 08:55:36.692449] I [graph.c:269:gf_add_cmdline_options]
0-vol1-hot-dht: adding option 'assert-no-child-down' for volume 'vol1-hot-dht'
with value 'yes'
[2015-05-15 08:55:36.692459] I [graph.c:269:gf_add_cmdline_options]
0-vol1-hot-dht: adding option 'lookup-unhashed' for volume 'vol1-hot-dht' with
val...skipping...
ith value 'yes'
[2015-05-15 08:57:58.911456] I [graph.c:269:gf_add_cmdline_options]
0-vol1-cold-dht: adding option 'lookup-unhashed' for volume 'vol1-cold-dht'
with value 'yes'
[2015-05-15 08:57:58.911466] I [graph.c:269:gf_add_cmdline_options]
0-vol1-cold-dht: adding option 'use-readdirp' for volume 'vol1-cold-dht' with
value 'yes'
[2015-05-15 08:57:58.911477] I [graph.c:269:gf_add_cmdline_options]
0-vol1-cold-replicate-1: adding option 'readdir-failover' for volume
'vol1-cold-replicate-1' with value 'off'
[2015-05-15 08:57:58.911486] I [graph.c:269:gf_add_cmdline_options]
0-vol1-cold-replicate-1: adding option 'entry-self-heal' for volume
'vol1-cold-replicate-1' with value 'off'
[2015-05-15 08:57:58.911496] I [graph.c:269:gf_add_cmdline_options]
0-vol1-cold-replicate-1: adding option 'metadata-self-heal' for volume
'vol1-cold-replicate-1' with value 'off'
[2015-05-15 08:57:58.911505] I [graph.c:269:gf_add_cmdline_options]
0-vol1-cold-replicate-1: adding option 'data-self-heal' for volume
'vol1-cold-replicate-1' with value 'off'
[2015-05-15 08:57:58.911517] I [graph.c:269:gf_add_cmdline_options]
0-vol1-cold-replicate-0: adding option 'readdir-failover' for volume
'vol1-cold-replicate-0' with value 'off'
[2015-05-15 08:57:58.911527] I [graph.c:269:gf_add_cmdline_options]
0-vol1-cold-replicate-0: adding option 'entry-self-heal' for volume
'vol1-cold-replicate-0' with value 'off'
[2015-05-15 08:57:58.911536] I [graph.c:269:gf_add_cmdline_options]
0-vol1-cold-replicate-0: adding option 'metadata-self-heal' for volume
'vol1-cold-replicate-0' with value 'off'
[2015-05-15 08:57:58.911545] I [graph.c:269:gf_add_cmdline_options]
0-vol1-cold-replicate-0: adding option 'data-self-heal' for volume
'vol1-cold-replicate-0' with value 'off'
[2015-05-15 08:57:58.912392] I [dht-shared.c:598:dht_init] 0-tier-dht: dht_init
using commit hash 2863507596
[2015-05-15 08:57:58.912452] I
[dht-shared.c:299:dht_parse_decommissioned_bricks] 0-tier-dht: decommissioning
subvolume vol1-hot-dht
[2015-05-15 08:57:58.913862] E [MSGID: 109037]
[tier.c:1007:tier_load_externals] 0-tier-dht: Error loading libgfdb.so
/usr/lib64/libgfdb.so: cannot open shared object file: No such file or
directory

[2015-05-15 08:57:58.913899] E [MSGID: 109037] [tier.c:1070:tier_init]
0-tier-dht: Could not load externals. Aborting
[2015-05-15 08:57:58.913946] E [xlator.c:426:xlator_init] 0-tier-dht:
Initialization of volume 'tier-dht' failed, review your volfile again
[2015-05-15 08:57:58.913959] E [graph.c:322:glusterfs_graph_init] 0-tier-dht:
initializing translator failed
[2015-05-15 08:57:58.913967] E [graph.c:661:glusterfs_graph_activate] 0-graph:
init failed
[2015-05-15 08:57:58.914373] W [glusterfsd.c:1219:cleanup_and_exit] (--> 0-:
received signum (0), shutting down
(END)

--- Additional comment from Anand Avati on 2015-05-16 04:33:24 EDT ---

REVIEW: http://review.gluster.org/10799 (cluster/tier: load libgfdb.so properly
in all cases) posted (#1) for review on release-3.7 by Niels de Vos
(ndevos at redhat.com)

Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1221534
[Bug 1221534] rebalance failed after attaching the tier to the volume.
https://bugzilla.redhat.com/show_bug.cgi?id=1221957
[Bug 1221957] Fully support data-tiering in 3.7.x, remove out of
'experimental' status
https://bugzilla.redhat.com/show_bug.cgi?id=1222092
[Bug 1222092] rebalance failed after attaching the tier to the volume.
-- 
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=m2QsgVpFT8&a=cc_unsubscribe