[Bugs] [Bug 1258144] New: Data Tiering: Tier deamon crashed when detach tier start was issued while IOs were happening

Sat Aug 29 15:40:56 UTC 2015

https://bugzilla.redhat.com/show_bug.cgi?id=1258144

            Bug ID: 1258144
           Summary: Data Tiering: Tier deamon crashed when detach tier
                    start was issued while IOs were happening
           Product: GlusterFS
           Version: 3.7.3
         Component: tiering
          Severity: urgent
          Assignee: bugs at gluster.org
          Reporter: nchilaka at redhat.com
        QA Contact: bugs at gluster.org
                CC: bugs at gluster.org

Description of problem:
=========================
I created a replicate tier over dist-rep volume. Mounted volume over nfs and I
turned on ctr 
I had done quite some IOs by untarring linux kernel tar.
The files were demoted after some time as expected.
Now. I renamed the existing untarred dir and issued an untar again.
While this was going on, I issued a detach tier start.
I noted the following observations:
1)the tier deamon crashed
2)obviously, the rebalance tier status and rebalance status shows as failed
 as below:
gluster v rebal g1  status
                                    Node Rebalanced-files          size      
scanned      failures       skipped               status   run time in secs
                               ---------      -----------   -----------  
-----------   -----------   -----------         ------------     --------------
                               localhost                0        0Bytes        
    0             0             0               failed               0.00
                             10.70.46.36                0        0Bytes        
    0             0             0               failed               0.00

3)*IMPORTANT* The IOs were however happening still and getting populated in hot
tier only(this could eventually fill the hot tier)
4)After some time, when i issued "gluster v status <vname>, it failed as below
[root at nag-manual-node1 ~]# gluster v status g1
Commit failed on localhost. Please check the log file for more details.
5)The AFR deamons too were not showing up in ps -ef

Version-Release number of selected component (if applicable):
=================================================================

[root at nag-manual-node1 ~]# rpm -qa|grep gluster
glusterfs-libs-3.7.3-0.82.git6c4096f.el6.x86_64
glusterfs-fuse-3.7.3-0.82.git6c4096f.el6.x86_64
glusterfs-server-3.7.3-0.82.git6c4096f.el6.x86_64
glusterfs-3.7.3-0.82.git6c4096f.el6.x86_64
glusterfs-api-3.7.3-0.82.git6c4096f.el6.x86_64
glusterfs-cli-3.7.3-0.82.git6c4096f.el6.x86_64
glpython-gluster-3.7.3-0.82.git6c4096f.el6.noarch
glusterfs-client-xlators-3.7.3-0.82.git6c4096f.el6.x86_64
[root at nag-manual-node1 ~]# gluster --version
glusterfs 3.7.3 built on Aug 27 2015 01:23:05
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
You may redistribute copies of GlusterFS under the terms of the GNU General
Public License.

Steps to Reproduce:
===================
1.created a 2x2 vol and start it
2.attached a 1x2 replica hot tier and mounted on nfs
3.performed linux untar
4. demotes happened after some(expected)
5. Now again did a linux untar, after renaming old dir
6. While in progress, issued a detach-tier start.
7. This caused tier deamon crash(and probably even replica crash, but not
sure,as the ps -ef didn't show them, but the files which were still getting
untarred  were avialable on both the bricks of the hot pair)

CRASH
======
[2015-08-29 16:02:01.669901] E [MSGID: 109037] [tier.c:898:tier_start]
0-g1-tier-dht: Demotion failed!
[2015-08-29 16:02:00.311020] I [MSGID: 109038]
[tier.c:350:tier_migrate_using_query_file] 0-g1-tier-dht: Tier 0 src_subvol
g1-hot-dht file .gitignore
[2015-08-29 16:02:00.312280] I [MSGID: 109038]
[tier.c:109:tier_check_same_node] 0-g1-tier-dht: /linux-4.1.6/.gitignore does
not belong to this node
[2015-08-29 16:04:00.698176] I [MSGID: 109038]
[tier.c:574:tier_build_migration_qfile] 0-g1-tier-dht: Failed to remove
/var/run/gluster/demotequeryfile-20559
pending frames:
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash: 
2015-08-29 16:04:00
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.7.3
/usr/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xb6)[0x3560c25936]
/usr/lib64/libglusterfs.so.0(gf_print_trace+0x32f)[0x3560c4549f]
/lib64/libc.so.6[0x340e8326a0]
/lib64/libc.so.6[0x340e93372f]
/usr/lib64/libgfdb.so.0(gf_sql_query_function+0xdf)[0x7fa812066dcf]
/usr/lib64/libgfdb.so.0(gf_sqlite3_find_unchanged_for_time+0xd5)[0x7fa81206bb05]
/usr/lib64/libgfdb.so.0(find_unchanged_for_time+0x4f)[0x7fa812065f1f]
/usr/lib64/glusterfs/3.7.3/xlator/cluster/tier.so(+0x5410d)[0x7fa81266f10d]
/usr/lib64/libglusterfs.so.0(dict_foreach_match+0x74)[0x3560c1d2d4]
/usr/lib64/libglusterfs.so.0(dict_foreach+0x18)[0x3560c1d388]
/usr/lib64/glusterfs/3.7.3/xlator/cluster/tier.so(+0x55ea7)[0x7fa812670ea7]
/lib64/libpthread.so.0[0x340ec07a51]
/lib64/libc.so.6(clone+0x6d)[0x340e8e89ad]

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are on the CC list for the bug.
You are the assignee for the bug.