[Bugs] [Bug 1258144] New: Data Tiering: Tier deamon crashed when detach tier start was issued while IOs were happening

bugzilla at redhat.com bugzilla at redhat.com
Sat Aug 29 15:40:56 UTC 2015


https://bugzilla.redhat.com/show_bug.cgi?id=1258144

            Bug ID: 1258144
           Summary: Data Tiering: Tier deamon crashed when detach tier
                    start was issued while IOs were happening
           Product: GlusterFS
           Version: 3.7.3
         Component: tiering
          Severity: urgent
          Assignee: bugs at gluster.org
          Reporter: nchilaka at redhat.com
        QA Contact: bugs at gluster.org
                CC: bugs at gluster.org



Description of problem:
=========================
I created a replicate tier over dist-rep volume. Mounted volume over nfs and I
turned on ctr 
I had done quite some IOs by untarring linux kernel tar.
The files were demoted after some time as expected.
Now. I renamed the existing untarred dir and issued an untar again.
While this was going on, I issued a detach tier start.
I noted the following observations:
1)the tier deamon crashed
2)obviously, the rebalance tier status and rebalance status shows as failed
 as below:
gluster v rebal g1  status
                                    Node Rebalanced-files          size      
scanned      failures       skipped               status   run time in secs
                               ---------      -----------   -----------  
-----------   -----------   -----------         ------------     --------------
                               localhost                0        0Bytes        
    0             0             0               failed               0.00
                             10.70.46.36                0        0Bytes        
    0             0             0               failed               0.00



3)*IMPORTANT* The IOs were however happening still and getting populated in hot
tier only(this could eventually fill the hot tier)
4)After some time, when i issued "gluster v status <vname>, it failed as below
[root at nag-manual-node1 ~]# gluster v status g1
Commit failed on localhost. Please check the log file for more details.
5)The AFR deamons too were not showing up in ps -ef


Version-Release number of selected component (if applicable):
=================================================================

[root at nag-manual-node1 ~]# rpm -qa|grep gluster
glusterfs-libs-3.7.3-0.82.git6c4096f.el6.x86_64
glusterfs-fuse-3.7.3-0.82.git6c4096f.el6.x86_64
glusterfs-server-3.7.3-0.82.git6c4096f.el6.x86_64
glusterfs-3.7.3-0.82.git6c4096f.el6.x86_64
glusterfs-api-3.7.3-0.82.git6c4096f.el6.x86_64
glusterfs-cli-3.7.3-0.82.git6c4096f.el6.x86_64
glpython-gluster-3.7.3-0.82.git6c4096f.el6.noarch
glusterfs-client-xlators-3.7.3-0.82.git6c4096f.el6.x86_64
[root at nag-manual-node1 ~]# gluster --version
glusterfs 3.7.3 built on Aug 27 2015 01:23:05
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
You may redistribute copies of GlusterFS under the terms of the GNU General
Public License.




Steps to Reproduce:
===================
1.created a 2x2 vol and start it
2.attached a 1x2 replica hot tier and mounted on nfs
3.performed linux untar
4. demotes happened after some(expected)
5. Now again did a linux untar, after renaming old dir
6. While in progress, issued a detach-tier start.
7. This caused tier deamon crash(and probably even replica crash, but not
sure,as the ps -ef didn't show them, but the files which were still getting
untarred  were avialable on both the bricks of the hot pair)



CRASH
======
[2015-08-29 16:02:01.669901] E [MSGID: 109037] [tier.c:898:tier_start]
0-g1-tier-dht: Demotion failed!
[2015-08-29 16:02:00.311020] I [MSGID: 109038]
[tier.c:350:tier_migrate_using_query_file] 0-g1-tier-dht: Tier 0 src_subvol
g1-hot-dht file .gitignore
[2015-08-29 16:02:00.312280] I [MSGID: 109038]
[tier.c:109:tier_check_same_node] 0-g1-tier-dht: /linux-4.1.6/.gitignore does
not belong to this node
[2015-08-29 16:04:00.698176] I [MSGID: 109038]
[tier.c:574:tier_build_migration_qfile] 0-g1-tier-dht: Failed to remove
/var/run/gluster/demotequeryfile-20559
pending frames:
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash: 
2015-08-29 16:04:00
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.7.3
/usr/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xb6)[0x3560c25936]
/usr/lib64/libglusterfs.so.0(gf_print_trace+0x32f)[0x3560c4549f]
/lib64/libc.so.6[0x340e8326a0]
/lib64/libc.so.6[0x340e93372f]
/usr/lib64/libgfdb.so.0(gf_sql_query_function+0xdf)[0x7fa812066dcf]
/usr/lib64/libgfdb.so.0(gf_sqlite3_find_unchanged_for_time+0xd5)[0x7fa81206bb05]
/usr/lib64/libgfdb.so.0(find_unchanged_for_time+0x4f)[0x7fa812065f1f]
/usr/lib64/glusterfs/3.7.3/xlator/cluster/tier.so(+0x5410d)[0x7fa81266f10d]
/usr/lib64/libglusterfs.so.0(dict_foreach_match+0x74)[0x3560c1d2d4]
/usr/lib64/libglusterfs.so.0(dict_foreach+0x18)[0x3560c1d388]
/usr/lib64/glusterfs/3.7.3/xlator/cluster/tier.so(+0x55ea7)[0x7fa812670ea7]
/lib64/libpthread.so.0[0x340ec07a51]
/lib64/libc.so.6(clone+0x6d)[0x340e8e89ad]

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Bugs mailing list