[Bugs] [Bug 1313315] New: [HC] glusterfs mount crashed

Tue Mar 1 11:33:22 UTC 2016

https://bugzilla.redhat.com/show_bug.cgi?id=1313315

            Bug ID: 1313315
           Summary: [HC] glusterfs mount crashed
           Product: GlusterFS
           Version: 3.7.9
         Component: sharding
          Keywords: Triaged
          Assignee: kdhananj at redhat.com
          Reporter: kdhananj at redhat.com
        QA Contact: bugs at gluster.org
                CC: annair at redhat.com, bugs at gluster.org,
                    kdhananj at redhat.com, knarra at redhat.com
        Depends On: 1313290, 1313293

+++ This bug was initially created as a clone of Bug #1313293 +++

+++ This bug was initially created as a clone of Bug #1313290 +++

Description of problem:
In a HC setup where there are three nodes in the system when the first host
loses its network connectivity and it node comes backup glusterfs mount for
engine domain is not present and the mount is crashed.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. Install HC 
2. once the engine vm is started and running add all the other hosts to the
engine. 
3. Make sure that none of your interfaces has ips on the system where your
engine is currently running.
4. Now login to the machine and run dhclient where there is no ip attached to
the system.
5. Once the system is up login to the system and try to connect to the vm.

Actual results:
user will not be able to connect to vm plus engine volume is not mounted
anymore and there is core dump in the system.

Expected results:
user should be able to connect to the vm and should not see any crashes.

--- Additional comment from RamaKasturi on 2016-03-01 05:39:16 EST ---

brack trace from the system:
==============================
(gdb) bt
#0  shard_fsync_cbk (frame=frame at entry=0x7fe701c2facc, cookie=0x7fe701c10ba0,
this=0x7fe6f800d1c0, op_ret=op_ret at entry=-1, op_errno=op_errno at entry=107,
prebuf=prebuf at entry=0x0, postbuf=postbuf at entry=0x0,
    xdata=xdata at entry=0x0) at shard.c:3884
#1  0x00007fe6fc4e915f in dht_fsync_cbk (frame=0x7fe701c10ba0,
cookie=<optimized out>, this=<optimized out>, op_ret=-1, op_errno=107,
prebuf=0x0, postbuf=0x0, xdata=0x0) at dht-inode-read.c:861
#2  0x00007fe6fc74dbe1 in afr_fsync (frame=0x7fe701c33540, this=<optimized
out>, fd=0x7fe6f80b9dcc, datasync=1, xdata=0x0) at afr-common.c:2969
#3  0x00007fe6fc4ebb19 in dht_fsync (frame=0x7fe701c10ba0, this=<optimized
out>, fd=0x7fe6f80b9dcc, datasync=1, xdata=0x0) at dht-inode-read.c:930
#4  0x00007fe6fc2814d5 in shard_fsync (frame=0x7fe701c2facc,
this=0x7fe6f800d1c0, fd=0x7fe6f80b9dcc, datasync=1, xdata=0x0) at shard.c:3894
#5  0x00007fe6fc070935 in wb_fsync_helper (frame=0x7fe701c3d074,
this=0x7fe6f800e630, fd=0x7fe6f80b9dcc, datasync=1, xdata=0x0) at
write-behind.c:1760
#6  0x00007fe70412b17d in call_resume (stub=0x7fe7016daf94) at call-stub.c:2576
#7  0x00007fe6fc073f29 in wb_do_winds (wb_inode=wb_inode at entry=0x7fe6e8041d20,
tasks=tasks at entry=0x7fe6f5dcf990) at write-behind.c:1460
#8  0x00007fe6fc074037 in wb_process_queue
(wb_inode=wb_inode at entry=0x7fe6e8041d20) at write-behind.c:1495
#9  0x00007fe6fc074c28 in wb_fsync (frame=0x7fe701c3d074, this=0x7fe6f800e630,
fd=0x7fe6f80b9dcc, datasync=1, xdata=0x0) at write-behind.c:1785
#10 0x00007fe7040ff4cd in default_fsync (frame=0x7fe701c3d074,
this=0x7fe6f800fa00, fd=0x7fe6f80b9dcc, flags=1, xdata=0x0) at defaults.c:1818
#11 0x00007fe70410b8d5 in default_fsync_resume (frame=0x7fe701c30aec,
this=0x7fe6f8010dd0, fd=0x7fe6f80b9dcc, flags=1, xdata=0x0) at defaults.c:1377
#12 0x00007fe70412b17d in call_resume (stub=0x7fe701722a6c) at call-stub.c:2576
#13 0x00007fe6f7bf5648 in open_and_resume (this=this at entry=0x7fe6f8010dd0,
fd=fd at entry=0x7fe6f80b9dcc, stub=0x7fe701722a6c) at open-behind.c:242
#14 0x00007fe6f7bf5a62 in ob_fsync (frame=0x7fe701c30aec, this=0x7fe6f8010dd0,
fd=0x7fe6f80b9dcc, flag=<optimized out>, xdata=<optimized out>) at
open-behind.c:499
#15 0x00007fe6f79dad20 in io_stats_fsync (frame=0x7fe701c3b238,
this=0x7fe6f8012180, fd=0x7fe6f80b9dcc, flags=1, xdata=0x0) at io-stats.c:2207
#16 0x00007fe7040ff4cd in default_fsync (frame=0x7fe701c3b238,
this=0x7fe6f8013660, fd=0x7fe6f80b9dcc, flags=1, xdata=0x0) at defaults.c:1818
#17 0x00007fe6f77c538b in meta_fsync (frame=0x7fe701c3b238,
this=0x7fe6f8013660, fd=0x7fe6f80b9dcc, flags=1, xdata=0x0) at meta.c:176
#18 0x00007fe7012b1697 in fuse_fsync_resume (state=0x7fe6e8046590) at
fuse-bridge.c:2489
#19 0x00007fe7012a8ec5 in fuse_resolve_done (state=<optimized out>) at
fuse-resolve.c:665
#20 fuse_resolve_all (state=<optimized out>) at fuse-resolve.c:692
#21 0x00007fe7012a8c08 in fuse_resolve (state=0x7fe6e8046590) at
fuse-resolve.c:656
#22 0x00007fe7012a8f0e in fuse_resolve_all (state=<optimized out>) at
fuse-resolve.c:688
#23 0x00007fe7012a8373 in fuse_resolve_continue
(state=state at entry=0x7fe6e8046590) at fuse-resolve.c:708
#24 0x00007fe7012a8ba8 in fuse_resolve_fd (state=0x7fe6e8046590) at
fuse-resolve.c:568
#25 fuse_resolve (state=0x7fe6e8046590) at fuse-resolve.c:645
#26 0x00007fe7012a8eee in fuse_resolve_all (state=<optimized out>) at
fuse-resolve.c:681
#27 0x00007fe7012a8f30 in fuse_resolve_and_resume (state=0x7fe6e8046590,
fn=0x7fe7012b14a0 <fuse_fsync_resume>) at fuse-resolve.c:720
#28 0x00007fe7012bbcde in fuse_thread_proc (data=0x7fe705bceac0) at
fuse-bridge.c:4944
#29 0x00007fe702f63dc5 in start_thread (arg=0x7fe6f5dd0700) at
pthread_create.c:308
#30 0x00007fe7028aa28d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:113

--- Additional comment from Vijay Bellur on 2016-03-01 06:32:08 EST ---

REVIEW: http://review.gluster.org/13562 (features/shard: Fix NULL-dereference
when fsync fails) posted (#1) for review on master by Krutika Dhananjay
(kdhananj at redhat.com)

Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1313290
[Bug 1313290] [HC] glusterfs mount crashed
https://bugzilla.redhat.com/show_bug.cgi?id=1313293
[Bug 1313293] [HC] glusterfs mount crashed
-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are on the CC list for the bug.