[Bugs] [Bug 1412119] New: ganesha service crashed on all nodes of ganesha cluster on disperse volume when doing lookup while copying files remotely using scp

Wed Jan 11 10:01:31 UTC 2017

https://bugzilla.redhat.com/show_bug.cgi?id=1412119

            Bug ID: 1412119
           Summary: ganesha service crashed on all nodes of ganesha
                    cluster on disperse volume when doing lookup while
                    copying files remotely using scp
           Product: GlusterFS
           Version: 3.8
         Component: distribute
          Severity: urgent
          Assignee: bugs at gluster.org
          Reporter: nbalacha at redhat.com
                CC: amukherj at redhat.com, aspandey at redhat.com,
                    bugs at gluster.org, jthottan at redhat.com,
                    nbalacha at redhat.com, nchilaka at redhat.com,
                    pkarampu at redhat.com, rhinduja at redhat.com,
                    rhs-bugs at redhat.com, skoduri at redhat.com,
                    storage-qa-internal at redhat.com, tdesala at redhat.com
        Depends On: 1410777
            Blocks: 1410406



+++ This bug was initially created as a clone of Bug #1410777 +++

+++ This bug was initially created as a clone of Bug #1410406 +++

Description of problem:
==========================
The ganesha service crashed on all the nodes of ganesha, making the volume 
unavailable
I mounted the volume on two different clients with below mapping
GlusterNode1:GaneshaNode1:client1 mounted on this
GlusterNode2:GaneshaNode2:client1 mounted on this
GlusterNode3:GaneshaNode3:client1 mounted on this
GlusterNode4:
GlusterNode5:
GlusterNode6:GaneshaNode4:client2 mounted on this

I was doing an scp of a media file(size of 1.1GB) from my laptop to the client1
 in a loop of 100 as below
 scp -r gluster.mp4 root at client45:/mnt/disperse/newdir/dir45/file.$i;done

I did a lookup from client2 which displays o/p immediately

THen i did a ls on the client 1 mount point (to check the scp status ie how
many files have been copied)

At this point the ls hung(i am not able to abort  the command on terminal using
ctrl+c)and the scp also didn't proceed
I saw in ganesha-gfapi logs as input/output error

ganesha on all nodes crashed



Note: I had already hit it once (yesterday)(post a healing while scp and ls was
going on), when i did similar thing, but that time all 1,2,3 VIPs failedover to
4th VIP with 1,2,3 crashing(but couldn't get the core as abrt service was not
enabled)




Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Missing separate debuginfo for /lib64/libntirpc.so.1.4
Try: yum --enablerepo='*debug*' install
/usr/lib/debug/.build-id/43/bbc3c15781ba5ecae854444efe636ba5348c11.debug
Core was generated by `/usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f
/etc/ganesha/ganesha.conf -N N'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007f631166e72e in dht_fsync_cbk (frame=0x7f631ae3884c, 
    cookie=<optimized out>, this=0x7f6338019250, op_ret=-1, op_errno=5, 
    prebuf=0x0, postbuf=0x0, xdata=0x0) at dht-inode-read.c:843
843            if (IS_DHT_MIGRATION_PHASE1 (postbuf)) {
Missing separate debuginfos, use: debuginfo-install
dbus-libs-1.6.12-17.el7.x86_64 glibc-2.17-157.el7_3.1.x86_64
gssproxy-0.4.1-13.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64
krb5-libs-1.14.1-27.el7_3.x86_64 libacl-2.2.51-12.el7.x86_64
libattr-2.4.46-12.el7.x86_64 libblkid-2.23.2-33.el7.x86_64
libcap-2.22-8.el7.x86_64 libcom_err-1.42.9-9.el7.x86_64
libgcc-4.8.5-11.el7.x86_64 libnfsidmap-0.25-15.el7.x86_64
libselinux-2.5-6.el7.x86_64 libuuid-2.23.2-33.el7.x86_64
openssl-libs-1.0.1e-60.el7.x86_64 pcre-8.32-15.el7_2.1.x86_64
sssd-client-1.14.0-43.el7_3.4.x86_64 zlib-1.2.7-17.el7.x86_64
(gdb) bt
#0  0x00007f631166e72e in dht_fsync_cbk (frame=0x7f631ae3884c, 
    cookie=<optimized out>, this=0x7f6338019250, op_ret=-1, op_errno=5, 
    prebuf=0x0, postbuf=0x0, xdata=0x0) at dht-inode-read.c:843
#1  0x00007f639c167286 in default_fsync_cbk (frame=0x7f631ae26674, 
    cookie=<optimized out>, this=<optimized out>, op_ret=-1, op_errno=5, 
    prebuf=0x0, postbuf=0x0, xdata=0x0) at defaults.c:1508
#2  0x00007f63118a5d8a in ec_manager_fsync (fop=0x7f630f35f660, 
    state=<optimized out>) at ec-generic.c:355
#3  0x00007f631189f79b in __ec_manager (fop=0x7f630f35f660, error=5)
    at ec-common.c:2287
#4  0x00007f631189f978 in ec_resume (fop=0x7f630f35f660, error=0)
    at ec-common.c:289
#5  0x00007f631189fa9f in ec_complete (fop=0x7f630f35f660) at ec-common.c:362
#6  0x00007f63118a37ef in ec_fsync_cbk (frame=<optimized out>, cookie=0x2, 
    this=0x7f6338017f90, op_ret=0, op_errno=0, prebuf=0x7f6312be2000, 
    postbuf=0x7f6312be2070, xdata=0x0) at ec-generic.c:279
#7  0x00007f6311b13f64 in client3_3_fsync_cbk (req=<optimized out>, 
    iov=<optimized out>, count=<optimized out>, myframe=0x7f631ae0246c)
    at client-rpc-fops.c:976
#8  0x00007f631fdf36e0 in rpc_clnt_handle_reply (
    clnt=clnt at entry=0x7f63381abd90, pollin=pollin at entry=0x7f63383152c0)
    at rpc-clnt.c:791
#9  0x00007f631fdf39cf in rpc_clnt_notify (trans=<optimized out>, 
    mydata=0x7f63381abde8, event=<optimized out>, data=0x7f63383152c0)
    at rpc-clnt.c:971
#10 0x00007f631fdef893 in rpc_transport_notify (
    this=this at entry=0x7f63381bbaf0, 
    event=event at entry=RPC_TRANSPORT_MSG_RECEIVED, 
    data=data at entry=0x7f63383152c0) at rpc-transport.c:538
#11 0x00007f63121d12d4 in socket_event_poll_in (
    this=this at entry=0x7f63381bbaf0) at socket.c:2267
#12 0x00007f63121d3785 in socket_event_handler (fd=<optimized out>, idx=12, 
    data=0x7f63381bbaf0, poll_in=1, poll_out=0, poll_err=0) at socket.c:2397
#13 0x00007f639c13f650 in event_dispatch_epoll_handler (event=0x7f6312be2540, 
    event_pool=0x7f6398086b00) at event-epoll.c:571
#14 event_dispatch_epoll_worker (data=0x7f633c000dc0) at event-epoll.c:674
#15 0x00007f64287a5dc5 in start_thread () from /lib64/libpthread.so.0
#16 0x00007f6427e7473d in clone () from /lib64/libc.so.6
(gdb) f 0
#0  0x00007f631166e72e in dht_fsync_cbk (frame=0x7f631ae3884c, 
    cookie=<optimized out>, this=0x7f6338019250, op_ret=-1, op_errno=5, 
    prebuf=0x0, postbuf=0x0, xdata=0x0) at dht-inode-read.c:843
843            if (IS_DHT_MIGRATION_PHASE1 (postbuf)) {



Version-Release number of selected component (if applicable):
==============================

[root at dhcp35-37 ccpp-2017-01-05-17:12:21-27585]# rpm -qa|grep gluster
[root at dhcp35-37 ccpp-2017-01-05-17:12:21-27585]# rpm -qa|grep gluster
glusterfs-geo-replication-3.8.4-10.el7rhgs.x86_64
nfs-ganesha-gluster-2.4.1-3.el7rhgs.x86_64
glusterfs-3.8.4-10.el7rhgs.x86_64
python-gluster-3.8.4-10.el7rhgs.noarch
glusterfs-cli-3.8.4-10.el7rhgs.x86_64
glusterfs-rdma-3.8.4-10.el7rhgs.x86_64
glusterfs-api-3.8.4-10.el7rhgs.x86_64
glusterfs-server-3.8.4-10.el7rhgs.x86_64
glusterfs-api-devel-3.8.4-10.el7rhgs.x86_64
glusterfs-debuginfo-3.8.4-10.el7rhgs.x86_64
glusterfs-libs-3.8.4-10.el7rhgs.x86_64
glusterfs-devel-3.8.4-10.el7rhgs.x86_64
glusterfs-fuse-3.8.4-10.el7rhgs.x86_64
glusterfs-ganesha-3.8.4-10.el7rhgs.x86_64
glusterfs-client-xlators-3.8.4-10.el7rhgs.x86_64
glusterfs-events-3.8.4-10.el7rhgs.x86_64
[root at dhcp35-37 ccpp-2017-01-05-17:12:21-27585]# 
[root at dhcp35-37 ccpp-2017-01-05-17:12:21-27585]# 
[root at dhcp35-37 ccpp-2017-01-05-17:12:21-27585]# rpm -qa|grep pace
pacemaker-cli-1.1.15-11.el7.x86_64
pacemaker-libs-1.1.15-11.el7.x86_64
pacemaker-1.1.15-11.el7.x86_64
pacemaker-cluster-libs-1.1.15-11.el7.x86_64
userspace-rcu-0.7.9-2.el7rhgs.x86_64
[root at dhcp35-37 ccpp-2017-01-05-17:12:21-27585]# rpm -qa|grep pcs
pcs-0.9.152-10.el7.x86_64
[root at dhcp35-37 ccpp-2017-01-05-17:12:21-27585]# rpm -qa|grep coro
corosync-2.4.0-4.el7.x86_64
corosynclib-2.4.0-4.el7.x86_64
[root at dhcp35-37 ccpp-2017-01-05-17:12:21-27585]# rpm -qa|grep ganesha
nfs-ganesha-gluster-2.4.1-3.el7rhgs.x86_64
nfs-ganesha-debuginfo-2.4.1-3.el7rhgs.x86_64
nfs-ganesha-2.4.1-3.el7rhgs.x86_64
glusterfs-ganesha-3.8.4-10.el7rhgs.x86_64



cores and log locations will be shared soon


--- Additional comment from Pranith Kumar K on 2017-01-05 08:34:38 EST ---

After:

pk at dhcp35-190 - ~/workspace/gerrit-repo (ec-link)
19:01:46 :) ⚡ git show a9ccd0c8
commit a9ccd0c8ea6989c72073028b296f73a6fcb6b896
Author: Mohammed Rafi KC <rkavunga at redhat.com>
Date:   Fri Apr 22 12:07:31 2016 +0530

    tier/dht: check for rebalance completion for EIO error

    When an ongoing rebalance completion check task been
    triggered by dht, there is a possibility of a race
    between afr setting subvol as non-readable and dht updates
    the cached subvol. In this window a write can fail with EIO.

    Change-Id: I42638e6d4104c0dbe893d1bc73e1366188458c5d
    BUG: 1329503
    Signed-off-by: Mohammed Rafi KC <rkavunga at redhat.com>
    Reviewed-on: http://review.gluster.org/14049
    NetBSD-regression: NetBSD Build System <jenkins at build.gluster.org>
    Smoke: Gluster Build System <jenkins at build.gluster.com>
    CentOS-regression: Gluster Build System <jenkins at build.gluster.com>
    Reviewed-by: N Balachandran <nbalacha at redhat.com>
    Reviewed-by: Jeff Darcy <jdarcy at redhat.com>

diff --git a/xlators/cluster/dht/src/dht-common.h
b/xlators/cluster/dht/src/dht-common.h
index ad86893..1dfc5cd 100644
--- a/xlators/cluster/dht/src/dht-common.h
+++ b/xlators/cluster/dht/src/dht-common.h
@@ -613,7 +613,9 @@ typedef struct dht_fd_ctx {
                 }                                               \
         } while (0)

-#define dht_inode_missing(op_errno) (op_errno == ENOENT || op_errno == ESTALE)
+#define dht_inode_missing(op_errno) (op_errno == ENOENT || op_errno == ESTALE
\
+                                     || op_errno == EIO) \
+/*Bad fix. Please revert the commit after fixing the bug 1329505*/

In this case EIO came with postbuf/prebuf to be NULL so it is NULL dereference
in dht. Changing component.

--- Additional comment from Jiffin on 2017-01-05 08:52:48 EST ---

Initial RCA :
There were EIO error send from 10.70.35.37:rhs-brick3-disperse and
10.70.35.135:rhs-brick3-disperse for the fsync operation
                op_ret = sys_fsync (_fd);
                if (op_ret == -1) {
                        op_errno = errno;
                        gf_msg (this->name, GF_LOG_ERROR, errno,
                                P_MSG_FSYNC_FAILED, "fsync on fd=%p "
                                "failed", fd); 
                        goto out; 
                }
        }

        op_ret = posix_fdstat (this, _fd, &postop);
        if (op_ret == -1) {
logs from server :
[2017-01-05 11:42:13.791238] E [MSGID: 113044] [posix.c:3664:posix_fsync]
0-disperse-posix: fsync on fd=0x7fac3fb638a0 failed [Input/output error]
[2017-01-05 11:42:13.791498] E [MSGID: 115066]
[server-rpc-fops.c:1311:server_fsync_cbk] 0-disperse-server: 25709: FSYNC 0
(77c421dd-edf5-463e-9768-3de44c35ca9d) ==> (Input/output error) [Input/output
error]

In above scenario posix pass "postop" as 0x0.
At the client, currently dht_fsync_cbk() ignoring EIO error to fix the
bz1329505

code sni
       if (op_ret == -1 && !dht_inode_missing(op_errno)) {
                gf_msg_debug (this->name, op_errno,
                              "subvolume %s returned -1",
                              prev->this->name);
                goto out;
        }
dht_inode_missing
--------------------
#define dht_inode_missing(op_errno) (op_errno == ENOENT || op_errno == ESTALE \
                                     || op_errno == EIO) \
/*Bad fix. Please revert the commit after fixing the bug 1329505*/

This case client received postbuf as 0x0 and when dht_fsync_cbk() access the
postbuf it crashes.

Requesting ec team and dht team to have look in the bug

--- Additional comment from Nithya Balachandran on 2017-01-06 06:32:55 EST ---

This is a bug in the dht_fsync_cbk handling. I will send a patch today.

--- Additional comment from Worker Ant on 2017-01-06 10:46:29 EST ---

REVIEW: http://review.gluster.org/16350 (cluster/dht: Incorrect migration
checks in fsync) posted (#1) for review on master by N Balachandran
(nbalacha at redhat.com)

--- Additional comment from Worker Ant on 2017-01-08 23:33:23 EST ---

COMMIT: http://review.gluster.org/16350 committed in master by Raghavendra G
(rgowdapp at redhat.com) 
------
commit 83117c71482c9f87a0b6812094cbb22497eb3faa
Author: N Balachandran <nbalacha at redhat.com>
Date:   Fri Jan 6 21:06:23 2017 +0530

    cluster/dht: Incorrect migration checks in fsync

    Fixed the order of the migration phase checks
    in dht_fsync_cbk. Phase1 should never be hit if op_ret
    is non zero.
    Signed-off-by: N Balachandran <nbalacha at redhat.com>

    Change-Id: I9222692e04868bffa93498059440f0aa553c83ec
    BUG: 1410777
    Reviewed-on: http://review.gluster.org/16350
    Smoke: Gluster Build System <jenkins at build.gluster.org>
    Reviewed-by: Shyamsundar Ranganathan <srangana at redhat.com>
    Tested-by: Raghavendra G <rgowdapp at redhat.com>
    NetBSD-regression: NetBSD Build System <jenkins at build.gluster.org>
    CentOS-regression: Gluster Build System <jenkins at build.gluster.org>
    Reviewed-by: Raghavendra G <rgowdapp at redhat.com>


Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1410406
[Bug 1410406] ganesha service crashed on all nodes of ganesha cluster on
disperse volume when doing lookup while copying files remotely using scp
https://bugzilla.redhat.com/show_bug.cgi?id=1410777
[Bug 1410777] ganesha service crashed on all nodes of ganesha cluster on
disperse volume when doing lookup while copying files remotely using scp
-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.