[Bugs] [Bug 1627610] glusterd crash in regression build

Tue Sep 11 03:06:41 UTC 2018

https://bugzilla.redhat.com/show_bug.cgi?id=1627610

Sanju <srakonde at redhat.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED
           Assignee|bugs at gluster.org            |srakonde at redhat.com

--- Comment #1 from Sanju <srakonde at redhat.com> ---
Root Cause:
>From Thread 7:
#10 0x00007f50dd2801f9 in glusterd_store_volinfo (volinfo=0x902290,
ac=GLUSTERD_VOLINFO_VER_AC_NONE)
at
/home/jenkins/root/workspace/regression-test-burn-in/xlators/mgmt/glusterd/src/glusterd-store.c:1806

>From Thread 1:
#10 0x00007f50dd2801f9 in glusterd_store_volinfo (volinfo=0x902290,
ac=GLUSTERD_VOLINFO_VER_AC_NONE)
    at
/home/jenkins/root/workspace/regression-test-burn-in/xlators/mgmt/glusterd/src/glusterd-store.c:1806

>From above snippets from the output of "t a a bt", we can say that Thread 7 and
Thread 1 are pointing to the same volinfo structure.

Source code for glusterd_store volinfo_write:
int32_t
glusterd_store_volinfo_write (int fd, glusterd_volinfo_t *volinfo)
{
        int32_t                         ret = -1;
        gf_store_handle_t              *shandle = NULL;
        GF_ASSERT (fd > 0);
        GF_ASSERT (volinfo);
        GF_ASSERT (volinfo->shandle);

        shandle = volinfo->shandle;
        ret = glusterd_volume_exclude_options_write (fd, volinfo);
        if (ret)
                goto out;

        shandle->fd = fd;
        dict_foreach (volinfo->dict, _storeopts, shandle);

        dict_foreach (volinfo->gsync_slaves, _storeslaves, shandle);
        shandle->fd = 0;
out:
        gf_msg_debug (THIS->name, 0, "Returning %d", ret);
        return ret;
}

At Thread 1,
#8  0x00007f50dd27e211 in glusterd_store_volinfo_write (fd=8, volinfo=0x902290)
    at
/home/jenkins/root/workspace/regression-test-burn-in/xlators/mgmt/glusterd/src/glusterd-store.c:1157

glusterd_store_volinfo_write is calling _storeopts, which again calls
gf_store_save_value. _storeopts is also having a assertion check for whether
fd>0. At glusterd_store_volinfo_write fd value is 8.

#4  0x00007f50e882b341 in gf_store_save_value (fd=0, key=0x91bff0
"performance.client-io-threads", 
    value=0x8bcc40 "off")
    at
/home/jenkins/root/workspace/regression-test-burn-in/libglusterfs/src/store.c:344
>From above we can see that fd value is 0.

At Thread 7, 
#8  0x00007f50dd27edbf in glusterd_store_brickinfos (volinfo=0x902290,
vol_fd=16)
    at
/home/jenkins/root/workspace/regression-test-burn-in/xlators/mgmt/glusterd/src/glusterd-store.c:1373
#9  0x00007f50dd27fa35 in glusterd_store_perform_volume_store
(volinfo=0x902290)
    at
/home/jenkins/root/workspace/regression-test-burn-in/xlators/mgmt/glusterd/src/glusterd-store.c:1613
#10 0x00007f50dd2801f9 in glusterd_store_volinfo (volinfo=0x902290,
ac=GLUSTERD_VOLINFO_VER_AC_NONE)
    at
/home/jenkins/root/workspace/regression-test-burn-in/xlators/mgmt/glusterd/src/glusterd-store.c:1806
#11 0x00007f50dd258a76 in glusterd_restart_bricks (opaque=0x0)
    at
/home/jenkins/root/workspace/regression-test-burn-in/xlators/mgmt/glusterd/src/glusterd-utils.c:6422
#12 0x00007f50e883111e in synctask_wrap ()
    at
/home/jenkins/root/workspace/regression-test-burn-in/libglusterfs/src/syncop.c:375
#13 0x00007f50e6e42030 in ?? () from ./lib64/libc.so.6
#14 0x0000000000000000 in ?? ()

In the stack, we can see glusterd_store_perform_volume_store calling
glusterd_store_brickinfos. Before calling glusterd_store_brickinfos,
glusterd_store_perform_volume_store calls glusterd_store_volinfo_write, which
is writing shandle->fd as 0.

So, Thread 7 updated the fd value as 0, where as Thread 1 is expecting fd > 0.
This is happening because we are having a separate syntask for
glusterd_restart_bricks. We can see glusterd_restart_bricks at Thread 7 bt.

Solution for this can be, acquiring the locks before writing in a critical
section. Need to explore more on the solution.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.