[Gluster-devel] Stale state->fd->inode and race condition with fd_destroy()

Anand Avati anand.avati at gmail.com
Fri Jul 8 13:03:53 UTC 2011


We haven't come across this issue so far.. can you post the complete
backtrace from your debugger?

Avati

On Sun, Jul 3, 2011 at 1:21 PM, Emmanuel Dreyfus <manu at netbsd.org> wrote:

> Hi
>
> I get a reprodcutbile crash of glusterfsd, running 3.2.1 code. I get it
> by running multiple tar -xzf on a client, and after a while, a
> glusterfsd on a brick crashes:
>
> Program terminated with signal 11, Segmentation fault.
> #0  0xba0d652e in server_rchecksum_cbk (frame=0xbad007d0,
>    cookie=0xbaf00300, this=0xba810000, op_ret=-1, op_errno=9,
>    weak_checksum=0, strong_checksum=0xb91ffc74 "") at
>    server3_1-fops.c:1305
>
> Here is the offending code
>
>        if (op_ret == -1)
>                gf_log (this->name, GF_LOG_INFO,
>                        "%"PRId64": RCHECKSUM %"PRId64" (%"PRId64") ==>
> %"PRId32" (%s)",
>                        frame->root->unique, state->resolve.fd_no,
>                        state->fd ? state->fd->inode->ino : 0, op_ret,
>                        strerror (op_errno));
>
> The problem is state->fd->inode value:
>
> (gdb) print *((server_state_t *)frame->root->state)->fd
> $7 = {pid = 2610, flags = 2, refcount = 2, inode_list =
>        {next = 0xb9801088, prev = 0xb9801088}, inode = 0xaaaaaaaa,
>        lock = {pts_magic = 3735879687, pts_spin = 0 '\0', pts_flags =
>        0}, _ctx = 0xbb96b080, xl_count = 8}
>
> inode = 0xaaaaaaaa is set in fd_destroy() to denote a stale object (It
> is less fun than using 0xdeadbeef :-)
>
> That suggests a race condition where a thread uses a fd that another
> thread destroyed. Of course, the value could be checked at the beginning
> of server_rchecksum_cbk(), but I suspect the problem is more widespread
> that this. There are many other places in server3_1-fops.c where
> state->fd->inode->ino is used.
>
> And should the value be checked at the beginning of
> server_rchecksum_cbk() and its friends, or in any gf_log() call, like
> this:
>        if (op_ret == -1)
>                gf_log (this->name, GF_LOG_INFO,
>                        "%"PRId64": RCHECKSUM %"PRId64" (%"PRId64") "
>                        "==> %"PRId32" (%s)",
>                        frame->root->unique, state->resolve.fd_no,
>                        state->fd && (state->fd->inode != 0xaaaaaaaa) ?
>                        state->fd->inode->ino : 0, op_ret,
>                        strerror (op_errno));
>
> FWIW this is a 2x2 replicated and distributed setup.
>
> --
> Emmanuel Dreyfus
> http://hcpnet.free.fr/pubz
> manu at netbsd.org
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
> https://lists.nongnu.org/mailman/listinfo/gluster-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-devel/attachments/20110708/e787b56f/attachment-0003.html>


More information about the Gluster-devel mailing list