[Bugs] [Bug 1422788] [Replicate] "RPC call decoding failed" leading to IO hang & mount inaccessible

bugzilla at redhat.com bugzilla at redhat.com
Fri Apr 7 12:05:13 UTC 2017


https://bugzilla.redhat.com/show_bug.cgi?id=1422788



--- Comment #4 from Worker Ant <bugzilla-bot at gluster.org> ---
COMMIT: https://review.gluster.org/16638 committed in release-3.8 by Niels de
Vos (ndevos at redhat.com) 
------
commit 982de32c7f559ab57f66a9ee92f884b772bae1e4
Author: Poornima G <pgurusid at redhat.com>
Date:   Tue Feb 14 12:45:36 2017 +0530

    rpcsvc: Add rpchdr and proghdr to iobref before submitting to transport

    Backport of https://review.gluster.org/16613

    Issue:
    When fio is run on multiple clients (each client writes to its own files),
    and meanwhile the clients does a readdirp, thus the client which did
    a readdirp will now recieve the upcalls. In this scenario the client
    disconnects with rpc decode failed error.

    RCA:
    Upcall calls rpcsvc_request_submit to submit the request to socket:
    rpcsvc_request_submit currently:
    rpcsvc_request_submit () {
       iobuf = iobuf_new
       iov = iobuf->ptr
       fill iobuf to contain xdrised upcall content - proghdr
       rpcsvc_callback_submit (..iov..)
       ...
       if (iobuf)
           iobuf_unref (iobuf)
    }

    rpcsvc_callback_submit (... iov...) {
       ...
       iobuf = iobuf_new
       iov1 = iobuf->ptr
       fill iobuf to contain xdrised rpc header - rpchdr
       msg.rpchdr = iov1
       msg.proghdr = iov
       ...
       rpc_transport_submit_request (msg)
       ...
       if (iobuf)
           iobuf_unref (iobuf)
    }

    rpcsvc_callback_submit assumes that once rpc_transport_submit_request()
    returns the msg is written on to socket and thus the buffers(rpchdr,
proghdr)
    can be freed, which is not the case. In especially high workload,
    rpc_transport_submit_request() may not be able to write to socket
immediately
    and hence adds it to its own queue and returns as successful. Thus, we have
    use after free, for rpchdr and proghdr. Hence the clients gets garbage
rpchdr
    and proghdr and thus fails to decode the rpc, resulting in disconnect.

    To prevent this, we need to add the rpchdr and proghdr to a iobref and send
    it in msg:
       iobref_add (iobref, iobufs)
       msg.iobref = iobref;
    The socket layer takes a ref on msg.iobref, if it cannot write to socket
and
    is adding to the queue. Thus we do not have use after free.

    Thank You for discussing, debugging and fixing along:
    Prashanth Pai <ppai at redhat.com>
    Raghavendra G <rgowdapp at redhat.com>
    Rajesh Joseph <rjoseph at redhat.com>
    Kotresh HR <khiremat at redhat.com>
    Mohammed Rafi KC <rkavunga at redhat.com>
    Soumya Koduri <skoduri at redhat.com>

    > Reviewed-on: https://review.gluster.org/16613
    > Reviewed-by: Prashanth Pai <ppai at redhat.com>
    > Smoke: Gluster Build System <jenkins at build.gluster.org>
    > Reviewed-by: soumya k <skoduri at redhat.com>
    > NetBSD-regression: NetBSD Build System <jenkins at build.gluster.org>
    > CentOS-regression: Gluster Build System <jenkins at build.gluster.org>
    > Reviewed-by: Raghavendra G <rgowdapp at redhat.com>

    Change-Id: Ifa6bf6f4879141f42b46830a37c1574b21b37275
    BUG: 1422788
    Signed-off-by: Poornima G <pgurusid at redhat.com>
    Reviewed-on: https://review.gluster.org/16638
    CentOS-regression: Gluster Build System <jenkins at build.gluster.org>
    Smoke: Gluster Build System <jenkins at build.gluster.org>
    Reviewed-by: Prashanth Pai <ppai at redhat.com>
    NetBSD-regression: NetBSD Build System <jenkins at build.gluster.org>
    Reviewed-by: Raghavendra G <rgowdapp at redhat.com>

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Bugs mailing list