[Bugs] [Bug 1316178] changelog/rpc: Memory leak- rpc_clnt_t object is never freed

bugzilla at redhat.com bugzilla at redhat.com
Fri Jul 22 15:12:55 UTC 2016


https://bugzilla.redhat.com/show_bug.cgi?id=1316178



--- Comment #11 from Vijay Bellur <vbellur at redhat.com> ---
COMMIT: http://review.gluster.org/13658 committed in master by Jeff Darcy
(jdarcy at redhat.com) 
------
commit 637ce9e2e27e9f598a4a6c5a04cd339efaa62076
Author: Kotresh HR <khiremat at redhat.com>
Date:   Mon Mar 7 11:45:07 2016 +0530

    changelog/rpc: Fix rpc_clnt_t mem leaks

    PROBLEM:
       1. Freeing up rpc_clnt object might lead to crashes. Well,
          it was not a necessity to free rpc-clnt object till now
          because all the existing use cases needs to reconnect
          back on disconnects. Hence timer code was not taking
          ref on rpc-clnt object.

          Glusterd had some use-cases that led to crash due to
          ping-timer and they fixed only those code paths that
          involve ping-timer.

          Now, since changelog has an use-case where rpc-clnt
          need to be freed up, we need to fix timer code to take
          refs

       2. In changelog, because of issue 1, only mydata was being
          freed which is incorrect. And there are races where
          rpc-clnt object would access the freed mydata which
          would lead to crashes.

          Since changelog xlator resides on brick side and is long
          living process, if multiple libgfchangelog consumers
          register to changelog and disconnect/reconnect mulitple
          times, it would result in leak of 'rpc-clnt' object
          for every connect/disconnect.

    SOLUTION:
       1. Handle ref/unref of 'rpc_clnt' structure in timer
          functions properly.
       2. In changelog, unref 'rpc_clnt' in RPC_CLNT_DISCONNECT
          after disabling timers and free mydata on RPC_CLNT_DESTROY.

    RPC SETUP IN CHANGELOG:
       1. changelog xlator initiates rpc server say 'changelog_rpc_server'
       2. libgfchangelog initiates one rpc server say
'libgfchangelog_rpc_server'
       3. libgfchangelog initiates rpc client and connects to
'changelog_rpc_server'
       4. In return changelog_rpc_server initiates a rpc client and connects
back
          to 'libgfchangelog_rpc_server'

    REF/UNREF HANDLING IN TIMER FUNCTIONS:
    Let's say rpc clnt refcount = 1
       1. Take the ref before reigstering callback to timer queue
               >>>>  rpc_clnt_ref (say ref count becomes = 2)
       2. Register a callback to timer say 'callback1'
       3. If register fails:
               >>>> rpc_clnt_unref (ref count = 1)
       4. On timer expiration, 'callback1' gets called. So unref rpc clnt at
the end
          in 'callback1'. This is corresponding to ref taken in step 1
               >>>> rpc_clnt_unref (ref count = 1)
       5. The cycle from step-1 to step-4 continues....until timer cancel event
happens
       6. timer cancel of say 'callback1'
               If timer cancel fails:
                     Do nothing, Step-4 would have unrefd
               If timer cancel succeeds:
                     >>>> rpc_clnt_unref (ref count = 1)

    Change-Id: I91389bc511b8b1a17824941970ee8d2c29a74a09
    BUG: 1316178
    Signed-off-by: Kotresh HR <khiremat at redhat.com>
    Reviewed-on: http://review.gluster.org/13658
    Smoke: Gluster Build System <jenkins at build.gluster.org>
    NetBSD-regression: NetBSD Build System <jenkins at build.gluster.org>
    CentOS-regression: Gluster Build System <jenkins at build.gluster.org>
    Reviewed-by: Raghavendra G <rgowdapp at redhat.com>

-- 
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=HEQe6gu69f&a=cc_unsubscribe


More information about the Bugs mailing list