[Bugs] [Bug 1238047] New: Crash in Quota enforcer

bugzilla at redhat.com bugzilla at redhat.com
Wed Jul 1 05:26:01 UTC 2015


https://bugzilla.redhat.com/show_bug.cgi?id=1238047

            Bug ID: 1238047
           Summary: Crash in Quota enforcer
           Product: GlusterFS
           Version: mainline
         Component: quota
          Assignee: bugs at gluster.org
          Reporter: rgowdapp at redhat.com
                CC: bugs at gluster.org, gluster-bugs at redhat.com



Description of problem:
Following crash was observed while running regression test:
./tests/bugs/quota/bug-1235182.t

(gdb) p local->validate_loc.inode
$27 = (inode_t *) 0x7f1e4acd406c
(gdb) bt
#0  0x00000039694028b8 in uuid_is_null () from /lib64/libuuid.so.1
#1  0x00007f1e618110ef in gf_uuid_is_null (uuid=0x8 <Address 0x8 out of
bounds>) at ../../../../../libglusterfs/src/compat-uuid.h:45
#2  quota_enforcer_lookup_cbk (req=<optimized out>, iov=<optimized out>,
count=<optimized out>, myframe=0x7f1e6cd75244)
    at ../../../../../xlators/features/quota/src/quota-enforcer-client.c:169
#3  0x00007f1e6dcc6db4 in rpc_clnt_handle_reply
(clnt=clnt at entry=0x7f1e5c0f2180, pollin=0x7f1e5c0c35e0) at
../../../../rpc/rpc-lib/src/rpc-clnt.c:759
#4  0x00007f1e6dcc7620 in rpc_clnt_notify (trans=<optimized out>,
mydata=0x7f1e5c0f21b0, event=<optimized out>, data=<optimized out>)
    at ../../../../rpc/rpc-lib/src/rpc-clnt.c:887
#5  0x00007f1e6dcc3633 in rpc_transport_notify (this=this at entry=0x7f1e5c0f5240,
event=event at entry=RPC_TRANSPORT_MSG_RECEIVED,
    data=data at entry=0x7f1e5c0c35e0) at
../../../../rpc/rpc-lib/src/rpc-transport.c:538
#6  0x00007f1e63f6ad45 in socket_event_poll_in (this=this at entry=0x7f1e5c0f5240)
at ../../../../../rpc/rpc-transport/socket/src/socket.c:2285
#7  0x00007f1e63f6dbbc in socket_event_handler (fd=<optimized out>,
idx=<optimized out>, data=0x7f1e5c0f5240, poll_in=1, poll_out=0, poll_err=0)
    at ../../../../../rpc/rpc-transport/socket/src/socket.c:2398
#8  0x00007f1e6df5a90a in event_dispatch_epoll_handler (event=0x7f1e63d37e20,
event_pool=0x1801c90) at ../../../libglusterfs/src/event-epoll.c:570
#9  event_dispatch_epoll_worker (data=0x1849540) at
../../../libglusterfs/src/event-epoll.c:673
#10 0x000000395ec07d14 in start_thread () from /lib64/libpthread.so.0
#11 0x000000395e8f199d in clone () from /lib64/libc.so.6
(gdb) p inode
$28 = (inode_t *) 0x0
(gdb) p local->validate_loc.inode
$29 = (inode_t *) 0x7f1e4acd406c
(gdb) f 2
#2  quota_enforcer_lookup_cbk (req=<optimized out>, iov=<optimized out>,
count=<optimized out>, myframe=0x7f1e6cd75244)
    at ../../../../../xlators/features/quota/src/quota-enforcer-client.c:169
169                if ((!gf_uuid_is_null (inode->gfid))

As can be seen above, inode is NULL. However, local->validate_loc.inode is
_NOT_ NULL. This indicates that most likely there is a race (somebody overwrote
local->validate_loc.inode). Looking through the code, I found following
implementation of quota_check_limit.

<quota_check_limit>

                ret = quota_check_object_limit (frame, ctx, priv, _inode, this,
                                                &op_errno, just_validated,
                                                local, &skip_check);
                if (skip_check == _gf_true)
                        goto done;

                if (ret) {
                        if (op_errno != EDQUOT)
                                gf_msg (this->name, GF_LOG_ERROR, 0,
                                        Q_MSG_ENFORCEMENT_FAILED, "Failed to "
                                        "check quota object limit");
                        goto err;
                }

                ret = quota_check_size_limit (frame, ctx, priv, _inode, this,
                                              &op_errno, just_validated, delta,
                                              local, &skip_check);
                if (skip_check == _gf_true)
                        goto done;

</quota_check_limit>

There are couple of issues with above implementation:
1. We continue to enforce on ancestors even while we are validating state of
one of the ancestors. Earlier (with just implementation of
quota_check_size_limit) this was not the case. Once we invoke validation, the
enforcement was _stalled_ till we here back the result of validation. Current
code just invokes validation and continues with enforcement. Note that local is
shared for all the activities - enforcement, validation and (probably)
build_ancestry. So, only one of enforcement, validation or build-ancestry can
be active at any time.

2. This is a slight variant of problem 1. Assume that quota_check_size_limit
invokes validation. This doesn't stop quota_check_inode_limit from again
invoking validation (since first validation is not complete _yet_). So,
quota_check_inode_limit invokes another instance of validation. Note that both
instances of validation are sharing same local causing races.

The fix is that as mentioned in 1, only one of (that too only one instance of)
enforcement, validation and build_ancestry can be in progress at any time.
Otherwise if we want to make parallel, each one of them should've their own
locals/frames to avoid corruption.

PS: Similar problem exists while enforcing limits on hardlinks.

Version-Release number of selected component (if applicable):
master


How reproducible:
Intermittent

Steps to Reproduce:
1. running ./tests/bugs/quota/bug-1235182.t in a loop hits the bug once in a
while.

2.
3.

Actual results:


Expected results:


Additional info:

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Bugs mailing list