[Bugs] [Bug 1238049] New: Crash in Quota enforcer

bugzilla at redhat.com bugzilla at redhat.com
Wed Jul 1 05:28:26 UTC 2015


            Bug ID: 1238049
           Summary: Crash in Quota enforcer
           Product: Red Hat Gluster Storage
           Version: 3.1
         Component: gluster-quota
          Assignee: vagarwal at redhat.com
          Reporter: rgowdapp at redhat.com
        QA Contact: storage-qa-internal at redhat.com
                CC: bugs at gluster.org, gluster-bugs at redhat.com
        Depends On: 1238047, 1238048

+++ This bug was initially created as a clone of Bug #1238048 +++

+++ This bug was initially created as a clone of Bug #1238047 +++

Description of problem:
Following crash was observed while running regression test:

(gdb) p local->validate_loc.inode
$27 = (inode_t *) 0x7f1e4acd406c
(gdb) bt
#0  0x00000039694028b8 in uuid_is_null () from /lib64/libuuid.so.1
#1  0x00007f1e618110ef in gf_uuid_is_null (uuid=0x8 <Address 0x8 out of
bounds>) at ../../../../../libglusterfs/src/compat-uuid.h:45
#2  quota_enforcer_lookup_cbk (req=<optimized out>, iov=<optimized out>,
count=<optimized out>, myframe=0x7f1e6cd75244)
    at ../../../../../xlators/features/quota/src/quota-enforcer-client.c:169
#3  0x00007f1e6dcc6db4 in rpc_clnt_handle_reply
(clnt=clnt at entry=0x7f1e5c0f2180, pollin=0x7f1e5c0c35e0) at
#4  0x00007f1e6dcc7620 in rpc_clnt_notify (trans=<optimized out>,
mydata=0x7f1e5c0f21b0, event=<optimized out>, data=<optimized out>)
    at ../../../../rpc/rpc-lib/src/rpc-clnt.c:887
#5  0x00007f1e6dcc3633 in rpc_transport_notify (this=this at entry=0x7f1e5c0f5240,
event=event at entry=RPC_TRANSPORT_MSG_RECEIVED,
    data=data at entry=0x7f1e5c0c35e0) at
#6  0x00007f1e63f6ad45 in socket_event_poll_in (this=this at entry=0x7f1e5c0f5240)
at ../../../../../rpc/rpc-transport/socket/src/socket.c:2285
#7  0x00007f1e63f6dbbc in socket_event_handler (fd=<optimized out>,
idx=<optimized out>, data=0x7f1e5c0f5240, poll_in=1, poll_out=0, poll_err=0)
    at ../../../../../rpc/rpc-transport/socket/src/socket.c:2398
#8  0x00007f1e6df5a90a in event_dispatch_epoll_handler (event=0x7f1e63d37e20,
event_pool=0x1801c90) at ../../../libglusterfs/src/event-epoll.c:570
#9  event_dispatch_epoll_worker (data=0x1849540) at
#10 0x000000395ec07d14 in start_thread () from /lib64/libpthread.so.0
#11 0x000000395e8f199d in clone () from /lib64/libc.so.6
(gdb) p inode
$28 = (inode_t *) 0x0
(gdb) p local->validate_loc.inode
$29 = (inode_t *) 0x7f1e4acd406c
(gdb) f 2
#2  quota_enforcer_lookup_cbk (req=<optimized out>, iov=<optimized out>,
count=<optimized out>, myframe=0x7f1e6cd75244)
    at ../../../../../xlators/features/quota/src/quota-enforcer-client.c:169
169                if ((!gf_uuid_is_null (inode->gfid))

As can be seen above, inode is NULL. However, local->validate_loc.inode is
_NOT_ NULL. This indicates that most likely there is a race (somebody overwrote
local->validate_loc.inode). Looking through the code, I found following
implementation of quota_check_limit.


                ret = quota_check_object_limit (frame, ctx, priv, _inode, this,
                                                &op_errno, just_validated,
                                                local, &skip_check);
                if (skip_check == _gf_true)
                        goto done;

                if (ret) {
                        if (op_errno != EDQUOT)
                                gf_msg (this->name, GF_LOG_ERROR, 0,
                                        Q_MSG_ENFORCEMENT_FAILED, "Failed to "
                                        "check quota object limit");
                        goto err;

                ret = quota_check_size_limit (frame, ctx, priv, _inode, this,
                                              &op_errno, just_validated, delta,
                                              local, &skip_check);
                if (skip_check == _gf_true)
                        goto done;


There are couple of issues with above implementation:
1. We continue to enforce on ancestors even while we are validating state of
one of the ancestors. Earlier (with just implementation of
quota_check_size_limit) this was not the case. Once we invoke validation, the
enforcement was _stalled_ till we here back the result of validation. Current
code just invokes validation and continues with enforcement. Note that local is
shared for all the activities - enforcement, validation and (probably)
build_ancestry. So, only one of enforcement, validation or build-ancestry can
be active at any time.

2. This is a slight variant of problem 1. Assume that quota_check_size_limit
invokes validation. This doesn't stop quota_check_inode_limit from again
invoking validation (since first validation is not complete _yet_). So,
quota_check_inode_limit invokes another instance of validation. Note that both
instances of validation are sharing same local causing races.

The fix is that as mentioned in 1, only one of (that too only one instance of)
enforcement, validation and build_ancestry can be in progress at any time.
Otherwise if we want to make parallel, each one of them should've their own
locals/frames to avoid corruption.

PS: Similar problem exists while enforcing limits on hardlinks.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. running ./tests/bugs/quota/bug-1235182.t in a loop hits the bug once in a


Actual results:

Expected results:

Additional info:

Referenced Bugs:

[Bug 1238047] Crash in Quota enforcer
[Bug 1238048] Crash in Quota enforcer
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=EmVRG9wXKJ&a=cc_unsubscribe

More information about the Bugs mailing list