[Bugs] [Bug 1322520] ./tests/basic/tier/tier-file-create.t dumping core fairly often on build machines in Linux

bugzilla at redhat.com bugzilla at redhat.com
Sat Apr 9 18:51:27 UTC 2016


https://bugzilla.redhat.com/show_bug.cgi?id=1322520



--- Comment #3 from Vijay Bellur <vbellur at redhat.com> ---
COMMIT: http://review.gluster.org/13859 committed in release-3.7 by Pranith
Kumar Karampuri (pkarampu at redhat.com) 
------
commit 0830a6b05fbc7db89e984bb12c40c8eb7dbe119f
Author: Pranith Kumar K <pkarampu at redhat.com>
Date:   Tue Mar 8 23:05:08 2016 +0530

    cluster/ec: Do not ref dictionary in lookup

    Problem:
    1) dict_for_each loops over the elements without any locks, so the members
of
       the dictionary can be ref/unrefed while dict_for_each is executed by
another
       thread leading to crashes.

    Basically with distributed ec + disctributed replicate as cold, hot tiers.
tier
    sends a lookup which fails on ec. (By this time dict already contains ec
    xattrs) After this lookup_everywhere code path is hit in tier which
triggers
    lookup on each of distribute's hash lookup but fails which leads to the
cold,
    hot dht's lookup_everywhere in two parallel epoll threads where in ec when
it
    tries to set trusted.ec.version/dirty/size as keys in the dictionary, the
older
    values against the same key get erased. While this erasing is going on if
the
    thread that is doing lookup on afr's subvolume accesses these keys either
in
    dict_copy_with_ref or client xlator trying to serialize, that can either
lead
    to crash or hang based on if the spin/mutex lock is called on invalid
memory.

    2) EC deletes GF_CONTENT_KEY from the dictionary, this may lead to extra
reads
       in case of lookup-everwhere for tiered volumes.

    Fix:
    Do dict_copy_with_ref() for the lookup-dictionary.
    This is avoiding the problem and is not actually fixing the 1st problem.
    2nd problem will be fixed.

     >Change-Id: I5427aa14c48cb7572977d4de9a28c5ffff2b4b95
     >BUG: 1315560
     >Signed-off-by: Pranith Kumar K <pkarampu at redhat.com>
     >Reviewed-on: http://review.gluster.org/13680
     >Smoke: Gluster Build System <jenkins at build.gluster.com>
     >NetBSD-regression: NetBSD Build System <jenkins at build.gluster.org>
     >CentOS-regression: Gluster Build System <jenkins at build.gluster.com>
     >Reviewed-by: Xavier Hernandez <xhernandez at datalab.es>
     >(cherry picked from commit 64cba025b13aad7fb3020a04930cfa22fbfcb859)

    Change-Id: I2828a0d9e730bc4b0ea6cee037365131767ae43e
    BUG: 1322520
    Signed-off-by: Pranith Kumar K <pkarampu at redhat.com>
    Reviewed-on: http://review.gluster.org/13859
    NetBSD-regression: NetBSD Build System <jenkins at build.gluster.org>
    CentOS-regression: Gluster Build System <jenkins at build.gluster.com>
    Reviewed-by: Ravishankar N <ravishankar at redhat.com>
    Reviewed-by: Krutika Dhananjay <kdhananj at redhat.com>
    Smoke: Gluster Build System <jenkins at build.gluster.com>

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are on the CC list for the bug.


More information about the Bugs mailing list