[Bugs] [Bug 1318428] New: ./tests/basic/tier/tier-file-create.t dumping core fairly often on build machines in Linux

bugzilla at redhat.com bugzilla at redhat.com
Wed Mar 16 20:45:42 UTC 2016


https://bugzilla.redhat.com/show_bug.cgi?id=1318428

            Bug ID: 1318428
           Summary: ./tests/basic/tier/tier-file-create.t dumping core
                    fairly often on build machines in Linux
           Product: Red Hat Gluster Storage
           Version: 3.1
         Component: glusterfs
     Sub Component: tiering
          Keywords: Triaged
          Severity: high
          Assignee: rhs-bugs at redhat.com
          Reporter: dblack at redhat.com
        QA Contact: nchilaka at redhat.com
                CC: bugs at gluster.org, dlambrig at redhat.com,
                    josferna at redhat.com, kdhananj at redhat.com,
                    nbalacha at redhat.com, pkarampu at redhat.com
        Depends On: 1315560



+++ This bug was initially created as a clone of Bug #1315560 +++

Description of problem:

http://www.gluster.org/pipermail/gluster-devel/2016-March/048568.html

https://build.gluster.org/job/rackspace-regression-2GB-triggered/18872/consoleFull
https://build.gluster.org/job/rackspace-regression-2GB-triggered/18793/console


I have set the author to the author of the script to begin with.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

--- Additional comment from Vijay Bellur on 2016-03-07 23:52:23 EST ---

REVIEW: http://review.gluster.org/13632 (tests: Move tier-file-create.t to bad
tests) posted (#1) for review on master by Krutika Dhananjay
(kdhananj at redhat.com)

--- Additional comment from Vijay Bellur on 2016-03-08 01:45:07 EST ---

REVIEW: http://review.gluster.org/13632 (tests: Move tier-file-create.t to bad
tests) posted (#2) for review on master by Krutika Dhananjay
(kdhananj at redhat.com)

--- Additional comment from Vijay Bellur on 2016-03-08 06:29:20 EST ---

REVIEW: http://review.gluster.org/13632 (tests: Move tier-file-create.t to bad
tests) posted (#3) for review on master by Krutika Dhananjay
(kdhananj at redhat.com)

--- Additional comment from Vijay Bellur on 2016-03-08 15:00:44 EST ---

COMMIT: http://review.gluster.org/13632 committed in master by Jeff Darcy
(jdarcy at redhat.com) 
------
commit 66d62edd08be5701407e4adcb153a676702ff8b8
Author: Krutika Dhananjay <kdhananj at redhat.com>
Date:   Tue Mar 8 10:21:14 2016 +0530

    tests: Move tier-file-create.t to bad tests

    Change-Id: Iaddb244699b0e2647a67a75f257e4c47e0e69e0d
    BUG: 1315560
    Signed-off-by: Krutika Dhananjay <kdhananj at redhat.com>
    Reviewed-on: http://review.gluster.org/13632
    Smoke: Gluster Build System <jenkins at build.gluster.com>
    NetBSD-regression: NetBSD Build System <jenkins at build.gluster.org>
    CentOS-regression: Gluster Build System <jenkins at build.gluster.com>
    Reviewed-by: Dan Lambright <dlambrig at redhat.com>
    Reviewed-by: Jeff Darcy <jdarcy at redhat.com>

--- Additional comment from Vijay Bellur on 2016-03-11 05:48:50 EST ---

REVIEW: http://review.gluster.org/13680 (cluster/ec: Do not ref dictionary in
lookup) posted (#1) for review on master by Pranith Kumar Karampuri
(pkarampu at redhat.com)

--- Additional comment from Vijay Bellur on 2016-03-14 07:40:03 EDT ---

COMMIT: http://review.gluster.org/13680 committed in master by Xavier Hernandez
(xhernandez at datalab.es) 
------
commit 64cba025b13aad7fb3020a04930cfa22fbfcb859
Author: Pranith Kumar K <pkarampu at redhat.com>
Date:   Tue Mar 8 23:05:08 2016 +0530

    cluster/ec: Do not ref dictionary in lookup

    Problem:
    1) dict_for_each loops over the elements without any locks, so the members
of
       the dictionary can be ref/unrefed while dict_for_each is executed by
another
       thread leading to crashes.

    Basically with distributed ec + disctributed replicate as cold, hot tiers.
tier
    sends a lookup which fails on ec. (By this time dict already contains ec
    xattrs) After this lookup_everywhere code path is hit in tier which
triggers
    lookup on each of distribute's hash lookup but fails which leads to the
cold,
    hot dht's lookup_everywhere in two parallel epoll threads where in ec when
it
    tries to set trusted.ec.version/dirty/size as keys in the dictionary, the
older
    values against the same key get erased. While this erasing is going on if
the
    thread that is doing lookup on afr's subvolume accesses these keys either
in
    dict_copy_with_ref or client xlator trying to serialize, that can either
lead
    to crash or hang based on if the spin/mutex lock is called on invalid
memory.

    2) EC deletes GF_CONTENT_KEY from the dictionary, this may lead to extra
reads
       in case of lookup-everwhere for tiered volumes.

    Fix:
    Do dict_copy_with_ref() for the lookup-dictionary.
    This is avoiding the problem and is not actually fixing the 1st problem.
    2nd problem will be fixed.

    Change-Id: I5427aa14c48cb7572977d4de9a28c5ffff2b4b95
    BUG: 1315560
    Signed-off-by: Pranith Kumar K <pkarampu at redhat.com>
    Reviewed-on: http://review.gluster.org/13680
    Smoke: Gluster Build System <jenkins at build.gluster.com>
    NetBSD-regression: NetBSD Build System <jenkins at build.gluster.org>
    CentOS-regression: Gluster Build System <jenkins at build.gluster.com>
    Reviewed-by: Xavier Hernandez <xhernandez at datalab.es>


Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1315560
[Bug 1315560] ./tests/basic/tier/tier-file-create.t dumping core fairly
often on build machines in Linux
-- 
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=mQfAppDu8k&a=cc_unsubscribe


More information about the Bugs mailing list