[Bugs] [Bug 1440635] Application VMs with their disk images on sharded-replica 3 volume are unable to boot after performing rebalance

bugzilla at redhat.com bugzilla at redhat.com
Mon Apr 10 15:45:54 UTC 2017


https://bugzilla.redhat.com/show_bug.cgi?id=1440635



--- Comment #5 from Worker Ant <bugzilla-bot at gluster.org> ---
COMMIT: https://review.gluster.org/17019 committed in release-3.8 by jiffin
tony Thottan (jthottan at redhat.com) 
------
commit d71ec72b981d110199c3376f39f91b704241975c
Author: Krutika Dhananjay <kdhananj at redhat.com>
Date:   Thu Apr 6 18:10:41 2017 +0530

    features/shard: Fix vm corruption upon fix-layout

            Backport of: https://review.gluster.org/17010

    shard's writev implementation, as part of identifying
    presence of participant shards that aren't in memory,
    first sends an MKNOD on these shards, and upon EEXIST error,
    looks up the shards before proceeding with the writes.

    The VM corruption was caused when the following happened:
    1. DHT had n subvolumes initially.
    2. Upon add-brick + fix-layout, the layout of .shard changed
       although the existing shards under it were yet to be migrated
       to their new hashed subvolumes.
    3. During this time, there were writes on the VM falling in regions
       of the file whose corresponding shards were already existing under
       .shard.
    4. Sharding xl sent MKNOD on these shards, now creating them in their
       new hashed subvolumes although there already exist shard blocks for
       this region with valid data.
    5. All subsequent writes were wound on these newly created copies.

    The net outcome is that both copies of the shard didn't have the correct
    data. This caused the affected VMs to be unbootable.

    FIX:
    For want of better alternatives in DHT, the fix changes shard fops to do
    a LOOKUP before the MKNOD and upon EEXIST error, perform another lookup.

    Change-Id: I1a5d3515b42e2e5583c407d1b4aff44d7ce472eb
    BUG: 1440635
    RCA'd-by: Raghavendra Gowdappa <rgowdapp at redhat.com>
    Reported-by: Mahdi Adnan <mahdi.adnan at outlook.com>
    Signed-off-by: Krutika Dhananjay <kdhananj at redhat.com>
    Reviewed-on: https://review.gluster.org/17019
    CentOS-regression: Gluster Build System <jenkins at build.gluster.org>
    NetBSD-regression: NetBSD Build System <jenkins at build.gluster.org>
    Smoke: Gluster Build System <jenkins at build.gluster.org>
    Reviewed-by: jiffin tony Thottan <jthottan at redhat.com>

-- 
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=PLPrBvEOjY&a=cc_unsubscribe


More information about the Bugs mailing list