[Bugs] [Bug 1403192] New: Files remain unhealed forever if shd is disabled and re-enabled while healing is in progress.

bugzilla at redhat.com bugzilla at redhat.com
Fri Dec 9 11:52:43 UTC 2016


https://bugzilla.redhat.com/show_bug.cgi?id=1403192

            Bug ID: 1403192
           Summary: Files remain unhealed forever if shd is disabled and
                    re-enabled while healing is in progress.
           Product: GlusterFS
           Version: 3.8
         Component: replicate
          Assignee: bugs at gluster.org
          Reporter: ravishankar at redhat.com
                CC: bugs at gluster.org
        Depends On: 1402841
            Blocks: 1403120, 1403187



+++ This bug was initially created as a clone of Bug #1402841 +++

Description of problem:

1. Create a 1x2 replica vol using a 2 node cluster.
2. Fuse mount the vol and create 2000 files
3. Bring one brick down, write to those files, leading to 2000 pending data
heals.
4. Bring back the brick and launch index heal
5. The shd log on the source brick prints completed heals for the the processed
files.
6. Before the heal completes, do a `gluster vol set volname self-heal-daemon
off`
7. The heal stops as expected.
8. Re-enable the shd: `gluster vol set volname self-heal-daemon on`
9. Observe the shd log, we don't see any files getting healed.
10. Launching index heal manually also has no effect.

The only workaround is to restart shd with a `volume start force`.

--- Additional comment from Worker Ant on 2016-12-08 07:55:33 EST ---

REVIEW: http://review.gluster.org/16073 (syncop: fix conditional wait bug in
parallel dir scan) posted (#1) for review on master by Ravishankar N
(ravishankar at redhat.com)

--- Additional comment from Worker Ant on 2016-12-09 00:27:13 EST ---

REVIEW: http://review.gluster.org/16073 (syncop: fix conditional wait bug in
parallel dir scan) posted (#2) for review on master by Ravishankar N
(ravishankar at redhat.com)

--- Additional comment from Worker Ant on 2016-12-09 05:24:25 EST ---

COMMIT: http://review.gluster.org/16073 committed in master by Pranith Kumar
Karampuri (pkarampu at redhat.com) 
------
commit 2d012c4558046afd6adb3992ff88f937c5f835e4
Author: Ravishankar N <ravishankar at redhat.com>
Date:   Fri Dec 9 09:50:43 2016 +0530

    syncop: fix conditional wait bug in parallel dir scan

    Problem:
    The issue as seen by the user is detailed in the BZ but what is
    happening is if the no. of items in the wait queue == max-qlen,
    syncop_mt_dir_scan() does a pthread_cond_wait until the launched
    synctask workers dequeue the queue. But if for some reason the worker
    fails, the queue is never emptied due to which further invocations of
    syncop_mt_dir_scan() are blocked forever.

    Fix: Made some changes to _dir_scan_job_fn

    - If a worker encounters error while processing an entry, notify the
      readdir loop in syncop_mt_dir_scan() of the error but continue to process
      other entries in the queue, decrementing the qlen as and when we dequeue
      elements, and ending only when the queue is empty.

    - If the readdir loop in syncop_mt_dir_scan() gets an error form the
      worker, stop the readdir+queueing of further entries.

    Change-Id: I39ce073e01a68c7ff18a0e9227389245a6f75b88
    BUG: 1402841
    Signed-off-by: Ravishankar N <ravishankar at redhat.com>
    Reviewed-on: http://review.gluster.org/16073
    Smoke: Gluster Build System <jenkins at build.gluster.org>
    NetBSD-regression: NetBSD Build System <jenkins at build.gluster.org>
    CentOS-regression: Gluster Build System <jenkins at build.gluster.org>
    Reviewed-by: Pranith Kumar Karampuri <pkarampu at redhat.com>


Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1402841
[Bug 1402841] Files remain unhealed forever if shd is disabled and
re-enabled while healing is in progress.
https://bugzilla.redhat.com/show_bug.cgi?id=1403120
[Bug 1403120] Files remain unhealed forever if shd is disabled and
re-enabled while healing is in progress.
https://bugzilla.redhat.com/show_bug.cgi?id=1403187
[Bug 1403187] Files remain unhealed forever if shd is disabled and
re-enabled while healing is in progress.
-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Bugs mailing list