[Bugs] [Bug 1207085] New: ec heal improvements

Mon Mar 30 08:17:42 UTC 2015

https://bugzilla.redhat.com/show_bug.cgi?id=1207085

            Bug ID: 1207085
           Summary: ec heal improvements
           Product: GlusterFS
           Version: mainline
         Component: disperse
          Assignee: bugs at gluster.org
          Reporter: pkarampu at redhat.com
                CC: bugs at gluster.org, gluster-bugs at redhat.com

Description of problem:
1) ec_manager_heal is doing xattr healing before data rebuilding. There is a
chance that after it executes setxattr of EC_XATTR_VERSION to latest on the bad
copies, it fails in data rebuilding. There is no way to detect that some of the
copies are still not rebuilt.

We need to make healing re-entrant. So first data rebuilding needs to happen.
If it is successfully done, then xattr healing has to happen.

2) For doing the solution suggested in '1)' we also need to get the locking to
be correct. i.e. a) we do not want multiple self-heals to happen in parallel.

3) Ec can already do 'name' healing. i.e. given a filename it will heal it
correctly i.e. either delete it or create it based on which is the correct
thing to do based on the 'version' of parent directory. The remaining code that
needs to be added for full directory healing is to readdir on all the
subvolumes of ec one by one and initiate this name-heal. This is going to be a
very BIG and time consuming operation for ec xlator with lot of subvolumes.
i.e. (4+2)6, (8+3)11, (8+4)12. For this release, this is the best we can do.

I am planning to break the existing ec_manager_heal into 4 different heals i.e.
1) name heal 2) data heal 3) metadata heal 4) directory heal to achieve the
things outlined above. 

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1.
2.
3.

Actual results:

Expected results:

Additional info:

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.