[Bugs] [Bug 1648205] New: Thin-arbiter: Have the state of volume in memory and use it for shd

Fri Nov 9 06:38:53 UTC 2018

https://bugzilla.redhat.com/show_bug.cgi?id=1648205

            Bug ID: 1648205
           Summary: Thin-arbiter: Have the state of volume in memory and
                    use it for shd
           Product: GlusterFS
           Version: 5
         Component: replicate
          Keywords: Reopened, Triaged
          Severity: high
          Priority: high
          Assignee: bugs at gluster.org
          Reporter: aspandey at redhat.com
                CC: aspandey at redhat.com, bugs at gluster.org,
                    ksubrahm at redhat.com, pasik at iki.fi,
                    ravishankar at redhat.com
        Depends On: 1579788

+++ This bug was initially created as a clone of Bug #1579788 +++

Description of problem:
In the current thin-arbiter implementation we do not have the state of the
volume in memory. This leads us to send the request to thin-arbiter node in
every failure scenarios, which will slow down the transaction.

Keep the state of which brick is good and bad in all the clients so that we
need not contact thin-arbiter brick in every failure scenarios. Contact the
thin-arbiter only when we don't have the state in memory. i.e, if it is the
first failure on the client or if its a failure after SHD heals and resets the
in memory copy.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1.
2.
3.

Actual results:

Expected results:

Additional info:

--- Additional comment from Worker Ant on 2018-05-28 09:06:38 EDT ---

REVIEW: https://review.gluster.org/20095 (afr: thin-arbiter 2 domain locking
and in-memory state) posted (#1) for review on master by Ravishankar N

--- Additional comment from Worker Ant on 2018-05-30 08:09:30 EDT ---

REVIEW: https://review.gluster.org/20103 (cluster/afr: Use 2 domain locking in
SHD for thin-arbiter) posted (#1) for review on master by Karthik U S

--- Additional comment from Yaniv Kaul on 2018-07-11 10:26:23 EDT ---

1. This looks more like a feature than a bug fix? The keyword doesn't reflect
this.
2. Severity/Priority?
3. Do we have numbers before/after?

--- Additional comment from Ravishankar N on 2018-07-11 21:00:35 EDT ---

Initial bare bones MVP-0 patches to get
https://github.com/gluster/glusterfs/issues/352 working were sent against the
github issue itself. Since then issue has been closed and we are creating bugs
to send fixes and other MVP milestones (see the document referenced in the
github issue).  The feature is 'on the way' to becoming demo-worthy.

--- Additional comment from Worker Ant on 2018-08-16 08:07:57 EDT ---

REVIEW: https://review.gluster.org/20748 (afr: common thin-arbiter functions)
posted (#1) for review on master by Ravishankar N

--- Additional comment from Worker Ant on 2018-08-23 02:38:03 EDT ---

COMMIT: https://review.gluster.org/20748 committed in master by "Ravishankar N"
<ravishankar at redhat.com> with a commit message- afr: common thin-arbiter
functions

...that can be used by client and self-heal daemon, namely:

afr_ta_post_op_lock()
afr_ta_post_op_unlock()

Note: These are not yet consumed. They will be used in the write txn
changes patch which will introduce 2 domain locking.

updates: bz#1579788
Change-Id: I636d50f8fde00736665060e8f9ee4510d5f38795
Signed-off-by: Ravishankar N <ravishankar at redhat.com>

--- Additional comment from Worker Ant on 2018-08-25 10:34:03 EDT ---

REVIEW: https://review.gluster.org/20994 (afr: thin-arbiter read txn changes)
posted (#1) for review on master by Ravishankar N

--- Additional comment from Worker Ant on 2018-08-31 05:57:52 EDT ---

REVIEW: https://review.gluster.org/21054 (afr: thin-arbiter read txn changes)
posted (#1) for review on master by Ravishankar N

--- Additional comment from Worker Ant on 2018-09-05 04:28:54 EDT ---

COMMIT: https://review.gluster.org/20994 committed in master by "Ravishankar N"
<ravishankar at redhat.com> with a commit message- afr: thin-arbiter read txn
changes

If both data bricks are up, read subvol will be based on read_subvols.

If only one data brick is up:
- First qeury the data-brick that is up. If it blames the other brick,
allow the reads.

- If if doesn't, query the TA to obtain the source of truth.

TODO: See if in-memory state can be maintained for read txns (BZ 1624358).

updates: bz#1579788
Change-Id: I61eec35592af3a1aaf9f90846d9a358b2e4b2fcc
Signed-off-by: Ravishankar N <ravishankar at redhat.com>

--- Additional comment from Worker Ant on 2018-09-07 08:21:23 EDT ---

REVIEW: https://review.gluster.org/21120 (afr: thin-arbiter 2 domain locking
and in-memory state) posted (#1) for review on master by Ravishankar N

--- Additional comment from Worker Ant on 2018-09-20 05:19:01 EDT ---

COMMIT: https://review.gluster.org/20103 committed in master by "Ravishankar N"
<ravishankar at redhat.com> with a commit message- cluster/afr: Use 2 domain
locking in SHD for thin-arbiter

With this change when SHD starts the index crawl it requests
all the clients to release the AFR_TA_DOM_NOTIFY lock so that
clients will know the in memory state is no more valid and
any new operations needs to query the thin-arbiter if required.

When SHD completes healing all the files without any failure, it
will again take the AFR_TA_DOM_NOTIFY lock and gets the xattrs on
TA to see whether there are any new failures happened by that time.
If there are new failures marked on TA, SHD will start the crawl
immediately to heal those failures as well. If there are no new
failures, then SHD will take the AFR_TA_DOM_MODIFY lock and unsets
the xattrs on TA, so that both the data bricks will be considered
as good there after.

Change-Id: I037b89a0823648f314580ba0716d877bd5ddb1f1
fixes: bz#1579788
Signed-off-by: karthik-us <ksubrahm at redhat.com>

--- Additional comment from Shyamsundar on 2018-10-23 11:09:12 EDT ---

This bug is getting closed because a release has been made available that
should address the reported issue. In case the problem is still not fixed with
glusterfs-5.0, please open a new bug report.

glusterfs-5.0 has been announced on the Gluster mailinglists [1], packages for
several distributions should become available in the near future. Keep an eye
on the Gluster Users mailinglist [2] and the update infrastructure for your
distribution.

[1] https://lists.gluster.org/pipermail/announce/2018-October/000115.html
[2] https://www.gluster.org/pipermail/gluster-users/

--- Additional comment from Ravishankar N on 2018-10-24 21:26:17 EDT ---

https://review.gluster.org/#/c/glusterfs/+/20095/ is yet to be merged. Moving
back to POST

--- Additional comment from Worker Ant on 2018-10-25 08:26:55 EDT ---

COMMIT: https://review.gluster.org/20095 committed in master by "Ravishankar N"
<ravishankar at redhat.com> with a commit message- afr: thin-arbiter 2 domain
locking and in-memory state

2 domain locking + xattrop for write-txn failures:
--------------------------------------------------
- A post-op wound on TA takes AFR_TA_DOM_NOTIFY range lock and
AFR_TA_DOM_MODIFY full lock, does xattrop on TA and releases
AFR_TA_DOM_MODIFY lock and stores in-memory which brick is bad.

- All further write txn failures are handled based on this in-memory
value without querying the TA.

- When shd heals the files, it does so by requesting full lock on
AFR_TA_DOM_NOTIFY domain. Client uses this as a cue (via upcall),
releases AFR_TA_DOM_NOTIFY range lock and invalidates its in-memory
notion of which brick is bad. The next write txn failure is wound on TA
to again update the in-memory state.

- Any incomplete write txns before the AFR_TA_DOM_NOTIFY upcall release
request is got is completed before the lock is released.

- Any write txns got after the release request are maintained in a ta_waitq.

- After the release is complete, the ta_waitq elements are spliced to a
separate queue which is then processed one by one.

- For fops that come in parallel when the in-memory bad brick is still
unknown, only one is wound to TA on wire. The other ones are maintained
in a ta_onwireq which is then processed after we get the response from
TA.

Change-Id: I32c7b61a61776663601ab0040e2f0767eca1fd64
updates: bz#1579788
Signed-off-by: Ravishankar N <ravishankar at redhat.com>
Signed-off-by: Ashish Pandey <aspandey at redhat.com>

Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1579788
[Bug 1579788] Thin-arbiter: Have the state of volume in memory
-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.