[Gluster-devel] Review request: Data corruption in write ordering of rebalance and application writes

Sun Nov 6 05:23:26 UTC 2016

Hi all,

Requesting for review of [1].

Bug: Lack of atomicity b/w read-src and write-dst of rebalance process [2]

Description & proposed solution:
Currently rebalance process does,
1. read (src)
2. write (dst)
To make sure that src and dst are identical, we need to make 1 and 2 atomic. Otherwise with parallel writes happening to same region during rebalance, writes on dst can go out of order (relative to src) and dst can be different from src which is basically a corruption [2]. To make atomic, we need to:
* lock (src) the region of file being read before 1
* unlock (src) the region of file being read after 2
and make sure that this lock blocks new writes from application (till an unlock is issued). Combining this with the approach that application writes are serially written to src first and then to dst, we will have the solution.

[1] http://review.gluster.org/#/c/15698/
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1376757

Thanks & Regards,
Karthik