[Bugs] [Bug 1376757] Data corruption in write ordering of rebalance and application writes

Fri Sep 23 10:50:39 UTC 2016

https://bugzilla.redhat.com/show_bug.cgi?id=1376757

--- Comment #1 from Raghavendra G <rgowdapp at redhat.com> ---
Currently rebalance process does:

1. read (src)
2. write (dst)

To make sure that src and dst are identical, we need to make combined
transaction of 1 and 2 atomic. Otherwise with parallel writes happening to same
region during rebalance, writes on dst can go out of order (relative to src)
and dst can be different from src which is basically a corruption.

Consider Following sequence of events happening on overlapping/same region of a
file:

1. rebalance process read a region (lets say which was written by an earlier
write w1).
2. application does a write (w2) to same region. w2 completes on src and dst.
3. rebalance process proceeds to write on dst the data it read in 1. So, w1 is
sent to dst.

After the above steps, the order of w1 and w2 in src is (w1, w2) but on dst, it
is (w2, w1). Hence w2 is lost on dst, resulting in corruption (as w2 was
reported success to application).

To make atomic, we need to:
* lock (src) the region of file being read before 1
* unlock (src) the region of file being read after 2

and make sure that this lock blocks new writes from application (till an unlock
is issued). Combine this with the approach that application writes are serially
written to src first and then to dst and we have a solution.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=vL52DYgjdm&a=cc_unsubscribe