[Bugs] [Bug 1379178] New: split brain on file recreate during "downed" brick.

Sun Sep 25 19:40:55 UTC 2016

https://bugzilla.redhat.com/show_bug.cgi?id=1379178

            Bug ID: 1379178
           Summary: split brain on file recreate during "downed" brick.
           Product: GlusterFS
           Version: 3.7.14
         Component: selfheal
          Severity: urgent
          Assignee: bugs at gluster.org
          Reporter: jaco at uls.co.za
                CC: bugs at gluster.org

Description of problem:

If a brick goes down, and a file gets recreated the GFID values end up
differing resulting in a split brain due to mismatching GFID values.

In essence, let's say we have a file called some_file, with a gfid of 112233. 
link count is 1 (2 on the brick, some_file and gfid link).

We have two bricks, A and B (replicated).

Brick B goes down.

On brick A some_file gets removed, and recreated (in the case I've seen with
courier-imap it's actually a rename of a different file into some_file).

When some_file got removed the GFID gets discarded too.  When the file gets
recreated a new GFID gets allocated, say aabbcc.

Brick B comes back up.

At this point some_file exists on both Brick A and Brick B but with differing
GFID values.  Since the directory contents was modified during B outage the
directory is marked for healing.  Due to the differing GFID values this fails.

Version-Release number of selected component (if applicable): 3.7.14

How reproducible:

Extremely.

Steps to Reproduce:
1.  Create a file when both bricks are up.
2.  Down a brick.
3.  rm the file.
4.  recreate the file.
5.  Up the downed brick.

Actual results:

File gives I/O errors and containing directory heal fails.

Expected results:

For downed brick to track the remove and recreate.

Additional info:

http://jkroon.blogs.uls.co.za/filesystems/glusterfs-and-courier-imap - I've
tried to perform a thorough write-up and analysis - including possible
solutions.

FYI - I definitely have an outage coming next weekend (Friday evening we've got
another scheduled power outage on the other power rail this time).  I'm pretty
sure this won't be solved by then, but at this stage I'm probably better off
sticking to a single server with NFS solution and taking the beatings as they
come.

If there is a way in which I can say that when brick A returns on Saturday
morning to completely discard it's contents and heal in full from B (ie, full
resync) that is something I'll need to consider.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.