[Bugs] [Bug 1336098] heal info command takes tens of minutes when in split-brain situation.

Mon Jun 27 12:19:14 UTC 2016

https://bugzilla.redhat.com/show_bug.cgi?id=1336098

René Pavlík <skyrat at email.cz> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
              Flags|needinfo?(skyrat at email.cz)  |

--- Comment #2 from René Pavlík <skyrat at email.cz> ---
Hi, Pranith,

not exactly the steps, but I can give you a description of our setup and the
triggers of that split-brain. The main aspect of this bug report is to have
reliable, always-returning-something tool to detect a split-brain - for example
by an external monitoring system, invoking the command each minute or so. I
have seen the reported behavior every time our setup had a connection issues
and gfid split-brain occurred. But the lasting time of the command depends on
the extent of the damage.

Our setup:
- 3 replicated nodes with client quorum
- 15 servers having the cluster mounted locally, rsyncing their data to the
glusterfs, to their own directories (sharing the same, common parent dir)
- the files are only being appended with new data or new files are being added,
no deletion
- when there is a connection issue, the gfid split-brain occurs: on each brick,
there is the latest data file with different size and gfid, or is missing
entirelly on some bricks.
- the total amount of the files in real split-brain is about 50
- sometimes also the containing directory has this issue

In such situation we would like to detect the split-brain but the issue
reported occurs.

I'm sorry, that I cannot give you exact scenario, where you would directly see
the issue. Hope this helps.

If you need additional info, please ask.

Thanks.

Rene

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.