[Gluster-devel] Automated split-brain resolution

Fri Aug 8 18:48:44 UTC 2014

>
>
> While we could extend the existing heal command, we also need to provide a
> policy flag. Entering "y/n" for 1000 files does not make the process any
> easier.
>

What i meant was not a solution just to give you suggestions, of
course there should be improvements on that too. Look at e2fsck output
when fixing corruption issues for example.

> I don't follow this part completely. If `info split-brain` gives you the
> gfid instead of file path, you could just go to the .glusterfs/<gfid
> hardlink> and do a setfattr there.
>

It isn't about just setfattr, one needs to validate which file it
points to make any sense. Are you saying that do you know the contents
of the file just by looking at a canonical gfid form?

> command for each entry in the file. Also makes it easy to integrate with a
> GUI: Click 'get files in sb' and you have a scroll-down list of files with
> polices against each file. Select a file, tick the policy and click
> 'resolve-sb' and done!
>

I agree to policy style, but the inherent problem is never fixed you
are still asking some one to write scripts using "info split-brain".

Here is the breakdown how it happens today

- grep /var/log/glusterfs/glustershd.log | awk (get gfids)
- Run the script to see which files are really in split brain
"(gfid-to-file.sh)" - Thanks Joe Julian!
  Do this on all servers and grab output

  Now this on a large enough cluster example 250TB volume with
60million files takes 4hrs, assuming that
  we didn't have more split brain in between
- Next 'gather getfattr/setfattr' output
- Figure out which to be deleted - then delete.

This whole cycle is a 2~3day activity on bigger clusters.

With your approach after having a policy

- grep /var/log/glusterfs/glustershd.log | awk (get gfids)
- Run the script to see which files are really in split brain
"(gfid-to-file.sh)" - Thanks Joe Julian!
  Do this on all servers and grab output

  Now this on a large enough cluster example 250TB volume with
60million files takes 4hrs, assuming that
  we didn't have more split brain in between.
- Figure out which to be deleted provide a policy based on
source-brick or bigger-file.  (In-fact this seems like just a
replacement for `rm -rf`)

Now what is ideal

- Figure out which file be deleted based on a policy (name your policy)

A 250TB cluster is a simply POC cluster in case of GlusterFS not
production, so you could think of scales of magnitude higher when
there is a problem.

Questions that occur here is:

- Why does one write a script at all? when we are ought to be
responsible for this information and even providing valid suggestions.
- if you are saying that 'info split-brain' to print gfid's what
purpose does it solve anyways?  I would even get rid of that 'info
split-brain'  - why would anyone needs to see which files are in split
brain when all we are printing is 'gfid' ?
- Trust is on us when a user copies their data into GlusterFS and we
are solely responsibly for it. If we cannot make valid decisions about
the files which we are supposed to manage, how do you expect a normal
user to make better decisions than us?

Here is an example we came across - there was suggestion i made to
Pranithk based out of Avati's idea that even a file in metadata split
brain can be made readable which is not the case today. This came out
of the fact that there are some important details which we know wholly
as a system which is not present with the user himself.

Since this has been a perpetuating misery for years, i would like to
see this fixed in a more convincing manner.

Excuse me being blunt about it!

> So we now have the command:
> # gluster volume heal <VOLNAME> [full | info [split-brain] | split-brain
> {bigger-file  |  source-brick <brick_name>} [<file>] ]
>
> The relevant new extension being:
> gluster volume heal  <VOLNAME> split-brain {bigger-file  | source-brick
> <brick_name>} [<file>]
>

This looks good.

-- 
Religious confuse piety with mere ritual, the virtuous confuse
regulation with outcomes