[Gluster-users] complete f......p thanks to glusterfs...applause, you crashed weeks of work

Tue Sep 2 14:13:09 UTC 2014

> ssl keys have to be 2048-bit fixed size

No, they don't.

> all keys have to bey verywhere(all versions....which noob programmed
> that ??)

That noob would be me.

t's not necessary to have the same key on all servers, but using
different ones would be even more complex and confusing for users.
Instead, the servers authenticate to one another using a single
identity.  According to SSL 101, anyone authenticating as an identity
needs the key for that identity, because it's really the key - not the
publicly readable cert - that guarantees authenticity.

If you want to set up a separate key+cert for each server, each one
having a "CA" file for the others, you certainly can and it works.
However, you'll still have to deal with distributing those new certs.
That's inherent to how SSL works.  Instead of forcing a particular PKI
or cert-distribution scheme on users, the GlusterFS SSL implementation
is specifically intended to let users make those choices.

> only control connection is encrypted

That's not true.  There are *separate* options to control encryption
for the data path, and in fact that code's much older.  Why separate?
Because the data-path usage of SSL is based on a different identity
model - probably more what you expected, with a separate identity per
client instead of a shared one between servers.

> At a certain point it also used tons of diskspace due to not deleting
> files in the ".glusterfs" directory , (but still being connected and
> up serving volumes)

For a long time, the only internal conditions that might have caused
the .glusterfs links not to be cleaned up were about 1000x less common
than similar problems which arise when users try to manipulate files
directly on the bricks.  Perhaps if you could describe what you were
doing on the bricks, we could help identify what was going on and
suggest safer ways of achieving the same goals.

> IT WAS A LONG AND PAINFUL SYNCING PROCESS until i thought i was happy
> ;)

Syncing what?  I'm guessing a bit here, but it sounds like you were
trying to do the equivalent of a replace-brick (or perhaps rebalance) by
hand.  As you've clearly discovered, such attempts are fraught with
peril.  Again, with some more constructive engagement perhaps we can
help guide you toward safer solutions.

> Due to an Online-resizing lvm/XFS glusterfs (i watch the logs nearly
> all the time) i discovered "mismacthing disk layouts" , realizing also
> that
>
> server1 was up and happy when you mount from it, but server2 spew
> input/output errors on several directories (for now just in that
> volume),

The "mismatching layout" messages are usually the result of extended
attributes that are missing from one brick's copy of a directory.  It's
possible that the XFS resize code is racy, in the sense that extended
attributes become unavailable at some stage even though the directory
itself is still accessible.  I suggest that you follow up on that bug
with the XFS developers, who are sure to be much more polite and
responsive than we are.

> i tried to rename one directory, it created a recursive loop inside
> XFS (e.g.  BIGGEST FILE-SYSTEM FAIL : TWO INODES linking to one dir ,
> ideally containing another) i got at least the XFS loop solved.

Another one for the XFS developers.

> Then the pre-last resort option came up.. deleted the volumes, cleaned
> all xattr on that ~2T ... and recreated the volumes, since shd seems
> to work somehow since 3.4

You mention that you cleared all xattrs.  Did you also clear out
.glusterfs?  In general, using anything but a completely empty directory
tree as a brick can be a bit problematic.

> Maybe anyone has a suggestion , except "create a new clean volume and
> move all your TB's" .

More suggestions might have been available if you had sought them
earlier.  At this point, none of us can tell what state your volume is
in, and there are many indications that it's probably a state none of us
have never seen or anticipated.  As you've found, attempting random
fixes in such a situation often makes things worse.  It would be
irresponsible for us to suggest that you go down even more unknown and
untried paths.  Our first priority should be to get things back to a
known and stable state.  Unfortunately at this point the only such
state at this point would seem to be a clean volume.