[Gluster-users] complete f......p thanks to glusterfs...applause, you crashed weeks of work

Tue Sep 2 03:49:46 UTC 2014

until a few days ago my opinion about glusterfs was " working but stable", now i would just call it the a versatile data and time blackhole.

Though i don't even feel like the dev's read the gluster-users list, i suggest you shot yourself and just do it like truecrypt ( big disclaimer: this software is insecure, use another product, NO PRODUCTION USE).

It started with the usual issues ,not syncing(3.2) , shd fails(3.2-3.3), peer doesnt reconnect(3.3), ssl keys have to be 2048-bit fixed size and all keys have to bey verywhere(all versions....which noob programmed that ??), only control connection is encrypted,   etc. etc. i kept calm, resynced,recreated, already gave up.. VERY OFTEN..

At a certain point it also used tons of diskspace due to not deleting files in the ".glusterfs" directory , (but still being connected and up serving volumes)

IT WAS A LONG AND PAINFUL SYNCING PROCESS until i thought i was happy ;)

But now the master-fail happened:
 (and i already know you can't pop out a simple solution, but yeah come, write your mess.. i'll describe it for you)

Due to an Online-resizing lvm/XFS glusterfs (i watch the logs nearly all the time) i discovered "mismacthing disk layouts" , realizing also that 

server1 was up and happy when you mount from it, but server2 spew input/output errors on several directories (for now just in that volume),

i tried to rename one directory, it created a recursive loop inside XFS (e.g. BIGGEST FILE-SYSTEM FAIL : TWO INODES linking to one dir , ideally containing another)
i got at least the XFS loop solved.

Then the pre-last resort option came up.. deleted the volumes, cleaned all xattr on that ~2T ... and recreated the volumes, since shd seems to work somehow since 3.4
guess what happened ?? i/o errors on server2 on and on , before i could mount on server1 from server 2 without i/o errors..not now..

Really i would like to love this project, but right now i'm in the mood for a killswitch (for the whole project), the aim is good, the way glusterfs tries to achieve this is just poor..tons of senseless logs, really , even your worst *insertBigCorp* DB server will spit less logs, glusterfs in the default setting is just eating your diskspace with logs, there is no option to rate-limit , everytime you start a volume it logs the volume config... sometimes i feel like git would be the way to go, not only for the logs (git-annex ;) ) .

now i realized through "ls -R 1>/dev/null" that this happend on ALL volumes in the cluster, an known problem "can't stat folders".

Maybe anyone has a suggestion , except "create a new clean volume and move all your TB's" .

Regards