Sun Dec 17 13:40:52 UTC 2023

Il 14/12/2023 16:08, Joe Julian ha scritto:

> With ceph, if the placement database is corrupted, all your data is lost 
> (happened to my employer, once, losing 5PB of customer data).

 From what I've been told (by experts) it's really hard to make it 
happen. More if proper redundancy of MON and MDS daemons is implemented 
on quality HW.

> With Gluster, it's just files on disks, easily recovered.

I've already had to do it twice in a year with the coming third time 
that's the "definitive migration".
The first time there were too many little files, the second it seemed 
192GB RAM are not enough to handle 30 bricks per server, and now that I 
reduced to just 6 bricks per server (creating RAIDs) and created a brand 
new volume in august, I already find lots of FUSE-inaccessible files 
that doesn't heal. Should be impossible since I'm using "replica 3 
arbiter 1" over IPoIB with the three servers speaking directly via the 
switch. But it keeps happening. I really trusted Gluster promises, but 
currently what I (and, worse, the users) see is a 60-70% availability.

Neither Gluster nor Ceph are "backup solutions", so if the data is not 
easily replaceable it's better to have it elsewhere. Better if offline.

