[Gluster-devel] File integrity and consistency in geo-replication

Natale Vinto ebballon at gmail.com
Wed Dec 19 17:57:20 UTC 2012

Hi guys,
I am a student from Italy and I should make my thesis on Glusterfs
focusing on the geo-replication and so I would better understand
mechanisms and if any model and theory adopted behind it. The case
study of the thesis would be the long time preservation data got from
the forthcoming digitalization of two big libraries (National Library
of Florence and Rome), with a very large amount of data to be kept in
a secure way (maybe about 800TB or WARC files of about 200MB)
distribuited on 6 nodes in 3 providers, as better shown here [1] .

While performance are not very important, files integrity and
consistency anyway and anytime are essential priorities, thus I would
know how it would be possible to ensure it using the geo-replication
and if there could be any model or theory that can help, even in the
worste case of Master unreachability or something else that could go
wrong. I would ask you what could be the better approach to deal with
this problem and if is there any theory ensuring it.

As you can see from here [2], the Library has considered using Gluster
but they found difficulties and many fails in local geo-replication
(gluster 3.3), and my role would be to study this part to better
understand how issues comes, combining the study with the pratical
functionality of the system.

I saw the Server Quorum feature for the next version, I was wondering
if it is the one from the Duvvuri theory and if could be useful for
that case killing unconsistent bricks.
And, what about using Hadoop with the Gluster connector?

I think that this work would require a massive study and testing (for
me at least!), but it would be very nice do this research trying to
get an international cultural needing working thanks to a big
opensource project, "in perpetuum" :)

[1] http://bit.ly/Zj6T1T

