[Gluster-users] Unify / Distribute / Strip -- Some feedback

Wed Apr 7 03:40:59 UTC 2010

Hi,

First of all, excuse my imperfect English, and use the following info as 
a user story for my personal experience, the results and problems shown 
here may or may not apply to your particular environment and 
requirements, but if you are thinking on using glusterfs or are already 
having problems, this might be useful:

Like someone already read on my previous messages, I am facing a 
situation where Distribute / Strip translators will run into "non free 
space" situations even when the overall cluster shows Gb's of free space 
left. So after some good advice I am falling back to Unify translator.

As I have a previous glusterfs setup (as distribute) and I had to move 
everything from there (and I don't have an intermediate mountpoint where 
I could gather all the data), I am forced to mount both old and new 
glusters in the same machine, and then moving data from the old to the 
new one.

My first try was creating the new gluster over glusterfs 3.0.3 using 
stripe translator. But then I found that I will also fall into the "non 
free space" situation, and I had to look for another solution, which 
ended in rolling back to 2.0.4 and using Unify translator with ALU 
scheduler.

Moving data became extremely (and painfully) slow: reading from a 
networked gluster and writing to another one! When using 3.0.3 Stripe I 
was hitting some useful transfer speed, but when I switched back to 
2.0.4 Unify I got an overall transfer speed of 1,2~2,0 Mb/s. With almost 
600 Gb of info that would last forever.

So what I did was stopping the old gluster (distributed) and log in on 
the storage nodes, then rsync all the content over ssh into the mount 
point of the new gluster. This improved the transfer speed 
significantly, achieving some nice speed of almost 20 mbps.

Attached to this email is a graph (generated with Graphite) showing the 
evolution of the filling process of the new gluster. The green line 
shows the size of the old gluster (original data) while the blue one 
shows the evolution of filling new one. There you can see the 1st slope 
which was filling the 3.0.3 Stripe, and the 2nd one which belongs to the 
2.0.4 Unify.

Mark 1 shows when we hit the false "disk full" situation. Before that 
you see the speed of copying from one gluster mount point to another 
directly.
Mark 2 shows the incredibly slow speed slope of directly copying from 
gluster to gluster when using 2.0 and Unify as target. Note the amazing 
difference against 3.0 direct copy. Both copies were performed with a 
single "cp -r" in the system mounting both glusters.
Mark 3 shows the speed slope when I started to copy *simultaneously* 
using "cp" and "rsync" from the storage nodes. Still it's quite slower 
than 3.0 results.
Mark 4 shows the speed with the original gluster stopped and data being 
copied using only rsync from the storage nodes over ssh. In this case 
you can see much better performance than 3.0

Another important thing to feedback about Unify: I misunderstood the 
storage schema and at first I dedicated 2 full storage nodes for 
replicating the namespace, thus loosing 40 Gb of overal storage 
capacity. Then Krzysztof suggested using the storage space and moving 
the namespace volume to another defined brick on the same machine nodes, 
thus having 2 machines with both storage and namespace info. Then I run 
into the question of having to re-create the whole data on the 
namespace, either somehow or having to start the copy back from scratch 
(again), but I just tried moving the files locally on the nodes to 
another folder (using "mv" and with the glusterfsd daemon stopped) and 
it worked finely!

This I recovered back the whole capacity and functionality of the 
storage cluster.

For us, the need of this storage cluster is basically a backup space, 
most commonly written to and very rarely read from. Also, we did not 
want to enter into more complicated clustering schema, and I personally 
wanted to avoid using Lustre or GFS (our next alternatives) because both 
need to install kernel modules and use LVM for storage, and I find it 
more useful for us the possibility to always access locally the data on 
the cluster nodes, in case the service goes down. Simplifying the 
general structure and keeping all in user space was worthy enough for us.

However, it is quite disappointing finding out that both the actual 
given approaches of glusterfs storage do not work properly for 
production environment, as stated before, because I will eventually be 
unable to use the whole cluster disk space. Unify on 2.0 seems to be 
slower in transfer speed, but at least it does work. I can't understand 
why the only fully working solution has been deprecated and can't be 
used with the last version, making the whole purpose of glusterfs 3.0 
just a theoretical solution.

Another very annoying point in the whole process was the complete 
absence of online documentation. The official wiki is vague and 
incomplete, and the only suggestion I found was "use the volume 
generator script", but I hardly can find how the translators work 
internally. It is very sad to find out that the only well documented 
translator on the whole wiki was the deprecated Unify, aging back to 1.4 
versions.

Thanks to all the people on the list who helped me finding out the 
problems and solutions, and the answers I couldn't find for the key 
questions on the official "documentation".

Hope this become somehow useful for the upcoming people, and as always, 
suggestions. comments  and corrections are more than welcome.