[Gluster-users] distribute replicated volume and tons of questions

Wed Feb 22 18:27:51 UTC 2017

On 02/21/17 09:33, Gandalf Corvotempesta wrote:
> Some questions:
>
> 1) can I start with a simple replicated volume and then move to a
> ditributed, replicated by adding more bricks ? I would like to start
> with 3 disks and then add 3 disks more in next month.
> seems stupid but this allow me to buy disks from different production batches.

Yes, you'll need to rebalance after you add a dht set so the hash table 
can utilize the new subvolume(s).

>
> 2) let's assume (to keep it simple) a 1GB file with sharding enabled
> with 100MB size.
> In a replicated volume with just 1 replicated brick, all shared (and
> thus the file) are placed on the brick (replicated to 3 servers).
> What in case of 2 bricks ? Gluster will place shard 1 to 5 on brick1,
> and 6 to 10 on brick2 or "distribution" only happen for the whole file
> ? (in example, all shards for file1 are placed on brick1, and all
> shards for file2 are placed on brick2)

My understanding is that the shards will be distributed using the same 
distributed hash table algorithm as any other file. (See 
https://joejulian.name/blog/dht-misses-are-expensive/ )

> 3) Based on question 2, when accessing a distributed file, gluster
> will read from all disks increasing the available bandwidth and
> thourhgput ?

That depends on where your bandwidth bottlenecks are.

>
> 4) Still keeping it simple, very simple: let's assume a VM with 10GB
> disk image placed on a distributed replicated volume. This VM hosts a
> simple webserver with a simple, but huge, website.
> Users accessing the website will access different section of the
> underlaying disk image.
> These accesses are distributed across the 2 bricks doubling the read
> performance (and write performance, as I can write on 2 disks at once)
> ?

If your web servers are hitting the disk for every page load, you're 
doing it wrong. As for your performance question, you are on the right 
train of thought.

> 5) by using ZFS, should I use a redundant ZIL ? What happens in case
> of ZIL failure? Usually, some date are lost, but Gluster is replicated
> in a syncronous way, thus loosing a ZIL on a single server should not
> be an issue, right ? Is gluster able to recover from this
> automatically ?

I can't answer ZFS questions. I, personally, don't feel it's worth all 
the hype it's getting and I don't use it.

> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users