[Gluster-users] GlusterFS on ZFS

Thu Apr 18 03:21:26 UTC 2019

Hi Code,

Keep in mind that if you like the thin LVM approach, you can still use VDO (Red Hat-based systems) and get that deduplication/compression.
VDO most probably will require some tuning to get the writes fast enough, but the reads  can be way faster.

Best Regards,
Strahil NikolovOn Apr 17, 2019 18:34, Pascal Suter <pascal.suter at dalco.ch> wrote:
>
> Hi Cody
>
> i'm still new to Gluster myself, so take my input with the necessary 
> skepticism:
>
> if you care about performance (and it looks like you do), use zfs mirror 
> pairs and not raidz volumes. in my experience (outside of gluster), 
> raidz pools perform significantly worse than a hardware raid5 or 6. if 
> you combine a mirror on zfs with a 3x replication on gluster, you need 
> 6x the amount of raw disk space to get your desired redundancy.. you 
> could do with 3x the amount of diskspace, if you left the zfs mirror 
> away and accept the rebuild of a lost disk over the network or you could 
> end up somewhere beween 3x and 6x if you used hardware raid6 instead of 
> zfs on the bricks. When using hardware raid6 make sure you align your 
> lvm volumes properly, it makes a huge difference in performance. Okay, 
> deduplication might give you some of it back, but benchmark the zfs 
> deduplication process first before deciding on it. in theory it could 
> add to your write perofrmance, but i'm not sure if that's going to 
> happen in reality.
>
> snapshotting might be tricky.. afaik gluster natively supports 
> snapshotting with thin provisioned lvm volumes only. this lets you 
> create snapshots with the "gluster" cli tool. gluster will then handle 
> consistency across all your bricks so that each snapshot (as a whole, 
> across all bricks) is consistent in itself. this includes some 
> challenges about handling open file sessions etc. I'm not familiar with 
> what gluster actually does but by reading the documentation and some 
> discussion about snapshots it seems that there is more to it than simply 
> automate a couple of lvcreate statements. so i would expect some 
> challenges when doing it yourself on zfs rather than letting gluster 
> handle it. Restoring a single file from a snapshot also seems alot 
> easier if you go with the lvm thin setup.. you can then mount a snapshot 
> (of your entire gluster volume, not just of a brick) and simply copy the 
> file.. while with zfs it seems you need to find out which bricks your 
> file resided on, then copy the necessary raw data to your live bricks 
> which is something i would not feel comfortable doing and it is a lot 
> more work and prone to error.
>
> also, if things go wrong (for example when dealing with the snapshots), 
> there are probably not so many people around to help you.
>
> again, i am no expert, that's just what i'd be concerned about with the 
> little knowledge i have at the moment :)
>
> cheers
>
> Pascal
>
> On 17.04.19 00:09, Cody Hill wrote:
> > Hey folks.
> >
> > I’m looking to deploy GlusterFS to host some VMs. I’ve done a lot of reading and would like to implement Deduplication and Compression in this setup. My thought would be to run ZFS to handle the Compression and Deduplication.
> >
> > ZFS would give me the following benefits:
> > 1. If a single disk fails rebuilds happen locally instead of over the network
> > 2. Zil & L2Arc should add a slight performance increase
> > 3. Deduplication and Compression are inline and have pretty good performance with modern hardware (Intel Skylake)
> > 4. Automated Snapshotting
> >
> > I can then layer GlusterFS on top to handle distribution to allow 3x Replicas of my storage.
> > My question is… Why aren’t more people doing this? Is this a horrible idea for some reason that I’m missing? I’d be very interested to hear your thoughts.
> >
> > Additional thoughts:
> > I’d like to use Ganesha pNFS to connect to this storage. (Any issues here?)
> > I think I’d need KeepAliveD across these 3x nodes to store in the FSTAB (Is this correct?)
> > I’m also thinking about creating a “Gluster Tier” of 512GB of Intel Optane DIMM to really smooth out write latencies… Any issues here?
> >
> > Thank you,
> > Cody Hill
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > _____________________