[Gluster-users] Bricks as BTRFS

Ric Wheeler rwheeler at redhat.com
Fri Sep 26 21:57:32 UTC 2014


On 09/26/2014 03:40 PM, James wrote:
> On Fri, Sep 26, 2014 at 3:15 PM, Ric Wheeler <rwheeler at redhat.com> wrote:
>> On 09/26/2014 01:58 PM, James wrote:
>>> On Thu, Sep 25, 2014 at 2:53 AM, Venky Shankar <vshankar at redhat.com>
>>> wrote:
>>>> Hey folks,
>>>>
>>>> Wanted to check if anyone out here uses BTRFS (and willing to share their
>>>> experiences[1]) as the backend filesystem for GlusterFS. We're planning
>>>> to
>>>> explore some of it's features and put it to use for GlusterFS. This was
>>>> discussed briefly during the weekly meeting on #gluster-meeting[2].
>>>>
>>>> To start with, we plan to explore data/metadata checksumming (+
>>>> scrubbing)
>>>> and subvolumes to "offload" the work to BTRFS. The mentioned features
>>>> would
>>>> help us with BitRot detection[3] and Openstack Manila use cases
>>>> respectively
>>>> (though there are various other nifty things one would want to do with
>>>> them).
>>>>
>>>> Thanks in advance!
>>>
>>> Hey,
>>>
>>> I couldn't make the meeting, but I am interested in BTRFS. I added
>>> this in puppet-gluster a bunch of months ago as a feature branch.
>>>
>>> https://bugzilla.redhat.com/show_bug.cgi?id=1094860
>>>
>>> I just pushed it to git master.
>>>
>>>
>>> https://github.com/purpleidea/puppet-gluster/commit/6c962083d8b100dcaeb6f11dbe61e6071f3d13f0
>>>
>>> The reason I want btrfs support, is I want glusterfs to eventually be
>>> able to support reflinks across gluster volumes. There is a strong use
>>> case for this feature.
>>>
>>> Let me know if this helps!
>>> Cheers,
>>> James
>>>
>> Reflinks in btrfs (or ocfs2) need to be between files in the same linux
>> kernel instance of btrfs.  Effectively, we have two inodes backed by the
>> same physical blocks.
>>
>> It won't, in general, be useful for reflinks across volumes....
>>
>> Regards,
>>
>> Ric
>
> Agreed... Which is why this isn't a trivial thing for GlusterFS to do,
> but we've discussed certain mechanisms to emulate this behaviour
> across a Gluster volume. For example:
>
> * If the reflink causes the file to be on the same brick, just reflink.
> * If the reflink causes the file to be on a different brick, then
> reflink to self, and put a pointer to that original brick
> * If we want to reflink across volumes, then it's tricky, because fuse
> would have to pass this information through and down to the
> filesystem.
>
> The winning use case for this feature is that someone could
> backup/restore petabytes of data "virtually instantly". This is
> possible with single volume things, but I'd like to scale this to a
> distributed-replicated data store.

Not clear why you would do anything but "try reflink" and fail or succeed.

Effectively, from a kernel point of view, reflink is just a copy offload method. 
The default should be to fall back to the invoking application and it will fail 
if not supported (where you would go back to a full copy).

reflink for backup is really a bad idea since you will not have really made a 
second copy - if the disk fails (even partially!) you might lose data since we 
will not have a second copy of the blocks. Where it is not supported, you will 
still need to do a full file copy which means normal file operation speed for 
the backup and restore.

Ric



More information about the Gluster-users mailing list