[Gluster-devel] bit rot support for glusterfs

Thu Jan 9 10:42:49 UTC 2014

On 2 January 2014 19:30, Jeffrey Darcy <jdarcy at redhat.com> wrote:
>> I will be starting to work on bit rot detection for glusterfs.
>
> That's magnificent, and not only because it's a valuable feature.  It's great to see you still exhibiting such initiative.  :)

Leaving gluster developement was never my intention. Yes, due to my
other commitments, the development/response time might be slower. But
in the end, being able to develop in the community does give me
freedom of choice :)

>
>> 1. Depend on change-log to recompute checksum. This eliminates
>> periodic crawl of brick/volume to update the checksum.
>
> Absolutely.  This was always my biggest objection to Doug's design.  Crawling simply doesn't scale well.  Even for local replication we're moving to a more log-based approach.  The design you cite also mentions AFR-specific artifacts like outcast, which need to be avoided both as a matter of general practice and because those artifacts might no longer be relevant a year from now.
>

Agreed. Bit rot will only take into account checksum based of a single
brick/child.

>> 2. Policy to determine when checksum to recomputed. If a file is under
>> going active I/O, then compute checksum only after a delay
>
> This might play into what we are (or at least I am) planning with respect to tiering, HSM, data classification, or whatever you want to call it.  In fact, it might be something that you don't need to worry about at all.  If a volume/pool is divided into a "live" part geared toward high performance and an "archival" part geared toward longevity and storage efficiency - slower drives, erasure codes and/or bit rot detection instead of AFR/NSR - with transparent migration between them, then you might just be able to say that anything placed under your purview will have bit rot detection applied.  To put it another way, the policy would be applied elsewhere in the system.
>

Makes sense to allow policy to be decided else where, as in most use
cases bit-rot would be enabled only for archival stores. So the last
fd close on a file should lead to computation of a checksum on the
bricks. Does change-log xlator have the capability to identify the
last fd close on a file?

>> 3. Ability to turn off/on bit rot detection in volumes.
>>
>> 4. If bit rot is turned on for a volume, a crawl would be necessary in
>> this case to compute checksum.