[Gluster-devel] Roadmap for afr, ec
fanghuang.data at yahoo.com
fanghuang.data at yahoo.com
Wed Sep 16 10:12:19 UTC 2015
For the EC encoding/decoding algorithm, could we design a plug-in mechanism to make users can choose their own
algorithm or can use the third side library just like Ceph? And I am also curious why originally the IDA algorithm
is chosen, instead of the common used Reed-Solomon algorithm?
> On Monday, 14 September 2015, 16:30, Pranith Kumar Karampuri <pkarampu at redhat.com> wrote:
> > hi,
> Here is a list of common improvements for both ec and afr planned over
> the next few months:
> 1) Granular entry self-heals.
> Both afr and ec at the moment do lot of readdirs and lookups to
> figure out the differences between the directories to perform heals.
> Kritika, Ravi, Anuradha and I are discussing about how to prevent this.
> The base algo is to store only the names that need heal in
> .glusterfs/indices/entry-changes/<parent-dir-gfid>/ as links to base
> file in .glusterfs/indices/entry-changes of the bricks. So only the
> names that need to be healed will be going through name heals.
> We want to complete this for 3.8 definitely.
> 2) Granular data self-heals.
> At the moment even if a single byte changes in the file afr, ec
> read the entire file to fix the problems. We are thinking of preventing
> this by remembering where the changes happened on the file in extended
> attributes. There will be a new extended attribute on the file which
> represents a bit map of the changes and each bit represents a range that
> needs healing. This extended attribute will have a maximum size it can
> represent, the extra chunks will be represented like shards in
> .glusterfs/indices/data-changes/<gfid-<block-num>> extended
> attribute on
> this block will store ranges that need heals.
> For example: If we have extended attribute value maximum size as 4KB and
> each bit represents 128KB (i.e. first bit represents changes done from
> offset 0-128KB, 2nd bit 128KB+1-256KB etc.), In single extended
> attribute we can store changes happening to file upto 4GB (We are
> thinking of dynamically increasing the size represented by each bit from
> say 4k to 128k, but this is still in design). For changes that are
> happening from offset 4GB+1 - 8GB will be stored in extended attribute
> of .glusterfs/indices/data-changes/<gfid-of-file-1>. Changes happening
> from offset 8GB+1 to 12GB will be stored in extended attribute of
> .glusterfs/indices/data-changes/<gfid-of-file-2>, (please note that
> these files are empty, they will just contain extended attributes) etc.
> We want to complete this for 3.8 (stretch goal)
> 3) Performance & throttling improvements for self-heal:
> We are also looking into the multi-threaded self-heal daemon patch
> by Richard for inclusion in 3.8. We are waiting for the discussions by
> Raghavendra G on QoS to be over before coming to any decisions on
> After we have compound fops:
> Goal here is to come up with compound fops and prevent un-necessary
> round trips:
> 4) Transaction latency improvements:
> On afr:
> In the unoptimized version of transaction we have: 1) Lock, 2)
> Pre-op 3) op 4) Post-op 5) unlock
> We will
> have: 1)
> Lock, 2) Pre-op + op 3) post-op + unlock
> This reduces round trips from 5 to 3 in the un-optimized version
> of afr-transaction.
> On EC:
> In the unoptimized version (worst case of unaligned write) of
> transaction we have: 1) Lock, 2) get version, size xattrs 3) reads of
> pre, post unaligned chunks 4) op 5) update version, size 6) unlock
> We will
> have: 1)
> Lock + get version, size xattrs + reads of pre, post unaligned chunks,
> 2) op 3) update version, size + unlock
> This reduces round trips from 6 to 3 in the un-optimized version
> of ec-transaction.
> 5) Entry self-heal per name latency improvements:
> Before: 1) Lock, 2) lookup to determine if the file needs to be
> deleted/created 3) create/delete 4) Unlock
> After: 1) Lock + lookup 2) delete/create + unlock
> Roadmap that applies only for EC: for 3.8
> - Use SSE2/AVX/NEON extensions when available to speed up Galois Field
> - Use a systematic matrix to improve encoding performance (it will also
> improve decoding performance when all bricks are healthy)
> - Implement a new algorithm able to detect and repair chunks of data on
> the fly.
> Roadmap that applies only for AFR:
> 1) Once granular entry/data heals, throttling are in, we can look at
> generalizing Richard's lazy replication patch to be used for Near
> synchronous replication between data centers and possibly just the
> bricks, haven't looked into the patch myself.
> We will be sending out more mails as soon as design completes for each
> of these items. We are eagerly waiting for Xavi to come back to get his
> comments as well for how EC will be impacted by the common changes.
> Feedback on this plan is very welcome!
> Gluster-devel mailing list
> Gluster-devel at gluster.org
More information about the Gluster-devel