[Gluster-users] Shard Volume testing (3.7.5)

Tue Oct 27 11:47:09 UTC 2015

On 26 October 2015 at 14:54, Krutika Dhananjay <kdhananj at redhat.com> wrote:

>
> Hi Lindsay,
>
> Thank you for trying out sharding and for your feedback. :) Please find my
> comments inline.
>

Hi Krutika, thanks for the feed back.

> With block size as low as 4MB, to the replicate module, these individual
> shards appear as large number of small(er) files, effectively turning it
> into some form of a small-file workload.
> There is an enhancement being worked on in AFR by Pranith, which attempts
> to improve write performance which will especially be useful when used with
> sharding. That should make  this problem go away.
>

Cool, also for my purposes (VM Image hosting), block sizes of 512MB are
just as good and improve things considerably.

>
> One Bug:
> After heals completed I shut down the VM's and run a MD5SUM on the VM
> image (via glusterfs) on each nodes. They all matched except for one time
> on gn3. Once I unmounted/remounted the datastore on gn3 the md5sum matched.
>
>
> This could possibly be the effect of a caching bug reported at
> https://bugzilla.redhat.com/show_bug.cgi?id=1272986. The fix is out for
> review and I'm confident that it will make it into 3.7.6.
>

Cool, I can replicate it fairly reliable at the moment.

Would it occur when using qemu/gfapi direct?

>
>
>
> One Oddity:
> gluster volume heals datastore info *always* shows a split brain on the
> directory, but it always heals without intervention. Dunno if this is
> normal on not.
>
>
> Which directory would this be?
>

Oddly it was the .shard directory

> Do you have the glustershd logs?
>

Sorry no, and I haven't managed to replicate it again. Will keep trying.

> Here is some documentation on sharding:
> https://gluster.readthedocs.org/en/release-3.7.0/Features/shard/. Let me
> know if you have more questions, and I will be happy to answer them.
> The problems we foresaw with too many 4MB shards is that
> i. entry self-heal under /.shard could result in complete crawl of the
> /.shard directory during heal, or
> ii. a disk replacement could involve lot many files needing to be created
> and healed to the sink brick,
> both of which would result in slower "entry" heal and rather high resource
> consumption from self-heal daemon.
>

Thanks, most interesting reading.

> Fortunately, with the introduction of more granular changelogs in
> replicate module to identify exactly what files under a given directory
> need to be healed to the sink brick, these problems should go away.
> In fact this enhancement is being worked upon as we speak and is targeted
> to be out by 3.8. Here is some doc:
> http://review.gluster.org/#/c/12257/1/in_progress/afr-self-heal-improvements.md
> (read section "Granular entry self-heals").
>

That look very interesting - in fact from my point of view, it replaces the
need for sharding altogether, that being the speed of heals.

>
> Yes. So Paul Cuzner and Satheesaran who have been testing sharding here
> have reported better write performance with 512M shards. I'd be interested
> to know what you feel about performance with relatively larger shards
> (think 512M).
>

Seq Read speeds basically tripled, and seq writes improved to the limit of
the network connection.

Cheers,

-- 
Lindsay
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20151027/d7fa7943/attachment.html>