<html><body><div style="font-family: garamond,new york,times,serif; font-size: 12pt; color: #000000"><div><br></div><div><br></div><hr id="zwchr"><blockquote style="border-left:2px solid #1010FF;margin-left:5px;padding-left:5px;color:#000;font-weight:normal;font-style:normal;text-decoration:none;font-family:Helvetica,Arial,sans-serif;font-size:12pt;" data-mce-style="border-left: 2px solid #1010FF; margin-left: 5px; padding-left: 5px; color: #000; font-weight: normal; font-style: normal; text-decoration: none; font-family: Helvetica,Arial,sans-serif; font-size: 12pt;"><b>From: </b>"Lindsay Mathieson" <lindsay.mathieson@gmail.com><br><b>To: </b>"gluster-users" <gluster-users@gluster.org><br><b>Sent: </b>Sunday, October 25, 2015 11:59:17 AM<br><b>Subject: </b>[Gluster-users] Shard Volume testing (3.7.5)<br><div><br></div><div dir="ltr"><div><div><div><div><div><div><div><div><div class="gmail_extra"><br><div class="gmail_quote">On 18 October 2015 at 00:17, Vijay Bellur <span dir="ltr"><<a href="mailto:vbellur@redhat.com" target="_blank" data-mce-href="mailto:vbellur@redhat.com">vbellur@redhat.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex" data-mce-style="margin: 0px 0px 0px 0.8ex; border-left: 1px solid #cccccc; padding-left: 1ex;"><div id=":1pp" class="" style="overflow:hidden" data-mce-style="overflow: hidden;">Krutika has been working on several performance improvements for sharding and the results have been encouraging for virtual machine workloads.<br><br> Testing feedback would be very welcome!<br></div></blockquote></div></div></div></div></div></div></div></div></div></div></div></blockquote><div><br></div><div>Hi Lindsay,<br></div><div><br></div><div>Thank you for trying out sharding and for your feedback. :) Please find my comments inline.<br></div><blockquote style="border-left:2px solid #1010FF;margin-left:5px;padding-left:5px;color:#000;font-weight:normal;font-style:normal;text-decoration:none;font-family:Helvetica,Arial,sans-serif;font-size:12pt;" data-mce-style="border-left: 2px solid #1010FF; margin-left: 5px; padding-left: 5px; color: #000; font-weight: normal; font-style: normal; text-decoration: none; font-family: Helvetica,Arial,sans-serif; font-size: 12pt;"><div dir="ltr"><div><div><div><div><div><div><div><div><div class="gmail_extra">I've managed to setup a replica 3 3.7.5 shard test volume, hosted using virtualised debian 8.2 servers, so performance is a bit crap :)<br><div><br></div></div><div class="gmail_extra">3 Nodes, gn1, hn2 & gn3<br></div><div class="gmail_extra">Each node has:<br></div><div class="gmail_extra">- 1GB RAM<br></div><div class="gmail_extra">- 1GB Ethernet<br></div><div class="gmail_extra">- 512 GB disk hosted on a ZFS External USB Drive :)<br></div><div class="gmail_extra"><br></div><div class="gmail_extra">- Datastore is shared out via NFS to the main cluster for running a VM<br></div><div class="gmail_extra">- I have the datastore mounted using glusterfs inside each test node so I can examine the data directly.<br></div><div class="gmail_extra"><br></div><div class="gmail_extra"><br><div><br></div></div><div class="gmail_extra">I've got two VM's running off it, one a 65GB (25GB sparse) Windows 7. I'be running bench marks and testing node failures by killing the cluster processes and killing actual nodes.<br><div><br></div></div><div class="gmail_extra">- Heal speed is immensely faster, a matter of minutes rather than hours.<br></div>- Read performance is quite good</div></div></div></div></div></div></div></div></div></blockquote><div><br></div><div>Good to hear. :)<br></div><blockquote style="border-left:2px solid #1010FF;margin-left:5px;padding-left:5px;color:#000;font-weight:normal;font-style:normal;text-decoration:none;font-family:Helvetica,Arial,sans-serif;font-size:12pt;" data-mce-style="border-left: 2px solid #1010FF; margin-left: 5px; padding-left: 5px; color: #000; font-weight: normal; font-style: normal; text-decoration: none; font-family: Helvetica,Arial,sans-serif; font-size: 12pt;"><div dir="ltr"><div><div><div><div><div><div><div><div><br></div>- Write performance is atrocious, but given the limited resources not unexpected. </div></div></div></div></div></div></div></div></blockquote><div><br></div><div>With block size as low as 4MB, to the replicate module, these individual shards appear as large number of small(er) files, effectively turning it into some form of a small-file workload.<br></div><div>There is an enhancement being worked on in AFR by Pranith, which attempts to improve write performance which will especially be useful when used with sharding. That should make this problem go away.<br></div><div><br></div><blockquote style="border-left:2px solid #1010FF;margin-left:5px;padding-left:5px;color:#000;font-weight:normal;font-style:normal;text-decoration:none;font-family:Helvetica,Arial,sans-serif;font-size:12pt;" data-mce-style="border-left: 2px solid #1010FF; margin-left: 5px; padding-left: 5px; color: #000; font-weight: normal; font-style: normal; text-decoration: none; font-family: Helvetica,Arial,sans-serif; font-size: 12pt;"><div dir="ltr"><div><div><div><div><div><div><div><br>- I'll be upgrading my main cluster to jessie soon and will be able to test with real hardware and bonded connections, plus using gfapi direct. Then I'll be able to do real benchmarks.<br><div><br></div></div><div>One Bug:<br></div>After heals completed I shut down the VM's and run a MD5SUM on the VM image (via glusterfs) on each nodes. They all matched except for one time on gn3. Once I unmounted/remounted the datastore on gn3 the md5sum matched.</div></div></div></div></div></div></div></blockquote><div><br></div><div>This could possibly be the effect of a caching bug reported at <a href="https://bugzilla.redhat.com/show_bug.cgi?id=1272986">https://bugzilla.redhat.com/show_bug.cgi?id=1272986</a>. The fix is out for review and I'm confident that it will make it into 3.7.6.<br></div><div><br></div><blockquote style="border-left:2px solid #1010FF;margin-left:5px;padding-left:5px;color:#000;font-weight:normal;font-style:normal;text-decoration:none;font-family:Helvetica,Arial,sans-serif;font-size:12pt;" data-mce-style="border-left: 2px solid #1010FF; margin-left: 5px; padding-left: 5px; color: #000; font-weight: normal; font-style: normal; text-decoration: none; font-family: Helvetica,Arial,sans-serif; font-size: 12pt;"><div dir="ltr"><div><div><div><div><div><div><br><div><br></div></div>One Oddity:<br></div>gluster volume heals datastore info *always* shows a split brain on the directory, but it always heals without intervention. Dunno if this is normal on not.</div></div></div></div></div></blockquote><div><br></div><div>Which directory would this be? Do you have the glustershd logs?<br></div><blockquote style="border-left:2px solid #1010FF;margin-left:5px;padding-left:5px;color:#000;font-weight:normal;font-style:normal;text-decoration:none;font-family:Helvetica,Arial,sans-serif;font-size:12pt;" data-mce-style="border-left: 2px solid #1010FF; margin-left: 5px; padding-left: 5px; color: #000; font-weight: normal; font-style: normal; text-decoration: none; font-family: Helvetica,Arial,sans-serif; font-size: 12pt;"><div dir="ltr"><div><div><div><div><br><div><br></div></div><div>Questions:<br></div>- I'd be interested to know how the shard are organsied and accessed - it looks like 1000's of 4mb files in the .shard directory, I'm concerned access times will go in the toilet once many large VM images are stored on the volume.</div></div></div></div></blockquote><div><br></div><div>Here is some documentation on sharding: <a href="https://gluster.readthedocs.org/en/release-3.7.0/Features/shard/"><a>https://gluster.readthedocs.org/en/release-3.7.0/Features/shard/</a></a><a>. </a>Let me know if you have more questions, and I will be happy to answer them.<br></div><div>The problems we foresaw with too many 4MB shards is that</div><div>i. entry self-heal under /.shard could result in complete crawl of the /.shard directory during heal, or</div><div>ii. a disk replacement could involve lot many files needing to be created and healed to the sink brick,<br></div><div>both of which would result in slower "entry" heal and rather high resource consumption from self-heal daemon.<br></div><div>Fortunately, with the introduction of more granular changelogs in replicate module to identify exactly what files under a given directory need to be healed to the sink brick, these problems should go away.<br></div><div>In fact this enhancement is being worked upon as we speak and is targeted to be out by 3.8. Here is some doc: <a href="http://review.gluster.org/#/c/12257/1/in_progress/afr-self-heal-improvements.md">http://review.gluster.org/#/c/12257/1/in_progress/afr-self-heal-improvements.md</a> (read section "Granular entry self-heals").<br></div><div><br></div><blockquote style="border-left:2px solid #1010FF;margin-left:5px;padding-left:5px;color:#000;font-weight:normal;font-style:normal;text-decoration:none;font-family:Helvetica,Arial,sans-serif;font-size:12pt;" data-mce-style="border-left: 2px solid #1010FF; margin-left: 5px; padding-left: 5px; color: #000; font-weight: normal; font-style: normal; text-decoration: none; font-family: Helvetica,Arial,sans-serif; font-size: 12pt;"><div dir="ltr"><div><div><div><br><div><br></div></div>- Is it worth experimenting with different shard sizes?</div></div></div></blockquote><div><br></div><div>Sure! You could use 'gluster volume set <VOL> features.shard-block-size <size>' to reconfigure the shard size. The new size will be used to shard those files/images/vdisks that are created _after_ the block size was reconfigured.<br></div><blockquote style="border-left:2px solid #1010FF;margin-left:5px;padding-left:5px;color:#000;font-weight:normal;font-style:normal;text-decoration:none;font-family:Helvetica,Arial,sans-serif;font-size:12pt;" data-mce-style="border-left: 2px solid #1010FF; margin-left: 5px; padding-left: 5px; color: #000; font-weight: normal; font-style: normal; text-decoration: none; font-family: Helvetica,Arial,sans-serif; font-size: 12pt;"><div dir="ltr"><div><div><br><div><br></div></div>- Anything you'd like me to test?</div></div></blockquote><div><br></div><div>Yes. So Paul Cuzner and Satheesaran who have been testing sharding here have reported better write performance with 512M shards. I'd be interested to know what you feel about performance with relatively larger shards (think 512M).<br></div><div><br></div><div>-Krutika<br></div><blockquote style="border-left:2px solid #1010FF;margin-left:5px;padding-left:5px;color:#000;font-weight:normal;font-style:normal;text-decoration:none;font-family:Helvetica,Arial,sans-serif;font-size:12pt;" data-mce-style="border-left: 2px solid #1010FF; margin-left: 5px; padding-left: 5px; color: #000; font-weight: normal; font-style: normal; text-decoration: none; font-family: Helvetica,Arial,sans-serif; font-size: 12pt;"><div dir="ltr"><div><br><div><br></div></div>Thanks,<br clear="all"><div><div><div><div><div><div><div><div><div><br clear="all"><br>-- <br><div class="gmail_signature">Lindsay</div></div></div></div></div></div></div></div></div></div></div><br>_______________________________________________<br>Gluster-users mailing list<br>Gluster-users@gluster.org<br>http://www.gluster.org/mailman/listinfo/gluster-users</blockquote><div><br></div></div></body></html>