[Gluster-Maintainers] [Gluster-devel] Gluster Test Thursday - Release 3.9

Thu Nov 3 11:12:23 UTC 2016

On Thu, Nov 3, 2016 at 9:55 AM, Pranith Kumar Karampuri <pkarampu at redhat.com
> wrote:

>
>
> On Wed, Nov 2, 2016 at 7:00 PM, Krutika Dhananjay <kdhananj at redhat.com>
> wrote:
>
>> Just finished testing VM storage use-case.
>>
>> *Volume configuration used:*
>>
>> [root at srv-1 ~]# gluster volume info
>>
>> Volume Name: rep
>> Type: Replicate
>> Volume ID: 2c603783-c1da-49b7-8100-0238c777b731
>> Status: Started
>> Snapshot Count: 0
>> Number of Bricks: 1 x 3 = 3
>> Transport-type: tcp
>> Bricks:
>> Brick1: srv-1:/bricks/rep1
>> Brick2: srv-2:/bricks/rep2
>> Brick3: srv-3:/bricks/rep4
>> Options Reconfigured:
>> nfs.disable: on
>> performance.readdir-ahead: on
>> transport.address-family: inet
>> performance.quick-read: off
>> performance.read-ahead: off
>> performance.io-cache: off
>> performance.stat-prefetch: off
>> cluster.eager-lock: enable
>> network.remote-dio: enable
>> cluster.quorum-type: auto
>> cluster.server-quorum-type: server
>> features.shard: on
>> cluster.granular-entry-heal: on
>> cluster.locking-scheme: granular
>> network.ping-timeout: 30
>> server.allow-insecure: on
>> storage.owner-uid: 107
>> storage.owner-gid: 107
>> cluster.data-self-heal-algorithm: full
>>
>> Used FUSE to mount the volume locally on each of the 3 nodes (no external
>> clients).
>> shard-block-size - 4MB.
>>
>> *TESTS AND RESULTS:*
>>
>> *What works:*
>>
>> * Created 3 vm images, one per hypervisor. Installed fedora 24 on all of
>> them.
>>   Used virt-manager for ease of setting up the environment. Installation
>> went fine. All green.
>>
>> * Rebooted the vms. Worked fine.
>>
>> * Killed brick-1. Ran dd on the three vms to create a 'src' file.
>> Captured their md5sum value. Verified that
>> the gfid indices and name indices are created under
>> .glusterfs/indices/xattrop and .glusterfs/indices/entry-changes
>> respectively as they should. Brought the brick back up. Waited until heal
>> completed. Captured md5sum again. They matched.
>>
>> * Killed brick-2. Copied 'src' file from the step above into new file
>> using dd. Captured md5sum on the newly created file.
>> Checksum matched. Waited for heal to finish. Captured md5sum again.
>> Everything matched.
>>
>> * Repeated the test above with brick-3 being killed and brought back up
>> after a while. Worked fine.
>>
>> At the end I also captured md5sums from the backend of the shards on the
>> three replicas. They all were found to be
>> in sync. So far so good.
>>
>> *What did NOT work:*
>>
>> * Started dd again on all 3 vms to copy the existing files to new files.
>> While dd was running, I ran replace-brick to replace the third brick with a
>> new brick on the same node with a different path. This caused dd on all
>> three vms to simultaneously fail with "Input/Output error". I tried to read
>> off the files, even that failed. Rebooted the vms. By this time, /.shard is
>> in
>> split-brain as per heal-info. And the vms seem to have suffered
>> corruption and are in an irrecoverable state.
>>
>> I checked the logs. The pattern is very much similar to the one in the
>> add-brick bug Lindsay reported here - https://bugzilla.redhat.com/sh
>> ow_bug.cgi?id=1387878. Seems like something is going wrong each time
>> there is a graph switch.
>>
>> @Aravinda and Pranith:
>>
>> I will need some time to debug this, if 3.9 release can wait until it is
>> RC'd and fixed.
>> Otherwise we will need to caution the users to not do replace-brick,
>> add-brick etc (or any form of graph switch for that matter) *might* cause
>> vm corruption, irrespective of whether the users are using FUSE or gfapi,
>> in 3.9.0.
>>
>> Let me know what your decision is.
>>
>
> Since this bug is not a regression let us document this as a known issue.
> Let us do our best to get the fix in next release.
>
> I am almost done with testing afr and ec.
>
> For afr, leaks etc were not there in the tests I did.
> But I am seeing performance drop for crawling related tests.
>
> This is with 3.9.0rc2
> running directory_crawl_create ... done (252.91 secs)
> running directory_crawl ... done (104.83 secs)
> running directory_recrawl ... done (71.20 secs)
> running metadata_modify ... done (324.83 secs)
> running directory_crawl_delete ... done (124.22 secs)
>

I guess this was a one off: I ran it again thrice for both 3.8.5 and
3.9.0rc2 and the numbers looked similar. Will try EC once again.

>
> This is with 3.8.5
> running directory_crawl_create ... done (176.48 secs)
> running directory_crawl ... done (9.99 secs)
> running directory_recrawl ... done (7.15 secs)
> running metadata_modify ... done (198.36 secs)
> running directory_crawl_delete ... done (89.32 secs)
>
> I am not seeing good performance with ec in 3.9.0rc2 when compared to
> 3.8.5 either.
>
> With v3.9.0rc2:
> running emptyfiles_create ... done (1278.63 secs)
> running emptyfiles_delete ... done (254.60 secs)
> running smallfiles_create ... done (1663.04 secs)
>
> With v3.8.5:
> emptyfiles_create       756.11
> emptyfiles_delete       349.97
> smallfiles_create       903.47
>
> Functionality is fine in both, only the performance. Since these are
> regressions I will spend some time on these to find what could be the
> reason.
>
>
>> -Krutika
>>
>>
>> On Wed, Oct 26, 2016 at 8:04 PM, Aravinda <avishwan at redhat.com> wrote:
>>
>>> Gluster 3.9.0rc2 tarball is available here
>>> http://bits.gluster.org/pub/gluster/glusterfs/src/glusterfs-
>>> 3.9.0rc2.tar.gz
>>>
>>> regards
>>> Aravinda
>>>
>>>
>>> On Tuesday 25 October 2016 04:12 PM, Aravinda wrote:
>>>
>>>> Hi,
>>>>
>>>> Since Automated test framework for Gluster is in progress, we need help
>>>> from Maintainers and developers to test the features and bug fixes to
>>>> release Gluster 3.9.
>>>>
>>>> In last maintainers meeting Shyam shared an idea about having a Test
>>>> day to accelerate the testing and release.
>>>>
>>>> Please participate in testing your component(s) on Oct 27, 2016. We
>>>> will prepare the rc2 build by tomorrow and share the details before Test
>>>> day.
>>>>
>>>> RC1 Link: http://www.gluster.org/pipermail/maintainers/2016-September/
>>>> 001442.html
>>>> Release Checklist: https://public.pad.fsfe.org/p/
>>>> gluster-component-release-checklist
>>>>
>>>>
>>>> Thanks and Regards
>>>> Aravinda and Pranith
>>>>
>>>>
>>> _______________________________________________
>>> Gluster-devel mailing list
>>> Gluster-devel at gluster.org
>>> http://www.gluster.org/mailman/listinfo/gluster-devel
>>>
>>
>>
>
>
> --
> Pranith
>

-- 
Pranith
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/maintainers/attachments/20161103/81bd92e9/attachment.html>