[Gluster-devel] Gluster Test Thursday - Release 3.9

Thu Nov 3 04:25:01 UTC 2016

On Wed, Nov 2, 2016 at 7:00 PM, Krutika Dhananjay <kdhananj at redhat.com>
wrote:

> Just finished testing VM storage use-case.
>
> *Volume configuration used:*
>
> [root at srv-1 ~]# gluster volume info
>
> Volume Name: rep
> Type: Replicate
> Volume ID: 2c603783-c1da-49b7-8100-0238c777b731
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x 3 = 3
> Transport-type: tcp
> Bricks:
> Brick1: srv-1:/bricks/rep1
> Brick2: srv-2:/bricks/rep2
> Brick3: srv-3:/bricks/rep4
> Options Reconfigured:
> nfs.disable: on
> performance.readdir-ahead: on
> transport.address-family: inet
> performance.quick-read: off
> performance.read-ahead: off
> performance.io-cache: off
> performance.stat-prefetch: off
> cluster.eager-lock: enable
> network.remote-dio: enable
> cluster.quorum-type: auto
> cluster.server-quorum-type: server
> features.shard: on
> cluster.granular-entry-heal: on
> cluster.locking-scheme: granular
> network.ping-timeout: 30
> server.allow-insecure: on
> storage.owner-uid: 107
> storage.owner-gid: 107
> cluster.data-self-heal-algorithm: full
>
> Used FUSE to mount the volume locally on each of the 3 nodes (no external
> clients).
> shard-block-size - 4MB.
>
> *TESTS AND RESULTS:*
>
> *What works:*
>
> * Created 3 vm images, one per hypervisor. Installed fedora 24 on all of
> them.
>   Used virt-manager for ease of setting up the environment. Installation
> went fine. All green.
>
> * Rebooted the vms. Worked fine.
>
> * Killed brick-1. Ran dd on the three vms to create a 'src' file. Captured
> their md5sum value. Verified that
> the gfid indices and name indices are created under
> .glusterfs/indices/xattrop and .glusterfs/indices/entry-changes
> respectively as they should. Brought the brick back up. Waited until heal
> completed. Captured md5sum again. They matched.
>
> * Killed brick-2. Copied 'src' file from the step above into new file
> using dd. Captured md5sum on the newly created file.
> Checksum matched. Waited for heal to finish. Captured md5sum again.
> Everything matched.
>
> * Repeated the test above with brick-3 being killed and brought back up
> after a while. Worked fine.
>
> At the end I also captured md5sums from the backend of the shards on the
> three replicas. They all were found to be
> in sync. So far so good.
>
> *What did NOT work:*
>
> * Started dd again on all 3 vms to copy the existing files to new files.
> While dd was running, I ran replace-brick to replace the third brick with a
> new brick on the same node with a different path. This caused dd on all
> three vms to simultaneously fail with "Input/Output error". I tried to read
> off the files, even that failed. Rebooted the vms. By this time, /.shard is
> in
> split-brain as per heal-info. And the vms seem to have suffered corruption
> and are in an irrecoverable state.
>
> I checked the logs. The pattern is very much similar to the one in the
> add-brick bug Lindsay reported here - https://bugzilla.redhat.com/
> show_bug.cgi?id=1387878. Seems like something is going wrong each time
> there is a graph switch.
>
> @Aravinda and Pranith:
>
> I will need some time to debug this, if 3.9 release can wait until it is
> RC'd and fixed.
> Otherwise we will need to caution the users to not do replace-brick,
> add-brick etc (or any form of graph switch for that matter) *might* cause
> vm corruption, irrespective of whether the users are using FUSE or gfapi,
> in 3.9.0.
>
> Let me know what your decision is.
>

Since this bug is not a regression let us document this as a known issue.
Let us do our best to get the fix in next release.

I am almost done with testing afr and ec.

For afr, leaks etc were not there in the tests I did.
But I am seeing performance drop for crawling related tests.

This is with 3.9.0rc2
running directory_crawl_create ... done (252.91 secs)
running directory_crawl ... done (104.83 secs)
running directory_recrawl ... done (71.20 secs)
running metadata_modify ... done (324.83 secs)
running directory_crawl_delete ... done (124.22 secs)

This is with 3.8.5
running directory_crawl_create ... done (176.48 secs)
running directory_crawl ... done (9.99 secs)
running directory_recrawl ... done (7.15 secs)
running metadata_modify ... done (198.36 secs)
running directory_crawl_delete ... done (89.32 secs)

I am not seeing good performance with ec in 3.9.0rc2 when compared to 3.8.5
either.

With v3.9.0rc2:
running emptyfiles_create ... done (1278.63 secs)
running emptyfiles_delete ... done (254.60 secs)
running smallfiles_create ... done (1663.04 secs)

With v3.8.5:
emptyfiles_create       756.11
emptyfiles_delete       349.97
smallfiles_create       903.47

Functionality is fine in both, only the performance. Since these are
regressions I will spend some time on these to find what could be the
reason.

> -Krutika
>
>
> On Wed, Oct 26, 2016 at 8:04 PM, Aravinda <avishwan at redhat.com> wrote:
>
>> Gluster 3.9.0rc2 tarball is available here
>> http://bits.gluster.org/pub/gluster/glusterfs/src/glusterfs-
>> 3.9.0rc2.tar.gz
>>
>> regards
>> Aravinda
>>
>>
>> On Tuesday 25 October 2016 04:12 PM, Aravinda wrote:
>>
>>> Hi,
>>>
>>> Since Automated test framework for Gluster is in progress, we need help
>>> from Maintainers and developers to test the features and bug fixes to
>>> release Gluster 3.9.
>>>
>>> In last maintainers meeting Shyam shared an idea about having a Test day
>>> to accelerate the testing and release.
>>>
>>> Please participate in testing your component(s) on Oct 27, 2016. We will
>>> prepare the rc2 build by tomorrow and share the details before Test day.
>>>
>>> RC1 Link: http://www.gluster.org/pipermail/maintainers/2016-September/
>>> 001442.html
>>> Release Checklist: https://public.pad.fsfe.org/p/
>>> gluster-component-release-checklist
>>>
>>>
>>> Thanks and Regards
>>> Aravinda and Pranith
>>>
>>>
>> _______________________________________________
>> Gluster-devel mailing list
>> Gluster-devel at gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-devel
>>
>
>

-- 
Pranith
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-devel/attachments/20161103/c40dbebe/attachment.html>