[Gluster-Maintainers] [Gluster-devel] Gluster Test Thursday - Release 3.9
Pranith Kumar Karampuri
pkarampu at redhat.com
Thu Nov 3 04:25:01 UTC 2016
On Wed, Nov 2, 2016 at 7:00 PM, Krutika Dhananjay <kdhananj at redhat.com>
> Just finished testing VM storage use-case.
> *Volume configuration used:*
> [root at srv-1 ~]# gluster volume info
> Volume Name: rep
> Type: Replicate
> Volume ID: 2c603783-c1da-49b7-8100-0238c777b731
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x 3 = 3
> Transport-type: tcp
> Brick1: srv-1:/bricks/rep1
> Brick2: srv-2:/bricks/rep2
> Brick3: srv-3:/bricks/rep4
> Options Reconfigured:
> nfs.disable: on
> performance.readdir-ahead: on
> transport.address-family: inet
> performance.quick-read: off
> performance.read-ahead: off
> performance.io-cache: off
> performance.stat-prefetch: off
> cluster.eager-lock: enable
> network.remote-dio: enable
> cluster.quorum-type: auto
> cluster.server-quorum-type: server
> features.shard: on
> cluster.granular-entry-heal: on
> cluster.locking-scheme: granular
> network.ping-timeout: 30
> server.allow-insecure: on
> storage.owner-uid: 107
> storage.owner-gid: 107
> cluster.data-self-heal-algorithm: full
> Used FUSE to mount the volume locally on each of the 3 nodes (no external
> shard-block-size - 4MB.
> *TESTS AND RESULTS:*
> *What works:*
> * Created 3 vm images, one per hypervisor. Installed fedora 24 on all of
> Used virt-manager for ease of setting up the environment. Installation
> went fine. All green.
> * Rebooted the vms. Worked fine.
> * Killed brick-1. Ran dd on the three vms to create a 'src' file. Captured
> their md5sum value. Verified that
> the gfid indices and name indices are created under
> .glusterfs/indices/xattrop and .glusterfs/indices/entry-changes
> respectively as they should. Brought the brick back up. Waited until heal
> completed. Captured md5sum again. They matched.
> * Killed brick-2. Copied 'src' file from the step above into new file
> using dd. Captured md5sum on the newly created file.
> Checksum matched. Waited for heal to finish. Captured md5sum again.
> Everything matched.
> * Repeated the test above with brick-3 being killed and brought back up
> after a while. Worked fine.
> At the end I also captured md5sums from the backend of the shards on the
> three replicas. They all were found to be
> in sync. So far so good.
> *What did NOT work:*
> * Started dd again on all 3 vms to copy the existing files to new files.
> While dd was running, I ran replace-brick to replace the third brick with a
> new brick on the same node with a different path. This caused dd on all
> three vms to simultaneously fail with "Input/Output error". I tried to read
> off the files, even that failed. Rebooted the vms. By this time, /.shard is
> split-brain as per heal-info. And the vms seem to have suffered corruption
> and are in an irrecoverable state.
> I checked the logs. The pattern is very much similar to the one in the
> add-brick bug Lindsay reported here - https://bugzilla.redhat.com/
> show_bug.cgi?id=1387878. Seems like something is going wrong each time
> there is a graph switch.
> @Aravinda and Pranith:
> I will need some time to debug this, if 3.9 release can wait until it is
> RC'd and fixed.
> Otherwise we will need to caution the users to not do replace-brick,
> add-brick etc (or any form of graph switch for that matter) *might* cause
> vm corruption, irrespective of whether the users are using FUSE or gfapi,
> in 3.9.0.
> Let me know what your decision is.
Since this bug is not a regression let us document this as a known issue.
Let us do our best to get the fix in next release.
I am almost done with testing afr and ec.
For afr, leaks etc were not there in the tests I did.
But I am seeing performance drop for crawling related tests.
This is with 3.9.0rc2
running directory_crawl_create ... done (252.91 secs)
running directory_crawl ... done (104.83 secs)
running directory_recrawl ... done (71.20 secs)
running metadata_modify ... done (324.83 secs)
running directory_crawl_delete ... done (124.22 secs)
This is with 3.8.5
running directory_crawl_create ... done (176.48 secs)
running directory_crawl ... done (9.99 secs)
running directory_recrawl ... done (7.15 secs)
running metadata_modify ... done (198.36 secs)
running directory_crawl_delete ... done (89.32 secs)
I am not seeing good performance with ec in 3.9.0rc2 when compared to 3.8.5
running emptyfiles_create ... done (1278.63 secs)
running emptyfiles_delete ... done (254.60 secs)
running smallfiles_create ... done (1663.04 secs)
Functionality is fine in both, only the performance. Since these are
regressions I will spend some time on these to find what could be the
> On Wed, Oct 26, 2016 at 8:04 PM, Aravinda <avishwan at redhat.com> wrote:
>> Gluster 3.9.0rc2 tarball is available here
>> On Tuesday 25 October 2016 04:12 PM, Aravinda wrote:
>>> Since Automated test framework for Gluster is in progress, we need help
>>> from Maintainers and developers to test the features and bug fixes to
>>> release Gluster 3.9.
>>> In last maintainers meeting Shyam shared an idea about having a Test day
>>> to accelerate the testing and release.
>>> Please participate in testing your component(s) on Oct 27, 2016. We will
>>> prepare the rc2 build by tomorrow and share the details before Test day.
>>> RC1 Link: http://www.gluster.org/pipermail/maintainers/2016-September/
>>> Release Checklist: https://public.pad.fsfe.org/p/
>>> Thanks and Regards
>>> Aravinda and Pranith
>> Gluster-devel mailing list
>> Gluster-devel at gluster.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the maintainers