[Gluster-devel] Gluster Test Thursday - Release 3.9
kdhananj at redhat.com
Wed Nov 2 13:30:02 UTC 2016
Just finished testing VM storage use-case.
*Volume configuration used:*
[root at srv-1 ~]# gluster volume info
Volume Name: rep
Volume ID: 2c603783-c1da-49b7-8100-0238c777b731
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Used FUSE to mount the volume locally on each of the 3 nodes (no external
shard-block-size - 4MB.
*TESTS AND RESULTS:*
* Created 3 vm images, one per hypervisor. Installed fedora 24 on all of
Used virt-manager for ease of setting up the environment. Installation
went fine. All green.
* Rebooted the vms. Worked fine.
* Killed brick-1. Ran dd on the three vms to create a 'src' file. Captured
their md5sum value. Verified that
the gfid indices and name indices are created under
.glusterfs/indices/xattrop and .glusterfs/indices/entry-changes
respectively as they should. Brought the brick back up. Waited until heal
completed. Captured md5sum again. They matched.
* Killed brick-2. Copied 'src' file from the step above into new file using
dd. Captured md5sum on the newly created file.
Checksum matched. Waited for heal to finish. Captured md5sum again.
* Repeated the test above with brick-3 being killed and brought back up
after a while. Worked fine.
At the end I also captured md5sums from the backend of the shards on the
three replicas. They all were found to be
in sync. So far so good.
*What did NOT work:*
* Started dd again on all 3 vms to copy the existing files to new files.
While dd was running, I ran replace-brick to replace the third brick with a
new brick on the same node with a different path. This caused dd on all
three vms to simultaneously fail with "Input/Output error". I tried to read
off the files, even that failed. Rebooted the vms. By this time, /.shard is
split-brain as per heal-info. And the vms seem to have suffered corruption
and are in an irrecoverable state.
I checked the logs. The pattern is very much similar to the one in the
add-brick bug Lindsay reported here -
https://bugzilla.redhat.com/show_bug.cgi?id=1387878. Seems like something
is going wrong each time there is a graph switch.
@Aravinda and Pranith:
I will need some time to debug this, if 3.9 release can wait until it is
RC'd and fixed.
Otherwise we will need to caution the users to not do replace-brick,
add-brick etc (or any form of graph switch for that matter) *might* cause
vm corruption, irrespective of whether the users are using FUSE or gfapi,
Let me know what your decision is.
On Wed, Oct 26, 2016 at 8:04 PM, Aravinda <avishwan at redhat.com> wrote:
> Gluster 3.9.0rc2 tarball is available here
> On Tuesday 25 October 2016 04:12 PM, Aravinda wrote:
>> Since Automated test framework for Gluster is in progress, we need help
>> from Maintainers and developers to test the features and bug fixes to
>> release Gluster 3.9.
>> In last maintainers meeting Shyam shared an idea about having a Test day
>> to accelerate the testing and release.
>> Please participate in testing your component(s) on Oct 27, 2016. We will
>> prepare the rc2 build by tomorrow and share the details before Test day.
>> RC1 Link: http://www.gluster.org/pipermail/maintainers/2016-September/
>> Release Checklist: https://public.pad.fsfe.org/p/
>> Thanks and Regards
>> Aravinda and Pranith
> Gluster-devel mailing list
> Gluster-devel at gluster.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Gluster-devel