[Gluster-Maintainers] [Gluster-devel] Gluster Test Thursday - Release 3.9

FNU Raghavendra Manjunath rabhat at redhat.com
Fri Nov 4 16:02:25 UTC 2016


Tested Bitrot related aspects. Created data, enabled bitrot and created
more data. The files were signed by the bitrot daemon. Simulated the
corruption by editing a file directly in the backend.
Triggered scrubbing (on demand). Found that the corrupted files were marked
bad by the scrubber.

Also ran general tests such as compiling gluster code base on the mount
point, dbench. The tests passed properly.

Still running some more tests. Will keep updated.

Regards,
Raghavendra


On Fri, Nov 4, 2016 at 12:43 AM, Pranith Kumar Karampuri <
pkarampu at redhat.com> wrote:

>
>
> On Thu, Nov 3, 2016 at 4:42 PM, Pranith Kumar Karampuri <
> pkarampu at redhat.com> wrote:
>
>>
>>
>> On Thu, Nov 3, 2016 at 9:55 AM, Pranith Kumar Karampuri <
>> pkarampu at redhat.com> wrote:
>>
>>>
>>>
>>> On Wed, Nov 2, 2016 at 7:00 PM, Krutika Dhananjay <kdhananj at redhat.com>
>>> wrote:
>>>
>>>> Just finished testing VM storage use-case.
>>>>
>>>> *Volume configuration used:*
>>>>
>>>> [root at srv-1 ~]# gluster volume info
>>>>
>>>> Volume Name: rep
>>>> Type: Replicate
>>>> Volume ID: 2c603783-c1da-49b7-8100-0238c777b731
>>>> Status: Started
>>>> Snapshot Count: 0
>>>> Number of Bricks: 1 x 3 = 3
>>>> Transport-type: tcp
>>>> Bricks:
>>>> Brick1: srv-1:/bricks/rep1
>>>> Brick2: srv-2:/bricks/rep2
>>>> Brick3: srv-3:/bricks/rep4
>>>> Options Reconfigured:
>>>> nfs.disable: on
>>>> performance.readdir-ahead: on
>>>> transport.address-family: inet
>>>> performance.quick-read: off
>>>> performance.read-ahead: off
>>>> performance.io-cache: off
>>>> performance.stat-prefetch: off
>>>> cluster.eager-lock: enable
>>>> network.remote-dio: enable
>>>> cluster.quorum-type: auto
>>>> cluster.server-quorum-type: server
>>>> features.shard: on
>>>> cluster.granular-entry-heal: on
>>>> cluster.locking-scheme: granular
>>>> network.ping-timeout: 30
>>>> server.allow-insecure: on
>>>> storage.owner-uid: 107
>>>> storage.owner-gid: 107
>>>> cluster.data-self-heal-algorithm: full
>>>>
>>>> Used FUSE to mount the volume locally on each of the 3 nodes (no
>>>> external clients).
>>>> shard-block-size - 4MB.
>>>>
>>>> *TESTS AND RESULTS:*
>>>>
>>>> *What works:*
>>>>
>>>> * Created 3 vm images, one per hypervisor. Installed fedora 24 on all
>>>> of them.
>>>>   Used virt-manager for ease of setting up the environment.
>>>> Installation went fine. All green.
>>>>
>>>> * Rebooted the vms. Worked fine.
>>>>
>>>> * Killed brick-1. Ran dd on the three vms to create a 'src' file.
>>>> Captured their md5sum value. Verified that
>>>> the gfid indices and name indices are created under
>>>> .glusterfs/indices/xattrop and .glusterfs/indices/entry-changes
>>>> respectively as they should. Brought the brick back up. Waited until heal
>>>> completed. Captured md5sum again. They matched.
>>>>
>>>> * Killed brick-2. Copied 'src' file from the step above into new file
>>>> using dd. Captured md5sum on the newly created file.
>>>> Checksum matched. Waited for heal to finish. Captured md5sum again.
>>>> Everything matched.
>>>>
>>>> * Repeated the test above with brick-3 being killed and brought back up
>>>> after a while. Worked fine.
>>>>
>>>> At the end I also captured md5sums from the backend of the shards on
>>>> the three replicas. They all were found to be
>>>> in sync. So far so good.
>>>>
>>>> *What did NOT work:*
>>>>
>>>> * Started dd again on all 3 vms to copy the existing files to new
>>>> files. While dd was running, I ran replace-brick to replace the third brick
>>>> with a new brick on the same node with a different path. This caused dd on
>>>> all three vms to simultaneously fail with "Input/Output error". I tried to
>>>> read off the files, even that failed. Rebooted the vms. By this time,
>>>> /.shard is in
>>>> split-brain as per heal-info. And the vms seem to have suffered
>>>> corruption and are in an irrecoverable state.
>>>>
>>>> I checked the logs. The pattern is very much similar to the one in the
>>>> add-brick bug Lindsay reported here - https://bugzilla.redhat.com/sh
>>>> ow_bug.cgi?id=1387878. Seems like something is going wrong each time
>>>> there is a graph switch.
>>>>
>>>> @Aravinda and Pranith:
>>>>
>>>> I will need some time to debug this, if 3.9 release can wait until it
>>>> is RC'd and fixed.
>>>> Otherwise we will need to caution the users to not do replace-brick,
>>>> add-brick etc (or any form of graph switch for that matter) *might* cause
>>>> vm corruption, irrespective of whether the users are using FUSE or gfapi,
>>>> in 3.9.0.
>>>>
>>>> Let me know what your decision is.
>>>>
>>>
>>> Since this bug is not a regression let us document this as a known
>>> issue. Let us do our best to get the fix in next release.
>>>
>>> I am almost done with testing afr and ec.
>>>
>>> For afr, leaks etc were not there in the tests I did.
>>> But I am seeing performance drop for crawling related tests.
>>>
>>> This is with 3.9.0rc2
>>> running directory_crawl_create ... done (252.91 secs)
>>> running directory_crawl ... done (104.83 secs)
>>> running directory_recrawl ... done (71.20 secs)
>>> running metadata_modify ... done (324.83 secs)
>>> running directory_crawl_delete ... done (124.22 secs)
>>>
>>
>> I guess this was a one off: I ran it again thrice for both 3.8.5 and
>> 3.9.0rc2 and the numbers looked similar. Will try EC once again.
>>
>>
>>>
>>> This is with 3.8.5
>>> running directory_crawl_create ... done (176.48 secs)
>>> running directory_crawl ... done (9.99 secs)
>>> running directory_recrawl ... done (7.15 secs)
>>> running metadata_modify ... done (198.36 secs)
>>> running directory_crawl_delete ... done (89.32 secs)
>>>
>>> I am not seeing good performance with ec in 3.9.0rc2 when compared to
>>> 3.8.5 either.
>>>
>>> With v3.9.0rc2:
>>> running emptyfiles_create ... done (1278.63 secs)
>>> running emptyfiles_delete ... done (254.60 secs)
>>> running smallfiles_create ... done (1663.04 secs)
>>>
>>>
>>
>>> With v3.8.5:
>>> emptyfiles_create       756.11
>>> emptyfiles_delete       349.97
>>> smallfiles_create       903.47
>>>
>>> Functionality is fine in both, only the performance. Since these are
>>> regressions I will spend some time on these to find what could be the
>>> reason.
>>>
>>
> I think the number of xattr calls increased significantly in 3.9.x I only
> guessed to have more setxattr calls but I was wrong. Will need to
> investigate more
>
> root at dhcp35-190 - ~/ec-strace
> 10:10:22 :) ⚡ grep xattr 3.9-syscalls.txt
>  197657 fgetxattr
>    8417 fsetxattr
> 8520762 lgetxattr
>   26202 llistxattr
> 1011455 lsetxattr
>
>
> root at dhcp35-190 - ~/ec-strace-3.8.5
> 10:10:10 :) ⚡ grep xattr 3.8.5-syscalls.txt
>  512140 fgetxattr
>    3226 fsetxattr
> 7206715 lgetxattr
>       8 llistxattr
>  605425 lsetxattr
>
>
>
>>
>>>
>>>> -Krutika
>>>>
>>>>
>>>> On Wed, Oct 26, 2016 at 8:04 PM, Aravinda <avishwan at redhat.com> wrote:
>>>>
>>>>> Gluster 3.9.0rc2 tarball is available here
>>>>> http://bits.gluster.org/pub/gluster/glusterfs/src/glusterfs-
>>>>> 3.9.0rc2.tar.gz
>>>>>
>>>>> regards
>>>>> Aravinda
>>>>>
>>>>>
>>>>> On Tuesday 25 October 2016 04:12 PM, Aravinda wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Since Automated test framework for Gluster is in progress, we need
>>>>>> help from Maintainers and developers to test the features and bug fixes to
>>>>>> release Gluster 3.9.
>>>>>>
>>>>>> In last maintainers meeting Shyam shared an idea about having a Test
>>>>>> day to accelerate the testing and release.
>>>>>>
>>>>>> Please participate in testing your component(s) on Oct 27, 2016. We
>>>>>> will prepare the rc2 build by tomorrow and share the details before Test
>>>>>> day.
>>>>>>
>>>>>> RC1 Link: http://www.gluster.org/piperma
>>>>>> il/maintainers/2016-September/001442.html
>>>>>> Release Checklist: https://public.pad.fsfe.org/p/
>>>>>> gluster-component-release-checklist
>>>>>>
>>>>>>
>>>>>> Thanks and Regards
>>>>>> Aravinda and Pranith
>>>>>>
>>>>>>
>>>>> _______________________________________________
>>>>> Gluster-devel mailing list
>>>>> Gluster-devel at gluster.org
>>>>> http://www.gluster.org/mailman/listinfo/gluster-devel
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Pranith
>>>
>>
>>
>>
>> --
>> Pranith
>>
>
>
>
> --
> Pranith
>
> _______________________________________________
> maintainers mailing list
> maintainers at gluster.org
> http://www.gluster.org/mailman/listinfo/maintainers
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/maintainers/attachments/20161104/c83b605a/attachment-0001.html>


More information about the maintainers mailing list