[Gluster-users] Gluster 3.8.10 rebalance VMs corruption

Krutika Dhananjay kdhananj at redhat.com
Sun Mar 19 11:01:49 UTC 2017


While I'm still going through the logs, just wanted to point out a couple
of things:

1. It is recommended that you use 3-way replication (replica count 3) for
VM store use case
2. network.ping-timeout at 5 seconds is way too low. Please change it to 30.

Is there any specific reason for using NFS-Ganesha over gfapi/FUSE?

Will get back with anything else I might find or more questions if I have
any.

-Krutika

On Sun, Mar 19, 2017 at 2:36 PM, Mahdi Adnan <mahdi.adnan at outlook.com>
wrote:

> Thanks mate,
>
> Kindly, check the attachment.
>
>
>
> --
>
> Respectfully
> *Mahdi A. Mahdi*
>
> ------------------------------
> *From:* Krutika Dhananjay <kdhananj at redhat.com>
> *Sent:* Sunday, March 19, 2017 10:00:22 AM
>
> *To:* Mahdi Adnan
> *Cc:* gluster-users at gluster.org
> *Subject:* Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption
>
> In that case could you share the ganesha-gfapi logs?
>
> -Krutika
>
> On Sun, Mar 19, 2017 at 12:13 PM, Mahdi Adnan <mahdi.adnan at outlook.com>
> wrote:
>
>> I have two volumes, one is mounted using libgfapi for ovirt mount, the
>> other one is exported via NFS-Ganesha for VMWare which is the one im
>> testing now.
>>
>>
>>
>> --
>>
>> Respectfully
>> *Mahdi A. Mahdi*
>>
>> ------------------------------
>> *From:* Krutika Dhananjay <kdhananj at redhat.com>
>> *Sent:* Sunday, March 19, 2017 8:02:19 AM
>>
>> *To:* Mahdi Adnan
>> *Cc:* gluster-users at gluster.org
>> *Subject:* Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption
>>
>>
>>
>> On Sat, Mar 18, 2017 at 10:36 PM, Mahdi Adnan <mahdi.adnan at outlook.com>
>> wrote:
>>
>>> Kindly, check the attached new log file, i dont know if it's helpful or
>>> not but, i couldn't find the log with the name you just described.
>>>
>> No. Are you using FUSE or libgfapi for accessing the volume? Or is it NFS?
>>
>> -Krutika
>>
>>>
>>>
>>> --
>>>
>>> Respectfully
>>> *Mahdi A. Mahdi*
>>>
>>> ------------------------------
>>> *From:* Krutika Dhananjay <kdhananj at redhat.com>
>>> *Sent:* Saturday, March 18, 2017 6:10:40 PM
>>>
>>> *To:* Mahdi Adnan
>>> *Cc:* gluster-users at gluster.org
>>> *Subject:* Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption
>>>
>>> mnt-disk11-vmware2.log seems like a brick log. Could you attach the fuse
>>> mount logs? It should be right under /var/log/glusterfs/ directory
>>> named after the mount point name, only hyphenated.
>>>
>>> -Krutika
>>>
>>> On Sat, Mar 18, 2017 at 7:27 PM, Mahdi Adnan <mahdi.adnan at outlook.com>
>>> wrote:
>>>
>>>> Hello Krutika,
>>>>
>>>>
>>>> Kindly, check the attached logs.
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Respectfully
>>>> *Mahdi A. Mahdi*
>>>>
>>>> ------------------------------
>>>> *From:* Krutika Dhananjay <kdhananj at redhat.com>
>>>> *Sent:* Saturday, March 18, 2017 3:29:03 PM
>>>> *To:* Mahdi Adnan
>>>> *Cc:* gluster-users at gluster.org
>>>> *Subject:* Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption
>>>>
>>>> Hi Mahdi,
>>>>
>>>> Could you attach mount, brick and rebalance logs?
>>>>
>>>> -Krutika
>>>>
>>>> On Sat, Mar 18, 2017 at 12:14 AM, Mahdi Adnan <mahdi.adnan at outlook.com>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I have upgraded to Gluster 3.8.10 today and ran the add-brick
>>>>> procedure in a volume contains few VMs.
>>>>> After the completion of rebalance, i have rebooted the VMs, some of
>>>>> ran just fine, and others just crashed.
>>>>> Windows boot to recovery mode and Linux throw xfs errors and does not
>>>>> boot.
>>>>> I ran the test again and it happened just as the first one, but i have
>>>>> noticed only VMs doing disk IOs are affected by this bug.
>>>>> The VMs in power off mode started fine and even md5 of the disk file
>>>>> did not change after the rebalance.
>>>>>
>>>>> anyone else can confirm this ?
>>>>>
>>>>>
>>>>> Volume info:
>>>>>
>>>>> Volume Name: vmware2
>>>>> Type: Distributed-Replicate
>>>>> Volume ID: 02328d46-a285-4533-aa3a-fb9bfeb688bf
>>>>> Status: Started
>>>>> Snapshot Count: 0
>>>>> Number of Bricks: 22 x 2 = 44
>>>>> Transport-type: tcp
>>>>> Bricks:
>>>>> Brick1: gluster01:/mnt/disk1/vmware2
>>>>> Brick2: gluster03:/mnt/disk1/vmware2
>>>>> Brick3: gluster02:/mnt/disk1/vmware2
>>>>> Brick4: gluster04:/mnt/disk1/vmware2
>>>>> Brick5: gluster01:/mnt/disk2/vmware2
>>>>> Brick6: gluster03:/mnt/disk2/vmware2
>>>>> Brick7: gluster02:/mnt/disk2/vmware2
>>>>> Brick8: gluster04:/mnt/disk2/vmware2
>>>>> Brick9: gluster01:/mnt/disk3/vmware2
>>>>> Brick10: gluster03:/mnt/disk3/vmware2
>>>>> Brick11: gluster02:/mnt/disk3/vmware2
>>>>> Brick12: gluster04:/mnt/disk3/vmware2
>>>>> Brick13: gluster01:/mnt/disk4/vmware2
>>>>> Brick14: gluster03:/mnt/disk4/vmware2
>>>>> Brick15: gluster02:/mnt/disk4/vmware2
>>>>> Brick16: gluster04:/mnt/disk4/vmware2
>>>>> Brick17: gluster01:/mnt/disk5/vmware2
>>>>> Brick18: gluster03:/mnt/disk5/vmware2
>>>>> Brick19: gluster02:/mnt/disk5/vmware2
>>>>> Brick20: gluster04:/mnt/disk5/vmware2
>>>>> Brick21: gluster01:/mnt/disk6/vmware2
>>>>> Brick22: gluster03:/mnt/disk6/vmware2
>>>>> Brick23: gluster02:/mnt/disk6/vmware2
>>>>> Brick24: gluster04:/mnt/disk6/vmware2
>>>>> Brick25: gluster01:/mnt/disk7/vmware2
>>>>> Brick26: gluster03:/mnt/disk7/vmware2
>>>>> Brick27: gluster02:/mnt/disk7/vmware2
>>>>> Brick28: gluster04:/mnt/disk7/vmware2
>>>>> Brick29: gluster01:/mnt/disk8/vmware2
>>>>> Brick30: gluster03:/mnt/disk8/vmware2
>>>>> Brick31: gluster02:/mnt/disk8/vmware2
>>>>> Brick32: gluster04:/mnt/disk8/vmware2
>>>>> Brick33: gluster01:/mnt/disk9/vmware2
>>>>> Brick34: gluster03:/mnt/disk9/vmware2
>>>>> Brick35: gluster02:/mnt/disk9/vmware2
>>>>> Brick36: gluster04:/mnt/disk9/vmware2
>>>>> Brick37: gluster01:/mnt/disk10/vmware2
>>>>> Brick38: gluster03:/mnt/disk10/vmware2
>>>>> Brick39: gluster02:/mnt/disk10/vmware2
>>>>> Brick40: gluster04:/mnt/disk10/vmware2
>>>>> Brick41: gluster01:/mnt/disk11/vmware2
>>>>> Brick42: gluster03:/mnt/disk11/vmware2
>>>>> Brick43: gluster02:/mnt/disk11/vmware2
>>>>> Brick44: gluster04:/mnt/disk11/vmware2
>>>>> Options Reconfigured:
>>>>> cluster.server-quorum-type: server
>>>>> nfs.disable: on
>>>>> performance.readdir-ahead: on
>>>>> transport.address-family: inet
>>>>> performance.quick-read: off
>>>>> performance.read-ahead: off
>>>>> performance.io-cache: off
>>>>> performance.stat-prefetch: off
>>>>> cluster.eager-lock: enable
>>>>> network.remote-dio: enable
>>>>> features.shard: on
>>>>> cluster.data-self-heal-algorithm: full
>>>>> features.cache-invalidation: on
>>>>> ganesha.enable: on
>>>>> features.shard-block-size: 256MB
>>>>> client.event-threads: 2
>>>>> server.event-threads: 2
>>>>> cluster.favorite-child-policy: size
>>>>> storage.build-pgfid: off
>>>>> network.ping-timeout: 5
>>>>> cluster.enable-shared-storage: enable
>>>>> nfs-ganesha: enable
>>>>> cluster.server-quorum-ratio: 51%
>>>>>
>>>>>
>>>>> Adding bricks:
>>>>> gluster volume add-brick vmware2 replica 2
>>>>> gluster01:/mnt/disk11/vmware2 gluster03:/mnt/disk11/vmware2
>>>>> gluster02:/mnt/disk11/vmware2 gluster04:/mnt/disk11/vmware2
>>>>>
>>>>>
>>>>> starting fix layout:
>>>>> gluster volume rebalance vmware2 fix-layout start
>>>>>
>>>>> Starting rebalance:
>>>>> gluster volume rebalance vmware2  start
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Respectfully
>>>>> *Mahdi A. Mahdi*
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Gluster-users mailing list
>>>>> Gluster-users at gluster.org
>>>>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>
>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170319/882acefb/attachment.html>


More information about the Gluster-users mailing list