[Gluster-users] Gluster 3.8.10 rebalance VMs corruption

Mon Mar 20 15:07:36 UTC 2017

I looked at the logs.

>From the time the new graph (since the add-brick command you shared where
bricks 41 through 44 are added) is switched to (line 3011 onwards in
nfs-gfapi.log), I see the following kinds of errors:

1. Lookups to a bunch of files failed with ENOENT on both replicas which
protocol/client converts to ESTALE. I am guessing these entries got
migrated to
other subvolumes leading to 'No such file or directory' errors.
DHT and thereafter shard get the same error code and log the following:

 0 [2017-03-17 14:04:26.353444] E [MSGID: 109040]
[dht-helper.c:1198:dht_migration_complete_check_task] 17-vmware2-dht:
<gfid:a68ce411-e381-46a3-93cd-d2af6a7c3532>: failed     to lookup the file
on vmware2-dht [Stale file
handle]

  1 [2017-03-17 14:04:26.353528] E [MSGID: 133014]
[shard.c:1253:shard_common_stat_cbk] 17-vmware2-shard: stat failed:
a68ce411-e381-46a3-93cd-d2af6a7c3532 [Stale file handle]

which is fine.

2. The other kind are from AFR logging of possible split-brain which I
suppose are harmless too.
[2017-03-17 14:23:36.968883] W [MSGID: 108008]
[afr-read-txn.c:228:afr_read_txn] 17-vmware2-replicate-13: Unreadable
subvolume -1 found with event generation 2 for gfid
74d49288-8452-40d4-893e-ff4672557ff9. (Possible split-brain)

Since you are saying the bug is hit only on VMs that are undergoing IO
while rebalance is running (as opposed to those that remained powered off),
rebalance + IO could be causing some issues.

CC'ing DHT devs

Raghavendra/Nithya/Susant,

Could you take a look?

-Krutika

On Sun, Mar 19, 2017 at 4:55 PM, Mahdi Adnan <mahdi.adnan at outlook.com>
wrote:

> Thank you for your email mate.
>
>
> Yes, im aware of this but, to save costs i chose replica 2, this cluster
> is all flash.
>
> In version 3.7.x i had issues with ping timeout, if one hosts went down
> for few seconds the whole cluster hangs and become unavailable, to avoid
> this i adjusted the ping timeout to 5 seconds.
>
> As for choosing Ganesha over gfapi, VMWare does not support Gluster (FUSE
> or gfapi) im stuck with NFS for this volume.
>
> The other volume is mounted using gfapi in oVirt cluster.
>
>
>
>
>
> --
>
> Respectfully
> *Mahdi A. Mahdi*
>
> ------------------------------
> *From:* Krutika Dhananjay <kdhananj at redhat.com>
> *Sent:* Sunday, March 19, 2017 2:01:49 PM
>
> *To:* Mahdi Adnan
> *Cc:* gluster-users at gluster.org
> *Subject:* Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption
>
> While I'm still going through the logs, just wanted to point out a couple
> of things:
>
> 1. It is recommended that you use 3-way replication (replica count 3) for
> VM store use case
> 2. network.ping-timeout at 5 seconds is way too low. Please change it to
> 30.
>
> Is there any specific reason for using NFS-Ganesha over gfapi/FUSE?
>
> Will get back with anything else I might find or more questions if I have
> any.
>
> -Krutika
>
> On Sun, Mar 19, 2017 at 2:36 PM, Mahdi Adnan <mahdi.adnan at outlook.com>
> wrote:
>
>> Thanks mate,
>>
>> Kindly, check the attachment.
>>
>>
>>
>> --
>>
>> Respectfully
>> *Mahdi A. Mahdi*
>>
>> ------------------------------
>> *From:* Krutika Dhananjay <kdhananj at redhat.com>
>> *Sent:* Sunday, March 19, 2017 10:00:22 AM
>>
>> *To:* Mahdi Adnan
>> *Cc:* gluster-users at gluster.org
>> *Subject:* Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption
>>
>> In that case could you share the ganesha-gfapi logs?
>>
>> -Krutika
>>
>> On Sun, Mar 19, 2017 at 12:13 PM, Mahdi Adnan <mahdi.adnan at outlook.com>
>> wrote:
>>
>>> I have two volumes, one is mounted using libgfapi for ovirt mount, the
>>> other one is exported via NFS-Ganesha for VMWare which is the one im
>>> testing now.
>>>
>>>
>>>
>>> --
>>>
>>> Respectfully
>>> *Mahdi A. Mahdi*
>>>
>>> ------------------------------
>>> *From:* Krutika Dhananjay <kdhananj at redhat.com>
>>> *Sent:* Sunday, March 19, 2017 8:02:19 AM
>>>
>>> *To:* Mahdi Adnan
>>> *Cc:* gluster-users at gluster.org
>>> *Subject:* Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption
>>>
>>>
>>>
>>> On Sat, Mar 18, 2017 at 10:36 PM, Mahdi Adnan <mahdi.adnan at outlook.com>
>>> wrote:
>>>
>>>> Kindly, check the attached new log file, i dont know if it's helpful or
>>>> not but, i couldn't find the log with the name you just described.
>>>>
>>> No. Are you using FUSE or libgfapi for accessing the volume? Or is it
>>> NFS?
>>>
>>> -Krutika
>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Respectfully
>>>> *Mahdi A. Mahdi*
>>>>
>>>> ------------------------------
>>>> *From:* Krutika Dhananjay <kdhananj at redhat.com>
>>>> *Sent:* Saturday, March 18, 2017 6:10:40 PM
>>>>
>>>> *To:* Mahdi Adnan
>>>> *Cc:* gluster-users at gluster.org
>>>> *Subject:* Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption
>>>>
>>>> mnt-disk11-vmware2.log seems like a brick log. Could you attach the
>>>> fuse mount logs? It should be right under /var/log/glusterfs/ directory
>>>> named after the mount point name, only hyphenated.
>>>>
>>>> -Krutika
>>>>
>>>> On Sat, Mar 18, 2017 at 7:27 PM, Mahdi Adnan <mahdi.adnan at outlook.com>
>>>> wrote:
>>>>
>>>>> Hello Krutika,
>>>>>
>>>>>
>>>>> Kindly, check the attached logs.
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Respectfully
>>>>> *Mahdi A. Mahdi*
>>>>>
>>>>> ------------------------------
>>>>> *From:* Krutika Dhananjay <kdhananj at redhat.com>
>>>>> *Sent:* Saturday, March 18, 2017 3:29:03 PM
>>>>> *To:* Mahdi Adnan
>>>>> *Cc:* gluster-users at gluster.org
>>>>> *Subject:* Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption
>>>>>
>>>>> Hi Mahdi,
>>>>>
>>>>> Could you attach mount, brick and rebalance logs?
>>>>>
>>>>> -Krutika
>>>>>
>>>>> On Sat, Mar 18, 2017 at 12:14 AM, Mahdi Adnan <mahdi.adnan at outlook.com
>>>>> > wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I have upgraded to Gluster 3.8.10 today and ran the add-brick
>>>>>> procedure in a volume contains few VMs.
>>>>>> After the completion of rebalance, i have rebooted the VMs, some of
>>>>>> ran just fine, and others just crashed.
>>>>>> Windows boot to recovery mode and Linux throw xfs errors and does not
>>>>>> boot.
>>>>>> I ran the test again and it happened just as the first one, but i
>>>>>> have noticed only VMs doing disk IOs are affected by this bug.
>>>>>> The VMs in power off mode started fine and even md5 of the disk file
>>>>>> did not change after the rebalance.
>>>>>>
>>>>>> anyone else can confirm this ?
>>>>>>
>>>>>>
>>>>>> Volume info:
>>>>>>
>>>>>> Volume Name: vmware2
>>>>>> Type: Distributed-Replicate
>>>>>> Volume ID: 02328d46-a285-4533-aa3a-fb9bfeb688bf
>>>>>> Status: Started
>>>>>> Snapshot Count: 0
>>>>>> Number of Bricks: 22 x 2 = 44
>>>>>> Transport-type: tcp
>>>>>> Bricks:
>>>>>> Brick1: gluster01:/mnt/disk1/vmware2
>>>>>> Brick2: gluster03:/mnt/disk1/vmware2
>>>>>> Brick3: gluster02:/mnt/disk1/vmware2
>>>>>> Brick4: gluster04:/mnt/disk1/vmware2
>>>>>> Brick5: gluster01:/mnt/disk2/vmware2
>>>>>> Brick6: gluster03:/mnt/disk2/vmware2
>>>>>> Brick7: gluster02:/mnt/disk2/vmware2
>>>>>> Brick8: gluster04:/mnt/disk2/vmware2
>>>>>> Brick9: gluster01:/mnt/disk3/vmware2
>>>>>> Brick10: gluster03:/mnt/disk3/vmware2
>>>>>> Brick11: gluster02:/mnt/disk3/vmware2
>>>>>> Brick12: gluster04:/mnt/disk3/vmware2
>>>>>> Brick13: gluster01:/mnt/disk4/vmware2
>>>>>> Brick14: gluster03:/mnt/disk4/vmware2
>>>>>> Brick15: gluster02:/mnt/disk4/vmware2
>>>>>> Brick16: gluster04:/mnt/disk4/vmware2
>>>>>> Brick17: gluster01:/mnt/disk5/vmware2
>>>>>> Brick18: gluster03:/mnt/disk5/vmware2
>>>>>> Brick19: gluster02:/mnt/disk5/vmware2
>>>>>> Brick20: gluster04:/mnt/disk5/vmware2
>>>>>> Brick21: gluster01:/mnt/disk6/vmware2
>>>>>> Brick22: gluster03:/mnt/disk6/vmware2
>>>>>> Brick23: gluster02:/mnt/disk6/vmware2
>>>>>> Brick24: gluster04:/mnt/disk6/vmware2
>>>>>> Brick25: gluster01:/mnt/disk7/vmware2
>>>>>> Brick26: gluster03:/mnt/disk7/vmware2
>>>>>> Brick27: gluster02:/mnt/disk7/vmware2
>>>>>> Brick28: gluster04:/mnt/disk7/vmware2
>>>>>> Brick29: gluster01:/mnt/disk8/vmware2
>>>>>> Brick30: gluster03:/mnt/disk8/vmware2
>>>>>> Brick31: gluster02:/mnt/disk8/vmware2
>>>>>> Brick32: gluster04:/mnt/disk8/vmware2
>>>>>> Brick33: gluster01:/mnt/disk9/vmware2
>>>>>> Brick34: gluster03:/mnt/disk9/vmware2
>>>>>> Brick35: gluster02:/mnt/disk9/vmware2
>>>>>> Brick36: gluster04:/mnt/disk9/vmware2
>>>>>> Brick37: gluster01:/mnt/disk10/vmware2
>>>>>> Brick38: gluster03:/mnt/disk10/vmware2
>>>>>> Brick39: gluster02:/mnt/disk10/vmware2
>>>>>> Brick40: gluster04:/mnt/disk10/vmware2
>>>>>> Brick41: gluster01:/mnt/disk11/vmware2
>>>>>> Brick42: gluster03:/mnt/disk11/vmware2
>>>>>> Brick43: gluster02:/mnt/disk11/vmware2
>>>>>> Brick44: gluster04:/mnt/disk11/vmware2
>>>>>> Options Reconfigured:
>>>>>> cluster.server-quorum-type: server
>>>>>> nfs.disable: on
>>>>>> performance.readdir-ahead: on
>>>>>> transport.address-family: inet
>>>>>> performance.quick-read: off
>>>>>> performance.read-ahead: off
>>>>>> performance.io-cache: off
>>>>>> performance.stat-prefetch: off
>>>>>> cluster.eager-lock: enable
>>>>>> network.remote-dio: enable
>>>>>> features.shard: on
>>>>>> cluster.data-self-heal-algorithm: full
>>>>>> features.cache-invalidation: on
>>>>>> ganesha.enable: on
>>>>>> features.shard-block-size: 256MB
>>>>>> client.event-threads: 2
>>>>>> server.event-threads: 2
>>>>>> cluster.favorite-child-policy: size
>>>>>> storage.build-pgfid: off
>>>>>> network.ping-timeout: 5
>>>>>> cluster.enable-shared-storage: enable
>>>>>> nfs-ganesha: enable
>>>>>> cluster.server-quorum-ratio: 51%
>>>>>>
>>>>>>
>>>>>> Adding bricks:
>>>>>> gluster volume add-brick vmware2 replica 2
>>>>>> gluster01:/mnt/disk11/vmware2 gluster03:/mnt/disk11/vmware2
>>>>>> gluster02:/mnt/disk11/vmware2 gluster04:/mnt/disk11/vmware2
>>>>>>
>>>>>>
>>>>>> starting fix layout:
>>>>>> gluster volume rebalance vmware2 fix-layout start
>>>>>>
>>>>>> Starting rebalance:
>>>>>> gluster volume rebalance vmware2  start
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>> Respectfully
>>>>>> *Mahdi A. Mahdi*
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Gluster-users mailing list
>>>>>> Gluster-users at gluster.org
>>>>>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170320/68b09d80/attachment.html>