[Gluster-users] Rebalance + VM corruption - current status and request for feedback

Tue Jun 6 06:17:40 UTC 2017

Hi Mahdi,

Did you get a chance to verify this fix again?
If this fix works for you, is it OK if we move this bug to CLOSED state and
revert the rebalance-cli warning patch?

-Krutika

On Mon, May 29, 2017 at 6:51 PM, Mahdi Adnan <mahdi.adnan at outlook.com>
wrote:

> Hello,
>
>
> Yes, i forgot to upgrade the client as well.
>
> I did the upgrade and created a new volume, same options as before, with
> one VM running and doing lots of IOs. i started the rebalance with force
> and after it completed the process i rebooted the VM, and it did start
> normally without issues.
>
> I repeated the process and did another rebalance while the VM running and
> everything went fine.
>
> But the logs in the client throwing lots of warning messages:
>
>
> [2017-05-29 13:14:59.416382] W [MSGID: 114031] [client-rpc-fops.c:2928:client3_3_lookup_cbk]
> 2-gfs_vol2-client-2: remote operation failed. Path:
> /50294ed6-db7a-418d-965f-9b44c69a83fd/images/d59487fe-
> f3a9-4bad-a607-3a181c871711/aa01c3a0-5aa0-432d-82ad-d1f515f1d87f
> (93c403f5-c769-44b9-a087-dc51fc21412e) [No such file or directory]
> [2017-05-29 13:14:59.416427] W [MSGID: 114031] [client-rpc-fops.c:2928:client3_3_lookup_cbk]
> 2-gfs_vol2-client-3: remote operation failed. Path:
> /50294ed6-db7a-418d-965f-9b44c69a83fd/images/d59487fe-
> f3a9-4bad-a607-3a181c871711/aa01c3a0-5aa0-432d-82ad-d1f515f1d87f
> (93c403f5-c769-44b9-a087-dc51fc21412e) [No such file or directory]
> [2017-05-29 13:14:59.808251] W [MSGID: 114031] [client-rpc-fops.c:2928:client3_3_lookup_cbk]
> 2-gfs_vol2-client-2: remote operation failed. Path:
> /50294ed6-db7a-418d-965f-9b44c69a83fd/images/d59487fe-
> f3a9-4bad-a607-3a181c871711/aa01c3a0-5aa0-432d-82ad-d1f515f1d87f
> (93c403f5-c769-44b9-a087-dc51fc21412e) [No such file or directory]
> [2017-05-29 13:14:59.808287] W [MSGID: 114031] [client-rpc-fops.c:2928:client3_3_lookup_cbk]
> 2-gfs_vol2-client-3: remote operation failed. Path:
> /50294ed6-db7a-418d-965f-9b44c69a83fd/images/d59487fe-
> f3a9-4bad-a607-3a181c871711/aa01c3a0-5aa0-432d-82ad-d1f515f1d87f
> (93c403f5-c769-44b9-a087-dc51fc21412e) [No such file or directory]
>
>
>
> Although the process went smooth, i will run another extensive test
> tomorrow just to be sure.
>
> --
>
> Respectfully
> *Mahdi A. Mahdi*
>
> ------------------------------
> *From:* Krutika Dhananjay <kdhananj at redhat.com>
> *Sent:* Monday, May 29, 2017 9:20:29 AM
>
> *To:* Mahdi Adnan
> *Cc:* gluster-user; Gandalf Corvotempesta; Lindsay Mathieson; Kevin
> Lemonnier
> *Subject:* Re: Rebalance + VM corruption - current status and request for
> feedback
>
> Hi,
>
> I took a look at your logs.
> It very much seems like an issue that is caused by a mismatch in glusterfs
> client and server packages.
> So your client (mount) seems to be still running 3.7.20, as confirmed by
> the occurrence of the following log message:
>
> [2017-05-26 08:58:23.647458] I [MSGID: 100030] [glusterfsd.c:2338:main]
> 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.20
> (args: /usr/sbin/glusterfs --volfile-server=s1 --volfile-server=s2
> --volfile-server=s3 --volfile-server=s4 --volfile-id=/testvol
> /rhev/data-center/mnt/glusterSD/s1:_testvol)
> [2017-05-26 08:58:40.901204] I [MSGID: 100030] [glusterfsd.c:2338:main]
> 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.20
> (args: /usr/sbin/glusterfs --volfile-server=s1 --volfile-server=s2
> --volfile-server=s3 --volfile-server=s4 --volfile-id=/testvol
> /rhev/data-center/mnt/glusterSD/s1:_testvol)
> [2017-05-26 08:58:48.923452] I [MSGID: 100030] [glusterfsd.c:2338:main]
> 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.20
> (args: /usr/sbin/glusterfs --volfile-server=s1 --volfile-server=s2
> --volfile-server=s3 --volfile-server=s4 --volfile-id=/testvol
> /rhev/data-center/mnt/glusterSD/s1:_testvol)
>
> whereas the servers have rightly been upgraded to 3.10.2, as seen in
> rebalance log:
>
> [2017-05-26 09:36:36.075940] I [MSGID: 100030] [glusterfsd.c:2475:main]
> 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.10.2
> (args: /usr/sbin/glusterfs -s localhost --volfile-id rebalance/testvol
> --xlator-option *dht.use-readdirp=yes --xlator-option
> *dht.lookup-unhashed=yes --xlator-option *dht.assert-no-child-down=yes
> --xlator-option *replicate*.data-self-heal=off --xlator-option
> *replicate*.metadata-self-heal=off --xlator-option
> *replicate*.entry-self-heal=off --xlator-option *dht.readdir-optimize=on
> --xlator-option *dht.rebalance-cmd=5 --xlator-option
> *dht.node-uuid=7c0bf49e-1ede-47b1-b9a5-bfde6e60f07b --xlator-option
> *dht.commit-hash=3376396580 --socket-file /var/run/gluster/gluster-
> rebalance-801faefa-a583-46b4-8eef-e0ec160da9ea.sock --pid-file
> /var/lib/glusterd/vols/testvol/rebalance/7c0bf49e-
> 1ede-47b1-b9a5-bfde6e60f07b.pid -l /var/log/glusterfs/testvol-
> rebalance.log)
>
>
> Could you upgrade all packages to 3.10.2 and try again?
>
> -Krutika
>
>
> On Fri, May 26, 2017 at 4:46 PM, Mahdi Adnan <mahdi.adnan at outlook.com>
> wrote:
>
>> Hi,
>>
>>
>> Attached are the logs for both the rebalance and the mount.
>>
>>
>>
>> --
>>
>> Respectfully
>> *Mahdi A. Mahdi*
>>
>> ------------------------------
>> *From:* Krutika Dhananjay <kdhananj at redhat.com>
>> *Sent:* Friday, May 26, 2017 1:12:28 PM
>> *To:* Mahdi Adnan
>> *Cc:* gluster-user; Gandalf Corvotempesta; Lindsay Mathieson; Kevin
>> Lemonnier
>> *Subject:* Re: Rebalance + VM corruption - current status and request
>> for feedback
>>
>> Could you provide the rebalance and mount logs?
>>
>> -Krutika
>>
>> On Fri, May 26, 2017 at 3:17 PM, Mahdi Adnan <mahdi.adnan at outlook.com>
>> wrote:
>>
>>> Good morning,
>>>
>>>
>>> So i have tested the new Gluster 3.10.2, and after starting rebalance
>>> two VMs were paused due to storage error and third one was not responding.
>>>
>>> After rebalance completed i started the VMs and it did not boot, and
>>> throw an XFS wrong inode error into the screen.
>>>
>>>
>>> My setup:
>>>
>>> 4 nodes running CentOS7.3 with Gluster 3.10.2
>>>
>>> 4 bricks in distributed replica with group set to virt.
>>>
>>> I added the volume to ovirt and created three VMs, i ran a loop to
>>> create 5GB file inside the VMs.
>>>
>>> Added new 4 bricks to the existing nodes.
>>>
>>> Started rebalane "with force to bypass the warning message"
>>>
>>> VMs started to fail after rebalancing.
>>>
>>>
>>>
>>>
>>> --
>>>
>>> Respectfully
>>> *Mahdi A. Mahdi*
>>>
>>> ------------------------------
>>> *From:* Krutika Dhananjay <kdhananj at redhat.com>
>>> *Sent:* Wednesday, May 17, 2017 6:59:20 AM
>>> *To:* gluster-user
>>> *Cc:* Gandalf Corvotempesta; Lindsay Mathieson; Kevin Lemonnier; Mahdi
>>> Adnan
>>> *Subject:* Rebalance + VM corruption - current status and request for
>>> feedback
>>>
>>> Hi,
>>>
>>> In the past couple of weeks, we've sent the following fixes concerning
>>> VM corruption upon doing rebalance - https://review.gluster.org/#/q
>>> /status:merged+project:glusterfs+branch:master+topic:bug-1440051
>>>
>>> These fixes are very much part of the latest 3.10.2 release.
>>>
>>> Satheesaran within Red Hat also verified that they work and he's not
>>> seeing corruption issues anymore.
>>>
>>> I'd like to hear feedback from the users themselves on these fixes (on
>>> your test environments to begin with) before even changing the status of
>>> the bug to CLOSED.
>>>
>>> Although 3.10.2 has a patch that prevents rebalance sub-commands from
>>> being executed on sharded volumes, you can override the check by using the
>>> 'force' option.
>>>
>>> For example,
>>>
>>> # gluster volume rebalance myvol start force
>>>
>>> Very much looking forward to hearing from you all.
>>>
>>> Thanks,
>>> Krutika
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170606/d37d34cf/attachment.html>