[Gluster-users] Rebalance + VM corruption - current status and request for feedback

Shyam srangana at redhat.com
Mon Jun 5 13:13:30 UTC 2017


Just to be clear, the release notes still carry the warning about this, 
and the code to use force when doing rebalance is still in place.

As we have received the feedback that this works, these will be removed 
in the subsequent minor release for the various streams as appropriate.

Thanks,
Shyam

On 06/05/2017 07:36 AM, Gandalf Corvotempesta wrote:
> Great, thanks!
>
> Il 5 giu 2017 6:49 AM, "Krutika Dhananjay" <kdhananj at redhat.com
> <mailto:kdhananj at redhat.com>> ha scritto:
>
>     The fixes are already available in 3.10.2, 3.8.12 and 3.11.0
>
>     -Krutika
>
>     On Sun, Jun 4, 2017 at 5:30 PM, Gandalf Corvotempesta
>     <gandalf.corvotempesta at gmail.com
>     <mailto:gandalf.corvotempesta at gmail.com>> wrote:
>
>         Great news.
>         Is this planned to be published in next release?
>
>         Il 29 mag 2017 3:27 PM, "Krutika Dhananjay" <kdhananj at redhat.com
>         <mailto:kdhananj at redhat.com>> ha scritto:
>
>             Thanks for that update. Very happy to hear it ran fine
>             without any issues. :)
>
>             Yeah so you can ignore those 'No such file or directory'
>             errors. They represent a transient state where DHT in the
>             client process is yet to figure out the new location of the
>             file.
>
>             -Krutika
>
>
>             On Mon, May 29, 2017 at 6:51 PM, Mahdi Adnan
>             <mahdi.adnan at outlook.com <mailto:mahdi.adnan at outlook.com>>
>             wrote:
>
>                 Hello,
>
>
>                 Yes, i forgot to upgrade the client as well.
>
>                 I did the upgrade and created a new volume, same options
>                 as before, with one VM running and doing lots of IOs. i
>                 started the rebalance with force and after it completed
>                 the process i rebooted the VM, and it did start normally
>                 without issues.
>
>                 I repeated the process and did another rebalance while
>                 the VM running and everything went fine.
>
>                 But the logs in the client throwing lots of warning
>                 messages:
>
>
>                 [2017-05-29 13:14:59.416382] W [MSGID: 114031]
>                 [client-rpc-fops.c:2928:client3_3_lookup_cbk]
>                 2-gfs_vol2-client-2: remote operation failed. Path:
>                 /50294ed6-db7a-418d-965f-9b44c69a83fd/images/d59487fe-f3a9-4bad-a607-3a181c871711/aa01c3a0-5aa0-432d-82ad-d1f515f1d87f
>                 (93c403f5-c769-44b9-a087-dc51fc21412e) [No such file or
>                 directory]
>                 [2017-05-29 13:14:59.416427] W [MSGID: 114031]
>                 [client-rpc-fops.c:2928:client3_3_lookup_cbk]
>                 2-gfs_vol2-client-3: remote operation failed. Path:
>                 /50294ed6-db7a-418d-965f-9b44c69a83fd/images/d59487fe-f3a9-4bad-a607-3a181c871711/aa01c3a0-5aa0-432d-82ad-d1f515f1d87f
>                 (93c403f5-c769-44b9-a087-dc51fc21412e) [No such file or
>                 directory]
>                 [2017-05-29 13:14:59.808251] W [MSGID: 114031]
>                 [client-rpc-fops.c:2928:client3_3_lookup_cbk]
>                 2-gfs_vol2-client-2: remote operation failed. Path:
>                 /50294ed6-db7a-418d-965f-9b44c69a83fd/images/d59487fe-f3a9-4bad-a607-3a181c871711/aa01c3a0-5aa0-432d-82ad-d1f515f1d87f
>                 (93c403f5-c769-44b9-a087-dc51fc21412e) [No such file or
>                 directory]
>                 [2017-05-29 13:14:59.808287] W [MSGID: 114031]
>                 [client-rpc-fops.c:2928:client3_3_lookup_cbk]
>                 2-gfs_vol2-client-3: remote operation failed. Path:
>                 /50294ed6-db7a-418d-965f-9b44c69a83fd/images/d59487fe-f3a9-4bad-a607-3a181c871711/aa01c3a0-5aa0-432d-82ad-d1f515f1d87f
>                 (93c403f5-c769-44b9-a087-dc51fc21412e) [No such file or
>                 directory]
>
>
>
>                 Although the process went smooth, i will run another
>                 extensive test tomorrow just to be sure.
>
>                 --
>
>                 Respectfully*
>                 **Mahdi A. Mahdi*
>
>                 ------------------------------------------------------------------------
>                 *From:* Krutika Dhananjay <kdhananj at redhat.com
>                 <mailto:kdhananj at redhat.com>>
>                 *Sent:* Monday, May 29, 2017 9:20:29 AM
>
>                 *To:* Mahdi Adnan
>                 *Cc:* gluster-user; Gandalf Corvotempesta; Lindsay
>                 Mathieson; Kevin Lemonnier
>                 *Subject:* Re: Rebalance + VM corruption - current
>                 status and request for feedback
>
>                 Hi,
>
>                 I took a look at your logs.
>                 It very much seems like an issue that is caused by a
>                 mismatch in glusterfs client and server packages.
>                 So your client (mount) seems to be still running 3.7.20,
>                 as confirmed by the occurrence of the following log message:
>
>                 [2017-05-26 08:58:23.647458] I [MSGID: 100030]
>                 [glusterfsd.c:2338:main] 0-/usr/sbin/glusterfs: Started
>                 running /usr/sbin/glusterfs version 3.7.20 (args:
>                 /usr/sbin/glusterfs --volfile-server=s1
>                 --volfile-server=s2 --volfile-server=s3
>                 --volfile-server=s4 --volfile-id=/testvol
>                 /rhev/data-center/mnt/glusterSD/s1:_testvol)
>                 [2017-05-26 08:58:40.901204] I [MSGID: 100030]
>                 [glusterfsd.c:2338:main] 0-/usr/sbin/glusterfs: Started
>                 running /usr/sbin/glusterfs version 3.7.20 (args:
>                 /usr/sbin/glusterfs --volfile-server=s1
>                 --volfile-server=s2 --volfile-server=s3
>                 --volfile-server=s4 --volfile-id=/testvol
>                 /rhev/data-center/mnt/glusterSD/s1:_testvol)
>                 [2017-05-26 08:58:48.923452] I [MSGID: 100030]
>                 [glusterfsd.c:2338:main] 0-/usr/sbin/glusterfs: Started
>                 running /usr/sbin/glusterfs version 3.7.20 (args:
>                 /usr/sbin/glusterfs --volfile-server=s1
>                 --volfile-server=s2 --volfile-server=s3
>                 --volfile-server=s4 --volfile-id=/testvol
>                 /rhev/data-center/mnt/glusterSD/s1:_testvol)
>
>                 whereas the servers have rightly been upgraded to
>                 3.10.2, as seen in rebalance log:
>
>                 [2017-05-26 09:36:36.075940] I [MSGID: 100030]
>                 [glusterfsd.c:2475:main] 0-/usr/sbin/glusterfs: Started
>                 running /usr/sbin/glusterfs version 3.10.2 (args:
>                 /usr/sbin/glusterfs -s localhost --volfile-id
>                 rebalance/testvol --xlator-option *dht.use-readdirp=yes
>                 --xlator-option *dht.lookup-unhashed=yes --xlator-option
>                 *dht.assert-no-child-down=yes --xlator-option
>                 *replicate*.data-self-heal=off --xlator-option
>                 *replicate*.metadata-self-heal=off --xlator-option
>                 *replicate*.entry-self-heal=off --xlator-option
>                 *dht.readdir-optimize=on --xlator-option
>                 *dht.rebalance-cmd=5 --xlator-option
>                 *dht.node-uuid=7c0bf49e-1ede-47b1-b9a5-bfde6e60f07b
>                 --xlator-option *dht.commit-hash=3376396580
>                 <tel:(337)%20639-6580> --socket-file
>                 /var/run/gluster/gluster-rebalance-801faefa-a583-46b4-8eef-e0ec160da9ea.sock
>                 --pid-file
>                 /var/lib/glusterd/vols/testvol/rebalance/7c0bf49e-1ede-47b1-b9a5-bfde6e60f07b.pid
>                 -l /var/log/glusterfs/testvol-rebalance.log)
>
>
>                 Could you upgrade all packages to 3.10.2 and try again?
>
>                 -Krutika
>
>
>                 On Fri, May 26, 2017 at 4:46 PM, Mahdi Adnan
>                 <mahdi.adnan at outlook.com
>                 <mailto:mahdi.adnan at outlook.com>> wrote:
>
>                     Hi,
>
>
>                     Attached are the logs for both the rebalance and the
>                     mount.
>
>
>
>                     --
>
>                     Respectfully*
>                     **Mahdi A. Mahdi*
>
>                     ------------------------------------------------------------------------
>                     *From:* Krutika Dhananjay <kdhananj at redhat.com
>                     <mailto:kdhananj at redhat.com>>
>                     *Sent:* Friday, May 26, 2017 1:12:28 PM
>                     *To:* Mahdi Adnan
>                     *Cc:* gluster-user; Gandalf Corvotempesta; Lindsay
>                     Mathieson; Kevin Lemonnier
>                     *Subject:* Re: Rebalance + VM corruption - current
>                     status and request for feedback
>
>                     Could you provide the rebalance and mount logs?
>
>                     -Krutika
>
>                     On Fri, May 26, 2017 at 3:17 PM, Mahdi Adnan
>                     <mahdi.adnan at outlook.com
>                     <mailto:mahdi.adnan at outlook.com>> wrote:
>
>                         Good morning,
>
>
>                         So i have tested the new Gluster 3.10.2, and
>                         after starting rebalance two VMs were paused due
>                         to storage error and third one was not responding.
>
>                         After rebalance completed i started the VMs and
>                         it did not boot, and throw an XFS wrong inode
>                         error into the screen.
>
>
>                         My setup:
>
>                         4 nodes running CentOS7.3 with Gluster 3.10.2
>
>                         4 bricks in distributed replica with group set
>                         to virt.
>
>                         I added the volume to ovirt and created three
>                         VMs, i ran a loop to create 5GB file inside the VMs.
>
>                         Added new 4 bricks to the existing nodes.
>
>                         Started rebalane "with force to bypass the
>                         warning message"
>
>                         VMs started to fail after rebalancing.
>
>
>
>
>                         --
>
>                         Respectfully*
>                         **Mahdi A. Mahdi*
>
>                         ------------------------------------------------------------------------
>                         *From:* Krutika Dhananjay <kdhananj at redhat.com
>                         <mailto:kdhananj at redhat.com>>
>                         *Sent:* Wednesday, May 17, 2017 6:59:20 AM
>                         *To:* gluster-user
>                         *Cc:* Gandalf Corvotempesta; Lindsay Mathieson;
>                         Kevin Lemonnier; Mahdi Adnan
>                         *Subject:* Rebalance + VM corruption - current
>                         status and request for feedback
>
>                         Hi,
>
>                         In the past couple of weeks, we've sent the
>                         following fixes concerning VM corruption upon
>                         doing rebalance -
>                         https://review.gluster.org/#/q/status:merged+project:glusterfs+branch:master+topic:bug-1440051
>                         <https://review.gluster.org/#/q/status:merged+project:glusterfs+branch:master+topic:bug-1440051>
>
>                         These fixes are very much part of the latest
>                         3.10.2 release.
>
>                         Satheesaran within Red Hat also verified that
>                         they work and he's not seeing corruption issues
>                         anymore.
>
>                         I'd like to hear feedback from the users
>                         themselves on these fixes (on your test
>                         environments to begin with) before even changing
>                         the status of the bug to CLOSED.
>
>                         Although 3.10.2 has a patch that prevents
>                         rebalance sub-commands from being executed on
>                         sharded volumes, you can override the check by
>                         using the 'force' option.
>
>                         For example,
>
>                         # gluster volume rebalance myvol start force
>
>                         Very much looking forward to hearing from you all.
>
>                         Thanks,
>                         Krutika
>
>
>
>
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>


More information about the Gluster-users mailing list