[Gluster-users] [ovirt-users] Re: Announcing Gluster release 5.5

Atin Mukherjee amukherj at redhat.com
Fri Mar 29 07:34:16 UTC 2019

On Fri, Mar 29, 2019 at 12:47 PM Krutika Dhananjay <kdhananj at redhat.com>

> Questions/comments inline ...
> On Thu, Mar 28, 2019 at 10:18 PM <olaf.buitelaar at gmail.com> wrote:
>> Dear All,
>> I wanted to share my experience upgrading from 4.2.8 to 4.3.1. While
>> previous upgrades from 4.1 to 4.2 etc. went rather smooth, this one was a
>> different experience. After first trying a test upgrade on a 3 node setup,
>> which went fine. i headed to upgrade the 9 node production platform,
>> unaware of the backward compatibility issues between gluster 3.12.15 ->
>> 5.3. After upgrading 2 nodes, the HA engine stopped and wouldn't start.
>> Vdsm wasn't able to mount the engine storage domain, since /dom_md/metadata
>> was missing or couldn't be accessed. Restoring this file by getting a good
>> copy of the underlying bricks, removing the file from the underlying bricks
>> where the file was 0 bytes and mark with the stickybit, and the
>> corresponding gfid's. Removing the file from the mount point, and copying
>> back the file on the mount point. Manually mounting the engine domain,  and
>> manually creating the corresponding symbolic links in /rhev/data-center and
>> /var/run/vdsm/storage and fixing the ownership back to vdsm.kvm (which was
>> root.root), i was able to start the HA engine again. Since the engine was
>> up again, and things seemed rather unstable i decided to continue the
>> upgrade on the other nodes suspecting an incompatibility in gluster
>> versions, i thought would be best to have them all on the same version
>> rather soonish. However things went from bad to worse, the engine stopped
>> again, and all vm’s stopped working as well.  So on a machine outside the
>> setup and restored a backup of the engine taken from version 4.2.8 just
>> before the upgrade. With this engine I was at least able to start some vm’s
>> again, and finalize the upgrade. Once the upgraded, things didn’t stabilize
>> and also lose 2 vm’s during the process due to image corruption. After
>> figuring out gluster 5.3 had quite some issues I was as lucky to see
>> gluster 5.5 was about to be released, on the moment the RPM’s were
>> available I’ve installed those. This helped a lot in terms of stability,
>> for which I’m very grateful! However the performance is unfortunate
>> terrible, it’s about 15% of what the performance was running gluster
>> 3.12.15. It’s strange since a simple dd shows ok performance, but our
>> actual workload doesn’t. While I would expect the performance to be better,
>> due to all improvements made since gluster version 3.12. Does anybody share
>> the same experience?
>> I really hope gluster 6 will soon be tested with ovirt and released, and
>> things start to perform and stabilize again..like the good old days. Of
>> course when I can do anything, I’m happy to help.
>> I think the following short list of issues we have after the migration;
>> Gluster 5.5;
>> -       Poor performance for our workload (mostly write dependent)
> For this, could you share the volume-profile output specifically for the
> affected volume(s)? Here's what you need to do -
> 1. # gluster volume profile $VOLNAME stop
> 2. # gluster volume profile $VOLNAME start
> 3. Run the test inside the vm wherein you see bad performance
> 4. # gluster volume profile $VOLNAME info # save the output of this
> command into a file
> 5. # gluster volume profile $VOLNAME stop
> 6. and attach the output file gotten in step 4
> -       VM’s randomly pause on un
> known storage errors, which are “stale file’s”. corresponding log; Lookup
>> on shard 797 failed. Base file gfid = 8a27b91a-ff02-42dc-bd4c-caa019424de8
>> [Stale file handle]
> Could you share the complete gluster client log file (it would be a
> filename matching the pattern rhev-data-center-mnt-glusterSD-*)
> Also the output of `gluster volume info $VOLNAME`
>> -       Some files are listed twice in a directory (probably related the
>> stale file issue?)
>> Example;
>> ls -la
>> /rhev/data-center/59cd53a9-0003-02d7-00eb-0000000001e3/313f5d25-76af-4ecd-9a20-82a2fe815a3c/images/4add6751-3731-4bbd-ae94-aaeed12ea450/
>> total 3081
>> drwxr-x---.  2 vdsm kvm    4096 Mar 18 11:34 .
>> drwxr-xr-x. 13 vdsm kvm    4096 Mar 19 09:42 ..
>> -rw-rw----.  1 vdsm kvm 1048576 Mar 28 12:55
>> 1a7cf259-6b29-421d-9688-b25dfaafb13c
>> -rw-rw----.  1 vdsm kvm 1048576 Mar 28 12:55
>> 1a7cf259-6b29-421d-9688-b25dfaafb13c
>> -rw-rw----.  1 vdsm kvm 1048576 Jan 27  2018
>> 1a7cf259-6b29-421d-9688-b25dfaafb13c.lease
>> -rw-r--r--.  1 vdsm kvm     290 Jan 27  2018
>> 1a7cf259-6b29-421d-9688-b25dfaafb13c.meta
>> -rw-r--r--.  1 vdsm kvm     290 Jan 27  2018
>> 1a7cf259-6b29-421d-9688-b25dfaafb13c.meta
> Adding DHT and readdir-ahead maintainers regarding entries getting listed
> twice.
> @Nithya Balachandran <nbalacha at redhat.com> ^^
> @Gowdappa, Raghavendra <rgowdapp at redhat.com> ^^
> @Poornima Gurusiddaiah <pgurusid at redhat.com> ^^
>> - brick processes sometimes starts multiple times. Sometimes I’ve 5 brick
>> processes for a single volume. Killing all glusterfsd’s for the volume on
>> the machine and running gluster v start <vol> force usually just starts one
>> after the event, from then on things look all right.
> Did you mean 5 brick processes for a single brick directory?
> +Mohit Agrawal <moagrawa at redhat.com> ^^

Mohit - Could this be because of missing the following commit in release-5
branch? It might be worth to backport this fix.

commit 66986594a9023c49e61b32769b7e6b260b600626
Author: Mohit Agrawal <moagrawal at redhat.com>
Date:   Fri Mar 1 13:41:24 2019 +0530

    glusterfsd: Multiple shd processes are spawned on brick_mux environment

    Problem: Multiple shd processes are spawned while starting volumes
             in the loop on brick_mux environment.glusterd spawn a process
             based on a pidfile and shd daemon is taking some time to
             update pid in pidfile due to that glusterd is not able to
             get shd pid

    Solution: Commit cd249f4cb783f8d79e79468c455732669e835a4f changed
              the code to update pidfile in parent for any gluster daemon
              after getting the status of forking child in parent.To resolve
              the same correct the condition update pidfile in parent only
              for glusterd and for rest of the daemon pidfile is updated in

    Change-Id: Ifd14797fa949562594a285ec82d58384ad717e81
    fixes: bz#1684404
    Signed-off-by: Mohit Agrawal <moagrawal at redhat.com>

> -Krutika
>> Ovirt
>> -       All vms images ownership are changed to root.root after the vm is
>> shutdown, probably related to;
>> https://bugzilla.redhat.com/show_bug.cgi?id=1666795 but not only scoped
>> to the HA engine. I’m still in compatibility mode 4.2 for the cluster and
>> for the vm’s, but upgraded to version ovirt 4.3.2
>> -       The network provider is set to ovn, which is fine..actually cool,
>> only the “ovs-vswitchd” is a CPU hog, and utilizes 100%
>> -       It seems on all nodes vdsm tries to get the the stats for the HA
>> engine, which is filling the logs with (not sure if this is new);
>> [api.virt] FINISH getStats return={'status': {'message': "Virtual machine
>> does not exist: {'vmId': u'20d69acd-edfd-4aeb-a2ae-49e9c121b7e9'}", 'code':
>> 1}} from=::1,59290, vmId=20d69acd-edfd-4aeb-a2ae-49e9c121b7e9 (api:54)
>> -       It seems the package os_brick [root] managedvolume not supported:
>> Managed Volume Not Supported. Missing package os-brick.: ('Cannot import
>> os_brick',) (caps:149)  which fills the vdsm.log, but for this I also saw
>> another message, so I suspect this will already be resolved shortly
>> -       The machine I used to run the backup HA engine, doesn’t want to
>> get removed from the hosted-engine –vm-status, not even after running;
>> hosted-engine --clean-metadata --host-id=10 --force-clean or hosted-engine
>> --clean-metadata --force-clean from the machine itself.
>> Think that's about it.
>> Don’t get me wrong, I don’t want to rant, I just wanted to share my
>> experience and see where things can made better.
>> Best Olaf
>> _______________________________________________
>> Users mailing list -- users at ovirt.org
>> To unsubscribe send an email to users-leave at ovirt.org
>> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
>> oVirt Code of Conduct:
>> https://www.ovirt.org/community/about/community-guidelines/
>> List Archives:
>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/3CO35Q7VZMWNHS4LPUJNO7S47MGLSKS5/
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190329/9096d1d1/attachment.html>

More information about the Gluster-users mailing list