[Gluster-users] [ovirt-users] Re: Announcing Gluster release 5.5

Mohit Agrawal moagrawa at redhat.com
Wed Apr 3 02:55:59 UTC 2019


Hi Olaf,

  As per current attached "multi-glusterfsd-vol3.txt |
multi-glusterfsd-vol4.txt" it is showing multiple processes are running
  for "ovirt-core ovirt-engine" brick names but there are no logs available
in bricklogs.zip specific to this bricks, bricklogs.zip
  has a dump of ovirt-kube logs only

  Kindly share brick logs specific to the bricks "ovirt-core  ovirt-engine"
and share glusterd logs also.

Regards
Mohit Agrawal

On Tue, Apr 2, 2019 at 9:18 PM Olaf Buitelaar <olaf.buitelaar at gmail.com>
wrote:

> Dear Krutika,
>
> 1.
> I've changed the volume settings, write performance seems to increased
> somewhat, however the profile doesn't really support that since latencies
> increased. However read performance has diminished, which does seem to be
> supported by the profile runs (attached).
> Also the IO does seem to behave more consistent than before.
> I don't really understand the idea behind them, maybe you can explain why
> these suggestions are good?
> These settings seems to avoid as much local caching and access as possible
> and push everything to the gluster processes. While i would expect local
> access and local caches are a good thing, since it would lead to having
> less network access or disk access.
> I tried to investigate these settings a bit more, and this is what i
> understood of them;
> - network.remote-dio; when on it seems to ignore the O_DIRECT flag in the
> client, thus causing the files to be cached and buffered in the page cache
> on the client, i would expect this to be a good thing especially if the
> server process would access the same page cache?
> At least that is what grasp from this commit;
> https://review.gluster.org/#/c/glusterfs/+/4206/2/xlators/protocol/client/src/client.c line
> 867
> Also found this commit;
> https://github.com/gluster/glusterfs/commit/06c4ba589102bf92c58cd9fba5c60064bc7a504e#diff-938709e499b4383c3ed33c3979b9080c suggesting
> remote-dio actually improves performance, not sure it's a write or read
> benchmark
> When a file is opened with O_DIRECT it will also disable the write-behind
> functionality
>
> - performance.strict-o-direct: when on, the AFR, will not ignore the
> O_DIRECT flag. and will invoke: fop_writev_stub with the wb_writev_helper,
> which seems to stack the operation, no idea why that is. But generally i
> suppose not ignoring the O_DIRECT flag in the AFR is a good thing, when a
> processes requests to have O_DIRECT. So this makes sense to me.
>
> - cluster.choose-local: when off, it doesn't prefer the local node, but
> would always choose a brick. Since it's a 9 node cluster, with 3
> subvolumes, only a 1/3 could end-up local, and the other 2/3 should be
> pushed to external nodes anyway. Or am I making the total wrong assumption
> here?
>
> It seems to this config is moving to the gluster-block config side of
> things, which does make sense.
> Since we're running quite some mysql instances, which opens the files with
> O_DIRECt i believe, it would mean the only layer of cache is within mysql
> it self. Which you could argue is a good thing. But i would expect a little
> of write-behind buffer, and maybe some of the data cached within gluster
> would alleviate things a bit on gluster's side. But i wouldn't know if
> that's the correct mind set, and so might be totally off here.
> Also i would expect these gluster v set <VOL> command to be online
> operations, but somehow the bricks went down, after applying these changes.
> What appears to have happened is that after the update the brick process
> was restarted, but due to multiple brick process start issue, multiple
> processes were started, and the brick didn't came online again.
> However i'll try to reproduce this, since i would like to test with
> cluster.choose-local: on, and see how performance compares. And hopefully
> when it occurs collect some useful info.
> Question; are network.remote-dio and performance.strict-o-direct mutually
> exclusive settings, or can they both be on?
>
> 2. I've attached all brick logs, the only thing relevant i found was;
> [2019-03-28 20:20:07.170452] I [MSGID: 113030]
> [posix-entry-ops.c:1146:posix_unlink] 0-ovirt-kube-posix:
> open-fd-key-status: 0 for
> /data/gfs/bricks/brick1/ovirt-kube/.shard/a38d64bc-a28b-4ee1-a0bb-f919e7a1022c.109886
> [2019-03-28 20:20:07.170491] I [MSGID: 113031]
> [posix-entry-ops.c:1053:posix_skip_non_linkto_unlink] 0-posix: linkto_xattr
> status: 0 for
> /data/gfs/bricks/brick1/ovirt-kube/.shard/a38d64bc-a28b-4ee1-a0bb-f919e7a1022c.109886
> [2019-03-28 20:20:07.248480] I [MSGID: 113030]
> [posix-entry-ops.c:1146:posix_unlink] 0-ovirt-kube-posix:
> open-fd-key-status: 0 for
> /data/gfs/bricks/brick1/ovirt-kube/.shard/a38d64bc-a28b-4ee1-a0bb-f919e7a1022c.109886
> [2019-03-28 20:20:07.248491] I [MSGID: 113031]
> [posix-entry-ops.c:1053:posix_skip_non_linkto_unlink] 0-posix: linkto_xattr
> status: 0 for
> /data/gfs/bricks/brick1/ovirt-kube/.shard/a38d64bc-a28b-4ee1-a0bb-f919e7a1022c.109886
>
> Thanks Olaf
>
> ps. sorry needed to resend since it exceed the file limit
>
> Op ma 1 apr. 2019 om 07:56 schreef Krutika Dhananjay <kdhananj at redhat.com
> >:
>
>> Adding back gluster-users
>> Comments inline ...
>>
>> On Fri, Mar 29, 2019 at 8:11 PM Olaf Buitelaar <olaf.buitelaar at gmail.com>
>> wrote:
>>
>>> Dear Krutika,
>>>
>>>
>>>
>>> 1. I’ve made 2 profile runs of around 10 minutes (see files
>>> profile_data.txt and profile_data2.txt). Looking at it, most time seems be
>>> spent at the  fop’s fsync and readdirp.
>>>
>>> Unfortunate I don’t have the profile info for the 3.12.15 version so
>>> it’s a bit hard to compare.
>>>
>>> One additional thing I do notice on 1 machine (10.32.9.5) the iowait
>>> time increased a lot, from an average below the 1% it’s now around the 12%
>>> after the upgrade.
>>>
>>> So first suspicion with be lighting strikes twice, and I’ve also just
>>> now a bad disk, but that doesn’t appear to be the case, since all smart
>>> status report ok.
>>>
>>> Also dd shows performance I would more or less expect;
>>>
>>> dd if=/dev/zero of=/data/test_file  bs=100M count=1  oflag=dsync
>>>
>>> 1+0 records in
>>>
>>> 1+0 records out
>>>
>>> 104857600 bytes (105 MB) copied, 0.686088 s, 153 MB/s
>>>
>>> dd if=/dev/zero of=/data/test_file  bs=1G count=1  oflag=dsync
>>>
>>> 1+0 records in
>>>
>>> 1+0 records out
>>>
>>> 1073741824 bytes (1.1 GB) copied, 7.61138 s, 141 MB/s
>>>
>>> if=/dev/urandom of=/data/test_file  bs=1024 count=1000000
>>>
>>> 1000000+0 records in
>>>
>>> 1000000+0 records out
>>>
>>> 1024000000 bytes (1.0 GB) copied, 6.35051 s, 161 MB/s
>>>
>>> dd if=/dev/zero of=/data/test_file  bs=1024 count=1000000
>>>
>>> 1000000+0 records in
>>>
>>> 1000000+0 records out
>>>
>>> 1024000000 bytes (1.0 GB) copied, 1.6899 s, 606 MB/s
>>>
>>> When I disable this brick (service glusterd stop; pkill glusterfsd)
>>> performance in gluster is better, but not on par with what it was. Also the
>>> cpu usages on the “neighbor” nodes which hosts the other bricks in the same
>>> subvolume increases quite a lot in this case, which I wouldn’t expect
>>> actually since they shouldn't handle much more work, except flagging shards
>>> to heal. Iowait  also goes to idle once gluster is stopped, so it’s for
>>> sure gluster which waits for io.
>>>
>>>
>>>
>>
>> So I see that FSYNC %-latency is on the higher side. And I also noticed
>> you don't have direct-io options enabled on the volume.
>> Could you set the following options on the volume -
>> # gluster volume set <VOLNAME> network.remote-dio off
>> # gluster volume set <VOLNAME> performance.strict-o-direct on
>> and also disable choose-local
>> # gluster volume set <VOLNAME> cluster.choose-local off
>>
>> let me know if this helps.
>>
>> 2. I’ve attached the mnt log and volume info, but I couldn’t find
>>> anything relevant in in those logs. I think this is because we run the VM’s
>>> with libgfapi;
>>>
>>> [root at ovirt-host-01 ~]# engine-config  -g LibgfApiSupported
>>>
>>> LibgfApiSupported: true version: 4.2
>>>
>>> LibgfApiSupported: true version: 4.1
>>>
>>> LibgfApiSupported: true version: 4.3
>>>
>>> And I can confirm the qemu process is invoked with the gluster://
>>> address for the images.
>>>
>>> The message is logged in the /var/lib/libvert/qemu/<machine>  file,
>>> which I’ve also included. For a sample case see around; 2019-03-28 20:20:07
>>>
>>> Which has the error; E [MSGID: 133010]
>>> [shard.c:2294:shard_common_lookup_shards_cbk] 0-ovirt-kube-shard: Lookup on
>>> shard 109886 failed. Base file gfid = a38d64bc-a28b-4ee1-a0bb-f919e7a1022c
>>> [Stale file handle]
>>>
>>
>> Could you also attach the brick logs for this volume?
>>
>>
>>>
>>> 3. yes I see multiple instances for the same brick directory, like;
>>>
>>> /usr/sbin/glusterfsd -s 10.32.9.6 --volfile-id
>>> ovirt-core.10.32.9.6.data-gfs-bricks-brick1-ovirt-core -p
>>> /var/run/gluster/vols/ovirt-core/10.32.9.6-data-gfs-bricks-brick1-ovirt-core.pid
>>> -S /var/run/gluster/452591c9165945d9.socket --brick-name
>>> /data/gfs/bricks/brick1/ovirt-core -l
>>> /var/log/glusterfs/bricks/data-gfs-bricks-brick1-ovirt-core.log
>>> --xlator-option *-posix.glusterd-uuid=fb513da6-f3bd-4571-b8a2-db5efaf60cc1
>>> --process-name brick --brick-port 49154 --xlator-option
>>> ovirt-core-server.listen-port=49154
>>>
>>>
>>>
>>> I’ve made an export of the output of ps from the time I observed these
>>> multiple processes.
>>>
>>> In addition the brick_mux bug as noted by Atin. I might also have
>>> another possible cause, as ovirt moves nodes from none-operational state or
>>> maintenance state to active/activating, it also seems to restart gluster,
>>> however I don’t have direct proof for this theory.
>>>
>>>
>>>
>>
>> +Atin Mukherjee <amukherj at redhat.com> ^^
>> +Mohit Agrawal <moagrawa at redhat.com>  ^^
>>
>> -Krutika
>>
>> Thanks Olaf
>>>
>>> Op vr 29 mrt. 2019 om 10:03 schreef Sandro Bonazzola <
>>> sbonazzo at redhat.com>:
>>>
>>>>
>>>>
>>>> Il giorno gio 28 mar 2019 alle ore 17:48 <olaf.buitelaar at gmail.com> ha
>>>> scritto:
>>>>
>>>>> Dear All,
>>>>>
>>>>> I wanted to share my experience upgrading from 4.2.8 to 4.3.1. While
>>>>> previous upgrades from 4.1 to 4.2 etc. went rather smooth, this one was a
>>>>> different experience. After first trying a test upgrade on a 3 node setup,
>>>>> which went fine. i headed to upgrade the 9 node production platform,
>>>>> unaware of the backward compatibility issues between gluster 3.12.15 ->
>>>>> 5.3. After upgrading 2 nodes, the HA engine stopped and wouldn't start.
>>>>> Vdsm wasn't able to mount the engine storage domain, since /dom_md/metadata
>>>>> was missing or couldn't be accessed. Restoring this file by getting a good
>>>>> copy of the underlying bricks, removing the file from the underlying bricks
>>>>> where the file was 0 bytes and mark with the stickybit, and the
>>>>> corresponding gfid's. Removing the file from the mount point, and copying
>>>>> back the file on the mount point. Manually mounting the engine domain,  and
>>>>> manually creating the corresponding symbolic links in /rhev/data-center and
>>>>> /var/run/vdsm/storage and fixing the ownership back to vdsm.kvm (which was
>>>>> root.root), i was able to start the HA engine again. Since the engine was
>>>>> up again, and things seemed rather unstable i decided to continue the
>>>>> upgrade on the other nodes suspecting an incompatibility in gluster
>>>>> versions, i thought would be best to have them all on the same version
>>>>> rather soonish. However things went from bad to worse, the engine stopped
>>>>> again, and all vm’s stopped working as well.  So on a machine outside the
>>>>> setup and restored a backup of the engine taken from version 4.2.8 just
>>>>> before the upgrade. With this engine I was at least able to start some vm’s
>>>>> again, and finalize the upgrade. Once the upgraded, things didn’t stabilize
>>>>> and also lose 2 vm’s during the process due to image corruption. After
>>>>> figuring out gluster 5.3 had quite some issues I was as lucky to see
>>>>> gluster 5.5 was about to be released, on the moment the RPM’s were
>>>>> available I’ve installed those. This helped a lot in terms of stability,
>>>>> for which I’m very grateful! However the performance is unfortunate
>>>>> terrible, it’s about 15% of what the performance was running gluster
>>>>> 3.12.15. It’s strange since a simple dd shows ok performance, but our
>>>>> actual workload doesn’t. While I would expect the performance to be better,
>>>>> due to all improvements made since gluster version 3.12. Does anybody share
>>>>> the same experience?
>>>>> I really hope gluster 6 will soon be tested with ovirt and released,
>>>>> and things start to perform and stabilize again..like the good old days. Of
>>>>> course when I can do anything, I’m happy to help.
>>>>>
>>>>
>>>> Opened https://bugzilla.redhat.com/show_bug.cgi?id=1693998 to track
>>>> the rebase on Gluster 6.
>>>>
>>>>
>>>>
>>>>>
>>>>> I think the following short list of issues we have after the migration;
>>>>> Gluster 5.5;
>>>>> -       Poor performance for our workload (mostly write dependent)
>>>>> -       VM’s randomly pause on unknown storage errors, which are
>>>>> “stale file’s”. corresponding log; Lookup on shard 797 failed. Base file
>>>>> gfid = 8a27b91a-ff02-42dc-bd4c-caa019424de8 [Stale file handle]
>>>>> -       Some files are listed twice in a directory (probably related
>>>>> the stale file issue?)
>>>>> Example;
>>>>> ls -la
>>>>> /rhev/data-center/59cd53a9-0003-02d7-00eb-0000000001e3/313f5d25-76af-4ecd-9a20-82a2fe815a3c/images/4add6751-3731-4bbd-ae94-aaeed12ea450/
>>>>> total 3081
>>>>> drwxr-x---.  2 vdsm kvm    4096 Mar 18 11:34 .
>>>>> drwxr-xr-x. 13 vdsm kvm    4096 Mar 19 09:42 ..
>>>>> -rw-rw----.  1 vdsm kvm 1048576 Mar 28 12:55
>>>>> 1a7cf259-6b29-421d-9688-b25dfaafb13c
>>>>> -rw-rw----.  1 vdsm kvm 1048576 Mar 28 12:55
>>>>> 1a7cf259-6b29-421d-9688-b25dfaafb13c
>>>>> -rw-rw----.  1 vdsm kvm 1048576 Jan 27  2018
>>>>> 1a7cf259-6b29-421d-9688-b25dfaafb13c.lease
>>>>> -rw-r--r--.  1 vdsm kvm     290 Jan 27  2018
>>>>> 1a7cf259-6b29-421d-9688-b25dfaafb13c.meta
>>>>> -rw-r--r--.  1 vdsm kvm     290 Jan 27  2018
>>>>> 1a7cf259-6b29-421d-9688-b25dfaafb13c.meta
>>>>>
>>>>> - brick processes sometimes starts multiple times. Sometimes I’ve 5
>>>>> brick processes for a single volume. Killing all glusterfsd’s for the
>>>>> volume on the machine and running gluster v start <vol> force usually just
>>>>> starts one after the event, from then on things look all right.
>>>>>
>>>>>
>>>> May I kindly ask to open bugs on Gluster for above issues at
>>>> https://bugzilla.redhat.com/enter_bug.cgi?product=GlusterFS ?
>>>> Sahina?
>>>>
>>>>
>>>>> Ovirt 4.3.2.1-1.el7
>>>>> -       All vms images ownership are changed to root.root after the vm
>>>>> is shutdown, probably related to;
>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1666795 but not only
>>>>> scoped to the HA engine. I’m still in compatibility mode 4.2 for the
>>>>> cluster and for the vm’s, but upgraded to version ovirt 4.3.2
>>>>>
>>>>
>>>> Ryan?
>>>>
>>>>
>>>>> -       The network provider is set to ovn, which is fine..actually
>>>>> cool, only the “ovs-vswitchd” is a CPU hog, and utilizes 100%
>>>>>
>>>>
>>>> Miguel? Dominik?
>>>>
>>>>
>>>>> -       It seems on all nodes vdsm tries to get the the stats for the
>>>>> HA engine, which is filling the logs with (not sure if this is new);
>>>>> [api.virt] FINISH getStats return={'status': {'message': "Virtual
>>>>> machine does not exist: {'vmId': u'20d69acd-edfd-4aeb-a2ae-49e9c121b7e9'}",
>>>>> 'code': 1}} from=::1,59290, vmId=20d69acd-edfd-4aeb-a2ae-49e9c121b7e9
>>>>> (api:54)
>>>>>
>>>>
>>>> Simone?
>>>>
>>>>
>>>>> -       It seems the package os_brick [root] managedvolume not
>>>>> supported: Managed Volume Not Supported. Missing package os-brick.:
>>>>> ('Cannot import os_brick',) (caps:149)  which fills the vdsm.log, but for
>>>>> this I also saw another message, so I suspect this will already be resolved
>>>>> shortly
>>>>> -       The machine I used to run the backup HA engine, doesn’t want
>>>>> to get removed from the hosted-engine –vm-status, not even after running;
>>>>> hosted-engine --clean-metadata --host-id=10 --force-clean or hosted-engine
>>>>> --clean-metadata --force-clean from the machine itself.
>>>>>
>>>>
>>>> Simone?
>>>>
>>>>
>>>>>
>>>>> Think that's about it.
>>>>>
>>>>> Don’t get me wrong, I don’t want to rant, I just wanted to share my
>>>>> experience and see where things can made better.
>>>>>
>>>>
>>>> If not already done, can you please open bugs for above issues at
>>>> https://bugzilla.redhat.com/enter_bug.cgi?classification=oVirt ?
>>>>
>>>>
>>>>>
>>>>>
>>>>> Best Olaf
>>>>> _______________________________________________
>>>>> Users mailing list -- users at ovirt.org
>>>>> To unsubscribe send an email to users-leave at ovirt.org
>>>>> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
>>>>> oVirt Code of Conduct:
>>>>> https://www.ovirt.org/community/about/community-guidelines/
>>>>> List Archives:
>>>>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/3CO35Q7VZMWNHS4LPUJNO7S47MGLSKS5/
>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> SANDRO BONAZZOLA
>>>>
>>>> MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV
>>>>
>>>> Red Hat EMEA <https://www.redhat.com/>
>>>>
>>>> sbonazzo at redhat.com
>>>> <https://red.ht/sig>
>>>>
>>> _______________________________________________
>>> Users mailing list -- users at ovirt.org
>>> To unsubscribe send an email to users-leave at ovirt.org
>>> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
>>> oVirt Code of Conduct:
>>> https://www.ovirt.org/community/about/community-guidelines/
>>> List Archives:
>>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/HAGTA64LF7LLE6YMHQ6DLT26MD2GZ2PK/
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190403/387ef47d/attachment-0001.html>


More information about the Gluster-users mailing list