From sankarshan.mukhopadhyay at gmail.com Mon Apr 1 00:23:29 2019 From: sankarshan.mukhopadhyay at gmail.com (Sankarshan Mukhopadhyay) Date: Mon, 1 Apr 2019 05:53:29 +0530 Subject: [Gluster-users] [Gluster-devel] Upgrade testing to gluster 6 In-Reply-To: References: Message-ID: Quite a considerable amount of detail here. Thank you! On Fri, Mar 29, 2019 at 11:42 AM Hari Gowtham wrote: > > Hello Gluster users, > > As you all aware that glusterfs-6 is out, we would like to inform you > that, we have spent a significant amount of time in testing > glusterfs-6 in upgrade scenarios. We have done upgrade testing to > glusterfs-6 from various releases like 3.12, 4.1 and 5.3. > > As glusterfs-6 has got in a lot of changes, we wanted to test those portions. > There were xlators (and respective options to enable/disable them) > added and deprecated in glusterfs-6 from various versions [1]. > > We had to check the following upgrade scenarios for all such options > Identified in [1]: > 1) option never enabled and upgraded > 2) option enabled and then upgraded > 3) option enabled and then disabled and then upgraded > > We weren't manually able to check all the combinations for all the options. > So the options involving enabling and disabling xlators were prioritized. > The below are the result of the ones tested. > > Never enabled and upgraded: > checked from 3.12, 4.1, 5.3 to 6 the upgrade works. > > Enabled and upgraded: > Tested for tier which is deprecated, It is not a recommended upgrade. > As expected the volume won't be consumable and will have a few more > issues as well. > Tested with 3.12, 4.1 and 5.3 to 6 upgrade. > > Enabled, disabled before upgrade. > Tested for tier with 3.12 and the upgrade went fine. > > There is one common issue to note in every upgrade. The node being > upgraded is going into disconnected state. You have to flush the iptables > and the restart glusterd on all nodes to fix this. > Is this something that is written in the upgrade notes? I do not seem to recall, if not, I'll send a PR > The testing for enabling new options is still pending. The new options > won't cause as much issues as the deprecated ones so this was put at > the end of the priority list. It would be nice to get contributions > for this. > Did the range of tests lead to any new issues? > For the disable testing, tier was used as it covers most of the xlator > that was removed. And all of these tests were done on a replica 3 volume. > I'm not sure if the Glusto team is reading this, but it would be pertinent to understand if the approach you have taken can be converted into a form of automated testing pre-release. > Note: This is only for upgrade testing of the newly added and removed > xlators. Does not involve the normal tests for the xlator. > > If you have any questions, please feel free to reach us. > > [1] https://docs.google.com/spreadsheets/d/1nh7T5AXaV6kc5KgILOy2pEqjzC3t_R47f1XUXSVFetI/edit?usp=sharing > > Regards, > Hari and Sanju. From hgowtham at redhat.com Mon Apr 1 04:58:21 2019 From: hgowtham at redhat.com (Hari Gowtham) Date: Mon, 1 Apr 2019 10:28:21 +0530 Subject: [Gluster-users] [Gluster-devel] Upgrade testing to gluster 6 In-Reply-To: References: Message-ID: Comments inline. On Mon, Apr 1, 2019 at 5:55 AM Sankarshan Mukhopadhyay wrote: > > Quite a considerable amount of detail here. Thank you! > > On Fri, Mar 29, 2019 at 11:42 AM Hari Gowtham wrote: > > > > Hello Gluster users, > > > > As you all aware that glusterfs-6 is out, we would like to inform you > > that, we have spent a significant amount of time in testing > > glusterfs-6 in upgrade scenarios. We have done upgrade testing to > > glusterfs-6 from various releases like 3.12, 4.1 and 5.3. > > > > As glusterfs-6 has got in a lot of changes, we wanted to test those portions. > > There were xlators (and respective options to enable/disable them) > > added and deprecated in glusterfs-6 from various versions [1]. > > > > We had to check the following upgrade scenarios for all such options > > Identified in [1]: > > 1) option never enabled and upgraded > > 2) option enabled and then upgraded > > 3) option enabled and then disabled and then upgraded > > > > We weren't manually able to check all the combinations for all the options. > > So the options involving enabling and disabling xlators were prioritized. > > The below are the result of the ones tested. > > > > Never enabled and upgraded: > > checked from 3.12, 4.1, 5.3 to 6 the upgrade works. > > > > Enabled and upgraded: > > Tested for tier which is deprecated, It is not a recommended upgrade. > > As expected the volume won't be consumable and will have a few more > > issues as well. > > Tested with 3.12, 4.1 and 5.3 to 6 upgrade. > > > > Enabled, disabled before upgrade. > > Tested for tier with 3.12 and the upgrade went fine. > > > > There is one common issue to note in every upgrade. The node being > > upgraded is going into disconnected state. You have to flush the iptables > > and the restart glusterd on all nodes to fix this. > > > > Is this something that is written in the upgrade notes? I do not seem > to recall, if not, I'll send a PR No this wasn't mentioned in the release notes. PRs are welcome. > > > The testing for enabling new options is still pending. The new options > > won't cause as much issues as the deprecated ones so this was put at > > the end of the priority list. It would be nice to get contributions > > for this. > > > > Did the range of tests lead to any new issues? Yes. In the first round of testing we found an issue and had to postpone the release of 6 until the fix was made available. https://bugzilla.redhat.com/show_bug.cgi?id=1684029 And then we tested it again after this patch was made available. and came across this: https://bugzilla.redhat.com/show_bug.cgi?id=1694010 Have mentioned this in the second mail as to how to over this situation for now until the fix is available. > > > For the disable testing, tier was used as it covers most of the xlator > > that was removed. And all of these tests were done on a replica 3 volume. > > > > I'm not sure if the Glusto team is reading this, but it would be > pertinent to understand if the approach you have taken can be > converted into a form of automated testing pre-release. I don't have an answer for this, have CCed Vijay. He might have an idea. > > > Note: This is only for upgrade testing of the newly added and removed > > xlators. Does not involve the normal tests for the xlator. > > > > If you have any questions, please feel free to reach us. > > > > [1] https://docs.google.com/spreadsheets/d/1nh7T5AXaV6kc5KgILOy2pEqjzC3t_R47f1XUXSVFetI/edit?usp=sharing > > > > Regards, > > Hari and Sanju. > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -- Regards, Hari Gowtham. From hgowtham at redhat.com Mon Apr 1 05:15:37 2019 From: hgowtham at redhat.com (Hari Gowtham) Date: Mon, 1 Apr 2019 10:45:37 +0530 Subject: [Gluster-users] upgrade best practices In-Reply-To: <9c792d30-0e79-98f7-6b76-9d168c947078@redhat.com> References: <629338fe8720f63420d43fa72cc7b080ba213a4c.camel@gmail.com> <9c792d30-0e79-98f7-6b76-9d168c947078@redhat.com> Message-ID: Hi, As mentioned above you need not stop the whole cluster and then upgrade and restart the gluster processes. We did do the basic rolling upgrade test with replica volume. And things turned out fine. There was this minor issue: https://bugzilla.redhat.com/show_bug.cgi?id=1694010 To overcome this, you will have to check if your upgraded node is getting disconnect. if it does, then you will have to 1) stop glusterd service on all the nodes (only glusterd) 2) flush the iptables (iptables -F) 3) start glusterd If you are fine with stopping your service and upgrading all nodes at the same time, You can go ahead with that as well. On Sun, Mar 31, 2019 at 11:02 PM Soumya Koduri wrote: > > > > On 3/29/19 10:39 PM, Poornima Gurusiddaiah wrote: > > > > > > On Fri, Mar 29, 2019, 10:03 PM Jim Kinney > > wrote: > > > > Currently running 3.12 on Centos 7.6. Doing cleanups on split-brain > > and out of sync, need heal files. > > > > We need to migrate the three replica servers to gluster v. 5 or 6. > > Also will need to upgrade about 80 clients as well. Given that a > > complete removal of gluster will not touch the 200+TB of data on 12 > > volumes, we are looking at doing that process, Stop all clients, > > stop all glusterd services, remove all of it, install new version, > > setup new volumes from old bricks, install new clients, mount > > everything. > > > > We would like to get some better performance from nfs-ganesha mounts > > but that doesn't look like an option (not done any parameter tweaks > > in testing yet). At a bare minimum, we would like to minimize the > > total downtime of all systems. > > Could you please be more specific here? As in are you looking for better > performance during upgrade process or in general? Compared to 3.12, > there are lot of perf improvements done in both glusterfs and esp., > nfs-ganesha (latest stable - V2.7.x) stack. If you could provide more > information about your workloads (for eg., large-file,small-files, > metadata-intensive) , we can make some recommendations wrt to configuration. > > Thanks, > Soumya > > > > > Does this process make more sense than a version upgrade process to > > 4.1, then 5, then 6? What "gotcha's" do I need to be ready for? I > > have until late May to prep and test on old, slow hardware with a > > small amount of files and volumes. > > > > > > You can directly upgrade from 3.12 to 6.x. I would suggest that rather > > than deleting and creating Gluster volume. +Hari and +Sanju for further > > guidelines on upgrade, as they recently did upgrade tests. +Soumya to > > add to the nfs-ganesha aspect. > > > > Regards, > > Poornima > > > > -- > > > > James P. Kinney III > > > > Every time you stop a school, you will have to build a jail. What you > > gain at one end you lose at the other. It's like feeding a dog on his > > own tail. It won't fatten the dog. > > - Speech 11/23/1900 Mark Twain > > > > http://heretothereideas.blogspot.com/ > > > > _______________________________________________ > > Gluster-users mailing list > > Gluster-users at gluster.org > > https://lists.gluster.org/mailman/listinfo/gluster-users > > -- Regards, Hari Gowtham. From kdhananj at redhat.com Mon Apr 1 05:56:11 2019 From: kdhananj at redhat.com (Krutika Dhananjay) Date: Mon, 1 Apr 2019 11:26:11 +0530 Subject: [Gluster-users] [ovirt-users] Re: Announcing Gluster release 5.5 In-Reply-To: References: <20190328164716.27693.35887@mail.ovirt.org> Message-ID: Adding back gluster-users Comments inline ... On Fri, Mar 29, 2019 at 8:11 PM Olaf Buitelaar wrote: > Dear Krutika, > > > > 1. I?ve made 2 profile runs of around 10 minutes (see files > profile_data.txt and profile_data2.txt). Looking at it, most time seems be > spent at the fop?s fsync and readdirp. > > Unfortunate I don?t have the profile info for the 3.12.15 version so it?s > a bit hard to compare. > > One additional thing I do notice on 1 machine (10.32.9.5) the iowait time > increased a lot, from an average below the 1% it?s now around the 12% after > the upgrade. > > So first suspicion with be lighting strikes twice, and I?ve also just now > a bad disk, but that doesn?t appear to be the case, since all smart status > report ok. > > Also dd shows performance I would more or less expect; > > dd if=/dev/zero of=/data/test_file bs=100M count=1 oflag=dsync > > 1+0 records in > > 1+0 records out > > 104857600 bytes (105 MB) copied, 0.686088 s, 153 MB/s > > dd if=/dev/zero of=/data/test_file bs=1G count=1 oflag=dsync > > 1+0 records in > > 1+0 records out > > 1073741824 bytes (1.1 GB) copied, 7.61138 s, 141 MB/s > > if=/dev/urandom of=/data/test_file bs=1024 count=1000000 > > 1000000+0 records in > > 1000000+0 records out > > 1024000000 bytes (1.0 GB) copied, 6.35051 s, 161 MB/s > > dd if=/dev/zero of=/data/test_file bs=1024 count=1000000 > > 1000000+0 records in > > 1000000+0 records out > > 1024000000 bytes (1.0 GB) copied, 1.6899 s, 606 MB/s > > When I disable this brick (service glusterd stop; pkill glusterfsd) > performance in gluster is better, but not on par with what it was. Also the > cpu usages on the ?neighbor? nodes which hosts the other bricks in the same > subvolume increases quite a lot in this case, which I wouldn?t expect > actually since they shouldn't handle much more work, except flagging shards > to heal. Iowait also goes to idle once gluster is stopped, so it?s for > sure gluster which waits for io. > > > So I see that FSYNC %-latency is on the higher side. And I also noticed you don't have direct-io options enabled on the volume. Could you set the following options on the volume - # gluster volume set network.remote-dio off # gluster volume set performance.strict-o-direct on and also disable choose-local # gluster volume set cluster.choose-local off let me know if this helps. 2. I?ve attached the mnt log and volume info, but I couldn?t find anything > relevant in in those logs. I think this is because we run the VM?s with > libgfapi; > > [root at ovirt-host-01 ~]# engine-config -g LibgfApiSupported > > LibgfApiSupported: true version: 4.2 > > LibgfApiSupported: true version: 4.1 > > LibgfApiSupported: true version: 4.3 > > And I can confirm the qemu process is invoked with the gluster:// address > for the images. > > The message is logged in the /var/lib/libvert/qemu/ file, which > I?ve also included. For a sample case see around; 2019-03-28 20:20:07 > > Which has the error; E [MSGID: 133010] > [shard.c:2294:shard_common_lookup_shards_cbk] 0-ovirt-kube-shard: Lookup on > shard 109886 failed. Base file gfid = a38d64bc-a28b-4ee1-a0bb-f919e7a1022c > [Stale file handle] > Could you also attach the brick logs for this volume? > > 3. yes I see multiple instances for the same brick directory, like; > > /usr/sbin/glusterfsd -s 10.32.9.6 --volfile-id > ovirt-core.10.32.9.6.data-gfs-bricks-brick1-ovirt-core -p > /var/run/gluster/vols/ovirt-core/10.32.9.6-data-gfs-bricks-brick1-ovirt-core.pid > -S /var/run/gluster/452591c9165945d9.socket --brick-name > /data/gfs/bricks/brick1/ovirt-core -l > /var/log/glusterfs/bricks/data-gfs-bricks-brick1-ovirt-core.log > --xlator-option *-posix.glusterd-uuid=fb513da6-f3bd-4571-b8a2-db5efaf60cc1 > --process-name brick --brick-port 49154 --xlator-option > ovirt-core-server.listen-port=49154 > > > > I?ve made an export of the output of ps from the time I observed these > multiple processes. > > In addition the brick_mux bug as noted by Atin. I might also have another > possible cause, as ovirt moves nodes from none-operational state or > maintenance state to active/activating, it also seems to restart gluster, > however I don?t have direct proof for this theory. > > > +Atin Mukherjee ^^ +Mohit Agrawal ^^ -Krutika Thanks Olaf > > Op vr 29 mrt. 2019 om 10:03 schreef Sandro Bonazzola >: > >> >> >> Il giorno gio 28 mar 2019 alle ore 17:48 ha >> scritto: >> >>> Dear All, >>> >>> I wanted to share my experience upgrading from 4.2.8 to 4.3.1. While >>> previous upgrades from 4.1 to 4.2 etc. went rather smooth, this one was a >>> different experience. After first trying a test upgrade on a 3 node setup, >>> which went fine. i headed to upgrade the 9 node production platform, >>> unaware of the backward compatibility issues between gluster 3.12.15 -> >>> 5.3. After upgrading 2 nodes, the HA engine stopped and wouldn't start. >>> Vdsm wasn't able to mount the engine storage domain, since /dom_md/metadata >>> was missing or couldn't be accessed. Restoring this file by getting a good >>> copy of the underlying bricks, removing the file from the underlying bricks >>> where the file was 0 bytes and mark with the stickybit, and the >>> corresponding gfid's. Removing the file from the mount point, and copying >>> back the file on the mount point. Manually mounting the engine domain, and >>> manually creating the corresponding symbolic links in /rhev/data-center and >>> /var/run/vdsm/storage and fixing the ownership back to vdsm.kvm (which was >>> root.root), i was able to start the HA engine again. Since the engine was >>> up again, and things seemed rather unstable i decided to continue the >>> upgrade on the other nodes suspecting an incompatibility in gluster >>> versions, i thought would be best to have them all on the same version >>> rather soonish. However things went from bad to worse, the engine stopped >>> again, and all vm?s stopped working as well. So on a machine outside the >>> setup and restored a backup of the engine taken from version 4.2.8 just >>> before the upgrade. With this engine I was at least able to start some vm?s >>> again, and finalize the upgrade. Once the upgraded, things didn?t stabilize >>> and also lose 2 vm?s during the process due to image corruption. After >>> figuring out gluster 5.3 had quite some issues I was as lucky to see >>> gluster 5.5 was about to be released, on the moment the RPM?s were >>> available I?ve installed those. This helped a lot in terms of stability, >>> for which I?m very grateful! However the performance is unfortunate >>> terrible, it?s about 15% of what the performance was running gluster >>> 3.12.15. It?s strange since a simple dd shows ok performance, but our >>> actual workload doesn?t. While I would expect the performance to be better, >>> due to all improvements made since gluster version 3.12. Does anybody share >>> the same experience? >>> I really hope gluster 6 will soon be tested with ovirt and released, and >>> things start to perform and stabilize again..like the good old days. Of >>> course when I can do anything, I?m happy to help. >>> >> >> Opened https://bugzilla.redhat.com/show_bug.cgi?id=1693998 to track the >> rebase on Gluster 6. >> >> >> >>> >>> I think the following short list of issues we have after the migration; >>> Gluster 5.5; >>> - Poor performance for our workload (mostly write dependent) >>> - VM?s randomly pause on unknown storage errors, which are ?stale >>> file?s?. corresponding log; Lookup on shard 797 failed. Base file gfid = >>> 8a27b91a-ff02-42dc-bd4c-caa019424de8 [Stale file handle] >>> - Some files are listed twice in a directory (probably related the >>> stale file issue?) >>> Example; >>> ls -la >>> /rhev/data-center/59cd53a9-0003-02d7-00eb-0000000001e3/313f5d25-76af-4ecd-9a20-82a2fe815a3c/images/4add6751-3731-4bbd-ae94-aaeed12ea450/ >>> total 3081 >>> drwxr-x---. 2 vdsm kvm 4096 Mar 18 11:34 . >>> drwxr-xr-x. 13 vdsm kvm 4096 Mar 19 09:42 .. >>> -rw-rw----. 1 vdsm kvm 1048576 Mar 28 12:55 >>> 1a7cf259-6b29-421d-9688-b25dfaafb13c >>> -rw-rw----. 1 vdsm kvm 1048576 Mar 28 12:55 >>> 1a7cf259-6b29-421d-9688-b25dfaafb13c >>> -rw-rw----. 1 vdsm kvm 1048576 Jan 27 2018 >>> 1a7cf259-6b29-421d-9688-b25dfaafb13c.lease >>> -rw-r--r--. 1 vdsm kvm 290 Jan 27 2018 >>> 1a7cf259-6b29-421d-9688-b25dfaafb13c.meta >>> -rw-r--r--. 1 vdsm kvm 290 Jan 27 2018 >>> 1a7cf259-6b29-421d-9688-b25dfaafb13c.meta >>> >>> - brick processes sometimes starts multiple times. Sometimes I?ve 5 >>> brick processes for a single volume. Killing all glusterfsd?s for the >>> volume on the machine and running gluster v start force usually just >>> starts one after the event, from then on things look all right. >>> >>> >> May I kindly ask to open bugs on Gluster for above issues at >> https://bugzilla.redhat.com/enter_bug.cgi?product=GlusterFS ? >> Sahina? >> >> >>> Ovirt 4.3.2.1-1.el7 >>> - All vms images ownership are changed to root.root after the vm >>> is shutdown, probably related to; >>> https://bugzilla.redhat.com/show_bug.cgi?id=1666795 but not only scoped >>> to the HA engine. I?m still in compatibility mode 4.2 for the cluster and >>> for the vm?s, but upgraded to version ovirt 4.3.2 >>> >> >> Ryan? >> >> >>> - The network provider is set to ovn, which is fine..actually >>> cool, only the ?ovs-vswitchd? is a CPU hog, and utilizes 100% >>> >> >> Miguel? Dominik? >> >> >>> - It seems on all nodes vdsm tries to get the the stats for the HA >>> engine, which is filling the logs with (not sure if this is new); >>> [api.virt] FINISH getStats return={'status': {'message': "Virtual >>> machine does not exist: {'vmId': u'20d69acd-edfd-4aeb-a2ae-49e9c121b7e9'}", >>> 'code': 1}} from=::1,59290, vmId=20d69acd-edfd-4aeb-a2ae-49e9c121b7e9 >>> (api:54) >>> >> >> Simone? >> >> >>> - It seems the package os_brick [root] managedvolume not >>> supported: Managed Volume Not Supported. Missing package os-brick.: >>> ('Cannot import os_brick',) (caps:149) which fills the vdsm.log, but for >>> this I also saw another message, so I suspect this will already be resolved >>> shortly >>> - The machine I used to run the backup HA engine, doesn?t want to >>> get removed from the hosted-engine ?vm-status, not even after running; >>> hosted-engine --clean-metadata --host-id=10 --force-clean or hosted-engine >>> --clean-metadata --force-clean from the machine itself. >>> >> >> Simone? >> >> >>> >>> Think that's about it. >>> >>> Don?t get me wrong, I don?t want to rant, I just wanted to share my >>> experience and see where things can made better. >>> >> >> If not already done, can you please open bugs for above issues at >> https://bugzilla.redhat.com/enter_bug.cgi?classification=oVirt ? >> >> >>> >>> >>> Best Olaf >>> _______________________________________________ >>> Users mailing list -- users at ovirt.org >>> To unsubscribe send an email to users-leave at ovirt.org >>> Privacy Statement: https://www.ovirt.org/site/privacy-policy/ >>> oVirt Code of Conduct: >>> https://www.ovirt.org/community/about/community-guidelines/ >>> List Archives: >>> https://lists.ovirt.org/archives/list/users at ovirt.org/message/3CO35Q7VZMWNHS4LPUJNO7S47MGLSKS5/ >>> >> >> >> -- >> >> SANDRO BONAZZOLA >> >> MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV >> >> Red Hat EMEA >> >> sbonazzo at redhat.com >> >> > _______________________________________________ > Users mailing list -- users at ovirt.org > To unsubscribe send an email to users-leave at ovirt.org > Privacy Statement: https://www.ovirt.org/site/privacy-policy/ > oVirt Code of Conduct: > https://www.ovirt.org/community/about/community-guidelines/ > List Archives: > https://lists.ovirt.org/archives/list/users at ovirt.org/message/HAGTA64LF7LLE6YMHQ6DLT26MD2GZ2PK/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From revirii at googlemail.com Mon Apr 1 07:41:51 2019 From: revirii at googlemail.com (Hu Bert) Date: Mon, 1 Apr 2019 09:41:51 +0200 Subject: [Gluster-users] Lots of connections on clients - appropriate values for various thread parameters In-Reply-To: References: Message-ID: Good morning, it seems like setting performance.quick-read to off (context: increased network traffic https://bugzilla.redhat.com/show_bug.cgi?id=1673058) solved the main problem. See those 2 munin graphs, especially network and iowait on March 24th and 31st (high traffic days); param was set to off on March 26th. network: https://abload.de/img/client-internal-netwoh3kh7.png cpu: https://abload.de/img/client-cpu-iowaitatkfc.png I'll keep watching this, but hopefully the problems have disappeared. Awaiting glusterfs v5.6 with the bugfix; then, after re-enabling quick-read, i'll check again. Regards, Hubert Am Fr., 29. M?rz 2019 um 07:47 Uhr schrieb Hu Bert : > > Hi Raghavendra, > > i'll try to gather the information you need, hopefully this weekend. > > One thing i've done this week: deactivate performance.quick-read > (https://bugzilla.redhat.com/show_bug.cgi?id=1673058), which > (according to munin) ended in a massive drop in network traffic and a > slightly lower iowait. Maybe that has helped already. We'll see. > > performance.nl-cache is deactivated due to unreadable > files/directories; we have a highly concurrent workload. There are > some nginx backend webservers that check if a requested file exists in > the glusterfs filesystem; i counted the log entries, this can be up to > 5 million entries a day; about 2/3 of the files are found in the > filesystem, they get delivered to the frontend; if not: the nginx's > send the request via round robin to 3 backend tomcats, and they have > to check whether a directory exists or not (and then create it and the > requested files). So it happens that tomcatA creates a directory and a > file in it, and within (milli)seconds tomcatB+C create additional > files in this dir. > > Deactivating nl-cache helped to solve this issue, after having > conversation with Nithya and Ravishankar. Just wanted to explain that. > > > Thx so far, > Hubert > > Am Fr., 29. M?rz 2019 um 06:29 Uhr schrieb Raghavendra Gowdappa > : > > > > +Gluster-users > > > > Sorry about the delay. There is nothing suspicious about per thread CPU utilization of glusterfs process. However looking at the volume profile attached I see huge number of lookups. I think if we cutdown the number of lookups probably we'll see improvements in performance. I need following information: > > > > * dump of fuse traffic under heavy load (use --dump-fuse option while mounting) > > * client volume profile for the duration of heavy load - https://docs.gluster.org/en/latest/Administrator%20Guide/Performance%20Testing/ > > * corresponding brick volume profile > > > > Basically I need to find out > > * whether these lookups are on existing files or non-existent files > > * whether they are on directories or files > > * why/whether md-cache or kernel attribute cache or nl-cache will help to cut down lookups. > > > > regards, > > Raghavendra > > > > On Mon, Mar 25, 2019 at 12:13 PM Hu Bert wrote: > >> > >> Hi Raghavendra, > >> > >> sorry, this took a while. The last weeks the weather was bad -> less > >> traffic, but this weekend there was a massive peak. I made 3 profiles > >> with top, but at first look there's nothing special here. > >> > >> I also made a gluster profile (on one of the servers) at a later > >> moment. Maybe that helps. I also added some munin graphics from 2 of > >> the clients and 1 graphic of server network, just to show how massive > >> the problem is. > >> > >> Just wondering if the high io wait is related to the high network > >> traffic bug (https://bugzilla.redhat.com/show_bug.cgi?id=1673058); if > >> so, i could deactivate performance.quick-read and check if there is > >> less iowait. If that helps: wonderful - and yearningly awaiting > >> updated packages (e.g. v5.6). If not: maybe we have to switch from our > >> normal 10TB hdds (raid10) to SSDs if the problem is based on slow > >> hardware in the use case of small files (images). > >> > >> > >> Thx, > >> Hubert > >> > >> Am Mo., 4. M?rz 2019 um 16:59 Uhr schrieb Raghavendra Gowdappa > >> : > >> > > >> > Were you seeing high Io-wait when you captured the top output? I guess not as you mentioned the load increases during weekend. Please note that this data has to be captured when you are experiencing problems. > >> > > >> > On Mon, Mar 4, 2019 at 8:02 PM Hu Bert wrote: > >> >> > >> >> Hi, > >> >> sending the link directly to you and not the list, you can distribute > >> >> if necessary. the command ran for about half a minute. Is that enough? > >> >> More? Less? > >> >> > >> >> https://download.outdooractive.com/top.output.tar.gz > >> >> > >> >> Am Mo., 4. M?rz 2019 um 15:21 Uhr schrieb Raghavendra Gowdappa > >> >> : > >> >> > > >> >> > > >> >> > > >> >> > On Mon, Mar 4, 2019 at 7:47 PM Raghavendra Gowdappa wrote: > >> >> >> > >> >> >> > >> >> >> > >> >> >> On Mon, Mar 4, 2019 at 4:26 PM Hu Bert wrote: > >> >> >>> > >> >> >>> Hi Raghavendra, > >> >> >>> > >> >> >>> at the moment iowait and cpu consumption is quite low, the main > >> >> >>> problems appear during the weekend (high traffic, especially on > >> >> >>> sunday), so either we have to wait until next sunday or use a time > >> >> >>> machine ;-) > >> >> >>> > >> >> >>> I made a screenshot of top (https://abload.de/img/top-hvvjt2.jpg) and > >> >> >>> a text output (https://pastebin.com/TkTWnqxt), maybe that helps. Seems > >> >> >>> like processes like glfs_fuseproc (>204h) and glfs_epoll (64h for each > >> >> >>> process) consume a lot of CPU (uptime 24 days). Is that already > >> >> >>> helpful? > >> >> >> > >> >> >> > >> >> >> Not much. The TIME field just says the amount of time the thread has been executing. Since its a long standing mount, we can expect such large values. But, the value itself doesn't indicate whether the thread itself was overloaded at any (some) interval(s). > >> >> >> > >> >> >> Can you please collect output of following command and send back the collected data? > >> >> >> > >> >> >> # top -bHd 3 > top.output > >> >> > > >> >> > > >> >> > Please collect this on problematic mounts and bricks. > >> >> > > >> >> >> > >> >> >>> > >> >> >>> > >> >> >>> Hubert > >> >> >>> > >> >> >>> Am Mo., 4. M?rz 2019 um 11:31 Uhr schrieb Raghavendra Gowdappa > >> >> >>> : > >> >> >>> > > >> >> >>> > what is the per thread CPU usage like on these clients? With highly concurrent workloads we've seen single thread that reads requests from /dev/fuse (fuse reader thread) becoming bottleneck. Would like to know what is the cpu usage of this thread looks like (you can use top -H). > >> >> >>> > > >> >> >>> > On Mon, Mar 4, 2019 at 3:39 PM Hu Bert wrote: > >> >> >>> >> > >> >> >>> >> Good morning, > >> >> >>> >> > >> >> >>> >> we use gluster v5.3 (replicate with 3 servers, 2 volumes, raid10 as > >> >> >>> >> brick) with at the moment 10 clients; 3 of them do heavy I/O > >> >> >>> >> operations (apache tomcats, read+write of (small) images). These 3 > >> >> >>> >> clients have a quite high I/O wait (stats from yesterday) as can be > >> >> >>> >> seen here: > >> >> >>> >> > >> >> >>> >> client: https://abload.de/img/client1-cpu-dayulkza.png > >> >> >>> >> server: https://abload.de/img/server1-cpu-dayayjdq.png > >> >> >>> >> > >> >> >>> >> The iowait in the graphics differ a lot. I checked netstat for the > >> >> >>> >> different clients; the other clients have 8 open connections: > >> >> >>> >> https://pastebin.com/bSN5fXwc > >> >> >>> >> > >> >> >>> >> 4 for each server and each volume. The 3 clients with the heavy I/O > >> >> >>> >> have (at the moment) according to netstat 170, 139 and 153 > >> >> >>> >> connections. An example for one client can be found here: > >> >> >>> >> https://pastebin.com/2zfWXASZ > >> >> >>> >> > >> >> >>> >> gluster volume info: https://pastebin.com/13LXPhmd > >> >> >>> >> gluster volume status: https://pastebin.com/cYFnWjUJ > >> >> >>> >> > >> >> >>> >> I just was wondering if the iowait is based on the clients and their > >> >> >>> >> workflow: requesting a lot of files (up to hundreds per second), > >> >> >>> >> opening a lot of connections and the servers aren't able to answer > >> >> >>> >> properly. Maybe something can be tuned here? > >> >> >>> >> > >> >> >>> >> Especially the server|client.event-threads (both set to 4) and > >> >> >>> >> performance.(high|normal|low|least)-prio-threads (all at default value > >> >> >>> >> 16) and performance.io-thread-count (32) options, maybe these aren't > >> >> >>> >> properly configured for up to 170 client connections. > >> >> >>> >> > >> >> >>> >> Both servers and clients have a Xeon CPU (6 cores, 12 threads), a 10 > >> >> >>> >> GBit connection and 128G (servers) respectively 256G (clients) RAM. > >> >> >>> >> Enough power :-) > >> >> >>> >> > >> >> >>> >> > >> >> >>> >> Thx for reading && best regards, > >> >> >>> >> > >> >> >>> >> Hubert > >> >> >>> >> _______________________________________________ > >> >> >>> >> Gluster-users mailing list > >> >> >>> >> Gluster-users at gluster.org > >> >> >>> >> https://lists.gluster.org/mailman/listinfo/gluster-users From jim.kinney at gmail.com Mon Apr 1 16:15:10 2019 From: jim.kinney at gmail.com (Jim Kinney) Date: Mon, 01 Apr 2019 12:15:10 -0400 Subject: [Gluster-users] upgrade best practices In-Reply-To: <9c792d30-0e79-98f7-6b76-9d168c947078@redhat.com> References: <629338fe8720f63420d43fa72cc7b080ba213a4c.camel@gmail.com> <9c792d30-0e79-98f7-6b76-9d168c947078@redhat.com> Message-ID: On Sun, 2019-03-31 at 23:01 +0530, Soumya Koduri wrote: > On 3/29/19 10:39 PM, Poornima Gurusiddaiah wrote: > > On Fri, Mar 29, 2019, 10:03 PM Jim Kinney > > wrote: > > Currently running 3.12 on Centos 7.6. Doing cleanups on split- > > brain and out of sync, need heal files. > > We need to migrate the three replica servers to gluster v. 5 or > > 6. Also will need to upgrade about 80 clients as well. Given > > that a complete removal of gluster will not touch the 200+TB of > > data on 12 volumes, we are looking at doing that process, Stop > > all clients, stop all glusterd services, remove all of it, > > install new version, setup new volumes from old bricks, install > > new clients, mount everything. > > We would like to get some better performance from nfs-ganesha > > mounts but that doesn't look like an option (not done any > > parameter tweaks in testing yet). At a bare minimum, we would > > like to minimize the total downtime of all systems. > > Could you please be more specific here? As in are you looking for > better performance during upgrade process or in general? Compared to > 3.12, there are lot of perf improvements done in both glusterfs and > esp., nfs-ganesha (latest stable - V2.7.x) stack. If you could > provide more information about your workloads (for eg., large- > file,small-files, metadata-intensive) , we can make some > recommendations wrt to configuration. Sure. More details: We are (soon to be) running a three-node replica only gluster service (2 nodes now, third is racked and ready for sync and being added to gluster cluster). Each node has 2 external drive arrays plus one internal. Each node has 40G IB plus 40G IP connections (plans to upgrade to 100G). We currently have 9 volumes and each is 7TB up to 50TB of space. Each volume is a mix of thousands of large (>1GB) and tens of thousands of small (~100KB) plus thousands inbetween. Currently we have a 13-node computational cluster with varying GPU abilities that mounts all of these volumes using gluster-fuse. Writes are slow and reads are also as if from a single server. I have data from a test setup (not anywhere near the capacity of the production system - just for testing commands and recoveries) that indicates raw NFS is much faster but no gluster, gluster-fuse is much slower. We have mmap issues with python and fuse-mounted locations. Converting to NFS solves this. We have tinkered with kernel settings to handle oom-killer so it will no longer drop glusterfs when an errant job eat all the ram (set oom_score_adj - -1000 for all glusterfs pids). We would like to transition (smoothly!!) to gluster 5 or 6 with nfs- ganesha 2.7 and see some performance improvements. We will be using corosync and pacemaker for NFS failover. It would be fantastic be able to saturate a 10G IPoIB (or 40G IB !) connection to each compute node in the current computational cluster. Right now we absolutely can't get much write speed ( copy a 6.2GB file from host to gluster storage took 1m 21s. cp from disk to /dev/null is 7s). cp from gluster to /dev/null is 1.0m (same 6.2GB file). That's a 10Gbps IPoIB connection at only 800Mbps. We would like to do things like enable SSL encryption of all data flows (we deal with PHI data in a HIPAA-regulated setting) but are concerned about performance. We are running dual Intel Xeon E5-2630L (12 physical cores each @ 2.4GHz) and 128GB RAM in each server node. We have 170 users. About 20 are active at any time. The current setting on /home (others are similar if not identical, maybe nfs-disable is true for others): gluster volume get home allOption Value ------ -- --- cluster.lookup- unhashed on cluste r.lookup- optimize off cluste r.min-free- disk 10% cluster. min-free- inodes 5% cluster. rebalance- stats off cluster.s ubvols-per- directory (null) cluster.rea ddir- optimize off cluster .rsync-hash- regex (null) cluster.ex tra-hash- regex (null) cluster.dh t-xattr- name trusted.glusterfs.dht cluster.r andomize-hash-range-by- gfid off cluster.rebal- throttle normal clust er.lock- migration off clus ter.local-volume- name (null) cluster.weig hted- rebalance on cluster. switch- pattern (null) cluste r.entry-change- log on cluster.read -subvolume (null) clu ster.read-subvolume-index - 1 cluster.read-hash- mode 1 cluster.b ackground-self-heal- count 8 cluster.metadata- self- heal on cluster.data- self- heal on cluster.e ntry-self- heal on cluster.se lf-heal- daemon enable cluster.h eal- timeout 600 clus ter.self-heal-window- size 1 cluster.data- change- log on cluster.met adata-change- log on cluster.data- self-heal- algorithm (null) cluster.eager- lock on dispe rse.eager- lock on cluste r.quorum- type none cluste r.quorum- count (null) cluste r.choose- local true cluste r.self-heal-readdir- size 1KB cluster.post-op- delay- secs 1 cluster.ensur e- durability on cluste r.consistent- metadata no cluster.he al-wait-queue- length 128 cluster.favorit e-child- policy none cluster.stripe -block- size 128KB cluster.stri pe- coalesce true diagno stics.latency- measurement off diagnostics .dump-fd- stats off diagnostics .count-fop- hits off diagnostics.b rick-log- level INFO diagnostics.c lient-log- level INFO diagnostics.br ick-sys-log- level CRITICAL diagnostics.clien t-sys-log- level CRITICAL diagnostics.brick- logger (null) diagnosti cs.client- logger (null) diagnostic s.brick-log- format (null) diagnostics.c lient-log- format (null) diagnostics.br ick-log-buf- size 5 diagnostics.clien t-log-buf- size 5 diagnostics.brick- log-flush- timeout 120 diagnostics.client- log-flush- timeout 120 diagnostics.stats- dump- interval 0 diagnostics.fo p-sample- interval 0 diagnostics.st ats-dump- format json diagnostics.fo p-sample-buf- size 65535 diagnostics.stats- dnscache-ttl- sec 86400 performance.cache-max- file- size 0 performance.cache- min-file- size 0 performance.cache- refresh- timeout 1 performance.cache -priority performa nce.cache- size 32MB performan ce.io-thread- count 16 performance.h igh-prio- threads 16 performance.n ormal-prio- threads 16 performance.low -prio- threads 16 performance. least-prio- threads 1 performance.en able-least- priority on performance.cach e- size 128MB performan ce.flush- behind on performan ce.nfs.flush- behind on performance.w rite-behind-window- size 1MB performance.resync- failed-syncs-after- fsyncoff performance.nfs.write- behind-window- size1MB performance.strict-o- direct off performance. nfs.strict-o- direct off performance.stri ct-write- ordering off performance.nfs. strict-write- ordering off performance.lazy- open yes performa nce.read-after- open no performance.re ad-ahead-page- count 4 performance.md- cache- timeout 1 performance. cache-swift- metadata true performance.cac he-samba- metadata false performance.cac he-capability- xattrs true performance.cache- ima- xattrs true features.encr yption off encr yption.master- key (null) encryptio n.data-key- size 256 encryption. block- size 4096 network. frame- timeout 1800 netwo rk.ping- timeout 42 netw ork.tcp-window- size (null) features.l ock- heal off featu res.grace- timeout 10 networ k.remote- dio disable client .event- threads 2 clie nt.tcp-user- timeout 0 client. keepalive- time 20 client.k eepalive- interval 2 client.k eepalive- count 9 network. tcp-window- size (null) network.in ode-lru- limit 16384 auth.allo w * auth.reject (null) transport.keepalive 1 server.allow- insecure (null) serv er.root- squash off ser ver.anonuid 65534 server.anongid 65534 server.statedump- path /var/run/gluster server.o utstanding-rpc- limit 64 features.lock- heal off featu res.grace- timeout 10 server .ssl (null) auth.ssl- allow * server.manage- gids off serve r.dynamic- auth on client .send- gids on ser ver.gid- timeout 300 se rver.own- thread (null) se rver.event- threads 1 serv er.tcp-user- timeout 0 server. keepalive- time 20 server.k eepalive- interval 2 server.k eepalive- count 9 transpor t.listen- backlog 10 ssl.own- cert (null) ssl.private- key (null) ssl .ca- list (null) ssl.crl- path (null) ssl.certificate- depth (null) ssl.cip her- list (null) ss l.dh- param (null) ssl.ec- curve (null) performance.write- behind on performan ce.read- ahead on performa nce.readdir- ahead off performance .io- cache on perfor mance.quick- read on performan ce.open- behind on performa nce.nl- cache off perfor mance.stat- prefetch on performa nce.client-io- threads off performance.n fs.write- behind on performance.n fs.read- ahead off performance. nfs.io- cache off performanc e.nfs.quick- read off performance.n fs.stat- prefetch off performance. nfs.io- threads off performanc e.force- readdirp true performan ce.cache- invalidation false features. uss off features.snapshot- directory .snaps features. show-snapshot- directory off network.compre ssion off netwo rk.compression.window-size - 15 network.compression.mem- level 8 network.compres sion.min- size 0 network.compres sion.compression-level - 1 network.compression.debug false features.limit- usage (null) featur es.default-soft- limit 80% features.soft -timeout 60 feat ures.hard- timeout 5 featu res.alert- time 86400 featur es.quota-deem- statfs off geo- replication.indexing off geo- replication.indexing off geo-replication.ignore-pid- check off geo- replication.ignore-pid- check off features.quota off features. inode- quota off featur es.bitrot disable debug.trace off debug.log- history no d ebug.log- file no d ebug.exclude- ops (null) debug .include- ops (null) debug .error- gen off deb ug.error- failure (null) deb ug.error- number (null) deb ug.random- failure off debu g.error- fops (null) nfs .enable- ino32 no nf s.mem- factor 15 nfs.export- dirs on nf s.export- volumes on nf s.addr- namelookup off nfs.dynamic- volumes off nfs .register-with- portmap on nfs.outst anding-rpc- limit 16 nfs.port 2049 nf s.rpc-auth- unix on nfs. rpc-auth- null on nfs. rpc-auth- allow all nfs. rpc-auth- reject none nfs. ports- insecure off n fs.trusted- sync off nfs .trusted- write off nfs .volume-access read- write nfs.export- dir nf s.disable off nfs.nlm on nfs.acl on nfs.mount- udp off n fs.mount- rmtab /var/lib/glusterd/nfs/rmtab n fs.rpc- statd /sbin/rpc.statd nfs.server-aux- gids off nfs.dr c off nfs.drc- size 0x20000 nfs.read-size (1 * 1048576ULL) nfs.write- size (1 * 1048576ULL) nfs.readdir- size (1 * 1048576ULL) nfs.rdirplus on nfs.exports-auth- enable (null) nfs.auth -refresh-interval- sec (null) nfs.auth-cache- ttl- sec (null) features.r ead- only off featu res.worm off features.worm-file- level off features.d efault-retention- period 120 features.retention -mode relax features. auto-commit- period 180 storage.linu x- aio off stora ge.batch-fsync-mode reverse- fsync storage.batch-fsync-delay- usec 0 storage.owner- uid - 1 storage.owner- gid - 1 storage.node-uuid- pathinfo off storage.h ealth-check- interval 30 storage.buil d- pgfid on stora ge.gfid2path on storage.gfid2path- separator : storage.b d- aio off cl uster.server-quorum- type off cluster.serve r-quorum- ratio 0 changelog.cha ngelog off chan gelog.changelog- dir (null) changelog.e ncoding ascii ch angelog.rollover- time 15 changelog. fsync- interval 5 changel og.changelog-barrier- timeout 120 changelog.capture- del- path off features.barr ier disable feat ures.barrier- timeout 120 features .trash off features.trash- dir .trashcan featur es.trash-eliminate- path (null) features.trash- max- filesize 5MB features.t rash-internal- op off cluster.enable- shared- storage disable cluster.write -freq- threshold 0 cluster.re ad-freq- threshold 0 cluster.t ier- pause off clus ter.tier-promote- frequency 120 cluster.tier -demote- frequency 3600 cluster.wat ermark- hi 90 cluster.w atermark- low 75 cluster.t ier- mode cache clus ter.tier-max-promote-file- size 0 cluster.tier-max- mb 4000 cluster. tier-max- files 10000 cluster. tier-query- limit 100 cluster.ti er- compact on clus ter.tier-hot-compact- frequency 604800 cluster.tier- cold-compact- frequency 604800 features.ctr- enabled off feat ures.record- counters off feature s.ctr-record-metadata- heat off features.ctr_link_co nsistency off features.ct r_lookupheal_link_timeout 300 fe atures.ctr_lookupheal_inode_timeout 300 features.ctr-sql-db- cachesize 12500 features.ct r-sql-db-wal- autocheckpoint 25000 features.selinu x on locks. trace off locks.mandatory- locking off cluster .disperse-self-heal- daemon enable cluster.quorum- reads no client .bind- insecure (null) fea tures.shard off features.shard-block- size 64MB features.scr ub- throttle lazy featur es.scrub- freq biweekly featur es.scrub false features.expiry- time 120 feature s.cache- invalidation off featur es.cache-invalidation- timeout 60 features.leases off features.l ease-lock-recall- timeout 60 disperse.backgroun d- heals 8 disperse.he al-wait- qlength 128 cluster.he al- timeout 600 dht. force- readdirp on d isperse.read-policy gfid- hash cluster.shd-max- threads 1 cluster .shd-wait- qlength 1024 cluster. locking- scheme full cluster .granular-entry- heal no features.locks -revocation- secs 0 features.locks- revocation-clear- all false features.locks- revocation-max- blocked 0 features.locks- monkey- unlocking false disperse.shd- max- threads 1 disperse .shd-wait- qlength 1024 disperse. cpu- extensions auto disp erse.self-heal-window- size 1 cluster.use- compound- fops off performance. parallel- readdir off performance. rda-request- size 131072 performance.rda -low- wmark 4096 performance .rda-high- wmark 128KB performance. rda-cache- limit 10MB performance.n l-cache-positive- entry false performance.nl-cache- limit 10MB performance. nl-cache- timeout 60 cluster.bric k- multiplex off clust er.max-bricks-per- process 0 disperse.optim istic-change- log on cluster.halo- enabled False clus ter.halo-shd-max- latency 99999 cluster.halo -nfsd-max- latency 5 cluster.halo- max- latency 5 cluster. halo-max-replicas > Thanks,Soumya > > Does this process make more sense than a version upgrade > > process to 4.1, then 5, then 6? What "gotcha's" do I need to be > > ready for? I have until late May to prep and test on old, slow > > hardware with a small amount of files and volumes. > > > > You can directly upgrade from 3.12 to 6.x. I would suggest that > > rather than deleting and creating Gluster volume. +Hari and +Sanju > > for further guidelines on upgrade, as they recently did upgrade > > tests. +Soumya to add to the nfs-ganesha aspect. > > Regards,Poornima > > -- > > James P. Kinney III > > Every time you stop a school, you will have to build a jail. > > What you gain at one end you lose at the other. It's like > > feeding a dog on his own tail. It won't fatten the dog. - > > Speech 11/23/1900 Mark Twain > > http://heretothereideas.blogspot.com/ > > > > _______________________________________________ Gluster- > > users mailing list Gluster-users at gluster.org > users at gluster.org> > > https://lists.gluster.org/mailman/listinfo/gluster-users > > -- James P. Kinney III Every time you stop a school, you will have to build a jail. What you gain at one end you lose at the other. It's like feeding a dog on his own tail. It won't fatten the dog. - Speech 11/23/1900 Mark Twain http://heretothereideas.blogspot.com/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From tomfite at gmail.com Mon Apr 1 18:27:17 2019 From: tomfite at gmail.com (Tom Fite) Date: Mon, 1 Apr 2019 14:27:17 -0400 Subject: [Gluster-users] Rsync in place of heal after brick failure Message-ID: Hi all, I have a very large (65 TB) brick in a replica 2 volume that needs to be re-copied from scratch. A heal will take a very long time with performance degradation on the volume so I investigated using rsync to do the brunt of the work. The command: rsync -av -H -X --numeric-ids --progress server1:/data/brick1/gv0 /data/brick1/ Running with -H assures that the hard links in .glusterfs are preserved, and -X preserves all of gluster's extended attributes. I've tested this on my test environment as follows: 1. Stop glusterd and kill procs 2. Move brick volume to backup dir 3. Run rsync 4. Start glusterd 5. Observe gluster status All appears to be working correctly. Gluster status reports all bricks online, all data is accessible in the volume, and I don't see any errors in the logs. Anybody else have experience trying this? Thanks -Tom -------------- next part -------------- An HTML attachment was scrubbed... URL: From skoduri at redhat.com Mon Apr 1 18:37:18 2019 From: skoduri at redhat.com (Soumya Koduri) Date: Tue, 2 Apr 2019 00:07:18 +0530 Subject: [Gluster-users] upgrade best practices In-Reply-To: References: <629338fe8720f63420d43fa72cc7b080ba213a4c.camel@gmail.com> <9c792d30-0e79-98f7-6b76-9d168c947078@redhat.com> Message-ID: <3ca551bb-cfbf-d079-e3a2-6e3226a07618@redhat.com> Thanks for the details. Response inline - On 4/1/19 9:45 PM, Jim Kinney wrote: > On Sun, 2019-03-31 at 23:01 +0530, Soumya Koduri wrote: >> >> On 3/29/19 10:39 PM, Poornima Gurusiddaiah wrote: >>> >>> >>> On Fri, Mar 29, 2019, 10:03 PM Jim Kinney < >>> jim.kinney at gmail.com >>> >>> >>> >> jim.kinney at gmail.com >>> >>> >> wrote: >>> >>> Currently running 3.12 on Centos 7.6. Doing cleanups on split-brain >>> and out of sync, need heal files. >>> >>> We need to migrate the three replica servers to gluster v. 5 or 6. >>> Also will need to upgrade about 80 clients as well. Given that a >>> complete removal of gluster will not touch the 200+TB of data on 12 >>> volumes, we are looking at doing that process, Stop all clients, >>> stop all glusterd services, remove all of it, install new version, >>> setup new volumes from old bricks, install new clients, mount >>> everything. >>> >>> We would like to get some better performance from nfs-ganesha mounts >>> but that doesn't look like an option (not done any parameter tweaks >>> in testing yet). At a bare minimum, we would like to minimize the >>> total downtime of all systems. >> >> Could you please be more specific here? As in are you looking for better >> performance during upgrade process or in general? Compared to 3.12, >> there are lot of perf improvements done in both glusterfs and esp., >> nfs-ganesha (latest stable - V2.7.x) stack. If you could provide more >> information about your workloads (for eg., large-file,small-files, >> metadata-intensive) , we can make some recommendations wrt to configuration. > > Sure. More details: > > We are (soon to be) running a three-node replica only gluster service (2 > nodes now, third is racked and ready for sync and being added to gluster > cluster). Each node has 2 external drive arrays plus one internal. Each > node has 40G IB plus 40G IP connections (plans to upgrade to 100G). We > currently have 9 volumes and each is 7TB up to 50TB of space. Each > volume is a mix of thousands of large (>1GB) and tens of thousands of > small (~100KB) plus thousands inbetween. > > Currently we have a 13-node computational cluster with varying GPU > abilities that mounts all of these volumes using gluster-fuse. Writes > are slow and reads are also as if from a single server. I have data from > a test setup (not anywhere near the capacity of the production system - > just for testing commands and recoveries) that indicates raw NFS is much > faster but no gluster, gluster-fuse is much slower. We have mmap issues > with python and fuse-mounted locations. Converting to NFS solves this. > We have tinkered with kernel settings to handle oom-killer so it will no > longer drop glusterfs when an errant job eat all the ram (set > oom_score_adj - -1000 for all glusterfs pids). Have you tried tuning any perf parameters? From the volume options you have shared below, I see that there is scope to improve performance (for eg., by enabling md-cache parameters and parallel-readdir, metadata related operations latency can be improved). Request Poornima, Xavi or Du to comment on recommended values for better I/O throughput for your workload. > > We would like to transition (smoothly!!) to gluster 5 or 6 with > nfs-ganesha 2.7 and see some performance improvements. We will be using > corosync and pacemaker for NFS failover. It would be fantastic be able > to saturate a 10G IPoIB (or 40G IB !) connection to each compute node in > the current computational cluster. Right now we absolutely can't get > much write speed ( copy a 6.2GB file from host to gluster storage took > 1m 21s. cp from disk to /dev/null is 7s). cp from gluster to /dev/null > is 1.0m (same 6.2GB file). That's a 10Gbps IPoIB connection at only 800Mbps. Few things to note here - * The volume option "nfs.disable" command refers to GluscterNFS service which is being deprecated and not enabled by default in the latest gluster versions available (like in gluster 5 & 6). We recommend NFS-Ganesha and hence this option needs to be turned on (to disable GlusterNFS) * Starting from Gluster 3.11 , HA configuration bits for NFS-Ganesha have been removed from gluster codebase. So you would need to either manually configure any HA service on top of NFS-Ganesha servers or use storhaug [1] to configure the same. * Coming to technical aspects, by switching to 'NFS', you could benefit from heavy caching done by NFS client and few other optimizations it does. Even NFS-Ganesha server does metadata caching and resides on the same nodes as the glusterfs servers. Apart from these, NFS-Ganesha acts like any other glusterfs client (but by making use of libgfapi and not fuse mount). It would be interesting to check if and how much improvement you get with 'NFS' when compared to fuse protocol for your workload. Please let us know when you have the test environment ready. Will make recommendations wrt to few settings for NFS-Ganesha server and client. Thanks, Soumya [1] https://github.com/linux-ha-storage/storhaug > > We would like to do things like enable SSL encryption of all data flows > (we deal with PHI data in a HIPAA-regulated setting) but are concerned > about performance. We are running dual Intel Xeon ?E5-2630L?(12 physical > cores each @ 2.4GHz) and 128GB RAM in each server node. We have 170 > users. About 20 are active at any time. > > The current setting on /home (others are similar if not identical, maybe > nfs-disable is true for others): > > gluster volume get home all > Option??????????????????????????????????Value > ------??????????????????????????????????----- > cluster.lookup-unhashed?????????????????on > cluster.lookup-optimize?????????????????off > cluster.min-free-disk???????????????????10% > cluster.min-free-inodes?????????????????5% > cluster.rebalance-stats?????????????????off > cluster.subvols-per-directory???????????(null) > cluster.readdir-optimize????????????????off > cluster.rsync-hash-regex????????????????(null) > cluster.extra-hash-regex????????????????(null) > cluster.dht-xattr-name??????????????????trusted.glusterfs.dht > cluster.randomize-hash-range-by-gfid????off > cluster.rebal-throttle??????????????????normal > cluster.lock-migration??????????????????off > cluster.local-volume-name???????????????(null) > cluster.weighted-rebalance??????????????on > cluster.switch-pattern??????????????????(null) > cluster.entry-change-log????????????????on > cluster.read-subvolume??????????????????(null) > cluster.read-subvolume-index????????????-1 > cluster.read-hash-mode??????????????????1 > cluster.background-self-heal-count??????8 > cluster.metadata-self-heal??????????????on > cluster.data-self-heal??????????????????on > cluster.entry-self-heal?????????????????on > cluster.self-heal-daemon????????????????enable > cluster.heal-timeout????????????????????600 > cluster.self-heal-window-size???????????1 > cluster.data-change-log?????????????????on > cluster.metadata-change-log?????????????on > cluster.data-self-heal-algorithm????????(null) > cluster.eager-lock??????????????????????on > disperse.eager-lock?????????????????????on > cluster.quorum-type?????????????????????none > cluster.quorum-count????????????????????(null) > cluster.choose-local????????????????????true > cluster.self-heal-readdir-size??????????1KB > cluster.post-op-delay-secs??????????????1 > cluster.ensure-durability???????????????on > cluster.consistent-metadata?????????????no > cluster.heal-wait-queue-length??????????128 > cluster.favorite-child-policy???????????none > cluster.stripe-block-size???????????????128KB > cluster.stripe-coalesce?????????????????true > diagnostics.latency-measurement?????????off > diagnostics.dump-fd-stats???????????????off > diagnostics.count-fop-hits??????????????off > diagnostics.brick-log-level?????????????INFO > diagnostics.client-log-level????????????INFO > diagnostics.brick-sys-log-level?????????CRITICAL > diagnostics.client-sys-log-level????????CRITICAL > diagnostics.brick-logger????????????????(null) > diagnostics.client-logger???????????????(null) > diagnostics.brick-log-format????????????(null) > diagnostics.client-log-format???????????(null) > diagnostics.brick-log-buf-size??????????5 > diagnostics.client-log-buf-size?????????5 > diagnostics.brick-log-flush-timeout?????120 > diagnostics.client-log-flush-timeout????120 > diagnostics.stats-dump-interval?????????0 > diagnostics.fop-sample-interval?????????0 > diagnostics.stats-dump-format???????????json > diagnostics.fop-sample-buf-size?????????65535 > diagnostics.stats-dnscache-ttl-sec??????86400 > performance.cache-max-file-size?????????0 > performance.cache-min-file-size?????????0 > performance.cache-refresh-timeout???????1 > performance.cache-priority > performance.cache-size??????????????????32MB > performance.io-thread-count?????????????16 > performance.high-prio-threads???????????16 > performance.normal-prio-threads?????????16 > performance.low-prio-threads????????????16 > performance.least-prio-threads??????????1 > performance.enable-least-priority???????on > performance.cache-size??????????????????128MB > performance.flush-behind????????????????on > performance.nfs.flush-behind????????????on > performance.write-behind-window-size????1MB > performance.resync-failed-syncs-after-fsyncoff > performance.nfs.write-behind-window-size1MB > performance.strict-o-direct?????????????off > performance.nfs.strict-o-direct?????????off > performance.strict-write-ordering???????off > performance.nfs.strict-write-ordering???off > performance.lazy-open???????????????????yes > performance.read-after-open?????????????no > performance.read-ahead-page-count???????4 > performance.md-cache-timeout????????????1 > performance.cache-swift-metadata????????true > performance.cache-samba-metadata????????false > performance.cache-capability-xattrs?????true > performance.cache-ima-xattrs????????????true > features.encryption?????????????????????off > encryption.master-key???????????????????(null) > encryption.data-key-size????????????????256 > encryption.block-size???????????????????4096 > network.frame-timeout???????????????????1800 > network.ping-timeout????????????????????42 > network.tcp-window-size?????????????????(null) > features.lock-heal??????????????????????off > features.grace-timeout??????????????????10 > network.remote-dio??????????????????????disable > client.event-threads????????????????????2 > client.tcp-user-timeout?????????????????0 > client.keepalive-time???????????????????20 > client.keepalive-interval???????????????2 > client.keepalive-count??????????????????9 > network.tcp-window-size?????????????????(null) > network.inode-lru-limit?????????????????16384 > auth.allow??????????????????????????????* > auth.reject?????????????????????????????(null) > transport.keepalive?????????????????????1 > server.allow-insecure???????????????????(null) > server.root-squash??????????????????????off > server.anonuid??????????????????????????65534 > server.anongid??????????????????????????65534 > server.statedump-path???????????????????/var/run/gluster > server.outstanding-rpc-limit????????????64 > features.lock-heal??????????????????????off > features.grace-timeout??????????????????10 > server.ssl??????????????????????????????(null) > auth.ssl-allow??????????????????????????* > server.manage-gids??????????????????????off > server.dynamic-auth?????????????????????on > client.send-gids????????????????????????on > server.gid-timeout??????????????????????300 > server.own-thread???????????????????????(null) > server.event-threads????????????????????1 > server.tcp-user-timeout?????????????????0 > server.keepalive-time???????????????????20 > server.keepalive-interval???????????????2 > server.keepalive-count??????????????????9 > transport.listen-backlog????????????????10 > ssl.own-cert????????????????????????????(null) > ssl.private-key?????????????????????????(null) > ssl.ca-list?????????????????????????????(null) > ssl.crl-path????????????????????????????(null) > ssl.certificate-depth???????????????????(null) > ssl.cipher-list?????????????????????????(null) > ssl.dh-param????????????????????????????(null) > ssl.ec-curve????????????????????????????(null) > performance.write-behind????????????????on > performance.read-ahead??????????????????on > performance.readdir-ahead???????????????off > performance.io-cache????????????????????on > performance.quick-read??????????????????on > performance.open-behind?????????????????on > performance.nl-cache????????????????????off > performance.stat-prefetch???????????????on > performance.client-io-threads???????????off > performance.nfs.write-behind????????????on > performance.nfs.read-ahead??????????????off > performance.nfs.io-cache????????????????off > performance.nfs.quick-read??????????????off > performance.nfs.stat-prefetch???????????off > performance.nfs.io-threads??????????????off > performance.force-readdirp??????????????true > performance.cache-invalidation??????????false > features.uss????????????????????????????off > features.snapshot-directory?????????????.snaps > features.show-snapshot-directory????????off > network.compression?????????????????????off > network.compression.window-size?????????-15 > network.compression.mem-level???????????8 > network.compression.min-size????????????0 > network.compression.compression-level???-1 > network.compression.debug???????????????false > features.limit-usage????????????????????(null) > features.default-soft-limit?????????????80% > features.soft-timeout???????????????????60 > features.hard-timeout???????????????????5 > features.alert-time?????????????????????86400 > features.quota-deem-statfs??????????????off > geo-replication.indexing????????????????off > geo-replication.indexing????????????????off > geo-replication.ignore-pid-check????????off > geo-replication.ignore-pid-check????????off > features.quota??????????????????????????off > features.inode-quota????????????????????off > features.bitrot?????????????????????????disable > debug.trace?????????????????????????????off > debug.log-history???????????????????????no > debug.log-file??????????????????????????no > debug.exclude-ops???????????????????????(null) > debug.include-ops???????????????????????(null) > debug.error-gen?????????????????????????off > debug.error-failure?????????????????????(null) > debug.error-number??????????????????????(null) > debug.random-failure????????????????????off > debug.error-fops????????????????????????(null) > nfs.enable-ino32????????????????????????no > nfs.mem-factor??????????????????????????15 > nfs.export-dirs?????????????????????????on > nfs.export-volumes??????????????????????on > nfs.addr-namelookup?????????????????????off > nfs.dynamic-volumes?????????????????????off > nfs.register-with-portmap???????????????on > nfs.outstanding-rpc-limit???????????????16 > nfs.port????????????????????????????????2049 > nfs.rpc-auth-unix???????????????????????on > nfs.rpc-auth-null???????????????????????on > nfs.rpc-auth-allow??????????????????????all > nfs.rpc-auth-reject?????????????????????none > nfs.ports-insecure??????????????????????off > nfs.trusted-sync????????????????????????off > nfs.trusted-write???????????????????????off > nfs.volume-access???????????????????????read-write > nfs.export-dir > nfs.disable?????????????????????????????off > nfs.nlm?????????????????????????????????on > nfs.acl?????????????????????????????????on > nfs.mount-udp???????????????????????????off > nfs.mount-rmtab?????????????????????????/var/lib/glusterd/nfs/rmtab > nfs.rpc-statd???????????????????????????/sbin/rpc.statd > nfs.server-aux-gids?????????????????????off > nfs.drc?????????????????????????????????off > nfs.drc-size????????????????????????????0x20000 > nfs.read-size???????????????????????????(1 * 1048576ULL) > nfs.write-size??????????????????????????(1 * 1048576ULL) > nfs.readdir-size????????????????????????(1 * 1048576ULL) > nfs.rdirplus????????????????????????????on > nfs.exports-auth-enable?????????????????(null) > nfs.auth-refresh-interval-sec???????????(null) > nfs.auth-cache-ttl-sec??????????????????(null) > features.read-only??????????????????????off > features.worm???????????????????????????off > features.worm-file-level????????????????off > features.default-retention-period???????120 > features.retention-mode?????????????????relax > features.auto-commit-period?????????????180 > storage.linux-aio???????????????????????off > storage.batch-fsync-mode????????????????reverse-fsync > storage.batch-fsync-delay-usec??????????0 > storage.owner-uid???????????????????????-1 > storage.owner-gid???????????????????????-1 > storage.node-uuid-pathinfo??????????????off > storage.health-check-interval???????????30 > storage.build-pgfid?????????????????????on > storage.gfid2path???????????????????????on > storage.gfid2path-separator?????????????: > storage.bd-aio??????????????????????????off > cluster.server-quorum-type??????????????off > cluster.server-quorum-ratio?????????????0 > changelog.changelog?????????????????????off > changelog.changelog-dir?????????????????(null) > changelog.encoding??????????????????????ascii > changelog.rollover-time?????????????????15 > changelog.fsync-interval????????????????5 > changelog.changelog-barrier-timeout?????120 > changelog.capture-del-path??????????????off > features.barrier????????????????????????disable > features.barrier-timeout????????????????120 > features.trash??????????????????????????off > features.trash-dir??????????????????????.trashcan > features.trash-eliminate-path???????????(null) > features.trash-max-filesize?????????????5MB > features.trash-internal-op??????????????off > cluster.enable-shared-storage???????????disable > cluster.write-freq-threshold????????????0 > cluster.read-freq-threshold?????????????0 > cluster.tier-pause??????????????????????off > cluster.tier-promote-frequency??????????120 > cluster.tier-demote-frequency???????????3600 > cluster.watermark-hi????????????????????90 > cluster.watermark-low???????????????????75 > cluster.tier-mode???????????????????????cache > cluster.tier-max-promote-file-size??????0 > cluster.tier-max-mb?????????????????????4000 > cluster.tier-max-files??????????????????10000 > cluster.tier-query-limit????????????????100 > cluster.tier-compact????????????????????on > cluster.tier-hot-compact-frequency??????604800 > cluster.tier-cold-compact-frequency?????604800 > features.ctr-enabled????????????????????off > features.record-counters????????????????off > features.ctr-record-metadata-heat???????off > features.ctr_link_consistency???????????off > features.ctr_lookupheal_link_timeout????300 > features.ctr_lookupheal_inode_timeout???300 > features.ctr-sql-db-cachesize???????????12500 > features.ctr-sql-db-wal-autocheckpoint??25000 > features.selinux????????????????????????on > locks.trace?????????????????????????????off > locks.mandatory-locking?????????????????off > cluster.disperse-self-heal-daemon???????enable > cluster.quorum-reads????????????????????no > client.bind-insecure????????????????????(null) > features.shard??????????????????????????off > features.shard-block-size???????????????64MB > features.scrub-throttle?????????????????lazy > features.scrub-freq?????????????????????biweekly > features.scrub??????????????????????????false > features.expiry-time????????????????????120 > features.cache-invalidation?????????????off > features.cache-invalidation-timeout?????60 > features.leases?????????????????????????off > features.lease-lock-recall-timeout??????60 > disperse.background-heals???????????????8 > disperse.heal-wait-qlength??????????????128 > cluster.heal-timeout????????????????????600 > dht.force-readdirp??????????????????????on > disperse.read-policy????????????????????gfid-hash > cluster.shd-max-threads?????????????????1 > cluster.shd-wait-qlength????????????????1024 > cluster.locking-scheme??????????????????full > cluster.granular-entry-heal?????????????no > features.locks-revocation-secs??????????0 > features.locks-revocation-clear-all?????false > features.locks-revocation-max-blocked???0 > features.locks-monkey-unlocking?????????false > disperse.shd-max-threads????????????????1 > disperse.shd-wait-qlength???????????????1024 > disperse.cpu-extensions?????????????????auto > disperse.self-heal-window-size??????????1 > cluster.use-compound-fops???????????????off > performance.parallel-readdir????????????off > performance.rda-request-size????????????131072 > performance.rda-low-wmark???????????????4096 > performance.rda-high-wmark??????????????128KB > performance.rda-cache-limit?????????????10MB > performance.nl-cache-positive-entry?????false > performance.nl-cache-limit??????????????10MB > performance.nl-cache-timeout????????????60 > cluster.brick-multiplex?????????????????off > cluster.max-bricks-per-process??????????0 > disperse.optimistic-change-log??????????on > cluster.halo-enabled????????????????????False > cluster.halo-shd-max-latency????????????99999 > cluster.halo-nfsd-max-latency???????????5 > cluster.halo-max-latency????????????????5 > cluster.halo-max-replicas >> >> Thanks, >> Soumya >> >>> >>> Does this process make more sense than a version upgrade process to >>> 4.1, then 5, then 6? What "gotcha's" do I need to be ready for? I >>> have until late May to prep and test on old, slow hardware with a >>> small amount of files and volumes. >>> >>> >>> You can directly upgrade from 3.12 to 6.x. I would suggest that rather >>> than deleting and creating Gluster volume. +Hari and +Sanju for further >>> guidelines on upgrade, as they recently did upgrade tests. +Soumya to >>> add to the nfs-ganesha aspect. >>> >>> Regards, >>> Poornima >>> >>> -- >>> >>> James P. Kinney III >>> >>> Every time you stop a school, you will have to build a jail. What you >>> gain at one end you lose at the other. It's like feeding a dog on his >>> own tail. It won't fatten the dog. >>> - Speech 11/23/1900 Mark Twain >>> >>> >>> http://heretothereideas.blogspot.com/ >>> >>> >>> _______________________________________________ >>> Gluster-users mailing list >>> >>> Gluster-users at gluster.org >>> >>> >> Gluster-users at gluster.org >>> >>> > >>> >>> https://lists.gluster.org/mailman/listinfo/gluster-users >>> >>> > -- > > James P. Kinney III > > Every time you stop a school, you will have to build a jail. What you > gain at one end you lose at the other. It's like feeding a dog on his > own tail. It won't fatten the dog. > - Speech 11/23/1900 Mark Twain > > http://heretothereideas.blogspot.com/ > From jim.kinney at gmail.com Mon Apr 1 20:23:22 2019 From: jim.kinney at gmail.com (Jim Kinney) Date: Mon, 01 Apr 2019 16:23:22 -0400 Subject: [Gluster-users] Rsync in place of heal after brick failure In-Reply-To: References: Message-ID: <2a200cc0272ad0a89763f1ff5646e1772eae205e.camel@gmail.com> Nice! I didn't use -H -X and the system had to do some clean up. I'll add this in my next migration progress as I move 120TB to new hard drives. On Mon, 2019-04-01 at 14:27 -0400, Tom Fite wrote: > Hi all, > I have a very large (65 TB) brick in a replica 2 volume that needs to > be re-copied from scratch. A heal will take a very long time with > performance degradation on the volume so I investigated using rsync > to do the brunt of the work. > > The command: > > rsync -av -H -X --numeric-ids --progress server1:/data/brick1/gv0 > /data/brick1/ > > Running with -H assures that the hard links in .glusterfs are > preserved, and -X preserves all of gluster's extended attributes. > > I've tested this on my test environment as follows: > > 1. Stop glusterd and kill procs > 2. Move brick volume to backup dir > 3. Run rsync > 4. Start glusterd > 5. Observe gluster status > > All appears to be working correctly. Gluster status reports all > bricks online, all data is accessible in the volume, and I don't see > any errors in the logs. > > Anybody else have experience trying this? > > Thanks > -Tom > > _______________________________________________Gluster-users mailing > listGluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -- James P. Kinney III Every time you stop a school, you will have to build a jail. What you gain at one end you lose at the other. It's like feeding a dog on his own tail. It won't fatten the dog. - Speech 11/23/1900 Mark Twain http://heretothereideas.blogspot.com/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From pgurusid at redhat.com Tue Apr 2 01:55:53 2019 From: pgurusid at redhat.com (Poornima Gurusiddaiah) Date: Tue, 2 Apr 2019 07:25:53 +0530 Subject: [Gluster-users] Rsync in place of heal after brick failure In-Reply-To: References: Message-ID: You could also try xfsdump and xfsrestore if you brick filesystem is xfs and the destination disk can be attached locally? This will be much faster. Regards, Poornima On Tue, Apr 2, 2019, 12:05 AM Tom Fite wrote: > Hi all, > > I have a very large (65 TB) brick in a replica 2 volume that needs to be > re-copied from scratch. A heal will take a very long time with performance > degradation on the volume so I investigated using rsync to do the brunt of > the work. > > The command: > > rsync -av -H -X --numeric-ids --progress server1:/data/brick1/gv0 > /data/brick1/ > > Running with -H assures that the hard links in .glusterfs are preserved, > and -X preserves all of gluster's extended attributes. > > I've tested this on my test environment as follows: > > 1. Stop glusterd and kill procs > 2. Move brick volume to backup dir > 3. Run rsync > 4. Start glusterd > 5. Observe gluster status > > All appears to be working correctly. Gluster status reports all bricks > online, all data is accessible in the volume, and I don't see any errors in > the logs. > > Anybody else have experience trying this? > > Thanks > -Tom > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From francois.duport at hotmail.fr Mon Apr 1 07:50:41 2019 From: francois.duport at hotmail.fr (=?utf-8?B?RnJhbsOnb2lzIER1cG9ydA==?=) Date: Mon, 1 Apr 2019 07:50:41 +0000 Subject: [Gluster-users] Cross-compiling GlusterFS Message-ID: Hi, I try to cross-compile GlusterFS because I don't want my embedded client Todo it and each time I reset my client. So I want the compile application to be in my rom image. That said in appearance I did succeeded in my cross compilation but when I do a 'make Destdir='pwd'/out install' in out there's only a 'lib' and a 'include' folder. I can't find the associated bin folder. Can you help me with that? Thanks Best regards Fran?ois -------------- next part -------------- An HTML attachment was scrubbed... URL: From ravishankar at redhat.com Tue Apr 2 08:50:42 2019 From: ravishankar at redhat.com (Ravishankar N) Date: Tue, 2 Apr 2019 14:20:42 +0530 Subject: [Gluster-users] Cross-compiling GlusterFS In-Reply-To: References: Message-ID: <8c934a87-2653-20e8-0252-fcbd7e37b0f4@redhat.com> On 01/04/19 1:20 PM, Fran?ois Duport wrote: > Hi, > > I try to cross-compile GlusterFS because I don't want my embedded > client Todo it and each time I reset my client. So I want the compile > application to be in my rom image. > > That said in appearance I did succeeded in my cross compilation but > when I do a 'make Destdir='pwd'/out install' in out there's only a > 'lib' and a 'include' folder. I can't find the associated bin folder. I did not attempt a cross compile but `make install DESTDIR=/tmp/DELETE/` did put everything including the binaries inside /tmp/DELETE on a local install. Perhaps you could search the verbose output during the install for names of binaries to see if and where they are getting installed. For example, scrolling through the output and searching for glfsheal, I see /usr/bin/mkdir -p '/tmp/DELETE//usr/local/sbin' ? /bin/sh ../../libtool?? --mode=install /usr/bin/install -c glfsheal '/tmp/DELETE//usr/local/sbin' libtool: install: /usr/bin/install -c .libs/glfsheal /tmp/DELETE//usr/local/sbin/glfsheal Hope that helps. Ravi > > Can you help me with that? > Thanks > > Best regards > Fran?ois > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From atin.mukherjee83 at gmail.com Tue Apr 2 09:53:54 2019 From: atin.mukherjee83 at gmail.com (Atin Mukherjee) Date: Tue, 2 Apr 2019 15:23:54 +0530 Subject: [Gluster-users] [Gluster-devel] Upgrade testing to gluster 6 In-Reply-To: References: Message-ID: On Mon, 1 Apr 2019 at 10:28, Hari Gowtham wrote: > Comments inline. > > On Mon, Apr 1, 2019 at 5:55 AM Sankarshan Mukhopadhyay > wrote: > > > > Quite a considerable amount of detail here. Thank you! > > > > On Fri, Mar 29, 2019 at 11:42 AM Hari Gowtham > wrote: > > > > > > Hello Gluster users, > > > > > > As you all aware that glusterfs-6 is out, we would like to inform you > > > that, we have spent a significant amount of time in testing > > > glusterfs-6 in upgrade scenarios. We have done upgrade testing to > > > glusterfs-6 from various releases like 3.12, 4.1 and 5.3. > > > > > > As glusterfs-6 has got in a lot of changes, we wanted to test those > portions. > > > There were xlators (and respective options to enable/disable them) > > > added and deprecated in glusterfs-6 from various versions [1]. > > > > > > We had to check the following upgrade scenarios for all such options > > > Identified in [1]: > > > 1) option never enabled and upgraded > > > 2) option enabled and then upgraded > > > 3) option enabled and then disabled and then upgraded > > > > > > We weren't manually able to check all the combinations for all the > options. > > > So the options involving enabling and disabling xlators were > prioritized. > > > The below are the result of the ones tested. > > > > > > Never enabled and upgraded: > > > checked from 3.12, 4.1, 5.3 to 6 the upgrade works. > > > > > > Enabled and upgraded: > > > Tested for tier which is deprecated, It is not a recommended upgrade. > > > As expected the volume won't be consumable and will have a few more > > > issues as well. > > > Tested with 3.12, 4.1 and 5.3 to 6 upgrade. > > > > > > Enabled, disabled before upgrade. > > > Tested for tier with 3.12 and the upgrade went fine. > > > > > > There is one common issue to note in every upgrade. The node being > > > upgraded is going into disconnected state. You have to flush the > iptables > > > and the restart glusterd on all nodes to fix this. > > > > > > > Is this something that is written in the upgrade notes? I do not seem > > to recall, if not, I'll send a PR > > No this wasn't mentioned in the release notes. PRs are welcome. > > > > > > The testing for enabling new options is still pending. The new options > > > won't cause as much issues as the deprecated ones so this was put at > > > the end of the priority list. It would be nice to get contributions > > > for this. > > > > > > > Did the range of tests lead to any new issues? > > Yes. In the first round of testing we found an issue and had to postpone > the > release of 6 until the fix was made available. > https://bugzilla.redhat.com/show_bug.cgi?id=1684029 > > And then we tested it again after this patch was made available. > and came across this: > https://bugzilla.redhat.com/show_bug.cgi?id=1694010 This isn?t a bug as we found that upgrade worked seamelessly in two different setup. So we have no issues in the upgrade path to glusterfs-6 release. > > Have mentioned this in the second mail as to how to over this situation > for now until the fix is available. > > > > > > For the disable testing, tier was used as it covers most of the xlator > > > that was removed. And all of these tests were done on a replica 3 > volume. > > > > > > > I'm not sure if the Glusto team is reading this, but it would be > > pertinent to understand if the approach you have taken can be > > converted into a form of automated testing pre-release. > > I don't have an answer for this, have CCed Vijay. > He might have an idea. > > > > > > Note: This is only for upgrade testing of the newly added and removed > > > xlators. Does not involve the normal tests for the xlator. > > > > > > If you have any questions, please feel free to reach us. > > > > > > [1] > https://docs.google.com/spreadsheets/d/1nh7T5AXaV6kc5KgILOy2pEqjzC3t_R47f1XUXSVFetI/edit?usp=sharing > > > > > > Regards, > > > Hari and Sanju. > > _______________________________________________ > > Gluster-users mailing list > > Gluster-users at gluster.org > > https://lists.gluster.org/mailman/listinfo/gluster-users > > > > -- > Regards, > Hari Gowtham. > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > -- --Atin -------------- next part -------------- An HTML attachment was scrubbed... URL: From nux at li.nux.ro Tue Apr 2 15:37:17 2019 From: nux at li.nux.ro (Nux!) Date: Tue, 2 Apr 2019 16:37:17 +0100 (BST) Subject: [Gluster-users] Prioritise local bricks for IO? In-Reply-To: References: <29221907.583.1553599314586.JavaMail.zimbra@li.nux.ro> Message-ID: <383369409.4472.1554219437440.JavaMail.zimbra@li.nux.ro> Ok, cool, thanks. So.. no go. Any other ideas on how to accomplish task then? -- Sent from the Delta quadrant using Borg technology! Nux! www.nux.ro ----- Original Message ----- > From: "Nithya Balachandran" > To: "Poornima Gurusiddaiah" > Cc: "Nux!" , "gluster-users" , "Gluster Devel" > Sent: Thursday, 28 March, 2019 09:38:16 > Subject: Re: [Gluster-users] Prioritise local bricks for IO? > On Wed, 27 Mar 2019 at 20:27, Poornima Gurusiddaiah > wrote: > >> This feature is not under active development as it was not used widely. >> AFAIK its not supported feature. >> +Nithya +Raghavendra for further clarifications. >> > > This is not actively supported - there has been no work done on this > feature for a long time. > > Regards, > Nithya > >> >> Regards, >> Poornima >> >> On Wed, Mar 27, 2019 at 12:33 PM Lucian wrote: >> >>> Oh, that's just what the doctor ordered! >>> Hope it works, thanks >>> >>> On 27 March 2019 03:15:57 GMT, Vlad Kopylov wrote: >>>> >>>> I don't remember if it still in works >>>> NUFA >>>> >>>> https://github.com/gluster/glusterfs-specs/blob/master/done/Features/nufa.md >>>> >>>> v >>>> >>>> On Tue, Mar 26, 2019 at 7:27 AM Nux! wrote: >>>> >>>>> Hello, >>>>> >>>>> I'm trying to set up a distributed backup storage (no replicas), but >>>>> I'd like to prioritise the local bricks for any IO done on the volume. >>>>> This will be a backup stor, so in other words, I'd like the files to be >>>>> written locally if there is space, so as to save the NICs for other traffic. >>>>> >>>>> Anyone knows how this might be achievable, if at all? >>>>> >>>>> -- >>>>> Sent from the Delta quadrant using Borg technology! >>>>> >>>>> Nux! >>>>> www.nux.ro >>>>> _______________________________________________ >>>>> Gluster-users mailing list >>>>> Gluster-users at gluster.org >>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>> >>>> >>> -- >>> Sent from my Android device with K-9 Mail. Please excuse my brevity. >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> https://lists.gluster.org/mailman/listinfo/gluster-users >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org > > https://lists.gluster.org/mailman/listinfo/gluster-users From ykaul at redhat.com Tue Apr 2 18:16:02 2019 From: ykaul at redhat.com (Yaniv Kaul) Date: Tue, 2 Apr 2019 21:16:02 +0300 Subject: [Gluster-users] [Gluster-devel] Prioritise local bricks for IO? In-Reply-To: <383369409.4472.1554219437440.JavaMail.zimbra@li.nux.ro> References: <29221907.583.1553599314586.JavaMail.zimbra@li.nux.ro> <383369409.4472.1554219437440.JavaMail.zimbra@li.nux.ro> Message-ID: On Tue, Apr 2, 2019 at 6:37 PM Nux! wrote: > Ok, cool, thanks. So.. no go. > > Any other ideas on how to accomplish task then? > While not a solution, I believe https://review.gluster.org/#/c/glusterfs/+/21333/ - read selection based on latency, is an interesting path towards this. (Of course, you'd need later also add write...) Y. > -- > Sent from the Delta quadrant using Borg technology! > > Nux! > www.nux.ro > > ----- Original Message ----- > > From: "Nithya Balachandran" > > To: "Poornima Gurusiddaiah" > > Cc: "Nux!" , "gluster-users" , > "Gluster Devel" > > Sent: Thursday, 28 March, 2019 09:38:16 > > Subject: Re: [Gluster-users] Prioritise local bricks for IO? > > > On Wed, 27 Mar 2019 at 20:27, Poornima Gurusiddaiah > > > wrote: > > > >> This feature is not under active development as it was not used widely. > >> AFAIK its not supported feature. > >> +Nithya +Raghavendra for further clarifications. > >> > > > > This is not actively supported - there has been no work done on this > > feature for a long time. > > > > Regards, > > Nithya > > > >> > >> Regards, > >> Poornima > >> > >> On Wed, Mar 27, 2019 at 12:33 PM Lucian wrote: > >> > >>> Oh, that's just what the doctor ordered! > >>> Hope it works, thanks > >>> > >>> On 27 March 2019 03:15:57 GMT, Vlad Kopylov > wrote: > >>>> > >>>> I don't remember if it still in works > >>>> NUFA > >>>> > >>>> > https://github.com/gluster/glusterfs-specs/blob/master/done/Features/nufa.md > >>>> > >>>> v > >>>> > >>>> On Tue, Mar 26, 2019 at 7:27 AM Nux! wrote: > >>>> > >>>>> Hello, > >>>>> > >>>>> I'm trying to set up a distributed backup storage (no replicas), but > >>>>> I'd like to prioritise the local bricks for any IO done on the > volume. > >>>>> This will be a backup stor, so in other words, I'd like the files to > be > >>>>> written locally if there is space, so as to save the NICs for other > traffic. > >>>>> > >>>>> Anyone knows how this might be achievable, if at all? > >>>>> > >>>>> -- > >>>>> Sent from the Delta quadrant using Borg technology! > >>>>> > >>>>> Nux! > >>>>> www.nux.ro > >>>>> _______________________________________________ > >>>>> Gluster-users mailing list > >>>>> Gluster-users at gluster.org > >>>>> https://lists.gluster.org/mailman/listinfo/gluster-users > >>>>> > >>>> > >>> -- > >>> Sent from my Android device with K-9 Mail. Please excuse my brevity. > >>> _______________________________________________ > >>> Gluster-users mailing list > >>> Gluster-users at gluster.org > >>> https://lists.gluster.org/mailman/listinfo/gluster-users > >> > >> _______________________________________________ > >> Gluster-users mailing list > >> Gluster-users at gluster.org > > > https://lists.gluster.org/mailman/listinfo/gluster-users > _______________________________________________ > Gluster-devel mailing list > Gluster-devel at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hunter86_bg at yahoo.com Wed Apr 3 00:26:09 2019 From: hunter86_bg at yahoo.com (Strahil Nikolov) Date: Wed, 3 Apr 2019 00:26:09 +0000 (UTC) Subject: [Gluster-users] Gluster 5.5 slower than 3.12.15 References: <408118771.15560336.1554251169489.ref@mail.yahoo.com> Message-ID: <408118771.15560336.1554251169489@mail.yahoo.com> Hi Community, I have the feeling that with gluster v5.5 I have poorer performance than it used to be on 3.12.15. Did you observe something like that? I have a 3 node Hyperconverged Cluster (ovirt + glusterfs with replica 3 arbiter1 volumes) with NFS Ganesha and since I have upgraded to v5 - the issues came up.First it was 5.3 notorious experience and now with 5.5 - my sanlock is having problems and higher latency than it used to be. I have switched from NFS-Ganesha to pure FUSE , but the latency problems do not go away. Of course , this is partially due to the consumer hardware, but as the hardware has not changed I was hoping that the performance will remain as is. So, do you expect 5.5 to perform less than 3.12 ? Some info:Volume Name: engineType: ReplicateVolume ID: 30ca1cc2-f2f7-4749-9e2e-cee9d7099dedStatus: StartedSnapshot Count: 0Number of Bricks: 1 x (2 + 1) = 3Transport-type: tcpBricks:Brick1: ovirt1:/gluster_bricks/engine/engineBrick2: ovirt2:/gluster_bricks/engine/engineBrick3: ovirt3:/gluster_bricks/engine/engine (arbiter)Options Reconfigured:performance.client-io-threads: offnfs.disable: ontransport.address-family: inetperformance.quick-read: offperformance.read-ahead: offperformance.io-cache: offperformance.low-prio-threads: 32network.remote-dio: offcluster.eager-lock: enablecluster.quorum-type: autocluster.server-quorum-type: servercluster.data-self-heal-algorithm: fullcluster.locking-scheme: granularcluster.shd-max-threads: 8cluster.shd-wait-qlength: 10000features.shard: onuser.cifs: offstorage.owner-uid: 36storage.owner-gid: 36network.ping-timeout: 30performance.strict-o-direct: oncluster.granular-entry-heal: enablecluster.enable-shared-storage: enable Network: 1 gbit/s Filesystem:XFS Best Regards,Strahil Nikolov -------------- next part -------------- An HTML attachment was scrubbed... URL: From moagrawa at redhat.com Wed Apr 3 02:55:59 2019 From: moagrawa at redhat.com (Mohit Agrawal) Date: Wed, 3 Apr 2019 08:25:59 +0530 Subject: [Gluster-users] [ovirt-users] Re: Announcing Gluster release 5.5 In-Reply-To: References: <20190328164716.27693.35887@mail.ovirt.org> Message-ID: Hi Olaf, As per current attached "multi-glusterfsd-vol3.txt | multi-glusterfsd-vol4.txt" it is showing multiple processes are running for "ovirt-core ovirt-engine" brick names but there are no logs available in bricklogs.zip specific to this bricks, bricklogs.zip has a dump of ovirt-kube logs only Kindly share brick logs specific to the bricks "ovirt-core ovirt-engine" and share glusterd logs also. Regards Mohit Agrawal On Tue, Apr 2, 2019 at 9:18 PM Olaf Buitelaar wrote: > Dear Krutika, > > 1. > I've changed the volume settings, write performance seems to increased > somewhat, however the profile doesn't really support that since latencies > increased. However read performance has diminished, which does seem to be > supported by the profile runs (attached). > Also the IO does seem to behave more consistent than before. > I don't really understand the idea behind them, maybe you can explain why > these suggestions are good? > These settings seems to avoid as much local caching and access as possible > and push everything to the gluster processes. While i would expect local > access and local caches are a good thing, since it would lead to having > less network access or disk access. > I tried to investigate these settings a bit more, and this is what i > understood of them; > - network.remote-dio; when on it seems to ignore the O_DIRECT flag in the > client, thus causing the files to be cached and buffered in the page cache > on the client, i would expect this to be a good thing especially if the > server process would access the same page cache? > At least that is what grasp from this commit; > https://review.gluster.org/#/c/glusterfs/+/4206/2/xlators/protocol/client/src/client.c line > 867 > Also found this commit; > https://github.com/gluster/glusterfs/commit/06c4ba589102bf92c58cd9fba5c60064bc7a504e#diff-938709e499b4383c3ed33c3979b9080c suggesting > remote-dio actually improves performance, not sure it's a write or read > benchmark > When a file is opened with O_DIRECT it will also disable the write-behind > functionality > > - performance.strict-o-direct: when on, the AFR, will not ignore the > O_DIRECT flag. and will invoke: fop_writev_stub with the wb_writev_helper, > which seems to stack the operation, no idea why that is. But generally i > suppose not ignoring the O_DIRECT flag in the AFR is a good thing, when a > processes requests to have O_DIRECT. So this makes sense to me. > > - cluster.choose-local: when off, it doesn't prefer the local node, but > would always choose a brick. Since it's a 9 node cluster, with 3 > subvolumes, only a 1/3 could end-up local, and the other 2/3 should be > pushed to external nodes anyway. Or am I making the total wrong assumption > here? > > It seems to this config is moving to the gluster-block config side of > things, which does make sense. > Since we're running quite some mysql instances, which opens the files with > O_DIRECt i believe, it would mean the only layer of cache is within mysql > it self. Which you could argue is a good thing. But i would expect a little > of write-behind buffer, and maybe some of the data cached within gluster > would alleviate things a bit on gluster's side. But i wouldn't know if > that's the correct mind set, and so might be totally off here. > Also i would expect these gluster v set command to be online > operations, but somehow the bricks went down, after applying these changes. > What appears to have happened is that after the update the brick process > was restarted, but due to multiple brick process start issue, multiple > processes were started, and the brick didn't came online again. > However i'll try to reproduce this, since i would like to test with > cluster.choose-local: on, and see how performance compares. And hopefully > when it occurs collect some useful info. > Question; are network.remote-dio and performance.strict-o-direct mutually > exclusive settings, or can they both be on? > > 2. I've attached all brick logs, the only thing relevant i found was; > [2019-03-28 20:20:07.170452] I [MSGID: 113030] > [posix-entry-ops.c:1146:posix_unlink] 0-ovirt-kube-posix: > open-fd-key-status: 0 for > /data/gfs/bricks/brick1/ovirt-kube/.shard/a38d64bc-a28b-4ee1-a0bb-f919e7a1022c.109886 > [2019-03-28 20:20:07.170491] I [MSGID: 113031] > [posix-entry-ops.c:1053:posix_skip_non_linkto_unlink] 0-posix: linkto_xattr > status: 0 for > /data/gfs/bricks/brick1/ovirt-kube/.shard/a38d64bc-a28b-4ee1-a0bb-f919e7a1022c.109886 > [2019-03-28 20:20:07.248480] I [MSGID: 113030] > [posix-entry-ops.c:1146:posix_unlink] 0-ovirt-kube-posix: > open-fd-key-status: 0 for > /data/gfs/bricks/brick1/ovirt-kube/.shard/a38d64bc-a28b-4ee1-a0bb-f919e7a1022c.109886 > [2019-03-28 20:20:07.248491] I [MSGID: 113031] > [posix-entry-ops.c:1053:posix_skip_non_linkto_unlink] 0-posix: linkto_xattr > status: 0 for > /data/gfs/bricks/brick1/ovirt-kube/.shard/a38d64bc-a28b-4ee1-a0bb-f919e7a1022c.109886 > > Thanks Olaf > > ps. sorry needed to resend since it exceed the file limit > > Op ma 1 apr. 2019 om 07:56 schreef Krutika Dhananjay >: > >> Adding back gluster-users >> Comments inline ... >> >> On Fri, Mar 29, 2019 at 8:11 PM Olaf Buitelaar >> wrote: >> >>> Dear Krutika, >>> >>> >>> >>> 1. I?ve made 2 profile runs of around 10 minutes (see files >>> profile_data.txt and profile_data2.txt). Looking at it, most time seems be >>> spent at the fop?s fsync and readdirp. >>> >>> Unfortunate I don?t have the profile info for the 3.12.15 version so >>> it?s a bit hard to compare. >>> >>> One additional thing I do notice on 1 machine (10.32.9.5) the iowait >>> time increased a lot, from an average below the 1% it?s now around the 12% >>> after the upgrade. >>> >>> So first suspicion with be lighting strikes twice, and I?ve also just >>> now a bad disk, but that doesn?t appear to be the case, since all smart >>> status report ok. >>> >>> Also dd shows performance I would more or less expect; >>> >>> dd if=/dev/zero of=/data/test_file bs=100M count=1 oflag=dsync >>> >>> 1+0 records in >>> >>> 1+0 records out >>> >>> 104857600 bytes (105 MB) copied, 0.686088 s, 153 MB/s >>> >>> dd if=/dev/zero of=/data/test_file bs=1G count=1 oflag=dsync >>> >>> 1+0 records in >>> >>> 1+0 records out >>> >>> 1073741824 bytes (1.1 GB) copied, 7.61138 s, 141 MB/s >>> >>> if=/dev/urandom of=/data/test_file bs=1024 count=1000000 >>> >>> 1000000+0 records in >>> >>> 1000000+0 records out >>> >>> 1024000000 bytes (1.0 GB) copied, 6.35051 s, 161 MB/s >>> >>> dd if=/dev/zero of=/data/test_file bs=1024 count=1000000 >>> >>> 1000000+0 records in >>> >>> 1000000+0 records out >>> >>> 1024000000 bytes (1.0 GB) copied, 1.6899 s, 606 MB/s >>> >>> When I disable this brick (service glusterd stop; pkill glusterfsd) >>> performance in gluster is better, but not on par with what it was. Also the >>> cpu usages on the ?neighbor? nodes which hosts the other bricks in the same >>> subvolume increases quite a lot in this case, which I wouldn?t expect >>> actually since they shouldn't handle much more work, except flagging shards >>> to heal. Iowait also goes to idle once gluster is stopped, so it?s for >>> sure gluster which waits for io. >>> >>> >>> >> >> So I see that FSYNC %-latency is on the higher side. And I also noticed >> you don't have direct-io options enabled on the volume. >> Could you set the following options on the volume - >> # gluster volume set network.remote-dio off >> # gluster volume set performance.strict-o-direct on >> and also disable choose-local >> # gluster volume set cluster.choose-local off >> >> let me know if this helps. >> >> 2. I?ve attached the mnt log and volume info, but I couldn?t find >>> anything relevant in in those logs. I think this is because we run the VM?s >>> with libgfapi; >>> >>> [root at ovirt-host-01 ~]# engine-config -g LibgfApiSupported >>> >>> LibgfApiSupported: true version: 4.2 >>> >>> LibgfApiSupported: true version: 4.1 >>> >>> LibgfApiSupported: true version: 4.3 >>> >>> And I can confirm the qemu process is invoked with the gluster:// >>> address for the images. >>> >>> The message is logged in the /var/lib/libvert/qemu/ file, >>> which I?ve also included. For a sample case see around; 2019-03-28 20:20:07 >>> >>> Which has the error; E [MSGID: 133010] >>> [shard.c:2294:shard_common_lookup_shards_cbk] 0-ovirt-kube-shard: Lookup on >>> shard 109886 failed. Base file gfid = a38d64bc-a28b-4ee1-a0bb-f919e7a1022c >>> [Stale file handle] >>> >> >> Could you also attach the brick logs for this volume? >> >> >>> >>> 3. yes I see multiple instances for the same brick directory, like; >>> >>> /usr/sbin/glusterfsd -s 10.32.9.6 --volfile-id >>> ovirt-core.10.32.9.6.data-gfs-bricks-brick1-ovirt-core -p >>> /var/run/gluster/vols/ovirt-core/10.32.9.6-data-gfs-bricks-brick1-ovirt-core.pid >>> -S /var/run/gluster/452591c9165945d9.socket --brick-name >>> /data/gfs/bricks/brick1/ovirt-core -l >>> /var/log/glusterfs/bricks/data-gfs-bricks-brick1-ovirt-core.log >>> --xlator-option *-posix.glusterd-uuid=fb513da6-f3bd-4571-b8a2-db5efaf60cc1 >>> --process-name brick --brick-port 49154 --xlator-option >>> ovirt-core-server.listen-port=49154 >>> >>> >>> >>> I?ve made an export of the output of ps from the time I observed these >>> multiple processes. >>> >>> In addition the brick_mux bug as noted by Atin. I might also have >>> another possible cause, as ovirt moves nodes from none-operational state or >>> maintenance state to active/activating, it also seems to restart gluster, >>> however I don?t have direct proof for this theory. >>> >>> >>> >> >> +Atin Mukherjee ^^ >> +Mohit Agrawal ^^ >> >> -Krutika >> >> Thanks Olaf >>> >>> Op vr 29 mrt. 2019 om 10:03 schreef Sandro Bonazzola < >>> sbonazzo at redhat.com>: >>> >>>> >>>> >>>> Il giorno gio 28 mar 2019 alle ore 17:48 ha >>>> scritto: >>>> >>>>> Dear All, >>>>> >>>>> I wanted to share my experience upgrading from 4.2.8 to 4.3.1. While >>>>> previous upgrades from 4.1 to 4.2 etc. went rather smooth, this one was a >>>>> different experience. After first trying a test upgrade on a 3 node setup, >>>>> which went fine. i headed to upgrade the 9 node production platform, >>>>> unaware of the backward compatibility issues between gluster 3.12.15 -> >>>>> 5.3. After upgrading 2 nodes, the HA engine stopped and wouldn't start. >>>>> Vdsm wasn't able to mount the engine storage domain, since /dom_md/metadata >>>>> was missing or couldn't be accessed. Restoring this file by getting a good >>>>> copy of the underlying bricks, removing the file from the underlying bricks >>>>> where the file was 0 bytes and mark with the stickybit, and the >>>>> corresponding gfid's. Removing the file from the mount point, and copying >>>>> back the file on the mount point. Manually mounting the engine domain, and >>>>> manually creating the corresponding symbolic links in /rhev/data-center and >>>>> /var/run/vdsm/storage and fixing the ownership back to vdsm.kvm (which was >>>>> root.root), i was able to start the HA engine again. Since the engine was >>>>> up again, and things seemed rather unstable i decided to continue the >>>>> upgrade on the other nodes suspecting an incompatibility in gluster >>>>> versions, i thought would be best to have them all on the same version >>>>> rather soonish. However things went from bad to worse, the engine stopped >>>>> again, and all vm?s stopped working as well. So on a machine outside the >>>>> setup and restored a backup of the engine taken from version 4.2.8 just >>>>> before the upgrade. With this engine I was at least able to start some vm?s >>>>> again, and finalize the upgrade. Once the upgraded, things didn?t stabilize >>>>> and also lose 2 vm?s during the process due to image corruption. After >>>>> figuring out gluster 5.3 had quite some issues I was as lucky to see >>>>> gluster 5.5 was about to be released, on the moment the RPM?s were >>>>> available I?ve installed those. This helped a lot in terms of stability, >>>>> for which I?m very grateful! However the performance is unfortunate >>>>> terrible, it?s about 15% of what the performance was running gluster >>>>> 3.12.15. It?s strange since a simple dd shows ok performance, but our >>>>> actual workload doesn?t. While I would expect the performance to be better, >>>>> due to all improvements made since gluster version 3.12. Does anybody share >>>>> the same experience? >>>>> I really hope gluster 6 will soon be tested with ovirt and released, >>>>> and things start to perform and stabilize again..like the good old days. Of >>>>> course when I can do anything, I?m happy to help. >>>>> >>>> >>>> Opened https://bugzilla.redhat.com/show_bug.cgi?id=1693998 to track >>>> the rebase on Gluster 6. >>>> >>>> >>>> >>>>> >>>>> I think the following short list of issues we have after the migration; >>>>> Gluster 5.5; >>>>> - Poor performance for our workload (mostly write dependent) >>>>> - VM?s randomly pause on unknown storage errors, which are >>>>> ?stale file?s?. corresponding log; Lookup on shard 797 failed. Base file >>>>> gfid = 8a27b91a-ff02-42dc-bd4c-caa019424de8 [Stale file handle] >>>>> - Some files are listed twice in a directory (probably related >>>>> the stale file issue?) >>>>> Example; >>>>> ls -la >>>>> /rhev/data-center/59cd53a9-0003-02d7-00eb-0000000001e3/313f5d25-76af-4ecd-9a20-82a2fe815a3c/images/4add6751-3731-4bbd-ae94-aaeed12ea450/ >>>>> total 3081 >>>>> drwxr-x---. 2 vdsm kvm 4096 Mar 18 11:34 . >>>>> drwxr-xr-x. 13 vdsm kvm 4096 Mar 19 09:42 .. >>>>> -rw-rw----. 1 vdsm kvm 1048576 Mar 28 12:55 >>>>> 1a7cf259-6b29-421d-9688-b25dfaafb13c >>>>> -rw-rw----. 1 vdsm kvm 1048576 Mar 28 12:55 >>>>> 1a7cf259-6b29-421d-9688-b25dfaafb13c >>>>> -rw-rw----. 1 vdsm kvm 1048576 Jan 27 2018 >>>>> 1a7cf259-6b29-421d-9688-b25dfaafb13c.lease >>>>> -rw-r--r--. 1 vdsm kvm 290 Jan 27 2018 >>>>> 1a7cf259-6b29-421d-9688-b25dfaafb13c.meta >>>>> -rw-r--r--. 1 vdsm kvm 290 Jan 27 2018 >>>>> 1a7cf259-6b29-421d-9688-b25dfaafb13c.meta >>>>> >>>>> - brick processes sometimes starts multiple times. Sometimes I?ve 5 >>>>> brick processes for a single volume. Killing all glusterfsd?s for the >>>>> volume on the machine and running gluster v start force usually just >>>>> starts one after the event, from then on things look all right. >>>>> >>>>> >>>> May I kindly ask to open bugs on Gluster for above issues at >>>> https://bugzilla.redhat.com/enter_bug.cgi?product=GlusterFS ? >>>> Sahina? >>>> >>>> >>>>> Ovirt 4.3.2.1-1.el7 >>>>> - All vms images ownership are changed to root.root after the vm >>>>> is shutdown, probably related to; >>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1666795 but not only >>>>> scoped to the HA engine. I?m still in compatibility mode 4.2 for the >>>>> cluster and for the vm?s, but upgraded to version ovirt 4.3.2 >>>>> >>>> >>>> Ryan? >>>> >>>> >>>>> - The network provider is set to ovn, which is fine..actually >>>>> cool, only the ?ovs-vswitchd? is a CPU hog, and utilizes 100% >>>>> >>>> >>>> Miguel? Dominik? >>>> >>>> >>>>> - It seems on all nodes vdsm tries to get the the stats for the >>>>> HA engine, which is filling the logs with (not sure if this is new); >>>>> [api.virt] FINISH getStats return={'status': {'message': "Virtual >>>>> machine does not exist: {'vmId': u'20d69acd-edfd-4aeb-a2ae-49e9c121b7e9'}", >>>>> 'code': 1}} from=::1,59290, vmId=20d69acd-edfd-4aeb-a2ae-49e9c121b7e9 >>>>> (api:54) >>>>> >>>> >>>> Simone? >>>> >>>> >>>>> - It seems the package os_brick [root] managedvolume not >>>>> supported: Managed Volume Not Supported. Missing package os-brick.: >>>>> ('Cannot import os_brick',) (caps:149) which fills the vdsm.log, but for >>>>> this I also saw another message, so I suspect this will already be resolved >>>>> shortly >>>>> - The machine I used to run the backup HA engine, doesn?t want >>>>> to get removed from the hosted-engine ?vm-status, not even after running; >>>>> hosted-engine --clean-metadata --host-id=10 --force-clean or hosted-engine >>>>> --clean-metadata --force-clean from the machine itself. >>>>> >>>> >>>> Simone? >>>> >>>> >>>>> >>>>> Think that's about it. >>>>> >>>>> Don?t get me wrong, I don?t want to rant, I just wanted to share my >>>>> experience and see where things can made better. >>>>> >>>> >>>> If not already done, can you please open bugs for above issues at >>>> https://bugzilla.redhat.com/enter_bug.cgi?classification=oVirt ? >>>> >>>> >>>>> >>>>> >>>>> Best Olaf >>>>> _______________________________________________ >>>>> Users mailing list -- users at ovirt.org >>>>> To unsubscribe send an email to users-leave at ovirt.org >>>>> Privacy Statement: https://www.ovirt.org/site/privacy-policy/ >>>>> oVirt Code of Conduct: >>>>> https://www.ovirt.org/community/about/community-guidelines/ >>>>> List Archives: >>>>> https://lists.ovirt.org/archives/list/users at ovirt.org/message/3CO35Q7VZMWNHS4LPUJNO7S47MGLSKS5/ >>>>> >>>> >>>> >>>> -- >>>> >>>> SANDRO BONAZZOLA >>>> >>>> MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV >>>> >>>> Red Hat EMEA >>>> >>>> sbonazzo at redhat.com >>>> >>>> >>> _______________________________________________ >>> Users mailing list -- users at ovirt.org >>> To unsubscribe send an email to users-leave at ovirt.org >>> Privacy Statement: https://www.ovirt.org/site/privacy-policy/ >>> oVirt Code of Conduct: >>> https://www.ovirt.org/community/about/community-guidelines/ >>> List Archives: >>> https://lists.ovirt.org/archives/list/users at ovirt.org/message/HAGTA64LF7LLE6YMHQ6DLT26MD2GZ2PK/ >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From nl at fischer-ka.de Wed Apr 3 06:48:55 2019 From: nl at fischer-ka.de (Ingo Fischer) Date: Wed, 3 Apr 2019 08:48:55 +0200 Subject: [Gluster-users] Is "replica 4 arbiter 1" allowed to tweak client-quorum? Message-ID: <22ed4f3d-d8f1-71ec-1422-241413f93a08@fischer-ka.de> Hi All, I had a replica 2 cluster to host my VM images from my Proxmox cluster. I got a bit around split brain scenarios by using "nufa" to make sure the files are located on the host where the machine also runs normally. So in fact one replica could fail and I still had the VM working. But then I thought about doing better and decided to add a node to increase replica and I decided against arbiter approach. During this I also decided to go away from nufa to make it a more normal approach. But in fact by adding the third replica and removing nufa I'm not really better on availability - only split-brain-chance. I'm still at the point that only one node is allowed to fail because else the now active client quorum is no longer met and FS goes read only (which in fact is not really better then failing completely as it was before). So I thought about adding arbiter bricks as "kind of 4th replica (but without space needs) ... but then I read in docs that only "replica 3 arbiter 1" is allowed as combination. Is this still true? If docs are true: Why arbiter is not allowed for higher replica counts? It would allow to improve on client quorum in my understanding. Thank you for your opinion and/or facts :-) Ingo -- Ingo Fischer Technical Director of Platform Gameforge 4D GmbH Albert-Nestler-Stra?e 8 76131 Karlsruhe Germany Tel. +49 721 354 808-2269 ingo.fischer at gameforge.com http://www.gameforge.com Amtsgericht Mannheim, Handelsregisternummer 718029 USt-IdNr.: DE814330106 Gesch?ftsf?hrer Alexander R?sner, Jeffrey Brown From ravishankar at redhat.com Wed Apr 3 07:38:27 2019 From: ravishankar at redhat.com (Ravishankar N) Date: Wed, 3 Apr 2019 13:08:27 +0530 Subject: [Gluster-users] Is "replica 4 arbiter 1" allowed to tweak client-quorum? In-Reply-To: <22ed4f3d-d8f1-71ec-1422-241413f93a08@fischer-ka.de> References: <22ed4f3d-d8f1-71ec-1422-241413f93a08@fischer-ka.de> Message-ID: <21e01090-8ddc-5d3f-d58c-f673dad5a78a@redhat.com> On 03/04/19 12:18 PM, Ingo Fischer wrote: > Hi All, > > I had a replica 2 cluster to host my VM images from my Proxmox cluster. > I got a bit around split brain scenarios by using "nufa" to make sure > the files are located on the host where the machine also runs normally. > So in fact one replica could fail and I still had the VM working. > > But then I thought about doing better and decided to add a node to > increase replica and I decided against arbiter approach. During this I > also decided to go away from nufa to make it a more normal approach. > > But in fact by adding the third replica and removing nufa I'm not really > better on availability - only split-brain-chance. I'm still at the point > that only one node is allowed to fail because else the now active client > quorum is no longer met and FS goes read only (which in fact is not > really better then failing completely as it was before). > > So I thought about adding arbiter bricks as "kind of 4th replica (but > without space needs) ... but then I read in docs that only "replica 3 > arbiter 1" is allowed as combination. Is this still true? Yes, this is still true. Slightly off-topic, the 'replica 3 arbiter 1' was supposed to mean there are 3 bricks out of which 1 is an arbiter. This supposedly caused some confusion where people thought there were 4 bricks involved. The CLI syntax was changed in the newer releases to 'replica 2 arbiter 1` to mean there are 2 data bricks and 1 arbiter brick. For backward compatibility, the older syntax still works though. The documentation needs to be updated. :-) > If docs are true: Why arbiter is not allowed for higher replica counts? The main motivation for the arbiter feature was to solve a specific case: people who wanted to avoid split-brains associated with replica 2 but did not want to add another full blown data brick to make it replica 3 for cost reasons. > It would allow to improve on client quorum in my understanding. Agreed but the current implementation is only for a 2+1 configuration. Perhaps it is something we could work on in the future to make it generic like you say. > > Thank you for your opinion and/or facts :-) I don't think NUFA is being worked on/tested actively. If you can afford a 3rd data brick, making it replica 3 is definitely better than a 2+1 arbiter since there is more availability by virtue of the 3rd brick also storing data. Both of them prevent split-brains and are used successfully by OVirt/ VM storage/ hyperconvergance use cases. Even without NUFA, for reads, AFR anyway serves it from the local copy (writes still need to go to all bricks). Regards, Ravi > > Ingo > From atumball at redhat.com Wed Apr 3 08:35:07 2019 From: atumball at redhat.com (Amar Tumballi Suryanarayan) Date: Wed, 3 Apr 2019 14:05:07 +0530 Subject: [Gluster-users] Gluster 5.5 slower than 3.12.15 In-Reply-To: <408118771.15560336.1554251169489@mail.yahoo.com> References: <408118771.15560336.1554251169489.ref@mail.yahoo.com> <408118771.15560336.1554251169489@mail.yahoo.com> Message-ID: Strahil, With some basic testing, we are noticing the similar behavior too. One of the issue we identified was increased n/w usage in 5.x series (being addressed by https://review.gluster.org/#/c/glusterfs/+/22404/), and there are few other features which write extended attributes which caused some delay. We are in the process of publishing some numbers with release-3.12.x, release-5 and release-6 comparison soon. With some numbers we are already seeing release-6 currently is giving really good performance in many configurations, specially for 1x3 replicate volume type. While we continue to identify and fix issues in 5.x series, one of the request is to validate release-6.x (6.0 or 6.1 which would happen on April 10th), so you can see the difference in your workload. Regards, Amar On Wed, Apr 3, 2019 at 5:57 AM Strahil Nikolov wrote: > Hi Community, > > I have the feeling that with gluster v5.5 I have poorer performance than > it used to be on 3.12.15. Did you observe something like that? > > I have a 3 node Hyperconverged Cluster (ovirt + glusterfs with replica 3 > arbiter1 volumes) with NFS Ganesha and since I have upgraded to v5 - the > issues came up. > First it was 5.3 notorious experience and now with 5.5 - my sanlock is > having problems and higher latency than it used to be. I have switched from > NFS-Ganesha to pure FUSE , but the latency problems do not go away. > > Of course , this is partially due to the consumer hardware, but as the > hardware has not changed I was hoping that the performance will remain as > is. > > So, do you expect 5.5 to perform less than 3.12 ? > > Some info: > Volume Name: engine > Type: Replicate > Volume ID: 30ca1cc2-f2f7-4749-9e2e-cee9d7099ded > Status: Started > Snapshot Count: 0 > Number of Bricks: 1 x (2 + 1) = 3 > Transport-type: tcp > Bricks: > Brick1: ovirt1:/gluster_bricks/engine/engine > Brick2: ovirt2:/gluster_bricks/engine/engine > Brick3: ovirt3:/gluster_bricks/engine/engine (arbiter) > Options Reconfigured: > performance.client-io-threads: off > nfs.disable: on > transport.address-family: inet > performance.quick-read: off > performance.read-ahead: off > performance.io-cache: off > performance.low-prio-threads: 32 > network.remote-dio: off > cluster.eager-lock: enable > cluster.quorum-type: auto > cluster.server-quorum-type: server > cluster.data-self-heal-algorithm: full > cluster.locking-scheme: granular > cluster.shd-max-threads: 8 > cluster.shd-wait-qlength: 10000 > features.shard: on > user.cifs: off > storage.owner-uid: 36 > storage.owner-gid: 36 > network.ping-timeout: 30 > performance.strict-o-direct: on > cluster.granular-entry-heal: enable > cluster.enable-shared-storage: enable > > Network: 1 gbit/s > > Filesystem:XFS > > Best Regards, > Strahil Nikolov > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -- Amar Tumballi (amarts) -------------- next part -------------- An HTML attachment was scrubbed... URL: From olaf.buitelaar at gmail.com Wed Apr 3 10:28:04 2019 From: olaf.buitelaar at gmail.com (Olaf Buitelaar) Date: Wed, 3 Apr 2019 12:28:04 +0200 Subject: [Gluster-users] [ovirt-users] Re: Announcing Gluster release 5.5 In-Reply-To: References: <20190328164716.27693.35887@mail.ovirt.org> Message-ID: Dear Mohit, Sorry i thought Krutika was referring to the ovirt-kube brick logs. due the large size (18MB compressed), i've placed the files here; https://edgecastcdn.net/0004FA/files/bricklogs.tar.bz2 Also i see i've attached the wrong files, i intended to attach profile_data4.txt | profile_data3.txt Sorry for the confusion. Thanks Olaf Op wo 3 apr. 2019 om 04:56 schreef Mohit Agrawal : > Hi Olaf, > > As per current attached "multi-glusterfsd-vol3.txt | > multi-glusterfsd-vol4.txt" it is showing multiple processes are running > for "ovirt-core ovirt-engine" brick names but there are no logs > available in bricklogs.zip specific to this bricks, bricklogs.zip > has a dump of ovirt-kube logs only > > Kindly share brick logs specific to the bricks "ovirt-core > ovirt-engine" and share glusterd logs also. > > Regards > Mohit Agrawal > > On Tue, Apr 2, 2019 at 9:18 PM Olaf Buitelaar > wrote: > >> Dear Krutika, >> >> 1. >> I've changed the volume settings, write performance seems to increased >> somewhat, however the profile doesn't really support that since latencies >> increased. However read performance has diminished, which does seem to be >> supported by the profile runs (attached). >> Also the IO does seem to behave more consistent than before. >> I don't really understand the idea behind them, maybe you can explain why >> these suggestions are good? >> These settings seems to avoid as much local caching and access as >> possible and push everything to the gluster processes. While i would expect >> local access and local caches are a good thing, since it would lead to >> having less network access or disk access. >> I tried to investigate these settings a bit more, and this is what i >> understood of them; >> - network.remote-dio; when on it seems to ignore the O_DIRECT flag in the >> client, thus causing the files to be cached and buffered in the page cache >> on the client, i would expect this to be a good thing especially if the >> server process would access the same page cache? >> At least that is what grasp from this commit; >> https://review.gluster.org/#/c/glusterfs/+/4206/2/xlators/protocol/client/src/client.c line >> 867 >> Also found this commit; >> https://github.com/gluster/glusterfs/commit/06c4ba589102bf92c58cd9fba5c60064bc7a504e#diff-938709e499b4383c3ed33c3979b9080c suggesting >> remote-dio actually improves performance, not sure it's a write or read >> benchmark >> When a file is opened with O_DIRECT it will also disable the write-behind >> functionality >> >> - performance.strict-o-direct: when on, the AFR, will not ignore the >> O_DIRECT flag. and will invoke: fop_writev_stub with the wb_writev_helper, >> which seems to stack the operation, no idea why that is. But generally i >> suppose not ignoring the O_DIRECT flag in the AFR is a good thing, when a >> processes requests to have O_DIRECT. So this makes sense to me. >> >> - cluster.choose-local: when off, it doesn't prefer the local node, but >> would always choose a brick. Since it's a 9 node cluster, with 3 >> subvolumes, only a 1/3 could end-up local, and the other 2/3 should be >> pushed to external nodes anyway. Or am I making the total wrong assumption >> here? >> >> It seems to this config is moving to the gluster-block config side of >> things, which does make sense. >> Since we're running quite some mysql instances, which opens the files >> with O_DIRECt i believe, it would mean the only layer of cache is within >> mysql it self. Which you could argue is a good thing. But i would expect a >> little of write-behind buffer, and maybe some of the data cached within >> gluster would alleviate things a bit on gluster's side. But i wouldn't know >> if that's the correct mind set, and so might be totally off here. >> Also i would expect these gluster v set command to be online >> operations, but somehow the bricks went down, after applying these changes. >> What appears to have happened is that after the update the brick process >> was restarted, but due to multiple brick process start issue, multiple >> processes were started, and the brick didn't came online again. >> However i'll try to reproduce this, since i would like to test with >> cluster.choose-local: on, and see how performance compares. And hopefully >> when it occurs collect some useful info. >> Question; are network.remote-dio and performance.strict-o-direct mutually >> exclusive settings, or can they both be on? >> >> 2. I've attached all brick logs, the only thing relevant i found was; >> [2019-03-28 20:20:07.170452] I [MSGID: 113030] >> [posix-entry-ops.c:1146:posix_unlink] 0-ovirt-kube-posix: >> open-fd-key-status: 0 for >> /data/gfs/bricks/brick1/ovirt-kube/.shard/a38d64bc-a28b-4ee1-a0bb-f919e7a1022c.109886 >> [2019-03-28 20:20:07.170491] I [MSGID: 113031] >> [posix-entry-ops.c:1053:posix_skip_non_linkto_unlink] 0-posix: linkto_xattr >> status: 0 for >> /data/gfs/bricks/brick1/ovirt-kube/.shard/a38d64bc-a28b-4ee1-a0bb-f919e7a1022c.109886 >> [2019-03-28 20:20:07.248480] I [MSGID: 113030] >> [posix-entry-ops.c:1146:posix_unlink] 0-ovirt-kube-posix: >> open-fd-key-status: 0 for >> /data/gfs/bricks/brick1/ovirt-kube/.shard/a38d64bc-a28b-4ee1-a0bb-f919e7a1022c.109886 >> [2019-03-28 20:20:07.248491] I [MSGID: 113031] >> [posix-entry-ops.c:1053:posix_skip_non_linkto_unlink] 0-posix: linkto_xattr >> status: 0 for >> /data/gfs/bricks/brick1/ovirt-kube/.shard/a38d64bc-a28b-4ee1-a0bb-f919e7a1022c.109886 >> >> Thanks Olaf >> >> ps. sorry needed to resend since it exceed the file limit >> >> Op ma 1 apr. 2019 om 07:56 schreef Krutika Dhananjay > >: >> >>> Adding back gluster-users >>> Comments inline ... >>> >>> On Fri, Mar 29, 2019 at 8:11 PM Olaf Buitelaar >>> wrote: >>> >>>> Dear Krutika, >>>> >>>> >>>> >>>> 1. I?ve made 2 profile runs of around 10 minutes (see files >>>> profile_data.txt and profile_data2.txt). Looking at it, most time seems be >>>> spent at the fop?s fsync and readdirp. >>>> >>>> Unfortunate I don?t have the profile info for the 3.12.15 version so >>>> it?s a bit hard to compare. >>>> >>>> One additional thing I do notice on 1 machine (10.32.9.5) the iowait >>>> time increased a lot, from an average below the 1% it?s now around the 12% >>>> after the upgrade. >>>> >>>> So first suspicion with be lighting strikes twice, and I?ve also just >>>> now a bad disk, but that doesn?t appear to be the case, since all smart >>>> status report ok. >>>> >>>> Also dd shows performance I would more or less expect; >>>> >>>> dd if=/dev/zero of=/data/test_file bs=100M count=1 oflag=dsync >>>> >>>> 1+0 records in >>>> >>>> 1+0 records out >>>> >>>> 104857600 bytes (105 MB) copied, 0.686088 s, 153 MB/s >>>> >>>> dd if=/dev/zero of=/data/test_file bs=1G count=1 oflag=dsync >>>> >>>> 1+0 records in >>>> >>>> 1+0 records out >>>> >>>> 1073741824 bytes (1.1 GB) copied, 7.61138 s, 141 MB/s >>>> >>>> if=/dev/urandom of=/data/test_file bs=1024 count=1000000 >>>> >>>> 1000000+0 records in >>>> >>>> 1000000+0 records out >>>> >>>> 1024000000 bytes (1.0 GB) copied, 6.35051 s, 161 MB/s >>>> >>>> dd if=/dev/zero of=/data/test_file bs=1024 count=1000000 >>>> >>>> 1000000+0 records in >>>> >>>> 1000000+0 records out >>>> >>>> 1024000000 bytes (1.0 GB) copied, 1.6899 s, 606 MB/s >>>> >>>> When I disable this brick (service glusterd stop; pkill glusterfsd) >>>> performance in gluster is better, but not on par with what it was. Also the >>>> cpu usages on the ?neighbor? nodes which hosts the other bricks in the same >>>> subvolume increases quite a lot in this case, which I wouldn?t expect >>>> actually since they shouldn't handle much more work, except flagging shards >>>> to heal. Iowait also goes to idle once gluster is stopped, so it?s for >>>> sure gluster which waits for io. >>>> >>>> >>>> >>> >>> So I see that FSYNC %-latency is on the higher side. And I also noticed >>> you don't have direct-io options enabled on the volume. >>> Could you set the following options on the volume - >>> # gluster volume set network.remote-dio off >>> # gluster volume set performance.strict-o-direct on >>> and also disable choose-local >>> # gluster volume set cluster.choose-local off >>> >>> let me know if this helps. >>> >>> 2. I?ve attached the mnt log and volume info, but I couldn?t find >>>> anything relevant in in those logs. I think this is because we run the VM?s >>>> with libgfapi; >>>> >>>> [root at ovirt-host-01 ~]# engine-config -g LibgfApiSupported >>>> >>>> LibgfApiSupported: true version: 4.2 >>>> >>>> LibgfApiSupported: true version: 4.1 >>>> >>>> LibgfApiSupported: true version: 4.3 >>>> >>>> And I can confirm the qemu process is invoked with the gluster:// >>>> address for the images. >>>> >>>> The message is logged in the /var/lib/libvert/qemu/ file, >>>> which I?ve also included. For a sample case see around; 2019-03-28 20:20:07 >>>> >>>> Which has the error; E [MSGID: 133010] >>>> [shard.c:2294:shard_common_lookup_shards_cbk] 0-ovirt-kube-shard: Lookup on >>>> shard 109886 failed. Base file gfid = a38d64bc-a28b-4ee1-a0bb-f919e7a1022c >>>> [Stale file handle] >>>> >>> >>> Could you also attach the brick logs for this volume? >>> >>> >>>> >>>> 3. yes I see multiple instances for the same brick directory, like; >>>> >>>> /usr/sbin/glusterfsd -s 10.32.9.6 --volfile-id >>>> ovirt-core.10.32.9.6.data-gfs-bricks-brick1-ovirt-core -p >>>> /var/run/gluster/vols/ovirt-core/10.32.9.6-data-gfs-bricks-brick1-ovirt-core.pid >>>> -S /var/run/gluster/452591c9165945d9.socket --brick-name >>>> /data/gfs/bricks/brick1/ovirt-core -l >>>> /var/log/glusterfs/bricks/data-gfs-bricks-brick1-ovirt-core.log >>>> --xlator-option *-posix.glusterd-uuid=fb513da6-f3bd-4571-b8a2-db5efaf60cc1 >>>> --process-name brick --brick-port 49154 --xlator-option >>>> ovirt-core-server.listen-port=49154 >>>> >>>> >>>> >>>> I?ve made an export of the output of ps from the time I observed these >>>> multiple processes. >>>> >>>> In addition the brick_mux bug as noted by Atin. I might also have >>>> another possible cause, as ovirt moves nodes from none-operational state or >>>> maintenance state to active/activating, it also seems to restart gluster, >>>> however I don?t have direct proof for this theory. >>>> >>>> >>>> >>> >>> +Atin Mukherjee ^^ >>> +Mohit Agrawal ^^ >>> >>> -Krutika >>> >>> Thanks Olaf >>>> >>>> Op vr 29 mrt. 2019 om 10:03 schreef Sandro Bonazzola < >>>> sbonazzo at redhat.com>: >>>> >>>>> >>>>> >>>>> Il giorno gio 28 mar 2019 alle ore 17:48 >>>>> ha scritto: >>>>> >>>>>> Dear All, >>>>>> >>>>>> I wanted to share my experience upgrading from 4.2.8 to 4.3.1. While >>>>>> previous upgrades from 4.1 to 4.2 etc. went rather smooth, this one was a >>>>>> different experience. After first trying a test upgrade on a 3 node setup, >>>>>> which went fine. i headed to upgrade the 9 node production platform, >>>>>> unaware of the backward compatibility issues between gluster 3.12.15 -> >>>>>> 5.3. After upgrading 2 nodes, the HA engine stopped and wouldn't start. >>>>>> Vdsm wasn't able to mount the engine storage domain, since /dom_md/metadata >>>>>> was missing or couldn't be accessed. Restoring this file by getting a good >>>>>> copy of the underlying bricks, removing the file from the underlying bricks >>>>>> where the file was 0 bytes and mark with the stickybit, and the >>>>>> corresponding gfid's. Removing the file from the mount point, and copying >>>>>> back the file on the mount point. Manually mounting the engine domain, and >>>>>> manually creating the corresponding symbolic links in /rhev/data-center and >>>>>> /var/run/vdsm/storage and fixing the ownership back to vdsm.kvm (which was >>>>>> root.root), i was able to start the HA engine again. Since the engine was >>>>>> up again, and things seemed rather unstable i decided to continue the >>>>>> upgrade on the other nodes suspecting an incompatibility in gluster >>>>>> versions, i thought would be best to have them all on the same version >>>>>> rather soonish. However things went from bad to worse, the engine stopped >>>>>> again, and all vm?s stopped working as well. So on a machine outside the >>>>>> setup and restored a backup of the engine taken from version 4.2.8 just >>>>>> before the upgrade. With this engine I was at least able to start some vm?s >>>>>> again, and finalize the upgrade. Once the upgraded, things didn?t stabilize >>>>>> and also lose 2 vm?s during the process due to image corruption. After >>>>>> figuring out gluster 5.3 had quite some issues I was as lucky to see >>>>>> gluster 5.5 was about to be released, on the moment the RPM?s were >>>>>> available I?ve installed those. This helped a lot in terms of stability, >>>>>> for which I?m very grateful! However the performance is unfortunate >>>>>> terrible, it?s about 15% of what the performance was running gluster >>>>>> 3.12.15. It?s strange since a simple dd shows ok performance, but our >>>>>> actual workload doesn?t. While I would expect the performance to be better, >>>>>> due to all improvements made since gluster version 3.12. Does anybody share >>>>>> the same experience? >>>>>> I really hope gluster 6 will soon be tested with ovirt and released, >>>>>> and things start to perform and stabilize again..like the good old days. Of >>>>>> course when I can do anything, I?m happy to help. >>>>>> >>>>> >>>>> Opened https://bugzilla.redhat.com/show_bug.cgi?id=1693998 to track >>>>> the rebase on Gluster 6. >>>>> >>>>> >>>>> >>>>>> >>>>>> I think the following short list of issues we have after the >>>>>> migration; >>>>>> Gluster 5.5; >>>>>> - Poor performance for our workload (mostly write dependent) >>>>>> - VM?s randomly pause on unknown storage errors, which are >>>>>> ?stale file?s?. corresponding log; Lookup on shard 797 failed. Base file >>>>>> gfid = 8a27b91a-ff02-42dc-bd4c-caa019424de8 [Stale file handle] >>>>>> - Some files are listed twice in a directory (probably related >>>>>> the stale file issue?) >>>>>> Example; >>>>>> ls -la >>>>>> /rhev/data-center/59cd53a9-0003-02d7-00eb-0000000001e3/313f5d25-76af-4ecd-9a20-82a2fe815a3c/images/4add6751-3731-4bbd-ae94-aaeed12ea450/ >>>>>> total 3081 >>>>>> drwxr-x---. 2 vdsm kvm 4096 Mar 18 11:34 . >>>>>> drwxr-xr-x. 13 vdsm kvm 4096 Mar 19 09:42 .. >>>>>> -rw-rw----. 1 vdsm kvm 1048576 Mar 28 12:55 >>>>>> 1a7cf259-6b29-421d-9688-b25dfaafb13c >>>>>> -rw-rw----. 1 vdsm kvm 1048576 Mar 28 12:55 >>>>>> 1a7cf259-6b29-421d-9688-b25dfaafb13c >>>>>> -rw-rw----. 1 vdsm kvm 1048576 Jan 27 2018 >>>>>> 1a7cf259-6b29-421d-9688-b25dfaafb13c.lease >>>>>> -rw-r--r--. 1 vdsm kvm 290 Jan 27 2018 >>>>>> 1a7cf259-6b29-421d-9688-b25dfaafb13c.meta >>>>>> -rw-r--r--. 1 vdsm kvm 290 Jan 27 2018 >>>>>> 1a7cf259-6b29-421d-9688-b25dfaafb13c.meta >>>>>> >>>>>> - brick processes sometimes starts multiple times. Sometimes I?ve 5 >>>>>> brick processes for a single volume. Killing all glusterfsd?s for the >>>>>> volume on the machine and running gluster v start force usually just >>>>>> starts one after the event, from then on things look all right. >>>>>> >>>>>> >>>>> May I kindly ask to open bugs on Gluster for above issues at >>>>> https://bugzilla.redhat.com/enter_bug.cgi?product=GlusterFS ? >>>>> Sahina? >>>>> >>>>> >>>>>> Ovirt 4.3.2.1-1.el7 >>>>>> - All vms images ownership are changed to root.root after the >>>>>> vm is shutdown, probably related to; >>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1666795 but not only >>>>>> scoped to the HA engine. I?m still in compatibility mode 4.2 for the >>>>>> cluster and for the vm?s, but upgraded to version ovirt 4.3.2 >>>>>> >>>>> >>>>> Ryan? >>>>> >>>>> >>>>>> - The network provider is set to ovn, which is fine..actually >>>>>> cool, only the ?ovs-vswitchd? is a CPU hog, and utilizes 100% >>>>>> >>>>> >>>>> Miguel? Dominik? >>>>> >>>>> >>>>>> - It seems on all nodes vdsm tries to get the the stats for the >>>>>> HA engine, which is filling the logs with (not sure if this is new); >>>>>> [api.virt] FINISH getStats return={'status': {'message': "Virtual >>>>>> machine does not exist: {'vmId': u'20d69acd-edfd-4aeb-a2ae-49e9c121b7e9'}", >>>>>> 'code': 1}} from=::1,59290, vmId=20d69acd-edfd-4aeb-a2ae-49e9c121b7e9 >>>>>> (api:54) >>>>>> >>>>> >>>>> Simone? >>>>> >>>>> >>>>>> - It seems the package os_brick [root] managedvolume not >>>>>> supported: Managed Volume Not Supported. Missing package os-brick.: >>>>>> ('Cannot import os_brick',) (caps:149) which fills the vdsm.log, but for >>>>>> this I also saw another message, so I suspect this will already be resolved >>>>>> shortly >>>>>> - The machine I used to run the backup HA engine, doesn?t want >>>>>> to get removed from the hosted-engine ?vm-status, not even after running; >>>>>> hosted-engine --clean-metadata --host-id=10 --force-clean or hosted-engine >>>>>> --clean-metadata --force-clean from the machine itself. >>>>>> >>>>> >>>>> Simone? >>>>> >>>>> >>>>>> >>>>>> Think that's about it. >>>>>> >>>>>> Don?t get me wrong, I don?t want to rant, I just wanted to share my >>>>>> experience and see where things can made better. >>>>>> >>>>> >>>>> If not already done, can you please open bugs for above issues at >>>>> https://bugzilla.redhat.com/enter_bug.cgi?classification=oVirt ? >>>>> >>>>> >>>>>> >>>>>> >>>>>> Best Olaf >>>>>> _______________________________________________ >>>>>> Users mailing list -- users at ovirt.org >>>>>> To unsubscribe send an email to users-leave at ovirt.org >>>>>> Privacy Statement: https://www.ovirt.org/site/privacy-policy/ >>>>>> oVirt Code of Conduct: >>>>>> https://www.ovirt.org/community/about/community-guidelines/ >>>>>> List Archives: >>>>>> https://lists.ovirt.org/archives/list/users at ovirt.org/message/3CO35Q7VZMWNHS4LPUJNO7S47MGLSKS5/ >>>>>> >>>>> >>>>> >>>>> -- >>>>> >>>>> SANDRO BONAZZOLA >>>>> >>>>> MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV >>>>> >>>>> Red Hat EMEA >>>>> >>>>> sbonazzo at redhat.com >>>>> >>>>> >>>> _______________________________________________ >>>> Users mailing list -- users at ovirt.org >>>> To unsubscribe send an email to users-leave at ovirt.org >>>> Privacy Statement: https://www.ovirt.org/site/privacy-policy/ >>>> oVirt Code of Conduct: >>>> https://www.ovirt.org/community/about/community-guidelines/ >>>> List Archives: >>>> https://lists.ovirt.org/archives/list/users at ovirt.org/message/HAGTA64LF7LLE6YMHQ6DLT26MD2GZ2PK/ >>>> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- Brick: 10.32.9.9:/data0/gfs/bricks/brick1/ovirt-data ---------------------------------------------------- Cumulative Stats: Block Size: 256b+ 512b+ 1024b+ No. of Reads: 11 200 9 No. of Writes: 2 31538 326701 Block Size: 2048b+ 4096b+ 8192b+ No. of Reads: 22 319528 527228 No. of Writes: 53880 1409021 1140345 Block Size: 16384b+ 32768b+ 65536b+ No. of Reads: 27747 3229 120201 No. of Writes: 479690 114939 144204 Block Size: 131072b+ 262144b+ 524288b+ No. of Reads: 209766 7725 43 No. of Writes: 105320 165416 8915 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 2 FORGET 0.00 0.00 us 0.00 us 0.00 us 6728 RELEASE 0.00 0.00 us 0.00 us 0.00 us 42179 RELEASEDIR 0.01 44.17 us 1.07 us 1288.76 us 2914 OPENDIR 0.02 697.13 us 42.15 us 5689.21 us 322 OPEN 0.02 411.08 us 8.60 us 5405.25 us 606 GETXATTR 0.02 1209.66 us 147.78 us 3219.56 us 234 READDIRP 0.03 38.80 us 19.08 us 7544.91 us 7757 STATFS 0.04 826.28 us 13.79 us 3583.18 us 616 READDIR 0.07 61.83 us 15.94 us 131142.59 us 13989 FSTAT 2.03 137.78 us 48.36 us 235353.97 us 172712 FXATTROP 2.16 983.89 us 10.19 us 660025.30 us 25674 LOOKUP 2.90 406.99 us 36.68 us 756289.17 us 83397 FSYNC 4.63 67941.30 us 13.93 us 1840271.15 us 798 INODELK 7.81 576.74 us 75.16 us 422586.52 us 158680 WRITE 40.09 2713.33 us 11.70 us 1850709.72 us 173111 FINODELK 40.16 3587.78 us 72.64 us 729965.74 us 131143 READ Duration: 58768 seconds Data Read: 45226370705 bytes Data Written: 133611506006 bytes Interval 9 Stats: Block Size: 512b+ 1024b+ 2048b+ No. of Reads: 0 0 0 No. of Writes: 394 387 86 Block Size: 4096b+ 8192b+ 16384b+ No. of Reads: 141 1093 13 No. of Writes: 5905 10055 2308 Block Size: 32768b+ 65536b+ 131072b+ No. of Reads: 15 515 2595 No. of Writes: 763 1465 1637 Block Size: 262144b+ 524288b+ No. of Reads: 2 0 No. of Writes: 2759 73 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 70 RELEASE 0.00 0.00 us 0.00 us 0.00 us 503 RELEASEDIR 0.00 172.94 us 46.56 us 620.23 us 70 OPEN 0.01 153.42 us 11.38 us 855.47 us 111 GETXATTR 0.01 49.63 us 1.23 us 1288.76 us 503 OPENDIR 0.01 434.10 us 27.72 us 2015.25 us 88 READDIR 0.02 1208.37 us 152.54 us 2434.77 us 46 READDIRP 0.02 43.13 us 20.02 us 2030.66 us 1361 STATFS 0.04 45.66 us 18.57 us 284.28 us 2431 FSTAT 1.20 154.41 us 75.97 us 84525.06 us 23005 FXATTROP 2.86 1865.08 us 14.26 us 212498.60 us 4518 LOOKUP 3.78 1006.27 us 38.86 us 756289.17 us 11072 FSYNC 4.27 60261.87 us 17.32 us 1437527.90 us 209 INODELK 8.19 935.38 us 76.82 us 422586.52 us 25832 WRITE 20.67 13949.32 us 89.67 us 707765.19 us 4374 READ 58.93 7494.13 us 12.88 us 1607033.18 us 23206 FINODELK Duration: 740 seconds Data Read: 385507328 bytes Data Written: 1776420864 bytes Brick: 10.32.9.5:/data/gfs/bricks/brick1/ovirt-data --------------------------------------------------- Cumulative Stats: Block Size: 256b+ 512b+ 1024b+ No. of Reads: 2 458 87 No. of Writes: 3 4507 33740 Block Size: 2048b+ 4096b+ 8192b+ No. of Reads: 54 34013 110867 No. of Writes: 6056 341153 234627 Block Size: 16384b+ 32768b+ 65536b+ No. of Reads: 7430 587 28255 No. of Writes: 70451 12767 34177 Block Size: 131072b+ 262144b+ 524288b+ No. of Reads: 2417 6164 15 No. of Writes: 40925 27615 4342 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 2 FORGET 0.00 0.00 us 0.00 us 0.00 us 49432 RELEASE 0.00 0.00 us 0.00 us 0.00 us 40899 RELEASEDIR 0.00 158.97 us 158.97 us 158.97 us 1 MKNOD 0.00 393.70 us 9.69 us 2344.17 us 8 ENTRYLK 0.00 129.60 us 10.54 us 296.61 us 112 READDIR 0.00 1299.61 us 6.43 us 155911.98 us 125 GETXATTR 0.00 3928.24 us 139.26 us 240788.91 us 236 READDIRP 0.03 3784.28 us 15.61 us 469284.63 us 1686 FSTAT 0.04 2368.24 us 28.06 us 242169.67 us 3623 OPEN 0.05 2811.93 us 8.13 us 1250845.84 us 3381 FLUSH 0.06 4385.28 us 0.80 us 527903.92 us 2653 OPENDIR 0.09 2315.69 us 11.48 us 816339.95 us 7750 STATFS 0.18 55337.88 us 8.34 us 1543417.83 us 648 INODELK 0.37 1462.23 us 6.84 us 1127299.99 us 49902 FINODELK 0.57 3924.78 us 11.60 us 968588.21 us 28256 LOOKUP 1.91 7500.40 us 53.88 us 2738720.92 us 49870 FXATTROP 2.21 30153.49 us 63.31 us 3473303.89 us 14319 READ 14.57 110289.45 us 122.19 us 3055911.44 us 25864 FSYNC 79.91 262383.20 us 98.78 us 4500846.60 us 59632 WRITE Duration: 60363 seconds Data Read: 6417030998 bytes Data Written: 27570997546 bytes Interval 9 Stats: Block Size: 512b+ 1024b+ 2048b+ No. of Reads: 0 0 0 No. of Writes: 59 2334 441 Block Size: 4096b+ 8192b+ 16384b+ No. of Reads: 13 331 7 No. of Writes: 4519 1752 790 Block Size: 32768b+ 65536b+ 131072b+ No. of Reads: 0 145 0 No. of Writes: 84 399 151 Block Size: 262144b+ 524288b+ No. of Reads: 0 0 No. of Writes: 214 31 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 615 RELEASE 0.00 0.00 us 0.00 us 0.00 us 467 RELEASEDIR 0.00 45.52 us 8.46 us 78.57 us 25 GETXATTR 0.00 144.98 us 13.50 us 296.61 us 16 READDIR 0.01 7894.70 us 180.57 us 240788.91 us 46 READDIRP 0.03 1404.57 us 1.00 us 94678.96 us 467 OPENDIR 0.06 4985.90 us 17.46 us 403210.36 us 294 FSTAT 0.06 2453.17 us 35.05 us 242169.67 us 615 OPEN 0.10 3976.83 us 9.70 us 1250845.84 us 591 FLUSH 0.10 33579.24 us 10.59 us 937670.52 us 73 INODELK 0.12 2132.57 us 14.22 us 816339.95 us 1361 STATFS 0.29 617.29 us 8.19 us 164742.40 us 11477 FINODELK 0.69 3379.94 us 17.79 us 622513.08 us 5053 LOOKUP 0.84 42003.14 us 160.61 us 1495939.39 us 496 READ 1.66 3575.85 us 68.64 us 1688509.25 us 11476 FXATTROP 22.52 95429.52 us 126.22 us 3055911.44 us 5823 FSYNC 73.52 168379.58 us 110.01 us 4058537.96 us 10773 WRITE Duration: 740 seconds Data Read: 12386304 bytes Data Written: 217700864 bytes Brick: 10.32.9.6:/data/gfs/bricks/bricka/ovirt-data --------------------------------------------------- Cumulative Stats: Block Size: 1b+ No. of Reads: 0 No. of Writes: 789986 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 2 FORGET 0.00 0.00 us 0.00 us 0.00 us 49432 RELEASE 0.00 0.00 us 0.00 us 0.00 us 40938 RELEASEDIR 0.00 21.03 us 16.15 us 34.88 us 8 ENTRYLK 0.00 270.94 us 270.94 us 270.94 us 1 MKNOD 0.01 261.74 us 11.55 us 9174.66 us 116 GETXATTR 0.01 297.48 us 13.73 us 2466.13 us 112 READDIR 0.07 64.73 us 15.29 us 4946.30 us 3382 FLUSH 0.07 82.72 us 1.51 us 4642.85 us 2661 OPENDIR 0.22 193.05 us 39.92 us 64374.98 us 3624 OPEN 0.25 1255.82 us 14.35 us 63381.45 us 648 INODELK 0.89 57.44 us 10.33 us 8940.33 us 50009 FINODELK 1.44 77.62 us 15.84 us 31914.28 us 59679 WRITE 2.59 294.62 us 15.52 us 115626.36 us 28267 LOOKUP 3.49 224.71 us 77.62 us 98174.30 us 49948 FXATTROP 90.95 11273.35 us 78.67 us 453079.55 us 25908 FSYNC Duration: 60366 seconds Data Read: 0 bytes Data Written: 789986 bytes Interval 9 Stats: Block Size: 1b+ No. of Reads: 0 No. of Writes: 10774 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 615 RELEASE 0.00 0.00 us 0.00 us 0.00 us 467 RELEASEDIR 0.00 85.11 us 12.25 us 434.77 us 14 GETXATTR 0.00 42.29 us 17.37 us 500.84 us 73 INODELK 0.00 205.10 us 15.46 us 509.24 us 16 READDIR 0.04 57.10 us 15.29 us 1829.07 us 591 FLUSH 0.05 79.87 us 1.78 us 1854.43 us 467 OPENDIR 0.11 144.84 us 45.31 us 17419.60 us 615 OPEN 0.79 55.64 us 13.17 us 8940.33 us 11478 FINODELK 0.93 69.84 us 16.64 us 6779.39 us 10774 WRITE 1.79 286.71 us 16.91 us 24721.64 us 5053 LOOKUP 3.09 218.15 us 81.40 us 54774.50 us 11476 FXATTROP 93.19 12944.68 us 111.22 us 453079.55 us 5825 FSYNC Duration: 740 seconds Data Read: 0 bytes Data Written: 10774 bytes Brick: 10.32.9.4:/data/gfs/bricks/brick1/ovirt-data --------------------------------------------------- Cumulative Stats: Block Size: 256b+ 512b+ 1024b+ No. of Reads: 52412 6 0 No. of Writes: 3 4504 33731 Block Size: 2048b+ 4096b+ 8192b+ No. of Reads: 0 66342 53000 No. of Writes: 6056 340041 234374 Block Size: 16384b+ 32768b+ 65536b+ No. of Reads: 2686 356 12946 No. of Writes: 70264 12678 34177 Block Size: 131072b+ 262144b+ 524288b+ No. of Reads: 602 3108 3 No. of Writes: 20547 27615 4342 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 2 FORGET 0.00 0.00 us 0.00 us 0.00 us 49333 RELEASE 0.00 0.00 us 0.00 us 0.00 us 40379 RELEASEDIR 0.00 22.96 us 15.97 us 51.06 us 8 ENTRYLK 0.00 260.50 us 260.50 us 260.50 us 1 MKNOD 0.00 77.42 us 16.00 us 2701.10 us 648 INODELK 0.00 144.50 us 7.61 us 1702.83 us 428 GETXATTR 0.01 285.11 us 12.41 us 3140.45 us 406 READDIR 0.01 51.02 us 13.08 us 46431.20 us 3384 FLUSH 0.01 65.58 us 0.94 us 17715.27 us 2808 OPENDIR 0.01 80.40 us 11.70 us 19019.90 us 2445 STAT 0.03 118.76 us 40.23 us 33323.48 us 3626 OPEN 0.03 57.81 us 15.15 us 27740.94 us 7757 STATFS 0.04 197.89 us 119.38 us 17249.34 us 2481 READDIRP 0.36 99.03 us 11.12 us 301165.99 us 49989 FINODELK 1.08 526.62 us 13.29 us 263413.97 us 28422 LOOKUP 1.23 341.69 us 71.48 us 563688.45 us 49950 FXATTROP 7.47 3998.57 us 82.10 us 469183.97 us 25947 READ 35.02 18777.53 us 92.85 us 483169.32 us 25908 FSYNC 54.69 12727.00 us 149.97 us 759284.50 us 59684 WRITE Duration: 58519 seconds Data Read: 3261956842 bytes Data Written: 24886890282 bytes Interval 9 Stats: Block Size: 256b+ 512b+ 1024b+ No. of Reads: 665 0 0 No. of Writes: 0 59 2334 Block Size: 2048b+ 4096b+ 8192b+ No. of Reads: 0 7 103 No. of Writes: 441 4519 1752 Block Size: 16384b+ 32768b+ 65536b+ No. of Reads: 2 3 17 No. of Writes: 790 84 399 Block Size: 131072b+ 262144b+ 524288b+ No. of Reads: 0 0 0 No. of Writes: 151 214 31 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 615 RELEASE 0.00 0.00 us 0.00 us 0.00 us 488 RELEASEDIR 0.00 39.06 us 18.17 us 225.32 us 73 INODELK 0.00 76.36 us 9.68 us 297.33 us 46 GETXATTR 0.01 222.65 us 21.02 us 593.11 us 58 READDIR 0.01 45.45 us 12.62 us 289.83 us 417 STAT 0.01 36.42 us 13.83 us 918.21 us 591 FLUSH 0.01 53.38 us 0.99 us 474.56 us 488 OPENDIR 0.02 87.38 us 40.23 us 5527.50 us 615 OPEN 0.03 49.03 us 18.83 us 3866.60 us 1361 STATFS 0.04 189.02 us 122.54 us 990.42 us 435 READDIRP 0.33 63.39 us 13.25 us 128981.30 us 11477 FINODELK 0.74 321.28 us 13.43 us 37963.21 us 5074 LOOKUP 0.98 186.47 us 80.20 us 43834.36 us 11476 FXATTROP 2.30 6321.31 us 154.25 us 110020.47 us 797 READ 39.43 8011.27 us 168.50 us 404368.45 us 10774 WRITE 56.08 21071.37 us 152.03 us 325318.37 us 5826 FSYNC Duration: 740 seconds Data Read: 2502580 bytes Data Written: 217700864 bytes Brick: 10.32.9.8:/data/gfs/bricks/bricka/ovirt-data --------------------------------------------------- Cumulative Stats: Block Size: 1b+ No. of Reads: 0 No. of Writes: 2836841 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 3501 RELEASE 0.00 0.00 us 0.00 us 0.00 us 39501 RELEASEDIR 0.00 23.26 us 15.42 us 47.19 us 110 FLUSH 0.01 85.78 us 57.99 us 148.82 us 124 REMOVEXATTR 0.01 90.83 us 67.34 us 158.33 us 124 SETATTR 0.01 67.66 us 10.50 us 368.03 us 194 GETXATTR 0.02 84.14 us 41.98 us 499.27 us 250 OPEN 0.04 36.99 us 12.62 us 398.57 us 944 INODELK 0.06 280.66 us 14.10 us 1296.89 us 197 READDIR 0.14 49.60 us 1.25 us 911.95 us 2704 OPENDIR 8.05 27.51 us 11.45 us 86619.65 us 270887 FINODELK 8.60 50.28 us 14.57 us 117405.17 us 158241 WRITE 22.34 810.95 us 15.51 us 136924.46 us 25499 LOOKUP 26.84 184.15 us 32.55 us 187376.40 us 134874 FSYNC 33.87 115.65 us 48.10 us 68557.92 us 271003 FXATTROP Duration: 59079 seconds Data Read: 0 bytes Data Written: 2836841 bytes Interval 9 Stats: Block Size: 1b+ No. of Reads: 0 No. of Writes: 25110 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 84 RELEASE 0.00 0.00 us 0.00 us 0.00 us 482 RELEASEDIR 0.00 22.68 us 15.95 us 30.51 us 22 FLUSH 0.02 81.94 us 66.17 us 121.51 us 24 REMOVEXATTR 0.02 92.75 us 73.22 us 158.33 us 24 SETATTR 0.02 50.76 us 10.50 us 198.71 us 52 GETXATTR 0.06 88.22 us 47.47 us 347.92 us 84 OPEN 0.07 200.12 us 17.43 us 366.16 us 46 READDIR 0.10 43.88 us 12.88 us 398.57 us 298 INODELK 0.17 46.60 us 1.30 us 95.78 us 482 OPENDIR 6.89 35.71 us 14.98 us 8325.56 us 25110 WRITE 8.62 243.97 us 17.27 us 13438.88 us 4599 LOOKUP 9.62 26.97 us 12.23 us 10471.02 us 46438 FINODELK 32.58 183.27 us 33.33 us 182520.02 us 23144 FSYNC 41.83 117.30 us 57.85 us 1991.12 us 46424 FXATTROP Duration: 740 seconds Data Read: 0 bytes Data Written: 25110 bytes Brick: 10.32.9.8:/data0/gfs/bricks/brick1/ovirt-data ---------------------------------------------------- Cumulative Stats: Block Size: 256b+ 512b+ 1024b+ No. of Reads: 8 1097 109 No. of Writes: 0 2901 273197 Block Size: 2048b+ 4096b+ 8192b+ No. of Reads: 115 238693 440909 No. of Writes: 36872 1361504 875644 Block Size: 16384b+ 32768b+ 65536b+ No. of Reads: 37900 3346 141710 No. of Writes: 293109 93776 162079 Block Size: 131072b+ 262144b+ 524288b+ No. of Reads: 3889 7281 33 No. of Writes: 161749 236364 7941 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 1 FORGET 0.00 0.00 us 0.00 us 0.00 us 3720 RELEASE 0.00 0.00 us 0.00 us 0.00 us 39522 RELEASEDIR 0.00 46.19 us 10.83 us 328.71 us 167 GETXATTR 0.00 107.36 us 42.28 us 762.18 us 195 OPEN 0.01 203.17 us 12.21 us 864.86 us 197 READDIR 0.03 43.02 us 1.32 us 452.74 us 2704 OPENDIR 0.06 2113.84 us 1920.14 us 2569.20 us 124 READDIRP 0.06 36.11 us 17.79 us 347.13 us 7757 STATFS 0.09 35.61 us 16.14 us 340.33 us 11844 FSTAT 0.73 27.99 us 11.02 us 73986.88 us 118371 FINODELK 1.77 136.85 us 37.39 us 121066.77 us 58862 FSYNC 1.88 346.99 us 15.01 us 77684.23 us 24658 LOOKUP 3.34 128.87 us 55.07 us 45501.15 us 118386 FXATTROP 5.55 52717.08 us 16.10 us 2004661.60 us 480 INODELK 9.40 234.45 us 75.18 us 172924.48 us 182886 WRITE 77.09 3911.50 us 74.71 us 427304.61 us 89909 READ Duration: 59079 seconds Data Read: 18550783716 bytes Data Written: 169056832000 bytes Interval 9 Stats: Block Size: 512b+ 1024b+ 2048b+ No. of Reads: 0 0 0 No. of Writes: 28 1201 202 Block Size: 4096b+ 8192b+ 16384b+ No. of Reads: 88 1012 13 No. of Writes: 11370 7637 1887 Block Size: 32768b+ 65536b+ 131072b+ No. of Reads: 13 723 0 No. of Writes: 518 690 1562 Block Size: 262144b+ 524288b+ No. of Reads: 0 0 No. of Writes: 2221 50 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 21 RELEASE 0.00 0.00 us 0.00 us 0.00 us 473 RELEASEDIR 0.00 63.17 us 11.55 us 328.71 us 24 GETXATTR 0.00 85.37 us 47.47 us 218.02 us 21 OPEN 0.01 191.44 us 20.33 us 506.91 us 28 READDIR 0.05 42.27 us 1.32 us 103.43 us 473 OPENDIR 0.12 35.99 us 17.79 us 312.67 us 1361 STATFS 0.13 2137.12 us 1993.74 us 2312.10 us 24 READDIRP 0.19 36.56 us 16.68 us 182.58 us 2058 FSTAT 1.46 28.28 us 11.78 us 4920.38 us 20656 FINODELK 3.09 283.99 us 16.10 us 77684.23 us 4368 LOOKUP 3.43 134.79 us 38.66 us 46317.56 us 10211 FSYNC 6.69 129.92 us 55.07 us 1519.44 us 20670 FXATTROP 15.38 225.45 us 75.18 us 166890.53 us 27366 WRITE 18.50 114198.06 us 16.67 us 2004661.60 us 65 INODELK 50.94 11055.19 us 133.17 us 355082.08 us 1849 READ Duration: 740 seconds Data Read: 57180160 bytes Data Written: 1466518016 bytes Brick: 10.32.9.7:/data/gfs/bricks/brick1/ovirt-data --------------------------------------------------- Cumulative Stats: Block Size: 256b+ 512b+ 1024b+ No. of Reads: 8 146 10 No. of Writes: 0 5640 191078 Block Size: 2048b+ 4096b+ 8192b+ No. of Reads: 12 275838 263894 No. of Writes: 29947 1275560 712585 Block Size: 16384b+ 32768b+ 65536b+ No. of Reads: 17182 2303 69829 No. of Writes: 286032 45424 94648 Block Size: 131072b+ 262144b+ 524288b+ No. of Reads: 2287 4034 20 No. of Writes: 88659 100478 6790 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 3501 RELEASE 0.00 0.00 us 0.00 us 0.00 us 40057 RELEASEDIR 0.00 20.98 us 14.68 us 31.79 us 110 FLUSH 0.00 82.04 us 58.17 us 510.84 us 124 REMOVEXATTR 0.00 86.21 us 62.72 us 156.44 us 124 SETATTR 0.00 268.84 us 6.67 us 2674.39 us 259 GETXATTR 0.00 321.60 us 41.89 us 3004.61 us 250 OPEN 0.01 475.40 us 14.32 us 1787.62 us 281 READDIR 0.01 1249.13 us 26.42 us 3832.88 us 234 READDIRP 0.05 178.77 us 16.13 us 351764.07 us 7757 STATFS 0.09 822.57 us 1.17 us 1068559.38 us 2746 OPENDIR 0.12 292.98 us 25.31 us 1365160.22 us 10719 FSTAT 0.90 25010.58 us 13.33 us 887933.44 us 941 INODELK 1.38 133.53 us 11.10 us 3938189.55 us 270885 FINODELK 1.68 162.21 us 45.86 us 2503412.21 us 271003 FXATTROP 2.03 394.43 us 31.20 us 756176.42 us 134874 FSYNC 12.66 2092.21 us 72.21 us 4245933.36 us 158241 WRITE 14.29 14633.42 us 10.75 us 4031333.55 us 25543 LOOKUP 66.78 18236.84 us 69.72 us 6429153.50 us 95797 READ Duration: 59031 seconds Data Read: 10396155504 bytes Data Written: 84404067328 bytes Interval 9 Stats: Block Size: 512b+ 1024b+ 2048b+ No. of Reads: 0 0 0 No. of Writes: 64 279 45 Block Size: 4096b+ 8192b+ 16384b+ No. of Reads: 172 529 10 No. of Writes: 14494 5264 1377 Block Size: 32768b+ 65536b+ 131072b+ No. of Reads: 5 351 0 No. of Writes: 316 548 1064 Block Size: 262144b+ 524288b+ No. of Reads: 0 0 No. of Writes: 1620 39 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 84 RELEASE 0.00 0.00 us 0.00 us 0.00 us 494 RELEASEDIR 0.00 22.20 us 17.97 us 31.79 us 22 FLUSH 0.00 81.91 us 68.21 us 101.65 us 24 REMOVEXATTR 0.00 89.38 us 72.10 us 128.30 us 24 SETATTR 0.01 43.26 us 1.29 us 153.44 us 494 OPENDIR 0.01 427.33 us 15.79 us 1319.22 us 70 READDIR 0.02 33.36 us 19.57 us 258.67 us 1361 STATFS 0.02 561.73 us 9.55 us 2674.39 us 86 GETXATTR 0.02 1283.75 us 28.25 us 3720.17 us 46 READDIRP 0.03 790.46 us 41.89 us 3004.61 us 84 OPEN 0.03 42.33 us 25.31 us 308.87 us 1862 FSTAT 0.53 27.26 us 11.92 us 22788.88 us 46436 FINODELK 0.61 316.49 us 10.75 us 100131.38 us 4611 LOOKUP 2.25 231.96 us 37.46 us 540421.26 us 23144 FSYNC 2.97 152.85 us 51.13 us 541193.18 us 46424 FXATTROP 4.89 39428.07 us 14.19 us 881646.99 us 296 INODELK 18.93 1799.80 us 72.85 us 3118991.86 us 25110 WRITE 69.67 155845.21 us 130.17 us 4885484.01 us 1067 READ Duration: 740 seconds Data Read: 28553216 bytes Data Written: 1032566784 bytes Brick: 10.32.9.21:/data/gfs/bricks/bricka/ovirt-data ---------------------------------------------------- Cumulative Stats: Block Size: 1b+ No. of Reads: 0 No. of Writes: 3513729 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 1 FORGET 0.00 0.00 us 0.00 us 0.00 us 4089 RELEASE 0.00 0.00 us 0.00 us 0.00 us 44272 RELEASEDIR 0.18 42.97 us 1.09 us 1138.82 us 2893 OPENDIR 0.42 611.92 us 15.33 us 7388.53 us 479 INODELK 0.56 679.34 us 15.23 us 2483.07 us 574 READDIR 0.61 2170.81 us 48.25 us 13138.61 us 195 OPEN 0.82 1066.87 us 8.75 us 13801.35 us 535 GETXATTR 4.89 28.84 us 10.98 us 51214.69 us 118373 FINODELK 9.18 35.04 us 14.64 us 81798.39 us 182886 WRITE 18.04 506.78 us 12.31 us 165781.70 us 24847 LOOKUP 22.07 130.09 us 53.80 us 40959.22 us 118386 FXATTROP 43.23 512.45 us 38.18 us 285202.84 us 58862 FSYNC Duration: 60363 seconds Data Read: 0 bytes Data Written: 3513729 bytes Interval 9 Stats: Block Size: 1b+ No. of Reads: 0 No. of Writes: 27366 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 21 RELEASE 0.00 0.00 us 0.00 us 0.00 us 500 RELEASEDIR 0.02 82.18 us 48.25 us 161.40 us 21 OPEN 0.02 41.30 us 16.13 us 156.19 us 65 INODELK 0.08 136.77 us 13.02 us 909.03 us 66 GETXATTR 0.20 43.34 us 1.27 us 277.72 us 500 OPENDIR 0.33 428.51 us 15.42 us 1215.07 us 82 READDIR 5.70 29.84 us 11.88 us 2949.68 us 20656 FINODELK 9.15 36.14 us 14.64 us 4606.43 us 27366 WRITE 11.51 283.22 us 12.31 us 53047.02 us 4395 LOOKUP 26.02 136.12 us 53.80 us 40959.22 us 20670 FXATTROP 46.96 497.27 us 40.09 us 274185.32 us 10211 FSYNC Duration: 740 seconds Data Read: 0 bytes Data Written: 27366 bytes Brick: 10.32.9.21:/data0/gfs/bricks/brick1/ovirt-data ----------------------------------------------------- Cumulative Stats: Block Size: 256b+ 512b+ 1024b+ No. of Reads: 11 66 2 No. of Writes: 2 31826 326701 Block Size: 2048b+ 4096b+ 8192b+ No. of Reads: 4 116933 302613 No. of Writes: 53880 1410548 1140566 Block Size: 16384b+ 32768b+ 65536b+ No. of Reads: 11546 1749 76356 No. of Writes: 479855 114971 144225 Block Size: 131072b+ 262144b+ 524288b+ No. of Reads: 1906 4093 24 No. of Writes: 105312 165416 8915 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 2 FORGET 0.00 0.00 us 0.00 us 0.00 us 6820 RELEASE 0.00 0.00 us 0.00 us 0.00 us 44299 RELEASEDIR 0.03 42.29 us 13.86 us 9356.79 us 2425 STAT 0.04 46.94 us 1.04 us 772.13 us 2893 OPENDIR 0.04 314.86 us 10.29 us 2391.56 us 479 GETXATTR 0.05 207.42 us 14.29 us 5815.42 us 799 INODELK 0.05 51.40 us 28.18 us 540.78 us 3666 FSTAT 0.09 39.82 us 18.62 us 9889.16 us 7757 STATFS 0.12 1358.64 us 43.37 us 233429.90 us 322 OPEN 0.14 851.72 us 15.78 us 4414.06 us 574 READDIR 0.16 224.28 us 143.72 us 3249.69 us 2482 READDIRP 1.46 30.22 us 10.80 us 110711.59 us 173110 FINODELK 4.32 601.98 us 15.19 us 91847.23 us 25659 LOOKUP 6.30 130.39 us 49.28 us 232146.00 us 172711 FXATTROP 8.84 378.85 us 35.58 us 430356.59 us 83395 FSYNC 23.35 525.84 us 73.80 us 494782.91 us 158694 WRITE 55.01 4360.84 us 78.00 us 503162.29 us 45075 READ Duration: 60363 seconds Data Read: 10404548654 bytes Data Written: 133624068438 bytes Interval 9 Stats: Block Size: 512b+ 1024b+ 2048b+ No. of Reads: 0 0 0 No. of Writes: 394 387 86 Block Size: 4096b+ 8192b+ 16384b+ No. of Reads: 31 563 12 No. of Writes: 5905 10055 2308 Block Size: 32768b+ 65536b+ 131072b+ No. of Reads: 7 368 1 No. of Writes: 763 1465 1637 Block Size: 262144b+ 524288b+ No. of Reads: 1 0 No. of Writes: 2759 73 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 70 RELEASE 0.00 0.00 us 0.00 us 0.00 us 500 RELEASEDIR 0.02 42.46 us 14.29 us 589.73 us 210 INODELK 0.03 34.11 us 20.98 us 177.92 us 413 STAT 0.04 298.66 us 10.97 us 1498.98 us 63 GETXATTR 0.06 48.48 us 1.17 us 120.46 us 500 OPENDIR 0.08 52.55 us 31.44 us 108.17 us 637 FSTAT 0.12 37.70 us 18.62 us 406.46 us 1361 STATFS 0.17 871.79 us 16.50 us 2028.77 us 82 READDIR 0.23 227.43 us 145.28 us 3249.69 us 435 READDIRP 0.56 3427.01 us 43.37 us 233429.90 us 70 OPEN 1.62 30.00 us 11.97 us 3260.32 us 23206 FINODELK 8.55 159.50 us 72.57 us 232146.00 us 23005 FXATTROP 8.94 849.78 us 16.81 us 91847.23 us 4515 LOOKUP 24.77 959.75 us 38.69 us 430356.59 us 11072 FSYNC 24.89 10859.72 us 153.99 us 250511.54 us 983 READ 29.91 496.71 us 73.80 us 453768.34 us 25832 WRITE Duration: 740 seconds Data Read: 30142464 bytes Data Written: 1776420864 bytes Brick: 10.32.9.20:/data/gfs/bricks/bricka/ovirt-data ---------------------------------------------------- Cumulative Stats: Block Size: 1b+ No. of Reads: 0 No. of Writes: 3979583 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 2 FORGET 0.00 0.00 us 0.00 us 0.00 us 6630 RELEASE 0.00 0.00 us 0.00 us 0.00 us 36900 RELEASEDIR 0.01 57.48 us 12.65 us 329.66 us 84 GETXATTR 0.02 193.66 us 14.90 us 878.82 us 70 READDIR 0.05 105.94 us 41.17 us 683.04 us 322 OPEN 0.06 54.19 us 15.72 us 5135.95 us 800 INODELK 0.19 54.37 us 1.60 us 1035.48 us 2641 OPENDIR 7.68 32.94 us 11.38 us 68417.92 us 173114 FINODELK 9.52 44.57 us 14.70 us 55440.51 us 158694 WRITE 24.39 712.95 us 16.53 us 280142.79 us 25407 LOOKUP 27.40 243.98 us 34.94 us 251521.50 us 83395 FSYNC 30.68 131.93 us 50.81 us 55731.00 us 172711 FXATTROP Duration: 57920 seconds Data Read: 0 bytes Data Written: 3979583 bytes Interval 9 Stats: Block Size: 1b+ No. of Reads: 0 No. of Writes: 25832 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 70 RELEASE 0.00 0.00 us 0.00 us 0.00 us 464 RELEASEDIR 0.01 62.34 us 15.20 us 116.82 us 17 GETXATTR 0.02 199.99 us 16.79 us 778.14 us 10 READDIR 0.10 114.77 us 46.50 us 683.04 us 70 OPEN 0.22 86.24 us 16.71 us 5135.95 us 211 INODELK 0.32 56.51 us 1.98 us 1035.48 us 464 OPENDIR 8.88 31.82 us 12.28 us 7988.05 us 23206 FINODELK 11.60 37.32 us 15.06 us 2981.61 us 25832 WRITE 12.08 224.23 us 20.07 us 39256.75 us 4479 LOOKUP 28.45 213.58 us 40.22 us 94343.80 us 11072 FSYNC 38.31 138.39 us 69.94 us 3069.85 us 23005 FXATTROP Duration: 740 seconds Data Read: 0 bytes Data Written: 25832 bytes Brick: 10.32.9.20:/data0/gfs/bricks/brick1/ovirt-data ----------------------------------------------------- Cumulative Stats: Block Size: 256b+ 512b+ 1024b+ No. of Reads: 4 109 21 No. of Writes: 0 2901 273197 Block Size: 2048b+ 4096b+ 8192b+ No. of Reads: 42 112256 166124 No. of Writes: 36872 1361504 875644 Block Size: 16384b+ 32768b+ 65536b+ No. of Reads: 12144 1370 69995 No. of Writes: 293109 93776 162079 Block Size: 131072b+ 262144b+ 524288b+ No. of Reads: 1466 2942 9 No. of Writes: 161749 236364 7941 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 3097 RELEASE 0.00 0.00 us 0.00 us 0.00 us 36900 RELEASEDIR 0.00 53.86 us 12.74 us 546.30 us 60 GETXATTR 0.00 211.97 us 16.62 us 988.45 us 70 READDIR 0.00 101.31 us 39.98 us 497.56 us 195 OPEN 0.00 46.92 us 14.96 us 3469.09 us 481 INODELK 0.01 52.86 us 1.65 us 1573.96 us 2641 OPENDIR 0.03 216.49 us 152.23 us 2562.55 us 2482 READDIRP 0.03 73.86 us 18.77 us 125905.96 us 7757 STATFS 0.06 111.91 us 16.04 us 655589.61 us 10152 FSTAT 0.07 542.53 us 12.89 us 523421.21 us 2425 STAT 1.10 803.43 us 18.50 us 1534952.31 us 24595 LOOKUP 1.16 177.03 us 11.27 us 1749236.34 us 118375 FINODELK 1.44 218.66 us 58.80 us 1784231.76 us 118390 FXATTROP 13.72 4194.48 us 39.91 us 2743546.94 us 58865 FSYNC 36.03 14004.46 us 79.14 us 2966713.52 us 46303 READ 46.33 4558.98 us 77.68 us 2638579.30 us 182887 WRITE Duration: 57920 seconds Data Read: 8237195368 bytes Data Written: 169056832000 bytes Interval 9 Stats: Block Size: 512b+ 1024b+ 2048b+ No. of Reads: 0 0 0 No. of Writes: 28 1201 202 Block Size: 4096b+ 8192b+ 16384b+ No. of Reads: 131 256 14 No. of Writes: 11370 7637 1887 Block Size: 32768b+ 65536b+ 131072b+ No. of Reads: 2 277 4 No. of Writes: 518 690 1562 Block Size: 262144b+ 524288b+ No. of Reads: 2 0 No. of Writes: 2221 50 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 21 RELEASE 0.00 0.00 us 0.00 us 0.00 us 464 RELEASEDIR 0.00 47.44 us 13.63 us 73.33 us 12 GETXATTR 0.00 212.66 us 19.17 us 803.07 us 10 READDIR 0.00 118.97 us 59.03 us 352.73 us 21 OPEN 0.00 45.59 us 20.23 us 248.98 us 65 INODELK 0.01 56.79 us 1.86 us 1008.29 us 464 OPENDIR 0.03 48.42 us 20.11 us 1484.22 us 1764 FSTAT 0.04 214.84 us 152.23 us 558.38 us 435 READDIRP 0.06 113.06 us 19.98 us 96371.03 us 1361 STATFS 0.07 470.41 us 15.68 us 177048.08 us 413 STAT 0.61 369.50 us 21.75 us 98568.83 us 4359 LOOKUP 0.84 107.63 us 11.27 us 422960.68 us 20656 FINODELK 1.50 191.62 us 58.80 us 669097.82 us 20670 FXATTROP 15.43 59246.36 us 108.59 us 2819788.24 us 686 READ 15.75 4062.95 us 40.23 us 1993844.42 us 10211 FSYNC 65.65 6319.81 us 80.06 us 2441596.43 us 27366 WRITE Duration: 740 seconds Data Read: 22843392 bytes Data Written: 1466518016 bytes Brick: 10.32.9.3:/data/gfs/bricks/brick3/ovirt-data --------------------------------------------------- Cumulative Stats: Block Size: 256b+ 512b+ 1024b+ No. of Reads: 4 131 8 No. of Writes: 0 5640 191078 Block Size: 2048b+ 4096b+ 8192b+ No. of Reads: 16 81419 161095 No. of Writes: 29947 1275560 712585 Block Size: 16384b+ 32768b+ 65536b+ No. of Reads: 17291 1342 103864 No. of Writes: 286032 45424 94648 Block Size: 131072b+ 262144b+ 524288b+ No. of Reads: 1022 1870 19 No. of Writes: 88659 100478 6790 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 3665 RELEASE 0.00 0.00 us 0.00 us 0.00 us 40654 RELEASEDIR 0.00 77.86 us 14.39 us 255.33 us 58 GETXATTR 0.00 55.93 us 18.71 us 185.55 us 110 FLUSH 0.00 148.39 us 76.71 us 306.67 us 124 REMOVEXATTR 0.00 156.41 us 94.69 us 237.19 us 124 SETATTR 0.00 282.94 us 26.64 us 1991.99 us 72 READDIR 0.01 249.61 us 62.86 us 1608.99 us 250 OPEN 0.02 108.18 us 23.05 us 1656.32 us 942 INODELK 0.03 73.13 us 23.73 us 337.63 us 2425 STAT 0.04 93.83 us 1.99 us 17101.96 us 2641 OPENDIR 0.10 78.77 us 24.79 us 1132.37 us 7757 STATFS 0.14 108.18 us 41.73 us 4078.40 us 7332 FSTAT 0.18 262.35 us 91.17 us 5148.11 us 3890 READDIRP 2.78 59.76 us 13.90 us 60015.05 us 270884 FINODELK 3.23 739.71 us 25.36 us 119501.01 us 25436 LOOKUP 7.72 333.10 us 45.00 us 283828.60 us 134874 FSYNC 9.10 195.46 us 67.03 us 157955.41 us 271003 FXATTROP 19.48 716.64 us 94.11 us 340140.18 us 158241 WRITE 57.15 8361.71 us 110.31 us 596087.45 us 39783 READ Duration: 60363 seconds Data Read: 9818510392 bytes Data Written: 84404067328 bytes Interval 9 Stats: Block Size: 512b+ 1024b+ 2048b+ No. of Reads: 0 0 0 No. of Writes: 64 279 45 Block Size: 4096b+ 8192b+ 16384b+ No. of Reads: 83 367 8 No. of Writes: 14494 5264 1377 Block Size: 32768b+ 65536b+ 131072b+ No. of Reads: 4 586 0 No. of Writes: 316 548 1064 Block Size: 262144b+ 524288b+ No. of Reads: 0 0 No. of Writes: 1620 39 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 84 RELEASE 0.00 0.00 us 0.00 us 0.00 us 464 RELEASEDIR 0.00 84.14 us 26.99 us 150.82 us 9 GETXATTR 0.00 52.66 us 18.71 us 108.64 us 22 FLUSH 0.00 187.96 us 35.03 us 565.44 us 11 READDIR 0.01 139.45 us 77.67 us 239.13 us 24 REMOVEXATTR 0.01 152.96 us 94.69 us 236.75 us 24 SETATTR 0.05 67.18 us 24.77 us 302.92 us 413 STAT 0.06 395.71 us 67.77 us 1608.99 us 84 OPEN 0.06 82.30 us 2.19 us 210.85 us 464 OPENDIR 0.09 186.01 us 24.60 us 1656.32 us 297 INODELK 0.18 78.19 us 27.95 us 1050.83 us 1361 STATFS 0.25 117.05 us 45.04 us 4078.40 us 1274 FSTAT 0.30 264.20 us 102.02 us 5148.11 us 682 READDIRP 2.07 271.71 us 35.32 us 22720.97 us 4581 LOOKUP 4.81 62.33 us 16.26 us 6962.45 us 46436 FINODELK 12.97 337.03 us 54.20 us 221094.08 us 23144 FSYNC 15.31 198.37 us 91.04 us 5197.06 us 46424 FXATTROP 24.57 588.49 us 97.75 us 228091.00 us 25110 WRITE 39.26 22524.25 us 112.00 us 551619.18 us 1048 READ Duration: 740 seconds Data Read: 42180608 bytes Data Written: 1032566784 bytes -------------- next part -------------- Brick: 10.32.9.9:/data0/gfs/bricks/brick1/ovirt-data ---------------------------------------------------- Cumulative Stats: Block Size: 256b+ 512b+ 1024b+ No. of Reads: 11 197 9 No. of Writes: 2 5298 50158 Block Size: 2048b+ 4096b+ 8192b+ No. of Reads: 22 152814 155742 No. of Writes: 10281 141128 229969 Block Size: 16384b+ 32768b+ 65536b+ No. of Reads: 19592 2856 34807 No. of Writes: 46540 15874 16618 Block Size: 131072b+ 262144b+ 524288b+ No. of Reads: 41083 7619 43 No. of Writes: 12325 19939 1278 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 2 FORGET 0.00 0.00 us 0.00 us 0.00 us 1163 RELEASE 0.00 0.00 us 0.00 us 0.00 us 8997 RELEASEDIR 0.01 42.50 us 1.07 us 200.99 us 2015 OPENDIR 0.03 1205.44 us 147.78 us 3219.56 us 160 READDIRP 0.03 38.21 us 19.08 us 7544.91 us 5346 STATFS 0.03 947.57 us 42.15 us 5689.21 us 221 OPEN 0.03 536.56 us 11.61 us 5405.25 us 416 GETXATTR 0.06 42.40 us 15.94 us 3408.97 us 9626 FSTAT 0.06 992.07 us 13.79 us 3583.18 us 440 READDIR 2.07 784.10 us 10.19 us 100292.79 us 17781 LOOKUP 2.58 128.10 us 48.36 us 73127.27 us 135547 FXATTROP 2.75 282.45 us 36.68 us 403763.55 us 65559 FSYNC 5.30 72558.78 us 13.93 us 1840271.15 us 491 INODELK 7.97 450.94 us 75.16 us 368078.53 us 118790 WRITE 24.45 1212.26 us 11.70 us 1850709.72 us 135580 FINODELK 54.60 3082.33 us 72.64 us 387813.52 us 119063 READ Duration: 10748 seconds Data Read: 13585666193 bytes Data Written: 16779903830 bytes Interval 6 Stats: Block Size: 512b+ 1024b+ 4096b+ No. of Reads: 0 0 1151 No. of Writes: 13 2 405 Block Size: 8192b+ 16384b+ 32768b+ No. of Reads: 580 10 0 No. of Writes: 357 115 16 Block Size: 65536b+ 131072b+ 262144b+ No. of Reads: 314 96 0 No. of Writes: 25 19 62 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 9 RELEASE 0.00 0.00 us 0.00 us 0.00 us 25 RELEASEDIR 0.00 42.88 us 11.61 us 65.94 us 6 GETXATTR 0.01 95.97 us 52.28 us 174.48 us 9 OPEN 0.01 43.61 us 1.83 us 68.31 us 25 OPENDIR 0.01 30.47 us 22.02 us 42.78 us 53 STATFS 0.01 291.00 us 148.50 us 772.43 us 6 READDIR 0.01 2255.97 us 2255.97 us 2255.97 us 1 READDIRP 0.03 42.10 us 18.38 us 114.84 us 98 FSTAT 0.10 30.02 us 14.10 us 662.25 us 520 FINODELK 0.18 165.79 us 20.56 us 2137.59 us 173 LOOKUP 0.21 137.80 us 53.21 us 366.07 us 243 FSYNC 0.43 133.43 us 81.66 us 384.37 us 520 FXATTROP 4.53 713.51 us 75.16 us 101590.43 us 1014 WRITE 28.81 102208.45 us 16.11 us 931987.84 us 45 INODELK 65.66 4873.26 us 79.54 us 293295.82 us 2151 READ Duration: 26 seconds Data Read: 42790912 bytes Data Written: 40480256 bytes Brick: 10.32.9.6:/data/gfs/bricks/bricka/ovirt-data --------------------------------------------------- Cumulative Stats: Block Size: 1b+ No. of Reads: 0 No. of Writes: 140764 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 2 FORGET 0.00 0.00 us 0.00 us 0.00 us 8678 RELEASE 0.00 0.00 us 0.00 us 0.00 us 10636 RELEASEDIR 0.00 21.03 us 16.15 us 34.88 us 8 ENTRYLK 0.00 270.94 us 270.94 us 270.94 us 1 MKNOD 0.01 323.61 us 13.73 us 2466.13 us 80 READDIR 0.02 326.98 us 11.55 us 9174.66 us 83 GETXATTR 0.09 65.78 us 16.75 us 4946.30 us 2332 FLUSH 0.09 85.68 us 1.51 us 4642.85 us 1834 OPENDIR 0.32 227.62 us 41.15 us 64374.98 us 2476 OPEN 0.45 2256.77 us 15.05 us 63381.45 us 347 INODELK 0.97 56.53 us 10.33 us 5526.47 us 29676 FINODELK 1.78 74.73 us 15.84 us 5393.43 us 41433 WRITE 3.57 320.06 us 15.52 us 115626.36 us 19340 LOOKUP 3.89 227.62 us 77.62 us 68580.08 us 29634 FXATTROP 88.81 9879.97 us 149.50 us 367307.05 us 15606 FSYNC Duration: 12346 seconds Data Read: 0 bytes Data Written: 140764 bytes Interval 6 Stats: Block Size: 1b+ No. of Reads: 0 No. of Writes: 174 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 25 RELEASE 0.00 0.00 us 0.00 us 0.00 us 25 RELEASEDIR 0.04 73.34 us 20.62 us 134.34 us 5 GETXATTR 0.24 82.36 us 46.64 us 190.61 us 25 OPEN 0.31 437.89 us 272.99 us 1086.34 us 6 READDIR 0.43 184.47 us 19.85 us 2374.95 us 20 FLUSH 0.47 235.68 us 19.14 us 1331.71 us 17 INODELK 0.58 197.88 us 2.13 us 1981.61 us 25 OPENDIR 1.50 115.21 us 18.93 us 1851.08 us 112 FINODELK 2.46 194.41 us 96.77 us 1656.53 us 109 FXATTROP 5.75 284.33 us 23.78 us 1087.73 us 174 WRITE 6.46 295.65 us 21.14 us 19073.84 us 188 LOOKUP 81.76 12553.12 us 218.24 us 73123.96 us 56 FSYNC Duration: 26 seconds Data Read: 0 bytes Data Written: 174 bytes Brick: 10.32.9.4:/data/gfs/bricks/brick1/ovirt-data --------------------------------------------------- Cumulative Stats: Block Size: 256b+ 512b+ 1024b+ No. of Reads: 9195 6 0 No. of Writes: 3 700 2043 Block Size: 2048b+ 4096b+ 8192b+ No. of Reads: 0 32431 16795 No. of Writes: 623 67451 42803 Block Size: 16384b+ 32768b+ 65536b+ No. of Reads: 1749 317 6006 No. of Writes: 12604 1347 5850 Block Size: 131072b+ 262144b+ 524288b+ No. of Reads: 565 3076 3 No. of Writes: 2324 2469 893 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 2 FORGET 0.00 0.00 us 0.00 us 0.00 us 8579 RELEASE 0.00 0.00 us 0.00 us 0.00 us 8397 RELEASEDIR 0.00 22.96 us 15.97 us 51.06 us 8 ENTRYLK 0.00 260.50 us 260.50 us 260.50 us 1 MKNOD 0.00 93.03 us 16.42 us 2701.10 us 347 INODELK 0.00 128.90 us 7.61 us 1129.71 us 284 GETXATTR 0.01 300.85 us 12.41 us 3140.45 us 290 READDIR 0.01 57.69 us 13.08 us 46431.20 us 2334 FLUSH 0.01 70.20 us 0.94 us 17715.27 us 1939 OPENDIR 0.02 89.64 us 11.70 us 19019.90 us 1691 STAT 0.03 132.06 us 42.23 us 33323.48 us 2478 OPEN 0.03 62.90 us 16.20 us 27740.94 us 5346 STATFS 0.03 202.21 us 119.54 us 17249.34 us 1709 READDIRP 0.36 120.14 us 11.12 us 301165.99 us 29660 FINODELK 1.20 614.44 us 13.29 us 263413.97 us 19453 LOOKUP 1.27 427.97 us 71.48 us 563688.45 us 29634 FXATTROP 9.41 3840.22 us 82.10 us 469183.97 us 24493 READ 27.25 17452.64 us 92.85 us 483169.32 us 15606 FSYNC 60.36 14557.60 us 149.97 us 759284.50 us 41437 WRITE Duration: 10499 seconds Data Read: 2314290390 bytes Data Written: 3195953450 bytes Interval 6 Stats: Block Size: 256b+ 512b+ 4096b+ No. of Reads: 23 0 571 No. of Writes: 0 3 13 Block Size: 8192b+ 16384b+ 32768b+ No. of Reads: 52 0 0 No. of Writes: 128 13 1 Block Size: 65536b+ 131072b+ 262144b+ No. of Reads: 46 0 0 No. of Writes: 7 7 2 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 25 RELEASE 0.00 0.00 us 0.00 us 0.00 us 25 RELEASEDIR 0.01 68.94 us 10.11 us 221.91 us 6 GETXATTR 0.01 30.00 us 15.66 us 54.39 us 20 FLUSH 0.02 44.60 us 20.24 us 95.68 us 15 STAT 0.02 41.06 us 19.29 us 103.84 us 17 INODELK 0.02 162.01 us 110.86 us 228.45 us 6 READDIR 0.03 45.60 us 1.45 us 89.10 us 25 OPENDIR 0.04 75.73 us 44.53 us 190.60 us 25 OPEN 0.04 36.62 us 16.52 us 66.38 us 53 STATFS 0.09 185.21 us 136.70 us 275.71 us 21 READDIRP 0.11 44.70 us 19.04 us 91.96 us 111 FINODELK 0.37 148.06 us 86.69 us 324.91 us 109 FXATTROP 1.53 355.20 us 16.63 us 41956.49 us 188 LOOKUP 25.17 19663.37 us 213.60 us 89492.03 us 56 FSYNC 28.20 7089.90 us 177.90 us 39757.75 us 174 WRITE 44.33 2802.26 us 104.46 us 49721.81 us 692 READ Duration: 26 seconds Data Read: 5790220 bytes Data Written: 4044288 bytes Brick: 10.32.9.5:/data/gfs/bricks/brick1/ovirt-data --------------------------------------------------- Cumulative Stats: Block Size: 256b+ 512b+ 1024b+ No. of Reads: 2 458 87 No. of Writes: 3 703 2052 Block Size: 2048b+ 4096b+ 8192b+ No. of Reads: 54 15030 33939 No. of Writes: 623 68563 43056 Block Size: 16384b+ 32768b+ 65536b+ No. of Reads: 3940 540 7475 No. of Writes: 12791 1436 5850 Block Size: 131072b+ 262144b+ 524288b+ No. of Reads: 2395 6141 15 No. of Writes: 22702 2469 893 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 2 FORGET 0.00 0.00 us 0.00 us 0.00 us 8678 RELEASE 0.00 0.00 us 0.00 us 0.00 us 10597 RELEASEDIR 0.00 158.97 us 158.97 us 158.97 us 1 MKNOD 0.00 393.70 us 9.69 us 2344.17 us 8 ENTRYLK 0.00 126.39 us 10.54 us 281.56 us 80 READDIR 0.00 1956.79 us 6.43 us 155911.98 us 82 GETXATTR 0.00 3315.24 us 139.26 us 170764.63 us 160 READDIRP 0.03 4020.12 us 15.61 us 469284.63 us 1164 FSTAT 0.04 2583.09 us 28.06 us 204742.02 us 2476 OPEN 0.04 2825.94 us 8.33 us 590785.10 us 2332 FLUSH 0.07 5673.25 us 0.80 us 527903.92 us 1832 OPENDIR 0.09 2595.01 us 11.48 us 382537.47 us 5342 STATFS 0.14 59496.60 us 8.34 us 1543417.83 us 347 INODELK 0.40 2031.08 us 6.84 us 1127299.99 us 29607 FINODELK 0.57 4484.44 us 11.60 us 968588.21 us 19334 LOOKUP 2.00 10241.03 us 53.88 us 1880631.52 us 29585 FXATTROP 2.57 29123.91 us 63.31 us 3473303.89 us 13399 READ 11.80 115055.07 us 122.19 us 2279735.88 us 15581 FSYNC 82.25 301692.99 us 98.78 us 4500846.60 us 41400 WRITE Duration: 12343 seconds Data Read: 4270911318 bytes Data Written: 5880060714 bytes Interval 6 Stats: Block Size: 512b+ 4096b+ 8192b+ No. of Reads: 0 29 109 No. of Writes: 3 13 128 Block Size: 16384b+ 32768b+ 65536b+ No. of Reads: 15 0 47 No. of Writes: 13 1 7 Block Size: 131072b+ 262144b+ No. of Reads: 0 0 No. of Writes: 7 2 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 25 RELEASE 0.00 0.00 us 0.00 us 0.00 us 25 RELEASEDIR 0.00 40.48 us 6.94 us 92.02 us 6 GETXATTR 0.00 28.47 us 10.19 us 50.39 us 20 FLUSH 0.00 53.78 us 31.59 us 76.06 us 12 FSTAT 0.00 129.00 us 88.83 us 161.83 us 6 READDIR 0.00 47.94 us 0.89 us 88.84 us 25 OPENDIR 0.00 1499.83 us 1499.83 us 1499.83 us 1 READDIRP 0.04 1333.68 us 30.78 us 31125.60 us 25 OPEN 0.05 836.76 us 12.08 us 23439.58 us 53 STATFS 0.11 872.63 us 8.21 us 56495.95 us 111 FINODELK 0.88 6920.74 us 68.16 us 625281.67 us 109 FXATTROP 1.02 4614.95 us 13.66 us 348629.63 us 188 LOOKUP 1.65 83057.19 us 12.45 us 658978.30 us 17 INODELK 6.49 27709.93 us 98.14 us 471332.27 us 200 READ 8.08 123206.43 us 267.50 us 979136.22 us 56 FSYNC 81.66 396159.13 us 232.40 us 1353202.04 us 176 WRITE Duration: 26 seconds Data Read: 4341760 bytes Data Written: 4044288 bytes Brick: 10.32.9.7:/data/gfs/bricks/brick1/ovirt-data --------------------------------------------------- Cumulative Stats: Block Size: 256b+ 512b+ 1024b+ No. of Reads: 8 146 10 No. of Writes: 0 822 17574 Block Size: 2048b+ 4096b+ 8192b+ No. of Reads: 12 147605 81020 No. of Writes: 3335 177490 110247 Block Size: 16384b+ 32768b+ 65536b+ No. of Reads: 12618 1795 18321 No. of Writes: 30013 5366 10235 Block Size: 131072b+ 262144b+ 524288b+ No. of Reads: 2188 4008 20 No. of Writes: 8375 8875 585 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 608 RELEASE 0.00 0.00 us 0.00 us 0.00 us 8795 RELEASEDIR 0.00 20.59 us 14.68 us 25.91 us 75 FLUSH 0.00 77.89 us 58.31 us 94.39 us 85 REMOVEXATTR 0.00 85.37 us 64.65 us 156.44 us 85 SETATTR 0.00 86.81 us 42.56 us 643.87 us 146 OPEN 0.00 127.62 us 6.67 us 943.71 us 165 GETXATTR 0.00 507.62 us 14.32 us 1787.62 us 201 READDIR 0.01 1235.24 us 26.42 us 3832.88 us 160 READDIRP 0.06 244.53 us 16.13 us 351764.07 us 5346 STATFS 0.10 1171.75 us 1.17 us 1068559.38 us 1895 OPENDIR 0.13 406.39 us 25.42 us 1365160.22 us 7375 FSTAT 0.53 20755.68 us 13.33 us 887933.44 us 564 INODELK 1.46 170.00 us 45.86 us 2503412.21 us 190741 FXATTROP 1.53 178.24 us 11.10 us 3938189.55 us 190604 FINODELK 1.82 425.12 us 32.12 us 663207.46 us 94843 FSYNC 11.43 2187.65 us 72.21 us 4245933.36 us 116005 WRITE 16.73 21132.03 us 14.12 us 4031333.55 us 17576 LOOKUP 66.20 15737.71 us 69.72 us 6429153.50 us 93418 READ Duration: 11011 seconds Data Read: 4863622768 bytes Data Written: 8447544320 bytes Interval 6 Stats: Block Size: 512b+ 1024b+ 2048b+ No. of Reads: 0 0 0 No. of Writes: 3 109 5 Block Size: 4096b+ 8192b+ 16384b+ No. of Reads: 6829 259 5 No. of Writes: 228 230 13 Block Size: 32768b+ 65536b+ 131072b+ No. of Reads: 0 175 0 No. of Writes: 4 2 12 Block Size: 262144b+ 524288b+ No. of Reads: 0 0 No. of Writes: 12 2 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 7 RELEASE 0.00 0.00 us 0.00 us 0.00 us 25 RELEASEDIR 0.00 74.17 us 74.17 us 74.17 us 1 REMOVEXATTR 0.00 89.45 us 89.45 us 89.45 us 1 SETATTR 0.00 33.25 us 16.33 us 64.78 us 4 GETXATTR 0.00 70.61 us 52.42 us 153.99 us 7 OPEN 0.00 37.55 us 1.95 us 64.59 us 25 OPENDIR 0.00 30.14 us 18.44 us 44.04 us 53 STATFS 0.00 275.53 us 155.69 us 765.06 us 6 READDIR 0.00 2352.74 us 2352.74 us 2352.74 us 1 READDIRP 0.01 43.65 us 29.55 us 73.65 us 76 FSTAT 0.05 149.19 us 25.96 us 227.32 us 171 LOOKUP 0.06 25.33 us 11.55 us 59.71 us 1236 FINODELK 0.15 130.28 us 50.48 us 244.99 us 609 FSYNC 0.26 113.48 us 78.80 us 565.69 us 1237 FXATTROP 0.27 237.31 us 80.85 us 2140.21 us 620 WRITE 10.35 142923.57 us 15.02 us 887933.44 us 39 INODELK 88.84 6582.39 us 75.54 us 3820683.07 us 7268 READ Duration: 26 seconds Data Read: 41644032 bytes Data Written: 11333120 bytes Brick: 10.32.9.21:/data/gfs/bricks/bricka/ovirt-data ---------------------------------------------------- Cumulative Stats: Block Size: 1b+ No. of Reads: 0 No. of Writes: 583084 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 1 FORGET 0.00 0.00 us 0.00 us 0.00 us 1506 RELEASE 0.00 0.00 us 0.00 us 0.00 us 11330 RELEASEDIR 0.16 41.96 us 1.09 us 174.18 us 2000 OPENDIR 0.53 882.55 us 15.33 us 7388.53 us 313 INODELK 0.61 785.26 us 15.23 us 2483.07 us 410 READDIR 0.78 2944.84 us 50.45 us 13138.61 us 139 OPEN 1.04 1439.59 us 8.75 us 13801.35 us 379 GETXATTR 4.69 28.29 us 10.98 us 51214.69 us 87008 FINODELK 9.29 34.56 us 14.65 us 81798.39 us 141069 WRITE 19.62 601.79 us 13.34 us 82349.56 us 17113 LOOKUP 21.24 128.13 us 74.02 us 5590.59 us 87026 FXATTROP 42.05 509.22 us 38.18 us 285202.84 us 43355 FSYNC Duration: 12343 seconds Data Read: 0 bytes Data Written: 583084 bytes Interval 6 Stats: Block Size: 1b+ No. of Reads: 0 No. of Writes: 820 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 14 RELEASE 0.00 0.00 us 0.00 us 0.00 us 25 RELEASEDIR 0.26 54.89 us 17.77 us 76.95 us 9 GETXATTR 0.54 73.00 us 50.45 us 123.40 us 14 OPEN 0.55 41.66 us 1.43 us 58.76 us 25 OPENDIR 0.88 278.14 us 157.59 us 394.86 us 6 READDIR 0.97 40.13 us 17.39 us 342.49 us 46 INODELK 9.81 34.88 us 16.20 us 3058.74 us 534 FINODELK 14.96 34.64 us 16.51 us 177.38 us 820 WRITE 17.10 168.21 us 24.51 us 323.55 us 193 LOOKUP 17.52 127.46 us 46.97 us 299.68 us 261 FSYNC 37.41 133.00 us 80.38 us 341.49 us 534 FXATTROP Duration: 26 seconds Data Read: 0 bytes Data Written: 820 bytes Brick: 10.32.9.21:/data0/gfs/bricks/brick1/ovirt-data ----------------------------------------------------- Cumulative Stats: Block Size: 256b+ 512b+ 1024b+ No. of Reads: 11 66 2 No. of Writes: 2 5586 50158 Block Size: 2048b+ 4096b+ 8192b+ No. of Reads: 4 52513 93336 No. of Writes: 10281 142655 230190 Block Size: 16384b+ 32768b+ 65536b+ No. of Reads: 7241 1565 19639 No. of Writes: 46705 15906 16639 Block Size: 131072b+ 262144b+ 524288b+ No. of Reads: 1874 4071 24 No. of Writes: 12317 19939 1278 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 2 FORGET 0.00 0.00 us 0.00 us 0.00 us 1255 RELEASE 0.00 0.00 us 0.00 us 0.00 us 11357 RELEASEDIR 0.03 45.76 us 13.86 us 9356.79 us 1679 STAT 0.03 46.01 us 1.04 us 772.13 us 2000 OPENDIR 0.04 326.97 us 10.29 us 2391.56 us 353 GETXATTR 0.05 51.07 us 29.86 us 540.78 us 2522 FSTAT 0.05 313.16 us 16.18 us 5815.42 us 491 INODELK 0.07 883.53 us 45.88 us 6148.85 us 221 OPEN 0.08 41.10 us 18.98 us 9889.16 us 5346 STATFS 0.12 856.40 us 15.78 us 4414.06 us 410 READDIR 0.14 225.48 us 143.72 us 926.43 us 1710 READDIRP 1.41 29.62 us 10.80 us 110711.59 us 135572 FINODELK 3.66 587.34 us 15.19 us 61302.68 us 17766 LOOKUP 5.95 125.02 us 49.28 us 17686.93 us 135539 FXATTROP 6.44 279.71 us 35.58 us 407061.84 us 65554 FSYNC 19.89 476.93 us 75.41 us 440395.70 us 118787 WRITE 62.04 4088.32 us 78.00 us 503162.29 us 43217 READ Duration: 12343 seconds Data Read: 4610277422 bytes Data Written: 16792466262 bytes Interval 6 Stats: Block Size: 512b+ 1024b+ 4096b+ No. of Reads: 0 0 70 No. of Writes: 13 2 405 Block Size: 8192b+ 16384b+ 32768b+ No. of Reads: 342 5 0 No. of Writes: 357 115 16 Block Size: 65536b+ 131072b+ 262144b+ No. of Reads: 168 0 0 No. of Writes: 25 19 62 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 9 RELEASE 0.00 0.00 us 0.00 us 0.00 us 25 RELEASEDIR 0.01 35.71 us 24.38 us 65.77 us 15 STAT 0.02 44.49 us 1.68 us 67.87 us 25 OPENDIR 0.02 126.43 us 52.25 us 398.25 us 9 OPEN 0.02 51.99 us 32.93 us 109.37 us 26 FSTAT 0.03 256.51 us 145.04 us 360.84 us 6 READDIR 0.03 33.20 us 20.54 us 47.63 us 53 STATFS 0.03 185.61 us 12.72 us 1214.94 us 10 GETXATTR 0.07 94.06 us 17.46 us 2439.30 us 45 INODELK 0.08 213.94 us 177.04 us 383.69 us 21 READDIRP 0.26 29.48 us 15.33 us 71.02 us 521 FINODELK 0.48 159.48 us 21.99 us 327.94 us 173 LOOKUP 1.22 136.16 us 84.01 us 461.36 us 521 FXATTROP 4.60 1098.76 us 46.07 us 205950.17 us 243 FSYNC 8.03 459.44 us 85.85 us 21485.20 us 1014 WRITE 85.10 8442.50 us 119.17 us 309111.15 us 585 READ Duration: 26 seconds Data Read: 14180352 bytes Data Written: 40480256 bytes Brick: 10.32.9.20:/data0/gfs/bricks/brick1/ovirt-data ----------------------------------------------------- Cumulative Stats: Block Size: 256b+ 512b+ 1024b+ No. of Reads: 4 109 21 No. of Writes: 0 460 54715 Block Size: 2048b+ 4096b+ 8192b+ No. of Reads: 42 45163 54896 No. of Writes: 8883 212986 162092 Block Size: 16384b+ 32768b+ 65536b+ No. of Reads: 7020 1175 20888 No. of Writes: 48288 16340 24443 Block Size: 131072b+ 262144b+ 524288b+ No. of Reads: 1408 2923 9 No. of Writes: 19159 26333 792 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 514 RELEASE 0.00 0.00 us 0.00 us 0.00 us 6838 RELEASEDIR 0.00 59.50 us 14.59 us 546.30 us 41 GETXATTR 0.00 220.10 us 16.62 us 988.45 us 50 READDIR 0.00 104.05 us 39.98 us 497.56 us 139 OPEN 0.00 48.49 us 15.80 us 3469.09 us 315 INODELK 0.01 50.94 us 1.65 us 1573.96 us 1820 OPENDIR 0.03 215.71 us 153.61 us 2562.55 us 1710 READDIRP 0.03 69.86 us 18.77 us 125905.96 us 5346 STATFS 0.07 141.42 us 16.04 us 655589.61 us 6984 FSTAT 0.08 659.41 us 12.89 us 523421.21 us 1679 STAT 1.24 1008.30 us 19.19 us 1534952.31 us 16933 LOOKUP 1.30 204.65 us 11.71 us 1749236.34 us 87007 FINODELK 1.48 233.96 us 73.33 us 1784231.76 us 87026 FXATTROP 12.57 3983.58 us 39.91 us 2743546.94 us 43355 FSYNC 41.38 12734.21 us 79.14 us 2966713.52 us 44635 READ 41.80 4069.99 us 77.68 us 2638579.30 us 141069 WRITE Duration: 9900 seconds Data Read: 3717300328 bytes Data Written: 21265203712 bytes Interval 6 Stats: Block Size: 512b+ 1024b+ 2048b+ No. of Reads: 0 0 0 No. of Writes: 3 39 2 Block Size: 4096b+ 8192b+ 16384b+ No. of Reads: 381 294 20 No. of Writes: 83 467 47 Block Size: 32768b+ 65536b+ 131072b+ No. of Reads: 0 264 0 No. of Writes: 32 35 41 Block Size: 262144b+ 524288b+ No. of Reads: 0 0 No. of Writes: 65 6 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 14 RELEASE 0.00 0.00 us 0.00 us 0.00 us 25 RELEASEDIR 0.00 55.58 us 15.93 us 93.98 us 8 GETXATTR 0.00 37.21 us 25.04 us 59.45 us 15 STAT 0.01 69.47 us 46.35 us 141.23 us 14 OPEN 0.01 26.67 us 18.22 us 49.45 us 46 INODELK 0.01 49.56 us 2.58 us 143.98 us 25 OPENDIR 0.01 34.60 us 18.77 us 126.90 us 53 STATFS 0.01 355.56 us 149.60 us 988.45 us 6 READDIR 0.02 41.77 us 21.14 us 73.50 us 72 FSTAT 0.02 215.13 us 169.65 us 274.27 us 21 READDIRP 0.08 29.04 us 13.69 us 66.67 us 534 FINODELK 0.17 160.73 us 23.54 us 321.76 us 193 LOOKUP 0.39 135.86 us 77.40 us 1627.16 us 534 FXATTROP 4.78 1072.78 us 86.42 us 197615.21 us 820 WRITE 19.40 13686.01 us 47.75 us 1796049.15 us 261 FSYNC 75.09 14420.80 us 94.10 us 2221701.71 us 959 READ Duration: 26 seconds Data Read: 21602304 bytes Data Written: 49301504 bytes Brick: 10.32.9.20:/data/gfs/bricks/bricka/ovirt-data ---------------------------------------------------- Cumulative Stats: Block Size: 1b+ No. of Reads: 0 No. of Writes: 549022 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 2 FORGET 0.00 0.00 us 0.00 us 0.00 us 1065 RELEASE 0.00 0.00 us 0.00 us 0.00 us 6838 RELEASEDIR 0.01 53.35 us 12.82 us 228.91 us 57 GETXATTR 0.02 194.83 us 14.90 us 878.82 us 50 READDIR 0.04 43.27 us 15.72 us 322.38 us 491 INODELK 0.04 97.28 us 41.17 us 432.99 us 221 OPEN 0.16 53.03 us 1.74 us 501.22 us 1820 OPENDIR 7.40 33.07 us 11.38 us 68417.92 us 135575 FINODELK 9.15 46.66 us 14.70 us 55440.51 us 118787 WRITE 26.84 925.03 us 16.53 us 280142.79 us 17586 LOOKUP 27.29 252.34 us 34.94 us 251521.50 us 65555 FSYNC 29.08 130.03 us 50.81 us 55731.00 us 135539 FXATTROP Duration: 9900 seconds Data Read: 0 bytes Data Written: 549022 bytes Interval 6 Stats: Block Size: 1b+ No. of Reads: 0 No. of Writes: 1014 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 9 RELEASE 0.00 0.00 us 0.00 us 0.00 us 25 RELEASEDIR 0.11 43.98 us 15.40 us 84.40 us 5 GETXATTR 0.44 95.58 us 51.84 us 164.84 us 9 OPEN 0.62 48.41 us 2.33 us 70.35 us 25 OPENDIR 0.72 31.10 us 17.28 us 117.76 us 45 INODELK 0.98 318.59 us 154.88 us 878.82 us 6 READDIR 9.45 35.24 us 14.72 us 2246.13 us 521 FINODELK 14.32 160.73 us 22.70 us 316.55 us 173 LOOKUP 17.04 135.61 us 56.40 us 1874.03 us 244 FSYNC 19.01 36.40 us 16.78 us 2947.68 us 1014 WRITE 37.30 139.06 us 85.48 us 1078.96 us 521 FXATTROP Duration: 26 seconds Data Read: 0 bytes Data Written: 1014 bytes Brick: 10.32.9.8:/data/gfs/bricks/bricka/ovirt-data --------------------------------------------------- Cumulative Stats: Block Size: 1b+ No. of Reads: 0 No. of Writes: 372917 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 608 RELEASE 0.00 0.00 us 0.00 us 0.00 us 8719 RELEASEDIR 0.00 23.65 us 15.42 us 47.19 us 75 FLUSH 0.01 86.28 us 57.99 us 148.82 us 85 REMOVEXATTR 0.01 90.74 us 67.34 us 130.94 us 85 SETATTR 0.01 73.83 us 11.11 us 236.42 us 133 GETXATTR 0.02 83.84 us 41.98 us 499.27 us 146 OPEN 0.03 33.97 us 12.62 us 315.20 us 564 INODELK 0.06 316.19 us 16.57 us 1296.89 us 141 READDIR 0.13 49.96 us 1.25 us 911.95 us 1865 OPENDIR 7.61 27.61 us 11.45 us 86619.65 us 190604 FINODELK 9.35 55.71 us 14.57 us 117405.17 us 116005 WRITE 25.18 183.50 us 32.55 us 187376.40 us 94843 FSYNC 25.95 1022.20 us 15.51 us 136924.46 us 17546 LOOKUP 31.63 114.64 us 48.10 us 68557.92 us 190742 FXATTROP Duration: 11059 seconds Data Read: 0 bytes Data Written: 372917 bytes Interval 6 Stats: Block Size: 1b+ No. of Reads: 0 No. of Writes: 620 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 7 RELEASE 0.00 0.00 us 0.00 us 0.00 us 25 RELEASEDIR 0.02 96.14 us 96.14 us 96.14 us 1 SETATTR 0.02 102.92 us 102.92 us 102.92 us 1 REMOVEXATTR 0.13 84.08 us 51.75 us 221.40 us 7 OPEN 0.13 61.82 us 16.08 us 159.35 us 10 GETXATTR 0.25 46.62 us 1.88 us 77.92 us 25 OPENDIR 0.27 32.45 us 17.59 us 152.92 us 39 INODELK 0.34 261.02 us 153.06 us 363.57 us 6 READDIR 4.63 34.65 us 16.37 us 80.81 us 620 WRITE 5.77 156.66 us 22.46 us 304.56 us 171 LOOKUP 7.34 27.58 us 14.15 us 781.04 us 1236 FINODELK 31.07 116.50 us 78.86 us 261.81 us 1238 FXATTROP 50.02 381.21 us 39.71 us 149969.67 us 609 FSYNC Duration: 26 seconds Data Read: 0 bytes Data Written: 620 bytes Brick: 10.32.9.8:/data0/gfs/bricks/brick1/ovirt-data ---------------------------------------------------- Cumulative Stats: Block Size: 256b+ 512b+ 1024b+ No. of Reads: 8 1096 109 No. of Writes: 0 460 54715 Block Size: 2048b+ 4096b+ 8192b+ No. of Reads: 115 111912 132663 No. of Writes: 8883 212986 162092 Block Size: 16384b+ 32768b+ 65536b+ No. of Reads: 27012 2879 35570 No. of Writes: 48288 16340 24443 Block Size: 131072b+ 262144b+ 524288b+ No. of Reads: 3743 7207 32 No. of Writes: 19159 26333 792 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 1 FORGET 0.00 0.00 us 0.00 us 0.00 us 1137 RELEASE 0.00 0.00 us 0.00 us 0.00 us 8740 RELEASEDIR 0.00 43.88 us 10.83 us 250.84 us 124 GETXATTR 0.00 120.62 us 44.65 us 762.18 us 139 OPEN 0.01 206.40 us 12.21 us 864.86 us 141 READDIR 0.02 43.05 us 1.44 us 452.74 us 1865 OPENDIR 0.05 2104.16 us 1920.14 us 2569.20 us 85 READDIRP 0.05 36.26 us 19.25 us 347.13 us 5346 STATFS 0.08 35.13 us 16.14 us 340.33 us 8148 FSTAT 0.63 27.73 us 11.02 us 73986.88 us 87007 FINODELK 1.53 134.48 us 38.31 us 90956.10 us 43355 FSYNC 1.73 388.39 us 15.69 us 62037.95 us 16978 LOOKUP 2.63 31993.13 us 16.10 us 888210.80 us 314 INODELK 2.93 128.32 us 73.56 us 45501.15 us 87026 FXATTROP 8.70 235.34 us 76.95 us 172924.48 us 141069 WRITE 81.65 3612.62 us 74.71 us 427304.61 us 86246 READ Duration: 11059 seconds Data Read: 8285158628 bytes Data Written: 21265203712 bytes Interval 6 Stats: Block Size: 512b+ 1024b+ 2048b+ No. of Reads: 0 0 0 No. of Writes: 3 39 2 Block Size: 4096b+ 8192b+ 16384b+ No. of Reads: 4976 418 10 No. of Writes: 83 467 47 Block Size: 32768b+ 65536b+ 131072b+ No. of Reads: 0 264 0 No. of Writes: 32 35 41 Block Size: 262144b+ 524288b+ No. of Reads: 0 0 No. of Writes: 65 6 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 14 RELEASE 0.00 0.00 us 0.00 us 0.00 us 25 RELEASEDIR 0.00 51.92 us 14.43 us 92.40 us 9 GETXATTR 0.01 40.96 us 1.87 us 66.77 us 25 OPENDIR 0.01 75.90 us 55.90 us 118.42 us 14 OPEN 0.01 236.50 us 152.98 us 361.08 us 6 READDIR 0.01 32.60 us 22.90 us 46.88 us 53 STATFS 0.01 2244.32 us 2244.32 us 2244.32 us 1 READDIRP 0.02 38.95 us 19.17 us 88.64 us 84 FSTAT 0.10 28.63 us 16.14 us 161.63 us 534 FINODELK 0.20 160.39 us 23.30 us 348.09 us 193 LOOKUP 0.38 226.64 us 49.98 us 20602.07 us 261 FSYNC 0.47 137.09 us 84.61 us 5477.48 us 534 FXATTROP 0.97 181.82 us 86.92 us 637.73 us 820 WRITE 28.68 96252.17 us 18.55 us 888210.80 us 46 INODELK 69.12 1882.74 us 79.08 us 157169.15 us 5668 READ Duration: 26 seconds Data Read: 41271296 bytes Data Written: 49301504 bytes Brick: 10.32.9.3:/data/gfs/bricks/brick3/ovirt-data --------------------------------------------------- Cumulative Stats: Block Size: 256b+ 512b+ 1024b+ No. of Reads: 4 131 8 No. of Writes: 0 822 17574 Block Size: 2048b+ 4096b+ 8192b+ No. of Reads: 16 35286 49422 No. of Writes: 3335 177490 110247 Block Size: 16384b+ 32768b+ 65536b+ No. of Reads: 12220 1028 23348 No. of Writes: 30013 5366 10235 Block Size: 131072b+ 262144b+ 524288b+ No. of Reads: 974 1858 19 No. of Writes: 8375 8875 585 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 772 RELEASE 0.00 0.00 us 0.00 us 0.00 us 10592 RELEASEDIR 0.00 76.27 us 14.39 us 255.33 us 38 GETXATTR 0.00 56.50 us 19.21 us 185.55 us 75 FLUSH 0.00 150.43 us 76.71 us 292.39 us 85 REMOVEXATTR 0.00 157.91 us 97.76 us 237.19 us 85 SETATTR 0.00 308.86 us 26.64 us 1991.99 us 51 READDIR 0.01 183.93 us 62.86 us 1362.63 us 146 OPEN 0.01 73.43 us 23.05 us 777.89 us 564 INODELK 0.03 75.13 us 23.73 us 337.63 us 1679 STAT 0.03 89.35 us 1.99 us 1982.01 us 1820 OPENDIR 0.09 79.36 us 24.79 us 805.05 us 5346 STATFS 0.11 106.63 us 41.73 us 1740.16 us 5044 FSTAT 0.15 262.43 us 91.17 us 4453.28 us 2680 READDIRP 2.38 58.64 us 13.90 us 26031.44 us 190605 FINODELK 3.35 898.17 us 25.36 us 119501.01 us 17501 LOOKUP 6.60 326.75 us 45.00 us 283828.60 us 94843 FSYNC 7.89 194.27 us 67.03 us 157955.41 us 190743 FXATTROP 17.21 696.94 us 97.07 us 340140.18 us 116005 WRITE 62.13 7751.59 us 110.31 us 596087.45 us 37647 READ Duration: 12343 seconds Data Read: 3321340984 bytes Data Written: 8447544320 bytes Interval 6 Stats: Block Size: 512b+ 1024b+ 2048b+ No. of Reads: 0 0 0 No. of Writes: 3 109 5 Block Size: 4096b+ 8192b+ 16384b+ No. of Reads: 187 225 13 No. of Writes: 228 230 13 Block Size: 32768b+ 65536b+ 131072b+ No. of Reads: 0 185 0 No. of Writes: 4 2 12 Block Size: 262144b+ 524288b+ No. of Reads: 0 0 No. of Writes: 12 2 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 7 RELEASE 0.00 0.00 us 0.00 us 0.00 us 25 RELEASEDIR 0.00 173.56 us 173.56 us 173.56 us 1 REMOVEXATTR 0.00 190.79 us 190.79 us 190.79 us 1 SETATTR 0.01 80.71 us 20.92 us 143.06 us 5 GETXATTR 0.01 163.10 us 113.71 us 277.38 us 7 OPEN 0.01 76.13 us 26.98 us 125.17 us 15 STAT 0.02 77.17 us 1.99 us 133.67 us 25 OPENDIR 0.03 61.12 us 30.85 us 227.60 us 39 INODELK 0.05 656.71 us 271.17 us 1991.99 us 6 READDIR 0.05 75.47 us 32.20 us 361.18 us 53 STATFS 0.07 110.87 us 59.06 us 160.47 us 52 FSTAT 0.11 249.69 us 108.39 us 447.56 us 34 READDIRP 0.50 231.83 us 29.43 us 677.06 us 171 LOOKUP 0.91 58.52 us 20.49 us 999.15 us 1238 FINODELK 2.45 319.84 us 67.30 us 4499.21 us 609 FSYNC 2.79 356.93 us 114.39 us 4840.74 us 620 WRITE 3.02 193.78 us 110.23 us 2600.12 us 1239 FXATTROP 89.95 11709.07 us 151.63 us 198748.33 us 610 READ Duration: 26 seconds Data Read: 14946304 bytes Data Written: 11333120 bytes From pascal.suter at dalco.ch Wed Apr 3 10:28:40 2019 From: pascal.suter at dalco.ch (Pascal Suter) Date: Wed, 3 Apr 2019 12:28:40 +0200 Subject: [Gluster-users] performance - what can I expect Message-ID: Hi all I am currently testing gluster on a single server. I have three bricks, each a hardware RAID6 volume with thin provisioned LVM that was aligned to the RAID and then formatted with xfs. i've created a distributed volume so that entire files get distributed across my three bricks. first I ran a iozone benchmark across each brick testing the read and write perofrmance of a single large file per brick i then mounted my gluster volume locally and ran another iozone run with the same parameters writing a single file. the file went to brick 1 which, when used driectly, would write with 2.3GB/s and read with 1.5GB/s. however, through gluster i got only 800MB/s read and 750MB/s write throughput another run with two processes each writing a file, where one file went to the first brick and the other file to the second brick (which by itself when directly accessed wrote at 2.8GB/s and read at 2.7GB/s) resulted in 1.2GB/s of aggregated write and also aggregated read throughput. Is this a normal performance i can expect out of a glusterfs or is it worth tuning in order to really get closer to the actual brick filesystem performance? here are the iozone commands i use for writing and reading.. note that i am using directIO in order to make sure i don't get fooled by cache :) ./iozone -i 0 -t 1 -F /mnt/brick${b}/thread1 -+n -c -C -e -I -w -+S 0 -s $filesize -r $recordsize > iozone-brick${b}-write.txt ./iozone -i 1 -t 1 -F /mnt/brick${b}/thread1 -+n -c -C -e -I -w -+S 0 -s $filesize -r $recordsize > iozone-brick${b}-read.txt cheers Pascal From jthottan at redhat.com Wed Apr 3 11:16:08 2019 From: jthottan at redhat.com (Jiffin Tony Thottan) Date: Wed, 3 Apr 2019 16:46:08 +0530 Subject: [Gluster-users] Gluster GEO replication fault after write over nfs-ganesha In-Reply-To: <1a5fb44e-fc3b-4edb-28ee-baa4ed077251@redhat.com> References: <1a5fb44e-fc3b-4edb-28ee-baa4ed077251@redhat.com> Message-ID: <050e80fd-2904-f69c-dd91-8bf0dfb96c3f@redhat.com> CCIng sunn as well. On 28/03/19 4:05 PM, Soumya Koduri wrote: > > > On 3/27/19 7:39 PM, Alexey Talikov wrote: >> I have two clusters with dispersed volumes (2+1) with GEO replication >> It works fine till I use glusterfs-fuse, but as even one file written >> over nfs-ganesha replication goes to Fault and recovers after I >> remove this file (sometimes after stop/start) >> I think nfs-hanesha writes file in some way that produces problem >> with replication >> > > I am not much familiar with geo-rep and not sure what/why exactly > failed here. Request Kotresh (cc'ed) to take a look and provide his > insights on the issue. > > Thanks, > Soumya > >> |OSError: [Errno 61] No data available: >> '.gfid/9c9514ce-a310-4a1c-a87b-a800a32a99f8' | >> >> but if I check over glusterfs mounted with aux-gfid-mount >> >> |getfattr -n trusted.glusterfs.pathinfo -e text >> /mnt/TEST/.gfid/9c9514ce-a310-4a1c-a87b-a800a32a99f8 getfattr: >> Removing leading '/' from absolute path names # file: >> mnt/TEST/.gfid/9c9514ce-a310-4a1c-a87b-a800a32a99f8 >> trusted.glusterfs.pathinfo="( >> ( >> ))" | >> >> File exists >> Details available here >> https://github.com/nfs-ganesha/nfs-ganesha/issues/408 >> >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users >> > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users From kbh-admin at mpa-ifw.tu-darmstadt.de Wed Apr 3 14:20:32 2019 From: kbh-admin at mpa-ifw.tu-darmstadt.de (kbh-admin) Date: Wed, 3 Apr 2019 16:20:32 +0200 Subject: [Gluster-users] Gluster and LVM Message-ID: Hello Gluster-Community, we consider to build several gluster-servers and have a question regarding? lvm and glusterfs. Scenario 1: Snapshots Of course, taking snapshots is a good capability and we want to use lvm for that. Scenaraio 2: Increase gluster volume We want to increase the gluster volume by adding hdd's and/or by adding dell powervaults later. We got the recommendation to set up a new gluster volume for the powervaults and don't use lvm in that case (lvresize ....) . What would you suggest and how do you manage both lvm and glusterfs together? Thanks in advance. Felix From dm at belkam.com Wed Apr 3 15:26:38 2019 From: dm at belkam.com (Dmitry Melekhov) Date: Wed, 3 Apr 2019 19:26:38 +0400 Subject: [Gluster-users] Gluster and LVM In-Reply-To: References: Message-ID: 03.04.2019 18:20, kbh-admin ?????: > Hello Gluster-Community, > > > we consider to build several gluster-servers and have a question > regarding? lvm and glusterfs. > > > Scenario 1: Snapshots > > Of course, taking snapshots is a good capability and we want to use > lvm for that. > > > Scenaraio 2: Increase gluster volume > > We want to increase the gluster volume by adding hdd's and/or by adding > > dell powervaults later. We got the recommendation to set up a new > gluster volume > > for the powervaults and don't use lvm in that case (lvresize ....) . > > > What would you suggest and how do you manage both lvm and glusterfs > together? If you already have storage why you need gluster? Just use it :-) > > Thanks in advance. > > > Felix > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users From pkalever at redhat.com Wed Apr 3 15:41:23 2019 From: pkalever at redhat.com (Prasanna Kalever) Date: Wed, 3 Apr 2019 21:11:23 +0530 Subject: [Gluster-users] Help: gluster-block In-Reply-To: References: Message-ID: On Tue, Apr 2, 2019 at 1:34 AM Karim Roumani wrote: > Actually we have a question. > > We did two tests as follows. > > Test 1 - iSCSI target on the glusterFS server > Test 2 - iSCSI target on a separate server with gluster client > > Test 2 performed a read speed of <1GB/second while Test 1 about > 300MB/second > > Any reason you see to why this may be the case? > For Test 1 case, 1. ops b/w * iscsi initiator <-> iscsi target and * tcmu-runner <-> gluster server are all using the same NIC resource. 2. Also, it might be possible that, the node might be facing high resource usage like cpu is high and/or memory is low, as everything is on the same node. You can check also check gluster profile info, to corner down some of these. Thanks! -- Prasanna > ? > > On Mon, Apr 1, 2019 at 1:00 PM Karim Roumani > wrote: > >> Thank you Prasanna for your quick response very much appreaciated we will >> review and get back to you. >> ? >> >> On Mon, Mar 25, 2019 at 9:00 AM Prasanna Kalever >> wrote: >> >>> [ adding +gluster-users for archive purpose ] >>> >>> On Sat, Mar 23, 2019 at 1:51 AM Jeffrey Chin >>> wrote: >>> > >>> > Hello Mr. Kalever, >>> >>> Hello Jeffrey, >>> >>> > >>> > I am currently working on a project to utilize GlusterFS for VMWare >>> VMs. In our research, we found that utilizing block devices with GlusterFS >>> would be the best approach for our use case (correct me if I am wrong). I >>> saw the gluster utility that you are a contributor for called gluster-block >>> (https://github.com/gluster/gluster-block), and I had a question about >>> the configuration. From what I understand, gluster-block only works on the >>> servers that are serving the gluster volume. Would it be possible to run >>> the gluster-block utility on a client machine that has a gluster volume >>> mounted to it? >>> >>> Yes, that is right! At the moment gluster-block is coupled with >>> glusterd for simplicity. >>> But we have made some changes here [1] to provide a way to specify >>> server address (volfile-server) which is outside the gluster-blockd >>> node, please take a look. >>> >>> Although it is not complete solution, but it should at-least help for >>> some usecases. Feel free to raise an issue [2] with the details about >>> your usecase and etc or submit a PR by your self :-) >>> We never picked it, as we never have a usecase needing separation of >>> gluster-blockd and glusterd. >>> >>> > >>> > I also have another question: how do I make the iSCSI targets persist >>> if all of the gluster nodes were rebooted? It seems like once all of the >>> nodes reboot, I am unable to reconnect to the iSCSI targets created by the >>> gluster-block utility. >>> >>> do you mean rebooting iscsi initiator ? or gluster-block/gluster >>> target/server nodes ? >>> >>> 1. for initiator to automatically connect to block devices post >>> reboot, we need to make below changes in /etc/iscsi/iscsid.conf: >>> node.startup = automatic >>> >>> 2. if you mean, just in case if all the gluster nodes goes down, on >>> the initiator all the available HA path's will be down, but we still >>> want the IO to be queued on the initiator, until one of the path >>> (gluster node) is availabe: >>> >>> for this in gluster-block sepcific section of multipath.conf you need >>> to replace 'no_path_retry 120' as 'no_path_retry queue' >>> Note: refer README for current multipath.conf setting recommendations. >>> >>> [1] https://github.com/gluster/gluster-block/pull/161 >>> [2] https://github.com/gluster/gluster-block/issues/new >>> >>> BRs, >>> -- >>> Prasanna >>> >> >> >> -- >> >> Thank you, >> >> *Karim Roumani* >> Director of Technology Solutions >> >> TekReach Solutions / Albatross Cloud >> 714-916-5677 >> Karim.Roumani at tekreach.com >> Albatross.cloud - One Stop Cloud Solutions >> Portalfronthosting.com - Complete >> SharePoint Solutions >> > > > -- > > Thank you, > > *Karim Roumani* > Director of Technology Solutions > > TekReach Solutions / Albatross Cloud > 714-916-5677 > Karim.Roumani at tekreach.com > Albatross.cloud - One Stop Cloud Solutions > Portalfronthosting.com - Complete > SharePoint Solutions > -------------- next part -------------- An HTML attachment was scrubbed... URL: From moagrawa at redhat.com Wed Apr 3 15:56:15 2019 From: moagrawa at redhat.com (Mohit Agrawal) Date: Wed, 3 Apr 2019 21:26:15 +0530 Subject: [Gluster-users] [ovirt-users] Re: Announcing Gluster release 5.5 In-Reply-To: References: <20190328164716.27693.35887@mail.ovirt.org> Message-ID: Hi, Thanks Olaf for sharing the relevant logs. @Atin, You are right patch https://review.gluster.org/#/c/glusterfs/+/22344/ will resolve the issue running multiple brick instance for same brick. As we can see in below logs glusterd is trying to start the same brick instance twice at the same time [2019-04-01 10:23:21.752401] I [glusterd-utils.c:6301:glusterd_brick_start] 0-management: starting a fresh brick process for brick /data/gfs/bricks/brick1/ovirt-engine [2019-04-01 10:23:30.348091] I [glusterd-utils.c:6301:glusterd_brick_start] 0-management: starting a fresh brick process for brick /data/gfs/bricks/brick1/ovirt-engine [2019-04-01 10:24:13.353396] I [glusterd-utils.c:6301:glusterd_brick_start] 0-management: starting a fresh brick process for brick /data/gfs/bricks/brick1/ovirt-engine [2019-04-01 10:24:24.253764] I [glusterd-utils.c:6301:glusterd_brick_start] 0-management: starting a fresh brick process for brick /data/gfs/bricks/brick1/ovirt-engine We are seeing below message between starting of two instances The message "E [MSGID: 101012] [common-utils.c:4075:gf_is_service_running] 0-: Unable to read pidfile: /var/run/gluster/vols/ovirt-engine/10.32.9.5-data-gfs-bricks-brick1-ovirt-engine.pid" repeated 2 times between [2019-04-01 10:23:21.748492] and [2019-04-01 10:23:21.752432] I will backport the same. Thanks, Mohit Agrawal On Wed, Apr 3, 2019 at 3:58 PM Olaf Buitelaar wrote: > Dear Mohit, > > Sorry i thought Krutika was referring to the ovirt-kube brick logs. due > the large size (18MB compressed), i've placed the files here; > https://edgecastcdn.net/0004FA/files/bricklogs.tar.bz2 > Also i see i've attached the wrong files, i intended to > attach profile_data4.txt | profile_data3.txt > Sorry for the confusion. > > Thanks Olaf > > Op wo 3 apr. 2019 om 04:56 schreef Mohit Agrawal : > >> Hi Olaf, >> >> As per current attached "multi-glusterfsd-vol3.txt | >> multi-glusterfsd-vol4.txt" it is showing multiple processes are running >> for "ovirt-core ovirt-engine" brick names but there are no logs >> available in bricklogs.zip specific to this bricks, bricklogs.zip >> has a dump of ovirt-kube logs only >> >> Kindly share brick logs specific to the bricks "ovirt-core >> ovirt-engine" and share glusterd logs also. >> >> Regards >> Mohit Agrawal >> >> On Tue, Apr 2, 2019 at 9:18 PM Olaf Buitelaar >> wrote: >> >>> Dear Krutika, >>> >>> 1. >>> I've changed the volume settings, write performance seems to increased >>> somewhat, however the profile doesn't really support that since latencies >>> increased. However read performance has diminished, which does seem to be >>> supported by the profile runs (attached). >>> Also the IO does seem to behave more consistent than before. >>> I don't really understand the idea behind them, maybe you can explain >>> why these suggestions are good? >>> These settings seems to avoid as much local caching and access as >>> possible and push everything to the gluster processes. While i would expect >>> local access and local caches are a good thing, since it would lead to >>> having less network access or disk access. >>> I tried to investigate these settings a bit more, and this is what i >>> understood of them; >>> - network.remote-dio; when on it seems to ignore the O_DIRECT flag in >>> the client, thus causing the files to be cached and buffered in the page >>> cache on the client, i would expect this to be a good thing especially if >>> the server process would access the same page cache? >>> At least that is what grasp from this commit; >>> https://review.gluster.org/#/c/glusterfs/+/4206/2/xlators/protocol/client/src/client.c line >>> 867 >>> Also found this commit; >>> https://github.com/gluster/glusterfs/commit/06c4ba589102bf92c58cd9fba5c60064bc7a504e#diff-938709e499b4383c3ed33c3979b9080c suggesting >>> remote-dio actually improves performance, not sure it's a write or read >>> benchmark >>> When a file is opened with O_DIRECT it will also disable the >>> write-behind functionality >>> >>> - performance.strict-o-direct: when on, the AFR, will not ignore the >>> O_DIRECT flag. and will invoke: fop_writev_stub with the wb_writev_helper, >>> which seems to stack the operation, no idea why that is. But generally i >>> suppose not ignoring the O_DIRECT flag in the AFR is a good thing, when a >>> processes requests to have O_DIRECT. So this makes sense to me. >>> >>> - cluster.choose-local: when off, it doesn't prefer the local node, but >>> would always choose a brick. Since it's a 9 node cluster, with 3 >>> subvolumes, only a 1/3 could end-up local, and the other 2/3 should be >>> pushed to external nodes anyway. Or am I making the total wrong assumption >>> here? >>> >>> It seems to this config is moving to the gluster-block config side of >>> things, which does make sense. >>> Since we're running quite some mysql instances, which opens the files >>> with O_DIRECt i believe, it would mean the only layer of cache is within >>> mysql it self. Which you could argue is a good thing. But i would expect a >>> little of write-behind buffer, and maybe some of the data cached within >>> gluster would alleviate things a bit on gluster's side. But i wouldn't know >>> if that's the correct mind set, and so might be totally off here. >>> Also i would expect these gluster v set command to be online >>> operations, but somehow the bricks went down, after applying these changes. >>> What appears to have happened is that after the update the brick process >>> was restarted, but due to multiple brick process start issue, multiple >>> processes were started, and the brick didn't came online again. >>> However i'll try to reproduce this, since i would like to test with >>> cluster.choose-local: on, and see how performance compares. And hopefully >>> when it occurs collect some useful info. >>> Question; are network.remote-dio and performance.strict-o-direct >>> mutually exclusive settings, or can they both be on? >>> >>> 2. I've attached all brick logs, the only thing relevant i found was; >>> [2019-03-28 20:20:07.170452] I [MSGID: 113030] >>> [posix-entry-ops.c:1146:posix_unlink] 0-ovirt-kube-posix: >>> open-fd-key-status: 0 for >>> /data/gfs/bricks/brick1/ovirt-kube/.shard/a38d64bc-a28b-4ee1-a0bb-f919e7a1022c.109886 >>> [2019-03-28 20:20:07.170491] I [MSGID: 113031] >>> [posix-entry-ops.c:1053:posix_skip_non_linkto_unlink] 0-posix: linkto_xattr >>> status: 0 for >>> /data/gfs/bricks/brick1/ovirt-kube/.shard/a38d64bc-a28b-4ee1-a0bb-f919e7a1022c.109886 >>> [2019-03-28 20:20:07.248480] I [MSGID: 113030] >>> [posix-entry-ops.c:1146:posix_unlink] 0-ovirt-kube-posix: >>> open-fd-key-status: 0 for >>> /data/gfs/bricks/brick1/ovirt-kube/.shard/a38d64bc-a28b-4ee1-a0bb-f919e7a1022c.109886 >>> [2019-03-28 20:20:07.248491] I [MSGID: 113031] >>> [posix-entry-ops.c:1053:posix_skip_non_linkto_unlink] 0-posix: linkto_xattr >>> status: 0 for >>> /data/gfs/bricks/brick1/ovirt-kube/.shard/a38d64bc-a28b-4ee1-a0bb-f919e7a1022c.109886 >>> >>> Thanks Olaf >>> >>> ps. sorry needed to resend since it exceed the file limit >>> >>> Op ma 1 apr. 2019 om 07:56 schreef Krutika Dhananjay < >>> kdhananj at redhat.com>: >>> >>>> Adding back gluster-users >>>> Comments inline ... >>>> >>>> On Fri, Mar 29, 2019 at 8:11 PM Olaf Buitelaar < >>>> olaf.buitelaar at gmail.com> wrote: >>>> >>>>> Dear Krutika, >>>>> >>>>> >>>>> >>>>> 1. I?ve made 2 profile runs of around 10 minutes (see files >>>>> profile_data.txt and profile_data2.txt). Looking at it, most time seems be >>>>> spent at the fop?s fsync and readdirp. >>>>> >>>>> Unfortunate I don?t have the profile info for the 3.12.15 version so >>>>> it?s a bit hard to compare. >>>>> >>>>> One additional thing I do notice on 1 machine (10.32.9.5) the iowait >>>>> time increased a lot, from an average below the 1% it?s now around the 12% >>>>> after the upgrade. >>>>> >>>>> So first suspicion with be lighting strikes twice, and I?ve also just >>>>> now a bad disk, but that doesn?t appear to be the case, since all smart >>>>> status report ok. >>>>> >>>>> Also dd shows performance I would more or less expect; >>>>> >>>>> dd if=/dev/zero of=/data/test_file bs=100M count=1 oflag=dsync >>>>> >>>>> 1+0 records in >>>>> >>>>> 1+0 records out >>>>> >>>>> 104857600 bytes (105 MB) copied, 0.686088 s, 153 MB/s >>>>> >>>>> dd if=/dev/zero of=/data/test_file bs=1G count=1 oflag=dsync >>>>> >>>>> 1+0 records in >>>>> >>>>> 1+0 records out >>>>> >>>>> 1073741824 bytes (1.1 GB) copied, 7.61138 s, 141 MB/s >>>>> >>>>> if=/dev/urandom of=/data/test_file bs=1024 count=1000000 >>>>> >>>>> 1000000+0 records in >>>>> >>>>> 1000000+0 records out >>>>> >>>>> 1024000000 bytes (1.0 GB) copied, 6.35051 s, 161 MB/s >>>>> >>>>> dd if=/dev/zero of=/data/test_file bs=1024 count=1000000 >>>>> >>>>> 1000000+0 records in >>>>> >>>>> 1000000+0 records out >>>>> >>>>> 1024000000 bytes (1.0 GB) copied, 1.6899 s, 606 MB/s >>>>> >>>>> When I disable this brick (service glusterd stop; pkill glusterfsd) >>>>> performance in gluster is better, but not on par with what it was. Also the >>>>> cpu usages on the ?neighbor? nodes which hosts the other bricks in the same >>>>> subvolume increases quite a lot in this case, which I wouldn?t expect >>>>> actually since they shouldn't handle much more work, except flagging shards >>>>> to heal. Iowait also goes to idle once gluster is stopped, so it?s for >>>>> sure gluster which waits for io. >>>>> >>>>> >>>>> >>>> >>>> So I see that FSYNC %-latency is on the higher side. And I also noticed >>>> you don't have direct-io options enabled on the volume. >>>> Could you set the following options on the volume - >>>> # gluster volume set network.remote-dio off >>>> # gluster volume set performance.strict-o-direct on >>>> and also disable choose-local >>>> # gluster volume set cluster.choose-local off >>>> >>>> let me know if this helps. >>>> >>>> 2. I?ve attached the mnt log and volume info, but I couldn?t find >>>>> anything relevant in in those logs. I think this is because we run the VM?s >>>>> with libgfapi; >>>>> >>>>> [root at ovirt-host-01 ~]# engine-config -g LibgfApiSupported >>>>> >>>>> LibgfApiSupported: true version: 4.2 >>>>> >>>>> LibgfApiSupported: true version: 4.1 >>>>> >>>>> LibgfApiSupported: true version: 4.3 >>>>> >>>>> And I can confirm the qemu process is invoked with the gluster:// >>>>> address for the images. >>>>> >>>>> The message is logged in the /var/lib/libvert/qemu/ file, >>>>> which I?ve also included. For a sample case see around; 2019-03-28 20:20:07 >>>>> >>>>> Which has the error; E [MSGID: 133010] >>>>> [shard.c:2294:shard_common_lookup_shards_cbk] 0-ovirt-kube-shard: Lookup on >>>>> shard 109886 failed. Base file gfid = a38d64bc-a28b-4ee1-a0bb-f919e7a1022c >>>>> [Stale file handle] >>>>> >>>> >>>> Could you also attach the brick logs for this volume? >>>> >>>> >>>>> >>>>> 3. yes I see multiple instances for the same brick directory, like; >>>>> >>>>> /usr/sbin/glusterfsd -s 10.32.9.6 --volfile-id >>>>> ovirt-core.10.32.9.6.data-gfs-bricks-brick1-ovirt-core -p >>>>> /var/run/gluster/vols/ovirt-core/10.32.9.6-data-gfs-bricks-brick1-ovirt-core.pid >>>>> -S /var/run/gluster/452591c9165945d9.socket --brick-name >>>>> /data/gfs/bricks/brick1/ovirt-core -l >>>>> /var/log/glusterfs/bricks/data-gfs-bricks-brick1-ovirt-core.log >>>>> --xlator-option *-posix.glusterd-uuid=fb513da6-f3bd-4571-b8a2-db5efaf60cc1 >>>>> --process-name brick --brick-port 49154 --xlator-option >>>>> ovirt-core-server.listen-port=49154 >>>>> >>>>> >>>>> >>>>> I?ve made an export of the output of ps from the time I observed these >>>>> multiple processes. >>>>> >>>>> In addition the brick_mux bug as noted by Atin. I might also have >>>>> another possible cause, as ovirt moves nodes from none-operational state or >>>>> maintenance state to active/activating, it also seems to restart gluster, >>>>> however I don?t have direct proof for this theory. >>>>> >>>>> >>>>> >>>> >>>> +Atin Mukherjee ^^ >>>> +Mohit Agrawal ^^ >>>> >>>> -Krutika >>>> >>>> Thanks Olaf >>>>> >>>>> Op vr 29 mrt. 2019 om 10:03 schreef Sandro Bonazzola < >>>>> sbonazzo at redhat.com>: >>>>> >>>>>> >>>>>> >>>>>> Il giorno gio 28 mar 2019 alle ore 17:48 >>>>>> ha scritto: >>>>>> >>>>>>> Dear All, >>>>>>> >>>>>>> I wanted to share my experience upgrading from 4.2.8 to 4.3.1. While >>>>>>> previous upgrades from 4.1 to 4.2 etc. went rather smooth, this one was a >>>>>>> different experience. After first trying a test upgrade on a 3 node setup, >>>>>>> which went fine. i headed to upgrade the 9 node production platform, >>>>>>> unaware of the backward compatibility issues between gluster 3.12.15 -> >>>>>>> 5.3. After upgrading 2 nodes, the HA engine stopped and wouldn't start. >>>>>>> Vdsm wasn't able to mount the engine storage domain, since /dom_md/metadata >>>>>>> was missing or couldn't be accessed. Restoring this file by getting a good >>>>>>> copy of the underlying bricks, removing the file from the underlying bricks >>>>>>> where the file was 0 bytes and mark with the stickybit, and the >>>>>>> corresponding gfid's. Removing the file from the mount point, and copying >>>>>>> back the file on the mount point. Manually mounting the engine domain, and >>>>>>> manually creating the corresponding symbolic links in /rhev/data-center and >>>>>>> /var/run/vdsm/storage and fixing the ownership back to vdsm.kvm (which was >>>>>>> root.root), i was able to start the HA engine again. Since the engine was >>>>>>> up again, and things seemed rather unstable i decided to continue the >>>>>>> upgrade on the other nodes suspecting an incompatibility in gluster >>>>>>> versions, i thought would be best to have them all on the same version >>>>>>> rather soonish. However things went from bad to worse, the engine stopped >>>>>>> again, and all vm?s stopped working as well. So on a machine outside the >>>>>>> setup and restored a backup of the engine taken from version 4.2.8 just >>>>>>> before the upgrade. With this engine I was at least able to start some vm?s >>>>>>> again, and finalize the upgrade. Once the upgraded, things didn?t stabilize >>>>>>> and also lose 2 vm?s during the process due to image corruption. After >>>>>>> figuring out gluster 5.3 had quite some issues I was as lucky to see >>>>>>> gluster 5.5 was about to be released, on the moment the RPM?s were >>>>>>> available I?ve installed those. This helped a lot in terms of stability, >>>>>>> for which I?m very grateful! However the performance is unfortunate >>>>>>> terrible, it?s about 15% of what the performance was running gluster >>>>>>> 3.12.15. It?s strange since a simple dd shows ok performance, but our >>>>>>> actual workload doesn?t. While I would expect the performance to be better, >>>>>>> due to all improvements made since gluster version 3.12. Does anybody share >>>>>>> the same experience? >>>>>>> I really hope gluster 6 will soon be tested with ovirt and released, >>>>>>> and things start to perform and stabilize again..like the good old days. Of >>>>>>> course when I can do anything, I?m happy to help. >>>>>>> >>>>>> >>>>>> Opened https://bugzilla.redhat.com/show_bug.cgi?id=1693998 to track >>>>>> the rebase on Gluster 6. >>>>>> >>>>>> >>>>>> >>>>>>> >>>>>>> I think the following short list of issues we have after the >>>>>>> migration; >>>>>>> Gluster 5.5; >>>>>>> - Poor performance for our workload (mostly write dependent) >>>>>>> - VM?s randomly pause on unknown storage errors, which are >>>>>>> ?stale file?s?. corresponding log; Lookup on shard 797 failed. Base file >>>>>>> gfid = 8a27b91a-ff02-42dc-bd4c-caa019424de8 [Stale file handle] >>>>>>> - Some files are listed twice in a directory (probably related >>>>>>> the stale file issue?) >>>>>>> Example; >>>>>>> ls -la >>>>>>> /rhev/data-center/59cd53a9-0003-02d7-00eb-0000000001e3/313f5d25-76af-4ecd-9a20-82a2fe815a3c/images/4add6751-3731-4bbd-ae94-aaeed12ea450/ >>>>>>> total 3081 >>>>>>> drwxr-x---. 2 vdsm kvm 4096 Mar 18 11:34 . >>>>>>> drwxr-xr-x. 13 vdsm kvm 4096 Mar 19 09:42 .. >>>>>>> -rw-rw----. 1 vdsm kvm 1048576 Mar 28 12:55 >>>>>>> 1a7cf259-6b29-421d-9688-b25dfaafb13c >>>>>>> -rw-rw----. 1 vdsm kvm 1048576 Mar 28 12:55 >>>>>>> 1a7cf259-6b29-421d-9688-b25dfaafb13c >>>>>>> -rw-rw----. 1 vdsm kvm 1048576 Jan 27 2018 >>>>>>> 1a7cf259-6b29-421d-9688-b25dfaafb13c.lease >>>>>>> -rw-r--r--. 1 vdsm kvm 290 Jan 27 2018 >>>>>>> 1a7cf259-6b29-421d-9688-b25dfaafb13c.meta >>>>>>> -rw-r--r--. 1 vdsm kvm 290 Jan 27 2018 >>>>>>> 1a7cf259-6b29-421d-9688-b25dfaafb13c.meta >>>>>>> >>>>>>> - brick processes sometimes starts multiple times. Sometimes I?ve 5 >>>>>>> brick processes for a single volume. Killing all glusterfsd?s for the >>>>>>> volume on the machine and running gluster v start force usually just >>>>>>> starts one after the event, from then on things look all right. >>>>>>> >>>>>>> >>>>>> May I kindly ask to open bugs on Gluster for above issues at >>>>>> https://bugzilla.redhat.com/enter_bug.cgi?product=GlusterFS ? >>>>>> Sahina? >>>>>> >>>>>> >>>>>>> Ovirt 4.3.2.1-1.el7 >>>>>>> - All vms images ownership are changed to root.root after the >>>>>>> vm is shutdown, probably related to; >>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1666795 but not only >>>>>>> scoped to the HA engine. I?m still in compatibility mode 4.2 for the >>>>>>> cluster and for the vm?s, but upgraded to version ovirt 4.3.2 >>>>>>> >>>>>> >>>>>> Ryan? >>>>>> >>>>>> >>>>>>> - The network provider is set to ovn, which is fine..actually >>>>>>> cool, only the ?ovs-vswitchd? is a CPU hog, and utilizes 100% >>>>>>> >>>>>> >>>>>> Miguel? Dominik? >>>>>> >>>>>> >>>>>>> - It seems on all nodes vdsm tries to get the the stats for >>>>>>> the HA engine, which is filling the logs with (not sure if this is new); >>>>>>> [api.virt] FINISH getStats return={'status': {'message': "Virtual >>>>>>> machine does not exist: {'vmId': u'20d69acd-edfd-4aeb-a2ae-49e9c121b7e9'}", >>>>>>> 'code': 1}} from=::1,59290, vmId=20d69acd-edfd-4aeb-a2ae-49e9c121b7e9 >>>>>>> (api:54) >>>>>>> >>>>>> >>>>>> Simone? >>>>>> >>>>>> >>>>>>> - It seems the package os_brick [root] managedvolume not >>>>>>> supported: Managed Volume Not Supported. Missing package os-brick.: >>>>>>> ('Cannot import os_brick',) (caps:149) which fills the vdsm.log, but for >>>>>>> this I also saw another message, so I suspect this will already be resolved >>>>>>> shortly >>>>>>> - The machine I used to run the backup HA engine, doesn?t want >>>>>>> to get removed from the hosted-engine ?vm-status, not even after running; >>>>>>> hosted-engine --clean-metadata --host-id=10 --force-clean or hosted-engine >>>>>>> --clean-metadata --force-clean from the machine itself. >>>>>>> >>>>>> >>>>>> Simone? >>>>>> >>>>>> >>>>>>> >>>>>>> Think that's about it. >>>>>>> >>>>>>> Don?t get me wrong, I don?t want to rant, I just wanted to share my >>>>>>> experience and see where things can made better. >>>>>>> >>>>>> >>>>>> If not already done, can you please open bugs for above issues at >>>>>> https://bugzilla.redhat.com/enter_bug.cgi?classification=oVirt ? >>>>>> >>>>>> >>>>>>> >>>>>>> >>>>>>> Best Olaf >>>>>>> _______________________________________________ >>>>>>> Users mailing list -- users at ovirt.org >>>>>>> To unsubscribe send an email to users-leave at ovirt.org >>>>>>> Privacy Statement: https://www.ovirt.org/site/privacy-policy/ >>>>>>> oVirt Code of Conduct: >>>>>>> https://www.ovirt.org/community/about/community-guidelines/ >>>>>>> List Archives: >>>>>>> https://lists.ovirt.org/archives/list/users at ovirt.org/message/3CO35Q7VZMWNHS4LPUJNO7S47MGLSKS5/ >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> >>>>>> SANDRO BONAZZOLA >>>>>> >>>>>> MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV >>>>>> >>>>>> Red Hat EMEA >>>>>> >>>>>> sbonazzo at redhat.com >>>>>> >>>>>> >>>>> _______________________________________________ >>>>> Users mailing list -- users at ovirt.org >>>>> To unsubscribe send an email to users-leave at ovirt.org >>>>> Privacy Statement: https://www.ovirt.org/site/privacy-policy/ >>>>> oVirt Code of Conduct: >>>>> https://www.ovirt.org/community/about/community-guidelines/ >>>>> List Archives: >>>>> https://lists.ovirt.org/archives/list/users at ovirt.org/message/HAGTA64LF7LLE6YMHQ6DLT26MD2GZ2PK/ >>>>> >>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From olaf.buitelaar at gmail.com Wed Apr 3 17:05:49 2019 From: olaf.buitelaar at gmail.com (Olaf Buitelaar) Date: Wed, 3 Apr 2019 19:05:49 +0200 Subject: [Gluster-users] [ovirt-users] Re: Announcing Gluster release 5.5 In-Reply-To: References: <20190328164716.27693.35887@mail.ovirt.org> Message-ID: Dear Mohit, Thanks for backporting this issue. Hopefully we can address the others as well, if i can do anything let me know. On my side i've tested with: gluster volume reset cluster.choose-local, but haven't noticed really a change in performance. On the good side, the brick processes didn't crash with updating this config. I'll experiment with the other changes as well, and see how the combinations affect performance. I also saw this commit; https://review.gluster.org/#/c/glusterfs/+/21333/ which looks very useful, will this be an recommended option for VM/block workloads? Thanks Olaf Op wo 3 apr. 2019 om 17:56 schreef Mohit Agrawal : > > Hi, > > Thanks Olaf for sharing the relevant logs. > > @Atin, > You are right patch https://review.gluster.org/#/c/glusterfs/+/22344/ > will resolve the issue running multiple brick instance for same brick. > > As we can see in below logs glusterd is trying to start the same brick > instance twice at the same time > > [2019-04-01 10:23:21.752401] I > [glusterd-utils.c:6301:glusterd_brick_start] 0-management: starting a fresh > brick process for brick /data/gfs/bricks/brick1/ovirt-engine > [2019-04-01 10:23:30.348091] I > [glusterd-utils.c:6301:glusterd_brick_start] 0-management: starting a fresh > brick process for brick /data/gfs/bricks/brick1/ovirt-engine > [2019-04-01 10:24:13.353396] I > [glusterd-utils.c:6301:glusterd_brick_start] 0-management: starting a fresh > brick process for brick /data/gfs/bricks/brick1/ovirt-engine > [2019-04-01 10:24:24.253764] I > [glusterd-utils.c:6301:glusterd_brick_start] 0-management: starting a fresh > brick process for brick /data/gfs/bricks/brick1/ovirt-engine > > We are seeing below message between starting of two instances > The message "E [MSGID: 101012] [common-utils.c:4075:gf_is_service_running] > 0-: Unable to read pidfile: > /var/run/gluster/vols/ovirt-engine/10.32.9.5-data-gfs-bricks-brick1-ovirt-engine.pid" > repeated 2 times between [2019-04-01 10:23:21.748492] and [2019-04-01 > 10:23:21.752432] > > I will backport the same. > Thanks, > Mohit Agrawal > > On Wed, Apr 3, 2019 at 3:58 PM Olaf Buitelaar > wrote: > >> Dear Mohit, >> >> Sorry i thought Krutika was referring to the ovirt-kube brick logs. due >> the large size (18MB compressed), i've placed the files here; >> https://edgecastcdn.net/0004FA/files/bricklogs.tar.bz2 >> Also i see i've attached the wrong files, i intended to >> attach profile_data4.txt | profile_data3.txt >> Sorry for the confusion. >> >> Thanks Olaf >> >> Op wo 3 apr. 2019 om 04:56 schreef Mohit Agrawal : >> >>> Hi Olaf, >>> >>> As per current attached "multi-glusterfsd-vol3.txt | >>> multi-glusterfsd-vol4.txt" it is showing multiple processes are running >>> for "ovirt-core ovirt-engine" brick names but there are no logs >>> available in bricklogs.zip specific to this bricks, bricklogs.zip >>> has a dump of ovirt-kube logs only >>> >>> Kindly share brick logs specific to the bricks "ovirt-core >>> ovirt-engine" and share glusterd logs also. >>> >>> Regards >>> Mohit Agrawal >>> >>> On Tue, Apr 2, 2019 at 9:18 PM Olaf Buitelaar >>> wrote: >>> >>>> Dear Krutika, >>>> >>>> 1. >>>> I've changed the volume settings, write performance seems to increased >>>> somewhat, however the profile doesn't really support that since latencies >>>> increased. However read performance has diminished, which does seem to be >>>> supported by the profile runs (attached). >>>> Also the IO does seem to behave more consistent than before. >>>> I don't really understand the idea behind them, maybe you can explain >>>> why these suggestions are good? >>>> These settings seems to avoid as much local caching and access as >>>> possible and push everything to the gluster processes. While i would expect >>>> local access and local caches are a good thing, since it would lead to >>>> having less network access or disk access. >>>> I tried to investigate these settings a bit more, and this is what i >>>> understood of them; >>>> - network.remote-dio; when on it seems to ignore the O_DIRECT flag in >>>> the client, thus causing the files to be cached and buffered in the page >>>> cache on the client, i would expect this to be a good thing especially if >>>> the server process would access the same page cache? >>>> At least that is what grasp from this commit; >>>> https://review.gluster.org/#/c/glusterfs/+/4206/2/xlators/protocol/client/src/client.c line >>>> 867 >>>> Also found this commit; >>>> https://github.com/gluster/glusterfs/commit/06c4ba589102bf92c58cd9fba5c60064bc7a504e#diff-938709e499b4383c3ed33c3979b9080c suggesting >>>> remote-dio actually improves performance, not sure it's a write or read >>>> benchmark >>>> When a file is opened with O_DIRECT it will also disable the >>>> write-behind functionality >>>> >>>> - performance.strict-o-direct: when on, the AFR, will not ignore the >>>> O_DIRECT flag. and will invoke: fop_writev_stub with the wb_writev_helper, >>>> which seems to stack the operation, no idea why that is. But generally i >>>> suppose not ignoring the O_DIRECT flag in the AFR is a good thing, when a >>>> processes requests to have O_DIRECT. So this makes sense to me. >>>> >>>> - cluster.choose-local: when off, it doesn't prefer the local node, but >>>> would always choose a brick. Since it's a 9 node cluster, with 3 >>>> subvolumes, only a 1/3 could end-up local, and the other 2/3 should be >>>> pushed to external nodes anyway. Or am I making the total wrong assumption >>>> here? >>>> >>>> It seems to this config is moving to the gluster-block config side of >>>> things, which does make sense. >>>> Since we're running quite some mysql instances, which opens the files >>>> with O_DIRECt i believe, it would mean the only layer of cache is within >>>> mysql it self. Which you could argue is a good thing. But i would expect a >>>> little of write-behind buffer, and maybe some of the data cached within >>>> gluster would alleviate things a bit on gluster's side. But i wouldn't know >>>> if that's the correct mind set, and so might be totally off here. >>>> Also i would expect these gluster v set command to be online >>>> operations, but somehow the bricks went down, after applying these changes. >>>> What appears to have happened is that after the update the brick process >>>> was restarted, but due to multiple brick process start issue, multiple >>>> processes were started, and the brick didn't came online again. >>>> However i'll try to reproduce this, since i would like to test with >>>> cluster.choose-local: on, and see how performance compares. And hopefully >>>> when it occurs collect some useful info. >>>> Question; are network.remote-dio and performance.strict-o-direct >>>> mutually exclusive settings, or can they both be on? >>>> >>>> 2. I've attached all brick logs, the only thing relevant i found was; >>>> [2019-03-28 20:20:07.170452] I [MSGID: 113030] >>>> [posix-entry-ops.c:1146:posix_unlink] 0-ovirt-kube-posix: >>>> open-fd-key-status: 0 for >>>> /data/gfs/bricks/brick1/ovirt-kube/.shard/a38d64bc-a28b-4ee1-a0bb-f919e7a1022c.109886 >>>> [2019-03-28 20:20:07.170491] I [MSGID: 113031] >>>> [posix-entry-ops.c:1053:posix_skip_non_linkto_unlink] 0-posix: linkto_xattr >>>> status: 0 for >>>> /data/gfs/bricks/brick1/ovirt-kube/.shard/a38d64bc-a28b-4ee1-a0bb-f919e7a1022c.109886 >>>> [2019-03-28 20:20:07.248480] I [MSGID: 113030] >>>> [posix-entry-ops.c:1146:posix_unlink] 0-ovirt-kube-posix: >>>> open-fd-key-status: 0 for >>>> /data/gfs/bricks/brick1/ovirt-kube/.shard/a38d64bc-a28b-4ee1-a0bb-f919e7a1022c.109886 >>>> [2019-03-28 20:20:07.248491] I [MSGID: 113031] >>>> [posix-entry-ops.c:1053:posix_skip_non_linkto_unlink] 0-posix: linkto_xattr >>>> status: 0 for >>>> /data/gfs/bricks/brick1/ovirt-kube/.shard/a38d64bc-a28b-4ee1-a0bb-f919e7a1022c.109886 >>>> >>>> Thanks Olaf >>>> >>>> ps. sorry needed to resend since it exceed the file limit >>>> >>>> Op ma 1 apr. 2019 om 07:56 schreef Krutika Dhananjay < >>>> kdhananj at redhat.com>: >>>> >>>>> Adding back gluster-users >>>>> Comments inline ... >>>>> >>>>> On Fri, Mar 29, 2019 at 8:11 PM Olaf Buitelaar < >>>>> olaf.buitelaar at gmail.com> wrote: >>>>> >>>>>> Dear Krutika, >>>>>> >>>>>> >>>>>> >>>>>> 1. I?ve made 2 profile runs of around 10 minutes (see files >>>>>> profile_data.txt and profile_data2.txt). Looking at it, most time seems be >>>>>> spent at the fop?s fsync and readdirp. >>>>>> >>>>>> Unfortunate I don?t have the profile info for the 3.12.15 version so >>>>>> it?s a bit hard to compare. >>>>>> >>>>>> One additional thing I do notice on 1 machine (10.32.9.5) the iowait >>>>>> time increased a lot, from an average below the 1% it?s now around the 12% >>>>>> after the upgrade. >>>>>> >>>>>> So first suspicion with be lighting strikes twice, and I?ve also just >>>>>> now a bad disk, but that doesn?t appear to be the case, since all smart >>>>>> status report ok. >>>>>> >>>>>> Also dd shows performance I would more or less expect; >>>>>> >>>>>> dd if=/dev/zero of=/data/test_file bs=100M count=1 oflag=dsync >>>>>> >>>>>> 1+0 records in >>>>>> >>>>>> 1+0 records out >>>>>> >>>>>> 104857600 bytes (105 MB) copied, 0.686088 s, 153 MB/s >>>>>> >>>>>> dd if=/dev/zero of=/data/test_file bs=1G count=1 oflag=dsync >>>>>> >>>>>> 1+0 records in >>>>>> >>>>>> 1+0 records out >>>>>> >>>>>> 1073741824 bytes (1.1 GB) copied, 7.61138 s, 141 MB/s >>>>>> >>>>>> if=/dev/urandom of=/data/test_file bs=1024 count=1000000 >>>>>> >>>>>> 1000000+0 records in >>>>>> >>>>>> 1000000+0 records out >>>>>> >>>>>> 1024000000 bytes (1.0 GB) copied, 6.35051 s, 161 MB/s >>>>>> >>>>>> dd if=/dev/zero of=/data/test_file bs=1024 count=1000000 >>>>>> >>>>>> 1000000+0 records in >>>>>> >>>>>> 1000000+0 records out >>>>>> >>>>>> 1024000000 bytes (1.0 GB) copied, 1.6899 s, 606 MB/s >>>>>> >>>>>> When I disable this brick (service glusterd stop; pkill glusterfsd) >>>>>> performance in gluster is better, but not on par with what it was. Also the >>>>>> cpu usages on the ?neighbor? nodes which hosts the other bricks in the same >>>>>> subvolume increases quite a lot in this case, which I wouldn?t expect >>>>>> actually since they shouldn't handle much more work, except flagging shards >>>>>> to heal. Iowait also goes to idle once gluster is stopped, so it?s for >>>>>> sure gluster which waits for io. >>>>>> >>>>>> >>>>>> >>>>> >>>>> So I see that FSYNC %-latency is on the higher side. And I also >>>>> noticed you don't have direct-io options enabled on the volume. >>>>> Could you set the following options on the volume - >>>>> # gluster volume set network.remote-dio off >>>>> # gluster volume set performance.strict-o-direct on >>>>> and also disable choose-local >>>>> # gluster volume set cluster.choose-local off >>>>> >>>>> let me know if this helps. >>>>> >>>>> 2. I?ve attached the mnt log and volume info, but I couldn?t find >>>>>> anything relevant in in those logs. I think this is because we run the VM?s >>>>>> with libgfapi; >>>>>> >>>>>> [root at ovirt-host-01 ~]# engine-config -g LibgfApiSupported >>>>>> >>>>>> LibgfApiSupported: true version: 4.2 >>>>>> >>>>>> LibgfApiSupported: true version: 4.1 >>>>>> >>>>>> LibgfApiSupported: true version: 4.3 >>>>>> >>>>>> And I can confirm the qemu process is invoked with the gluster:// >>>>>> address for the images. >>>>>> >>>>>> The message is logged in the /var/lib/libvert/qemu/ file, >>>>>> which I?ve also included. For a sample case see around; 2019-03-28 20:20:07 >>>>>> >>>>>> Which has the error; E [MSGID: 133010] >>>>>> [shard.c:2294:shard_common_lookup_shards_cbk] 0-ovirt-kube-shard: Lookup on >>>>>> shard 109886 failed. Base file gfid = a38d64bc-a28b-4ee1-a0bb-f919e7a1022c >>>>>> [Stale file handle] >>>>>> >>>>> >>>>> Could you also attach the brick logs for this volume? >>>>> >>>>> >>>>>> >>>>>> 3. yes I see multiple instances for the same brick directory, like; >>>>>> >>>>>> /usr/sbin/glusterfsd -s 10.32.9.6 --volfile-id >>>>>> ovirt-core.10.32.9.6.data-gfs-bricks-brick1-ovirt-core -p >>>>>> /var/run/gluster/vols/ovirt-core/10.32.9.6-data-gfs-bricks-brick1-ovirt-core.pid >>>>>> -S /var/run/gluster/452591c9165945d9.socket --brick-name >>>>>> /data/gfs/bricks/brick1/ovirt-core -l >>>>>> /var/log/glusterfs/bricks/data-gfs-bricks-brick1-ovirt-core.log >>>>>> --xlator-option *-posix.glusterd-uuid=fb513da6-f3bd-4571-b8a2-db5efaf60cc1 >>>>>> --process-name brick --brick-port 49154 --xlator-option >>>>>> ovirt-core-server.listen-port=49154 >>>>>> >>>>>> >>>>>> >>>>>> I?ve made an export of the output of ps from the time I observed >>>>>> these multiple processes. >>>>>> >>>>>> In addition the brick_mux bug as noted by Atin. I might also have >>>>>> another possible cause, as ovirt moves nodes from none-operational state or >>>>>> maintenance state to active/activating, it also seems to restart gluster, >>>>>> however I don?t have direct proof for this theory. >>>>>> >>>>>> >>>>>> >>>>> >>>>> +Atin Mukherjee ^^ >>>>> +Mohit Agrawal ^^ >>>>> >>>>> -Krutika >>>>> >>>>> Thanks Olaf >>>>>> >>>>>> Op vr 29 mrt. 2019 om 10:03 schreef Sandro Bonazzola < >>>>>> sbonazzo at redhat.com>: >>>>>> >>>>>>> >>>>>>> >>>>>>> Il giorno gio 28 mar 2019 alle ore 17:48 >>>>>>> ha scritto: >>>>>>> >>>>>>>> Dear All, >>>>>>>> >>>>>>>> I wanted to share my experience upgrading from 4.2.8 to 4.3.1. >>>>>>>> While previous upgrades from 4.1 to 4.2 etc. went rather smooth, this one >>>>>>>> was a different experience. After first trying a test upgrade on a 3 node >>>>>>>> setup, which went fine. i headed to upgrade the 9 node production platform, >>>>>>>> unaware of the backward compatibility issues between gluster 3.12.15 -> >>>>>>>> 5.3. After upgrading 2 nodes, the HA engine stopped and wouldn't start. >>>>>>>> Vdsm wasn't able to mount the engine storage domain, since /dom_md/metadata >>>>>>>> was missing or couldn't be accessed. Restoring this file by getting a good >>>>>>>> copy of the underlying bricks, removing the file from the underlying bricks >>>>>>>> where the file was 0 bytes and mark with the stickybit, and the >>>>>>>> corresponding gfid's. Removing the file from the mount point, and copying >>>>>>>> back the file on the mount point. Manually mounting the engine domain, and >>>>>>>> manually creating the corresponding symbolic links in /rhev/data-center and >>>>>>>> /var/run/vdsm/storage and fixing the ownership back to vdsm.kvm (which was >>>>>>>> root.root), i was able to start the HA engine again. Since the engine was >>>>>>>> up again, and things seemed rather unstable i decided to continue the >>>>>>>> upgrade on the other nodes suspecting an incompatibility in gluster >>>>>>>> versions, i thought would be best to have them all on the same version >>>>>>>> rather soonish. However things went from bad to worse, the engine stopped >>>>>>>> again, and all vm?s stopped working as well. So on a machine outside the >>>>>>>> setup and restored a backup of the engine taken from version 4.2.8 just >>>>>>>> before the upgrade. With this engine I was at least able to start some vm?s >>>>>>>> again, and finalize the upgrade. Once the upgraded, things didn?t stabilize >>>>>>>> and also lose 2 vm?s during the process due to image corruption. After >>>>>>>> figuring out gluster 5.3 had quite some issues I was as lucky to see >>>>>>>> gluster 5.5 was about to be released, on the moment the RPM?s were >>>>>>>> available I?ve installed those. This helped a lot in terms of stability, >>>>>>>> for which I?m very grateful! However the performance is unfortunate >>>>>>>> terrible, it?s about 15% of what the performance was running gluster >>>>>>>> 3.12.15. It?s strange since a simple dd shows ok performance, but our >>>>>>>> actual workload doesn?t. While I would expect the performance to be better, >>>>>>>> due to all improvements made since gluster version 3.12. Does anybody share >>>>>>>> the same experience? >>>>>>>> I really hope gluster 6 will soon be tested with ovirt and >>>>>>>> released, and things start to perform and stabilize again..like the good >>>>>>>> old days. Of course when I can do anything, I?m happy to help. >>>>>>>> >>>>>>> >>>>>>> Opened https://bugzilla.redhat.com/show_bug.cgi?id=1693998 to track >>>>>>> the rebase on Gluster 6. >>>>>>> >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> I think the following short list of issues we have after the >>>>>>>> migration; >>>>>>>> Gluster 5.5; >>>>>>>> - Poor performance for our workload (mostly write dependent) >>>>>>>> - VM?s randomly pause on unknown storage errors, which are >>>>>>>> ?stale file?s?. corresponding log; Lookup on shard 797 failed. Base file >>>>>>>> gfid = 8a27b91a-ff02-42dc-bd4c-caa019424de8 [Stale file handle] >>>>>>>> - Some files are listed twice in a directory (probably >>>>>>>> related the stale file issue?) >>>>>>>> Example; >>>>>>>> ls -la >>>>>>>> /rhev/data-center/59cd53a9-0003-02d7-00eb-0000000001e3/313f5d25-76af-4ecd-9a20-82a2fe815a3c/images/4add6751-3731-4bbd-ae94-aaeed12ea450/ >>>>>>>> total 3081 >>>>>>>> drwxr-x---. 2 vdsm kvm 4096 Mar 18 11:34 . >>>>>>>> drwxr-xr-x. 13 vdsm kvm 4096 Mar 19 09:42 .. >>>>>>>> -rw-rw----. 1 vdsm kvm 1048576 Mar 28 12:55 >>>>>>>> 1a7cf259-6b29-421d-9688-b25dfaafb13c >>>>>>>> -rw-rw----. 1 vdsm kvm 1048576 Mar 28 12:55 >>>>>>>> 1a7cf259-6b29-421d-9688-b25dfaafb13c >>>>>>>> -rw-rw----. 1 vdsm kvm 1048576 Jan 27 2018 >>>>>>>> 1a7cf259-6b29-421d-9688-b25dfaafb13c.lease >>>>>>>> -rw-r--r--. 1 vdsm kvm 290 Jan 27 2018 >>>>>>>> 1a7cf259-6b29-421d-9688-b25dfaafb13c.meta >>>>>>>> -rw-r--r--. 1 vdsm kvm 290 Jan 27 2018 >>>>>>>> 1a7cf259-6b29-421d-9688-b25dfaafb13c.meta >>>>>>>> >>>>>>>> - brick processes sometimes starts multiple times. Sometimes I?ve 5 >>>>>>>> brick processes for a single volume. Killing all glusterfsd?s for the >>>>>>>> volume on the machine and running gluster v start force usually just >>>>>>>> starts one after the event, from then on things look all right. >>>>>>>> >>>>>>>> >>>>>>> May I kindly ask to open bugs on Gluster for above issues at >>>>>>> https://bugzilla.redhat.com/enter_bug.cgi?product=GlusterFS ? >>>>>>> Sahina? >>>>>>> >>>>>>> >>>>>>>> Ovirt 4.3.2.1-1.el7 >>>>>>>> - All vms images ownership are changed to root.root after the >>>>>>>> vm is shutdown, probably related to; >>>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1666795 but not only >>>>>>>> scoped to the HA engine. I?m still in compatibility mode 4.2 for the >>>>>>>> cluster and for the vm?s, but upgraded to version ovirt 4.3.2 >>>>>>>> >>>>>>> >>>>>>> Ryan? >>>>>>> >>>>>>> >>>>>>>> - The network provider is set to ovn, which is fine..actually >>>>>>>> cool, only the ?ovs-vswitchd? is a CPU hog, and utilizes 100% >>>>>>>> >>>>>>> >>>>>>> Miguel? Dominik? >>>>>>> >>>>>>> >>>>>>>> - It seems on all nodes vdsm tries to get the the stats for >>>>>>>> the HA engine, which is filling the logs with (not sure if this is new); >>>>>>>> [api.virt] FINISH getStats return={'status': {'message': "Virtual >>>>>>>> machine does not exist: {'vmId': u'20d69acd-edfd-4aeb-a2ae-49e9c121b7e9'}", >>>>>>>> 'code': 1}} from=::1,59290, vmId=20d69acd-edfd-4aeb-a2ae-49e9c121b7e9 >>>>>>>> (api:54) >>>>>>>> >>>>>>> >>>>>>> Simone? >>>>>>> >>>>>>> >>>>>>>> - It seems the package os_brick [root] managedvolume not >>>>>>>> supported: Managed Volume Not Supported. Missing package os-brick.: >>>>>>>> ('Cannot import os_brick',) (caps:149) which fills the vdsm.log, but for >>>>>>>> this I also saw another message, so I suspect this will already be resolved >>>>>>>> shortly >>>>>>>> - The machine I used to run the backup HA engine, doesn?t >>>>>>>> want to get removed from the hosted-engine ?vm-status, not even after >>>>>>>> running; hosted-engine --clean-metadata --host-id=10 --force-clean or >>>>>>>> hosted-engine --clean-metadata --force-clean from the machine itself. >>>>>>>> >>>>>>> >>>>>>> Simone? >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> Think that's about it. >>>>>>>> >>>>>>>> Don?t get me wrong, I don?t want to rant, I just wanted to share my >>>>>>>> experience and see where things can made better. >>>>>>>> >>>>>>> >>>>>>> If not already done, can you please open bugs for above issues at >>>>>>> https://bugzilla.redhat.com/enter_bug.cgi?classification=oVirt ? >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Best Olaf >>>>>>>> _______________________________________________ >>>>>>>> Users mailing list -- users at ovirt.org >>>>>>>> To unsubscribe send an email to users-leave at ovirt.org >>>>>>>> Privacy Statement: https://www.ovirt.org/site/privacy-policy/ >>>>>>>> oVirt Code of Conduct: >>>>>>>> https://www.ovirt.org/community/about/community-guidelines/ >>>>>>>> List Archives: >>>>>>>> https://lists.ovirt.org/archives/list/users at ovirt.org/message/3CO35Q7VZMWNHS4LPUJNO7S47MGLSKS5/ >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> >>>>>>> SANDRO BONAZZOLA >>>>>>> >>>>>>> MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV >>>>>>> >>>>>>> Red Hat EMEA >>>>>>> >>>>>>> sbonazzo at redhat.com >>>>>>> >>>>>>> >>>>>> _______________________________________________ >>>>>> Users mailing list -- users at ovirt.org >>>>>> To unsubscribe send an email to users-leave at ovirt.org >>>>>> Privacy Statement: https://www.ovirt.org/site/privacy-policy/ >>>>>> oVirt Code of Conduct: >>>>>> https://www.ovirt.org/community/about/community-guidelines/ >>>>>> List Archives: >>>>>> https://lists.ovirt.org/archives/list/users at ovirt.org/message/HAGTA64LF7LLE6YMHQ6DLT26MD2GZ2PK/ >>>>>> >>>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From budic at onholyground.com Wed Apr 3 20:37:48 2019 From: budic at onholyground.com (Darrell Budic) Date: Wed, 3 Apr 2019 15:37:48 -0500 Subject: [Gluster-users] [Gluster-devel] Upgrade testing to gluster 6 In-Reply-To: References: Message-ID: <13CA0DC5-C248-40B8-B2D4-E6664812303A@onholyground.com> Hari- I was upgrading my test cluster from 5.5 to 6 and I hit this bug (https://bugzilla.redhat.com/show_bug.cgi?id=1694010 ) or something similar. In my case, the workaround did not work, and I was left with a gluster that had gone into no-quorum mode and stopped all the bricks. Wasn?t much in the logs either, but I noticed my /etc/glusterfs/glusterd.vol files were not the same as the newer versions, so I updated them, restarted glusterd, and suddenly the updated node showed as peer-in-cluster again. Once I updated other notes the same way, things started working again. Maybe a place to look? My old config (all nodes): volume management type mgmt/glusterd option working-directory /var/lib/glusterd option transport-type socket option transport.socket.keepalive-time 10 option transport.socket.keepalive-interval 2 option transport.socket.read-fail-log off option ping-timeout 10 option event-threads 1 option rpc-auth-allow-insecure on # option transport.address-family inet6 # option base-port 49152 end-volume changed to: volume management type mgmt/glusterd option working-directory /var/lib/glusterd option transport-type socket,rdma option transport.socket.keepalive-time 10 option transport.socket.keepalive-interval 2 option transport.socket.read-fail-log off option transport.socket.listen-port 24007 option transport.rdma.listen-port 24008 option ping-timeout 0 option event-threads 1 option rpc-auth-allow-insecure on # option lock-timer 180 # option transport.address-family inet6 # option base-port 49152 option max-port 60999 end-volume the only thing I found in the glusterd logs that looks relevant was (repeated for both of the other nodes in this cluster), so no clue why it happened: [2019-04-03 20:19:16.802638] I [MSGID: 106004] [glusterd-handler.c:6427:__glusterd_peer_rpc_notify] 0-management: Peer (<0ecbf953-681b-448f-9746-d1c1fe7a0978>), in state , has disconnected from glusterd. > On Apr 2, 2019, at 4:53 AM, Atin Mukherjee wrote: > > > > On Mon, 1 Apr 2019 at 10:28, Hari Gowtham > wrote: > Comments inline. > > On Mon, Apr 1, 2019 at 5:55 AM Sankarshan Mukhopadhyay > > wrote: > > > > Quite a considerable amount of detail here. Thank you! > > > > On Fri, Mar 29, 2019 at 11:42 AM Hari Gowtham > wrote: > > > > > > Hello Gluster users, > > > > > > As you all aware that glusterfs-6 is out, we would like to inform you > > > that, we have spent a significant amount of time in testing > > > glusterfs-6 in upgrade scenarios. We have done upgrade testing to > > > glusterfs-6 from various releases like 3.12, 4.1 and 5.3. > > > > > > As glusterfs-6 has got in a lot of changes, we wanted to test those portions. > > > There were xlators (and respective options to enable/disable them) > > > added and deprecated in glusterfs-6 from various versions [1]. > > > > > > We had to check the following upgrade scenarios for all such options > > > Identified in [1]: > > > 1) option never enabled and upgraded > > > 2) option enabled and then upgraded > > > 3) option enabled and then disabled and then upgraded > > > > > > We weren't manually able to check all the combinations for all the options. > > > So the options involving enabling and disabling xlators were prioritized. > > > The below are the result of the ones tested. > > > > > > Never enabled and upgraded: > > > checked from 3.12, 4.1, 5.3 to 6 the upgrade works. > > > > > > Enabled and upgraded: > > > Tested for tier which is deprecated, It is not a recommended upgrade. > > > As expected the volume won't be consumable and will have a few more > > > issues as well. > > > Tested with 3.12, 4.1 and 5.3 to 6 upgrade. > > > > > > Enabled, disabled before upgrade. > > > Tested for tier with 3.12 and the upgrade went fine. > > > > > > There is one common issue to note in every upgrade. The node being > > > upgraded is going into disconnected state. You have to flush the iptables > > > and the restart glusterd on all nodes to fix this. > > > > > > > Is this something that is written in the upgrade notes? I do not seem > > to recall, if not, I'll send a PR > > No this wasn't mentioned in the release notes. PRs are welcome. > > > > > > The testing for enabling new options is still pending. The new options > > > won't cause as much issues as the deprecated ones so this was put at > > > the end of the priority list. It would be nice to get contributions > > > for this. > > > > > > > Did the range of tests lead to any new issues? > > Yes. In the first round of testing we found an issue and had to postpone the > release of 6 until the fix was made available. > https://bugzilla.redhat.com/show_bug.cgi?id=1684029 > > And then we tested it again after this patch was made available. > and came across this: > https://bugzilla.redhat.com/show_bug.cgi?id=1694010 > > This isn?t a bug as we found that upgrade worked seamelessly in two different setup. So we have no issues in the upgrade path to glusterfs-6 release. > > > > Have mentioned this in the second mail as to how to over this situation > for now until the fix is available. > > > > > > For the disable testing, tier was used as it covers most of the xlator > > > that was removed. And all of these tests were done on a replica 3 volume. > > > > > > > I'm not sure if the Glusto team is reading this, but it would be > > pertinent to understand if the approach you have taken can be > > converted into a form of automated testing pre-release. > > I don't have an answer for this, have CCed Vijay. > He might have an idea. > > > > > > Note: This is only for upgrade testing of the newly added and removed > > > xlators. Does not involve the normal tests for the xlator. > > > > > > If you have any questions, please feel free to reach us. > > > > > > [1] https://docs.google.com/spreadsheets/d/1nh7T5AXaV6kc5KgILOy2pEqjzC3t_R47f1XUXSVFetI/edit?usp=sharing > > > > > > Regards, > > > Hari and Sanju. > > _______________________________________________ > > Gluster-users mailing list > > Gluster-users at gluster.org > > https://lists.gluster.org/mailman/listinfo/gluster-users > > > > -- > Regards, > Hari Gowtham. > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > -- > --Atin > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From meira at cesup.ufrgs.br Wed Apr 3 22:06:47 2019 From: meira at cesup.ufrgs.br (Lindolfo Meira) Date: Wed, 3 Apr 2019 19:06:47 -0300 (-03) Subject: [Gluster-users] Enabling quotas on gluster Message-ID: Hi folks. Does anyone know how significant is the performance penalty for enabling directory level quotas on a gluster fs, compared to the case with no quotas at all? Lindolfo Meira, MSc Diretor Geral, Centro Nacional de Supercomputa??o Universidade Federal do Rio Grande do Sul +55 (51) 3308-3139 From hunter86_bg at yahoo.com Thu Apr 4 07:11:09 2019 From: hunter86_bg at yahoo.com (Strahil) Date: Thu, 04 Apr 2019 10:11:09 +0300 Subject: [Gluster-users] Gluster 5.5 slower than 3.12.15 Message-ID: Hi Amar, I would like to test Cluster v6 , but as I'm quite new to oVirt - I'm not sure if oVirt <-> Gluster will communicate properly Did anyone test rollback from v6 to v5.5 ? If rollback is possible - I would be happy to give it a try. Best Regards, Strahil NikolovOn Apr 3, 2019 11:35, Amar Tumballi Suryanarayan wrote: > > Strahil, > > With some basic testing, we are noticing the similar behavior too. > > One of the issue we identified was increased n/w usage in 5.x series (being addressed by?https://review.gluster.org/#/c/glusterfs/+/22404/), and there are few other features which write extended attributes which caused some delay. > > We are in the process of publishing some numbers with release-3.12.x, release-5 and release-6 comparison soon. With some numbers we are already seeing release-6 currently is giving really good performance in many configurations, specially for 1x3 replicate volume type. > > While we continue to identify and fix issues in 5.x series, one of the request is to validate release-6.x (6.0 or 6.1 which would happen on April 10th), so you can see the difference in your workload. > > Regards, > Amar > > > > On Wed, Apr 3, 2019 at 5:57 AM Strahil Nikolov wrote: >> >> Hi Community, >> >> I have the feeling that with gluster v5.5 I have poorer performance than it used to be on 3.12.15. Did you observe something like that? >> >> I have a 3 node Hyperconverged Cluster (ovirt + glusterfs with replica 3 arbiter1 volumes) with NFS Ganesha and since I have upgraded to v5 - the issues came up. >> First it was 5.3 notorious experience and now with 5.5 - my sanlock is having problems and higher latency than it used to be. I have switched from NFS-Ganesha to pure FUSE , but the latency problems do not go away. >> >> Of course , this is partially due to the consumer hardware, but as the hardware has not changed I was hoping that the performance will remain as is. >> >> So, do you expect 5.5 to perform less than 3.12 ? >> >> Some info: >> Volume Name: engine >> Type: Replicate >> Volume ID: 30ca1cc2-f2f7-4749-9e2e-cee9d7099ded >> Status: Started >> Snapshot Count: 0 >> Number of Bricks: 1 x (2 + 1) = 3 >> Transport-type: tcp >> Bricks: >> Brick1: ovirt1:/gluster_bricks/engine/engine >> Brick2: ovirt2:/gluster_bricks/engine/engine >> Brick3: ovirt3:/gluster_bricks/engine/engine (arbiter) >> Options Reconfigured: >> performance.client-io-threads: off >> nfs.disable: on >> transport.address-family: inet >> performance.quick-read: off >> performance.read-ahead: off >> performance.io-cache: off >> performance.low-prio-threads: 32 >> network.remote-dio: off >> cluster.eager-lock: enable >> cluster.quorum-type: auto >> cluster.server-quorum-type: server >> cluster.data-self-heal-algorithm: full >> cluster.locking-scheme: granular >> cluster.shd-max-threads: 8 >> cluster.shd-wait-qlength: 10000 >> features.shard: on >> user.cifs: off >> storage.owner-uid: 36 >> storage.owner-gid: 36 >> network.ping-timeout: 30 >> performance.strict-o-direct: on >> cluster.granular-entry-heal: enable >> cluster.enable-shared-storage: enable >> >> Network: 1 gbit/s >> >> Filesystem:XFS >> >> Best Regards, >> Strahil Nikolov >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users > > > > -- > Amar Tumballi (amarts) -------------- next part -------------- An HTML attachment was scrubbed... URL: From srakonde at redhat.com Thu Apr 4 07:54:53 2019 From: srakonde at redhat.com (Sanju Rakonde) Date: Thu, 4 Apr 2019 13:24:53 +0530 Subject: [Gluster-users] [Gluster-devel] Upgrade testing to gluster 6 In-Reply-To: <13CA0DC5-C248-40B8-B2D4-E6664812303A@onholyground.com> References: <13CA0DC5-C248-40B8-B2D4-E6664812303A@onholyground.com> Message-ID: We don't hit https://bugzilla.redhat.com/show_bug.cgi?id=1694010 while upgrading to glusterfs-6. We tested it in different setups and understood that this issue is seen because of some issue in setup. regarding the issue you have faced, can you please let us know which documentation you have followed for the upgrade? During our testing, we didn't hit any such issue. we would like to understand what went wrong. On Thu, Apr 4, 2019 at 2:08 AM Darrell Budic wrote: > Hari- > > I was upgrading my test cluster from 5.5 to 6 and I hit this bug ( > https://bugzilla.redhat.com/show_bug.cgi?id=1694010) or something > similar. In my case, the workaround did not work, and I was left with a > gluster that had gone into no-quorum mode and stopped all the bricks. > Wasn?t much in the logs either, but I noticed my > /etc/glusterfs/glusterd.vol files were not the same as the newer versions, > so I updated them, restarted glusterd, and suddenly the updated node showed > as peer-in-cluster again. Once I updated other notes the same way, things > started working again. Maybe a place to look? > > My old config (all nodes): > volume management > type mgmt/glusterd > option working-directory /var/lib/glusterd > option transport-type socket > option transport.socket.keepalive-time 10 > option transport.socket.keepalive-interval 2 > option transport.socket.read-fail-log off > option ping-timeout 10 > option event-threads 1 > option rpc-auth-allow-insecure on > # option transport.address-family inet6 > # option base-port 49152 > end-volume > > changed to: > volume management > type mgmt/glusterd > option working-directory /var/lib/glusterd > option transport-type socket,rdma > option transport.socket.keepalive-time 10 > option transport.socket.keepalive-interval 2 > option transport.socket.read-fail-log off > option transport.socket.listen-port 24007 > option transport.rdma.listen-port 24008 > option ping-timeout 0 > option event-threads 1 > option rpc-auth-allow-insecure on > # option lock-timer 180 > # option transport.address-family inet6 > # option base-port 49152 > option max-port 60999 > end-volume > > the only thing I found in the glusterd logs that looks relevant was > (repeated for both of the other nodes in this cluster), so no clue why it > happened: > [2019-04-03 20:19:16.802638] I [MSGID: 106004] > [glusterd-handler.c:6427:__glusterd_peer_rpc_notify] 0-management: Peer > (<0ecbf953-681b-448f-9746-d1c1fe7a0978>), in state Cluster>, has disconnected from glusterd. > > > On Apr 2, 2019, at 4:53 AM, Atin Mukherjee > wrote: > > > > On Mon, 1 Apr 2019 at 10:28, Hari Gowtham wrote: > >> Comments inline. >> >> On Mon, Apr 1, 2019 at 5:55 AM Sankarshan Mukhopadhyay >> wrote: >> > >> > Quite a considerable amount of detail here. Thank you! >> > >> > On Fri, Mar 29, 2019 at 11:42 AM Hari Gowtham >> wrote: >> > > >> > > Hello Gluster users, >> > > >> > > As you all aware that glusterfs-6 is out, we would like to inform you >> > > that, we have spent a significant amount of time in testing >> > > glusterfs-6 in upgrade scenarios. We have done upgrade testing to >> > > glusterfs-6 from various releases like 3.12, 4.1 and 5.3. >> > > >> > > As glusterfs-6 has got in a lot of changes, we wanted to test those >> portions. >> > > There were xlators (and respective options to enable/disable them) >> > > added and deprecated in glusterfs-6 from various versions [1]. >> > > >> > > We had to check the following upgrade scenarios for all such options >> > > Identified in [1]: >> > > 1) option never enabled and upgraded >> > > 2) option enabled and then upgraded >> > > 3) option enabled and then disabled and then upgraded >> > > >> > > We weren't manually able to check all the combinations for all the >> options. >> > > So the options involving enabling and disabling xlators were >> prioritized. >> > > The below are the result of the ones tested. >> > > >> > > Never enabled and upgraded: >> > > checked from 3.12, 4.1, 5.3 to 6 the upgrade works. >> > > >> > > Enabled and upgraded: >> > > Tested for tier which is deprecated, It is not a recommended upgrade. >> > > As expected the volume won't be consumable and will have a few more >> > > issues as well. >> > > Tested with 3.12, 4.1 and 5.3 to 6 upgrade. >> > > >> > > Enabled, disabled before upgrade. >> > > Tested for tier with 3.12 and the upgrade went fine. >> > > >> > > There is one common issue to note in every upgrade. The node being >> > > upgraded is going into disconnected state. You have to flush the >> iptables >> > > and the restart glusterd on all nodes to fix this. >> > > >> > >> > Is this something that is written in the upgrade notes? I do not seem >> > to recall, if not, I'll send a PR >> >> No this wasn't mentioned in the release notes. PRs are welcome. >> >> > >> > > The testing for enabling new options is still pending. The new options >> > > won't cause as much issues as the deprecated ones so this was put at >> > > the end of the priority list. It would be nice to get contributions >> > > for this. >> > > >> > >> > Did the range of tests lead to any new issues? >> >> Yes. In the first round of testing we found an issue and had to postpone >> the >> release of 6 until the fix was made available. >> https://bugzilla.redhat.com/show_bug.cgi?id=1684029 >> >> And then we tested it again after this patch was made available. >> and came across this: >> https://bugzilla.redhat.com/show_bug.cgi?id=1694010 > > > This isn?t a bug as we found that upgrade worked seamelessly in two > different setup. So we have no issues in the upgrade path to glusterfs-6 > release. > > >> >> Have mentioned this in the second mail as to how to over this situation >> for now until the fix is available. >> >> > >> > > For the disable testing, tier was used as it covers most of the xlator >> > > that was removed. And all of these tests were done on a replica 3 >> volume. >> > > >> > >> > I'm not sure if the Glusto team is reading this, but it would be >> > pertinent to understand if the approach you have taken can be >> > converted into a form of automated testing pre-release. >> >> I don't have an answer for this, have CCed Vijay. >> He might have an idea. >> >> > >> > > Note: This is only for upgrade testing of the newly added and removed >> > > xlators. Does not involve the normal tests for the xlator. >> > > >> > > If you have any questions, please feel free to reach us. >> > > >> > > [1] >> https://docs.google.com/spreadsheets/d/1nh7T5AXaV6kc5KgILOy2pEqjzC3t_R47f1XUXSVFetI/edit?usp=sharing >> > > >> > > Regards, >> > > Hari and Sanju. >> > _______________________________________________ >> > Gluster-users mailing list >> > Gluster-users at gluster.org >> > https://lists.gluster.org/mailman/listinfo/gluster-users >> >> >> >> -- >> Regards, >> Hari Gowtham. >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users >> > -- > --Atin > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -- Thanks, Sanju -------------- next part -------------- An HTML attachment was scrubbed... URL: From hgowtham at redhat.com Thu Apr 4 09:41:19 2019 From: hgowtham at redhat.com (Hari Gowtham) Date: Thu, 4 Apr 2019 15:11:19 +0530 Subject: [Gluster-users] Enabling quotas on gluster In-Reply-To: References: Message-ID: Hi, The performance hit that quota causes depended on a number of factors like: 1) the number of files, 2) the depth of the directories in the FS 3) the breadth of the directories in the FS 4) the number of bricks. These are the main contributions to the performance hit. If the volume is of lesser size then quota should work fine. Let us know more about your use case to help you better. Note: gluster quota is not being actively worked on. On Thu, Apr 4, 2019 at 3:45 AM Lindolfo Meira wrote: > > Hi folks. > > Does anyone know how significant is the performance penalty for enabling > directory level quotas on a gluster fs, compared to the case with no > quotas at all? > > > Lindolfo Meira, MSc > Diretor Geral, Centro Nacional de Supercomputa??o > Universidade Federal do Rio Grande do Sul > +55 (51) 3308-3139_______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -- Regards, Hari Gowtham. From pascal.suter at dalco.ch Thu Apr 4 10:03:19 2019 From: pascal.suter at dalco.ch (Pascal Suter) Date: Thu, 4 Apr 2019 12:03:19 +0200 Subject: [Gluster-users] performance - what can I expect In-Reply-To: References: Message-ID: <381efa03-78b3-e244-9f52-054b357b5d57@dalco.ch> I just noticed i left the most important parameters out :) here's the write command with filesize and recordsize in it as well :) ./iozone -i 0 -t 1 -F /mnt/gluster/storage/thread1 -+n -c -C -e -I -w -+S 0 -s 200G -r 16384k also i ran the benchmark without direct_io which resulted in an even worse performance. i also tried to mount the gluster volume via nfs-ganesha which further reduced throughput down to about 450MB/s if i run the iozone benchmark with 3 threads writing to all three bricks directly (from the xfs filesystem) i get throughputs of around 6GB/s .. if I run the same benchmark through gluster mounted locally using the fuse client and with enough threads so that each brick gets at least one file written to it, i end up seing throughputs around 1.5GB/s .. that's a 4x decrease in performance. at it actually is the same if i run the benchmark with less threads and files only get written to two out of three bricks. cpu load on the server is around 25% by the way, nicely distributed across all available cores. i can't believe that gluster should really be so slow and everybody is just happily using it. any hints on what i'm doing wrong are very welcome. i'm using gluster 6.0 by the way. regards Pascal On 03.04.19 12:28, Pascal Suter wrote: > Hi all > > I am currently testing gluster on a single server. I have three > bricks, each a hardware RAID6 volume with thin provisioned LVM that > was aligned to the RAID and then formatted with xfs. > > i've created a distributed volume so that entire files get distributed > across my three bricks. > > first I ran a iozone benchmark across each brick testing the read and > write perofrmance of a single large file per brick > > i then mounted my gluster volume locally and ran another iozone run > with the same parameters writing a single file. the file went to brick > 1 which, when used driectly, would write with 2.3GB/s and read with > 1.5GB/s. however, through gluster i got only 800MB/s read and 750MB/s > write throughput > > another run with two processes each writing a file, where one file > went to the first brick and the other file to the second brick (which > by itself when directly accessed wrote at 2.8GB/s and read at 2.7GB/s) > resulted in 1.2GB/s of aggregated write and also aggregated read > throughput. > > Is this a normal performance i can expect out of a glusterfs or is it > worth tuning in order to really get closer to the actual brick > filesystem performance? > > here are the iozone commands i use for writing and reading.. note that > i am using directIO in order to make sure i don't get fooled by cache :) > > ./iozone -i 0 -t 1 -F /mnt/brick${b}/thread1 -+n -c -C -e -I -w -+S 0 > -s $filesize -r $recordsize > iozone-brick${b}-write.txt > > ./iozone -i 1 -t 1 -F /mnt/brick${b}/thread1 -+n -c -C -e -I -w -+S 0 > -s $filesize -r $recordsize > iozone-brick${b}-read.txt > > cheers > > Pascal > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users From bandabasotti at gmail.com Thu Apr 4 10:15:01 2019 From: bandabasotti at gmail.com (banda bassotti) Date: Thu, 4 Apr 2019 12:15:01 +0200 Subject: [Gluster-users] thin arbiter setup Message-ID: Hi all, is there a detailed guide on how to configure a two-node cluster with a thin arbiter? I tried to follow the guide: https://docs.gluster.org/en/latest/Administrator%20Guide/Thin-Arbiter-Volumes/#setting-up-thin-arbiter-volume but it doesn't work. I'm using debian stretch and gluster 6 repository. thnx a lot. banda. -------------- next part -------------- An HTML attachment was scrubbed... URL: From aspandey at redhat.com Thu Apr 4 11:37:50 2019 From: aspandey at redhat.com (Ashish Pandey) Date: Thu, 4 Apr 2019 07:37:50 -0400 (EDT) Subject: [Gluster-users] thin arbiter setup In-Reply-To: References: Message-ID: <1634249416.10967979.1554377870313.JavaMail.zimbra@redhat.com> Hi, Currently, thin-arbiter can be setup using GD2. glustercli command is provided by GD2 only. Have you installed and started GD2 first? Could you please mention in which step you faced issue? --- Ashish ----- Original Message ----- From: "banda bassotti" To: gluster-users at gluster.org Sent: Thursday, April 4, 2019 3:45:01 PM Subject: [Gluster-users] thin arbiter setup Hi all, is there a detailed guide on how to configure a two-node cluster with a thin arbiter? I tried to follow the guide: https://docs.gluster.org/en/latest/Administrator%20Guide/Thin-Arbiter-Volumes/#setting-up-thin-arbiter-volume but it doesn't work. I'm using debian stretch and gluster 6 repository. thnx a lot. banda. _______________________________________________ Gluster-users mailing list Gluster-users at gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From hunter86_bg at yahoo.com Thu Apr 4 13:13:01 2019 From: hunter86_bg at yahoo.com (Strahil Nikolov) Date: Thu, 4 Apr 2019 13:13:01 +0000 (UTC) Subject: [Gluster-users] thin arbiter setup In-Reply-To: References: Message-ID: <1610020491.16495722.1554383581498@mail.yahoo.com> Hi Banda, As far as I know (mentioned here in the mail list) , you need to use GlusterD2 and not the standard one . Best Regards,Strahil Nikolov ? ?????????, 4 ????? 2019 ?., 13:19:51 ?. ???????+3, banda bassotti ??????: Hi all, is there a detailed guide on how to configure a two-node cluster with a thin arbiter? I tried to follow the guide:? https://docs.gluster.org/en/latest/Administrator%20Guide/Thin-Arbiter-Volumes/#setting-up-thin-arbiter-volume?? but it doesn't work.? I'm using debian stretch and gluster 6 repository. thnx a lot. banda._______________________________________________ Gluster-users mailing list Gluster-users at gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From hunter86_bg at yahoo.com Thu Apr 4 13:26:56 2019 From: hunter86_bg at yahoo.com (Strahil Nikolov) Date: Thu, 4 Apr 2019 13:26:56 +0000 (UTC) Subject: [Gluster-users] thin arbiter setup In-Reply-To: <1634249416.10967979.1554377870313.JavaMail.zimbra@redhat.com> References: <1634249416.10967979.1554377870313.JavaMail.zimbra@redhat.com> Message-ID: <276872486.16505307.1554384416533@mail.yahoo.com> I have proposed a change in the Docs about thin arbiters, as it is quite deceptive. Best Regards,Strahil Nikolov ? ?????????, 4 ????? 2019 ?., 14:38:16 ?. ???????+3, Ashish Pandey ??????: Hi, Currently, thin-arbiter can be setup using GD2. glustercli command is provided by GD2 only. Have you installed and started GD2 first? Could you please mention in which step you faced issue? --- Ashish From: "banda bassotti" To: gluster-users at gluster.org Sent: Thursday, April 4, 2019 3:45:01 PM Subject: [Gluster-users] thin arbiter setup Hi all, is there a detailed guide on how to configure a two-node cluster with a thin arbiter? I tried to follow the guide:? https://docs.gluster.org/en/latest/Administrator%20Guide/Thin-Arbiter-Volumes/#setting-up-thin-arbiter-volume?? but it doesn't work.? I'm using debian stretch and gluster 6 repository. thnx a lot. banda. _______________________________________________ Gluster-users mailing list Gluster-users at gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users _______________________________________________ Gluster-users mailing list Gluster-users at gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From budic at onholyground.com Thu Apr 4 15:56:33 2019 From: budic at onholyground.com (Darrell Budic) Date: Thu, 4 Apr 2019 10:56:33 -0500 Subject: [Gluster-users] [Gluster-devel] Upgrade testing to gluster 6 In-Reply-To: References: <13CA0DC5-C248-40B8-B2D4-E6664812303A@onholyground.com> Message-ID: <493E624F-FC40-4242-BF9D-BAD0385B2DA5@onholyground.com> I didn?t follow any specific documents, just a generic rolling upgrade one node at a time. Once the first node didn?t reconnect, I tried to follow the workaround in the bug during the upgrade. Basic procedure was: - take 3 nodes that were initially installed with 3.12.x (forget which, but low number) and had been upgraded directly to 5.5 from 3.12.15 - op-version was 50400 - on node A: - yum install centos-release-gluster6 - yum upgrade (was some ovirt cockpit components, gluster, and a lib or two this time), hit yes - discover glusterd was dead - systemctl restart glusterd - no peer connections, try iptables -F; systemctl restart glusterd, no change - following the workaround in the bug, try iptables -F & restart glusterd on other 2 nodes, no effect - nodes B & C were still connected to each other and all bricks were fine at this point - try upgrading other 2 nodes and restarting gluster, no effect (iptables still empty) - lost quota here, so all bricks went offline - read logs, not finding much, but looked at glusterd.vol and compared to new versions - updated glusterd.vol on A and restarted glusterd - A doesn?t show any connected peers, but both other nodes show A as connected - update glusterd.vol on B & C, restart glusterd - all nodes show connected and volumes are active and healing The only odd thing in my process was that node A did not have any active bricks on it at the time of the upgrade. It doesn?t seem like this mattered since B & C showed the same symptoms between themselves while being upgraded, but I don?t know. The only log entry that referenced anything about peer connections is included below already. Looks like it was related to my glusterd settings, since that?s what fixed it for me. Unfortunately, I don?t have the bandwidth or the systems to test different versions of that specifically, but maybe you guys can on some test resources? Otherwise, I?ve got another cluster (my production one!) that?s midway through the upgrade from 3.12.15 -> 5.5. I paused when I started getting multiple brick processes on the two nodes that had gone to 5.5 already. I think I?m going to jump the last node right to 6 to try and avoid that mess, and it has the same glusterd.vol settings. I?ll try and capture it?s logs during the upgrade and see if there?s any new info, or if it has the same issues as this group did. -Darrell > On Apr 4, 2019, at 2:54 AM, Sanju Rakonde wrote: > > We don't hit https://bugzilla.redhat.com/show_bug.cgi?id=1694010 while upgrading to glusterfs-6. We tested it in different setups and understood that this issue is seen because of some issue in setup. > > regarding the issue you have faced, can you please let us know which documentation you have followed for the upgrade? During our testing, we didn't hit any such issue. we would like to understand what went wrong. > > On Thu, Apr 4, 2019 at 2:08 AM Darrell Budic > wrote: > Hari- > > I was upgrading my test cluster from 5.5 to 6 and I hit this bug (https://bugzilla.redhat.com/show_bug.cgi?id=1694010 ) or something similar. In my case, the workaround did not work, and I was left with a gluster that had gone into no-quorum mode and stopped all the bricks. Wasn?t much in the logs either, but I noticed my /etc/glusterfs/glusterd.vol files were not the same as the newer versions, so I updated them, restarted glusterd, and suddenly the updated node showed as peer-in-cluster again. Once I updated other notes the same way, things started working again. Maybe a place to look? > > My old config (all nodes): > volume management > type mgmt/glusterd > option working-directory /var/lib/glusterd > option transport-type socket > option transport.socket.keepalive-time 10 > option transport.socket.keepalive-interval 2 > option transport.socket.read-fail-log off > option ping-timeout 10 > option event-threads 1 > option rpc-auth-allow-insecure on > # option transport.address-family inet6 > # option base-port 49152 > end-volume > > changed to: > volume management > type mgmt/glusterd > option working-directory /var/lib/glusterd > option transport-type socket,rdma > option transport.socket.keepalive-time 10 > option transport.socket.keepalive-interval 2 > option transport.socket.read-fail-log off > option transport.socket.listen-port 24007 > option transport.rdma.listen-port 24008 > option ping-timeout 0 > option event-threads 1 > option rpc-auth-allow-insecure on > # option lock-timer 180 > # option transport.address-family inet6 > # option base-port 49152 > option max-port 60999 > end-volume > > the only thing I found in the glusterd logs that looks relevant was (repeated for both of the other nodes in this cluster), so no clue why it happened: > [2019-04-03 20:19:16.802638] I [MSGID: 106004] [glusterd-handler.c:6427:__glusterd_peer_rpc_notify] 0-management: Peer (<0ecbf953-681b-448f-9746-d1c1fe7a0978>), in state , has disconnected from glusterd. > > >> On Apr 2, 2019, at 4:53 AM, Atin Mukherjee > wrote: >> >> >> >> On Mon, 1 Apr 2019 at 10:28, Hari Gowtham > wrote: >> Comments inline. >> >> On Mon, Apr 1, 2019 at 5:55 AM Sankarshan Mukhopadhyay >> > wrote: >> > >> > Quite a considerable amount of detail here. Thank you! >> > >> > On Fri, Mar 29, 2019 at 11:42 AM Hari Gowtham > wrote: >> > > >> > > Hello Gluster users, >> > > >> > > As you all aware that glusterfs-6 is out, we would like to inform you >> > > that, we have spent a significant amount of time in testing >> > > glusterfs-6 in upgrade scenarios. We have done upgrade testing to >> > > glusterfs-6 from various releases like 3.12, 4.1 and 5.3. >> > > >> > > As glusterfs-6 has got in a lot of changes, we wanted to test those portions. >> > > There were xlators (and respective options to enable/disable them) >> > > added and deprecated in glusterfs-6 from various versions [1]. >> > > >> > > We had to check the following upgrade scenarios for all such options >> > > Identified in [1]: >> > > 1) option never enabled and upgraded >> > > 2) option enabled and then upgraded >> > > 3) option enabled and then disabled and then upgraded >> > > >> > > We weren't manually able to check all the combinations for all the options. >> > > So the options involving enabling and disabling xlators were prioritized. >> > > The below are the result of the ones tested. >> > > >> > > Never enabled and upgraded: >> > > checked from 3.12, 4.1, 5.3 to 6 the upgrade works. >> > > >> > > Enabled and upgraded: >> > > Tested for tier which is deprecated, It is not a recommended upgrade. >> > > As expected the volume won't be consumable and will have a few more >> > > issues as well. >> > > Tested with 3.12, 4.1 and 5.3 to 6 upgrade. >> > > >> > > Enabled, disabled before upgrade. >> > > Tested for tier with 3.12 and the upgrade went fine. >> > > >> > > There is one common issue to note in every upgrade. The node being >> > > upgraded is going into disconnected state. You have to flush the iptables >> > > and the restart glusterd on all nodes to fix this. >> > > >> > >> > Is this something that is written in the upgrade notes? I do not seem >> > to recall, if not, I'll send a PR >> >> No this wasn't mentioned in the release notes. PRs are welcome. >> >> > >> > > The testing for enabling new options is still pending. The new options >> > > won't cause as much issues as the deprecated ones so this was put at >> > > the end of the priority list. It would be nice to get contributions >> > > for this. >> > > >> > >> > Did the range of tests lead to any new issues? >> >> Yes. In the first round of testing we found an issue and had to postpone the >> release of 6 until the fix was made available. >> https://bugzilla.redhat.com/show_bug.cgi?id=1684029 >> >> And then we tested it again after this patch was made available. >> and came across this: >> https://bugzilla.redhat.com/show_bug.cgi?id=1694010 >> >> This isn?t a bug as we found that upgrade worked seamelessly in two different setup. So we have no issues in the upgrade path to glusterfs-6 release. >> >> >> >> Have mentioned this in the second mail as to how to over this situation >> for now until the fix is available. >> >> > >> > > For the disable testing, tier was used as it covers most of the xlator >> > > that was removed. And all of these tests were done on a replica 3 volume. >> > > >> > >> > I'm not sure if the Glusto team is reading this, but it would be >> > pertinent to understand if the approach you have taken can be >> > converted into a form of automated testing pre-release. >> >> I don't have an answer for this, have CCed Vijay. >> He might have an idea. >> >> > >> > > Note: This is only for upgrade testing of the newly added and removed >> > > xlators. Does not involve the normal tests for the xlator. >> > > >> > > If you have any questions, please feel free to reach us. >> > > >> > > [1] https://docs.google.com/spreadsheets/d/1nh7T5AXaV6kc5KgILOy2pEqjzC3t_R47f1XUXSVFetI/edit?usp=sharing >> > > >> > > Regards, >> > > Hari and Sanju. >> > _______________________________________________ >> > Gluster-users mailing list >> > Gluster-users at gluster.org >> > https://lists.gluster.org/mailman/listinfo/gluster-users >> >> >> >> -- >> Regards, >> Hari Gowtham. >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users >> -- >> --Atin >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > > -- > Thanks, > Sanju -------------- next part -------------- An HTML attachment was scrubbed... URL: From amukherj at redhat.com Thu Apr 4 16:25:05 2019 From: amukherj at redhat.com (Atin Mukherjee) Date: Thu, 4 Apr 2019 21:55:05 +0530 Subject: [Gluster-users] [Gluster-devel] Upgrade testing to gluster 6 In-Reply-To: <493E624F-FC40-4242-BF9D-BAD0385B2DA5@onholyground.com> References: <13CA0DC5-C248-40B8-B2D4-E6664812303A@onholyground.com> <493E624F-FC40-4242-BF9D-BAD0385B2DA5@onholyground.com> Message-ID: Darell, I fully understand that you can't reproduce it and you don't have bandwidth to test it again, but would you be able to send us the glusterd log from all the nodes when this happened. We would like to go through the logs and get back. I would particularly like to see if something has gone wrong with transport.socket.listen-port option. But with out the log files we can't find out anything. Hope you understand it. On Thu, Apr 4, 2019 at 9:27 PM Darrell Budic wrote: > I didn?t follow any specific documents, just a generic rolling upgrade one > node at a time. Once the first node didn?t reconnect, I tried to follow the > workaround in the bug during the upgrade. Basic procedure was: > > - take 3 nodes that were initially installed with 3.12.x (forget which, > but low number) and had been upgraded directly to 5.5 from 3.12.15 > - op-version was 50400 > - on node A: > - yum install centos-release-gluster6 > - yum upgrade (was some ovirt cockpit components, gluster, and a lib or > two this time), hit yes > - discover glusterd was dead > - systemctl restart glusterd > - no peer connections, try iptables -F; systemctl restart glusterd, no > change > - following the workaround in the bug, try iptables -F & restart glusterd > on other 2 nodes, no effect > - nodes B & C were still connected to each other and all bricks were > fine at this point > - try upgrading other 2 nodes and restarting gluster, no effect (iptables > still empty) > - lost quota here, so all bricks went offline > - read logs, not finding much, but looked at glusterd.vol and compared to > new versions > - updated glusterd.vol on A and restarted glusterd > - A doesn?t show any connected peers, but both other nodes show A as > connected > - update glusterd.vol on B & C, restart glusterd > - all nodes show connected and volumes are active and healing > > The only odd thing in my process was that node A did not have any active > bricks on it at the time of the upgrade. It doesn?t seem like this mattered > since B & C showed the same symptoms between themselves while being > upgraded, but I don?t know. The only log entry that referenced anything > about peer connections is included below already. > > Looks like it was related to my glusterd settings, since that?s what fixed > it for me. Unfortunately, I don?t have the bandwidth or the systems to test > different versions of that specifically, but maybe you guys can on some > test resources? Otherwise, I?ve got another cluster (my production one!) > that?s midway through the upgrade from 3.12.15 -> 5.5. I paused when I > started getting multiple brick processes on the two nodes that had gone to > 5.5 already. I think I?m going to jump the last node right to 6 to try and > avoid that mess, and it has the same glusterd.vol settings. I?ll try and > capture it?s logs during the upgrade and see if there?s any new info, or if > it has the same issues as this group did. > > -Darrell > > On Apr 4, 2019, at 2:54 AM, Sanju Rakonde wrote: > > We don't hit https://bugzilla.redhat.com/show_bug.cgi?id=1694010 while > upgrading to glusterfs-6. We tested it in different setups and understood > that this issue is seen because of some issue in setup. > > regarding the issue you have faced, can you please let us know which > documentation you have followed for the upgrade? During our testing, we > didn't hit any such issue. we would like to understand what went wrong. > > On Thu, Apr 4, 2019 at 2:08 AM Darrell Budic > wrote: > >> Hari- >> >> I was upgrading my test cluster from 5.5 to 6 and I hit this bug ( >> https://bugzilla.redhat.com/show_bug.cgi?id=1694010) or something >> similar. In my case, the workaround did not work, and I was left with a >> gluster that had gone into no-quorum mode and stopped all the bricks. >> Wasn?t much in the logs either, but I noticed my >> /etc/glusterfs/glusterd.vol files were not the same as the newer versions, >> so I updated them, restarted glusterd, and suddenly the updated node showed >> as peer-in-cluster again. Once I updated other notes the same way, things >> started working again. Maybe a place to look? >> >> My old config (all nodes): >> volume management >> type mgmt/glusterd >> option working-directory /var/lib/glusterd >> option transport-type socket >> option transport.socket.keepalive-time 10 >> option transport.socket.keepalive-interval 2 >> option transport.socket.read-fail-log off >> option ping-timeout 10 >> option event-threads 1 >> option rpc-auth-allow-insecure on >> # option transport.address-family inet6 >> # option base-port 49152 >> end-volume >> >> changed to: >> volume management >> type mgmt/glusterd >> option working-directory /var/lib/glusterd >> option transport-type socket,rdma >> option transport.socket.keepalive-time 10 >> option transport.socket.keepalive-interval 2 >> option transport.socket.read-fail-log off >> option transport.socket.listen-port 24007 >> option transport.rdma.listen-port 24008 >> option ping-timeout 0 >> option event-threads 1 >> option rpc-auth-allow-insecure on >> # option lock-timer 180 >> # option transport.address-family inet6 >> # option base-port 49152 >> option max-port 60999 >> end-volume >> >> the only thing I found in the glusterd logs that looks relevant was >> (repeated for both of the other nodes in this cluster), so no clue why it >> happened: >> [2019-04-03 20:19:16.802638] I [MSGID: 106004] >> [glusterd-handler.c:6427:__glusterd_peer_rpc_notify] 0-management: Peer >> (<0ecbf953-681b-448f-9746-d1c1fe7a0978>), in state > Cluster>, has disconnected from glusterd. >> >> >> On Apr 2, 2019, at 4:53 AM, Atin Mukherjee >> wrote: >> >> >> >> On Mon, 1 Apr 2019 at 10:28, Hari Gowtham wrote: >> >>> Comments inline. >>> >>> On Mon, Apr 1, 2019 at 5:55 AM Sankarshan Mukhopadhyay >>> wrote: >>> > >>> > Quite a considerable amount of detail here. Thank you! >>> > >>> > On Fri, Mar 29, 2019 at 11:42 AM Hari Gowtham >>> wrote: >>> > > >>> > > Hello Gluster users, >>> > > >>> > > As you all aware that glusterfs-6 is out, we would like to inform you >>> > > that, we have spent a significant amount of time in testing >>> > > glusterfs-6 in upgrade scenarios. We have done upgrade testing to >>> > > glusterfs-6 from various releases like 3.12, 4.1 and 5.3. >>> > > >>> > > As glusterfs-6 has got in a lot of changes, we wanted to test those >>> portions. >>> > > There were xlators (and respective options to enable/disable them) >>> > > added and deprecated in glusterfs-6 from various versions [1]. >>> > > >>> > > We had to check the following upgrade scenarios for all such options >>> > > Identified in [1]: >>> > > 1) option never enabled and upgraded >>> > > 2) option enabled and then upgraded >>> > > 3) option enabled and then disabled and then upgraded >>> > > >>> > > We weren't manually able to check all the combinations for all the >>> options. >>> > > So the options involving enabling and disabling xlators were >>> prioritized. >>> > > The below are the result of the ones tested. >>> > > >>> > > Never enabled and upgraded: >>> > > checked from 3.12, 4.1, 5.3 to 6 the upgrade works. >>> > > >>> > > Enabled and upgraded: >>> > > Tested for tier which is deprecated, It is not a recommended upgrade. >>> > > As expected the volume won't be consumable and will have a few more >>> > > issues as well. >>> > > Tested with 3.12, 4.1 and 5.3 to 6 upgrade. >>> > > >>> > > Enabled, disabled before upgrade. >>> > > Tested for tier with 3.12 and the upgrade went fine. >>> > > >>> > > There is one common issue to note in every upgrade. The node being >>> > > upgraded is going into disconnected state. You have to flush the >>> iptables >>> > > and the restart glusterd on all nodes to fix this. >>> > > >>> > >>> > Is this something that is written in the upgrade notes? I do not seem >>> > to recall, if not, I'll send a PR >>> >>> No this wasn't mentioned in the release notes. PRs are welcome. >>> >>> > >>> > > The testing for enabling new options is still pending. The new >>> options >>> > > won't cause as much issues as the deprecated ones so this was put at >>> > > the end of the priority list. It would be nice to get contributions >>> > > for this. >>> > > >>> > >>> > Did the range of tests lead to any new issues? >>> >>> Yes. In the first round of testing we found an issue and had to postpone >>> the >>> release of 6 until the fix was made available. >>> https://bugzilla.redhat.com/show_bug.cgi?id=1684029 >>> >>> And then we tested it again after this patch was made available. >>> and came across this: >>> https://bugzilla.redhat.com/show_bug.cgi?id=1694010 >> >> >> This isn?t a bug as we found that upgrade worked seamelessly in two >> different setup. So we have no issues in the upgrade path to glusterfs-6 >> release. >> >> >>> >>> Have mentioned this in the second mail as to how to over this situation >>> for now until the fix is available. >>> >>> > >>> > > For the disable testing, tier was used as it covers most of the >>> xlator >>> > > that was removed. And all of these tests were done on a replica 3 >>> volume. >>> > > >>> > >>> > I'm not sure if the Glusto team is reading this, but it would be >>> > pertinent to understand if the approach you have taken can be >>> > converted into a form of automated testing pre-release. >>> >>> I don't have an answer for this, have CCed Vijay. >>> He might have an idea. >>> >>> > >>> > > Note: This is only for upgrade testing of the newly added and removed >>> > > xlators. Does not involve the normal tests for the xlator. >>> > > >>> > > If you have any questions, please feel free to reach us. >>> > > >>> > > [1] >>> https://docs.google.com/spreadsheets/d/1nh7T5AXaV6kc5KgILOy2pEqjzC3t_R47f1XUXSVFetI/edit?usp=sharing >>> > > >>> > > Regards, >>> > > Hari and Sanju. >>> > _______________________________________________ >>> > Gluster-users mailing list >>> > Gluster-users at gluster.org >>> > https://lists.gluster.org/mailman/listinfo/gluster-users >>> >>> >>> >>> -- >>> Regards, >>> Hari Gowtham. >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> https://lists.gluster.org/mailman/listinfo/gluster-users >>> >> -- >> --Atin >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users >> >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users > > > > -- > Thanks, > Sanju > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From budic at onholyground.com Thu Apr 4 16:40:38 2019 From: budic at onholyground.com (Darrell Budic) Date: Thu, 4 Apr 2019 11:40:38 -0500 Subject: [Gluster-users] [Gluster-devel] Upgrade testing to gluster 6 In-Reply-To: References: <13CA0DC5-C248-40B8-B2D4-E6664812303A@onholyground.com> <493E624F-FC40-4242-BF9D-BAD0385B2DA5@onholyground.com> Message-ID: <0F2C3FE9-E190-47D4-A043-319F086447E9@onholyground.com> Just the glusterd.log from each node, right? > On Apr 4, 2019, at 11:25 AM, Atin Mukherjee wrote: > > Darell, > > I fully understand that you can't reproduce it and you don't have bandwidth to test it again, but would you be able to send us the glusterd log from all the nodes when this happened. We would like to go through the logs and get back. I would particularly like to see if something has gone wrong with transport.socket.listen-port option. But with out the log files we can't find out anything. Hope you understand it. > > On Thu, Apr 4, 2019 at 9:27 PM Darrell Budic > wrote: > I didn?t follow any specific documents, just a generic rolling upgrade one node at a time. Once the first node didn?t reconnect, I tried to follow the workaround in the bug during the upgrade. Basic procedure was: > > - take 3 nodes that were initially installed with 3.12.x (forget which, but low number) and had been upgraded directly to 5.5 from 3.12.15 > - op-version was 50400 > - on node A: > - yum install centos-release-gluster6 > - yum upgrade (was some ovirt cockpit components, gluster, and a lib or two this time), hit yes > - discover glusterd was dead > - systemctl restart glusterd > - no peer connections, try iptables -F; systemctl restart glusterd, no change > - following the workaround in the bug, try iptables -F & restart glusterd on other 2 nodes, no effect > - nodes B & C were still connected to each other and all bricks were fine at this point > - try upgrading other 2 nodes and restarting gluster, no effect (iptables still empty) > - lost quota here, so all bricks went offline > - read logs, not finding much, but looked at glusterd.vol and compared to new versions > - updated glusterd.vol on A and restarted glusterd > - A doesn?t show any connected peers, but both other nodes show A as connected > - update glusterd.vol on B & C, restart glusterd > - all nodes show connected and volumes are active and healing > > The only odd thing in my process was that node A did not have any active bricks on it at the time of the upgrade. It doesn?t seem like this mattered since B & C showed the same symptoms between themselves while being upgraded, but I don?t know. The only log entry that referenced anything about peer connections is included below already. > > Looks like it was related to my glusterd settings, since that?s what fixed it for me. Unfortunately, I don?t have the bandwidth or the systems to test different versions of that specifically, but maybe you guys can on some test resources? Otherwise, I?ve got another cluster (my production one!) that?s midway through the upgrade from 3.12.15 -> 5.5. I paused when I started getting multiple brick processes on the two nodes that had gone to 5.5 already. I think I?m going to jump the last node right to 6 to try and avoid that mess, and it has the same glusterd.vol settings. I?ll try and capture it?s logs during the upgrade and see if there?s any new info, or if it has the same issues as this group did. > > -Darrell > >> On Apr 4, 2019, at 2:54 AM, Sanju Rakonde > wrote: >> >> We don't hit https://bugzilla.redhat.com/show_bug.cgi?id=1694010 while upgrading to glusterfs-6. We tested it in different setups and understood that this issue is seen because of some issue in setup. >> >> regarding the issue you have faced, can you please let us know which documentation you have followed for the upgrade? During our testing, we didn't hit any such issue. we would like to understand what went wrong. >> >> On Thu, Apr 4, 2019 at 2:08 AM Darrell Budic > wrote: >> Hari- >> >> I was upgrading my test cluster from 5.5 to 6 and I hit this bug (https://bugzilla.redhat.com/show_bug.cgi?id=1694010 ) or something similar. In my case, the workaround did not work, and I was left with a gluster that had gone into no-quorum mode and stopped all the bricks. Wasn?t much in the logs either, but I noticed my /etc/glusterfs/glusterd.vol files were not the same as the newer versions, so I updated them, restarted glusterd, and suddenly the updated node showed as peer-in-cluster again. Once I updated other notes the same way, things started working again. Maybe a place to look? >> >> My old config (all nodes): >> volume management >> type mgmt/glusterd >> option working-directory /var/lib/glusterd >> option transport-type socket >> option transport.socket.keepalive-time 10 >> option transport.socket.keepalive-interval 2 >> option transport.socket.read-fail-log off >> option ping-timeout 10 >> option event-threads 1 >> option rpc-auth-allow-insecure on >> # option transport.address-family inet6 >> # option base-port 49152 >> end-volume >> >> changed to: >> volume management >> type mgmt/glusterd >> option working-directory /var/lib/glusterd >> option transport-type socket,rdma >> option transport.socket.keepalive-time 10 >> option transport.socket.keepalive-interval 2 >> option transport.socket.read-fail-log off >> option transport.socket.listen-port 24007 >> option transport.rdma.listen-port 24008 >> option ping-timeout 0 >> option event-threads 1 >> option rpc-auth-allow-insecure on >> # option lock-timer 180 >> # option transport.address-family inet6 >> # option base-port 49152 >> option max-port 60999 >> end-volume >> >> the only thing I found in the glusterd logs that looks relevant was (repeated for both of the other nodes in this cluster), so no clue why it happened: >> [2019-04-03 20:19:16.802638] I [MSGID: 106004] [glusterd-handler.c:6427:__glusterd_peer_rpc_notify] 0-management: Peer (<0ecbf953-681b-448f-9746-d1c1fe7a0978>), in state , has disconnected from glusterd. >> >> >>> On Apr 2, 2019, at 4:53 AM, Atin Mukherjee > wrote: >>> >>> >>> >>> On Mon, 1 Apr 2019 at 10:28, Hari Gowtham > wrote: >>> Comments inline. >>> >>> On Mon, Apr 1, 2019 at 5:55 AM Sankarshan Mukhopadhyay >>> > wrote: >>> > >>> > Quite a considerable amount of detail here. Thank you! >>> > >>> > On Fri, Mar 29, 2019 at 11:42 AM Hari Gowtham > wrote: >>> > > >>> > > Hello Gluster users, >>> > > >>> > > As you all aware that glusterfs-6 is out, we would like to inform you >>> > > that, we have spent a significant amount of time in testing >>> > > glusterfs-6 in upgrade scenarios. We have done upgrade testing to >>> > > glusterfs-6 from various releases like 3.12, 4.1 and 5.3. >>> > > >>> > > As glusterfs-6 has got in a lot of changes, we wanted to test those portions. >>> > > There were xlators (and respective options to enable/disable them) >>> > > added and deprecated in glusterfs-6 from various versions [1]. >>> > > >>> > > We had to check the following upgrade scenarios for all such options >>> > > Identified in [1]: >>> > > 1) option never enabled and upgraded >>> > > 2) option enabled and then upgraded >>> > > 3) option enabled and then disabled and then upgraded >>> > > >>> > > We weren't manually able to check all the combinations for all the options. >>> > > So the options involving enabling and disabling xlators were prioritized. >>> > > The below are the result of the ones tested. >>> > > >>> > > Never enabled and upgraded: >>> > > checked from 3.12, 4.1, 5.3 to 6 the upgrade works. >>> > > >>> > > Enabled and upgraded: >>> > > Tested for tier which is deprecated, It is not a recommended upgrade. >>> > > As expected the volume won't be consumable and will have a few more >>> > > issues as well. >>> > > Tested with 3.12, 4.1 and 5.3 to 6 upgrade. >>> > > >>> > > Enabled, disabled before upgrade. >>> > > Tested for tier with 3.12 and the upgrade went fine. >>> > > >>> > > There is one common issue to note in every upgrade. The node being >>> > > upgraded is going into disconnected state. You have to flush the iptables >>> > > and the restart glusterd on all nodes to fix this. >>> > > >>> > >>> > Is this something that is written in the upgrade notes? I do not seem >>> > to recall, if not, I'll send a PR >>> >>> No this wasn't mentioned in the release notes. PRs are welcome. >>> >>> > >>> > > The testing for enabling new options is still pending. The new options >>> > > won't cause as much issues as the deprecated ones so this was put at >>> > > the end of the priority list. It would be nice to get contributions >>> > > for this. >>> > > >>> > >>> > Did the range of tests lead to any new issues? >>> >>> Yes. In the first round of testing we found an issue and had to postpone the >>> release of 6 until the fix was made available. >>> https://bugzilla.redhat.com/show_bug.cgi?id=1684029 >>> >>> And then we tested it again after this patch was made available. >>> and came across this: >>> https://bugzilla.redhat.com/show_bug.cgi?id=1694010 >>> >>> This isn?t a bug as we found that upgrade worked seamelessly in two different setup. So we have no issues in the upgrade path to glusterfs-6 release. >>> >>> >>> >>> Have mentioned this in the second mail as to how to over this situation >>> for now until the fix is available. >>> >>> > >>> > > For the disable testing, tier was used as it covers most of the xlator >>> > > that was removed. And all of these tests were done on a replica 3 volume. >>> > > >>> > >>> > I'm not sure if the Glusto team is reading this, but it would be >>> > pertinent to understand if the approach you have taken can be >>> > converted into a form of automated testing pre-release. >>> >>> I don't have an answer for this, have CCed Vijay. >>> He might have an idea. >>> >>> > >>> > > Note: This is only for upgrade testing of the newly added and removed >>> > > xlators. Does not involve the normal tests for the xlator. >>> > > >>> > > If you have any questions, please feel free to reach us. >>> > > >>> > > [1] https://docs.google.com/spreadsheets/d/1nh7T5AXaV6kc5KgILOy2pEqjzC3t_R47f1XUXSVFetI/edit?usp=sharing >>> > > >>> > > Regards, >>> > > Hari and Sanju. >>> > _______________________________________________ >>> > Gluster-users mailing list >>> > Gluster-users at gluster.org >>> > https://lists.gluster.org/mailman/listinfo/gluster-users >>> >>> >>> >>> -- >>> Regards, >>> Hari Gowtham. >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> https://lists.gluster.org/mailman/listinfo/gluster-users >>> -- >>> --Atin >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> https://lists.gluster.org/mailman/listinfo/gluster-users >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users >> >> -- >> Thanks, >> Sanju > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From joel_patterson at verizon.net Thu Apr 4 16:48:44 2019 From: joel_patterson at verizon.net (Joel Patterson) Date: Thu, 4 Apr 2019 12:48:44 -0400 Subject: [Gluster-users] backupvolfile-server (servers) not working for new mounts? Message-ID: <6e5457c0-ed98-d41a-fe58-ce77705f892d@verizon.net> I have a gluster 4.1 system with three servers running Docker/Kubernetes.??? The pods mount filesystems using gluster. 10.13.112.31 is the primary server [A] and all mounts specify it with two other servers [10.13.113.116 [B] and 10.13.114.16 [C]] specified in backup-volfile-servers. I'm testing what happens when a server goes down. If I bring down [B] or [C], no problem, everything restages and works. But if I bring down [A], any *existing* mount continues to work, but any new mounts fail.? I'm seeing messages about all subvolumes being down in the pod. But I've mounted this exact same volume on the same system (before I bring down the server) and I can access all the data fine. Why the failure for new mounts???? I'm on AWS and all servers are in different availability zones, but I don't see how that would be an issue. I tried using just backupvolfile-server and that didn't work either. --- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus From jorge.crespo at avature.net Thu Apr 4 17:17:08 2019 From: jorge.crespo at avature.net (Jorge Crespo) Date: Thu, 4 Apr 2019 19:17:08 +0200 Subject: [Gluster-users] Gluster with KVM - VM migration Message-ID: Hi everyone, First message in this list, hope I can help out as much as I can. I was wondering if someone could point out any solution already working or this would be a matter of scripting. We are using Gluster for a kind of strange infrastructure , where we have let's say 2 NODES , 2 bricks each , 2 volumes total. And both servers are mounting as clients both volumes. We exclusively use these volumes to run VM's. And the reason of the infrastructure is to be able to LiveMigrate VM's from one node to the other. VM's defined and running in NODE 1, MV files in /gluster_gv1 , this GlusterFS is also mounted in NODE2 ,but NODE2 doesn't make any real use of it. VM's defined and running in NODE 2, MV files in /gluster_gv2 , as before, this GlusterFS is also mounted in NODE1, but it doesn't make any real use of it. So the question comes now: - Let's say we come to an scenario where NODE 1 comes down. I have the VM's files copied to NODE2, I define them in NODE2 and start them , no problem with that. - Now the NODE 1 comes back UP , I guess the safest solution should be to have the VM's without Autostart so things don't go messy. But let's imagine I want my system to know which VM's are started in NODE 2, and start the ones that haven't been started in NODE 2. Is there any "official" way to achieve this? Basically achieve something like vCenter where the cluster keeps track of where the VM's are running at any given time, and also being able to start them in a different node if their node goes down. If there is no "official" answer, I'd like to hear your opinions. Cheers! -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 4012 bytes Desc: Firma criptogr??fica S/MIME URL: From amukherj at redhat.com Thu Apr 4 17:43:43 2019 From: amukherj at redhat.com (Atin Mukherjee) Date: Thu, 4 Apr 2019 23:13:43 +0530 Subject: [Gluster-users] [Gluster-devel] Upgrade testing to gluster 6 In-Reply-To: <0F2C3FE9-E190-47D4-A043-319F086447E9@onholyground.com> References: <13CA0DC5-C248-40B8-B2D4-E6664812303A@onholyground.com> <493E624F-FC40-4242-BF9D-BAD0385B2DA5@onholyground.com> <0F2C3FE9-E190-47D4-A043-319F086447E9@onholyground.com> Message-ID: On Thu, 4 Apr 2019 at 22:10, Darrell Budic wrote: > Just the glusterd.log from each node, right? > Yes. > > On Apr 4, 2019, at 11:25 AM, Atin Mukherjee wrote: > > Darell, > > I fully understand that you can't reproduce it and you don't have > bandwidth to test it again, but would you be able to send us the glusterd > log from all the nodes when this happened. We would like to go through the > logs and get back. I would particularly like to see if something has gone > wrong with transport.socket.listen-port option. But with out the log files > we can't find out anything. Hope you understand it. > > On Thu, Apr 4, 2019 at 9:27 PM Darrell Budic > wrote: > >> I didn?t follow any specific documents, just a generic rolling upgrade >> one node at a time. Once the first node didn?t reconnect, I tried to follow >> the workaround in the bug during the upgrade. Basic procedure was: >> >> - take 3 nodes that were initially installed with 3.12.x (forget which, >> but low number) and had been upgraded directly to 5.5 from 3.12.15 >> - op-version was 50400 >> - on node A: >> - yum install centos-release-gluster6 >> - yum upgrade (was some ovirt cockpit components, gluster, and a lib or >> two this time), hit yes >> - discover glusterd was dead >> - systemctl restart glusterd >> - no peer connections, try iptables -F; systemctl restart glusterd, no >> change >> - following the workaround in the bug, try iptables -F & restart glusterd >> on other 2 nodes, no effect >> - nodes B & C were still connected to each other and all bricks were >> fine at this point >> - try upgrading other 2 nodes and restarting gluster, no effect (iptables >> still empty) >> - lost quota here, so all bricks went offline >> - read logs, not finding much, but looked at glusterd.vol and compared to >> new versions >> - updated glusterd.vol on A and restarted glusterd >> - A doesn?t show any connected peers, but both other nodes show A as >> connected >> - update glusterd.vol on B & C, restart glusterd >> - all nodes show connected and volumes are active and healing >> >> The only odd thing in my process was that node A did not have any active >> bricks on it at the time of the upgrade. It doesn?t seem like this mattered >> since B & C showed the same symptoms between themselves while being >> upgraded, but I don?t know. The only log entry that referenced anything >> about peer connections is included below already. >> >> Looks like it was related to my glusterd settings, since that?s what >> fixed it for me. Unfortunately, I don?t have the bandwidth or the systems >> to test different versions of that specifically, but maybe you guys can on >> some test resources? Otherwise, I?ve got another cluster (my production >> one!) that?s midway through the upgrade from 3.12.15 -> 5.5. I paused when >> I started getting multiple brick processes on the two nodes that had gone >> to 5.5 already. I think I?m going to jump the last node right to 6 to try >> and avoid that mess, and it has the same glusterd.vol settings. I?ll try >> and capture it?s logs during the upgrade and see if there?s any new info, >> or if it has the same issues as this group did. >> >> -Darrell >> >> On Apr 4, 2019, at 2:54 AM, Sanju Rakonde wrote: >> >> We don't hit https://bugzilla.redhat.com/show_bug.cgi?id=1694010 while >> upgrading to glusterfs-6. We tested it in different setups and understood >> that this issue is seen because of some issue in setup. >> >> regarding the issue you have faced, can you please let us know which >> documentation you have followed for the upgrade? During our testing, we >> didn't hit any such issue. we would like to understand what went wrong. >> >> On Thu, Apr 4, 2019 at 2:08 AM Darrell Budic >> wrote: >> >>> Hari- >>> >>> I was upgrading my test cluster from 5.5 to 6 and I hit this bug ( >>> https://bugzilla.redhat.com/show_bug.cgi?id=1694010) or something >>> similar. In my case, the workaround did not work, and I was left with a >>> gluster that had gone into no-quorum mode and stopped all the bricks. >>> Wasn?t much in the logs either, but I noticed my >>> /etc/glusterfs/glusterd.vol files were not the same as the newer versions, >>> so I updated them, restarted glusterd, and suddenly the updated node showed >>> as peer-in-cluster again. Once I updated other notes the same way, things >>> started working again. Maybe a place to look? >>> >>> My old config (all nodes): >>> volume management >>> type mgmt/glusterd >>> option working-directory /var/lib/glusterd >>> option transport-type socket >>> option transport.socket.keepalive-time 10 >>> option transport.socket.keepalive-interval 2 >>> option transport.socket.read-fail-log off >>> option ping-timeout 10 >>> option event-threads 1 >>> option rpc-auth-allow-insecure on >>> # option transport.address-family inet6 >>> # option base-port 49152 >>> end-volume >>> >>> changed to: >>> volume management >>> type mgmt/glusterd >>> option working-directory /var/lib/glusterd >>> option transport-type socket,rdma >>> option transport.socket.keepalive-time 10 >>> option transport.socket.keepalive-interval 2 >>> option transport.socket.read-fail-log off >>> option transport.socket.listen-port 24007 >>> option transport.rdma.listen-port 24008 >>> option ping-timeout 0 >>> option event-threads 1 >>> option rpc-auth-allow-insecure on >>> # option lock-timer 180 >>> # option transport.address-family inet6 >>> # option base-port 49152 >>> option max-port 60999 >>> end-volume >>> >>> the only thing I found in the glusterd logs that looks relevant was >>> (repeated for both of the other nodes in this cluster), so no clue why it >>> happened: >>> [2019-04-03 20:19:16.802638] I [MSGID: 106004] >>> [glusterd-handler.c:6427:__glusterd_peer_rpc_notify] 0-management: Peer >>> (<0ecbf953-681b-448f-9746-d1c1fe7a0978>), in state >> Cluster>, has disconnected from glusterd. >>> >>> >>> On Apr 2, 2019, at 4:53 AM, Atin Mukherjee >>> wrote: >>> >>> >>> >>> On Mon, 1 Apr 2019 at 10:28, Hari Gowtham wrote: >>> >>>> Comments inline. >>>> >>>> On Mon, Apr 1, 2019 at 5:55 AM Sankarshan Mukhopadhyay >>>> wrote: >>>> > >>>> > Quite a considerable amount of detail here. Thank you! >>>> > >>>> > On Fri, Mar 29, 2019 at 11:42 AM Hari Gowtham >>>> wrote: >>>> > > >>>> > > Hello Gluster users, >>>> > > >>>> > > As you all aware that glusterfs-6 is out, we would like to inform >>>> you >>>> > > that, we have spent a significant amount of time in testing >>>> > > glusterfs-6 in upgrade scenarios. We have done upgrade testing to >>>> > > glusterfs-6 from various releases like 3.12, 4.1 and 5.3. >>>> > > >>>> > > As glusterfs-6 has got in a lot of changes, we wanted to test those >>>> portions. >>>> > > There were xlators (and respective options to enable/disable them) >>>> > > added and deprecated in glusterfs-6 from various versions [1]. >>>> > > >>>> > > We had to check the following upgrade scenarios for all such options >>>> > > Identified in [1]: >>>> > > 1) option never enabled and upgraded >>>> > > 2) option enabled and then upgraded >>>> > > 3) option enabled and then disabled and then upgraded >>>> > > >>>> > > We weren't manually able to check all the combinations for all the >>>> options. >>>> > > So the options involving enabling and disabling xlators were >>>> prioritized. >>>> > > The below are the result of the ones tested. >>>> > > >>>> > > Never enabled and upgraded: >>>> > > checked from 3.12, 4.1, 5.3 to 6 the upgrade works. >>>> > > >>>> > > Enabled and upgraded: >>>> > > Tested for tier which is deprecated, It is not a recommended >>>> upgrade. >>>> > > As expected the volume won't be consumable and will have a few more >>>> > > issues as well. >>>> > > Tested with 3.12, 4.1 and 5.3 to 6 upgrade. >>>> > > >>>> > > Enabled, disabled before upgrade. >>>> > > Tested for tier with 3.12 and the upgrade went fine. >>>> > > >>>> > > There is one common issue to note in every upgrade. The node being >>>> > > upgraded is going into disconnected state. You have to flush the >>>> iptables >>>> > > and the restart glusterd on all nodes to fix this. >>>> > > >>>> > >>>> > Is this something that is written in the upgrade notes? I do not seem >>>> > to recall, if not, I'll send a PR >>>> >>>> No this wasn't mentioned in the release notes. PRs are welcome. >>>> >>>> > >>>> > > The testing for enabling new options is still pending. The new >>>> options >>>> > > won't cause as much issues as the deprecated ones so this was put at >>>> > > the end of the priority list. It would be nice to get contributions >>>> > > for this. >>>> > > >>>> > >>>> > Did the range of tests lead to any new issues? >>>> >>>> Yes. In the first round of testing we found an issue and had to >>>> postpone the >>>> release of 6 until the fix was made available. >>>> https://bugzilla.redhat.com/show_bug.cgi?id=1684029 >>>> >>>> And then we tested it again after this patch was made available. >>>> and came across this: >>>> https://bugzilla.redhat.com/show_bug.cgi?id=1694010 >>> >>> >>> This isn?t a bug as we found that upgrade worked seamelessly in two >>> different setup. So we have no issues in the upgrade path to glusterfs-6 >>> release. >>> >>> >>>> >>>> Have mentioned this in the second mail as to how to over this situation >>>> for now until the fix is available. >>>> >>>> > >>>> > > For the disable testing, tier was used as it covers most of the >>>> xlator >>>> > > that was removed. And all of these tests were done on a replica 3 >>>> volume. >>>> > > >>>> > >>>> > I'm not sure if the Glusto team is reading this, but it would be >>>> > pertinent to understand if the approach you have taken can be >>>> > converted into a form of automated testing pre-release. >>>> >>>> I don't have an answer for this, have CCed Vijay. >>>> He might have an idea. >>>> >>>> > >>>> > > Note: This is only for upgrade testing of the newly added and >>>> removed >>>> > > xlators. Does not involve the normal tests for the xlator. >>>> > > >>>> > > If you have any questions, please feel free to reach us. >>>> > > >>>> > > [1] >>>> https://docs.google.com/spreadsheets/d/1nh7T5AXaV6kc5KgILOy2pEqjzC3t_R47f1XUXSVFetI/edit?usp=sharing >>>> > > >>>> > > Regards, >>>> > > Hari and Sanju. >>>> > _______________________________________________ >>>> > Gluster-users mailing list >>>> > Gluster-users at gluster.org >>>> > https://lists.gluster.org/mailman/listinfo/gluster-users >>>> >>>> >>>> >>>> -- >>>> Regards, >>>> Hari Gowtham. >>>> _______________________________________________ >>>> Gluster-users mailing list >>>> Gluster-users at gluster.org >>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>> >>> -- >>> --Atin >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> https://lists.gluster.org/mailman/listinfo/gluster-users >>> >>> >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> https://lists.gluster.org/mailman/listinfo/gluster-users >> >> >> >> -- >> Thanks, >> Sanju >> >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users > > > -- - Atin (atinm) -------------- next part -------------- An HTML attachment was scrubbed... URL: From hunter86_bg at yahoo.com Fri Apr 5 03:44:58 2019 From: hunter86_bg at yahoo.com (Strahil) Date: Fri, 05 Apr 2019 06:44:58 +0300 Subject: [Gluster-users] [ovirt-users] Re: Controller recomandation - LSI2008/9265 Message-ID: Adding Gluster users' mail list.On Apr 5, 2019 06:02, Leo David wrote: > > Hi Everyone, > Any thoughts on this ? > > > On Wed, Apr 3, 2019, 17:02 Leo David wrote: >> >> Hi Everyone, >> For a hyperconverged setup started with 3 nodes and going up in time up to 12 nodes, I have to choose between LSI2008 ( jbod ) and LSI9265 (raid). >> Perc h710 ( raid )?might be an option too, but on a different chassis. >> There will not be many disk installed on each node, so the replication will be replica 3 replicated-distribute volumes across the nodes as: >> node1/disk1? node2/disk1? node3/disk1 >> node1/disk2? node2/disk2? node3/disk2 >> and so on... >> As i will add nodes to the cluster ,? I?intend expand the volumes using the same rule. >> What?would it?be a better way,? to used jbod cards ( no cache ) or raid card and create raid0 arrays ( one for each disk ) and therefore have a bit of raid cache ( 512Mb ) ? >> Is raid caching a benefit to have it underneath ovirt/gluster as long as I go for "Jbod"? installation anyway ? >> Thank you very much ! >> -- >> Best regards, Leo David -------------- next part -------------- An HTML attachment was scrubbed... URL: From mauryam at gmail.com Fri Apr 5 06:30:00 2019 From: mauryam at gmail.com (Maurya M) Date: Fri, 5 Apr 2019 12:00:00 +0530 Subject: [Gluster-users] Geo-replication benchmarking & DR - failover simulation Message-ID: Hi All, As i have now a geo-replication established between my 3 sites , wanted to test the flow throughput , is there any tools , documentation to measure the data flow b/w the various slave volumes setup. Also i understand the replication is unidirectional ( master to slave) - so incase of DR , are there tested methods to achieve the failover and reverse the replication and once the primary if restored - go back on the initial setup. Appreciate your thoughts & suggestions. Will look forward to the comments here. Thanks all, Maurya -------------- next part -------------- An HTML attachment was scrubbed... URL: From benedikt.kaless at forumZFD.de Fri Apr 5 07:28:52 2019 From: benedikt.kaless at forumZFD.de (=?UTF-8?Q?Benedikt_Kale=c3=9f?=) Date: Fri, 5 Apr 2019 09:28:52 +0200 Subject: [Gluster-users] Samba performance Message-ID: Hi everyone, I'm running gluster version 5.5-1 and I'm dealing with a slow performance together with samba and start to analyze the issue. I want to set the options for samba. But when I try ??? gluster volume set ? group samba I get the following error message: ??? Unable to open file '/var/lib/glusterd/groups/samba'. Do you have any hints for me? Thank you in advance Benedikt -- ?forumZFD Entschieden f?r Frieden|Committed to Peace Benedikt Kale? Leiter Team IT|Head team IT Forum Ziviler Friedensdienst e.V.|Forum Civil Peace Service Am K?lner Brett 8 | 50825 K?ln | Germany Tel 0221 91273233 | Fax 0221 91273299 | http://www.forumZFD.de Vorstand nach ? 26 BGB, einzelvertretungsberechtigt|Executive Board: Oliver Knabe (Vorsitz|Chair), Sonja Wiekenberg-Mlalandle, Alexander Mauz VR 17651 Amtsgericht K?ln Spenden|Donations: IBAN DE37 3702 0500 0008 2401 01 BIC BFSWDE33XXX From hunter86_bg at yahoo.com Fri Apr 5 08:48:13 2019 From: hunter86_bg at yahoo.com (Strahil Nikolov) Date: Fri, 5 Apr 2019 08:48:13 +0000 (UTC) Subject: [Gluster-users] [ovirt-users] Re: Hosted-Engine constantly dies In-Reply-To: <2065983264.15082423.1554204966673@mail.yahoo.com> References: <756008012.14131704.1554068838744.ref@mail.yahoo.com> <756008012.14131704.1554068838744@mail.yahoo.com> <1494883209.14382966.1554121892714@mail.yahoo.com> <608254695.14449614.1554126155785@mail.yahoo.com> <2065983264.15082423.1554204966673@mail.yahoo.com> Message-ID: <806868289.140719.1554454093999@mail.yahoo.com> Hi Simone, a short mail chain in gluster-users Amar confirmed my suspicion that Gluster v5.5 is performing a little bit slower than 3.12.15 .In result the sanlock reservations take too much time. I have updated my setup and uncached (used lvm caching in writeback mode) my data bricks and used the SSD for the engine volume.Now the engine is running quite well and no more issues were observed. Can you share any thoughts about oVirt being updated to Gluster v6.x ? I know that there are any hooks between vdsm and gluster and I'm not sure how vdsm will react on the new version. Best Regards,Strahil Nikolov -------------- next part -------------- An HTML attachment was scrubbed... URL: From hunter86_bg at yahoo.com Fri Apr 5 08:52:01 2019 From: hunter86_bg at yahoo.com (Strahil Nikolov) Date: Fri, 5 Apr 2019 08:52:01 +0000 (UTC) Subject: [Gluster-users] Geo-replication benchmarking & DR - failover simulation In-Reply-To: References: Message-ID: <2140013531.17058815.1554454321827@mail.yahoo.com> If you want to measure the traffic - I guess gtop can help you with that , but I have never checked it on geo-rep. At least this is what I'm checking when a full replication is needed (for example storage layout change on the brick). Best Regards,Strahil Nikolov ? ?????, 5 ????? 2019 ?., 9:30:22 ?. ???????+3, Maurya M ??????: Hi All,?As i have now a geo-replication established between my 3 sites , wanted to test the flow throughput , is there any tools , documentation to measure the data flow b/w the various slave volumes setup. Also i understand the replication is unidirectional ( master to slave) - so incase of DR , are there tested methods to achieve the failover and reverse the replication and once the primary if restored - go back on the initial setup. Appreciate your thoughts & suggestions. Will look forward to the comments here. Thanks all,Maurya_______________________________________________ Gluster-users mailing list Gluster-users at gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From anoopcs at cryptolab.net Fri Apr 5 09:03:14 2019 From: anoopcs at cryptolab.net (Anoop C S) Date: Fri, 05 Apr 2019 14:33:14 +0530 Subject: [Gluster-users] Samba performance In-Reply-To: References: Message-ID: <011cbf9c8a3795a504af36192c7432f35e4f6ca9.camel@cryptolab.net> On Fri, 2019-04-05 at 09:28 +0200, Benedikt Kale? wrote: > Hi everyone, > > I'm running gluster version 5.5-1 and I'm dealing with a slow > performance together with samba and start to analyze the issue. > > I want to set the options for samba. But when I try > > gluster volume set group samba > > I get the following error message: > > Unable to open file '/var/lib/glusterd/groups/samba'. > > Do you have any hints for me? This particular group command is only available from v6.0 (as of now). https://bugzilla.redhat.com/show_bug.cgi?id=1656771 > Thank you in advance > > Benedikt > From spisla80 at gmail.com Fri Apr 5 11:57:57 2019 From: spisla80 at gmail.com (David Spisla) Date: Fri, 5 Apr 2019 13:57:57 +0200 Subject: [Gluster-users] Samba performance In-Reply-To: <011cbf9c8a3795a504af36192c7432f35e4f6ca9.camel@cryptolab.net> References: <011cbf9c8a3795a504af36192c7432f35e4f6ca9.camel@cryptolab.net> Message-ID: Hello Anoop, it is a known issue that Gluster+Samba has a poor performance especially for small files. May this manual can give you more inspiration how to setup your Gluster and Samba environment. Regards David Spisla Am Fr., 5. Apr. 2019 um 11:28 Uhr schrieb Anoop C S : > On Fri, 2019-04-05 at 09:28 +0200, Benedikt Kale? wrote: > > Hi everyone, > > > > I'm running gluster version 5.5-1 and I'm dealing with a slow > > performance together with samba and start to analyze the issue. > > > > I want to set the options for samba. But when I try > > > > gluster volume set group samba > > > > I get the following error message: > > > > Unable to open file '/var/lib/glusterd/groups/samba'. > > > > Do you have any hints for me? > > This particular group command is only available from v6.0 (as of now). > > https://bugzilla.redhat.com/show_bug.cgi?id=1656771 > > > Thank you in advance > > > > Benedikt > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From spisla80 at gmail.com Fri Apr 5 11:58:50 2019 From: spisla80 at gmail.com (David Spisla) Date: Fri, 5 Apr 2019 13:58:50 +0200 Subject: [Gluster-users] Samba performance In-Reply-To: References: <011cbf9c8a3795a504af36192c7432f35e4f6ca9.camel@cryptolab.net> Message-ID: I forgot the link. Here it is: https://github.com/gluster/glusterdocs/blob/9f2c979b50517b7cb7bf9fbcfccb033bc7bf5082/docs/Administrator%20Guide/Accessing%20Gluster%20from%20Windows.md Am Fr., 5. Apr. 2019 um 13:57 Uhr schrieb David Spisla : > Hello Anoop, > > it is a known issue that Gluster+Samba has a poor performance especially > for small files. > May this manual can give you more inspiration how to setup your Gluster > and Samba environment. > > Regards > David Spisla > > Am Fr., 5. Apr. 2019 um 11:28 Uhr schrieb Anoop C S >: > >> On Fri, 2019-04-05 at 09:28 +0200, Benedikt Kale? wrote: >> > Hi everyone, >> > >> > I'm running gluster version 5.5-1 and I'm dealing with a slow >> > performance together with samba and start to analyze the issue. >> > >> > I want to set the options for samba. But when I try >> > >> > gluster volume set group samba >> > >> > I get the following error message: >> > >> > Unable to open file '/var/lib/glusterd/groups/samba'. >> > >> > Do you have any hints for me? >> >> This particular group command is only available from v6.0 (as of now). >> >> https://bugzilla.redhat.com/show_bug.cgi?id=1656771 >> >> > Thank you in advance >> > >> > Benedikt >> > >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hunter86_bg at yahoo.com Fri Apr 5 12:17:52 2019 From: hunter86_bg at yahoo.com (Strahil Nikolov) Date: Fri, 5 Apr 2019 12:17:52 +0000 (UTC) Subject: [Gluster-users] [ovirt-users] Re: Hosted-Engine constantly dies In-Reply-To: References: <756008012.14131704.1554068838744.ref@mail.yahoo.com> <756008012.14131704.1554068838744@mail.yahoo.com> <1494883209.14382966.1554121892714@mail.yahoo.com> <608254695.14449614.1554126155785@mail.yahoo.com> <2065983264.15082423.1554204966673@mail.yahoo.com> <806868289.140719.1554454093999@mail.yahoo.com> Message-ID: <1792865462.17113753.1554466672917@mail.yahoo.com> >This definitively helps, but for my experience the network speed is really determinant here.Can you describe your network >configuration? >A 10 Gbps net is definitively fine here. >A few bonded 1 Gbps nics could work. >A single 1 Gbps nic could be an issue. I have a gigabit interface on my workstations and sadly I have no option for upgrade without switching the hardware. I have observed my network traffic for days with iftop and gtop and I have never reached my Gbit interface's maximum bandwidth, not even the half of it. Even when reseting my bricks (gluster volume reset-brick) and running a full heal - I do not observe more than 50GiB/s utilization. I am not sure if FUSE is using network for accessing the local brick - but I? hope that it is not true. Checking disk performance - everything is in the expected ranges. I suspect that the Gluster v5 enhancements are increasing both network and IOPS requirements and my setup was not dealing with it properly. >It's definitively planned, see:?https://bugzilla.redhat.com/1693998>I'm not really sure about its time plan. I will try to get involved and provide feedback both to oVirt and Gluster dev teams. Best Regards,Strahil Nikolov -------------- next part -------------- An HTML attachment was scrubbed... URL: From srangana at redhat.com Fri Apr 5 13:36:49 2019 From: srangana at redhat.com (Shyam Ranganathan) Date: Fri, 5 Apr 2019 09:36:49 -0400 Subject: [Gluster-users] Announcing Gluster release 4.1.8 Message-ID: The Gluster community is pleased to announce the release of Gluster 4.1.8 (packages available at [1]). Release notes for the release can be found at [2]. Major changes, features and limitations addressed in this release: None Thanks, Gluster community [1] Packages for 4.1.8: https://download.gluster.org/pub/gluster/glusterfs/4.1/4.1.8/ [2] Release notes for 4.1.8: https://docs.gluster.org/en/latest/release-notes/4.1.8/ From pascal.suter at dalco.ch Fri Apr 5 14:04:07 2019 From: pascal.suter at dalco.ch (Pascal Suter) Date: Fri, 5 Apr 2019 16:04:07 +0200 Subject: [Gluster-users] Samba performance In-Reply-To: References: <011cbf9c8a3795a504af36192c7432f35e4f6ca9.camel@cryptolab.net> Message-ID: <605fe6be-609a-3e3b-3c1c-f75a88beb026@dalco.ch> also i think you should be able to get better performance if you use vfs_glusterfs [1]. From what I understand it's similar to samba what nfs-ganesha does for nfs, it uses the libgfapi directly rather than sharing a fuse-mount [1] https://www.samba.org/samba/docs/current/man-html/vfs_glusterfs.8.html on my test machine i did this: (centos 7) first we need to install |samba-vfs-glusterfs| and of course the samba server which comes as a dependency: yum install samba-vfs-glusterfs gluster volume set vol1 server.allow-insecure on now stop and start the gluster volume, this will auto-create a (non-working, how funny is that) configuration in |/etc/samba/smb.conf| gluster volume stop vol1 gluster volume start vol1 now edit |/etc/samba/smb.conf| and add |kernel share modes = no| and also |guest ok=yes| to the newly created |[gluster-vol1]| section so that it finally looks about like this: [gluster-vol1] comment = For samba share of volume vol1 vfs objects = glusterfs glusterfs:volume = vol1 glusterfs:logfile = /var/log/samba/glusterfs-vol1.%M.log glusterfs:loglevel = 7 path = / read only = no guest ok = yes kernel share modes = no now restart samba systemctl restart smb and now you should be able to mount the samba share mount.cifs -o guest //gluster1/gluster-vol1 /mnt/cifs On 05.04.19 13:58, David Spisla wrote: > I forgot the link. Here it is: > https://github.com/gluster/glusterdocs/blob/9f2c979b50517b7cb7bf9fbcfccb033bc7bf5082/docs/Administrator%20Guide/Accessing%20Gluster%20from%20Windows.md > > Am Fr., 5. Apr. 2019 um 13:57?Uhr schrieb David Spisla > >: > > Hello Anoop, > > it is a known issue that Gluster+Samba has a poor performance > especially for small files. > May this manual can give you more inspiration how to setup your > Gluster and Samba environment. > > Regards > David Spisla > > Am Fr., 5. Apr. 2019 um 11:28?Uhr schrieb Anoop C S > >: > > On Fri, 2019-04-05 at 09:28 +0200, Benedikt Kale? wrote: > > Hi everyone, > > > > I'm running gluster version 5.5-1 and I'm dealing with a slow > > performance together with samba and start to analyze the issue. > > > > I want to set the options for samba. But when I try > > > >? ? ?gluster volume set ? group samba > > > > I get the following error message: > > > >? ? ?Unable to open file '/var/lib/glusterd/groups/samba'. > > > > Do you have any hints for me? > > This particular group command is only available from v6.0 (as > of now). > > https://bugzilla.redhat.com/show_bug.cgi?id=1656771 > > > Thank you in advance > > > > Benedikt > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From hunter86_bg at yahoo.com Fri Apr 5 17:22:45 2019 From: hunter86_bg at yahoo.com (Strahil) Date: Fri, 05 Apr 2019 20:22:45 +0300 Subject: [Gluster-users] [ovirt-users] Re: Hosted-Engine constantly dies Message-ID: Hi Simone, > According to gluster administration guide: > https://docs.gluster.org/en/latest/Administrator%20Guide/Network%20Configurations%20Techniques/ > ? > in the "when to bond" section we can read: > network throughput limit of client/server \<\< storage throughput limit > > 1 GbE (almost always) > 10-Gbps links or faster -- for writes, replication doubles the load on the network and replicas are usually on different peers to which the client can transmit in parallel. > > So if you are using oVirt hyper-converged in replica 3 you have to transmit everything two times over the storage network to sync it with other peers. > > I'm not really in that details, but if?https://bugzilla.redhat.com/1673058 is really like it's described, we even have an 5x overhead with current gluster 5.x. > > This means that with a 1000 Mbps nic we cannot expect more than: > 1000 Mbps / 2 (other replicas) / 5 (overhead in Gluster 5.x ???) / 8 (bit per bytes) = 12.5 MByte per seconds and this is definitively enough to have sanlock failing especially because we don't have just the sanlock load as you can imagine. > > I'd strongly advice to move to 10 Gigabit Ethernet (nowadays with a few hundred dollars you can buy a 4/5 ports 10GBASE-T copper switch plus 3 nics and the cables just for the gluster network) or to bond a few 1 Gigabit Ethernet?links. I didn't know that. So , with 1 Gbit network everyone should use replica 3 arbiter 1 volumes to minimize replication traffic. Best Regards, Strahil Nikolov -------------- next part -------------- An HTML attachment was scrubbed... URL: From phlogistonjohn at asynchrono.us Fri Apr 5 19:30:29 2019 From: phlogistonjohn at asynchrono.us (John Mulligan) Date: Fri, 05 Apr 2019 15:30:29 -0400 Subject: [Gluster-users] Heketi v9.0.0 available for download Message-ID: <2297373.DkY5oybNEL@abydos> Heketi v9.0.0 is now available [1]. This is the new stable version of Heketi. Major additions in this release: * Limit volumes per Gluster cluster * Prevent server from starting if db has unknown dbattributes * Support a default admin mode option * Add an option to enable strict zone checking on volume creation * Add automatic pending operation clean-up functionality * Configurable device formatting parameters * Add consistency check feature and state examiner debugging tools * The faulty and non-functional "db delete-pending-entries" command has been removed This release contains numerous stability and bug fixes. A more detailed changelog is available at the release page [1]. -- John M. on behalf of the Heketi team [1] https://github.com/heketi/heketi/releases/tag/v9.0.0 From hunter86_bg at yahoo.com Sun Apr 7 13:48:23 2019 From: hunter86_bg at yahoo.com (Strahil) Date: Sun, 07 Apr 2019 16:48:23 +0300 Subject: [Gluster-users] Cluster 5.5 brick constantly dies Message-ID: Hi, After a hardware maintenance (GPU removed) I have powered my oVirt node running gluster 5.5 and noticed that one volume has no running brick locally. After forcefully starting the volume, the brick is up but almost instantly I got the following on my CentOS 7 terminal. ================================ [root at ovirt2 ~]# gluster volume heal isos full Broadcast message from systemd-journald at ovirt2.localdomain (Sun 2019-04-07 16:41:30 EEST): gluster_bricks-isos-isos[6884]: [2019-04-07 13:41:30.148365] M [MSGID: 113075] [posix-helpers.c:1957:posix_health_check_thread_proc] 0-isos-posix: health-check failed, going down Broadcast message from systemd-journald at ovirt2.localdomain (Sun 2019-04-07 16:41:30 EEST): gluster_bricks-isos-isos[6884]: [2019-04-07 13:41:30.148934] M [MSGID: 113075] [posix-helpers.c:1975:posix_health_check_thread_proc] 0-isos-posix: still alive! -> SIGTERM Message from syslogd at ovirt2 at Apr 7 16:41:30 ... gluster_bricks-isos-isos[6884]:[2019-04-07 13:41:30.148365] M [MSGID: 113075] [posix-helpers.c:1957:posix_health_check_thread_proc] 0-isos-posix: health-check failed, going down Message from syslogd at ovirt2 at Apr 7 16:41:30 ... gluster_bricks-isos-isos[6884]:[2019-04-07 13:41:30.148934] M [MSGID: 113075] [posix-helpers.c:1975:posix_health_check_thread_proc] 0-isos-posix: still alive! -> SIGTERM ================================ Restarting glusterd.service didn't help. How should I debug it ? Best Regards, Strahil Nikolov -------------- next part -------------- An HTML attachment was scrubbed... URL: From budic at onholyground.com Sun Apr 7 14:06:51 2019 From: budic at onholyground.com (Darrell Budic) Date: Sun, 7 Apr 2019 09:06:51 -0500 Subject: [Gluster-users] Cluster 5.5 brick constantly dies In-Reply-To: References: Message-ID: You?ve probably got multiple glusterfsd brick processes running. It?s possible to track them down and kill them from a shell, do a gluster vol status to see which one got registered last with glusterd, then ps -ax | grep glusterd | grep "< volume name>" and kill any extra one that are not the PID reported from vol status. And upgrade to gluster6, I?m not all the way through that process, but so far it seems to resolve that problem for me. > On Apr 7, 2019, at 8:48 AM, Strahil wrote: > > Hi, > > After a hardware maintenance (GPU removed) I have powered my oVirt node running gluster 5.5 and noticed that one volume has no running brick locally. > > After forcefully starting the volume, the brick is up but almost instantly I got the following on my CentOS 7 terminal. > ================================ > > [root at ovirt2 ~]# gluster volume heal isos full > Broadcast message from systemd-journald at ovirt2.localdomain (Sun 2019-04-07 16:41:30 EEST): > > gluster_bricks-isos-isos[6884]: [2019-04-07 13:41:30.148365] M [MSGID: 113075] [posix-helpers.c:1957:posix_health_check_thread_proc] 0-isos-posix: health-check failed, going down > > Broadcast message from systemd-journald at ovirt2.localdomain (Sun 2019-04-07 16:41:30 EEST): > > gluster_bricks-isos-isos[6884]: [2019-04-07 13:41:30.148934] M [MSGID: 113075] [posix-helpers.c:1975:posix_health_check_thread_proc] 0-isos-posix: still alive! -> SIGTERM > > Message from syslogd at ovirt2 at Apr 7 16:41:30 ... > gluster_bricks-isos-isos[6884]:[2019-04-07 13:41:30.148365] M [MSGID: 113075] [posix-helpers.c:1957:posix_health_check_thread_proc] 0-isos-posix: health-check failed, going down > > Message from syslogd at ovirt2 at Apr 7 16:41:30 ... > gluster_bricks-isos-isos[6884]:[2019-04-07 13:41:30.148934] M [MSGID: 113075] [posix-helpers.c:1975:posix_health_check_thread_proc] 0-isos-posix: still alive! -> SIGTERM > > ================================ > > Restarting glusterd.service didn't help. > How should I debug it ? > > Best Regards, > Strahil Nikolov > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From hunter86_bg at yahoo.com Sun Apr 7 16:33:20 2019 From: hunter86_bg at yahoo.com (Strahil) Date: Sun, 07 Apr 2019 19:33:20 +0300 Subject: [Gluster-users] Cluster 5.5 brick constantly dies Message-ID: Thanks Darrell, I just let it idle for 15 min and on the next start force didn't have the issue. I guess the process died and on the force start there were no more processes for this brick. I will have your words in mind next time. I'm planing to move to Gluster v6 , but as far as I know oVirt has some kind of integration with gluster and I'm not sure what will happen. For now 5.5 is way more stable than 5.3 and I will keep with the officially provided by oVirt. Best Regards, Strahil Nikolov On Apr 7, 2019 17:06, Darrell Budic wrote: > > You?ve probably got multiple glusterfsd brick processes running. It?s possible to track them down and kill them from a shell, do a gluster vol status to see which one got registered last with glusterd, then ps -ax | grep glusterd | grep "< volume name>" and kill any extra one that are not the PID reported from vol status.? > > And upgrade to gluster6, I?m not all the way through that process, but so far it seems to resolve that problem for me. > >> On Apr 7, 2019, at 8:48 AM, Strahil wrote: >> >> Hi, >> >> After a hardware maintenance (GPU removed)? I have powered my oVirt node running gluster 5.5 and noticed that one volume has no running brick locally. >> >> After forcefully starting the volume, the brick is up but almost instantly I got the following on my CentOS 7 terminal. >> ================================ >> >> [root at ovirt2 ~]# gluster volume heal isos full >> Broadcast message from systemd-journald at ovirt2.localdomain (Sun 2019-04-07 16:41:30 EEST): >> >> gluster_bricks-isos-isos[6884]: [2019-04-07 13:41:30.148365] M [MSGID: 113075] [posix-helpers.c:1957:posix_health_check_thread_proc] 0-isos-posix: health-check failed, going down >> >> Broadcast message from systemd-journald at ovirt2.localdomain (Sun 2019-04-07 16:41:30 EEST): >> >> gluster_bricks-isos-isos[6884]: [2019-04-07 13:41:30.148934] M [MSGID: 113075] [posix-helpers.c:1975:posix_health_check_thread_proc] 0-isos-posix: still alive! -> SIGTERM >> >> Message from syslogd at ovirt2 at Apr? 7 16:41:30 ... >> gluster_bricks-isos-isos[6884]:[2019-04-07 13:41:30.148365] M [MSGID: 113075] [posix-helpers.c:1957:posix_health_check_thread_proc] 0-isos-posix: health-check failed, going down >> >> Message from syslogd at ovirt2 at Apr? 7 16:41:30 ... >> gluster_bricks-isos-isos[6884]:[2019-04-07 13:41:30.148934] M [MSGID: 113075] [posix-helpers.c:1975:posix_health_check_thread_proc] 0-isos-posix: still alive! -> SIGTERM >> >> ================================ >> >> Restarting glusterd.service didn't help. >> How should I debug it ? >> >> Best Regards, >> Strahil Nikolov >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sabose at redhat.com Mon Apr 8 05:04:17 2019 From: sabose at redhat.com (Sahina Bose) Date: Mon, 8 Apr 2019 10:34:17 +0530 Subject: [Gluster-users] Gluster with KVM - VM migration In-Reply-To: References: Message-ID: On Thu, Apr 4, 2019 at 10:53 PM Jorge Crespo wrote: > > > Hi everyone, > > First message in this list, hope I can help out as much as I can. > > I was wondering if someone could point out any solution already working > or this would be a matter of scripting. > > We are using Gluster for a kind of strange infrastructure , where we > have let's say 2 NODES , 2 bricks each , 2 volumes total. And both > servers are mounting as clients both volumes. > > We exclusively use these volumes to run VM's. And the reason of the > infrastructure is to be able to LiveMigrate VM's from one node to the other. > > VM's defined and running in NODE 1, MV files in /gluster_gv1 , this > GlusterFS is also mounted in NODE2 ,but NODE2 doesn't make any real use > of it. > > VM's defined and running in NODE 2, MV files in /gluster_gv2 , as > before, this GlusterFS is also mounted in NODE1, but it doesn't make any > real use of it. > > So the question comes now: > > - Let's say we come to an scenario where NODE 1 comes down. I have the > VM's files copied to NODE2, I define them in NODE2 and start them , no > problem with that. > > - Now the NODE 1 comes back UP , I guess the safest solution should be > to have the VM's without Autostart so things don't go messy. But let's > imagine I want my system to know which VM's are started in NODE 2, and > start the ones that haven't been started in NODE 2. > > Is there any "official" way to achieve this? Basically achieve something > like vCenter where the cluster keeps track of where the VM's are running > at any given time, and also being able to start them in a different node > if their node goes down. > > If there is no "official" answer, I'd like to hear your opinions. Have you considered oVirt + Gluster hyperconverged solution (https://www.ovirt.org/documentation/gluster-hyperconverged/Gluster_Hyperconverged_Guide.html)? It addresses the HA problem that you are after. Let us know. > Cheers! > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users From felix.koelzow at gmx.de Mon Apr 8 07:37:50 2019 From: felix.koelzow at gmx.de (=?UTF-8?Q?Felix_K=c3=b6lzow?=) Date: Mon, 8 Apr 2019 09:37:50 +0200 Subject: [Gluster-users] Gluster and LVM In-Reply-To: <9f4c439f-7893-4169-8d91-ebc41d22277d@netvel.net> References: <9f4c439f-7893-4169-8d91-ebc41d22277d@netvel.net> Message-ID: Thank you very much for your response. I fully agree that using LVM has great advantages. Maybe there is a misunderstanding, but I really got the recommendation to not use (normal) LVM in combination with gluster to increase the volume. *Maybe someone in the community has some good or bad experience* *using LVM and gluster in combination.* So please let me know :) > One of the arguments for things like Gluster and Ceph is that you can > many storage nodes that operate in parallel so that the ideal is a > very large number of small drive arrays over a small number of very > large drive arrays. I also agree we that. In our case, we actually plan to get Redhat Gluster Storage Support and an increase of storage nodes would mean an increase of support costs while the same amount of storage volume is available. So we are looking for a reasonable compromise. Felix On 03.04.19 17:12, Alvin Starr wrote: > As a general rule I always suggest using LVM. > I have had LVM save my career a few times. > I believe that if you wish to use Gluster snapshots then the > underlying system needs to be a thinly provisioned LVM volume. > > Adding storage space to an LVM is easy and all modern file-systems > support online growing so it is easy to grow a file-system. > > If you have directory trees that are very deep and wide then you may > want to put a bit of thought into how you configure your Gluster > installation. > We have a volume with about 50M files and something like an xfs dump > or rsync of the underlying filesystem will take close to a day but > copying the data over Gluster takes weeks. > This is a problem with all clustered file systems because there is > extra locking and co-ordination required for file operations. > > Also you need to realize that the performance of something like the > powervault is limited to the speed of the connection to your server. > So that a single SAS link is limited to 6Gb(for example) and so is > your disk array but most internal raid controllers will support the > number of ports * 6Gb. > This means that a computer with 12 drives in the front will access > disk faster than a system with a 12 drive disk array attached by a few > SAS links. > > One of the arguments for things like Gluster and Ceph is that you can > many storage nodes that operate in parallel so that the ideal is a > very large number of small drive arrays over a small number of very > large drive arrays. > > > On 4/3/19 10:20 AM, kbh-admin wrote: >> Hello Gluster-Community, >> >> >> we consider to build several Gluster-servers and have a question >> regarding? lvm and Glusterfs. >> >> >> Scenario 1: Snapshots >> >> Of course, taking snapshots is a good capability and we want to use >> lvm for that. >> >> >> Scenaraio 2: Increase Gluster volume >> >> We want to increase the Gluster volume by adding hdd's and/or by adding >> >> dell powervaults later. We got the recommendation to set up a new >> Gluster volume >> >> for the powervaults and don't use lvm in that case (lvresize ....) . >> >> >> What would you suggest and how do you manage both lvm and Glusterfs >> together? >> >> >> Thanks in advance. >> >> >> Felix >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at Gluster.org >> https://lists.Gluster.org/mailman/listinfo/Gluster-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bb at kernelpanic.ru Mon Apr 8 10:49:02 2019 From: bb at kernelpanic.ru (Boris Zhmurov) Date: Mon, 8 Apr 2019 11:49:02 +0100 Subject: [Gluster-users] Gluster and LVM In-Reply-To: References: Message-ID: Hi Felix, On 03/04/2019 15:20, kbh-admin wrote: > Hello Gluster-Community, > > > we consider to build several gluster-servers and have a question > regarding? lvm and glusterfs. > > > Scenario 1: Snapshots > > Of course, taking snapshots is a good capability and we want to use > lvm for that. Please keep in mind, not just LVM, but LVM's "thin volumes" > Scenaraio 2: Increase gluster volume > > We want to increase the gluster volume by adding hdd's and/or by adding > > dell powervaults later. We got the recommendation to set up a new > gluster volume > > for the powervaults and don't use lvm in that case (lvresize ....) . > > > What would you suggest and how do you manage both lvm and glusterfs > together? > > > Thanks in advance. > > If you use LVM, it is quite simple just to increase the volume. When you add new HDDs, create new logical volumes on them, make the file system, and add it as another brick to your gluster volume (gluster add-brick volumename replica N ip.add.re.ss:/new-brick ... etc...). -- Kind regards, Boris Zhmurov mailto: bb at kernelpanic.ru From tomfite at gmail.com Mon Apr 8 13:01:08 2019 From: tomfite at gmail.com (Tom Fite) Date: Mon, 8 Apr 2019 09:01:08 -0400 Subject: [Gluster-users] Rsync in place of heal after brick failure In-Reply-To: References: Message-ID: Thanks for the idea, Poornima. Testing shows that xfsdump and xfsrestore is much faster than rsync since it handles small files much better. I don't have extra space to store the dumps but I was able to figure out how to pipe the xfsdump and restore via ssh. For anyone else that's interested: On source machine, run: xfsdump -J - /dev/mapper/[vg]-[brick] | ssh root@[destination fqdn] xfsrestore -J - [/path/to/brick] -Tom On Mon, Apr 1, 2019 at 9:56 PM Poornima Gurusiddaiah wrote: > You could also try xfsdump and xfsrestore if you brick filesystem is xfs > and the destination disk can be attached locally? This will be much faster. > > Regards, > Poornima > > On Tue, Apr 2, 2019, 12:05 AM Tom Fite wrote: > >> Hi all, >> >> I have a very large (65 TB) brick in a replica 2 volume that needs to be >> re-copied from scratch. A heal will take a very long time with performance >> degradation on the volume so I investigated using rsync to do the brunt of >> the work. >> >> The command: >> >> rsync -av -H -X --numeric-ids --progress server1:/data/brick1/gv0 >> /data/brick1/ >> >> Running with -H assures that the hard links in .glusterfs are preserved, >> and -X preserves all of gluster's extended attributes. >> >> I've tested this on my test environment as follows: >> >> 1. Stop glusterd and kill procs >> 2. Move brick volume to backup dir >> 3. Run rsync >> 4. Start glusterd >> 5. Observe gluster status >> >> All appears to be working correctly. Gluster status reports all bricks >> online, all data is accessible in the volume, and I don't see any errors in >> the logs. >> >> Anybody else have experience trying this? >> >> Thanks >> -Tom >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jorge.crespo at avature.net Mon Apr 8 13:08:46 2019 From: jorge.crespo at avature.net (Jorge Crespo) Date: Mon, 8 Apr 2019 15:08:46 +0200 Subject: [Gluster-users] Gluster with KVM - VM migration In-Reply-To: References: Message-ID: Thanks a lot for the suggestion. I will check it out and let you know. Cheers! El 8/4/19 a las 7:04, Sahina Bose escribi?: > On Thu, Apr 4, 2019 at 10:53 PM Jorge Crespo wrote: >> >> Hi everyone, >> >> First message in this list, hope I can help out as much as I can. >> >> I was wondering if someone could point out any solution already working >> or this would be a matter of scripting. >> >> We are using Gluster for a kind of strange infrastructure , where we >> have let's say 2 NODES , 2 bricks each , 2 volumes total. And both >> servers are mounting as clients both volumes. >> >> We exclusively use these volumes to run VM's. And the reason of the >> infrastructure is to be able to LiveMigrate VM's from one node to the other. >> >> VM's defined and running in NODE 1, MV files in /gluster_gv1 , this >> GlusterFS is also mounted in NODE2 ,but NODE2 doesn't make any real use >> of it. >> >> VM's defined and running in NODE 2, MV files in /gluster_gv2 , as >> before, this GlusterFS is also mounted in NODE1, but it doesn't make any >> real use of it. >> >> So the question comes now: >> >> - Let's say we come to an scenario where NODE 1 comes down. I have the >> VM's files copied to NODE2, I define them in NODE2 and start them , no >> problem with that. >> >> - Now the NODE 1 comes back UP , I guess the safest solution should be >> to have the VM's without Autostart so things don't go messy. But let's >> imagine I want my system to know which VM's are started in NODE 2, and >> start the ones that haven't been started in NODE 2. >> >> Is there any "official" way to achieve this? Basically achieve something >> like vCenter where the cluster keeps track of where the VM's are running >> at any given time, and also being able to start them in a different node >> if their node goes down. >> >> If there is no "official" answer, I'd like to hear your opinions. > Have you considered oVirt + Gluster hyperconverged solution > (https://www.ovirt.org/documentation/gluster-hyperconverged/Gluster_Hyperconverged_Guide.html)? > It addresses the HA problem that you are after. Let us know. > >> Cheers! >> >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 4012 bytes Desc: Firma criptogr??fica S/MIME URL: From rightkicktech at gmail.com Mon Apr 8 18:15:56 2019 From: rightkicktech at gmail.com (Alex K) Date: Mon, 8 Apr 2019 21:15:56 +0300 Subject: [Gluster-users] Gluster and LVM In-Reply-To: References: <9f4c439f-7893-4169-8d91-ebc41d22277d@netvel.net> Message-ID: I use gluster on top of lvm for several years without any issues. On Mon, Apr 8, 2019, 10:43 Felix K?lzow wrote: > Thank you very much for your response. > > I fully agree that using LVM has great advantages. Maybe there is a > misunderstanding, > > but I really got the recommendation to not use (normal) LVM in combination > with gluster to > > increase the volume. *Maybe someone in the community has some good or bad > experience* > > *using LVM and gluster in combination.* So please let me know :) > > > One of the arguments for things like Gluster and Ceph is that you can many > storage nodes that operate in parallel so that the ideal is a very large > number of small drive arrays over a small number of very large drive > arrays. > > I also agree we that. In our case, we actually plan to get Redhat Gluster > Storage Support and an increase of > > storage nodes would mean an increase of support costs while the same > amount of storage volume is available. > > So we are looking for a reasonable compromise. > > Felix > On 03.04.19 17:12, Alvin Starr wrote: > > As a general rule I always suggest using LVM. > I have had LVM save my career a few times. > I believe that if you wish to use Gluster snapshots then the underlying > system needs to be a thinly provisioned LVM volume. > > Adding storage space to an LVM is easy and all modern file-systems support > online growing so it is easy to grow a file-system. > > If you have directory trees that are very deep and wide then you may want > to put a bit of thought into how you configure your Gluster installation. > We have a volume with about 50M files and something like an xfs dump or > rsync of the underlying filesystem will take close to a day but copying the > data over Gluster takes weeks. > This is a problem with all clustered file systems because there is extra > locking and co-ordination required for file operations. > > Also you need to realize that the performance of something like the > powervault is limited to the speed of the connection to your server. > So that a single SAS link is limited to 6Gb(for example) and so is your > disk array but most internal raid controllers will support the number of > ports * 6Gb. > This means that a computer with 12 drives in the front will access disk > faster than a system with a 12 drive disk array attached by a few SAS > links. > > One of the arguments for things like Gluster and Ceph is that you can many > storage nodes that operate in parallel so that the ideal is a very large > number of small drive arrays over a small number of very large drive > arrays. > > > On 4/3/19 10:20 AM, kbh-admin wrote: > > Hello Gluster-Community, > > > we consider to build several Gluster-servers and have a question > regarding lvm and Glusterfs. > > > Scenario 1: Snapshots > > Of course, taking snapshots is a good capability and we want to use lvm > for that. > > > Scenaraio 2: Increase Gluster volume > > We want to increase the Gluster volume by adding hdd's and/or by adding > > dell powervaults later. We got the recommendation to set up a new Gluster > volume > > for the powervaults and don't use lvm in that case (lvresize ....) . > > > What would you suggest and how do you manage both lvm and Glusterfs > together? > > > Thanks in advance. > > > Felix > > _______________________________________________ > Gluster-users mailing list > Gluster-users at Gluster.org > https://lists.Gluster.org/mailman/listinfo/Gluster-users > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From hunter86_bg at yahoo.com Mon Apr 8 18:47:06 2019 From: hunter86_bg at yahoo.com (Strahil) Date: Mon, 08 Apr 2019 21:47:06 +0300 Subject: [Gluster-users] Gluster and LVM Message-ID: Correct me if I'm wrong but thin LVM is needed for creation of snapshots. I am a new gluster user , but I don't see any LVM issues so far. Best Regards, Strahil NikolovOn Apr 8, 2019 21:15, Alex K wrote: > > I use gluster on top of lvm for several years without any issues.? > > On Mon, Apr 8, 2019, 10:43 Felix K?lzow wrote: >> >> Thank you very much for your response. >> >> I fully agree that using LVM has great advantages. Maybe there is a misunderstanding, >> >> but I really got the recommendation to not use (normal) LVM in combination with gluster to >> >> increase the volume. Maybe someone in the community has some good or bad experience >> >> using LVM and gluster in combination. So please let me know :) >> >> >>> One of the arguments for things like Gluster and Ceph is that you can many storage nodes that operate in parallel so that the ideal is a very large number of small drive arrays over a small number of very large drive arrays. >> >> I also agree we that. In our case, we actually plan to get Redhat Gluster Storage Support and an increase of >> >> storage nodes would mean an increase of support costs while the same amount of storage volume is available. >> >> So we are looking for a reasonable compromise. >> >> Felix >> >> On 03.04.19 17:12, Alvin Starr wrote: >>> >>> As a general rule I always suggest using LVM. >>> I have had LVM save my career a few times. >>> I believe that if you wish to use Gluster snapshots then the underlying system needs to be a thinly provisioned LVM volume. >>> >>> Adding storage space to an LVM is easy and all modern file-systems support online growing so it is easy to grow a file-system. >>> >>> If you have directory trees that are very deep and wide then you may want to put a bit of thought into how you configure your Gluster installation. >>> We have a volume with about 50M files and something like an xfs dump or rsync of the underlying filesystem will take close to a day but copying the data over Gluster takes weeks. >>> This is a problem with all clustered file systems because there is extra locking and co-ordination required for file operations. >>> >>> Also you need to realize that the performance of something like the powervault is limited to the speed of the connection to your server. >>> So that a single SAS link is limited to 6Gb(for example) and so is your disk array but most internal raid controllers will support the number of ports * 6Gb. >>> This means that a computer with 12 drives in the front will access disk faster than a system with a 12 drive disk array attached by a few SAS links. >>> >>> One of the arguments for things like Gluster and Ceph is that you can many storage nodes that operate in parallel so that the ideal is a very large number of small drive arrays over a small number of very large drive arrays. >>> >>> >>> On 4/3/19 10:20 AM, kbh-admin wrote: >>>> >>>> Hello Gluster-Community, >>>> >>>> >>>> we consider to build several Gluster-servers and have a question regarding? lvm and Glusterfs. >>>> >>>> >>>> Scenario 1: Snapshots >>>> >>>> Of course, taking snapshots is a good capability and we want to use lvm for that. >>>> >>>> >>>> Scenaraio 2: Increase Gluster volume >>>> >>>> We want to increase the Gluster volume by adding hdd's and/or by adding >>>> >>>> dell powervaults later. We got the recommendation to set up a new Gluster volume >>>> >>>> for the powervaults and don't use lvm in that case (lvresize ....) . >>>> >>>> >>>> What would you suggest and how do you manage both lvm and Glusterfs together? >>>> >>>> >>>> Thanks in advance. >>>> >>>> >>>> Felix >>>> >>>> _______________________________________________ >>>> Gluster-users mailing list -------------- next part -------------- An HTML attachment was scrubbed... URL: From avishwan at redhat.com Tue Apr 9 04:10:59 2019 From: avishwan at redhat.com (Aravinda) Date: Tue, 09 Apr 2019 09:40:59 +0530 Subject: [Gluster-users] Rsync in place of heal after brick failure In-Reply-To: References: Message-ID: On Mon, 2019-04-08 at 09:01 -0400, Tom Fite wrote: > Thanks for the idea, Poornima. Testing shows that xfsdump and > xfsrestore is much faster than rsync since it handles small files > much better. I don't have extra space to store the dumps but I was > able to figure out how to pipe the xfsdump and restore via ssh. For > anyone else that's interested: > > On source machine, run: > > xfsdump -J - /dev/mapper/[vg]-[brick] | ssh root@[destination fqdn] > xfsrestore -J - [/path/to/brick] Nice. Thanks for sharing > > -Tom > > On Mon, Apr 1, 2019 at 9:56 PM Poornima Gurusiddaiah < > pgurusid at redhat.com> wrote: > > You could also try xfsdump and xfsrestore if you brick filesystem > > is xfs and the destination disk can be attached locally? This will > > be much faster. > > > > Regards, > > Poornima > > > > On Tue, Apr 2, 2019, 12:05 AM Tom Fite wrote: > > > Hi all, > > > > > > I have a very large (65 TB) brick in a replica 2 volume that > > > needs to be re-copied from scratch. A heal will take a very long > > > time with performance degradation on the volume so I investigated > > > using rsync to do the brunt of the work. > > > > > > The command: > > > > > > rsync -av -H -X --numeric-ids --progress server1:/data/brick1/gv0 > > > /data/brick1/ > > > > > > Running with -H assures that the hard links in .glusterfs are > > > preserved, and -X preserves all of gluster's extended attributes. > > > > > > I've tested this on my test environment as follows: > > > > > > 1. Stop glusterd and kill procs > > > 2. Move brick volume to backup dir > > > 3. Run rsync > > > 4. Start glusterd > > > 5. Observe gluster status > > > > > > All appears to be working correctly. Gluster status reports all > > > bricks online, all data is accessible in the volume, and I don't > > > see any errors in the logs. > > > > > > Anybody else have experience trying this? > > > > > > Thanks > > > -Tom > > > _______________________________________________ > > > Gluster-users mailing list > > > Gluster-users at gluster.org > > > https://lists.gluster.org/mailman/listinfo/gluster-users > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -- regards Aravinda From pgurusid at redhat.com Tue Apr 9 04:23:02 2019 From: pgurusid at redhat.com (Poornima Gurusiddaiah) Date: Tue, 9 Apr 2019 09:53:02 +0530 Subject: [Gluster-users] Rsync in place of heal after brick failure In-Reply-To: References: Message-ID: On Mon, Apr 8, 2019, 6:31 PM Tom Fite wrote: > Thanks for the idea, Poornima. Testing shows that xfsdump and xfsrestore > is much faster than rsync since it handles small files much better. I don't > have extra space to store the dumps but I was able to figure out how to > pipe the xfsdump and restore via ssh. For anyone else that's interested: > > On source machine, run: > > xfsdump -J - /dev/mapper/[vg]-[brick] | ssh root@[destination fqdn] > xfsrestore -J - [/path/to/brick] > That's great. Is it possible for you to write a short summary on this in your blog or in the Gluster/blogs [1]? The summary would be very helpful for other users as well. If you could also include details on the approaches you explored and the time each would take for the 65 TB data. Thanks in advance. We will also see how we could incorporate this in replace brick/offline migration. [1] https://gluster.github.io/devblog/write-for-gluster Thanks, Poornima > -Tom > > On Mon, Apr 1, 2019 at 9:56 PM Poornima Gurusiddaiah > wrote: > >> You could also try xfsdump and xfsrestore if you brick filesystem is xfs >> and the destination disk can be attached locally? This will be much faster. >> >> Regards, >> Poornima >> >> On Tue, Apr 2, 2019, 12:05 AM Tom Fite wrote: >> >>> Hi all, >>> >>> I have a very large (65 TB) brick in a replica 2 volume that needs to be >>> re-copied from scratch. A heal will take a very long time with performance >>> degradation on the volume so I investigated using rsync to do the brunt of >>> the work. >>> >>> The command: >>> >>> rsync -av -H -X --numeric-ids --progress server1:/data/brick1/gv0 >>> /data/brick1/ >>> >>> Running with -H assures that the hard links in .glusterfs are preserved, >>> and -X preserves all of gluster's extended attributes. >>> >>> I've tested this on my test environment as follows: >>> >>> 1. Stop glusterd and kill procs >>> 2. Move brick volume to backup dir >>> 3. Run rsync >>> 4. Start glusterd >>> 5. Observe gluster status >>> >>> All appears to be working correctly. Gluster status reports all bricks >>> online, all data is accessible in the volume, and I don't see any errors in >>> the logs. >>> >>> Anybody else have experience trying this? >>> >>> Thanks >>> -Tom >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> https://lists.gluster.org/mailman/listinfo/gluster-users >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From aspandey at redhat.com Tue Apr 9 05:23:24 2019 From: aspandey at redhat.com (Ashish Pandey) Date: Tue, 9 Apr 2019 01:23:24 -0400 (EDT) Subject: [Gluster-users] Rsync in place of heal after brick failure In-Reply-To: References: Message-ID: <815457965.12666287.1554787404195.JavaMail.zimbra@redhat.com> ----- Original Message ----- From: "Poornima Gurusiddaiah" To: "Tom Fite" Cc: "Gluster-users" Sent: Tuesday, April 9, 2019 9:53:02 AM Subject: Re: [Gluster-users] Rsync in place of heal after brick failure On Mon, Apr 8, 2019, 6:31 PM Tom Fite < tomfite at gmail.com > wrote: Thanks for the idea, Poornima. Testing shows that xfsdump and xfsrestore is much faster than rsync since it handles small files much better. I don't have extra space to store the dumps but I was able to figure out how to pipe the xfsdump and restore via ssh. For anyone else that's interested: On source machine, run: xfsdump -J - /dev/mapper/[vg]-[brick] | ssh root@[destination fqdn] xfsrestore -J - [/path/to/brick] That's great. Is it possible for you to write a short summary on this in your blog or in the Gluster/blogs [1]? The summary would be very helpful for other users as well. If you could also include details on the approaches you explored and the time each would take for the 65 TB data. Thanks in advance. We will also see how we could incorporate this in replace brick/offline migration. [1] https://gluster.github.io/devblog/write-for-gluster Thanks, Poornima
-Tom On Mon, Apr 1, 2019 at 9:56 PM Poornima Gurusiddaiah < pgurusid at redhat.com > wrote:
You could also try xfsdump and xfsrestore if you brick filesystem is xfs and the destination disk can be attached locally? This will be much faster. Regards, Poornima On Tue, Apr 2, 2019, 12:05 AM Tom Fite < tomfite at gmail.com > wrote:
Hi all, I have a very large (65 TB) brick in a replica 2 volume that needs to be re-copied from scratch. A heal will take a very long time with performance degradation on the volume so I investigated using rsync to do the brunt of the work. The command: rsync -av -H -X --numeric-ids --progress server1:/data/brick1/gv0 /data/brick1/ Running with -H assures that the hard links in .glusterfs are preserved, and -X preserves all of gluster's extended attributes. I've tested this on my test environment as follows: 1. Stop glusterd and kill procs 2. Move brick volume to backup dir 3. Run rsync 4. Start glusterd 5. Observe gluster status
Just want to add one step to quickly test this. You can kill other brick which you did not touch and then try to access your volume. This will ensure that all the file operations are falling on this new brick and you can see if everything is accessible.
All appears to be working correctly. Gluster status reports all bricks online, all data is accessible in the volume, and I don't see any errors in the logs. Anybody else have experience trying this? Thanks -Tom _______________________________________________ Gluster-users mailing list Gluster-users at gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
_______________________________________________ Gluster-users mailing list Gluster-users at gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From rightkicktech at gmail.com Tue Apr 9 05:26:18 2019 From: rightkicktech at gmail.com (Alex K) Date: Tue, 9 Apr 2019 08:26:18 +0300 Subject: [Gluster-users] Gluster and LVM In-Reply-To: References: Message-ID: On Mon, Apr 8, 2019, 21:47 Strahil wrote: > Correct me if I'm wrong but thin LVM is needed for creation of snapshots. > Yes, you need thin provisioned logical volumes for gluster snapshots. Actually, gluster snapshots are lvm snapshots under the hood. > I am a new gluster user , but I don't see any LVM issues so far. > Neither me > Best Regards, > Strahil Nikolov > On Apr 8, 2019 21:15, Alex K wrote: > > I use gluster on top of lvm for several years without any issues. > > On Mon, Apr 8, 2019, 10:43 Felix K?lzow wrote: > > Thank you very much for your response. > > I fully agree that using LVM has great advantages. Maybe there is a > misunderstanding, > > but I really got the recommendation to not use (normal) LVM in combination > with gluster to > > increase the volume. *Maybe someone in the community has some good or bad > experience* > > *using LVM and gluster in combination.* So please let me know :) > > > One of the arguments for things like Gluster and Ceph is that you can many > storage nodes that operate in parallel so that the ideal is a very large > number of small drive arrays over a small number of very large drive > arrays. > > I also agree we that. In our case, we actually plan to get Redhat Gluster > Storage Support and an increase of > > storage nodes would mean an increase of support costs while the same > amount of storage volume is available. > > So we are looking for a reasonable compromise. > > Felix > On 03.04.19 17:12, Alvin Starr wrote: > > As a general rule I always suggest using LVM. > I have had LVM save my career a few times. > I believe that if you wish to use Gluster snapshots then the underlying > system needs to be a thinly provisioned LVM volume. > > Adding storage space to an LVM is easy and all modern file-systems support > online growing so it is easy to grow a file-system. > > If you have directory trees that are very deep and wide then you may want > to put a bit of thought into how you configure your Gluster installation. > We have a volume with about 50M files and something like an xfs dump or > rsync of the underlying filesystem will take close to a day but copying the > data over Gluster takes weeks. > This is a problem with all clustered file systems because there is extra > locking and co-ordination required for file operations. > > Also you need to realize that the performance of something like the > powervault is limited to the speed of the connection to your server. > So that a single SAS link is limited to 6Gb(for example) and so is your > disk array but most internal raid controllers will support the number of > ports * 6Gb. > This means that a computer with 12 drives in the front will access disk > faster than a system with a 12 drive disk array attached by a few SAS > links. > > One of the arguments for things like Gluster and Ceph is that you can many > storage nodes that operate in parallel so that the ideal is a very large > number of small drive arrays over a small number of very large drive > arrays. > > > On 4/3/19 10:20 AM, kbh-admin wrote: > > Hello Gluster-Community, > > > we consider to build several Gluster-servers and have a question > regarding lvm and Glusterfs. > > > Scenario 1: Snapshots > > Of course, taking snapshots is a good capability and we want to use lvm > for that. > > > Scenaraio 2: Increase Gluster volume > > We want to increase the Gluster volume by adding hdd's and/or by adding > > dell powervaults later. We got the recommendation to set up a new Gluster > volume > > for the powervaults and don't use lvm in that case (lvresize ....) . > > > What would you suggest and how do you manage both lvm and Glusterfs > together? > > > Thanks in advance. > > > Felix > > _______________________________________________ > Gluster-users mailing list > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hunter86_bg at yahoo.com Tue Apr 9 15:34:53 2019 From: hunter86_bg at yahoo.com (Strahil) Date: Tue, 09 Apr 2019 18:34:53 +0300 Subject: [Gluster-users] Rsync in place of heal after brick failure Message-ID: Correct me if I'm wrong but I have been left with the impression that cluster heal is multi-process , multi-connection event and would benefit from a bonding like balance-alb. I don't have much experience with xfsdump, but it looks like a single process that uses single connection and thus only LACP can be beneficial. Am I wrong? Best Regards, Strahil NikolovOn Apr 9, 2019 07:10, Aravinda wrote: > > On Mon, 2019-04-08 at 09:01 -0400, Tom Fite wrote: > > Thanks for the idea, Poornima. Testing shows that xfsdump and > > xfsrestore is much faster than rsync since it handles small files > > much better. I don't have extra space to store the dumps but I was > > able to figure out how to pipe the xfsdump and restore via ssh. For > > anyone else that's interested: > > > > On source machine, run: > > > > xfsdump -J - /dev/mapper/[vg]-[brick] | ssh root@[destination fqdn] > > xfsrestore -J - [/path/to/brick] > > Nice. Thanks for sharing > > > > > -Tom > > > > On Mon, Apr 1, 2019 at 9:56 PM Poornima Gurusiddaiah < > > pgurusid at redhat.com> wrote: > > > You could also try xfsdump and xfsrestore if you brick filesystem > > > is xfs and the destination disk can be attached locally? This will > > > be much faster. > > > > > > Regards, > > > Poornima > > > > > > On Tue, Apr 2, 2019, 12:05 AM Tom Fite wrote: > > > > Hi all, > > > > > > > > I have a very large (65 TB) brick in a replica 2 volume that > > > > needs to be re-copied from scratch. A heal will take a very long > > > > time with performance degradation on the volume so I investigated > > > > using rsync to do the brunt of the work. > > > > > > > > The command: > > > > > > > > rsync -av -H -X --numeric-ids --progress server1:/data/brick1/gv0 > > > > /data/brick1/ > > > > > > > > Running with -H assures that the hard links in .glusterfs are > > > > preserved, and -X preserves all of gluster's extended attributes. > > > > > > > > I've tested this on my test environment as follows: > > > > > > > > 1. Stop glusterd and kill procs > > > > 2. Move brick volume to backup dir > > > > 3. Run rsync > > > > 4. Start glusterd > > > > 5. Observe gluster status > > > > > > > > All appears to be working correctly. Gluster status reports all > > > > bricks online, all data is accessible in the volume, and I don't > > > > see any errors in the logs. > > > > > > > > Anybody else have experience trying this? > > > > > > > > Thanks > > > > -Tom > > > > _______________________________________________ > > > > Gluster-users mailing list > > > > Gluster-users at gluster.org > > > > https://lists.gluster.org/mailman/listinfo/gluster-users > > > > _______________________________________________ > > Gluster-users mailing list > > Gluster-users at gluster.org > > https://lists.gluster.org/mailman/listinfo/gluster-users > -- > regards > Aravinda > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users From alvin at netvel.net Tue Apr 9 17:32:41 2019 From: alvin at netvel.net (Alvin Starr) Date: Tue, 9 Apr 2019 13:32:41 -0400 Subject: [Gluster-users] Rsync in place of heal after brick failure In-Reply-To: References: Message-ID: The performance needs to be compared between the two in a real environment. For example I have a system where xfsdump takes something like 4 hours for a complete dump to /dev/null but a "find . -type f > /dev/null" takes well over a day. So it seems that xfsdump is very disk read efficient. Another thing to take into consideration is the latency. If the hosts are on the same lan then life is good but if the systems are milliseconds or more away from each other then you start getting side effects from BDP(bandwidth delay product) and this can quickly take a multi-gigabit link and turn it into a multi-megabit link. BBCP supports piping data into and out of the program allowing for better use of the available bandwidth. So that may be another way to get better performance out of multiple links or links with latency issues. On 4/9/19 11:34 AM, Strahil wrote: > Correct me if I'm wrong but I have been left with the impression that cluster heal is multi-process , multi-connection event and would benefit from a bonding like balance-alb. > > I don't have much experience with xfsdump, but it looks like a single process that uses single connection and thus only LACP can be beneficial. > > Am I wrong? > > Best Regards, > Strahil NikolovOn Apr 9, 2019 07:10, Aravinda wrote: >> On Mon, 2019-04-08 at 09:01 -0400, Tom Fite wrote: >>> Thanks for the idea, Poornima. Testing shows that xfsdump and >>> xfsrestore is much faster than rsync since it handles small files >>> much better. I don't have extra space to store the dumps but I was >>> able to figure out how to pipe the xfsdump and restore via ssh. For >>> anyone else that's interested: >>> >>> On source machine, run: >>> >>> xfsdump -J - /dev/mapper/[vg]-[brick] | ssh root@[destination fqdn] >>> xfsrestore -J - [/path/to/brick] >> Nice. Thanks for sharing >> >>> -Tom >>> >>> On Mon, Apr 1, 2019 at 9:56 PM Poornima Gurusiddaiah < >>> pgurusid at redhat.com> wrote: >>>> You could also try xfsdump and xfsrestore if you brick filesystem >>>> is xfs and the destination disk can be attached locally? This will >>>> be much faster. >>>> >>>> Regards, >>>> Poornima >>>> >>>> On Tue, Apr 2, 2019, 12:05 AM Tom Fite wrote: >>>>> Hi all, >>>>> >>>>> I have a very large (65 TB) brick in a replica 2 volume that >>>>> needs to be re-copied from scratch. A heal will take a very long >>>>> time with performance degradation on the volume so I investigated >>>>> using rsync to do the brunt of the work. >>>>> >>>>> The command: >>>>> >>>>> rsync -av -H -X --numeric-ids --progress server1:/data/brick1/gv0 >>>>> /data/brick1/ >>>>> >>>>> Running with -H assures that the hard links in .glusterfs are >>>>> preserved, and -X preserves all of gluster's extended attributes. >>>>> >>>>> I've tested this on my test environment as follows: >>>>> >>>>> 1. Stop glusterd and kill procs >>>>> 2. Move brick volume to backup dir >>>>> 3. Run rsync >>>>> 4. Start glusterd >>>>> 5. Observe gluster status >>>>> >>>>> All appears to be working correctly. Gluster status reports all >>>>> bricks online, all data is accessible in the volume, and I don't >>>>> see any errors in the logs. >>>>> >>>>> Anybody else have experience trying this? >>>>> >>>>> Thanks >>>>> -Tom >>>>> _______________________________________________ >>>>> Gluster-users mailing list >>>>> Gluster-users at gluster.org >>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> https://lists.gluster.org/mailman/listinfo/gluster-users >> -- >> regards >> Aravinda >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -- Alvin Starr || land: (905)513-7688 Netvel Inc. || Cell: (416)806-0133 alvin at netvel.net || From hunter86_bg at yahoo.com Tue Apr 9 21:06:39 2019 From: hunter86_bg at yahoo.com (Strahil Nikolov) Date: Tue, 9 Apr 2019 21:06:39 +0000 (UTC) Subject: [Gluster-users] Gluster snapshot fails References: <1800297079.797563.1554843999336.ref@mail.yahoo.com> Message-ID: <1800297079.797563.1554843999336@mail.yahoo.com> Hello Community, I have a problem running a snapshot of a replica 3 arbiter 1 volume. Error:[root at ovirt2 ~]# gluster snapshot create before-423 engine description "Before upgrade of engine from 4.2.2 to 4.2.3"?snapshot create: failed: Snapshot is supported only for thin provisioned LV. Ensure that all bricks of engine are thinly provisioned LV.Snapshot command failed Volume info: Volume Name: engineType: ReplicateVolume ID: 30ca1cc2-f2f7-4749-9e2e-cee9d7099dedStatus: StartedSnapshot Count: 0Number of Bricks: 1 x (2 + 1) = 3Transport-type: tcpBricks:Brick1: ovirt1:/gluster_bricks/engine/engineBrick2: ovirt2:/gluster_bricks/engine/engineBrick3: ovirt3:/gluster_bricks/engine/engine (arbiter)Options Reconfigured:cluster.granular-entry-heal: enableperformance.strict-o-direct: onnetwork.ping-timeout: 30storage.owner-gid: 36storage.owner-uid: 36user.cifs: offfeatures.shard: oncluster.shd-wait-qlength: 10000cluster.shd-max-threads: 8cluster.locking-scheme: granularcluster.data-self-heal-algorithm: fullcluster.server-quorum-type: servercluster.quorum-type: autocluster.eager-lock: enablenetwork.remote-dio: offperformance.low-prio-threads: 32performance.io-cache: offperformance.read-ahead: offperformance.quick-read: offtransport.address-family: inetnfs.disable: onperformance.client-io-threads: offcluster.enable-shared-storage: enable All bricks are on thin lvm with plenty of space, the only thing that could be causing it is that ovirt1 & ovirt2 are on?/dev/gluster_vg_ssd/gluster_lv_engine , while arbiter is on?/dev/gluster_vg_sda3/gluster_lv_engine. Is that the issue ? Should I rename my brick's VG ?If so, why there is no mentioning in the documentation ? Best Regards,Strahil Nikolov -------------- next part -------------- An HTML attachment was scrubbed... URL: From hunter86_bg at yahoo.com Tue Apr 9 21:06:39 2019 From: hunter86_bg at yahoo.com (Strahil Nikolov) Date: Tue, 9 Apr 2019 21:06:39 +0000 (UTC) Subject: [Gluster-users] Gluster snapshot fails References: <1800297079.797563.1554843999336.ref@mail.yahoo.com> Message-ID: <1800297079.797563.1554843999336@mail.yahoo.com> Hello Community, I have a problem running a snapshot of a replica 3 arbiter 1 volume. Error:[root at ovirt2 ~]# gluster snapshot create before-423 engine description "Before upgrade of engine from 4.2.2 to 4.2.3"?snapshot create: failed: Snapshot is supported only for thin provisioned LV. Ensure that all bricks of engine are thinly provisioned LV.Snapshot command failed Volume info: Volume Name: engineType: ReplicateVolume ID: 30ca1cc2-f2f7-4749-9e2e-cee9d7099dedStatus: StartedSnapshot Count: 0Number of Bricks: 1 x (2 + 1) = 3Transport-type: tcpBricks:Brick1: ovirt1:/gluster_bricks/engine/engineBrick2: ovirt2:/gluster_bricks/engine/engineBrick3: ovirt3:/gluster_bricks/engine/engine (arbiter)Options Reconfigured:cluster.granular-entry-heal: enableperformance.strict-o-direct: onnetwork.ping-timeout: 30storage.owner-gid: 36storage.owner-uid: 36user.cifs: offfeatures.shard: oncluster.shd-wait-qlength: 10000cluster.shd-max-threads: 8cluster.locking-scheme: granularcluster.data-self-heal-algorithm: fullcluster.server-quorum-type: servercluster.quorum-type: autocluster.eager-lock: enablenetwork.remote-dio: offperformance.low-prio-threads: 32performance.io-cache: offperformance.read-ahead: offperformance.quick-read: offtransport.address-family: inetnfs.disable: onperformance.client-io-threads: offcluster.enable-shared-storage: enable All bricks are on thin lvm with plenty of space, the only thing that could be causing it is that ovirt1 & ovirt2 are on?/dev/gluster_vg_ssd/gluster_lv_engine , while arbiter is on?/dev/gluster_vg_sda3/gluster_lv_engine. Is that the issue ? Should I rename my brick's VG ?If so, why there is no mentioning in the documentation ? Best Regards,Strahil Nikolov -------------- next part -------------- An HTML attachment was scrubbed... URL: From hunter86_bg at yahoo.com Tue Apr 9 21:06:39 2019 From: hunter86_bg at yahoo.com (Strahil Nikolov) Date: Tue, 9 Apr 2019 21:06:39 +0000 (UTC) Subject: [Gluster-users] Gluster snapshot fails References: <1800297079.797563.1554843999336.ref@mail.yahoo.com> Message-ID: <1800297079.797563.1554843999336@mail.yahoo.com> Hello Community, I have a problem running a snapshot of a replica 3 arbiter 1 volume. Error:[root at ovirt2 ~]# gluster snapshot create before-423 engine description "Before upgrade of engine from 4.2.2 to 4.2.3"?snapshot create: failed: Snapshot is supported only for thin provisioned LV. Ensure that all bricks of engine are thinly provisioned LV.Snapshot command failed Volume info: Volume Name: engineType: ReplicateVolume ID: 30ca1cc2-f2f7-4749-9e2e-cee9d7099dedStatus: StartedSnapshot Count: 0Number of Bricks: 1 x (2 + 1) = 3Transport-type: tcpBricks:Brick1: ovirt1:/gluster_bricks/engine/engineBrick2: ovirt2:/gluster_bricks/engine/engineBrick3: ovirt3:/gluster_bricks/engine/engine (arbiter)Options Reconfigured:cluster.granular-entry-heal: enableperformance.strict-o-direct: onnetwork.ping-timeout: 30storage.owner-gid: 36storage.owner-uid: 36user.cifs: offfeatures.shard: oncluster.shd-wait-qlength: 10000cluster.shd-max-threads: 8cluster.locking-scheme: granularcluster.data-self-heal-algorithm: fullcluster.server-quorum-type: servercluster.quorum-type: autocluster.eager-lock: enablenetwork.remote-dio: offperformance.low-prio-threads: 32performance.io-cache: offperformance.read-ahead: offperformance.quick-read: offtransport.address-family: inetnfs.disable: onperformance.client-io-threads: offcluster.enable-shared-storage: enable All bricks are on thin lvm with plenty of space, the only thing that could be causing it is that ovirt1 & ovirt2 are on?/dev/gluster_vg_ssd/gluster_lv_engine , while arbiter is on?/dev/gluster_vg_sda3/gluster_lv_engine. Is that the issue ? Should I rename my brick's VG ?If so, why there is no mentioning in the documentation ? Best Regards,Strahil Nikolov -------------- next part -------------- An HTML attachment was scrubbed... URL: From hunter86_bg at yahoo.com Tue Apr 9 21:06:39 2019 From: hunter86_bg at yahoo.com (Strahil Nikolov) Date: Tue, 9 Apr 2019 21:06:39 +0000 (UTC) Subject: [Gluster-users] Gluster snapshot fails References: <1800297079.797563.1554843999336.ref@mail.yahoo.com> Message-ID: <1800297079.797563.1554843999336@mail.yahoo.com> Hello Community, I have a problem running a snapshot of a replica 3 arbiter 1 volume. Error:[root at ovirt2 ~]# gluster snapshot create before-423 engine description "Before upgrade of engine from 4.2.2 to 4.2.3"?snapshot create: failed: Snapshot is supported only for thin provisioned LV. Ensure that all bricks of engine are thinly provisioned LV.Snapshot command failed Volume info: Volume Name: engineType: ReplicateVolume ID: 30ca1cc2-f2f7-4749-9e2e-cee9d7099dedStatus: StartedSnapshot Count: 0Number of Bricks: 1 x (2 + 1) = 3Transport-type: tcpBricks:Brick1: ovirt1:/gluster_bricks/engine/engineBrick2: ovirt2:/gluster_bricks/engine/engineBrick3: ovirt3:/gluster_bricks/engine/engine (arbiter)Options Reconfigured:cluster.granular-entry-heal: enableperformance.strict-o-direct: onnetwork.ping-timeout: 30storage.owner-gid: 36storage.owner-uid: 36user.cifs: offfeatures.shard: oncluster.shd-wait-qlength: 10000cluster.shd-max-threads: 8cluster.locking-scheme: granularcluster.data-self-heal-algorithm: fullcluster.server-quorum-type: servercluster.quorum-type: autocluster.eager-lock: enablenetwork.remote-dio: offperformance.low-prio-threads: 32performance.io-cache: offperformance.read-ahead: offperformance.quick-read: offtransport.address-family: inetnfs.disable: onperformance.client-io-threads: offcluster.enable-shared-storage: enable All bricks are on thin lvm with plenty of space, the only thing that could be causing it is that ovirt1 & ovirt2 are on?/dev/gluster_vg_ssd/gluster_lv_engine , while arbiter is on?/dev/gluster_vg_sda3/gluster_lv_engine. Is that the issue ? Should I rename my brick's VG ?If so, why there is no mentioning in the documentation ? Best Regards,Strahil Nikolov -------------- next part -------------- An HTML attachment was scrubbed... URL: From hunter86_bg at yahoo.com Tue Apr 9 21:06:39 2019 From: hunter86_bg at yahoo.com (Strahil Nikolov) Date: Tue, 9 Apr 2019 21:06:39 +0000 (UTC) Subject: [Gluster-users] Gluster snapshot fails References: <1800297079.797563.1554843999336.ref@mail.yahoo.com> Message-ID: <1800297079.797563.1554843999336@mail.yahoo.com> Hello Community, I have a problem running a snapshot of a replica 3 arbiter 1 volume. Error:[root at ovirt2 ~]# gluster snapshot create before-423 engine description "Before upgrade of engine from 4.2.2 to 4.2.3"?snapshot create: failed: Snapshot is supported only for thin provisioned LV. Ensure that all bricks of engine are thinly provisioned LV.Snapshot command failed Volume info: Volume Name: engineType: ReplicateVolume ID: 30ca1cc2-f2f7-4749-9e2e-cee9d7099dedStatus: StartedSnapshot Count: 0Number of Bricks: 1 x (2 + 1) = 3Transport-type: tcpBricks:Brick1: ovirt1:/gluster_bricks/engine/engineBrick2: ovirt2:/gluster_bricks/engine/engineBrick3: ovirt3:/gluster_bricks/engine/engine (arbiter)Options Reconfigured:cluster.granular-entry-heal: enableperformance.strict-o-direct: onnetwork.ping-timeout: 30storage.owner-gid: 36storage.owner-uid: 36user.cifs: offfeatures.shard: oncluster.shd-wait-qlength: 10000cluster.shd-max-threads: 8cluster.locking-scheme: granularcluster.data-self-heal-algorithm: fullcluster.server-quorum-type: servercluster.quorum-type: autocluster.eager-lock: enablenetwork.remote-dio: offperformance.low-prio-threads: 32performance.io-cache: offperformance.read-ahead: offperformance.quick-read: offtransport.address-family: inetnfs.disable: onperformance.client-io-threads: offcluster.enable-shared-storage: enable All bricks are on thin lvm with plenty of space, the only thing that could be causing it is that ovirt1 & ovirt2 are on?/dev/gluster_vg_ssd/gluster_lv_engine , while arbiter is on?/dev/gluster_vg_sda3/gluster_lv_engine. Is that the issue ? Should I rename my brick's VG ?If so, why there is no mentioning in the documentation ? Best Regards,Strahil Nikolov -------------- next part -------------- An HTML attachment was scrubbed... URL: From hunter86_bg at yahoo.com Tue Apr 9 21:06:39 2019 From: hunter86_bg at yahoo.com (Strahil Nikolov) Date: Tue, 9 Apr 2019 21:06:39 +0000 (UTC) Subject: [Gluster-users] Gluster snapshot fails References: <1800297079.797563.1554843999336.ref@mail.yahoo.com> Message-ID: <1800297079.797563.1554843999336@mail.yahoo.com> Hello Community, I have a problem running a snapshot of a replica 3 arbiter 1 volume. Error:[root at ovirt2 ~]# gluster snapshot create before-423 engine description "Before upgrade of engine from 4.2.2 to 4.2.3"?snapshot create: failed: Snapshot is supported only for thin provisioned LV. Ensure that all bricks of engine are thinly provisioned LV.Snapshot command failed Volume info: Volume Name: engineType: ReplicateVolume ID: 30ca1cc2-f2f7-4749-9e2e-cee9d7099dedStatus: StartedSnapshot Count: 0Number of Bricks: 1 x (2 + 1) = 3Transport-type: tcpBricks:Brick1: ovirt1:/gluster_bricks/engine/engineBrick2: ovirt2:/gluster_bricks/engine/engineBrick3: ovirt3:/gluster_bricks/engine/engine (arbiter)Options Reconfigured:cluster.granular-entry-heal: enableperformance.strict-o-direct: onnetwork.ping-timeout: 30storage.owner-gid: 36storage.owner-uid: 36user.cifs: offfeatures.shard: oncluster.shd-wait-qlength: 10000cluster.shd-max-threads: 8cluster.locking-scheme: granularcluster.data-self-heal-algorithm: fullcluster.server-quorum-type: servercluster.quorum-type: autocluster.eager-lock: enablenetwork.remote-dio: offperformance.low-prio-threads: 32performance.io-cache: offperformance.read-ahead: offperformance.quick-read: offtransport.address-family: inetnfs.disable: onperformance.client-io-threads: offcluster.enable-shared-storage: enable All bricks are on thin lvm with plenty of space, the only thing that could be causing it is that ovirt1 & ovirt2 are on?/dev/gluster_vg_ssd/gluster_lv_engine , while arbiter is on?/dev/gluster_vg_sda3/gluster_lv_engine. Is that the issue ? Should I rename my brick's VG ?If so, why there is no mentioning in the documentation ? Best Regards,Strahil Nikolov -------------- next part -------------- An HTML attachment was scrubbed... URL: From hunter86_bg at yahoo.com Tue Apr 9 21:06:39 2019 From: hunter86_bg at yahoo.com (Strahil Nikolov) Date: Tue, 9 Apr 2019 21:06:39 +0000 (UTC) Subject: [Gluster-users] Gluster snapshot fails References: <1800297079.797563.1554843999336.ref@mail.yahoo.com> Message-ID: <1800297079.797563.1554843999336@mail.yahoo.com> Hello Community, I have a problem running a snapshot of a replica 3 arbiter 1 volume. Error:[root at ovirt2 ~]# gluster snapshot create before-423 engine description "Before upgrade of engine from 4.2.2 to 4.2.3"?snapshot create: failed: Snapshot is supported only for thin provisioned LV. Ensure that all bricks of engine are thinly provisioned LV.Snapshot command failed Volume info: Volume Name: engineType: ReplicateVolume ID: 30ca1cc2-f2f7-4749-9e2e-cee9d7099dedStatus: StartedSnapshot Count: 0Number of Bricks: 1 x (2 + 1) = 3Transport-type: tcpBricks:Brick1: ovirt1:/gluster_bricks/engine/engineBrick2: ovirt2:/gluster_bricks/engine/engineBrick3: ovirt3:/gluster_bricks/engine/engine (arbiter)Options Reconfigured:cluster.granular-entry-heal: enableperformance.strict-o-direct: onnetwork.ping-timeout: 30storage.owner-gid: 36storage.owner-uid: 36user.cifs: offfeatures.shard: oncluster.shd-wait-qlength: 10000cluster.shd-max-threads: 8cluster.locking-scheme: granularcluster.data-self-heal-algorithm: fullcluster.server-quorum-type: servercluster.quorum-type: autocluster.eager-lock: enablenetwork.remote-dio: offperformance.low-prio-threads: 32performance.io-cache: offperformance.read-ahead: offperformance.quick-read: offtransport.address-family: inetnfs.disable: onperformance.client-io-threads: offcluster.enable-shared-storage: enable All bricks are on thin lvm with plenty of space, the only thing that could be causing it is that ovirt1 & ovirt2 are on?/dev/gluster_vg_ssd/gluster_lv_engine , while arbiter is on?/dev/gluster_vg_sda3/gluster_lv_engine. Is that the issue ? Should I rename my brick's VG ?If so, why there is no mentioning in the documentation ? Best Regards,Strahil Nikolov -------------- next part -------------- An HTML attachment was scrubbed... URL: From snowmailer at gmail.com Wed Apr 10 09:42:32 2019 From: snowmailer at gmail.com (Martin Toth) Date: Wed, 10 Apr 2019 11:42:32 +0200 Subject: [Gluster-users] Replica 3 - how to replace failed node (peer) Message-ID: <0917AF4A-76EC-4A9E-820F-E0ADA2DA899A@gmail.com> Hi all, I am running replica 3 gluster with 3 bricks. One of my servers failed - all disks are showing errors and raid is in fault state. Type: Replicate Volume ID: 41d5c283-3a74-4af8-a55d-924447bfa59a Status: Started Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: node1.san:/tank/gluster/gv0imagestore/brick1 Brick2: node2.san:/tank/gluster/gv0imagestore/brick1 References: <0917AF4A-76EC-4A9E-820F-E0ADA2DA899A@gmail.com> Message-ID: Hello Martin, look here: https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.4/pdf/administration_guide/Red_Hat_Gluster_Storage-3.4-Administration_Guide-en-US.pdf on page 324. There is a manual how to replace a brick in case of a hardware failure Regards David Spisla Am Mi., 10. Apr. 2019 um 11:42 Uhr schrieb Martin Toth : > Hi all, > > I am running replica 3 gluster with 3 bricks. One of my servers failed - > all disks are showing errors and raid is in fault state. > > Type: Replicate > Volume ID: 41d5c283-3a74-4af8-a55d-924447bfa59a > Status: Started > Number of Bricks: 1 x 3 = 3 > Transport-type: tcp > Bricks: > Brick1: node1.san:/tank/gluster/gv0imagestore/brick1 > Brick2: node2.san:/tank/gluster/gv0imagestore/brick1 Brick3: node3.san:/tank/gluster/gv0imagestore/brick1 > > So one of my bricks is totally failed (node2). It went down and all data > are lost (failed raid on node2). Now I am running only two bricks on 2 > servers out from 3. > This is really critical problem for us, we can lost all data. I want to > add new disks to node2, create new raid array on them and try to replace > failed brick on this node. > > What is the procedure of replacing Brick2 on node2, can someone advice? I > can?t find anything relevant in documentation. > > Thanks in advance, > Martin > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From pascal.suter at dalco.ch Wed Apr 10 10:14:38 2019 From: pascal.suter at dalco.ch (Pascal Suter) Date: Wed, 10 Apr 2019 12:14:38 +0200 Subject: [Gluster-users] performance - what can I expect In-Reply-To: <381efa03-78b3-e244-9f52-054b357b5d57@dalco.ch> References: <381efa03-78b3-e244-9f52-054b357b5d57@dalco.ch> Message-ID: <8f150899-321b-f184-978c-9b7b01e6fb39@dalco.ch> i continued my testing with 5 clients, all attached over 100Gbit/s omni-path via IP over IB. when i run the same iozone benchmark across all 5 clients where gluster is mounted using the glusterfs client, i get an aggretated write throughput of only about 400GB/s and an aggregated read throughput of 1.5GB/s. Each node was writing a single 200Gb file in 16MB chunks and the files where distributed across all three bricks on the server. the connection was established over Omnipath for sure, as there is no other link between the nodes and server. i have no clue what i'm doing wrong here. i can't believe that this is a normal performance people would expect to see from gluster. i guess nobody would be using it if it was this slow. again, when written dreictly to the xfs filesystem on the bricks, i get over 6GB/s read and write throughput using the same benchmark. any advise is appreciated cheers Pascal On 04.04.19 12:03, Pascal Suter wrote: > I just noticed i left the most important parameters out :) > > here's the write command with filesize and recordsize in it as well :) > > ./iozone -i 0 -t 1 -F /mnt/gluster/storage/thread1 -+n -c -C -e -I -w > -+S 0 -s 200G -r 16384k > > also i ran the benchmark without direct_io which resulted in an even > worse performance. > > i also tried to mount the gluster volume via nfs-ganesha which further > reduced throughput down to about 450MB/s > > if i run the iozone benchmark with 3 threads writing to all three > bricks directly (from the xfs filesystem) i get throughputs of around > 6GB/s .. if I run the same benchmark through gluster mounted locally > using the fuse client and with enough threads so that each brick gets > at least one file written to it, i end up seing throughputs around > 1.5GB/s .. that's a 4x decrease in performance. at it actually is the > same if i run the benchmark with less threads and files only get > written to two out of three bricks. > > cpu load on the server is around 25% by the way, nicely distributed > across all available cores. > > i can't believe that gluster should really be so slow and everybody is > just happily using it. any hints on what i'm doing wrong are very > welcome. > > i'm using gluster 6.0 by the way. > > regards > > Pascal > > On 03.04.19 12:28, Pascal Suter wrote: >> Hi all >> >> I am currently testing gluster on a single server. I have three >> bricks, each a hardware RAID6 volume with thin provisioned LVM that >> was aligned to the RAID and then formatted with xfs. >> >> i've created a distributed volume so that entire files get >> distributed across my three bricks. >> >> first I ran a iozone benchmark across each brick testing the read and >> write perofrmance of a single large file per brick >> >> i then mounted my gluster volume locally and ran another iozone run >> with the same parameters writing a single file. the file went to >> brick 1 which, when used driectly, would write with 2.3GB/s and read >> with 1.5GB/s. however, through gluster i got only 800MB/s read and >> 750MB/s write throughput >> >> another run with two processes each writing a file, where one file >> went to the first brick and the other file to the second brick (which >> by itself when directly accessed wrote at 2.8GB/s and read at >> 2.7GB/s) resulted in 1.2GB/s of aggregated write and also aggregated >> read throughput. >> >> Is this a normal performance i can expect out of a glusterfs or is it >> worth tuning in order to really get closer to the actual brick >> filesystem performance? >> >> here are the iozone commands i use for writing and reading.. note >> that i am using directIO in order to make sure i don't get fooled by >> cache :) >> >> ./iozone -i 0 -t 1 -F /mnt/brick${b}/thread1 -+n -c -C -e -I -w -+S 0 >> -s $filesize -r $recordsize > iozone-brick${b}-write.txt >> >> ./iozone -i 1 -t 1 -F /mnt/brick${b}/thread1 -+n -c -C -e -I -w -+S 0 >> -s $filesize -r $recordsize > iozone-brick${b}-read.txt >> >> cheers >> >> Pascal >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users From rkavunga at redhat.com Wed Apr 10 10:16:25 2019 From: rkavunga at redhat.com (RAFI KC) Date: Wed, 10 Apr 2019 15:46:25 +0530 Subject: [Gluster-users] [Gluster-devel] Replica 3 - how to replace failed node (peer) In-Reply-To: References: <0917AF4A-76EC-4A9E-820F-E0ADA2DA899A@gmail.com> Message-ID: <3292fe0e-f164-43c0-f922-fa2176158749@redhat.com> reset brick is another way of replacing a brick. this usually helpful, when you want to replace the brick with same name. You can find the documentation here https://docs.gluster.org/en/latest/release-notes/3.9.0/#introducing-reset-brick-command. In your case, I think you can use replace brick. So you can initiate a reset-brick start, then you have to replace your failed disk and create new brick with same name . Once you have healthy disk and brick, you can commit the reset-brick. Let's know if you have any question, Rafi KC On 4/10/19 3:39 PM, David Spisla wrote: > Hello Martin, > > look here: > https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.4/pdf/administration_guide/Red_Hat_Gluster_Storage-3.4-Administration_Guide-en-US.pdf > on page 324. There is a manual how to replace a brick in case of a > hardware failure > > Regards > David Spisla > > Am Mi., 10. Apr. 2019 um 11:42?Uhr schrieb Martin Toth > >: > > Hi all, > > I am running replica 3 gluster with 3 bricks. One of my servers > failed - all disks are showing errors and raid is in fault state. > > Type: Replicate > Volume ID: 41d5c283-3a74-4af8-a55d-924447bfa59a > Status: Started > Number of Bricks: 1 x 3 = 3 > Transport-type: tcp > Bricks: > Brick1: node1.san:/tank/gluster/gv0imagestore/brick1 > Brick2: node2.san:/tank/gluster/gv0imagestore/brick1 is down > Brick3: node3.san:/tank/gluster/gv0imagestore/brick1 > > So one of my bricks is totally failed (node2). It went down and > all data are lost (failed raid on node2). Now I am running only > two bricks on 2 servers out from 3. > This is really critical problem for us, we can lost all data. I > want to add new disks to node2, create new raid array on them and > try to replace failed brick on this node. > > What is the procedure of replacing Brick2 on node2, can someone > advice? I can?t find anything relevant in documentation. > > Thanks in advance, > Martin > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > > > _______________________________________________ > Gluster-devel mailing list > Gluster-devel at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-devel -------------- next part -------------- An HTML attachment was scrubbed... URL: From ksubrahm at redhat.com Wed Apr 10 10:20:40 2019 From: ksubrahm at redhat.com (Karthik Subrahmanya) Date: Wed, 10 Apr 2019 15:50:40 +0530 Subject: [Gluster-users] Replica 3 - how to replace failed node (peer) In-Reply-To: <0917AF4A-76EC-4A9E-820F-E0ADA2DA899A@gmail.com> References: <0917AF4A-76EC-4A9E-820F-E0ADA2DA899A@gmail.com> Message-ID: Hi Martin, After you add the new disks and creating raid array, you can run the following command to replace the old brick with new one: - If you are going to use a different name to the new brick you can run gluster volume replace-brick commit force - If you are planning to use the same name for the new brick as well then you can use gluster volume reset-brick commit force Here old-brick & new-brick's hostname & path should be same. After replacing the brick, make sure the brick comes online using volume status. Heal should automatically start, you can check the heal status to see all the files gets replicated to the newly added brick. If it does not start automatically, you can manually start that by running gluster volume heal . HTH, Karthik On Wed, Apr 10, 2019 at 3:13 PM Martin Toth wrote: > Hi all, > > I am running replica 3 gluster with 3 bricks. One of my servers failed - > all disks are showing errors and raid is in fault state. > > Type: Replicate > Volume ID: 41d5c283-3a74-4af8-a55d-924447bfa59a > Status: Started > Number of Bricks: 1 x 3 = 3 > Transport-type: tcp > Bricks: > Brick1: node1.san:/tank/gluster/gv0imagestore/brick1 > Brick2: node2.san:/tank/gluster/gv0imagestore/brick1 Brick3: node3.san:/tank/gluster/gv0imagestore/brick1 > > So one of my bricks is totally failed (node2). It went down and all data > are lost (failed raid on node2). Now I am running only two bricks on 2 > servers out from 3. > This is really critical problem for us, we can lost all data. I want to > add new disks to node2, create new raid array on them and try to replace > failed brick on this node. > > What is the procedure of replacing Brick2 on node2, can someone advice? I > can?t find anything relevant in documentation. > > Thanks in advance, > Martin > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From snowmailer at gmail.com Wed Apr 10 10:34:37 2019 From: snowmailer at gmail.com (Martin Toth) Date: Wed, 10 Apr 2019 12:34:37 +0200 Subject: [Gluster-users] [External] Replica 3 - how to replace failed node (peer) In-Reply-To: References: <0917AF4A-76EC-4A9E-820F-E0ADA2DA899A@gmail.com> Message-ID: <804C3826-0173-431C-A286-085E7E582212@gmail.com> I?ve read this documentation but step 4 is really unclear to me. I don?t understand related mkdir/rmdir/setfattr and so on. Step 4: Using the gluster volume fuse mount (In this example: /mnt/r2) set up metadata so that data will be synced to new brick (In this case it is from Server1:/home/gfs/r2_1 to Server1:/home/gfs/r2_5) Why should I change trusted.non-existent-key on this volume? It is even more confusing because other mentioned howtos does not contain this step at all. BR, Martin > On 10 Apr 2019, at 11:54, Davide Obbi wrote: > > https://docs.gluster.org/en/v3/Administrator%20Guide/Managing%20Volumes/#replace-faulty-brick > > On Wed, Apr 10, 2019 at 11:42 AM Martin Toth > wrote: > Hi all, > > I am running replica 3 gluster with 3 bricks. One of my servers failed - all disks are showing errors and raid is in fault state. > > Type: Replicate > Volume ID: 41d5c283-3a74-4af8-a55d-924447bfa59a > Status: Started > Number of Bricks: 1 x 3 = 3 > Transport-type: tcp > Bricks: > Brick1: node1.san:/tank/gluster/gv0imagestore/brick1 > Brick2: node2.san:/tank/gluster/gv0imagestore/brick1 Brick3: node3.san:/tank/gluster/gv0imagestore/brick1 > > So one of my bricks is totally failed (node2). It went down and all data are lost (failed raid on node2). Now I am running only two bricks on 2 servers out from 3. > This is really critical problem for us, we can lost all data. I want to add new disks to node2, create new raid array on them and try to replace failed brick on this node. > > What is the procedure of replacing Brick2 on node2, can someone advice? I can?t find anything relevant in documentation. > > Thanks in advance, > Martin > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > > -- > Davide Obbi > Senior System Administrator > > Booking.com B.V. > Vijzelstraat 66-80 Amsterdam 1017HL Netherlands > Direct +31207031558 > > Empowering people to experience the world since 1996 > 43 languages, 214+ offices worldwide, 141,000+ global destinations, 29 million reported listings > Subsidiary of Booking Holdings Inc. (NASDAQ: BKNG) -------------- next part -------------- An HTML attachment was scrubbed... URL: From snowmailer at gmail.com Wed Apr 10 10:38:13 2019 From: snowmailer at gmail.com (Martin Toth) Date: Wed, 10 Apr 2019 12:38:13 +0200 Subject: [Gluster-users] Replica 3 - how to replace failed node (peer) In-Reply-To: References: <0917AF4A-76EC-4A9E-820F-E0ADA2DA899A@gmail.com> Message-ID: <1634978A-E849-48DB-A160-B1AC3DB56D38@gmail.com> Thanks, this looks ok to me, I will reset brick because I don't have any data anymore on failed node so I can use same path / brick name. Is reseting brick dangerous command? Should I be worried about some possible failure that will impact remaining two nodes? I am running really old 3.7.6 but stable version. Thanks, BR! Martin > On 10 Apr 2019, at 12:20, Karthik Subrahmanya wrote: > > Hi Martin, > > After you add the new disks and creating raid array, you can run the following command to replace the old brick with new one: > > - If you are going to use a different name to the new brick you can run > gluster volume replace-brick commit force > > - If you are planning to use the same name for the new brick as well then you can use > gluster volume reset-brick commit force > Here old-brick & new-brick's hostname & path should be same. > > After replacing the brick, make sure the brick comes online using volume status. > Heal should automatically start, you can check the heal status to see all the files gets replicated to the newly added brick. If it does not start automatically, you can manually start that by running gluster volume heal . > > HTH, > Karthik > > On Wed, Apr 10, 2019 at 3:13 PM Martin Toth > wrote: > Hi all, > > I am running replica 3 gluster with 3 bricks. One of my servers failed - all disks are showing errors and raid is in fault state. > > Type: Replicate > Volume ID: 41d5c283-3a74-4af8-a55d-924447bfa59a > Status: Started > Number of Bricks: 1 x 3 = 3 > Transport-type: tcp > Bricks: > Brick1: node1.san:/tank/gluster/gv0imagestore/brick1 > Brick2: node2.san:/tank/gluster/gv0imagestore/brick1 Brick3: node3.san:/tank/gluster/gv0imagestore/brick1 > > So one of my bricks is totally failed (node2). It went down and all data are lost (failed raid on node2). Now I am running only two bricks on 2 servers out from 3. > This is really critical problem for us, we can lost all data. I want to add new disks to node2, create new raid array on them and try to replace failed brick on this node. > > What is the procedure of replacing Brick2 on node2, can someone advice? I can?t find anything relevant in documentation. > > Thanks in advance, > Martin > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From ksubrahm at redhat.com Wed Apr 10 12:26:36 2019 From: ksubrahm at redhat.com (Karthik Subrahmanya) Date: Wed, 10 Apr 2019 17:56:36 +0530 Subject: [Gluster-users] [External] Replica 3 - how to replace failed node (peer) In-Reply-To: <804C3826-0173-431C-A286-085E7E582212@gmail.com> References: <0917AF4A-76EC-4A9E-820F-E0ADA2DA899A@gmail.com> <804C3826-0173-431C-A286-085E7E582212@gmail.com> Message-ID: Hi Martin, The reset-brick command is introduced in 3.9.0 and not present in 3.7.6. You can try using the same replace-brick command with the force option even if you want to use the same name for the brick being replaced. 3.7.6 is EOLed long back and glusterfs-6 is the latest version with lots of improvements, bug fixes and new features. The release schedule can be found at [1]. Upgrading to one of the maintained branch is highly recommended. On Wed, Apr 10, 2019 at 4:14 PM Martin Toth wrote: > I?ve read this documentation but step 4 is really unclear to me. I don?t > understand related mkdir/rmdir/setfattr and so on. > > Step 4: > > *Using the gluster volume fuse mount (In this example: /mnt/r2) set up > metadata so that data will be synced to new brick (In this case it is > from Server1:/home/gfs/r2_1 to Server1:/home/gfs/r2_5)* > > Why should I change trusted.non-existent-key on this volume? > It is even more confusing because other mentioned howtos does not contain > this step at all. > Those steps were needed in the older releases to set some metadata on the good bricks so that heal should not happen from the replaced brick to good bricks, which can lead to data loss. Since you are on 3.7.6, we have automated all these steps for you in that branch. You just need to run the replace-brick command, which will take care of all those things. [1] https://www.gluster.org/release-schedule/ Regards, Karthik > > BR, > Martin > > On 10 Apr 2019, at 11:54, Davide Obbi wrote: > > > https://docs.gluster.org/en/v3/Administrator%20Guide/Managing%20Volumes/#replace-faulty-brick > > On Wed, Apr 10, 2019 at 11:42 AM Martin Toth wrote: > >> Hi all, >> >> I am running replica 3 gluster with 3 bricks. One of my servers failed - >> all disks are showing errors and raid is in fault state. >> >> Type: Replicate >> Volume ID: 41d5c283-3a74-4af8-a55d-924447bfa59a >> Status: Started >> Number of Bricks: 1 x 3 = 3 >> Transport-type: tcp >> Bricks: >> Brick1: node1.san:/tank/gluster/gv0imagestore/brick1 >> Brick2: node2.san:/tank/gluster/gv0imagestore/brick1 > Brick3: node3.san:/tank/gluster/gv0imagestore/brick1 >> >> So one of my bricks is totally failed (node2). It went down and all data >> are lost (failed raid on node2). Now I am running only two bricks on 2 >> servers out from 3. >> This is really critical problem for us, we can lost all data. I want to >> add new disks to node2, create new raid array on them and try to replace >> failed brick on this node. >> >> What is the procedure of replacing Brick2 on node2, can someone advice? I >> can?t find anything relevant in documentation. >> >> Thanks in advance, >> Martin >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users > > > > -- > Davide Obbi > Senior System Administrator > > Booking.com B.V. > Vijzelstraat 66-80 Amsterdam 1017HL Netherlands > Direct +31207031558 > [image: Booking.com] > Empowering people to experience the world since 1996 > 43 languages, 214+ offices worldwide, 141,000+ global destinations, 29 > million reported listings > Subsidiary of Booking Holdings Inc. (NASDAQ: BKNG) > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From rkavunga at redhat.com Wed Apr 10 13:05:01 2019 From: rkavunga at redhat.com (Rafi Kavungal Chundattu Parambil) Date: Wed, 10 Apr 2019 09:05:01 -0400 (EDT) Subject: [Gluster-users] Gluster snapshot fails In-Reply-To: <1800297079.797563.1554843999336@mail.yahoo.com> References: <1800297079.797563.1554843999336.ref@mail.yahoo.com> <1800297079.797563.1554843999336@mail.yahoo.com> Message-ID: <1066182693.15102103.1554901501174.JavaMail.zimbra@redhat.com> Hi Strahil, The name of device is not at all a problem here. Can you please check the log of glusterd, and see if there is any useful information about the failure. Also please provide the output of `lvscan` and `lvs --noheadings -o pool_lv` from all nodes Regards Rafi KC ----- Original Message ----- From: "Strahil Nikolov" To: gluster-users at gluster.org Sent: Wednesday, April 10, 2019 2:36:39 AM Subject: [Gluster-users] Gluster snapshot fails Hello Community, I have a problem running a snapshot of a replica 3 arbiter 1 volume. Error: [root at ovirt2 ~]# gluster snapshot create before-423 engine description "Before upgrade of engine from 4.2.2 to 4.2.3" snapshot create: failed: Snapshot is supported only for thin provisioned LV. Ensure that all bricks of engine are thinly provisioned LV. Snapshot command failed Volume info: Volume Name: engine Type: Replicate Volume ID: 30ca1cc2-f2f7-4749-9e2e-cee9d7099ded Status: Started Snapshot Count: 0 Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: ovirt1:/gluster_bricks/engine/engine Brick2: ovirt2:/gluster_bricks/engine/engine Brick3: ovirt3:/gluster_bricks/engine/engine (arbiter) Options Reconfigured: cluster.granular-entry-heal: enable performance.strict-o-direct: on network.ping-timeout: 30 storage.owner-gid: 36 storage.owner-uid: 36 user.cifs: off features.shard: on cluster.shd-wait-qlength: 10000 cluster.shd-max-threads: 8 cluster.locking-scheme: granular cluster.data-self-heal-algorithm: full cluster.server-quorum-type: server cluster.quorum-type: auto cluster.eager-lock: enable network.remote-dio: off performance.low-prio-threads: 32 performance.io-cache: off performance.read-ahead: off performance.quick-read: off transport.address-family: inet nfs.disable: on performance.client-io-threads: off cluster.enable-shared-storage: enable All bricks are on thin lvm with plenty of space, the only thing that could be causing it is that ovirt1 & ovirt2 are on /dev/gluster_vg_ssd/gluster_lv_engine , while arbiter is on /dev/gluster_vg_sda3/gluster_lv_engine. Is that the issue ? Should I rename my brick's VG ? If so, why there is no mentioning in the documentation ? Best Regards, Strahil Nikolov _______________________________________________ Gluster-users mailing list Gluster-users at gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users From karim.roumani at tekreach.com Mon Apr 1 20:00:20 2019 From: karim.roumani at tekreach.com (Karim Roumani) Date: Mon, 1 Apr 2019 13:00:20 -0700 Subject: [Gluster-users] Help: gluster-block In-Reply-To: References: Message-ID: Thank you Prasanna for your quick response very much appreaciated we will review and get back to you. ? On Mon, Mar 25, 2019 at 9:00 AM Prasanna Kalever wrote: > [ adding +gluster-users for archive purpose ] > > On Sat, Mar 23, 2019 at 1:51 AM Jeffrey Chin > wrote: > > > > Hello Mr. Kalever, > > Hello Jeffrey, > > > > > I am currently working on a project to utilize GlusterFS for VMWare VMs. > In our research, we found that utilizing block devices with GlusterFS would > be the best approach for our use case (correct me if I am wrong). I saw the > gluster utility that you are a contributor for called gluster-block ( > https://github.com/gluster/gluster-block), and I had a question about the > configuration. From what I understand, gluster-block only works on the > servers that are serving the gluster volume. Would it be possible to run > the gluster-block utility on a client machine that has a gluster volume > mounted to it? > > Yes, that is right! At the moment gluster-block is coupled with > glusterd for simplicity. > But we have made some changes here [1] to provide a way to specify > server address (volfile-server) which is outside the gluster-blockd > node, please take a look. > > Although it is not complete solution, but it should at-least help for > some usecases. Feel free to raise an issue [2] with the details about > your usecase and etc or submit a PR by your self :-) > We never picked it, as we never have a usecase needing separation of > gluster-blockd and glusterd. > > > > > I also have another question: how do I make the iSCSI targets persist if > all of the gluster nodes were rebooted? It seems like once all of the nodes > reboot, I am unable to reconnect to the iSCSI targets created by the > gluster-block utility. > > do you mean rebooting iscsi initiator ? or gluster-block/gluster > target/server nodes ? > > 1. for initiator to automatically connect to block devices post > reboot, we need to make below changes in /etc/iscsi/iscsid.conf: > node.startup = automatic > > 2. if you mean, just in case if all the gluster nodes goes down, on > the initiator all the available HA path's will be down, but we still > want the IO to be queued on the initiator, until one of the path > (gluster node) is availabe: > > for this in gluster-block sepcific section of multipath.conf you need > to replace 'no_path_retry 120' as 'no_path_retry queue' > Note: refer README for current multipath.conf setting recommendations. > > [1] https://github.com/gluster/gluster-block/pull/161 > [2] https://github.com/gluster/gluster-block/issues/new > > BRs, > -- > Prasanna > -- Thank you, *Karim Roumani* Director of Technology Solutions TekReach Solutions / Albatross Cloud 714-916-5677 Karim.Roumani at tekreach.com Albatross.cloud - One Stop Cloud Solutions Portalfronthosting.com - Complete SharePoint Solutions -------------- next part -------------- An HTML attachment was scrubbed... URL: From karim.roumani at tekreach.com Mon Apr 1 20:03:54 2019 From: karim.roumani at tekreach.com (Karim Roumani) Date: Mon, 1 Apr 2019 13:03:54 -0700 Subject: [Gluster-users] Help: gluster-block In-Reply-To: References: Message-ID: Actually we have a question. We did two tests as follows. Test 1 - iSCSI target on the glusterFS server Test 2 - iSCSI target on a separate server with gluster client Test 2 performed a read speed of <1GB/second while Test 1 about 300MB/second Any reason you see to why this may be the case? ? On Mon, Apr 1, 2019 at 1:00 PM Karim Roumani wrote: > Thank you Prasanna for your quick response very much appreaciated we will > review and get back to you. > ? > > On Mon, Mar 25, 2019 at 9:00 AM Prasanna Kalever > wrote: > >> [ adding +gluster-users for archive purpose ] >> >> On Sat, Mar 23, 2019 at 1:51 AM Jeffrey Chin >> wrote: >> > >> > Hello Mr. Kalever, >> >> Hello Jeffrey, >> >> > >> > I am currently working on a project to utilize GlusterFS for VMWare >> VMs. In our research, we found that utilizing block devices with GlusterFS >> would be the best approach for our use case (correct me if I am wrong). I >> saw the gluster utility that you are a contributor for called gluster-block >> (https://github.com/gluster/gluster-block), and I had a question about >> the configuration. From what I understand, gluster-block only works on the >> servers that are serving the gluster volume. Would it be possible to run >> the gluster-block utility on a client machine that has a gluster volume >> mounted to it? >> >> Yes, that is right! At the moment gluster-block is coupled with >> glusterd for simplicity. >> But we have made some changes here [1] to provide a way to specify >> server address (volfile-server) which is outside the gluster-blockd >> node, please take a look. >> >> Although it is not complete solution, but it should at-least help for >> some usecases. Feel free to raise an issue [2] with the details about >> your usecase and etc or submit a PR by your self :-) >> We never picked it, as we never have a usecase needing separation of >> gluster-blockd and glusterd. >> >> > >> > I also have another question: how do I make the iSCSI targets persist >> if all of the gluster nodes were rebooted? It seems like once all of the >> nodes reboot, I am unable to reconnect to the iSCSI targets created by the >> gluster-block utility. >> >> do you mean rebooting iscsi initiator ? or gluster-block/gluster >> target/server nodes ? >> >> 1. for initiator to automatically connect to block devices post >> reboot, we need to make below changes in /etc/iscsi/iscsid.conf: >> node.startup = automatic >> >> 2. if you mean, just in case if all the gluster nodes goes down, on >> the initiator all the available HA path's will be down, but we still >> want the IO to be queued on the initiator, until one of the path >> (gluster node) is availabe: >> >> for this in gluster-block sepcific section of multipath.conf you need >> to replace 'no_path_retry 120' as 'no_path_retry queue' >> Note: refer README for current multipath.conf setting recommendations. >> >> [1] https://github.com/gluster/gluster-block/pull/161 >> [2] https://github.com/gluster/gluster-block/issues/new >> >> BRs, >> -- >> Prasanna >> > > > -- > > Thank you, > > *Karim Roumani* > Director of Technology Solutions > > TekReach Solutions / Albatross Cloud > 714-916-5677 > Karim.Roumani at tekreach.com > Albatross.cloud - One Stop Cloud Solutions > Portalfronthosting.com - Complete > SharePoint Solutions > -- Thank you, *Karim Roumani* Director of Technology Solutions TekReach Solutions / Albatross Cloud 714-916-5677 Karim.Roumani at tekreach.com Albatross.cloud - One Stop Cloud Solutions Portalfronthosting.com - Complete SharePoint Solutions -------------- next part -------------- An HTML attachment was scrubbed... URL: From ingo at fischer-ka.de Tue Apr 2 20:23:34 2019 From: ingo at fischer-ka.de (Ingo Fischer) Date: Tue, 2 Apr 2019 22:23:34 +0200 Subject: [Gluster-users] Is "replica 4 arbiter 1" allowed to tweak client-quorum? Message-ID: <39133cb5-f3fe-4fd6-2dba-45058cdb5f1a@fischer-ka.de> Hi All, I had a replica 2 cluster to host my VM images from my Proxmox cluster. I got a bit around split brain scenarios by using "nufa" to make sure the files are located on the host where the machine also runs normally. So in fact one replica could fail and I still had the VM working. But then I thought about doing better and decided to add a node to increase replica and I decided against arbiter approach. During this I also decided to go away from nufa to make it a more normal approach. But in fact by adding the third replica and removing nufa I'm not really better on availability - only split-brain-chance. I'm still at the point that only one node is allowed to fail because else the now active client quorum is no longer met and FS goes read only (which in fact is not really better then failing completely as it was before). So I thought about adding arbiter bricks as "kind of 4th replica (but without space needs) ... but then I read in docs that only "replica 3 arbiter 1" is allowed as combination. Is this still true? If docs are true: Why arbiter is not allowed for higher replica counts? It would allow to improve on client quorum in my understanding. Thank you for your opinion and/or facts :-) Ingo From olaf.buitelaar at gmail.com Tue Apr 2 15:48:07 2019 From: olaf.buitelaar at gmail.com (Olaf Buitelaar) Date: Tue, 2 Apr 2019 17:48:07 +0200 Subject: [Gluster-users] [ovirt-users] Re: Announcing Gluster release 5.5 In-Reply-To: References: <20190328164716.27693.35887@mail.ovirt.org> Message-ID: Dear Krutika, 1. I've changed the volume settings, write performance seems to increased somewhat, however the profile doesn't really support that since latencies increased. However read performance has diminished, which does seem to be supported by the profile runs (attached). Also the IO does seem to behave more consistent than before. I don't really understand the idea behind them, maybe you can explain why these suggestions are good? These settings seems to avoid as much local caching and access as possible and push everything to the gluster processes. While i would expect local access and local caches are a good thing, since it would lead to having less network access or disk access. I tried to investigate these settings a bit more, and this is what i understood of them; - network.remote-dio; when on it seems to ignore the O_DIRECT flag in the client, thus causing the files to be cached and buffered in the page cache on the client, i would expect this to be a good thing especially if the server process would access the same page cache? At least that is what grasp from this commit; https://review.gluster.org/#/c/glusterfs/+/4206/2/xlators/protocol/client/src/client.c line 867 Also found this commit; https://github.com/gluster/glusterfs/commit/06c4ba589102bf92c58cd9fba5c60064bc7a504e#diff-938709e499b4383c3ed33c3979b9080c suggesting remote-dio actually improves performance, not sure it's a write or read benchmark When a file is opened with O_DIRECT it will also disable the write-behind functionality - performance.strict-o-direct: when on, the AFR, will not ignore the O_DIRECT flag. and will invoke: fop_writev_stub with the wb_writev_helper, which seems to stack the operation, no idea why that is. But generally i suppose not ignoring the O_DIRECT flag in the AFR is a good thing, when a processes requests to have O_DIRECT. So this makes sense to me. - cluster.choose-local: when off, it doesn't prefer the local node, but would always choose a brick. Since it's a 9 node cluster, with 3 subvolumes, only a 1/3 could end-up local, and the other 2/3 should be pushed to external nodes anyway. Or am I making the total wrong assumption here? It seems to this config is moving to the gluster-block config side of things, which does make sense. Since we're running quite some mysql instances, which opens the files with O_DIRECt i believe, it would mean the only layer of cache is within mysql it self. Which you could argue is a good thing. But i would expect a little of write-behind buffer, and maybe some of the data cached within gluster would alleviate things a bit on gluster's side. But i wouldn't know if that's the correct mind set, and so might be totally off here. Also i would expect these gluster v set command to be online operations, but somehow the bricks went down, after applying these changes. What appears to have happened is that after the update the brick process was restarted, but due to multiple brick process start issue, multiple processes were started, and the brick didn't came online again. However i'll try to reproduce this, since i would like to test with cluster.choose-local: on, and see how performance compares. And hopefully when it occurs collect some useful info. Question; are network.remote-dio and performance.strict-o-direct mutually exclusive settings, or can they both be on? 2. I've attached all brick logs, the only thing relevant i found was; [2019-03-28 20:20:07.170452] I [MSGID: 113030] [posix-entry-ops.c:1146:posix_unlink] 0-ovirt-kube-posix: open-fd-key-status: 0 for /data/gfs/bricks/brick1/ovirt-kube/.shard/a38d64bc-a28b-4ee1-a0bb-f919e7a1022c.109886 [2019-03-28 20:20:07.170491] I [MSGID: 113031] [posix-entry-ops.c:1053:posix_skip_non_linkto_unlink] 0-posix: linkto_xattr status: 0 for /data/gfs/bricks/brick1/ovirt-kube/.shard/a38d64bc-a28b-4ee1-a0bb-f919e7a1022c.109886 [2019-03-28 20:20:07.248480] I [MSGID: 113030] [posix-entry-ops.c:1146:posix_unlink] 0-ovirt-kube-posix: open-fd-key-status: 0 for /data/gfs/bricks/brick1/ovirt-kube/.shard/a38d64bc-a28b-4ee1-a0bb-f919e7a1022c.109886 [2019-03-28 20:20:07.248491] I [MSGID: 113031] [posix-entry-ops.c:1053:posix_skip_non_linkto_unlink] 0-posix: linkto_xattr status: 0 for /data/gfs/bricks/brick1/ovirt-kube/.shard/a38d64bc-a28b-4ee1-a0bb-f919e7a1022c.109886 Thanks Olaf ps. sorry needed to resend since it exceed the file limit Op ma 1 apr. 2019 om 07:56 schreef Krutika Dhananjay : > Adding back gluster-users > Comments inline ... > > On Fri, Mar 29, 2019 at 8:11 PM Olaf Buitelaar > wrote: > >> Dear Krutika, >> >> >> >> 1. I?ve made 2 profile runs of around 10 minutes (see files >> profile_data.txt and profile_data2.txt). Looking at it, most time seems be >> spent at the fop?s fsync and readdirp. >> >> Unfortunate I don?t have the profile info for the 3.12.15 version so it?s >> a bit hard to compare. >> >> One additional thing I do notice on 1 machine (10.32.9.5) the iowait time >> increased a lot, from an average below the 1% it?s now around the 12% after >> the upgrade. >> >> So first suspicion with be lighting strikes twice, and I?ve also just now >> a bad disk, but that doesn?t appear to be the case, since all smart status >> report ok. >> >> Also dd shows performance I would more or less expect; >> >> dd if=/dev/zero of=/data/test_file bs=100M count=1 oflag=dsync >> >> 1+0 records in >> >> 1+0 records out >> >> 104857600 bytes (105 MB) copied, 0.686088 s, 153 MB/s >> >> dd if=/dev/zero of=/data/test_file bs=1G count=1 oflag=dsync >> >> 1+0 records in >> >> 1+0 records out >> >> 1073741824 bytes (1.1 GB) copied, 7.61138 s, 141 MB/s >> >> if=/dev/urandom of=/data/test_file bs=1024 count=1000000 >> >> 1000000+0 records in >> >> 1000000+0 records out >> >> 1024000000 bytes (1.0 GB) copied, 6.35051 s, 161 MB/s >> >> dd if=/dev/zero of=/data/test_file bs=1024 count=1000000 >> >> 1000000+0 records in >> >> 1000000+0 records out >> >> 1024000000 bytes (1.0 GB) copied, 1.6899 s, 606 MB/s >> >> When I disable this brick (service glusterd stop; pkill glusterfsd) >> performance in gluster is better, but not on par with what it was. Also the >> cpu usages on the ?neighbor? nodes which hosts the other bricks in the same >> subvolume increases quite a lot in this case, which I wouldn?t expect >> actually since they shouldn't handle much more work, except flagging shards >> to heal. Iowait also goes to idle once gluster is stopped, so it?s for >> sure gluster which waits for io. >> >> >> > > So I see that FSYNC %-latency is on the higher side. And I also noticed > you don't have direct-io options enabled on the volume. > Could you set the following options on the volume - > # gluster volume set network.remote-dio off > # gluster volume set performance.strict-o-direct on > and also disable choose-local > # gluster volume set cluster.choose-local off > > let me know if this helps. > > 2. I?ve attached the mnt log and volume info, but I couldn?t find anything >> relevant in in those logs. I think this is because we run the VM?s with >> libgfapi; >> >> [root at ovirt-host-01 ~]# engine-config -g LibgfApiSupported >> >> LibgfApiSupported: true version: 4.2 >> >> LibgfApiSupported: true version: 4.1 >> >> LibgfApiSupported: true version: 4.3 >> >> And I can confirm the qemu process is invoked with the gluster:// address >> for the images. >> >> The message is logged in the /var/lib/libvert/qemu/ file, which >> I?ve also included. For a sample case see around; 2019-03-28 20:20:07 >> >> Which has the error; E [MSGID: 133010] >> [shard.c:2294:shard_common_lookup_shards_cbk] 0-ovirt-kube-shard: Lookup on >> shard 109886 failed. Base file gfid = a38d64bc-a28b-4ee1-a0bb-f919e7a1022c >> [Stale file handle] >> > > Could you also attach the brick logs for this volume? > > >> >> 3. yes I see multiple instances for the same brick directory, like; >> >> /usr/sbin/glusterfsd -s 10.32.9.6 --volfile-id >> ovirt-core.10.32.9.6.data-gfs-bricks-brick1-ovirt-core -p >> /var/run/gluster/vols/ovirt-core/10.32.9.6-data-gfs-bricks-brick1-ovirt-core.pid >> -S /var/run/gluster/452591c9165945d9.socket --brick-name >> /data/gfs/bricks/brick1/ovirt-core -l >> /var/log/glusterfs/bricks/data-gfs-bricks-brick1-ovirt-core.log >> --xlator-option *-posix.glusterd-uuid=fb513da6-f3bd-4571-b8a2-db5efaf60cc1 >> --process-name brick --brick-port 49154 --xlator-option >> ovirt-core-server.listen-port=49154 >> >> >> >> I?ve made an export of the output of ps from the time I observed these >> multiple processes. >> >> In addition the brick_mux bug as noted by Atin. I might also have another >> possible cause, as ovirt moves nodes from none-operational state or >> maintenance state to active/activating, it also seems to restart gluster, >> however I don?t have direct proof for this theory. >> >> >> > > +Atin Mukherjee ^^ > +Mohit Agrawal ^^ > > -Krutika > > Thanks Olaf >> >> Op vr 29 mrt. 2019 om 10:03 schreef Sandro Bonazzola > >: >> >>> >>> >>> Il giorno gio 28 mar 2019 alle ore 17:48 ha >>> scritto: >>> >>>> Dear All, >>>> >>>> I wanted to share my experience upgrading from 4.2.8 to 4.3.1. While >>>> previous upgrades from 4.1 to 4.2 etc. went rather smooth, this one was a >>>> different experience. After first trying a test upgrade on a 3 node setup, >>>> which went fine. i headed to upgrade the 9 node production platform, >>>> unaware of the backward compatibility issues between gluster 3.12.15 -> >>>> 5.3. After upgrading 2 nodes, the HA engine stopped and wouldn't start. >>>> Vdsm wasn't able to mount the engine storage domain, since /dom_md/metadata >>>> was missing or couldn't be accessed. Restoring this file by getting a good >>>> copy of the underlying bricks, removing the file from the underlying bricks >>>> where the file was 0 bytes and mark with the stickybit, and the >>>> corresponding gfid's. Removing the file from the mount point, and copying >>>> back the file on the mount point. Manually mounting the engine domain, and >>>> manually creating the corresponding symbolic links in /rhev/data-center and >>>> /var/run/vdsm/storage and fixing the ownership back to vdsm.kvm (which was >>>> root.root), i was able to start the HA engine again. Since the engine was >>>> up again, and things seemed rather unstable i decided to continue the >>>> upgrade on the other nodes suspecting an incompatibility in gluster >>>> versions, i thought would be best to have them all on the same version >>>> rather soonish. However things went from bad to worse, the engine stopped >>>> again, and all vm?s stopped working as well. So on a machine outside the >>>> setup and restored a backup of the engine taken from version 4.2.8 just >>>> before the upgrade. With this engine I was at least able to start some vm?s >>>> again, and finalize the upgrade. Once the upgraded, things didn?t stabilize >>>> and also lose 2 vm?s during the process due to image corruption. After >>>> figuring out gluster 5.3 had quite some issues I was as lucky to see >>>> gluster 5.5 was about to be released, on the moment the RPM?s were >>>> available I?ve installed those. This helped a lot in terms of stability, >>>> for which I?m very grateful! However the performance is unfortunate >>>> terrible, it?s about 15% of what the performance was running gluster >>>> 3.12.15. It?s strange since a simple dd shows ok performance, but our >>>> actual workload doesn?t. While I would expect the performance to be better, >>>> due to all improvements made since gluster version 3.12. Does anybody share >>>> the same experience? >>>> I really hope gluster 6 will soon be tested with ovirt and released, >>>> and things start to perform and stabilize again..like the good old days. Of >>>> course when I can do anything, I?m happy to help. >>>> >>> >>> Opened https://bugzilla.redhat.com/show_bug.cgi?id=1693998 to track the >>> rebase on Gluster 6. >>> >>> >>> >>>> >>>> I think the following short list of issues we have after the migration; >>>> Gluster 5.5; >>>> - Poor performance for our workload (mostly write dependent) >>>> - VM?s randomly pause on unknown storage errors, which are ?stale >>>> file?s?. corresponding log; Lookup on shard 797 failed. Base file gfid = >>>> 8a27b91a-ff02-42dc-bd4c-caa019424de8 [Stale file handle] >>>> - Some files are listed twice in a directory (probably related >>>> the stale file issue?) >>>> Example; >>>> ls -la >>>> /rhev/data-center/59cd53a9-0003-02d7-00eb-0000000001e3/313f5d25-76af-4ecd-9a20-82a2fe815a3c/images/4add6751-3731-4bbd-ae94-aaeed12ea450/ >>>> total 3081 >>>> drwxr-x---. 2 vdsm kvm 4096 Mar 18 11:34 . >>>> drwxr-xr-x. 13 vdsm kvm 4096 Mar 19 09:42 .. >>>> -rw-rw----. 1 vdsm kvm 1048576 Mar 28 12:55 >>>> 1a7cf259-6b29-421d-9688-b25dfaafb13c >>>> -rw-rw----. 1 vdsm kvm 1048576 Mar 28 12:55 >>>> 1a7cf259-6b29-421d-9688-b25dfaafb13c >>>> -rw-rw----. 1 vdsm kvm 1048576 Jan 27 2018 >>>> 1a7cf259-6b29-421d-9688-b25dfaafb13c.lease >>>> -rw-r--r--. 1 vdsm kvm 290 Jan 27 2018 >>>> 1a7cf259-6b29-421d-9688-b25dfaafb13c.meta >>>> -rw-r--r--. 1 vdsm kvm 290 Jan 27 2018 >>>> 1a7cf259-6b29-421d-9688-b25dfaafb13c.meta >>>> >>>> - brick processes sometimes starts multiple times. Sometimes I?ve 5 >>>> brick processes for a single volume. Killing all glusterfsd?s for the >>>> volume on the machine and running gluster v start force usually just >>>> starts one after the event, from then on things look all right. >>>> >>>> >>> May I kindly ask to open bugs on Gluster for above issues at >>> https://bugzilla.redhat.com/enter_bug.cgi?product=GlusterFS ? >>> Sahina? >>> >>> >>>> Ovirt 4.3.2.1-1.el7 >>>> - All vms images ownership are changed to root.root after the vm >>>> is shutdown, probably related to; >>>> https://bugzilla.redhat.com/show_bug.cgi?id=1666795 but not only >>>> scoped to the HA engine. I?m still in compatibility mode 4.2 for the >>>> cluster and for the vm?s, but upgraded to version ovirt 4.3.2 >>>> >>> >>> Ryan? >>> >>> >>>> - The network provider is set to ovn, which is fine..actually >>>> cool, only the ?ovs-vswitchd? is a CPU hog, and utilizes 100% >>>> >>> >>> Miguel? Dominik? >>> >>> >>>> - It seems on all nodes vdsm tries to get the the stats for the >>>> HA engine, which is filling the logs with (not sure if this is new); >>>> [api.virt] FINISH getStats return={'status': {'message': "Virtual >>>> machine does not exist: {'vmId': u'20d69acd-edfd-4aeb-a2ae-49e9c121b7e9'}", >>>> 'code': 1}} from=::1,59290, vmId=20d69acd-edfd-4aeb-a2ae-49e9c121b7e9 >>>> (api:54) >>>> >>> >>> Simone? >>> >>> >>>> - It seems the package os_brick [root] managedvolume not >>>> supported: Managed Volume Not Supported. Missing package os-brick.: >>>> ('Cannot import os_brick',) (caps:149) which fills the vdsm.log, but for >>>> this I also saw another message, so I suspect this will already be resolved >>>> shortly >>>> - The machine I used to run the backup HA engine, doesn?t want to >>>> get removed from the hosted-engine ?vm-status, not even after running; >>>> hosted-engine --clean-metadata --host-id=10 --force-clean or hosted-engine >>>> --clean-metadata --force-clean from the machine itself. >>>> >>> >>> Simone? >>> >>> >>>> >>>> Think that's about it. >>>> >>>> Don?t get me wrong, I don?t want to rant, I just wanted to share my >>>> experience and see where things can made better. >>>> >>> >>> If not already done, can you please open bugs for above issues at >>> https://bugzilla.redhat.com/enter_bug.cgi?classification=oVirt ? >>> >>> >>>> >>>> >>>> Best Olaf >>>> _______________________________________________ >>>> Users mailing list -- users at ovirt.org >>>> To unsubscribe send an email to users-leave at ovirt.org >>>> Privacy Statement: https://www.ovirt.org/site/privacy-policy/ >>>> oVirt Code of Conduct: >>>> https://www.ovirt.org/community/about/community-guidelines/ >>>> List Archives: >>>> https://lists.ovirt.org/archives/list/users at ovirt.org/message/3CO35Q7VZMWNHS4LPUJNO7S47MGLSKS5/ >>>> >>> >>> >>> -- >>> >>> SANDRO BONAZZOLA >>> >>> MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV >>> >>> Red Hat EMEA >>> >>> sbonazzo at redhat.com >>> >>> >> _______________________________________________ >> Users mailing list -- users at ovirt.org >> To unsubscribe send an email to users-leave at ovirt.org >> Privacy Statement: https://www.ovirt.org/site/privacy-policy/ >> oVirt Code of Conduct: >> https://www.ovirt.org/community/about/community-guidelines/ >> List Archives: >> https://lists.ovirt.org/archives/list/users at ovirt.org/message/HAGTA64LF7LLE6YMHQ6DLT26MD2GZ2PK/ >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- Brick: 10.32.9.9:/data0/gfs/bricks/brick1/ovirt-data ---------------------------------------------------- Cumulative Stats: Block Size: 256b+ 512b+ 1024b+ No. of Reads: 11 200 9 No. of Writes: 2 31538 326701 Block Size: 2048b+ 4096b+ 8192b+ No. of Reads: 22 319528 527228 No. of Writes: 53880 1409021 1140345 Block Size: 16384b+ 32768b+ 65536b+ No. of Reads: 27747 3229 120201 No. of Writes: 479690 114939 144204 Block Size: 131072b+ 262144b+ 524288b+ No. of Reads: 209766 7725 43 No. of Writes: 105320 165416 8915 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 2 FORGET 0.00 0.00 us 0.00 us 0.00 us 6728 RELEASE 0.00 0.00 us 0.00 us 0.00 us 42179 RELEASEDIR 0.01 44.17 us 1.07 us 1288.76 us 2914 OPENDIR 0.02 697.13 us 42.15 us 5689.21 us 322 OPEN 0.02 411.08 us 8.60 us 5405.25 us 606 GETXATTR 0.02 1209.66 us 147.78 us 3219.56 us 234 READDIRP 0.03 38.80 us 19.08 us 7544.91 us 7757 STATFS 0.04 826.28 us 13.79 us 3583.18 us 616 READDIR 0.07 61.83 us 15.94 us 131142.59 us 13989 FSTAT 2.03 137.78 us 48.36 us 235353.97 us 172712 FXATTROP 2.16 983.89 us 10.19 us 660025.30 us 25674 LOOKUP 2.90 406.99 us 36.68 us 756289.17 us 83397 FSYNC 4.63 67941.30 us 13.93 us 1840271.15 us 798 INODELK 7.81 576.74 us 75.16 us 422586.52 us 158680 WRITE 40.09 2713.33 us 11.70 us 1850709.72 us 173111 FINODELK 40.16 3587.78 us 72.64 us 729965.74 us 131143 READ Duration: 58768 seconds Data Read: 45226370705 bytes Data Written: 133611506006 bytes Interval 9 Stats: Block Size: 512b+ 1024b+ 2048b+ No. of Reads: 0 0 0 No. of Writes: 394 387 86 Block Size: 4096b+ 8192b+ 16384b+ No. of Reads: 141 1093 13 No. of Writes: 5905 10055 2308 Block Size: 32768b+ 65536b+ 131072b+ No. of Reads: 15 515 2595 No. of Writes: 763 1465 1637 Block Size: 262144b+ 524288b+ No. of Reads: 2 0 No. of Writes: 2759 73 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 70 RELEASE 0.00 0.00 us 0.00 us 0.00 us 503 RELEASEDIR 0.00 172.94 us 46.56 us 620.23 us 70 OPEN 0.01 153.42 us 11.38 us 855.47 us 111 GETXATTR 0.01 49.63 us 1.23 us 1288.76 us 503 OPENDIR 0.01 434.10 us 27.72 us 2015.25 us 88 READDIR 0.02 1208.37 us 152.54 us 2434.77 us 46 READDIRP 0.02 43.13 us 20.02 us 2030.66 us 1361 STATFS 0.04 45.66 us 18.57 us 284.28 us 2431 FSTAT 1.20 154.41 us 75.97 us 84525.06 us 23005 FXATTROP 2.86 1865.08 us 14.26 us 212498.60 us 4518 LOOKUP 3.78 1006.27 us 38.86 us 756289.17 us 11072 FSYNC 4.27 60261.87 us 17.32 us 1437527.90 us 209 INODELK 8.19 935.38 us 76.82 us 422586.52 us 25832 WRITE 20.67 13949.32 us 89.67 us 707765.19 us 4374 READ 58.93 7494.13 us 12.88 us 1607033.18 us 23206 FINODELK Duration: 740 seconds Data Read: 385507328 bytes Data Written: 1776420864 bytes Brick: 10.32.9.5:/data/gfs/bricks/brick1/ovirt-data --------------------------------------------------- Cumulative Stats: Block Size: 256b+ 512b+ 1024b+ No. of Reads: 2 458 87 No. of Writes: 3 4507 33740 Block Size: 2048b+ 4096b+ 8192b+ No. of Reads: 54 34013 110867 No. of Writes: 6056 341153 234627 Block Size: 16384b+ 32768b+ 65536b+ No. of Reads: 7430 587 28255 No. of Writes: 70451 12767 34177 Block Size: 131072b+ 262144b+ 524288b+ No. of Reads: 2417 6164 15 No. of Writes: 40925 27615 4342 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 2 FORGET 0.00 0.00 us 0.00 us 0.00 us 49432 RELEASE 0.00 0.00 us 0.00 us 0.00 us 40899 RELEASEDIR 0.00 158.97 us 158.97 us 158.97 us 1 MKNOD 0.00 393.70 us 9.69 us 2344.17 us 8 ENTRYLK 0.00 129.60 us 10.54 us 296.61 us 112 READDIR 0.00 1299.61 us 6.43 us 155911.98 us 125 GETXATTR 0.00 3928.24 us 139.26 us 240788.91 us 236 READDIRP 0.03 3784.28 us 15.61 us 469284.63 us 1686 FSTAT 0.04 2368.24 us 28.06 us 242169.67 us 3623 OPEN 0.05 2811.93 us 8.13 us 1250845.84 us 3381 FLUSH 0.06 4385.28 us 0.80 us 527903.92 us 2653 OPENDIR 0.09 2315.69 us 11.48 us 816339.95 us 7750 STATFS 0.18 55337.88 us 8.34 us 1543417.83 us 648 INODELK 0.37 1462.23 us 6.84 us 1127299.99 us 49902 FINODELK 0.57 3924.78 us 11.60 us 968588.21 us 28256 LOOKUP 1.91 7500.40 us 53.88 us 2738720.92 us 49870 FXATTROP 2.21 30153.49 us 63.31 us 3473303.89 us 14319 READ 14.57 110289.45 us 122.19 us 3055911.44 us 25864 FSYNC 79.91 262383.20 us 98.78 us 4500846.60 us 59632 WRITE Duration: 60363 seconds Data Read: 6417030998 bytes Data Written: 27570997546 bytes Interval 9 Stats: Block Size: 512b+ 1024b+ 2048b+ No. of Reads: 0 0 0 No. of Writes: 59 2334 441 Block Size: 4096b+ 8192b+ 16384b+ No. of Reads: 13 331 7 No. of Writes: 4519 1752 790 Block Size: 32768b+ 65536b+ 131072b+ No. of Reads: 0 145 0 No. of Writes: 84 399 151 Block Size: 262144b+ 524288b+ No. of Reads: 0 0 No. of Writes: 214 31 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 615 RELEASE 0.00 0.00 us 0.00 us 0.00 us 467 RELEASEDIR 0.00 45.52 us 8.46 us 78.57 us 25 GETXATTR 0.00 144.98 us 13.50 us 296.61 us 16 READDIR 0.01 7894.70 us 180.57 us 240788.91 us 46 READDIRP 0.03 1404.57 us 1.00 us 94678.96 us 467 OPENDIR 0.06 4985.90 us 17.46 us 403210.36 us 294 FSTAT 0.06 2453.17 us 35.05 us 242169.67 us 615 OPEN 0.10 3976.83 us 9.70 us 1250845.84 us 591 FLUSH 0.10 33579.24 us 10.59 us 937670.52 us 73 INODELK 0.12 2132.57 us 14.22 us 816339.95 us 1361 STATFS 0.29 617.29 us 8.19 us 164742.40 us 11477 FINODELK 0.69 3379.94 us 17.79 us 622513.08 us 5053 LOOKUP 0.84 42003.14 us 160.61 us 1495939.39 us 496 READ 1.66 3575.85 us 68.64 us 1688509.25 us 11476 FXATTROP 22.52 95429.52 us 126.22 us 3055911.44 us 5823 FSYNC 73.52 168379.58 us 110.01 us 4058537.96 us 10773 WRITE Duration: 740 seconds Data Read: 12386304 bytes Data Written: 217700864 bytes Brick: 10.32.9.6:/data/gfs/bricks/bricka/ovirt-data --------------------------------------------------- Cumulative Stats: Block Size: 1b+ No. of Reads: 0 No. of Writes: 789986 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 2 FORGET 0.00 0.00 us 0.00 us 0.00 us 49432 RELEASE 0.00 0.00 us 0.00 us 0.00 us 40938 RELEASEDIR 0.00 21.03 us 16.15 us 34.88 us 8 ENTRYLK 0.00 270.94 us 270.94 us 270.94 us 1 MKNOD 0.01 261.74 us 11.55 us 9174.66 us 116 GETXATTR 0.01 297.48 us 13.73 us 2466.13 us 112 READDIR 0.07 64.73 us 15.29 us 4946.30 us 3382 FLUSH 0.07 82.72 us 1.51 us 4642.85 us 2661 OPENDIR 0.22 193.05 us 39.92 us 64374.98 us 3624 OPEN 0.25 1255.82 us 14.35 us 63381.45 us 648 INODELK 0.89 57.44 us 10.33 us 8940.33 us 50009 FINODELK 1.44 77.62 us 15.84 us 31914.28 us 59679 WRITE 2.59 294.62 us 15.52 us 115626.36 us 28267 LOOKUP 3.49 224.71 us 77.62 us 98174.30 us 49948 FXATTROP 90.95 11273.35 us 78.67 us 453079.55 us 25908 FSYNC Duration: 60366 seconds Data Read: 0 bytes Data Written: 789986 bytes Interval 9 Stats: Block Size: 1b+ No. of Reads: 0 No. of Writes: 10774 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 615 RELEASE 0.00 0.00 us 0.00 us 0.00 us 467 RELEASEDIR 0.00 85.11 us 12.25 us 434.77 us 14 GETXATTR 0.00 42.29 us 17.37 us 500.84 us 73 INODELK 0.00 205.10 us 15.46 us 509.24 us 16 READDIR 0.04 57.10 us 15.29 us 1829.07 us 591 FLUSH 0.05 79.87 us 1.78 us 1854.43 us 467 OPENDIR 0.11 144.84 us 45.31 us 17419.60 us 615 OPEN 0.79 55.64 us 13.17 us 8940.33 us 11478 FINODELK 0.93 69.84 us 16.64 us 6779.39 us 10774 WRITE 1.79 286.71 us 16.91 us 24721.64 us 5053 LOOKUP 3.09 218.15 us 81.40 us 54774.50 us 11476 FXATTROP 93.19 12944.68 us 111.22 us 453079.55 us 5825 FSYNC Duration: 740 seconds Data Read: 0 bytes Data Written: 10774 bytes Brick: 10.32.9.4:/data/gfs/bricks/brick1/ovirt-data --------------------------------------------------- Cumulative Stats: Block Size: 256b+ 512b+ 1024b+ No. of Reads: 52412 6 0 No. of Writes: 3 4504 33731 Block Size: 2048b+ 4096b+ 8192b+ No. of Reads: 0 66342 53000 No. of Writes: 6056 340041 234374 Block Size: 16384b+ 32768b+ 65536b+ No. of Reads: 2686 356 12946 No. of Writes: 70264 12678 34177 Block Size: 131072b+ 262144b+ 524288b+ No. of Reads: 602 3108 3 No. of Writes: 20547 27615 4342 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 2 FORGET 0.00 0.00 us 0.00 us 0.00 us 49333 RELEASE 0.00 0.00 us 0.00 us 0.00 us 40379 RELEASEDIR 0.00 22.96 us 15.97 us 51.06 us 8 ENTRYLK 0.00 260.50 us 260.50 us 260.50 us 1 MKNOD 0.00 77.42 us 16.00 us 2701.10 us 648 INODELK 0.00 144.50 us 7.61 us 1702.83 us 428 GETXATTR 0.01 285.11 us 12.41 us 3140.45 us 406 READDIR 0.01 51.02 us 13.08 us 46431.20 us 3384 FLUSH 0.01 65.58 us 0.94 us 17715.27 us 2808 OPENDIR 0.01 80.40 us 11.70 us 19019.90 us 2445 STAT 0.03 118.76 us 40.23 us 33323.48 us 3626 OPEN 0.03 57.81 us 15.15 us 27740.94 us 7757 STATFS 0.04 197.89 us 119.38 us 17249.34 us 2481 READDIRP 0.36 99.03 us 11.12 us 301165.99 us 49989 FINODELK 1.08 526.62 us 13.29 us 263413.97 us 28422 LOOKUP 1.23 341.69 us 71.48 us 563688.45 us 49950 FXATTROP 7.47 3998.57 us 82.10 us 469183.97 us 25947 READ 35.02 18777.53 us 92.85 us 483169.32 us 25908 FSYNC 54.69 12727.00 us 149.97 us 759284.50 us 59684 WRITE Duration: 58519 seconds Data Read: 3261956842 bytes Data Written: 24886890282 bytes Interval 9 Stats: Block Size: 256b+ 512b+ 1024b+ No. of Reads: 665 0 0 No. of Writes: 0 59 2334 Block Size: 2048b+ 4096b+ 8192b+ No. of Reads: 0 7 103 No. of Writes: 441 4519 1752 Block Size: 16384b+ 32768b+ 65536b+ No. of Reads: 2 3 17 No. of Writes: 790 84 399 Block Size: 131072b+ 262144b+ 524288b+ No. of Reads: 0 0 0 No. of Writes: 151 214 31 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 615 RELEASE 0.00 0.00 us 0.00 us 0.00 us 488 RELEASEDIR 0.00 39.06 us 18.17 us 225.32 us 73 INODELK 0.00 76.36 us 9.68 us 297.33 us 46 GETXATTR 0.01 222.65 us 21.02 us 593.11 us 58 READDIR 0.01 45.45 us 12.62 us 289.83 us 417 STAT 0.01 36.42 us 13.83 us 918.21 us 591 FLUSH 0.01 53.38 us 0.99 us 474.56 us 488 OPENDIR 0.02 87.38 us 40.23 us 5527.50 us 615 OPEN 0.03 49.03 us 18.83 us 3866.60 us 1361 STATFS 0.04 189.02 us 122.54 us 990.42 us 435 READDIRP 0.33 63.39 us 13.25 us 128981.30 us 11477 FINODELK 0.74 321.28 us 13.43 us 37963.21 us 5074 LOOKUP 0.98 186.47 us 80.20 us 43834.36 us 11476 FXATTROP 2.30 6321.31 us 154.25 us 110020.47 us 797 READ 39.43 8011.27 us 168.50 us 404368.45 us 10774 WRITE 56.08 21071.37 us 152.03 us 325318.37 us 5826 FSYNC Duration: 740 seconds Data Read: 2502580 bytes Data Written: 217700864 bytes Brick: 10.32.9.8:/data/gfs/bricks/bricka/ovirt-data --------------------------------------------------- Cumulative Stats: Block Size: 1b+ No. of Reads: 0 No. of Writes: 2836841 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 3501 RELEASE 0.00 0.00 us 0.00 us 0.00 us 39501 RELEASEDIR 0.00 23.26 us 15.42 us 47.19 us 110 FLUSH 0.01 85.78 us 57.99 us 148.82 us 124 REMOVEXATTR 0.01 90.83 us 67.34 us 158.33 us 124 SETATTR 0.01 67.66 us 10.50 us 368.03 us 194 GETXATTR 0.02 84.14 us 41.98 us 499.27 us 250 OPEN 0.04 36.99 us 12.62 us 398.57 us 944 INODELK 0.06 280.66 us 14.10 us 1296.89 us 197 READDIR 0.14 49.60 us 1.25 us 911.95 us 2704 OPENDIR 8.05 27.51 us 11.45 us 86619.65 us 270887 FINODELK 8.60 50.28 us 14.57 us 117405.17 us 158241 WRITE 22.34 810.95 us 15.51 us 136924.46 us 25499 LOOKUP 26.84 184.15 us 32.55 us 187376.40 us 134874 FSYNC 33.87 115.65 us 48.10 us 68557.92 us 271003 FXATTROP Duration: 59079 seconds Data Read: 0 bytes Data Written: 2836841 bytes Interval 9 Stats: Block Size: 1b+ No. of Reads: 0 No. of Writes: 25110 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 84 RELEASE 0.00 0.00 us 0.00 us 0.00 us 482 RELEASEDIR 0.00 22.68 us 15.95 us 30.51 us 22 FLUSH 0.02 81.94 us 66.17 us 121.51 us 24 REMOVEXATTR 0.02 92.75 us 73.22 us 158.33 us 24 SETATTR 0.02 50.76 us 10.50 us 198.71 us 52 GETXATTR 0.06 88.22 us 47.47 us 347.92 us 84 OPEN 0.07 200.12 us 17.43 us 366.16 us 46 READDIR 0.10 43.88 us 12.88 us 398.57 us 298 INODELK 0.17 46.60 us 1.30 us 95.78 us 482 OPENDIR 6.89 35.71 us 14.98 us 8325.56 us 25110 WRITE 8.62 243.97 us 17.27 us 13438.88 us 4599 LOOKUP 9.62 26.97 us 12.23 us 10471.02 us 46438 FINODELK 32.58 183.27 us 33.33 us 182520.02 us 23144 FSYNC 41.83 117.30 us 57.85 us 1991.12 us 46424 FXATTROP Duration: 740 seconds Data Read: 0 bytes Data Written: 25110 bytes Brick: 10.32.9.8:/data0/gfs/bricks/brick1/ovirt-data ---------------------------------------------------- Cumulative Stats: Block Size: 256b+ 512b+ 1024b+ No. of Reads: 8 1097 109 No. of Writes: 0 2901 273197 Block Size: 2048b+ 4096b+ 8192b+ No. of Reads: 115 238693 440909 No. of Writes: 36872 1361504 875644 Block Size: 16384b+ 32768b+ 65536b+ No. of Reads: 37900 3346 141710 No. of Writes: 293109 93776 162079 Block Size: 131072b+ 262144b+ 524288b+ No. of Reads: 3889 7281 33 No. of Writes: 161749 236364 7941 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 1 FORGET 0.00 0.00 us 0.00 us 0.00 us 3720 RELEASE 0.00 0.00 us 0.00 us 0.00 us 39522 RELEASEDIR 0.00 46.19 us 10.83 us 328.71 us 167 GETXATTR 0.00 107.36 us 42.28 us 762.18 us 195 OPEN 0.01 203.17 us 12.21 us 864.86 us 197 READDIR 0.03 43.02 us 1.32 us 452.74 us 2704 OPENDIR 0.06 2113.84 us 1920.14 us 2569.20 us 124 READDIRP 0.06 36.11 us 17.79 us 347.13 us 7757 STATFS 0.09 35.61 us 16.14 us 340.33 us 11844 FSTAT 0.73 27.99 us 11.02 us 73986.88 us 118371 FINODELK 1.77 136.85 us 37.39 us 121066.77 us 58862 FSYNC 1.88 346.99 us 15.01 us 77684.23 us 24658 LOOKUP 3.34 128.87 us 55.07 us 45501.15 us 118386 FXATTROP 5.55 52717.08 us 16.10 us 2004661.60 us 480 INODELK 9.40 234.45 us 75.18 us 172924.48 us 182886 WRITE 77.09 3911.50 us 74.71 us 427304.61 us 89909 READ Duration: 59079 seconds Data Read: 18550783716 bytes Data Written: 169056832000 bytes Interval 9 Stats: Block Size: 512b+ 1024b+ 2048b+ No. of Reads: 0 0 0 No. of Writes: 28 1201 202 Block Size: 4096b+ 8192b+ 16384b+ No. of Reads: 88 1012 13 No. of Writes: 11370 7637 1887 Block Size: 32768b+ 65536b+ 131072b+ No. of Reads: 13 723 0 No. of Writes: 518 690 1562 Block Size: 262144b+ 524288b+ No. of Reads: 0 0 No. of Writes: 2221 50 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 21 RELEASE 0.00 0.00 us 0.00 us 0.00 us 473 RELEASEDIR 0.00 63.17 us 11.55 us 328.71 us 24 GETXATTR 0.00 85.37 us 47.47 us 218.02 us 21 OPEN 0.01 191.44 us 20.33 us 506.91 us 28 READDIR 0.05 42.27 us 1.32 us 103.43 us 473 OPENDIR 0.12 35.99 us 17.79 us 312.67 us 1361 STATFS 0.13 2137.12 us 1993.74 us 2312.10 us 24 READDIRP 0.19 36.56 us 16.68 us 182.58 us 2058 FSTAT 1.46 28.28 us 11.78 us 4920.38 us 20656 FINODELK 3.09 283.99 us 16.10 us 77684.23 us 4368 LOOKUP 3.43 134.79 us 38.66 us 46317.56 us 10211 FSYNC 6.69 129.92 us 55.07 us 1519.44 us 20670 FXATTROP 15.38 225.45 us 75.18 us 166890.53 us 27366 WRITE 18.50 114198.06 us 16.67 us 2004661.60 us 65 INODELK 50.94 11055.19 us 133.17 us 355082.08 us 1849 READ Duration: 740 seconds Data Read: 57180160 bytes Data Written: 1466518016 bytes Brick: 10.32.9.7:/data/gfs/bricks/brick1/ovirt-data --------------------------------------------------- Cumulative Stats: Block Size: 256b+ 512b+ 1024b+ No. of Reads: 8 146 10 No. of Writes: 0 5640 191078 Block Size: 2048b+ 4096b+ 8192b+ No. of Reads: 12 275838 263894 No. of Writes: 29947 1275560 712585 Block Size: 16384b+ 32768b+ 65536b+ No. of Reads: 17182 2303 69829 No. of Writes: 286032 45424 94648 Block Size: 131072b+ 262144b+ 524288b+ No. of Reads: 2287 4034 20 No. of Writes: 88659 100478 6790 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 3501 RELEASE 0.00 0.00 us 0.00 us 0.00 us 40057 RELEASEDIR 0.00 20.98 us 14.68 us 31.79 us 110 FLUSH 0.00 82.04 us 58.17 us 510.84 us 124 REMOVEXATTR 0.00 86.21 us 62.72 us 156.44 us 124 SETATTR 0.00 268.84 us 6.67 us 2674.39 us 259 GETXATTR 0.00 321.60 us 41.89 us 3004.61 us 250 OPEN 0.01 475.40 us 14.32 us 1787.62 us 281 READDIR 0.01 1249.13 us 26.42 us 3832.88 us 234 READDIRP 0.05 178.77 us 16.13 us 351764.07 us 7757 STATFS 0.09 822.57 us 1.17 us 1068559.38 us 2746 OPENDIR 0.12 292.98 us 25.31 us 1365160.22 us 10719 FSTAT 0.90 25010.58 us 13.33 us 887933.44 us 941 INODELK 1.38 133.53 us 11.10 us 3938189.55 us 270885 FINODELK 1.68 162.21 us 45.86 us 2503412.21 us 271003 FXATTROP 2.03 394.43 us 31.20 us 756176.42 us 134874 FSYNC 12.66 2092.21 us 72.21 us 4245933.36 us 158241 WRITE 14.29 14633.42 us 10.75 us 4031333.55 us 25543 LOOKUP 66.78 18236.84 us 69.72 us 6429153.50 us 95797 READ Duration: 59031 seconds Data Read: 10396155504 bytes Data Written: 84404067328 bytes Interval 9 Stats: Block Size: 512b+ 1024b+ 2048b+ No. of Reads: 0 0 0 No. of Writes: 64 279 45 Block Size: 4096b+ 8192b+ 16384b+ No. of Reads: 172 529 10 No. of Writes: 14494 5264 1377 Block Size: 32768b+ 65536b+ 131072b+ No. of Reads: 5 351 0 No. of Writes: 316 548 1064 Block Size: 262144b+ 524288b+ No. of Reads: 0 0 No. of Writes: 1620 39 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 84 RELEASE 0.00 0.00 us 0.00 us 0.00 us 494 RELEASEDIR 0.00 22.20 us 17.97 us 31.79 us 22 FLUSH 0.00 81.91 us 68.21 us 101.65 us 24 REMOVEXATTR 0.00 89.38 us 72.10 us 128.30 us 24 SETATTR 0.01 43.26 us 1.29 us 153.44 us 494 OPENDIR 0.01 427.33 us 15.79 us 1319.22 us 70 READDIR 0.02 33.36 us 19.57 us 258.67 us 1361 STATFS 0.02 561.73 us 9.55 us 2674.39 us 86 GETXATTR 0.02 1283.75 us 28.25 us 3720.17 us 46 READDIRP 0.03 790.46 us 41.89 us 3004.61 us 84 OPEN 0.03 42.33 us 25.31 us 308.87 us 1862 FSTAT 0.53 27.26 us 11.92 us 22788.88 us 46436 FINODELK 0.61 316.49 us 10.75 us 100131.38 us 4611 LOOKUP 2.25 231.96 us 37.46 us 540421.26 us 23144 FSYNC 2.97 152.85 us 51.13 us 541193.18 us 46424 FXATTROP 4.89 39428.07 us 14.19 us 881646.99 us 296 INODELK 18.93 1799.80 us 72.85 us 3118991.86 us 25110 WRITE 69.67 155845.21 us 130.17 us 4885484.01 us 1067 READ Duration: 740 seconds Data Read: 28553216 bytes Data Written: 1032566784 bytes Brick: 10.32.9.21:/data/gfs/bricks/bricka/ovirt-data ---------------------------------------------------- Cumulative Stats: Block Size: 1b+ No. of Reads: 0 No. of Writes: 3513729 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 1 FORGET 0.00 0.00 us 0.00 us 0.00 us 4089 RELEASE 0.00 0.00 us 0.00 us 0.00 us 44272 RELEASEDIR 0.18 42.97 us 1.09 us 1138.82 us 2893 OPENDIR 0.42 611.92 us 15.33 us 7388.53 us 479 INODELK 0.56 679.34 us 15.23 us 2483.07 us 574 READDIR 0.61 2170.81 us 48.25 us 13138.61 us 195 OPEN 0.82 1066.87 us 8.75 us 13801.35 us 535 GETXATTR 4.89 28.84 us 10.98 us 51214.69 us 118373 FINODELK 9.18 35.04 us 14.64 us 81798.39 us 182886 WRITE 18.04 506.78 us 12.31 us 165781.70 us 24847 LOOKUP 22.07 130.09 us 53.80 us 40959.22 us 118386 FXATTROP 43.23 512.45 us 38.18 us 285202.84 us 58862 FSYNC Duration: 60363 seconds Data Read: 0 bytes Data Written: 3513729 bytes Interval 9 Stats: Block Size: 1b+ No. of Reads: 0 No. of Writes: 27366 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 21 RELEASE 0.00 0.00 us 0.00 us 0.00 us 500 RELEASEDIR 0.02 82.18 us 48.25 us 161.40 us 21 OPEN 0.02 41.30 us 16.13 us 156.19 us 65 INODELK 0.08 136.77 us 13.02 us 909.03 us 66 GETXATTR 0.20 43.34 us 1.27 us 277.72 us 500 OPENDIR 0.33 428.51 us 15.42 us 1215.07 us 82 READDIR 5.70 29.84 us 11.88 us 2949.68 us 20656 FINODELK 9.15 36.14 us 14.64 us 4606.43 us 27366 WRITE 11.51 283.22 us 12.31 us 53047.02 us 4395 LOOKUP 26.02 136.12 us 53.80 us 40959.22 us 20670 FXATTROP 46.96 497.27 us 40.09 us 274185.32 us 10211 FSYNC Duration: 740 seconds Data Read: 0 bytes Data Written: 27366 bytes Brick: 10.32.9.21:/data0/gfs/bricks/brick1/ovirt-data ----------------------------------------------------- Cumulative Stats: Block Size: 256b+ 512b+ 1024b+ No. of Reads: 11 66 2 No. of Writes: 2 31826 326701 Block Size: 2048b+ 4096b+ 8192b+ No. of Reads: 4 116933 302613 No. of Writes: 53880 1410548 1140566 Block Size: 16384b+ 32768b+ 65536b+ No. of Reads: 11546 1749 76356 No. of Writes: 479855 114971 144225 Block Size: 131072b+ 262144b+ 524288b+ No. of Reads: 1906 4093 24 No. of Writes: 105312 165416 8915 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 2 FORGET 0.00 0.00 us 0.00 us 0.00 us 6820 RELEASE 0.00 0.00 us 0.00 us 0.00 us 44299 RELEASEDIR 0.03 42.29 us 13.86 us 9356.79 us 2425 STAT 0.04 46.94 us 1.04 us 772.13 us 2893 OPENDIR 0.04 314.86 us 10.29 us 2391.56 us 479 GETXATTR 0.05 207.42 us 14.29 us 5815.42 us 799 INODELK 0.05 51.40 us 28.18 us 540.78 us 3666 FSTAT 0.09 39.82 us 18.62 us 9889.16 us 7757 STATFS 0.12 1358.64 us 43.37 us 233429.90 us 322 OPEN 0.14 851.72 us 15.78 us 4414.06 us 574 READDIR 0.16 224.28 us 143.72 us 3249.69 us 2482 READDIRP 1.46 30.22 us 10.80 us 110711.59 us 173110 FINODELK 4.32 601.98 us 15.19 us 91847.23 us 25659 LOOKUP 6.30 130.39 us 49.28 us 232146.00 us 172711 FXATTROP 8.84 378.85 us 35.58 us 430356.59 us 83395 FSYNC 23.35 525.84 us 73.80 us 494782.91 us 158694 WRITE 55.01 4360.84 us 78.00 us 503162.29 us 45075 READ Duration: 60363 seconds Data Read: 10404548654 bytes Data Written: 133624068438 bytes Interval 9 Stats: Block Size: 512b+ 1024b+ 2048b+ No. of Reads: 0 0 0 No. of Writes: 394 387 86 Block Size: 4096b+ 8192b+ 16384b+ No. of Reads: 31 563 12 No. of Writes: 5905 10055 2308 Block Size: 32768b+ 65536b+ 131072b+ No. of Reads: 7 368 1 No. of Writes: 763 1465 1637 Block Size: 262144b+ 524288b+ No. of Reads: 1 0 No. of Writes: 2759 73 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 70 RELEASE 0.00 0.00 us 0.00 us 0.00 us 500 RELEASEDIR 0.02 42.46 us 14.29 us 589.73 us 210 INODELK 0.03 34.11 us 20.98 us 177.92 us 413 STAT 0.04 298.66 us 10.97 us 1498.98 us 63 GETXATTR 0.06 48.48 us 1.17 us 120.46 us 500 OPENDIR 0.08 52.55 us 31.44 us 108.17 us 637 FSTAT 0.12 37.70 us 18.62 us 406.46 us 1361 STATFS 0.17 871.79 us 16.50 us 2028.77 us 82 READDIR 0.23 227.43 us 145.28 us 3249.69 us 435 READDIRP 0.56 3427.01 us 43.37 us 233429.90 us 70 OPEN 1.62 30.00 us 11.97 us 3260.32 us 23206 FINODELK 8.55 159.50 us 72.57 us 232146.00 us 23005 FXATTROP 8.94 849.78 us 16.81 us 91847.23 us 4515 LOOKUP 24.77 959.75 us 38.69 us 430356.59 us 11072 FSYNC 24.89 10859.72 us 153.99 us 250511.54 us 983 READ 29.91 496.71 us 73.80 us 453768.34 us 25832 WRITE Duration: 740 seconds Data Read: 30142464 bytes Data Written: 1776420864 bytes Brick: 10.32.9.20:/data/gfs/bricks/bricka/ovirt-data ---------------------------------------------------- Cumulative Stats: Block Size: 1b+ No. of Reads: 0 No. of Writes: 3979583 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 2 FORGET 0.00 0.00 us 0.00 us 0.00 us 6630 RELEASE 0.00 0.00 us 0.00 us 0.00 us 36900 RELEASEDIR 0.01 57.48 us 12.65 us 329.66 us 84 GETXATTR 0.02 193.66 us 14.90 us 878.82 us 70 READDIR 0.05 105.94 us 41.17 us 683.04 us 322 OPEN 0.06 54.19 us 15.72 us 5135.95 us 800 INODELK 0.19 54.37 us 1.60 us 1035.48 us 2641 OPENDIR 7.68 32.94 us 11.38 us 68417.92 us 173114 FINODELK 9.52 44.57 us 14.70 us 55440.51 us 158694 WRITE 24.39 712.95 us 16.53 us 280142.79 us 25407 LOOKUP 27.40 243.98 us 34.94 us 251521.50 us 83395 FSYNC 30.68 131.93 us 50.81 us 55731.00 us 172711 FXATTROP Duration: 57920 seconds Data Read: 0 bytes Data Written: 3979583 bytes Interval 9 Stats: Block Size: 1b+ No. of Reads: 0 No. of Writes: 25832 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 70 RELEASE 0.00 0.00 us 0.00 us 0.00 us 464 RELEASEDIR 0.01 62.34 us 15.20 us 116.82 us 17 GETXATTR 0.02 199.99 us 16.79 us 778.14 us 10 READDIR 0.10 114.77 us 46.50 us 683.04 us 70 OPEN 0.22 86.24 us 16.71 us 5135.95 us 211 INODELK 0.32 56.51 us 1.98 us 1035.48 us 464 OPENDIR 8.88 31.82 us 12.28 us 7988.05 us 23206 FINODELK 11.60 37.32 us 15.06 us 2981.61 us 25832 WRITE 12.08 224.23 us 20.07 us 39256.75 us 4479 LOOKUP 28.45 213.58 us 40.22 us 94343.80 us 11072 FSYNC 38.31 138.39 us 69.94 us 3069.85 us 23005 FXATTROP Duration: 740 seconds Data Read: 0 bytes Data Written: 25832 bytes Brick: 10.32.9.20:/data0/gfs/bricks/brick1/ovirt-data ----------------------------------------------------- Cumulative Stats: Block Size: 256b+ 512b+ 1024b+ No. of Reads: 4 109 21 No. of Writes: 0 2901 273197 Block Size: 2048b+ 4096b+ 8192b+ No. of Reads: 42 112256 166124 No. of Writes: 36872 1361504 875644 Block Size: 16384b+ 32768b+ 65536b+ No. of Reads: 12144 1370 69995 No. of Writes: 293109 93776 162079 Block Size: 131072b+ 262144b+ 524288b+ No. of Reads: 1466 2942 9 No. of Writes: 161749 236364 7941 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 3097 RELEASE 0.00 0.00 us 0.00 us 0.00 us 36900 RELEASEDIR 0.00 53.86 us 12.74 us 546.30 us 60 GETXATTR 0.00 211.97 us 16.62 us 988.45 us 70 READDIR 0.00 101.31 us 39.98 us 497.56 us 195 OPEN 0.00 46.92 us 14.96 us 3469.09 us 481 INODELK 0.01 52.86 us 1.65 us 1573.96 us 2641 OPENDIR 0.03 216.49 us 152.23 us 2562.55 us 2482 READDIRP 0.03 73.86 us 18.77 us 125905.96 us 7757 STATFS 0.06 111.91 us 16.04 us 655589.61 us 10152 FSTAT 0.07 542.53 us 12.89 us 523421.21 us 2425 STAT 1.10 803.43 us 18.50 us 1534952.31 us 24595 LOOKUP 1.16 177.03 us 11.27 us 1749236.34 us 118375 FINODELK 1.44 218.66 us 58.80 us 1784231.76 us 118390 FXATTROP 13.72 4194.48 us 39.91 us 2743546.94 us 58865 FSYNC 36.03 14004.46 us 79.14 us 2966713.52 us 46303 READ 46.33 4558.98 us 77.68 us 2638579.30 us 182887 WRITE Duration: 57920 seconds Data Read: 8237195368 bytes Data Written: 169056832000 bytes Interval 9 Stats: Block Size: 512b+ 1024b+ 2048b+ No. of Reads: 0 0 0 No. of Writes: 28 1201 202 Block Size: 4096b+ 8192b+ 16384b+ No. of Reads: 131 256 14 No. of Writes: 11370 7637 1887 Block Size: 32768b+ 65536b+ 131072b+ No. of Reads: 2 277 4 No. of Writes: 518 690 1562 Block Size: 262144b+ 524288b+ No. of Reads: 2 0 No. of Writes: 2221 50 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 21 RELEASE 0.00 0.00 us 0.00 us 0.00 us 464 RELEASEDIR 0.00 47.44 us 13.63 us 73.33 us 12 GETXATTR 0.00 212.66 us 19.17 us 803.07 us 10 READDIR 0.00 118.97 us 59.03 us 352.73 us 21 OPEN 0.00 45.59 us 20.23 us 248.98 us 65 INODELK 0.01 56.79 us 1.86 us 1008.29 us 464 OPENDIR 0.03 48.42 us 20.11 us 1484.22 us 1764 FSTAT 0.04 214.84 us 152.23 us 558.38 us 435 READDIRP 0.06 113.06 us 19.98 us 96371.03 us 1361 STATFS 0.07 470.41 us 15.68 us 177048.08 us 413 STAT 0.61 369.50 us 21.75 us 98568.83 us 4359 LOOKUP 0.84 107.63 us 11.27 us 422960.68 us 20656 FINODELK 1.50 191.62 us 58.80 us 669097.82 us 20670 FXATTROP 15.43 59246.36 us 108.59 us 2819788.24 us 686 READ 15.75 4062.95 us 40.23 us 1993844.42 us 10211 FSYNC 65.65 6319.81 us 80.06 us 2441596.43 us 27366 WRITE Duration: 740 seconds Data Read: 22843392 bytes Data Written: 1466518016 bytes Brick: 10.32.9.3:/data/gfs/bricks/brick3/ovirt-data --------------------------------------------------- Cumulative Stats: Block Size: 256b+ 512b+ 1024b+ No. of Reads: 4 131 8 No. of Writes: 0 5640 191078 Block Size: 2048b+ 4096b+ 8192b+ No. of Reads: 16 81419 161095 No. of Writes: 29947 1275560 712585 Block Size: 16384b+ 32768b+ 65536b+ No. of Reads: 17291 1342 103864 No. of Writes: 286032 45424 94648 Block Size: 131072b+ 262144b+ 524288b+ No. of Reads: 1022 1870 19 No. of Writes: 88659 100478 6790 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 3665 RELEASE 0.00 0.00 us 0.00 us 0.00 us 40654 RELEASEDIR 0.00 77.86 us 14.39 us 255.33 us 58 GETXATTR 0.00 55.93 us 18.71 us 185.55 us 110 FLUSH 0.00 148.39 us 76.71 us 306.67 us 124 REMOVEXATTR 0.00 156.41 us 94.69 us 237.19 us 124 SETATTR 0.00 282.94 us 26.64 us 1991.99 us 72 READDIR 0.01 249.61 us 62.86 us 1608.99 us 250 OPEN 0.02 108.18 us 23.05 us 1656.32 us 942 INODELK 0.03 73.13 us 23.73 us 337.63 us 2425 STAT 0.04 93.83 us 1.99 us 17101.96 us 2641 OPENDIR 0.10 78.77 us 24.79 us 1132.37 us 7757 STATFS 0.14 108.18 us 41.73 us 4078.40 us 7332 FSTAT 0.18 262.35 us 91.17 us 5148.11 us 3890 READDIRP 2.78 59.76 us 13.90 us 60015.05 us 270884 FINODELK 3.23 739.71 us 25.36 us 119501.01 us 25436 LOOKUP 7.72 333.10 us 45.00 us 283828.60 us 134874 FSYNC 9.10 195.46 us 67.03 us 157955.41 us 271003 FXATTROP 19.48 716.64 us 94.11 us 340140.18 us 158241 WRITE 57.15 8361.71 us 110.31 us 596087.45 us 39783 READ Duration: 60363 seconds Data Read: 9818510392 bytes Data Written: 84404067328 bytes Interval 9 Stats: Block Size: 512b+ 1024b+ 2048b+ No. of Reads: 0 0 0 No. of Writes: 64 279 45 Block Size: 4096b+ 8192b+ 16384b+ No. of Reads: 83 367 8 No. of Writes: 14494 5264 1377 Block Size: 32768b+ 65536b+ 131072b+ No. of Reads: 4 586 0 No. of Writes: 316 548 1064 Block Size: 262144b+ 524288b+ No. of Reads: 0 0 No. of Writes: 1620 39 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 84 RELEASE 0.00 0.00 us 0.00 us 0.00 us 464 RELEASEDIR 0.00 84.14 us 26.99 us 150.82 us 9 GETXATTR 0.00 52.66 us 18.71 us 108.64 us 22 FLUSH 0.00 187.96 us 35.03 us 565.44 us 11 READDIR 0.01 139.45 us 77.67 us 239.13 us 24 REMOVEXATTR 0.01 152.96 us 94.69 us 236.75 us 24 SETATTR 0.05 67.18 us 24.77 us 302.92 us 413 STAT 0.06 395.71 us 67.77 us 1608.99 us 84 OPEN 0.06 82.30 us 2.19 us 210.85 us 464 OPENDIR 0.09 186.01 us 24.60 us 1656.32 us 297 INODELK 0.18 78.19 us 27.95 us 1050.83 us 1361 STATFS 0.25 117.05 us 45.04 us 4078.40 us 1274 FSTAT 0.30 264.20 us 102.02 us 5148.11 us 682 READDIRP 2.07 271.71 us 35.32 us 22720.97 us 4581 LOOKUP 4.81 62.33 us 16.26 us 6962.45 us 46436 FINODELK 12.97 337.03 us 54.20 us 221094.08 us 23144 FSYNC 15.31 198.37 us 91.04 us 5197.06 us 46424 FXATTROP 24.57 588.49 us 97.75 us 228091.00 us 25110 WRITE 39.26 22524.25 us 112.00 us 551619.18 us 1048 READ Duration: 740 seconds Data Read: 42180608 bytes Data Written: 1032566784 bytes -------------- next part -------------- A non-text attachment was scrubbed... Name: bricklogs.7z Type: application/octet-stream Size: 4259488 bytes Desc: not available URL: -------------- next part -------------- Brick: 10.32.9.9:/data0/gfs/bricks/brick1/ovirt-data ---------------------------------------------------- Cumulative Stats: Block Size: 256b+ 512b+ 1024b+ No. of Reads: 11 197 9 No. of Writes: 2 5298 50158 Block Size: 2048b+ 4096b+ 8192b+ No. of Reads: 22 152814 155742 No. of Writes: 10281 141128 229969 Block Size: 16384b+ 32768b+ 65536b+ No. of Reads: 19592 2856 34807 No. of Writes: 46540 15874 16618 Block Size: 131072b+ 262144b+ 524288b+ No. of Reads: 41083 7619 43 No. of Writes: 12325 19939 1278 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 2 FORGET 0.00 0.00 us 0.00 us 0.00 us 1163 RELEASE 0.00 0.00 us 0.00 us 0.00 us 8997 RELEASEDIR 0.01 42.50 us 1.07 us 200.99 us 2015 OPENDIR 0.03 1205.44 us 147.78 us 3219.56 us 160 READDIRP 0.03 38.21 us 19.08 us 7544.91 us 5346 STATFS 0.03 947.57 us 42.15 us 5689.21 us 221 OPEN 0.03 536.56 us 11.61 us 5405.25 us 416 GETXATTR 0.06 42.40 us 15.94 us 3408.97 us 9626 FSTAT 0.06 992.07 us 13.79 us 3583.18 us 440 READDIR 2.07 784.10 us 10.19 us 100292.79 us 17781 LOOKUP 2.58 128.10 us 48.36 us 73127.27 us 135547 FXATTROP 2.75 282.45 us 36.68 us 403763.55 us 65559 FSYNC 5.30 72558.78 us 13.93 us 1840271.15 us 491 INODELK 7.97 450.94 us 75.16 us 368078.53 us 118790 WRITE 24.45 1212.26 us 11.70 us 1850709.72 us 135580 FINODELK 54.60 3082.33 us 72.64 us 387813.52 us 119063 READ Duration: 10748 seconds Data Read: 13585666193 bytes Data Written: 16779903830 bytes Interval 6 Stats: Block Size: 512b+ 1024b+ 4096b+ No. of Reads: 0 0 1151 No. of Writes: 13 2 405 Block Size: 8192b+ 16384b+ 32768b+ No. of Reads: 580 10 0 No. of Writes: 357 115 16 Block Size: 65536b+ 131072b+ 262144b+ No. of Reads: 314 96 0 No. of Writes: 25 19 62 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 9 RELEASE 0.00 0.00 us 0.00 us 0.00 us 25 RELEASEDIR 0.00 42.88 us 11.61 us 65.94 us 6 GETXATTR 0.01 95.97 us 52.28 us 174.48 us 9 OPEN 0.01 43.61 us 1.83 us 68.31 us 25 OPENDIR 0.01 30.47 us 22.02 us 42.78 us 53 STATFS 0.01 291.00 us 148.50 us 772.43 us 6 READDIR 0.01 2255.97 us 2255.97 us 2255.97 us 1 READDIRP 0.03 42.10 us 18.38 us 114.84 us 98 FSTAT 0.10 30.02 us 14.10 us 662.25 us 520 FINODELK 0.18 165.79 us 20.56 us 2137.59 us 173 LOOKUP 0.21 137.80 us 53.21 us 366.07 us 243 FSYNC 0.43 133.43 us 81.66 us 384.37 us 520 FXATTROP 4.53 713.51 us 75.16 us 101590.43 us 1014 WRITE 28.81 102208.45 us 16.11 us 931987.84 us 45 INODELK 65.66 4873.26 us 79.54 us 293295.82 us 2151 READ Duration: 26 seconds Data Read: 42790912 bytes Data Written: 40480256 bytes Brick: 10.32.9.6:/data/gfs/bricks/bricka/ovirt-data --------------------------------------------------- Cumulative Stats: Block Size: 1b+ No. of Reads: 0 No. of Writes: 140764 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 2 FORGET 0.00 0.00 us 0.00 us 0.00 us 8678 RELEASE 0.00 0.00 us 0.00 us 0.00 us 10636 RELEASEDIR 0.00 21.03 us 16.15 us 34.88 us 8 ENTRYLK 0.00 270.94 us 270.94 us 270.94 us 1 MKNOD 0.01 323.61 us 13.73 us 2466.13 us 80 READDIR 0.02 326.98 us 11.55 us 9174.66 us 83 GETXATTR 0.09 65.78 us 16.75 us 4946.30 us 2332 FLUSH 0.09 85.68 us 1.51 us 4642.85 us 1834 OPENDIR 0.32 227.62 us 41.15 us 64374.98 us 2476 OPEN 0.45 2256.77 us 15.05 us 63381.45 us 347 INODELK 0.97 56.53 us 10.33 us 5526.47 us 29676 FINODELK 1.78 74.73 us 15.84 us 5393.43 us 41433 WRITE 3.57 320.06 us 15.52 us 115626.36 us 19340 LOOKUP 3.89 227.62 us 77.62 us 68580.08 us 29634 FXATTROP 88.81 9879.97 us 149.50 us 367307.05 us 15606 FSYNC Duration: 12346 seconds Data Read: 0 bytes Data Written: 140764 bytes Interval 6 Stats: Block Size: 1b+ No. of Reads: 0 No. of Writes: 174 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 25 RELEASE 0.00 0.00 us 0.00 us 0.00 us 25 RELEASEDIR 0.04 73.34 us 20.62 us 134.34 us 5 GETXATTR 0.24 82.36 us 46.64 us 190.61 us 25 OPEN 0.31 437.89 us 272.99 us 1086.34 us 6 READDIR 0.43 184.47 us 19.85 us 2374.95 us 20 FLUSH 0.47 235.68 us 19.14 us 1331.71 us 17 INODELK 0.58 197.88 us 2.13 us 1981.61 us 25 OPENDIR 1.50 115.21 us 18.93 us 1851.08 us 112 FINODELK 2.46 194.41 us 96.77 us 1656.53 us 109 FXATTROP 5.75 284.33 us 23.78 us 1087.73 us 174 WRITE 6.46 295.65 us 21.14 us 19073.84 us 188 LOOKUP 81.76 12553.12 us 218.24 us 73123.96 us 56 FSYNC Duration: 26 seconds Data Read: 0 bytes Data Written: 174 bytes Brick: 10.32.9.4:/data/gfs/bricks/brick1/ovirt-data --------------------------------------------------- Cumulative Stats: Block Size: 256b+ 512b+ 1024b+ No. of Reads: 9195 6 0 No. of Writes: 3 700 2043 Block Size: 2048b+ 4096b+ 8192b+ No. of Reads: 0 32431 16795 No. of Writes: 623 67451 42803 Block Size: 16384b+ 32768b+ 65536b+ No. of Reads: 1749 317 6006 No. of Writes: 12604 1347 5850 Block Size: 131072b+ 262144b+ 524288b+ No. of Reads: 565 3076 3 No. of Writes: 2324 2469 893 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 2 FORGET 0.00 0.00 us 0.00 us 0.00 us 8579 RELEASE 0.00 0.00 us 0.00 us 0.00 us 8397 RELEASEDIR 0.00 22.96 us 15.97 us 51.06 us 8 ENTRYLK 0.00 260.50 us 260.50 us 260.50 us 1 MKNOD 0.00 93.03 us 16.42 us 2701.10 us 347 INODELK 0.00 128.90 us 7.61 us 1129.71 us 284 GETXATTR 0.01 300.85 us 12.41 us 3140.45 us 290 READDIR 0.01 57.69 us 13.08 us 46431.20 us 2334 FLUSH 0.01 70.20 us 0.94 us 17715.27 us 1939 OPENDIR 0.02 89.64 us 11.70 us 19019.90 us 1691 STAT 0.03 132.06 us 42.23 us 33323.48 us 2478 OPEN 0.03 62.90 us 16.20 us 27740.94 us 5346 STATFS 0.03 202.21 us 119.54 us 17249.34 us 1709 READDIRP 0.36 120.14 us 11.12 us 301165.99 us 29660 FINODELK 1.20 614.44 us 13.29 us 263413.97 us 19453 LOOKUP 1.27 427.97 us 71.48 us 563688.45 us 29634 FXATTROP 9.41 3840.22 us 82.10 us 469183.97 us 24493 READ 27.25 17452.64 us 92.85 us 483169.32 us 15606 FSYNC 60.36 14557.60 us 149.97 us 759284.50 us 41437 WRITE Duration: 10499 seconds Data Read: 2314290390 bytes Data Written: 3195953450 bytes Interval 6 Stats: Block Size: 256b+ 512b+ 4096b+ No. of Reads: 23 0 571 No. of Writes: 0 3 13 Block Size: 8192b+ 16384b+ 32768b+ No. of Reads: 52 0 0 No. of Writes: 128 13 1 Block Size: 65536b+ 131072b+ 262144b+ No. of Reads: 46 0 0 No. of Writes: 7 7 2 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 25 RELEASE 0.00 0.00 us 0.00 us 0.00 us 25 RELEASEDIR 0.01 68.94 us 10.11 us 221.91 us 6 GETXATTR 0.01 30.00 us 15.66 us 54.39 us 20 FLUSH 0.02 44.60 us 20.24 us 95.68 us 15 STAT 0.02 41.06 us 19.29 us 103.84 us 17 INODELK 0.02 162.01 us 110.86 us 228.45 us 6 READDIR 0.03 45.60 us 1.45 us 89.10 us 25 OPENDIR 0.04 75.73 us 44.53 us 190.60 us 25 OPEN 0.04 36.62 us 16.52 us 66.38 us 53 STATFS 0.09 185.21 us 136.70 us 275.71 us 21 READDIRP 0.11 44.70 us 19.04 us 91.96 us 111 FINODELK 0.37 148.06 us 86.69 us 324.91 us 109 FXATTROP 1.53 355.20 us 16.63 us 41956.49 us 188 LOOKUP 25.17 19663.37 us 213.60 us 89492.03 us 56 FSYNC 28.20 7089.90 us 177.90 us 39757.75 us 174 WRITE 44.33 2802.26 us 104.46 us 49721.81 us 692 READ Duration: 26 seconds Data Read: 5790220 bytes Data Written: 4044288 bytes Brick: 10.32.9.5:/data/gfs/bricks/brick1/ovirt-data --------------------------------------------------- Cumulative Stats: Block Size: 256b+ 512b+ 1024b+ No. of Reads: 2 458 87 No. of Writes: 3 703 2052 Block Size: 2048b+ 4096b+ 8192b+ No. of Reads: 54 15030 33939 No. of Writes: 623 68563 43056 Block Size: 16384b+ 32768b+ 65536b+ No. of Reads: 3940 540 7475 No. of Writes: 12791 1436 5850 Block Size: 131072b+ 262144b+ 524288b+ No. of Reads: 2395 6141 15 No. of Writes: 22702 2469 893 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 2 FORGET 0.00 0.00 us 0.00 us 0.00 us 8678 RELEASE 0.00 0.00 us 0.00 us 0.00 us 10597 RELEASEDIR 0.00 158.97 us 158.97 us 158.97 us 1 MKNOD 0.00 393.70 us 9.69 us 2344.17 us 8 ENTRYLK 0.00 126.39 us 10.54 us 281.56 us 80 READDIR 0.00 1956.79 us 6.43 us 155911.98 us 82 GETXATTR 0.00 3315.24 us 139.26 us 170764.63 us 160 READDIRP 0.03 4020.12 us 15.61 us 469284.63 us 1164 FSTAT 0.04 2583.09 us 28.06 us 204742.02 us 2476 OPEN 0.04 2825.94 us 8.33 us 590785.10 us 2332 FLUSH 0.07 5673.25 us 0.80 us 527903.92 us 1832 OPENDIR 0.09 2595.01 us 11.48 us 382537.47 us 5342 STATFS 0.14 59496.60 us 8.34 us 1543417.83 us 347 INODELK 0.40 2031.08 us 6.84 us 1127299.99 us 29607 FINODELK 0.57 4484.44 us 11.60 us 968588.21 us 19334 LOOKUP 2.00 10241.03 us 53.88 us 1880631.52 us 29585 FXATTROP 2.57 29123.91 us 63.31 us 3473303.89 us 13399 READ 11.80 115055.07 us 122.19 us 2279735.88 us 15581 FSYNC 82.25 301692.99 us 98.78 us 4500846.60 us 41400 WRITE Duration: 12343 seconds Data Read: 4270911318 bytes Data Written: 5880060714 bytes Interval 6 Stats: Block Size: 512b+ 4096b+ 8192b+ No. of Reads: 0 29 109 No. of Writes: 3 13 128 Block Size: 16384b+ 32768b+ 65536b+ No. of Reads: 15 0 47 No. of Writes: 13 1 7 Block Size: 131072b+ 262144b+ No. of Reads: 0 0 No. of Writes: 7 2 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 25 RELEASE 0.00 0.00 us 0.00 us 0.00 us 25 RELEASEDIR 0.00 40.48 us 6.94 us 92.02 us 6 GETXATTR 0.00 28.47 us 10.19 us 50.39 us 20 FLUSH 0.00 53.78 us 31.59 us 76.06 us 12 FSTAT 0.00 129.00 us 88.83 us 161.83 us 6 READDIR 0.00 47.94 us 0.89 us 88.84 us 25 OPENDIR 0.00 1499.83 us 1499.83 us 1499.83 us 1 READDIRP 0.04 1333.68 us 30.78 us 31125.60 us 25 OPEN 0.05 836.76 us 12.08 us 23439.58 us 53 STATFS 0.11 872.63 us 8.21 us 56495.95 us 111 FINODELK 0.88 6920.74 us 68.16 us 625281.67 us 109 FXATTROP 1.02 4614.95 us 13.66 us 348629.63 us 188 LOOKUP 1.65 83057.19 us 12.45 us 658978.30 us 17 INODELK 6.49 27709.93 us 98.14 us 471332.27 us 200 READ 8.08 123206.43 us 267.50 us 979136.22 us 56 FSYNC 81.66 396159.13 us 232.40 us 1353202.04 us 176 WRITE Duration: 26 seconds Data Read: 4341760 bytes Data Written: 4044288 bytes Brick: 10.32.9.7:/data/gfs/bricks/brick1/ovirt-data --------------------------------------------------- Cumulative Stats: Block Size: 256b+ 512b+ 1024b+ No. of Reads: 8 146 10 No. of Writes: 0 822 17574 Block Size: 2048b+ 4096b+ 8192b+ No. of Reads: 12 147605 81020 No. of Writes: 3335 177490 110247 Block Size: 16384b+ 32768b+ 65536b+ No. of Reads: 12618 1795 18321 No. of Writes: 30013 5366 10235 Block Size: 131072b+ 262144b+ 524288b+ No. of Reads: 2188 4008 20 No. of Writes: 8375 8875 585 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 608 RELEASE 0.00 0.00 us 0.00 us 0.00 us 8795 RELEASEDIR 0.00 20.59 us 14.68 us 25.91 us 75 FLUSH 0.00 77.89 us 58.31 us 94.39 us 85 REMOVEXATTR 0.00 85.37 us 64.65 us 156.44 us 85 SETATTR 0.00 86.81 us 42.56 us 643.87 us 146 OPEN 0.00 127.62 us 6.67 us 943.71 us 165 GETXATTR 0.00 507.62 us 14.32 us 1787.62 us 201 READDIR 0.01 1235.24 us 26.42 us 3832.88 us 160 READDIRP 0.06 244.53 us 16.13 us 351764.07 us 5346 STATFS 0.10 1171.75 us 1.17 us 1068559.38 us 1895 OPENDIR 0.13 406.39 us 25.42 us 1365160.22 us 7375 FSTAT 0.53 20755.68 us 13.33 us 887933.44 us 564 INODELK 1.46 170.00 us 45.86 us 2503412.21 us 190741 FXATTROP 1.53 178.24 us 11.10 us 3938189.55 us 190604 FINODELK 1.82 425.12 us 32.12 us 663207.46 us 94843 FSYNC 11.43 2187.65 us 72.21 us 4245933.36 us 116005 WRITE 16.73 21132.03 us 14.12 us 4031333.55 us 17576 LOOKUP 66.20 15737.71 us 69.72 us 6429153.50 us 93418 READ Duration: 11011 seconds Data Read: 4863622768 bytes Data Written: 8447544320 bytes Interval 6 Stats: Block Size: 512b+ 1024b+ 2048b+ No. of Reads: 0 0 0 No. of Writes: 3 109 5 Block Size: 4096b+ 8192b+ 16384b+ No. of Reads: 6829 259 5 No. of Writes: 228 230 13 Block Size: 32768b+ 65536b+ 131072b+ No. of Reads: 0 175 0 No. of Writes: 4 2 12 Block Size: 262144b+ 524288b+ No. of Reads: 0 0 No. of Writes: 12 2 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 7 RELEASE 0.00 0.00 us 0.00 us 0.00 us 25 RELEASEDIR 0.00 74.17 us 74.17 us 74.17 us 1 REMOVEXATTR 0.00 89.45 us 89.45 us 89.45 us 1 SETATTR 0.00 33.25 us 16.33 us 64.78 us 4 GETXATTR 0.00 70.61 us 52.42 us 153.99 us 7 OPEN 0.00 37.55 us 1.95 us 64.59 us 25 OPENDIR 0.00 30.14 us 18.44 us 44.04 us 53 STATFS 0.00 275.53 us 155.69 us 765.06 us 6 READDIR 0.00 2352.74 us 2352.74 us 2352.74 us 1 READDIRP 0.01 43.65 us 29.55 us 73.65 us 76 FSTAT 0.05 149.19 us 25.96 us 227.32 us 171 LOOKUP 0.06 25.33 us 11.55 us 59.71 us 1236 FINODELK 0.15 130.28 us 50.48 us 244.99 us 609 FSYNC 0.26 113.48 us 78.80 us 565.69 us 1237 FXATTROP 0.27 237.31 us 80.85 us 2140.21 us 620 WRITE 10.35 142923.57 us 15.02 us 887933.44 us 39 INODELK 88.84 6582.39 us 75.54 us 3820683.07 us 7268 READ Duration: 26 seconds Data Read: 41644032 bytes Data Written: 11333120 bytes Brick: 10.32.9.21:/data/gfs/bricks/bricka/ovirt-data ---------------------------------------------------- Cumulative Stats: Block Size: 1b+ No. of Reads: 0 No. of Writes: 583084 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 1 FORGET 0.00 0.00 us 0.00 us 0.00 us 1506 RELEASE 0.00 0.00 us 0.00 us 0.00 us 11330 RELEASEDIR 0.16 41.96 us 1.09 us 174.18 us 2000 OPENDIR 0.53 882.55 us 15.33 us 7388.53 us 313 INODELK 0.61 785.26 us 15.23 us 2483.07 us 410 READDIR 0.78 2944.84 us 50.45 us 13138.61 us 139 OPEN 1.04 1439.59 us 8.75 us 13801.35 us 379 GETXATTR 4.69 28.29 us 10.98 us 51214.69 us 87008 FINODELK 9.29 34.56 us 14.65 us 81798.39 us 141069 WRITE 19.62 601.79 us 13.34 us 82349.56 us 17113 LOOKUP 21.24 128.13 us 74.02 us 5590.59 us 87026 FXATTROP 42.05 509.22 us 38.18 us 285202.84 us 43355 FSYNC Duration: 12343 seconds Data Read: 0 bytes Data Written: 583084 bytes Interval 6 Stats: Block Size: 1b+ No. of Reads: 0 No. of Writes: 820 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 14 RELEASE 0.00 0.00 us 0.00 us 0.00 us 25 RELEASEDIR 0.26 54.89 us 17.77 us 76.95 us 9 GETXATTR 0.54 73.00 us 50.45 us 123.40 us 14 OPEN 0.55 41.66 us 1.43 us 58.76 us 25 OPENDIR 0.88 278.14 us 157.59 us 394.86 us 6 READDIR 0.97 40.13 us 17.39 us 342.49 us 46 INODELK 9.81 34.88 us 16.20 us 3058.74 us 534 FINODELK 14.96 34.64 us 16.51 us 177.38 us 820 WRITE 17.10 168.21 us 24.51 us 323.55 us 193 LOOKUP 17.52 127.46 us 46.97 us 299.68 us 261 FSYNC 37.41 133.00 us 80.38 us 341.49 us 534 FXATTROP Duration: 26 seconds Data Read: 0 bytes Data Written: 820 bytes Brick: 10.32.9.21:/data0/gfs/bricks/brick1/ovirt-data ----------------------------------------------------- Cumulative Stats: Block Size: 256b+ 512b+ 1024b+ No. of Reads: 11 66 2 No. of Writes: 2 5586 50158 Block Size: 2048b+ 4096b+ 8192b+ No. of Reads: 4 52513 93336 No. of Writes: 10281 142655 230190 Block Size: 16384b+ 32768b+ 65536b+ No. of Reads: 7241 1565 19639 No. of Writes: 46705 15906 16639 Block Size: 131072b+ 262144b+ 524288b+ No. of Reads: 1874 4071 24 No. of Writes: 12317 19939 1278 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 2 FORGET 0.00 0.00 us 0.00 us 0.00 us 1255 RELEASE 0.00 0.00 us 0.00 us 0.00 us 11357 RELEASEDIR 0.03 45.76 us 13.86 us 9356.79 us 1679 STAT 0.03 46.01 us 1.04 us 772.13 us 2000 OPENDIR 0.04 326.97 us 10.29 us 2391.56 us 353 GETXATTR 0.05 51.07 us 29.86 us 540.78 us 2522 FSTAT 0.05 313.16 us 16.18 us 5815.42 us 491 INODELK 0.07 883.53 us 45.88 us 6148.85 us 221 OPEN 0.08 41.10 us 18.98 us 9889.16 us 5346 STATFS 0.12 856.40 us 15.78 us 4414.06 us 410 READDIR 0.14 225.48 us 143.72 us 926.43 us 1710 READDIRP 1.41 29.62 us 10.80 us 110711.59 us 135572 FINODELK 3.66 587.34 us 15.19 us 61302.68 us 17766 LOOKUP 5.95 125.02 us 49.28 us 17686.93 us 135539 FXATTROP 6.44 279.71 us 35.58 us 407061.84 us 65554 FSYNC 19.89 476.93 us 75.41 us 440395.70 us 118787 WRITE 62.04 4088.32 us 78.00 us 503162.29 us 43217 READ Duration: 12343 seconds Data Read: 4610277422 bytes Data Written: 16792466262 bytes Interval 6 Stats: Block Size: 512b+ 1024b+ 4096b+ No. of Reads: 0 0 70 No. of Writes: 13 2 405 Block Size: 8192b+ 16384b+ 32768b+ No. of Reads: 342 5 0 No. of Writes: 357 115 16 Block Size: 65536b+ 131072b+ 262144b+ No. of Reads: 168 0 0 No. of Writes: 25 19 62 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 9 RELEASE 0.00 0.00 us 0.00 us 0.00 us 25 RELEASEDIR 0.01 35.71 us 24.38 us 65.77 us 15 STAT 0.02 44.49 us 1.68 us 67.87 us 25 OPENDIR 0.02 126.43 us 52.25 us 398.25 us 9 OPEN 0.02 51.99 us 32.93 us 109.37 us 26 FSTAT 0.03 256.51 us 145.04 us 360.84 us 6 READDIR 0.03 33.20 us 20.54 us 47.63 us 53 STATFS 0.03 185.61 us 12.72 us 1214.94 us 10 GETXATTR 0.07 94.06 us 17.46 us 2439.30 us 45 INODELK 0.08 213.94 us 177.04 us 383.69 us 21 READDIRP 0.26 29.48 us 15.33 us 71.02 us 521 FINODELK 0.48 159.48 us 21.99 us 327.94 us 173 LOOKUP 1.22 136.16 us 84.01 us 461.36 us 521 FXATTROP 4.60 1098.76 us 46.07 us 205950.17 us 243 FSYNC 8.03 459.44 us 85.85 us 21485.20 us 1014 WRITE 85.10 8442.50 us 119.17 us 309111.15 us 585 READ Duration: 26 seconds Data Read: 14180352 bytes Data Written: 40480256 bytes Brick: 10.32.9.20:/data0/gfs/bricks/brick1/ovirt-data ----------------------------------------------------- Cumulative Stats: Block Size: 256b+ 512b+ 1024b+ No. of Reads: 4 109 21 No. of Writes: 0 460 54715 Block Size: 2048b+ 4096b+ 8192b+ No. of Reads: 42 45163 54896 No. of Writes: 8883 212986 162092 Block Size: 16384b+ 32768b+ 65536b+ No. of Reads: 7020 1175 20888 No. of Writes: 48288 16340 24443 Block Size: 131072b+ 262144b+ 524288b+ No. of Reads: 1408 2923 9 No. of Writes: 19159 26333 792 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 514 RELEASE 0.00 0.00 us 0.00 us 0.00 us 6838 RELEASEDIR 0.00 59.50 us 14.59 us 546.30 us 41 GETXATTR 0.00 220.10 us 16.62 us 988.45 us 50 READDIR 0.00 104.05 us 39.98 us 497.56 us 139 OPEN 0.00 48.49 us 15.80 us 3469.09 us 315 INODELK 0.01 50.94 us 1.65 us 1573.96 us 1820 OPENDIR 0.03 215.71 us 153.61 us 2562.55 us 1710 READDIRP 0.03 69.86 us 18.77 us 125905.96 us 5346 STATFS 0.07 141.42 us 16.04 us 655589.61 us 6984 FSTAT 0.08 659.41 us 12.89 us 523421.21 us 1679 STAT 1.24 1008.30 us 19.19 us 1534952.31 us 16933 LOOKUP 1.30 204.65 us 11.71 us 1749236.34 us 87007 FINODELK 1.48 233.96 us 73.33 us 1784231.76 us 87026 FXATTROP 12.57 3983.58 us 39.91 us 2743546.94 us 43355 FSYNC 41.38 12734.21 us 79.14 us 2966713.52 us 44635 READ 41.80 4069.99 us 77.68 us 2638579.30 us 141069 WRITE Duration: 9900 seconds Data Read: 3717300328 bytes Data Written: 21265203712 bytes Interval 6 Stats: Block Size: 512b+ 1024b+ 2048b+ No. of Reads: 0 0 0 No. of Writes: 3 39 2 Block Size: 4096b+ 8192b+ 16384b+ No. of Reads: 381 294 20 No. of Writes: 83 467 47 Block Size: 32768b+ 65536b+ 131072b+ No. of Reads: 0 264 0 No. of Writes: 32 35 41 Block Size: 262144b+ 524288b+ No. of Reads: 0 0 No. of Writes: 65 6 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 14 RELEASE 0.00 0.00 us 0.00 us 0.00 us 25 RELEASEDIR 0.00 55.58 us 15.93 us 93.98 us 8 GETXATTR 0.00 37.21 us 25.04 us 59.45 us 15 STAT 0.01 69.47 us 46.35 us 141.23 us 14 OPEN 0.01 26.67 us 18.22 us 49.45 us 46 INODELK 0.01 49.56 us 2.58 us 143.98 us 25 OPENDIR 0.01 34.60 us 18.77 us 126.90 us 53 STATFS 0.01 355.56 us 149.60 us 988.45 us 6 READDIR 0.02 41.77 us 21.14 us 73.50 us 72 FSTAT 0.02 215.13 us 169.65 us 274.27 us 21 READDIRP 0.08 29.04 us 13.69 us 66.67 us 534 FINODELK 0.17 160.73 us 23.54 us 321.76 us 193 LOOKUP 0.39 135.86 us 77.40 us 1627.16 us 534 FXATTROP 4.78 1072.78 us 86.42 us 197615.21 us 820 WRITE 19.40 13686.01 us 47.75 us 1796049.15 us 261 FSYNC 75.09 14420.80 us 94.10 us 2221701.71 us 959 READ Duration: 26 seconds Data Read: 21602304 bytes Data Written: 49301504 bytes Brick: 10.32.9.20:/data/gfs/bricks/bricka/ovirt-data ---------------------------------------------------- Cumulative Stats: Block Size: 1b+ No. of Reads: 0 No. of Writes: 549022 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 2 FORGET 0.00 0.00 us 0.00 us 0.00 us 1065 RELEASE 0.00 0.00 us 0.00 us 0.00 us 6838 RELEASEDIR 0.01 53.35 us 12.82 us 228.91 us 57 GETXATTR 0.02 194.83 us 14.90 us 878.82 us 50 READDIR 0.04 43.27 us 15.72 us 322.38 us 491 INODELK 0.04 97.28 us 41.17 us 432.99 us 221 OPEN 0.16 53.03 us 1.74 us 501.22 us 1820 OPENDIR 7.40 33.07 us 11.38 us 68417.92 us 135575 FINODELK 9.15 46.66 us 14.70 us 55440.51 us 118787 WRITE 26.84 925.03 us 16.53 us 280142.79 us 17586 LOOKUP 27.29 252.34 us 34.94 us 251521.50 us 65555 FSYNC 29.08 130.03 us 50.81 us 55731.00 us 135539 FXATTROP Duration: 9900 seconds Data Read: 0 bytes Data Written: 549022 bytes Interval 6 Stats: Block Size: 1b+ No. of Reads: 0 No. of Writes: 1014 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 9 RELEASE 0.00 0.00 us 0.00 us 0.00 us 25 RELEASEDIR 0.11 43.98 us 15.40 us 84.40 us 5 GETXATTR 0.44 95.58 us 51.84 us 164.84 us 9 OPEN 0.62 48.41 us 2.33 us 70.35 us 25 OPENDIR 0.72 31.10 us 17.28 us 117.76 us 45 INODELK 0.98 318.59 us 154.88 us 878.82 us 6 READDIR 9.45 35.24 us 14.72 us 2246.13 us 521 FINODELK 14.32 160.73 us 22.70 us 316.55 us 173 LOOKUP 17.04 135.61 us 56.40 us 1874.03 us 244 FSYNC 19.01 36.40 us 16.78 us 2947.68 us 1014 WRITE 37.30 139.06 us 85.48 us 1078.96 us 521 FXATTROP Duration: 26 seconds Data Read: 0 bytes Data Written: 1014 bytes Brick: 10.32.9.8:/data/gfs/bricks/bricka/ovirt-data --------------------------------------------------- Cumulative Stats: Block Size: 1b+ No. of Reads: 0 No. of Writes: 372917 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 608 RELEASE 0.00 0.00 us 0.00 us 0.00 us 8719 RELEASEDIR 0.00 23.65 us 15.42 us 47.19 us 75 FLUSH 0.01 86.28 us 57.99 us 148.82 us 85 REMOVEXATTR 0.01 90.74 us 67.34 us 130.94 us 85 SETATTR 0.01 73.83 us 11.11 us 236.42 us 133 GETXATTR 0.02 83.84 us 41.98 us 499.27 us 146 OPEN 0.03 33.97 us 12.62 us 315.20 us 564 INODELK 0.06 316.19 us 16.57 us 1296.89 us 141 READDIR 0.13 49.96 us 1.25 us 911.95 us 1865 OPENDIR 7.61 27.61 us 11.45 us 86619.65 us 190604 FINODELK 9.35 55.71 us 14.57 us 117405.17 us 116005 WRITE 25.18 183.50 us 32.55 us 187376.40 us 94843 FSYNC 25.95 1022.20 us 15.51 us 136924.46 us 17546 LOOKUP 31.63 114.64 us 48.10 us 68557.92 us 190742 FXATTROP Duration: 11059 seconds Data Read: 0 bytes Data Written: 372917 bytes Interval 6 Stats: Block Size: 1b+ No. of Reads: 0 No. of Writes: 620 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 7 RELEASE 0.00 0.00 us 0.00 us 0.00 us 25 RELEASEDIR 0.02 96.14 us 96.14 us 96.14 us 1 SETATTR 0.02 102.92 us 102.92 us 102.92 us 1 REMOVEXATTR 0.13 84.08 us 51.75 us 221.40 us 7 OPEN 0.13 61.82 us 16.08 us 159.35 us 10 GETXATTR 0.25 46.62 us 1.88 us 77.92 us 25 OPENDIR 0.27 32.45 us 17.59 us 152.92 us 39 INODELK 0.34 261.02 us 153.06 us 363.57 us 6 READDIR 4.63 34.65 us 16.37 us 80.81 us 620 WRITE 5.77 156.66 us 22.46 us 304.56 us 171 LOOKUP 7.34 27.58 us 14.15 us 781.04 us 1236 FINODELK 31.07 116.50 us 78.86 us 261.81 us 1238 FXATTROP 50.02 381.21 us 39.71 us 149969.67 us 609 FSYNC Duration: 26 seconds Data Read: 0 bytes Data Written: 620 bytes Brick: 10.32.9.8:/data0/gfs/bricks/brick1/ovirt-data ---------------------------------------------------- Cumulative Stats: Block Size: 256b+ 512b+ 1024b+ No. of Reads: 8 1096 109 No. of Writes: 0 460 54715 Block Size: 2048b+ 4096b+ 8192b+ No. of Reads: 115 111912 132663 No. of Writes: 8883 212986 162092 Block Size: 16384b+ 32768b+ 65536b+ No. of Reads: 27012 2879 35570 No. of Writes: 48288 16340 24443 Block Size: 131072b+ 262144b+ 524288b+ No. of Reads: 3743 7207 32 No. of Writes: 19159 26333 792 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 1 FORGET 0.00 0.00 us 0.00 us 0.00 us 1137 RELEASE 0.00 0.00 us 0.00 us 0.00 us 8740 RELEASEDIR 0.00 43.88 us 10.83 us 250.84 us 124 GETXATTR 0.00 120.62 us 44.65 us 762.18 us 139 OPEN 0.01 206.40 us 12.21 us 864.86 us 141 READDIR 0.02 43.05 us 1.44 us 452.74 us 1865 OPENDIR 0.05 2104.16 us 1920.14 us 2569.20 us 85 READDIRP 0.05 36.26 us 19.25 us 347.13 us 5346 STATFS 0.08 35.13 us 16.14 us 340.33 us 8148 FSTAT 0.63 27.73 us 11.02 us 73986.88 us 87007 FINODELK 1.53 134.48 us 38.31 us 90956.10 us 43355 FSYNC 1.73 388.39 us 15.69 us 62037.95 us 16978 LOOKUP 2.63 31993.13 us 16.10 us 888210.80 us 314 INODELK 2.93 128.32 us 73.56 us 45501.15 us 87026 FXATTROP 8.70 235.34 us 76.95 us 172924.48 us 141069 WRITE 81.65 3612.62 us 74.71 us 427304.61 us 86246 READ Duration: 11059 seconds Data Read: 8285158628 bytes Data Written: 21265203712 bytes Interval 6 Stats: Block Size: 512b+ 1024b+ 2048b+ No. of Reads: 0 0 0 No. of Writes: 3 39 2 Block Size: 4096b+ 8192b+ 16384b+ No. of Reads: 4976 418 10 No. of Writes: 83 467 47 Block Size: 32768b+ 65536b+ 131072b+ No. of Reads: 0 264 0 No. of Writes: 32 35 41 Block Size: 262144b+ 524288b+ No. of Reads: 0 0 No. of Writes: 65 6 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 14 RELEASE 0.00 0.00 us 0.00 us 0.00 us 25 RELEASEDIR 0.00 51.92 us 14.43 us 92.40 us 9 GETXATTR 0.01 40.96 us 1.87 us 66.77 us 25 OPENDIR 0.01 75.90 us 55.90 us 118.42 us 14 OPEN 0.01 236.50 us 152.98 us 361.08 us 6 READDIR 0.01 32.60 us 22.90 us 46.88 us 53 STATFS 0.01 2244.32 us 2244.32 us 2244.32 us 1 READDIRP 0.02 38.95 us 19.17 us 88.64 us 84 FSTAT 0.10 28.63 us 16.14 us 161.63 us 534 FINODELK 0.20 160.39 us 23.30 us 348.09 us 193 LOOKUP 0.38 226.64 us 49.98 us 20602.07 us 261 FSYNC 0.47 137.09 us 84.61 us 5477.48 us 534 FXATTROP 0.97 181.82 us 86.92 us 637.73 us 820 WRITE 28.68 96252.17 us 18.55 us 888210.80 us 46 INODELK 69.12 1882.74 us 79.08 us 157169.15 us 5668 READ Duration: 26 seconds Data Read: 41271296 bytes Data Written: 49301504 bytes Brick: 10.32.9.3:/data/gfs/bricks/brick3/ovirt-data --------------------------------------------------- Cumulative Stats: Block Size: 256b+ 512b+ 1024b+ No. of Reads: 4 131 8 No. of Writes: 0 822 17574 Block Size: 2048b+ 4096b+ 8192b+ No. of Reads: 16 35286 49422 No. of Writes: 3335 177490 110247 Block Size: 16384b+ 32768b+ 65536b+ No. of Reads: 12220 1028 23348 No. of Writes: 30013 5366 10235 Block Size: 131072b+ 262144b+ 524288b+ No. of Reads: 974 1858 19 No. of Writes: 8375 8875 585 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 772 RELEASE 0.00 0.00 us 0.00 us 0.00 us 10592 RELEASEDIR 0.00 76.27 us 14.39 us 255.33 us 38 GETXATTR 0.00 56.50 us 19.21 us 185.55 us 75 FLUSH 0.00 150.43 us 76.71 us 292.39 us 85 REMOVEXATTR 0.00 157.91 us 97.76 us 237.19 us 85 SETATTR 0.00 308.86 us 26.64 us 1991.99 us 51 READDIR 0.01 183.93 us 62.86 us 1362.63 us 146 OPEN 0.01 73.43 us 23.05 us 777.89 us 564 INODELK 0.03 75.13 us 23.73 us 337.63 us 1679 STAT 0.03 89.35 us 1.99 us 1982.01 us 1820 OPENDIR 0.09 79.36 us 24.79 us 805.05 us 5346 STATFS 0.11 106.63 us 41.73 us 1740.16 us 5044 FSTAT 0.15 262.43 us 91.17 us 4453.28 us 2680 READDIRP 2.38 58.64 us 13.90 us 26031.44 us 190605 FINODELK 3.35 898.17 us 25.36 us 119501.01 us 17501 LOOKUP 6.60 326.75 us 45.00 us 283828.60 us 94843 FSYNC 7.89 194.27 us 67.03 us 157955.41 us 190743 FXATTROP 17.21 696.94 us 97.07 us 340140.18 us 116005 WRITE 62.13 7751.59 us 110.31 us 596087.45 us 37647 READ Duration: 12343 seconds Data Read: 3321340984 bytes Data Written: 8447544320 bytes Interval 6 Stats: Block Size: 512b+ 1024b+ 2048b+ No. of Reads: 0 0 0 No. of Writes: 3 109 5 Block Size: 4096b+ 8192b+ 16384b+ No. of Reads: 187 225 13 No. of Writes: 228 230 13 Block Size: 32768b+ 65536b+ 131072b+ No. of Reads: 0 185 0 No. of Writes: 4 2 12 Block Size: 262144b+ 524288b+ No. of Reads: 0 0 No. of Writes: 12 2 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 7 RELEASE 0.00 0.00 us 0.00 us 0.00 us 25 RELEASEDIR 0.00 173.56 us 173.56 us 173.56 us 1 REMOVEXATTR 0.00 190.79 us 190.79 us 190.79 us 1 SETATTR 0.01 80.71 us 20.92 us 143.06 us 5 GETXATTR 0.01 163.10 us 113.71 us 277.38 us 7 OPEN 0.01 76.13 us 26.98 us 125.17 us 15 STAT 0.02 77.17 us 1.99 us 133.67 us 25 OPENDIR 0.03 61.12 us 30.85 us 227.60 us 39 INODELK 0.05 656.71 us 271.17 us 1991.99 us 6 READDIR 0.05 75.47 us 32.20 us 361.18 us 53 STATFS 0.07 110.87 us 59.06 us 160.47 us 52 FSTAT 0.11 249.69 us 108.39 us 447.56 us 34 READDIRP 0.50 231.83 us 29.43 us 677.06 us 171 LOOKUP 0.91 58.52 us 20.49 us 999.15 us 1238 FINODELK 2.45 319.84 us 67.30 us 4499.21 us 609 FSYNC 2.79 356.93 us 114.39 us 4840.74 us 620 WRITE 3.02 193.78 us 110.23 us 2600.12 us 1239 FXATTROP 89.95 11709.07 us 151.63 us 198748.33 us 610 READ Duration: 26 seconds Data Read: 14946304 bytes Data Written: 11333120 bytes From gomathinayagam08 at gmail.com Mon Apr 1 04:20:38 2019 From: gomathinayagam08 at gmail.com (Gomathi Nayagam) Date: Mon, 1 Apr 2019 09:50:38 +0530 Subject: [Gluster-users] Reg: Gluster Message-ID: Hi User, We are testing geo-replication of gluster it is taking nearly 8 mins to transfer 16 GB size of data between the DCs while when transferred the same data over plain rsync it took only 2 mins. Can we know if we are missing something? Thanks & Regards, Gomathi Nayagam.D -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeff.forbes at mail.nacon.com Tue Apr 2 20:25:47 2019 From: jeff.forbes at mail.nacon.com (Jeff Forbes) Date: Tue, 2 Apr 2019 20:25:47 +0000 Subject: [Gluster-users] Write Speed unusually slow when both bricks are online Message-ID: <9C5C26E24872DB46B612A2AFF44DE6C31C467228@FAFNIR-DOMAIN.Nacon.com> I have two CentOS-6 servers running version 3.12.14 of gluster-server. Each server as one brick and they are configured to replicate between the two bricks. I also have two CentOS-6 servers running version 3.12.2-18 of glusterfs. These servers use a separate VLAN. Each server has two bonded 1 Gbps NICs to communicate the gluster traffic. File transfer speeds between these servers using rsync approaches 100 MBps. The client servers mount the gluster volume using this fstab entry: 192.168.40.30:gv0 /store glusterfs defaults, attribute-timeout=600,entry-timeout=600,negative-timeout=600,fopen-keep-cache,use-readdirp=no,log-level=WARNING 1 2 Reading data from the servers to the clients is similar to the rsync speed. The problem is that writing from the clients to the mounted gluster volume is less than 8 MB/s and fluctuates from less than 500 kB/s to 8 MB/s, as measured by the pv command. Using rsync, the speed fluctuates between 2 and 5 MBps. When the bonded nics on one of the gluster servers is shut down, the write speed to the remaining online brick is now similar to the read speed I can only assume that there is something wrong in my configuration, since a greater than 10-fold decrease in write speed when the bricks are replicating makes for an unusable system. Does anyone have any ideas what the problem may be? Server volume configuration: > sudo gluster volume info Volume Name: gv0 Type: Replicate Volume ID: d96bbb99-f264-4655-95ff-f9f05ca9ff55 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: 192.168.40.20:/export/scsi/brick Brick2: 192.168.40.30:/export/scsi/brick Options Reconfigured: performance.cache-size: 1GB performance.readdir-ahead: on features.cache-invalidation: on features.cache-invalidation-timeout: 600 performance.stat-prefetch: on performance.cache-samba-metadata: on performance.cache-invalidation: on performance.md-cache-timeout: 600 network.inode-lru-limit: 250000 performance.cache-refresh-timeout: 60 performance.read-ahead: disable performance.parallel-readdir: on performance.write-behind-window-size: 4MB performance.io-thread-count: 64 performance.client-io-threads: on performance.quick-read: on performance.flush-behind: on performance.write-behind: on nfs.disable: on client.event-threads: 3 server.event-threads: 3 server.allow-insecure: on From manschwetus at cs-software-gmbh.de Tue Apr 9 12:08:28 2019 From: manschwetus at cs-software-gmbh.de (Florian Manschwetus) Date: Tue, 9 Apr 2019 12:08:28 +0000 Subject: [Gluster-users] SEGFAULT in FUSE layer Message-ID: Hi All, I'd like to bring this bug report, I just opened, to your attention. https://bugzilla.redhat.com/show_bug.cgi?id=1697971 -- Mit freundlichen Gr??en / With kind regards Florian Manschwetus CS Software Concepts and Solutions GmbH Gesch?ftsf?hrer / Managing director: Dr. Werner Alexi Amtsgericht Wiesbaden HRB 10004 (Commercial registry) Schiersteiner Stra?e 31 D-65187 Wiesbaden Germany -------------- next part -------------- An HTML attachment was scrubbed... URL: From hunter86_bg at yahoo.com Wed Apr 10 17:00:20 2019 From: hunter86_bg at yahoo.com (Strahil) Date: Wed, 10 Apr 2019 20:00:20 +0300 Subject: [Gluster-users] Replica 3 - how to replace failed node (peer) Message-ID: I have used reset-brick - but I have just changed the brick layout. You may give it a try, but I guess you need your new brick to have same amount of space (or more). Maybe someone more experienced should share a more sound solution. Best Regards, Strahil NikolovOn Apr 10, 2019 12:42, Martin Toth wrote: > > Hi all, > > I am running replica 3 gluster with 3 bricks. One of my servers failed - all disks are showing errors and raid is in fault state. > > Type: Replicate > Volume ID: 41d5c283-3a74-4af8-a55d-924447bfa59a > Status: Started > Number of Bricks: 1 x 3 = 3 > Transport-type: tcp > Bricks: > Brick1: node1.san:/tank/gluster/gv0imagestore/brick1 > Brick2: node2.san:/tank/gluster/gv0imagestore/brick1 Brick3: node3.san:/tank/gluster/gv0imagestore/brick1 > > So one of my bricks is totally failed (node2). It went down and all data are lost (failed raid on node2). Now I am running only two bricks on 2 servers out from 3. > This is really critical problem for us, we can lost all data. I want to add new disks to node2, create new raid array on them and try to replace failed brick on this node. > > What is the procedure of replacing Brick2 on node2, can someone advice? I can?t find anything relevant in documentation. > > Thanks in advance, > Martin > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users From rgowdapp at redhat.com Thu Apr 11 01:15:18 2019 From: rgowdapp at redhat.com (Raghavendra Gowdappa) Date: Thu, 11 Apr 2019 06:45:18 +0530 Subject: [Gluster-users] Write Speed unusually slow when both bricks are online In-Reply-To: <9C5C26E24872DB46B612A2AFF44DE6C31C467228@FAFNIR-DOMAIN.Nacon.com> References: <9C5C26E24872DB46B612A2AFF44DE6C31C467228@FAFNIR-DOMAIN.Nacon.com> Message-ID: I would need following data: * client and brick volume profile - https://glusterdocs.readthedocs.io/en/latest/Administrator%20Guide/Performance%20Testing/ * cmdline of exact test you were running regards, On Wed, Apr 10, 2019 at 9:02 PM Jeff Forbes wrote: > I have two CentOS-6 servers running version 3.12.14 of gluster-server. > Each server as one brick and they are configured to replicate between > the two bricks. > > I also have two CentOS-6 servers running version 3.12.2-18 of > glusterfs. > > These servers use a separate VLAN. Each server has two bonded 1 Gbps > NICs to communicate the gluster traffic. File transfer speeds between > these servers using rsync approaches 100 MBps. > > The client servers mount the gluster volume using this fstab entry: > 192.168.40.30:gv0 /store glusterfs defaults, > attribute-timeout=600,entry-timeout=600,negative-timeout=600,fopen-keep-cache,use-readdirp=no,log-level=WARNING > 1 2 > > Reading data from the servers to the clients is similar to the rsync > speed. The problem is that writing from the clients to the mounted > gluster volume is less than 8 MB/s and fluctuates from less than 500 > kB/s to 8 MB/s, as measured by the pv command. Using rsync, the speed > fluctuates between 2 and 5 MBps. > > When the bonded nics on one of the gluster servers is shut down, the > write speed to the remaining online brick is now similar to the read > speed > > I can only assume that there is something wrong in my configuration, > since a greater than 10-fold decrease in write speed when the bricks > are replicating makes for an unusable system. > > > Does anyone have any ideas what the problem may be? > > > Server volume configuration: > > sudo gluster volume info > > Volume Name: gv0 > Type: Replicate > Volume ID: d96bbb99-f264-4655-95ff-f9f05ca9ff55 > Status: Started > Snapshot Count: 0 > Number of Bricks: 1 x 2 = 2 > Transport-type: tcp > Bricks: > Brick1: 192.168.40.20:/export/scsi/brick > Brick2: 192.168.40.30:/export/scsi/brick > Options Reconfigured: > performance.cache-size: 1GB > performance.readdir-ahead: on > features.cache-invalidation: on > features.cache-invalidation-timeout: 600 > performance.stat-prefetch: on > performance.cache-samba-metadata: on > performance.cache-invalidation: on > performance.md-cache-timeout: 600 > network.inode-lru-limit: 250000 > performance.cache-refresh-timeout: 60 > performance.read-ahead: disable > performance.parallel-readdir: on > performance.write-behind-window-size: 4MB > performance.io-thread-count: 64 > performance.client-io-threads: on > performance.quick-read: on > performance.flush-behind: on > performance.write-behind: on > nfs.disable: on > client.event-threads: 3 > server.event-threads: 3 > server.allow-insecure: on > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ksubrahm at redhat.com Thu Apr 11 04:21:32 2019 From: ksubrahm at redhat.com (Karthik Subrahmanya) Date: Thu, 11 Apr 2019 09:51:32 +0530 Subject: [Gluster-users] Is "replica 4 arbiter 1" allowed to tweak client-quorum? In-Reply-To: <39133cb5-f3fe-4fd6-2dba-45058cdb5f1a@fischer-ka.de> References: <39133cb5-f3fe-4fd6-2dba-45058cdb5f1a@fischer-ka.de> Message-ID: Hi, I guess you missed Ravishankar's reply [1] for this query, on your previous thread. [1] https://lists.gluster.org/pipermail/gluster-users/2019-April/036247.html Regards, Karthik On Wed, Apr 10, 2019 at 8:59 PM Ingo Fischer wrote: > Hi All, > > I had a replica 2 cluster to host my VM images from my Proxmox cluster. > I got a bit around split brain scenarios by using "nufa" to make sure > the files are located on the host where the machine also runs normally. > So in fact one replica could fail and I still had the VM working. > > But then I thought about doing better and decided to add a node to > increase replica and I decided against arbiter approach. During this I > also decided to go away from nufa to make it a more normal approach. > > But in fact by adding the third replica and removing nufa I'm not really > better on availability - only split-brain-chance. I'm still at the point > that only one node is allowed to fail because else the now active client > quorum is no longer met and FS goes read only (which in fact is not > really better then failing completely as it was before). > > So I thought about adding arbiter bricks as "kind of 4th replica (but > without space needs) ... but then I read in docs that only "replica 3 > arbiter 1" is allowed as combination. Is this still true? > If docs are true: Why arbiter is not allowed for higher replica counts? > It would allow to improve on client quorum in my understanding. > > Thank you for your opinion and/or facts :-) > > Ingo > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ksubrahm at redhat.com Thu Apr 11 04:34:23 2019 From: ksubrahm at redhat.com (Karthik Subrahmanya) Date: Thu, 11 Apr 2019 10:04:23 +0530 Subject: [Gluster-users] Replica 3 - how to replace failed node (peer) In-Reply-To: References: Message-ID: Hi Strahil, Thank you for sharing your experience with reset-brick option. Since he is using the gluster version 3.7.6, we do not have the reset-brick [1] option implemented there. It is introduced in 3.9.0. He has to go with replace-brick with the force option if he wants to use the same path & name for the new brick. Yes, it is recommended to have the new brick to be of the same size as that of the other bricks. [1] https://docs.gluster.org/en/latest/release-notes/3.9.0/#introducing-reset-brick-command Regards, Karthik On Wed, Apr 10, 2019 at 10:31 PM Strahil wrote: > I have used reset-brick - but I have just changed the brick layout. > You may give it a try, but I guess you need your new brick to have same > amount of space (or more). > > Maybe someone more experienced should share a more sound solution. > > Best Regards, > Strahil NikolovOn Apr 10, 2019 12:42, Martin Toth > wrote: > > > > Hi all, > > > > I am running replica 3 gluster with 3 bricks. One of my servers failed - > all disks are showing errors and raid is in fault state. > > > > Type: Replicate > > Volume ID: 41d5c283-3a74-4af8-a55d-924447bfa59a > > Status: Started > > Number of Bricks: 1 x 3 = 3 > > Transport-type: tcp > > Bricks: > > Brick1: node1.san:/tank/gluster/gv0imagestore/brick1 > > Brick2: node2.san:/tank/gluster/gv0imagestore/brick1 down > > Brick3: node3.san:/tank/gluster/gv0imagestore/brick1 > > > > So one of my bricks is totally failed (node2). It went down and all data > are lost (failed raid on node2). Now I am running only two bricks on 2 > servers out from 3. > > This is really critical problem for us, we can lost all data. I want to > add new disks to node2, create new raid array on them and try to replace > failed brick on this node. > > > > What is the procedure of replacing Brick2 on node2, can someone advice? > I can?t find anything relevant in documentation. > > > > Thanks in advance, > > Martin > > _______________________________________________ > > Gluster-users mailing list > > Gluster-users at gluster.org > > https://lists.gluster.org/mailman/listinfo/gluster-users > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From hunter86_bg at yahoo.com Thu Apr 11 04:44:57 2019 From: hunter86_bg at yahoo.com (Strahil) Date: Thu, 11 Apr 2019 07:44:57 +0300 Subject: [Gluster-users] Replica 3 - how to replace failed node (peer) Message-ID: Hi Karthnik, I used only once the brick replace function when I wanted to change my Arbiter (v3.12.15 in oVirt 4.2.7) and it was a complete disaster. Most probably I should have stopped the source arbiter before doing that, but the docs didn't mention it. Thus I always use reset-brick, as it never let me down. Best Regards, Strahil NikolovOn Apr 11, 2019 07:34, Karthik Subrahmanya wrote: > > Hi Strahil, > > Thank you for sharing your experience with reset-brick option. > Since he is using the gluster version 3.7.6, we do not have the reset-brick [1] option implemented there. It is introduced in 3.9.0. He has to go with replace-brick with the force option if he wants to use the same path & name for the new brick.? > Yes, it is recommended to have the new brick to be of the same size as that of the other bricks. > > [1]?https://docs.gluster.org/en/latest/release-notes/3.9.0/#introducing-reset-brick-command > > Regards, > Karthik > > On Wed, Apr 10, 2019 at 10:31 PM Strahil wrote: >> >> I have used reset-brick - but I have just changed the brick layout. >> You may give it a try, but I guess you need your new brick to have same amount of space (or more). >> >> Maybe someone more experienced should share a more sound solution. >> >> Best Regards, >> Strahil NikolovOn Apr 10, 2019 12:42, Martin Toth wrote: >> > >> > Hi all, >> > >> > I am running replica 3 gluster with 3 bricks. One of my servers failed - all disks are showing errors and raid is in fault state. >> > >> > Type: Replicate >> > Volume ID: 41d5c283-3a74-4af8-a55d-924447bfa59a >> > Status: Started >> > Number of Bricks: 1 x 3 = 3 >> > Transport-type: tcp >> > Bricks: >> > Brick1: node1.san:/tank/gluster/gv0imagestore/brick1 >> > Brick2: node2.san:/tank/gluster/gv0imagestore/brick1 > > Brick3: node3.san:/tank/gluster/gv0imagestore/brick1 >> > >> > So one of my bricks is totally failed (node2). It went down and all data are lost (failed raid on node2). Now I am running only two bricks on 2 servers out from 3. >> > This is really critical problem for us, we can lost all data. I want to add new disks to node2, create new raid array on them and try to replace failed brick on this node. >> > >> > What is the procedure of replacing Brick2 on node2, can someone advice? I can?t find anything relevant in documentation. >> > >> > Thanks in advance, >> > Martin >> > _______________________________________________ >> > Gluster-users mailing list >> > Gluster-users at gluster.org >> > https://lists.gluster.org/mailman/listinfo/gluster-users >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From ksubrahm at redhat.com Thu Apr 11 04:53:37 2019 From: ksubrahm at redhat.com (Karthik Subrahmanya) Date: Thu, 11 Apr 2019 10:23:37 +0530 Subject: [Gluster-users] Replica 3 - how to replace failed node (peer) In-Reply-To: References: Message-ID: Hi Strahil, Can you give us some more insights on - the volume configuration you were using? - why you wanted to replace your brick? - which brick(s) you tried replacing? - what problem(s) did you face? Regards, Karthik On Thu, Apr 11, 2019 at 10:14 AM Strahil wrote: > Hi Karthnik, > I used only once the brick replace function when I wanted to change my > Arbiter (v3.12.15 in oVirt 4.2.7) and it was a complete disaster. > Most probably I should have stopped the source arbiter before doing that, > but the docs didn't mention it. > > Thus I always use reset-brick, as it never let me down. > > Best Regards, > Strahil Nikolov > On Apr 11, 2019 07:34, Karthik Subrahmanya wrote: > > Hi Strahil, > > Thank you for sharing your experience with reset-brick option. > Since he is using the gluster version 3.7.6, we do not have the > reset-brick [1] option implemented there. It is introduced in 3.9.0. He has > to go with replace-brick with the force option if he wants to use the same > path & name for the new brick. > Yes, it is recommended to have the new brick to be of the same size as > that of the other bricks. > > [1] > https://docs.gluster.org/en/latest/release-notes/3.9.0/#introducing-reset-brick-command > > Regards, > Karthik > > On Wed, Apr 10, 2019 at 10:31 PM Strahil wrote: > > I have used reset-brick - but I have just changed the brick layout. > You may give it a try, but I guess you need your new brick to have same > amount of space (or more). > > Maybe someone more experienced should share a more sound solution. > > Best Regards, > Strahil NikolovOn Apr 10, 2019 12:42, Martin Toth > wrote: > > > > Hi all, > > > > I am running replica 3 gluster with 3 bricks. One of my servers failed - > all disks are showing errors and raid is in fault state. > > > > Type: Replicate > > Volume ID: 41d5c283-3a74-4af8-a55d-924447bfa59a > > Status: Started > > Number of Bricks: 1 x 3 = 3 > > Transport-type: tcp > > Bricks: > > Brick1: node1.san:/tank/gluster/gv0imagestore/brick1 > > Brick2: node2.san:/tank/gluster/gv0imagestore/brick1 down > > Brick3: node3.san:/tank/gluster/gv0imagestore/brick1 > > > > So one of my bricks is totally failed (node2). It went down and all data > are lost (failed raid on node2). Now I am running only two bricks on 2 > servers out from 3. > > This is really critical problem for us, we can lost all data. I want to > add new disks to node2, create new raid array on them and try to replace > failed brick on this node. > > > > What is the procedure of replacing Brick2 on node2, can someone advice? > I can?t find anything relevant in documentation. > > > > Thanks in advance, > > Martin > > _______________________________________________ > > Gluster-users mailing list > > Gluster-users at gluster.org > > https://lists.gluster.org/mailman/listinfo/gluster-users > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ksubrahm at redhat.com Thu Apr 11 04:55:37 2019 From: ksubrahm at redhat.com (Karthik Subrahmanya) Date: Thu, 11 Apr 2019 10:25:37 +0530 Subject: [Gluster-users] Replica 3 - how to replace failed node (peer) In-Reply-To: References: Message-ID: On Thu, Apr 11, 2019 at 10:23 AM Karthik Subrahmanya wrote: > Hi Strahil, > > Can you give us some more insights on > - the volume configuration you were using? > - why you wanted to replace your brick? > - which brick(s) you tried replacing? > - if you remember the commands/steps that you followed, please give that as well. > - what problem(s) did you face? > > Regards, > Karthik > > On Thu, Apr 11, 2019 at 10:14 AM Strahil wrote: > >> Hi Karthnik, >> I used only once the brick replace function when I wanted to change my >> Arbiter (v3.12.15 in oVirt 4.2.7) and it was a complete disaster. >> Most probably I should have stopped the source arbiter before doing that, >> but the docs didn't mention it. >> >> Thus I always use reset-brick, as it never let me down. >> >> Best Regards, >> Strahil Nikolov >> On Apr 11, 2019 07:34, Karthik Subrahmanya wrote: >> >> Hi Strahil, >> >> Thank you for sharing your experience with reset-brick option. >> Since he is using the gluster version 3.7.6, we do not have the >> reset-brick [1] option implemented there. It is introduced in 3.9.0. He has >> to go with replace-brick with the force option if he wants to use the same >> path & name for the new brick. >> Yes, it is recommended to have the new brick to be of the same size as >> that of the other bricks. >> >> [1] >> https://docs.gluster.org/en/latest/release-notes/3.9.0/#introducing-reset-brick-command >> >> Regards, >> Karthik >> >> On Wed, Apr 10, 2019 at 10:31 PM Strahil wrote: >> >> I have used reset-brick - but I have just changed the brick layout. >> You may give it a try, but I guess you need your new brick to have same >> amount of space (or more). >> >> Maybe someone more experienced should share a more sound solution. >> >> Best Regards, >> Strahil NikolovOn Apr 10, 2019 12:42, Martin Toth >> wrote: >> > >> > Hi all, >> > >> > I am running replica 3 gluster with 3 bricks. One of my servers failed >> - all disks are showing errors and raid is in fault state. >> > >> > Type: Replicate >> > Volume ID: 41d5c283-3a74-4af8-a55d-924447bfa59a >> > Status: Started >> > Number of Bricks: 1 x 3 = 3 >> > Transport-type: tcp >> > Bricks: >> > Brick1: node1.san:/tank/gluster/gv0imagestore/brick1 >> > Brick2: node2.san:/tank/gluster/gv0imagestore/brick1 > down >> > Brick3: node3.san:/tank/gluster/gv0imagestore/brick1 >> > >> > So one of my bricks is totally failed (node2). It went down and all >> data are lost (failed raid on node2). Now I am running only two bricks on 2 >> servers out from 3. >> > This is really critical problem for us, we can lost all data. I want to >> add new disks to node2, create new raid array on them and try to replace >> failed brick on this node. >> > >> > What is the procedure of replacing Brick2 on node2, can someone advice? >> I can?t find anything relevant in documentation. >> > >> > Thanks in advance, >> > Martin >> > _______________________________________________ >> > Gluster-users mailing list >> > Gluster-users at gluster.org >> > https://lists.gluster.org/mailman/listinfo/gluster-users >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From atumball at redhat.com Thu Apr 11 07:10:26 2019 From: atumball at redhat.com (Amar Tumballi Suryanarayan) Date: Thu, 11 Apr 2019 12:40:26 +0530 Subject: [Gluster-users] SEGFAULT in FUSE layer In-Reply-To: References: Message-ID: Thanks for the report Florian. We will look into this. On Wed, Apr 10, 2019 at 9:03 PM Florian Manschwetus < manschwetus at cs-software-gmbh.de> wrote: > Hi All, > > I?d like to bring this bug report, I just opened, to your attention. > > https://bugzilla.redhat.com/show_bug.cgi?id=1697971 > > > > > > > > -- > > Mit freundlichen Gr??en / With kind regards > > Florian Manschwetus > > > > CS Software Concepts and Solutions GmbH > > Gesch?ftsf?hrer / Managing director: Dr. Werner Alexi > > Amtsgericht Wiesbaden HRB 10004 (Commercial registry) > > Schiersteiner Stra?e 31 > > D-65187 Wiesbaden > > Germany > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -- Amar Tumballi (amarts) -------------- next part -------------- An HTML attachment was scrubbed... URL: From snowmailer at gmail.com Thu Apr 11 07:12:58 2019 From: snowmailer at gmail.com (Martin Toth) Date: Thu, 11 Apr 2019 09:12:58 +0200 Subject: [Gluster-users] Replica 3 - how to replace failed node (peer) In-Reply-To: <1634978A-E849-48DB-A160-B1AC3DB56D38@gmail.com> References: <0917AF4A-76EC-4A9E-820F-E0ADA2DA899A@gmail.com> <1634978A-E849-48DB-A160-B1AC3DB56D38@gmail.com> Message-ID: <69E7C95F-8A81-46CB-8BD8-F66B582144EC@gmail.com> Hi Karthik, more over, I would like to ask if there are some recommended settings/parameters for SHD in order to achieve good or fair I/O while volume will be healed when I will replace Brick (this should trigger healing process). I had some problems in past when healing was triggered, VM disks became unresponsive because healing took most of I/O. My volume containing only big files with VM disks. Thanks for suggestions. BR, Martin > On 10 Apr 2019, at 12:38, Martin Toth wrote: > > Thanks, this looks ok to me, I will reset brick because I don't have any data anymore on failed node so I can use same path / brick name. > > Is reseting brick dangerous command? Should I be worried about some possible failure that will impact remaining two nodes? I am running really old 3.7.6 but stable version. > > Thanks, > BR! > > Martin > > >> On 10 Apr 2019, at 12:20, Karthik Subrahmanya > wrote: >> >> Hi Martin, >> >> After you add the new disks and creating raid array, you can run the following command to replace the old brick with new one: >> >> - If you are going to use a different name to the new brick you can run >> gluster volume replace-brick commit force >> >> - If you are planning to use the same name for the new brick as well then you can use >> gluster volume reset-brick commit force >> Here old-brick & new-brick's hostname & path should be same. >> >> After replacing the brick, make sure the brick comes online using volume status. >> Heal should automatically start, you can check the heal status to see all the files gets replicated to the newly added brick. If it does not start automatically, you can manually start that by running gluster volume heal . >> >> HTH, >> Karthik >> >> On Wed, Apr 10, 2019 at 3:13 PM Martin Toth > wrote: >> Hi all, >> >> I am running replica 3 gluster with 3 bricks. One of my servers failed - all disks are showing errors and raid is in fault state. >> >> Type: Replicate >> Volume ID: 41d5c283-3a74-4af8-a55d-924447bfa59a >> Status: Started >> Number of Bricks: 1 x 3 = 3 >> Transport-type: tcp >> Bricks: >> Brick1: node1.san:/tank/gluster/gv0imagestore/brick1 >> Brick2: node2.san:/tank/gluster/gv0imagestore/brick1 > Brick3: node3.san:/tank/gluster/gv0imagestore/brick1 >> >> So one of my bricks is totally failed (node2). It went down and all data are lost (failed raid on node2). Now I am running only two bricks on 2 servers out from 3. >> This is really critical problem for us, we can lost all data. I want to add new disks to node2, create new raid array on them and try to replace failed brick on this node. >> >> What is the procedure of replacing Brick2 on node2, can someone advice? I can?t find anything relevant in documentation. >> >> Thanks in advance, >> Martin >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From hunter86_bg at yahoo.com Thu Apr 11 08:00:31 2019 From: hunter86_bg at yahoo.com (Strahil Nikolov) Date: Thu, 11 Apr 2019 08:00:31 +0000 (UTC) Subject: [Gluster-users] Gluster snapshot fails In-Reply-To: <1066182693.15102103.1554901501174.JavaMail.zimbra@redhat.com> References: <1800297079.797563.1554843999336.ref@mail.yahoo.com> <1800297079.797563.1554843999336@mail.yahoo.com> <1066182693.15102103.1554901501174.JavaMail.zimbra@redhat.com> Message-ID: <715831506.1767324.1554969631562@mail.yahoo.com> Hi Rafi, thanks for your update. I have tested again with another gluster volume.[root at ovirt1 glusterfs]# gluster volume info isos Volume Name: isos Type: Replicate Volume ID: 9b92b5bd-79f5-427b-bd8d-af28b038ed2a Status: Started Snapshot Count: 0 Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: ovirt1:/gluster_bricks/isos/isos Brick2: ovirt2:/gluster_bricks/isos/isos Brick3: ovirt3.localdomain:/gluster_bricks/isos/isos (arbiter) Options Reconfigured: cluster.granular-entry-heal: enable performance.strict-o-direct: on network.ping-timeout: 30 storage.owner-gid: 36 storage.owner-uid: 36 user.cifs: off features.shard: on cluster.shd-wait-qlength: 10000 cluster.shd-max-threads: 8 cluster.locking-scheme: granular cluster.data-self-heal-algorithm: full cluster.server-quorum-type: server cluster.quorum-type: auto cluster.eager-lock: enable network.remote-dio: off performance.low-prio-threads: 32 performance.io-cache: off performance.read-ahead: off performance.quick-read: off transport.address-family: inet nfs.disable: on performance.client-io-threads: off cluster.enable-shared-storage: enable Command run: logrotate -f glusterfs ; logrotate -f glusterfs-georep;? gluster snapshot create isos-snap-2019-04-11 isos? description TEST Logs:[root at ovirt1 glusterfs]# cat cli.log [2019-04-11 07:51:02.367453] I [cli.c:769:main] 0-cli: Started running gluster with version 5.5 [2019-04-11 07:51:02.486863] I [MSGID: 101190] [event-epoll.c:621:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1 [2019-04-11 07:51:02.556813] E [cli-rpc-ops.c:11293:gf_cli_snapshot] 0-cli: cli_to_glusterd for snapshot failed [2019-04-11 07:51:02.556880] I [input.c:31:cli_batch] 0-: Exiting with: -1 [root at ovirt1 glusterfs]# cat glusterd.log [2019-04-11 07:51:02.553357] E [MSGID: 106024] [glusterd-snapshot.c:2547:glusterd_snapshot_create_prevalidate] 0-management: Snapshot is supported only for thin provisioned LV. Ensure that all bricks of isos are thinly provisioned LV. [2019-04-11 07:51:02.553365] W [MSGID: 106029] [glusterd-snapshot.c:8613:glusterd_snapshot_prevalidate] 0-management: Snapshot create pre-validation failed [2019-04-11 07:51:02.553703] W [MSGID: 106121] [glusterd-mgmt.c:147:gd_mgmt_v3_pre_validate_fn] 0-management: Snapshot Prevalidate Failed [2019-04-11 07:51:02.553719] E [MSGID: 106121] [glusterd-mgmt.c:1015:glusterd_mgmt_v3_pre_validate] 0-management: Pre Validation failed for operation Snapshot on local node My LVs hosting the bricks are:[root at ovirt1 ~]# lvs gluster_vg_md0 ? LV????????????? VG???????????? Attr?????? LSize?? Pool??????????? Origin Data%? Meta%? Move Log Cpy%Sync Convert ? gluster_lv_data gluster_vg_md0 Vwi-aot--- 500.00g my_vdo_thinpool??????? 35.97 ? gluster_lv_isos gluster_vg_md0 Vwi-aot---? 50.00g my_vdo_thinpool??????? 52.11 ? my_vdo_thinpool gluster_vg_md0 twi-aot---?? 9.86t??????????????????????? 2.04?? 11.45 [root at ovirt1 ~]# ssh ovirt2 "lvs gluster_vg_md0" ? LV????????????? VG???????????? Attr?????? LSize?? Pool??????????? Origin Data%? Meta%? Move Log Cpy%Sync Convert ? gluster_lv_data gluster_vg_md0 Vwi-aot--- 500.00g my_vdo_thinpool??????? 35.98 ? gluster_lv_isos gluster_vg_md0 Vwi-aot---? 50.00g my_vdo_thinpool??????? 25.94 ? my_vdo_thinpool gluster_vg_md0 twi-aot---? <9.77t??????????????????????? 1.93?? 11.39 [root at ovirt1 ~]# ssh ovirt3 "lvs gluster_vg_sda3" ? LV??????????????????? VG????????????? Attr?????? LSize? Pool????????????????? Origin Data%? Meta%? Move Log Cpy%Sync Convert ? gluster_lv_data?????? gluster_vg_sda3 Vwi-aotz-- 15.00g gluster_thinpool_sda3??????? 0.17 ? gluster_lv_engine???? gluster_vg_sda3 Vwi-aotz-- 15.00g gluster_thinpool_sda3??????? 0.16 ? gluster_lv_isos?????? gluster_vg_sda3 Vwi-aotz-- 15.00g gluster_thinpool_sda3??????? 0.12 ? gluster_thinpool_sda3 gluster_vg_sda3 twi-aotz-- 41.00g????????????????????????????? 0.16?? 1.58 As you can see - all bricks are thin LV and space is not the issue. Can someone hint me how to enable debug , so gluster logs can show the reason for that pre-check failure ? Best Regards,Strahil Nikolov ? ?????, 10 ????? 2019 ?., 9:05:15 ?. ???????-4, Rafi Kavungal Chundattu Parambil ??????: Hi Strahil, The name of device is not at all a problem here. Can you please check the log of glusterd, and see if there is any useful information about the failure. Also please provide the output of `lvscan` and `lvs --noheadings -o pool_lv` from all nodes Regards Rafi KC ----- Original Message ----- From: "Strahil Nikolov" To: gluster-users at gluster.org Sent: Wednesday, April 10, 2019 2:36:39 AM Subject: [Gluster-users] Gluster snapshot fails Hello Community, I have a problem running a snapshot of a replica 3 arbiter 1 volume. Error: [root at ovirt2 ~]# gluster snapshot create before-423 engine description "Before upgrade of engine from 4.2.2 to 4.2.3" snapshot create: failed: Snapshot is supported only for thin provisioned LV. Ensure that all bricks of engine are thinly provisioned LV. Snapshot command failed Volume info: Volume Name: engine Type: Replicate Volume ID: 30ca1cc2-f2f7-4749-9e2e-cee9d7099ded Status: Started Snapshot Count: 0 Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: ovirt1:/gluster_bricks/engine/engine Brick2: ovirt2:/gluster_bricks/engine/engine Brick3: ovirt3:/gluster_bricks/engine/engine (arbiter) Options Reconfigured: cluster.granular-entry-heal: enable performance.strict-o-direct: on network.ping-timeout: 30 storage.owner-gid: 36 storage.owner-uid: 36 user.cifs: off features.shard: on cluster.shd-wait-qlength: 10000 cluster.shd-max-threads: 8 cluster.locking-scheme: granular cluster.data-self-heal-algorithm: full cluster.server-quorum-type: server cluster.quorum-type: auto cluster.eager-lock: enable network.remote-dio: off performance.low-prio-threads: 32 performance.io-cache: off performance.read-ahead: off performance.quick-read: off transport.address-family: inet nfs.disable: on performance.client-io-threads: off cluster.enable-shared-storage: enable All bricks are on thin lvm with plenty of space, the only thing that could be causing it is that ovirt1 & ovirt2 are on /dev/gluster_vg_ssd/gluster_lv_engine , while arbiter is on /dev/gluster_vg_sda3/gluster_lv_engine. Is that the issue ? Should I rename my brick's VG ? If so, why there is no mentioning in the documentation ? Best Regards, Strahil Nikolov _______________________________________________ Gluster-users mailing list Gluster-users at gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From hunter86_bg at yahoo.com Thu Apr 11 08:10:22 2019 From: hunter86_bg at yahoo.com (Strahil Nikolov) Date: Thu, 11 Apr 2019 08:10:22 +0000 (UTC) Subject: [Gluster-users] Replica 3 - how to replace failed node (peer) In-Reply-To: References: Message-ID: <2030773662.1746728.1554970222668@mail.yahoo.com> Hi Karthik, - the volume configuration you were using?I used oVirt 4.2.6 Gluster Wizard, so I guess - we need to involve the oVirt devs here. - why you wanted to replace your brick?I have deployed the arbiter on another location as I thought I can deploy the Thin Arbiter (still waiting the docs to be updated), but once I realized that GlusterD doesn't support Thin Arbiter, I had to build another machine for a local arbiter - thus a replacement was needed.- which brick(s) you tried replacing?I was replacing the old arbiter with a new one- what problem(s) did you face?All oVirt VMs got paused due to I/O errors. At the end, I have rebuild the whole setup and I never tried to replace the brick this way (used only reset-brick which didn't cause any issues). As I mentioned that was on v3.12, which is not the default for oVirt 4.3.x - so my guess is that it is OK now (current is v5.5). Just sharing my experience. Best Regards,Strahil Nikolov ? ?????????, 11 ????? 2019 ?., 0:53:52 ?. ???????-4, Karthik Subrahmanya ??????: Hi Strahil, Can you give us some more insights on- the volume configuration you were using?- why you wanted to replace your brick?- which brick(s) you tried replacing?- what problem(s) did you face? Regards,Karthik On Thu, Apr 11, 2019 at 10:14 AM Strahil wrote: Hi Karthnik, I used only once the brick replace function when I wanted to change my Arbiter (v3.12.15 in oVirt 4.2.7)? and it was a complete disaster. Most probably I should have stopped the source arbiter before doing that, but the docs didn't mention it. Thus I always use reset-brick, as it never let me down. Best Regards, Strahil Nikolov On Apr 11, 2019 07:34, Karthik Subrahmanya wrote: Hi Strahil, Thank you for sharing your experience with reset-brick option.Since he is using the gluster version 3.7.6, we do not have the reset-brick [1] option implemented there. It is introduced in 3.9.0. He has to go with replace-brick with the force option if he wants to use the same path & name for the new brick.?Yes, it is recommended to have the new brick to be of the same size as that of the other bricks. [1]?https://docs.gluster.org/en/latest/release-notes/3.9.0/#introducing-reset-brick-command Regards,Karthik On Wed, Apr 10, 2019 at 10:31 PM Strahil wrote: I have used reset-brick - but I have just changed the brick layout. You may give it a try, but I guess you need your new brick to have same amount of space (or more). Maybe someone more experienced should share a more sound solution. Best Regards, Strahil NikolovOn Apr 10, 2019 12:42, Martin Toth wrote: > > Hi all, > > I am running replica 3 gluster with 3 bricks. One of my servers failed - all disks are showing errors and raid is in fault state. > > Type: Replicate > Volume ID: 41d5c283-3a74-4af8-a55d-924447bfa59a > Status: Started > Number of Bricks: 1 x 3 = 3 > Transport-type: tcp > Bricks: > Brick1: node1.san:/tank/gluster/gv0imagestore/brick1 > Brick2: node2.san:/tank/gluster/gv0imagestore/brick1 Brick3: node3.san:/tank/gluster/gv0imagestore/brick1 > > So one of my bricks is totally failed (node2). It went down and all data are lost (failed raid on node2). Now I am running only two bricks on 2 servers out from 3. > This is really critical problem for us, we can lost all data. I want to add new disks to node2, create new raid array on them and try to replace failed brick on this node. > > What is the procedure of replacing Brick2 on node2, can someone advice? I can?t find anything relevant in documentation. > > Thanks in advance, > Martin > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users _______________________________________________ Gluster-users mailing list Gluster-users at gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From hunter86_bg at yahoo.com Thu Apr 11 08:11:19 2019 From: hunter86_bg at yahoo.com (Strahil Nikolov) Date: Thu, 11 Apr 2019 08:11:19 +0000 (UTC) Subject: [Gluster-users] Replica 3 - how to replace failed node (peer) In-Reply-To: References: Message-ID: <245562760.1708418.1554970279439@mail.yahoo.com> The command should be copy & paste from the docs. Can't check it out - as the systems were wiped. Best Regards,Strahil Nikolov -------------- next part -------------- An HTML attachment was scrubbed... URL: From atumball at redhat.com Thu Apr 11 08:54:14 2019 From: atumball at redhat.com (Amar Tumballi Suryanarayan) Date: Thu, 11 Apr 2019 14:24:14 +0530 Subject: [Gluster-users] Proposal: Changes in Gluster Community meetings In-Reply-To: References: <62104B6F-99CF-4C22-80FC-9C177F73E897@onholyground.com> Message-ID: Hi All, Below is the final details of our community meeting, and I will be sending invites to mailing list following this email. You can add Gluster Community Calendar so you can get notifications on the meetings. We are starting the meetings from next week. For the first meeting, we need 1 volunteer from users to discuss the use case / what went well, and what went bad, etc. preferrably in APAC region. NA/EMEA region, next week. Draft Content: https://hackmd.io/OqZbh7gfQe6uvVUXUVKJ5g ---- Gluster Community Meeting Previous Meeting minutes: - http://github.com/gluster/community Date/Time: Check the community calendar Bridge - APAC friendly hours - Bridge: https://bluejeans.com/836554017 - NA/EMEA - Bridge: https://bluejeans.com/486278655 ------------------------------ Attendance - Name, Company Host - Who will host next meeting? - Host will need to send out the agenda 24hr - 12hrs in advance to mailing list, and also make sure to send the meeting minutes. - Host will need to reach out to one user at least who can talk about their usecase, their experience, and their needs. - Host needs to send meeting minutes as PR to http://github.com/gluster/community User stories - Discuss 1 usecase from a user. - How was the architecture derived, what volume type used, options, etc? - What were the major issues faced ? How to improve them? - What worked good? - How can we all collaborate well, so it is win-win for the community and the user? How can we Community - Any release updates? - Blocker issues across the project? - Metrics - Number of new bugs since previous meeting. How many are not triaged? - Number of emails, anything unanswered? Conferences / Meetups - Any conference in next 1 month where gluster-developers are going? gluster-users are going? So we can meet and discuss. Developer focus - Any design specs to discuss? - Metrics of the week? - Coverity - Clang-Scan - Number of patches from new developers. - Did we increase test coverage? - [Atin] Also talk about most frequent test failures in the CI and carve out an AI to get them fixed. RoundTable - ---- Regards, Amar On Mon, Mar 25, 2019 at 8:53 PM Amar Tumballi Suryanarayan < atumball at redhat.com> wrote: > Thanks for the feedback Darrell, > > The new proposal is to have one in North America 'morning' time. (10AM > PST), And another in ASIA day time, which is evening 7pm/6pm in Australia, > 9pm Newzealand, 5pm Tokyo, 4pm Beijing. > > For example, if we choose Every other Tuesday for meeting, and 1st of the > month is Tuesday, we would have North America time for 1st, and on 15th it > would be ASIA/Pacific time. > > Hopefully, this way, we can cover all the timezones, and meeting minutes > would be committed to github repo, so that way, it will be easier for > everyone to be aware of what is happening. > > Regards, > Amar > > On Mon, Mar 25, 2019 at 8:40 PM Darrell Budic > wrote: > >> As a user, I?d like to visit more of these, but the time slot is my 3AM. >> Any possibility for a rolling schedule (move meeting +6 hours each week >> with rolling attendance from maintainers?) or an occasional regional >> meeting 12 hours opposed to the one you?re proposing? >> >> -Darrell >> >> On Mar 25, 2019, at 4:25 AM, Amar Tumballi Suryanarayan < >> atumball at redhat.com> wrote: >> >> All, >> >> We currently have 3 meetings which are public: >> >> 1. Maintainer's Meeting >> >> - Runs once in 2 weeks (on Mondays), and current attendance is around 3-5 >> on an avg, and not much is discussed. >> - Without majority attendance, we can't take any decisions too. >> >> 2. Community meeting >> >> - Supposed to happen on #gluster-meeting, every 2 weeks, and is the only >> meeting which is for 'Community/Users'. Others are for developers as of >> now. >> Sadly attendance is getting closer to 0 in recent times. >> >> 3. GCS meeting >> >> - We started it as an effort inside Red Hat gluster team, and opened it >> up for community from Jan 2019, but the attendance was always from RHT >> members, and haven't seen any traction from wider group. >> >> So, I have a proposal to call out for cancelling all these meeting, and >> keeping just 1 weekly 'Community' meeting, where even topics related to >> maintainers and GCS and other projects can be discussed. >> >> I have a template of a draft template @ >> https://hackmd.io/OqZbh7gfQe6uvVUXUVKJ5g >> >> Please feel free to suggest improvements, both in agenda and in timings. >> So, we can have more participation from members of community, which allows >> more user - developer interactions, and hence quality of project. >> >> Waiting for feedbacks, >> >> Regards, >> Amar >> >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users >> >> >> > > -- > Amar Tumballi (amarts) > -- Amar Tumballi (amarts) -------------- next part -------------- An HTML attachment was scrubbed... URL: From amarts at gmail.com Thu Apr 11 08:56:48 2019 From: amarts at gmail.com (amarts at gmail.com) Date: Thu, 11 Apr 2019 08:56:48 +0000 Subject: [Gluster-users] Invitation: Gluster Community Meeting (APAC friendly hours) @ Tue Apr 16, 2019 11:30am - 12:30pm (IST) (gluster-users@gluster.org) Message-ID: <000000000000cc604305863d5d35@google.com> You have been invited to the following event. Title: Gluster Community Meeting (APAC friendly hours) Bridge: https://bluejeans.com/836554017 Meeting minutes: https://hackmd.io/OqZbh7gfQe6uvVUXUVKJ5g?both Previous Meeting notes: http://github.com/gluster/community When: Tue Apr 16, 2019 11:30am ? 12:30pm India Standard Time - Kolkata Where: https://bluejeans.com/836554017 Calendar: gluster-users at gluster.org Who: * amarts at gmail.com - creator * gluster-users at gluster.org * maintainers at gluster.org * gluster-devel at gluster.org Event details: https://www.google.com/calendar/event?action=VIEW&eid=MjU2dWllNDQyM2tqaGs0ZjhidGl2YmdtM2YgZ2x1c3Rlci11c2Vyc0BnbHVzdGVyLm9yZw&tok=NTIjdmViajVibDBrbnNiOWQwY205ZWg5cGJsaTRAZ3JvdXAuY2FsZW5kYXIuZ29vZ2xlLmNvbTZlODU1NTU1Mzk4NjllOTQ4NzUxODAxYTQ4M2E4Y2ExMDRhODg3YjY&ctz=Asia%2FKolkata&hl=en&es=0 Invitation from Google Calendar: https://www.google.com/calendar/ You are receiving this courtesy email at the account gluster-users at gluster.org because you are an attendee of this event. To stop receiving future updates for this event, decline this event. Alternatively you can sign up for a Google account at https://www.google.com/calendar/ and control your notification settings for your entire calendar. Forwarding this invitation could allow any recipient to send a response to the organizer and be added to the guest list, or invite others regardless of their own invitation status, or to modify your RSVP. Learn more at https://support.google.com/calendar/answer/37135#forwarding -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/calendar Size: 1822 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: invite.ics Type: application/ics Size: 1862 bytes Desc: not available URL: From amarts at gmail.com Thu Apr 11 08:57:51 2019 From: amarts at gmail.com (amarts at gmail.com) Date: Thu, 11 Apr 2019 08:57:51 +0000 Subject: [Gluster-users] Invitation: Gluster Community Meeting (NA/EMEA friendly hours) @ Tue Apr 23, 2019 10:30pm - 11:30pm (IST) (gluster-users@gluster.org) Message-ID: <000000000000910a0605863d61a0@google.com> You have been invited to the following event. Title: Gluster Community Meeting (NA/EMEA friendly hours) Bridge: https://bluejeans.com/486278655 Meeting minutes: https://hackmd.io/OqZbh7gfQe6uvVUXUVKJ5g?both Previous Meeting notes: http://github.com/gluster/community When: Tue Apr 23, 2019 10:30pm ? 11:30pm India Standard Time - Kolkata Where: https://bluejeans.com/486278655 Calendar: gluster-users at gluster.org Who: * amarts at gmail.com - creator * gluster-users at gluster.org * maintainers at gluster.org * gluster-devel at gluster.org Event details: https://www.google.com/calendar/event?action=VIEW&eid=N3Y1NWZkZTkxNWQzc3QxcHR2OHJnNm4zNzYgZ2x1c3Rlci11c2Vyc0BnbHVzdGVyLm9yZw&tok=NTIjdmViajVibDBrbnNiOWQwY205ZWg5cGJsaTRAZ3JvdXAuY2FsZW5kYXIuZ29vZ2xlLmNvbTc2MWYzNWEwZmFiMjk5YzFlYmM3NzkyNjNhOWY5MzExYTM4NGYwMWQ&ctz=Asia%2FKolkata&hl=en&es=0 Invitation from Google Calendar: https://www.google.com/calendar/ You are receiving this courtesy email at the account gluster-users at gluster.org because you are an attendee of this event. To stop receiving future updates for this event, decline this event. Alternatively you can sign up for a Google account at https://www.google.com/calendar/ and control your notification settings for your entire calendar. Forwarding this invitation could allow any recipient to send a response to the organizer and be added to the guest list, or invite others regardless of their own invitation status, or to modify your RSVP. Learn more at https://support.google.com/calendar/answer/37135#forwarding -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/calendar Size: 1827 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: invite.ics Type: application/ics Size: 1867 bytes Desc: not available URL: From ksubrahm at redhat.com Thu Apr 11 10:31:51 2019 From: ksubrahm at redhat.com (Karthik Subrahmanya) Date: Thu, 11 Apr 2019 16:01:51 +0530 Subject: [Gluster-users] Replica 3 - how to replace failed node (peer) In-Reply-To: <69E7C95F-8A81-46CB-8BD8-F66B582144EC@gmail.com> References: <0917AF4A-76EC-4A9E-820F-E0ADA2DA899A@gmail.com> <1634978A-E849-48DB-A160-B1AC3DB56D38@gmail.com> <69E7C95F-8A81-46CB-8BD8-F66B582144EC@gmail.com> Message-ID: On Thu, Apr 11, 2019 at 12:43 PM Martin Toth wrote: > Hi Karthik, > > more over, I would like to ask if there are some recommended > settings/parameters for SHD in order to achieve good or fair I/O while > volume will be healed when I will replace Brick (this should trigger > healing process). > If I understand you concern correctly, you need to get fair I/O performance for clients while healing takes place as part of the replace brick operation. For this you can turn off the "data-self-heal" and "metadata-self-heal" options until the heal completes on the new brick. Turning off client side healing doesn't compromise data integrity and consistency. During the read request from client, pending xattr is evaluated for replica copies and read is only served from correct copy. During writes, IO will continue on both the replicas, SHD will take care of healing files. After replacing the brick, we strongly recommend you to consider upgrading your gluster to one of the maintained versions. We have many stability related fixes there, which can handle some critical issues and corner cases which you could hit during these kind of scenarios. Regards, Karthik > I had some problems in past when healing was triggered, VM disks became > unresponsive because healing took most of I/O. My volume containing only > big files with VM disks. > > Thanks for suggestions. > BR, > Martin > > On 10 Apr 2019, at 12:38, Martin Toth wrote: > > Thanks, this looks ok to me, I will reset brick because I don't have any > data anymore on failed node so I can use same path / brick name. > > Is reseting brick dangerous command? Should I be worried about some > possible failure that will impact remaining two nodes? I am running really > old 3.7.6 but stable version. > > Thanks, > BR! > > Martin > > > On 10 Apr 2019, at 12:20, Karthik Subrahmanya wrote: > > Hi Martin, > > After you add the new disks and creating raid array, you can run the > following command to replace the old brick with new one: > > - If you are going to use a different name to the new brick you can run > gluster volume replace-brick commit force > > - If you are planning to use the same name for the new brick as well then > you can use > gluster volume reset-brick commit force > Here old-brick & new-brick's hostname & path should be same. > > After replacing the brick, make sure the brick comes online using volume > status. > Heal should automatically start, you can check the heal status to see all > the files gets replicated to the newly added brick. If it does not start > automatically, you can manually start that by running gluster volume heal > . > > HTH, > Karthik > > On Wed, Apr 10, 2019 at 3:13 PM Martin Toth wrote: > >> Hi all, >> >> I am running replica 3 gluster with 3 bricks. One of my servers failed - >> all disks are showing errors and raid is in fault state. >> >> Type: Replicate >> Volume ID: 41d5c283-3a74-4af8-a55d-924447bfa59a >> Status: Started >> Number of Bricks: 1 x 3 = 3 >> Transport-type: tcp >> Bricks: >> Brick1: node1.san:/tank/gluster/gv0imagestore/brick1 >> Brick2: node2.san:/tank/gluster/gv0imagestore/brick1 > Brick3: node3.san:/tank/gluster/gv0imagestore/brick1 >> >> So one of my bricks is totally failed (node2). It went down and all data >> are lost (failed raid on node2). Now I am running only two bricks on 2 >> servers out from 3. >> This is really critical problem for us, we can lost all data. I want to >> add new disks to node2, create new raid array on them and try to replace >> failed brick on this node. >> >> What is the procedure of replacing Brick2 on node2, can someone advice? I >> can?t find anything relevant in documentation. >> >> Thanks in advance, >> Martin >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ksubrahm at redhat.com Thu Apr 11 10:45:52 2019 From: ksubrahm at redhat.com (Karthik Subrahmanya) Date: Thu, 11 Apr 2019 16:15:52 +0530 Subject: [Gluster-users] Replica 3 - how to replace failed node (peer) In-Reply-To: <2030773662.1746728.1554970222668@mail.yahoo.com> References: <2030773662.1746728.1554970222668@mail.yahoo.com> Message-ID: On Thu, Apr 11, 2019 at 1:40 PM Strahil Nikolov wrote: > Hi Karthik, > > - the volume configuration you were using? > I used oVirt 4.2.6 Gluster Wizard, so I guess - we need to involve the > oVirt devs here. > - why you wanted to replace your brick? > I have deployed the arbiter on another location as I thought I can deploy > the Thin Arbiter (still waiting the docs to be updated), but once I > realized that GlusterD doesn't support Thin Arbiter, I had to build another > machine for a local arbiter - thus a replacement was needed. > We are working on supporting Thin-arbiter with GlusterD. Once done, we will update on the users list so that you can play with it and let us know your experience. > - which brick(s) you tried replacing? > I was replacing the old arbiter with a new one > - what problem(s) did you face? > All oVirt VMs got paused due to I/O errors. > There could be many reasons for this. Without knowing the exact state of the system at that time, I am afraid to make any comment on this. > > At the end, I have rebuild the whole setup and I never tried to replace > the brick this way (used only reset-brick which didn't cause any issues). > > As I mentioned that was on v3.12, which is not the default for oVirt > 4.3.x - so my guess is that it is OK now (current is v5.5). > I don't remember anyone complaining about this recently. This should work in the latest releases. > > Just sharing my experience. > Highly appreciated. Regards, Karthik > > Best Regards, > Strahil Nikolov > > ? ?????????, 11 ????? 2019 ?., 0:53:52 ?. ???????-4, Karthik Subrahmanya < > ksubrahm at redhat.com> ??????: > > > Hi Strahil, > > Can you give us some more insights on > - the volume configuration you were using? > - why you wanted to replace your brick? > - which brick(s) you tried replacing? > - what problem(s) did you face? > > Regards, > Karthik > > On Thu, Apr 11, 2019 at 10:14 AM Strahil wrote: > > Hi Karthnik, > I used only once the brick replace function when I wanted to change my > Arbiter (v3.12.15 in oVirt 4.2.7) and it was a complete disaster. > Most probably I should have stopped the source arbiter before doing that, > but the docs didn't mention it. > > Thus I always use reset-brick, as it never let me down. > > Best Regards, > Strahil Nikolov > On Apr 11, 2019 07:34, Karthik Subrahmanya wrote: > > Hi Strahil, > > Thank you for sharing your experience with reset-brick option. > Since he is using the gluster version 3.7.6, we do not have the > reset-brick [1] option implemented there. It is introduced in 3.9.0. He has > to go with replace-brick with the force option if he wants to use the same > path & name for the new brick. > Yes, it is recommended to have the new brick to be of the same size as > that of the other bricks. > > [1] > https://docs.gluster.org/en/latest/release-notes/3.9.0/#introducing-reset-brick-command > > Regards, > Karthik > > On Wed, Apr 10, 2019 at 10:31 PM Strahil wrote: > > I have used reset-brick - but I have just changed the brick layout. > You may give it a try, but I guess you need your new brick to have same > amount of space (or more). > > Maybe someone more experienced should share a more sound solution. > > Best Regards, > Strahil NikolovOn Apr 10, 2019 12:42, Martin Toth > wrote: > > > > Hi all, > > > > I am running replica 3 gluster with 3 bricks. One of my servers failed - > all disks are showing errors and raid is in fault state. > > > > Type: Replicate > > Volume ID: 41d5c283-3a74-4af8-a55d-924447bfa59a > > Status: Started > > Number of Bricks: 1 x 3 = 3 > > Transport-type: tcp > > Bricks: > > Brick1: node1.san:/tank/gluster/gv0imagestore/brick1 > > Brick2: node2.san:/tank/gluster/gv0imagestore/brick1 down > > Brick3: node3.san:/tank/gluster/gv0imagestore/brick1 > > > > So one of my bricks is totally failed (node2). It went down and all data > are lost (failed raid on node2). Now I am running only two bricks on 2 > servers out from 3. > > This is really critical problem for us, we can lost all data. I want to > add new disks to node2, create new raid array on them and try to replace > failed brick on this node. > > > > What is the procedure of replacing Brick2 on node2, can someone advice? > I can?t find anything relevant in documentation. > > > > Thanks in advance, > > Martin > > _______________________________________________ > > Gluster-users mailing list > > Gluster-users at gluster.org > > https://lists.gluster.org/mailman/listinfo/gluster-users > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From snowmailer at gmail.com Thu Apr 11 13:08:02 2019 From: snowmailer at gmail.com (Martin Toth) Date: Thu, 11 Apr 2019 15:08:02 +0200 Subject: [Gluster-users] Replica 3 - how to replace failed node (peer) In-Reply-To: References: <0917AF4A-76EC-4A9E-820F-E0ADA2DA899A@gmail.com> <1634978A-E849-48DB-A160-B1AC3DB56D38@gmail.com> <69E7C95F-8A81-46CB-8BD8-F66B582144EC@gmail.com> Message-ID: <00009213-6BF3-4A7F-AFA7-AC076B04496C@gmail.com> Hi Karthik, > On Thu, Apr 11, 2019 at 12:43 PM Martin Toth > wrote: > Hi Karthik, > > more over, I would like to ask if there are some recommended settings/parameters for SHD in order to achieve good or fair I/O while volume will be healed when I will replace Brick (this should trigger healing process). > If I understand you concern correctly, you need to get fair I/O performance for clients while healing takes place as part of the replace brick operation. For this you can turn off the "data-self-heal" and "metadata-self-heal" options until the heal completes on the new brick. This is exactly what I mean. I am running VM disks on remaining 2 (out of 3 - one failed as mentioned) nodes and I need to ensure there will be fair I/O performance available on these two nodes while replace brick operation will heal volume. I will not run any VMs on node where replace brick operation will be running. So if I understand correctly, when I will set : # gluster volume set cluster.data-self-heal off # gluster volume set cluster.metadata-self-heal off this will tell Gluster clients (libgfapi and FUSE mount) not to read from node ?where replace brick operation? is in place but from remaing two healthy nodes. Is this correct ? Thanks for clarification. > Turning off client side healing doesn't compromise data integrity and consistency. During the read request from client, pending xattr is evaluated for replica copies and read is only served from correct copy. During writes, IO will continue on both the replicas, SHD will take care of healing files. > After replacing the brick, we strongly recommend you to consider upgrading your gluster to one of the maintained versions. We have many stability related fixes there, which can handle some critical issues and corner cases which you could hit during these kind of scenarios. This will be first priority in infrastructure after fixing this cluster back to fully functional replica3. I will upgrade to 3.12.x and then to version 5 or 6. BR, Martin > Regards, > Karthik > I had some problems in past when healing was triggered, VM disks became unresponsive because healing took most of I/O. My volume containing only big files with VM disks. > > Thanks for suggestions. > BR, > Martin > >> On 10 Apr 2019, at 12:38, Martin Toth > wrote: >> >> Thanks, this looks ok to me, I will reset brick because I don't have any data anymore on failed node so I can use same path / brick name. >> >> Is reseting brick dangerous command? Should I be worried about some possible failure that will impact remaining two nodes? I am running really old 3.7.6 but stable version. >> >> Thanks, >> BR! >> >> Martin >> >> >>> On 10 Apr 2019, at 12:20, Karthik Subrahmanya > wrote: >>> >>> Hi Martin, >>> >>> After you add the new disks and creating raid array, you can run the following command to replace the old brick with new one: >>> >>> - If you are going to use a different name to the new brick you can run >>> gluster volume replace-brick commit force >>> >>> - If you are planning to use the same name for the new brick as well then you can use >>> gluster volume reset-brick commit force >>> Here old-brick & new-brick's hostname & path should be same. >>> >>> After replacing the brick, make sure the brick comes online using volume status. >>> Heal should automatically start, you can check the heal status to see all the files gets replicated to the newly added brick. If it does not start automatically, you can manually start that by running gluster volume heal . >>> >>> HTH, >>> Karthik >>> >>> On Wed, Apr 10, 2019 at 3:13 PM Martin Toth > wrote: >>> Hi all, >>> >>> I am running replica 3 gluster with 3 bricks. One of my servers failed - all disks are showing errors and raid is in fault state. >>> >>> Type: Replicate >>> Volume ID: 41d5c283-3a74-4af8-a55d-924447bfa59a >>> Status: Started >>> Number of Bricks: 1 x 3 = 3 >>> Transport-type: tcp >>> Bricks: >>> Brick1: node1.san:/tank/gluster/gv0imagestore/brick1 >>> Brick2: node2.san:/tank/gluster/gv0imagestore/brick1 >> Brick3: node3.san:/tank/gluster/gv0imagestore/brick1 >>> >>> So one of my bricks is totally failed (node2). It went down and all data are lost (failed raid on node2). Now I am running only two bricks on 2 servers out from 3. >>> This is really critical problem for us, we can lost all data. I want to add new disks to node2, create new raid array on them and try to replace failed brick on this node. >>> >>> What is the procedure of replacing Brick2 on node2, can someone advice? I can?t find anything relevant in documentation. >>> >>> Thanks in advance, >>> Martin >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> https://lists.gluster.org/mailman/listinfo/gluster-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ksubrahm at redhat.com Thu Apr 11 13:40:30 2019 From: ksubrahm at redhat.com (Karthik Subrahmanya) Date: Thu, 11 Apr 2019 19:10:30 +0530 Subject: [Gluster-users] Replica 3 - how to replace failed node (peer) In-Reply-To: <00009213-6BF3-4A7F-AFA7-AC076B04496C@gmail.com> References: <0917AF4A-76EC-4A9E-820F-E0ADA2DA899A@gmail.com> <1634978A-E849-48DB-A160-B1AC3DB56D38@gmail.com> <69E7C95F-8A81-46CB-8BD8-F66B582144EC@gmail.com> <00009213-6BF3-4A7F-AFA7-AC076B04496C@gmail.com> Message-ID: On Thu, Apr 11, 2019 at 6:38 PM Martin Toth wrote: > Hi Karthik, > > On Thu, Apr 11, 2019 at 12:43 PM Martin Toth wrote: > >> Hi Karthik, >> >> more over, I would like to ask if there are some recommended >> settings/parameters for SHD in order to achieve good or fair I/O while >> volume will be healed when I will replace Brick (this should trigger >> healing process). >> > If I understand you concern correctly, you need to get fair I/O > performance for clients while healing takes place as part of the replace > brick operation. For this you can turn off the "data-self-heal" and > "metadata-self-heal" options until the heal completes on the new brick. > > > This is exactly what I mean. I am running VM disks on remaining 2 (out of > 3 - one failed as mentioned) nodes and I need to ensure there will be fair > I/O performance available on these two nodes while replace brick operation > will heal volume. > I will not run any VMs on node where replace brick operation will be > running. So if I understand correctly, when I will set : > > # gluster volume set cluster.data-self-heal off > # gluster volume set cluster.metadata-self-heal off > > this will tell Gluster clients (libgfapi and FUSE mount) not to read from > node ?where replace brick operation? is in place but from remaing two > healthy nodes. Is this correct ? Thanks for clarification. > The reads will be served from one of the good bricks since the file will either be not present on the replaced brick at the time of read or it will be present but marked for heal if it is not already healed. If already healed by SHD, then it could be served from the new brick as well, but there won't be any problem in reading from there in that scenario. By setting these two options whenever a read comes from client it will not try to heal the file for data/metadata. Otherwise it would try to heal (if not already healed by SHD) when the read comes on this, hence slowing down the client. > > Turning off client side healing doesn't compromise data integrity and > consistency. During the read request from client, pending xattr is > evaluated for replica copies and read is only served from correct copy. > During writes, IO will continue on both the replicas, SHD will take care of > healing files. > After replacing the brick, we strongly recommend you to consider upgrading > your gluster to one of the maintained versions. We have many stability > related fixes there, which can handle some critical issues and corner cases > which you could hit during these kind of scenarios. > > > This will be first priority in infrastructure after fixing this cluster > back to fully functional replica3. I will upgrade to 3.12.x and then to > version 5 or 6. > Sounds good. If you are planning to have the same name for the new brick and if you get the error like "Brick may be containing or be contained by an existing brick" even after using the force option, try using a different name. That should work. Regards, Karthik > > BR, > Martin > > Regards, > Karthik > >> I had some problems in past when healing was triggered, VM disks became >> unresponsive because healing took most of I/O. My volume containing only >> big files with VM disks. >> >> Thanks for suggestions. >> BR, >> Martin >> >> On 10 Apr 2019, at 12:38, Martin Toth wrote: >> >> Thanks, this looks ok to me, I will reset brick because I don't have any >> data anymore on failed node so I can use same path / brick name. >> >> Is reseting brick dangerous command? Should I be worried about some >> possible failure that will impact remaining two nodes? I am running really >> old 3.7.6 but stable version. >> >> Thanks, >> BR! >> >> Martin >> >> >> On 10 Apr 2019, at 12:20, Karthik Subrahmanya >> wrote: >> >> Hi Martin, >> >> After you add the new disks and creating raid array, you can run the >> following command to replace the old brick with new one: >> >> - If you are going to use a different name to the new brick you can run >> gluster volume replace-brick commit >> force >> >> - If you are planning to use the same name for the new brick as well then >> you can use >> gluster volume reset-brick commit force >> Here old-brick & new-brick's hostname & path should be same. >> >> After replacing the brick, make sure the brick comes online using volume >> status. >> Heal should automatically start, you can check the heal status to see all >> the files gets replicated to the newly added brick. If it does not start >> automatically, you can manually start that by running gluster volume heal >> . >> >> HTH, >> Karthik >> >> On Wed, Apr 10, 2019 at 3:13 PM Martin Toth wrote: >> >>> Hi all, >>> >>> I am running replica 3 gluster with 3 bricks. One of my servers failed - >>> all disks are showing errors and raid is in fault state. >>> >>> Type: Replicate >>> Volume ID: 41d5c283-3a74-4af8-a55d-924447bfa59a >>> Status: Started >>> Number of Bricks: 1 x 3 = 3 >>> Transport-type: tcp >>> Bricks: >>> Brick1: node1.san:/tank/gluster/gv0imagestore/brick1 >>> Brick2: node2.san:/tank/gluster/gv0imagestore/brick1 >> down >>> Brick3: node3.san:/tank/gluster/gv0imagestore/brick1 >>> >>> So one of my bricks is totally failed (node2). It went down and all data >>> are lost (failed raid on node2). Now I am running only two bricks on 2 >>> servers out from 3. >>> This is really critical problem for us, we can lost all data. I want to >>> add new disks to node2, create new raid array on them and try to replace >>> failed brick on this node. >>> >>> What is the procedure of replacing Brick2 on node2, can someone advice? >>> I can?t find anything relevant in documentation. >>> >>> Thanks in advance, >>> Martin >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> https://lists.gluster.org/mailman/listinfo/gluster-users >> >> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From benedikt.kaless at forumZFD.de Fri Apr 12 07:39:03 2019 From: benedikt.kaless at forumZFD.de (=?UTF-8?Q?Benedikt_Kale=c3=9f?=) Date: Fri, 12 Apr 2019 09:39:03 +0200 Subject: [Gluster-users] gluster mountbroker failed after upgrade to gluster 6 Message-ID: |Hi,| |I updated to gluster to gluster 6 and now the geo-replication remains in status "Faulty". | |If I run a "gluster-mountbroker status" I get: | |Traceback (most recent call last): ? File "/usr/sbin/gluster-mountbroker", line 396, in ??? runcli() ? File "/usr/lib/python3/dist-packages/gluster/cliutils/cliutils.py", line 225, in runcli ??? cls.run(args) ? File "/usr/sbin/gluster-mountbroker", line 275, in run ??? out = execute_in_peers("node-status") ? File "/usr/lib/python3/dist-packages/gluster/cliutils/cliutils.py", line 127, in execute_in_peers ??? raise GlusterCmdException((rc, out, err, " ".join(cmd))) gluster.cliutils.cliutils.GlusterCmdException: (1, '', 'Unable to end. Error : Success\n', 'gluster system:: execute mountbroker.py node-status') | |What can I do: set up the georeplication again?| |Best regards| |Benedikt | | | -- ?forumZFD Entschieden f?r Frieden|Committed to Peace Benedikt Kale? Leiter Team IT|Head team IT Forum Ziviler Friedensdienst e.V.|Forum Civil Peace Service Am K?lner Brett 8 | 50825 K?ln | Germany Tel 0221 91273233 | Fax 0221 91273299 | http://www.forumZFD.de Vorstand nach ? 26 BGB, einzelvertretungsberechtigt|Executive Board: Oliver Knabe (Vorsitz|Chair), Sonja Wiekenberg-Mlalandle, Alexander Mauz VR 17651 Amtsgericht K?ln Spenden|Donations: IBAN DE37 3702 0500 0008 2401 01 BIC BFSWDE33XXX From hunter86_bg at yahoo.com Fri Apr 12 08:12:31 2019 From: hunter86_bg at yahoo.com (Strahil Nikolov) Date: Fri, 12 Apr 2019 08:12:31 +0000 (UTC) Subject: [Gluster-users] Gluster snapshot fails References: <1941056132.2442645.1555056751876.ref@mail.yahoo.com> Message-ID: <1941056132.2442645.1555056751876@mail.yahoo.com> Hello All, I have tried to enable debug and see the reason for the issue. Here is the relevant glusterd.log: [2019-04-12 07:56:54.526508] E [MSGID: 106077] [glusterd-snapshot.c:1882:glusterd_is_thinp_brick] 0-management: Failed to get pool name for device systemd-1 [2019-04-12 07:56:54.527509] E [MSGID: 106121] [glusterd-snapshot.c:2523:glusterd_snapshot_create_prevalidate] 0-management: Failed to pre validate [2019-04-12 07:56:54.527525] E [MSGID: 106024] [glusterd-snapshot.c:2547:glusterd_snapshot_create_prevalidate] 0-management: Snapshot is supported only for thin provisioned LV. Ensure that all bricks of isos are thinly provisioned LV. [2019-04-12 07:56:54.527539] W [MSGID: 106029] [glusterd-snapshot.c:8613:glusterd_snapshot_prevalidate] 0-management: Snapshot create pre-validation failed [2019-04-12 07:56:54.527552] W [MSGID: 106121] [glusterd-mgmt.c:147:gd_mgmt_v3_pre_validate_fn] 0-management: Snapshot Prevalidate Failed [2019-04-12 07:56:54.527568] E [MSGID: 106121] [glusterd-mgmt.c:1015:glusterd_mgmt_v3_pre_validate] 0-management: Pre Validation failed for operation Snapshot on local node [2019-04-12 07:56:54.527583] E [MSGID: 106121] [glusterd-mgmt.c:2377:glusterd_mgmt_v3_initiate_snap_phases] 0-management: Pre Validation Failed here is the output of lvscan & lvs: [root at ovirt1 ~]# lvscan ? ACTIVE??????????? '/dev/gluster_vg_md0/my_vdo_thinpool' [9.86 TiB] inherit ? ACTIVE??????????? '/dev/gluster_vg_md0/gluster_lv_data' [500.00 GiB] inherit ? ACTIVE??????????? '/dev/gluster_vg_md0/gluster_lv_isos' [50.00 GiB] inherit ? ACTIVE??????????? '/dev/gluster_vg_ssd/my_ssd_thinpool' [168.59 GiB] inherit ? ACTIVE??????????? '/dev/gluster_vg_ssd/gluster_lv_engine' [40.00 GiB] inherit ? ACTIVE??????????? '/dev/centos_ovirt1/swap' [6.70 GiB] inherit ? ACTIVE??????????? '/dev/centos_ovirt1/home' [1.00 GiB] inherit ? ACTIVE??????????? '/dev/centos_ovirt1/root' [60.00 GiB] inherit [root at ovirt1 ~]# lvs --noheadings -o pool_lv ? my_vdo_thinpool ? my_vdo_thinpool ? my_ssd_thinpool [root at ovirt1 ~]# ssh ovirt2 "lvscan;lvs --noheadings -o pool_lv" ? ACTIVE??????????? '/dev/gluster_vg_md0/my_vdo_thinpool' [<9.77 TiB] inherit ? ACTIVE??????????? '/dev/gluster_vg_md0/gluster_lv_data' [500.00 GiB] inherit ? ACTIVE??????????? '/dev/gluster_vg_md0/gluster_lv_isos' [50.00 GiB] inherit ? ACTIVE??????????? '/dev/gluster_vg_ssd/my_ssd_thinpool' [<161.40 GiB] inherit ? ACTIVE??????????? '/dev/gluster_vg_ssd/gluster_lv_engine' [40.00 GiB] inherit ? ACTIVE??????????? '/dev/centos_ovirt2/root' [15.00 GiB] inherit ? ACTIVE??????????? '/dev/centos_ovirt2/home' [1.00 GiB] inherit ? ACTIVE??????????? '/dev/centos_ovirt2/swap' [16.00 GiB] inherit ? my_vdo_thinpool ? my_vdo_thinpool ? my_ssd_thinpool [root at ovirt1 ~]# ssh ovirt3 "lvscan;lvs --noheadings -o pool_lv" ? ACTIVE??????????? '/dev/gluster_vg_sda3/gluster_thinpool_sda3' [41.00 GiB] inherit ? ACTIVE??????????? '/dev/gluster_vg_sda3/gluster_lv_data' [15.00 GiB] inherit ? ACTIVE??????????? '/dev/gluster_vg_sda3/gluster_lv_isos' [15.00 GiB] inherit ? ACTIVE??????????? '/dev/gluster_vg_sda3/gluster_lv_engine' [15.00 GiB] inherit ? ACTIVE??????????? '/dev/centos_ovirt3/root' [20.00 GiB] inherit ? ACTIVE??????????? '/dev/centos_ovirt3/home' [1.00 GiB] inherit ? ACTIVE??????????? '/dev/centos_ovirt3/swap' [8.00 GiB] inherit ? gluster_thinpool_sda3 ? gluster_thinpool_sda3 ? gluster_thinpool_sda3 I am mounting my bricks via systemd , as I have issues with bricks being started before VDO. [root at ovirt1 ~]# findmnt /gluster_bricks/isos TARGET?????????????? SOURCE???????????????????????????????????? FSTYPE OPTIONS /gluster_bricks/isos systemd-1????????????????????????????????? autofs rw,relatime,fd=31,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=21843 /gluster_bricks/isos /dev/mapper/gluster_vg_md0-gluster_lv_isos xfs??? rw,noatime,nodiratime,seclabel,attr2,inode64,noquota [root at ovirt1 ~]# ssh ovirt2 "findmnt /gluster_bricks/isos " TARGET?????????????? SOURCE???????????????????????????????????? FSTYPE OPTIONS /gluster_bricks/isos systemd-1????????????????????????????????? autofs rw,relatime,fd=26,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=14279 /gluster_bricks/isos /dev/mapper/gluster_vg_md0-gluster_lv_isos xfs??? rw,noatime,nodiratime,seclabel,attr2,inode64,noquota [root at ovirt1 ~]# ssh ovirt3 "findmnt /gluster_bricks/isos " TARGET?????????????? SOURCE????????????????????????????????????? FSTYPE OPTIONS /gluster_bricks/isos systemd-1?????????????????????????????????? autofs rw,relatime,fd=35,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=17770 /gluster_bricks/isos /dev/mapper/gluster_vg_sda3-gluster_lv_isos xfs??? rw,noatime,nodiratime,seclabel,attr2,inode64,logbsize=256k,sunit=512,swidth=1024,noquota [root at ovirt1 ~]# grep "gluster_bricks" /proc/mounts systemd-1 /gluster_bricks/data autofs rw,relatime,fd=22,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=21513 0 0 systemd-1 /gluster_bricks/engine autofs rw,relatime,fd=25,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=21735 0 0 systemd-1 /gluster_bricks/isos autofs rw,relatime,fd=31,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=21843 0 0 /dev/mapper/gluster_vg_ssd-gluster_lv_engine /gluster_bricks/engine xfs rw,seclabel,noatime,nodiratime,attr2,inode64,sunit=256,swidth=256,noquota 0 0 /dev/mapper/gluster_vg_md0-gluster_lv_isos /gluster_bricks/isos xfs rw,seclabel,noatime,nodiratime,attr2,inode64,noquota 0 0 /dev/mapper/gluster_vg_md0-gluster_lv_data /gluster_bricks/data xfs rw,seclabel,noatime,nodiratime,attr2,inode64,noquota 0 0 Obviously , gluster is catching "systemd-1" as a device and tries to check if it's a thin LV.Where should I open a bug for that ? P.S.: Adding oVirt User list. Best Regards,Strahil Nikolov ? ?????????, 11 ????? 2019 ?., 4:00:31 ?. ???????-4, Strahil Nikolov ??????: Hi Rafi, thanks for your update. I have tested again with another gluster volume.[root at ovirt1 glusterfs]# gluster volume info isos Volume Name: isos Type: Replicate Volume ID: 9b92b5bd-79f5-427b-bd8d-af28b038ed2a Status: Started Snapshot Count: 0 Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: ovirt1:/gluster_bricks/isos/isos Brick2: ovirt2:/gluster_bricks/isos/isos Brick3: ovirt3.localdomain:/gluster_bricks/isos/isos (arbiter) Options Reconfigured: cluster.granular-entry-heal: enable performance.strict-o-direct: on network.ping-timeout: 30 storage.owner-gid: 36 storage.owner-uid: 36 user.cifs: off features.shard: on cluster.shd-wait-qlength: 10000 cluster.shd-max-threads: 8 cluster.locking-scheme: granular cluster.data-self-heal-algorithm: full cluster.server-quorum-type: server cluster.quorum-type: auto cluster.eager-lock: enable network.remote-dio: off performance.low-prio-threads: 32 performance.io-cache: off performance.read-ahead: off performance.quick-read: off transport.address-family: inet nfs.disable: on performance.client-io-threads: off cluster.enable-shared-storage: enable Command run: logrotate -f glusterfs ; logrotate -f glusterfs-georep;? gluster snapshot create isos-snap-2019-04-11 isos? description TEST Logs:[root at ovirt1 glusterfs]# cat cli.log [2019-04-11 07:51:02.367453] I [cli.c:769:main] 0-cli: Started running gluster with version 5.5 [2019-04-11 07:51:02.486863] I [MSGID: 101190] [event-epoll.c:621:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1 [2019-04-11 07:51:02.556813] E [cli-rpc-ops.c:11293:gf_cli_snapshot] 0-cli: cli_to_glusterd for snapshot failed [2019-04-11 07:51:02.556880] I [input.c:31:cli_batch] 0-: Exiting with: -1 [root at ovirt1 glusterfs]# cat glusterd.log [2019-04-11 07:51:02.553357] E [MSGID: 106024] [glusterd-snapshot.c:2547:glusterd_snapshot_create_prevalidate] 0-management: Snapshot is supported only for thin provisioned LV. Ensure that all bricks of isos are thinly provisioned LV. [2019-04-11 07:51:02.553365] W [MSGID: 106029] [glusterd-snapshot.c:8613:glusterd_snapshot_prevalidate] 0-management: Snapshot create pre-validation failed [2019-04-11 07:51:02.553703] W [MSGID: 106121] [glusterd-mgmt.c:147:gd_mgmt_v3_pre_validate_fn] 0-management: Snapshot Prevalidate Failed [2019-04-11 07:51:02.553719] E [MSGID: 106121] [glusterd-mgmt.c:1015:glusterd_mgmt_v3_pre_validate] 0-management: Pre Validation failed for operation Snapshot on local node My LVs hosting the bricks are:[root at ovirt1 ~]# lvs gluster_vg_md0 ? LV????????????? VG???????????? Attr?????? LSize?? Pool??????????? Origin Data%? Meta%? Move Log Cpy%Sync Convert ? gluster_lv_data gluster_vg_md0 Vwi-aot--- 500.00g my_vdo_thinpool??????? 35.97 ? gluster_lv_isos gluster_vg_md0 Vwi-aot---? 50.00g my_vdo_thinpool??????? 52.11 ? my_vdo_thinpool gluster_vg_md0 twi-aot---?? 9.86t??????????????????????? 2.04?? 11.45 [root at ovirt1 ~]# ssh ovirt2 "lvs gluster_vg_md0" ? LV????????????? VG???????????? Attr?????? LSize?? Pool??????????? Origin Data%? Meta%? Move Log Cpy%Sync Convert ? gluster_lv_data gluster_vg_md0 Vwi-aot--- 500.00g my_vdo_thinpool??????? 35.98 ? gluster_lv_isos gluster_vg_md0 Vwi-aot---? 50.00g my_vdo_thinpool??????? 25.94 ? my_vdo_thinpool gluster_vg_md0 twi-aot---? <9.77t??????????????????????? 1.93?? 11.39 [root at ovirt1 ~]# ssh ovirt3 "lvs gluster_vg_sda3" ? LV??????????????????? VG????????????? Attr?????? LSize? Pool????????????????? Origin Data%? Meta%? Move Log Cpy%Sync Convert ? gluster_lv_data?????? gluster_vg_sda3 Vwi-aotz-- 15.00g gluster_thinpool_sda3??????? 0.17 ? gluster_lv_engine???? gluster_vg_sda3 Vwi-aotz-- 15.00g gluster_thinpool_sda3??????? 0.16 ? gluster_lv_isos?????? gluster_vg_sda3 Vwi-aotz-- 15.00g gluster_thinpool_sda3??????? 0.12 ? gluster_thinpool_sda3 gluster_vg_sda3 twi-aotz-- 41.00g????????????????????????????? 0.16?? 1.58 As you can see - all bricks are thin LV and space is not the issue. Can someone hint me how to enable debug , so gluster logs can show the reason for that pre-check failure ? Best Regards,Strahil Nikolov ? ?????, 10 ????? 2019 ?., 9:05:15 ?. ???????-4, Rafi Kavungal Chundattu Parambil ??????: Hi Strahil, The name of device is not at all a problem here. Can you please check the log of glusterd, and see if there is any useful information about the failure. Also please provide the output of `lvscan` and `lvs --noheadings -o pool_lv` from all nodes Regards Rafi KC ----- Original Message ----- From: "Strahil Nikolov" To: gluster-users at gluster.org Sent: Wednesday, April 10, 2019 2:36:39 AM Subject: [Gluster-users] Gluster snapshot fails Hello Community, I have a problem running a snapshot of a replica 3 arbiter 1 volume. Error: [root at ovirt2 ~]# gluster snapshot create before-423 engine description "Before upgrade of engine from 4.2.2 to 4.2.3" snapshot create: failed: Snapshot is supported only for thin provisioned LV. Ensure that all bricks of engine are thinly provisioned LV. Snapshot command failed Volume info: Volume Name: engine Type: Replicate Volume ID: 30ca1cc2-f2f7-4749-9e2e-cee9d7099ded Status: Started Snapshot Count: 0 Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: ovirt1:/gluster_bricks/engine/engine Brick2: ovirt2:/gluster_bricks/engine/engine Brick3: ovirt3:/gluster_bricks/engine/engine (arbiter) Options Reconfigured: cluster.granular-entry-heal: enable performance.strict-o-direct: on network.ping-timeout: 30 storage.owner-gid: 36 storage.owner-uid: 36 user.cifs: off features.shard: on cluster.shd-wait-qlength: 10000 cluster.shd-max-threads: 8 cluster.locking-scheme: granular cluster.data-self-heal-algorithm: full cluster.server-quorum-type: server cluster.quorum-type: auto cluster.eager-lock: enable network.remote-dio: off performance.low-prio-threads: 32 performance.io-cache: off performance.read-ahead: off performance.quick-read: off transport.address-family: inet nfs.disable: on performance.client-io-threads: off cluster.enable-shared-storage: enable All bricks are on thin lvm with plenty of space, the only thing that could be causing it is that ovirt1 & ovirt2 are on /dev/gluster_vg_ssd/gluster_lv_engine , while arbiter is on /dev/gluster_vg_sda3/gluster_lv_engine. Is that the issue ? Should I rename my brick's VG ? If so, why there is no mentioning in the documentation ? Best Regards, Strahil Nikolov _______________________________________________ Gluster-users mailing list Gluster-users at gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From hunter86_bg at yahoo.com Fri Apr 12 08:32:18 2019 From: hunter86_bg at yahoo.com (Strahil Nikolov) Date: Fri, 12 Apr 2019 08:32:18 +0000 (UTC) Subject: [Gluster-users] Gluster snapshot fails In-Reply-To: <1941056132.2442645.1555056751876@mail.yahoo.com> References: <1941056132.2442645.1555056751876.ref@mail.yahoo.com> <1941056132.2442645.1555056751876@mail.yahoo.com> Message-ID: <96435198.542814.1555057938954@mail.yahoo.com> Hello All, it seems that "systemd-1" is from the automount unit , and not from the systemd unit. [root at ovirt1 system]# systemctl cat gluster_bricks-isos.automount # /etc/systemd/system/gluster_bricks-isos.automount [Unit] Description=automount for gluster brick ISOS [Automount] Where=/gluster_bricks/isos [Install] WantedBy=multi-user.target Best Regards,Strahil Nikolov ? ?????, 12 ????? 2019 ?., 4:12:31 ?. ???????-4, Strahil Nikolov ??????: Hello All, I have tried to enable debug and see the reason for the issue. Here is the relevant glusterd.log: [2019-04-12 07:56:54.526508] E [MSGID: 106077] [glusterd-snapshot.c:1882:glusterd_is_thinp_brick] 0-management: Failed to get pool name for device systemd-1 [2019-04-12 07:56:54.527509] E [MSGID: 106121] [glusterd-snapshot.c:2523:glusterd_snapshot_create_prevalidate] 0-management: Failed to pre validate [2019-04-12 07:56:54.527525] E [MSGID: 106024] [glusterd-snapshot.c:2547:glusterd_snapshot_create_prevalidate] 0-management: Snapshot is supported only for thin provisioned LV. Ensure that all bricks of isos are thinly provisioned LV. [2019-04-12 07:56:54.527539] W [MSGID: 106029] [glusterd-snapshot.c:8613:glusterd_snapshot_prevalidate] 0-management: Snapshot create pre-validation failed [2019-04-12 07:56:54.527552] W [MSGID: 106121] [glusterd-mgmt.c:147:gd_mgmt_v3_pre_validate_fn] 0-management: Snapshot Prevalidate Failed [2019-04-12 07:56:54.527568] E [MSGID: 106121] [glusterd-mgmt.c:1015:glusterd_mgmt_v3_pre_validate] 0-management: Pre Validation failed for operation Snapshot on local node [2019-04-12 07:56:54.527583] E [MSGID: 106121] [glusterd-mgmt.c:2377:glusterd_mgmt_v3_initiate_snap_phases] 0-management: Pre Validation Failed here is the output of lvscan & lvs: [root at ovirt1 ~]# lvscan ? ACTIVE??????????? '/dev/gluster_vg_md0/my_vdo_thinpool' [9.86 TiB] inherit ? ACTIVE??????????? '/dev/gluster_vg_md0/gluster_lv_data' [500.00 GiB] inherit ? ACTIVE??????????? '/dev/gluster_vg_md0/gluster_lv_isos' [50.00 GiB] inherit ? ACTIVE??????????? '/dev/gluster_vg_ssd/my_ssd_thinpool' [168.59 GiB] inherit ? ACTIVE??????????? '/dev/gluster_vg_ssd/gluster_lv_engine' [40.00 GiB] inherit ? ACTIVE??????????? '/dev/centos_ovirt1/swap' [6.70 GiB] inherit ? ACTIVE??????????? '/dev/centos_ovirt1/home' [1.00 GiB] inherit ? ACTIVE??????????? '/dev/centos_ovirt1/root' [60.00 GiB] inherit [root at ovirt1 ~]# lvs --noheadings -o pool_lv ? my_vdo_thinpool ? my_vdo_thinpool ? my_ssd_thinpool [root at ovirt1 ~]# ssh ovirt2 "lvscan;lvs --noheadings -o pool_lv" ? ACTIVE??????????? '/dev/gluster_vg_md0/my_vdo_thinpool' [<9.77 TiB] inherit ? ACTIVE??????????? '/dev/gluster_vg_md0/gluster_lv_data' [500.00 GiB] inherit ? ACTIVE??????????? '/dev/gluster_vg_md0/gluster_lv_isos' [50.00 GiB] inherit ? ACTIVE??????????? '/dev/gluster_vg_ssd/my_ssd_thinpool' [<161.40 GiB] inherit ? ACTIVE??????????? '/dev/gluster_vg_ssd/gluster_lv_engine' [40.00 GiB] inherit ? ACTIVE??????????? '/dev/centos_ovirt2/root' [15.00 GiB] inherit ? ACTIVE??????????? '/dev/centos_ovirt2/home' [1.00 GiB] inherit ? ACTIVE??????????? '/dev/centos_ovirt2/swap' [16.00 GiB] inherit ? my_vdo_thinpool ? my_vdo_thinpool ? my_ssd_thinpool [root at ovirt1 ~]# ssh ovirt3 "lvscan;lvs --noheadings -o pool_lv" ? ACTIVE??????????? '/dev/gluster_vg_sda3/gluster_thinpool_sda3' [41.00 GiB] inherit ? ACTIVE??????????? '/dev/gluster_vg_sda3/gluster_lv_data' [15.00 GiB] inherit ? ACTIVE??????????? '/dev/gluster_vg_sda3/gluster_lv_isos' [15.00 GiB] inherit ? ACTIVE??????????? '/dev/gluster_vg_sda3/gluster_lv_engine' [15.00 GiB] inherit ? ACTIVE??????????? '/dev/centos_ovirt3/root' [20.00 GiB] inherit ? ACTIVE??????????? '/dev/centos_ovirt3/home' [1.00 GiB] inherit ? ACTIVE??????????? '/dev/centos_ovirt3/swap' [8.00 GiB] inherit ? gluster_thinpool_sda3 ? gluster_thinpool_sda3 ? gluster_thinpool_sda3 I am mounting my bricks via systemd , as I have issues with bricks being started before VDO. [root at ovirt1 ~]# findmnt /gluster_bricks/isos TARGET?????????????? SOURCE???????????????????????????????????? FSTYPE OPTIONS /gluster_bricks/isos systemd-1????????????????????????????????? autofs rw,relatime,fd=31,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=21843 /gluster_bricks/isos /dev/mapper/gluster_vg_md0-gluster_lv_isos xfs??? rw,noatime,nodiratime,seclabel,attr2,inode64,noquota [root at ovirt1 ~]# ssh ovirt2 "findmnt /gluster_bricks/isos " TARGET?????????????? SOURCE???????????????????????????????????? FSTYPE OPTIONS /gluster_bricks/isos systemd-1????????????????????????????????? autofs rw,relatime,fd=26,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=14279 /gluster_bricks/isos /dev/mapper/gluster_vg_md0-gluster_lv_isos xfs??? rw,noatime,nodiratime,seclabel,attr2,inode64,noquota [root at ovirt1 ~]# ssh ovirt3 "findmnt /gluster_bricks/isos " TARGET?????????????? SOURCE????????????????????????????????????? FSTYPE OPTIONS /gluster_bricks/isos systemd-1?????????????????????????????????? autofs rw,relatime,fd=35,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=17770 /gluster_bricks/isos /dev/mapper/gluster_vg_sda3-gluster_lv_isos xfs??? rw,noatime,nodiratime,seclabel,attr2,inode64,logbsize=256k,sunit=512,swidth=1024,noquota [root at ovirt1 ~]# grep "gluster_bricks" /proc/mounts systemd-1 /gluster_bricks/data autofs rw,relatime,fd=22,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=21513 0 0 systemd-1 /gluster_bricks/engine autofs rw,relatime,fd=25,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=21735 0 0 systemd-1 /gluster_bricks/isos autofs rw,relatime,fd=31,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=21843 0 0 /dev/mapper/gluster_vg_ssd-gluster_lv_engine /gluster_bricks/engine xfs rw,seclabel,noatime,nodiratime,attr2,inode64,sunit=256,swidth=256,noquota 0 0 /dev/mapper/gluster_vg_md0-gluster_lv_isos /gluster_bricks/isos xfs rw,seclabel,noatime,nodiratime,attr2,inode64,noquota 0 0 /dev/mapper/gluster_vg_md0-gluster_lv_data /gluster_bricks/data xfs rw,seclabel,noatime,nodiratime,attr2,inode64,noquota 0 0 Obviously , gluster is catching "systemd-1" as a device and tries to check if it's a thin LV.Where should I open a bug for that ? P.S.: Adding oVirt User list. Best Regards,Strahil Nikolov ? ?????????, 11 ????? 2019 ?., 4:00:31 ?. ???????-4, Strahil Nikolov ??????: Hi Rafi, thanks for your update. I have tested again with another gluster volume.[root at ovirt1 glusterfs]# gluster volume info isos Volume Name: isos Type: Replicate Volume ID: 9b92b5bd-79f5-427b-bd8d-af28b038ed2a Status: Started Snapshot Count: 0 Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: ovirt1:/gluster_bricks/isos/isos Brick2: ovirt2:/gluster_bricks/isos/isos Brick3: ovirt3.localdomain:/gluster_bricks/isos/isos (arbiter) Options Reconfigured: cluster.granular-entry-heal: enable performance.strict-o-direct: on network.ping-timeout: 30 storage.owner-gid: 36 storage.owner-uid: 36 user.cifs: off features.shard: on cluster.shd-wait-qlength: 10000 cluster.shd-max-threads: 8 cluster.locking-scheme: granular cluster.data-self-heal-algorithm: full cluster.server-quorum-type: server cluster.quorum-type: auto cluster.eager-lock: enable network.remote-dio: off performance.low-prio-threads: 32 performance.io-cache: off performance.read-ahead: off performance.quick-read: off transport.address-family: inet nfs.disable: on performance.client-io-threads: off cluster.enable-shared-storage: enable Command run: logrotate -f glusterfs ; logrotate -f glusterfs-georep;? gluster snapshot create isos-snap-2019-04-11 isos? description TEST Logs:[root at ovirt1 glusterfs]# cat cli.log [2019-04-11 07:51:02.367453] I [cli.c:769:main] 0-cli: Started running gluster with version 5.5 [2019-04-11 07:51:02.486863] I [MSGID: 101190] [event-epoll.c:621:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1 [2019-04-11 07:51:02.556813] E [cli-rpc-ops.c:11293:gf_cli_snapshot] 0-cli: cli_to_glusterd for snapshot failed [2019-04-11 07:51:02.556880] I [input.c:31:cli_batch] 0-: Exiting with: -1 [root at ovirt1 glusterfs]# cat glusterd.log [2019-04-11 07:51:02.553357] E [MSGID: 106024] [glusterd-snapshot.c:2547:glusterd_snapshot_create_prevalidate] 0-management: Snapshot is supported only for thin provisioned LV. Ensure that all bricks of isos are thinly provisioned LV. [2019-04-11 07:51:02.553365] W [MSGID: 106029] [glusterd-snapshot.c:8613:glusterd_snapshot_prevalidate] 0-management: Snapshot create pre-validation failed [2019-04-11 07:51:02.553703] W [MSGID: 106121] [glusterd-mgmt.c:147:gd_mgmt_v3_pre_validate_fn] 0-management: Snapshot Prevalidate Failed [2019-04-11 07:51:02.553719] E [MSGID: 106121] [glusterd-mgmt.c:1015:glusterd_mgmt_v3_pre_validate] 0-management: Pre Validation failed for operation Snapshot on local node My LVs hosting the bricks are:[root at ovirt1 ~]# lvs gluster_vg_md0 ? LV????????????? VG???????????? Attr?????? LSize?? Pool??????????? Origin Data%? Meta%? Move Log Cpy%Sync Convert ? gluster_lv_data gluster_vg_md0 Vwi-aot--- 500.00g my_vdo_thinpool??????? 35.97 ? gluster_lv_isos gluster_vg_md0 Vwi-aot---? 50.00g my_vdo_thinpool??????? 52.11 ? my_vdo_thinpool gluster_vg_md0 twi-aot---?? 9.86t??????????????????????? 2.04?? 11.45 [root at ovirt1 ~]# ssh ovirt2 "lvs gluster_vg_md0" ? LV????????????? VG???????????? Attr?????? LSize?? Pool??????????? Origin Data%? Meta%? Move Log Cpy%Sync Convert ? gluster_lv_data gluster_vg_md0 Vwi-aot--- 500.00g my_vdo_thinpool??????? 35.98 ? gluster_lv_isos gluster_vg_md0 Vwi-aot---? 50.00g my_vdo_thinpool??????? 25.94 ? my_vdo_thinpool gluster_vg_md0 twi-aot---? <9.77t??????????????????????? 1.93?? 11.39 [root at ovirt1 ~]# ssh ovirt3 "lvs gluster_vg_sda3" ? LV??????????????????? VG????????????? Attr?????? LSize? Pool????????????????? Origin Data%? Meta%? Move Log Cpy%Sync Convert ? gluster_lv_data?????? gluster_vg_sda3 Vwi-aotz-- 15.00g gluster_thinpool_sda3??????? 0.17 ? gluster_lv_engine???? gluster_vg_sda3 Vwi-aotz-- 15.00g gluster_thinpool_sda3??????? 0.16 ? gluster_lv_isos?????? gluster_vg_sda3 Vwi-aotz-- 15.00g gluster_thinpool_sda3??????? 0.12 ? gluster_thinpool_sda3 gluster_vg_sda3 twi-aotz-- 41.00g????????????????????????????? 0.16?? 1.58 As you can see - all bricks are thin LV and space is not the issue. Can someone hint me how to enable debug , so gluster logs can show the reason for that pre-check failure ? Best Regards,Strahil Nikolov ? ?????, 10 ????? 2019 ?., 9:05:15 ?. ???????-4, Rafi Kavungal Chundattu Parambil ??????: Hi Strahil, The name of device is not at all a problem here. Can you please check the log of glusterd, and see if there is any useful information about the failure. Also please provide the output of `lvscan` and `lvs --noheadings -o pool_lv` from all nodes Regards Rafi KC ----- Original Message ----- From: "Strahil Nikolov" To: gluster-users at gluster.org Sent: Wednesday, April 10, 2019 2:36:39 AM Subject: [Gluster-users] Gluster snapshot fails Hello Community, I have a problem running a snapshot of a replica 3 arbiter 1 volume. Error: [root at ovirt2 ~]# gluster snapshot create before-423 engine description "Before upgrade of engine from 4.2.2 to 4.2.3" snapshot create: failed: Snapshot is supported only for thin provisioned LV. Ensure that all bricks of engine are thinly provisioned LV. Snapshot command failed Volume info: Volume Name: engine Type: Replicate Volume ID: 30ca1cc2-f2f7-4749-9e2e-cee9d7099ded Status: Started Snapshot Count: 0 Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: ovirt1:/gluster_bricks/engine/engine Brick2: ovirt2:/gluster_bricks/engine/engine Brick3: ovirt3:/gluster_bricks/engine/engine (arbiter) Options Reconfigured: cluster.granular-entry-heal: enable performance.strict-o-direct: on network.ping-timeout: 30 storage.owner-gid: 36 storage.owner-uid: 36 user.cifs: off features.shard: on cluster.shd-wait-qlength: 10000 cluster.shd-max-threads: 8 cluster.locking-scheme: granular cluster.data-self-heal-algorithm: full cluster.server-quorum-type: server cluster.quorum-type: auto cluster.eager-lock: enable network.remote-dio: off performance.low-prio-threads: 32 performance.io-cache: off performance.read-ahead: off performance.quick-read: off transport.address-family: inet nfs.disable: on performance.client-io-threads: off cluster.enable-shared-storage: enable All bricks are on thin lvm with plenty of space, the only thing that could be causing it is that ovirt1 & ovirt2 are on /dev/gluster_vg_ssd/gluster_lv_engine , while arbiter is on /dev/gluster_vg_sda3/gluster_lv_engine. Is that the issue ? Should I rename my brick's VG ? If so, why there is no mentioning in the documentation ? Best Regards, Strahil Nikolov _______________________________________________ Gluster-users mailing list Gluster-users at gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From hunter86_bg at yahoo.com Fri Apr 12 11:32:41 2019 From: hunter86_bg at yahoo.com (Strahil Nikolov) Date: Fri, 12 Apr 2019 11:32:41 +0000 (UTC) Subject: [Gluster-users] Gluster snapshot fails In-Reply-To: <96435198.542814.1555057938954@mail.yahoo.com> References: <1941056132.2442645.1555056751876.ref@mail.yahoo.com> <1941056132.2442645.1555056751876@mail.yahoo.com> <96435198.542814.1555057938954@mail.yahoo.com> Message-ID: <1106680361.905173.1555068761758@mail.yahoo.com> Hi All, I have tested gluster snapshot without systemd.automount units and it works as follows: [root at ovirt1 system]# gluster snapshot create isos-snap-2019-04-11 isos? description TEST snapshot create: success: Snap isos-snap-2019-04-11_GMT-2019.04.12-11.18.24 created successfully [root at ovirt1 system]# gluster snapshot list isos-snap-2019-04-11_GMT-2019.04.12-11.18.24 [root at ovirt1 system]# gluster snapshot info isos-snap-2019-04-11_GMT-2019.04.12-11.18.24 Snapshot????????????????? : isos-snap-2019-04-11_GMT-2019.04.12-11.18.24 Snap UUID???????????????? : 70d5716e-4633-43d4-a562-8e29a96b0104 Description?????????????? : TEST Created?????????????????? : 2019-04-12 11:18:24 Snap Volumes: ??????? Snap Volume Name????????? : 584e88eab0374c0582cc544a2bc4b79e ??????? Origin Volume name??????? : isos ??????? Snaps taken for isos????? : 1 ??????? Snaps available for isos? : 255 ??????? Status??????????????????? : Stopped Best Regards,Strahil Nikolov ? ?????, 12 ????? 2019 ?., 4:32:18 ?. ???????-4, Strahil Nikolov ??????: Hello All, it seems that "systemd-1" is from the automount unit , and not from the systemd unit. [root at ovirt1 system]# systemctl cat gluster_bricks-isos.automount # /etc/systemd/system/gluster_bricks-isos.automount [Unit] Description=automount for gluster brick ISOS [Automount] Where=/gluster_bricks/isos [Install] WantedBy=multi-user.target Best Regards,Strahil Nikolov ? ?????, 12 ????? 2019 ?., 4:12:31 ?. ???????-4, Strahil Nikolov ??????: Hello All, I have tried to enable debug and see the reason for the issue. Here is the relevant glusterd.log: [2019-04-12 07:56:54.526508] E [MSGID: 106077] [glusterd-snapshot.c:1882:glusterd_is_thinp_brick] 0-management: Failed to get pool name for device systemd-1 [2019-04-12 07:56:54.527509] E [MSGID: 106121] [glusterd-snapshot.c:2523:glusterd_snapshot_create_prevalidate] 0-management: Failed to pre validate [2019-04-12 07:56:54.527525] E [MSGID: 106024] [glusterd-snapshot.c:2547:glusterd_snapshot_create_prevalidate] 0-management: Snapshot is supported only for thin provisioned LV. Ensure that all bricks of isos are thinly provisioned LV. [2019-04-12 07:56:54.527539] W [MSGID: 106029] [glusterd-snapshot.c:8613:glusterd_snapshot_prevalidate] 0-management: Snapshot create pre-validation failed [2019-04-12 07:56:54.527552] W [MSGID: 106121] [glusterd-mgmt.c:147:gd_mgmt_v3_pre_validate_fn] 0-management: Snapshot Prevalidate Failed [2019-04-12 07:56:54.527568] E [MSGID: 106121] [glusterd-mgmt.c:1015:glusterd_mgmt_v3_pre_validate] 0-management: Pre Validation failed for operation Snapshot on local node [2019-04-12 07:56:54.527583] E [MSGID: 106121] [glusterd-mgmt.c:2377:glusterd_mgmt_v3_initiate_snap_phases] 0-management: Pre Validation Failed here is the output of lvscan & lvs: [root at ovirt1 ~]# lvscan ? ACTIVE??????????? '/dev/gluster_vg_md0/my_vdo_thinpool' [9.86 TiB] inherit ? ACTIVE??????????? '/dev/gluster_vg_md0/gluster_lv_data' [500.00 GiB] inherit ? ACTIVE??????????? '/dev/gluster_vg_md0/gluster_lv_isos' [50.00 GiB] inherit ? ACTIVE??????????? '/dev/gluster_vg_ssd/my_ssd_thinpool' [168.59 GiB] inherit ? ACTIVE??????????? '/dev/gluster_vg_ssd/gluster_lv_engine' [40.00 GiB] inherit ? ACTIVE??????????? '/dev/centos_ovirt1/swap' [6.70 GiB] inherit ? ACTIVE??????????? '/dev/centos_ovirt1/home' [1.00 GiB] inherit ? ACTIVE??????????? '/dev/centos_ovirt1/root' [60.00 GiB] inherit [root at ovirt1 ~]# lvs --noheadings -o pool_lv ? my_vdo_thinpool ? my_vdo_thinpool ? my_ssd_thinpool [root at ovirt1 ~]# ssh ovirt2 "lvscan;lvs --noheadings -o pool_lv" ? ACTIVE??????????? '/dev/gluster_vg_md0/my_vdo_thinpool' [<9.77 TiB] inherit ? ACTIVE??????????? '/dev/gluster_vg_md0/gluster_lv_data' [500.00 GiB] inherit ? ACTIVE??????????? '/dev/gluster_vg_md0/gluster_lv_isos' [50.00 GiB] inherit ? ACTIVE??????????? '/dev/gluster_vg_ssd/my_ssd_thinpool' [<161.40 GiB] inherit ? ACTIVE??????????? '/dev/gluster_vg_ssd/gluster_lv_engine' [40.00 GiB] inherit ? ACTIVE??????????? '/dev/centos_ovirt2/root' [15.00 GiB] inherit ? ACTIVE??????????? '/dev/centos_ovirt2/home' [1.00 GiB] inherit ? ACTIVE??????????? '/dev/centos_ovirt2/swap' [16.00 GiB] inherit ? my_vdo_thinpool ? my_vdo_thinpool ? my_ssd_thinpool [root at ovirt1 ~]# ssh ovirt3 "lvscan;lvs --noheadings -o pool_lv" ? ACTIVE??????????? '/dev/gluster_vg_sda3/gluster_thinpool_sda3' [41.00 GiB] inherit ? ACTIVE??????????? '/dev/gluster_vg_sda3/gluster_lv_data' [15.00 GiB] inherit ? ACTIVE??????????? '/dev/gluster_vg_sda3/gluster_lv_isos' [15.00 GiB] inherit ? ACTIVE??????????? '/dev/gluster_vg_sda3/gluster_lv_engine' [15.00 GiB] inherit ? ACTIVE??????????? '/dev/centos_ovirt3/root' [20.00 GiB] inherit ? ACTIVE??????????? '/dev/centos_ovirt3/home' [1.00 GiB] inherit ? ACTIVE??????????? '/dev/centos_ovirt3/swap' [8.00 GiB] inherit ? gluster_thinpool_sda3 ? gluster_thinpool_sda3 ? gluster_thinpool_sda3 I am mounting my bricks via systemd , as I have issues with bricks being started before VDO. [root at ovirt1 ~]# findmnt /gluster_bricks/isos TARGET?????????????? SOURCE???????????????????????????????????? FSTYPE OPTIONS /gluster_bricks/isos systemd-1????????????????????????????????? autofs rw,relatime,fd=31,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=21843 /gluster_bricks/isos /dev/mapper/gluster_vg_md0-gluster_lv_isos xfs??? rw,noatime,nodiratime,seclabel,attr2,inode64,noquota [root at ovirt1 ~]# ssh ovirt2 "findmnt /gluster_bricks/isos " TARGET?????????????? SOURCE???????????????????????????????????? FSTYPE OPTIONS /gluster_bricks/isos systemd-1????????????????????????????????? autofs rw,relatime,fd=26,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=14279 /gluster_bricks/isos /dev/mapper/gluster_vg_md0-gluster_lv_isos xfs??? rw,noatime,nodiratime,seclabel,attr2,inode64,noquota [root at ovirt1 ~]# ssh ovirt3 "findmnt /gluster_bricks/isos " TARGET?????????????? SOURCE????????????????????????????????????? FSTYPE OPTIONS /gluster_bricks/isos systemd-1?????????????????????????????????? autofs rw,relatime,fd=35,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=17770 /gluster_bricks/isos /dev/mapper/gluster_vg_sda3-gluster_lv_isos xfs??? rw,noatime,nodiratime,seclabel,attr2,inode64,logbsize=256k,sunit=512,swidth=1024,noquota [root at ovirt1 ~]# grep "gluster_bricks" /proc/mounts systemd-1 /gluster_bricks/data autofs rw,relatime,fd=22,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=21513 0 0 systemd-1 /gluster_bricks/engine autofs rw,relatime,fd=25,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=21735 0 0 systemd-1 /gluster_bricks/isos autofs rw,relatime,fd=31,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=21843 0 0 /dev/mapper/gluster_vg_ssd-gluster_lv_engine /gluster_bricks/engine xfs rw,seclabel,noatime,nodiratime,attr2,inode64,sunit=256,swidth=256,noquota 0 0 /dev/mapper/gluster_vg_md0-gluster_lv_isos /gluster_bricks/isos xfs rw,seclabel,noatime,nodiratime,attr2,inode64,noquota 0 0 /dev/mapper/gluster_vg_md0-gluster_lv_data /gluster_bricks/data xfs rw,seclabel,noatime,nodiratime,attr2,inode64,noquota 0 0 Obviously , gluster is catching "systemd-1" as a device and tries to check if it's a thin LV.Where should I open a bug for that ? P.S.: Adding oVirt User list. Best Regards,Strahil Nikolov ? ?????????, 11 ????? 2019 ?., 4:00:31 ?. ???????-4, Strahil Nikolov ??????: Hi Rafi, thanks for your update. I have tested again with another gluster volume.[root at ovirt1 glusterfs]# gluster volume info isos Volume Name: isos Type: Replicate Volume ID: 9b92b5bd-79f5-427b-bd8d-af28b038ed2a Status: Started Snapshot Count: 0 Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: ovirt1:/gluster_bricks/isos/isos Brick2: ovirt2:/gluster_bricks/isos/isos Brick3: ovirt3.localdomain:/gluster_bricks/isos/isos (arbiter) Options Reconfigured: cluster.granular-entry-heal: enable performance.strict-o-direct: on network.ping-timeout: 30 storage.owner-gid: 36 storage.owner-uid: 36 user.cifs: off features.shard: on cluster.shd-wait-qlength: 10000 cluster.shd-max-threads: 8 cluster.locking-scheme: granular cluster.data-self-heal-algorithm: full cluster.server-quorum-type: server cluster.quorum-type: auto cluster.eager-lock: enable network.remote-dio: off performance.low-prio-threads: 32 performance.io-cache: off performance.read-ahead: off performance.quick-read: off transport.address-family: inet nfs.disable: on performance.client-io-threads: off cluster.enable-shared-storage: enable Command run: logrotate -f glusterfs ; logrotate -f glusterfs-georep;? gluster snapshot create isos-snap-2019-04-11 isos? description TEST Logs:[root at ovirt1 glusterfs]# cat cli.log [2019-04-11 07:51:02.367453] I [cli.c:769:main] 0-cli: Started running gluster with version 5.5 [2019-04-11 07:51:02.486863] I [MSGID: 101190] [event-epoll.c:621:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1 [2019-04-11 07:51:02.556813] E [cli-rpc-ops.c:11293:gf_cli_snapshot] 0-cli: cli_to_glusterd for snapshot failed [2019-04-11 07:51:02.556880] I [input.c:31:cli_batch] 0-: Exiting with: -1 [root at ovirt1 glusterfs]# cat glusterd.log [2019-04-11 07:51:02.553357] E [MSGID: 106024] [glusterd-snapshot.c:2547:glusterd_snapshot_create_prevalidate] 0-management: Snapshot is supported only for thin provisioned LV. Ensure that all bricks of isos are thinly provisioned LV. [2019-04-11 07:51:02.553365] W [MSGID: 106029] [glusterd-snapshot.c:8613:glusterd_snapshot_prevalidate] 0-management: Snapshot create pre-validation failed [2019-04-11 07:51:02.553703] W [MSGID: 106121] [glusterd-mgmt.c:147:gd_mgmt_v3_pre_validate_fn] 0-management: Snapshot Prevalidate Failed [2019-04-11 07:51:02.553719] E [MSGID: 106121] [glusterd-mgmt.c:1015:glusterd_mgmt_v3_pre_validate] 0-management: Pre Validation failed for operation Snapshot on local node My LVs hosting the bricks are:[root at ovirt1 ~]# lvs gluster_vg_md0 ? LV????????????? VG???????????? Attr?????? LSize?? Pool??????????? Origin Data%? Meta%? Move Log Cpy%Sync Convert ? gluster_lv_data gluster_vg_md0 Vwi-aot--- 500.00g my_vdo_thinpool??????? 35.97 ? gluster_lv_isos gluster_vg_md0 Vwi-aot---? 50.00g my_vdo_thinpool??????? 52.11 ? my_vdo_thinpool gluster_vg_md0 twi-aot---?? 9.86t??????????????????????? 2.04?? 11.45 [root at ovirt1 ~]# ssh ovirt2 "lvs gluster_vg_md0" ? LV????????????? VG???????????? Attr?????? LSize?? Pool??????????? Origin Data%? Meta%? Move Log Cpy%Sync Convert ? gluster_lv_data gluster_vg_md0 Vwi-aot--- 500.00g my_vdo_thinpool??????? 35.98 ? gluster_lv_isos gluster_vg_md0 Vwi-aot---? 50.00g my_vdo_thinpool??????? 25.94 ? my_vdo_thinpool gluster_vg_md0 twi-aot---? <9.77t??????????????????????? 1.93?? 11.39 [root at ovirt1 ~]# ssh ovirt3 "lvs gluster_vg_sda3" ? LV??????????????????? VG????????????? Attr?????? LSize? Pool????????????????? Origin Data%? Meta%? Move Log Cpy%Sync Convert ? gluster_lv_data?????? gluster_vg_sda3 Vwi-aotz-- 15.00g gluster_thinpool_sda3??????? 0.17 ? gluster_lv_engine???? gluster_vg_sda3 Vwi-aotz-- 15.00g gluster_thinpool_sda3??????? 0.16 ? gluster_lv_isos?????? gluster_vg_sda3 Vwi-aotz-- 15.00g gluster_thinpool_sda3??????? 0.12 ? gluster_thinpool_sda3 gluster_vg_sda3 twi-aotz-- 41.00g????????????????????????????? 0.16?? 1.58 As you can see - all bricks are thin LV and space is not the issue. Can someone hint me how to enable debug , so gluster logs can show the reason for that pre-check failure ? Best Regards,Strahil Nikolov ? ?????, 10 ????? 2019 ?., 9:05:15 ?. ???????-4, Rafi Kavungal Chundattu Parambil ??????: Hi Strahil, The name of device is not at all a problem here. Can you please check the log of glusterd, and see if there is any useful information about the failure. Also please provide the output of `lvscan` and `lvs --noheadings -o pool_lv` from all nodes Regards Rafi KC ----- Original Message ----- From: "Strahil Nikolov" To: gluster-users at gluster.org Sent: Wednesday, April 10, 2019 2:36:39 AM Subject: [Gluster-users] Gluster snapshot fails Hello Community, I have a problem running a snapshot of a replica 3 arbiter 1 volume. Error: [root at ovirt2 ~]# gluster snapshot create before-423 engine description "Before upgrade of engine from 4.2.2 to 4.2.3" snapshot create: failed: Snapshot is supported only for thin provisioned LV. Ensure that all bricks of engine are thinly provisioned LV. Snapshot command failed Volume info: Volume Name: engine Type: Replicate Volume ID: 30ca1cc2-f2f7-4749-9e2e-cee9d7099ded Status: Started Snapshot Count: 0 Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: ovirt1:/gluster_bricks/engine/engine Brick2: ovirt2:/gluster_bricks/engine/engine Brick3: ovirt3:/gluster_bricks/engine/engine (arbiter) Options Reconfigured: cluster.granular-entry-heal: enable performance.strict-o-direct: on network.ping-timeout: 30 storage.owner-gid: 36 storage.owner-uid: 36 user.cifs: off features.shard: on cluster.shd-wait-qlength: 10000 cluster.shd-max-threads: 8 cluster.locking-scheme: granular cluster.data-self-heal-algorithm: full cluster.server-quorum-type: server cluster.quorum-type: auto cluster.eager-lock: enable network.remote-dio: off performance.low-prio-threads: 32 performance.io-cache: off performance.read-ahead: off performance.quick-read: off transport.address-family: inet nfs.disable: on performance.client-io-threads: off cluster.enable-shared-storage: enable All bricks are on thin lvm with plenty of space, the only thing that could be causing it is that ovirt1 & ovirt2 are on /dev/gluster_vg_ssd/gluster_lv_engine , while arbiter is on /dev/gluster_vg_sda3/gluster_lv_engine. Is that the issue ? Should I rename my brick's VG ? If so, why there is no mentioning in the documentation ? Best Regards, Strahil Nikolov _______________________________________________ Gluster-users mailing list Gluster-users at gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From hunter86_bg at yahoo.com Fri Apr 12 11:44:43 2019 From: hunter86_bg at yahoo.com (Strahil Nikolov) Date: Fri, 12 Apr 2019 11:44:43 +0000 (UTC) Subject: [Gluster-users] Gluster snapshot fails In-Reply-To: <1106680361.905173.1555068761758@mail.yahoo.com> References: <1941056132.2442645.1555056751876.ref@mail.yahoo.com> <1941056132.2442645.1555056751876@mail.yahoo.com> <96435198.542814.1555057938954@mail.yahoo.com> <1106680361.905173.1555068761758@mail.yahoo.com> Message-ID: <1298569445.1314916.1555069483300@mail.yahoo.com> I hope this is the last update on the issue -> opened a bug https://bugzilla.redhat.com/show_bug.cgi?id=1699309 Best regards,Strahil Nikolov ? ?????, 12 ????? 2019 ?., 7:32:41 ?. ???????-4, Strahil Nikolov ??????: Hi All, I have tested gluster snapshot without systemd.automount units and it works as follows: [root at ovirt1 system]# gluster snapshot create isos-snap-2019-04-11 isos? description TEST snapshot create: success: Snap isos-snap-2019-04-11_GMT-2019.04.12-11.18.24 created successfully [root at ovirt1 system]# gluster snapshot list isos-snap-2019-04-11_GMT-2019.04.12-11.18.24 [root at ovirt1 system]# gluster snapshot info isos-snap-2019-04-11_GMT-2019.04.12-11.18.24 Snapshot????????????????? : isos-snap-2019-04-11_GMT-2019.04.12-11.18.24 Snap UUID???????????????? : 70d5716e-4633-43d4-a562-8e29a96b0104 Description?????????????? : TEST Created?????????????????? : 2019-04-12 11:18:24 Snap Volumes: ??????? Snap Volume Name????????? : 584e88eab0374c0582cc544a2bc4b79e ??????? Origin Volume name??????? : isos ??????? Snaps taken for isos????? : 1 ??????? Snaps available for isos? : 255 ??????? Status??????????????????? : Stopped Best Regards,Strahil Nikolov ? ?????, 12 ????? 2019 ?., 4:32:18 ?. ???????-4, Strahil Nikolov ??????: Hello All, it seems that "systemd-1" is from the automount unit , and not from the systemd unit. [root at ovirt1 system]# systemctl cat gluster_bricks-isos.automount # /etc/systemd/system/gluster_bricks-isos.automount [Unit] Description=automount for gluster brick ISOS [Automount] Where=/gluster_bricks/isos [Install] WantedBy=multi-user.target Best Regards,Strahil Nikolov ? ?????, 12 ????? 2019 ?., 4:12:31 ?. ???????-4, Strahil Nikolov ??????: Hello All, I have tried to enable debug and see the reason for the issue. Here is the relevant glusterd.log: [2019-04-12 07:56:54.526508] E [MSGID: 106077] [glusterd-snapshot.c:1882:glusterd_is_thinp_brick] 0-management: Failed to get pool name for device systemd-1 [2019-04-12 07:56:54.527509] E [MSGID: 106121] [glusterd-snapshot.c:2523:glusterd_snapshot_create_prevalidate] 0-management: Failed to pre validate [2019-04-12 07:56:54.527525] E [MSGID: 106024] [glusterd-snapshot.c:2547:glusterd_snapshot_create_prevalidate] 0-management: Snapshot is supported only for thin provisioned LV. Ensure that all bricks of isos are thinly provisioned LV. [2019-04-12 07:56:54.527539] W [MSGID: 106029] [glusterd-snapshot.c:8613:glusterd_snapshot_prevalidate] 0-management: Snapshot create pre-validation failed [2019-04-12 07:56:54.527552] W [MSGID: 106121] [glusterd-mgmt.c:147:gd_mgmt_v3_pre_validate_fn] 0-management: Snapshot Prevalidate Failed [2019-04-12 07:56:54.527568] E [MSGID: 106121] [glusterd-mgmt.c:1015:glusterd_mgmt_v3_pre_validate] 0-management: Pre Validation failed for operation Snapshot on local node [2019-04-12 07:56:54.527583] E [MSGID: 106121] [glusterd-mgmt.c:2377:glusterd_mgmt_v3_initiate_snap_phases] 0-management: Pre Validation Failed here is the output of lvscan & lvs: [root at ovirt1 ~]# lvscan ? ACTIVE??????????? '/dev/gluster_vg_md0/my_vdo_thinpool' [9.86 TiB] inherit ? ACTIVE??????????? '/dev/gluster_vg_md0/gluster_lv_data' [500.00 GiB] inherit ? ACTIVE??????????? '/dev/gluster_vg_md0/gluster_lv_isos' [50.00 GiB] inherit ? ACTIVE??????????? '/dev/gluster_vg_ssd/my_ssd_thinpool' [168.59 GiB] inherit ? ACTIVE??????????? '/dev/gluster_vg_ssd/gluster_lv_engine' [40.00 GiB] inherit ? ACTIVE??????????? '/dev/centos_ovirt1/swap' [6.70 GiB] inherit ? ACTIVE??????????? '/dev/centos_ovirt1/home' [1.00 GiB] inherit ? ACTIVE??????????? '/dev/centos_ovirt1/root' [60.00 GiB] inherit [root at ovirt1 ~]# lvs --noheadings -o pool_lv ? my_vdo_thinpool ? my_vdo_thinpool ? my_ssd_thinpool [root at ovirt1 ~]# ssh ovirt2 "lvscan;lvs --noheadings -o pool_lv" ? ACTIVE??????????? '/dev/gluster_vg_md0/my_vdo_thinpool' [<9.77 TiB] inherit ? ACTIVE??????????? '/dev/gluster_vg_md0/gluster_lv_data' [500.00 GiB] inherit ? ACTIVE??????????? '/dev/gluster_vg_md0/gluster_lv_isos' [50.00 GiB] inherit ? ACTIVE??????????? '/dev/gluster_vg_ssd/my_ssd_thinpool' [<161.40 GiB] inherit ? ACTIVE??????????? '/dev/gluster_vg_ssd/gluster_lv_engine' [40.00 GiB] inherit ? ACTIVE??????????? '/dev/centos_ovirt2/root' [15.00 GiB] inherit ? ACTIVE??????????? '/dev/centos_ovirt2/home' [1.00 GiB] inherit ? ACTIVE??????????? '/dev/centos_ovirt2/swap' [16.00 GiB] inherit ? my_vdo_thinpool ? my_vdo_thinpool ? my_ssd_thinpool [root at ovirt1 ~]# ssh ovirt3 "lvscan;lvs --noheadings -o pool_lv" ? ACTIVE??????????? '/dev/gluster_vg_sda3/gluster_thinpool_sda3' [41.00 GiB] inherit ? ACTIVE??????????? '/dev/gluster_vg_sda3/gluster_lv_data' [15.00 GiB] inherit ? ACTIVE??????????? '/dev/gluster_vg_sda3/gluster_lv_isos' [15.00 GiB] inherit ? ACTIVE??????????? '/dev/gluster_vg_sda3/gluster_lv_engine' [15.00 GiB] inherit ? ACTIVE??????????? '/dev/centos_ovirt3/root' [20.00 GiB] inherit ? ACTIVE??????????? '/dev/centos_ovirt3/home' [1.00 GiB] inherit ? ACTIVE??????????? '/dev/centos_ovirt3/swap' [8.00 GiB] inherit ? gluster_thinpool_sda3 ? gluster_thinpool_sda3 ? gluster_thinpool_sda3 I am mounting my bricks via systemd , as I have issues with bricks being started before VDO. [root at ovirt1 ~]# findmnt /gluster_bricks/isos TARGET?????????????? SOURCE???????????????????????????????????? FSTYPE OPTIONS /gluster_bricks/isos systemd-1????????????????????????????????? autofs rw,relatime,fd=31,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=21843 /gluster_bricks/isos /dev/mapper/gluster_vg_md0-gluster_lv_isos xfs??? rw,noatime,nodiratime,seclabel,attr2,inode64,noquota [root at ovirt1 ~]# ssh ovirt2 "findmnt /gluster_bricks/isos " TARGET?????????????? SOURCE???????????????????????????????????? FSTYPE OPTIONS /gluster_bricks/isos systemd-1????????????????????????????????? autofs rw,relatime,fd=26,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=14279 /gluster_bricks/isos /dev/mapper/gluster_vg_md0-gluster_lv_isos xfs??? rw,noatime,nodiratime,seclabel,attr2,inode64,noquota [root at ovirt1 ~]# ssh ovirt3 "findmnt /gluster_bricks/isos " TARGET?????????????? SOURCE????????????????????????????????????? FSTYPE OPTIONS /gluster_bricks/isos systemd-1?????????????????????????????????? autofs rw,relatime,fd=35,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=17770 /gluster_bricks/isos /dev/mapper/gluster_vg_sda3-gluster_lv_isos xfs??? rw,noatime,nodiratime,seclabel,attr2,inode64,logbsize=256k,sunit=512,swidth=1024,noquota [root at ovirt1 ~]# grep "gluster_bricks" /proc/mounts systemd-1 /gluster_bricks/data autofs rw,relatime,fd=22,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=21513 0 0 systemd-1 /gluster_bricks/engine autofs rw,relatime,fd=25,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=21735 0 0 systemd-1 /gluster_bricks/isos autofs rw,relatime,fd=31,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=21843 0 0 /dev/mapper/gluster_vg_ssd-gluster_lv_engine /gluster_bricks/engine xfs rw,seclabel,noatime,nodiratime,attr2,inode64,sunit=256,swidth=256,noquota 0 0 /dev/mapper/gluster_vg_md0-gluster_lv_isos /gluster_bricks/isos xfs rw,seclabel,noatime,nodiratime,attr2,inode64,noquota 0 0 /dev/mapper/gluster_vg_md0-gluster_lv_data /gluster_bricks/data xfs rw,seclabel,noatime,nodiratime,attr2,inode64,noquota 0 0 Obviously , gluster is catching "systemd-1" as a device and tries to check if it's a thin LV.Where should I open a bug for that ? P.S.: Adding oVirt User list. Best Regards,Strahil Nikolov ? ?????????, 11 ????? 2019 ?., 4:00:31 ?. ???????-4, Strahil Nikolov ??????: Hi Rafi, thanks for your update. I have tested again with another gluster volume.[root at ovirt1 glusterfs]# gluster volume info isos Volume Name: isos Type: Replicate Volume ID: 9b92b5bd-79f5-427b-bd8d-af28b038ed2a Status: Started Snapshot Count: 0 Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: ovirt1:/gluster_bricks/isos/isos Brick2: ovirt2:/gluster_bricks/isos/isos Brick3: ovirt3.localdomain:/gluster_bricks/isos/isos (arbiter) Options Reconfigured: cluster.granular-entry-heal: enable performance.strict-o-direct: on network.ping-timeout: 30 storage.owner-gid: 36 storage.owner-uid: 36 user.cifs: off features.shard: on cluster.shd-wait-qlength: 10000 cluster.shd-max-threads: 8 cluster.locking-scheme: granular cluster.data-self-heal-algorithm: full cluster.server-quorum-type: server cluster.quorum-type: auto cluster.eager-lock: enable network.remote-dio: off performance.low-prio-threads: 32 performance.io-cache: off performance.read-ahead: off performance.quick-read: off transport.address-family: inet nfs.disable: on performance.client-io-threads: off cluster.enable-shared-storage: enable Command run: logrotate -f glusterfs ; logrotate -f glusterfs-georep;? gluster snapshot create isos-snap-2019-04-11 isos? description TEST Logs:[root at ovirt1 glusterfs]# cat cli.log [2019-04-11 07:51:02.367453] I [cli.c:769:main] 0-cli: Started running gluster with version 5.5 [2019-04-11 07:51:02.486863] I [MSGID: 101190] [event-epoll.c:621:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1 [2019-04-11 07:51:02.556813] E [cli-rpc-ops.c:11293:gf_cli_snapshot] 0-cli: cli_to_glusterd for snapshot failed [2019-04-11 07:51:02.556880] I [input.c:31:cli_batch] 0-: Exiting with: -1 [root at ovirt1 glusterfs]# cat glusterd.log [2019-04-11 07:51:02.553357] E [MSGID: 106024] [glusterd-snapshot.c:2547:glusterd_snapshot_create_prevalidate] 0-management: Snapshot is supported only for thin provisioned LV. Ensure that all bricks of isos are thinly provisioned LV. [2019-04-11 07:51:02.553365] W [MSGID: 106029] [glusterd-snapshot.c:8613:glusterd_snapshot_prevalidate] 0-management: Snapshot create pre-validation failed [2019-04-11 07:51:02.553703] W [MSGID: 106121] [glusterd-mgmt.c:147:gd_mgmt_v3_pre_validate_fn] 0-management: Snapshot Prevalidate Failed [2019-04-11 07:51:02.553719] E [MSGID: 106121] [glusterd-mgmt.c:1015:glusterd_mgmt_v3_pre_validate] 0-management: Pre Validation failed for operation Snapshot on local node My LVs hosting the bricks are:[root at ovirt1 ~]# lvs gluster_vg_md0 ? LV????????????? VG???????????? Attr?????? LSize?? Pool??????????? Origin Data%? Meta%? Move Log Cpy%Sync Convert ? gluster_lv_data gluster_vg_md0 Vwi-aot--- 500.00g my_vdo_thinpool??????? 35.97 ? gluster_lv_isos gluster_vg_md0 Vwi-aot---? 50.00g my_vdo_thinpool??????? 52.11 ? my_vdo_thinpool gluster_vg_md0 twi-aot---?? 9.86t??????????????????????? 2.04?? 11.45 [root at ovirt1 ~]# ssh ovirt2 "lvs gluster_vg_md0" ? LV????????????? VG???????????? Attr?????? LSize?? Pool??????????? Origin Data%? Meta%? Move Log Cpy%Sync Convert ? gluster_lv_data gluster_vg_md0 Vwi-aot--- 500.00g my_vdo_thinpool??????? 35.98 ? gluster_lv_isos gluster_vg_md0 Vwi-aot---? 50.00g my_vdo_thinpool??????? 25.94 ? my_vdo_thinpool gluster_vg_md0 twi-aot---? <9.77t??????????????????????? 1.93?? 11.39 [root at ovirt1 ~]# ssh ovirt3 "lvs gluster_vg_sda3" ? LV??????????????????? VG????????????? Attr?????? LSize? Pool????????????????? Origin Data%? Meta%? Move Log Cpy%Sync Convert ? gluster_lv_data?????? gluster_vg_sda3 Vwi-aotz-- 15.00g gluster_thinpool_sda3??????? 0.17 ? gluster_lv_engine???? gluster_vg_sda3 Vwi-aotz-- 15.00g gluster_thinpool_sda3??????? 0.16 ? gluster_lv_isos?????? gluster_vg_sda3 Vwi-aotz-- 15.00g gluster_thinpool_sda3??????? 0.12 ? gluster_thinpool_sda3 gluster_vg_sda3 twi-aotz-- 41.00g????????????????????????????? 0.16?? 1.58 As you can see - all bricks are thin LV and space is not the issue. Can someone hint me how to enable debug , so gluster logs can show the reason for that pre-check failure ? Best Regards,Strahil Nikolov ? ?????, 10 ????? 2019 ?., 9:05:15 ?. ???????-4, Rafi Kavungal Chundattu Parambil ??????: Hi Strahil, The name of device is not at all a problem here. Can you please check the log of glusterd, and see if there is any useful information about the failure. Also please provide the output of `lvscan` and `lvs --noheadings -o pool_lv` from all nodes Regards Rafi KC ----- Original Message ----- From: "Strahil Nikolov" To: gluster-users at gluster.org Sent: Wednesday, April 10, 2019 2:36:39 AM Subject: [Gluster-users] Gluster snapshot fails Hello Community, I have a problem running a snapshot of a replica 3 arbiter 1 volume. Error: [root at ovirt2 ~]# gluster snapshot create before-423 engine description "Before upgrade of engine from 4.2.2 to 4.2.3" snapshot create: failed: Snapshot is supported only for thin provisioned LV. Ensure that all bricks of engine are thinly provisioned LV. Snapshot command failed Volume info: Volume Name: engine Type: Replicate Volume ID: 30ca1cc2-f2f7-4749-9e2e-cee9d7099ded Status: Started Snapshot Count: 0 Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: ovirt1:/gluster_bricks/engine/engine Brick2: ovirt2:/gluster_bricks/engine/engine Brick3: ovirt3:/gluster_bricks/engine/engine (arbiter) Options Reconfigured: cluster.granular-entry-heal: enable performance.strict-o-direct: on network.ping-timeout: 30 storage.owner-gid: 36 storage.owner-uid: 36 user.cifs: off features.shard: on cluster.shd-wait-qlength: 10000 cluster.shd-max-threads: 8 cluster.locking-scheme: granular cluster.data-self-heal-algorithm: full cluster.server-quorum-type: server cluster.quorum-type: auto cluster.eager-lock: enable network.remote-dio: off performance.low-prio-threads: 32 performance.io-cache: off performance.read-ahead: off performance.quick-read: off transport.address-family: inet nfs.disable: on performance.client-io-threads: off cluster.enable-shared-storage: enable All bricks are on thin lvm with plenty of space, the only thing that could be causing it is that ovirt1 & ovirt2 are on /dev/gluster_vg_ssd/gluster_lv_engine , while arbiter is on /dev/gluster_vg_sda3/gluster_lv_engine. Is that the issue ? Should I rename my brick's VG ? If so, why there is no mentioning in the documentation ? Best Regards, Strahil Nikolov _______________________________________________ Gluster-users mailing list Gluster-users at gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From felix.koelzow at gmx.de Fri Apr 12 15:04:02 2019 From: felix.koelzow at gmx.de (=?UTF-8?Q?Felix_K=c3=b6lzow?=) Date: Fri, 12 Apr 2019 17:04:02 +0200 Subject: [Gluster-users] Replica 3: Client access via FUSE failed if two bricks are down Message-ID: Dear Gluster-Community, I created a test-environment to test a gluster volume with replica 3. Afterwards, I am able to manually mount the gluster volume using FUSE. mount command: mount -t glusterfs? -o backup-volfile-servers=gluster01:gluster02 gluster00:/ifwFuse /mnt/glusterfs/ifwFuse Just for a testing purpose, I shutdown *two* (arbitrary) bricks and one brick keeps still online and accessible via ssh. If I poweroff the two machines, I immediately get the following error message: ls: cannot open directory .: Transport endpoint is not connected From my understanding of replica 3, even if two bricks are broken the client should be able to have access to the data. Actually, I don't know how to solve that issue. Any idea is welcome! If you need any log-file as further information, just give me a hint! Thanks in advance. Felix -------------- next part -------------- An HTML attachment was scrubbed... URL: From ravishankar at redhat.com Fri Apr 12 15:53:17 2019 From: ravishankar at redhat.com (Ravishankar N) Date: Fri, 12 Apr 2019 21:23:17 +0530 Subject: [Gluster-users] Replica 3: Client access via FUSE failed if two bricks are down In-Reply-To: References: Message-ID: On 12/04/19 8:34 PM, Felix K?lzow wrote: > > Dear Gluster-Community, > > > I created a test-environment to test a gluster volume with replica 3. > > Afterwards, I am able to manually mount the gluster volume using FUSE. > > > mount command: > > mount -t glusterfs? -o backup-volfile-servers=gluster01:gluster02 > gluster00:/ifwFuse /mnt/glusterfs/ifwFuse > > > Just for a testing purpose, I shutdown *two* (arbitrary) bricks and > one brick keeps still online > > and accessible via ssh. If I poweroff the two machines, I immediately > get the following error message: > > ls: cannot open directory .: Transport endpoint is not connected > > > From my understanding of replica 3, even if two bricks are broken the > client should be able to > > have access to the data. > In replica 3, 2 out of 3 bricks must be up for allowing access. IoW, client-quorum must be met. Otherwise you would get ENOTCONN like you observed. > > Actually, I don't know how to solve that issue. Any idea is welcome! > You could disable client-quorum but it is strongly advised not to do so in order to prevent split-brains (https://docs.gluster.org/en/v3/Administrator%20Guide/Split%20brain%20and%20ways%20to%20deal%20with%20it/). Hope that helps. -Ravi > > If you need any log-file as further information, just give me a hint! > > > Thanks in advance. > > Felix > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From bgoldowsky at cast.org Thu Apr 11 17:51:46 2019 From: bgoldowsky at cast.org (Boris Goldowsky) Date: Thu, 11 Apr 2019 17:51:46 +0000 Subject: [Gluster-users] Volume stuck unable to add a brick Message-ID: <02CC8632-9B90-43F6-89FA-1160CE529667@contoso.com> I?ve got a replicated volume with three bricks (?1x3=3?), the idea is to have a common set of files that are locally available on all the machines (Scientific Linux 7, which is essentially CentOS 7) in a cluster. I tried to add on a fourth machine, so used a command like this: sudo gluster volume add-brick dockervols replica 4 webserver8:/data/gluster/dockervols force but the result is: volume add-brick: failed: Commit failed on webserver1. Please check log file for details. Commit failed on webserver8. Please check log file for details. Commit failed on webserver11. Please check log file for details. Tried: removing the new brick (this also fails) and trying again. Tried: checking the logs. The log files are not enlightening to me ? I don?t know what?s normal and what?s not. Tried: deleting the brick directory from previous attempt, so that it?s not in the way. Tried: restarting gluster services Tried: rebooting Tried: setting up a new volume, replicated to all four machines. This works, so I?m assuming it?s not a networking issue. But still fails with this existing volume that has the critical data in it. Running out of ideas. Any suggestions? Thank you! Boris -------------- next part -------------- An HTML attachment was scrubbed... URL: From atin.mukherjee83 at gmail.com Fri Apr 12 17:09:45 2019 From: atin.mukherjee83 at gmail.com (Atin Mukherjee) Date: Fri, 12 Apr 2019 22:39:45 +0530 Subject: [Gluster-users] Volume stuck unable to add a brick In-Reply-To: <02CC8632-9B90-43F6-89FA-1160CE529667@contoso.com> References: <02CC8632-9B90-43F6-89FA-1160CE529667@contoso.com> Message-ID: On Fri, 12 Apr 2019 at 22:32, Boris Goldowsky wrote: > I?ve got a replicated volume with three bricks (?1x3=3?), the idea is to > have a common set of files that are locally available on all the machines > (Scientific Linux 7, which is essentially CentOS 7) in a cluster. > > > > I tried to add on a fourth machine, so used a command like this: > > > > sudo gluster volume add-brick dockervols replica 4 > webserver8:/data/gluster/dockervols force > > > > but the result is: > > volume add-brick: failed: Commit failed on webserver1. Please check log > file for details. > > Commit failed on webserver8. Please check log file for details. > > Commit failed on webserver11. Please check log file for details. > > > > Tried: removing the new brick (this also fails) and trying again. > > Tried: checking the logs. The log files are not enlightening to me ? I > don?t know what?s normal and what?s not. > >From webserver8 & webserver11 could you attach glusterd log files? Also please share following: - gluster version? (gluster ?version) - Output of ?gluster peer status? - Output of ?gluster v info? from all 4 nodes. Tried: deleting the brick directory from previous attempt, so that it?s not > in the way. > > Tried: restarting gluster services > > Tried: rebooting > > Tried: setting up a new volume, replicated to all four machines. This > works, so I?m assuming it?s not a networking issue. But still fails with > this existing volume that has the critical data in it. > > > > Running out of ideas. Any suggestions? Thank you! > > > > Boris > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -- --Atin -------------- next part -------------- An HTML attachment was scrubbed... URL: From sankarshan.mukhopadhyay at gmail.com Sat Apr 13 05:41:35 2019 From: sankarshan.mukhopadhyay at gmail.com (Sankarshan Mukhopadhyay) Date: Sat, 13 Apr 2019 11:11:35 +0530 Subject: [Gluster-users] [request for feedback] starting a small series about troubleshooting GlusterFS Message-ID: We've often seen threads on this list (and others) around troubleshooting a component within GlusterFS leading to a bit of back-and-forth while the minimum data sets are collected. In order to increase the understanding and familiarity, we started off a series with the first one being around a general approach to collecting data for troubleshooting. You can listen to the episode at The desired end outcome is that if we talk about something that is not yet documented or, incompletely/inaccurately documented, we'll fix the shortcomings with the documentation. I'm also looking forward to feedback on what can be covered and would be relevant. Please respond here or, on Twitter (@Gluster) From revirii at googlemail.com Mon Apr 15 06:33:22 2019 From: revirii at googlemail.com (Hu Bert) Date: Mon, 15 Apr 2019 08:33:22 +0200 Subject: [Gluster-users] Upgrade 5.5 -> 5.6: network traffic bug fixed? Message-ID: Good Morning, today i updated my replica 3 setup (debian stretch) from version 5.5 to 5.6, as i thought the network traffic bug (#1673058) was fixed and i could re-activate 'performance.quick-read' again. See release notes: https://review.gluster.org/#/c/glusterfs/+/22538/ http://git.gluster.org/cgit/glusterfs.git/commit/?id=34a2347780c2429284f57232f3aabb78547a9795 Upgrade went fine, and then i was watching iowait and network traffic. It seems that the network traffic went up after upgrade and reactivation of performance.quick-read. Here are some graphs: network client1: https://abload.de/img/network-clientfwj1m.png network client2: https://abload.de/img/network-client2trkow.png network server: https://abload.de/img/network-serverv3jjr.png gluster volume info: https://pastebin.com/ZMuJYXRZ Just wondering if the network traffic bug really got fixed or if this is a new problem. I'll wait a couple of minutes and then deactivate performance.quick-read again, just to see if network traffic goes down to normal levels. Best regards, Hubert From revirii at googlemail.com Mon Apr 15 06:54:12 2019 From: revirii at googlemail.com (Hu Bert) Date: Mon, 15 Apr 2019 08:54:12 +0200 Subject: [Gluster-users] Upgrade 5.5 -> 5.6: network traffic bug fixed? In-Reply-To: References: Message-ID: fyi: after setting performance.quick-read to off network traffic dropped to normal levels, client load/iowait back to normal as well. client: https://abload.de/img/network-client-afterihjqi.png server: https://abload.de/img/network-server-afterwdkrl.png Am Mo., 15. Apr. 2019 um 08:33 Uhr schrieb Hu Bert : > > Good Morning, > > today i updated my replica 3 setup (debian stretch) from version 5.5 > to 5.6, as i thought the network traffic bug (#1673058) was fixed and > i could re-activate 'performance.quick-read' again. See release notes: > > https://review.gluster.org/#/c/glusterfs/+/22538/ > http://git.gluster.org/cgit/glusterfs.git/commit/?id=34a2347780c2429284f57232f3aabb78547a9795 > > Upgrade went fine, and then i was watching iowait and network traffic. > It seems that the network traffic went up after upgrade and > reactivation of performance.quick-read. Here are some graphs: > > network client1: https://abload.de/img/network-clientfwj1m.png > network client2: https://abload.de/img/network-client2trkow.png > network server: https://abload.de/img/network-serverv3jjr.png > > gluster volume info: https://pastebin.com/ZMuJYXRZ > > Just wondering if the network traffic bug really got fixed or if this > is a new problem. I'll wait a couple of minutes and then deactivate > performance.quick-read again, just to see if network traffic goes down > to normal levels. > > > Best regards, > Hubert From spisla80 at gmail.com Mon Apr 15 09:09:11 2019 From: spisla80 at gmail.com (David Spisla) Date: Mon, 15 Apr 2019 11:09:11 +0200 Subject: [Gluster-users] XFS, WORM and the Year-2038 Problem Message-ID: Hi folks, I tried out default retention periods e.g. to set the Retention date to 2071. When I did the WORMing, everything seems to be OK. From FUSE and also at Brick-Level, the retention was set to 2071 on all nodes.Additionally I enabled the storage.ctime option, so that the timestamps are stored in the mdata xattr, too. But after a while I obeserved, that on Brick-Level the atime (which stores the retention) was switched to 1934: # stat /gluster/brick1/glusterbrick/data/file3.txt File: /gluster/brick1/glusterbrick/data/file3.txt Size: 5 Blocks: 16 IO Block: 4096 regular file Device: 830h/2096d Inode: 115 Links: 2 Access: (0544/-r-xr--r--) Uid: ( 2000/ gluster) Gid: ( 2000/ gluster) Access: 1934-12-13 20:45:51.000000000 +0000 Modify: 2019-04-10 09:50:09.000000000 +0000 Change: 2019-04-10 10:13:39.703623917 +0000 Birth: - >From FUSE I get the correct atime: # stat /gluster/volume1/data/file3.txt File: /gluster/volume1/data/file3.txt Size: 5 Blocks: 1 IO Block: 131072 regular file Device: 2eh/46d Inode: 10812026387234582248 Links: 1 Access: (0544/-r-xr--r--) Uid: ( 2000/ gluster) Gid: ( 2000/ gluster) Access: 2071-01-19 03:14:07.000000000 +0000 Modify: 2019-04-10 09:50:09.000000000 +0000 Change: 2019-04-10 10:13:39.705341476 +0000 Birth: - I find out that XFS supports only 32-Bit timestamp values. So in my expectation it should not be possible to set the atime to 2071. But at first it was 2071 and later it was switched to 1934 due to the YEAR-2038 problem. I am asking myself: 1. Why it is possible to set atime on XFS greater than 2038? 2. And why this atime switched to a time lower 1970 after a while? Regards David Spisla -------------- next part -------------- An HTML attachment was scrubbed... URL: From atumball at redhat.com Mon Apr 15 09:26:46 2019 From: atumball at redhat.com (Amar Tumballi Suryanarayan) Date: Mon, 15 Apr 2019 14:56:46 +0530 Subject: [Gluster-users] XFS, WORM and the Year-2038 Problem In-Reply-To: References: Message-ID: On Mon, Apr 15, 2019 at 2:40 PM David Spisla wrote: > Hi folks, > I tried out default retention periods e.g. to set the Retention date to > 2071. When I did the WORMing, everything seems to be OK. From FUSE and also > at Brick-Level, the retention was set to 2071 on all nodes.Additionally I > enabled the storage.ctime option, so that the timestamps are stored in the > mdata xattr, too. But after a while I obeserved, that on Brick-Level the > atime (which stores the retention) was switched to 1934: > > # stat /gluster/brick1/glusterbrick/data/file3.txt > File: /gluster/brick1/glusterbrick/data/file3.txt > Size: 5 Blocks: 16 IO Block: 4096 regular file > Device: 830h/2096d Inode: 115 Links: 2 > Access: (0544/-r-xr--r--) Uid: ( 2000/ gluster) Gid: ( 2000/ > gluster) > Access: 1934-12-13 20:45:51.000000000 +0000 > Modify: 2019-04-10 09:50:09.000000000 +0000 > Change: 2019-04-10 10:13:39.703623917 +0000 > Birth: - > > From FUSE I get the correct atime: > # stat /gluster/volume1/data/file3.txt > File: /gluster/volume1/data/file3.txt > Size: 5 Blocks: 1 IO Block: 131072 regular file > Device: 2eh/46d Inode: 10812026387234582248 Links: 1 > Access: (0544/-r-xr--r--) Uid: ( 2000/ gluster) Gid: ( 2000/ > gluster) > Access: 2071-01-19 03:14:07.000000000 +0000 > Modify: 2019-04-10 09:50:09.000000000 +0000 > Change: 2019-04-10 10:13:39.705341476 +0000 > Birth: - > > >From FUSE you get the time of what the clients set, as we now store timestamp as extended attribute, not the 'stat->st_atime'. This is called 'ctime' feature which we introduced in glusterfs-5.0, It helps us to support statx() variables. > I find out that XFS supports only 32-Bit timestamp values. So in my > expectation it should not be possible to set the atime to 2071. But at > first it was 2071 and later it was switched to 1934 due to the YEAR-2038 > problem. I am asking myself: > 1. Why it is possible to set atime on XFS greater than 2038? > 2. And why this atime switched to a time lower 1970 after a while? > > Regards > David Spisla > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -- Amar Tumballi (amarts) -------------- next part -------------- An HTML attachment was scrubbed... URL: From greenet at ornl.gov Mon Apr 15 13:09:12 2019 From: greenet at ornl.gov (Greene, Tami McFarlin) Date: Mon, 15 Apr 2019 13:09:12 +0000 Subject: [Gluster-users] Difference between processes: shrinking volume and replacing faulty brick Message-ID: We need to remove a server node from our configuration (distributed volume). There is more than enough space on the remaining bricks to accept the data attached to the failing server; we didn?t know if one process or the other would be significantly faster. We know shrinking the volume (remove-brick) rebalances as it moves the data; so moving 506G resuled in the rebalancing of 1.8T and took considerable time. Reading the documentation, it seems that replacing a brick is simplying introducing an empty brick to accept the displaced data, but it is the exact same process: remove-brick. Is there anyway to migrate the data without rebalancing at the same time and then rebalancing once all data has been moved? I know that is not ideal, but it would allow us to remove the problem server much quicker and resume production while rebalancing. Tami Tami McFarlin Greene Lab Technician RF, Communications, and Intelligent Systems Group Electrical and Electronics System Research Division Oak Ridge National Laboratory Bldg. 3500, Rm. A15 greenet at ornl.gov (865) 643-0401 -------------- next part -------------- An HTML attachment was scrubbed... URL: From atin.mukherjee83 at gmail.com Mon Apr 15 16:13:16 2019 From: atin.mukherjee83 at gmail.com (Atin Mukherjee) Date: Mon, 15 Apr 2019 21:43:16 +0530 Subject: [Gluster-users] Volume stuck unable to add a brick In-Reply-To: <52179D48-7CF0-405E-805F-C5DCDF5B12CB@cast.org> References: <02CC8632-9B90-43F6-89FA-1160CE529667@contoso.com> <52179D48-7CF0-405E-805F-C5DCDF5B12CB@cast.org> Message-ID: +Karthik Subrahmanya Didn't we we fix this problem recently? Failed to set extended attribute indicates that temp mount is failing and we don't have quorum number of bricks up. Boris - What's the gluster version are you using? On Mon, Apr 15, 2019 at 7:35 PM Boris Goldowsky wrote: > Atin, thank you for the reply. Here are all of those pieces of > information: > > > > [bgoldowsky at webserver9 ~]$ gluster --version > > glusterfs 3.12.2 > > (same on all nodes) > > > > [bgoldowsky at webserver9 ~]$ sudo gluster peer status > > Number of Peers: 3 > > > > Hostname: webserver11.cast.org > > Uuid: c2b147fd-cab4-4859-9922-db5730f8549d > > State: Peer in Cluster (Connected) > > > > Hostname: webserver1.cast.org > > Uuid: 4b918f65-2c9d-478e-8648-81d1d6526d4c > > State: Peer in Cluster (Connected) > > Other names: > > 192.168.200.131 > > webserver1 > > > > Hostname: webserver8.cast.org > > Uuid: be2f568b-61c5-4016-9264-083e4e6453a2 > > State: Peer in Cluster (Connected) > > Other names: > > webserver8 > > > > [bgoldowsky at webserver1 ~]$ sudo gluster v info > > Volume Name: dockervols > > Type: Replicate > > Volume ID: 6093a9c6-ec6c-463a-ad25-8c3e3305b98a > > Status: Started > > Snapshot Count: 0 > > Number of Bricks: 1 x 3 = 3 > > Transport-type: tcp > > Bricks: > > Brick1: webserver1:/data/gluster/dockervols > > Brick2: webserver11:/data/gluster/dockervols > > Brick3: webserver9:/data/gluster/dockervols > > Options Reconfigured: > > nfs.disable: on > > transport.address-family: inet > > auth.allow: 127.0.0.1 > > > > Volume Name: testvol > > Type: Replicate > > Volume ID: 4d5f00f5-00ea-4dcf-babf-1a76eca55332 > > Status: Started > > Snapshot Count: 0 > > Number of Bricks: 1 x 4 = 4 > > Transport-type: tcp > > Bricks: > > Brick1: webserver1:/data/gluster/testvol > > Brick2: webserver9:/data/gluster/testvol > > Brick3: webserver11:/data/gluster/testvol > > Brick4: webserver8:/data/gluster/testvol > > Options Reconfigured: > > transport.address-family: inet > > nfs.disable: on > > > > [bgoldowsky at webserver8 ~]$ sudo gluster v info > > Volume Name: dockervols > > Type: Replicate > > Volume ID: 6093a9c6-ec6c-463a-ad25-8c3e3305b98a > > Status: Started > > Snapshot Count: 0 > > Number of Bricks: 1 x 3 = 3 > > Transport-type: tcp > > Bricks: > > Brick1: webserver1:/data/gluster/dockervols > > Brick2: webserver11:/data/gluster/dockervols > > Brick3: webserver9:/data/gluster/dockervols > > Options Reconfigured: > > nfs.disable: on > > transport.address-family: inet > > auth.allow: 127.0.0.1 > > > > Volume Name: testvol > > Type: Replicate > > Volume ID: 4d5f00f5-00ea-4dcf-babf-1a76eca55332 > > Status: Started > > Snapshot Count: 0 > > Number of Bricks: 1 x 4 = 4 > > Transport-type: tcp > > Bricks: > > Brick1: webserver1:/data/gluster/testvol > > Brick2: webserver9:/data/gluster/testvol > > Brick3: webserver11:/data/gluster/testvol > > Brick4: webserver8:/data/gluster/testvol > > Options Reconfigured: > > nfs.disable: on > > transport.address-family: inet > > > > [bgoldowsky at webserver9 ~]$ sudo gluster v info > > Volume Name: dockervols > > Type: Replicate > > Volume ID: 6093a9c6-ec6c-463a-ad25-8c3e3305b98a > > Status: Started > > Snapshot Count: 0 > > Number of Bricks: 1 x 3 = 3 > > Transport-type: tcp > > Bricks: > > Brick1: webserver1:/data/gluster/dockervols > > Brick2: webserver11:/data/gluster/dockervols > > Brick3: webserver9:/data/gluster/dockervols > > Options Reconfigured: > > nfs.disable: on > > transport.address-family: inet > > auth.allow: 127.0.0.1 > > > > Volume Name: testvol > > Type: Replicate > > Volume ID: 4d5f00f5-00ea-4dcf-babf-1a76eca55332 > > Status: Started > > Snapshot Count: 0 > > Number of Bricks: 1 x 4 = 4 > > Transport-type: tcp > > Bricks: > > Brick1: webserver1:/data/gluster/testvol > > Brick2: webserver9:/data/gluster/testvol > > Brick3: webserver11:/data/gluster/testvol > > Brick4: webserver8:/data/gluster/testvol > > Options Reconfigured: > > nfs.disable: on > > transport.address-family: inet > > > > [bgoldowsky at webserver11 ~]$ sudo gluster v info > > Volume Name: dockervols > > Type: Replicate > > Volume ID: 6093a9c6-ec6c-463a-ad25-8c3e3305b98a > > Status: Started > > Snapshot Count: 0 > > Number of Bricks: 1 x 3 = 3 > > Transport-type: tcp > > Bricks: > > Brick1: webserver1:/data/gluster/dockervols > > Brick2: webserver11:/data/gluster/dockervols > > Brick3: webserver9:/data/gluster/dockervols > > Options Reconfigured: > > auth.allow: 127.0.0.1 > > transport.address-family: inet > > nfs.disable: on > > > > Volume Name: testvol > > Type: Replicate > > Volume ID: 4d5f00f5-00ea-4dcf-babf-1a76eca55332 > > Status: Started > > Snapshot Count: 0 > > Number of Bricks: 1 x 4 = 4 > > Transport-type: tcp > > Bricks: > > Brick1: webserver1:/data/gluster/testvol > > Brick2: webserver9:/data/gluster/testvol > > Brick3: webserver11:/data/gluster/testvol > > Brick4: webserver8:/data/gluster/testvol > > Options Reconfigured: > > transport.address-family: inet > > nfs.disable: on > > > > [bgoldowsky at webserver9 ~]$ sudo gluster volume add-brick dockervols > replica 4 webserver8:/data/gluster/dockervols force > > volume add-brick: failed: Commit failed on webserver8.cast.org. Please > check log file for details. > > > > Webserver8 glusterd.log: > > > > [2019-04-15 13:55:42.338197] I [MSGID: 106488] > [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management: > Received get vol req > > The message "I [MSGID: 106488] > [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management: > Received get vol req" repeated 2 times between [2019-04-15 13:55:42.338197] > and [2019-04-15 13:55:42.341618] > > [2019-04-15 14:00:20.445011] I [run.c:190:runner_log] > (-->/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0x3a215) > [0x7fe697764215] > -->/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0xe3e9d) > [0x7fe69780de9d] -->/lib64/libglusterfs.so.0(runner_log+0x115) > [0x7fe6a2d16ea5] ) 0-management: Ran script: > /var/lib/glusterd/hooks/1/add-brick/pre/S28Quota-enable-root-xattr-heal.sh > --volname=dockervols --version=1 --volume-op=add-brick > --gd-workdir=/var/lib/glusterd > > [2019-04-15 14:00:20.445148] I [MSGID: 106578] > [glusterd-brick-ops.c:1354:glusterd_op_perform_add_bricks] 0-management: > replica-count is set 4 > > [2019-04-15 14:00:20.445184] I [MSGID: 106578] > [glusterd-brick-ops.c:1364:glusterd_op_perform_add_bricks] 0-management: > type is set 0, need to change it > > [2019-04-15 14:00:20.672347] E [MSGID: 106054] > [glusterd-utils.c:13863:glusterd_handle_replicate_brick_ops] 0-management: > Failed to set extended attribute trusted.add-brick : Transport endpoint is > not connected [Transport endpoint is not connected] > > [2019-04-15 14:00:20.693491] E [MSGID: 101042] > [compat.c:569:gf_umount_lazy] 0-management: Lazy unmount of /tmp/mntmvdFGq > [Transport endpoint is not connected] > > [2019-04-15 14:00:20.693597] E [MSGID: 106074] > [glusterd-brick-ops.c:2590:glusterd_op_add_brick] 0-glusterd: Unable to add > bricks > > [2019-04-15 14:00:20.693637] E [MSGID: 106123] > [glusterd-mgmt.c:312:gd_mgmt_v3_commit_fn] 0-management: Add-brick commit > failed. > > [2019-04-15 14:00:20.693667] E [MSGID: 106123] > [glusterd-mgmt-handler.c:616:glusterd_handle_commit_fn] 0-management: > commit failed on operation Add brick > > > > Webserver11 log file: > > > > [2019-04-15 13:56:29.563270] I [MSGID: 106488] > [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management: > Received get vol req > > The message "I [MSGID: 106488] > [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management: > Received get vol req" repeated 2 times between [2019-04-15 13:56:29.563270] > and [2019-04-15 13:56:29.566209] > > [2019-04-15 14:00:33.996866] I [run.c:190:runner_log] > (-->/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0x3a215) > [0x7f36de924215] > -->/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0xe3e9d) > [0x7f36de9cde9d] -->/lib64/libglusterfs.so.0(runner_log+0x115) > [0x7f36e9ed6ea5] ) 0-management: Ran script: > /var/lib/glusterd/hooks/1/add-brick/pre/S28Quota-enable-root-xattr-heal.sh > --volname=dockervols --version=1 --volume-op=add-brick > --gd-workdir=/var/lib/glusterd > > [2019-04-15 14:00:33.996979] I [MSGID: 106578] > [glusterd-brick-ops.c:1354:glusterd_op_perform_add_bricks] 0-management: > replica-count is set 4 > > [2019-04-15 14:00:33.997004] I [MSGID: 106578] > [glusterd-brick-ops.c:1364:glusterd_op_perform_add_bricks] 0-management: > type is set 0, need to change it > > [2019-04-15 14:00:34.013789] I [MSGID: 106132] > [glusterd-proc-mgmt.c:84:glusterd_proc_stop] 0-management: nfs already > stopped > > [2019-04-15 14:00:34.013849] I [MSGID: 106568] > [glusterd-svc-mgmt.c:243:glusterd_svc_stop] 0-management: nfs service is > stopped > > [2019-04-15 14:00:34.017535] I [MSGID: 106568] > [glusterd-proc-mgmt.c:88:glusterd_proc_stop] 0-management: Stopping > glustershd daemon running in pid: 6087 > > [2019-04-15 14:00:35.018783] I [MSGID: 106568] > [glusterd-svc-mgmt.c:243:glusterd_svc_stop] 0-management: glustershd > service is stopped > > [2019-04-15 14:00:35.018952] I [MSGID: 106567] > [glusterd-svc-mgmt.c:211:glusterd_svc_start] 0-management: Starting > glustershd service > > [2019-04-15 14:00:35.028306] I [MSGID: 106132] > [glusterd-proc-mgmt.c:84:glusterd_proc_stop] 0-management: bitd already > stopped > > [2019-04-15 14:00:35.028408] I [MSGID: 106568] > [glusterd-svc-mgmt.c:243:glusterd_svc_stop] 0-management: bitd service is > stopped > > [2019-04-15 14:00:35.028601] I [MSGID: 106132] > [glusterd-proc-mgmt.c:84:glusterd_proc_stop] 0-management: scrub already > stopped > > [2019-04-15 14:00:35.028645] I [MSGID: 106568] > [glusterd-svc-mgmt.c:243:glusterd_svc_stop] 0-management: scrub service is > stopped > > > > Thank you for taking a look! > > > > Boris > > > > > > *From: *Atin Mukherjee > *Date: *Friday, April 12, 2019 at 1:10 PM > *To: *Boris Goldowsky > *Cc: *Gluster-users > *Subject: *Re: [Gluster-users] Volume stuck unable to add a brick > > > > > > > > On Fri, 12 Apr 2019 at 22:32, Boris Goldowsky wrote: > > I?ve got a replicated volume with three bricks (?1x3=3?), the idea is to > have a common set of files that are locally available on all the machines > (Scientific Linux 7, which is essentially CentOS 7) in a cluster. > > > > I tried to add on a fourth machine, so used a command like this: > > > > sudo gluster volume add-brick dockervols replica 4 > webserver8:/data/gluster/dockervols force > > > > but the result is: > > volume add-brick: failed: Commit failed on webserver1. Please check log > file for details. > > Commit failed on webserver8. Please check log file for details. > > Commit failed on webserver11. Please check log file for details. > > > > Tried: removing the new brick (this also fails) and trying again. > > Tried: checking the logs. The log files are not enlightening to me ? I > don?t know what?s normal and what?s not. > > > > From webserver8 & webserver11 could you attach glusterd log files? > > > > Also please share following: > > - gluster version? (gluster ?version) > > - Output of ?gluster peer status? > > - Output of ?gluster v info? from all 4 nodes. > > > > Tried: deleting the brick directory from previous attempt, so that it?s > not in the way. > > Tried: restarting gluster services > > Tried: rebooting > > Tried: setting up a new volume, replicated to all four machines. This > works, so I?m assuming it?s not a networking issue. But still fails with > this existing volume that has the critical data in it. > > > > Running out of ideas. Any suggestions? Thank you! > > > > Boris > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > > -- > > --Atin > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bgoldowsky at cast.org Mon Apr 15 14:05:52 2019 From: bgoldowsky at cast.org (Boris Goldowsky) Date: Mon, 15 Apr 2019 14:05:52 +0000 Subject: [Gluster-users] Volume stuck unable to add a brick In-Reply-To: References: <02CC8632-9B90-43F6-89FA-1160CE529667@contoso.com> Message-ID: <52179D48-7CF0-405E-805F-C5DCDF5B12CB@cast.org> Atin, thank you for the reply. Here are all of those pieces of information: [bgoldowsky at webserver9 ~]$ gluster --version glusterfs 3.12.2 (same on all nodes) [bgoldowsky at webserver9 ~]$ sudo gluster peer status Number of Peers: 3 Hostname: webserver11.cast.org Uuid: c2b147fd-cab4-4859-9922-db5730f8549d State: Peer in Cluster (Connected) Hostname: webserver1.cast.org Uuid: 4b918f65-2c9d-478e-8648-81d1d6526d4c State: Peer in Cluster (Connected) Other names: 192.168.200.131 webserver1 Hostname: webserver8.cast.org Uuid: be2f568b-61c5-4016-9264-083e4e6453a2 State: Peer in Cluster (Connected) Other names: webserver8 [bgoldowsky at webserver1 ~]$ sudo gluster v info Volume Name: dockervols Type: Replicate Volume ID: 6093a9c6-ec6c-463a-ad25-8c3e3305b98a Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: webserver1:/data/gluster/dockervols Brick2: webserver11:/data/gluster/dockervols Brick3: webserver9:/data/gluster/dockervols Options Reconfigured: nfs.disable: on transport.address-family: inet auth.allow: 127.0.0.1 Volume Name: testvol Type: Replicate Volume ID: 4d5f00f5-00ea-4dcf-babf-1a76eca55332 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 4 = 4 Transport-type: tcp Bricks: Brick1: webserver1:/data/gluster/testvol Brick2: webserver9:/data/gluster/testvol Brick3: webserver11:/data/gluster/testvol Brick4: webserver8:/data/gluster/testvol Options Reconfigured: transport.address-family: inet nfs.disable: on [bgoldowsky at webserver8 ~]$ sudo gluster v info Volume Name: dockervols Type: Replicate Volume ID: 6093a9c6-ec6c-463a-ad25-8c3e3305b98a Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: webserver1:/data/gluster/dockervols Brick2: webserver11:/data/gluster/dockervols Brick3: webserver9:/data/gluster/dockervols Options Reconfigured: nfs.disable: on transport.address-family: inet auth.allow: 127.0.0.1 Volume Name: testvol Type: Replicate Volume ID: 4d5f00f5-00ea-4dcf-babf-1a76eca55332 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 4 = 4 Transport-type: tcp Bricks: Brick1: webserver1:/data/gluster/testvol Brick2: webserver9:/data/gluster/testvol Brick3: webserver11:/data/gluster/testvol Brick4: webserver8:/data/gluster/testvol Options Reconfigured: nfs.disable: on transport.address-family: inet [bgoldowsky at webserver9 ~]$ sudo gluster v info Volume Name: dockervols Type: Replicate Volume ID: 6093a9c6-ec6c-463a-ad25-8c3e3305b98a Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: webserver1:/data/gluster/dockervols Brick2: webserver11:/data/gluster/dockervols Brick3: webserver9:/data/gluster/dockervols Options Reconfigured: nfs.disable: on transport.address-family: inet auth.allow: 127.0.0.1 Volume Name: testvol Type: Replicate Volume ID: 4d5f00f5-00ea-4dcf-babf-1a76eca55332 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 4 = 4 Transport-type: tcp Bricks: Brick1: webserver1:/data/gluster/testvol Brick2: webserver9:/data/gluster/testvol Brick3: webserver11:/data/gluster/testvol Brick4: webserver8:/data/gluster/testvol Options Reconfigured: nfs.disable: on transport.address-family: inet [bgoldowsky at webserver11 ~]$ sudo gluster v info Volume Name: dockervols Type: Replicate Volume ID: 6093a9c6-ec6c-463a-ad25-8c3e3305b98a Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: webserver1:/data/gluster/dockervols Brick2: webserver11:/data/gluster/dockervols Brick3: webserver9:/data/gluster/dockervols Options Reconfigured: auth.allow: 127.0.0.1 transport.address-family: inet nfs.disable: on Volume Name: testvol Type: Replicate Volume ID: 4d5f00f5-00ea-4dcf-babf-1a76eca55332 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 4 = 4 Transport-type: tcp Bricks: Brick1: webserver1:/data/gluster/testvol Brick2: webserver9:/data/gluster/testvol Brick3: webserver11:/data/gluster/testvol Brick4: webserver8:/data/gluster/testvol Options Reconfigured: transport.address-family: inet nfs.disable: on [bgoldowsky at webserver9 ~]$ sudo gluster volume add-brick dockervols replica 4 webserver8:/data/gluster/dockervols force volume add-brick: failed: Commit failed on webserver8.cast.org. Please check log file for details. Webserver8 glusterd.log: [2019-04-15 13:55:42.338197] I [MSGID: 106488] [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management: Received get vol req The message "I [MSGID: 106488] [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management: Received get vol req" repeated 2 times between [2019-04-15 13:55:42.338197] and [2019-04-15 13:55:42.341618] [2019-04-15 14:00:20.445011] I [run.c:190:runner_log] (-->/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0x3a215) [0x7fe697764215] -->/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0xe3e9d) [0x7fe69780de9d] -->/lib64/libglusterfs.so.0(runner_log+0x115) [0x7fe6a2d16ea5] ) 0-management: Ran script: /var/lib/glusterd/hooks/1/add-brick/pre/S28Quota-enable-root-xattr-heal.sh --volname=dockervols --version=1 --volume-op=add-brick --gd-workdir=/var/lib/glusterd [2019-04-15 14:00:20.445148] I [MSGID: 106578] [glusterd-brick-ops.c:1354:glusterd_op_perform_add_bricks] 0-management: replica-count is set 4 [2019-04-15 14:00:20.445184] I [MSGID: 106578] [glusterd-brick-ops.c:1364:glusterd_op_perform_add_bricks] 0-management: type is set 0, need to change it [2019-04-15 14:00:20.672347] E [MSGID: 106054] [glusterd-utils.c:13863:glusterd_handle_replicate_brick_ops] 0-management: Failed to set extended attribute trusted.add-brick : Transport endpoint is not connected [Transport endpoint is not connected] [2019-04-15 14:00:20.693491] E [MSGID: 101042] [compat.c:569:gf_umount_lazy] 0-management: Lazy unmount of /tmp/mntmvdFGq [Transport endpoint is not connected] [2019-04-15 14:00:20.693597] E [MSGID: 106074] [glusterd-brick-ops.c:2590:glusterd_op_add_brick] 0-glusterd: Unable to add bricks [2019-04-15 14:00:20.693637] E [MSGID: 106123] [glusterd-mgmt.c:312:gd_mgmt_v3_commit_fn] 0-management: Add-brick commit failed. [2019-04-15 14:00:20.693667] E [MSGID: 106123] [glusterd-mgmt-handler.c:616:glusterd_handle_commit_fn] 0-management: commit failed on operation Add brick Webserver11 log file: [2019-04-15 13:56:29.563270] I [MSGID: 106488] [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management: Received get vol req The message "I [MSGID: 106488] [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management: Received get vol req" repeated 2 times between [2019-04-15 13:56:29.563270] and [2019-04-15 13:56:29.566209] [2019-04-15 14:00:33.996866] I [run.c:190:runner_log] (-->/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0x3a215) [0x7f36de924215] -->/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0xe3e9d) [0x7f36de9cde9d] -->/lib64/libglusterfs.so.0(runner_log+0x115) [0x7f36e9ed6ea5] ) 0-management: Ran script: /var/lib/glusterd/hooks/1/add-brick/pre/S28Quota-enable-root-xattr-heal.sh --volname=dockervols --version=1 --volume-op=add-brick --gd-workdir=/var/lib/glusterd [2019-04-15 14:00:33.996979] I [MSGID: 106578] [glusterd-brick-ops.c:1354:glusterd_op_perform_add_bricks] 0-management: replica-count is set 4 [2019-04-15 14:00:33.997004] I [MSGID: 106578] [glusterd-brick-ops.c:1364:glusterd_op_perform_add_bricks] 0-management: type is set 0, need to change it [2019-04-15 14:00:34.013789] I [MSGID: 106132] [glusterd-proc-mgmt.c:84:glusterd_proc_stop] 0-management: nfs already stopped [2019-04-15 14:00:34.013849] I [MSGID: 106568] [glusterd-svc-mgmt.c:243:glusterd_svc_stop] 0-management: nfs service is stopped [2019-04-15 14:00:34.017535] I [MSGID: 106568] [glusterd-proc-mgmt.c:88:glusterd_proc_stop] 0-management: Stopping glustershd daemon running in pid: 6087 [2019-04-15 14:00:35.018783] I [MSGID: 106568] [glusterd-svc-mgmt.c:243:glusterd_svc_stop] 0-management: glustershd service is stopped [2019-04-15 14:00:35.018952] I [MSGID: 106567] [glusterd-svc-mgmt.c:211:glusterd_svc_start] 0-management: Starting glustershd service [2019-04-15 14:00:35.028306] I [MSGID: 106132] [glusterd-proc-mgmt.c:84:glusterd_proc_stop] 0-management: bitd already stopped [2019-04-15 14:00:35.028408] I [MSGID: 106568] [glusterd-svc-mgmt.c:243:glusterd_svc_stop] 0-management: bitd service is stopped [2019-04-15 14:00:35.028601] I [MSGID: 106132] [glusterd-proc-mgmt.c:84:glusterd_proc_stop] 0-management: scrub already stopped [2019-04-15 14:00:35.028645] I [MSGID: 106568] [glusterd-svc-mgmt.c:243:glusterd_svc_stop] 0-management: scrub service is stopped Thank you for taking a look! Boris From: Atin Mukherjee Date: Friday, April 12, 2019 at 1:10 PM To: Boris Goldowsky Cc: Gluster-users Subject: Re: [Gluster-users] Volume stuck unable to add a brick On Fri, 12 Apr 2019 at 22:32, Boris Goldowsky > wrote: I?ve got a replicated volume with three bricks (?1x3=3?), the idea is to have a common set of files that are locally available on all the machines (Scientific Linux 7, which is essentially CentOS 7) in a cluster. I tried to add on a fourth machine, so used a command like this: sudo gluster volume add-brick dockervols replica 4 webserver8:/data/gluster/dockervols force but the result is: volume add-brick: failed: Commit failed on webserver1. Please check log file for details. Commit failed on webserver8. Please check log file for details. Commit failed on webserver11. Please check log file for details. Tried: removing the new brick (this also fails) and trying again. Tried: checking the logs. The log files are not enlightening to me ? I don?t know what?s normal and what?s not. From webserver8 & webserver11 could you attach glusterd log files? Also please share following: - gluster version? (gluster ?version) - Output of ?gluster peer status? - Output of ?gluster v info? from all 4 nodes. Tried: deleting the brick directory from previous attempt, so that it?s not in the way. Tried: restarting gluster services Tried: rebooting Tried: setting up a new volume, replicated to all four machines. This works, so I?m assuming it?s not a networking issue. But still fails with this existing volume that has the critical data in it. Running out of ideas. Any suggestions? Thank you! Boris _______________________________________________ Gluster-users mailing list Gluster-users at gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users -- --Atin -------------- next part -------------- An HTML attachment was scrubbed... URL: From bgoldowsky at cast.org Mon Apr 15 17:19:09 2019 From: bgoldowsky at cast.org (Boris Goldowsky) Date: Mon, 15 Apr 2019 17:19:09 +0000 Subject: [Gluster-users] Volume stuck unable to add a brick Message-ID: <2d745cdf-c14d-4f15-9482-2b1176c5ecde@email.android.com> 3.12.2 Boris On Apr 15, 2019 12:13 PM, Atin Mukherjee wrote: +Karthik Subrahmanya Didn't we we fix this problem recently? Failed to set extended attribute indicates that temp mount is failing and we don't have quorum number of bricks up. Boris - What's the gluster version are you using? On Mon, Apr 15, 2019 at 7:35 PM Boris Goldowsky > wrote: Atin, thank you for the reply. Here are all of those pieces of information: [bgoldowsky at webserver9 ~]$ gluster --version glusterfs 3.12.2 (same on all nodes) [bgoldowsky at webserver9 ~]$ sudo gluster peer status Number of Peers: 3 Hostname: webserver11.cast.org Uuid: c2b147fd-cab4-4859-9922-db5730f8549d State: Peer in Cluster (Connected) Hostname: webserver1.cast.org Uuid: 4b918f65-2c9d-478e-8648-81d1d6526d4c State: Peer in Cluster (Connected) Other names: 192.168.200.131 webserver1 Hostname: webserver8.cast.org Uuid: be2f568b-61c5-4016-9264-083e4e6453a2 State: Peer in Cluster (Connected) Other names: webserver8 [bgoldowsky at webserver1 ~]$ sudo gluster v info Volume Name: dockervols Type: Replicate Volume ID: 6093a9c6-ec6c-463a-ad25-8c3e3305b98a Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: webserver1:/data/gluster/dockervols Brick2: webserver11:/data/gluster/dockervols Brick3: webserver9:/data/gluster/dockervols Options Reconfigured: nfs.disable: on transport.address-family: inet auth.allow: 127.0.0.1 Volume Name: testvol Type: Replicate Volume ID: 4d5f00f5-00ea-4dcf-babf-1a76eca55332 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 4 = 4 Transport-type: tcp Bricks: Brick1: webserver1:/data/gluster/testvol Brick2: webserver9:/data/gluster/testvol Brick3: webserver11:/data/gluster/testvol Brick4: webserver8:/data/gluster/testvol Options Reconfigured: transport.address-family: inet nfs.disable: on [bgoldowsky at webserver8 ~]$ sudo gluster v info Volume Name: dockervols Type: Replicate Volume ID: 6093a9c6-ec6c-463a-ad25-8c3e3305b98a Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: webserver1:/data/gluster/dockervols Brick2: webserver11:/data/gluster/dockervols Brick3: webserver9:/data/gluster/dockervols Options Reconfigured: nfs.disable: on transport.address-family: inet auth.allow: 127.0.0.1 Volume Name: testvol Type: Replicate Volume ID: 4d5f00f5-00ea-4dcf-babf-1a76eca55332 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 4 = 4 Transport-type: tcp Bricks: Brick1: webserver1:/data/gluster/testvol Brick2: webserver9:/data/gluster/testvol Brick3: webserver11:/data/gluster/testvol Brick4: webserver8:/data/gluster/testvol Options Reconfigured: nfs.disable: on transport.address-family: inet [bgoldowsky at webserver9 ~]$ sudo gluster v info Volume Name: dockervols Type: Replicate Volume ID: 6093a9c6-ec6c-463a-ad25-8c3e3305b98a Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: webserver1:/data/gluster/dockervols Brick2: webserver11:/data/gluster/dockervols Brick3: webserver9:/data/gluster/dockervols Options Reconfigured: nfs.disable: on transport.address-family: inet auth.allow: 127.0.0.1 Volume Name: testvol Type: Replicate Volume ID: 4d5f00f5-00ea-4dcf-babf-1a76eca55332 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 4 = 4 Transport-type: tcp Bricks: Brick1: webserver1:/data/gluster/testvol Brick2: webserver9:/data/gluster/testvol Brick3: webserver11:/data/gluster/testvol Brick4: webserver8:/data/gluster/testvol Options Reconfigured: nfs.disable: on transport.address-family: inet [bgoldowsky at webserver11 ~]$ sudo gluster v info Volume Name: dockervols Type: Replicate Volume ID: 6093a9c6-ec6c-463a-ad25-8c3e3305b98a Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: webserver1:/data/gluster/dockervols Brick2: webserver11:/data/gluster/dockervols Brick3: webserver9:/data/gluster/dockervols Options Reconfigured: auth.allow: 127.0.0.1 transport.address-family: inet nfs.disable: on Volume Name: testvol Type: Replicate Volume ID: 4d5f00f5-00ea-4dcf-babf-1a76eca55332 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 4 = 4 Transport-type: tcp Bricks: Brick1: webserver1:/data/gluster/testvol Brick2: webserver9:/data/gluster/testvol Brick3: webserver11:/data/gluster/testvol Brick4: webserver8:/data/gluster/testvol Options Reconfigured: transport.address-family: inet nfs.disable: on [bgoldowsky at webserver9 ~]$ sudo gluster volume add-brick dockervols replica 4 webserver8:/data/gluster/dockervols force volume add-brick: failed: Commit failed on webserver8.cast.org. Please check log file for details. Webserver8 glusterd.log: [2019-04-15 13:55:42.338197] I [MSGID: 106488] [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management: Received get vol req The message "I [MSGID: 106488] [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management: Received get vol req" repeated 2 times between [2019-04-15 13:55:42.338197] and [2019-04-15 13:55:42.341618] [2019-04-15 14:00:20.445011] I [run.c:190:runner_log] (-->/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0x3a215) [0x7fe697764215] -->/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0xe3e9d) [0x7fe69780de9d] -->/lib64/libglusterfs.so.0(runner_log+0x115) [0x7fe6a2d16ea5] ) 0-management: Ran script: /var/lib/glusterd/hooks/1/add-brick/pre/S28Quota-enable-root-xattr-heal.sh --volname=dockervols --version=1 --volume-op=add-brick --gd-workdir=/var/lib/glusterd [2019-04-15 14:00:20.445148] I [MSGID: 106578] [glusterd-brick-ops.c:1354:glusterd_op_perform_add_bricks] 0-management: replica-count is set 4 [2019-04-15 14:00:20.445184] I [MSGID: 106578] [glusterd-brick-ops.c:1364:glusterd_op_perform_add_bricks] 0-management: type is set 0, need to change it [2019-04-15 14:00:20.672347] E [MSGID: 106054] [glusterd-utils.c:13863:glusterd_handle_replicate_brick_ops] 0-management: Failed to set extended attribute trusted.add-brick : Transport endpoint is not connected [Transport endpoint is not connected] [2019-04-15 14:00:20.693491] E [MSGID: 101042] [compat.c:569:gf_umount_lazy] 0-management: Lazy unmount of /tmp/mntmvdFGq [Transport endpoint is not connected] [2019-04-15 14:00:20.693597] E [MSGID: 106074] [glusterd-brick-ops.c:2590:glusterd_op_add_brick] 0-glusterd: Unable to add bricks [2019-04-15 14:00:20.693637] E [MSGID: 106123] [glusterd-mgmt.c:312:gd_mgmt_v3_commit_fn] 0-management: Add-brick commit failed. [2019-04-15 14:00:20.693667] E [MSGID: 106123] [glusterd-mgmt-handler.c:616:glusterd_handle_commit_fn] 0-management: commit failed on operation Add brick Webserver11 log file: [2019-04-15 13:56:29.563270] I [MSGID: 106488] [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management: Received get vol req The message "I [MSGID: 106488] [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management: Received get vol req" repeated 2 times between [2019-04-15 13:56:29.563270] and [2019-04-15 13:56:29.566209] [2019-04-15 14:00:33.996866] I [run.c:190:runner_log] (-->/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0x3a215) [0x7f36de924215] -->/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0xe3e9d) [0x7f36de9cde9d] -->/lib64/libglusterfs.so.0(runner_log+0x115) [0x7f36e9ed6ea5] ) 0-management: Ran script: /var/lib/glusterd/hooks/1/add-brick/pre/S28Quota-enable-root-xattr-heal.sh --volname=dockervols --version=1 --volume-op=add-brick --gd-workdir=/var/lib/glusterd [2019-04-15 14:00:33.996979] I [MSGID: 106578] [glusterd-brick-ops.c:1354:glusterd_op_perform_add_bricks] 0-management: replica-count is set 4 [2019-04-15 14:00:33.997004] I [MSGID: 106578] [glusterd-brick-ops.c:1364:glusterd_op_perform_add_bricks] 0-management: type is set 0, need to change it [2019-04-15 14:00:34.013789] I [MSGID: 106132] [glusterd-proc-mgmt.c:84:glusterd_proc_stop] 0-management: nfs already stopped [2019-04-15 14:00:34.013849] I [MSGID: 106568] [glusterd-svc-mgmt.c:243:glusterd_svc_stop] 0-management: nfs service is stopped [2019-04-15 14:00:34.017535] I [MSGID: 106568] [glusterd-proc-mgmt.c:88:glusterd_proc_stop] 0-management: Stopping glustershd daemon running in pid: 6087 [2019-04-15 14:00:35.018783] I [MSGID: 106568] [glusterd-svc-mgmt.c:243:glusterd_svc_stop] 0-management: glustershd service is stopped [2019-04-15 14:00:35.018952] I [MSGID: 106567] [glusterd-svc-mgmt.c:211:glusterd_svc_start] 0-management: Starting glustershd service [2019-04-15 14:00:35.028306] I [MSGID: 106132] [glusterd-proc-mgmt.c:84:glusterd_proc_stop] 0-management: bitd already stopped [2019-04-15 14:00:35.028408] I [MSGID: 106568] [glusterd-svc-mgmt.c:243:glusterd_svc_stop] 0-management: bitd service is stopped [2019-04-15 14:00:35.028601] I [MSGID: 106132] [glusterd-proc-mgmt.c:84:glusterd_proc_stop] 0-management: scrub already stopped [2019-04-15 14:00:35.028645] I [MSGID: 106568] [glusterd-svc-mgmt.c:243:glusterd_svc_stop] 0-management: scrub service is stopped Thank you for taking a look! Boris From: Atin Mukherjee > Date: Friday, April 12, 2019 at 1:10 PM To: Boris Goldowsky > Cc: Gluster-users > Subject: Re: [Gluster-users] Volume stuck unable to add a brick On Fri, 12 Apr 2019 at 22:32, Boris Goldowsky > wrote: I?ve got a replicated volume with three bricks (?1x3=3?), the idea is to have a common set of files that are locally available on all the machines (Scientific Linux 7, which is essentially CentOS 7) in a cluster. I tried to add on a fourth machine, so used a command like this: sudo gluster volume add-brick dockervols replica 4 webserver8:/data/gluster/dockervols force but the result is: volume add-brick: failed: Commit failed on webserver1. Please check log file for details. Commit failed on webserver8. Please check log file for details. Commit failed on webserver11. Please check log file for details. Tried: removing the new brick (this also fails) and trying again. Tried: checking the logs. The log files are not enlightening to me ? I don?t know what?s normal and what?s not. From webserver8 & webserver11 could you attach glusterd log files? Also please share following: - gluster version? (gluster ?version) - Output of ?gluster peer status? - Output of ?gluster v info? from all 4 nodes. Tried: deleting the brick directory from previous attempt, so that it?s not in the way. Tried: restarting gluster services Tried: rebooting Tried: setting up a new volume, replicated to all four machines. This works, so I?m assuming it?s not a networking issue. But still fails with this existing volume that has the critical data in it. Running out of ideas. Any suggestions? Thank you! Boris _______________________________________________ Gluster-users mailing list Gluster-users at gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users -- --Atin -------------- next part -------------- An HTML attachment was scrubbed... URL: From pgurusid at redhat.com Tue Apr 16 05:57:01 2019 From: pgurusid at redhat.com (Poornima Gurusiddaiah) Date: Tue, 16 Apr 2019 11:27:01 +0530 Subject: [Gluster-users] Reg: Gluster In-Reply-To: References: Message-ID: +Sunny On Wed, Apr 10, 2019, 9:02 PM Gomathi Nayagam wrote: > Hi User, > > We are testing geo-replication of gluster it is taking nearly 8 mins to > transfer 16 GB size of data between the DCs while when transferred the same > data over plain rsync it took only 2 mins. Can we know if we are missing > something? > > > > Thanks & Regards, > Gomathi Nayagam.D > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From ksubrahm at redhat.com Tue Apr 16 06:51:44 2019 From: ksubrahm at redhat.com (Karthik Subrahmanya) Date: Tue, 16 Apr 2019 12:21:44 +0530 Subject: [Gluster-users] Volume stuck unable to add a brick In-Reply-To: References: <02CC8632-9B90-43F6-89FA-1160CE529667@contoso.com> <52179D48-7CF0-405E-805F-C5DCDF5B12CB@cast.org> Message-ID: On Mon, Apr 15, 2019 at 9:43 PM Atin Mukherjee wrote: > +Karthik Subrahmanya > > Didn't we we fix this problem recently? Failed to set extended attribute > indicates that temp mount is failing and we don't have quorum number of > bricks up. > We had two fixes which handles two kind of add-brick scenarios. [1] Fails add-brick when increasing the replica count if any of the brick is down to avoid data loss. This can be overridden by using the force option. [2] Allow add-brick to set the extended attributes by the temp mount if the volume is already mounted (has clients). They are in version 3.12.2 so, patch [1] is present there. But since they are using the force option it should not have any problem even if they have any brick down. The error message they are getting is also different, so it is not because of any brick being down I guess. Patch [2] is not present in 3.12.2 and it is not the conversion from plain distribute to replicate volume. So the scenario is different here. It seems like they are hitting some other issue. @Boris, Can you attach the add-brick's temp mount log. The file name should look something like "dockervols-add-brick-mount.log". Can you also provide all the brick logs of that volume during that time. [1] https://review.gluster.org/#/c/glusterfs/+/16330/ [2] https://review.gluster.org/#/c/glusterfs/+/21791/ Regards, Karthik > > Boris - What's the gluster version are you using? > > > > On Mon, Apr 15, 2019 at 7:35 PM Boris Goldowsky > wrote: > >> Atin, thank you for the reply. Here are all of those pieces of >> information: >> >> >> >> [bgoldowsky at webserver9 ~]$ gluster --version >> >> glusterfs 3.12.2 >> >> (same on all nodes) >> >> >> >> [bgoldowsky at webserver9 ~]$ sudo gluster peer status >> >> Number of Peers: 3 >> >> >> >> Hostname: webserver11.cast.org >> >> Uuid: c2b147fd-cab4-4859-9922-db5730f8549d >> >> State: Peer in Cluster (Connected) >> >> >> >> Hostname: webserver1.cast.org >> >> Uuid: 4b918f65-2c9d-478e-8648-81d1d6526d4c >> >> State: Peer in Cluster (Connected) >> >> Other names: >> >> 192.168.200.131 >> >> webserver1 >> >> >> >> Hostname: webserver8.cast.org >> >> Uuid: be2f568b-61c5-4016-9264-083e4e6453a2 >> >> State: Peer in Cluster (Connected) >> >> Other names: >> >> webserver8 >> >> >> >> [bgoldowsky at webserver1 ~]$ sudo gluster v info >> >> Volume Name: dockervols >> >> Type: Replicate >> >> Volume ID: 6093a9c6-ec6c-463a-ad25-8c3e3305b98a >> >> Status: Started >> >> Snapshot Count: 0 >> >> Number of Bricks: 1 x 3 = 3 >> >> Transport-type: tcp >> >> Bricks: >> >> Brick1: webserver1:/data/gluster/dockervols >> >> Brick2: webserver11:/data/gluster/dockervols >> >> Brick3: webserver9:/data/gluster/dockervols >> >> Options Reconfigured: >> >> nfs.disable: on >> >> transport.address-family: inet >> >> auth.allow: 127.0.0.1 >> >> >> >> Volume Name: testvol >> >> Type: Replicate >> >> Volume ID: 4d5f00f5-00ea-4dcf-babf-1a76eca55332 >> >> Status: Started >> >> Snapshot Count: 0 >> >> Number of Bricks: 1 x 4 = 4 >> >> Transport-type: tcp >> >> Bricks: >> >> Brick1: webserver1:/data/gluster/testvol >> >> Brick2: webserver9:/data/gluster/testvol >> >> Brick3: webserver11:/data/gluster/testvol >> >> Brick4: webserver8:/data/gluster/testvol >> >> Options Reconfigured: >> >> transport.address-family: inet >> >> nfs.disable: on >> >> >> >> [bgoldowsky at webserver8 ~]$ sudo gluster v info >> >> Volume Name: dockervols >> >> Type: Replicate >> >> Volume ID: 6093a9c6-ec6c-463a-ad25-8c3e3305b98a >> >> Status: Started >> >> Snapshot Count: 0 >> >> Number of Bricks: 1 x 3 = 3 >> >> Transport-type: tcp >> >> Bricks: >> >> Brick1: webserver1:/data/gluster/dockervols >> >> Brick2: webserver11:/data/gluster/dockervols >> >> Brick3: webserver9:/data/gluster/dockervols >> >> Options Reconfigured: >> >> nfs.disable: on >> >> transport.address-family: inet >> >> auth.allow: 127.0.0.1 >> >> >> >> Volume Name: testvol >> >> Type: Replicate >> >> Volume ID: 4d5f00f5-00ea-4dcf-babf-1a76eca55332 >> >> Status: Started >> >> Snapshot Count: 0 >> >> Number of Bricks: 1 x 4 = 4 >> >> Transport-type: tcp >> >> Bricks: >> >> Brick1: webserver1:/data/gluster/testvol >> >> Brick2: webserver9:/data/gluster/testvol >> >> Brick3: webserver11:/data/gluster/testvol >> >> Brick4: webserver8:/data/gluster/testvol >> >> Options Reconfigured: >> >> nfs.disable: on >> >> transport.address-family: inet >> >> >> >> [bgoldowsky at webserver9 ~]$ sudo gluster v info >> >> Volume Name: dockervols >> >> Type: Replicate >> >> Volume ID: 6093a9c6-ec6c-463a-ad25-8c3e3305b98a >> >> Status: Started >> >> Snapshot Count: 0 >> >> Number of Bricks: 1 x 3 = 3 >> >> Transport-type: tcp >> >> Bricks: >> >> Brick1: webserver1:/data/gluster/dockervols >> >> Brick2: webserver11:/data/gluster/dockervols >> >> Brick3: webserver9:/data/gluster/dockervols >> >> Options Reconfigured: >> >> nfs.disable: on >> >> transport.address-family: inet >> >> auth.allow: 127.0.0.1 >> >> >> >> Volume Name: testvol >> >> Type: Replicate >> >> Volume ID: 4d5f00f5-00ea-4dcf-babf-1a76eca55332 >> >> Status: Started >> >> Snapshot Count: 0 >> >> Number of Bricks: 1 x 4 = 4 >> >> Transport-type: tcp >> >> Bricks: >> >> Brick1: webserver1:/data/gluster/testvol >> >> Brick2: webserver9:/data/gluster/testvol >> >> Brick3: webserver11:/data/gluster/testvol >> >> Brick4: webserver8:/data/gluster/testvol >> >> Options Reconfigured: >> >> nfs.disable: on >> >> transport.address-family: inet >> >> >> >> [bgoldowsky at webserver11 ~]$ sudo gluster v info >> >> Volume Name: dockervols >> >> Type: Replicate >> >> Volume ID: 6093a9c6-ec6c-463a-ad25-8c3e3305b98a >> >> Status: Started >> >> Snapshot Count: 0 >> >> Number of Bricks: 1 x 3 = 3 >> >> Transport-type: tcp >> >> Bricks: >> >> Brick1: webserver1:/data/gluster/dockervols >> >> Brick2: webserver11:/data/gluster/dockervols >> >> Brick3: webserver9:/data/gluster/dockervols >> >> Options Reconfigured: >> >> auth.allow: 127.0.0.1 >> >> transport.address-family: inet >> >> nfs.disable: on >> >> >> >> Volume Name: testvol >> >> Type: Replicate >> >> Volume ID: 4d5f00f5-00ea-4dcf-babf-1a76eca55332 >> >> Status: Started >> >> Snapshot Count: 0 >> >> Number of Bricks: 1 x 4 = 4 >> >> Transport-type: tcp >> >> Bricks: >> >> Brick1: webserver1:/data/gluster/testvol >> >> Brick2: webserver9:/data/gluster/testvol >> >> Brick3: webserver11:/data/gluster/testvol >> >> Brick4: webserver8:/data/gluster/testvol >> >> Options Reconfigured: >> >> transport.address-family: inet >> >> nfs.disable: on >> >> >> >> [bgoldowsky at webserver9 ~]$ sudo gluster volume add-brick dockervols >> replica 4 webserver8:/data/gluster/dockervols force >> >> volume add-brick: failed: Commit failed on webserver8.cast.org. Please >> check log file for details. >> >> >> >> Webserver8 glusterd.log: >> >> >> >> [2019-04-15 13:55:42.338197] I [MSGID: 106488] >> [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management: >> Received get vol req >> >> The message "I [MSGID: 106488] >> [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management: >> Received get vol req" repeated 2 times between [2019-04-15 13:55:42.338197] >> and [2019-04-15 13:55:42.341618] >> >> [2019-04-15 14:00:20.445011] I [run.c:190:runner_log] >> (-->/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0x3a215) >> [0x7fe697764215] >> -->/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0xe3e9d) >> [0x7fe69780de9d] -->/lib64/libglusterfs.so.0(runner_log+0x115) >> [0x7fe6a2d16ea5] ) 0-management: Ran script: >> /var/lib/glusterd/hooks/1/add-brick/pre/S28Quota-enable-root-xattr-heal.sh >> --volname=dockervols --version=1 --volume-op=add-brick >> --gd-workdir=/var/lib/glusterd >> >> [2019-04-15 14:00:20.445148] I [MSGID: 106578] >> [glusterd-brick-ops.c:1354:glusterd_op_perform_add_bricks] 0-management: >> replica-count is set 4 >> >> [2019-04-15 14:00:20.445184] I [MSGID: 106578] >> [glusterd-brick-ops.c:1364:glusterd_op_perform_add_bricks] 0-management: >> type is set 0, need to change it >> >> [2019-04-15 14:00:20.672347] E [MSGID: 106054] >> [glusterd-utils.c:13863:glusterd_handle_replicate_brick_ops] 0-management: >> Failed to set extended attribute trusted.add-brick : Transport endpoint is >> not connected [Transport endpoint is not connected] >> >> [2019-04-15 14:00:20.693491] E [MSGID: 101042] >> [compat.c:569:gf_umount_lazy] 0-management: Lazy unmount of /tmp/mntmvdFGq >> [Transport endpoint is not connected] >> >> [2019-04-15 14:00:20.693597] E [MSGID: 106074] >> [glusterd-brick-ops.c:2590:glusterd_op_add_brick] 0-glusterd: Unable to add >> bricks >> >> [2019-04-15 14:00:20.693637] E [MSGID: 106123] >> [glusterd-mgmt.c:312:gd_mgmt_v3_commit_fn] 0-management: Add-brick commit >> failed. >> >> [2019-04-15 14:00:20.693667] E [MSGID: 106123] >> [glusterd-mgmt-handler.c:616:glusterd_handle_commit_fn] 0-management: >> commit failed on operation Add brick >> >> >> >> Webserver11 log file: >> >> >> >> [2019-04-15 13:56:29.563270] I [MSGID: 106488] >> [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management: >> Received get vol req >> >> The message "I [MSGID: 106488] >> [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management: >> Received get vol req" repeated 2 times between [2019-04-15 13:56:29.563270] >> and [2019-04-15 13:56:29.566209] >> >> [2019-04-15 14:00:33.996866] I [run.c:190:runner_log] >> (-->/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0x3a215) >> [0x7f36de924215] >> -->/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0xe3e9d) >> [0x7f36de9cde9d] -->/lib64/libglusterfs.so.0(runner_log+0x115) >> [0x7f36e9ed6ea5] ) 0-management: Ran script: >> /var/lib/glusterd/hooks/1/add-brick/pre/S28Quota-enable-root-xattr-heal.sh >> --volname=dockervols --version=1 --volume-op=add-brick >> --gd-workdir=/var/lib/glusterd >> >> [2019-04-15 14:00:33.996979] I [MSGID: 106578] >> [glusterd-brick-ops.c:1354:glusterd_op_perform_add_bricks] 0-management: >> replica-count is set 4 >> >> [2019-04-15 14:00:33.997004] I [MSGID: 106578] >> [glusterd-brick-ops.c:1364:glusterd_op_perform_add_bricks] 0-management: >> type is set 0, need to change it >> >> [2019-04-15 14:00:34.013789] I [MSGID: 106132] >> [glusterd-proc-mgmt.c:84:glusterd_proc_stop] 0-management: nfs already >> stopped >> >> [2019-04-15 14:00:34.013849] I [MSGID: 106568] >> [glusterd-svc-mgmt.c:243:glusterd_svc_stop] 0-management: nfs service is >> stopped >> >> [2019-04-15 14:00:34.017535] I [MSGID: 106568] >> [glusterd-proc-mgmt.c:88:glusterd_proc_stop] 0-management: Stopping >> glustershd daemon running in pid: 6087 >> >> [2019-04-15 14:00:35.018783] I [MSGID: 106568] >> [glusterd-svc-mgmt.c:243:glusterd_svc_stop] 0-management: glustershd >> service is stopped >> >> [2019-04-15 14:00:35.018952] I [MSGID: 106567] >> [glusterd-svc-mgmt.c:211:glusterd_svc_start] 0-management: Starting >> glustershd service >> >> [2019-04-15 14:00:35.028306] I [MSGID: 106132] >> [glusterd-proc-mgmt.c:84:glusterd_proc_stop] 0-management: bitd already >> stopped >> >> [2019-04-15 14:00:35.028408] I [MSGID: 106568] >> [glusterd-svc-mgmt.c:243:glusterd_svc_stop] 0-management: bitd service is >> stopped >> >> [2019-04-15 14:00:35.028601] I [MSGID: 106132] >> [glusterd-proc-mgmt.c:84:glusterd_proc_stop] 0-management: scrub already >> stopped >> >> [2019-04-15 14:00:35.028645] I [MSGID: 106568] >> [glusterd-svc-mgmt.c:243:glusterd_svc_stop] 0-management: scrub service is >> stopped >> >> >> >> Thank you for taking a look! >> >> >> >> Boris >> >> >> >> >> >> *From: *Atin Mukherjee >> *Date: *Friday, April 12, 2019 at 1:10 PM >> *To: *Boris Goldowsky >> *Cc: *Gluster-users >> *Subject: *Re: [Gluster-users] Volume stuck unable to add a brick >> >> >> >> >> >> >> >> On Fri, 12 Apr 2019 at 22:32, Boris Goldowsky >> wrote: >> >> I?ve got a replicated volume with three bricks (?1x3=3?), the idea is to >> have a common set of files that are locally available on all the machines >> (Scientific Linux 7, which is essentially CentOS 7) in a cluster. >> >> >> >> I tried to add on a fourth machine, so used a command like this: >> >> >> >> sudo gluster volume add-brick dockervols replica 4 >> webserver8:/data/gluster/dockervols force >> >> >> >> but the result is: >> >> volume add-brick: failed: Commit failed on webserver1. Please check log >> file for details. >> >> Commit failed on webserver8. Please check log file for details. >> >> Commit failed on webserver11. Please check log file for details. >> >> >> >> Tried: removing the new brick (this also fails) and trying again. >> >> Tried: checking the logs. The log files are not enlightening to me ? I >> don?t know what?s normal and what?s not. >> >> >> >> From webserver8 & webserver11 could you attach glusterd log files? >> >> >> >> Also please share following: >> >> - gluster version? (gluster ?version) >> >> - Output of ?gluster peer status? >> >> - Output of ?gluster v info? from all 4 nodes. >> >> >> >> Tried: deleting the brick directory from previous attempt, so that it?s >> not in the way. >> >> Tried: restarting gluster services >> >> Tried: rebooting >> >> Tried: setting up a new volume, replicated to all four machines. This >> works, so I?m assuming it?s not a networking issue. But still fails with >> this existing volume that has the critical data in it. >> >> >> >> Running out of ideas. Any suggestions? Thank you! >> >> >> >> Boris >> >> >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users >> >> -- >> >> --Atin >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pgurusid at redhat.com Tue Apr 16 06:54:43 2019 From: pgurusid at redhat.com (Poornima Gurusiddaiah) Date: Tue, 16 Apr 2019 12:24:43 +0530 Subject: [Gluster-users] Difference between processes: shrinking volume and replacing faulty brick In-Reply-To: References: Message-ID: Do you have plain distributed volume without any replication? If so replace brick should copy the data on the faulty brick to the new brick, unless there is some old data which also would need rebalance. Having, add brick followed by remove brick and doing a rebalance is inefficient, i think we should have just the old brick data copied to the new brick, and rebalance the whole volume when necessary. Adding the distribute experts to the thread. If you are ok with downtime, trying xfsdump and restore of the faulty brick and reforming the volume may be faster. Regards, Poornima On Mon, Apr 15, 2019, 6:40 PM Greene, Tami McFarlin wrote: > We need to remove a server node from our configuration (distributed > volume). There is more than enough space on the remaining bricks to > accept the data attached to the failing server; we didn?t know if one > process or the other would be significantly faster. We know shrinking the > volume (remove-brick) rebalances as it moves the data; so moving 506G > resuled in the rebalancing of 1.8T and took considerable time. > > > > Reading the documentation, it seems that replacing a brick is simplying > introducing an empty brick to accept the displaced data, but it is the > exact same process: remove-brick. > > > > Is there anyway to migrate the data without rebalancing at the same time > and then rebalancing once all data has been moved? I know that is not > ideal, but it would allow us to remove the problem server much quicker and > resume production while rebalancing. > > > > Tami > > > > Tami McFarlin Greene > > Lab Technician > > RF, Communications, and Intelligent Systems Group > > Electrical and Electronics System Research Division > > Oak Ridge National Laboratory > > Bldg. 3500, Rm. A15 > > greenet at ornl.gov (865) > 643-0401 > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From pgurusid at redhat.com Tue Apr 16 06:57:32 2019 From: pgurusid at redhat.com (Poornima Gurusiddaiah) Date: Tue, 16 Apr 2019 12:27:32 +0530 Subject: [Gluster-users] Upgrade 5.5 -> 5.6: network traffic bug fixed? In-Reply-To: References: Message-ID: Thank you for reporting this. I had done testing on my local setup and the issue was resolved even with quick-read enabled. Let me test it again. Regards, Poornima On Mon, Apr 15, 2019 at 12:25 PM Hu Bert wrote: > fyi: after setting performance.quick-read to off network traffic > dropped to normal levels, client load/iowait back to normal as well. > > client: https://abload.de/img/network-client-afterihjqi.png > server: https://abload.de/img/network-server-afterwdkrl.png > > Am Mo., 15. Apr. 2019 um 08:33 Uhr schrieb Hu Bert >: > > > > Good Morning, > > > > today i updated my replica 3 setup (debian stretch) from version 5.5 > > to 5.6, as i thought the network traffic bug (#1673058) was fixed and > > i could re-activate 'performance.quick-read' again. See release notes: > > > > https://review.gluster.org/#/c/glusterfs/+/22538/ > > > http://git.gluster.org/cgit/glusterfs.git/commit/?id=34a2347780c2429284f57232f3aabb78547a9795 > > > > Upgrade went fine, and then i was watching iowait and network traffic. > > It seems that the network traffic went up after upgrade and > > reactivation of performance.quick-read. Here are some graphs: > > > > network client1: https://abload.de/img/network-clientfwj1m.png > > network client2: https://abload.de/img/network-client2trkow.png > > network server: https://abload.de/img/network-serverv3jjr.png > > > > gluster volume info: https://pastebin.com/ZMuJYXRZ > > > > Just wondering if the network traffic bug really got fixed or if this > > is a new problem. I'll wait a couple of minutes and then deactivate > > performance.quick-read again, just to see if network traffic goes down > > to normal levels. > > > > > > Best regards, > > Hubert > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From revirii at googlemail.com Tue Apr 16 07:43:47 2019 From: revirii at googlemail.com (Hu Bert) Date: Tue, 16 Apr 2019 09:43:47 +0200 Subject: [Gluster-users] Upgrade 5.5 -> 5.6: network traffic bug fixed? In-Reply-To: References: Message-ID: In my first test on my testing setup the traffic was on a normal level, so i thought i was "safe". But on my live system the network traffic was a multiple of the traffic one would expect. performance.quick-read was enabled in both, the only difference in the volume options between live and testing are: performance.read-ahead: testing on, live off performance.io-cache: testing on, live off I ran another test on my testing setup, deactivated both and copied 9 GB of data. Now the traffic went up as well, from before ~9-10 MBit/s up to 100 MBit/s with both options off. Does performance.quick-read require one of those options set to 'on'? I'll start another test shortly, and activate on of those 2 options, maybe there's a connection between those 3 options? Best Regards, Hubert Am Di., 16. Apr. 2019 um 08:57 Uhr schrieb Poornima Gurusiddaiah : > > Thank you for reporting this. I had done testing on my local setup and the issue was resolved even with quick-read enabled. Let me test it again. > > Regards, > Poornima > > On Mon, Apr 15, 2019 at 12:25 PM Hu Bert wrote: >> >> fyi: after setting performance.quick-read to off network traffic >> dropped to normal levels, client load/iowait back to normal as well. >> >> client: https://abload.de/img/network-client-afterihjqi.png >> server: https://abload.de/img/network-server-afterwdkrl.png >> >> Am Mo., 15. Apr. 2019 um 08:33 Uhr schrieb Hu Bert : >> > >> > Good Morning, >> > >> > today i updated my replica 3 setup (debian stretch) from version 5.5 >> > to 5.6, as i thought the network traffic bug (#1673058) was fixed and >> > i could re-activate 'performance.quick-read' again. See release notes: >> > >> > https://review.gluster.org/#/c/glusterfs/+/22538/ >> > http://git.gluster.org/cgit/glusterfs.git/commit/?id=34a2347780c2429284f57232f3aabb78547a9795 >> > >> > Upgrade went fine, and then i was watching iowait and network traffic. >> > It seems that the network traffic went up after upgrade and >> > reactivation of performance.quick-read. Here are some graphs: >> > >> > network client1: https://abload.de/img/network-clientfwj1m.png >> > network client2: https://abload.de/img/network-client2trkow.png >> > network server: https://abload.de/img/network-serverv3jjr.png >> > >> > gluster volume info: https://pastebin.com/ZMuJYXRZ >> > >> > Just wondering if the network traffic bug really got fixed or if this >> > is a new problem. I'll wait a couple of minutes and then deactivate >> > performance.quick-read again, just to see if network traffic goes down >> > to normal levels. >> > >> > >> > Best regards, >> > Hubert >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users From bgoldowsky at cast.org Tue Apr 16 11:50:41 2019 From: bgoldowsky at cast.org (Boris Goldowsky) Date: Tue, 16 Apr 2019 11:50:41 +0000 Subject: [Gluster-users] Volume stuck unable to add a brick In-Reply-To: References: <02CC8632-9B90-43F6-89FA-1160CE529667@contoso.com> <52179D48-7CF0-405E-805F-C5DCDF5B12CB@cast.org> Message-ID: OK, log files attached. Boris From: Karthik Subrahmanya Date: Tuesday, April 16, 2019 at 2:52 AM To: Atin Mukherjee , Boris Goldowsky Cc: Gluster-users Subject: Re: [Gluster-users] Volume stuck unable to add a brick On Mon, Apr 15, 2019 at 9:43 PM Atin Mukherjee > wrote: +Karthik Subrahmanya Didn't we we fix this problem recently? Failed to set extended attribute indicates that temp mount is failing and we don't have quorum number of bricks up. We had two fixes which handles two kind of add-brick scenarios. [1] Fails add-brick when increasing the replica count if any of the brick is down to avoid data loss. This can be overridden by using the force option. [2] Allow add-brick to set the extended attributes by the temp mount if the volume is already mounted (has clients). They are in version 3.12.2 so, patch [1] is present there. But since they are using the force option it should not have any problem even if they have any brick down. The error message they are getting is also different, so it is not because of any brick being down I guess. Patch [2] is not present in 3.12.2 and it is not the conversion from plain distribute to replicate volume. So the scenario is different here. It seems like they are hitting some other issue. @Boris, Can you attach the add-brick's temp mount log. The file name should look something like "dockervols-add-brick-mount.log". Can you also provide all the brick logs of that volume during that time. [1] https://review.gluster.org/#/c/glusterfs/+/16330/ [2] https://review.gluster.org/#/c/glusterfs/+/21791/ Regards, Karthik Boris - What's the gluster version are you using? On Mon, Apr 15, 2019 at 7:35 PM Boris Goldowsky > wrote: Atin, thank you for the reply. Here are all of those pieces of information: [bgoldowsky at webserver9 ~]$ gluster --version glusterfs 3.12.2 (same on all nodes) [bgoldowsky at webserver9 ~]$ sudo gluster peer status Number of Peers: 3 Hostname: webserver11.cast.org Uuid: c2b147fd-cab4-4859-9922-db5730f8549d State: Peer in Cluster (Connected) Hostname: webserver1.cast.org Uuid: 4b918f65-2c9d-478e-8648-81d1d6526d4c State: Peer in Cluster (Connected) Other names: 192.168.200.131 webserver1 Hostname: webserver8.cast.org Uuid: be2f568b-61c5-4016-9264-083e4e6453a2 State: Peer in Cluster (Connected) Other names: webserver8 [bgoldowsky at webserver1 ~]$ sudo gluster v info Volume Name: dockervols Type: Replicate Volume ID: 6093a9c6-ec6c-463a-ad25-8c3e3305b98a Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: webserver1:/data/gluster/dockervols Brick2: webserver11:/data/gluster/dockervols Brick3: webserver9:/data/gluster/dockervols Options Reconfigured: nfs.disable: on transport.address-family: inet auth.allow: 127.0.0.1 Volume Name: testvol Type: Replicate Volume ID: 4d5f00f5-00ea-4dcf-babf-1a76eca55332 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 4 = 4 Transport-type: tcp Bricks: Brick1: webserver1:/data/gluster/testvol Brick2: webserver9:/data/gluster/testvol Brick3: webserver11:/data/gluster/testvol Brick4: webserver8:/data/gluster/testvol Options Reconfigured: transport.address-family: inet nfs.disable: on [bgoldowsky at webserver8 ~]$ sudo gluster v info Volume Name: dockervols Type: Replicate Volume ID: 6093a9c6-ec6c-463a-ad25-8c3e3305b98a Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: webserver1:/data/gluster/dockervols Brick2: webserver11:/data/gluster/dockervols Brick3: webserver9:/data/gluster/dockervols Options Reconfigured: nfs.disable: on transport.address-family: inet auth.allow: 127.0.0.1 Volume Name: testvol Type: Replicate Volume ID: 4d5f00f5-00ea-4dcf-babf-1a76eca55332 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 4 = 4 Transport-type: tcp Bricks: Brick1: webserver1:/data/gluster/testvol Brick2: webserver9:/data/gluster/testvol Brick3: webserver11:/data/gluster/testvol Brick4: webserver8:/data/gluster/testvol Options Reconfigured: nfs.disable: on transport.address-family: inet [bgoldowsky at webserver9 ~]$ sudo gluster v info Volume Name: dockervols Type: Replicate Volume ID: 6093a9c6-ec6c-463a-ad25-8c3e3305b98a Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: webserver1:/data/gluster/dockervols Brick2: webserver11:/data/gluster/dockervols Brick3: webserver9:/data/gluster/dockervols Options Reconfigured: nfs.disable: on transport.address-family: inet auth.allow: 127.0.0.1 Volume Name: testvol Type: Replicate Volume ID: 4d5f00f5-00ea-4dcf-babf-1a76eca55332 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 4 = 4 Transport-type: tcp Bricks: Brick1: webserver1:/data/gluster/testvol Brick2: webserver9:/data/gluster/testvol Brick3: webserver11:/data/gluster/testvol Brick4: webserver8:/data/gluster/testvol Options Reconfigured: nfs.disable: on transport.address-family: inet [bgoldowsky at webserver11 ~]$ sudo gluster v info Volume Name: dockervols Type: Replicate Volume ID: 6093a9c6-ec6c-463a-ad25-8c3e3305b98a Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: webserver1:/data/gluster/dockervols Brick2: webserver11:/data/gluster/dockervols Brick3: webserver9:/data/gluster/dockervols Options Reconfigured: auth.allow: 127.0.0.1 transport.address-family: inet nfs.disable: on Volume Name: testvol Type: Replicate Volume ID: 4d5f00f5-00ea-4dcf-babf-1a76eca55332 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 4 = 4 Transport-type: tcp Bricks: Brick1: webserver1:/data/gluster/testvol Brick2: webserver9:/data/gluster/testvol Brick3: webserver11:/data/gluster/testvol Brick4: webserver8:/data/gluster/testvol Options Reconfigured: transport.address-family: inet nfs.disable: on [bgoldowsky at webserver9 ~]$ sudo gluster volume add-brick dockervols replica 4 webserver8:/data/gluster/dockervols force volume add-brick: failed: Commit failed on webserver8.cast.org. Please check log file for details. Webserver8 glusterd.log: [2019-04-15 13:55:42.338197] I [MSGID: 106488] [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management: Received get vol req The message "I [MSGID: 106488] [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management: Received get vol req" repeated 2 times between [2019-04-15 13:55:42.338197] and [2019-04-15 13:55:42.341618] [2019-04-15 14:00:20.445011] I [run.c:190:runner_log] (-->/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0x3a215) [0x7fe697764215] -->/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0xe3e9d) [0x7fe69780de9d] -->/lib64/libglusterfs.so.0(runner_log+0x115) [0x7fe6a2d16ea5] ) 0-management: Ran script: /var/lib/glusterd/hooks/1/add-brick/pre/S28Quota-enable-root-xattr-heal.sh --volname=dockervols --version=1 --volume-op=add-brick --gd-workdir=/var/lib/glusterd [2019-04-15 14:00:20.445148] I [MSGID: 106578] [glusterd-brick-ops.c:1354:glusterd_op_perform_add_bricks] 0-management: replica-count is set 4 [2019-04-15 14:00:20.445184] I [MSGID: 106578] [glusterd-brick-ops.c:1364:glusterd_op_perform_add_bricks] 0-management: type is set 0, need to change it [2019-04-15 14:00:20.672347] E [MSGID: 106054] [glusterd-utils.c:13863:glusterd_handle_replicate_brick_ops] 0-management: Failed to set extended attribute trusted.add-brick : Transport endpoint is not connected [Transport endpoint is not connected] [2019-04-15 14:00:20.693491] E [MSGID: 101042] [compat.c:569:gf_umount_lazy] 0-management: Lazy unmount of /tmp/mntmvdFGq [Transport endpoint is not connected] [2019-04-15 14:00:20.693597] E [MSGID: 106074] [glusterd-brick-ops.c:2590:glusterd_op_add_brick] 0-glusterd: Unable to add bricks [2019-04-15 14:00:20.693637] E [MSGID: 106123] [glusterd-mgmt.c:312:gd_mgmt_v3_commit_fn] 0-management: Add-brick commit failed. [2019-04-15 14:00:20.693667] E [MSGID: 106123] [glusterd-mgmt-handler.c:616:glusterd_handle_commit_fn] 0-management: commit failed on operation Add brick Webserver11 log file: [2019-04-15 13:56:29.563270] I [MSGID: 106488] [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management: Received get vol req The message "I [MSGID: 106488] [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management: Received get vol req" repeated 2 times between [2019-04-15 13:56:29.563270] and [2019-04-15 13:56:29.566209] [2019-04-15 14:00:33.996866] I [run.c:190:runner_log] (-->/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0x3a215) [0x7f36de924215] -->/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0xe3e9d) [0x7f36de9cde9d] -->/lib64/libglusterfs.so.0(runner_log+0x115) [0x7f36e9ed6ea5] ) 0-management: Ran script: /var/lib/glusterd/hooks/1/add-brick/pre/S28Quota-enable-root-xattr-heal.sh --volname=dockervols --version=1 --volume-op=add-brick --gd-workdir=/var/lib/glusterd [2019-04-15 14:00:33.996979] I [MSGID: 106578] [glusterd-brick-ops.c:1354:glusterd_op_perform_add_bricks] 0-management: replica-count is set 4 [2019-04-15 14:00:33.997004] I [MSGID: 106578] [glusterd-brick-ops.c:1364:glusterd_op_perform_add_bricks] 0-management: type is set 0, need to change it [2019-04-15 14:00:34.013789] I [MSGID: 106132] [glusterd-proc-mgmt.c:84:glusterd_proc_stop] 0-management: nfs already stopped [2019-04-15 14:00:34.013849] I [MSGID: 106568] [glusterd-svc-mgmt.c:243:glusterd_svc_stop] 0-management: nfs service is stopped [2019-04-15 14:00:34.017535] I [MSGID: 106568] [glusterd-proc-mgmt.c:88:glusterd_proc_stop] 0-management: Stopping glustershd daemon running in pid: 6087 [2019-04-15 14:00:35.018783] I [MSGID: 106568] [glusterd-svc-mgmt.c:243:glusterd_svc_stop] 0-management: glustershd service is stopped [2019-04-15 14:00:35.018952] I [MSGID: 106567] [glusterd-svc-mgmt.c:211:glusterd_svc_start] 0-management: Starting glustershd service [2019-04-15 14:00:35.028306] I [MSGID: 106132] [glusterd-proc-mgmt.c:84:glusterd_proc_stop] 0-management: bitd already stopped [2019-04-15 14:00:35.028408] I [MSGID: 106568] [glusterd-svc-mgmt.c:243:glusterd_svc_stop] 0-management: bitd service is stopped [2019-04-15 14:00:35.028601] I [MSGID: 106132] [glusterd-proc-mgmt.c:84:glusterd_proc_stop] 0-management: scrub already stopped [2019-04-15 14:00:35.028645] I [MSGID: 106568] [glusterd-svc-mgmt.c:243:glusterd_svc_stop] 0-management: scrub service is stopped Thank you for taking a look! Boris From: Atin Mukherjee > Date: Friday, April 12, 2019 at 1:10 PM To: Boris Goldowsky > Cc: Gluster-users > Subject: Re: [Gluster-users] Volume stuck unable to add a brick On Fri, 12 Apr 2019 at 22:32, Boris Goldowsky > wrote: I?ve got a replicated volume with three bricks (?1x3=3?), the idea is to have a common set of files that are locally available on all the machines (Scientific Linux 7, which is essentially CentOS 7) in a cluster. I tried to add on a fourth machine, so used a command like this: sudo gluster volume add-brick dockervols replica 4 webserver8:/data/gluster/dockervols force but the result is: volume add-brick: failed: Commit failed on webserver1. Please check log file for details. Commit failed on webserver8. Please check log file for details. Commit failed on webserver11. Please check log file for details. Tried: removing the new brick (this also fails) and trying again. Tried: checking the logs. The log files are not enlightening to me ? I don?t know what?s normal and what?s not. From webserver8 & webserver11 could you attach glusterd log files? Also please share following: - gluster version? (gluster ?version) - Output of ?gluster peer status? - Output of ?gluster v info? from all 4 nodes. Tried: deleting the brick directory from previous attempt, so that it?s not in the way. Tried: restarting gluster services Tried: rebooting Tried: setting up a new volume, replicated to all four machines. This works, so I?m assuming it?s not a networking issue. But still fails with this existing volume that has the critical data in it. Running out of ideas. Any suggestions? Thank you! Boris _______________________________________________ Gluster-users mailing list Gluster-users at gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users -- --Atin -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: data-gluster-dockervols.log-webserver1 Type: application/octet-stream Size: 2853 bytes Desc: data-gluster-dockervols.log-webserver1 URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: data-gluster-dockervols.log-webserver11 Type: application/octet-stream Size: 2853 bytes Desc: data-gluster-dockervols.log-webserver11 URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: data-gluster-dockervols.log-webserver9 Type: application/octet-stream Size: 3827 bytes Desc: data-gluster-dockervols.log-webserver9 URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: dockervols-add-brick-mount.log Type: application/octet-stream Size: 7389 bytes Desc: dockervols-add-brick-mount.log URL: From ksubrahm at redhat.com Tue Apr 16 12:19:49 2019 From: ksubrahm at redhat.com (Karthik Subrahmanya) Date: Tue, 16 Apr 2019 17:49:49 +0530 Subject: [Gluster-users] Volume stuck unable to add a brick In-Reply-To: References: <02CC8632-9B90-43F6-89FA-1160CE529667@contoso.com> <52179D48-7CF0-405E-805F-C5DCDF5B12CB@cast.org> Message-ID: Hi Boris, Thank you for providing the logs. The problem here is because of the "auth.allow: 127.0.0.1" setting on the volume. When you try to add a new brick to the volume internally replication module will try to set some metadata on the existing bricks to mark pending heal on the new brick, by creating a temporary mount. Because of the auth.allow setting that mount gets permission errors as seen in the below logs, leading to add-brick failure. >From data-gluster-dockervols.log-webserver9 : [2019-04-15 14:00:34.226838] I [addr.c:55:compare_addr_and_update] 0-/data/gluster/dockervols: allowed = "127.0.0.1", received addr = "192.168.200.147" [2019-04-15 14:00:34.226895] E [MSGID: 115004] [authenticate.c:224:gf_authenticate] 0-auth: no authentication module is interested in accepting remote-client (null) [2019-04-15 14:00:34.227129] E [MSGID: 115001] [server-handshake.c:848:server_setvolume] 0-dockervols-server: Cannot authenticate client from webserver8.cast.org-55674-2019/04/15-14:00:20:495333-dockervols-client-2-0-0 3.12.2 [Permission denied] >From dockervols-add-brick-mount.log : [2019-04-15 14:00:20.672033] W [MSGID: 114043] [client-handshake.c:1109:client_setvolume_cbk] 0-dockervols-client-2: failed to set the volume [Permission denied] [2019-04-15 14:00:20.672102] W [MSGID: 114007] [client-handshake.c:1138:client_setvolume_cbk] 0-dockervols-client-2: failed to get 'process-uuid' from reply dict [Invalid argument] [2019-04-15 14:00:20.672129] E [MSGID: 114044] [client-handshake.c:1144:client_setvolume_cbk] 0-dockervols-client-2: SETVOLUME on remote-host failed: Authentication failed [Permission denied] [2019-04-15 14:00:20.672151] I [MSGID: 114049] [client-handshake.c:1258:client_setvolume_cbk] 0-dockervols-client-2: sending AUTH_FAILED event This is a known issue and we are planning to fix this. For the time being we have a workaround for this. - Before you try adding the brick set the auth.allow option to default i.e., "*" or you can do this by running "gluster v reset auth.allow" - Add the brick - After it succeeds set back the auth.allow option to the previous value. Regards, Karthik On Tue, Apr 16, 2019 at 5:20 PM Boris Goldowsky wrote: > OK, log files attached. > > > > Boris > > > > > > *From: *Karthik Subrahmanya > *Date: *Tuesday, April 16, 2019 at 2:52 AM > *To: *Atin Mukherjee , Boris Goldowsky < > bgoldowsky at cast.org> > *Cc: *Gluster-users > *Subject: *Re: [Gluster-users] Volume stuck unable to add a brick > > > > > > > > On Mon, Apr 15, 2019 at 9:43 PM Atin Mukherjee > wrote: > > +Karthik Subrahmanya > > > > Didn't we we fix this problem recently? Failed to set extended attribute > indicates that temp mount is failing and we don't have quorum number of > bricks up. > > > > We had two fixes which handles two kind of add-brick scenarios. > > [1] Fails add-brick when increasing the replica count if any of the brick > is down to avoid data loss. This can be overridden by using the force > option. > > [2] Allow add-brick to set the extended attributes by the temp mount if > the volume is already mounted (has clients). > > > > They are in version 3.12.2 so, patch [1] is present there. But since they > are using the force option it should not have any problem even if they have > any brick down. The error message they are getting is also different, so it > is not because of any brick being down I guess. > > Patch [2] is not present in 3.12.2 and it is not the conversion from plain > distribute to replicate volume. So the scenario is different here. > > It seems like they are hitting some other issue. > > > > @Boris, > > Can you attach the add-brick's temp mount log. The file name should look > something like "dockervols-add-brick-mount.log". Can you also provide all > the brick logs of that volume during that time. > > > > [1] https://review.gluster.org/#/c/glusterfs/+/16330/ > > [2] https://review.gluster.org/#/c/glusterfs/+/21791/ > > > > Regards, > > Karthik > > > > Boris - What's the gluster version are you using? > > > > > > > > On Mon, Apr 15, 2019 at 7:35 PM Boris Goldowsky > wrote: > > Atin, thank you for the reply. Here are all of those pieces of > information: > > > > [bgoldowsky at webserver9 ~]$ gluster --version > > glusterfs 3.12.2 > > (same on all nodes) > > > > [bgoldowsky at webserver9 ~]$ sudo gluster peer status > > Number of Peers: 3 > > > > Hostname: webserver11.cast.org > > Uuid: c2b147fd-cab4-4859-9922-db5730f8549d > > State: Peer in Cluster (Connected) > > > > Hostname: webserver1.cast.org > > Uuid: 4b918f65-2c9d-478e-8648-81d1d6526d4c > > State: Peer in Cluster (Connected) > > Other names: > > 192.168.200.131 > > webserver1 > > > > Hostname: webserver8.cast.org > > Uuid: be2f568b-61c5-4016-9264-083e4e6453a2 > > State: Peer in Cluster (Connected) > > Other names: > > webserver8 > > > > [bgoldowsky at webserver1 ~]$ sudo gluster v info > > Volume Name: dockervols > > Type: Replicate > > Volume ID: 6093a9c6-ec6c-463a-ad25-8c3e3305b98a > > Status: Started > > Snapshot Count: 0 > > Number of Bricks: 1 x 3 = 3 > > Transport-type: tcp > > Bricks: > > Brick1: webserver1:/data/gluster/dockervols > > Brick2: webserver11:/data/gluster/dockervols > > Brick3: webserver9:/data/gluster/dockervols > > Options Reconfigured: > > nfs.disable: on > > transport.address-family: inet > > auth.allow: 127.0.0.1 > > > > Volume Name: testvol > > Type: Replicate > > Volume ID: 4d5f00f5-00ea-4dcf-babf-1a76eca55332 > > Status: Started > > Snapshot Count: 0 > > Number of Bricks: 1 x 4 = 4 > > Transport-type: tcp > > Bricks: > > Brick1: webserver1:/data/gluster/testvol > > Brick2: webserver9:/data/gluster/testvol > > Brick3: webserver11:/data/gluster/testvol > > Brick4: webserver8:/data/gluster/testvol > > Options Reconfigured: > > transport.address-family: inet > > nfs.disable: on > > > > [bgoldowsky at webserver8 ~]$ sudo gluster v info > > Volume Name: dockervols > > Type: Replicate > > Volume ID: 6093a9c6-ec6c-463a-ad25-8c3e3305b98a > > Status: Started > > Snapshot Count: 0 > > Number of Bricks: 1 x 3 = 3 > > Transport-type: tcp > > Bricks: > > Brick1: webserver1:/data/gluster/dockervols > > Brick2: webserver11:/data/gluster/dockervols > > Brick3: webserver9:/data/gluster/dockervols > > Options Reconfigured: > > nfs.disable: on > > transport.address-family: inet > > auth.allow: 127.0.0.1 > > > > Volume Name: testvol > > Type: Replicate > > Volume ID: 4d5f00f5-00ea-4dcf-babf-1a76eca55332 > > Status: Started > > Snapshot Count: 0 > > Number of Bricks: 1 x 4 = 4 > > Transport-type: tcp > > Bricks: > > Brick1: webserver1:/data/gluster/testvol > > Brick2: webserver9:/data/gluster/testvol > > Brick3: webserver11:/data/gluster/testvol > > Brick4: webserver8:/data/gluster/testvol > > Options Reconfigured: > > nfs.disable: on > > transport.address-family: inet > > > > [bgoldowsky at webserver9 ~]$ sudo gluster v info > > Volume Name: dockervols > > Type: Replicate > > Volume ID: 6093a9c6-ec6c-463a-ad25-8c3e3305b98a > > Status: Started > > Snapshot Count: 0 > > Number of Bricks: 1 x 3 = 3 > > Transport-type: tcp > > Bricks: > > Brick1: webserver1:/data/gluster/dockervols > > Brick2: webserver11:/data/gluster/dockervols > > Brick3: webserver9:/data/gluster/dockervols > > Options Reconfigured: > > nfs.disable: on > > transport.address-family: inet > > auth.allow: 127.0.0.1 > > > > Volume Name: testvol > > Type: Replicate > > Volume ID: 4d5f00f5-00ea-4dcf-babf-1a76eca55332 > > Status: Started > > Snapshot Count: 0 > > Number of Bricks: 1 x 4 = 4 > > Transport-type: tcp > > Bricks: > > Brick1: webserver1:/data/gluster/testvol > > Brick2: webserver9:/data/gluster/testvol > > Brick3: webserver11:/data/gluster/testvol > > Brick4: webserver8:/data/gluster/testvol > > Options Reconfigured: > > nfs.disable: on > > transport.address-family: inet > > > > [bgoldowsky at webserver11 ~]$ sudo gluster v info > > Volume Name: dockervols > > Type: Replicate > > Volume ID: 6093a9c6-ec6c-463a-ad25-8c3e3305b98a > > Status: Started > > Snapshot Count: 0 > > Number of Bricks: 1 x 3 = 3 > > Transport-type: tcp > > Bricks: > > Brick1: webserver1:/data/gluster/dockervols > > Brick2: webserver11:/data/gluster/dockervols > > Brick3: webserver9:/data/gluster/dockervols > > Options Reconfigured: > > auth.allow: 127.0.0.1 > > transport.address-family: inet > > nfs.disable: on > > > > Volume Name: testvol > > Type: Replicate > > Volume ID: 4d5f00f5-00ea-4dcf-babf-1a76eca55332 > > Status: Started > > Snapshot Count: 0 > > Number of Bricks: 1 x 4 = 4 > > Transport-type: tcp > > Bricks: > > Brick1: webserver1:/data/gluster/testvol > > Brick2: webserver9:/data/gluster/testvol > > Brick3: webserver11:/data/gluster/testvol > > Brick4: webserver8:/data/gluster/testvol > > Options Reconfigured: > > transport.address-family: inet > > nfs.disable: on > > > > [bgoldowsky at webserver9 ~]$ sudo gluster volume add-brick dockervols > replica 4 webserver8:/data/gluster/dockervols force > > volume add-brick: failed: Commit failed on webserver8.cast.org. Please > check log file for details. > > > > Webserver8 glusterd.log: > > > > [2019-04-15 13:55:42.338197] I [MSGID: 106488] > [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management: > Received get vol req > > The message "I [MSGID: 106488] > [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management: > Received get vol req" repeated 2 times between [2019-04-15 13:55:42.338197] > and [2019-04-15 13:55:42.341618] > > [2019-04-15 14:00:20.445011] I [run.c:190:runner_log] > (-->/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0x3a215) > [0x7fe697764215] > -->/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0xe3e9d) > [0x7fe69780de9d] -->/lib64/libglusterfs.so.0(runner_log+0x115) > [0x7fe6a2d16ea5] ) 0-management: Ran script: > /var/lib/glusterd/hooks/1/add-brick/pre/S28Quota-enable-root-xattr-heal.sh > --volname=dockervols --version=1 --volume-op=add-brick > --gd-workdir=/var/lib/glusterd > > [2019-04-15 14:00:20.445148] I [MSGID: 106578] > [glusterd-brick-ops.c:1354:glusterd_op_perform_add_bricks] 0-management: > replica-count is set 4 > > [2019-04-15 14:00:20.445184] I [MSGID: 106578] > [glusterd-brick-ops.c:1364:glusterd_op_perform_add_bricks] 0-management: > type is set 0, need to change it > > [2019-04-15 14:00:20.672347] E [MSGID: 106054] > [glusterd-utils.c:13863:glusterd_handle_replicate_brick_ops] 0-management: > Failed to set extended attribute trusted.add-brick : Transport endpoint is > not connected [Transport endpoint is not connected] > > [2019-04-15 14:00:20.693491] E [MSGID: 101042] > [compat.c:569:gf_umount_lazy] 0-management: Lazy unmount of /tmp/mntmvdFGq > [Transport endpoint is not connected] > > [2019-04-15 14:00:20.693597] E [MSGID: 106074] > [glusterd-brick-ops.c:2590:glusterd_op_add_brick] 0-glusterd: Unable to add > bricks > > [2019-04-15 14:00:20.693637] E [MSGID: 106123] > [glusterd-mgmt.c:312:gd_mgmt_v3_commit_fn] 0-management: Add-brick commit > failed. > > [2019-04-15 14:00:20.693667] E [MSGID: 106123] > [glusterd-mgmt-handler.c:616:glusterd_handle_commit_fn] 0-management: > commit failed on operation Add brick > > > > Webserver11 log file: > > > > [2019-04-15 13:56:29.563270] I [MSGID: 106488] > [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management: > Received get vol req > > The message "I [MSGID: 106488] > [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management: > Received get vol req" repeated 2 times between [2019-04-15 13:56:29.563270] > and [2019-04-15 13:56:29.566209] > > [2019-04-15 14:00:33.996866] I [run.c:190:runner_log] > (-->/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0x3a215) > [0x7f36de924215] > -->/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0xe3e9d) > [0x7f36de9cde9d] -->/lib64/libglusterfs.so.0(runner_log+0x115) > [0x7f36e9ed6ea5] ) 0-management: Ran script: > /var/lib/glusterd/hooks/1/add-brick/pre/S28Quota-enable-root-xattr-heal.sh > --volname=dockervols --version=1 --volume-op=add-brick > --gd-workdir=/var/lib/glusterd > > [2019-04-15 14:00:33.996979] I [MSGID: 106578] > [glusterd-brick-ops.c:1354:glusterd_op_perform_add_bricks] 0-management: > replica-count is set 4 > > [2019-04-15 14:00:33.997004] I [MSGID: 106578] > [glusterd-brick-ops.c:1364:glusterd_op_perform_add_bricks] 0-management: > type is set 0, need to change it > > [2019-04-15 14:00:34.013789] I [MSGID: 106132] > [glusterd-proc-mgmt.c:84:glusterd_proc_stop] 0-management: nfs already > stopped > > [2019-04-15 14:00:34.013849] I [MSGID: 106568] > [glusterd-svc-mgmt.c:243:glusterd_svc_stop] 0-management: nfs service is > stopped > > [2019-04-15 14:00:34.017535] I [MSGID: 106568] > [glusterd-proc-mgmt.c:88:glusterd_proc_stop] 0-management: Stopping > glustershd daemon running in pid: 6087 > > [2019-04-15 14:00:35.018783] I [MSGID: 106568] > [glusterd-svc-mgmt.c:243:glusterd_svc_stop] 0-management: glustershd > service is stopped > > [2019-04-15 14:00:35.018952] I [MSGID: 106567] > [glusterd-svc-mgmt.c:211:glusterd_svc_start] 0-management: Starting > glustershd service > > [2019-04-15 14:00:35.028306] I [MSGID: 106132] > [glusterd-proc-mgmt.c:84:glusterd_proc_stop] 0-management: bitd already > stopped > > [2019-04-15 14:00:35.028408] I [MSGID: 106568] > [glusterd-svc-mgmt.c:243:glusterd_svc_stop] 0-management: bitd service is > stopped > > [2019-04-15 14:00:35.028601] I [MSGID: 106132] > [glusterd-proc-mgmt.c:84:glusterd_proc_stop] 0-management: scrub already > stopped > > [2019-04-15 14:00:35.028645] I [MSGID: 106568] > [glusterd-svc-mgmt.c:243:glusterd_svc_stop] 0-management: scrub service is > stopped > > > > Thank you for taking a look! > > > > Boris > > > > > > *From: *Atin Mukherjee > *Date: *Friday, April 12, 2019 at 1:10 PM > *To: *Boris Goldowsky > *Cc: *Gluster-users > *Subject: *Re: [Gluster-users] Volume stuck unable to add a brick > > > > > > > > On Fri, 12 Apr 2019 at 22:32, Boris Goldowsky wrote: > > I?ve got a replicated volume with three bricks (?1x3=3?), the idea is to > have a common set of files that are locally available on all the machines > (Scientific Linux 7, which is essentially CentOS 7) in a cluster. > > > > I tried to add on a fourth machine, so used a command like this: > > > > sudo gluster volume add-brick dockervols replica 4 > webserver8:/data/gluster/dockervols force > > > > but the result is: > > volume add-brick: failed: Commit failed on webserver1. Please check log > file for details. > > Commit failed on webserver8. Please check log file for details. > > Commit failed on webserver11. Please check log file for details. > > > > Tried: removing the new brick (this also fails) and trying again. > > Tried: checking the logs. The log files are not enlightening to me ? I > don?t know what?s normal and what?s not. > > > > From webserver8 & webserver11 could you attach glusterd log files? > > > > Also please share following: > > - gluster version? (gluster ?version) > > - Output of ?gluster peer status? > > - Output of ?gluster v info? from all 4 nodes. > > > > Tried: deleting the brick directory from previous attempt, so that it?s > not in the way. > > Tried: restarting gluster services > > Tried: rebooting > > Tried: setting up a new volume, replicated to all four machines. This > works, so I?m assuming it?s not a networking issue. But still fails with > this existing volume that has the critical data in it. > > > > Running out of ideas. Any suggestions? Thank you! > > > > Boris > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > > -- > > --Atin > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From revirii at googlemail.com Tue Apr 16 12:54:49 2019 From: revirii at googlemail.com (Hu Bert) Date: Tue, 16 Apr 2019 14:54:49 +0200 Subject: [Gluster-users] Upgrade 5.5 -> 5.6: network traffic bug fixed? In-Reply-To: References: Message-ID: Hi Poornima, thx for your efforts. I made a couple of tests and the results are the same, so the options are not related. Anyway, i'm not able to reproduce the problem on my testing system, although the volume options are the same. About 1.5 hours ago i set performance.quick-read to on again and watched: load/iowait went up (not bad at the moment, little traffic), but network traffic went up - from <20 MBit/s up to 160 MBit/s. After deactivating quick-read traffic dropped to < 20 MBit/s again. munin graph: https://abload.de/img/network-client4s0kle.png The 2nd peak is from the last test. Thx, Hubert Am Di., 16. Apr. 2019 um 09:43 Uhr schrieb Hu Bert : > > In my first test on my testing setup the traffic was on a normal > level, so i thought i was "safe". But on my live system the network > traffic was a multiple of the traffic one would expect. > performance.quick-read was enabled in both, the only difference in the > volume options between live and testing are: > > performance.read-ahead: testing on, live off > performance.io-cache: testing on, live off > > I ran another test on my testing setup, deactivated both and copied 9 > GB of data. Now the traffic went up as well, from before ~9-10 MBit/s > up to 100 MBit/s with both options off. Does performance.quick-read > require one of those options set to 'on'? > > I'll start another test shortly, and activate on of those 2 options, > maybe there's a connection between those 3 options? > > > Best Regards, > Hubert > > Am Di., 16. Apr. 2019 um 08:57 Uhr schrieb Poornima Gurusiddaiah > : > > > > Thank you for reporting this. I had done testing on my local setup and the issue was resolved even with quick-read enabled. Let me test it again. > > > > Regards, > > Poornima > > > > On Mon, Apr 15, 2019 at 12:25 PM Hu Bert wrote: > >> > >> fyi: after setting performance.quick-read to off network traffic > >> dropped to normal levels, client load/iowait back to normal as well. > >> > >> client: https://abload.de/img/network-client-afterihjqi.png > >> server: https://abload.de/img/network-server-afterwdkrl.png > >> > >> Am Mo., 15. Apr. 2019 um 08:33 Uhr schrieb Hu Bert : > >> > > >> > Good Morning, > >> > > >> > today i updated my replica 3 setup (debian stretch) from version 5.5 > >> > to 5.6, as i thought the network traffic bug (#1673058) was fixed and > >> > i could re-activate 'performance.quick-read' again. See release notes: > >> > > >> > https://review.gluster.org/#/c/glusterfs/+/22538/ > >> > http://git.gluster.org/cgit/glusterfs.git/commit/?id=34a2347780c2429284f57232f3aabb78547a9795 > >> > > >> > Upgrade went fine, and then i was watching iowait and network traffic. > >> > It seems that the network traffic went up after upgrade and > >> > reactivation of performance.quick-read. Here are some graphs: > >> > > >> > network client1: https://abload.de/img/network-clientfwj1m.png > >> > network client2: https://abload.de/img/network-client2trkow.png > >> > network server: https://abload.de/img/network-serverv3jjr.png > >> > > >> > gluster volume info: https://pastebin.com/ZMuJYXRZ > >> > > >> > Just wondering if the network traffic bug really got fixed or if this > >> > is a new problem. I'll wait a couple of minutes and then deactivate > >> > performance.quick-read again, just to see if network traffic goes down > >> > to normal levels. > >> > > >> > > >> > Best regards, > >> > Hubert > >> _______________________________________________ > >> Gluster-users mailing list > >> Gluster-users at gluster.org > >> https://lists.gluster.org/mailman/listinfo/gluster-users From avishwan at redhat.com Tue Apr 16 13:09:22 2019 From: avishwan at redhat.com (Aravinda) Date: Tue, 16 Apr 2019 18:39:22 +0530 Subject: [Gluster-users] Reg: Gluster In-Reply-To: References: Message-ID: <2c44fb17b7ce644ede6a5f75a14398a79cfa742d.camel@redhat.com> On Tue, 2019-04-16 at 11:27 +0530, Poornima Gurusiddaiah wrote: > +Sunny > > On Wed, Apr 10, 2019, 9:02 PM Gomathi Nayagam < > gomathinayagam08 at gmail.com> wrote: > > Hi User, > > > > We are testing geo-replication of gluster it is > > taking nearly 8 mins to transfer 16 GB size of data between the DCs > > while when transferred the same data over plain rsync it took only > > 2 mins. Can we know if we are missing something? > > > > > > > > > > Thanks & Regards, > > Gomathi Nayagam.D > > _______________________________________________ > > Gluster-users mailing list > > Gluster-users at gluster.org > > https://lists.gluster.org/mailman/listinfo/gluster-users > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users Geo-replication does many things to keep information about synced data and track the new changes happened in the master Volume. Geo- replication shines better when doing incremental sync that is when new data is created or existing data is modified in Master volume. Are you observing slowness even during incremental sync? (Current time - Last Synced time in status output shows how much Slave Volume is lagging compared to Master Volume) -- regards Aravinda From ksubrahm at redhat.com Tue Apr 16 15:26:00 2019 From: ksubrahm at redhat.com (Karthik Subrahmanya) Date: Tue, 16 Apr 2019 20:56:00 +0530 Subject: [Gluster-users] Volume stuck unable to add a brick In-Reply-To: References: <02CC8632-9B90-43F6-89FA-1160CE529667@contoso.com> <52179D48-7CF0-405E-805F-C5DCDF5B12CB@cast.org> Message-ID: You're welcome! On Tue 16 Apr, 2019, 7:12 PM Boris Goldowsky, wrote: > That worked! Thank you SO much! > > > > Boris > > > > > > *From: *Karthik Subrahmanya > *Date: *Tuesday, April 16, 2019 at 8:20 AM > *To: *Boris Goldowsky > *Cc: *Atin Mukherjee , Gluster-users < > gluster-users at gluster.org> > *Subject: *Re: [Gluster-users] Volume stuck unable to add a brick > > > > Hi Boris, > > > > Thank you for providing the logs. > > The problem here is because of the "auth.allow: 127.0.0.1" setting on the > volume. > > When you try to add a new brick to the volume internally replication > module will try to set some metadata on the existing bricks to mark pending > heal on the new brick, by creating a temporary mount. Because of the > auth.allow setting that mount gets permission errors as seen in the below > logs, leading to add-brick failure. > > > > From data-gluster-dockervols.log-webserver9 : > > [2019-04-15 14:00:34.226838] I [addr.c:55:compare_addr_and_update] > 0-/data/gluster/dockervols: allowed = "127.0.0.1", received addr = > "192.168.200.147" > > [2019-04-15 14:00:34.226895] E [MSGID: 115004] > [authenticate.c:224:gf_authenticate] 0-auth: no authentication module is > interested in accepting remote-client (null) > > [2019-04-15 14:00:34.227129] E [MSGID: 115001] > [server-handshake.c:848:server_setvolume] 0-dockervols-server: Cannot > authenticate client from > webserver8.cast.org-55674-2019/04/15-14:00:20:495333-dockervols-client-2-0-0 > 3.12.2 [Permission denied] > > > > From dockervols-add-brick-mount.log : > > [2019-04-15 14:00:20.672033] W [MSGID: 114043] > [client-handshake.c:1109:client_setvolume_cbk] 0-dockervols-client-2: > failed to set the volume [Permission denied] > > [2019-04-15 14:00:20.672102] W [MSGID: 114007] > [client-handshake.c:1138:client_setvolume_cbk] 0-dockervols-client-2: > failed to get 'process-uuid' from reply dict [Invalid argument] > > [2019-04-15 14:00:20.672129] E [MSGID: 114044] > [client-handshake.c:1144:client_setvolume_cbk] 0-dockervols-client-2: > SETVOLUME on remote-host failed: Authentication failed [Permission denied] > > [2019-04-15 14:00:20.672151] I [MSGID: 114049] > [client-handshake.c:1258:client_setvolume_cbk] 0-dockervols-client-2: > sending AUTH_FAILED event > > > > This is a known issue and we are planning to fix this. For the time being > we have a workaround for this. > > - Before you try adding the brick set the auth.allow option to default > i.e., "*" or you can do this by running "gluster v reset > auth.allow" > > - Add the brick > > - After it succeeds set back the auth.allow option to the previous value. > > > > Regards, > > Karthik > > > > On Tue, Apr 16, 2019 at 5:20 PM Boris Goldowsky > wrote: > > OK, log files attached. > > > > Boris > > > > > > *From: *Karthik Subrahmanya > *Date: *Tuesday, April 16, 2019 at 2:52 AM > *To: *Atin Mukherjee , Boris Goldowsky < > bgoldowsky at cast.org> > *Cc: *Gluster-users > *Subject: *Re: [Gluster-users] Volume stuck unable to add a brick > > > > > > > > On Mon, Apr 15, 2019 at 9:43 PM Atin Mukherjee > wrote: > > +Karthik Subrahmanya > > > > Didn't we we fix this problem recently? Failed to set extended attribute > indicates that temp mount is failing and we don't have quorum number of > bricks up. > > > > We had two fixes which handles two kind of add-brick scenarios. > > [1] Fails add-brick when increasing the replica count if any of the brick > is down to avoid data loss. This can be overridden by using the force > option. > > [2] Allow add-brick to set the extended attributes by the temp mount if > the volume is already mounted (has clients). > > > > They are in version 3.12.2 so, patch [1] is present there. But since they > are using the force option it should not have any problem even if they have > any brick down. The error message they are getting is also different, so it > is not because of any brick being down I guess. > > Patch [2] is not present in 3.12.2 and it is not the conversion from plain > distribute to replicate volume. So the scenario is different here. > > It seems like they are hitting some other issue. > > > > @Boris, > > Can you attach the add-brick's temp mount log. The file name should look > something like "dockervols-add-brick-mount.log". Can you also provide all > the brick logs of that volume during that time. > > > > [1] https://review.gluster.org/#/c/glusterfs/+/16330/ > > [2] https://review.gluster.org/#/c/glusterfs/+/21791/ > > > > Regards, > > Karthik > > > > Boris - What's the gluster version are you using? > > > > > > > > On Mon, Apr 15, 2019 at 7:35 PM Boris Goldowsky > wrote: > > Atin, thank you for the reply. Here are all of those pieces of > information: > > > > [bgoldowsky at webserver9 ~]$ gluster --version > > glusterfs 3.12.2 > > (same on all nodes) > > > > [bgoldowsky at webserver9 ~]$ sudo gluster peer status > > Number of Peers: 3 > > > > Hostname: webserver11.cast.org > > Uuid: c2b147fd-cab4-4859-9922-db5730f8549d > > State: Peer in Cluster (Connected) > > > > Hostname: webserver1.cast.org > > Uuid: 4b918f65-2c9d-478e-8648-81d1d6526d4c > > State: Peer in Cluster (Connected) > > Other names: > > 192.168.200.131 > > webserver1 > > > > Hostname: webserver8.cast.org > > Uuid: be2f568b-61c5-4016-9264-083e4e6453a2 > > State: Peer in Cluster (Connected) > > Other names: > > webserver8 > > > > [bgoldowsky at webserver1 ~]$ sudo gluster v info > > Volume Name: dockervols > > Type: Replicate > > Volume ID: 6093a9c6-ec6c-463a-ad25-8c3e3305b98a > > Status: Started > > Snapshot Count: 0 > > Number of Bricks: 1 x 3 = 3 > > Transport-type: tcp > > Bricks: > > Brick1: webserver1:/data/gluster/dockervols > > Brick2: webserver11:/data/gluster/dockervols > > Brick3: webserver9:/data/gluster/dockervols > > Options Reconfigured: > > nfs.disable: on > > transport.address-family: inet > > auth.allow: 127.0.0.1 > > > > Volume Name: testvol > > Type: Replicate > > Volume ID: 4d5f00f5-00ea-4dcf-babf-1a76eca55332 > > Status: Started > > Snapshot Count: 0 > > Number of Bricks: 1 x 4 = 4 > > Transport-type: tcp > > Bricks: > > Brick1: webserver1:/data/gluster/testvol > > Brick2: webserver9:/data/gluster/testvol > > Brick3: webserver11:/data/gluster/testvol > > Brick4: webserver8:/data/gluster/testvol > > Options Reconfigured: > > transport.address-family: inet > > nfs.disable: on > > > > [bgoldowsky at webserver8 ~]$ sudo gluster v info > > Volume Name: dockervols > > Type: Replicate > > Volume ID: 6093a9c6-ec6c-463a-ad25-8c3e3305b98a > > Status: Started > > Snapshot Count: 0 > > Number of Bricks: 1 x 3 = 3 > > Transport-type: tcp > > Bricks: > > Brick1: webserver1:/data/gluster/dockervols > > Brick2: webserver11:/data/gluster/dockervols > > Brick3: webserver9:/data/gluster/dockervols > > Options Reconfigured: > > nfs.disable: on > > transport.address-family: inet > > auth.allow: 127.0.0.1 > > > > Volume Name: testvol > > Type: Replicate > > Volume ID: 4d5f00f5-00ea-4dcf-babf-1a76eca55332 > > Status: Started > > Snapshot Count: 0 > > Number of Bricks: 1 x 4 = 4 > > Transport-type: tcp > > Bricks: > > Brick1: webserver1:/data/gluster/testvol > > Brick2: webserver9:/data/gluster/testvol > > Brick3: webserver11:/data/gluster/testvol > > Brick4: webserver8:/data/gluster/testvol > > Options Reconfigured: > > nfs.disable: on > > transport.address-family: inet > > > > [bgoldowsky at webserver9 ~]$ sudo gluster v info > > Volume Name: dockervols > > Type: Replicate > > Volume ID: 6093a9c6-ec6c-463a-ad25-8c3e3305b98a > > Status: Started > > Snapshot Count: 0 > > Number of Bricks: 1 x 3 = 3 > > Transport-type: tcp > > Bricks: > > Brick1: webserver1:/data/gluster/dockervols > > Brick2: webserver11:/data/gluster/dockervols > > Brick3: webserver9:/data/gluster/dockervols > > Options Reconfigured: > > nfs.disable: on > > transport.address-family: inet > > auth.allow: 127.0.0.1 > > > > Volume Name: testvol > > Type: Replicate > > Volume ID: 4d5f00f5-00ea-4dcf-babf-1a76eca55332 > > Status: Started > > Snapshot Count: 0 > > Number of Bricks: 1 x 4 = 4 > > Transport-type: tcp > > Bricks: > > Brick1: webserver1:/data/gluster/testvol > > Brick2: webserver9:/data/gluster/testvol > > Brick3: webserver11:/data/gluster/testvol > > Brick4: webserver8:/data/gluster/testvol > > Options Reconfigured: > > nfs.disable: on > > transport.address-family: inet > > > > [bgoldowsky at webserver11 ~]$ sudo gluster v info > > Volume Name: dockervols > > Type: Replicate > > Volume ID: 6093a9c6-ec6c-463a-ad25-8c3e3305b98a > > Status: Started > > Snapshot Count: 0 > > Number of Bricks: 1 x 3 = 3 > > Transport-type: tcp > > Bricks: > > Brick1: webserver1:/data/gluster/dockervols > > Brick2: webserver11:/data/gluster/dockervols > > Brick3: webserver9:/data/gluster/dockervols > > Options Reconfigured: > > auth.allow: 127.0.0.1 > > transport.address-family: inet > > nfs.disable: on > > > > Volume Name: testvol > > Type: Replicate > > Volume ID: 4d5f00f5-00ea-4dcf-babf-1a76eca55332 > > Status: Started > > Snapshot Count: 0 > > Number of Bricks: 1 x 4 = 4 > > Transport-type: tcp > > Bricks: > > Brick1: webserver1:/data/gluster/testvol > > Brick2: webserver9:/data/gluster/testvol > > Brick3: webserver11:/data/gluster/testvol > > Brick4: webserver8:/data/gluster/testvol > > Options Reconfigured: > > transport.address-family: inet > > nfs.disable: on > > > > [bgoldowsky at webserver9 ~]$ sudo gluster volume add-brick dockervols > replica 4 webserver8:/data/gluster/dockervols force > > volume add-brick: failed: Commit failed on webserver8.cast.org. Please > check log file for details. > > > > Webserver8 glusterd.log: > > > > [2019-04-15 13:55:42.338197] I [MSGID: 106488] > [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management: > Received get vol req > > The message "I [MSGID: 106488] > [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management: > Received get vol req" repeated 2 times between [2019-04-15 13:55:42.338197] > and [2019-04-15 13:55:42.341618] > > [2019-04-15 14:00:20.445011] I [run.c:190:runner_log] > (-->/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0x3a215) > [0x7fe697764215] > -->/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0xe3e9d) > [0x7fe69780de9d] -->/lib64/libglusterfs.so.0(runner_log+0x115) > [0x7fe6a2d16ea5] ) 0-management: Ran script: > /var/lib/glusterd/hooks/1/add-brick/pre/S28Quota-enable-root-xattr-heal.sh > --volname=dockervols --version=1 --volume-op=add-brick > --gd-workdir=/var/lib/glusterd > > [2019-04-15 14:00:20.445148] I [MSGID: 106578] > [glusterd-brick-ops.c:1354:glusterd_op_perform_add_bricks] 0-management: > replica-count is set 4 > > [2019-04-15 14:00:20.445184] I [MSGID: 106578] > [glusterd-brick-ops.c:1364:glusterd_op_perform_add_bricks] 0-management: > type is set 0, need to change it > > [2019-04-15 14:00:20.672347] E [MSGID: 106054] > [glusterd-utils.c:13863:glusterd_handle_replicate_brick_ops] 0-management: > Failed to set extended attribute trusted.add-brick : Transport endpoint is > not connected [Transport endpoint is not connected] > > [2019-04-15 14:00:20.693491] E [MSGID: 101042] > [compat.c:569:gf_umount_lazy] 0-management: Lazy unmount of /tmp/mntmvdFGq > [Transport endpoint is not connected] > > [2019-04-15 14:00:20.693597] E [MSGID: 106074] > [glusterd-brick-ops.c:2590:glusterd_op_add_brick] 0-glusterd: Unable to add > bricks > > [2019-04-15 14:00:20.693637] E [MSGID: 106123] > [glusterd-mgmt.c:312:gd_mgmt_v3_commit_fn] 0-management: Add-brick commit > failed. > > [2019-04-15 14:00:20.693667] E [MSGID: 106123] > [glusterd-mgmt-handler.c:616:glusterd_handle_commit_fn] 0-management: > commit failed on operation Add brick > > > > Webserver11 log file: > > > > [2019-04-15 13:56:29.563270] I [MSGID: 106488] > [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management: > Received get vol req > > The message "I [MSGID: 106488] > [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management: > Received get vol req" repeated 2 times between [2019-04-15 13:56:29.563270] > and [2019-04-15 13:56:29.566209] > > [2019-04-15 14:00:33.996866] I [run.c:190:runner_log] > (-->/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0x3a215) > [0x7f36de924215] > -->/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0xe3e9d) > [0x7f36de9cde9d] -->/lib64/libglusterfs.so.0(runner_log+0x115) > [0x7f36e9ed6ea5] ) 0-management: Ran script: > /var/lib/glusterd/hooks/1/add-brick/pre/S28Quota-enable-root-xattr-heal.sh > --volname=dockervols --version=1 --volume-op=add-brick > --gd-workdir=/var/lib/glusterd > > [2019-04-15 14:00:33.996979] I [MSGID: 106578] > [glusterd-brick-ops.c:1354:glusterd_op_perform_add_bricks] 0-management: > replica-count is set 4 > > [2019-04-15 14:00:33.997004] I [MSGID: 106578] > [glusterd-brick-ops.c:1364:glusterd_op_perform_add_bricks] 0-management: > type is set 0, need to change it > > [2019-04-15 14:00:34.013789] I [MSGID: 106132] > [glusterd-proc-mgmt.c:84:glusterd_proc_stop] 0-management: nfs already > stopped > > [2019-04-15 14:00:34.013849] I [MSGID: 106568] > [glusterd-svc-mgmt.c:243:glusterd_svc_stop] 0-management: nfs service is > stopped > > [2019-04-15 14:00:34.017535] I [MSGID: 106568] > [glusterd-proc-mgmt.c:88:glusterd_proc_stop] 0-management: Stopping > glustershd daemon running in pid: 6087 > > [2019-04-15 14:00:35.018783] I [MSGID: 106568] > [glusterd-svc-mgmt.c:243:glusterd_svc_stop] 0-management: glustershd > service is stopped > > [2019-04-15 14:00:35.018952] I [MSGID: 106567] > [glusterd-svc-mgmt.c:211:glusterd_svc_start] 0-management: Starting > glustershd service > > [2019-04-15 14:00:35.028306] I [MSGID: 106132] > [glusterd-proc-mgmt.c:84:glusterd_proc_stop] 0-management: bitd already > stopped > > [2019-04-15 14:00:35.028408] I [MSGID: 106568] > [glusterd-svc-mgmt.c:243:glusterd_svc_stop] 0-management: bitd service is > stopped > > [2019-04-15 14:00:35.028601] I [MSGID: 106132] > [glusterd-proc-mgmt.c:84:glusterd_proc_stop] 0-management: scrub already > stopped > > [2019-04-15 14:00:35.028645] I [MSGID: 106568] > [glusterd-svc-mgmt.c:243:glusterd_svc_stop] 0-management: scrub service is > stopped > > > > Thank you for taking a look! > > > > Boris > > > > > > *From: *Atin Mukherjee > *Date: *Friday, April 12, 2019 at 1:10 PM > *To: *Boris Goldowsky > *Cc: *Gluster-users > *Subject: *Re: [Gluster-users] Volume stuck unable to add a brick > > > > > > > > On Fri, 12 Apr 2019 at 22:32, Boris Goldowsky wrote: > > I?ve got a replicated volume with three bricks (?1x3=3?), the idea is to > have a common set of files that are locally available on all the machines > (Scientific Linux 7, which is essentially CentOS 7) in a cluster. > > > > I tried to add on a fourth machine, so used a command like this: > > > > sudo gluster volume add-brick dockervols replica 4 > webserver8:/data/gluster/dockervols force > > > > but the result is: > > volume add-brick: failed: Commit failed on webserver1. Please check log > file for details. > > Commit failed on webserver8. Please check log file for details. > > Commit failed on webserver11. Please check log file for details. > > > > Tried: removing the new brick (this also fails) and trying again. > > Tried: checking the logs. The log files are not enlightening to me ? I > don?t know what?s normal and what?s not. > > > > From webserver8 & webserver11 could you attach glusterd log files? > > > > Also please share following: > > - gluster version? (gluster ?version) > > - Output of ?gluster peer status? > > - Output of ?gluster v info? from all 4 nodes. > > > > Tried: deleting the brick directory from previous attempt, so that it?s > not in the way. > > Tried: restarting gluster services > > Tried: rebooting > > Tried: setting up a new volume, replicated to all four machines. This > works, so I?m assuming it?s not a networking issue. But still fails with > this existing volume that has the critical data in it. > > > > Running out of ideas. Any suggestions? Thank you! > > > > Boris > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > > -- > > --Atin > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From snowmailer at gmail.com Tue Apr 16 15:44:24 2019 From: snowmailer at gmail.com (Martin Toth) Date: Tue, 16 Apr 2019 17:44:24 +0200 Subject: [Gluster-users] Replica 3 - how to replace failed node (peer) In-Reply-To: References: <0917AF4A-76EC-4A9E-820F-E0ADA2DA899A@gmail.com> <1634978A-E849-48DB-A160-B1AC3DB56D38@gmail.com> <69E7C95F-8A81-46CB-8BD8-F66B582144EC@gmail.com> <00009213-6BF3-4A7F-AFA7-AC076B04496C@gmail.com> Message-ID: <7B2698DB-1897-4EA4-AA63-FFE8752C50F7@gmail.com> Thanks for clarification, one more question. When I will recover(boot) failed node back and this peer will be available again to remaining two nodes. How do I tell gluster to mark this brick as failed ? I mean, I?ve booted failed node back without networking. Disk partition (ZFS pool on another disks) where brick was before failure is lost. Now I can start gluster event when I don't have ZFS pool where failed brick was before ? This wont be a problem when I will connect this node back to cluster ? (before brick replace/reset command will be issued) Thanks. BR! Martin > On 11 Apr 2019, at 15:40, Karthik Subrahmanya wrote: > > > > On Thu, Apr 11, 2019 at 6:38 PM Martin Toth > wrote: > Hi Karthik, > >> On Thu, Apr 11, 2019 at 12:43 PM Martin Toth > wrote: >> Hi Karthik, >> >> more over, I would like to ask if there are some recommended settings/parameters for SHD in order to achieve good or fair I/O while volume will be healed when I will replace Brick (this should trigger healing process). >> If I understand you concern correctly, you need to get fair I/O performance for clients while healing takes place as part of the replace brick operation. For this you can turn off the "data-self-heal" and "metadata-self-heal" options until the heal completes on the new brick. > > This is exactly what I mean. I am running VM disks on remaining 2 (out of 3 - one failed as mentioned) nodes and I need to ensure there will be fair I/O performance available on these two nodes while replace brick operation will heal volume. > I will not run any VMs on node where replace brick operation will be running. So if I understand correctly, when I will set : > > # gluster volume set cluster.data-self-heal off > # gluster volume set cluster.metadata-self-heal off > > this will tell Gluster clients (libgfapi and FUSE mount) not to read from node ?where replace brick operation? is in place but from remaing two healthy nodes. Is this correct ? Thanks for clarification. > The reads will be served from one of the good bricks since the file will either be not present on the replaced brick at the time of read or it will be present but marked for heal if it is not already healed. If already healed by SHD, then it could be served from the new brick as well, but there won't be any problem in reading from there in that scenario. > By setting these two options whenever a read comes from client it will not try to heal the file for data/metadata. Otherwise it would try to heal (if not already healed by SHD) when the read comes on this, hence slowing down the client. > >> Turning off client side healing doesn't compromise data integrity and consistency. During the read request from client, pending xattr is evaluated for replica copies and read is only served from correct copy. During writes, IO will continue on both the replicas, SHD will take care of healing files. >> After replacing the brick, we strongly recommend you to consider upgrading your gluster to one of the maintained versions. We have many stability related fixes there, which can handle some critical issues and corner cases which you could hit during these kind of scenarios. > > This will be first priority in infrastructure after fixing this cluster back to fully functional replica3. I will upgrade to 3.12.x and then to version 5 or 6. > Sounds good. > > If you are planning to have the same name for the new brick and if you get the error like "Brick may be containing or be contained by an existing brick" even after using the force option, try using a different name. That should work. > > Regards, > Karthik > > BR, > Martin > >> Regards, >> Karthik >> I had some problems in past when healing was triggered, VM disks became unresponsive because healing took most of I/O. My volume containing only big files with VM disks. >> >> Thanks for suggestions. >> BR, >> Martin >> >>> On 10 Apr 2019, at 12:38, Martin Toth > wrote: >>> >>> Thanks, this looks ok to me, I will reset brick because I don't have any data anymore on failed node so I can use same path / brick name. >>> >>> Is reseting brick dangerous command? Should I be worried about some possible failure that will impact remaining two nodes? I am running really old 3.7.6 but stable version. >>> >>> Thanks, >>> BR! >>> >>> Martin >>> >>> >>>> On 10 Apr 2019, at 12:20, Karthik Subrahmanya > wrote: >>>> >>>> Hi Martin, >>>> >>>> After you add the new disks and creating raid array, you can run the following command to replace the old brick with new one: >>>> >>>> - If you are going to use a different name to the new brick you can run >>>> gluster volume replace-brick commit force >>>> >>>> - If you are planning to use the same name for the new brick as well then you can use >>>> gluster volume reset-brick commit force >>>> Here old-brick & new-brick's hostname & path should be same. >>>> >>>> After replacing the brick, make sure the brick comes online using volume status. >>>> Heal should automatically start, you can check the heal status to see all the files gets replicated to the newly added brick. If it does not start automatically, you can manually start that by running gluster volume heal . >>>> >>>> HTH, >>>> Karthik >>>> >>>> On Wed, Apr 10, 2019 at 3:13 PM Martin Toth > wrote: >>>> Hi all, >>>> >>>> I am running replica 3 gluster with 3 bricks. One of my servers failed - all disks are showing errors and raid is in fault state. >>>> >>>> Type: Replicate >>>> Volume ID: 41d5c283-3a74-4af8-a55d-924447bfa59a >>>> Status: Started >>>> Number of Bricks: 1 x 3 = 3 >>>> Transport-type: tcp >>>> Bricks: >>>> Brick1: node1.san:/tank/gluster/gv0imagestore/brick1 >>>> Brick2: node2.san:/tank/gluster/gv0imagestore/brick1 >>> Brick3: node3.san:/tank/gluster/gv0imagestore/brick1 >>>> >>>> So one of my bricks is totally failed (node2). It went down and all data are lost (failed raid on node2). Now I am running only two bricks on 2 servers out from 3. >>>> This is really critical problem for us, we can lost all data. I want to add new disks to node2, create new raid array on them and try to replace failed brick on this node. >>>> >>>> What is the procedure of replacing Brick2 on node2, can someone advice? I can?t find anything relevant in documentation. >>>> >>>> Thanks in advance, >>>> Martin >>>> _______________________________________________ >>>> Gluster-users mailing list >>>> Gluster-users at gluster.org >>>> https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From spisla80 at gmail.com Wed Apr 17 08:02:17 2019 From: spisla80 at gmail.com (David Spisla) Date: Wed, 17 Apr 2019 10:02:17 +0200 Subject: [Gluster-users] Hard Failover with Samba and Glusterfs Message-ID: Dear Gluster Community, I have this setup: 4-Node Glusterfs v5.5 Cluster, using SAMBA/CTDB v4.8 to access the volumes (each node has a VIP) I was testing this failover scenario: 1. Start Writing 940 GB with small files (64K-100K)from a Win10 Client to node1 2. During the write process I hardly shutdown node1 (where the client is connect via VIP) by turn off the power My expectation is, that the write process stops and after a while the Win10 Client offers me a Retry, so I can continue the write on different node (which has now the VIP of node1). In past time I did this observation, but now the system shows a strange bahaviour: The Win10 Client do nothing and the Explorer freezes, in the backend CTDB can not perform the failover and throws errors. The glusterd from node2 and node3 logs this messages: > [2019-04-16 14:47:31.828323] W [glusterd-locks.c:795:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0x24349) [0x7f1a62fcb349] -->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0x2d950) [0x7f1a62fd4950] -->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0xe0359) [0x7f1a63087359] ) 0-management: Lock for vol archive1 not held > [2019-04-16 14:47:31.828350] W [MSGID: 106117] [glusterd-handler.c:6451:__glusterd_peer_rpc_notify] 0-management: Lock not released for archive1 > [2019-04-16 14:47:31.828369] W [glusterd-locks.c:795:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0x24349) [0x7f1a62fcb349] -->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0x2d950) [0x7f1a62fd4950] -->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0xe0359) [0x7f1a63087359] ) 0-management: Lock for vol archive2 not held > [2019-04-16 14:47:31.828376] W [MSGID: 106117] [glusterd-handler.c:6451:__glusterd_peer_rpc_notify] 0-management: Lock not released for archive2 > [2019-04-16 14:47:31.828412] W [glusterd-locks.c:795:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0x24349) [0x7f1a62fcb349] -->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0x2d950) [0x7f1a62fd4950] -->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0xe0359) [0x7f1a63087359] ) 0-management: Lock for vol gluster_shared_storage not held > [2019-04-16 14:47:31.828423] W [MSGID: 106117] [glusterd-handler.c:6451:__glusterd_peer_rpc_notify] 0-management: Lock not released for gluster_shared_storage > > *In my oponion Samba/CTDB can not perform the failover correctly and continue the write process because glusterfs didn't released the lock.* What do you think? It seems to me like a bug because in past time the failover works correctly. Regards David Spisla -------------- next part -------------- An HTML attachment was scrubbed... URL: From bgoldowsky at cast.org Tue Apr 16 13:41:48 2019 From: bgoldowsky at cast.org (Boris Goldowsky) Date: Tue, 16 Apr 2019 13:41:48 +0000 Subject: [Gluster-users] Volume stuck unable to add a brick In-Reply-To: References: <02CC8632-9B90-43F6-89FA-1160CE529667@contoso.com> <52179D48-7CF0-405E-805F-C5DCDF5B12CB@cast.org> Message-ID: That worked! Thank you SO much! Boris From: Karthik Subrahmanya Date: Tuesday, April 16, 2019 at 8:20 AM To: Boris Goldowsky Cc: Atin Mukherjee , Gluster-users Subject: Re: [Gluster-users] Volume stuck unable to add a brick Hi Boris, Thank you for providing the logs. The problem here is because of the "auth.allow: 127.0.0.1" setting on the volume. When you try to add a new brick to the volume internally replication module will try to set some metadata on the existing bricks to mark pending heal on the new brick, by creating a temporary mount. Because of the auth.allow setting that mount gets permission errors as seen in the below logs, leading to add-brick failure. From data-gluster-dockervols.log-webserver9 : [2019-04-15 14:00:34.226838] I [addr.c:55:compare_addr_and_update] 0-/data/gluster/dockervols: allowed = "127.0.0.1", received addr = "192.168.200.147" [2019-04-15 14:00:34.226895] E [MSGID: 115004] [authenticate.c:224:gf_authenticate] 0-auth: no authentication module is interested in accepting remote-client (null) [2019-04-15 14:00:34.227129] E [MSGID: 115001] [server-handshake.c:848:server_setvolume] 0-dockervols-server: Cannot authenticate client from webserver8.cast.org-55674-2019/04/15-14:00:20:495333-dockervols-client-2-0-0 3.12.2 [Permission denied] From dockervols-add-brick-mount.log : [2019-04-15 14:00:20.672033] W [MSGID: 114043] [client-handshake.c:1109:client_setvolume_cbk] 0-dockervols-client-2: failed to set the volume [Permission denied] [2019-04-15 14:00:20.672102] W [MSGID: 114007] [client-handshake.c:1138:client_setvolume_cbk] 0-dockervols-client-2: failed to get 'process-uuid' from reply dict [Invalid argument] [2019-04-15 14:00:20.672129] E [MSGID: 114044] [client-handshake.c:1144:client_setvolume_cbk] 0-dockervols-client-2: SETVOLUME on remote-host failed: Authentication failed [Permission denied] [2019-04-15 14:00:20.672151] I [MSGID: 114049] [client-handshake.c:1258:client_setvolume_cbk] 0-dockervols-client-2: sending AUTH_FAILED event This is a known issue and we are planning to fix this. For the time being we have a workaround for this. - Before you try adding the brick set the auth.allow option to default i.e., "*" or you can do this by running "gluster v reset auth.allow" - Add the brick - After it succeeds set back the auth.allow option to the previous value. Regards, Karthik On Tue, Apr 16, 2019 at 5:20 PM Boris Goldowsky > wrote: OK, log files attached. Boris From: Karthik Subrahmanya > Date: Tuesday, April 16, 2019 at 2:52 AM To: Atin Mukherjee >, Boris Goldowsky > Cc: Gluster-users > Subject: Re: [Gluster-users] Volume stuck unable to add a brick On Mon, Apr 15, 2019 at 9:43 PM Atin Mukherjee > wrote: +Karthik Subrahmanya Didn't we we fix this problem recently? Failed to set extended attribute indicates that temp mount is failing and we don't have quorum number of bricks up. We had two fixes which handles two kind of add-brick scenarios. [1] Fails add-brick when increasing the replica count if any of the brick is down to avoid data loss. This can be overridden by using the force option. [2] Allow add-brick to set the extended attributes by the temp mount if the volume is already mounted (has clients). They are in version 3.12.2 so, patch [1] is present there. But since they are using the force option it should not have any problem even if they have any brick down. The error message they are getting is also different, so it is not because of any brick being down I guess. Patch [2] is not present in 3.12.2 and it is not the conversion from plain distribute to replicate volume. So the scenario is different here. It seems like they are hitting some other issue. @Boris, Can you attach the add-brick's temp mount log. The file name should look something like "dockervols-add-brick-mount.log". Can you also provide all the brick logs of that volume during that time. [1] https://review.gluster.org/#/c/glusterfs/+/16330/ [2] https://review.gluster.org/#/c/glusterfs/+/21791/ Regards, Karthik Boris - What's the gluster version are you using? On Mon, Apr 15, 2019 at 7:35 PM Boris Goldowsky > wrote: Atin, thank you for the reply. Here are all of those pieces of information: [bgoldowsky at webserver9 ~]$ gluster --version glusterfs 3.12.2 (same on all nodes) [bgoldowsky at webserver9 ~]$ sudo gluster peer status Number of Peers: 3 Hostname: webserver11.cast.org Uuid: c2b147fd-cab4-4859-9922-db5730f8549d State: Peer in Cluster (Connected) Hostname: webserver1.cast.org Uuid: 4b918f65-2c9d-478e-8648-81d1d6526d4c State: Peer in Cluster (Connected) Other names: 192.168.200.131 webserver1 Hostname: webserver8.cast.org Uuid: be2f568b-61c5-4016-9264-083e4e6453a2 State: Peer in Cluster (Connected) Other names: webserver8 [bgoldowsky at webserver1 ~]$ sudo gluster v info Volume Name: dockervols Type: Replicate Volume ID: 6093a9c6-ec6c-463a-ad25-8c3e3305b98a Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: webserver1:/data/gluster/dockervols Brick2: webserver11:/data/gluster/dockervols Brick3: webserver9:/data/gluster/dockervols Options Reconfigured: nfs.disable: on transport.address-family: inet auth.allow: 127.0.0.1 Volume Name: testvol Type: Replicate Volume ID: 4d5f00f5-00ea-4dcf-babf-1a76eca55332 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 4 = 4 Transport-type: tcp Bricks: Brick1: webserver1:/data/gluster/testvol Brick2: webserver9:/data/gluster/testvol Brick3: webserver11:/data/gluster/testvol Brick4: webserver8:/data/gluster/testvol Options Reconfigured: transport.address-family: inet nfs.disable: on [bgoldowsky at webserver8 ~]$ sudo gluster v info Volume Name: dockervols Type: Replicate Volume ID: 6093a9c6-ec6c-463a-ad25-8c3e3305b98a Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: webserver1:/data/gluster/dockervols Brick2: webserver11:/data/gluster/dockervols Brick3: webserver9:/data/gluster/dockervols Options Reconfigured: nfs.disable: on transport.address-family: inet auth.allow: 127.0.0.1 Volume Name: testvol Type: Replicate Volume ID: 4d5f00f5-00ea-4dcf-babf-1a76eca55332 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 4 = 4 Transport-type: tcp Bricks: Brick1: webserver1:/data/gluster/testvol Brick2: webserver9:/data/gluster/testvol Brick3: webserver11:/data/gluster/testvol Brick4: webserver8:/data/gluster/testvol Options Reconfigured: nfs.disable: on transport.address-family: inet [bgoldowsky at webserver9 ~]$ sudo gluster v info Volume Name: dockervols Type: Replicate Volume ID: 6093a9c6-ec6c-463a-ad25-8c3e3305b98a Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: webserver1:/data/gluster/dockervols Brick2: webserver11:/data/gluster/dockervols Brick3: webserver9:/data/gluster/dockervols Options Reconfigured: nfs.disable: on transport.address-family: inet auth.allow: 127.0.0.1 Volume Name: testvol Type: Replicate Volume ID: 4d5f00f5-00ea-4dcf-babf-1a76eca55332 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 4 = 4 Transport-type: tcp Bricks: Brick1: webserver1:/data/gluster/testvol Brick2: webserver9:/data/gluster/testvol Brick3: webserver11:/data/gluster/testvol Brick4: webserver8:/data/gluster/testvol Options Reconfigured: nfs.disable: on transport.address-family: inet [bgoldowsky at webserver11 ~]$ sudo gluster v info Volume Name: dockervols Type: Replicate Volume ID: 6093a9c6-ec6c-463a-ad25-8c3e3305b98a Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: webserver1:/data/gluster/dockervols Brick2: webserver11:/data/gluster/dockervols Brick3: webserver9:/data/gluster/dockervols Options Reconfigured: auth.allow: 127.0.0.1 transport.address-family: inet nfs.disable: on Volume Name: testvol Type: Replicate Volume ID: 4d5f00f5-00ea-4dcf-babf-1a76eca55332 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 4 = 4 Transport-type: tcp Bricks: Brick1: webserver1:/data/gluster/testvol Brick2: webserver9:/data/gluster/testvol Brick3: webserver11:/data/gluster/testvol Brick4: webserver8:/data/gluster/testvol Options Reconfigured: transport.address-family: inet nfs.disable: on [bgoldowsky at webserver9 ~]$ sudo gluster volume add-brick dockervols replica 4 webserver8:/data/gluster/dockervols force volume add-brick: failed: Commit failed on webserver8.cast.org. Please check log file for details. Webserver8 glusterd.log: [2019-04-15 13:55:42.338197] I [MSGID: 106488] [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management: Received get vol req The message "I [MSGID: 106488] [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management: Received get vol req" repeated 2 times between [2019-04-15 13:55:42.338197] and [2019-04-15 13:55:42.341618] [2019-04-15 14:00:20.445011] I [run.c:190:runner_log] (-->/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0x3a215) [0x7fe697764215] -->/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0xe3e9d) [0x7fe69780de9d] -->/lib64/libglusterfs.so.0(runner_log+0x115) [0x7fe6a2d16ea5] ) 0-management: Ran script: /var/lib/glusterd/hooks/1/add-brick/pre/S28Quota-enable-root-xattr-heal.sh --volname=dockervols --version=1 --volume-op=add-brick --gd-workdir=/var/lib/glusterd [2019-04-15 14:00:20.445148] I [MSGID: 106578] [glusterd-brick-ops.c:1354:glusterd_op_perform_add_bricks] 0-management: replica-count is set 4 [2019-04-15 14:00:20.445184] I [MSGID: 106578] [glusterd-brick-ops.c:1364:glusterd_op_perform_add_bricks] 0-management: type is set 0, need to change it [2019-04-15 14:00:20.672347] E [MSGID: 106054] [glusterd-utils.c:13863:glusterd_handle_replicate_brick_ops] 0-management: Failed to set extended attribute trusted.add-brick : Transport endpoint is not connected [Transport endpoint is not connected] [2019-04-15 14:00:20.693491] E [MSGID: 101042] [compat.c:569:gf_umount_lazy] 0-management: Lazy unmount of /tmp/mntmvdFGq [Transport endpoint is not connected] [2019-04-15 14:00:20.693597] E [MSGID: 106074] [glusterd-brick-ops.c:2590:glusterd_op_add_brick] 0-glusterd: Unable to add bricks [2019-04-15 14:00:20.693637] E [MSGID: 106123] [glusterd-mgmt.c:312:gd_mgmt_v3_commit_fn] 0-management: Add-brick commit failed. [2019-04-15 14:00:20.693667] E [MSGID: 106123] [glusterd-mgmt-handler.c:616:glusterd_handle_commit_fn] 0-management: commit failed on operation Add brick Webserver11 log file: [2019-04-15 13:56:29.563270] I [MSGID: 106488] [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management: Received get vol req The message "I [MSGID: 106488] [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management: Received get vol req" repeated 2 times between [2019-04-15 13:56:29.563270] and [2019-04-15 13:56:29.566209] [2019-04-15 14:00:33.996866] I [run.c:190:runner_log] (-->/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0x3a215) [0x7f36de924215] -->/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0xe3e9d) [0x7f36de9cde9d] -->/lib64/libglusterfs.so.0(runner_log+0x115) [0x7f36e9ed6ea5] ) 0-management: Ran script: /var/lib/glusterd/hooks/1/add-brick/pre/S28Quota-enable-root-xattr-heal.sh --volname=dockervols --version=1 --volume-op=add-brick --gd-workdir=/var/lib/glusterd [2019-04-15 14:00:33.996979] I [MSGID: 106578] [glusterd-brick-ops.c:1354:glusterd_op_perform_add_bricks] 0-management: replica-count is set 4 [2019-04-15 14:00:33.997004] I [MSGID: 106578] [glusterd-brick-ops.c:1364:glusterd_op_perform_add_bricks] 0-management: type is set 0, need to change it [2019-04-15 14:00:34.013789] I [MSGID: 106132] [glusterd-proc-mgmt.c:84:glusterd_proc_stop] 0-management: nfs already stopped [2019-04-15 14:00:34.013849] I [MSGID: 106568] [glusterd-svc-mgmt.c:243:glusterd_svc_stop] 0-management: nfs service is stopped [2019-04-15 14:00:34.017535] I [MSGID: 106568] [glusterd-proc-mgmt.c:88:glusterd_proc_stop] 0-management: Stopping glustershd daemon running in pid: 6087 [2019-04-15 14:00:35.018783] I [MSGID: 106568] [glusterd-svc-mgmt.c:243:glusterd_svc_stop] 0-management: glustershd service is stopped [2019-04-15 14:00:35.018952] I [MSGID: 106567] [glusterd-svc-mgmt.c:211:glusterd_svc_start] 0-management: Starting glustershd service [2019-04-15 14:00:35.028306] I [MSGID: 106132] [glusterd-proc-mgmt.c:84:glusterd_proc_stop] 0-management: bitd already stopped [2019-04-15 14:00:35.028408] I [MSGID: 106568] [glusterd-svc-mgmt.c:243:glusterd_svc_stop] 0-management: bitd service is stopped [2019-04-15 14:00:35.028601] I [MSGID: 106132] [glusterd-proc-mgmt.c:84:glusterd_proc_stop] 0-management: scrub already stopped [2019-04-15 14:00:35.028645] I [MSGID: 106568] [glusterd-svc-mgmt.c:243:glusterd_svc_stop] 0-management: scrub service is stopped Thank you for taking a look! Boris From: Atin Mukherjee > Date: Friday, April 12, 2019 at 1:10 PM To: Boris Goldowsky > Cc: Gluster-users > Subject: Re: [Gluster-users] Volume stuck unable to add a brick On Fri, 12 Apr 2019 at 22:32, Boris Goldowsky > wrote: I?ve got a replicated volume with three bricks (?1x3=3?), the idea is to have a common set of files that are locally available on all the machines (Scientific Linux 7, which is essentially CentOS 7) in a cluster. I tried to add on a fourth machine, so used a command like this: sudo gluster volume add-brick dockervols replica 4 webserver8:/data/gluster/dockervols force but the result is: volume add-brick: failed: Commit failed on webserver1. Please check log file for details. Commit failed on webserver8. Please check log file for details. Commit failed on webserver11. Please check log file for details. Tried: removing the new brick (this also fails) and trying again. Tried: checking the logs. The log files are not enlightening to me ? I don?t know what?s normal and what?s not. From webserver8 & webserver11 could you attach glusterd log files? Also please share following: - gluster version? (gluster ?version) - Output of ?gluster peer status? - Output of ?gluster v info? from all 4 nodes. Tried: deleting the brick directory from previous attempt, so that it?s not in the way. Tried: restarting gluster services Tried: rebooting Tried: setting up a new volume, replicated to all four machines. This works, so I?m assuming it?s not a networking issue. But still fails with this existing volume that has the critical data in it. Running out of ideas. Any suggestions? Thank you! Boris _______________________________________________ Gluster-users mailing list Gluster-users at gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users -- --Atin -------------- next part -------------- An HTML attachment was scrubbed... URL: From cody at platform9.com Tue Apr 16 22:09:28 2019 From: cody at platform9.com (Cody Hill) Date: Tue, 16 Apr 2019 17:09:28 -0500 Subject: [Gluster-users] GlusterFS on ZFS Message-ID: <5F070389-0E92-4277-927E-80B7C65FC5C0@platform9.com> Hey folks. I?m looking to deploy GlusterFS to host some VMs. I?ve done a lot of reading and would like to implement Deduplication and Compression in this setup. My thought would be to run ZFS to handle the Compression and Deduplication. ZFS would give me the following benefits: 1. If a single disk fails rebuilds happen locally instead of over the network 2. Zil & L2Arc should add a slight performance increase 3. Deduplication and Compression are inline and have pretty good performance with modern hardware (Intel Skylake) 4. Automated Snapshotting I can then layer GlusterFS on top to handle distribution to allow 3x Replicas of my storage. My question is? Why aren?t more people doing this? Is this a horrible idea for some reason that I?m missing? I?d be very interested to hear your thoughts. Additional thoughts: I?d like to use Ganesha pNFS to connect to this storage. (Any issues here?) I think I?d need KeepAliveD across these 3x nodes to store in the FSTAB (Is this correct?) I?m also thinking about creating a ?Gluster Tier? of 512GB of Intel Optane DIMM to really smooth out write latencies? Any issues here? Thank you, Cody Hill From pascal.suter at dalco.ch Wed Apr 17 15:34:38 2019 From: pascal.suter at dalco.ch (Pascal Suter) Date: Wed, 17 Apr 2019 17:34:38 +0200 Subject: [Gluster-users] GlusterFS on ZFS In-Reply-To: <5F070389-0E92-4277-927E-80B7C65FC5C0@platform9.com> References: <5F070389-0E92-4277-927E-80B7C65FC5C0@platform9.com> Message-ID: <91e1e030-16f5-0938-f320-9d773f5dc4ee@dalco.ch> Hi Cody i'm still new to Gluster myself, so take my input with the necessary skepticism: if you care about performance (and it looks like you do), use zfs mirror pairs and not raidz volumes. in my experience (outside of gluster), raidz pools perform significantly worse than a hardware raid5 or 6. if you combine a mirror on zfs with a 3x replication on gluster, you need 6x the amount of raw disk space to get your desired redundancy.. you could do with 3x the amount of diskspace, if you left the zfs mirror away and accept the rebuild of a lost disk over the network or you could end up somewhere beween 3x and 6x if you used hardware raid6 instead of zfs on the bricks. When using hardware raid6 make sure you align your lvm volumes properly, it makes a huge difference in performance. Okay, deduplication might give you some of it back, but benchmark the zfs deduplication process first before deciding on it. in theory it could add to your write perofrmance, but i'm not sure if that's going to happen in reality. snapshotting might be tricky.. afaik gluster natively supports snapshotting with thin provisioned lvm volumes only. this lets you create snapshots with the "gluster" cli tool. gluster will then handle consistency across all your bricks so that each snapshot (as a whole, across all bricks) is consistent in itself. this includes some challenges about handling open file sessions etc. I'm not familiar with what gluster actually does but by reading the documentation and some discussion about snapshots it seems that there is more to it than simply automate a couple of lvcreate statements. so i would expect some challenges when doing it yourself on zfs rather than letting gluster handle it. Restoring a single file from a snapshot also seems alot easier if you go with the lvm thin setup.. you can then mount a snapshot (of your entire gluster volume, not just of a brick) and simply copy the file.. while with zfs it seems you need to find out which bricks your file resided on, then copy the necessary raw data to your live bricks which is something i would not feel comfortable doing and it is a lot more work and prone to error. also, if things go wrong (for example when dealing with the snapshots), there are probably not so many people around to help you. again, i am no expert, that's just what i'd be concerned about with the little knowledge i have at the moment :) cheers Pascal On 17.04.19 00:09, Cody Hill wrote: > Hey folks. > > I?m looking to deploy GlusterFS to host some VMs. I?ve done a lot of reading and would like to implement Deduplication and Compression in this setup. My thought would be to run ZFS to handle the Compression and Deduplication. > > ZFS would give me the following benefits: > 1. If a single disk fails rebuilds happen locally instead of over the network > 2. Zil & L2Arc should add a slight performance increase > 3. Deduplication and Compression are inline and have pretty good performance with modern hardware (Intel Skylake) > 4. Automated Snapshotting > > I can then layer GlusterFS on top to handle distribution to allow 3x Replicas of my storage. > My question is? Why aren?t more people doing this? Is this a horrible idea for some reason that I?m missing? I?d be very interested to hear your thoughts. > > Additional thoughts: > I?d like to use Ganesha pNFS to connect to this storage. (Any issues here?) > I think I?d need KeepAliveD across these 3x nodes to store in the FSTAB (Is this correct?) > I?m also thinking about creating a ?Gluster Tier? of 512GB of Intel Optane DIMM to really smooth out write latencies? Any issues here? > > Thank you, > Cody Hill > > > > > > > > > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users From hunter86_bg at yahoo.com Thu Apr 18 03:21:26 2019 From: hunter86_bg at yahoo.com (Strahil) Date: Thu, 18 Apr 2019 06:21:26 +0300 Subject: [Gluster-users] GlusterFS on ZFS Message-ID: Hi Code, Keep in mind that if you like the thin LVM approach, you can still use VDO (Red Hat-based systems) and get that deduplication/compression. VDO most probably will require some tuning to get the writes fast enough, but the reads can be way faster. Best Regards, Strahil NikolovOn Apr 17, 2019 18:34, Pascal Suter wrote: > > Hi Cody > > i'm still new to Gluster myself, so take my input with the necessary > skepticism: > > if you care about performance (and it looks like you do), use zfs mirror > pairs and not raidz volumes. in my experience (outside of gluster), > raidz pools perform significantly worse than a hardware raid5 or 6. if > you combine a mirror on zfs with a 3x replication on gluster, you need > 6x the amount of raw disk space to get your desired redundancy.. you > could do with 3x the amount of diskspace, if you left the zfs mirror > away and accept the rebuild of a lost disk over the network or you could > end up somewhere beween 3x and 6x if you used hardware raid6 instead of > zfs on the bricks. When using hardware raid6 make sure you align your > lvm volumes properly, it makes a huge difference in performance. Okay, > deduplication might give you some of it back, but benchmark the zfs > deduplication process first before deciding on it. in theory it could > add to your write perofrmance, but i'm not sure if that's going to > happen in reality. > > snapshotting might be tricky.. afaik gluster natively supports > snapshotting with thin provisioned lvm volumes only. this lets you > create snapshots with the "gluster" cli tool. gluster will then handle > consistency across all your bricks so that each snapshot (as a whole, > across all bricks) is consistent in itself. this includes some > challenges about handling open file sessions etc. I'm not familiar with > what gluster actually does but by reading the documentation and some > discussion about snapshots it seems that there is more to it than simply > automate a couple of lvcreate statements. so i would expect some > challenges when doing it yourself on zfs rather than letting gluster > handle it. Restoring a single file from a snapshot also seems alot > easier if you go with the lvm thin setup.. you can then mount a snapshot > (of your entire gluster volume, not just of a brick) and simply copy the > file.. while with zfs it seems you need to find out which bricks your > file resided on, then copy the necessary raw data to your live bricks > which is something i would not feel comfortable doing and it is a lot > more work and prone to error. > > also, if things go wrong (for example when dealing with the snapshots), > there are probably not so many people around to help you. > > again, i am no expert, that's just what i'd be concerned about with the > little knowledge i have at the moment :) > > cheers > > Pascal > > On 17.04.19 00:09, Cody Hill wrote: > > Hey folks. > > > > I?m looking to deploy GlusterFS to host some VMs. I?ve done a lot of reading and would like to implement Deduplication and Compression in this setup. My thought would be to run ZFS to handle the Compression and Deduplication. > > > > ZFS would give me the following benefits: > > 1. If a single disk fails rebuilds happen locally instead of over the network > > 2. Zil & L2Arc should add a slight performance increase > > 3. Deduplication and Compression are inline and have pretty good performance with modern hardware (Intel Skylake) > > 4. Automated Snapshotting > > > > I can then layer GlusterFS on top to handle distribution to allow 3x Replicas of my storage. > > My question is? Why aren?t more people doing this? Is this a horrible idea for some reason that I?m missing? I?d be very interested to hear your thoughts. > > > > Additional thoughts: > > I?d like to use Ganesha pNFS to connect to this storage. (Any issues here?) > > I think I?d need KeepAliveD across these 3x nodes to store in the FSTAB (Is this correct?) > > I?m also thinking about creating a ?Gluster Tier? of 512GB of Intel Optane DIMM to really smooth out write latencies? Any issues here? > > > > Thank you, > > Cody Hill > > > > > > > > > > > > > > > > > > > > > > > > _____________________ From hgichon at gmail.com Thu Apr 18 07:20:47 2019 From: hgichon at gmail.com (hgichon) Date: Thu, 18 Apr 2019 16:20:47 +0900 Subject: [Gluster-users] Hard Failover with Samba and Glusterfs In-Reply-To: References: Message-ID: Hi. I have a some question about your testing. 1. What was the glusterfs version you used in past time? 2. How about a volume configuration? 3. Was CTDB vip failed over correctly? If so, Clould you attach /var/log/samba/glusterfs-volname.win10.ip.log ? Best Regards - kpkim 2019? 4? 17? (?) ?? 5:02, David Spisla ?? ??: > Dear Gluster Community, > > I have this setup: 4-Node Glusterfs v5.5 Cluster, using SAMBA/CTDB v4.8 to > access the volumes (each node has a VIP) > > I was testing this failover scenario: > > 1. Start Writing 940 GB with small files (64K-100K)from a Win10 Client to > node1 > 2. During the write process I hardly shutdown node1 (where the client is > connect via VIP) by turn off the power > > My expectation is, that the write process stops and after a while the > Win10 Client offers me a Retry, so I can continue the write on different > node (which has now the VIP of node1). > In past time I did this observation, but now the system shows a strange > bahaviour: > > The Win10 Client do nothing and the Explorer freezes, in the backend CTDB > can not perform the failover and throws errors. The glusterd from node2 and > node3 logs this messages: > >> [2019-04-16 14:47:31.828323] W [glusterd-locks.c:795:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0x24349) [0x7f1a62fcb349] -->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0x2d950) [0x7f1a62fd4950] -->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0xe0359) [0x7f1a63087359] ) 0-management: Lock for vol archive1 not held >> [2019-04-16 14:47:31.828350] W [MSGID: 106117] [glusterd-handler.c:6451:__glusterd_peer_rpc_notify] 0-management: Lock not released for archive1 >> [2019-04-16 14:47:31.828369] W [glusterd-locks.c:795:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0x24349) [0x7f1a62fcb349] -->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0x2d950) [0x7f1a62fd4950] -->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0xe0359) [0x7f1a63087359] ) 0-management: Lock for vol archive2 not held >> [2019-04-16 14:47:31.828376] W [MSGID: 106117] [glusterd-handler.c:6451:__glusterd_peer_rpc_notify] 0-management: Lock not released for archive2 >> [2019-04-16 14:47:31.828412] W [glusterd-locks.c:795:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0x24349) [0x7f1a62fcb349] -->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0x2d950) [0x7f1a62fd4950] -->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0xe0359) [0x7f1a63087359] ) 0-management: Lock for vol gluster_shared_storage not held >> [2019-04-16 14:47:31.828423] W [MSGID: 106117] [glusterd-handler.c:6451:__glusterd_peer_rpc_notify] 0-management: Lock not released for gluster_shared_storage >> >> > *In my oponion Samba/CTDB can not perform the failover correctly and > continue the write process because glusterfs didn't released the lock.* > What do you think? It seems to me like a bug because in past time the > failover works correctly. > > Regards > David Spisla > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From lemonnierk at ulrar.net Thu Apr 18 07:27:23 2019 From: lemonnierk at ulrar.net (lemonnierk at ulrar.net) Date: Thu, 18 Apr 2019 08:27:23 +0100 Subject: [Gluster-users] Settings for VM hosting Message-ID: <20190418072722.GF25080@althea.ulrar.net> Hi, We've been using the same settings, found in an old email here, since v3.7 of gluster for our VM hosting volumes. They've been working fine but since we've just installed a v6 for testing I figured there might be new settings I should be aware of. So for access through the libgfapi (qemu), for VM hard drives, is that still optimal and recommended ? Volume Name: glusterfs Type: Replicate Volume ID: b28347ff-2c27-44e0-bc7d-c1c017df7cd1 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: ips1adm.X:/mnt/glusterfs/brick Brick2: ips2adm.X:/mnt/glusterfs/brick Brick3: ips3adm.X:/mnt/glusterfs/brick Options Reconfigured: performance.readdir-ahead: on cluster.quorum-type: auto cluster.server-quorum-type: server network.remote-dio: enable cluster.eager-lock: enable performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.stat-prefetch: off features.shard: on features.shard-block-size: 64MB cluster.data-self-heal-algorithm: full network.ping-timeout: 30 diagnostics.count-fop-hits: on diagnostics.latency-measurement: on transport.address-family: inet nfs.disable: on performance.client-io-threads: off Thanks ! From karli at inparadise.se Thu Apr 18 07:32:30 2019 From: karli at inparadise.se (=?utf-8?B?S2FybGkgU2rDtmJlcmc=?=) Date: Thu, 18 Apr 2019 09:32:30 +0200 (CEST) Subject: [Gluster-users] GlusterFS on ZFS Message-ID: <085deed5-f048-4baa-84f8-1f6ef1436a5b@email.android.com> An HTML attachment was scrubbed... URL: From srangana at redhat.com Thu Apr 18 11:08:16 2019 From: srangana at redhat.com (Shyam Ranganathan) Date: Thu, 18 Apr 2019 07:08:16 -0400 Subject: [Gluster-users] Announcing Gluster release 5.6 Message-ID: The Gluster community is pleased to announce the release of Gluster 5.6 (packages available at [1]). Release notes for the release can be found at [2]. Major changes, features and limitations addressed in this release: - Release 5.x had a long standing issue where network bandwidth usage was much higher than in prior releases. This issue has been addressed in this release. Bug 1673058 has more details regarding the issue [3]. Thanks, Gluster community [1] Packages for 5.6: https://download.gluster.org/pub/gluster/glusterfs/5/5.6/ [2] Release notes for 5.6: https://docs.gluster.org/en/latest/release-notes/5.6/ [3] Bandwidth usage bug: https://bugzilla.redhat.com/show_bug.cgi?id=1673058 From snowmailer at gmail.com Thu Apr 18 13:13:25 2019 From: snowmailer at gmail.com (Martin Toth) Date: Thu, 18 Apr 2019 15:13:25 +0200 Subject: [Gluster-users] Settings for VM hosting In-Reply-To: <20190418072722.GF25080@althea.ulrar.net> References: <20190418072722.GF25080@althea.ulrar.net> Message-ID: <3FCC8050-52F9-4469-B714-E92FA440C146@gmail.com> Hi, I am curious about your setup and settings also. I have exactly same setup and use case. - why do you use sharding on replica3? Do you have various size of bricks(disks) pre node? Wonder if someone will share settings for this setup. BR! > On 18 Apr 2019, at 09:27, lemonnierk at ulrar.net wrote: > > Hi, > > We've been using the same settings, found in an old email here, since > v3.7 of gluster for our VM hosting volumes. They've been working fine > but since we've just installed a v6 for testing I figured there might > be new settings I should be aware of. > > So for access through the libgfapi (qemu), for VM hard drives, is that > still optimal and recommended ? > > Volume Name: glusterfs > Type: Replicate > Volume ID: b28347ff-2c27-44e0-bc7d-c1c017df7cd1 > Status: Started > Snapshot Count: 0 > Number of Bricks: 1 x 3 = 3 > Transport-type: tcp > Bricks: > Brick1: ips1adm.X:/mnt/glusterfs/brick > Brick2: ips2adm.X:/mnt/glusterfs/brick > Brick3: ips3adm.X:/mnt/glusterfs/brick > Options Reconfigured: > performance.readdir-ahead: on > cluster.quorum-type: auto > cluster.server-quorum-type: server > network.remote-dio: enable > cluster.eager-lock: enable > performance.quick-read: off > performance.read-ahead: off > performance.io-cache: off > performance.stat-prefetch: off > features.shard: on > features.shard-block-size: 64MB > cluster.data-self-heal-algorithm: full > network.ping-timeout: 30 > diagnostics.count-fop-hits: on > diagnostics.latency-measurement: on > transport.address-family: inet > nfs.disable: on > performance.client-io-threads: off > > Thanks ! > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users From nl at fischer-ka.de Thu Apr 18 13:44:28 2019 From: nl at fischer-ka.de (nl at fischer-ka.de) Date: Thu, 18 Apr 2019 15:44:28 +0200 Subject: [Gluster-users] Settings for VM hosting In-Reply-To: <3FCC8050-52F9-4469-B714-E92FA440C146@gmail.com> References: <20190418072722.GF25080@althea.ulrar.net> <3FCC8050-52F9-4469-B714-E92FA440C146@gmail.com> Message-ID: <5cf68e1a-6c8f-9f18-72be-85e30c6a00b3@fischer-ka.de> Hi, I have setup my storage for my nodes (also replica 3, but distributed replicated volume with some more nodes) just some weeks ago based on the "virt group" as recommended ... and here is mine: cluster.choose-local: off user.cifs: off features.shard: on cluster.shd-wait-qlength: 10000 cluster.shd-max-threads: 8 cluster.locking-scheme: granular cluster.data-self-heal-algorithm: diff cluster.server-quorum-type: server cluster.quorum-type: auto cluster.eager-lock: enable network.remote-dio: enable performance.low-prio-threads: 32 performance.io-cache: off performance.read-ahead: off performance.quick-read: off performance.client-io-threads: off nfs.disable: on transport.address-family: inet cluster.granular-entry-heal: enable I only changed the data-self-heal-algorithm because CPU is not limiting that much on my nodes so I chose that over bandwith (based on my understanding of the docs). I have some more nodes, so sharding will better distribute the data between my nodes Ingo Am 18.04.19 um 15:13 schrieb Martin Toth: > Hi, > > I am curious about your setup and settings also. I have exactly same setup and use case. > > - why do you use sharding on replica3? Do you have various size of bricks(disks) pre node? > > Wonder if someone will share settings for this setup. > > BR! > >> On 18 Apr 2019, at 09:27, lemonnierk at ulrar.net wrote: >> >> Hi, >> >> We've been using the same settings, found in an old email here, since >> v3.7 of gluster for our VM hosting volumes. They've been working fine >> but since we've just installed a v6 for testing I figured there might >> be new settings I should be aware of. >> >> So for access through the libgfapi (qemu), for VM hard drives, is that >> still optimal and recommended ? >> >> Volume Name: glusterfs >> Type: Replicate >> Volume ID: b28347ff-2c27-44e0-bc7d-c1c017df7cd1 >> Status: Started >> Snapshot Count: 0 >> Number of Bricks: 1 x 3 = 3 >> Transport-type: tcp >> Bricks: >> Brick1: ips1adm.X:/mnt/glusterfs/brick >> Brick2: ips2adm.X:/mnt/glusterfs/brick >> Brick3: ips3adm.X:/mnt/glusterfs/brick >> Options Reconfigured: >> performance.readdir-ahead: on >> cluster.quorum-type: auto >> cluster.server-quorum-type: server >> network.remote-dio: enable >> cluster.eager-lock: enable >> performance.quick-read: off >> performance.read-ahead: off >> performance.io-cache: off >> performance.stat-prefetch: off >> features.shard: on >> features.shard-block-size: 64MB >> cluster.data-self-heal-algorithm: full >> network.ping-timeout: 30 >> diagnostics.count-fop-hits: on >> diagnostics.latency-measurement: on >> transport.address-family: inet >> nfs.disable: on >> performance.client-io-threads: off >> >> Thanks ! >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > From lemonnierk at ulrar.net Thu Apr 18 14:15:51 2019 From: lemonnierk at ulrar.net (lemonnierk at ulrar.net) Date: Thu, 18 Apr 2019 15:15:51 +0100 Subject: [Gluster-users] Settings for VM hosting In-Reply-To: <3FCC8050-52F9-4469-B714-E92FA440C146@gmail.com> References: <20190418072722.GF25080@althea.ulrar.net> <3FCC8050-52F9-4469-B714-E92FA440C146@gmail.com> Message-ID: <20190418141551.GG25080@althea.ulrar.net> On Thu, Apr 18, 2019 at 03:13:25PM +0200, Martin Toth wrote: > Hi, > > I am curious about your setup and settings also. I have exactly same setup and use case. > > - why do you use sharding on replica3? Do you have various size of bricks(disks) pre node? > Back in the 3.7 era there was a bug locking the files during heal. So without sharding the whole disk was locked for ~30 minutes (depending on the disk's size of course), it was briningh the whole service down during heals. We started using shards then because it locks only the shard being healed instead of the whole file, I believe the bug has been fixed since but we've kept it just in case. As a bonus combined to the heal algo full (just re-transmit the shard instead of trying to figure out what's changed) it's much, much faster heal times with very little cpu usage, so really there's no reason not to imho, sharding is great. Might be different if you have big dedicated servers for gluster and nothing else to do with your cpu, I don't know, but for us sharding is a big gain during heals, which unfortunatly is very common on OVH's shaky vRacks :( From budic at onholyground.com Thu Apr 18 16:19:06 2019 From: budic at onholyground.com (Darrell Budic) Date: Thu, 18 Apr 2019 11:19:06 -0500 Subject: [Gluster-users] GlusterFS on ZFS In-Reply-To: <5F070389-0E92-4277-927E-80B7C65FC5C0@platform9.com> References: <5F070389-0E92-4277-927E-80B7C65FC5C0@platform9.com> Message-ID: I use ZFS over VOD because I?m more familiar with it and it suites my use case better. I got similar results from performance tests, with VOD outperforming writes slight and ZFS outperforming reads. That was before I added some ZIL and cache to my ZFS disks, too. I also don?t like that you have to specify estimated sizes with VOD for compression, I prefer the ZFS approach. Don?t forget to set the appropriate zfs attributes, the parts of the Gluster doc with those are still valid. Few more comments inline: > On Apr 16, 2019, at 5:09 PM, Cody Hill wrote: > > Hey folks. > > I?m looking to deploy GlusterFS to host some VMs. I?ve done a lot of reading and would like to implement Deduplication and Compression in this setup. My thought would be to run ZFS to handle the Compression and Deduplication. > > ZFS would give me the following benefits: > 1. If a single disk fails rebuilds happen locally instead of over the network I actually run mine in a pure stripe for best performance, if a disk fails and smart warnings didn?t give me enough time to replace it inline first, I?ll rebuild over the network. I have 10G of course, and currently < 10TB of data so I consider it reasonable. I also decided I?d rather present one large brick over many smaller bricks, in some tests others have done, it has shown benefits for gluster healing. > 2. Zil & L2Arc should add a slight performance increase Yes. Get the absolute fasted ZIL you can, but any modern enterprise SSD will still give you some benefits. Over-provision these, you probably need 4-15Gb for the Zil (1G networking vs 10G), and I use 90% of the cache drive to allow the SSD to work it?s best. Cache effectiveness depends on your workload, so monitor and/or test with/without. > 3. Deduplication and Compression are inline and have pretty good performance with modern hardware (Intel Skylake) LZ4 compression is great. As others have said, I?d avoid deduplication altogether. Especially in a gluster environment, why waste the RAM and do the work multiple times? > 4. Automated Snapshotting Be careful doing this ?underneath? the gluster layer, you?re snapshotting only that replica and it?s not guaranteed to be in sync with the others. At best, you?re making a point in time backup of one node, maybe useful for off-system backups with zfs streaming, but I?d consider gluster geo-rep first. And won?t work at all if you are not running a pure replica. > I can then layer GlusterFS on top to handle distribution to allow 3x Replicas of my storage. > My question is? Why aren?t more people doing this? Is this a horrible idea for some reason that I?m missing? I?d be very interested to hear your thoughts. > > Additional thoughts: > I?d like to use Ganesha pNFS to connect to this storage. (Any issues here?) I?d just use glusterfs glfsapi mounts, but if you want to go NFS, sure. Make sure you?re ready to support Ganesha, it doesn?t seem to be as well integrated in the latest gluster releases. Caveat, I don?t use it myself. > I think I?d need KeepAliveD across these 3x nodes to store in the FSTAB (Is this correct?) There are easier ways. I use a simple DNS round robin to a name (that i can put in the host files for the servers/clients to avoid bootstrap issues when the local DNS is a vm ;)), and set the backup-server option so nodes can switch automatically if one fails. Or you can mount localhost: with a converged cluster, again with backup-server options for best results. > I?m also thinking about creating a ?Gluster Tier? of 512GB of Intel Optane DIMM to really smooth out write latencies? Any issues here? Gluster tiering is currently being dropped from support, until/unless it comes back, I?d use the optanes as cache/zil or just make a separate fast pool out of them. From hunter86_bg at yahoo.com Thu Apr 18 17:23:24 2019 From: hunter86_bg at yahoo.com (Strahil) Date: Thu, 18 Apr 2019 20:23:24 +0300 Subject: [Gluster-users] Settings for VM hosting Message-ID: Sharding has one benefit for me (oVirt) -> faster heal after maintenance. Otherwise imagine 150 GB VM disk - while you reboot recently patched node , all files on the running replica will be marked for replication. Either it will consume alot of CPU ( to find the neccessary ofsets for heal) or use full heal and replicate the whole file. With sharding - it's quite simple and fast. Best Regards, Strahil NikolovOn Apr 18, 2019 16:13, Martin Toth wrote: > > Hi, > > I am curious about your setup and settings also. I have exactly same setup and use case. > > - why do you use sharding on replica3? Do you have various size of bricks(disks) pre node? > > Wonder if someone will share settings for this setup. > > BR! > > > On 18 Apr 2019, at 09:27, lemonnierk at ulrar.net wrote: > > > > Hi, > > > > We've been using the same settings, found in an old email here, since > > v3.7 of gluster for our VM hosting volumes. They've been working fine > > but since we've just installed a v6 for testing I figured there might > > be new settings I should be aware of. > > > > So for access through the libgfapi (qemu), for VM hard drives, is that > > still optimal and recommended ? > > > > Volume Name: glusterfs > > Type: Replicate > > Volume ID: b28347ff-2c27-44e0-bc7d-c1c017df7cd1 > > Status: Started > > Snapshot Count: 0 > > Number of Bricks: 1 x 3 = 3 > > Transport-type: tcp > > Bricks: > > Brick1: ips1adm.X:/mnt/glusterfs/brick > > Brick2: ips2adm.X:/mnt/glusterfs/brick > > Brick3: ips3adm.X:/mnt/glusterfs/brick > > Options Reconfigured: > > performance.readdir-ahead: on > > cluster.quorum-type: auto > > cluster.server-quorum-type: server > > network.remote-dio: enable > > cluster.eager-lock: enable > > performance.quick-read: off > > performance.read-ahead: off > > performance.io-cache: off > > performance.stat-prefetch: off > > features.shard: on > > features.shard-block-size: 64MB > > cluster.data-self-heal-algorithm: full > > network.ping-timeout: 30 > > diagnostics.count-fop-hits: on > > diagnostics.latency-measurement: on > > transport.address-family: inet > > nfs.disable: on > > performance.client-io-threads: off > > > > Thanks ! > > _______________________________________________ > > Gluster-users mailing list > > Gluster-users at gluster.org > > https://lists.gluster.org/mailman/listinfo/gluster-users > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users From hunter86_bg at yahoo.com Thu Apr 18 17:30:32 2019 From: hunter86_bg at yahoo.com (Strahil) Date: Thu, 18 Apr 2019 20:30:32 +0300 Subject: [Gluster-users] GlusterFS on ZFS Message-ID: Aabout those Optanes -> if you decide to go LVM , you can use them as cache pools for your largest bricks. Cached LVs can be converted to cache pools and create bricks out of it. You have a lot of options... Best Regards, Strahil NikolovOn Apr 18, 2019 19:19, Darrell Budic wrote: > > I use ZFS over VOD because I?m more familiar with it and it suites my use case better. I got similar results from performance tests, with VOD outperforming writes slight and ZFS outperforming reads. That was before I added some ZIL and cache to my ZFS disks, too. I also don?t like that you have to specify estimated sizes with VOD for compression, I prefer the ZFS approach. Don?t forget to set the appropriate zfs attributes, the parts of the Gluster doc with those are still valid. > > Few more comments inline: > > > On Apr 16, 2019, at 5:09 PM, Cody Hill wrote: > > > > Hey folks. > > > > I?m looking to deploy GlusterFS to host some VMs. I?ve done a lot of reading and would like to implement Deduplication and Compression in this setup. My thought would be to run ZFS to handle the Compression and Deduplication. > > > > ZFS would give me the following benefits: > > 1. If a single disk fails rebuilds happen locally instead of over the network > > I actually run mine in a pure stripe for best performance, if a disk fails and smart warnings didn?t give me enough time to replace it inline first, I?ll rebuild over the network. I have 10G of course, and currently < 10TB of data so I consider it reasonable. I also decided I?d rather present one large brick over many smaller bricks, in some tests others have done, it has shown benefits for gluster healing. > > > 2. Zil & L2Arc should add a slight performance increase > > Yes. Get the absolute fasted ZIL you can, but any modern enterprise SSD will still give you some benefits. Over-provision these, you probably need 4-15Gb for the Zil (1G networking vs 10G), and I use 90% of the cache drive to allow the SSD to work it?s best. Cache effectiveness depends on your workload, so monitor and/or test with/without. > > > 3. Deduplication and Compression are inline and have pretty good performance with modern hardware (Intel Skylake) > > LZ4 compression is great. As others have said, I?d avoid deduplication altogether. Especially in a gluster environment, why waste the RAM and do the work multiple times? > > > 4. Automated Snapshotting > > Be careful doing this ?underneath? the gluster layer, you?re snapshotting only that replica and it?s not guaranteed to be in sync with the others. At best, you?re making a point in time backup of one node, maybe useful for off-system backups with zfs streaming, but I?d consider gluster geo-rep first. And won?t work at all if you are not running a pure replica. > > > I can then layer GlusterFS on top to handle distribution to allow 3x Replicas of my storage. > > My question is? Why aren?t more people doing this? Is this a horrible idea for some reason that I?m missing? I?d be very interested to hear your thoughts. > > > > Additional thoughts: > > I?d like to use Ganesha pNFS to connect to this storage. (Any issues here?) > > I?d just use glusterfs glfsapi mounts, but if you want to go NFS, sure. Make sure you?re ready to support Ganesha, it doesn?t seem to be as well integrated in the latest gluster releases. Caveat, I don?t use it myself. > > > I think I?d need KeepAliveD across these 3x nodes to store in the FSTAB (Is this correct?) > > There are easier ways. I use a simple DNS round robin to a name (that i can put in the host files for the servers/clients to avoid bootstrap issues when the local DNS is a vm ;)), and set the backup-server option so nodes can switch automatically if one fails. Or you can mount localhost: with a converged cluster, again with backup-server options for best results. > > > I?m also thinking about creating a ?Gluster Tier? of 512GB of Intel Optane DIMM to really smooth out write latencies? Any issues here? > > Gluster tiering is currently being dropped from support, until/unless it comes back, I?d use the optanes as cache/zil or just make a separate fast pool out of them. > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users From glusterusers at davepedu.com Fri Apr 19 00:06:19 2019 From: glusterusers at davepedu.com (Dave Pedu) Date: Thu, 18 Apr 2019 17:06:19 -0700 Subject: [Gluster-users] GlusterFS on ZFS In-Reply-To: <5F070389-0E92-4277-927E-80B7C65FC5C0@platform9.com> References: <5F070389-0E92-4277-927E-80B7C65FC5C0@platform9.com> Message-ID: Do check this doc: https://docs.gluster.org/en/latest/Administrator%20Guide/Gluster%20On%20ZFS/#build-install-zfs In particular, the bit regarding xattr=sa. In the past, Gluster would cause extremely poor performance on zfs datasets without this option set. I'm not sure if this is still the case. - Dave On 2019-04-16 15:09, Cody Hill wrote: > Hey folks. > > I?m looking to deploy GlusterFS to host some VMs. I?ve done a lot of > reading and would like to implement Deduplication and Compression in > this setup. My thought would be to run ZFS to handle the Compression > and Deduplication. > > ZFS would give me the following benefits: > 1. If a single disk fails rebuilds happen locally instead of over the > network > 2. Zil & L2Arc should add a slight performance increase > 3. Deduplication and Compression are inline and have pretty good > performance with modern hardware (Intel Skylake) > 4. Automated Snapshotting > > I can then layer GlusterFS on top to handle distribution to allow 3x > Replicas of my storage. > My question is? Why aren?t more people doing this? Is this a horrible > idea for some reason that I?m missing? I?d be very interested to hear > your thoughts. > > Additional thoughts: > I?d like to use Ganesha pNFS to connect to this storage. (Any issues > here?) > I think I?d need KeepAliveD across these 3x nodes to store in the > FSTAB (Is this correct?) > I?m also thinking about creating a ?Gluster Tier? of 512GB of Intel > Optane DIMM to really smooth out write latencies? Any issues here? > > Thank you, > Cody Hill > > > > > > > > > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users From kdhananj at redhat.com Fri Apr 19 01:17:49 2019 From: kdhananj at redhat.com (Krutika Dhananjay) Date: Fri, 19 Apr 2019 06:47:49 +0530 Subject: [Gluster-users] Settings for VM hosting In-Reply-To: <20190418072722.GF25080@althea.ulrar.net> References: <20190418072722.GF25080@althea.ulrar.net> Message-ID: Looks good mostly. You can also turn on performance.stat-prefetch, and also set client.event-threads and server.event-threads to 4. And if your bricks are on ssds, then you could also enable performance.client-io-threads. And if your bricks and hypervisors are on same set of machines (hyperconverged), then you can turn off cluster.choose-local and see if it helps read performance. Do let us know what helped and what didn't. -Krutika On Thu, Apr 18, 2019 at 1:05 PM wrote: > Hi, > > We've been using the same settings, found in an old email here, since > v3.7 of gluster for our VM hosting volumes. They've been working fine > but since we've just installed a v6 for testing I figured there might > be new settings I should be aware of. > > So for access through the libgfapi (qemu), for VM hard drives, is that > still optimal and recommended ? > > Volume Name: glusterfs > Type: Replicate > Volume ID: b28347ff-2c27-44e0-bc7d-c1c017df7cd1 > Status: Started > Snapshot Count: 0 > Number of Bricks: 1 x 3 = 3 > Transport-type: tcp > Bricks: > Brick1: ips1adm.X:/mnt/glusterfs/brick > Brick2: ips2adm.X:/mnt/glusterfs/brick > Brick3: ips3adm.X:/mnt/glusterfs/brick > Options Reconfigured: > performance.readdir-ahead: on > cluster.quorum-type: auto > cluster.server-quorum-type: server > network.remote-dio: enable > cluster.eager-lock: enable > performance.quick-read: off > performance.read-ahead: off > performance.io-cache: off > performance.stat-prefetch: off > features.shard: on > features.shard-block-size: 64MB > cluster.data-self-heal-algorithm: full > network.ping-timeout: 30 > diagnostics.count-fop-hits: on > diagnostics.latency-measurement: on > transport.address-family: inet > nfs.disable: on > performance.client-io-threads: off > > Thanks ! > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From archon810 at gmail.com Fri Apr 19 06:57:59 2019 From: archon810 at gmail.com (Artem Russakovskii) Date: Thu, 18 Apr 2019 23:57:59 -0700 Subject: [Gluster-users] v6.0 release notes fix request Message-ID: Hi, https://docs.gluster.org/en/latest/release-notes/6.0/ currently contains a list of fixed bugs that's run-on and should be fixed with proper line breaks: [image: image.png] Sincerely, Artem -- Founder, Android Police , APK Mirror , Illogical Robot LLC beerpla.net | +ArtemRussakovskii | @ArtemR -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 270318 bytes Desc: not available URL: From lemonnierk at ulrar.net Fri Apr 19 07:18:16 2019 From: lemonnierk at ulrar.net (lemonnierk at ulrar.net) Date: Fri, 19 Apr 2019 08:18:16 +0100 Subject: [Gluster-users] Settings for VM hosting In-Reply-To: References: <20190418072722.GF25080@althea.ulrar.net> Message-ID: <20190419071816.GH25080@althea.ulrar.net> On Fri, Apr 19, 2019 at 06:47:49AM +0530, Krutika Dhananjay wrote: > Looks good mostly. > You can also turn on performance.stat-prefetch, and also set Ah the corruption bug has been fixed, I missed that. Great ! > client.event-threads and server.event-threads to 4. I didn't realize that would also apply to libgfapi ? Good to know, thanks. > And if your bricks are on ssds, then you could also enable > performance.client-io-threads. I'm surprised by that, the doc says "This feature is not recommended for distributed, replicated or distributed-replicated volumes." Since this volume is just a replica 3, shouldn't this stay off ? The disks are all nvme, which I assume would count as ssd. > And if your bricks and hypervisors are on same set of machines > (hyperconverged), > then you can turn off cluster.choose-local and see if it helps read > performance. Thanks, we'll give those a try ! From xpk at headdesk.me Fri Apr 19 15:03:05 2019 From: xpk at headdesk.me (xpk at headdesk.me) Date: Fri, 19 Apr 2019 15:03:05 +0000 Subject: [Gluster-users] adding thin arbiter Message-ID: Hi guys, On an existing volume, I have a volume with 3 replica. One of them is an arbiter. Is there a way to change the arbiter to a thin-arbiter? I tried removing the arbiter brick and add it back, but the add-brick command does't take the --thin-arbiter option. xpk -------------- next part -------------- An HTML attachment was scrubbed... URL: From snowmailer at gmail.com Sat Apr 20 09:29:09 2019 From: snowmailer at gmail.com (Martin Toth) Date: Sat, 20 Apr 2019 11:29:09 +0200 Subject: [Gluster-users] Replica 3 - how to replace failed node (peer) In-Reply-To: <7B2698DB-1897-4EA4-AA63-FFE8752C50F7@gmail.com> References: <0917AF4A-76EC-4A9E-820F-E0ADA2DA899A@gmail.com> <1634978A-E849-48DB-A160-B1AC3DB56D38@gmail.com> <69E7C95F-8A81-46CB-8BD8-F66B582144EC@gmail.com> <00009213-6BF3-4A7F-AFA7-AC076B04496C@gmail.com> <7B2698DB-1897-4EA4-AA63-FFE8752C50F7@gmail.com> Message-ID: <960939F3-15F3-4283-B8AE-5CE11F82523E@gmail.com> Just for other users.. they may find this usefull. I finally started Gluster server process on failed node that lost brick and all went OK. Server is again available as a peer and failed brick is not running, so I can continue with replace brick/ reset brick operation. > On 16 Apr 2019, at 17:44, Martin Toth wrote: > > Thanks for clarification, one more question. > > When I will recover(boot) failed node back and this peer will be available again to remaining two nodes. How do I tell gluster to mark this brick as failed ? > > I mean, I?ve booted failed node back without networking. Disk partition (ZFS pool on another disks) where brick was before failure is lost. > Now I can start gluster event when I don't have ZFS pool where failed brick was before ? > > This wont be a problem when I will connect this node back to cluster ? (before brick replace/reset command will be issued) > > Thanks. BR! > Martin > >> On 11 Apr 2019, at 15:40, Karthik Subrahmanya > wrote: >> >> >> >> On Thu, Apr 11, 2019 at 6:38 PM Martin Toth > wrote: >> Hi Karthik, >> >>> On Thu, Apr 11, 2019 at 12:43 PM Martin Toth > wrote: >>> Hi Karthik, >>> >>> more over, I would like to ask if there are some recommended settings/parameters for SHD in order to achieve good or fair I/O while volume will be healed when I will replace Brick (this should trigger healing process). >>> If I understand you concern correctly, you need to get fair I/O performance for clients while healing takes place as part of the replace brick operation. For this you can turn off the "data-self-heal" and "metadata-self-heal" options until the heal completes on the new brick. >> >> This is exactly what I mean. I am running VM disks on remaining 2 (out of 3 - one failed as mentioned) nodes and I need to ensure there will be fair I/O performance available on these two nodes while replace brick operation will heal volume. >> I will not run any VMs on node where replace brick operation will be running. So if I understand correctly, when I will set : >> >> # gluster volume set cluster.data-self-heal off >> # gluster volume set cluster.metadata-self-heal off >> >> this will tell Gluster clients (libgfapi and FUSE mount) not to read from node ?where replace brick operation? is in place but from remaing two healthy nodes. Is this correct ? Thanks for clarification. >> The reads will be served from one of the good bricks since the file will either be not present on the replaced brick at the time of read or it will be present but marked for heal if it is not already healed. If already healed by SHD, then it could be served from the new brick as well, but there won't be any problem in reading from there in that scenario. >> By setting these two options whenever a read comes from client it will not try to heal the file for data/metadata. Otherwise it would try to heal (if not already healed by SHD) when the read comes on this, hence slowing down the client. >> >>> Turning off client side healing doesn't compromise data integrity and consistency. During the read request from client, pending xattr is evaluated for replica copies and read is only served from correct copy. During writes, IO will continue on both the replicas, SHD will take care of healing files. >>> After replacing the brick, we strongly recommend you to consider upgrading your gluster to one of the maintained versions. We have many stability related fixes there, which can handle some critical issues and corner cases which you could hit during these kind of scenarios. >> >> This will be first priority in infrastructure after fixing this cluster back to fully functional replica3. I will upgrade to 3.12.x and then to version 5 or 6. >> Sounds good. >> >> If you are planning to have the same name for the new brick and if you get the error like "Brick may be containing or be contained by an existing brick" even after using the force option, try using a different name. That should work. >> >> Regards, >> Karthik >> >> BR, >> Martin >> >>> Regards, >>> Karthik >>> I had some problems in past when healing was triggered, VM disks became unresponsive because healing took most of I/O. My volume containing only big files with VM disks. >>> >>> Thanks for suggestions. >>> BR, >>> Martin >>> >>>> On 10 Apr 2019, at 12:38, Martin Toth > wrote: >>>> >>>> Thanks, this looks ok to me, I will reset brick because I don't have any data anymore on failed node so I can use same path / brick name. >>>> >>>> Is reseting brick dangerous command? Should I be worried about some possible failure that will impact remaining two nodes? I am running really old 3.7.6 but stable version. >>>> >>>> Thanks, >>>> BR! >>>> >>>> Martin >>>> >>>> >>>>> On 10 Apr 2019, at 12:20, Karthik Subrahmanya > wrote: >>>>> >>>>> Hi Martin, >>>>> >>>>> After you add the new disks and creating raid array, you can run the following command to replace the old brick with new one: >>>>> >>>>> - If you are going to use a different name to the new brick you can run >>>>> gluster volume replace-brick commit force >>>>> >>>>> - If you are planning to use the same name for the new brick as well then you can use >>>>> gluster volume reset-brick commit force >>>>> Here old-brick & new-brick's hostname & path should be same. >>>>> >>>>> After replacing the brick, make sure the brick comes online using volume status. >>>>> Heal should automatically start, you can check the heal status to see all the files gets replicated to the newly added brick. If it does not start automatically, you can manually start that by running gluster volume heal . >>>>> >>>>> HTH, >>>>> Karthik >>>>> >>>>> On Wed, Apr 10, 2019 at 3:13 PM Martin Toth > wrote: >>>>> Hi all, >>>>> >>>>> I am running replica 3 gluster with 3 bricks. One of my servers failed - all disks are showing errors and raid is in fault state. >>>>> >>>>> Type: Replicate >>>>> Volume ID: 41d5c283-3a74-4af8-a55d-924447bfa59a >>>>> Status: Started >>>>> Number of Bricks: 1 x 3 = 3 >>>>> Transport-type: tcp >>>>> Bricks: >>>>> Brick1: node1.san:/tank/gluster/gv0imagestore/brick1 >>>>> Brick2: node2.san:/tank/gluster/gv0imagestore/brick1 >>>> Brick3: node3.san:/tank/gluster/gv0imagestore/brick1 >>>>> >>>>> So one of my bricks is totally failed (node2). It went down and all data are lost (failed raid on node2). Now I am running only two bricks on 2 servers out from 3. >>>>> This is really critical problem for us, we can lost all data. I want to add new disks to node2, create new raid array on them and try to replace failed brick on this node. >>>>> >>>>> What is the procedure of replacing Brick2 on node2, can someone advice? I can?t find anything relevant in documentation. >>>>> >>>>> Thanks in advance, >>>>> Martin >>>>> _______________________________________________ >>>>> Gluster-users mailing list >>>>> Gluster-users at gluster.org >>>>> https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From hunter86_bg at yahoo.com Sat Apr 20 10:37:51 2019 From: hunter86_bg at yahoo.com (Strahil) Date: Sat, 20 Apr 2019 13:37:51 +0300 Subject: [Gluster-users] adding thin arbiter Message-ID: The docs still do not clarify it, but thin arbiter is only supported by GlusterD2 (for now). Best Regards, Strahil NikolovOn Apr 19, 2019 18:03, xpk at headdesk.me wrote: > > Hi guys, > > On an existing volume, I have a volume with 3 replica. One of them is an arbiter. Is there a way to change the arbiter to a thin-arbiter? I tried removing the arbiter brick and add it back, but the add-brick command does't take the --thin-arbiter option. > > xpk -------------- next part -------------- An HTML attachment was scrubbed... URL: From patrickmrennie at gmail.com Sat Apr 20 13:22:13 2019 From: patrickmrennie at gmail.com (Patrick Rennie) Date: Sat, 20 Apr 2019 21:22:13 +0800 Subject: [Gluster-users] Extremely slow cluster performance Message-ID: Hello Gluster Users, I am hoping someone can help me with resolving an ongoing issue I've been having, I'm new to mailing lists so forgive me if I have gotten anything wrong. We have noticed our performance deteriorating over the last few weeks, easily measured by trying to do an ls on one of our top-level folders, and timing it, which usually would take 2-5 seconds, and now takes up to 20 minutes, which obviously renders our cluster basically unusable. This has been intermittent in the past but is now almost constant and I am not sure how to work out the exact cause. We have noticed some errors in the brick logs, and have noticed that if we kill the right brick process, performance instantly returns back to normal, this is not always the same brick, but it indicates to me something in the brick processes or background tasks may be causing extreme latency. Due to this ability to fix it by killing the right brick process off, I think it's a specific file, or folder, or operation which may be hanging and causing the increased latency, but I am not sure how to work it out. One last thing to add is that our bricks are getting quite full (~95% full), we are trying to migrate data off to new storage but that is going slowly, not helped by this issue. I am currently trying to run a full heal as there appear to be many files needing healing, and I have all brick processes running so they have an opportunity to heal, but this means performance is very poor. It currently takes over 15-20 minutes to do an ls of one of our top-level folders, which just contains 60-80 other folders, this should take 2-5 seconds. This is all being checked by FUSE mount locally on the storage node itself, but it is the same for other clients and VMs accessing the cluster. Initially, it seemed our NFS mounts were not affected and operated at normal speed, but testing over the last day has shown that our NFS clients are also extremely slow, so it doesn't seem specific to FUSE as I first thought it might be. I am not sure how to proceed from here, I am fairly new to gluster having inherited this setup from my predecessor and trying to keep it going. I have included some info below to try and help with diagnosis, please let me know if any further info would be helpful. I would really appreciate any advice on what I could try to work out the cause. Thank you in advance for reading this, and any suggestions you might be able to offer. - Patrick This is an example of the main error I see in our brick logs, there have been others, I can post them when I see them again too: [2019-04-20 04:54:43.055680] E [MSGID: 113001] [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on /brick1/ library: system.posix_acl_default [Operation not supported] [2019-04-20 05:01:29.476313] W [posix.c:4929:posix_getxattr] 0-gvAA01-posix: Extended attributes not supported (try remounting brick with 'user_xattr' flag) Our setup consists of 2 storage nodes and an arbiter node. I have noticed our nodes are on slightly different versions, I'm not sure if this could be an issue. We have 9 bricks on each node, made up of ZFS RAIDZ2 pools - total capacity is around 560TB. We have bonded 10gbps NICS on each node, and I have tested bandwidth with iperf and found that it's what would be expected from this config. Individual brick performance seems ok, I've tested several bricks using dd and can write a 10GB files at 1.7GB/s. # dd if=/dev/zero of=/brick1/test/test.file bs=1M count=10000 10000+0 records in 10000+0 records out 10485760000 bytes (10 GB, 9.8 GiB) copied, 6.20303 s, 1.7 GB/s Node 1: # glusterfs --version glusterfs 3.12.15 Node 2: # glusterfs --version glusterfs 3.12.14 Arbiter: # glusterfs --version glusterfs 3.12.14 Here is our gluster volume status: # gluster volume status Status of volume: gvAA01 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 01-B:/brick1/gvAA01/brick 49152 0 Y 7219 Brick 02-B:/brick1/gvAA01/brick 49152 0 Y 21845 Brick 00-A:/arbiterAA01/gvAA01/bri ck1 49152 0 Y 6931 Brick 01-B:/brick2/gvAA01/brick 49153 0 Y 7239 Brick 02-B:/brick2/gvAA01/brick 49153 0 Y 9916 Brick 00-A:/arbiterAA01/gvAA01/bri ck2 49153 0 Y 6939 Brick 01-B:/brick3/gvAA01/brick 49154 0 Y 7235 Brick 02-B:/brick3/gvAA01/brick 49154 0 Y 21858 Brick 00-A:/arbiterAA01/gvAA01/bri ck3 49154 0 Y 6947 Brick 01-B:/brick4/gvAA01/brick 49155 0 Y 31840 Brick 02-B:/brick4/gvAA01/brick 49155 0 Y 9933 Brick 00-A:/arbiterAA01/gvAA01/bri ck4 49155 0 Y 6956 Brick 01-B:/brick5/gvAA01/brick 49156 0 Y 7233 Brick 02-B:/brick5/gvAA01/brick 49156 0 Y 9942 Brick 00-A:/arbiterAA01/gvAA01/bri ck5 49156 0 Y 6964 Brick 01-B:/brick6/gvAA01/brick 49157 0 Y 7234 Brick 02-B:/brick6/gvAA01/brick 49157 0 Y 9952 Brick 00-A:/arbiterAA01/gvAA01/bri ck6 49157 0 Y 6974 Brick 01-B:/brick7/gvAA01/brick 49158 0 Y 7248 Brick 02-B:/brick7/gvAA01/brick 49158 0 Y 9960 Brick 00-A:/arbiterAA01/gvAA01/bri ck7 49158 0 Y 6984 Brick 01-B:/brick8/gvAA01/brick 49159 0 Y 7253 Brick 02-B:/brick8/gvAA01/brick 49159 0 Y 9970 Brick 00-A:/arbiterAA01/gvAA01/bri ck8 49159 0 Y 6993 Brick 01-B:/brick9/gvAA01/brick 49160 0 Y 7245 Brick 02-B:/brick9/gvAA01/brick 49160 0 Y 9984 Brick 00-A:/arbiterAA01/gvAA01/bri ck9 49160 0 Y 7001 NFS Server on localhost 2049 0 Y 17276 Self-heal Daemon on localhost N/A N/A Y 25245 NFS Server on 02-B 2049 0 Y 9089 Self-heal Daemon on 02-B N/A N/A Y 17838 NFS Server on 00-a 2049 0 Y 15660 Self-heal Daemon on 00-a N/A N/A Y 16218 Task Status of Volume gvAA01 ------------------------------------------------------------------------------ There are no active volume tasks And gluster volume info: # gluster volume info Volume Name: gvAA01 Type: Distributed-Replicate Volume ID: ca4ece2c-13fe-414b-856c-2878196d6118 Status: Started Snapshot Count: 0 Number of Bricks: 9 x (2 + 1) = 27 Transport-type: tcp Bricks: Brick1: 01-B:/brick1/gvAA01/brick Brick2: 02-B:/brick1/gvAA01/brick Brick3: 00-A:/arbiterAA01/gvAA01/brick1 (arbiter) Brick4: 01-B:/brick2/gvAA01/brick Brick5: 02-B:/brick2/gvAA01/brick Brick6: 00-A:/arbiterAA01/gvAA01/brick2 (arbiter) Brick7: 01-B:/brick3/gvAA01/brick Brick8: 02-B:/brick3/gvAA01/brick Brick9: 00-A:/arbiterAA01/gvAA01/brick3 (arbiter) Brick10: 01-B:/brick4/gvAA01/brick Brick11: 02-B:/brick4/gvAA01/brick Brick12: 00-A:/arbiterAA01/gvAA01/brick4 (arbiter) Brick13: 01-B:/brick5/gvAA01/brick Brick14: 02-B:/brick5/gvAA01/brick Brick15: 00-A:/arbiterAA01/gvAA01/brick5 (arbiter) Brick16: 01-B:/brick6/gvAA01/brick Brick17: 02-B:/brick6/gvAA01/brick Brick18: 00-A:/arbiterAA01/gvAA01/brick6 (arbiter) Brick19: 01-B:/brick7/gvAA01/brick Brick20: 02-B:/brick7/gvAA01/brick Brick21: 00-A:/arbiterAA01/gvAA01/brick7 (arbiter) Brick22: 01-B:/brick8/gvAA01/brick Brick23: 02-B:/brick8/gvAA01/brick Brick24: 00-A:/arbiterAA01/gvAA01/brick8 (arbiter) Brick25: 01-B:/brick9/gvAA01/brick Brick26: 02-B:/brick9/gvAA01/brick Brick27: 00-A:/arbiterAA01/gvAA01/brick9 (arbiter) Options Reconfigured: cluster.shd-max-threads: 4 performance.least-prio-threads: 16 cluster.readdir-optimize: on performance.quick-read: off performance.stat-prefetch: off cluster.data-self-heal: on cluster.lookup-unhashed: auto cluster.lookup-optimize: on cluster.favorite-child-policy: mtime server.allow-insecure: on transport.address-family: inet client.bind-insecure: on cluster.entry-self-heal: off cluster.metadata-self-heal: off performance.md-cache-timeout: 600 cluster.self-heal-daemon: enable performance.readdir-ahead: on diagnostics.brick-log-level: INFO nfs.disable: off Thank you for any assistance. - Patrick -------------- next part -------------- An HTML attachment was scrubbed... URL: From budic at onholyground.com Sat Apr 20 14:50:08 2019 From: budic at onholyground.com (Darrell Budic) Date: Sat, 20 Apr 2019 09:50:08 -0500 Subject: [Gluster-users] Extremely slow cluster performance In-Reply-To: References: Message-ID: Patrick, I would definitely upgrade your two nodes from 3.12.14 to 3.12.15. You also mention ZFS, and that error you show makes me think you need to check to be sure you have ?xattr=sa? and ?acltype=posixacl? set on your ZFS volumes. You also observed your bricks are crossing the 95% full line, ZFS performance will degrade significantly the closer you get to full. In my experience, this starts somewhere between 10% and 5% free space remaining, so you?re in that realm. How?s your free memory on the servers doing? Do you have your zfs arc cache limited to something less than all the RAM? It shares pretty well, but I?ve encountered situations where other things won?t try and take ram back properly if they think it?s in use, so ZFS never gets the opportunity to give it up. Since your volume is a disperse-replica, you might try tuning disperse.shd-max-threads, default is 1, I?d try it at 2, 4, or even more if the CPUs are beefy enough. And setting server.event-threads to 4 and client.event-threads to 8 has proven helpful in many cases. After you get upgraded to 3.12.15, enabling performance.stat-prefetch may help as well. I don?t know if it matters, but I?d also recommend resetting performance.least-prio-threads to the default of 1 (or try 2 or 4) and/or also setting performance.io -thread-count to 32 if those have beefy CPUs. Beyond those general ideas, more info about your hardware (CPU and RAM) and workload (VMs, direct storage for web servers or enders, etc) may net you some more ideas. Then you?re going to have to do more digging into brick logs looking for errors and/or warnings to see what?s going on. -Darrell > On Apr 20, 2019, at 8:22 AM, Patrick Rennie wrote: > > Hello Gluster Users, > > I am hoping someone can help me with resolving an ongoing issue I've been having, I'm new to mailing lists so forgive me if I have gotten anything wrong. We have noticed our performance deteriorating over the last few weeks, easily measured by trying to do an ls on one of our top-level folders, and timing it, which usually would take 2-5 seconds, and now takes up to 20 minutes, which obviously renders our cluster basically unusable. This has been intermittent in the past but is now almost constant and I am not sure how to work out the exact cause. We have noticed some errors in the brick logs, and have noticed that if we kill the right brick process, performance instantly returns back to normal, this is not always the same brick, but it indicates to me something in the brick processes or background tasks may be causing extreme latency. Due to this ability to fix it by killing the right brick process off, I think it's a specific file, or folder, or operation which may be hanging and causing the increased latency, but I am not sure how to work it out. One last thing to add is that our bricks are getting quite full (~95% full), we are trying to migrate data off to new storage but that is going slowly, not helped by this issue. I am currently trying to run a full heal as there appear to be many files needing healing, and I have all brick processes running so they have an opportunity to heal, but this means performance is very poor. It currently takes over 15-20 minutes to do an ls of one of our top-level folders, which just contains 60-80 other folders, this should take 2-5 seconds. This is all being checked by FUSE mount locally on the storage node itself, but it is the same for other clients and VMs accessing the cluster. Initially, it seemed our NFS mounts were not affected and operated at normal speed, but testing over the last day has shown that our NFS clients are also extremely slow, so it doesn't seem specific to FUSE as I first thought it might be. > > I am not sure how to proceed from here, I am fairly new to gluster having inherited this setup from my predecessor and trying to keep it going. I have included some info below to try and help with diagnosis, please let me know if any further info would be helpful. I would really appreciate any advice on what I could try to work out the cause. Thank you in advance for reading this, and any suggestions you might be able to offer. > > - Patrick > > This is an example of the main error I see in our brick logs, there have been others, I can post them when I see them again too: > [2019-04-20 04:54:43.055680] E [MSGID: 113001] [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on /brick1/ library: system.posix_acl_default [Operation not supported] > [2019-04-20 05:01:29.476313] W [posix.c:4929:posix_getxattr] 0-gvAA01-posix: Extended attributes not supported (try remounting brick with 'user_xattr' flag) > > Our setup consists of 2 storage nodes and an arbiter node. I have noticed our nodes are on slightly different versions, I'm not sure if this could be an issue. We have 9 bricks on each node, made up of ZFS RAIDZ2 pools - total capacity is around 560TB. > We have bonded 10gbps NICS on each node, and I have tested bandwidth with iperf and found that it's what would be expected from this config. > Individual brick performance seems ok, I've tested several bricks using dd and can write a 10GB files at 1.7GB/s. > > # dd if=/dev/zero of=/brick1/test/test.file bs=1M count=10000 > 10000+0 records in > 10000+0 records out > 10485760000 bytes (10 GB, 9.8 GiB) copied, 6.20303 s, 1.7 GB/s > > Node 1: > # glusterfs --version > glusterfs 3.12.15 > > Node 2: > # glusterfs --version > glusterfs 3.12.14 > > Arbiter: > # glusterfs --version > glusterfs 3.12.14 > > Here is our gluster volume status: > > # gluster volume status > Status of volume: gvAA01 > Gluster process TCP Port RDMA Port Online Pid > ------------------------------------------------------------------------------ > Brick 01-B:/brick1/gvAA01/brick 49152 0 Y 7219 > Brick 02-B:/brick1/gvAA01/brick 49152 0 Y 21845 > Brick 00-A:/arbiterAA01/gvAA01/bri > ck1 49152 0 Y 6931 > Brick 01-B:/brick2/gvAA01/brick 49153 0 Y 7239 > Brick 02-B:/brick2/gvAA01/brick 49153 0 Y 9916 > Brick 00-A:/arbiterAA01/gvAA01/bri > ck2 49153 0 Y 6939 > Brick 01-B:/brick3/gvAA01/brick 49154 0 Y 7235 > Brick 02-B:/brick3/gvAA01/brick 49154 0 Y 21858 > Brick 00-A:/arbiterAA01/gvAA01/bri > ck3 49154 0 Y 6947 > Brick 01-B:/brick4/gvAA01/brick 49155 0 Y 31840 > Brick 02-B:/brick4/gvAA01/brick 49155 0 Y 9933 > Brick 00-A:/arbiterAA01/gvAA01/bri > ck4 49155 0 Y 6956 > Brick 01-B:/brick5/gvAA01/brick 49156 0 Y 7233 > Brick 02-B:/brick5/gvAA01/brick 49156 0 Y 9942 > Brick 00-A:/arbiterAA01/gvAA01/bri > ck5 49156 0 Y 6964 > Brick 01-B:/brick6/gvAA01/brick 49157 0 Y 7234 > Brick 02-B:/brick6/gvAA01/brick 49157 0 Y 9952 > Brick 00-A:/arbiterAA01/gvAA01/bri > ck6 49157 0 Y 6974 > Brick 01-B:/brick7/gvAA01/brick 49158 0 Y 7248 > Brick 02-B:/brick7/gvAA01/brick 49158 0 Y 9960 > Brick 00-A:/arbiterAA01/gvAA01/bri > ck7 49158 0 Y 6984 > Brick 01-B:/brick8/gvAA01/brick 49159 0 Y 7253 > Brick 02-B:/brick8/gvAA01/brick 49159 0 Y 9970 > Brick 00-A:/arbiterAA01/gvAA01/bri > ck8 49159 0 Y 6993 > Brick 01-B:/brick9/gvAA01/brick 49160 0 Y 7245 > Brick 02-B:/brick9/gvAA01/brick 49160 0 Y 9984 > Brick 00-A:/arbiterAA01/gvAA01/bri > ck9 49160 0 Y 7001 > NFS Server on localhost 2049 0 Y 17276 > Self-heal Daemon on localhost N/A N/A Y 25245 > NFS Server on 02-B 2049 0 Y 9089 > Self-heal Daemon on 02-B N/A N/A Y 17838 > NFS Server on 00-a 2049 0 Y 15660 > Self-heal Daemon on 00-a N/A N/A Y 16218 > > Task Status of Volume gvAA01 > ------------------------------------------------------------------------------ > There are no active volume tasks > > And gluster volume info: > > # gluster volume info > > Volume Name: gvAA01 > Type: Distributed-Replicate > Volume ID: ca4ece2c-13fe-414b-856c-2878196d6118 > Status: Started > Snapshot Count: 0 > Number of Bricks: 9 x (2 + 1) = 27 > Transport-type: tcp > Bricks: > Brick1: 01-B:/brick1/gvAA01/brick > Brick2: 02-B:/brick1/gvAA01/brick > Brick3: 00-A:/arbiterAA01/gvAA01/brick1 (arbiter) > Brick4: 01-B:/brick2/gvAA01/brick > Brick5: 02-B:/brick2/gvAA01/brick > Brick6: 00-A:/arbiterAA01/gvAA01/brick2 (arbiter) > Brick7: 01-B:/brick3/gvAA01/brick > Brick8: 02-B:/brick3/gvAA01/brick > Brick9: 00-A:/arbiterAA01/gvAA01/brick3 (arbiter) > Brick10: 01-B:/brick4/gvAA01/brick > Brick11: 02-B:/brick4/gvAA01/brick > Brick12: 00-A:/arbiterAA01/gvAA01/brick4 (arbiter) > Brick13: 01-B:/brick5/gvAA01/brick > Brick14: 02-B:/brick5/gvAA01/brick > Brick15: 00-A:/arbiterAA01/gvAA01/brick5 (arbiter) > Brick16: 01-B:/brick6/gvAA01/brick > Brick17: 02-B:/brick6/gvAA01/brick > Brick18: 00-A:/arbiterAA01/gvAA01/brick6 (arbiter) > Brick19: 01-B:/brick7/gvAA01/brick > Brick20: 02-B:/brick7/gvAA01/brick > Brick21: 00-A:/arbiterAA01/gvAA01/brick7 (arbiter) > Brick22: 01-B:/brick8/gvAA01/brick > Brick23: 02-B:/brick8/gvAA01/brick > Brick24: 00-A:/arbiterAA01/gvAA01/brick8 (arbiter) > Brick25: 01-B:/brick9/gvAA01/brick > Brick26: 02-B:/brick9/gvAA01/brick > Brick27: 00-A:/arbiterAA01/gvAA01/brick9 (arbiter) > Options Reconfigured: > cluster.shd-max-threads: 4 > performance.least-prio-threads: 16 > cluster.readdir-optimize: on > performance.quick-read: off > performance.stat-prefetch: off > cluster.data-self-heal: on > cluster.lookup-unhashed: auto > cluster.lookup-optimize: on > cluster.favorite-child-policy: mtime > server.allow-insecure: on > transport.address-family: inet > client.bind-insecure: on > cluster.entry-self-heal: off > cluster.metadata-self-heal: off > performance.md-cache-timeout: 600 > cluster.self-heal-daemon: enable > performance.readdir-ahead: on > diagnostics.brick-log-level: INFO > nfs.disable: off > > Thank you for any assistance. > > - Patrick > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From patrickmrennie at gmail.com Sat Apr 20 15:09:23 2019 From: patrickmrennie at gmail.com (Patrick Rennie) Date: Sat, 20 Apr 2019 23:09:23 +0800 Subject: [Gluster-users] Extremely slow cluster performance In-Reply-To: References: Message-ID: Hi Darrell, Thanks for your reply, this issue seems to be getting worse over the last few days, really has me tearing my hair out. I will do as you have suggested and get started on upgrading from 3.12.14 to 3.12.15. I've checked the zfs properties and all bricks have "xattr=sa" set, but none of them has "acltype=posixacl" set, currently the acltype property shows "off", if I make these changes will it apply retroactively to the existing data? I'm unfamiliar with what this will change so I may need to look into that before I proceed. I understand performance is going to slow down as the bricks get full, I am currently trying to free space and migrate data to some newer storage, I have fresh several hundred TB storage I just setup recently but with these performance issues it's really slow. I also believe there is significant data which has been deleted directly from the bricks in the past, so if I can reclaim this space in a safe manner then I will have at least around 10-15% free space. These servers have dual 8 core Xeon (E5-2620v4) and 512GB of RAM so generally they have plenty of resources available, currently only using around 330/512GB of memory. I will look into what your suggested settings will change, and then will probably go ahead with your recommendations, for our specs as stated above, what would you suggest for performance.io-thread-count ? Our workload is nothing too extreme, we have a few VMs which write backup data to this storage nightly for our clients, our VMs don't live on this cluster, but just write to it. I've been going through all of the logs I can, below are some slightly sanitized errors I've come across, but I'm not sure what to make of them. The main error I am seeing is the first one below, across several of my bricks, but possibly only for specific folders on the cluster, I'm not 100% about that yet though. [2019-04-20 05:56:59.512649] E [MSGID: 113001] [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default [Operation not supported] [2019-04-20 05:59:06.084333] E [MSGID: 113001] [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default [Operation not supported] [2019-04-20 05:59:43.289030] E [MSGID: 113001] [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default [Operation not supported] [2019-04-20 05:59:50.582257] E [MSGID: 113001] [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default [Operation not supported] [2019-04-20 06:01:42.501701] E [MSGID: 113001] [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default [Operation not supported] [2019-04-20 06:01:51.665354] W [posix.c:4929:posix_getxattr] 0-gvAA01-posix: Extended attributes not supported (try remounting brick with 'user_xattr' flag) [2019-04-20 13:12:36.131856] E [MSGID: 113002] [posix-helpers.c:893:posix_gfid_set] 0-gvAA01-posix: gfid is null for /xxxxxxxxxxxxxxxxxxxx [Invalid argument] [2019-04-20 13:12:36.131959] E [MSGID: 113002] [posix.c:362:posix_lookup] 0-gvAA01-posix: buf->ia_gfid is null for /brick2/xxxxxxxxxxxxxxxxxxxx_62906_tmp [No data available] [2019-04-20 13:12:36.132016] E [MSGID: 115050] [server-rpc-fops.c:175:server_lookup_cbk] 0-gvAA01-server: 24274759: LOOKUP /xxxxxxxxxxxxxxxxxxxx (a7c9b4a0-b7ee-4d01-a79e-576013c8ac87/Cloud Backup_clone1.vbm_62906_tmp), client: 00-A-16217-2019/04/08-21:23:03:692424-gvAA01-client-4-0-3, error-xlator: gvAA01-posix [No data available] [2019-04-20 13:12:38.093719] E [MSGID: 115050] [server-rpc-fops.c:175:server_lookup_cbk] 0-gvAA01-server: 24276491: LOOKUP /xxxxxxxxxxxxxxxxxxxx (a7c9b4a0-b7ee-4d01-a79e-576013c8ac87/Cloud Backup_clone1.vbm_62906_tmp), client: 00-A-16217-2019/04/08-21:23:03:692424-gvAA01-client-4-0-3, error-xlator: gvAA01-posix [No data available] [2019-04-20 13:12:38.093660] E [MSGID: 113002] [posix-helpers.c:893:posix_gfid_set] 0-gvAA01-posix: gfid is null for /xxxxxxxxxxxxxxxxxxxx [Invalid argument] [2019-04-20 13:12:38.093696] E [MSGID: 113002] [posix.c:362:posix_lookup] 0-gvAA01-posix: buf->ia_gfid is null for /brick2/xxxxxxxxxxxxxxxxxxxx [No data available] [2019-04-20 14:25:59.654576] E [inodelk.c:404:__inode_unlock_lock] 0-gvAA01-locks: Matching lock not found for unlock 0-9223372036854775807, by 980fdbbd367f0000 on 0x7fc4f0161440 [2019-04-20 14:25:59.654668] E [MSGID: 115053] [server-rpc-fops.c:295:server_inodelk_cbk] 0-gvAA01-server: 6092928: INODELK /xxxxxxxxxxxxxxxxxxxx.cdr$ (25b14631-a179-4274-8243-6e272d4f2ad8), client: cb-per-worker18-53637-2019/04/19-14:25:37:927673-gvAA01-client-1-0-4, error-xlator: gvAA01-locks [Invalid argument] [2019-04-20 13:35:07.495495] E [rpcsvc.c:1364:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x247c644, Program: GlusterFS 3.3, ProgVers: 330, Proc: 27) to rpc-transport (tcp.gvAA01-server) [2019-04-20 13:35:07.495619] E [server.c:195:server_submit_reply] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.14/xlator/debug/io-stats.so(+0x1696a) [0x7ff4ae6f796a] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.14/xlator/protocol/server.so(+0x2d6e8) [0x7ff4ae2a96e8] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.14/xlator/protocol/server.so(+0x928d) [0x7ff4ae28528d] ) 0-: Reply submission failed Thank you again for your assistance. It is greatly appreciated. - Patrick On Sat, Apr 20, 2019 at 10:50 PM Darrell Budic wrote: > Patrick, > > I would definitely upgrade your two nodes from 3.12.14 to 3.12.15. You > also mention ZFS, and that error you show makes me think you need to check > to be sure you have ?xattr=sa? and ?acltype=posixacl? set on your ZFS > volumes. > > You also observed your bricks are crossing the 95% full line, ZFS > performance will degrade significantly the closer you get to full. In my > experience, this starts somewhere between 10% and 5% free space remaining, > so you?re in that realm. > > How?s your free memory on the servers doing? Do you have your zfs arc > cache limited to something less than all the RAM? It shares pretty well, > but I?ve encountered situations where other things won?t try and take ram > back properly if they think it?s in use, so ZFS never gets the opportunity > to give it up. > > Since your volume is a disperse-replica, you might try tuning > disperse.shd-max-threads, default is 1, I?d try it at 2, 4, or even more if > the CPUs are beefy enough. And setting server.event-threads to 4 and > client.event-threads to 8 has proven helpful in many cases. After you get > upgraded to 3.12.15, enabling performance.stat-prefetch may help as well. I > don?t know if it matters, but I?d also recommend resetting > performance.least-prio-threads to the default of 1 (or try 2 or 4) and/or > also setting performance.io-thread-count to 32 if those have beefy CPUs. > > Beyond those general ideas, more info about your hardware (CPU and RAM) > and workload (VMs, direct storage for web servers or enders, etc) may net > you some more ideas. Then you?re going to have to do more digging into > brick logs looking for errors and/or warnings to see what?s going on. > > -Darrell > > > On Apr 20, 2019, at 8:22 AM, Patrick Rennie > wrote: > > Hello Gluster Users, > > I am hoping someone can help me with resolving an ongoing issue I've been > having, I'm new to mailing lists so forgive me if I have gotten anything > wrong. We have noticed our performance deteriorating over the last few > weeks, easily measured by trying to do an ls on one of our top-level > folders, and timing it, which usually would take 2-5 seconds, and now takes > up to 20 minutes, which obviously renders our cluster basically unusable. > This has been intermittent in the past but is now almost constant and I am > not sure how to work out the exact cause. We have noticed some errors in > the brick logs, and have noticed that if we kill the right brick process, > performance instantly returns back to normal, this is not always the same > brick, but it indicates to me something in the brick processes or > background tasks may be causing extreme latency. Due to this ability to fix > it by killing the right brick process off, I think it's a specific file, or > folder, or operation which may be hanging and causing the increased > latency, but I am not sure how to work it out. One last thing to add is > that our bricks are getting quite full (~95% full), we are trying to > migrate data off to new storage but that is going slowly, not helped by > this issue. I am currently trying to run a full heal as there appear to be > many files needing healing, and I have all brick processes running so they > have an opportunity to heal, but this means performance is very poor. It > currently takes over 15-20 minutes to do an ls of one of our top-level > folders, which just contains 60-80 other folders, this should take 2-5 > seconds. This is all being checked by FUSE mount locally on the storage > node itself, but it is the same for other clients and VMs accessing the > cluster. Initially, it seemed our NFS mounts were not affected and operated > at normal speed, but testing over the last day has shown that our NFS > clients are also extremely slow, so it doesn't seem specific to FUSE as I > first thought it might be. > > I am not sure how to proceed from here, I am fairly new to gluster having > inherited this setup from my predecessor and trying to keep it going. I > have included some info below to try and help with diagnosis, please let me > know if any further info would be helpful. I would really appreciate any > advice on what I could try to work out the cause. Thank you in advance for > reading this, and any suggestions you might be able to offer. > > - Patrick > > This is an example of the main error I see in our brick logs, there have > been others, I can post them when I see them again too: > [2019-04-20 04:54:43.055680] E [MSGID: 113001] > [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on > /brick1/ library: system.posix_acl_default [Operation not > supported] > [2019-04-20 05:01:29.476313] W [posix.c:4929:posix_getxattr] > 0-gvAA01-posix: Extended attributes not supported (try remounting brick > with 'user_xattr' flag) > > Our setup consists of 2 storage nodes and an arbiter node. I have noticed > our nodes are on slightly different versions, I'm not sure if this could be > an issue. We have 9 bricks on each node, made up of ZFS RAIDZ2 pools - > total capacity is around 560TB. > We have bonded 10gbps NICS on each node, and I have tested bandwidth with > iperf and found that it's what would be expected from this config. > Individual brick performance seems ok, I've tested several bricks using dd > and can write a 10GB files at 1.7GB/s. > > # dd if=/dev/zero of=/brick1/test/test.file bs=1M count=10000 > 10000+0 records in > 10000+0 records out > 10485760000 bytes (10 GB, 9.8 GiB) copied, 6.20303 s, 1.7 GB/s > > Node 1: > # glusterfs --version > glusterfs 3.12.15 > > Node 2: > # glusterfs --version > glusterfs 3.12.14 > > Arbiter: > # glusterfs --version > glusterfs 3.12.14 > > Here is our gluster volume status: > > # gluster volume status > Status of volume: gvAA01 > Gluster process TCP Port RDMA Port Online > Pid > > ------------------------------------------------------------------------------ > Brick 01-B:/brick1/gvAA01/brick 49152 0 Y 7219 > Brick 02-B:/brick1/gvAA01/brick 49152 0 Y 21845 > Brick 00-A:/arbiterAA01/gvAA01/bri > ck1 49152 0 Y > 6931 > Brick 01-B:/brick2/gvAA01/brick 49153 0 Y 7239 > Brick 02-B:/brick2/gvAA01/brick 49153 0 Y 9916 > Brick 00-A:/arbiterAA01/gvAA01/bri > ck2 49153 0 Y > 6939 > Brick 01-B:/brick3/gvAA01/brick 49154 0 Y 7235 > Brick 02-B:/brick3/gvAA01/brick 49154 0 Y 21858 > Brick 00-A:/arbiterAA01/gvAA01/bri > ck3 49154 0 Y > 6947 > Brick 01-B:/brick4/gvAA01/brick 49155 0 Y 31840 > Brick 02-B:/brick4/gvAA01/brick 49155 0 Y 9933 > Brick 00-A:/arbiterAA01/gvAA01/bri > ck4 49155 0 Y > 6956 > Brick 01-B:/brick5/gvAA01/brick 49156 0 Y 7233 > Brick 02-B:/brick5/gvAA01/brick 49156 0 Y 9942 > Brick 00-A:/arbiterAA01/gvAA01/bri > ck5 49156 0 Y > 6964 > Brick 01-B:/brick6/gvAA01/brick 49157 0 Y 7234 > Brick 02-B:/brick6/gvAA01/brick 49157 0 Y 9952 > Brick 00-A:/arbiterAA01/gvAA01/bri > ck6 49157 0 Y > 6974 > Brick 01-B:/brick7/gvAA01/brick 49158 0 Y 7248 > Brick 02-B:/brick7/gvAA01/brick 49158 0 Y 9960 > Brick 00-A:/arbiterAA01/gvAA01/bri > ck7 49158 0 Y > 6984 > Brick 01-B:/brick8/gvAA01/brick 49159 0 Y 7253 > Brick 02-B:/brick8/gvAA01/brick 49159 0 Y 9970 > Brick 00-A:/arbiterAA01/gvAA01/bri > ck8 49159 0 Y > 6993 > Brick 01-B:/brick9/gvAA01/brick 49160 0 Y 7245 > Brick 02-B:/brick9/gvAA01/brick 49160 0 Y 9984 > Brick 00-A:/arbiterAA01/gvAA01/bri > ck9 49160 0 Y > 7001 > NFS Server on localhost 2049 0 Y > 17276 > Self-heal Daemon on localhost N/A N/A Y > 25245 > NFS Server on 02-B 2049 0 Y 9089 > Self-heal Daemon on 02-B N/A N/A Y 17838 > NFS Server on 00-a 2049 0 Y 15660 > Self-heal Daemon on 00-a N/A N/A Y 16218 > > Task Status of Volume gvAA01 > > ------------------------------------------------------------------------------ > There are no active volume tasks > > And gluster volume info: > > # gluster volume info > > Volume Name: gvAA01 > Type: Distributed-Replicate > Volume ID: ca4ece2c-13fe-414b-856c-2878196d6118 > Status: Started > Snapshot Count: 0 > Number of Bricks: 9 x (2 + 1) = 27 > Transport-type: tcp > Bricks: > Brick1: 01-B:/brick1/gvAA01/brick > Brick2: 02-B:/brick1/gvAA01/brick > Brick3: 00-A:/arbiterAA01/gvAA01/brick1 (arbiter) > Brick4: 01-B:/brick2/gvAA01/brick > Brick5: 02-B:/brick2/gvAA01/brick > Brick6: 00-A:/arbiterAA01/gvAA01/brick2 (arbiter) > Brick7: 01-B:/brick3/gvAA01/brick > Brick8: 02-B:/brick3/gvAA01/brick > Brick9: 00-A:/arbiterAA01/gvAA01/brick3 (arbiter) > Brick10: 01-B:/brick4/gvAA01/brick > Brick11: 02-B:/brick4/gvAA01/brick > Brick12: 00-A:/arbiterAA01/gvAA01/brick4 (arbiter) > Brick13: 01-B:/brick5/gvAA01/brick > Brick14: 02-B:/brick5/gvAA01/brick > Brick15: 00-A:/arbiterAA01/gvAA01/brick5 (arbiter) > Brick16: 01-B:/brick6/gvAA01/brick > Brick17: 02-B:/brick6/gvAA01/brick > Brick18: 00-A:/arbiterAA01/gvAA01/brick6 (arbiter) > Brick19: 01-B:/brick7/gvAA01/brick > Brick20: 02-B:/brick7/gvAA01/brick > Brick21: 00-A:/arbiterAA01/gvAA01/brick7 (arbiter) > Brick22: 01-B:/brick8/gvAA01/brick > Brick23: 02-B:/brick8/gvAA01/brick > Brick24: 00-A:/arbiterAA01/gvAA01/brick8 (arbiter) > Brick25: 01-B:/brick9/gvAA01/brick > Brick26: 02-B:/brick9/gvAA01/brick > Brick27: 00-A:/arbiterAA01/gvAA01/brick9 (arbiter) > Options Reconfigured: > cluster.shd-max-threads: 4 > performance.least-prio-threads: 16 > cluster.readdir-optimize: on > performance.quick-read: off > performance.stat-prefetch: off > cluster.data-self-heal: on > cluster.lookup-unhashed: auto > cluster.lookup-optimize: on > cluster.favorite-child-policy: mtime > server.allow-insecure: on > transport.address-family: inet > client.bind-insecure: on > cluster.entry-self-heal: off > cluster.metadata-self-heal: off > performance.md-cache-timeout: 600 > cluster.self-heal-daemon: enable > performance.readdir-ahead: on > diagnostics.brick-log-level: INFO > nfs.disable: off > > Thank you for any assistance. > > - Patrick > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From budic at onholyground.com Sat Apr 20 15:57:37 2019 From: budic at onholyground.com (Darrell Budic) Date: Sat, 20 Apr 2019 10:57:37 -0500 Subject: [Gluster-users] Extremely slow cluster performance In-Reply-To: References: Message-ID: <93FC9B39-2E8C-4579-8C9D-DEF1A28B7384@onholyground.com> See inline: > On Apr 20, 2019, at 10:09 AM, Patrick Rennie wrote: > > Hi Darrell, > > Thanks for your reply, this issue seems to be getting worse over the last few days, really has me tearing my hair out. I will do as you have suggested and get started on upgrading from 3.12.14 to 3.12.15. > I've checked the zfs properties and all bricks have "xattr=sa" set, but none of them has "acltype=posixacl" set, currently the acltype property shows "off", if I make these changes will it apply retroactively to the existing data? I'm unfamiliar with what this will change so I may need to look into that before I proceed. It is safe to apply that now, any new set/get calls will then use it if new posixacls exist, and use older if not. ZFS is good that way. It should clear up your posix_acl and posix errors over time. > I understand performance is going to slow down as the bricks get full, I am currently trying to free space and migrate data to some newer storage, I have fresh several hundred TB storage I just setup recently but with these performance issues it's really slow. I also believe there is significant data which has been deleted directly from the bricks in the past, so if I can reclaim this space in a safe manner then I will have at least around 10-15% free space. Full ZFS volumes will have a much larger impact on performance than you?d think, I?d prioritize this. If you have been taking zfs snapshots, consider deleting them to get the overall volume free space back up. And just to be sure it?s been said, delete from within the mounted volumes, don?t delete directly from the bricks (gluster will just try and heal it later, compounding your issues). Does not apply to deleting other data from the ZFS volume if it?s not part of the brick directory, of course. > These servers have dual 8 core Xeon (E5-2620v4) and 512GB of RAM so generally they have plenty of resources available, currently only using around 330/512GB of memory. > > I will look into what your suggested settings will change, and then will probably go ahead with your recommendations, for our specs as stated above, what would you suggest for performance.io-thread-count ? I run single 2630v4s on my servers, which have a smaller storage footprint than yours. I?d go with 32 for performance.io -thread-count. I?d try 4 for the shd thread settings on that gear. Your memory use sounds fine, so no worries there. > Our workload is nothing too extreme, we have a few VMs which write backup data to this storage nightly for our clients, our VMs don't live on this cluster, but just write to it. If they are writing compressible data, you?ll get immediate benefit by setting compression=lz4 on your ZFS volumes. It won?t help any old data, of course, but it will compress new data going forward. This is another one that?s safe to enable on the fly. > I've been going through all of the logs I can, below are some slightly sanitized errors I've come across, but I'm not sure what to make of them. The main error I am seeing is the first one below, across several of my bricks, but possibly only for specific folders on the cluster, I'm not 100% about that yet though. > > [2019-04-20 05:56:59.512649] E [MSGID: 113001] [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default [Operation not supported] > [2019-04-20 05:59:06.084333] E [MSGID: 113001] [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default [Operation not supported] > [2019-04-20 05:59:43.289030] E [MSGID: 113001] [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default [Operation not supported] > [2019-04-20 05:59:50.582257] E [MSGID: 113001] [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default [Operation not supported] > [2019-04-20 06:01:42.501701] E [MSGID: 113001] [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default [Operation not supported] > [2019-04-20 06:01:51.665354] W [posix.c:4929:posix_getxattr] 0-gvAA01-posix: Extended attributes not supported (try remounting brick with 'user_xattr' flag) > > > [2019-04-20 13:12:36.131856] E [MSGID: 113002] [posix-helpers.c:893:posix_gfid_set] 0-gvAA01-posix: gfid is null for /xxxxxxxxxxxxxxxxxxxx [Invalid argument] > [2019-04-20 13:12:36.131959] E [MSGID: 113002] [posix.c:362:posix_lookup] 0-gvAA01-posix: buf->ia_gfid is null for /brick2/xxxxxxxxxxxxxxxxxxxx_62906_tmp [No data available] > [2019-04-20 13:12:36.132016] E [MSGID: 115050] [server-rpc-fops.c:175:server_lookup_cbk] 0-gvAA01-server: 24274759: LOOKUP /xxxxxxxxxxxxxxxxxxxx (a7c9b4a0-b7ee-4d01-a79e-576013c8ac87/Cloud Backup_clone1.vbm_62906_tmp), client: 00-A-16217-2019/04/08-21:23:03:692424-gvAA01-client-4-0-3, error-xlator: gvAA01-posix [No data available] > [2019-04-20 13:12:38.093719] E [MSGID: 115050] [server-rpc-fops.c:175:server_lookup_cbk] 0-gvAA01-server: 24276491: LOOKUP /xxxxxxxxxxxxxxxxxxxx (a7c9b4a0-b7ee-4d01-a79e-576013c8ac87/Cloud Backup_clone1.vbm_62906_tmp), client: 00-A-16217-2019/04/08-21:23:03:692424-gvAA01-client-4-0-3, error-xlator: gvAA01-posix [No data available] > [2019-04-20 13:12:38.093660] E [MSGID: 113002] [posix-helpers.c:893:posix_gfid_set] 0-gvAA01-posix: gfid is null for /xxxxxxxxxxxxxxxxxxxx [Invalid argument] > [2019-04-20 13:12:38.093696] E [MSGID: 113002] [posix.c:362:posix_lookup] 0-gvAA01-posix: buf->ia_gfid is null for /brick2/xxxxxxxxxxxxxxxxxxxx [No data available] > posixacls should clear those up, as mentioned. > > [2019-04-20 14:25:59.654576] E [inodelk.c:404:__inode_unlock_lock] 0-gvAA01-locks: Matching lock not found for unlock 0-9223372036854775807, by 980fdbbd367f0000 on 0x7fc4f0161440 > [2019-04-20 14:25:59.654668] E [MSGID: 115053] [server-rpc-fops.c:295:server_inodelk_cbk] 0-gvAA01-server: 6092928: INODELK /xxxxxxxxxxxxxxxxxxxx.cdr$ (25b14631-a179-4274-8243-6e272d4f2ad8), client: cb-per-worker18-53637-2019/04/19-14:25:37:927673-gvAA01-client-1-0-4, error-xlator: gvAA01-locks [Invalid argument] > > > [2019-04-20 13:35:07.495495] E [rpcsvc.c:1364:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x247c644, Program: GlusterFS 3.3, ProgVers: 330, Proc: 27) to rpc-transport (tcp.gvAA01-server) > [2019-04-20 13:35:07.495619] E [server.c:195:server_submit_reply] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.14/xlator/debug/io-stats.so(+0x1696a) [0x7ff4ae6f796a] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.14/xlator/protocol/server.so(+0x2d6e8) [0x7ff4ae2a96e8] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.14/xlator/protocol/server.so(+0x928d) [0x7ff4ae28528d] ) 0-: Reply submission failed > Fix the posix acls and see if these clear up over time as well, I?m unclear on what the overall effect of running without the posix acls will be to total gluster health. Your biggest problem sounds like you need to free up space on the volumes and get the overall volume health back up to par and see if that doesn?t resolve the symptoms you?re seeing. > > Thank you again for your assistance. It is greatly appreciated. > > - Patrick > > > > On Sat, Apr 20, 2019 at 10:50 PM Darrell Budic > wrote: > Patrick, > > I would definitely upgrade your two nodes from 3.12.14 to 3.12.15. You also mention ZFS, and that error you show makes me think you need to check to be sure you have ?xattr=sa? and ?acltype=posixacl? set on your ZFS volumes. > > You also observed your bricks are crossing the 95% full line, ZFS performance will degrade significantly the closer you get to full. In my experience, this starts somewhere between 10% and 5% free space remaining, so you?re in that realm. > > How?s your free memory on the servers doing? Do you have your zfs arc cache limited to something less than all the RAM? It shares pretty well, but I?ve encountered situations where other things won?t try and take ram back properly if they think it?s in use, so ZFS never gets the opportunity to give it up. > > Since your volume is a disperse-replica, you might try tuning disperse.shd-max-threads, default is 1, I?d try it at 2, 4, or even more if the CPUs are beefy enough. And setting server.event-threads to 4 and client.event-threads to 8 has proven helpful in many cases. After you get upgraded to 3.12.15, enabling performance.stat-prefetch may help as well. I don?t know if it matters, but I?d also recommend resetting performance.least-prio-threads to the default of 1 (or try 2 or 4) and/or also setting performance.io -thread-count to 32 if those have beefy CPUs. > > Beyond those general ideas, more info about your hardware (CPU and RAM) and workload (VMs, direct storage for web servers or enders, etc) may net you some more ideas. Then you?re going to have to do more digging into brick logs looking for errors and/or warnings to see what?s going on. > > -Darrell > > >> On Apr 20, 2019, at 8:22 AM, Patrick Rennie > wrote: >> >> Hello Gluster Users, >> >> I am hoping someone can help me with resolving an ongoing issue I've been having, I'm new to mailing lists so forgive me if I have gotten anything wrong. We have noticed our performance deteriorating over the last few weeks, easily measured by trying to do an ls on one of our top-level folders, and timing it, which usually would take 2-5 seconds, and now takes up to 20 minutes, which obviously renders our cluster basically unusable. This has been intermittent in the past but is now almost constant and I am not sure how to work out the exact cause. We have noticed some errors in the brick logs, and have noticed that if we kill the right brick process, performance instantly returns back to normal, this is not always the same brick, but it indicates to me something in the brick processes or background tasks may be causing extreme latency. Due to this ability to fix it by killing the right brick process off, I think it's a specific file, or folder, or operation which may be hanging and causing the increased latency, but I am not sure how to work it out. One last thing to add is that our bricks are getting quite full (~95% full), we are trying to migrate data off to new storage but that is going slowly, not helped by this issue. I am currently trying to run a full heal as there appear to be many files needing healing, and I have all brick processes running so they have an opportunity to heal, but this means performance is very poor. It currently takes over 15-20 minutes to do an ls of one of our top-level folders, which just contains 60-80 other folders, this should take 2-5 seconds. This is all being checked by FUSE mount locally on the storage node itself, but it is the same for other clients and VMs accessing the cluster. Initially, it seemed our NFS mounts were not affected and operated at normal speed, but testing over the last day has shown that our NFS clients are also extremely slow, so it doesn't seem specific to FUSE as I first thought it might be. >> >> I am not sure how to proceed from here, I am fairly new to gluster having inherited this setup from my predecessor and trying to keep it going. I have included some info below to try and help with diagnosis, please let me know if any further info would be helpful. I would really appreciate any advice on what I could try to work out the cause. Thank you in advance for reading this, and any suggestions you might be able to offer. >> >> - Patrick >> >> This is an example of the main error I see in our brick logs, there have been others, I can post them when I see them again too: >> [2019-04-20 04:54:43.055680] E [MSGID: 113001] [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on /brick1/ library: system.posix_acl_default [Operation not supported] >> [2019-04-20 05:01:29.476313] W [posix.c:4929:posix_getxattr] 0-gvAA01-posix: Extended attributes not supported (try remounting brick with 'user_xattr' flag) >> >> Our setup consists of 2 storage nodes and an arbiter node. I have noticed our nodes are on slightly different versions, I'm not sure if this could be an issue. We have 9 bricks on each node, made up of ZFS RAIDZ2 pools - total capacity is around 560TB. >> We have bonded 10gbps NICS on each node, and I have tested bandwidth with iperf and found that it's what would be expected from this config. >> Individual brick performance seems ok, I've tested several bricks using dd and can write a 10GB files at 1.7GB/s. >> >> # dd if=/dev/zero of=/brick1/test/test.file bs=1M count=10000 >> 10000+0 records in >> 10000+0 records out >> 10485760000 bytes (10 GB, 9.8 GiB) copied, 6.20303 s, 1.7 GB/s >> >> Node 1: >> # glusterfs --version >> glusterfs 3.12.15 >> >> Node 2: >> # glusterfs --version >> glusterfs 3.12.14 >> >> Arbiter: >> # glusterfs --version >> glusterfs 3.12.14 >> >> Here is our gluster volume status: >> >> # gluster volume status >> Status of volume: gvAA01 >> Gluster process TCP Port RDMA Port Online Pid >> ------------------------------------------------------------------------------ >> Brick 01-B:/brick1/gvAA01/brick 49152 0 Y 7219 >> Brick 02-B:/brick1/gvAA01/brick 49152 0 Y 21845 >> Brick 00-A:/arbiterAA01/gvAA01/bri >> ck1 49152 0 Y 6931 >> Brick 01-B:/brick2/gvAA01/brick 49153 0 Y 7239 >> Brick 02-B:/brick2/gvAA01/brick 49153 0 Y 9916 >> Brick 00-A:/arbiterAA01/gvAA01/bri >> ck2 49153 0 Y 6939 >> Brick 01-B:/brick3/gvAA01/brick 49154 0 Y 7235 >> Brick 02-B:/brick3/gvAA01/brick 49154 0 Y 21858 >> Brick 00-A:/arbiterAA01/gvAA01/bri >> ck3 49154 0 Y 6947 >> Brick 01-B:/brick4/gvAA01/brick 49155 0 Y 31840 >> Brick 02-B:/brick4/gvAA01/brick 49155 0 Y 9933 >> Brick 00-A:/arbiterAA01/gvAA01/bri >> ck4 49155 0 Y 6956 >> Brick 01-B:/brick5/gvAA01/brick 49156 0 Y 7233 >> Brick 02-B:/brick5/gvAA01/brick 49156 0 Y 9942 >> Brick 00-A:/arbiterAA01/gvAA01/bri >> ck5 49156 0 Y 6964 >> Brick 01-B:/brick6/gvAA01/brick 49157 0 Y 7234 >> Brick 02-B:/brick6/gvAA01/brick 49157 0 Y 9952 >> Brick 00-A:/arbiterAA01/gvAA01/bri >> ck6 49157 0 Y 6974 >> Brick 01-B:/brick7/gvAA01/brick 49158 0 Y 7248 >> Brick 02-B:/brick7/gvAA01/brick 49158 0 Y 9960 >> Brick 00-A:/arbiterAA01/gvAA01/bri >> ck7 49158 0 Y 6984 >> Brick 01-B:/brick8/gvAA01/brick 49159 0 Y 7253 >> Brick 02-B:/brick8/gvAA01/brick 49159 0 Y 9970 >> Brick 00-A:/arbiterAA01/gvAA01/bri >> ck8 49159 0 Y 6993 >> Brick 01-B:/brick9/gvAA01/brick 49160 0 Y 7245 >> Brick 02-B:/brick9/gvAA01/brick 49160 0 Y 9984 >> Brick 00-A:/arbiterAA01/gvAA01/bri >> ck9 49160 0 Y 7001 >> NFS Server on localhost 2049 0 Y 17276 >> Self-heal Daemon on localhost N/A N/A Y 25245 >> NFS Server on 02-B 2049 0 Y 9089 >> Self-heal Daemon on 02-B N/A N/A Y 17838 >> NFS Server on 00-a 2049 0 Y 15660 >> Self-heal Daemon on 00-a N/A N/A Y 16218 >> >> Task Status of Volume gvAA01 >> ------------------------------------------------------------------------------ >> There are no active volume tasks >> >> And gluster volume info: >> >> # gluster volume info >> >> Volume Name: gvAA01 >> Type: Distributed-Replicate >> Volume ID: ca4ece2c-13fe-414b-856c-2878196d6118 >> Status: Started >> Snapshot Count: 0 >> Number of Bricks: 9 x (2 + 1) = 27 >> Transport-type: tcp >> Bricks: >> Brick1: 01-B:/brick1/gvAA01/brick >> Brick2: 02-B:/brick1/gvAA01/brick >> Brick3: 00-A:/arbiterAA01/gvAA01/brick1 (arbiter) >> Brick4: 01-B:/brick2/gvAA01/brick >> Brick5: 02-B:/brick2/gvAA01/brick >> Brick6: 00-A:/arbiterAA01/gvAA01/brick2 (arbiter) >> Brick7: 01-B:/brick3/gvAA01/brick >> Brick8: 02-B:/brick3/gvAA01/brick >> Brick9: 00-A:/arbiterAA01/gvAA01/brick3 (arbiter) >> Brick10: 01-B:/brick4/gvAA01/brick >> Brick11: 02-B:/brick4/gvAA01/brick >> Brick12: 00-A:/arbiterAA01/gvAA01/brick4 (arbiter) >> Brick13: 01-B:/brick5/gvAA01/brick >> Brick14: 02-B:/brick5/gvAA01/brick >> Brick15: 00-A:/arbiterAA01/gvAA01/brick5 (arbiter) >> Brick16: 01-B:/brick6/gvAA01/brick >> Brick17: 02-B:/brick6/gvAA01/brick >> Brick18: 00-A:/arbiterAA01/gvAA01/brick6 (arbiter) >> Brick19: 01-B:/brick7/gvAA01/brick >> Brick20: 02-B:/brick7/gvAA01/brick >> Brick21: 00-A:/arbiterAA01/gvAA01/brick7 (arbiter) >> Brick22: 01-B:/brick8/gvAA01/brick >> Brick23: 02-B:/brick8/gvAA01/brick >> Brick24: 00-A:/arbiterAA01/gvAA01/brick8 (arbiter) >> Brick25: 01-B:/brick9/gvAA01/brick >> Brick26: 02-B:/brick9/gvAA01/brick >> Brick27: 00-A:/arbiterAA01/gvAA01/brick9 (arbiter) >> Options Reconfigured: >> cluster.shd-max-threads: 4 >> performance.least-prio-threads: 16 >> cluster.readdir-optimize: on >> performance.quick-read: off >> performance.stat-prefetch: off >> cluster.data-self-heal: on >> cluster.lookup-unhashed: auto >> cluster.lookup-optimize: on >> cluster.favorite-child-policy: mtime >> server.allow-insecure: on >> transport.address-family: inet >> client.bind-insecure: on >> cluster.entry-self-heal: off >> cluster.metadata-self-heal: off >> performance.md-cache-timeout: 600 >> cluster.self-heal-daemon: enable >> performance.readdir-ahead: on >> diagnostics.brick-log-level: INFO >> nfs.disable: off >> >> Thank you for any assistance. >> >> - Patrick >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From patrickmrennie at gmail.com Sat Apr 20 16:54:15 2019 From: patrickmrennie at gmail.com (Patrick Rennie) Date: Sun, 21 Apr 2019 00:54:15 +0800 Subject: [Gluster-users] Extremely slow cluster performance In-Reply-To: <93FC9B39-2E8C-4579-8C9D-DEF1A28B7384@onholyground.com> References: <93FC9B39-2E8C-4579-8C9D-DEF1A28B7384@onholyground.com> Message-ID: Hi Darrell, Thanks again for your advice, I've applied the acltype=posixacl on my zpools and I think that has reduced some of the noise from my brick logs. I also bumped up some of the thread counts you suggested but my CPU load skyrocketed, so I dropped it back down to something slightly lower, but still higher than it was before, and will see how that goes for a while. Although low space is a definite issue, if I run an ls anywhere on my bricks directly it's instant, <1 second, and still takes several minutes via gluster, so there is still a problem in my gluster configuration somewhere. We don't have any snapshots, but I am trying to work out if any data on there is safe to delete, or if there is any way I can safely find and delete data which has been removed directly from the bricks in the past. I also have lz4 compression already enabled on each zpool which does help a bit, we get between 1.05 and 1.08x compression on this data. I've tried to go through each client and checked it's cluster mount logs and also my brick logs and looking for errors, so far nothing is jumping out at me, but there are some warnings and errors here and there, I am trying to work out what they mean. It's already 1 am here and unfortunately, I'm still awake working on this issue, but I think that I will have to leave the version upgrades until tomorrow. Thanks again for your advice so far. If anyone has any ideas on where I can look for errors other than brick logs or the cluster mount logs to help resolve this issue, it would be much appreciated. Cheers, - Patrick On Sat, Apr 20, 2019 at 11:57 PM Darrell Budic wrote: > See inline: > > On Apr 20, 2019, at 10:09 AM, Patrick Rennie > wrote: > > Hi Darrell, > > Thanks for your reply, this issue seems to be getting worse over the last > few days, really has me tearing my hair out. I will do as you have > suggested and get started on upgrading from 3.12.14 to 3.12.15. > I've checked the zfs properties and all bricks have "xattr=sa" set, but > none of them has "acltype=posixacl" set, currently the acltype property > shows "off", if I make these changes will it apply retroactively to the > existing data? I'm unfamiliar with what this will change so I may need to > look into that before I proceed. > > > It is safe to apply that now, any new set/get calls will then use it if > new posixacls exist, and use older if not. ZFS is good that way. It should > clear up your posix_acl and posix errors over time. > > I understand performance is going to slow down as the bricks get full, I > am currently trying to free space and migrate data to some newer storage, I > have fresh several hundred TB storage I just setup recently but with these > performance issues it's really slow. I also believe there is significant > data which has been deleted directly from the bricks in the past, so if I > can reclaim this space in a safe manner then I will have at least around > 10-15% free space. > > > Full ZFS volumes will have a much larger impact on performance than you?d > think, I?d prioritize this. If you have been taking zfs snapshots, consider > deleting them to get the overall volume free space back up. And just to be > sure it?s been said, delete from within the mounted volumes, don?t delete > directly from the bricks (gluster will just try and heal it later, > compounding your issues). Does not apply to deleting other data from the > ZFS volume if it?s not part of the brick directory, of course. > > These servers have dual 8 core Xeon (E5-2620v4) and 512GB of RAM so > generally they have plenty of resources available, currently only using > around 330/512GB of memory. > > I will look into what your suggested settings will change, and then will > probably go ahead with your recommendations, for our specs as stated above, > what would you suggest for performance.io-thread-count ? > > > I run single 2630v4s on my servers, which have a smaller storage footprint > than yours. I?d go with 32 for performance.io-thread-count. I?d try 4 for > the shd thread settings on that gear. Your memory use sounds fine, so no > worries there. > > Our workload is nothing too extreme, we have a few VMs which write backup > data to this storage nightly for our clients, our VMs don't live on this > cluster, but just write to it. > > > If they are writing compressible data, you?ll get immediate benefit by > setting compression=lz4 on your ZFS volumes. It won?t help any old data, of > course, but it will compress new data going forward. This is another one > that?s safe to enable on the fly. > > I've been going through all of the logs I can, below are some slightly > sanitized errors I've come across, but I'm not sure what to make of them. > The main error I am seeing is the first one below, across several of my > bricks, but possibly only for specific folders on the cluster, I'm not 100% > about that yet though. > > [2019-04-20 05:56:59.512649] E [MSGID: 113001] > [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on > /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default [Operation not > supported] > [2019-04-20 05:59:06.084333] E [MSGID: 113001] > [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on > /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default [Operation not > supported] > [2019-04-20 05:59:43.289030] E [MSGID: 113001] > [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on > /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default [Operation not > supported] > [2019-04-20 05:59:50.582257] E [MSGID: 113001] > [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on > /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default [Operation not > supported] > [2019-04-20 06:01:42.501701] E [MSGID: 113001] > [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on > /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default [Operation not > supported] > [2019-04-20 06:01:51.665354] W [posix.c:4929:posix_getxattr] > 0-gvAA01-posix: Extended attributes not supported (try remounting brick > with 'user_xattr' flag) > > > [2019-04-20 13:12:36.131856] E [MSGID: 113002] > [posix-helpers.c:893:posix_gfid_set] 0-gvAA01-posix: gfid is null for > /xxxxxxxxxxxxxxxxxxxx [Invalid argument] > [2019-04-20 13:12:36.131959] E [MSGID: 113002] [posix.c:362:posix_lookup] > 0-gvAA01-posix: buf->ia_gfid is null for > /brick2/xxxxxxxxxxxxxxxxxxxx_62906_tmp [No data available] > [2019-04-20 13:12:36.132016] E [MSGID: 115050] > [server-rpc-fops.c:175:server_lookup_cbk] 0-gvAA01-server: 24274759: LOOKUP > /xxxxxxxxxxxxxxxxxxxx (a7c9b4a0-b7ee-4d01-a79e-576013c8ac87/Cloud > Backup_clone1.vbm_62906_tmp), client: > 00-A-16217-2019/04/08-21:23:03:692424-gvAA01-client-4-0-3, error-xlator: > gvAA01-posix [No data available] > [2019-04-20 13:12:38.093719] E [MSGID: 115050] > [server-rpc-fops.c:175:server_lookup_cbk] 0-gvAA01-server: 24276491: LOOKUP > /xxxxxxxxxxxxxxxxxxxx (a7c9b4a0-b7ee-4d01-a79e-576013c8ac87/Cloud > Backup_clone1.vbm_62906_tmp), client: > 00-A-16217-2019/04/08-21:23:03:692424-gvAA01-client-4-0-3, error-xlator: > gvAA01-posix [No data available] > [2019-04-20 13:12:38.093660] E [MSGID: 113002] > [posix-helpers.c:893:posix_gfid_set] 0-gvAA01-posix: gfid is null for > /xxxxxxxxxxxxxxxxxxxx [Invalid argument] > [2019-04-20 13:12:38.093696] E [MSGID: 113002] [posix.c:362:posix_lookup] > 0-gvAA01-posix: buf->ia_gfid is null for /brick2/xxxxxxxxxxxxxxxxxxxx [No > data available] > > > posixacls should clear those up, as mentioned. > > > [2019-04-20 14:25:59.654576] E [inodelk.c:404:__inode_unlock_lock] > 0-gvAA01-locks: Matching lock not found for unlock 0-9223372036854775807, > by 980fdbbd367f0000 on 0x7fc4f0161440 > [2019-04-20 14:25:59.654668] E [MSGID: 115053] > [server-rpc-fops.c:295:server_inodelk_cbk] 0-gvAA01-server: 6092928: > INODELK /xxxxxxxxxxxxxxxxxxxx.cdr$ (25b14631-a179-4274-8243-6e272d4f2ad8), > client: > cb-per-worker18-53637-2019/04/19-14:25:37:927673-gvAA01-client-1-0-4, > error-xlator: gvAA01-locks [Invalid argument] > > > [2019-04-20 13:35:07.495495] E [rpcsvc.c:1364:rpcsvc_submit_generic] > 0-rpc-service: failed to submit message (XID: 0x247c644, Program: GlusterFS > 3.3, ProgVers: 330, Proc: 27) to rpc-transport (tcp.gvAA01-server) > [2019-04-20 13:35:07.495619] E [server.c:195:server_submit_reply] > (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.14/xlator/debug/io-stats.so(+0x1696a) > [0x7ff4ae6f796a] > -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.14/xlator/protocol/server.so(+0x2d6e8) > [0x7ff4ae2a96e8] > -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.14/xlator/protocol/server.so(+0x928d) > [0x7ff4ae28528d] ) 0-: Reply submission failed > > > Fix the posix acls and see if these clear up over time as well, I?m > unclear on what the overall effect of running without the posix acls will > be to total gluster health. Your biggest problem sounds like you need to > free up space on the volumes and get the overall volume health back up to > par and see if that doesn?t resolve the symptoms you?re seeing. > > > > Thank you again for your assistance. It is greatly appreciated. > > - Patrick > > > > On Sat, Apr 20, 2019 at 10:50 PM Darrell Budic > wrote: > >> Patrick, >> >> I would definitely upgrade your two nodes from 3.12.14 to 3.12.15. You >> also mention ZFS, and that error you show makes me think you need to check >> to be sure you have ?xattr=sa? and ?acltype=posixacl? set on your ZFS >> volumes. >> >> You also observed your bricks are crossing the 95% full line, ZFS >> performance will degrade significantly the closer you get to full. In my >> experience, this starts somewhere between 10% and 5% free space remaining, >> so you?re in that realm. >> >> How?s your free memory on the servers doing? Do you have your zfs arc >> cache limited to something less than all the RAM? It shares pretty well, >> but I?ve encountered situations where other things won?t try and take ram >> back properly if they think it?s in use, so ZFS never gets the opportunity >> to give it up. >> >> Since your volume is a disperse-replica, you might try tuning >> disperse.shd-max-threads, default is 1, I?d try it at 2, 4, or even more if >> the CPUs are beefy enough. And setting server.event-threads to 4 and >> client.event-threads to 8 has proven helpful in many cases. After you get >> upgraded to 3.12.15, enabling performance.stat-prefetch may help as well. I >> don?t know if it matters, but I?d also recommend resetting >> performance.least-prio-threads to the default of 1 (or try 2 or 4) and/or >> also setting performance.io-thread-count to 32 if those have beefy CPUs. >> >> Beyond those general ideas, more info about your hardware (CPU and RAM) >> and workload (VMs, direct storage for web servers or enders, etc) may net >> you some more ideas. Then you?re going to have to do more digging into >> brick logs looking for errors and/or warnings to see what?s going on. >> >> -Darrell >> >> >> On Apr 20, 2019, at 8:22 AM, Patrick Rennie >> wrote: >> >> Hello Gluster Users, >> >> I am hoping someone can help me with resolving an ongoing issue I've been >> having, I'm new to mailing lists so forgive me if I have gotten anything >> wrong. We have noticed our performance deteriorating over the last few >> weeks, easily measured by trying to do an ls on one of our top-level >> folders, and timing it, which usually would take 2-5 seconds, and now takes >> up to 20 minutes, which obviously renders our cluster basically unusable. >> This has been intermittent in the past but is now almost constant and I am >> not sure how to work out the exact cause. We have noticed some errors in >> the brick logs, and have noticed that if we kill the right brick process, >> performance instantly returns back to normal, this is not always the same >> brick, but it indicates to me something in the brick processes or >> background tasks may be causing extreme latency. Due to this ability to fix >> it by killing the right brick process off, I think it's a specific file, or >> folder, or operation which may be hanging and causing the increased >> latency, but I am not sure how to work it out. One last thing to add is >> that our bricks are getting quite full (~95% full), we are trying to >> migrate data off to new storage but that is going slowly, not helped by >> this issue. I am currently trying to run a full heal as there appear to be >> many files needing healing, and I have all brick processes running so they >> have an opportunity to heal, but this means performance is very poor. It >> currently takes over 15-20 minutes to do an ls of one of our top-level >> folders, which just contains 60-80 other folders, this should take 2-5 >> seconds. This is all being checked by FUSE mount locally on the storage >> node itself, but it is the same for other clients and VMs accessing the >> cluster. Initially, it seemed our NFS mounts were not affected and operated >> at normal speed, but testing over the last day has shown that our NFS >> clients are also extremely slow, so it doesn't seem specific to FUSE as I >> first thought it might be. >> >> I am not sure how to proceed from here, I am fairly new to gluster having >> inherited this setup from my predecessor and trying to keep it going. I >> have included some info below to try and help with diagnosis, please let me >> know if any further info would be helpful. I would really appreciate any >> advice on what I could try to work out the cause. Thank you in advance for >> reading this, and any suggestions you might be able to offer. >> >> - Patrick >> >> This is an example of the main error I see in our brick logs, there have >> been others, I can post them when I see them again too: >> [2019-04-20 04:54:43.055680] E [MSGID: 113001] >> [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on >> /brick1/ library: system.posix_acl_default [Operation not >> supported] >> [2019-04-20 05:01:29.476313] W [posix.c:4929:posix_getxattr] >> 0-gvAA01-posix: Extended attributes not supported (try remounting brick >> with 'user_xattr' flag) >> >> Our setup consists of 2 storage nodes and an arbiter node. I have noticed >> our nodes are on slightly different versions, I'm not sure if this could be >> an issue. We have 9 bricks on each node, made up of ZFS RAIDZ2 pools - >> total capacity is around 560TB. >> We have bonded 10gbps NICS on each node, and I have tested bandwidth with >> iperf and found that it's what would be expected from this config. >> Individual brick performance seems ok, I've tested several bricks using >> dd and can write a 10GB files at 1.7GB/s. >> >> # dd if=/dev/zero of=/brick1/test/test.file bs=1M count=10000 >> 10000+0 records in >> 10000+0 records out >> 10485760000 bytes (10 GB, 9.8 GiB) copied, 6.20303 s, 1.7 GB/s >> >> Node 1: >> # glusterfs --version >> glusterfs 3.12.15 >> >> Node 2: >> # glusterfs --version >> glusterfs 3.12.14 >> >> Arbiter: >> # glusterfs --version >> glusterfs 3.12.14 >> >> Here is our gluster volume status: >> >> # gluster volume status >> Status of volume: gvAA01 >> Gluster process TCP Port RDMA Port Online >> Pid >> >> ------------------------------------------------------------------------------ >> Brick 01-B:/brick1/gvAA01/brick 49152 0 Y 7219 >> Brick 02-B:/brick1/gvAA01/brick 49152 0 Y 21845 >> Brick 00-A:/arbiterAA01/gvAA01/bri >> ck1 49152 0 Y >> 6931 >> Brick 01-B:/brick2/gvAA01/brick 49153 0 Y 7239 >> Brick 02-B:/brick2/gvAA01/brick 49153 0 Y 9916 >> Brick 00-A:/arbiterAA01/gvAA01/bri >> ck2 49153 0 Y >> 6939 >> Brick 01-B:/brick3/gvAA01/brick 49154 0 Y 7235 >> Brick 02-B:/brick3/gvAA01/brick 49154 0 Y 21858 >> Brick 00-A:/arbiterAA01/gvAA01/bri >> ck3 49154 0 Y >> 6947 >> Brick 01-B:/brick4/gvAA01/brick 49155 0 Y 31840 >> Brick 02-B:/brick4/gvAA01/brick 49155 0 Y 9933 >> Brick 00-A:/arbiterAA01/gvAA01/bri >> ck4 49155 0 Y >> 6956 >> Brick 01-B:/brick5/gvAA01/brick 49156 0 Y 7233 >> Brick 02-B:/brick5/gvAA01/brick 49156 0 Y 9942 >> Brick 00-A:/arbiterAA01/gvAA01/bri >> ck5 49156 0 Y >> 6964 >> Brick 01-B:/brick6/gvAA01/brick 49157 0 Y 7234 >> Brick 02-B:/brick6/gvAA01/brick 49157 0 Y 9952 >> Brick 00-A:/arbiterAA01/gvAA01/bri >> ck6 49157 0 Y >> 6974 >> Brick 01-B:/brick7/gvAA01/brick 49158 0 Y 7248 >> Brick 02-B:/brick7/gvAA01/brick 49158 0 Y 9960 >> Brick 00-A:/arbiterAA01/gvAA01/bri >> ck7 49158 0 Y >> 6984 >> Brick 01-B:/brick8/gvAA01/brick 49159 0 Y 7253 >> Brick 02-B:/brick8/gvAA01/brick 49159 0 Y 9970 >> Brick 00-A:/arbiterAA01/gvAA01/bri >> ck8 49159 0 Y >> 6993 >> Brick 01-B:/brick9/gvAA01/brick 49160 0 Y 7245 >> Brick 02-B:/brick9/gvAA01/brick 49160 0 Y 9984 >> Brick 00-A:/arbiterAA01/gvAA01/bri >> ck9 49160 0 Y >> 7001 >> NFS Server on localhost 2049 0 Y >> 17276 >> Self-heal Daemon on localhost N/A N/A Y >> 25245 >> NFS Server on 02-B 2049 0 Y 9089 >> Self-heal Daemon on 02-B N/A N/A Y 17838 >> NFS Server on 00-a 2049 0 Y 15660 >> Self-heal Daemon on 00-a N/A N/A Y 16218 >> >> Task Status of Volume gvAA01 >> >> ------------------------------------------------------------------------------ >> There are no active volume tasks >> >> And gluster volume info: >> >> # gluster volume info >> >> Volume Name: gvAA01 >> Type: Distributed-Replicate >> Volume ID: ca4ece2c-13fe-414b-856c-2878196d6118 >> Status: Started >> Snapshot Count: 0 >> Number of Bricks: 9 x (2 + 1) = 27 >> Transport-type: tcp >> Bricks: >> Brick1: 01-B:/brick1/gvAA01/brick >> Brick2: 02-B:/brick1/gvAA01/brick >> Brick3: 00-A:/arbiterAA01/gvAA01/brick1 (arbiter) >> Brick4: 01-B:/brick2/gvAA01/brick >> Brick5: 02-B:/brick2/gvAA01/brick >> Brick6: 00-A:/arbiterAA01/gvAA01/brick2 (arbiter) >> Brick7: 01-B:/brick3/gvAA01/brick >> Brick8: 02-B:/brick3/gvAA01/brick >> Brick9: 00-A:/arbiterAA01/gvAA01/brick3 (arbiter) >> Brick10: 01-B:/brick4/gvAA01/brick >> Brick11: 02-B:/brick4/gvAA01/brick >> Brick12: 00-A:/arbiterAA01/gvAA01/brick4 (arbiter) >> Brick13: 01-B:/brick5/gvAA01/brick >> Brick14: 02-B:/brick5/gvAA01/brick >> Brick15: 00-A:/arbiterAA01/gvAA01/brick5 (arbiter) >> Brick16: 01-B:/brick6/gvAA01/brick >> Brick17: 02-B:/brick6/gvAA01/brick >> Brick18: 00-A:/arbiterAA01/gvAA01/brick6 (arbiter) >> Brick19: 01-B:/brick7/gvAA01/brick >> Brick20: 02-B:/brick7/gvAA01/brick >> Brick21: 00-A:/arbiterAA01/gvAA01/brick7 (arbiter) >> Brick22: 01-B:/brick8/gvAA01/brick >> Brick23: 02-B:/brick8/gvAA01/brick >> Brick24: 00-A:/arbiterAA01/gvAA01/brick8 (arbiter) >> Brick25: 01-B:/brick9/gvAA01/brick >> Brick26: 02-B:/brick9/gvAA01/brick >> Brick27: 00-A:/arbiterAA01/gvAA01/brick9 (arbiter) >> Options Reconfigured: >> cluster.shd-max-threads: 4 >> performance.least-prio-threads: 16 >> cluster.readdir-optimize: on >> performance.quick-read: off >> performance.stat-prefetch: off >> cluster.data-self-heal: on >> cluster.lookup-unhashed: auto >> cluster.lookup-optimize: on >> cluster.favorite-child-policy: mtime >> server.allow-insecure: on >> transport.address-family: inet >> client.bind-insecure: on >> cluster.entry-self-heal: off >> cluster.metadata-self-heal: off >> performance.md-cache-timeout: 600 >> cluster.self-heal-daemon: enable >> performance.readdir-ahead: on >> diagnostics.brick-log-level: INFO >> nfs.disable: off >> >> Thank you for any assistance. >> >> - Patrick >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users >> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From budic at onholyground.com Sat Apr 20 17:22:06 2019 From: budic at onholyground.com (Darrell Budic) Date: Sat, 20 Apr 2019 12:22:06 -0500 Subject: [Gluster-users] Extremely slow cluster performance In-Reply-To: References: <93FC9B39-2E8C-4579-8C9D-DEF1A28B7384@onholyground.com> Message-ID: Patrick, Sounds like progress. Be aware that gluster is expected to max out the CPUs on at least one of your servers while healing. This is normal and won?t adversely affect overall performance (any more than having bricks in need of healing, at any rate) unless you?re overdoing it. shd threads <= 4 should not do that on your hardware. Other tunings may have also increased overall performance, so you may see higher CPU than previously anyway. I?d recommend upping those thread counts and letting it heal as fast as possible, especially if these are dedicated Gluster storage servers (Ie: not also running VMs, etc). You should see ?normal? CPU use one heals are completed. I see ~15-30% overall normally, 95-98% while healing (x my 20 cores). It?s also likely to be different between your servers, in a pure replica, one tends to max and one tends to be a little higher, in a distributed-replica, I?d expect more than one to run harder while healing. Keep the differences between doing an ls on a brick and doing an ls on a gluster mount in mind. When you do a ls on a gluster volume, it isn?t just doing a ls on one brick, it?s effectively doing it on ALL of your bricks, and they all have to return data before the ls succeeds. In a distributed volume, it?s figuring out where on each volume things live and getting the stat() from each to assemble the whole thing. And if things are in need of healing, it will take even longer to decide which version is current and use it (shd triggers a heal anytime it encounters this). Any of these things being slow slows down the overall response. At this point, I?d get some sleep too, and let your cluster heal while you do. I?d really want it fully healed before I did any updates anyway, so let it use CPU and get itself sorted out. Expect it to do a round of healing after you upgrade each machine too, this is normal so don?t let the CPU spike surprise you, It?s just catching up from the downtime incurred by the update and/or reboot if you did one. That reminds me, check your gluster cluster.op-version and cluster.max-op-version (gluster vol get all all | grep op-version). If op-version isn?t at the max-op-verison, set it to it so you?re taking advantage of the latest features available to your version. -Darrell > On Apr 20, 2019, at 11:54 AM, Patrick Rennie wrote: > > Hi Darrell, > > Thanks again for your advice, I've applied the acltype=posixacl on my zpools and I think that has reduced some of the noise from my brick logs. > I also bumped up some of the thread counts you suggested but my CPU load skyrocketed, so I dropped it back down to something slightly lower, but still higher than it was before, and will see how that goes for a while. > > Although low space is a definite issue, if I run an ls anywhere on my bricks directly it's instant, <1 second, and still takes several minutes via gluster, so there is still a problem in my gluster configuration somewhere. We don't have any snapshots, but I am trying to work out if any data on there is safe to delete, or if there is any way I can safely find and delete data which has been removed directly from the bricks in the past. I also have lz4 compression already enabled on each zpool which does help a bit, we get between 1.05 and 1.08x compression on this data. > I've tried to go through each client and checked it's cluster mount logs and also my brick logs and looking for errors, so far nothing is jumping out at me, but there are some warnings and errors here and there, I am trying to work out what they mean. > > It's already 1 am here and unfortunately, I'm still awake working on this issue, but I think that I will have to leave the version upgrades until tomorrow. > > Thanks again for your advice so far. If anyone has any ideas on where I can look for errors other than brick logs or the cluster mount logs to help resolve this issue, it would be much appreciated. > > Cheers, > > - Patrick > > On Sat, Apr 20, 2019 at 11:57 PM Darrell Budic > wrote: > See inline: > >> On Apr 20, 2019, at 10:09 AM, Patrick Rennie > wrote: >> >> Hi Darrell, >> >> Thanks for your reply, this issue seems to be getting worse over the last few days, really has me tearing my hair out. I will do as you have suggested and get started on upgrading from 3.12.14 to 3.12.15. >> I've checked the zfs properties and all bricks have "xattr=sa" set, but none of them has "acltype=posixacl" set, currently the acltype property shows "off", if I make these changes will it apply retroactively to the existing data? I'm unfamiliar with what this will change so I may need to look into that before I proceed. > > It is safe to apply that now, any new set/get calls will then use it if new posixacls exist, and use older if not. ZFS is good that way. It should clear up your posix_acl and posix errors over time. > >> I understand performance is going to slow down as the bricks get full, I am currently trying to free space and migrate data to some newer storage, I have fresh several hundred TB storage I just setup recently but with these performance issues it's really slow. I also believe there is significant data which has been deleted directly from the bricks in the past, so if I can reclaim this space in a safe manner then I will have at least around 10-15% free space. > > Full ZFS volumes will have a much larger impact on performance than you?d think, I?d prioritize this. If you have been taking zfs snapshots, consider deleting them to get the overall volume free space back up. And just to be sure it?s been said, delete from within the mounted volumes, don?t delete directly from the bricks (gluster will just try and heal it later, compounding your issues). Does not apply to deleting other data from the ZFS volume if it?s not part of the brick directory, of course. > >> These servers have dual 8 core Xeon (E5-2620v4) and 512GB of RAM so generally they have plenty of resources available, currently only using around 330/512GB of memory. >> >> I will look into what your suggested settings will change, and then will probably go ahead with your recommendations, for our specs as stated above, what would you suggest for performance.io -thread-count ? > > I run single 2630v4s on my servers, which have a smaller storage footprint than yours. I?d go with 32 for performance.io -thread-count. I?d try 4 for the shd thread settings on that gear. Your memory use sounds fine, so no worries there. > >> Our workload is nothing too extreme, we have a few VMs which write backup data to this storage nightly for our clients, our VMs don't live on this cluster, but just write to it. > > If they are writing compressible data, you?ll get immediate benefit by setting compression=lz4 on your ZFS volumes. It won?t help any old data, of course, but it will compress new data going forward. This is another one that?s safe to enable on the fly. > >> I've been going through all of the logs I can, below are some slightly sanitized errors I've come across, but I'm not sure what to make of them. The main error I am seeing is the first one below, across several of my bricks, but possibly only for specific folders on the cluster, I'm not 100% about that yet though. >> >> [2019-04-20 05:56:59.512649] E [MSGID: 113001] [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default [Operation not supported] >> [2019-04-20 05:59:06.084333] E [MSGID: 113001] [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default [Operation not supported] >> [2019-04-20 05:59:43.289030] E [MSGID: 113001] [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default [Operation not supported] >> [2019-04-20 05:59:50.582257] E [MSGID: 113001] [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default [Operation not supported] >> [2019-04-20 06:01:42.501701] E [MSGID: 113001] [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default [Operation not supported] >> [2019-04-20 06:01:51.665354] W [posix.c:4929:posix_getxattr] 0-gvAA01-posix: Extended attributes not supported (try remounting brick with 'user_xattr' flag) >> >> >> [2019-04-20 13:12:36.131856] E [MSGID: 113002] [posix-helpers.c:893:posix_gfid_set] 0-gvAA01-posix: gfid is null for /xxxxxxxxxxxxxxxxxxxx [Invalid argument] >> [2019-04-20 13:12:36.131959] E [MSGID: 113002] [posix.c:362:posix_lookup] 0-gvAA01-posix: buf->ia_gfid is null for /brick2/xxxxxxxxxxxxxxxxxxxx_62906_tmp [No data available] >> [2019-04-20 13:12:36.132016] E [MSGID: 115050] [server-rpc-fops.c:175:server_lookup_cbk] 0-gvAA01-server: 24274759: LOOKUP /xxxxxxxxxxxxxxxxxxxx (a7c9b4a0-b7ee-4d01-a79e-576013c8ac87/Cloud Backup_clone1.vbm_62906_tmp), client: 00-A-16217-2019/04/08-21:23:03:692424-gvAA01-client-4-0-3, error-xlator: gvAA01-posix [No data available] >> [2019-04-20 13:12:38.093719] E [MSGID: 115050] [server-rpc-fops.c:175:server_lookup_cbk] 0-gvAA01-server: 24276491: LOOKUP /xxxxxxxxxxxxxxxxxxxx (a7c9b4a0-b7ee-4d01-a79e-576013c8ac87/Cloud Backup_clone1.vbm_62906_tmp), client: 00-A-16217-2019/04/08-21:23:03:692424-gvAA01-client-4-0-3, error-xlator: gvAA01-posix [No data available] >> [2019-04-20 13:12:38.093660] E [MSGID: 113002] [posix-helpers.c:893:posix_gfid_set] 0-gvAA01-posix: gfid is null for /xxxxxxxxxxxxxxxxxxxx [Invalid argument] >> [2019-04-20 13:12:38.093696] E [MSGID: 113002] [posix.c:362:posix_lookup] 0-gvAA01-posix: buf->ia_gfid is null for /brick2/xxxxxxxxxxxxxxxxxxxx [No data available] >> > > posixacls should clear those up, as mentioned. > >> >> [2019-04-20 14:25:59.654576] E [inodelk.c:404:__inode_unlock_lock] 0-gvAA01-locks: Matching lock not found for unlock 0-9223372036854775807, by 980fdbbd367f0000 on 0x7fc4f0161440 >> [2019-04-20 14:25:59.654668] E [MSGID: 115053] [server-rpc-fops.c:295:server_inodelk_cbk] 0-gvAA01-server: 6092928: INODELK /xxxxxxxxxxxxxxxxxxxx.cdr$ (25b14631-a179-4274-8243-6e272d4f2ad8), client: cb-per-worker18-53637-2019/04/19-14:25:37:927673-gvAA01-client-1-0-4, error-xlator: gvAA01-locks [Invalid argument] >> >> >> [2019-04-20 13:35:07.495495] E [rpcsvc.c:1364:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x247c644, Program: GlusterFS 3.3, ProgVers: 330, Proc: 27) to rpc-transport (tcp.gvAA01-server) >> [2019-04-20 13:35:07.495619] E [server.c:195:server_submit_reply] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.14/xlator/debug/io-stats.so(+0x1696a) [0x7ff4ae6f796a] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.14/xlator/protocol/server.so(+0x2d6e8) [0x7ff4ae2a96e8] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.14/xlator/protocol/server.so(+0x928d) [0x7ff4ae28528d] ) 0-: Reply submission failed >> > > Fix the posix acls and see if these clear up over time as well, I?m unclear on what the overall effect of running without the posix acls will be to total gluster health. Your biggest problem sounds like you need to free up space on the volumes and get the overall volume health back up to par and see if that doesn?t resolve the symptoms you?re seeing. > > >> >> Thank you again for your assistance. It is greatly appreciated. >> >> - Patrick >> >> >> >> On Sat, Apr 20, 2019 at 10:50 PM Darrell Budic > wrote: >> Patrick, >> >> I would definitely upgrade your two nodes from 3.12.14 to 3.12.15. You also mention ZFS, and that error you show makes me think you need to check to be sure you have ?xattr=sa? and ?acltype=posixacl? set on your ZFS volumes. >> >> You also observed your bricks are crossing the 95% full line, ZFS performance will degrade significantly the closer you get to full. In my experience, this starts somewhere between 10% and 5% free space remaining, so you?re in that realm. >> >> How?s your free memory on the servers doing? Do you have your zfs arc cache limited to something less than all the RAM? It shares pretty well, but I?ve encountered situations where other things won?t try and take ram back properly if they think it?s in use, so ZFS never gets the opportunity to give it up. >> >> Since your volume is a disperse-replica, you might try tuning disperse.shd-max-threads, default is 1, I?d try it at 2, 4, or even more if the CPUs are beefy enough. And setting server.event-threads to 4 and client.event-threads to 8 has proven helpful in many cases. After you get upgraded to 3.12.15, enabling performance.stat-prefetch may help as well. I don?t know if it matters, but I?d also recommend resetting performance.least-prio-threads to the default of 1 (or try 2 or 4) and/or also setting performance.io -thread-count to 32 if those have beefy CPUs. >> >> Beyond those general ideas, more info about your hardware (CPU and RAM) and workload (VMs, direct storage for web servers or enders, etc) may net you some more ideas. Then you?re going to have to do more digging into brick logs looking for errors and/or warnings to see what?s going on. >> >> -Darrell >> >> >>> On Apr 20, 2019, at 8:22 AM, Patrick Rennie > wrote: >>> >>> Hello Gluster Users, >>> >>> I am hoping someone can help me with resolving an ongoing issue I've been having, I'm new to mailing lists so forgive me if I have gotten anything wrong. We have noticed our performance deteriorating over the last few weeks, easily measured by trying to do an ls on one of our top-level folders, and timing it, which usually would take 2-5 seconds, and now takes up to 20 minutes, which obviously renders our cluster basically unusable. This has been intermittent in the past but is now almost constant and I am not sure how to work out the exact cause. We have noticed some errors in the brick logs, and have noticed that if we kill the right brick process, performance instantly returns back to normal, this is not always the same brick, but it indicates to me something in the brick processes or background tasks may be causing extreme latency. Due to this ability to fix it by killing the right brick process off, I think it's a specific file, or folder, or operation which may be hanging and causing the increased latency, but I am not sure how to work it out. One last thing to add is that our bricks are getting quite full (~95% full), we are trying to migrate data off to new storage but that is going slowly, not helped by this issue. I am currently trying to run a full heal as there appear to be many files needing healing, and I have all brick processes running so they have an opportunity to heal, but this means performance is very poor. It currently takes over 15-20 minutes to do an ls of one of our top-level folders, which just contains 60-80 other folders, this should take 2-5 seconds. This is all being checked by FUSE mount locally on the storage node itself, but it is the same for other clients and VMs accessing the cluster. Initially, it seemed our NFS mounts were not affected and operated at normal speed, but testing over the last day has shown that our NFS clients are also extremely slow, so it doesn't seem specific to FUSE as I first thought it might be. >>> >>> I am not sure how to proceed from here, I am fairly new to gluster having inherited this setup from my predecessor and trying to keep it going. I have included some info below to try and help with diagnosis, please let me know if any further info would be helpful. I would really appreciate any advice on what I could try to work out the cause. Thank you in advance for reading this, and any suggestions you might be able to offer. >>> >>> - Patrick >>> >>> This is an example of the main error I see in our brick logs, there have been others, I can post them when I see them again too: >>> [2019-04-20 04:54:43.055680] E [MSGID: 113001] [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on /brick1/ library: system.posix_acl_default [Operation not supported] >>> [2019-04-20 05:01:29.476313] W [posix.c:4929:posix_getxattr] 0-gvAA01-posix: Extended attributes not supported (try remounting brick with 'user_xattr' flag) >>> >>> Our setup consists of 2 storage nodes and an arbiter node. I have noticed our nodes are on slightly different versions, I'm not sure if this could be an issue. We have 9 bricks on each node, made up of ZFS RAIDZ2 pools - total capacity is around 560TB. >>> We have bonded 10gbps NICS on each node, and I have tested bandwidth with iperf and found that it's what would be expected from this config. >>> Individual brick performance seems ok, I've tested several bricks using dd and can write a 10GB files at 1.7GB/s. >>> >>> # dd if=/dev/zero of=/brick1/test/test.file bs=1M count=10000 >>> 10000+0 records in >>> 10000+0 records out >>> 10485760000 bytes (10 GB, 9.8 GiB) copied, 6.20303 s, 1.7 GB/s >>> >>> Node 1: >>> # glusterfs --version >>> glusterfs 3.12.15 >>> >>> Node 2: >>> # glusterfs --version >>> glusterfs 3.12.14 >>> >>> Arbiter: >>> # glusterfs --version >>> glusterfs 3.12.14 >>> >>> Here is our gluster volume status: >>> >>> # gluster volume status >>> Status of volume: gvAA01 >>> Gluster process TCP Port RDMA Port Online Pid >>> ------------------------------------------------------------------------------ >>> Brick 01-B:/brick1/gvAA01/brick 49152 0 Y 7219 >>> Brick 02-B:/brick1/gvAA01/brick 49152 0 Y 21845 >>> Brick 00-A:/arbiterAA01/gvAA01/bri >>> ck1 49152 0 Y 6931 >>> Brick 01-B:/brick2/gvAA01/brick 49153 0 Y 7239 >>> Brick 02-B:/brick2/gvAA01/brick 49153 0 Y 9916 >>> Brick 00-A:/arbiterAA01/gvAA01/bri >>> ck2 49153 0 Y 6939 >>> Brick 01-B:/brick3/gvAA01/brick 49154 0 Y 7235 >>> Brick 02-B:/brick3/gvAA01/brick 49154 0 Y 21858 >>> Brick 00-A:/arbiterAA01/gvAA01/bri >>> ck3 49154 0 Y 6947 >>> Brick 01-B:/brick4/gvAA01/brick 49155 0 Y 31840 >>> Brick 02-B:/brick4/gvAA01/brick 49155 0 Y 9933 >>> Brick 00-A:/arbiterAA01/gvAA01/bri >>> ck4 49155 0 Y 6956 >>> Brick 01-B:/brick5/gvAA01/brick 49156 0 Y 7233 >>> Brick 02-B:/brick5/gvAA01/brick 49156 0 Y 9942 >>> Brick 00-A:/arbiterAA01/gvAA01/bri >>> ck5 49156 0 Y 6964 >>> Brick 01-B:/brick6/gvAA01/brick 49157 0 Y 7234 >>> Brick 02-B:/brick6/gvAA01/brick 49157 0 Y 9952 >>> Brick 00-A:/arbiterAA01/gvAA01/bri >>> ck6 49157 0 Y 6974 >>> Brick 01-B:/brick7/gvAA01/brick 49158 0 Y 7248 >>> Brick 02-B:/brick7/gvAA01/brick 49158 0 Y 9960 >>> Brick 00-A:/arbiterAA01/gvAA01/bri >>> ck7 49158 0 Y 6984 >>> Brick 01-B:/brick8/gvAA01/brick 49159 0 Y 7253 >>> Brick 02-B:/brick8/gvAA01/brick 49159 0 Y 9970 >>> Brick 00-A:/arbiterAA01/gvAA01/bri >>> ck8 49159 0 Y 6993 >>> Brick 01-B:/brick9/gvAA01/brick 49160 0 Y 7245 >>> Brick 02-B:/brick9/gvAA01/brick 49160 0 Y 9984 >>> Brick 00-A:/arbiterAA01/gvAA01/bri >>> ck9 49160 0 Y 7001 >>> NFS Server on localhost 2049 0 Y 17276 >>> Self-heal Daemon on localhost N/A N/A Y 25245 >>> NFS Server on 02-B 2049 0 Y 9089 >>> Self-heal Daemon on 02-B N/A N/A Y 17838 >>> NFS Server on 00-a 2049 0 Y 15660 >>> Self-heal Daemon on 00-a N/A N/A Y 16218 >>> >>> Task Status of Volume gvAA01 >>> ------------------------------------------------------------------------------ >>> There are no active volume tasks >>> >>> And gluster volume info: >>> >>> # gluster volume info >>> >>> Volume Name: gvAA01 >>> Type: Distributed-Replicate >>> Volume ID: ca4ece2c-13fe-414b-856c-2878196d6118 >>> Status: Started >>> Snapshot Count: 0 >>> Number of Bricks: 9 x (2 + 1) = 27 >>> Transport-type: tcp >>> Bricks: >>> Brick1: 01-B:/brick1/gvAA01/brick >>> Brick2: 02-B:/brick1/gvAA01/brick >>> Brick3: 00-A:/arbiterAA01/gvAA01/brick1 (arbiter) >>> Brick4: 01-B:/brick2/gvAA01/brick >>> Brick5: 02-B:/brick2/gvAA01/brick >>> Brick6: 00-A:/arbiterAA01/gvAA01/brick2 (arbiter) >>> Brick7: 01-B:/brick3/gvAA01/brick >>> Brick8: 02-B:/brick3/gvAA01/brick >>> Brick9: 00-A:/arbiterAA01/gvAA01/brick3 (arbiter) >>> Brick10: 01-B:/brick4/gvAA01/brick >>> Brick11: 02-B:/brick4/gvAA01/brick >>> Brick12: 00-A:/arbiterAA01/gvAA01/brick4 (arbiter) >>> Brick13: 01-B:/brick5/gvAA01/brick >>> Brick14: 02-B:/brick5/gvAA01/brick >>> Brick15: 00-A:/arbiterAA01/gvAA01/brick5 (arbiter) >>> Brick16: 01-B:/brick6/gvAA01/brick >>> Brick17: 02-B:/brick6/gvAA01/brick >>> Brick18: 00-A:/arbiterAA01/gvAA01/brick6 (arbiter) >>> Brick19: 01-B:/brick7/gvAA01/brick >>> Brick20: 02-B:/brick7/gvAA01/brick >>> Brick21: 00-A:/arbiterAA01/gvAA01/brick7 (arbiter) >>> Brick22: 01-B:/brick8/gvAA01/brick >>> Brick23: 02-B:/brick8/gvAA01/brick >>> Brick24: 00-A:/arbiterAA01/gvAA01/brick8 (arbiter) >>> Brick25: 01-B:/brick9/gvAA01/brick >>> Brick26: 02-B:/brick9/gvAA01/brick >>> Brick27: 00-A:/arbiterAA01/gvAA01/brick9 (arbiter) >>> Options Reconfigured: >>> cluster.shd-max-threads: 4 >>> performance.least-prio-threads: 16 >>> cluster.readdir-optimize: on >>> performance.quick-read: off >>> performance.stat-prefetch: off >>> cluster.data-self-heal: on >>> cluster.lookup-unhashed: auto >>> cluster.lookup-optimize: on >>> cluster.favorite-child-policy: mtime >>> server.allow-insecure: on >>> transport.address-family: inet >>> client.bind-insecure: on >>> cluster.entry-self-heal: off >>> cluster.metadata-self-heal: off >>> performance.md-cache-timeout: 600 >>> cluster.self-heal-daemon: enable >>> performance.readdir-ahead: on >>> diagnostics.brick-log-level: INFO >>> nfs.disable: off >>> >>> Thank you for any assistance. >>> >>> - Patrick >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> https://lists.gluster.org/mailman/listinfo/gluster-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rgowdapp at redhat.com Sun Apr 21 02:33:39 2019 From: rgowdapp at redhat.com (Raghavendra Gowdappa) Date: Sun, 21 Apr 2019 08:03:39 +0530 Subject: [Gluster-users] Settings for VM hosting In-Reply-To: <20190419071816.GH25080@althea.ulrar.net> References: <20190418072722.GF25080@althea.ulrar.net> <20190419071816.GH25080@althea.ulrar.net> Message-ID: On Fri, Apr 19, 2019 at 12:48 PM wrote: > On Fri, Apr 19, 2019 at 06:47:49AM +0530, Krutika Dhananjay wrote: > > Looks good mostly. > > You can also turn on performance.stat-prefetch, and also set > > Ah the corruption bug has been fixed, I missed that. Great ! > Do you have details or bug report of the corruption you saw earlier? Just want to understand what's the exact fix that helped you. > > client.event-threads and server.event-threads to 4. > > I didn't realize that would also apply to libgfapi ? > Good to know, thanks. > > > And if your bricks are on ssds, then you could also enable > > performance.client-io-threads. > > I'm surprised by that, the doc says "This feature is not recommended for > distributed, replicated or distributed-replicated volumes." > Since this volume is just a replica 3, shouldn't this stay off ? > The disks are all nvme, which I assume would count as ssd. > > > And if your bricks and hypervisors are on same set of machines > > (hyperconverged), > > then you can turn off cluster.choose-local and see if it helps read > > performance. > > Thanks, we'll give those a try ! > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From patrickmrennie at gmail.com Sun Apr 21 07:50:08 2019 From: patrickmrennie at gmail.com (Patrick Rennie) Date: Sun, 21 Apr 2019 15:50:08 +0800 Subject: [Gluster-users] Extremely slow cluster performance In-Reply-To: References: <93FC9B39-2E8C-4579-8C9D-DEF1A28B7384@onholyground.com> Message-ID: Hi Darrell, Thanks again for your advice, I've left it for a while but unfortunately it's still just as slow and causing more problems for our operations now. I will need to try and take some steps to at least bring performance back to normal while continuing to investigate the issue longer term. I can definitely see one node with heavier CPU than the other, almost double, which I am OK with, but I think the heal process is going to take forever, trying to check the "gluster volume heal info" shows thousands and thousands of files which may need healing, I have no idea how many in total the command is still running after hours, so I am not sure what has gone so wrong to cause this. I've checked cluster.op-version and cluster.max-op-version and it looks like I'm on the latest version there. I have no idea how long the healing is going to take on this cluster, we have around 560TB of data on here, but I don't think I can wait that long to try and restore performance to normal. Can anyone think of anything else I can try in the meantime to work out what's causing the extreme latency? I've been going through cluster client the logs of some of our VMs and on some of our FTP servers I found this in the cluster mount log, but I am not seeing it on any of our other servers, just our FTP servers. [2019-04-21 07:16:19.925388] E [MSGID: 101046] [dht-common.c:1904:dht_revalidate_cbk] 0-gvAA01-dht: dict is null [2019-04-21 07:19:43.413834] W [MSGID: 114031] [client-rpc-fops.c:2203:client3_3_setattr_cbk] 0-gvAA01-client-19: remote operation failed [No such file or directory] [2019-04-21 07:19:43.414153] W [MSGID: 114031] [client-rpc-fops.c:2203:client3_3_setattr_cbk] 0-gvAA01-client-20: remote operation failed [No such file or directory] [2019-04-21 07:23:33.154717] E [MSGID: 101046] [dht-common.c:1904:dht_revalidate_cbk] 0-gvAA01-dht: dict is null [2019-04-21 07:33:24.943913] E [MSGID: 101046] [dht-common.c:1904:dht_revalidate_cbk] 0-gvAA01-dht: dict is null Any ideas what this could mean? I am basically just grasping at straws here. I am going to hold off on the version upgrade until I know there are no files which need healing, which could be a while, from some reading I've done there shouldn't be any issues with this as both are on v3.12.x I've free'd up a small amount of space, but I still need to work on this further. I've read of a command "find .glusterfs -type f -links -2 -exec rm {} \;" which could be run on each brick and it would potentially clean up any files which were deleted straight from the bricks, but not via the client, I have a feeling this could help me free up about 5-10TB per brick from what I've been told about the history of this cluster. Can anyone confirm if this is actually safe to run? At this stage, I'm open to any suggestions as to how to proceed, thanks again for any advice. Cheers, - Patrick On Sun, Apr 21, 2019 at 1:22 AM Darrell Budic wrote: > Patrick, > > Sounds like progress. Be aware that gluster is expected to max out the > CPUs on at least one of your servers while healing. This is normal and > won?t adversely affect overall performance (any more than having bricks in > need of healing, at any rate) unless you?re overdoing it. shd threads <= 4 > should not do that on your hardware. Other tunings may have also increased > overall performance, so you may see higher CPU than previously anyway. I?d > recommend upping those thread counts and letting it heal as fast as > possible, especially if these are dedicated Gluster storage servers (Ie: > not also running VMs, etc). You should see ?normal? CPU use one heals are > completed. I see ~15-30% overall normally, 95-98% while healing (x my 20 > cores). It?s also likely to be different between your servers, in a pure > replica, one tends to max and one tends to be a little higher, in a > distributed-replica, I?d expect more than one to run harder while healing. > > Keep the differences between doing an ls on a brick and doing an ls on a > gluster mount in mind. When you do a ls on a gluster volume, it isn?t just > doing a ls on one brick, it?s effectively doing it on ALL of your bricks, > and they all have to return data before the ls succeeds. In a distributed > volume, it?s figuring out where on each volume things live and getting the > stat() from each to assemble the whole thing. And if things are in need of > healing, it will take even longer to decide which version is current and > use it (shd triggers a heal anytime it encounters this). Any of these > things being slow slows down the overall response. > > At this point, I?d get some sleep too, and let your cluster heal while you > do. I?d really want it fully healed before I did any updates anyway, so let > it use CPU and get itself sorted out. Expect it to do a round of healing > after you upgrade each machine too, this is normal so don?t let the CPU > spike surprise you, It?s just catching up from the downtime incurred by the > update and/or reboot if you did one. > > That reminds me, check your gluster cluster.op-version and > cluster.max-op-version (gluster vol get all all | grep op-version). If > op-version isn?t at the max-op-verison, set it to it so you?re taking > advantage of the latest features available to your version. > > -Darrell > > On Apr 20, 2019, at 11:54 AM, Patrick Rennie > wrote: > > Hi Darrell, > > Thanks again for your advice, I've applied the acltype=posixacl on my > zpools and I think that has reduced some of the noise from my brick logs. > I also bumped up some of the thread counts you suggested but my CPU load > skyrocketed, so I dropped it back down to something slightly lower, but > still higher than it was before, and will see how that goes for a while. > > Although low space is a definite issue, if I run an ls anywhere on my > bricks directly it's instant, <1 second, and still takes several minutes > via gluster, so there is still a problem in my gluster configuration > somewhere. We don't have any snapshots, but I am trying to work out if any > data on there is safe to delete, or if there is any way I can safely find > and delete data which has been removed directly from the bricks in the > past. I also have lz4 compression already enabled on each zpool which does > help a bit, we get between 1.05 and 1.08x compression on this data. > I've tried to go through each client and checked it's cluster mount logs > and also my brick logs and looking for errors, so far nothing is jumping > out at me, but there are some warnings and errors here and there, I am > trying to work out what they mean. > > It's already 1 am here and unfortunately, I'm still awake working on this > issue, but I think that I will have to leave the version upgrades until > tomorrow. > > Thanks again for your advice so far. If anyone has any ideas on where I > can look for errors other than brick logs or the cluster mount logs to help > resolve this issue, it would be much appreciated. > > Cheers, > > - Patrick > > On Sat, Apr 20, 2019 at 11:57 PM Darrell Budic > wrote: > >> See inline: >> >> On Apr 20, 2019, at 10:09 AM, Patrick Rennie >> wrote: >> >> Hi Darrell, >> >> Thanks for your reply, this issue seems to be getting worse over the last >> few days, really has me tearing my hair out. I will do as you have >> suggested and get started on upgrading from 3.12.14 to 3.12.15. >> I've checked the zfs properties and all bricks have "xattr=sa" set, but >> none of them has "acltype=posixacl" set, currently the acltype property >> shows "off", if I make these changes will it apply retroactively to the >> existing data? I'm unfamiliar with what this will change so I may need to >> look into that before I proceed. >> >> >> It is safe to apply that now, any new set/get calls will then use it if >> new posixacls exist, and use older if not. ZFS is good that way. It should >> clear up your posix_acl and posix errors over time. >> >> I understand performance is going to slow down as the bricks get full, I >> am currently trying to free space and migrate data to some newer storage, I >> have fresh several hundred TB storage I just setup recently but with these >> performance issues it's really slow. I also believe there is significant >> data which has been deleted directly from the bricks in the past, so if I >> can reclaim this space in a safe manner then I will have at least around >> 10-15% free space. >> >> >> Full ZFS volumes will have a much larger impact on performance than you?d >> think, I?d prioritize this. If you have been taking zfs snapshots, consider >> deleting them to get the overall volume free space back up. And just to be >> sure it?s been said, delete from within the mounted volumes, don?t delete >> directly from the bricks (gluster will just try and heal it later, >> compounding your issues). Does not apply to deleting other data from the >> ZFS volume if it?s not part of the brick directory, of course. >> >> These servers have dual 8 core Xeon (E5-2620v4) and 512GB of RAM so >> generally they have plenty of resources available, currently only using >> around 330/512GB of memory. >> >> I will look into what your suggested settings will change, and then will >> probably go ahead with your recommendations, for our specs as stated above, >> what would you suggest for performance.io-thread-count ? >> >> >> I run single 2630v4s on my servers, which have a smaller storage >> footprint than yours. I?d go with 32 for performance.io-thread-count. >> I?d try 4 for the shd thread settings on that gear. Your memory use sounds >> fine, so no worries there. >> >> Our workload is nothing too extreme, we have a few VMs which write backup >> data to this storage nightly for our clients, our VMs don't live on this >> cluster, but just write to it. >> >> >> If they are writing compressible data, you?ll get immediate benefit by >> setting compression=lz4 on your ZFS volumes. It won?t help any old data, of >> course, but it will compress new data going forward. This is another one >> that?s safe to enable on the fly. >> >> I've been going through all of the logs I can, below are some slightly >> sanitized errors I've come across, but I'm not sure what to make of them. >> The main error I am seeing is the first one below, across several of my >> bricks, but possibly only for specific folders on the cluster, I'm not 100% >> about that yet though. >> >> [2019-04-20 05:56:59.512649] E [MSGID: 113001] >> [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on >> /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default [Operation not >> supported] >> [2019-04-20 05:59:06.084333] E [MSGID: 113001] >> [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on >> /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default [Operation not >> supported] >> [2019-04-20 05:59:43.289030] E [MSGID: 113001] >> [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on >> /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default [Operation not >> supported] >> [2019-04-20 05:59:50.582257] E [MSGID: 113001] >> [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on >> /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default [Operation not >> supported] >> [2019-04-20 06:01:42.501701] E [MSGID: 113001] >> [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on >> /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default [Operation not >> supported] >> [2019-04-20 06:01:51.665354] W [posix.c:4929:posix_getxattr] >> 0-gvAA01-posix: Extended attributes not supported (try remounting brick >> with 'user_xattr' flag) >> >> >> [2019-04-20 13:12:36.131856] E [MSGID: 113002] >> [posix-helpers.c:893:posix_gfid_set] 0-gvAA01-posix: gfid is null for >> /xxxxxxxxxxxxxxxxxxxx [Invalid argument] >> [2019-04-20 13:12:36.131959] E [MSGID: 113002] [posix.c:362:posix_lookup] >> 0-gvAA01-posix: buf->ia_gfid is null for >> /brick2/xxxxxxxxxxxxxxxxxxxx_62906_tmp [No data available] >> [2019-04-20 13:12:36.132016] E [MSGID: 115050] >> [server-rpc-fops.c:175:server_lookup_cbk] 0-gvAA01-server: 24274759: LOOKUP >> /xxxxxxxxxxxxxxxxxxxx (a7c9b4a0-b7ee-4d01-a79e-576013c8ac87/Cloud >> Backup_clone1.vbm_62906_tmp), client: >> 00-A-16217-2019/04/08-21:23:03:692424-gvAA01-client-4-0-3, error-xlator: >> gvAA01-posix [No data available] >> [2019-04-20 13:12:38.093719] E [MSGID: 115050] >> [server-rpc-fops.c:175:server_lookup_cbk] 0-gvAA01-server: 24276491: LOOKUP >> /xxxxxxxxxxxxxxxxxxxx (a7c9b4a0-b7ee-4d01-a79e-576013c8ac87/Cloud >> Backup_clone1.vbm_62906_tmp), client: >> 00-A-16217-2019/04/08-21:23:03:692424-gvAA01-client-4-0-3, error-xlator: >> gvAA01-posix [No data available] >> [2019-04-20 13:12:38.093660] E [MSGID: 113002] >> [posix-helpers.c:893:posix_gfid_set] 0-gvAA01-posix: gfid is null for >> /xxxxxxxxxxxxxxxxxxxx [Invalid argument] >> [2019-04-20 13:12:38.093696] E [MSGID: 113002] [posix.c:362:posix_lookup] >> 0-gvAA01-posix: buf->ia_gfid is null for /brick2/xxxxxxxxxxxxxxxxxxxx [No >> data available] >> >> >> posixacls should clear those up, as mentioned. >> >> >> [2019-04-20 14:25:59.654576] E [inodelk.c:404:__inode_unlock_lock] >> 0-gvAA01-locks: Matching lock not found for unlock 0-9223372036854775807, >> by 980fdbbd367f0000 on 0x7fc4f0161440 >> [2019-04-20 14:25:59.654668] E [MSGID: 115053] >> [server-rpc-fops.c:295:server_inodelk_cbk] 0-gvAA01-server: 6092928: >> INODELK /xxxxxxxxxxxxxxxxxxxx.cdr$ (25b14631-a179-4274-8243-6e272d4f2ad8), >> client: >> cb-per-worker18-53637-2019/04/19-14:25:37:927673-gvAA01-client-1-0-4, >> error-xlator: gvAA01-locks [Invalid argument] >> >> >> [2019-04-20 13:35:07.495495] E [rpcsvc.c:1364:rpcsvc_submit_generic] >> 0-rpc-service: failed to submit message (XID: 0x247c644, Program: GlusterFS >> 3.3, ProgVers: 330, Proc: 27) to rpc-transport (tcp.gvAA01-server) >> [2019-04-20 13:35:07.495619] E [server.c:195:server_submit_reply] >> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.14/xlator/debug/io-stats.so(+0x1696a) >> [0x7ff4ae6f796a] >> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.14/xlator/protocol/server.so(+0x2d6e8) >> [0x7ff4ae2a96e8] >> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.14/xlator/protocol/server.so(+0x928d) >> [0x7ff4ae28528d] ) 0-: Reply submission failed >> >> >> Fix the posix acls and see if these clear up over time as well, I?m >> unclear on what the overall effect of running without the posix acls will >> be to total gluster health. Your biggest problem sounds like you need to >> free up space on the volumes and get the overall volume health back up to >> par and see if that doesn?t resolve the symptoms you?re seeing. >> >> >> >> Thank you again for your assistance. It is greatly appreciated. >> >> - Patrick >> >> >> >> On Sat, Apr 20, 2019 at 10:50 PM Darrell Budic >> wrote: >> >>> Patrick, >>> >>> I would definitely upgrade your two nodes from 3.12.14 to 3.12.15. You >>> also mention ZFS, and that error you show makes me think you need to check >>> to be sure you have ?xattr=sa? and ?acltype=posixacl? set on your ZFS >>> volumes. >>> >>> You also observed your bricks are crossing the 95% full line, ZFS >>> performance will degrade significantly the closer you get to full. In my >>> experience, this starts somewhere between 10% and 5% free space remaining, >>> so you?re in that realm. >>> >>> How?s your free memory on the servers doing? Do you have your zfs arc >>> cache limited to something less than all the RAM? It shares pretty well, >>> but I?ve encountered situations where other things won?t try and take ram >>> back properly if they think it?s in use, so ZFS never gets the opportunity >>> to give it up. >>> >>> Since your volume is a disperse-replica, you might try tuning >>> disperse.shd-max-threads, default is 1, I?d try it at 2, 4, or even more if >>> the CPUs are beefy enough. And setting server.event-threads to 4 and >>> client.event-threads to 8 has proven helpful in many cases. After you get >>> upgraded to 3.12.15, enabling performance.stat-prefetch may help as well. I >>> don?t know if it matters, but I?d also recommend resetting >>> performance.least-prio-threads to the default of 1 (or try 2 or 4) and/or >>> also setting performance.io-thread-count to 32 if those have beefy CPUs. >>> >>> Beyond those general ideas, more info about your hardware (CPU and RAM) >>> and workload (VMs, direct storage for web servers or enders, etc) may net >>> you some more ideas. Then you?re going to have to do more digging into >>> brick logs looking for errors and/or warnings to see what?s going on. >>> >>> -Darrell >>> >>> >>> On Apr 20, 2019, at 8:22 AM, Patrick Rennie >>> wrote: >>> >>> Hello Gluster Users, >>> >>> I am hoping someone can help me with resolving an ongoing issue I've >>> been having, I'm new to mailing lists so forgive me if I have gotten >>> anything wrong. We have noticed our performance deteriorating over the last >>> few weeks, easily measured by trying to do an ls on one of our top-level >>> folders, and timing it, which usually would take 2-5 seconds, and now takes >>> up to 20 minutes, which obviously renders our cluster basically unusable. >>> This has been intermittent in the past but is now almost constant and I am >>> not sure how to work out the exact cause. We have noticed some errors in >>> the brick logs, and have noticed that if we kill the right brick process, >>> performance instantly returns back to normal, this is not always the same >>> brick, but it indicates to me something in the brick processes or >>> background tasks may be causing extreme latency. Due to this ability to fix >>> it by killing the right brick process off, I think it's a specific file, or >>> folder, or operation which may be hanging and causing the increased >>> latency, but I am not sure how to work it out. One last thing to add is >>> that our bricks are getting quite full (~95% full), we are trying to >>> migrate data off to new storage but that is going slowly, not helped by >>> this issue. I am currently trying to run a full heal as there appear to be >>> many files needing healing, and I have all brick processes running so they >>> have an opportunity to heal, but this means performance is very poor. It >>> currently takes over 15-20 minutes to do an ls of one of our top-level >>> folders, which just contains 60-80 other folders, this should take 2-5 >>> seconds. This is all being checked by FUSE mount locally on the storage >>> node itself, but it is the same for other clients and VMs accessing the >>> cluster. Initially, it seemed our NFS mounts were not affected and operated >>> at normal speed, but testing over the last day has shown that our NFS >>> clients are also extremely slow, so it doesn't seem specific to FUSE as I >>> first thought it might be. >>> >>> I am not sure how to proceed from here, I am fairly new to gluster >>> having inherited this setup from my predecessor and trying to keep it >>> going. I have included some info below to try and help with diagnosis, >>> please let me know if any further info would be helpful. I would really >>> appreciate any advice on what I could try to work out the cause. Thank you >>> in advance for reading this, and any suggestions you might be able to >>> offer. >>> >>> - Patrick >>> >>> This is an example of the main error I see in our brick logs, there have >>> been others, I can post them when I see them again too: >>> [2019-04-20 04:54:43.055680] E [MSGID: 113001] >>> [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on >>> /brick1/ library: system.posix_acl_default [Operation not >>> supported] >>> [2019-04-20 05:01:29.476313] W [posix.c:4929:posix_getxattr] >>> 0-gvAA01-posix: Extended attributes not supported (try remounting brick >>> with 'user_xattr' flag) >>> >>> Our setup consists of 2 storage nodes and an arbiter node. I have >>> noticed our nodes are on slightly different versions, I'm not sure if this >>> could be an issue. We have 9 bricks on each node, made up of ZFS RAIDZ2 >>> pools - total capacity is around 560TB. >>> We have bonded 10gbps NICS on each node, and I have tested bandwidth >>> with iperf and found that it's what would be expected from this config. >>> Individual brick performance seems ok, I've tested several bricks using >>> dd and can write a 10GB files at 1.7GB/s. >>> >>> # dd if=/dev/zero of=/brick1/test/test.file bs=1M count=10000 >>> 10000+0 records in >>> 10000+0 records out >>> 10485760000 bytes (10 GB, 9.8 GiB) copied, 6.20303 s, 1.7 GB/s >>> >>> Node 1: >>> # glusterfs --version >>> glusterfs 3.12.15 >>> >>> Node 2: >>> # glusterfs --version >>> glusterfs 3.12.14 >>> >>> Arbiter: >>> # glusterfs --version >>> glusterfs 3.12.14 >>> >>> Here is our gluster volume status: >>> >>> # gluster volume status >>> Status of volume: gvAA01 >>> Gluster process TCP Port RDMA Port Online >>> Pid >>> >>> ------------------------------------------------------------------------------ >>> Brick 01-B:/brick1/gvAA01/brick 49152 0 Y 7219 >>> Brick 02-B:/brick1/gvAA01/brick 49152 0 Y 21845 >>> Brick 00-A:/arbiterAA01/gvAA01/bri >>> ck1 49152 0 Y >>> 6931 >>> Brick 01-B:/brick2/gvAA01/brick 49153 0 Y 7239 >>> Brick 02-B:/brick2/gvAA01/brick 49153 0 Y 9916 >>> Brick 00-A:/arbiterAA01/gvAA01/bri >>> ck2 49153 0 Y >>> 6939 >>> Brick 01-B:/brick3/gvAA01/brick 49154 0 Y 7235 >>> Brick 02-B:/brick3/gvAA01/brick 49154 0 Y 21858 >>> Brick 00-A:/arbiterAA01/gvAA01/bri >>> ck3 49154 0 Y >>> 6947 >>> Brick 01-B:/brick4/gvAA01/brick 49155 0 Y 31840 >>> Brick 02-B:/brick4/gvAA01/brick 49155 0 Y 9933 >>> Brick 00-A:/arbiterAA01/gvAA01/bri >>> ck4 49155 0 Y >>> 6956 >>> Brick 01-B:/brick5/gvAA01/brick 49156 0 Y 7233 >>> Brick 02-B:/brick5/gvAA01/brick 49156 0 Y 9942 >>> Brick 00-A:/arbiterAA01/gvAA01/bri >>> ck5 49156 0 Y >>> 6964 >>> Brick 01-B:/brick6/gvAA01/brick 49157 0 Y 7234 >>> Brick 02-B:/brick6/gvAA01/brick 49157 0 Y 9952 >>> Brick 00-A:/arbiterAA01/gvAA01/bri >>> ck6 49157 0 Y >>> 6974 >>> Brick 01-B:/brick7/gvAA01/brick 49158 0 Y 7248 >>> Brick 02-B:/brick7/gvAA01/brick 49158 0 Y 9960 >>> Brick 00-A:/arbiterAA01/gvAA01/bri >>> ck7 49158 0 Y >>> 6984 >>> Brick 01-B:/brick8/gvAA01/brick 49159 0 Y 7253 >>> Brick 02-B:/brick8/gvAA01/brick 49159 0 Y 9970 >>> Brick 00-A:/arbiterAA01/gvAA01/bri >>> ck8 49159 0 Y >>> 6993 >>> Brick 01-B:/brick9/gvAA01/brick 49160 0 Y 7245 >>> Brick 02-B:/brick9/gvAA01/brick 49160 0 Y 9984 >>> Brick 00-A:/arbiterAA01/gvAA01/bri >>> ck9 49160 0 Y >>> 7001 >>> NFS Server on localhost 2049 0 Y >>> 17276 >>> Self-heal Daemon on localhost N/A N/A Y >>> 25245 >>> NFS Server on 02-B 2049 0 Y 9089 >>> Self-heal Daemon on 02-B N/A N/A Y 17838 >>> NFS Server on 00-a 2049 0 Y 15660 >>> Self-heal Daemon on 00-a N/A N/A Y 16218 >>> >>> Task Status of Volume gvAA01 >>> >>> ------------------------------------------------------------------------------ >>> There are no active volume tasks >>> >>> And gluster volume info: >>> >>> # gluster volume info >>> >>> Volume Name: gvAA01 >>> Type: Distributed-Replicate >>> Volume ID: ca4ece2c-13fe-414b-856c-2878196d6118 >>> Status: Started >>> Snapshot Count: 0 >>> Number of Bricks: 9 x (2 + 1) = 27 >>> Transport-type: tcp >>> Bricks: >>> Brick1: 01-B:/brick1/gvAA01/brick >>> Brick2: 02-B:/brick1/gvAA01/brick >>> Brick3: 00-A:/arbiterAA01/gvAA01/brick1 (arbiter) >>> Brick4: 01-B:/brick2/gvAA01/brick >>> Brick5: 02-B:/brick2/gvAA01/brick >>> Brick6: 00-A:/arbiterAA01/gvAA01/brick2 (arbiter) >>> Brick7: 01-B:/brick3/gvAA01/brick >>> Brick8: 02-B:/brick3/gvAA01/brick >>> Brick9: 00-A:/arbiterAA01/gvAA01/brick3 (arbiter) >>> Brick10: 01-B:/brick4/gvAA01/brick >>> Brick11: 02-B:/brick4/gvAA01/brick >>> Brick12: 00-A:/arbiterAA01/gvAA01/brick4 (arbiter) >>> Brick13: 01-B:/brick5/gvAA01/brick >>> Brick14: 02-B:/brick5/gvAA01/brick >>> Brick15: 00-A:/arbiterAA01/gvAA01/brick5 (arbiter) >>> Brick16: 01-B:/brick6/gvAA01/brick >>> Brick17: 02-B:/brick6/gvAA01/brick >>> Brick18: 00-A:/arbiterAA01/gvAA01/brick6 (arbiter) >>> Brick19: 01-B:/brick7/gvAA01/brick >>> Brick20: 02-B:/brick7/gvAA01/brick >>> Brick21: 00-A:/arbiterAA01/gvAA01/brick7 (arbiter) >>> Brick22: 01-B:/brick8/gvAA01/brick >>> Brick23: 02-B:/brick8/gvAA01/brick >>> Brick24: 00-A:/arbiterAA01/gvAA01/brick8 (arbiter) >>> Brick25: 01-B:/brick9/gvAA01/brick >>> Brick26: 02-B:/brick9/gvAA01/brick >>> Brick27: 00-A:/arbiterAA01/gvAA01/brick9 (arbiter) >>> Options Reconfigured: >>> cluster.shd-max-threads: 4 >>> performance.least-prio-threads: 16 >>> cluster.readdir-optimize: on >>> performance.quick-read: off >>> performance.stat-prefetch: off >>> cluster.data-self-heal: on >>> cluster.lookup-unhashed: auto >>> cluster.lookup-optimize: on >>> cluster.favorite-child-policy: mtime >>> server.allow-insecure: on >>> transport.address-family: inet >>> client.bind-insecure: on >>> cluster.entry-self-heal: off >>> cluster.metadata-self-heal: off >>> performance.md-cache-timeout: 600 >>> cluster.self-heal-daemon: enable >>> performance.readdir-ahead: on >>> diagnostics.brick-log-level: INFO >>> nfs.disable: off >>> >>> Thank you for any assistance. >>> >>> - Patrick >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> https://lists.gluster.org/mailman/listinfo/gluster-users >>> >>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From patrickmrennie at gmail.com Sun Apr 21 07:55:21 2019 From: patrickmrennie at gmail.com (Patrick Rennie) Date: Sun, 21 Apr 2019 15:55:21 +0800 Subject: [Gluster-users] Extremely slow cluster performance In-Reply-To: References: <93FC9B39-2E8C-4579-8C9D-DEF1A28B7384@onholyground.com> Message-ID: Just another small update, I'm continuing to watch my brick logs and I just saw these errors come up in the recent events too. I am going to continue to post any errors I see in the hope of finding the right one to try and fix.. This is from the logs on brick1, seems to be occurring on both nodes on brick1, although at different times. I'm not sure what this means, can anyone shed any light? I guess I am looking for some kind of specific error which may indicate something is broken or stuck and locking up and causing the extreme latency I'm seeing in the cluster. [2019-04-21 07:25:55.064497] E [rpcsvc.c:1364:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x7c700c, Program: GlusterFS 3.3, ProgVers: 330, Proc: 29) to rpc-transport (tcp.gvAA01-server) [2019-04-21 07:25:55.064612] E [server.c:195:server_submit_reply] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e58a) [0x7f3b3e93158a] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17d45) [0x7f3b3e4c5d45] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) [0x7f3b3e4b72cd] ) 0-: Reply submission failed [2019-04-21 07:25:55.064675] E [rpcsvc.c:1364:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x7c70af, Program: GlusterFS 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server) [2019-04-21 07:25:55.064705] E [server.c:195:server_submit_reply] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa) [0x7f3b3e9318fa] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35) [0x7f3b3e4c5f35] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) [0x7f3b3e4b72cd] ) 0-: Reply submission failed [2019-04-21 07:25:55.064742] E [rpcsvc.c:1364:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x7c723c, Program: GlusterFS 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server) [2019-04-21 07:25:55.064768] E [server.c:195:server_submit_reply] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa) [0x7f3b3e9318fa] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35) [0x7f3b3e4c5f35] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) [0x7f3b3e4b72cd] ) 0-: Reply submission failed [2019-04-21 07:25:55.064812] E [rpcsvc.c:1364:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x7c72b4, Program: GlusterFS 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server) [2019-04-21 07:25:55.064837] E [server.c:195:server_submit_reply] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa) [0x7f3b3e9318fa] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35) [0x7f3b3e4c5f35] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) [0x7f3b3e4b72cd] ) 0-: Reply submission failed [2019-04-21 07:25:55.064880] E [rpcsvc.c:1364:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x7c740b, Program: GlusterFS 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server) [2019-04-21 07:25:55.064905] E [server.c:195:server_submit_reply] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa) [0x7f3b3e9318fa] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35) [0x7f3b3e4c5f35] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) [0x7f3b3e4b72cd] ) 0-: Reply submission failed [2019-04-21 07:25:55.064939] E [rpcsvc.c:1364:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x7c7441, Program: GlusterFS 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server) [2019-04-21 07:25:55.064962] E [server.c:195:server_submit_reply] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa) [0x7f3b3e9318fa] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35) [0x7f3b3e4c5f35] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) [0x7f3b3e4b72cd] ) 0-: Reply submission failed [2019-04-21 07:25:55.064996] E [rpcsvc.c:1364:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x7c74d5, Program: GlusterFS 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server) [2019-04-21 07:25:55.065020] E [server.c:195:server_submit_reply] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa) [0x7f3b3e9318fa] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35) [0x7f3b3e4c5f35] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) [0x7f3b3e4b72cd] ) 0-: Reply submission failed [2019-04-21 07:25:55.065052] E [rpcsvc.c:1364:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x7c7551, Program: GlusterFS 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server) [2019-04-21 07:25:55.065076] E [server.c:195:server_submit_reply] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa) [0x7f3b3e9318fa] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35) [0x7f3b3e4c5f35] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) [0x7f3b3e4b72cd] ) 0-: Reply submission failed [2019-04-21 07:25:55.065110] E [rpcsvc.c:1364:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x7c76d1, Program: GlusterFS 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server) [2019-04-21 07:25:55.065133] E [server.c:195:server_submit_reply] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa) [0x7f3b3e9318fa] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35) [0x7f3b3e4c5f35] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) [0x7f3b3e4b72cd] ) 0-: Reply submission failed Thanks again, -Patrick On Sun, Apr 21, 2019 at 3:50 PM Patrick Rennie wrote: > Hi Darrell, > > Thanks again for your advice, I've left it for a while but unfortunately > it's still just as slow and causing more problems for our operations now. I > will need to try and take some steps to at least bring performance back to > normal while continuing to investigate the issue longer term. I can > definitely see one node with heavier CPU than the other, almost double, > which I am OK with, but I think the heal process is going to take forever, > trying to check the "gluster volume heal info" shows thousands and > thousands of files which may need healing, I have no idea how many in total > the command is still running after hours, so I am not sure what has gone so > wrong to cause this. > > I've checked cluster.op-version and cluster.max-op-version and it looks > like I'm on the latest version there. > > I have no idea how long the healing is going to take on this cluster, we > have around 560TB of data on here, but I don't think I can wait that long > to try and restore performance to normal. > > Can anyone think of anything else I can try in the meantime to work out > what's causing the extreme latency? > > I've been going through cluster client the logs of some of our VMs and on > some of our FTP servers I found this in the cluster mount log, but I am not > seeing it on any of our other servers, just our FTP servers. > > [2019-04-21 07:16:19.925388] E [MSGID: 101046] > [dht-common.c:1904:dht_revalidate_cbk] 0-gvAA01-dht: dict is null > [2019-04-21 07:19:43.413834] W [MSGID: 114031] > [client-rpc-fops.c:2203:client3_3_setattr_cbk] 0-gvAA01-client-19: remote > operation failed [No such file or directory] > [2019-04-21 07:19:43.414153] W [MSGID: 114031] > [client-rpc-fops.c:2203:client3_3_setattr_cbk] 0-gvAA01-client-20: remote > operation failed [No such file or directory] > [2019-04-21 07:23:33.154717] E [MSGID: 101046] > [dht-common.c:1904:dht_revalidate_cbk] 0-gvAA01-dht: dict is null > [2019-04-21 07:33:24.943913] E [MSGID: 101046] > [dht-common.c:1904:dht_revalidate_cbk] 0-gvAA01-dht: dict is null > > Any ideas what this could mean? I am basically just grasping at straws > here. > > I am going to hold off on the version upgrade until I know there are no > files which need healing, which could be a while, from some reading I've > done there shouldn't be any issues with this as both are on v3.12.x > > I've free'd up a small amount of space, but I still need to work on this > further. > > I've read of a command "find .glusterfs -type f -links -2 -exec rm {} \;" > which could be run on each brick and it would potentially clean up any > files which were deleted straight from the bricks, but not via the client, > I have a feeling this could help me free up about 5-10TB per brick from > what I've been told about the history of this cluster. Can anyone confirm > if this is actually safe to run? > > At this stage, I'm open to any suggestions as to how to proceed, thanks > again for any advice. > > Cheers, > > - Patrick > > On Sun, Apr 21, 2019 at 1:22 AM Darrell Budic > wrote: > >> Patrick, >> >> Sounds like progress. Be aware that gluster is expected to max out the >> CPUs on at least one of your servers while healing. This is normal and >> won?t adversely affect overall performance (any more than having bricks in >> need of healing, at any rate) unless you?re overdoing it. shd threads <= 4 >> should not do that on your hardware. Other tunings may have also increased >> overall performance, so you may see higher CPU than previously anyway. I?d >> recommend upping those thread counts and letting it heal as fast as >> possible, especially if these are dedicated Gluster storage servers (Ie: >> not also running VMs, etc). You should see ?normal? CPU use one heals are >> completed. I see ~15-30% overall normally, 95-98% while healing (x my 20 >> cores). It?s also likely to be different between your servers, in a pure >> replica, one tends to max and one tends to be a little higher, in a >> distributed-replica, I?d expect more than one to run harder while healing. >> >> Keep the differences between doing an ls on a brick and doing an ls on a >> gluster mount in mind. When you do a ls on a gluster volume, it isn?t just >> doing a ls on one brick, it?s effectively doing it on ALL of your bricks, >> and they all have to return data before the ls succeeds. In a distributed >> volume, it?s figuring out where on each volume things live and getting the >> stat() from each to assemble the whole thing. And if things are in need of >> healing, it will take even longer to decide which version is current and >> use it (shd triggers a heal anytime it encounters this). Any of these >> things being slow slows down the overall response. >> >> At this point, I?d get some sleep too, and let your cluster heal while >> you do. I?d really want it fully healed before I did any updates anyway, so >> let it use CPU and get itself sorted out. Expect it to do a round of >> healing after you upgrade each machine too, this is normal so don?t let the >> CPU spike surprise you, It?s just catching up from the downtime incurred by >> the update and/or reboot if you did one. >> >> That reminds me, check your gluster cluster.op-version and >> cluster.max-op-version (gluster vol get all all | grep op-version). If >> op-version isn?t at the max-op-verison, set it to it so you?re taking >> advantage of the latest features available to your version. >> >> -Darrell >> >> On Apr 20, 2019, at 11:54 AM, Patrick Rennie >> wrote: >> >> Hi Darrell, >> >> Thanks again for your advice, I've applied the acltype=posixacl on my >> zpools and I think that has reduced some of the noise from my brick logs. >> I also bumped up some of the thread counts you suggested but my CPU load >> skyrocketed, so I dropped it back down to something slightly lower, but >> still higher than it was before, and will see how that goes for a while. >> >> Although low space is a definite issue, if I run an ls anywhere on my >> bricks directly it's instant, <1 second, and still takes several minutes >> via gluster, so there is still a problem in my gluster configuration >> somewhere. We don't have any snapshots, but I am trying to work out if any >> data on there is safe to delete, or if there is any way I can safely find >> and delete data which has been removed directly from the bricks in the >> past. I also have lz4 compression already enabled on each zpool which does >> help a bit, we get between 1.05 and 1.08x compression on this data. >> I've tried to go through each client and checked it's cluster mount logs >> and also my brick logs and looking for errors, so far nothing is jumping >> out at me, but there are some warnings and errors here and there, I am >> trying to work out what they mean. >> >> It's already 1 am here and unfortunately, I'm still awake working on this >> issue, but I think that I will have to leave the version upgrades until >> tomorrow. >> >> Thanks again for your advice so far. If anyone has any ideas on where I >> can look for errors other than brick logs or the cluster mount logs to help >> resolve this issue, it would be much appreciated. >> >> Cheers, >> >> - Patrick >> >> On Sat, Apr 20, 2019 at 11:57 PM Darrell Budic >> wrote: >> >>> See inline: >>> >>> On Apr 20, 2019, at 10:09 AM, Patrick Rennie >>> wrote: >>> >>> Hi Darrell, >>> >>> Thanks for your reply, this issue seems to be getting worse over the >>> last few days, really has me tearing my hair out. I will do as you have >>> suggested and get started on upgrading from 3.12.14 to 3.12.15. >>> I've checked the zfs properties and all bricks have "xattr=sa" set, but >>> none of them has "acltype=posixacl" set, currently the acltype property >>> shows "off", if I make these changes will it apply retroactively to the >>> existing data? I'm unfamiliar with what this will change so I may need to >>> look into that before I proceed. >>> >>> >>> It is safe to apply that now, any new set/get calls will then use it if >>> new posixacls exist, and use older if not. ZFS is good that way. It should >>> clear up your posix_acl and posix errors over time. >>> >>> I understand performance is going to slow down as the bricks get full, I >>> am currently trying to free space and migrate data to some newer storage, I >>> have fresh several hundred TB storage I just setup recently but with these >>> performance issues it's really slow. I also believe there is significant >>> data which has been deleted directly from the bricks in the past, so if I >>> can reclaim this space in a safe manner then I will have at least around >>> 10-15% free space. >>> >>> >>> Full ZFS volumes will have a much larger impact on performance than >>> you?d think, I?d prioritize this. If you have been taking zfs snapshots, >>> consider deleting them to get the overall volume free space back up. And >>> just to be sure it?s been said, delete from within the mounted volumes, >>> don?t delete directly from the bricks (gluster will just try and heal it >>> later, compounding your issues). Does not apply to deleting other data from >>> the ZFS volume if it?s not part of the brick directory, of course. >>> >>> These servers have dual 8 core Xeon (E5-2620v4) and 512GB of RAM so >>> generally they have plenty of resources available, currently only using >>> around 330/512GB of memory. >>> >>> I will look into what your suggested settings will change, and then will >>> probably go ahead with your recommendations, for our specs as stated above, >>> what would you suggest for performance.io-thread-count ? >>> >>> >>> I run single 2630v4s on my servers, which have a smaller storage >>> footprint than yours. I?d go with 32 for performance.io-thread-count. >>> I?d try 4 for the shd thread settings on that gear. Your memory use sounds >>> fine, so no worries there. >>> >>> Our workload is nothing too extreme, we have a few VMs which write >>> backup data to this storage nightly for our clients, our VMs don't live on >>> this cluster, but just write to it. >>> >>> >>> If they are writing compressible data, you?ll get immediate benefit by >>> setting compression=lz4 on your ZFS volumes. It won?t help any old data, of >>> course, but it will compress new data going forward. This is another one >>> that?s safe to enable on the fly. >>> >>> I've been going through all of the logs I can, below are some slightly >>> sanitized errors I've come across, but I'm not sure what to make of them. >>> The main error I am seeing is the first one below, across several of my >>> bricks, but possibly only for specific folders on the cluster, I'm not 100% >>> about that yet though. >>> >>> [2019-04-20 05:56:59.512649] E [MSGID: 113001] >>> [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on >>> /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default [Operation not >>> supported] >>> [2019-04-20 05:59:06.084333] E [MSGID: 113001] >>> [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on >>> /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default [Operation not >>> supported] >>> [2019-04-20 05:59:43.289030] E [MSGID: 113001] >>> [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on >>> /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default [Operation not >>> supported] >>> [2019-04-20 05:59:50.582257] E [MSGID: 113001] >>> [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on >>> /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default [Operation not >>> supported] >>> [2019-04-20 06:01:42.501701] E [MSGID: 113001] >>> [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on >>> /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default [Operation not >>> supported] >>> [2019-04-20 06:01:51.665354] W [posix.c:4929:posix_getxattr] >>> 0-gvAA01-posix: Extended attributes not supported (try remounting brick >>> with 'user_xattr' flag) >>> >>> >>> [2019-04-20 13:12:36.131856] E [MSGID: 113002] >>> [posix-helpers.c:893:posix_gfid_set] 0-gvAA01-posix: gfid is null for >>> /xxxxxxxxxxxxxxxxxxxx [Invalid argument] >>> [2019-04-20 13:12:36.131959] E [MSGID: 113002] >>> [posix.c:362:posix_lookup] 0-gvAA01-posix: buf->ia_gfid is null for >>> /brick2/xxxxxxxxxxxxxxxxxxxx_62906_tmp [No data available] >>> [2019-04-20 13:12:36.132016] E [MSGID: 115050] >>> [server-rpc-fops.c:175:server_lookup_cbk] 0-gvAA01-server: 24274759: LOOKUP >>> /xxxxxxxxxxxxxxxxxxxx (a7c9b4a0-b7ee-4d01-a79e-576013c8ac87/Cloud >>> Backup_clone1.vbm_62906_tmp), client: >>> 00-A-16217-2019/04/08-21:23:03:692424-gvAA01-client-4-0-3, error-xlator: >>> gvAA01-posix [No data available] >>> [2019-04-20 13:12:38.093719] E [MSGID: 115050] >>> [server-rpc-fops.c:175:server_lookup_cbk] 0-gvAA01-server: 24276491: LOOKUP >>> /xxxxxxxxxxxxxxxxxxxx (a7c9b4a0-b7ee-4d01-a79e-576013c8ac87/Cloud >>> Backup_clone1.vbm_62906_tmp), client: >>> 00-A-16217-2019/04/08-21:23:03:692424-gvAA01-client-4-0-3, error-xlator: >>> gvAA01-posix [No data available] >>> [2019-04-20 13:12:38.093660] E [MSGID: 113002] >>> [posix-helpers.c:893:posix_gfid_set] 0-gvAA01-posix: gfid is null for >>> /xxxxxxxxxxxxxxxxxxxx [Invalid argument] >>> [2019-04-20 13:12:38.093696] E [MSGID: 113002] >>> [posix.c:362:posix_lookup] 0-gvAA01-posix: buf->ia_gfid is null for >>> /brick2/xxxxxxxxxxxxxxxxxxxx [No data available] >>> >>> >>> posixacls should clear those up, as mentioned. >>> >>> >>> [2019-04-20 14:25:59.654576] E [inodelk.c:404:__inode_unlock_lock] >>> 0-gvAA01-locks: Matching lock not found for unlock 0-9223372036854775807, >>> by 980fdbbd367f0000 on 0x7fc4f0161440 >>> [2019-04-20 14:25:59.654668] E [MSGID: 115053] >>> [server-rpc-fops.c:295:server_inodelk_cbk] 0-gvAA01-server: 6092928: >>> INODELK /xxxxxxxxxxxxxxxxxxxx.cdr$ (25b14631-a179-4274-8243-6e272d4f2ad8), >>> client: >>> cb-per-worker18-53637-2019/04/19-14:25:37:927673-gvAA01-client-1-0-4, >>> error-xlator: gvAA01-locks [Invalid argument] >>> >>> >>> [2019-04-20 13:35:07.495495] E [rpcsvc.c:1364:rpcsvc_submit_generic] >>> 0-rpc-service: failed to submit message (XID: 0x247c644, Program: GlusterFS >>> 3.3, ProgVers: 330, Proc: 27) to rpc-transport (tcp.gvAA01-server) >>> [2019-04-20 13:35:07.495619] E [server.c:195:server_submit_reply] >>> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.14/xlator/debug/io-stats.so(+0x1696a) >>> [0x7ff4ae6f796a] >>> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.14/xlator/protocol/server.so(+0x2d6e8) >>> [0x7ff4ae2a96e8] >>> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.14/xlator/protocol/server.so(+0x928d) >>> [0x7ff4ae28528d] ) 0-: Reply submission failed >>> >>> >>> Fix the posix acls and see if these clear up over time as well, I?m >>> unclear on what the overall effect of running without the posix acls will >>> be to total gluster health. Your biggest problem sounds like you need to >>> free up space on the volumes and get the overall volume health back up to >>> par and see if that doesn?t resolve the symptoms you?re seeing. >>> >>> >>> >>> Thank you again for your assistance. It is greatly appreciated. >>> >>> - Patrick >>> >>> >>> >>> On Sat, Apr 20, 2019 at 10:50 PM Darrell Budic >>> wrote: >>> >>>> Patrick, >>>> >>>> I would definitely upgrade your two nodes from 3.12.14 to 3.12.15. You >>>> also mention ZFS, and that error you show makes me think you need to check >>>> to be sure you have ?xattr=sa? and ?acltype=posixacl? set on your ZFS >>>> volumes. >>>> >>>> You also observed your bricks are crossing the 95% full line, ZFS >>>> performance will degrade significantly the closer you get to full. In my >>>> experience, this starts somewhere between 10% and 5% free space remaining, >>>> so you?re in that realm. >>>> >>>> How?s your free memory on the servers doing? Do you have your zfs arc >>>> cache limited to something less than all the RAM? It shares pretty well, >>>> but I?ve encountered situations where other things won?t try and take ram >>>> back properly if they think it?s in use, so ZFS never gets the opportunity >>>> to give it up. >>>> >>>> Since your volume is a disperse-replica, you might try tuning >>>> disperse.shd-max-threads, default is 1, I?d try it at 2, 4, or even more if >>>> the CPUs are beefy enough. And setting server.event-threads to 4 and >>>> client.event-threads to 8 has proven helpful in many cases. After you get >>>> upgraded to 3.12.15, enabling performance.stat-prefetch may help as well. I >>>> don?t know if it matters, but I?d also recommend resetting >>>> performance.least-prio-threads to the default of 1 (or try 2 or 4) and/or >>>> also setting performance.io-thread-count to 32 if those have beefy >>>> CPUs. >>>> >>>> Beyond those general ideas, more info about your hardware (CPU and RAM) >>>> and workload (VMs, direct storage for web servers or enders, etc) may net >>>> you some more ideas. Then you?re going to have to do more digging into >>>> brick logs looking for errors and/or warnings to see what?s going on. >>>> >>>> -Darrell >>>> >>>> >>>> On Apr 20, 2019, at 8:22 AM, Patrick Rennie >>>> wrote: >>>> >>>> Hello Gluster Users, >>>> >>>> I am hoping someone can help me with resolving an ongoing issue I've >>>> been having, I'm new to mailing lists so forgive me if I have gotten >>>> anything wrong. We have noticed our performance deteriorating over the last >>>> few weeks, easily measured by trying to do an ls on one of our top-level >>>> folders, and timing it, which usually would take 2-5 seconds, and now takes >>>> up to 20 minutes, which obviously renders our cluster basically unusable. >>>> This has been intermittent in the past but is now almost constant and I am >>>> not sure how to work out the exact cause. We have noticed some errors in >>>> the brick logs, and have noticed that if we kill the right brick process, >>>> performance instantly returns back to normal, this is not always the same >>>> brick, but it indicates to me something in the brick processes or >>>> background tasks may be causing extreme latency. Due to this ability to fix >>>> it by killing the right brick process off, I think it's a specific file, or >>>> folder, or operation which may be hanging and causing the increased >>>> latency, but I am not sure how to work it out. One last thing to add is >>>> that our bricks are getting quite full (~95% full), we are trying to >>>> migrate data off to new storage but that is going slowly, not helped by >>>> this issue. I am currently trying to run a full heal as there appear to be >>>> many files needing healing, and I have all brick processes running so they >>>> have an opportunity to heal, but this means performance is very poor. It >>>> currently takes over 15-20 minutes to do an ls of one of our top-level >>>> folders, which just contains 60-80 other folders, this should take 2-5 >>>> seconds. This is all being checked by FUSE mount locally on the storage >>>> node itself, but it is the same for other clients and VMs accessing the >>>> cluster. Initially, it seemed our NFS mounts were not affected and operated >>>> at normal speed, but testing over the last day has shown that our NFS >>>> clients are also extremely slow, so it doesn't seem specific to FUSE as I >>>> first thought it might be. >>>> >>>> I am not sure how to proceed from here, I am fairly new to gluster >>>> having inherited this setup from my predecessor and trying to keep it >>>> going. I have included some info below to try and help with diagnosis, >>>> please let me know if any further info would be helpful. I would really >>>> appreciate any advice on what I could try to work out the cause. Thank you >>>> in advance for reading this, and any suggestions you might be able to >>>> offer. >>>> >>>> - Patrick >>>> >>>> This is an example of the main error I see in our brick logs, there >>>> have been others, I can post them when I see them again too: >>>> [2019-04-20 04:54:43.055680] E [MSGID: 113001] >>>> [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on >>>> /brick1/ library: system.posix_acl_default [Operation not >>>> supported] >>>> [2019-04-20 05:01:29.476313] W [posix.c:4929:posix_getxattr] >>>> 0-gvAA01-posix: Extended attributes not supported (try remounting brick >>>> with 'user_xattr' flag) >>>> >>>> Our setup consists of 2 storage nodes and an arbiter node. I have >>>> noticed our nodes are on slightly different versions, I'm not sure if this >>>> could be an issue. We have 9 bricks on each node, made up of ZFS RAIDZ2 >>>> pools - total capacity is around 560TB. >>>> We have bonded 10gbps NICS on each node, and I have tested bandwidth >>>> with iperf and found that it's what would be expected from this config. >>>> Individual brick performance seems ok, I've tested several bricks using >>>> dd and can write a 10GB files at 1.7GB/s. >>>> >>>> # dd if=/dev/zero of=/brick1/test/test.file bs=1M count=10000 >>>> 10000+0 records in >>>> 10000+0 records out >>>> 10485760000 bytes (10 GB, 9.8 GiB) copied, 6.20303 s, 1.7 GB/s >>>> >>>> Node 1: >>>> # glusterfs --version >>>> glusterfs 3.12.15 >>>> >>>> Node 2: >>>> # glusterfs --version >>>> glusterfs 3.12.14 >>>> >>>> Arbiter: >>>> # glusterfs --version >>>> glusterfs 3.12.14 >>>> >>>> Here is our gluster volume status: >>>> >>>> # gluster volume status >>>> Status of volume: gvAA01 >>>> Gluster process TCP Port RDMA Port >>>> Online Pid >>>> >>>> ------------------------------------------------------------------------------ >>>> Brick 01-B:/brick1/gvAA01/brick 49152 0 Y 7219 >>>> Brick 02-B:/brick1/gvAA01/brick 49152 0 Y 21845 >>>> Brick 00-A:/arbiterAA01/gvAA01/bri >>>> ck1 49152 0 Y >>>> 6931 >>>> Brick 01-B:/brick2/gvAA01/brick 49153 0 Y 7239 >>>> Brick 02-B:/brick2/gvAA01/brick 49153 0 Y 9916 >>>> Brick 00-A:/arbiterAA01/gvAA01/bri >>>> ck2 49153 0 Y >>>> 6939 >>>> Brick 01-B:/brick3/gvAA01/brick 49154 0 Y 7235 >>>> Brick 02-B:/brick3/gvAA01/brick 49154 0 Y 21858 >>>> Brick 00-A:/arbiterAA01/gvAA01/bri >>>> ck3 49154 0 Y >>>> 6947 >>>> Brick 01-B:/brick4/gvAA01/brick 49155 0 Y 31840 >>>> Brick 02-B:/brick4/gvAA01/brick 49155 0 Y 9933 >>>> Brick 00-A:/arbiterAA01/gvAA01/bri >>>> ck4 49155 0 Y >>>> 6956 >>>> Brick 01-B:/brick5/gvAA01/brick 49156 0 Y 7233 >>>> Brick 02-B:/brick5/gvAA01/brick 49156 0 Y 9942 >>>> Brick 00-A:/arbiterAA01/gvAA01/bri >>>> ck5 49156 0 Y >>>> 6964 >>>> Brick 01-B:/brick6/gvAA01/brick 49157 0 Y 7234 >>>> Brick 02-B:/brick6/gvAA01/brick 49157 0 Y 9952 >>>> Brick 00-A:/arbiterAA01/gvAA01/bri >>>> ck6 49157 0 Y >>>> 6974 >>>> Brick 01-B:/brick7/gvAA01/brick 49158 0 Y 7248 >>>> Brick 02-B:/brick7/gvAA01/brick 49158 0 Y 9960 >>>> Brick 00-A:/arbiterAA01/gvAA01/bri >>>> ck7 49158 0 Y >>>> 6984 >>>> Brick 01-B:/brick8/gvAA01/brick 49159 0 Y 7253 >>>> Brick 02-B:/brick8/gvAA01/brick 49159 0 Y 9970 >>>> Brick 00-A:/arbiterAA01/gvAA01/bri >>>> ck8 49159 0 Y >>>> 6993 >>>> Brick 01-B:/brick9/gvAA01/brick 49160 0 Y 7245 >>>> Brick 02-B:/brick9/gvAA01/brick 49160 0 Y 9984 >>>> Brick 00-A:/arbiterAA01/gvAA01/bri >>>> ck9 49160 0 Y >>>> 7001 >>>> NFS Server on localhost 2049 0 Y >>>> 17276 >>>> Self-heal Daemon on localhost N/A N/A Y >>>> 25245 >>>> NFS Server on 02-B 2049 0 Y 9089 >>>> Self-heal Daemon on 02-B N/A N/A Y 17838 >>>> NFS Server on 00-a 2049 0 Y 15660 >>>> Self-heal Daemon on 00-a N/A N/A Y 16218 >>>> >>>> Task Status of Volume gvAA01 >>>> >>>> ------------------------------------------------------------------------------ >>>> There are no active volume tasks >>>> >>>> And gluster volume info: >>>> >>>> # gluster volume info >>>> >>>> Volume Name: gvAA01 >>>> Type: Distributed-Replicate >>>> Volume ID: ca4ece2c-13fe-414b-856c-2878196d6118 >>>> Status: Started >>>> Snapshot Count: 0 >>>> Number of Bricks: 9 x (2 + 1) = 27 >>>> Transport-type: tcp >>>> Bricks: >>>> Brick1: 01-B:/brick1/gvAA01/brick >>>> Brick2: 02-B:/brick1/gvAA01/brick >>>> Brick3: 00-A:/arbiterAA01/gvAA01/brick1 (arbiter) >>>> Brick4: 01-B:/brick2/gvAA01/brick >>>> Brick5: 02-B:/brick2/gvAA01/brick >>>> Brick6: 00-A:/arbiterAA01/gvAA01/brick2 (arbiter) >>>> Brick7: 01-B:/brick3/gvAA01/brick >>>> Brick8: 02-B:/brick3/gvAA01/brick >>>> Brick9: 00-A:/arbiterAA01/gvAA01/brick3 (arbiter) >>>> Brick10: 01-B:/brick4/gvAA01/brick >>>> Brick11: 02-B:/brick4/gvAA01/brick >>>> Brick12: 00-A:/arbiterAA01/gvAA01/brick4 (arbiter) >>>> Brick13: 01-B:/brick5/gvAA01/brick >>>> Brick14: 02-B:/brick5/gvAA01/brick >>>> Brick15: 00-A:/arbiterAA01/gvAA01/brick5 (arbiter) >>>> Brick16: 01-B:/brick6/gvAA01/brick >>>> Brick17: 02-B:/brick6/gvAA01/brick >>>> Brick18: 00-A:/arbiterAA01/gvAA01/brick6 (arbiter) >>>> Brick19: 01-B:/brick7/gvAA01/brick >>>> Brick20: 02-B:/brick7/gvAA01/brick >>>> Brick21: 00-A:/arbiterAA01/gvAA01/brick7 (arbiter) >>>> Brick22: 01-B:/brick8/gvAA01/brick >>>> Brick23: 02-B:/brick8/gvAA01/brick >>>> Brick24: 00-A:/arbiterAA01/gvAA01/brick8 (arbiter) >>>> Brick25: 01-B:/brick9/gvAA01/brick >>>> Brick26: 02-B:/brick9/gvAA01/brick >>>> Brick27: 00-A:/arbiterAA01/gvAA01/brick9 (arbiter) >>>> Options Reconfigured: >>>> cluster.shd-max-threads: 4 >>>> performance.least-prio-threads: 16 >>>> cluster.readdir-optimize: on >>>> performance.quick-read: off >>>> performance.stat-prefetch: off >>>> cluster.data-self-heal: on >>>> cluster.lookup-unhashed: auto >>>> cluster.lookup-optimize: on >>>> cluster.favorite-child-policy: mtime >>>> server.allow-insecure: on >>>> transport.address-family: inet >>>> client.bind-insecure: on >>>> cluster.entry-self-heal: off >>>> cluster.metadata-self-heal: off >>>> performance.md-cache-timeout: 600 >>>> cluster.self-heal-daemon: enable >>>> performance.readdir-ahead: on >>>> diagnostics.brick-log-level: INFO >>>> nfs.disable: off >>>> >>>> Thank you for any assistance. >>>> >>>> - Patrick >>>> _______________________________________________ >>>> Gluster-users mailing list >>>> Gluster-users at gluster.org >>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>> >>>> >>>> >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From patrickmrennie at gmail.com Sun Apr 21 09:41:01 2019 From: patrickmrennie at gmail.com (Patrick Rennie) Date: Sun, 21 Apr 2019 17:41:01 +0800 Subject: [Gluster-users] Extremely slow cluster performance In-Reply-To: References: <93FC9B39-2E8C-4579-8C9D-DEF1A28B7384@onholyground.com> Message-ID: Another small update from me, I have been keeping an eye on the glustershd.log file to see what is going on and I keep seeing the same file names come up in there every 10 minutes, but not a lot of other activity. Logs below. How can I be sure my heal is progressing through the files which actually need to be healed? I thought it would show up in these logs. I also increased the "cluster.shd-max-threads" from 4 to 8 to try and speed things up too. Any ideas here? Thanks, - Patrick On 01-B ------- [2019-04-21 09:12:54.575689] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-gvAA01-replicate-6: performing metadata selfheal on 5354c112-2e58-451d-a6f7-6bfcc1c9d904 [2019-04-21 09:12:54.733601] I [MSGID: 108026] [afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6: Completed metadata selfheal on 5354c112-2e58-451d-a6f7-6bfcc1c9d904. sources=[0] 2 sinks=1 [2019-04-21 09:13:12.028509] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-5: performing entry selfheal on 30547ab6-1fbd-422e-9c81-2009f9ff7ebe [2019-04-21 09:13:12.047470] W [MSGID: 108015] [afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-5: expunging file 30547ab6-1fbd-422e-9c81-2009f9ff7ebe/XXXXXXXX.vbm_346744_tmp (00000000-0000-0000-0000-000000000000) on gvAA01-client-17 [2019-04-21 09:23:13.044377] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-5: performing entry selfheal on 30547ab6-1fbd-422e-9c81-2009f9ff7ebe [2019-04-21 09:23:13.051479] W [MSGID: 108015] [afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-5: expunging file 30547ab6-1fbd-422e-9c81-2009f9ff7ebe/XXXXXXXX.vbm_346744_tmp (00000000-0000-0000-0000-000000000000) on gvAA01-client-17 [2019-04-21 09:33:07.400369] I [MSGID: 108026] [afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6: Completed data selfheal on 2fd9899f-192b-49cb-ae9c-df35d3f004fa. sources=[0] 2 sinks=1 [2019-04-21 09:33:11.825449] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-gvAA01-replicate-6: performing metadata selfheal on 2fd9899f-192b-49cb-ae9c-df35d3f004fa [2019-04-21 09:33:14.029837] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-5: performing entry selfheal on 30547ab6-1fbd-422e-9c81-2009f9ff7ebe [2019-04-21 09:33:14.037436] W [MSGID: 108015] [afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-5: expunging file 30547ab6-1fbd-422e-9c81-2009f9ff7ebe/XXXXXXXX.vbm_346744_tmp (00000000-0000-0000-0000-000000000000) on gvAA01-client-17 [2019-04-21 09:33:23.913882] I [MSGID: 108026] [afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6: Completed metadata selfheal on 2fd9899f-192b-49cb-ae9c-df35d3f004fa. sources=[0] 2 sinks=1 [2019-04-21 09:33:43.874201] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-gvAA01-replicate-6: performing metadata selfheal on c25b80fd-f7df-4c6d-92bd-db930e89a0b1 [2019-04-21 09:34:02.273898] I [MSGID: 108026] [afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6: Completed metadata selfheal on c25b80fd-f7df-4c6d-92bd-db930e89a0b1. sources=[0] 2 sinks=1 [2019-04-21 09:35:12.282045] I [MSGID: 108026] [afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6: Completed data selfheal on 94027f22-a7d7-4827-be0d-09cf5ddda885. sources=[0] 2 sinks=1 [2019-04-21 09:35:15.146252] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-gvAA01-replicate-6: performing metadata selfheal on 94027f22-a7d7-4827-be0d-09cf5ddda885 [2019-04-21 09:35:15.254538] I [MSGID: 108026] [afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6: Completed metadata selfheal on 94027f22-a7d7-4827-be0d-09cf5ddda885. sources=[0] 2 sinks=1 [2019-04-21 09:35:22.900803] I [MSGID: 108026] [afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6: Completed data selfheal on 84c93069-cfd8-441b-a6e8-958bed535b45. sources=[0] 2 sinks=1 [2019-04-21 09:35:27.150963] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-gvAA01-replicate-6: performing metadata selfheal on 84c93069-cfd8-441b-a6e8-958bed535b45 [2019-04-21 09:35:29.186295] I [MSGID: 108026] [afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6: Completed metadata selfheal on 84c93069-cfd8-441b-a6e8-958bed535b45. sources=[0] 2 sinks=1 [2019-04-21 09:35:35.967451] I [MSGID: 108026] [afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6: Completed data selfheal on e747c32e-4353-4173-9024-855c69cdf9b9. sources=[0] 2 sinks=1 [2019-04-21 09:35:40.733444] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-gvAA01-replicate-6: performing metadata selfheal on e747c32e-4353-4173-9024-855c69cdf9b9 [2019-04-21 09:35:58.707593] I [MSGID: 108026] [afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6: Completed metadata selfheal on e747c32e-4353-4173-9024-855c69cdf9b9. sources=[0] 2 sinks=1 [2019-04-21 09:36:25.554260] I [MSGID: 108026] [afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6: Completed data selfheal on 4758d581-9de0-403b-af8b-bfd3d71d020d. sources=[0] 2 sinks=1 [2019-04-21 09:36:26.031422] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-gvAA01-replicate-6: performing metadata selfheal on 4758d581-9de0-403b-af8b-bfd3d71d020d [2019-04-21 09:36:26.083982] I [MSGID: 108026] [afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6: Completed metadata selfheal on 4758d581-9de0-403b-af8b-bfd3d71d020d. sources=[0] 2 sinks=1 On 02-B ------- [2019-04-21 09:03:15.815250] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-4: performing entry selfheal on 8fa9513e-82fd-4a6e-8ac9-e1f1cd8afb01 [2019-04-21 09:03:15.863153] W [MSGID: 108015] [afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-4: expunging file 8fa9513e-82fd-4a6e-8ac9-e1f1cd8afb01/C_VOL-b003-i174-cd.md5.tmp (00000000-0000-0000-0000-000000000000) on gvAA01-client-14 [2019-04-21 09:03:15.867432] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-4: performing entry selfheal on 65f6d9fb-a441-4e47-b91a-0936d11a8c8f [2019-04-21 09:03:15.875134] W [MSGID: 108015] [afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-4: expunging file 65f6d9fb-a441-4e47-b91a-0936d11a8c8f/C_VOL-b001-i14937-cd.md5.tmp (00000000-0000-0000-0000-000000000000) on gvAA01-client-14 [2019-04-21 09:03:39.020198] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-5: performing entry selfheal on 30547ab6-1fbd-422e-9c81-2009f9ff7ebe [2019-04-21 09:03:39.027345] W [MSGID: 108015] [afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-5: expunging file 30547ab6-1fbd-422e-9c81-2009f9ff7ebe/XXXXXXXX.vbm_346744_tmp (00000000-0000-0000-0000-000000000000) on gvAA01-client-17 [2019-04-21 09:13:18.524874] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-4: performing entry selfheal on 8fa9513e-82fd-4a6e-8ac9-e1f1cd8afb01 [2019-04-21 09:13:20.070172] W [MSGID: 108015] [afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-4: expunging file 8fa9513e-82fd-4a6e-8ac9-e1f1cd8afb01/C_VOL-b003-i174-cd.md5.tmp (00000000-0000-0000-0000-000000000000) on gvAA01-client-14 [2019-04-21 09:13:20.074977] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-4: performing entry selfheal on 65f6d9fb-a441-4e47-b91a-0936d11a8c8f [2019-04-21 09:13:20.080827] W [MSGID: 108015] [afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-4: expunging file 65f6d9fb-a441-4e47-b91a-0936d11a8c8f/C_VOL-b001-i14937-cd.md5.tmp (00000000-0000-0000-0000-000000000000) on gvAA01-client-14 [2019-04-21 09:13:40.015763] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-5: performing entry selfheal on 30547ab6-1fbd-422e-9c81-2009f9ff7ebe [2019-04-21 09:13:40.021805] W [MSGID: 108015] [afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-5: expunging file 30547ab6-1fbd-422e-9c81-2009f9ff7ebe/XXXXXXXX.vbm_346744_tmp (00000000-0000-0000-0000-000000000000) on gvAA01-client-17 [2019-04-21 09:23:21.991032] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-4: performing entry selfheal on 8fa9513e-82fd-4a6e-8ac9-e1f1cd8afb01 [2019-04-21 09:23:22.054565] W [MSGID: 108015] [afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-4: expunging file 8fa9513e-82fd-4a6e-8ac9-e1f1cd8afb01/C_VOL-b003-i174-cd.md5.tmp (00000000-0000-0000-0000-000000000000) on gvAA01-client-14 [2019-04-21 09:23:22.059225] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-4: performing entry selfheal on 65f6d9fb-a441-4e47-b91a-0936d11a8c8f [2019-04-21 09:23:22.066266] W [MSGID: 108015] [afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-4: expunging file 65f6d9fb-a441-4e47-b91a-0936d11a8c8f/C_VOL-b001-i14937-cd.md5.tmp (00000000-0000-0000-0000-000000000000) on gvAA01-client-14 [2019-04-21 09:23:41.129962] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-5: performing entry selfheal on 30547ab6-1fbd-422e-9c81-2009f9ff7ebe [2019-04-21 09:23:41.135919] W [MSGID: 108015] [afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-5: expunging file 30547ab6-1fbd-422e-9c81-2009f9ff7ebe/XXXXXXXX.vbm_346744_tmp (00000000-0000-0000-0000-000000000000) on gvAA01-client-17 [2019-04-21 09:33:24.015223] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-4: performing entry selfheal on 8fa9513e-82fd-4a6e-8ac9-e1f1cd8afb01 [2019-04-21 09:33:24.069686] W [MSGID: 108015] [afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-4: expunging file 8fa9513e-82fd-4a6e-8ac9-e1f1cd8afb01/C_VOL-b003-i174-cd.md5.tmp (00000000-0000-0000-0000-000000000000) on gvAA01-client-14 [2019-04-21 09:33:24.074341] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-4: performing entry selfheal on 65f6d9fb-a441-4e47-b91a-0936d11a8c8f [2019-04-21 09:33:24.080065] W [MSGID: 108015] [afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-4: expunging file 65f6d9fb-a441-4e47-b91a-0936d11a8c8f/C_VOL-b001-i14937-cd.md5.tmp (00000000-0000-0000-0000-000000000000) on gvAA01-client-14 [2019-04-21 09:33:42.099515] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-5: performing entry selfheal on 30547ab6-1fbd-422e-9c81-2009f9ff7ebe [2019-04-21 09:33:42.107481] W [MSGID: 108015] [afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-5: expunging file 30547ab6-1fbd-422e-9c81-2009f9ff7ebe/XXXXXXXX.vbm_346744_tmp (00000000-0000-0000-0000-000000000000) on gvAA01-client-17 On Sun, Apr 21, 2019 at 3:55 PM Patrick Rennie wrote: > Just another small update, I'm continuing to watch my brick logs and I > just saw these errors come up in the recent events too. I am going to > continue to post any errors I see in the hope of finding the right one to > try and fix.. > This is from the logs on brick1, seems to be occurring on both nodes on > brick1, although at different times. I'm not sure what this means, can > anyone shed any light? > I guess I am looking for some kind of specific error which may indicate > something is broken or stuck and locking up and causing the extreme latency > I'm seeing in the cluster. > > [2019-04-21 07:25:55.064497] E [rpcsvc.c:1364:rpcsvc_submit_generic] > 0-rpc-service: failed to submit message (XID: 0x7c700c, Program: GlusterFS > 3.3, ProgVers: 330, Proc: 29) to rpc-transport (tcp.gvAA01-server) > [2019-04-21 07:25:55.064612] E [server.c:195:server_submit_reply] > (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e58a) > [0x7f3b3e93158a] > -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17d45) > [0x7f3b3e4c5d45] > -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) > [0x7f3b3e4b72cd] ) 0-: Reply submission failed > [2019-04-21 07:25:55.064675] E [rpcsvc.c:1364:rpcsvc_submit_generic] > 0-rpc-service: failed to submit message (XID: 0x7c70af, Program: GlusterFS > 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server) > [2019-04-21 07:25:55.064705] E [server.c:195:server_submit_reply] > (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa) > [0x7f3b3e9318fa] > -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35) > [0x7f3b3e4c5f35] > -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) > [0x7f3b3e4b72cd] ) 0-: Reply submission failed > [2019-04-21 07:25:55.064742] E [rpcsvc.c:1364:rpcsvc_submit_generic] > 0-rpc-service: failed to submit message (XID: 0x7c723c, Program: GlusterFS > 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server) > [2019-04-21 07:25:55.064768] E [server.c:195:server_submit_reply] > (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa) > [0x7f3b3e9318fa] > -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35) > [0x7f3b3e4c5f35] > -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) > [0x7f3b3e4b72cd] ) 0-: Reply submission failed > [2019-04-21 07:25:55.064812] E [rpcsvc.c:1364:rpcsvc_submit_generic] > 0-rpc-service: failed to submit message (XID: 0x7c72b4, Program: GlusterFS > 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server) > [2019-04-21 07:25:55.064837] E [server.c:195:server_submit_reply] > (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa) > [0x7f3b3e9318fa] > -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35) > [0x7f3b3e4c5f35] > -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) > [0x7f3b3e4b72cd] ) 0-: Reply submission failed > [2019-04-21 07:25:55.064880] E [rpcsvc.c:1364:rpcsvc_submit_generic] > 0-rpc-service: failed to submit message (XID: 0x7c740b, Program: GlusterFS > 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server) > [2019-04-21 07:25:55.064905] E [server.c:195:server_submit_reply] > (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa) > [0x7f3b3e9318fa] > -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35) > [0x7f3b3e4c5f35] > -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) > [0x7f3b3e4b72cd] ) 0-: Reply submission failed > [2019-04-21 07:25:55.064939] E [rpcsvc.c:1364:rpcsvc_submit_generic] > 0-rpc-service: failed to submit message (XID: 0x7c7441, Program: GlusterFS > 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server) > [2019-04-21 07:25:55.064962] E [server.c:195:server_submit_reply] > (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa) > [0x7f3b3e9318fa] > -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35) > [0x7f3b3e4c5f35] > -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) > [0x7f3b3e4b72cd] ) 0-: Reply submission failed > [2019-04-21 07:25:55.064996] E [rpcsvc.c:1364:rpcsvc_submit_generic] > 0-rpc-service: failed to submit message (XID: 0x7c74d5, Program: GlusterFS > 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server) > [2019-04-21 07:25:55.065020] E [server.c:195:server_submit_reply] > (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa) > [0x7f3b3e9318fa] > -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35) > [0x7f3b3e4c5f35] > -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) > [0x7f3b3e4b72cd] ) 0-: Reply submission failed > [2019-04-21 07:25:55.065052] E [rpcsvc.c:1364:rpcsvc_submit_generic] > 0-rpc-service: failed to submit message (XID: 0x7c7551, Program: GlusterFS > 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server) > [2019-04-21 07:25:55.065076] E [server.c:195:server_submit_reply] > (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa) > [0x7f3b3e9318fa] > -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35) > [0x7f3b3e4c5f35] > -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) > [0x7f3b3e4b72cd] ) 0-: Reply submission failed > [2019-04-21 07:25:55.065110] E [rpcsvc.c:1364:rpcsvc_submit_generic] > 0-rpc-service: failed to submit message (XID: 0x7c76d1, Program: GlusterFS > 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server) > [2019-04-21 07:25:55.065133] E [server.c:195:server_submit_reply] > (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa) > [0x7f3b3e9318fa] > -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35) > [0x7f3b3e4c5f35] > -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) > [0x7f3b3e4b72cd] ) 0-: Reply submission failed > > Thanks again, > > -Patrick > > On Sun, Apr 21, 2019 at 3:50 PM Patrick Rennie > wrote: > >> Hi Darrell, >> >> Thanks again for your advice, I've left it for a while but unfortunately >> it's still just as slow and causing more problems for our operations now. I >> will need to try and take some steps to at least bring performance back to >> normal while continuing to investigate the issue longer term. I can >> definitely see one node with heavier CPU than the other, almost double, >> which I am OK with, but I think the heal process is going to take forever, >> trying to check the "gluster volume heal info" shows thousands and >> thousands of files which may need healing, I have no idea how many in total >> the command is still running after hours, so I am not sure what has gone so >> wrong to cause this. >> >> I've checked cluster.op-version and cluster.max-op-version and it looks >> like I'm on the latest version there. >> >> I have no idea how long the healing is going to take on this cluster, we >> have around 560TB of data on here, but I don't think I can wait that long >> to try and restore performance to normal. >> >> Can anyone think of anything else I can try in the meantime to work out >> what's causing the extreme latency? >> >> I've been going through cluster client the logs of some of our VMs and on >> some of our FTP servers I found this in the cluster mount log, but I am not >> seeing it on any of our other servers, just our FTP servers. >> >> [2019-04-21 07:16:19.925388] E [MSGID: 101046] >> [dht-common.c:1904:dht_revalidate_cbk] 0-gvAA01-dht: dict is null >> [2019-04-21 07:19:43.413834] W [MSGID: 114031] >> [client-rpc-fops.c:2203:client3_3_setattr_cbk] 0-gvAA01-client-19: remote >> operation failed [No such file or directory] >> [2019-04-21 07:19:43.414153] W [MSGID: 114031] >> [client-rpc-fops.c:2203:client3_3_setattr_cbk] 0-gvAA01-client-20: remote >> operation failed [No such file or directory] >> [2019-04-21 07:23:33.154717] E [MSGID: 101046] >> [dht-common.c:1904:dht_revalidate_cbk] 0-gvAA01-dht: dict is null >> [2019-04-21 07:33:24.943913] E [MSGID: 101046] >> [dht-common.c:1904:dht_revalidate_cbk] 0-gvAA01-dht: dict is null >> >> Any ideas what this could mean? I am basically just grasping at straws >> here. >> >> I am going to hold off on the version upgrade until I know there are no >> files which need healing, which could be a while, from some reading I've >> done there shouldn't be any issues with this as both are on v3.12.x >> >> I've free'd up a small amount of space, but I still need to work on this >> further. >> >> I've read of a command "find .glusterfs -type f -links -2 -exec rm {} \;" >> which could be run on each brick and it would potentially clean up any >> files which were deleted straight from the bricks, but not via the client, >> I have a feeling this could help me free up about 5-10TB per brick from >> what I've been told about the history of this cluster. Can anyone confirm >> if this is actually safe to run? >> >> At this stage, I'm open to any suggestions as to how to proceed, thanks >> again for any advice. >> >> Cheers, >> >> - Patrick >> >> On Sun, Apr 21, 2019 at 1:22 AM Darrell Budic >> wrote: >> >>> Patrick, >>> >>> Sounds like progress. Be aware that gluster is expected to max out the >>> CPUs on at least one of your servers while healing. This is normal and >>> won?t adversely affect overall performance (any more than having bricks in >>> need of healing, at any rate) unless you?re overdoing it. shd threads <= 4 >>> should not do that on your hardware. Other tunings may have also increased >>> overall performance, so you may see higher CPU than previously anyway. I?d >>> recommend upping those thread counts and letting it heal as fast as >>> possible, especially if these are dedicated Gluster storage servers (Ie: >>> not also running VMs, etc). You should see ?normal? CPU use one heals are >>> completed. I see ~15-30% overall normally, 95-98% while healing (x my 20 >>> cores). It?s also likely to be different between your servers, in a pure >>> replica, one tends to max and one tends to be a little higher, in a >>> distributed-replica, I?d expect more than one to run harder while healing. >>> >>> Keep the differences between doing an ls on a brick and doing an ls on a >>> gluster mount in mind. When you do a ls on a gluster volume, it isn?t just >>> doing a ls on one brick, it?s effectively doing it on ALL of your bricks, >>> and they all have to return data before the ls succeeds. In a distributed >>> volume, it?s figuring out where on each volume things live and getting the >>> stat() from each to assemble the whole thing. And if things are in need of >>> healing, it will take even longer to decide which version is current and >>> use it (shd triggers a heal anytime it encounters this). Any of these >>> things being slow slows down the overall response. >>> >>> At this point, I?d get some sleep too, and let your cluster heal while >>> you do. I?d really want it fully healed before I did any updates anyway, so >>> let it use CPU and get itself sorted out. Expect it to do a round of >>> healing after you upgrade each machine too, this is normal so don?t let the >>> CPU spike surprise you, It?s just catching up from the downtime incurred by >>> the update and/or reboot if you did one. >>> >>> That reminds me, check your gluster cluster.op-version and >>> cluster.max-op-version (gluster vol get all all | grep op-version). If >>> op-version isn?t at the max-op-verison, set it to it so you?re taking >>> advantage of the latest features available to your version. >>> >>> -Darrell >>> >>> On Apr 20, 2019, at 11:54 AM, Patrick Rennie >>> wrote: >>> >>> Hi Darrell, >>> >>> Thanks again for your advice, I've applied the acltype=posixacl on my >>> zpools and I think that has reduced some of the noise from my brick logs. >>> I also bumped up some of the thread counts you suggested but my CPU load >>> skyrocketed, so I dropped it back down to something slightly lower, but >>> still higher than it was before, and will see how that goes for a while. >>> >>> Although low space is a definite issue, if I run an ls anywhere on my >>> bricks directly it's instant, <1 second, and still takes several minutes >>> via gluster, so there is still a problem in my gluster configuration >>> somewhere. We don't have any snapshots, but I am trying to work out if any >>> data on there is safe to delete, or if there is any way I can safely find >>> and delete data which has been removed directly from the bricks in the >>> past. I also have lz4 compression already enabled on each zpool which does >>> help a bit, we get between 1.05 and 1.08x compression on this data. >>> I've tried to go through each client and checked it's cluster mount logs >>> and also my brick logs and looking for errors, so far nothing is jumping >>> out at me, but there are some warnings and errors here and there, I am >>> trying to work out what they mean. >>> >>> It's already 1 am here and unfortunately, I'm still awake working on >>> this issue, but I think that I will have to leave the version upgrades >>> until tomorrow. >>> >>> Thanks again for your advice so far. If anyone has any ideas on where I >>> can look for errors other than brick logs or the cluster mount logs to help >>> resolve this issue, it would be much appreciated. >>> >>> Cheers, >>> >>> - Patrick >>> >>> On Sat, Apr 20, 2019 at 11:57 PM Darrell Budic >>> wrote: >>> >>>> See inline: >>>> >>>> On Apr 20, 2019, at 10:09 AM, Patrick Rennie >>>> wrote: >>>> >>>> Hi Darrell, >>>> >>>> Thanks for your reply, this issue seems to be getting worse over the >>>> last few days, really has me tearing my hair out. I will do as you have >>>> suggested and get started on upgrading from 3.12.14 to 3.12.15. >>>> I've checked the zfs properties and all bricks have "xattr=sa" set, but >>>> none of them has "acltype=posixacl" set, currently the acltype property >>>> shows "off", if I make these changes will it apply retroactively to the >>>> existing data? I'm unfamiliar with what this will change so I may need to >>>> look into that before I proceed. >>>> >>>> >>>> It is safe to apply that now, any new set/get calls will then use it if >>>> new posixacls exist, and use older if not. ZFS is good that way. It should >>>> clear up your posix_acl and posix errors over time. >>>> >>>> I understand performance is going to slow down as the bricks get full, >>>> I am currently trying to free space and migrate data to some newer storage, >>>> I have fresh several hundred TB storage I just setup recently but with >>>> these performance issues it's really slow. I also believe there is >>>> significant data which has been deleted directly from the bricks in the >>>> past, so if I can reclaim this space in a safe manner then I will have at >>>> least around 10-15% free space. >>>> >>>> >>>> Full ZFS volumes will have a much larger impact on performance than >>>> you?d think, I?d prioritize this. If you have been taking zfs snapshots, >>>> consider deleting them to get the overall volume free space back up. And >>>> just to be sure it?s been said, delete from within the mounted volumes, >>>> don?t delete directly from the bricks (gluster will just try and heal it >>>> later, compounding your issues). Does not apply to deleting other data from >>>> the ZFS volume if it?s not part of the brick directory, of course. >>>> >>>> These servers have dual 8 core Xeon (E5-2620v4) and 512GB of RAM so >>>> generally they have plenty of resources available, currently only using >>>> around 330/512GB of memory. >>>> >>>> I will look into what your suggested settings will change, and then >>>> will probably go ahead with your recommendations, for our specs as stated >>>> above, what would you suggest for performance.io-thread-count ? >>>> >>>> >>>> I run single 2630v4s on my servers, which have a smaller storage >>>> footprint than yours. I?d go with 32 for performance.io-thread-count. >>>> I?d try 4 for the shd thread settings on that gear. Your memory use sounds >>>> fine, so no worries there. >>>> >>>> Our workload is nothing too extreme, we have a few VMs which write >>>> backup data to this storage nightly for our clients, our VMs don't live on >>>> this cluster, but just write to it. >>>> >>>> >>>> If they are writing compressible data, you?ll get immediate benefit by >>>> setting compression=lz4 on your ZFS volumes. It won?t help any old data, of >>>> course, but it will compress new data going forward. This is another one >>>> that?s safe to enable on the fly. >>>> >>>> I've been going through all of the logs I can, below are some slightly >>>> sanitized errors I've come across, but I'm not sure what to make of them. >>>> The main error I am seeing is the first one below, across several of my >>>> bricks, but possibly only for specific folders on the cluster, I'm not 100% >>>> about that yet though. >>>> >>>> [2019-04-20 05:56:59.512649] E [MSGID: 113001] >>>> [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on >>>> /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default [Operation not >>>> supported] >>>> [2019-04-20 05:59:06.084333] E [MSGID: 113001] >>>> [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on >>>> /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default [Operation not >>>> supported] >>>> [2019-04-20 05:59:43.289030] E [MSGID: 113001] >>>> [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on >>>> /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default [Operation not >>>> supported] >>>> [2019-04-20 05:59:50.582257] E [MSGID: 113001] >>>> [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on >>>> /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default [Operation not >>>> supported] >>>> [2019-04-20 06:01:42.501701] E [MSGID: 113001] >>>> [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on >>>> /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default [Operation not >>>> supported] >>>> [2019-04-20 06:01:51.665354] W [posix.c:4929:posix_getxattr] >>>> 0-gvAA01-posix: Extended attributes not supported (try remounting brick >>>> with 'user_xattr' flag) >>>> >>>> >>>> [2019-04-20 13:12:36.131856] E [MSGID: 113002] >>>> [posix-helpers.c:893:posix_gfid_set] 0-gvAA01-posix: gfid is null for >>>> /xxxxxxxxxxxxxxxxxxxx [Invalid argument] >>>> [2019-04-20 13:12:36.131959] E [MSGID: 113002] >>>> [posix.c:362:posix_lookup] 0-gvAA01-posix: buf->ia_gfid is null for >>>> /brick2/xxxxxxxxxxxxxxxxxxxx_62906_tmp [No data available] >>>> [2019-04-20 13:12:36.132016] E [MSGID: 115050] >>>> [server-rpc-fops.c:175:server_lookup_cbk] 0-gvAA01-server: 24274759: LOOKUP >>>> /xxxxxxxxxxxxxxxxxxxx (a7c9b4a0-b7ee-4d01-a79e-576013c8ac87/Cloud >>>> Backup_clone1.vbm_62906_tmp), client: >>>> 00-A-16217-2019/04/08-21:23:03:692424-gvAA01-client-4-0-3, error-xlator: >>>> gvAA01-posix [No data available] >>>> [2019-04-20 13:12:38.093719] E [MSGID: 115050] >>>> [server-rpc-fops.c:175:server_lookup_cbk] 0-gvAA01-server: 24276491: LOOKUP >>>> /xxxxxxxxxxxxxxxxxxxx (a7c9b4a0-b7ee-4d01-a79e-576013c8ac87/Cloud >>>> Backup_clone1.vbm_62906_tmp), client: >>>> 00-A-16217-2019/04/08-21:23:03:692424-gvAA01-client-4-0-3, error-xlator: >>>> gvAA01-posix [No data available] >>>> [2019-04-20 13:12:38.093660] E [MSGID: 113002] >>>> [posix-helpers.c:893:posix_gfid_set] 0-gvAA01-posix: gfid is null for >>>> /xxxxxxxxxxxxxxxxxxxx [Invalid argument] >>>> [2019-04-20 13:12:38.093696] E [MSGID: 113002] >>>> [posix.c:362:posix_lookup] 0-gvAA01-posix: buf->ia_gfid is null for >>>> /brick2/xxxxxxxxxxxxxxxxxxxx [No data available] >>>> >>>> >>>> posixacls should clear those up, as mentioned. >>>> >>>> >>>> [2019-04-20 14:25:59.654576] E [inodelk.c:404:__inode_unlock_lock] >>>> 0-gvAA01-locks: Matching lock not found for unlock 0-9223372036854775807, >>>> by 980fdbbd367f0000 on 0x7fc4f0161440 >>>> [2019-04-20 14:25:59.654668] E [MSGID: 115053] >>>> [server-rpc-fops.c:295:server_inodelk_cbk] 0-gvAA01-server: 6092928: >>>> INODELK /xxxxxxxxxxxxxxxxxxxx.cdr$ (25b14631-a179-4274-8243-6e272d4f2ad8), >>>> client: >>>> cb-per-worker18-53637-2019/04/19-14:25:37:927673-gvAA01-client-1-0-4, >>>> error-xlator: gvAA01-locks [Invalid argument] >>>> >>>> >>>> [2019-04-20 13:35:07.495495] E [rpcsvc.c:1364:rpcsvc_submit_generic] >>>> 0-rpc-service: failed to submit message (XID: 0x247c644, Program: GlusterFS >>>> 3.3, ProgVers: 330, Proc: 27) to rpc-transport (tcp.gvAA01-server) >>>> [2019-04-20 13:35:07.495619] E [server.c:195:server_submit_reply] >>>> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.14/xlator/debug/io-stats.so(+0x1696a) >>>> [0x7ff4ae6f796a] >>>> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.14/xlator/protocol/server.so(+0x2d6e8) >>>> [0x7ff4ae2a96e8] >>>> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.14/xlator/protocol/server.so(+0x928d) >>>> [0x7ff4ae28528d] ) 0-: Reply submission failed >>>> >>>> >>>> Fix the posix acls and see if these clear up over time as well, I?m >>>> unclear on what the overall effect of running without the posix acls will >>>> be to total gluster health. Your biggest problem sounds like you need to >>>> free up space on the volumes and get the overall volume health back up to >>>> par and see if that doesn?t resolve the symptoms you?re seeing. >>>> >>>> >>>> >>>> Thank you again for your assistance. It is greatly appreciated. >>>> >>>> - Patrick >>>> >>>> >>>> >>>> On Sat, Apr 20, 2019 at 10:50 PM Darrell Budic >>>> wrote: >>>> >>>>> Patrick, >>>>> >>>>> I would definitely upgrade your two nodes from 3.12.14 to 3.12.15. You >>>>> also mention ZFS, and that error you show makes me think you need to check >>>>> to be sure you have ?xattr=sa? and ?acltype=posixacl? set on your ZFS >>>>> volumes. >>>>> >>>>> You also observed your bricks are crossing the 95% full line, ZFS >>>>> performance will degrade significantly the closer you get to full. In my >>>>> experience, this starts somewhere between 10% and 5% free space remaining, >>>>> so you?re in that realm. >>>>> >>>>> How?s your free memory on the servers doing? Do you have your zfs arc >>>>> cache limited to something less than all the RAM? It shares pretty well, >>>>> but I?ve encountered situations where other things won?t try and take ram >>>>> back properly if they think it?s in use, so ZFS never gets the opportunity >>>>> to give it up. >>>>> >>>>> Since your volume is a disperse-replica, you might try tuning >>>>> disperse.shd-max-threads, default is 1, I?d try it at 2, 4, or even more if >>>>> the CPUs are beefy enough. And setting server.event-threads to 4 and >>>>> client.event-threads to 8 has proven helpful in many cases. After you get >>>>> upgraded to 3.12.15, enabling performance.stat-prefetch may help as well. I >>>>> don?t know if it matters, but I?d also recommend resetting >>>>> performance.least-prio-threads to the default of 1 (or try 2 or 4) and/or >>>>> also setting performance.io-thread-count to 32 if those have beefy >>>>> CPUs. >>>>> >>>>> Beyond those general ideas, more info about your hardware (CPU and >>>>> RAM) and workload (VMs, direct storage for web servers or enders, etc) may >>>>> net you some more ideas. Then you?re going to have to do more digging into >>>>> brick logs looking for errors and/or warnings to see what?s going on. >>>>> >>>>> -Darrell >>>>> >>>>> >>>>> On Apr 20, 2019, at 8:22 AM, Patrick Rennie >>>>> wrote: >>>>> >>>>> Hello Gluster Users, >>>>> >>>>> I am hoping someone can help me with resolving an ongoing issue I've >>>>> been having, I'm new to mailing lists so forgive me if I have gotten >>>>> anything wrong. We have noticed our performance deteriorating over the last >>>>> few weeks, easily measured by trying to do an ls on one of our top-level >>>>> folders, and timing it, which usually would take 2-5 seconds, and now takes >>>>> up to 20 minutes, which obviously renders our cluster basically unusable. >>>>> This has been intermittent in the past but is now almost constant and I am >>>>> not sure how to work out the exact cause. We have noticed some errors in >>>>> the brick logs, and have noticed that if we kill the right brick process, >>>>> performance instantly returns back to normal, this is not always the same >>>>> brick, but it indicates to me something in the brick processes or >>>>> background tasks may be causing extreme latency. Due to this ability to fix >>>>> it by killing the right brick process off, I think it's a specific file, or >>>>> folder, or operation which may be hanging and causing the increased >>>>> latency, but I am not sure how to work it out. One last thing to add is >>>>> that our bricks are getting quite full (~95% full), we are trying to >>>>> migrate data off to new storage but that is going slowly, not helped by >>>>> this issue. I am currently trying to run a full heal as there appear to be >>>>> many files needing healing, and I have all brick processes running so they >>>>> have an opportunity to heal, but this means performance is very poor. It >>>>> currently takes over 15-20 minutes to do an ls of one of our top-level >>>>> folders, which just contains 60-80 other folders, this should take 2-5 >>>>> seconds. This is all being checked by FUSE mount locally on the storage >>>>> node itself, but it is the same for other clients and VMs accessing the >>>>> cluster. Initially, it seemed our NFS mounts were not affected and operated >>>>> at normal speed, but testing over the last day has shown that our NFS >>>>> clients are also extremely slow, so it doesn't seem specific to FUSE as I >>>>> first thought it might be. >>>>> >>>>> I am not sure how to proceed from here, I am fairly new to gluster >>>>> having inherited this setup from my predecessor and trying to keep it >>>>> going. I have included some info below to try and help with diagnosis, >>>>> please let me know if any further info would be helpful. I would really >>>>> appreciate any advice on what I could try to work out the cause. Thank you >>>>> in advance for reading this, and any suggestions you might be able to >>>>> offer. >>>>> >>>>> - Patrick >>>>> >>>>> This is an example of the main error I see in our brick logs, there >>>>> have been others, I can post them when I see them again too: >>>>> [2019-04-20 04:54:43.055680] E [MSGID: 113001] >>>>> [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on >>>>> /brick1/ library: system.posix_acl_default [Operation not >>>>> supported] >>>>> [2019-04-20 05:01:29.476313] W [posix.c:4929:posix_getxattr] >>>>> 0-gvAA01-posix: Extended attributes not supported (try remounting brick >>>>> with 'user_xattr' flag) >>>>> >>>>> Our setup consists of 2 storage nodes and an arbiter node. I have >>>>> noticed our nodes are on slightly different versions, I'm not sure if this >>>>> could be an issue. We have 9 bricks on each node, made up of ZFS RAIDZ2 >>>>> pools - total capacity is around 560TB. >>>>> We have bonded 10gbps NICS on each node, and I have tested bandwidth >>>>> with iperf and found that it's what would be expected from this config. >>>>> Individual brick performance seems ok, I've tested several bricks >>>>> using dd and can write a 10GB files at 1.7GB/s. >>>>> >>>>> # dd if=/dev/zero of=/brick1/test/test.file bs=1M count=10000 >>>>> 10000+0 records in >>>>> 10000+0 records out >>>>> 10485760000 bytes (10 GB, 9.8 GiB) copied, 6.20303 s, 1.7 GB/s >>>>> >>>>> Node 1: >>>>> # glusterfs --version >>>>> glusterfs 3.12.15 >>>>> >>>>> Node 2: >>>>> # glusterfs --version >>>>> glusterfs 3.12.14 >>>>> >>>>> Arbiter: >>>>> # glusterfs --version >>>>> glusterfs 3.12.14 >>>>> >>>>> Here is our gluster volume status: >>>>> >>>>> # gluster volume status >>>>> Status of volume: gvAA01 >>>>> Gluster process TCP Port RDMA Port >>>>> Online Pid >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> Brick 01-B:/brick1/gvAA01/brick 49152 0 Y 7219 >>>>> Brick 02-B:/brick1/gvAA01/brick 49152 0 Y 21845 >>>>> Brick 00-A:/arbiterAA01/gvAA01/bri >>>>> ck1 49152 0 Y >>>>> 6931 >>>>> Brick 01-B:/brick2/gvAA01/brick 49153 0 Y 7239 >>>>> Brick 02-B:/brick2/gvAA01/brick 49153 0 Y 9916 >>>>> Brick 00-A:/arbiterAA01/gvAA01/bri >>>>> ck2 49153 0 Y >>>>> 6939 >>>>> Brick 01-B:/brick3/gvAA01/brick 49154 0 Y 7235 >>>>> Brick 02-B:/brick3/gvAA01/brick 49154 0 Y 21858 >>>>> Brick 00-A:/arbiterAA01/gvAA01/bri >>>>> ck3 49154 0 Y >>>>> 6947 >>>>> Brick 01-B:/brick4/gvAA01/brick 49155 0 Y 31840 >>>>> Brick 02-B:/brick4/gvAA01/brick 49155 0 Y 9933 >>>>> Brick 00-A:/arbiterAA01/gvAA01/bri >>>>> ck4 49155 0 Y >>>>> 6956 >>>>> Brick 01-B:/brick5/gvAA01/brick 49156 0 Y 7233 >>>>> Brick 02-B:/brick5/gvAA01/brick 49156 0 Y 9942 >>>>> Brick 00-A:/arbiterAA01/gvAA01/bri >>>>> ck5 49156 0 Y >>>>> 6964 >>>>> Brick 01-B:/brick6/gvAA01/brick 49157 0 Y 7234 >>>>> Brick 02-B:/brick6/gvAA01/brick 49157 0 Y 9952 >>>>> Brick 00-A:/arbiterAA01/gvAA01/bri >>>>> ck6 49157 0 Y >>>>> 6974 >>>>> Brick 01-B:/brick7/gvAA01/brick 49158 0 Y 7248 >>>>> Brick 02-B:/brick7/gvAA01/brick 49158 0 Y 9960 >>>>> Brick 00-A:/arbiterAA01/gvAA01/bri >>>>> ck7 49158 0 Y >>>>> 6984 >>>>> Brick 01-B:/brick8/gvAA01/brick 49159 0 Y 7253 >>>>> Brick 02-B:/brick8/gvAA01/brick 49159 0 Y 9970 >>>>> Brick 00-A:/arbiterAA01/gvAA01/bri >>>>> ck8 49159 0 Y >>>>> 6993 >>>>> Brick 01-B:/brick9/gvAA01/brick 49160 0 Y 7245 >>>>> Brick 02-B:/brick9/gvAA01/brick 49160 0 Y 9984 >>>>> Brick 00-A:/arbiterAA01/gvAA01/bri >>>>> ck9 49160 0 Y >>>>> 7001 >>>>> NFS Server on localhost 2049 0 Y >>>>> 17276 >>>>> Self-heal Daemon on localhost N/A N/A Y >>>>> 25245 >>>>> NFS Server on 02-B 2049 0 Y 9089 >>>>> Self-heal Daemon on 02-B N/A N/A Y 17838 >>>>> NFS Server on 00-a 2049 0 Y 15660 >>>>> Self-heal Daemon on 00-a N/A N/A Y 16218 >>>>> >>>>> Task Status of Volume gvAA01 >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> There are no active volume tasks >>>>> >>>>> And gluster volume info: >>>>> >>>>> # gluster volume info >>>>> >>>>> Volume Name: gvAA01 >>>>> Type: Distributed-Replicate >>>>> Volume ID: ca4ece2c-13fe-414b-856c-2878196d6118 >>>>> Status: Started >>>>> Snapshot Count: 0 >>>>> Number of Bricks: 9 x (2 + 1) = 27 >>>>> Transport-type: tcp >>>>> Bricks: >>>>> Brick1: 01-B:/brick1/gvAA01/brick >>>>> Brick2: 02-B:/brick1/gvAA01/brick >>>>> Brick3: 00-A:/arbiterAA01/gvAA01/brick1 (arbiter) >>>>> Brick4: 01-B:/brick2/gvAA01/brick >>>>> Brick5: 02-B:/brick2/gvAA01/brick >>>>> Brick6: 00-A:/arbiterAA01/gvAA01/brick2 (arbiter) >>>>> Brick7: 01-B:/brick3/gvAA01/brick >>>>> Brick8: 02-B:/brick3/gvAA01/brick >>>>> Brick9: 00-A:/arbiterAA01/gvAA01/brick3 (arbiter) >>>>> Brick10: 01-B:/brick4/gvAA01/brick >>>>> Brick11: 02-B:/brick4/gvAA01/brick >>>>> Brick12: 00-A:/arbiterAA01/gvAA01/brick4 (arbiter) >>>>> Brick13: 01-B:/brick5/gvAA01/brick >>>>> Brick14: 02-B:/brick5/gvAA01/brick >>>>> Brick15: 00-A:/arbiterAA01/gvAA01/brick5 (arbiter) >>>>> Brick16: 01-B:/brick6/gvAA01/brick >>>>> Brick17: 02-B:/brick6/gvAA01/brick >>>>> Brick18: 00-A:/arbiterAA01/gvAA01/brick6 (arbiter) >>>>> Brick19: 01-B:/brick7/gvAA01/brick >>>>> Brick20: 02-B:/brick7/gvAA01/brick >>>>> Brick21: 00-A:/arbiterAA01/gvAA01/brick7 (arbiter) >>>>> Brick22: 01-B:/brick8/gvAA01/brick >>>>> Brick23: 02-B:/brick8/gvAA01/brick >>>>> Brick24: 00-A:/arbiterAA01/gvAA01/brick8 (arbiter) >>>>> Brick25: 01-B:/brick9/gvAA01/brick >>>>> Brick26: 02-B:/brick9/gvAA01/brick >>>>> Brick27: 00-A:/arbiterAA01/gvAA01/brick9 (arbiter) >>>>> Options Reconfigured: >>>>> cluster.shd-max-threads: 4 >>>>> performance.least-prio-threads: 16 >>>>> cluster.readdir-optimize: on >>>>> performance.quick-read: off >>>>> performance.stat-prefetch: off >>>>> cluster.data-self-heal: on >>>>> cluster.lookup-unhashed: auto >>>>> cluster.lookup-optimize: on >>>>> cluster.favorite-child-policy: mtime >>>>> server.allow-insecure: on >>>>> transport.address-family: inet >>>>> client.bind-insecure: on >>>>> cluster.entry-self-heal: off >>>>> cluster.metadata-self-heal: off >>>>> performance.md-cache-timeout: 600 >>>>> cluster.self-heal-daemon: enable >>>>> performance.readdir-ahead: on >>>>> diagnostics.brick-log-level: INFO >>>>> nfs.disable: off >>>>> >>>>> Thank you for any assistance. >>>>> >>>>> - Patrick >>>>> _______________________________________________ >>>>> Gluster-users mailing list >>>>> Gluster-users at gluster.org >>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>> >>>>> >>>>> >>>> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From hunter86_bg at yahoo.com Sun Apr 21 11:51:33 2019 From: hunter86_bg at yahoo.com (Strahil) Date: Sun, 21 Apr 2019 14:51:33 +0300 Subject: [Gluster-users] Extremely slow cluster performance Message-ID: Hi Patrick, I guess you can collect some data via the 'gluster profile' command. At least it should show any issues from performance point of view. gluster volume profile volume start; Do an 'ls' gluster volume profile volume info gluster volume profile volume stop Also, can you define top 3 errors seen in the logs. If you manage to fix them (with the help of the community) one by one - you might restore your full functionality. By the way, do you have the option to archive the data and thus reduce the ammount stored - which obviously will increase ZFS performance. Best Regards, Strahil Nikolov On Apr 21, 2019 10:50, Patrick Rennie wrote: > > Hi Darrell,? > > Thanks again for your advice, I've left it for a while but unfortunately it's still just as slow and causing more problems for our operations now. I will need to try and take some steps to at least bring performance back to normal while continuing to investigate the issue longer term. I can definitely see one node with heavier CPU than the other, almost double, which I am OK with, but I think the heal process is going to take forever, trying to check the "gluster volume heal info" shows thousands and thousands of files which may need healing, I have no idea how many in total the command is still running after hours, so I am not sure what has gone so wrong to cause this.? > > I've checked cluster.op-version and cluster.max-op-version and it looks like I'm on the latest version there.? > > I have no idea how long the healing is going to take on this cluster, we have around 560TB of data on here, but I don't think I can wait that long to try and restore performance to normal.? > > Can anyone think of anything else I can try in the meantime to work out what's causing the extreme latency?? > > I've been going through cluster client the logs of some of our VMs and on some of our FTP servers I found this in the cluster mount log, but I am not seeing it on any of our other servers, just our FTP servers.? > > [2019-04-21 07:16:19.925388] E [MSGID: 101046] [dht-common.c:1904:dht_revalidate_cbk] 0-gvAA01-dht: dict is null > [2019-04-21 07:19:43.413834] W [MSGID: 114031] [client-rpc-fops.c:2203:client3_3_setattr_cbk] 0-gvAA01-client-19: remote operation failed [No such file or directory] > [2019-04-21 07:19:43.414153] W [MSGID: 114031] [client-rpc-fops.c:2203:client3_3_setattr_cbk] 0-gvAA01-client-20: remote operation failed [No such file or directory] > [2019-04-21 07:23:33.154717] E [MSGID: 101046] [dht-common.c:1904:dht_revalidate_cbk] 0-gvAA01-dht: dict is null > [2019-04-21 07:33:24.943913] E [MSGID: 101046] [dht-common.c:1904:dht_revalidate_cbk] 0-gvAA01-dht: dict is null > > Any ideas what this could mean? I am basically just grasping at straws here. > > I am going to hold off on the version upgrade until I know there are no files which need healing, which could be a while, from some reading I've done there shouldn't be any issues with this as both are on v3.12.x? > > I've free'd up a small amount of space, but I still need to work on this further.? > > I've read of a command "find .glusterfs -type f -links -2 -exec rm {} \;" which could be run on each brick and it would potentially clean up any files which were deleted straight from the bricks, but not via the client, I have a feeling this could help me free up about 5-10TB per brick from what I've been told about the history of this cluster. Can anyone confirm if this is actually safe to run?? > > At this stage, I'm open to any suggestions as to how to proceed, thanks again for any advice.? > > Cheers,? > > - Patrick > > On Sun, Apr 21, 2019 at 1:22 AM Darrell Budic wrote: >> >> Patrick, >> >> Sounds like progress. Be aware that gluster is expected to max out the CPUs on at least one of your servers while healing. This is normal and won?t adversely affect overall performance (any more than having bricks in need of healing, at any rate) unless you?re overdoing it. shd threads <= 4 should not do that on your hardware. Other tunings may have also increased overall performance, so you may see higher CPU than previously anyway. I?d recommend upping those thread counts and letting it heal as fast as possible, especially if these are dedicated Gluster storage servers (Ie: not also running VMs, etc). You should see ?normal? CPU use one heals are completed. I see ~15-30% overall normally, 95-98% while healing (x my 20 cores). It?s also likely to be different between your servers, in a pure replica, one tends to max and one tends to be a little higher, in a distributed-replica, I?d expect more than one to run harder while healin -------------- next part -------------- An HTML attachment was scrubbed... URL: From hunter86_bg at yahoo.com Sun Apr 21 11:56:04 2019 From: hunter86_bg at yahoo.com (Strahil) Date: Sun, 21 Apr 2019 14:56:04 +0300 Subject: [Gluster-users] Extremely slow cluster performance Message-ID: By the way, can you provide the 'volume info' and the mount options on all clients? Maybe , there is an option that uses a lot of resources due to some client's mount options. Best Regards, Strahil NikolovOn Apr 21, 2019 10:55, Patrick Rennie wrote: > > Just another small update, I'm continuing to watch my brick logs and I just saw these errors come up in the recent events too. I am going to continue to post any errors I see in the hope of finding the right one to try and fix..? > This is from the logs on brick1, seems to be occurring on both nodes on brick1, although at different times. I'm not sure what this means, can anyone shed any light?? > I guess I am looking for some kind of specific error which may indicate something is broken or stuck and locking up and causing the extreme latency I'm seeing in the cluster.? > > [2019-04-21 07:25:55.064497] E [rpcsvc.c:1364:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x7c700c, Program: GlusterFS 3.3, ProgVers: 330, Proc: 29) to rpc-transport (tcp.gvAA01-server) > [2019-04-21 07:25:55.064612] E [server.c:195:server_submit_reply] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e58a) [0x7f3b3e93158a] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17d45) [0x7f3b3e4c5d45] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) [0x7f3b3e4b72cd] ) 0-: Reply submission failed > [2019-04-21 07:25:55.064675] E [rpcsvc.c:1364:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x7c70af, Program: GlusterFS 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server) > [2019-04-21 07:25:55.064705] E [server.c:195:server_submit_reply] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa) [0x7f3b3e9318fa] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35) [0x7f3b3e4c5f35] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) [0x7f3b3e4b72cd] ) 0-: Reply submission failed > [2019-04-21 07:25:55.064742] E [rpcsvc.c:1364:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x7c723c, Program: GlusterFS 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server) > [2019-04-21 07:25:55.064768] E [server.c:195:server_submit_reply] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa) [0x7f3b3e9318fa] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35) [0x7f3b3e4c5f35] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) [0x7f3b3e4b72cd] ) 0-: Reply submission failed > [2019-04-21 07:25:55.064812] E [rpcsvc.c:1364:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x7c72b4, Program: GlusterFS 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server) > [2019-04-21 07:25:55.064837] E [server.c:195:server_submit_reply] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa) [0x7f3b3e9318fa] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35) [0x7f3b3e4c5f35] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) [0x7f3b3e4b72cd] ) 0-: Reply submission failed > [2019-04-21 07:25:55.064880] E [rpcsvc.c:1364:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x7c740b, Program: GlusterFS 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server) > [2019-04-21 07:25:55.064905] E [server.c:195:server_submit_reply] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa) [0x7f3b3e9318fa] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35) [0x7f3b3e4c5f35] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) [0x7f3b3e4b72cd] ) 0-: Reply submission failed > [2019-04-21 07:25:55.064939] E [rpcsvc.c:1364:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x7c7441, Program: GlusterFS 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server) > [2019-04-21 07:25:55.064962] E [server.c:195:server_submit_reply] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa) [0x7f3b3e9318fa] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35) [0x7f3b3e4c5f35] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) [0x7f3b3e4b72cd] ) 0-: Reply submission failed > [2019-04-21 07:25:55.064996] E [rpcsvc.c:1364:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x7c74d5, Program: GlusterFS 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server) > [2019-04-21 07:25:55.065020] E [server.c:195:server_submit_reply] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa) [0x7f3b3e9318fa] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35) [0x7f3b3e4c5f35] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) [0x7f3b3e4b72cd] ) 0-: Reply submission failed > [2019-04-21 07:25:55.065052] E [rpcsvc.c:1364:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x7c7551, Program: GlusterFS 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server) > [2019-04-21 07:25:55.065076] E [server.c:195:server_submit_reply] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa) [0x7f3b3e9318fa] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35) [0x7f3b3e4c5f35] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) [0x7f3b3e4b72cd] ) 0-: Reply submission failed > [2019-04-21 07:25:55.065110] E [rpcsvc.c:1364:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x7c76d1, Program: GlusterFS 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server) > [2019-04-21 07:25:55.065133] E [server.c:195:server_submit_reply] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa) [0x7f3b3e9318fa] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35) [0x7f3b3e4c5f35] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) [0x7f3b3e4b72cd] ) 0-: Reply submission failed > > Thanks again, > > -Patrick > > On Sun, Apr 21, 2019 at 3:50 PM Patrick Rennie wrote: >> >> Hi Darrell,? >> >> Thanks again for your advice, I've left it for a while but unfortunately it's still just as slow and causing more problems for our operations now. I will need to try and take some steps to at least bring performance back to normal while continuing to investigate the issue longer term. I can definitely see one node with heavier CPU than the other, almost double, which I am OK with, but I think the heal process is going to take forever, trying to check the "gluster volume heal info" shows thousands and thousands of files which may need healing, I have no idea how many in total the command is still running after hours, so I am not sure what has gone so wrong to cause this.? >> >> I've checked cluster.op-version and cluster.max-op-version and it looks like I'm on the latest version there.? >> >> I have no idea how long the healing is going to take on this cluster, we have around 560TB of data on here, but I don't think I can wait that long to try and restore performance to normal.? >> >> Can anyone think of anything else I can try in the meantime to work out what's causing the extreme latency?? >> >> I've been going through cluster client the logs of some of our VMs and on some of our FTP servers I found this in the cluster mount log, but I am not seeing it on any of our other servers, just our FTP servers.? >> >> [2019-04-21 07:16:19.925388] E [MSGID: 101046] [dht-common.c:1904:dht_revalidate_cbk] 0-gvAA01-dht: dict is null >> [2019-04-21 07:19:43.413834] W [MSGID: 114031] [client-rpc-fops.c:2203:client3_3_setattr_cbk] 0-gvAA01-client-19: remote operation failed [No such file or directory] >> [2019-04-21 07:19:43.414153] W [MSGID: 114031] [client-rpc-fops.c:2203:client3_3_setattr_cbk] 0-gvAA01-client-20: remote operation failed [No such file or directory] >> [2019-04-21 07:23:33.154717] E [MSGID: 101046] [dht-common.c:1904:dht_revalidate_cbk] 0-gvAA01-dht: dict is null >> [2019-04-21 07:33:24.943913] E [MSGID: 101046] [dht-common.c:1904:dht_revalidate_cbk] 0-gvAA01-dht: dict is null >> >> Any ideas what this could mean? I am basically just grasping at straws here. >> >> I am going to hold off on the version upgrade until I know there are no files which need healing, which could be a while, from some reading I've done there shouldn't be any issues with this as both are on v3.12.x? >> >> I've free'd up a small amount of space, but I still need to work on this further.? >> >> I've read of a command "find .glusterfs -type f -links -2 -exec rm {} \;" which could be run on each brick and it would potentially clean up any files which were deleted straight from the bricks, but not via the client, I have a feeling this could help me free up about 5-10TB per brick from what I've been told about the history of this cluster. Can anyone confirm if this is actually safe to run?? >> >> At this stage, I'm open to any suggestions as to how to proceed, thanks again for any advice.? >> >> Cheers,? >> >> - Patrick >> >> On Sun, Apr 21, 2019 at 1:22 AM Darrell Budic wrote: >>> >>> Patrick, >>> >>> Sounds like progress. Be aware that gluster is expected to max out the CPUs on at least one of your servers while healing. This is normal and won?t adversely affect overall performance (any more than having bricks in need of healing, at any rate) unless you?re overdoing it. shd threads <= 4 should not do that on your hardware. Other tunings may have also increased overall performance, so you may see higher CPU than previously anyway. I?d recommend upping those thread counts and letting it heal as fast as possible, especially if these are dedicated Gluster storage servers (Ie: not also running VMs, etc). You should see ?normal? CPU use one heals are completed. I see ~15-30% overall normally, 95-98% while healing (x my 20 cores). It?s also likely to be different between your servers, in a pure replica, one tends to max and one tends to be a little higher, in a distributed-replica, I?d expect more than one to run harder while healing. >>> >>> Keep the differences between doing an ls on a brick and doing an ls on a gluster mount in mind. When you do a ls on a gluster volume, it isn?t just doing a ls on one brick, it?s effectively doing it on ALL of your bricks, and they all have to return data before the ls succeeds. In a distributed volume, it?s figuring out where on each volume things live and getting the stat() from each to assemble the whole thing. And if things are in need of healing, it will take even longer to decide which version is current and use it (shd triggers a heal anytime it encounters this). Any of these things being slow slows down the overall response.? >>> >>> At this point, I?d get some sleep too, and let your cluster heal while you do. I?d really want it fully healed before I did any updates anyway, so let it use CPU and get itself sorted out. Expect it to do a round of healing after you upgrade each machine too, this is normal so don?t let the CPU spike surprise you, It?s just catching up from the downtime incurred by the update and/or reboot if you did one. >>> >>> That reminds me, check your gluster cluster.op-version and cluster.max-op-version (gluster vol get all all | grep op-version). If op-version isn?t at the max-op-verison, set it to it so you?re taking advantage of the latest features available to your version. >>> >>> ? -Darrell >>> >>>> On Apr 20, 2019, at 11:54 AM, Patrick Rennie wrote: >>>> >>>> Hi Darrell,? >>>> >>>> Thanks again for your advice, I've applied the acltype=posixacl on my zpools and I think that has reduced some of the noise from my brick logs.? >>>> I also bumped up some of the thread counts you suggested but my CPU load skyrocketed, so I dropped it back down to something slightly lower, but still higher than it was before, and will see how that goes for a while.? >>>> >>>> Although low space is a definite issue, if I run an ls anywhere on my bricks directly it's instant, <1 second, and still takes several minutes via gluster, so there is still a problem in my gluster configuration somewhere. We don't have any snapshots, but I am trying to work out if any data on there is safe to delete, or if there is any way I can safely find and delete data which has been removed directly from the bricks in the past. I also have lz4 compression already enabled on each zpool which does help a bit, we get between 1.05 and 1.08x compression on this data.? >>>> I've tried to go through each client and checked it's cluster mount logs and also my brick logs and looking for errors, so far nothing is jumping out at me, but there are some warnings and errors here and there, I am trying to work out what they mean.? >>>> >>>> It's already 1 am here and unfortunately, I'm still awake working on this issue, but I think that I will have to leave the version upgrades until tomorrow.? >>>> >>>> Thanks again for your advice so far. If anyone has any ideas on where I can look for errors other than brick logs or the cluster mount logs to help resolve this issue, it would be much appreciated.? >>>> >>>> Cheers, >>>> >>>> - Patrick >>>> >>>> On Sat, Apr 20, 2019 at 11:57 PM Darrell Budic wrote: >>>>> >>>>> See inline: >>>>> >>>>>> On Apr 20, 2019, at 10:09 AM, Patrick Rennie wrote: >>>>>> >>>>>> Hi Darrell,? >>>>>> >>>>>> Thanks for your reply, this issue seems to be getting worse over the last few days, really has me tearing my hair out. I will do as you have suggested and get started on upgrading from 3.12.14 to 3.12.15.? >>>>>> I've checked the zfs properties and all bricks have "xattr=sa" set, but none of them has "acltype=posixacl" set, currently the acltype property shows "off", if I make these changes will it apply retroactively to the existing data? I'm unfamiliar with what this will change so I may need to look into that before I proceed.? >>>>> >>>>> >>>>> It is safe to apply that now, any new set/get calls will then use it if new posixacls exist, and use older if not. ZFS is good that way. It should clear up your posix_acl and posix errors over time. >>>>> >>>>>> I understand performance is going to slow down as the bricks get full, I am currently trying to free space and migrate data to some newer storage, I have fresh several hundred TB storage I just setup recently but with these performance issues it's really slow. I also believe there is significant data which has been deleted directly from the bricks in the past, so if I can reclaim this space in a safe manner then I will have at least around 10-15% free space.? >>>>> >>>>> >>>>> Full ZFS volumes will have a much larger impact on performance than you?d think, I?d prioritize this. If you have been taking zfs snapshots, consider deleting them to get the overall volume free space back up. And just to be sure it?s been said, delete from within the mounted volumes, don?t delete directly from the bricks (gluster will just try and heal it later, compounding your issues). Does not apply to deleting other data from the ZFS volume if it?s not part of the brick directory, of course. >>>>> >>>>>> These servers have dual 8 core Xeon (E5-2620v4) and 512GB of RAM so generally they have plenty of resources available, currently only using around 330/512GB of memory. >>>>>> >>>>>> I will look into what your suggested settings will change, and then will probably go ahead with your recommendations, for our specs as stated above, what would you suggest for performance.io-thread-count ? >>>>> >>>>> >>>>> I run single 2630v4s on my servers, which have a smaller storage footprint than yours. I?d go with 32 for performance.io-thread-count. I?d try 4 for the shd thread settings on that gear. Your memory use sounds fine, so no worries there. >>>>> >>>>>> Our workload is nothing too extreme, we have a few VMs which write backup data to this storage nightly for our clients, our VMs don't live on this cluster, but just write to it.? >>>>> >>>>> >>>>> If they are writing compressible data, you?ll get immediate benefit by setting compression=lz4 on your ZFS volumes. It won?t help any old data, of course, but it will compress new data going forward. This is another one that?s safe to enable on the fly. >>>>> >>>>>> I've been going through all of the logs I can, below are some slightly sanitized errors I've come across, but I'm not sure what to make of them. The main error I am seeing is the first one below, across several of my bricks, but possibly only for specific folders on the cluster, I'm not 100% about that yet though.? >>>>>> >>>>>> [2019-04-20 05:56:59.512649] E [MSGID: 113001] [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default? [Operation not supported] >>>>>> [2019-04-20 05:59:06.084333] E [MSGID: 113001] [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default? [Operation not supported] >>>>>> [2019-04-20 05:59:43.289030] E [MSGID: 113001] [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default? [Operation not supported] >>>>>> [2019-04-20 05:59:50.582257] E [MSGID: 113001] [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default? [Operation not supported] >>>>>> [2019-04-20 06:01:42.501701] E [MSGID: 113001] [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default? [Operation not supported] >>>>>> [2019-04-20 06:01:51.665354] W [posix.c:4929:posix_getxattr] 0-gvAA01-posix: Extended attributes not supported (try remounting brick with 'user_xattr' flag) >>>>>> >>>>>> >>>>>> [2019-04-20 13:12:36.131856] E [MSGID: 113002] [posix-helpers.c:893:posix_gfid_set] 0-gvAA01-posix: gfid is null for /xxxxxxxxxxxxxxxxxxxx [Invalid argument] >>>>>> [2019-04-20 13:12:36.131959] E [MSGID: 113002] [posix.c:362:posix_lookup] 0-gvAA01-posix: buf->ia_gfid is null for /brick2/xxxxxxxxxxxxxxxxxxxx_62906_tmp [No data available] >>>>>> [2019-04-20 13:12:36.132016] E [MSGID: 115050] [server-rpc-fops.c:175:server_lookup_cbk] 0-gvAA01-server: 24274759: LOOKUP /xxxxxxxxxxxxxxxxxxxx (a7c9b4a0-b7ee-4d01-a79e-576013c8ac87/Cloud Backup_clone1.vbm_62906_tmp), client: 00-A-16217-2019/04/08-21:23:03:692424-gvAA01-client-4-0-3, error-xlator: gvAA01-posix [No data available] >>>>>> [2019-04-20 13:12:38.093719] E [MSGID: 115050] [server-rpc-fops.c:175:server_lookup_cbk] 0-gvAA01-server: 24276491: LOOKUP /xxxxxxxxxxxxxxxxxxxx (a7c9b4a0-b7ee-4d01-a79e-576013c8ac87/Cloud Backup_clone1.vbm_62906_tmp), client: 00-A-16217-2019/04/08-21:23:03:692424-gvAA01-client-4-0-3, error-xlator: gvAA01-posix [No data available] >>>>>> [2019-04-20 13:12:38.093660] E [MSGID: 113002] [posix-helpers.c:893:posix_gfid_set] 0-gvAA01-posix: gfid is null for /xxxxxxxxxxxxxxxxxxxx [Invalid argument] >>>>>> [2019-04-20 13:12:38.093696] E [MSGID: 113002] [posix.c:362:posix_lookup] 0-gvAA01-posix: buf->ia_gfid is null for /brick2/xxxxxxxxxxxxxxxxxxxx [No data available] >>>>>> >>>>> >>>>> posixacls should clear those up, as mentioned. >>>>> >>>>>> >>>>>> [2019-04-20 14:25:59.654576] E [inodelk.c:404:__inode_unlock_lock] 0-gvAA01-locks:? Matching lock not found for unlock 0-9223372036854775807, by 980fdbbd367f0000 on 0x7fc4f0161440 >>>>>> [2019-04-20 14:25:59.654668] E [MSGID: 115053] [server-rpc-fops.c:295:server_inodelk_cbk] 0-gvAA01-server: 6092928: INODELK /xxxxxxxxxxxxxxxxxxxx.cdr$ (25b14631-a179-4274-8243-6e272d4f2ad8), client: cb-per-worker18-53637-2019/04/19-14:25:37:927673-gvAA01-client-1-0-4, error-xlator: gvAA01-locks [Invalid argument] >>>>>> >>>>>> >>>>>> [2019-04-20 13:35:07.495495] E [rpcsvc.c:1364:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x247c644, Program: GlusterFS 3.3, ProgVers: 330, Proc: 27) to rpc-transport (tcp.gvAA01-server) >>>>>> [2019-04-20 13:35:07.495619] E [server.c:195:server_submit_reply] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.14/xlator/debug/io-stats.so(+0x1696a) [0x7ff4ae6f796a] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.14/xlator/protocol/server.so(+0x2d6e8) [0x7ff4ae2a96e8] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.14/xlator/protocol/server.so(+0x928d) [0x7ff4ae28528d] ) 0-: Reply submission failed >>>>>> >>>>> >>>>> Fix the posix acls and see if these clear up over time as well, I?m unclear on what the overall effect of running without the posix acls will be to total gluster health. Your biggest problem sounds like you need to free up space on the volumes and get the overall volume health back up to par and see if that doesn?t resolve the symptoms you?re seeing. >>>>> >>>>> >>>>>> >>>>>> Thank you again for your assistance. It is greatly appreciated.? >>>>>> >>>>>> - Patrick >>>>>> >>>>>> >>>>>> >>>>>> On Sat, Apr 20, 2019 at 10:50 PM Darrell Budic wrote: >>>>>>> >>>>>>> Patrick, >>>>>>> >>>>>>> I would definitely upgrade your two nodes from 3.12.14 to 3.12.15. You also mention ZFS, and that error you show makes me think you need to check to be sure you have ?xattr=sa? and ?acltype=posixacl? set on your ZFS volumes. >>>>>>> >>>>>>> You also observed your bricks are crossing the 95% full line, ZFS performance will degrade significantly the closer you get to full. In my experience, this starts somewhere between 10% and 5% free space remaining, so you?re in that realm.? >>>>>>> >>>>>>> How?s your free memory on the servers doing? Do you have your zfs arc cache limited to something less than all the RAM? It shares pretty well, but I?ve encountered situations where other things won?t try and take ram back properly if they think it?s in use, so ZFS never gets the opportunity to give it up. >>>>>>> >>>>>>> Since your volume is a disperse-replica, you might try tuning disperse.shd-max-threads, default is 1, I?d try it at 2, 4, or even more if the CPUs are beefy enough. And setting server.event-threads to 4 and client.event-threads to 8 has proven helpful in many cases. After you get upgraded to 3.12.15, enabling performance.stat-prefetch may help as well. I don?t know if it matters, but I?d also recommend resetting performance.least-prio-threads to the default of 1 (or try 2 or 4) and/or also setting performance.io-thread-count to 32 if those have beefy CPUs. >>>>>>> >>>>>>> Beyond those general ideas, more info about your hardware (CPU and RAM) and workload (VMs, direct storage for web servers or enders, etc) may net you some more ideas. Then you?re going to have to do more digging into brick logs looking for errors and/or warnings to see what?s going on. >>>>>>> >>>>>>> ? -Darrell >>>>>>> >>>>>>> >>>>>>>> On Apr 20, 2019, at 8:22 AM, Patrick Rennie wrote: >>>>>>>> >>>>>>>> Hello Gluster Users,? >>>>>>>> >>>>>>>> I am hoping someone can help me with resolving an ongoing issue I've been having, I'm new to mailing lists so forgive me if I have gotten anything wrong. We have noticed our performance deteriorating over the last few weeks, easily measured by trying to do an ls on one of our top-level folders, and timing it, which usually would take 2-5 seconds, and now takes up to 20 minutes, which obviously renders our cluster basically unusable. This has been intermittent in the past but is now almost constant and I am not sure how to work out the exact cause. We have noticed some errors in the brick logs, and have noticed that if we kill the right brick process, performance instantly returns back to normal, this is not always the same brick, but it indicates to me something in the brick processes or background tasks may be causing extreme latency. Due to this ability to fix it by killing the right brick process off, I think it's a specific file, or folder, or operation which may be hanging and causing the increased latency, but I am not sure how to work it out. One last thing to add is that our bricks are getting quite full (~95% full), we are trying to migrate data off to new storage but that is going slowly, not helped by this issue. I am currently trying to run a full heal as there appear to be many files needing healing, and I have all brick processes running so they have an opportunity to heal, but this means performance is very poor. It currently takes over 15-20 minutes to do an ls of one of our top-level folders, which just contains 60-80 other folders, this should take 2-5 seconds. This is all being checked by FUSE mount locally on the storage node itself, but it is the same for other clients and VMs accessing the cluster. Initially, it seemed our NFS mounts were not affected and operated at normal speed, but testing over the last day has shown that our NFS clients are also extremely slow, so it doesn't seem specific to FUSE as I first thought it might be.? >>>>>>>> >>>>>>>> I am not sure how to proceed from here, I am fairly new to gluster having inherited this setup from my predecessor and trying to keep it going. I have included some info below to try and help with diagnosis, please let me know if any further info would be helpful. I would really appreciate any advice on what I could try to work out the cause. Thank you in advance for reading this, and any suggestions you might be able to offer.? >>>>>>>> >>>>>>>> - Patrick >>>>>>>> >>>>>>>> This is an example of the main error I see in our brick logs, there have been others, I can post them when I see them again too: >>>>>>>> [2019-04-20 04:54:43.055680] E [MSGID: 113001] [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on /brick1/ library: system.posix_acl_default? [Operation not supported] >>>>>>>> [2019-04-20 05:01:29.476313] W [posix.c:4929:posix_getxattr] 0-gvAA01-posix: Extended attributes not supported (try remounting brick with 'user_xattr' flag) >>>>>>>> >>>>>>>> Our setup consists of 2 storage nodes and an arbiter node. I have noticed our nodes are on slightly different versions, I'm not sure if this could be an issue. We have 9 bricks on each node, made up of ZFS RAIDZ2 pools - total capacity is around 560TB.? >>>>>>>> We have bonded 10gbps NICS on each node, and I have tested bandwidth with iperf and found that it's what would be expected from this config.? >>>>>>>> Individual brick performance seems ok, I've tested several bricks using dd and can write a 10GB files at 1.7GB/s.? >>>>>>>> >>>>>>>> # dd if=/dev/zero of=/brick1/test/test.file bs=1M count=10000 >>>>>>>> 10000+0 records in >>>>>>>> 10000+0 records out >>>>>>>> 10485760000 bytes (10 GB, 9.8 GiB) copied, 6.20303 s, 1.7 GB/s >>>>>>>> >>>>>>>> Node 1: >>>>>>>> # glusterfs --version >>>>>>>> glusterfs 3.12.15 >>>>>>>> >>>>>>>> Node 2: >>>>>>>> # glusterfs --version >>>>>>>> glusterfs 3.12.14 >>>>>>>> >>>>>>>> Arbiter: >>>>>>>> # glusterfs --version >>>>>>>> glusterfs 3.12.14 >>>>>>>> >>>>>>>> Here is our gluster volume status: >>>>>>>> >>>>>>>> # gluster volume status >>>>>>>> Status of volume: gvAA01 >>>>>>>> Gluster process? ? ? ? ? ? ? ? ? ? ? ? ? ? ?TCP Port? RDMA Port? Online? Pid >>>>>>>> ------------------------------------------------------------------------------ >>>>>>>> Brick 01-B:/brick1/gvAA01/brick? ? 49152? ? ?0? ? ? ? ? Y? ? ? ?7219 >>>>>>>> Brick 02-B:/brick1/gvAA01/brick? ? 49152? ? ?0? ? ? ? ? Y? ? ? ?21845 >>>>>>>> Brick 00-A:/arbiterAA01/gvAA01/bri >>>>>>>> ck1? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?49152? ? ?0? ? ? ? ? Y? ? ? ?6931 >>>>>>>> Brick 01-B:/brick2/gvAA01/brick? ? 49153? ? ?0? ? ? ? ? Y? ? ? ?7239 >>>>>>>> Brick 02-B:/brick2/gvAA01/brick? ? 49153? ? ?0? ? ? ? ? Y? ? ? ?9916 >>>>>>>> Brick 00-A:/arbiterAA01/gvAA01/bri >>>>>>>> ck2? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?49153? ? ?0? ? ? ? ? Y? ? ? ?6939 >>>>>>>> Brick 01-B:/brick3/gvAA01/brick? ? 49154? ? ?0? ? ? ? ? Y? ? ? ?7235 >>>>>>>> Brick 02-B:/brick3/gvAA01/brick? ? 49154? ? ?0? ? ? ? ? Y? ? ? ?21858 >>>>>>>> Brick 00-A:/arbiterAA01/gvAA01/bri >>>>>>>> ck3? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?49154? ? ?0? ? ? ? ? Y? ? ? ?6947 >>>>>>>> Brick 01-B:/brick4/gvAA01/brick? ? 49155? ? ?0? ? ? ? ? Y? ? ? ?31840 >>>>>>>> Brick 02-B:/brick4/gvAA01/brick? ? 49155? ? ?0? ? ? ? ? Y? ? ? ?9933 >>>>>>>> Brick 00-A:/arbiterAA01/gvAA01/bri >>>>>>>> ck4? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?49155? ? ?0? ? ? ? ? Y? ? ? ?6956 >>>>>>>> Brick 01-B:/brick5/gvAA01/brick? ? 49156? ? ?0? ? ? ? ? Y? ? ? ?7233 >>>>>>>> Brick 02-B:/brick5/gvAA01/brick? ? 49156? ? ?0? ? ? ? ? Y? ? ? ?9942 >>>>>>>> Brick 00-A:/arbiterAA01/gvAA01/bri >>>>>>>> ck5? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?49156? ? ?0? ? ? ? ? Y? ? ? ?6964 >>>>>>>> Brick 01-B:/brick6/gvAA01/brick? ? 49157? ? ?0? ? ? ? ? Y? ? ? ?7234 >>>>>>>> Brick 02-B:/brick6/gvAA01/brick? ? 49157? ? ?0? ? ? ? ? Y? ? ? ?9952 >>>>>>>> Brick 00-A:/arbiterAA01/gvAA01/bri >>>>>>>> ck6? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?49157? ? ?0? ? ? ? ? Y? ? ? ?6974 >>>>>>>> Brick 01-B:/brick7/gvAA01/brick? ? 49158? ? ?0? ? ? ? ? Y? ? ? ?7248 >>>>>>>> Brick 02-B:/brick7/gvAA01/brick? ? 49158? ? ?0? ? ? ? ? Y? ? ? ?9960 >>>>>>>> Brick 00-A:/arbiterAA01/gvAA01/bri >>>>>>>> ck7? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?49158? ? ?0? ? ? ? ? Y? ? ? ?6984 >>>>>>>> Brick 01-B:/brick8/gvAA01/brick? ? 49159? ? ?0? ? ? ? ? Y? ? ? ?7253 >>>>>>>> Brick 02-B:/brick8/gvAA01/brick? ? 49159? ? ?0? ? ? ? ? Y? ? ? ?9970 >>>>>>>> Brick 00-A:/arbiterAA01/gvAA01/bri >>>>>>>> ck8? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?49159? ? ?0? ? ? ? ? Y? ? ? ?6993 >>>>>>>> Brick 01-B:/brick9/gvAA01/brick? ? 49160? ? ?0? ? ? ? ? Y? ? ? ?7245 >>>>>>>> Brick 02-B:/brick9/gvAA01/brick? ? 49160? ? ?0? ? ? ? ? Y? ? ? ?9984 >>>>>>>> Brick 00-A:/arbiterAA01/gvAA01/bri >>>>>>>> ck9? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?49160? ? ?0? ? ? ? ? Y? ? ? ?7001 >>>>>>>> NFS Server on localhost? ? ? ? ? ? ? ? ? ? ?2049? ? ? 0? ? ? ? ? Y? ? ? ?17276 >>>>>>>> Self-heal Daemon on localhost? ? ? ? ? ? ? ?N/A? ? ? ?N/A? ? ? ? Y? ? ? ?25245 >>>>>>>> NFS Server on 02-B? ? ? ? ? ? ? ? ?2049? ? ? 0? ? ? ? ? Y? ? ? ?9089 >>>>>>>> Self-heal Daemon on 02-B? ? ? ? ? ?N/A? ? ? ?N/A? ? ? ? Y? ? ? ?17838 >>>>>>>> NFS Server on 00-a? ? ? ? ? ? ? ? ?2049? ? ? 0? ? ? ? ? Y? ? ? ?15660 >>>>>>>> Self-heal Daemon on 00-a? ? ? ? ? ?N/A? ? ? ?N/A? ? ? ? Y? ? ? ?16218 >>>>>>>> >>>>>>>> Task Status of Volume gvAA01 >>>>>>>> ------------------------------------------------------------------------------ >>>>>>>> There are no active volume tasks >>>>>>>> >>>>>>>> And gluster volume info:? >>>>>>>> >>>>>>>> # gluster volume info >>>>>>>> >>>>>>>> Volume Name: gvAA01 >>>>>>>> Type: Distributed-Replicate >>>>>>>> Volume ID: ca4ece2c-13fe-414b-856c-2878196d6118 >>>>>>>> Status: Started >>>>>>>> Snapshot Count: 0 >>>>>>>> Number of Bricks: 9 x (2 + 1) = 27 >>>>>>>> Transport-type: tcp >>>>>>>> Bricks: >>>>>>>> Brick1: 01-B:/brick1/gvAA01/brick >>>>>>>> Brick2: 02-B:/brick1/gvAA01/brick >>>>>>>> Brick3: 00-A:/arbiterAA01/gvAA01/brick1 (arbiter) >>>>>>>> Brick4: 01-B:/brick2/gvAA01/brick >>>>>>>> Brick5: 02-B:/brick2/gvAA01/brick >>>>>>>> Brick6: 00-A:/arbiterAA01/gvAA01/brick2 (arbiter) >>>>>>>> Brick7: 01-B:/brick3/gvAA01/brick >>>>>>>> Brick8: 02-B:/brick3/gvAA01/brick >>>>>>>> Brick9: 00-A:/arbiterAA01/gvAA01/brick3 (arbiter) >>>>>>>> Brick10: 01-B:/brick4/gvAA01/brick >>>>>>>> Brick11: 02-B:/brick4/gvAA01/brick >>>>>>>> Brick12: 00-A:/arbiterAA01/gvAA01/brick4 (arbiter) >>>>>>>> Brick13: 01-B:/brick5/gvAA01/brick >>>>>>>> Brick14: 02-B:/brick5/gvAA01/brick >>>>>>>> Brick15: 00-A:/arbiterAA01/gvAA01/brick5 (arbiter) >>>>>>>> Brick16: 01-B:/brick6/gvAA01/brick >>>>>>>> Brick17: 02-B:/brick6/gvAA01/brick >>>>>>>> Brick18: 00-A:/arbiterAA01/gvAA01/brick6 (arbiter) >>>>>>>> Brick19: 01-B:/brick7/gvAA01/brick >>>>>>>> Brick20: 02-B:/brick7/gvAA01/brick >>>>>>>> Brick21: 00-A:/arbiterAA01/gvAA01/brick7 (arbiter) >>>>>>>> Brick22: 01-B:/brick8/gvAA01/brick >>>>>>>> Brick23: 02-B:/brick8/gvAA01/brick >>>>>>>> Brick24: 00-A:/arbiterAA01/gvAA01/brick8 (arbiter) >>>>>>>> Brick25: 01-B:/brick9/gvAA01/brick >>>>>>>> Brick26: 02-B:/brick9/gvAA01/brick >>>>>>>> Brick27: 00-A:/arbiterAA01/gvAA01/brick9 (arbiter) >>>>>>>> Options Reconfigured: >>>>>>>> cluster.shd-max-threads: 4 >>>>>>>> performance.least-prio-threads: 16 >>>>>>>> cluster.readdir-optimize: on >>>>>>>> performance.quick-read: off >>>>>>>> performance.stat-prefetch: off >>>>>>>> cluster.data-self-heal: on >>>>>>>> cluster.lookup-unhashed: auto >>>>>>>> cluster.lookup-optimize: on >>>>>>>> cluster.favorite-child-policy: mtime >>>>>>>> server.allow-insecure: on >>>>>>>> transport.address-family: inet >>>>>>>> client.bind-insecure: on >>>>>>>> cluster.entry-self-heal: off >>>>>>>> cluster.metadata-self-heal: off >>>>>>>> performance.md-cache-timeout: 600 >>>>>>>> cluster.self-heal-daemon: enable >>>>>>>> performance.readdir-ahead: on >>>>>>>> diagnostics.brick-log-level: INFO >>>>>>>> nfs.disable: off >>>>>>>> >>>>>>>> Thank you for any assistance.? >>>>>>>> >>>>>>>> - Patrick >>>>>>>> _______________________________________________ >>>>>>>> Gluster-users mailing list >>>>>>>> Gluster-users at gluster.org >>>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>>>> >>>>>>> >>>>> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From patrickmrennie at gmail.com Sun Apr 21 14:24:34 2019 From: patrickmrennie at gmail.com (Patrick Rennie) Date: Sun, 21 Apr 2019 22:24:34 +0800 Subject: [Gluster-users] Extremely slow cluster performance In-Reply-To: References: Message-ID: Hi Strahil, Thank you for your reply and your suggestions. I'm not sure which logs would be most relevant to be checking to diagnose this issue, we have the brick logs, the cluster mount logs, the shd logs or something else? I have posted a few that I have seen repeated a few times already. I will continue to post anything further that I see. I am working on migrating data to some new storage, so this will slowly free up space, although this is a production cluster and new data is being uploaded every day, sometimes faster than I can migrate it off. I have several other similar clusters and none of them have the same problem, one the others is actually at 98-99% right now (big problem, I know) but still performs perfectly fine compared to this cluster, I am not sure low space is the root cause here. I currently have 13 VMs accessing this cluster, I have checked each one and all of them use one of the two options below to mount the cluster in fstab HOSTNAME:/gvAA01 /mountpoint glusterfs defaults,_netdev,rw,log-level=WARNING,direct-io-mode=disable,use-readdirp=no 0 0 HOSTNAME:/gvAA01 /mountpoint glusterfs defaults,_netdev,rw,log-level=WARNING,direct-io-mode=disable I also have a few other VMs which use NFS to access the cluster, and these machines appear to be significantly quicker, initially I get a similar delay with NFS but if I cancel the first "ls" and try it again I get < 1 sec lookups, this can take over 10 minutes by FUSE/gluster client, but the same trick of cancelling and trying again doesn't work for FUSE/gluster. Sometimes the NFS queries have no delay at all, so this is a bit strange to me. HOSTNAME:/gvAA01 /mountpoint/ nfs defaults,_netdev,vers=3,async,noatime 0 0 Example: user at VM:~$ time ls /cluster/folder ^C real 9m49.383s user 0m0.001s sys 0m0.010s user at VM:~$ time ls /cluster/folder real 0m0.069s user 0m0.001s sys 0m0.007s --- I have checked the profiling as you suggested, I let it run for around a minute, then cancelled it and saved the profile info. root at HOSTNAME:/var/log/glusterfs# gluster volume profile gvAA01 start Starting volume profile on gvAA01 has been successful root at HOSTNAME:/var/log/glusterfs# time ls /cluster/folder ^C real 1m1.660s user 0m0.000s sys 0m0.002s root at HOSTNAME:/var/log/glusterfs# gluster volume profile gvAA01 info >> ~/profile.txt root at HOSTNAME:/var/log/glusterfs# gluster volume profile gvAA01 stop I will attach the results to this email as it's over 1000 lines. Unfortunately, I'm not sure what I'm looking at but possibly somebody will be able to help me make sense of it and let me know if it highlights any specific issues. Happy to try any further suggestions. Thank you, -Patrick On Sun, Apr 21, 2019 at 7:55 PM Strahil wrote: > By the way, can you provide the 'volume info' and the mount options on all > clients? > Maybe , there is an option that uses a lot of resources due to some > client's mount options. > > Best Regards, > Strahil Nikolov > On Apr 21, 2019 10:55, Patrick Rennie wrote: > > Just another small update, I'm continuing to watch my brick logs and I > just saw these errors come up in the recent events too. I am going to > continue to post any errors I see in the hope of finding the right one to > try and fix.. > This is from the logs on brick1, seems to be occurring on both nodes on > brick1, although at different times. I'm not sure what this means, can > anyone shed any light? > I guess I am looking for some kind of specific error which may indicate > something is broken or stuck and locking up and causing the extreme latency > I'm seeing in the cluster. > > [2019-04-21 07:25:55.064497] E [rpcsvc.c:1364:rpcsvc_submit_generic] > 0-rpc-service: failed to submit message (XID: 0x7c700c, Program: GlusterFS > 3.3, ProgVers: 330, Proc: 29) to rpc-transport (tcp.gvAA01-server) > [2019-04-21 07:25:55.064612] E [server.c:195:server_submit_reply] > (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e58a) > [0x7f3b3e93158a] > -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17d45) > [0x7f3b3e4c5d45] > -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) > [0x7f3b3e4b72cd] ) 0-: Reply submission failed > [2019-04-21 07:25:55.064675] E [rpcsvc.c:1364:rpcsvc_submit_generic] > 0-rpc-service: failed to submit message (XID: 0x7c70af, Program: GlusterFS > 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server) > [2019-04-21 07:25:55.064705] E [server.c:195:server_submit_reply] > (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa) > [0x7f3b3e9318fa] > -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35) > [0x7f3b3e4c5f35] > -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) > [0x7f3b3e4b72cd] ) 0-: Reply submission failed > [2019-04-21 07:25:55.064742] E [rpcsvc.c:1364:rpcsvc_submit_generic] > 0-rpc-service: failed to submit message (XID: 0x7c723c, Program: GlusterFS > 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server) > [2019-04-21 07:25:55.064768] E [server.c:195:server_submit_reply] > (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa) > [0x7f3b3e9318fa] > -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35) > [0x7f3b3e4c5f35] > -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) > [0x7f3b3e4b72cd] ) 0-: Reply submission failed > [2019-04-21 07:25:55.064812] E [rpcsvc.c:1364:rpcsvc_submit_generic] > 0-rpc-service: failed to submit message (XID: 0x7c72b4, Program: GlusterFS > 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server) > [2019-04-21 07:25:55.064837] E [server.c:195:server_submit_reply] > (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa) > [0x7f3b3e9318fa] > -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35) > [0x7f3b3e4c5f35] > -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) > [0x7f3b3e4b72cd] ) 0-: Reply submission failed > [2019-04-21 07:25:55.064880] E [rpcsvc.c:1364:rpcsvc_submit_generic] > 0-rpc-service: failed to submit message (XID: 0x7c740b, Program: GlusterFS > 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server) > [2019-04-21 07:25:55.064905] E [server.c:195:server_submit_reply] > (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa) > [0x7f3b3e9318fa] > -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35) > [0x7f3b3e4c5f35] > -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) > [0x7f3b3e4b72cd] ) 0-: Reply submission failed > [2019-04-21 07:25:55.064939] E [rpcsvc.c:1364:rpcsvc_submit_generic] > 0-rpc-service: failed to submit message (XID: 0x7c7441, Program: GlusterFS > 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server) > [2019-04-21 07:25:55.064962] E [server.c:195:server_submit_reply] > (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa) > [0x7f3b3e9318fa] > -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35) > [0x7f3b3e4c5f35] > -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) > [0x7f3b3e4b72cd] ) 0-: Reply submission failed > [2019-04-21 07:25:55.064996] E [rpcsvc.c:1364:rpcsvc_submit_generic] > 0-rpc-service: failed to submit message (XID: 0x7c74d5, Program: GlusterFS > 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server) > [2019-04-21 07:25:55.065020] E [server.c:195:server_submit_reply] > (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa) > [0x7f3b3e9318fa] > -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35) > [0x7f3b3e4c5f35] > -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) > [0x7f3b3e4b72cd] ) 0-: Reply submission failed > [2019-04-21 07:25:55.065052] E [rpcsvc.c:1364:rpcsvc_submit_generic] > 0-rpc-service: failed to submit message (XID: 0x7c7551, Program: GlusterFS > 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server) > [2019-04-21 07:25:55.065076] E [server.c:195:server_submit_reply] > (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa) > [0x7f3b3e9318fa] > -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35) > [0x7f3b3e4c5f35] > -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) > [0x7f3b3e4b72cd] ) 0-: Reply submission failed > [2019-04-21 07:25:55.065110] E [rpcsvc.c:1364:rpcsvc_submit_generic] > 0-rpc-service: failed to submit message (XID: 0x7c76d1, Program: GlusterFS > 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server) > [2019-04-21 07:25:55.065133] E [server.c:195:server_submit_reply] > (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa) > [0x7f3b3e9318fa] > -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35) > [0x7f3b3e4c5f35] > -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) > [0x7f3b3e4b72cd] ) 0-: Reply submission failed > > Thanks again, > > -Patrick > > On Sun, Apr 21, 2019 at 3:50 PM Patrick Rennie > wrote: > > Hi Darrell, > > Thanks again for your advice, I've left it for a while but unfortunately > it's still just as slow and causing more problems for our operations now. I > will need to try and take some steps to at least bring performance back to > normal while continuing to investigate the issue longer term. I can > definitely see one node with heavier CPU than the other, almost double, > which I am OK with, but I think the heal process is going to take forever, > trying to check the "gluster volume heal info" shows thousands and > thousands of files which may need healing, I have no idea how many in total > the command is still running after hours, so I am not sure what has gone so > wrong to cause this. > > I've checked cluster.op-version and cluster.max-op-version and it looks > like I'm on the latest version there. > > I have no idea how long the healing is going to take on this cluster, we > have around 560TB of data on here, but I don't think I can wait that long > to try and restore performance to normal. > > Can anyone think of anything else I can try in the meantime to work out > what's causing the extreme latency? > > I've been going through cluster client the logs of some of our VMs and on > some of our FTP servers I found this in the cluster mount log, but I am not > seeing it on any of our other servers, just our FTP servers. > > [2019-04-21 07:16:19.925388] E [MSGID: 101046] > [dht-common.c:1904:dht_revalidate_cbk] 0-gvAA01-dht: dict is null > [2019-04-21 07:19:43.413834] W [MSGID: 114031] > [client-rpc-fops.c:2203:client3_3_setattr_cbk] 0-gvAA01-client-19: remote > operation failed [No such file or directory] > [2019-04-21 07:19:43.414153] W [MSGID: 114031] > [client-rpc-fops.c:2203:client3_3_setattr_cbk] 0-gvAA01-client-20: remote > operation failed [No such file or directory] > [2019-04-21 07:23:33.154717] E [MSGID: 101046] > [dht-common.c:1904:dht_revalidate_cbk] 0-gvAA01-dht: dict is null > [2019-04-21 07:33:24.943913] E [MSGID: 101046] > [dht-common.c:1904:dht_revalidate_cbk] 0-gvAA01-dht: dict is null > > Any ideas what this could mean? I am basically just grasping at straws > here. > > I am going to hold off on the version upgrade until I know there are no > files which need healing, which could be a while, from some reading I've > done there shouldn't be any issues with this as both are on v3.12.x > > I've free'd up a small amount of space, but I still need to work on this > further. > > I've read of a command "find .glusterfs -type f -links -2 -exec rm {} \;" > which could be run on each brick and it would potentially clean up any > files which were deleted straight from the bricks, but not via the client, > I have a feeling this could help me free up about 5-10TB per brick from > what I've been told about the history of this cluster. Can anyone confirm > if this is actually safe to run? > > At this stage, I'm open to any suggestions as to how to proceed, thanks > again for any advice. > > Cheers, > > - Patrick > > On Sun, Apr 21, 2019 at 1:22 AM Darrell Budic > wrote: > > Patrick, > > Sounds like progress. Be aware that gluster is expected to max out the > CPUs on at least one of your servers while healing. This is normal and > won?t adversely affect overall performance (any more than having bricks in > need of healing, at any rate) unless you?re overdoing it. shd threads <= 4 > should not do that on your hardware. Other tunings may have also increased > overall performance, so you may see higher CPU than previously anyway. I?d > recommend upping those thread counts and letting it heal as fast as > possible, especially if these are dedicated Gluster storage servers (Ie: > not also running VMs, etc). You should see ?normal? CPU use one heals are > completed. I see ~15-30% overall normally, 95-98% while healing (x my 20 > cores). It?s also likely to be different between your servers, in a pure > replica, one tends to max and one tends to be a little higher, in a > distributed-replica, I?d expect more than one to run harder while healing. > > Keep the differences between doing an ls on a brick and doing an ls on a > gluster mount in mind. When you do a ls on a gluster volume, it isn?t just > doing a ls on one brick, it?s effectively doing it on ALL of your bricks, > and they all have to return data before the ls succeeds. In a distributed > volume, it?s figuring out where on each volume things live and getting the > stat() from each to assemble the whole thing. And if things are in need of > healing, it will take even longer to decide which version is current and > use it (shd triggers a heal anytime it encounters this). Any of these > things being slow slows down the overall response. > > At this point, I?d get some sleep too, and let your cluster heal while you > do. I?d really want it fully healed before I did any updates anyway, so let > it use CPU and get itself sorted out. Expect it to do a round of healing > after you upgrade each machine too, this is normal so don?t let the CPU > spike surprise you, It?s just catching up from the downtime incurred by the > update and/or reboot if you did one. > > That reminds me, check your gluster cluster.op-version and > cluster.max-op-version (gluster vol get all all | grep op-version). If > op-version isn?t at the max-op-verison, set it to it so you?re taking > advantage of the latest features available to your version. > > -Darrell > > On Apr 20, 2019, at 11:54 AM, Patrick Rennie > wrote: > > Hi Darrell, > > Thanks again for your advice, I've applied the acltype=posixacl on my > zpools and I think that has reduced some of the noise from my brick logs. > I also bumped up some of the thread counts you suggested but my CPU load > skyrocketed, so I dropped it back down to something slightly lower, but > still higher than it was before, and will see how that goes for a while. > > Although low space is a definite issue, if I run an ls anywhere on my > bricks directly it's instant, <1 second, and still takes several minutes > via gluster, so there is still a problem in my gluster configuration > somewhere. We don't have any snapshots, but I am trying to work out if any > data on there is safe to delete, or if there is any way I can safely find > and delete data which has been removed directly from the bricks in the > past. I also have lz4 compression already enabled on each zpool which does > help a bit, we get between 1.05 and 1.08x compression on this data. > I've tried to go through each client and checked it's cluster mount logs > and also my brick logs and looking for errors, so far nothing is jumping > out at me, but there are some warnings and errors here and there, I am > trying to work out what they mean. > > It's already 1 am here and unfortunately, I'm still awake working on this > issue, but I think that I will have to leave the version upgrades until > tomorrow. > > Thanks again for your advice so far. If anyone has any ideas on where I > can look for errors other than brick logs or the cluster mount logs to help > resolve this issue, it would be much appreciated. > > Cheers, > > - Patrick > > On Sat, Apr 20, 2019 at 11:57 PM Darrell Budic > wrote: > > See inline: > > On Apr 20, 2019, at 10:09 AM, Patrick Rennie > wrote: > > Hi Darrell, > > Thanks for your reply, this issue seems to be getting worse over the last > few days, really has me tearing my hair out. I will do as you have > suggested and get started on upgrading from 3.12.14 to 3.12.15. > I've checked the zfs properties and all bricks have "xattr=sa" set, but > none of them has "acltype=posixacl" set, currently the acltype property > shows "off", if I make these changes will it apply retroactively to the > existing data? I'm unfamiliar with what this will change so I may need to > look into that before I proceed. > > > It is safe to apply that now, any new set/get calls will then use it if > new posixacls exist, and use older if not. ZFS is good that way. It should > clear up your posix_acl and posix errors over time. > > I understand performance is going to slow down as the bricks get full, I > am currently trying to free space and migrate data to some newer storage, I > have fresh several hundred TB storage I just setup recently but with these > performance issues it's really slow. I also believe there is significant > data which has been deleted directly from the bricks in the past, so if I > can reclaim this space in a safe manner then I will have at least around > 10-15% free space. > > > Full ZFS volumes will have a much larger impact on performance than you?d > think, I?d prioritize this. If you have been taking zfs snapshots, consider > deleting them to get the overall volume free space back up. And just to be > sure it?s been said, delete from within the mounted volumes, don?t delete > directly from the bricks (gluster will just try and heal it later, > compounding your issues). Does not apply to deleting other data from the > ZFS volume if it?s not part of the brick directory, of course. > > These servers have dual 8 core Xeon (E5-2620v4) and 512GB of RAM so > generally they have plenty of resources available, currently only using > around 330/512GB of memory. > > I will look into what your suggested settings will change, and then will > probably go ahead with your recommendations, for our specs as stated above, > what would you suggest for performance.io-thread-count ? > > > I run single 2630v4s on my servers, which have a smaller storage footprint > than yours. I?d go with 32 for performance.io-thread-count. I?d try 4 for > the shd thread settings on that gear. Your memory use sounds fine, so no > worries there. > > Our workload is nothing too extreme, we have a few VMs which write backup > data to this storage nightly for our clients, our VMs don't live on this > cluster, but just write to it. > > > If they are writing compressible data, you?ll get immediate benefit by > setting compression=lz4 on your ZFS volumes. It won?t help any old data, of > course, but it will compress new data going forward. This is another one > that?s safe to enable on the fly. > > I've been going through all of the logs I can, below are some slightly > sanitized errors I've come across, but I'm not sure what to make of them. > The main error I am seeing is the first one below, across several of my > bricks, but possibly only for specific folders on the cluster, I'm not 100% > about that yet though. > > [2019-04-20 05:56:59.512649] E [MSGID: 113001] > [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on > /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default [Operation not > supported] > [2019-04-20 05:59:06.084333] E [MSGID: 113001] > [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on > /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default [Operation not > supported] > [2019-04-20 05:59:43.289030] E [MSGID: 113001] > [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on > /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default [Operation not > supported] > [2019-04-20 05:59:50.582257] E [MSGID: 113001] > [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on > /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default [Operation not > supported] > [2019-04-20 06:01:42.501701] E [MSGID: 113001] > [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on > /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default [Operation not > supported] > [2019-04-20 06:01:51.665354] W [posix.c:4929:posix_getxattr] > 0-gvAA01-posix: Extended attributes not supported (try remounting brick > with 'user_xattr' flag) > > > [2019-04-20 13:12:36.131856] E [MSGID: 113002] > [posix-helpers.c:893:posix_gfid_set] 0-gvAA01-posix: gfid is null for > /xxxxxxxxxxxxxxxxxxxx [Invalid argument] > [2019-04-20 13:12:36.131959] E [MSGID: 113002] [posix.c:362:posix_lookup] > 0-gvAA01-posix: buf->ia_gfid is null for > /brick2/xxxxxxxxxxxxxxxxxxxx_62906_tmp [No data available] > [2019-04-20 13:12:36.132016] E [MSGID: 115050] > [server-rpc-fops.c:175:server_lookup_cbk] 0-gvAA01-server: 24274759: LOOKUP > /xxxxxxxxxxxxxxxxxxxx (a7c9b4a0-b7ee-4d01-a79e-576013c8ac87/Cloud > Backup_clone1.vbm_62906_tmp), client: > 00-A-16217-2019/04/08-21:23:03:692424-gvAA01-client-4-0-3, error-xlator: > gvAA01-posix [No data available] > [2019-04-20 13:12:38.093719] E [MSGID: 115050] > [server-rpc-fops.c:175:server_lookup_cbk] 0-gvAA01-server: 24276491: LOOKUP > /xxxxxxxxxxxxxxxxxxxx (a7c9b4a0-b7ee-4d01-a79e-576013c8ac87/Cloud > Backup_clone1.vbm_62906_tmp), client: > 00-A-16217-2019/04/08-21:23:03:692424-gvAA01-client-4-0-3, error-xlator: > gvAA01-posix [No data available] > [2019-04-20 13:12:38.093660] E [MSGID: 113002] > [posix-helpers.c:893:posix_gfid_set] 0-gvAA01-posix: gfid is null for > /xxxxxxxxxxxxxxxxxxxx [Invalid argument] > [2019-04-20 13:12:38.093696] E [MSGID: 113002] [posix.c:362:posix_lookup] > 0-gvAA01-posix: buf->ia_gfid is null for /brick2/xxxxxxxxxxxxxxxxxxxx [No > data available] > > > posixacls should clear those up, as mentioned. > > > [2019-04-20 14:25:59.654576] E [inodelk.c:404:__inode_unlock_lock] > 0-gvAA01-locks: Matching lock not found for unlock 0-9223372036854775807, > by 980fdbbd367f0000 on 0x7fc4f0161440 > [2019-04-20 14:25:59.654668] E [MSGID: 115053] > [server-rpc-fops.c:295:server_inodelk_cbk] 0-gvAA01-server: 6092928: > INODELK /xxxxxxxxxxxxxxxxxxxx.cdr$ (25b14631-a179-4274-8243-6e272d4f2ad8), > client: > cb-per-worker18-53637-2019/04/19-14:25:37:927673-gvAA01-client-1-0-4, > error-xlator: gvAA01-locks [Invalid argument] > > > [2019-04-20 13:35:07.495495] E [rpcsvc.c:1364:rpcsvc_submit_generic] > 0-rpc-service: failed to submit message (XID: 0x247c644, Program: GlusterFS > 3.3, ProgVers: 330, Proc: 27) to rpc-transport (tcp.gvAA01-server) > [2019-04-20 13:35:07.495619] E [server.c:195:server_submit_reply] > (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.14/xlator/debug/io-stats.so(+0x1696a) > [0x7ff4ae6f796a] > -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.14/xlator/protocol/server.so(+0x2d6e8) > [0x7ff4ae2a96e8] > -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.14/xlator/protocol/server.so(+0x928d) > [0x7ff4ae28528d] ) 0-: Reply submission failed > > > Fix the posix acls and see if these clear up over time as well, I?m > unclear on what the overall effect of running without the posix acls will > be to total gluster health. Your biggest problem sounds like you need to > free up space on the volumes and get the overall volume health back up to > par and see if that doesn?t resolve the symptoms you?re seeing. > > > > Thank you again for your assistance. It is greatly appreciated. > > - Patrick > > > > On Sat, Apr 20, 2019 at 10:50 PM Darrell Budic > wrote: > > Patrick, > > I would definitely upgrade your two nodes from 3.12.14 to 3.12.15. You > also mention ZFS, and that error you show makes me think you need to check > to be sure you have ?xattr=sa? and ?acltype=posixacl? set on your ZFS > volumes. > > You also observed your bricks are crossing the 95% full line, ZFS > performance will degrade significantly the closer you get to full. In my > experience, this starts somewhere between 10% and 5% free space remaining, > so you?re in that realm. > > How?s your free memory on the servers doing? Do you have your zfs arc > cache limited to something less than all the RAM? It shares pretty well, > but I?ve encountered situations where other things won?t try and take ram > back properly if they think it?s in use, so ZFS never gets the opportunity > to give it up. > > Since your volume is a disperse-replica, you might try tuning > disperse.shd-max-threads, default is 1, I?d try it at 2, 4, or even more if > the CPUs are beefy enough. And setting server.event-threads to 4 and > client.event-threads to 8 has proven helpful in many cases. After you get > upgraded to 3.12.15, enabling performance.stat-prefetch may help as well. I > don?t know if it matters, but I?d also recommend resetting > performance.least-prio-threads to the default of 1 (or try 2 or 4) and/or > also setting performance.io-thread-count to 32 if those have beefy CPUs. > > Beyond those general ideas, more info about your hardware (CPU and RAM) > and workload (VMs, direct storage for web servers or enders, etc) may net > you some more ideas. Then you?re going to have to do more digging into > brick logs looking for errors and/or warnings to see what?s going on. > > -Darrell > > > On Apr 20, 2019, at 8:22 AM, Patrick Rennie > wrote: > > Hello Gluster Users, > > I am hoping someone can help me with resolving an ongoing issue I've been > having, I'm new to mailing lists so forgive me if I have gotten anything > wrong. We have noticed our performance deteriorating over the last few > weeks, easily measured by trying to do an ls on one of our top-level > folders, and timing it, which usually would take 2-5 seconds, and now takes > up to 20 minutes, which obviously renders our cluster basically unusable. > This has been intermittent in the past but is now almost constant and I am > not sure how to work out the exact cause. We have noticed some errors in > the brick logs, and have noticed that if we kill the right brick process, > performance instantly returns back to normal, this is not always the same > brick, but it indicates to me something in the brick processes or > background tasks may be causing extreme latency. Due to this ability to fix > it by killing the right brick process off, I think it's a specific file, or > folder, or operation which may be hanging and causing the increased > latency, but I am not sure how to work it out. One last thing to add is > that our bricks are getting quite full (~95% full), we are trying to > migrate data off to new storage but that is going slowly, not helped by > this issue. I am currently trying to run a full heal as there appear to be > many files needing healing, and I have all brick processes running so they > have an opportunity to heal, but this means performance is very poor. It > currently takes over 15-20 minutes to do an ls of one of our top-level > folders, which just contains 60-80 other folders, this should take 2-5 > seconds. This is all being checked by FUSE mount locally on the storage > node itself, but it is the same for other clients and VMs accessing the > cluster. Initially, it seemed our NFS mounts were not affected and operated > at normal speed, but testing over the last day has shown that our NFS > clients are also extremely slow, so it doesn't seem specific to FUSE as I > first thought it might be. > > I am not sure how to proceed from here, I am fairly new to gluster having > inherited this setup from my predecessor and trying to keep it going. I > have included some info below to try and help with diagnosis, please let me > know if any further info would be helpful. I would really appreciate any > advice on what I could try to work out the cause. Thank you in advance for > reading this, and any suggestions you might be able to offer. > > - Patrick > > This is an example of the main error I see in our brick logs, there have > been others, I can post them when I see them again too: > [2019-04-20 04:54:43.055680] E [MSGID: 113001] > [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on > /brick1/ library: system.posix_acl_default [Operation not > supported] > [2019-04-20 05:01:29.476313] W [posix.c:4929:posix_getxattr] > 0-gvAA01-posix: Extended attributes not supported (try remounting brick > with 'user_xattr' flag) > > Our setup consists of 2 storage nodes and an arbiter node. I have noticed > our nodes are on slightly different versions, I'm not sure if this could be > an issue. We have 9 bricks on each node, made up of ZFS RAIDZ2 pools - > total capacity is around 560TB. > We have bonded 10gbps NICS on each node, and I have tested bandwidth with > iperf and found that it's what would be expected from this config. > Individual brick performance seems ok, I've tested several bricks using dd > and can write a 10GB files at 1.7GB/s. > > # dd if=/dev/zero of=/brick1/test/test.file bs=1M count=10000 > 10000+0 records in > 10000+0 records out > 10485760000 bytes (10 GB, 9.8 GiB) copied, 6.20303 s, 1.7 GB/s > > Node 1: > # glusterfs --version > glusterfs 3.12.15 > > Node 2: > # glusterfs --version > glusterfs 3.12.14 > > Arbiter: > # glusterfs --version > glusterfs 3.12.14 > > Here is our gluster volume status: > > # gluster volume status > Status of volume: gvAA01 > Gluster process TCP Port RDMA Port Online > Pid > > ------------------------------------------------------------------------------ > Brick 01-B:/brick1/gvAA01/brick 49152 0 Y 7219 > Brick 02-B:/brick1/gvAA01/brick 49152 0 Y 21845 > Brick 00-A:/arbiterAA01/gvAA01/bri > ck1 49152 0 Y > 6931 > Brick 01-B:/brick2/gvAA01/brick 49153 0 Y 7239 > Brick 02-B:/brick2/gvAA01/brick 49153 0 Y 9916 > Brick 00-A:/arbiterAA01/gvAA01/bri > ck2 49153 0 Y > 6939 > Brick 01-B:/brick3/gvAA01/brick 49154 0 Y 7235 > Brick 02-B:/brick3/gvAA01/brick 49154 0 Y 21858 > Brick 00-A:/arbiterAA01/gvAA01/bri > ck3 49154 0 Y > 6947 > Brick 01-B:/brick4/gvAA01/brick 49155 0 Y 31840 > Brick 02-B:/brick4/gvAA01/brick 49155 0 Y 9933 > Brick 00-A:/arbiterAA01/gvAA01/bri > ck4 49155 0 Y > 6956 > Brick 01-B:/brick5/gvAA01/brick 49156 0 Y 7233 > Brick 02-B:/brick5/gvAA01/brick 49156 0 Y 9942 > Brick 00-A:/arbiterAA01/gvAA01/bri > ck5 49156 0 Y > 6964 > Brick 01-B:/brick6/gvAA01/brick 49157 0 Y 7234 > Brick 02-B:/brick6/gvAA01/brick 49157 0 Y 9952 > Brick 00-A:/arbiterAA01/gvAA01/bri > ck6 49157 0 Y > 6974 > Brick 01-B:/brick7/gvAA01/brick 49158 0 Y 7248 > Brick 02-B:/brick7/gvAA01/brick 49158 0 Y 9960 > Brick 00-A:/arbiterAA01/gvAA01/bri > ck7 49158 0 Y > 6984 > Brick 01-B:/brick8/gvAA01/brick 49159 0 Y 7253 > Brick 02-B:/brick8/gvAA01/brick 49159 0 Y 9970 > Brick 00-A:/arbiterAA01/gvAA01/bri > ck8 49159 0 Y > 6993 > Brick 01-B:/brick9/gvAA01/brick 49160 0 Y 7245 > Brick 02-B:/brick9/gvAA01/brick 49160 0 Y 9984 > Brick 00-A:/arbiterAA01/gvAA01/bri > ck9 49160 0 Y > 7001 > NFS Server on localhost 2049 0 Y > 17276 > Self-heal Daemon on localhost N/A N/A Y > 25245 > NFS Server on 02-B 2049 0 Y 9089 > Self-heal Daemon on 02-B N/A N/A Y 17838 > NFS Server on 00-a 2049 0 Y 15660 > Self-heal Daemon on 00-a N/A N/A Y 16218 > > Task Status of Volume gvAA01 > > ------------------------------------------------------------------------------ > There are no active volume tasks > > And gluster volume info: > > # gluster volume info > > Volume Name: gvAA01 > Type: Distributed-Replicate > Volume ID: ca4ece2c-13fe-414b-856c-2878196d6118 > Status: Started > Snapshot Count: 0 > Number of Bricks: 9 x (2 + 1) = 27 > Transport-type: tcp > Bricks: > Brick1: 01-B:/brick1/gvAA01/brick > Brick2: 02-B:/brick1/gvAA01/brick > Brick3: 00-A:/arbiterAA01/gvAA01/brick1 (arbiter) > Brick4: 01-B:/brick2/gvAA01/brick > Brick5: 02-B:/brick2/gvAA01/brick > Brick6: 00-A:/arbiterAA01/gvAA01/brick2 (arbiter) > Brick7: 01-B:/brick3/gvAA01/brick > Brick8: 02-B:/brick3/gvAA01/brick > Brick9: 00-A:/arbiterAA01/gvAA01/brick3 (arbiter) > Brick10: 01-B:/brick4/gvAA01/brick > Brick11: 02-B:/brick4/gvAA01/brick > Brick12: 00-A:/arbiterAA01/gvAA01/brick4 (arbiter) > Brick13: 01-B:/brick5/gvAA01/brick > Brick14: 02-B:/brick5/gvAA01/brick > Brick15: 00-A:/arbiterAA01/gvAA01/brick5 (arbiter) > Brick16: 01-B:/brick6/gvAA01/brick > Brick17: 02-B:/brick6/gvAA01/brick > Brick18: 00-A:/arbiterAA01/gvAA01/brick6 (arbiter) > Brick19: 01-B:/brick7/gvAA01/brick > Brick20: 02-B:/brick7/gvAA01/brick > Brick21: 00-A:/arbiterAA01/gvAA01/brick7 (arbiter) > Brick22: 01-B:/brick8/gvAA01/brick > Brick23: 02-B:/brick8/gvAA01/brick > Brick24: 00-A:/arbiterAA01/gvAA01/brick8 (arbiter) > Brick25: 01-B:/brick9/gvAA01/brick > Brick26: 02-B:/brick9/gvAA01/brick > Brick27: 00-A:/arbiterAA01/gvAA01/brick9 (arbiter) > Options Reconfigured: > cluster.shd-max-threads: 4 > performance.least-prio-threads: 16 > cluster.readdir-optimize: on > performance.quick-read: off > performance.stat-prefetch: off > cluster.data-self-heal: on > cluster.lookup-unhashed: auto > cluster.lookup-optimize: on > cluster.favorite-child-policy: mtime > server.allow-insecure: on > transport.address-family: inet > client.bind-insecure: on > cluster.entry-self-heal: off > cluster.metadata-self-heal: off > performance.md-cache-timeout: 600 > cluster.self-heal-daemon: enable > performance.readdir-ahead: on > diagnostics.brick-log-level: INFO > nfs.disable: off > > Thank you for any assistance. > > - Patrick > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- Brick: HOSTNAME-02-B:/brick1/gvAA01/brick ----------------------------------------- Cumulative Stats: Block Size: 32b+ 64b+ 128b+ No. of Reads: 1 0 0 No. of Writes: 138 7 45 Block Size: 256b+ 512b+ 1024b+ No. of Reads: 0 0 0 No. of Writes: 1 588 8321 Block Size: 2048b+ 4096b+ 8192b+ No. of Reads: 0 2 5 No. of Writes: 9294 88957 21544 Block Size: 16384b+ 32768b+ 65536b+ No. of Reads: 13 22 41 No. of Writes: 253312 24305 207953 Block Size: 131072b+ No. of Reads: 415121 No. of Writes: 632261 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 1656 FORGET 0.00 0.00 us 0.00 us 0.00 us 3098 RELEASE 0.00 0.00 us 0.00 us 0.00 us 30293 RELEASEDIR 0.00 38.00 us 38.00 us 38.00 us 1 FLUSH 0.00 881.00 us 881.00 us 881.00 us 1 UNLINK 0.00 252705.00 us 252705.00 us 252705.00 us 1 LK 0.00 45581.06 us 275.00 us 760038.00 us 17 SETXATTR 0.00 178264.40 us 1180.00 us 567795.00 us 5 MKNOD 0.01 52932.15 us 269.00 us 710452.00 us 34 SETATTR 0.01 56835.03 us 22.00 us 875479.00 us 34 GETXATTR 0.01 33009.92 us 55.00 us 690160.00 us 59 READ 0.01 130547.06 us 199.00 us 746238.00 us 17 REMOVEXATTR 0.01 101512.69 us 45.00 us 898760.00 us 35 READDIR 0.02 176200.21 us 79.00 us 767224.00 us 24 READDIRP 0.03 149224.76 us 1210.00 us 847273.00 us 50 MKDIR 0.04 477780.00 us 33.00 us 894975.00 us 20 FSTAT 0.04 294297.09 us 766.00 us 1055615.00 us 33 XATTROP 0.10 121732.71 us 19.00 us 1134288.00 us 211 ENTRYLK 0.10 61611.54 us 470.00 us 1132038.00 us 439 FSYNC 0.11 308381.30 us 56.00 us 1117021.00 us 97 OPENDIR 0.26 352045.54 us 25.00 us 1117112.00 us 200 STATFS 0.54 59824.41 us 66.00 us 1050503.00 us 2449 WRITE 0.62 42784.77 us 135.00 us 982688.00 us 3920 FXATTROP 1.09 48299.64 us 12.00 us 1475231.00 us 6113 FINODELK 1.99 74036.50 us 337.00 us 1504736.00 us 7270 RCHECKSUM 5.02 91592.07 us 14.00 us 1727644.00 us 14800 INODELK 18.22 391339.18 us 13.00 us 1118801.00 us 12565 STAT 71.77 447238.86 us 65.00 us 1520575.00 us 43295 LOOKUP Duration: 78214 seconds Data Read: 54416941626 bytes Data Written: 113658424203 bytes Interval 1 Stats: Block Size: 4096b+ 16384b+ 32768b+ No. of Reads: 0 0 0 No. of Writes: 10 1 37 Block Size: 65536b+ 131072b+ No. of Reads: 0 99 No. of Writes: 364 2494 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 115 RELEASEDIR 0.00 159976.00 us 1747.00 us 318205.00 us 2 MKNOD 0.01 43052.35 us 72.00 us 711323.00 us 17 READDIR 0.01 95923.62 us 306.00 us 760038.00 us 8 SETXATTR 0.01 66875.50 us 283.00 us 703396.00 us 16 SETATTR 0.01 99516.19 us 22.00 us 875479.00 us 16 GETXATTR 0.01 186339.50 us 268.00 us 670244.00 us 10 READDIRP 0.01 38121.54 us 55.00 us 690160.00 us 49 READ 0.02 276285.38 us 199.00 us 746238.00 us 8 REMOVEXATTR 0.03 234487.19 us 766.00 us 992806.00 us 16 XATTROP 0.04 160636.43 us 1337.00 us 847273.00 us 35 MKDIR 0.05 445612.14 us 33.00 us 885814.00 us 14 FSTAT 0.10 318281.76 us 56.00 us 1117021.00 us 42 OPENDIR 0.11 77332.15 us 512.00 us 1132038.00 us 184 FSYNC 0.13 140282.40 us 19.00 us 1134288.00 us 121 ENTRYLK 0.26 371711.56 us 25.00 us 1117112.00 us 94 STATFS 0.50 65120.02 us 77.00 us 1048875.00 us 1037 WRITE 0.52 44151.04 us 145.00 us 982688.00 us 1588 FXATTROP 0.96 53136.48 us 15.00 us 1131465.00 us 2444 FINODELK 2.04 75382.00 us 337.00 us 1135632.00 us 3653 RCHECKSUM 4.89 89311.00 us 14.00 us 1727644.00 us 7403 INODELK 19.46 432093.99 us 13.00 us 1118801.00 us 6093 STAT 70.86 461973.08 us 65.00 us 1520575.00 us 20751 LOOKUP Duration: 246 seconds Data Read: 12976128 bytes Data Written: 374076928 bytes Brick: HOSTNAME-02-B:/brick7/gvAA01/brick ----------------------------------------- Cumulative Stats: Block Size: 32b+ 64b+ 512b+ No. of Reads: 0 0 0 No. of Writes: 3 1 2 Block Size: 1024b+ 2048b+ 4096b+ No. of Reads: 0 0 0 No. of Writes: 5 6 1174 Block Size: 8192b+ 16384b+ 32768b+ No. of Reads: 0 0 1 No. of Writes: 354 415 1133 Block Size: 65536b+ 131072b+ No. of Reads: 0 5403 No. of Writes: 7199 16939 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 59 FORGET 0.00 0.00 us 0.00 us 0.00 us 414 RELEASE 0.00 0.00 us 0.00 us 0.00 us 812 RELEASEDIR 0.00 51.00 us 51.00 us 51.00 us 1 LK 0.00 811.00 us 811.00 us 811.00 us 1 LINK 0.00 1962.67 us 116.00 us 5155.00 us 3 OPEN 0.00 19450.00 us 19450.00 us 19450.00 us 1 RENAME 0.00 425385.00 us 3485.00 us 847285.00 us 2 MKNOD 0.00 514930.00 us 94.00 us 1029766.00 us 2 FTRUNCATE 0.01 355550.20 us 457.00 us 713035.00 us 5 UNLINK 0.01 1192925.00 us 955005.00 us 1430845.00 us 2 XATTROP 0.01 493677.80 us 1865.00 us 1133142.00 us 5 CREATE 0.02 429648.22 us 26.00 us 1005252.00 us 9 FLUSH 0.02 375096.13 us 31.00 us 743624.00 us 15 FSTAT 0.03 553294.91 us 124.00 us 2047492.00 us 11 SETATTR 0.04 241321.34 us 210.00 us 1130863.00 us 35 READDIRP 0.11 308798.10 us 37.00 us 1090701.00 us 80 OPENDIR 0.14 437895.92 us 20.00 us 2390710.00 us 76 ENTRYLK 0.24 274161.42 us 20.00 us 1131003.00 us 206 STATFS 0.26 380771.70 us 111.00 us 2217146.00 us 156 FSYNC 0.62 608724.21 us 98.00 us 2805007.00 us 234 RCHECKSUM 0.68 436962.29 us 65.00 us 2218008.00 us 359 READ 0.88 292235.22 us 16.00 us 1860745.00 us 696 FINODELK 1.16 517693.60 us 20.00 us 3188822.00 us 516 INODELK 3.10 1007821.69 us 185.00 us 8062558.00 us 710 FXATTROP 5.47 459404.39 us 60.00 us 2424925.00 us 2747 WRITE 19.30 343135.28 us 11.00 us 1508447.00 us 12965 STAT 67.89 449890.07 us 56.00 us 3189202.00 us 34791 LOOKUP Duration: 2760 seconds Data Read: 708235477 bytes Data Written: 3108339591 bytes Interval 1 Stats: Block Size: 32b+ 2048b+ 4096b+ No. of Reads: 0 0 0 No. of Writes: 1 2 87 Block Size: 8192b+ 16384b+ 32768b+ No. of Reads: 0 0 0 No. of Writes: 33 46 155 Block Size: 65536b+ 131072b+ No. of Reads: 0 384 No. of Writes: 898 1821 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 7 FORGET 0.00 0.00 us 0.00 us 0.00 us 5 RELEASE 0.00 0.00 us 0.00 us 0.00 us 90 RELEASEDIR 0.00 811.00 us 811.00 us 811.00 us 1 LINK 0.00 3485.00 us 3485.00 us 3485.00 us 1 MKNOD 0.00 1962.67 us 116.00 us 5155.00 us 3 OPEN 0.01 364139.50 us 35.00 us 728244.00 us 2 FLUSH 0.01 955005.00 us 955005.00 us 955005.00 us 1 XATTROP 0.01 514930.00 us 94.00 us 1029766.00 us 2 FTRUNCATE 0.01 568496.50 us 3851.00 us 1133142.00 us 2 CREATE 0.02 443739.00 us 457.00 us 713035.00 us 4 UNLINK 0.03 340935.25 us 31.00 us 700342.00 us 8 FSTAT 0.03 701433.25 us 124.00 us 2047492.00 us 4 SETATTR 0.03 319603.10 us 1904.00 us 1130863.00 us 10 READDIRP 0.10 277164.95 us 40.00 us 1090701.00 us 39 OPENDIR 0.18 558491.82 us 25.00 us 2390710.00 us 34 ENTRYLK 0.25 275750.60 us 20.00 us 1131003.00 us 96 STATFS 0.26 362599.55 us 111.00 us 2217146.00 us 77 FSYNC 0.78 761765.55 us 98.00 us 2805007.00 us 110 RCHECKSUM 0.81 519364.02 us 65.00 us 2218008.00 us 168 READ 0.86 292915.32 us 16.00 us 1860745.00 us 313 FINODELK 1.36 618592.78 us 23.00 us 3188822.00 us 235 INODELK 3.03 1037876.55 us 220.00 us 8062558.00 us 313 FXATTROP 6.12 518399.40 us 67.00 us 2424925.00 us 1265 WRITE 21.23 369324.98 us 11.00 us 1496678.00 us 6163 STAT 64.89 406497.92 us 60.00 us 3189202.00 us 17113 LOOKUP Duration: 246 seconds Data Read: 50331648 bytes Data Written: 349692935 bytes Brick: HOSTNAME-02-B:/brick4/gvAA01/brick ----------------------------------------- Cumulative Stats: Block Size: 2b+ 4b+ 8b+ No. of Reads: 1 0 1 No. of Writes: 2 1 7 Block Size: 16b+ 32b+ 64b+ No. of Reads: 4 392 51 No. of Writes: 12 135 57 Block Size: 128b+ 256b+ 512b+ No. of Reads: 443 127 10 No. of Writes: 140 304 1444 Block Size: 1024b+ 2048b+ 4096b+ No. of Reads: 90 261 364 No. of Writes: 219068 121020 345512 Block Size: 8192b+ 16384b+ 32768b+ No. of Reads: 514 856 1478 No. of Writes: 118098 171851 108390 Block Size: 65536b+ 131072b+ 262144b+ No. of Reads: 2306 8288758 0 No. of Writes: 371871 702539 10 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 45329 FORGET 0.00 0.00 us 0.00 us 0.00 us 48418 RELEASE 0.00 0.00 us 0.00 us 0.00 us 311777 RELEASEDIR 0.00 38.00 us 38.00 us 38.00 us 1 LK 0.00 49.50 us 45.00 us 54.00 us 2 ENTRYLK 0.00 191.00 us 191.00 us 191.00 us 1 SETATTR 0.00 114.50 us 82.00 us 212.00 us 10 OPEN 0.00 53.83 us 28.00 us 258.00 us 29 FINODELK 0.00 32180.00 us 32180.00 us 32180.00 us 1 MKNOD 0.05 13870.29 us 44.00 us 565518.00 us 80 OPENDIR 0.11 84408.74 us 164.00 us 483823.00 us 27 READDIRP 0.15 15813.07 us 27.00 us 642324.00 us 200 STATFS 0.21 10819.69 us 30.00 us 2037193.00 us 404 FSTAT 0.34 49.82 us 10.00 us 159830.00 us 141050 INODELK 0.47 155572.08 us 67.00 us 469682.00 us 62 WRITE 1.93 376836.99 us 162.00 us 3309818.00 us 105 FXATTROP 9.35 2726.16 us 388.00 us 708391.00 us 70513 RCHECKSUM 17.39 6686.00 us 36.00 us 689525.00 us 53448 READ 18.05 29067.22 us 14.00 us 1708922.00 us 12766 STAT 51.94 33265.70 us 59.00 us 3308011.00 us 32089 LOOKUP Duration: 87061 seconds Data Read: 1086749393549 bytes Data Written: 147537171706 bytes Interval 1 Stats: Block Size: 8192b+ 32768b+ 65536b+ No. of Reads: 0 0 0 No. of Writes: 1 3 11 Block Size: 131072b+ No. of Reads: 85818 No. of Writes: 47 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 90 RELEASEDIR 0.00 77.00 us 33.00 us 258.00 us 7 FINODELK 0.00 109.12 us 46.00 us 397.00 us 34 OPENDIR 0.06 120138.50 us 837.00 us 483823.00 us 6 READDIRP 0.10 12370.53 us 31.00 us 642324.00 us 94 STATFS 0.29 54.74 us 10.00 us 159830.00 us 62641 INODELK 0.32 12202.91 us 30.00 us 2037193.00 us 313 FSTAT 0.42 166359.60 us 67.00 us 469682.00 us 30 WRITE 1.76 424134.63 us 162.00 us 3309818.00 us 49 FXATTROP 9.44 3566.54 us 396.00 us 708391.00 us 31320 RCHECKSUM 21.56 41576.42 us 14.00 us 1708922.00 us 6138 STAT 22.70 6488.02 us 36.00 us 689525.00 us 41414 READ 43.34 33595.04 us 81.00 us 3308011.00 us 15268 LOOKUP Duration: 246 seconds Data Read: 11248336896 bytes Data Written: 7489024 bytes Brick: HOSTNAME-02-B:/brick8/gvAA01/brick ----------------------------------------- Cumulative Stats: Block Size: 32b+ 512b+ 1024b+ No. of Reads: 0 1 0 No. of Writes: 2 415 835 Block Size: 2048b+ 4096b+ 8192b+ No. of Reads: 4 2 4 No. of Writes: 1788 22766 9151 Block Size: 16384b+ 32768b+ 65536b+ No. of Reads: 12 28 129 No. of Writes: 12333 23668 124056 Block Size: 131072b+ 262144b+ No. of Reads: 1565084 0 No. of Writes: 331920 2 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 99 FORGET 0.00 0.00 us 0.00 us 0.00 us 1590 RELEASE 0.00 0.00 us 0.00 us 0.00 us 32065 RELEASEDIR 0.00 158.00 us 158.00 us 158.00 us 1 SETATTR 0.00 42.75 us 34.00 us 57.00 us 4 ENTRYLK 0.00 95.00 us 91.00 us 99.00 us 2 OPEN 0.00 36.90 us 31.00 us 49.00 us 10 INODELK 0.00 70.67 us 23.00 us 139.00 us 6 GETXATTR 0.03 3341.00 us 3341.00 us 3341.00 us 1 UNLINK 0.03 4044.00 us 4044.00 us 4044.00 us 1 MKNOD 0.09 135.14 us 2.00 us 1219.00 us 83 OPENDIR 0.30 187.68 us 24.00 us 7544.00 us 200 STATFS 0.67 4177.55 us 263.00 us 28261.00 us 20 READDIRP 1.02 21242.50 us 153.00 us 122468.00 us 6 READDIR 14.15 138.01 us 16.00 us 21061.00 us 12765 STAT 83.70 336.20 us 35.00 us 30375.00 us 31007 LOOKUP Duration: 87061 seconds Data Read: 205153204736 bytes Data Written: 59695163000 bytes Interval 1 Stats: %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 90 RELEASEDIR 0.00 41.00 us 33.00 us 49.00 us 2 INODELK 0.00 158.00 us 158.00 us 158.00 us 1 SETATTR 0.00 42.75 us 34.00 us 57.00 us 4 ENTRYLK 0.05 3341.00 us 3341.00 us 3341.00 us 1 UNLINK 0.06 116.15 us 43.00 us 427.00 us 34 OPENDIR 0.06 4044.00 us 4044.00 us 4044.00 us 1 MKNOD 0.25 176.27 us 31.00 us 7544.00 us 94 STATFS 0.34 4370.40 us 3093.00 us 5741.00 us 5 READDIRP 14.33 152.05 us 17.00 us 21061.00 us 6138 STAT 84.90 366.58 us 73.00 us 30375.00 us 15084 LOOKUP Duration: 246 seconds Data Read: 0 bytes Data Written: 0 bytes Brick: HOSTNAME-02-B:/brick5/gvAA01/brick ----------------------------------------- Cumulative Stats: Block Size: 512b+ 1024b+ 2048b+ No. of Reads: 0 0 0 No. of Writes: 675 2015 5669 Block Size: 4096b+ 8192b+ 16384b+ No. of Reads: 3 0 2 No. of Writes: 86933 18437 13563 Block Size: 32768b+ 65536b+ 131072b+ No. of Reads: 13 11 2226145 No. of Writes: 28736 150681 348421 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 114 FORGET 0.00 0.00 us 0.00 us 0.00 us 1644 RELEASE 0.00 0.00 us 0.00 us 0.00 us 29197 RELEASEDIR 0.00 93.00 us 93.00 us 93.00 us 1 OPEN 0.00 121.00 us 111.00 us 131.00 us 2 SETATTR 0.00 62.43 us 22.00 us 134.00 us 7 GETXATTR 0.00 25.81 us 17.00 us 74.00 us 70 ENTRYLK 0.00 5628.50 us 2325.00 us 8932.00 us 2 MKNOD 0.01 124665.17 us 231.00 us 624079.00 us 6 READDIR 0.02 101333.57 us 165.00 us 520857.00 us 30 READDIRP 0.14 237618.40 us 1.00 us 763629.00 us 85 OPENDIR 0.24 170075.12 us 22.00 us 762654.00 us 200 STATFS 0.28 1494.00 us 387.00 us 658594.00 us 26497 RCHECKSUM 0.38 67376.52 us 67.00 us 592219.00 us 782 WRITE 0.39 36555.77 us 136.00 us 970840.00 us 1506 FXATTROP 0.41 76461.54 us 1824.00 us 784619.00 us 751 FSYNC 0.44 27318.09 us 14.00 us 779529.00 us 2256 FINODELK 1.39 3660.22 us 13.00 us 727062.00 us 53075 INODELK 18.68 205180.63 us 12.00 us 767356.00 us 12736 STAT 77.62 351003.58 us 30.00 us 971220.00 us 30942 LOOKUP Duration: 78214 seconds Data Read: 291787269120 bytes Data Written: 65192541184 bytes Interval 1 Stats: Block Size: 1024b+ 2048b+ 4096b+ No. of Reads: 0 0 0 No. of Writes: 2 1 5 Block Size: 8192b+ 16384b+ 32768b+ No. of Reads: 0 0 0 No. of Writes: 10 19 31 Block Size: 65536b+ 131072b+ No. of Reads: 0 0 No. of Writes: 206 686 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 90 RELEASEDIR 0.02 264573.67 us 103764.00 us 499127.00 us 6 READDIRP 0.09 197846.12 us 43.00 us 651148.00 us 34 OPENDIR 0.23 173971.46 us 27.00 us 663534.00 us 94 STATFS 0.25 1340.23 us 403.00 us 609386.00 us 13504 RCHECKSUM 0.33 30710.89 us 136.00 us 970840.00 us 774 FXATTROP 0.33 59568.81 us 79.00 us 592219.00 us 400 WRITE 0.40 74721.05 us 1894.00 us 645840.00 us 386 FSYNC 0.42 25870.04 us 17.00 us 641253.00 us 1160 FINODELK 1.35 3568.56 us 13.00 us 677567.00 us 27047 INODELK 20.17 235917.45 us 12.00 us 668351.00 us 6109 STAT 76.39 365037.23 us 61.00 us 971220.00 us 14951 LOOKUP Duration: 246 seconds Data Read: 0 bytes Data Written: 116390912 bytes Brick: HOSTNAME-02-B:/brick6/gvAA01/brick ----------------------------------------- Cumulative Stats: Block Size: 64b+ 512b+ 1024b+ No. of Reads: 2 1 0 No. of Writes: 0 249 554 Block Size: 2048b+ 4096b+ 8192b+ No. of Reads: 0 0 7 No. of Writes: 1147 5601 4339 Block Size: 16384b+ 32768b+ 65536b+ No. of Reads: 9 14 52 No. of Writes: 8060 16044 85807 Block Size: 131072b+ 262144b+ No. of Reads: 57503 0 No. of Writes: 233443 2 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 177 FORGET 0.00 0.00 us 0.00 us 0.00 us 267 RELEASE 0.00 0.00 us 0.00 us 0.00 us 32347 RELEASEDIR 0.00 77.80 us 18.00 us 140.00 us 5 GETXATTR 0.01 110.21 us 28.00 us 884.00 us 14 ENTRYLK 0.01 2082.00 us 2082.00 us 2082.00 us 1 RENAME 0.10 185.46 us 2.00 us 1256.00 us 84 OPENDIR 0.20 148.97 us 30.00 us 2192.00 us 200 STATFS 0.64 16299.00 us 163.00 us 95266.00 us 6 READDIR 0.83 5265.21 us 315.00 us 28553.00 us 24 READDIRP 15.27 181.48 us 14.00 us 30885.00 us 12765 STAT 82.92 409.24 us 29.00 us 31576.00 us 30731 LOOKUP Duration: 87061 seconds Data Read: 7542971594 bytes Data Written: 41717849088 bytes Interval 1 Stats: %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 90 RELEASEDIR 0.01 275.25 us 65.00 us 884.00 us 4 ENTRYLK 0.03 2082.00 us 2082.00 us 2082.00 us 1 RENAME 0.10 217.26 us 63.00 us 1044.00 us 34 OPENDIR 0.17 134.26 us 30.00 us 939.00 us 94 STATFS 0.33 4005.17 us 2273.00 us 6785.00 us 6 READDIRP 14.30 171.33 us 16.00 us 19199.00 us 6138 STAT 85.06 413.91 us 68.00 us 31576.00 us 15114 LOOKUP Duration: 246 seconds Data Read: 0 bytes Data Written: 0 bytes Brick: HOSTNAME-02-B:/brick2/gvAA01/brick ----------------------------------------- Cumulative Stats: Block Size: 8b+ 32b+ 64b+ No. of Reads: 0 0 0 No. of Writes: 1 54 46 Block Size: 128b+ 256b+ 512b+ No. of Reads: 0 0 0 No. of Writes: 180 277 721 Block Size: 1024b+ 2048b+ 4096b+ No. of Reads: 0 0 3 No. of Writes: 20590 17995 62409 Block Size: 8192b+ 16384b+ 32768b+ No. of Reads: 4 26 28 No. of Writes: 68162 107367 71851 Block Size: 65536b+ 131072b+ 262144b+ No. of Reads: 59 542702 0 No. of Writes: 218072 310390 8 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 573 FORGET 0.00 0.00 us 0.00 us 0.00 us 2644 RELEASE 0.00 0.00 us 0.00 us 0.00 us 28568 RELEASEDIR 0.00 33.00 us 33.00 us 33.00 us 1 FLUSH 0.00 67.50 us 56.00 us 79.00 us 2 LK 0.00 167.00 us 167.00 us 167.00 us 1 OPEN 0.01 144.96 us 42.00 us 1139.00 us 80 OPENDIR 0.01 111.08 us 24.00 us 2868.00 us 200 STATFS 0.03 157.12 us 73.00 us 9048.00 us 429 WRITE 0.86 135.98 us 15.00 us 13652.00 us 12765 STAT 1.18 81686.03 us 231.00 us 851745.00 us 29 READDIRP 3.98 45.61 us 14.00 us 11734.00 us 175875 INODELK 8.95 577.68 us 74.00 us 224862.00 us 31188 LOOKUP 84.98 1946.12 us 330.00 us 811977.00 us 87936 RCHECKSUM Duration: 78214 seconds Data Read: 71140696438 bytes Data Written: 71129235470 bytes Interval 1 Stats: Block Size: 131072b+ No. of Reads: 0 No. of Writes: 551 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 1 FORGET 0.00 0.00 us 0.00 us 0.00 us 90 RELEASEDIR 0.01 150.06 us 42.00 us 1139.00 us 34 OPENDIR 0.01 90.35 us 31.00 us 735.00 us 94 STATFS 0.04 202.23 us 81.00 us 9048.00 us 144 WRITE 1.11 133.66 us 16.00 us 9674.00 us 6138 STAT 1.90 234569.83 us 1225.00 us 851745.00 us 6 READDIRP 5.01 46.95 us 14.00 us 11257.00 us 78964 INODELK 11.66 568.83 us 74.00 us 114583.00 us 15164 LOOKUP 80.26 1503.60 us 330.00 us 558433.00 us 39481 RCHECKSUM Duration: 246 seconds Data Read: 0 bytes Data Written: 72220672 bytes Brick: HOSTNAME-02-B:/brick9/gvAA01/brick ----------------------------------------- Cumulative Stats: Block Size: 64b+ 512b+ 1024b+ No. of Reads: 1 2 0 No. of Writes: 0 1488 3133 Block Size: 2048b+ 4096b+ 8192b+ No. of Reads: 2 12 6 No. of Writes: 6241 20337 17112 Block Size: 16384b+ 32768b+ 65536b+ No. of Reads: 41 40 109 No. of Writes: 36536 97167 400752 Block Size: 131072b+ 262144b+ No. of Reads: 1392662 0 No. of Writes: 734025 9 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 245 FORGET 0.00 0.00 us 0.00 us 0.00 us 1040 RELEASE 0.00 0.00 us 0.00 us 0.00 us 32052 RELEASEDIR 0.00 40.50 us 32.00 us 49.00 us 2 FLUSH 0.00 64.50 us 54.00 us 75.00 us 2 ENTRYLK 0.00 45.25 us 14.00 us 92.00 us 4 GETXATTR 0.00 66.33 us 43.00 us 94.00 us 3 LK 0.00 230.00 us 230.00 us 230.00 us 1 OPEN 0.06 14688.00 us 14688.00 us 14688.00 us 1 UNLINK 0.13 358.22 us 2.00 us 6730.00 us 83 OPENDIR 0.16 178.78 us 26.00 us 4082.00 us 200 STATFS 0.57 21934.83 us 164.00 us 122756.00 us 6 READDIR 2.74 24249.73 us 122.00 us 185638.00 us 26 READDIRP 16.00 287.83 us 14.00 us 53088.00 us 12765 STAT 80.33 596.96 us 60.00 us 55814.00 us 30911 LOOKUP Duration: 87061 seconds Data Read: 182552716865 bytes Data Written: 146735579648 bytes Interval 1 Stats: %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 90 RELEASEDIR 0.00 64.50 us 54.00 us 75.00 us 2 ENTRYLK 0.14 14688.00 us 14688.00 us 14688.00 us 1 UNLINK 0.16 174.14 us 26.00 us 3573.00 us 94 STATFS 0.17 541.12 us 45.00 us 6730.00 us 34 OPENDIR 2.16 45567.60 us 951.00 us 139415.00 us 5 READDIRP 14.97 257.11 us 14.00 us 30619.00 us 6138 STAT 82.40 576.00 us 70.00 us 49717.00 us 15086 LOOKUP Duration: 246 seconds Data Read: 0 bytes Data Written: 0 bytes Brick: HOSTNAME-02-B:/brick3/gvAA01/brick ----------------------------------------- Cumulative Stats: Block Size: 4b+ 16b+ 32b+ No. of Reads: 0 0 0 No. of Writes: 1 1 13 Block Size: 64b+ 128b+ 256b+ No. of Reads: 0 0 0 No. of Writes: 10 125 37 Block Size: 512b+ 1024b+ 2048b+ No. of Reads: 0 0 1 No. of Writes: 324 326351 4542 Block Size: 4096b+ 8192b+ 16384b+ No. of Reads: 4 2 9 No. of Writes: 56358 18819 27128 Block Size: 32768b+ 65536b+ 131072b+ No. of Reads: 17 42 3816011 No. of Writes: 35096 76153 521853 Block Size: 262144b+ No. of Reads: 0 No. of Writes: 12 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 175 FORGET 0.00 0.00 us 0.00 us 0.00 us 2519 RELEASE 0.00 0.00 us 0.00 us 0.00 us 28490 RELEASEDIR 0.00 27.00 us 27.00 us 27.00 us 1 FLUSH 0.00 512.00 us 512.00 us 512.00 us 1 UNLINK 0.00 414686.00 us 414686.00 us 414686.00 us 1 LK 0.00 253232.00 us 39.00 us 506425.00 us 2 ENTRYLK 0.05 497784.80 us 124.00 us 1570288.00 us 30 READDIRP 0.11 7015.25 us 336.00 us 1565069.00 us 4505 RCHECKSUM 0.13 494818.47 us 52.00 us 1557805.00 us 80 OPENDIR 0.15 21618.97 us 19.00 us 1410202.00 us 2051 FSTAT 0.23 68568.80 us 1781.00 us 1573119.00 us 984 FSYNC 0.27 415087.87 us 24.00 us 1423645.00 us 196 STATFS 0.33 10851.24 us 16.00 us 1570142.00 us 9075 INODELK 0.60 85769.69 us 152.00 us 2328777.00 us 2065 FXATTROP 0.63 104161.26 us 53.00 us 1573331.00 us 1808 WRITE 0.71 24037.55 us 37.00 us 1570577.00 us 8788 READ 1.34 123004.33 us 13.00 us 1572980.00 us 3244 FINODELK 20.01 469342.63 us 11.00 us 1570713.00 us 12689 STAT 75.44 675578.60 us 98.00 us 1963137.00 us 33238 LOOKUP Duration: 78214 seconds Data Read: 500177719808 bytes Data Written: 79883172260 bytes Interval 1 Stats: Block Size: 128b+ 1024b+ 2048b+ No. of Reads: 0 0 0 No. of Writes: 1 566 14 Block Size: 4096b+ 8192b+ 16384b+ No. of Reads: 0 0 0 No. of Writes: 623 191 123 Block Size: 32768b+ 65536b+ 131072b+ No. of Reads: 0 0 10368 No. of Writes: 81 126 445 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 90 RELEASEDIR 0.00 512.00 us 512.00 us 512.00 us 1 UNLINK 0.00 253232.00 us 39.00 us 506425.00 us 2 ENTRYLK 0.02 567538.67 us 61455.00 us 1096240.00 us 6 READDIRP 0.10 425877.85 us 52.00 us 1412618.00 us 34 OPENDIR 0.10 6947.28 us 340.00 us 1416193.00 us 2133 RCHECKSUM 0.17 24302.80 us 20.00 us 1393421.00 us 1016 FSTAT 0.20 79981.08 us 1784.00 us 1444418.00 us 364 FSYNC 0.27 432331.46 us 24.00 us 1393263.00 us 90 STATFS 0.34 11557.54 us 16.00 us 1441854.00 us 4303 INODELK 0.51 94985.64 us 162.00 us 2328777.00 us 775 FXATTROP 0.63 128079.62 us 58.00 us 1442703.00 us 712 WRITE 0.73 23597.75 us 37.00 us 1416135.00 us 4456 READ 0.86 109550.04 us 13.00 us 1443792.00 us 1140 FINODELK 19.86 470022.73 us 14.00 us 1412768.00 us 6122 STAT 76.20 702701.46 us 98.00 us 1963137.00 us 15711 LOOKUP Duration: 246 seconds Data Read: 1358954496 bytes Data Written: 85018520 bytes Brick: HOSTNAME-01-B:/brick8/gvAA01/brick ----------------------------------------- Cumulative Stats: Block Size: 1b+ 2b+ 4b+ No. of Reads: 0 148 9 No. of Writes: 2198 856 1414 Block Size: 8b+ 16b+ 32b+ No. of Reads: 17 52 2454 No. of Writes: 2883 23076 27561 Block Size: 64b+ 128b+ 256b+ No. of Reads: 2418 1055 843 No. of Writes: 148757 245049 274675 Block Size: 512b+ 1024b+ 2048b+ No. of Reads: 8942 12954 30505 No. of Writes: 1985063 409642431 64134867 Block Size: 4096b+ 8192b+ 16384b+ No. of Reads: 112381 168949 285944 No. of Writes: 92567046 63502163 116489833 Block Size: 32768b+ 65536b+ 131072b+ No. of Reads: 743469 1210030 816372704 No. of Writes: 17318963 70068548 142121633 Block Size: 262144b+ 524288b+ 1048576b+ No. of Reads: 0 0 0 No. of Writes: 1088 11 111 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 1179035 FORGET 0.00 0.00 us 0.00 us 0.00 us 2293524 RELEASE 0.00 0.00 us 0.00 us 0.00 us 26725606 RELEASEDIR 0.00 292.00 us 292.00 us 292.00 us 1 LINK 0.00 36.47 us 24.00 us 45.00 us 15 FLUSH 0.00 428.50 us 358.00 us 499.00 us 2 RENAME 0.00 166.80 us 121.00 us 220.00 us 10 SETATTR 0.00 70.97 us 35.00 us 326.00 us 30 LK 0.00 58.15 us 37.00 us 105.00 us 55 FINODELK 0.00 1624.25 us 770.00 us 3488.00 us 8 MKNOD 0.00 61.19 us 27.00 us 334.00 us 214 ENTRYLK 0.01 242.92 us 154.00 us 560.00 us 78 SETXATTR 0.03 85.31 us 28.00 us 2293.00 us 894 STATFS 0.05 26925.20 us 294.00 us 90304.00 us 5 UNLINK 0.06 3003.38 us 297.00 us 57010.00 us 55 FXATTROP 0.07 2474.06 us 475.00 us 36988.00 us 78 MKDIR 0.10 68.06 us 27.00 us 2641.00 us 4293 FSTAT 0.11 82114.75 us 351.00 us 310420.00 us 4 READDIR 0.21 97.59 us 3.00 us 3330.00 us 6260 OPENDIR 0.39 134.22 us 47.00 us 26252.00 us 8479 OPEN 0.44 128.85 us 12.00 us 25805.00 us 9866 STAT 0.80 151.41 us 75.00 us 69114.00 us 15222 WRITE 0.83 69.77 us 18.00 us 599472.00 us 34079 INODELK 3.07 476.92 us 48.00 us 59897.00 us 18573 GETXATTR 8.90 505.04 us 47.00 us 120915.00 us 50768 READ 10.30 284.59 us 20.00 us 120764.00 us 104241 LOOKUP 74.63 19961.45 us 52.00 us 5262484.00 us 10772 READDIRP Duration: 4279793 seconds Data Read: 107167648980220 bytes Data Written: 31737926706873 bytes Interval 5 Stats: Block Size: 131072b+ No. of Reads: 2903 No. of Writes: 0 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 90 RELEASEDIR 0.00 85.00 us 62.00 us 108.00 us 2 ENTRYLK 0.00 185.00 us 185.00 us 185.00 us 1 SETATTR 0.03 1168.50 us 77.00 us 2260.00 us 2 INODELK 0.07 93.63 us 40.00 us 165.00 us 62 FSTAT 0.07 177.49 us 54.00 us 955.00 us 35 OPENDIR 0.14 135.89 us 28.00 us 2293.00 us 91 STATFS 0.49 43269.00 us 43269.00 us 43269.00 us 1 UNLINK 2.25 1365.04 us 164.00 us 22604.00 us 145 READDIRP 5.95 153.65 us 18.00 us 9302.00 us 3397 STAT 36.13 2655.90 us 61.00 us 101625.00 us 1194 READ 54.87 321.63 us 74.00 us 33476.00 us 14974 LOOKUP Duration: 246 seconds Data Read: 380502016 bytes Data Written: 0 bytes Brick: HOSTNAME-01-B:/brick6/gvAA01/brick ----------------------------------------- Cumulative Stats: Block Size: 1b+ 2b+ 4b+ No. of Reads: 0 1 0 No. of Writes: 2161 831 1435 Block Size: 8b+ 16b+ 32b+ No. of Reads: 0 0 2903 No. of Writes: 4166 5588026 44908 Block Size: 64b+ 128b+ 256b+ No. of Reads: 3536 5 80 No. of Writes: 87870 241581 376594 Block Size: 512b+ 1024b+ 2048b+ No. of Reads: 8020 7205 19984 No. of Writes: 7805307 539972646 80512909 Block Size: 4096b+ 8192b+ 16384b+ No. of Reads: 62273 100742 210754 No. of Writes: 102909970 76529387 166357551 Block Size: 32768b+ 65536b+ 131072b+ No. of Reads: 551103 735158 761717336 No. of Writes: 15712591 54848367 117383734 Block Size: 262144b+ 524288b+ 1048576b+ No. of Reads: 0 5 360612 No. of Writes: 3717 5306 1855286 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 835383 FORGET 0.00 0.00 us 0.00 us 0.00 us 817859 RELEASE 0.00 0.00 us 0.00 us 0.00 us 28832755 RELEASEDIR 0.00 42.75 us 30.00 us 67.00 us 4 FLUSH 0.00 65.38 us 43.00 us 120.00 us 8 LK 0.00 172.33 us 123.00 us 238.00 us 9 SETATTR 0.00 548.14 us 264.00 us 1315.00 us 7 UNLINK 0.00 1554.89 us 596.00 us 4101.00 us 9 MKNOD 0.00 59.64 us 24.00 us 520.00 us 238 ENTRYLK 0.00 245.50 us 150.00 us 493.00 us 78 SETXATTR 0.00 20433.00 us 20433.00 us 20433.00 us 1 RENAME 0.00 2955.50 us 81.00 us 22895.00 us 8 FSTAT 0.01 7511.17 us 216.00 us 42274.00 us 6 READDIR 0.01 85.94 us 26.00 us 2189.00 us 894 STATFS 0.01 4187.81 us 2124.00 us 20241.00 us 21 FSYNC 0.04 1532.93 us 223.00 us 66019.00 us 202 FXATTROP 0.05 4863.85 us 568.00 us 145709.00 us 78 MKDIR 0.11 131.42 us 61.00 us 3438.00 us 6721 OPEN 0.11 91.46 us 2.00 us 29957.00 us 10315 OPENDIR 0.17 142.09 us 15.00 us 12445.00 us 9865 STAT 0.35 107.33 us 18.00 us 774166.00 us 27066 INODELK 1.21 480.50 us 23.00 us 121342.00 us 21084 GETXATTR 2.71 145604.99 us 38.00 us 1277202.00 us 155 FINODELK 3.14 247.45 us 14.00 us 64731.00 us 105762 LOOKUP 3.36 282.18 us 68.00 us 131903.00 us 99206 READ 15.81 7013.42 us 49.00 us 4783006.00 us 18800 READDIRP 72.91 61747.58 us 84.00 us 612823.00 us 9846 WRITE Duration: 4279793 seconds Data Read: 100321376892903 bytes Data Written: 29652026734449 bytes Interval 5 Stats: %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 90 RELEASEDIR 0.01 157.00 us 50.00 us 436.00 us 4 ENTRYLK 0.12 177.06 us 58.00 us 1407.00 us 35 OPENDIR 0.24 138.63 us 39.00 us 1793.00 us 91 STATFS 0.38 20433.00 us 20433.00 us 20433.00 us 1 RENAME 2.46 1564.36 us 149.00 us 18439.00 us 84 READDIRP 8.49 133.50 us 19.00 us 7417.00 us 3397 STAT 88.30 313.58 us 81.00 us 20208.00 us 15040 LOOKUP Duration: 246 seconds Data Read: 0 bytes Data Written: 0 bytes Brick: HOSTNAME-01-B:/brick4/gvAA01/brick ----------------------------------------- Cumulative Stats: Block Size: 2b+ 4b+ 8b+ No. of Reads: 0 0 0 No. of Writes: 3 1 7 Block Size: 16b+ 32b+ 64b+ No. of Reads: 0 3 0 No. of Writes: 14 176 55 Block Size: 128b+ 256b+ 512b+ No. of Reads: 0 0 6 No. of Writes: 551 409 1327 Block Size: 1024b+ 2048b+ 4096b+ No. of Reads: 2 25 20 No. of Writes: 163855 98807 281768 Block Size: 8192b+ 16384b+ 32768b+ No. of Reads: 52 81 142 No. of Writes: 105053 164786 104540 Block Size: 65536b+ 131072b+ 262144b+ No. of Reads: 311 2267877 0 No. of Writes: 353531 7948711 10 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 61063 FORGET 0.00 0.00 us 0.00 us 0.00 us 14884 RELEASE 0.00 0.00 us 0.00 us 0.00 us 95159 RELEASEDIR 0.00 70.00 us 70.00 us 70.00 us 1 LK 0.00 59.50 us 56.00 us 63.00 us 2 ENTRYLK 0.00 174.00 us 174.00 us 174.00 us 1 SETATTR 0.00 142.19 us 70.00 us 229.00 us 26 OPEN 0.00 264.95 us 47.00 us 3405.00 us 81 OPENDIR 0.01 5896.86 us 27.00 us 127468.00 us 22 FINODELK 0.02 199117.00 us 199117.00 us 199117.00 us 1 MKNOD 0.12 5566.34 us 20.00 us 545166.00 us 197 STATFS 0.25 312.26 us 26.00 us 700937.00 us 7444 FSTAT 0.86 63127.77 us 162.00 us 952344.00 us 125 READDIRP 0.91 77367.29 us 212.00 us 1039173.00 us 107 FXATTROP 1.42 209840.71 us 121.00 us 620314.00 us 62 WRITE 1.48 95.93 us 15.00 us 722187.00 us 141085 INODELK 4.15 2543.50 us 57.00 us 721863.00 us 14886 READ 13.69 15652.88 us 16.00 us 721267.00 us 7986 STAT 17.78 2302.89 us 363.00 us 834931.00 us 70497 RCHECKSUM 59.29 16876.75 us 68.00 us 1036627.00 us 32076 LOOKUP Duration: 77227 seconds Data Read: 297294236329 bytes Data Written: 1094314598941 bytes Interval 1 Stats: Block Size: 8192b+ 32768b+ 65536b+ No. of Reads: 0 0 0 No. of Writes: 1 3 11 Block Size: 131072b+ No. of Reads: 17261 No. of Writes: 47 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 90 RELEASEDIR 0.00 193.43 us 54.00 us 1791.00 us 35 OPENDIR 0.02 554.58 us 20.00 us 31421.00 us 91 STATFS 0.06 12897.40 us 32.00 us 127468.00 us 10 FINODELK 1.02 312.28 us 26.00 us 700937.00 us 7443 FSTAT 1.47 44044.13 us 162.00 us 917890.00 us 76 READDIRP 2.55 92.91 us 16.00 us 722187.00 us 62715 INODELK 2.99 227428.40 us 121.00 us 534197.00 us 30 WRITE 3.12 139535.71 us 259.00 us 1039173.00 us 51 FXATTROP 8.24 5538.18 us 18.00 us 721267.00 us 3397 STAT 16.69 2558.28 us 57.00 us 721863.00 us 14886 READ 22.42 1632.14 us 363.00 us 718808.00 us 31356 RCHECKSUM 41.42 6177.50 us 76.00 us 1036627.00 us 15304 LOOKUP Duration: 246 seconds Data Read: 2262433792 bytes Data Written: 7489024 bytes Brick: HOSTNAME-01-B:/brick5/gvAA01/brick ----------------------------------------- Cumulative Stats: Block Size: 1b+ 2b+ 4b+ No. of Reads: 0 0 2 No. of Writes: 482 547 864 Block Size: 8b+ 16b+ 32b+ No. of Reads: 12 39 3645 No. of Writes: 1717 18223 16043 Block Size: 64b+ 128b+ 256b+ No. of Reads: 2441 542 390 No. of Writes: 607834 219970 204446 Block Size: 512b+ 1024b+ 2048b+ No. of Reads: 6262 5164 13211 No. of Writes: 1064280 203635430 61786149 Block Size: 4096b+ 8192b+ 16384b+ No. of Reads: 70712 123308 170598 No. of Writes: 80206120 104132098 110568574 Block Size: 32768b+ 65536b+ 131072b+ No. of Reads: 493854 698297 533801800 No. of Writes: 18319266 76209603 132443069 Block Size: 262144b+ 524288b+ 1048576b+ No. of Reads: 0 0 2 No. of Writes: 837 1 11 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 583048 FORGET 0.00 0.00 us 0.00 us 0.00 us 1791579 RELEASE 0.00 0.00 us 0.00 us 0.00 us 25457262 RELEASEDIR 0.00 322.17 us 205.00 us 808.00 us 6 SETATTR 0.00 2292.58 us 31.00 us 16199.00 us 12 FLUSH 0.00 28625.50 us 10170.00 us 47081.00 us 2 UNLINK 0.00 2983.42 us 51.00 us 37006.00 us 24 LK 0.00 15029.17 us 1096.00 us 44572.00 us 6 MKNOD 0.00 4302.60 us 150.00 us 66649.00 us 78 SETXATTR 0.01 13873.86 us 808.00 us 107723.00 us 78 MKDIR 0.01 755.09 us 121.00 us 211768.00 us 1698 FXATTROP 0.01 4013.43 us 23.00 us 71921.00 us 334 ENTRYLK 0.02 220483.93 us 257.00 us 516216.00 us 15 READDIR 0.03 4518.96 us 31.00 us 76937.00 us 894 STATFS 0.04 7217.03 us 1872.00 us 1382417.00 us 811 FSYNC 0.09 5690.68 us 2.00 us 2641373.00 us 2147 OPENDIR 0.10 1368.02 us 15.00 us 134024.00 us 9865 STAT 0.16 1421.97 us 48.00 us 3165390.00 us 15489 OPEN 0.17 2595.75 us 26.00 us 2281303.00 us 8854 GETXATTR 0.21 2879.52 us 22.00 us 132848.00 us 9957 FSTAT 0.21 11982.66 us 54.00 us 2340814.00 us 2438 READDIRP 1.42 2157.46 us 39.00 us 3980667.00 us 90417 LOOKUP 2.01 1851.28 us 13.00 us 4353557.00 us 149524 INODELK 2.09 6592.01 us 333.00 us 2340514.00 us 43669 RCHECKSUM 3.92 8208.32 us 37.00 us 284665.00 us 65801 READ 13.14 726280.54 us 26.00 us 4778309.00 us 2490 FINODELK 76.36 304426.35 us 47.00 us 3078703.00 us 34522 WRITE Duration: 4279793 seconds Data Read: 70063314642216 bytes Data Written: 30894365505026 bytes Interval 5 Stats: Block Size: 1024b+ 2048b+ 4096b+ No. of Reads: 0 0 0 No. of Writes: 2 1 5 Block Size: 8192b+ 16384b+ 32768b+ No. of Reads: 0 0 0 No. of Writes: 10 19 31 Block Size: 65536b+ 131072b+ No. of Reads: 0 19354 No. of Writes: 206 675 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 90 RELEASEDIR 0.00 142.03 us 51.00 us 558.00 us 35 OPENDIR 0.00 207.80 us 37.00 us 4278.00 us 91 STATFS 0.00 79.99 us 31.00 us 141.00 us 290 FSTAT 0.01 1539.89 us 170.00 us 8853.00 us 54 READDIRP 0.02 260.63 us 121.00 us 1374.00 us 776 FXATTROP 0.11 3223.89 us 2100.00 us 5998.00 us 388 FSYNC 0.14 444.37 us 15.00 us 15289.00 us 3397 STAT 0.83 593.88 us 76.00 us 47077.00 us 15531 LOOKUP 2.89 83325.97 us 86.00 us 502479.00 us 387 WRITE 5.95 8661.33 us 47.00 us 267392.00 us 7668 READ 8.17 3374.06 us 18.00 us 3834382.00 us 27019 INODELK 10.53 8711.57 us 411.00 us 496077.00 us 13488 RCHECKSUM 71.35 688576.36 us 37.00 us 4310436.00 us 1156 FINODELK Duration: 246 seconds Data Read: 2536767488 bytes Data Written: 114949120 bytes Brick: HOSTNAME-01-B:/brick1/gvAA01/brick ----------------------------------------- Cumulative Stats: Block Size: 32b+ 128b+ 512b+ No. of Reads: 10 16 0 No. of Writes: 2 0 3 Block Size: 1024b+ 2048b+ 4096b+ No. of Reads: 2 2 4 No. of Writes: 4 1 230 Block Size: 8192b+ 16384b+ 32768b+ No. of Reads: 2 8 10 No. of Writes: 62 111 342 Block Size: 65536b+ 131072b+ No. of Reads: 12 39099 No. of Writes: 1999 14756 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 110 FORGET 0.00 0.00 us 0.00 us 0.00 us 1552 RELEASE 0.00 0.00 us 0.00 us 0.00 us 5486 RELEASEDIR 0.00 33.00 us 33.00 us 33.00 us 1 FLUSH 0.00 75.00 us 75.00 us 75.00 us 1 LK 0.00 177.50 us 93.00 us 313.00 us 10 FSTAT 0.00 23790.00 us 23790.00 us 23790.00 us 1 UNLINK 0.00 308.52 us 28.00 us 9424.00 us 197 STATFS 0.01 5786.87 us 124.00 us 196553.00 us 102 READDIRP 0.01 166.50 us 54.00 us 37190.00 us 3921 FXATTROP 0.02 1713.96 us 155.00 us 57875.00 us 440 FSYNC 0.03 13994.76 us 26.00 us 115223.00 us 86 GETXATTR 0.03 9081.56 us 33.00 us 119977.00 us 134 OPENDIR 0.04 58848.03 us 100.00 us 709195.00 us 33 READDIR 0.05 262.36 us 15.00 us 16663.00 us 8244 STAT 0.08 40910.06 us 598.00 us 174924.00 us 88 XATTROP 0.16 20063.18 us 25.00 us 165004.00 us 355 ENTRYLK 0.21 17832.65 us 77.00 us 171068.00 us 545 READ 0.31 7303.16 us 65.00 us 559590.00 us 1961 WRITE 2.79 2933.18 us 67.00 us 208762.00 us 43381 LOOKUP 5.41 33917.36 us 430.00 us 842225.00 us 7262 RCHECKSUM 13.39 41023.35 us 20.00 us 13484708.00 us 14865 INODELK 77.46 603032.78 us 17.00 us 14543086.00 us 5850 FINODELK Duration: 2952 seconds Data Read: 5126633408 bytes Data Written: 2195986825 bytes Interval 1 Stats: Block Size: 4096b+ 16384b+ 32768b+ No. of Reads: 0 0 0 No. of Writes: 10 1 37 Block Size: 65536b+ 131072b+ No. of Reads: 0 678 No. of Writes: 364 1888 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 17 FORGET 0.00 0.00 us 0.00 us 0.00 us 159 RELEASEDIR 0.00 175.83 us 93.00 us 313.00 us 6 FSTAT 0.00 372.52 us 34.00 us 9424.00 us 91 STATFS 0.01 155.47 us 54.00 us 15012.00 us 1586 FXATTROP 0.02 1662.29 us 603.00 us 19925.00 us 185 FSYNC 0.02 5773.04 us 124.00 us 196553.00 us 56 READDIRP 0.03 11697.47 us 56.00 us 119977.00 us 55 OPENDIR 0.04 19956.53 us 41.00 us 115223.00 us 36 GETXATTR 0.04 48883.81 us 100.00 us 119392.00 us 16 READDIR 0.06 344.76 us 15.00 us 16663.00 us 3514 STAT 0.10 39258.90 us 800.00 us 140015.00 us 52 XATTROP 0.17 19739.70 us 28.00 us 165004.00 us 169 ENTRYLK 0.29 19225.55 us 85.00 us 154901.00 us 299 READ 0.38 9418.15 us 74.00 us 180442.00 us 795 WRITE 3.27 3127.60 us 74.00 us 208762.00 us 20788 LOOKUP 6.44 35274.85 us 431.00 us 310417.00 us 3637 RCHECKSUM 13.94 37449.35 us 25.00 us 12546026.00 us 7412 INODELK 75.20 635507.62 us 19.00 us 14543086.00 us 2356 FINODELK Duration: 246 seconds Data Read: 88866816 bytes Data Written: 294647296 bytes Brick: HOSTNAME-01-B:/brick9/gvAA01/brick ----------------------------------------- Cumulative Stats: Block Size: 1b+ 2b+ 4b+ No. of Reads: 0 55 9 No. of Writes: 1491 1000 2102 Block Size: 8b+ 16b+ 32b+ No. of Reads: 64 50 4812 No. of Writes: 4696 7222905 44892 Block Size: 64b+ 128b+ 256b+ No. of Reads: 4319 4022 3248 No. of Writes: 293891 418033 542883 Block Size: 512b+ 1024b+ 2048b+ No. of Reads: 12528 18194 54796 No. of Writes: 8494202 664144673 121431414 Block Size: 4096b+ 8192b+ 16384b+ No. of Reads: 216483 292636 465186 No. of Writes: 206999000 164921303 247436952 Block Size: 32768b+ 65536b+ 131072b+ No. of Reads: 1428505 2116990 1026270560 No. of Writes: 33487910 131100929 250121202 Block Size: 262144b+ 524288b+ 1048576b+ No. of Reads: 0 0 0 No. of Writes: 3112 1649 486445 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 1466412 FORGET 0.00 0.00 us 0.00 us 0.00 us 1649265 RELEASE 0.00 0.00 us 0.00 us 0.00 us 26523053 RELEASEDIR 0.00 41.00 us 18.00 us 134.00 us 13 FLUSH 0.00 61.28 us 37.00 us 155.00 us 25 LK 0.00 6961.00 us 6961.00 us 6961.00 us 1 RENAME 0.00 1194.50 us 237.00 us 4764.00 us 6 READDIR 0.00 28213.00 us 28213.00 us 28213.00 us 1 UNLINK 0.00 6334.43 us 126.00 us 25589.00 us 7 SETATTR 0.00 12989.29 us 800.00 us 37294.00 us 7 MKNOD 0.00 1259.36 us 128.00 us 32443.00 us 78 SETXATTR 0.01 1803.81 us 25.00 us 44751.00 us 204 ENTRYLK 0.01 7197.82 us 506.00 us 90051.00 us 78 MKDIR 0.04 3359.54 us 21.00 us 216237.00 us 894 STATFS 0.05 25889.78 us 600.00 us 236221.00 us 130 FSYNC 0.06 1592.46 us 2.00 us 381435.00 us 2539 OPENDIR 0.06 463.18 us 60.00 us 539032.00 us 8960 OPEN 0.12 841.19 us 14.00 us 217750.00 us 9865 STAT 0.25 1861.16 us 20.00 us 247927.00 us 9467 FSTAT 0.28 544.06 us 17.00 us 1376223.00 us 36022 INODELK 0.30 13330.61 us 159.00 us 399369.00 us 1583 FXATTROP 0.33 2010.68 us 29.00 us 537519.00 us 11519 GETXATTR 0.66 35582.06 us 18.00 us 983226.00 us 1290 FINODELK 1.93 1499.23 us 47.00 us 1543777.00 us 89791 LOOKUP 4.19 87790.99 us 52.00 us 11504765.00 us 3329 READDIRP 13.62 2616.76 us 35.00 us 534914.00 us 363070 READ 78.09 65133.76 us 49.00 us 1024146.00 us 83658 WRITE Duration: 4279793 seconds Data Read: 134801101778949 bytes Data Written: 58578267669735 bytes Interval 5 Stats: Block Size: 131072b+ No. of Reads: 1129 No. of Writes: 0 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 90 RELEASEDIR 0.00 81.00 us 77.00 us 85.00 us 2 ENTRYLK 0.10 197.94 us 61.00 us 1277.00 us 35 OPENDIR 0.17 127.99 us 37.00 us 1322.00 us 91 STATFS 0.40 827.09 us 58.00 us 24347.00 us 33 FSTAT 0.41 28213.00 us 28213.00 us 28213.00 us 1 UNLINK 7.75 156.57 us 17.00 us 16695.00 us 3397 STAT 8.23 3870.01 us 138.00 us 284848.00 us 146 READDIRP 10.90 1623.22 us 50.00 us 23865.00 us 461 READ 72.04 327.04 us 87.00 us 60571.00 us 15125 LOOKUP Duration: 246 seconds Data Read: 147980288 bytes Data Written: 0 bytes Brick: HOSTNAME-01-B:/brick3/gvAA01/brick ----------------------------------------- Cumulative Stats: Block Size: 1b+ 2b+ 4b+ No. of Reads: 0 12 6 No. of Writes: 2567 781 1340 Block Size: 8b+ 16b+ 32b+ No. of Reads: 31 66 4036 No. of Writes: 2948 440590 38913 Block Size: 64b+ 128b+ 256b+ No. of Reads: 2734 1728 1167 No. of Writes: 75799 347793 470159 Block Size: 512b+ 1024b+ 2048b+ No. of Reads: 9774 16256 29003 No. of Writes: 2265947 446846351 86298960 Block Size: 4096b+ 8192b+ 16384b+ No. of Reads: 118753 170352 279731 No. of Writes: 147940171 119810004 134459511 Block Size: 32768b+ 65536b+ 131072b+ No. of Reads: 1011892 1204426 639618284 No. of Writes: 29300584 134801413 251115548 Block Size: 262144b+ 524288b+ 1048576b+ No. of Reads: 0 3 368642 No. of Writes: 1808 14 312 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 556431 FORGET 0.00 0.00 us 0.00 us 0.00 us 646004 RELEASE 0.00 0.00 us 0.00 us 0.00 us 24454045 RELEASEDIR 0.00 308.00 us 308.00 us 308.00 us 1 RENAME 0.00 39.80 us 27.00 us 67.00 us 15 FLUSH 0.00 452.50 us 304.00 us 601.00 us 2 READDIR 0.00 62.24 us 43.00 us 88.00 us 29 LK 0.00 167.55 us 133.00 us 195.00 us 11 SETATTR 0.00 230.03 us 138.00 us 610.00 us 78 SETXATTR 0.00 252.74 us 27.00 us 40634.00 us 205 ENTRYLK 0.00 4938.91 us 750.00 us 39272.00 us 11 MKNOD 0.00 24506.75 us 379.00 us 59086.00 us 4 UNLINK 0.00 64.45 us 26.00 us 1244.00 us 1721 FSTAT 0.00 93.36 us 3.00 us 6609.00 us 5953 OPENDIR 0.00 725.79 us 29.00 us 556857.00 us 894 STATFS 0.00 151.83 us 73.00 us 40777.00 us 4636 OPEN 0.00 9566.51 us 531.00 us 85953.00 us 78 MKDIR 0.05 192.95 us 40.00 us 96090.00 us 46282 READ 0.05 2198.09 us 349.00 us 123915.00 us 4724 RCHECKSUM 0.12 2289.39 us 15.00 us 574911.00 us 9870 STAT 0.16 2874.39 us 29.00 us 725497.00 us 10731 GETXATTR 0.22 5318.90 us 112.00 us 613165.00 us 7887 FXATTROP 0.62 630.24 us 49.00 us 436551.00 us 188339 WRITE 1.04 19727.28 us 54.00 us 2411484.00 us 10085 READDIRP 1.71 3781.30 us 21.00 us 613196.00 us 86277 LOOKUP 1.86 120602.70 us 146.00 us 22966912.00 us 2935 FSYNC 4.43 29711.50 us 19.00 us 36688198.00 us 28412 INODELK 89.72 1630926.05 us 19.00 us 39459053.00 us 10488 FINODELK Duration: 4279793 seconds Data Read: 84395109496663 bytes Data Written: 55381384251629 bytes Interval 5 Stats: Block Size: 128b+ 1024b+ 2048b+ No. of Reads: 0 0 0 No. of Writes: 1 566 14 Block Size: 4096b+ 8192b+ 16384b+ No. of Reads: 0 0 0 No. of Writes: 623 191 123 Block Size: 32768b+ 65536b+ 131072b+ No. of Reads: 6 2 19 No. of Writes: 81 126 445 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 90 RELEASEDIR 0.00 71.00 us 44.00 us 98.00 us 3 ENTRYLK 0.00 117.00 us 102.00 us 132.00 us 2 FSTAT 0.00 160.97 us 54.00 us 989.00 us 35 OPENDIR 0.00 182.71 us 37.00 us 2711.00 us 91 STATFS 0.00 37863.00 us 37863.00 us 37863.00 us 1 UNLINK 0.01 10687.38 us 93.00 us 96090.00 us 24 READ 0.04 906.65 us 112.00 us 140471.00 us 793 FXATTROP 0.04 10671.96 us 130.00 us 156901.00 us 74 READDIRP 0.05 248.06 us 17.00 us 26637.00 us 3400 STAT 0.06 2929.57 us 1830.00 us 42843.00 us 365 FSYNC 0.20 1782.90 us 377.00 us 114465.00 us 2135 RCHECKSUM 0.62 699.57 us 87.00 us 44929.00 us 16450 LOOKUP 0.91 23643.10 us 80.00 us 227392.00 us 716 WRITE 7.92 34236.55 us 19.00 us 21085778.00 us 4296 INODELK 90.15 1542280.19 us 21.00 us 23438747.00 us 1086 FINODELK Duration: 246 seconds Data Read: 2984448 bytes Data Written: 85018520 bytes Brick: HOSTNAME-01-B:/brick2/gvAA01/brick ----------------------------------------- Cumulative Stats: Block Size: 1b+ 2b+ 4b+ No. of Reads: 0 6 1 No. of Writes: 5690 1158 1507 Block Size: 8b+ 16b+ 32b+ No. of Reads: 3 8 4220 No. of Writes: 4257 653093 38884 Block Size: 64b+ 128b+ 256b+ No. of Reads: 2057 143 237 No. of Writes: 138246 284967 301583 Block Size: 512b+ 1024b+ 2048b+ No. of Reads: 9964 14318 24878 No. of Writes: 2799071 412706426 75565460 Block Size: 4096b+ 8192b+ 16384b+ No. of Reads: 111870 177302 261376 No. of Writes: 190929249 79365618 255744283 Block Size: 32768b+ 65536b+ 131072b+ No. of Reads: 1009823 1146396 686781597 No. of Writes: 27530847 83970418 194993443 Block Size: 262144b+ 524288b+ 1048576b+ No. of Reads: 0 1 0 No. of Writes: 1208 19 69 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 723412 FORGET 0.00 0.00 us 0.00 us 0.00 us 731255 RELEASE 0.00 0.00 us 0.00 us 0.00 us 24362990 RELEASEDIR 0.00 374.71 us 20.00 us 4282.00 us 21 FLUSH 0.00 262.48 us 35.00 us 2950.00 us 40 LK 0.00 2522.33 us 138.00 us 13040.00 us 15 SETATTR 0.00 8113.44 us 298.00 us 28903.00 us 9 RENAME 0.00 400.89 us 71.00 us 85670.00 us 1193 OPEN 0.00 6982.42 us 138.00 us 31124.00 us 78 SETXATTR 0.00 40219.80 us 955.00 us 114403.00 us 15 MKNOD 0.00 807.49 us 25.00 us 37564.00 us 894 STATFS 0.01 586.83 us 27.00 us 52770.00 us 1780 FSTAT 0.01 94259.50 us 1264.00 us 259989.00 us 12 UNLINK 0.01 238.64 us 15.00 us 418394.00 us 5290 ENTRYLK 0.03 20173.97 us 302.00 us 1014766.00 us 195 XATTROP 0.05 90005.90 us 1022.00 us 480876.00 us 78 MKDIR 0.06 908.45 us 15.00 us 46094.00 us 10388 STAT 0.06 2539.71 us 38.00 us 981906.00 us 3719 OPENDIR 0.29 11452.34 us 20.00 us 5427128.00 us 3916 GETXATTR 0.95 1026081.59 us 632.00 us 18320824.00 us 141 FSYNC 1.19 554.78 us 14.00 us 9904525.00 us 327694 INODELK 1.67 3739.81 us 67.00 us 9923314.00 us 68290 LOOKUP 1.76 2587364.42 us 45.00 us 134512764.00 us 104 READDIR 4.39 208181.93 us 58.00 us 39750208.00 us 3224 READDIRP 4.59 4377.20 us 259.00 us 10053127.00 us 160191 RCHECKSUM 7.95 5493.00 us 38.00 us 1437578.00 us 221339 READ 21.05 110957.49 us 151.00 us 15390646.00 us 29002 FXATTROP 27.49 29140.19 us 47.00 us 5380701.00 us 144248 WRITE 28.45 103576.62 us 12.00 us 39311904.00 us 41993 FINODELK Duration: 4279793 seconds Data Read: 90182870915545 bytes Data Written: 43304871600094 bytes Interval 5 Stats: Block Size: 131072b+ No. of Reads: 585 No. of Writes: 0 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 90 RELEASEDIR 0.00 390.00 us 390.00 us 390.00 us 1 FSTAT 0.01 363.34 us 54.00 us 4960.00 us 35 OPENDIR 0.01 175.43 us 30.00 us 3792.00 us 91 STATFS 0.22 3670.76 us 65.00 us 173128.00 us 148 READ 0.54 18823.38 us 146.00 us 428923.00 us 72 READDIRP 1.75 55.66 us 16.00 us 63524.00 us 78812 INODELK 2.19 1615.88 us 16.00 us 28474.00 us 3397 STAT 12.31 2043.55 us 96.00 us 127094.00 us 15129 LOOKUP 82.99 5290.98 us 359.00 us 998606.00 us 39404 RCHECKSUM Duration: 246 seconds Data Read: 76677120 bytes Data Written: 0 bytes Brick: HOSTNAME-01-B:/brick7/gvAA01/brick ----------------------------------------- Cumulative Stats: Block Size: 1b+ 2b+ 4b+ No. of Reads: 0 13 4 No. of Writes: 1196 734 1040 Block Size: 8b+ 16b+ 32b+ No. of Reads: 1 36 3299 No. of Writes: 2062 300726 37753 Block Size: 64b+ 128b+ 256b+ No. of Reads: 2023 496 525 No. of Writes: 78098 422006 308797 Block Size: 512b+ 1024b+ 2048b+ No. of Reads: 18849 14540 35638 No. of Writes: 1846989 429384177 84605233 Block Size: 4096b+ 8192b+ 16384b+ No. of Reads: 150787 230288 420816 No. of Writes: 137132537 56105000 83554076 Block Size: 32768b+ 65536b+ 131072b+ No. of Reads: 1193840 1821978 1328519682 No. of Writes: 23571537 108072617 188243753 Block Size: 262144b+ 524288b+ 1048576b+ No. of Reads: 0 0 0 No. of Writes: 1439 45 467 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 584854 FORGET 0.00 0.00 us 0.00 us 0.00 us 847666 RELEASE 0.00 0.00 us 0.00 us 0.00 us 25363464 RELEASEDIR 0.00 64.12 us 40.00 us 118.00 us 8 LK 0.00 55.38 us 28.00 us 195.00 us 13 FLUSH 0.00 584.50 us 296.00 us 873.00 us 2 XATTROP 0.00 191.79 us 117.00 us 302.00 us 14 SETATTR 0.00 861.00 us 700.00 us 921.00 us 6 MKNOD 0.00 15796.00 us 15796.00 us 15796.00 us 1 RENAME 0.00 242.94 us 137.00 us 827.00 us 78 SETXATTR 0.00 80.85 us 29.00 us 2109.00 us 280 ENTRYLK 0.00 40533.00 us 40533.00 us 40533.00 us 1 LINK 0.00 12168.00 us 3663.00 us 29808.00 us 5 CREATE 0.00 85.45 us 27.00 us 4758.00 us 894 STATFS 0.00 1005.23 us 84.00 us 6375.00 us 157 FSYNC 0.00 2083.33 us 570.00 us 29605.00 us 78 MKDIR 0.00 46997.20 us 1143.00 us 78119.00 us 5 UNLINK 0.00 57905.17 us 236.00 us 344520.00 us 6 READDIR 0.01 76.43 us 28.00 us 86185.00 us 8213 FSTAT 0.02 155.31 us 76.00 us 61948.00 us 9793 OPEN 0.02 7686.27 us 146.00 us 89509.00 us 231 RCHECKSUM 0.04 100.69 us 3.00 us 41331.00 us 31020 OPENDIR 0.09 3927.64 us 81.00 us 852406.00 us 1732 FXATTROP 0.13 238.47 us 21.00 us 36324.00 us 39834 GETXATTR 0.20 217.49 us 42.00 us 104309.00 us 68968 READ 0.55 1549.97 us 62.00 us 186554.00 us 25900 WRITE 1.14 8343.75 us 13.00 us 877090.00 us 10010 STAT 2.36 2870.24 us 55.00 us 133764.00 us 60231 READDIRP 3.39 1648.07 us 15.00 us 1386407.00 us 150815 LOOKUP 18.82 34568.94 us 20.00 us 255108468.00 us 39875 INODELK 73.20 2950233.38 us 22.00 us 238932201.00 us 1817 FINODELK Duration: 4279793 seconds Data Read: 174378591384147 bytes Data Written: 42067381218610 bytes Interval 5 Stats: Block Size: 32b+ 2048b+ 4096b+ No. of Reads: 1 0 0 No. of Writes: 0 2 90 Block Size: 8192b+ 16384b+ 32768b+ No. of Reads: 0 0 1 No. of Writes: 34 59 174 Block Size: 65536b+ 131072b+ No. of Reads: 0 102 No. of Writes: 1045 1863 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 8 FORGET 0.00 0.00 us 0.00 us 0.00 us 5 RELEASE 0.00 0.00 us 0.00 us 0.00 us 93 RELEASEDIR 0.00 48.00 us 35.00 us 61.00 us 2 FLUSH 0.00 182.00 us 79.00 us 285.00 us 2 FSTAT 0.00 188.50 us 184.00 us 193.00 us 2 SETATTR 0.00 873.00 us 873.00 us 873.00 us 1 XATTROP 0.00 492.00 us 154.00 us 1073.00 us 3 OPEN 0.00 314.67 us 128.00 us 658.00 us 6 GETXATTR 0.00 115.02 us 29.00 us 1537.00 us 44 ENTRYLK 0.00 168.39 us 45.00 us 861.00 us 38 OPENDIR 0.00 166.43 us 39.00 us 4758.00 us 91 STATFS 0.00 40533.00 us 40533.00 us 40533.00 us 1 LINK 0.00 23496.50 us 17185.00 us 29808.00 us 2 CREATE 0.00 1991.03 us 209.00 us 17197.00 us 35 READ 0.00 1044.09 us 84.00 us 3060.00 us 74 FSYNC 0.00 1268.33 us 159.00 us 8139.00 us 70 READDIRP 0.00 39216.75 us 1143.00 us 60962.00 us 4 UNLINK 0.01 149.32 us 16.00 us 55158.00 us 3479 STAT 0.01 7967.72 us 146.00 us 76151.00 us 106 RCHECKSUM 0.02 2188.39 us 81.00 us 87414.00 us 500 FXATTROP 0.08 296.47 us 80.00 us 20575.00 us 16886 LOOKUP 0.24 10693.63 us 85.00 us 186554.00 us 1365 WRITE 20.68 5275482.42 us 32.00 us 255108468.00 us 239 INODELK 78.95 8146074.61 us 22.00 us 238932201.00 us 591 FINODELK Duration: 246 seconds Data Read: 13408908 bytes Data Written: 373263308 bytes Brick: HOSTNAME-00-A:/arbiterAA01/gvAA01/brick2 ----------------------------------------------- Cumulative Stats: Block Size: 1b+ No. of Reads: 0 No. of Writes: 4595631725 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 5210160 FORGET 0.00 0.00 us 0.00 us 0.00 us 19833857 RELEASE 0.00 0.00 us 0.00 us 0.00 us 121845659 RELEASEDIR 0.00 47.77 us 29.00 us 72.00 us 22 FLUSH 0.00 67.21 us 32.00 us 161.00 us 42 LK 0.00 190.20 us 142.00 us 368.00 us 15 SETATTR 0.00 355.78 us 255.00 us 408.00 us 9 RENAME 0.01 660.58 us 239.00 us 988.00 us 12 UNLINK 0.01 223.95 us 142.00 us 652.00 us 78 SETXATTR 0.02 1351.87 us 991.00 us 1807.00 us 15 MKNOD 0.11 1625.97 us 1013.00 us 3225.00 us 78 MKDIR 0.14 134.56 us 73.00 us 1296.00 us 1194 OPEN 0.18 1095.03 us 265.00 us 17673.00 us 195 XATTROP 0.25 56.99 us 18.00 us 3303.00 us 5290 ENTRYLK 0.57 286.73 us 46.00 us 28784.00 us 2379 OPENDIR 2.47 69.67 us 17.00 us 31954.00 us 42231 FINODELK 3.17 26765.84 us 73.00 us 243526.00 us 141 FSYNC 6.30 73.60 us 22.00 us 18488.00 us 101758 WRITE 9.51 389.77 us 167.00 us 21472.00 us 29019 FXATTROP 17.26 62.64 us 16.00 us 37178.00 us 327699 INODELK 59.99 1046.54 us 81.00 us 83922.00 us 68182 LOOKUP Duration: 19297631 seconds Data Read: 0 bytes Data Written: 4595631725 bytes Interval 5 Stats: %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 90 RELEASEDIR 0.11 1109.06 us 64.00 us 28784.00 us 35 OPENDIR 14.97 64.35 us 16.00 us 36247.00 us 78809 INODELK 84.92 1903.39 us 101.00 us 57323.00 us 15117 LOOKUP Duration: 246 seconds Data Read: 0 bytes Data Written: 0 bytes Brick: HOSTNAME-00-A:/arbiterAA01/gvAA01/brick5 ----------------------------------------------- Cumulative Stats: Block Size: 1b+ No. of Reads: 0 No. of Writes: 3725579480 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 4675824 FORGET 0.00 0.00 us 0.00 us 0.00 us 3501480 RELEASE 0.00 0.00 us 0.00 us 0.00 us 121321048 RELEASEDIR 0.00 43.60 us 26.00 us 60.00 us 5 GETXATTR 0.00 183.50 us 136.00 us 254.00 us 6 SETATTR 0.00 94.50 us 34.00 us 360.00 us 12 FLUSH 0.00 637.50 us 378.00 us 897.00 us 2 UNLINK 0.00 152.08 us 54.00 us 1647.00 us 24 LK 0.01 923.40 us 352.00 us 2505.00 us 5 READDIR 0.02 1971.33 us 952.00 us 4079.00 us 6 MKNOD 0.03 278.36 us 158.00 us 1290.00 us 78 SETXATTR 0.03 70.37 us 27.00 us 906.00 us 334 ENTRYLK 0.06 147.48 us 83.00 us 1130.00 us 311 OPEN 0.18 1747.03 us 991.00 us 14702.00 us 78 MKDIR 0.28 178.04 us 33.00 us 9265.00 us 1190 OPENDIR 0.31 92.93 us 27.00 us 28425.00 us 2510 FINODELK 2.11 1980.44 us 117.00 us 116330.00 us 812 FSYNC 2.48 1109.55 us 141.00 us 193978.00 us 1702 FXATTROP 3.54 77.89 us 21.00 us 19627.00 us 34622 WRITE 7.06 60.51 us 21.00 us 43675.00 us 88861 INODELK 83.89 470.13 us 46.00 us 227967.00 us 135916 LOOKUP Duration: 19297631 seconds Data Read: 0 bytes Data Written: 3725579480 bytes Interval 5 Stats: Block Size: 1b+ No. of Reads: 0 No. of Writes: 949 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 90 RELEASEDIR 0.09 532.37 us 59.00 us 4354.00 us 35 OPENDIR 0.15 79.14 us 42.00 us 813.00 us 388 WRITE 0.44 75.07 us 27.00 us 1936.00 us 1164 FINODELK 0.94 241.59 us 141.00 us 3508.00 us 776 FXATTROP 1.86 957.12 us 124.00 us 9367.00 us 388 FSYNC 8.59 63.47 us 21.00 us 43675.00 us 27031 INODELK 87.92 1125.41 us 103.00 us 60160.00 us 15599 LOOKUP Duration: 246 seconds Data Read: 0 bytes Data Written: 949 bytes Brick: HOSTNAME-00-A:/arbiterAA01/gvAA01/brick9 ----------------------------------------------- Cumulative Stats: Block Size: 1b+ No. of Reads: 0 No. of Writes: 13824276920 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 11714931 FORGET 0.00 0.00 us 0.00 us 0.00 us 12574659 RELEASE 0.00 0.00 us 0.00 us 0.00 us 121307214 RELEASEDIR 0.00 612.00 us 612.00 us 612.00 us 1 RENAME 0.00 53.69 us 32.00 us 75.00 us 13 FLUSH 0.00 146.43 us 118.00 us 157.00 us 7 SETATTR 0.00 1053.00 us 1053.00 us 1053.00 us 1 UNLINK 0.00 90.76 us 39.00 us 575.00 us 25 LK 0.01 1011.83 us 280.00 us 3498.00 us 6 READDIR 0.01 1154.50 us 873.00 us 1578.00 us 6 MKNOD 0.01 65.37 us 32.00 us 609.00 us 202 ENTRYLK 0.02 212.12 us 124.00 us 561.00 us 78 SETXATTR 0.05 408.86 us 31.00 us 989.00 us 111 GETXATTR 0.07 66.08 us 25.00 us 1177.00 us 1094 FINODELK 0.11 1413.50 us 793.00 us 6939.00 us 78 MKDIR 0.17 145.79 us 3.00 us 16855.00 us 1188 OPENDIR 0.90 566.03 us 178.00 us 101200.00 us 1583 FXATTROP 1.17 132.19 us 70.00 us 15892.00 us 8843 OPEN 2.28 63.93 us 25.00 us 14064.00 us 35560 INODELK 3.30 25327.51 us 77.00 us 275527.00 us 130 FSYNC 6.02 71.86 us 20.00 us 29539.00 us 83532 WRITE 85.86 950.72 us 45.00 us 234956.00 us 89984 LOOKUP Duration: 19297631 seconds Data Read: 0 bytes Data Written: 13824276920 bytes Interval 5 Stats: %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 90 RELEASEDIR 0.00 107.50 us 84.00 us 131.00 us 2 ENTRYLK 0.00 1053.00 us 1053.00 us 1053.00 us 1 UNLINK 0.13 946.06 us 58.00 us 16855.00 us 35 OPENDIR 99.87 1700.08 us 89.00 us 67917.00 us 15086 LOOKUP Duration: 246 seconds Data Read: 0 bytes Data Written: 0 bytes Brick: HOSTNAME-00-A:/arbiterAA01/gvAA01/brick1 ----------------------------------------------- Cumulative Stats: Block Size: 1b+ No. of Reads: 0 No. of Writes: 2407865216 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 3885977 FORGET 0.00 0.00 us 0.00 us 0.00 us 1908766 RELEASE 0.00 0.00 us 0.00 us 0.00 us 121336583 RELEASEDIR 0.00 41.33 us 37.00 us 44.00 us 3 FLUSH 0.00 107.40 us 60.00 us 216.00 us 5 LK 0.00 197.00 us 144.00 us 367.00 us 9 SETATTR 0.00 559.25 us 269.00 us 1318.00 us 4 UNLINK 0.01 117.55 us 34.00 us 426.00 us 49 GETXATTR 0.01 729.10 us 111.00 us 2755.00 us 10 FSTAT 0.02 283.04 us 169.00 us 1903.00 us 78 SETXATTR 0.04 71.01 us 24.00 us 3460.00 us 547 ENTRYLK 0.05 5719.78 us 1045.00 us 39386.00 us 9 MKNOD 0.10 1199.26 us 338.00 us 2419.00 us 88 XATTROP 0.10 18965.50 us 244.00 us 111009.00 us 6 READDIR 0.13 1842.74 us 740.00 us 17409.00 us 78 MKDIR 0.22 196.96 us 3.00 us 13973.00 us 1241 OPENDIR 0.37 848.02 us 97.00 us 22427.00 us 478 FSYNC 0.58 142.08 us 80.00 us 9771.00 us 4495 OPEN 0.71 79.24 us 30.00 us 8519.00 us 9794 WRITE 0.72 197.05 us 83.00 us 36661.00 us 4020 FXATTROP 2.42 80.26 us 16.00 us 20655.00 us 33176 INODELK 4.57 791.78 us 19.00 us 854250.00 us 6345 FINODELK 89.97 1130.79 us 47.00 us 716773.00 us 87554 LOOKUP Duration: 19297631 seconds Data Read: 0 bytes Data Written: 2407865216 bytes Interval 5 Stats: Block Size: 1b+ No. of Reads: 0 No. of Writes: 2300 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 159 RELEASEDIR 0.02 993.33 us 161.00 us 2755.00 us 6 FSTAT 0.04 85.40 us 24.00 us 3460.00 us 169 ENTRYLK 0.14 957.84 us 61.00 us 13973.00 us 55 OPENDIR 0.16 76.25 us 38.00 us 1284.00 us 795 WRITE 0.17 1246.94 us 468.00 us 2063.00 us 52 XATTROP 0.36 755.13 us 97.00 us 10173.00 us 185 FSYNC 0.68 165.35 us 92.00 us 6470.00 us 1586 FXATTROP 1.75 90.50 us 16.00 us 7633.00 us 7475 INODELK 6.20 972.86 us 23.00 us 854250.00 us 2464 FINODELK 90.50 1686.49 us 89.00 us 78332.00 us 20759 LOOKUP Duration: 246 seconds Data Read: 0 bytes Data Written: 2300 bytes Brick: HOSTNAME-00-A:/arbiterAA01/gvAA01/brick8 ----------------------------------------------- Cumulative Stats: Block Size: 1b+ No. of Reads: 0 No. of Writes: 5492133233 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 6710878 FORGET 0.00 0.00 us 0.00 us 0.00 us 5709109 RELEASE 0.00 0.00 us 0.00 us 0.00 us 121308841 RELEASEDIR 0.00 381.00 us 381.00 us 381.00 us 1 LINK 0.00 321.50 us 309.00 us 334.00 us 2 RENAME 0.00 50.13 us 31.00 us 92.00 us 15 FLUSH 0.00 68.93 us 43.00 us 136.00 us 30 LK 0.00 207.60 us 137.00 us 379.00 us 10 SETATTR 0.00 456.20 us 247.00 us 837.00 us 5 UNLINK 0.00 67.16 us 39.00 us 233.00 us 56 FINODELK 0.01 1313.12 us 943.00 us 1632.00 us 8 MKNOD 0.01 61.85 us 34.00 us 384.00 us 214 ENTRYLK 0.02 260.12 us 141.00 us 1053.00 us 78 SETXATTR 0.04 688.39 us 327.00 us 2844.00 us 56 FXATTROP 0.11 1360.77 us 862.00 us 3735.00 us 78 MKDIR 0.22 177.44 us 49.00 us 11813.00 us 1185 OPENDIR 0.90 147.72 us 78.00 us 17203.00 us 5732 OPEN 1.30 79.82 us 37.00 us 11963.00 us 15230 WRITE 1.65 67.26 us 23.00 us 13182.00 us 23091 INODELK 95.71 796.62 us 69.00 us 131536.00 us 112767 LOOKUP Duration: 19297631 seconds Data Read: 0 bytes Data Written: 5492133233 bytes Interval 5 Stats: %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 90 RELEASEDIR 0.00 106.00 us 84.00 us 128.00 us 2 ENTRYLK 0.00 379.00 us 379.00 us 379.00 us 1 SETATTR 0.00 513.00 us 513.00 us 513.00 us 1 UNLINK 0.00 665.50 us 662.00 us 669.00 us 2 INODELK 0.15 1173.54 us 86.00 us 11813.00 us 35 OPENDIR 99.84 1831.00 us 105.00 us 93579.00 us 15150 LOOKUP Duration: 246 seconds Data Read: 0 bytes Data Written: 0 bytes Brick: HOSTNAME-00-A:/arbiterAA01/gvAA01/brick3 ----------------------------------------------- Cumulative Stats: Block Size: 1b+ No. of Reads: 0 No. of Writes: 4572691669 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 4594149 FORGET 0.00 0.00 us 0.00 us 0.00 us 3305802 RELEASE 0.00 0.00 us 0.00 us 0.00 us 121331753 RELEASEDIR 0.00 351.00 us 351.00 us 351.00 us 1 RENAME 0.00 52.20 us 31.00 us 74.00 us 15 FLUSH 0.00 150.27 us 116.00 us 192.00 us 11 SETATTR 0.00 70.48 us 40.00 us 155.00 us 29 LK 0.00 698.00 us 294.00 us 1204.00 us 4 UNLINK 0.01 101.02 us 45.00 us 170.00 us 123 GETXATTR 0.02 226.96 us 132.00 us 661.00 us 78 SETXATTR 0.03 2449.45 us 981.00 us 14588.00 us 11 MKNOD 0.03 133.80 us 33.00 us 14384.00 us 205 ENTRYLK 0.08 13713.00 us 250.00 us 79575.00 us 6 READDIR 0.11 1403.08 us 856.00 us 3307.00 us 78 MKDIR 0.15 130.34 us 3.00 us 7373.00 us 1188 OPENDIR 0.63 135.12 us 75.00 us 3913.00 us 4698 OPEN 1.58 151.39 us 20.00 us 720477.00 us 10441 FINODELK 2.03 70.30 us 26.00 us 31988.00 us 28857 INODELK 3.61 457.95 us 148.00 us 96986.00 us 7889 FXATTROP 13.93 74.03 us 19.00 us 38536.00 us 188198 WRITE 15.54 5296.92 us 87.00 us 285482.00 us 2936 FSYNC 62.24 723.17 us 19.00 us 1037869.00 us 86100 LOOKUP Duration: 19297631 seconds Data Read: 0 bytes Data Written: 4572691669 bytes Interval 5 Stats: Block Size: 1b+ No. of Reads: 0 No. of Writes: 2170 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 90 RELEASEDIR 0.00 105.67 us 65.00 us 164.00 us 3 ENTRYLK 0.01 950.00 us 950.00 us 950.00 us 1 UNLINK 0.07 398.43 us 64.00 us 5324.00 us 35 OPENDIR 0.34 88.92 us 36.00 us 4283.00 us 716 WRITE 0.52 88.23 us 22.00 us 6820.00 us 1112 FINODELK 1.21 289.14 us 156.00 us 6432.00 us 793 FXATTROP 1.53 796.04 us 104.00 us 2668.00 us 365 FSYNC 1.55 68.03 us 29.00 us 1880.00 us 4317 INODELK 94.77 1086.38 us 76.00 us 101220.00 us 16536 LOOKUP Duration: 246 seconds Data Read: 0 bytes Data Written: 2170 bytes Brick: HOSTNAME-00-A:/arbiterAA01/gvAA01/brick6 ----------------------------------------------- Cumulative Stats: Block Size: 1b+ No. of Reads: 0 No. of Writes: 3299040333 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 4224301 FORGET 0.00 0.00 us 0.00 us 0.00 us 2341147 RELEASE 0.00 0.00 us 0.00 us 0.00 us 121317598 RELEASEDIR 0.00 75.25 us 39.00 us 117.00 us 4 FLUSH 0.00 70.62 us 47.00 us 100.00 us 8 LK 0.00 156.11 us 125.00 us 189.00 us 9 SETATTR 0.00 1980.00 us 1980.00 us 1980.00 us 1 RENAME 0.00 364.29 us 311.00 us 434.00 us 7 UNLINK 0.01 134.92 us 31.00 us 662.00 us 53 GETXATTR 0.01 1307.44 us 1005.00 us 1850.00 us 9 MKNOD 0.01 79.36 us 26.00 us 665.00 us 157 FINODELK 0.02 275.58 us 162.00 us 1593.00 us 78 SETXATTR 0.03 108.84 us 33.00 us 4176.00 us 238 ENTRYLK 0.12 1469.38 us 947.00 us 7741.00 us 78 MKDIR 0.16 17107.56 us 88.00 us 150018.00 us 9 READDIR 0.22 170.89 us 3.00 us 12533.00 us 1191 OPENDIR 0.24 1117.26 us 216.00 us 79180.00 us 202 FXATTROP 0.85 38142.10 us 498.00 us 126241.00 us 21 FSYNC 0.89 85.41 us 21.00 us 27516.00 us 9833 WRITE 0.98 138.12 us 74.00 us 3527.00 us 6670 OPEN 1.87 65.50 us 24.00 us 18157.00 us 26861 INODELK 94.56 840.11 us 19.00 us 150836.00 us 105706 LOOKUP Duration: 19297631 seconds Data Read: 0 bytes Data Written: 3299040333 bytes Interval 5 Stats: %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 90 RELEASEDIR 0.01 1980.00 us 1980.00 us 1980.00 us 1 RENAME 0.04 2164.25 us 113.00 us 4176.00 us 4 ENTRYLK 0.14 780.77 us 62.00 us 12533.00 us 35 OPENDIR 99.81 1300.71 us 65.00 us 96686.00 us 15135 LOOKUP Duration: 246 seconds Data Read: 0 bytes Data Written: 0 bytes Brick: HOSTNAME-00-A:/arbiterAA01/gvAA01/brick4 ----------------------------------------------- Cumulative Stats: Block Size: 1b+ No. of Reads: 0 No. of Writes: 4188179800 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 4843776 FORGET 0.00 0.00 us 0.00 us 0.00 us 4734700 RELEASE 0.00 0.00 us 0.00 us 0.00 us 122187144 RELEASEDIR 0.00 453.00 us 453.00 us 453.00 us 1 RENAME 0.00 2935.00 us 2935.00 us 2935.00 us 1 MKNOD 0.00 314.83 us 275.00 us 348.00 us 12 LINK 0.00 46.91 us 22.00 us 394.00 us 116 FLUSH 0.00 78.42 us 37.00 us 725.00 us 78 LK 0.00 152.00 us 66.00 us 2303.00 us 41 FTRUNCATE 0.01 120.96 us 27.00 us 211.00 us 80 GETXATTR 0.01 1123.27 us 400.00 us 1955.00 us 11 XATTROP 0.01 264.98 us 89.00 us 4166.00 us 59 SETATTR 0.01 522.77 us 219.00 us 1988.00 us 31 UNLINK 0.01 3116.33 us 335.00 us 15814.00 us 6 READDIR 0.02 194.27 us 87.00 us 1585.00 us 146 OPEN 0.02 404.47 us 146.00 us 10053.00 us 78 SETXATTR 0.03 94.93 us 24.00 us 12262.00 us 407 ENTRYLK 0.04 380.95 us 82.00 us 30028.00 us 169 FSTAT 0.08 1825.35 us 1237.00 us 3241.00 us 62 CREATE 0.11 2054.41 us 1177.00 us 7853.00 us 78 MKDIR 0.11 137.97 us 4.00 us 4662.00 us 1188 OPENDIR 0.93 140.91 us 18.00 us 368752.00 us 9912 FINODELK 2.28 404.58 us 88.00 us 26154.00 us 8431 FXATTROP 20.16 9694.89 us 64.00 us 288694.00 us 3111 FSYNC 20.76 68.66 us 19.00 us 45520.00 us 452355 WRITE 26.67 59.16 us 16.00 us 26108.00 us 674404 INODELK 28.72 687.59 us 46.00 us 85079.00 us 62473 LOOKUP Duration: 19297631 seconds Data Read: 0 bytes Data Written: 4188179800 bytes Interval 5 Stats: Block Size: 1b+ No. of Reads: 0 No. of Writes: 62 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 90 RELEASEDIR 0.01 71.83 us 46.00 us 116.00 us 30 WRITE 0.03 150.81 us 33.00 us 1289.00 us 37 FINODELK 0.08 489.54 us 63.00 us 4603.00 us 35 OPENDIR 0.20 874.41 us 240.00 us 2977.00 us 51 FXATTROP 19.05 66.56 us 17.00 us 14272.00 us 62733 INODELK 80.63 1162.50 us 99.00 us 85079.00 us 15201 LOOKUP Duration: 246 seconds Data Read: 0 bytes Data Written: 62 bytes Brick: HOSTNAME-00-A:/arbiterAA01/gvAA01/brick7 ----------------------------------------------- Cumulative Stats: Block Size: 1b+ No. of Reads: 0 No. of Writes: 4459447660 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 4903888 FORGET 0.00 0.00 us 0.00 us 0.00 us 4906100 RELEASE 0.00 0.00 us 0.00 us 0.00 us 121313760 RELEASEDIR 0.00 157.00 us 157.00 us 157.00 us 1 FSTAT 0.00 334.00 us 334.00 us 334.00 us 1 LINK 0.00 492.00 us 492.00 us 492.00 us 1 RENAME 0.00 81.56 us 50.00 us 121.00 us 9 LK 0.00 600.00 us 433.00 us 767.00 us 2 XATTROP 0.00 284.69 us 26.00 us 1664.00 us 13 FLUSH 0.00 343.79 us 162.00 us 1811.00 us 14 SETATTR 0.00 1061.67 us 236.00 us 3527.00 us 6 UNLINK 0.01 1371.00 us 997.00 us 2107.00 us 6 MKNOD 0.01 109.50 us 33.00 us 166.00 us 103 GETXATTR 0.01 3619.60 us 1046.00 us 6783.00 us 5 CREATE 0.02 258.63 us 161.00 us 1183.00 us 78 SETXATTR 0.05 214.24 us 27.00 us 6443.00 us 280 ENTRYLK 0.09 1455.38 us 615.00 us 7156.00 us 78 MKDIR 0.11 24019.00 us 296.00 us 140960.00 us 6 READDIR 0.17 181.83 us 4.00 us 11536.00 us 1191 OPENDIR 0.20 1608.87 us 48.00 us 21829.00 us 157 FSYNC 1.05 138.38 us 73.00 us 1645.00 us 9762 OPEN 1.27 945.52 us 99.00 us 35935.00 us 1730 FXATTROP 2.19 71.12 us 25.00 us 12922.00 us 39768 INODELK 2.31 115.41 us 18.00 us 17045.00 us 25869 WRITE 10.45 6550.60 us 19.00 us 5168504.00 us 2062 FINODELK 82.07 703.56 us 22.00 us 235243.00 us 150709 LOOKUP Duration: 19297631 seconds Data Read: 0 bytes Data Written: 4459447660 bytes Interval 5 Stats: Block Size: 1b+ No. of Reads: 0 No. of Writes: 3267 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 9 FORGET 0.00 0.00 us 0.00 us 0.00 us 5 RELEASE 0.00 0.00 us 0.00 us 0.00 us 93 RELEASEDIR 0.00 157.00 us 157.00 us 157.00 us 1 FSTAT 0.00 334.00 us 334.00 us 334.00 us 1 LINK 0.00 187.00 us 170.00 us 204.00 us 2 SETATTR 0.00 272.50 us 54.00 us 491.00 us 2 FLUSH 0.00 767.00 us 767.00 us 767.00 us 1 XATTROP 0.01 654.00 us 153.00 us 1645.00 us 3 OPEN 0.03 1135.40 us 236.00 us 3527.00 us 5 UNLINK 0.04 3975.50 us 1713.00 us 6238.00 us 2 CREATE 0.05 305.97 us 56.00 us 2057.00 us 38 OPENDIR 0.12 607.39 us 27.00 us 6443.00 us 44 ENTRYLK 0.59 1786.74 us 48.00 us 21829.00 us 74 FSYNC 0.73 661.93 us 33.00 us 12922.00 us 245 INODELK 3.02 1345.12 us 101.00 us 31614.00 us 500 FXATTROP 3.23 526.99 us 18.00 us 14746.00 us 1365 WRITE 22.47 6998.94 us 19.00 us 2596341.00 us 715 FINODELK 69.70 916.77 us 78.00 us 51010.00 us 16930 LOOKUP Duration: 246 seconds Data Read: 0 bytes Data Written: 3267 bytes From patrickmrennie at gmail.com Sun Apr 21 14:28:52 2019 From: patrickmrennie at gmail.com (Patrick Rennie) Date: Sun, 21 Apr 2019 22:28:52 +0800 Subject: [Gluster-users] Extremely slow cluster performance In-Reply-To: References: Message-ID: I think just worked out why NFS lookups are sometimes slow and sometimes fast as the hostname uses round robin DNS lookups, if I change to a specific host, 01-B, it's always quick, and if I change to the other brick host, 02-B, it's always slow. Maybe that will help to narrow this down? On Sun, Apr 21, 2019 at 10:24 PM Patrick Rennie wrote: > Hi Strahil, > > Thank you for your reply and your suggestions. I'm not sure which logs > would be most relevant to be checking to diagnose this issue, we have the > brick logs, the cluster mount logs, the shd logs or something else? I have > posted a few that I have seen repeated a few times already. I will continue > to post anything further that I see. > I am working on migrating data to some new storage, so this will slowly > free up space, although this is a production cluster and new data is being > uploaded every day, sometimes faster than I can migrate it off. I have > several other similar clusters and none of them have the same problem, one > the others is actually at 98-99% right now (big problem, I know) but still > performs perfectly fine compared to this cluster, I am not sure low space > is the root cause here. > > I currently have 13 VMs accessing this cluster, I have checked each one > and all of them use one of the two options below to mount the cluster in > fstab > > HOSTNAME:/gvAA01 /mountpoint glusterfs > defaults,_netdev,rw,log-level=WARNING,direct-io-mode=disable,use-readdirp=no > 0 0 > HOSTNAME:/gvAA01 /mountpoint glusterfs > defaults,_netdev,rw,log-level=WARNING,direct-io-mode=disable > > I also have a few other VMs which use NFS to access the cluster, and these > machines appear to be significantly quicker, initially I get a similar > delay with NFS but if I cancel the first "ls" and try it again I get < 1 > sec lookups, this can take over 10 minutes by FUSE/gluster client, but the > same trick of cancelling and trying again doesn't work for FUSE/gluster. > Sometimes the NFS queries have no delay at all, so this is a bit strange to > me. > HOSTNAME:/gvAA01 /mountpoint/ nfs > defaults,_netdev,vers=3,async,noatime 0 0 > > Example: > user at VM:~$ time ls /cluster/folder > ^C > > real 9m49.383s > user 0m0.001s > sys 0m0.010s > > user at VM:~$ time ls /cluster/folder > > > real 0m0.069s > user 0m0.001s > sys 0m0.007s > > --- > > I have checked the profiling as you suggested, I let it run for around a > minute, then cancelled it and saved the profile info. > > root at HOSTNAME:/var/log/glusterfs# gluster volume profile gvAA01 start > Starting volume profile on gvAA01 has been successful > root at HOSTNAME:/var/log/glusterfs# time ls /cluster/folder > ^C > > real 1m1.660s > user 0m0.000s > sys 0m0.002s > > root at HOSTNAME:/var/log/glusterfs# gluster volume profile gvAA01 info >> > ~/profile.txt > root at HOSTNAME:/var/log/glusterfs# gluster volume profile gvAA01 stop > > I will attach the results to this email as it's over 1000 lines. > Unfortunately, I'm not sure what I'm looking at but possibly somebody will > be able to help me make sense of it and let me know if it highlights any > specific issues. > > Happy to try any further suggestions. Thank you, > > -Patrick > > On Sun, Apr 21, 2019 at 7:55 PM Strahil wrote: > >> By the way, can you provide the 'volume info' and the mount options on >> all clients? >> Maybe , there is an option that uses a lot of resources due to some >> client's mount options. >> >> Best Regards, >> Strahil Nikolov >> On Apr 21, 2019 10:55, Patrick Rennie wrote: >> >> Just another small update, I'm continuing to watch my brick logs and I >> just saw these errors come up in the recent events too. I am going to >> continue to post any errors I see in the hope of finding the right one to >> try and fix.. >> This is from the logs on brick1, seems to be occurring on both nodes on >> brick1, although at different times. I'm not sure what this means, can >> anyone shed any light? >> I guess I am looking for some kind of specific error which may indicate >> something is broken or stuck and locking up and causing the extreme latency >> I'm seeing in the cluster. >> >> [2019-04-21 07:25:55.064497] E [rpcsvc.c:1364:rpcsvc_submit_generic] >> 0-rpc-service: failed to submit message (XID: 0x7c700c, Program: GlusterFS >> 3.3, ProgVers: 330, Proc: 29) to rpc-transport (tcp.gvAA01-server) >> [2019-04-21 07:25:55.064612] E [server.c:195:server_submit_reply] >> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e58a) >> [0x7f3b3e93158a] >> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17d45) >> [0x7f3b3e4c5d45] >> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) >> [0x7f3b3e4b72cd] ) 0-: Reply submission failed >> [2019-04-21 07:25:55.064675] E [rpcsvc.c:1364:rpcsvc_submit_generic] >> 0-rpc-service: failed to submit message (XID: 0x7c70af, Program: GlusterFS >> 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server) >> [2019-04-21 07:25:55.064705] E [server.c:195:server_submit_reply] >> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa) >> [0x7f3b3e9318fa] >> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35) >> [0x7f3b3e4c5f35] >> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) >> [0x7f3b3e4b72cd] ) 0-: Reply submission failed >> [2019-04-21 07:25:55.064742] E [rpcsvc.c:1364:rpcsvc_submit_generic] >> 0-rpc-service: failed to submit message (XID: 0x7c723c, Program: GlusterFS >> 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server) >> [2019-04-21 07:25:55.064768] E [server.c:195:server_submit_reply] >> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa) >> [0x7f3b3e9318fa] >> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35) >> [0x7f3b3e4c5f35] >> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) >> [0x7f3b3e4b72cd] ) 0-: Reply submission failed >> [2019-04-21 07:25:55.064812] E [rpcsvc.c:1364:rpcsvc_submit_generic] >> 0-rpc-service: failed to submit message (XID: 0x7c72b4, Program: GlusterFS >> 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server) >> [2019-04-21 07:25:55.064837] E [server.c:195:server_submit_reply] >> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa) >> [0x7f3b3e9318fa] >> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35) >> [0x7f3b3e4c5f35] >> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) >> [0x7f3b3e4b72cd] ) 0-: Reply submission failed >> [2019-04-21 07:25:55.064880] E [rpcsvc.c:1364:rpcsvc_submit_generic] >> 0-rpc-service: failed to submit message (XID: 0x7c740b, Program: GlusterFS >> 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server) >> [2019-04-21 07:25:55.064905] E [server.c:195:server_submit_reply] >> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa) >> [0x7f3b3e9318fa] >> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35) >> [0x7f3b3e4c5f35] >> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) >> [0x7f3b3e4b72cd] ) 0-: Reply submission failed >> [2019-04-21 07:25:55.064939] E [rpcsvc.c:1364:rpcsvc_submit_generic] >> 0-rpc-service: failed to submit message (XID: 0x7c7441, Program: GlusterFS >> 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server) >> [2019-04-21 07:25:55.064962] E [server.c:195:server_submit_reply] >> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa) >> [0x7f3b3e9318fa] >> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35) >> [0x7f3b3e4c5f35] >> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) >> [0x7f3b3e4b72cd] ) 0-: Reply submission failed >> [2019-04-21 07:25:55.064996] E [rpcsvc.c:1364:rpcsvc_submit_generic] >> 0-rpc-service: failed to submit message (XID: 0x7c74d5, Program: GlusterFS >> 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server) >> [2019-04-21 07:25:55.065020] E [server.c:195:server_submit_reply] >> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa) >> [0x7f3b3e9318fa] >> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35) >> [0x7f3b3e4c5f35] >> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) >> [0x7f3b3e4b72cd] ) 0-: Reply submission failed >> [2019-04-21 07:25:55.065052] E [rpcsvc.c:1364:rpcsvc_submit_generic] >> 0-rpc-service: failed to submit message (XID: 0x7c7551, Program: GlusterFS >> 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server) >> [2019-04-21 07:25:55.065076] E [server.c:195:server_submit_reply] >> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa) >> [0x7f3b3e9318fa] >> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35) >> [0x7f3b3e4c5f35] >> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) >> [0x7f3b3e4b72cd] ) 0-: Reply submission failed >> [2019-04-21 07:25:55.065110] E [rpcsvc.c:1364:rpcsvc_submit_generic] >> 0-rpc-service: failed to submit message (XID: 0x7c76d1, Program: GlusterFS >> 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server) >> [2019-04-21 07:25:55.065133] E [server.c:195:server_submit_reply] >> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa) >> [0x7f3b3e9318fa] >> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35) >> [0x7f3b3e4c5f35] >> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) >> [0x7f3b3e4b72cd] ) 0-: Reply submission failed >> >> Thanks again, >> >> -Patrick >> >> On Sun, Apr 21, 2019 at 3:50 PM Patrick Rennie >> wrote: >> >> Hi Darrell, >> >> Thanks again for your advice, I've left it for a while but unfortunately >> it's still just as slow and causing more problems for our operations now. I >> will need to try and take some steps to at least bring performance back to >> normal while continuing to investigate the issue longer term. I can >> definitely see one node with heavier CPU than the other, almost double, >> which I am OK with, but I think the heal process is going to take forever, >> trying to check the "gluster volume heal info" shows thousands and >> thousands of files which may need healing, I have no idea how many in total >> the command is still running after hours, so I am not sure what has gone so >> wrong to cause this. >> >> I've checked cluster.op-version and cluster.max-op-version and it looks >> like I'm on the latest version there. >> >> I have no idea how long the healing is going to take on this cluster, we >> have around 560TB of data on here, but I don't think I can wait that long >> to try and restore performance to normal. >> >> Can anyone think of anything else I can try in the meantime to work out >> what's causing the extreme latency? >> >> I've been going through cluster client the logs of some of our VMs and on >> some of our FTP servers I found this in the cluster mount log, but I am not >> seeing it on any of our other servers, just our FTP servers. >> >> [2019-04-21 07:16:19.925388] E [MSGID: 101046] >> [dht-common.c:1904:dht_revalidate_cbk] 0-gvAA01-dht: dict is null >> [2019-04-21 07:19:43.413834] W [MSGID: 114031] >> [client-rpc-fops.c:2203:client3_3_setattr_cbk] 0-gvAA01-client-19: remote >> operation failed [No such file or directory] >> [2019-04-21 07:19:43.414153] W [MSGID: 114031] >> [client-rpc-fops.c:2203:client3_3_setattr_cbk] 0-gvAA01-client-20: remote >> operation failed [No such file or directory] >> [2019-04-21 07:23:33.154717] E [MSGID: 101046] >> [dht-common.c:1904:dht_revalidate_cbk] 0-gvAA01-dht: dict is null >> [2019-04-21 07:33:24.943913] E [MSGID: 101046] >> [dht-common.c:1904:dht_revalidate_cbk] 0-gvAA01-dht: dict is null >> >> Any ideas what this could mean? I am basically just grasping at straws >> here. >> >> I am going to hold off on the version upgrade until I know there are no >> files which need healing, which could be a while, from some reading I've >> done there shouldn't be any issues with this as both are on v3.12.x >> >> I've free'd up a small amount of space, but I still need to work on this >> further. >> >> I've read of a command "find .glusterfs -type f -links -2 -exec rm {} \;" >> which could be run on each brick and it would potentially clean up any >> files which were deleted straight from the bricks, but not via the client, >> I have a feeling this could help me free up about 5-10TB per brick from >> what I've been told about the history of this cluster. Can anyone confirm >> if this is actually safe to run? >> >> At this stage, I'm open to any suggestions as to how to proceed, thanks >> again for any advice. >> >> Cheers, >> >> - Patrick >> >> On Sun, Apr 21, 2019 at 1:22 AM Darrell Budic >> wrote: >> >> Patrick, >> >> Sounds like progress. Be aware that gluster is expected to max out the >> CPUs on at least one of your servers while healing. This is normal and >> won?t adversely affect overall performance (any more than having bricks in >> need of healing, at any rate) unless you?re overdoing it. shd threads <= 4 >> should not do that on your hardware. Other tunings may have also increased >> overall performance, so you may see higher CPU than previously anyway. I?d >> recommend upping those thread counts and letting it heal as fast as >> possible, especially if these are dedicated Gluster storage servers (Ie: >> not also running VMs, etc). You should see ?normal? CPU use one heals are >> completed. I see ~15-30% overall normally, 95-98% while healing (x my 20 >> cores). It?s also likely to be different between your servers, in a pure >> replica, one tends to max and one tends to be a little higher, in a >> distributed-replica, I?d expect more than one to run harder while healing. >> >> Keep the differences between doing an ls on a brick and doing an ls on a >> gluster mount in mind. When you do a ls on a gluster volume, it isn?t just >> doing a ls on one brick, it?s effectively doing it on ALL of your bricks, >> and they all have to return data before the ls succeeds. In a distributed >> volume, it?s figuring out where on each volume things live and getting the >> stat() from each to assemble the whole thing. And if things are in need of >> healing, it will take even longer to decide which version is current and >> use it (shd triggers a heal anytime it encounters this). Any of these >> things being slow slows down the overall response. >> >> At this point, I?d get some sleep too, and let your cluster heal while >> you do. I?d really want it fully healed before I did any updates anyway, so >> let it use CPU and get itself sorted out. Expect it to do a round of >> healing after you upgrade each machine too, this is normal so don?t let the >> CPU spike surprise you, It?s just catching up from the downtime incurred by >> the update and/or reboot if you did one. >> >> That reminds me, check your gluster cluster.op-version and >> cluster.max-op-version (gluster vol get all all | grep op-version). If >> op-version isn?t at the max-op-verison, set it to it so you?re taking >> advantage of the latest features available to your version. >> >> -Darrell >> >> On Apr 20, 2019, at 11:54 AM, Patrick Rennie >> wrote: >> >> Hi Darrell, >> >> Thanks again for your advice, I've applied the acltype=posixacl on my >> zpools and I think that has reduced some of the noise from my brick logs. >> I also bumped up some of the thread counts you suggested but my CPU load >> skyrocketed, so I dropped it back down to something slightly lower, but >> still higher than it was before, and will see how that goes for a while. >> >> Although low space is a definite issue, if I run an ls anywhere on my >> bricks directly it's instant, <1 second, and still takes several minutes >> via gluster, so there is still a problem in my gluster configuration >> somewhere. We don't have any snapshots, but I am trying to work out if any >> data on there is safe to delete, or if there is any way I can safely find >> and delete data which has been removed directly from the bricks in the >> past. I also have lz4 compression already enabled on each zpool which does >> help a bit, we get between 1.05 and 1.08x compression on this data. >> I've tried to go through each client and checked it's cluster mount logs >> and also my brick logs and looking for errors, so far nothing is jumping >> out at me, but there are some warnings and errors here and there, I am >> trying to work out what they mean. >> >> It's already 1 am here and unfortunately, I'm still awake working on this >> issue, but I think that I will have to leave the version upgrades until >> tomorrow. >> >> Thanks again for your advice so far. If anyone has any ideas on where I >> can look for errors other than brick logs or the cluster mount logs to help >> resolve this issue, it would be much appreciated. >> >> Cheers, >> >> - Patrick >> >> On Sat, Apr 20, 2019 at 11:57 PM Darrell Budic >> wrote: >> >> See inline: >> >> On Apr 20, 2019, at 10:09 AM, Patrick Rennie >> wrote: >> >> Hi Darrell, >> >> Thanks for your reply, this issue seems to be getting worse over the last >> few days, really has me tearing my hair out. I will do as you have >> suggested and get started on upgrading from 3.12.14 to 3.12.15. >> I've checked the zfs properties and all bricks have "xattr=sa" set, but >> none of them has "acltype=posixacl" set, currently the acltype property >> shows "off", if I make these changes will it apply retroactively to the >> existing data? I'm unfamiliar with what this will change so I may need to >> look into that before I proceed. >> >> >> It is safe to apply that now, any new set/get calls will then use it if >> new posixacls exist, and use older if not. ZFS is good that way. It should >> clear up your posix_acl and posix errors over time. >> >> I understand performance is going to slow down as the bricks get full, I >> am currently trying to free space and migrate data to some newer storage, I >> have fresh several hundred TB storage I just setup recently but with these >> performance issues it's really slow. I also believe there is significant >> data which has been deleted directly from the bricks in the past, so if I >> can reclaim this space in a safe manner then I will have at least around >> 10-15% free space. >> >> >> Full ZFS volumes will have a much larger impact on performance than you?d >> think, I?d prioritize this. If you have been taking zfs snapshots, consider >> deleting them to get the overall volume free space back up. And just to be >> sure it?s been said, delete from within the mounted volumes, don?t delete >> directly from the bricks (gluster will just try and heal it later, >> compounding your issues). Does not apply to deleting other data from the >> ZFS volume if it?s not part of the brick directory, of course. >> >> These servers have dual 8 core Xeon (E5-2620v4) and 512GB of RAM so >> generally they have plenty of resources available, currently only using >> around 330/512GB of memory. >> >> I will look into what your suggested settings will change, and then will >> probably go ahead with your recommendations, for our specs as stated above, >> what would you suggest for performance.io-thread-count ? >> >> >> I run single 2630v4s on my servers, which have a smaller storage >> footprint than yours. I?d go with 32 for performance.io-thread-count. >> I?d try 4 for the shd thread settings on that gear. Your memory use sounds >> fine, so no worries there. >> >> Our workload is nothing too extreme, we have a few VMs which write backup >> data to this storage nightly for our clients, our VMs don't live on this >> cluster, but just write to it. >> >> >> If they are writing compressible data, you?ll get immediate benefit by >> setting compression=lz4 on your ZFS volumes. It won?t help any old data, of >> course, but it will compress new data going forward. This is another one >> that?s safe to enable on the fly. >> >> I've been going through all of the logs I can, below are some slightly >> sanitized errors I've come across, but I'm not sure what to make of them. >> The main error I am seeing is the first one below, across several of my >> bricks, but possibly only for specific folders on the cluster, I'm not 100% >> about that yet though. >> >> [2019-04-20 05:56:59.512649] E [MSGID: 113001] >> [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on >> /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default [Operation not >> supported] >> [2019-04-20 05:59:06.084333] E [MSGID: 113001] >> [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on >> /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default [Operation not >> supported] >> [2019-04-20 05:59:43.289030] E [MSGID: 113001] >> [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on >> /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default [Operation not >> supported] >> [2019-04-20 05:59:50.582257] E [MSGID: 113001] >> [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on >> /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default [Operation not >> supported] >> [2019-04-20 06:01:42.501701] E [MSGID: 113001] >> [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on >> /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default [Operation not >> supported] >> [2019-04-20 06:01:51.665354] W [posix.c:4929:posix_getxattr] >> 0-gvAA01-posix: Extended attributes not supported (try remounting brick >> with 'user_xattr' flag) >> >> >> [2019-04-20 13:12:36.131856] E [MSGID: 113002] >> [posix-helpers.c:893:posix_gfid_set] 0-gvAA01-posix: gfid is null for >> /xxxxxxxxxxxxxxxxxxxx [Invalid argument] >> [2019-04-20 13:12:36.131959] E [MSGID: 113002] [posix.c:362:posix_lookup] >> 0-gvAA01-posix: buf->ia_gfid is null for >> /brick2/xxxxxxxxxxxxxxxxxxxx_62906_tmp [No data available] >> [2019-04-20 13:12:36.132016] E [MSGID: 115050] >> [server-rpc-fops.c:175:server_lookup_cbk] 0-gvAA01-server: 24274759: LOOKUP >> /xxxxxxxxxxxxxxxxxxxx (a7c9b4a0-b7ee-4d01-a79e-576013c8ac87/Cloud >> Backup_clone1.vbm_62906_tmp), client: >> 00-A-16217-2019/04/08-21:23:03:692424-gvAA01-client-4-0-3, error-xlator: >> gvAA01-posix [No data available] >> [2019-04-20 13:12:38.093719] E [MSGID: 115050] >> [server-rpc-fops.c:175:server_lookup_cbk] 0-gvAA01-server: 24276491: LOOKUP >> /xxxxxxxxxxxxxxxxxxxx (a7c9b4a0-b7ee-4d01-a79e-576013c8ac87/Cloud >> Backup_clone1.vbm_62906_tmp), client: >> 00-A-16217-2019/04/08-21:23:03:692424-gvAA01-client-4-0-3, error-xlator: >> gvAA01-posix [No data available] >> [2019-04-20 13:12:38.093660] E [MSGID: 113002] >> [posix-helpers.c:893:posix_gfid_set] 0-gvAA01-posix: gfid is null for >> /xxxxxxxxxxxxxxxxxxxx [Invalid argument] >> [2019-04-20 13:12:38.093696] E [MSGID: 113002] [posix.c:362:posix_lookup] >> 0-gvAA01-posix: buf->ia_gfid is null for /brick2/xxxxxxxxxxxxxxxxxxxx [No >> data available] >> >> >> posixacls should clear those up, as mentioned. >> >> >> [2019-04-20 14:25:59.654576] E [inodelk.c:404:__inode_unlock_lock] >> 0-gvAA01-locks: Matching lock not found for unlock 0-9223372036854775807, >> by 980fdbbd367f0000 on 0x7fc4f0161440 >> [2019-04-20 14:25:59.654668] E [MSGID: 115053] >> [server-rpc-fops.c:295:server_inodelk_cbk] 0-gvAA01-server: 6092928: >> INODELK /xxxxxxxxxxxxxxxxxxxx.cdr$ (25b14631-a179-4274-8243-6e272d4f2ad8), >> client: >> cb-per-worker18-53637-2019/04/19-14:25:37:927673-gvAA01-client-1-0-4, >> error-xlator: gvAA01-locks [Invalid argument] >> >> >> [2019-04-20 13:35:07.495495] E [rpcsvc.c:1364:rpcsvc_submit_generic] >> 0-rpc-service: failed to submit message (XID: 0x247c644, Program: GlusterFS >> 3.3, ProgVers: 330, Proc: 27) to rpc-transport (tcp.gvAA01-server) >> [2019-04-20 13:35:07.495619] E [server.c:195:server_submit_reply] >> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.14/xlator/debug/io-stats.so(+0x1696a) >> [0x7ff4ae6f796a] >> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.14/xlator/protocol/server.so(+0x2d6e8) >> [0x7ff4ae2a96e8] >> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.14/xlator/protocol/server.so(+0x928d) >> [0x7ff4ae28528d] ) 0-: Reply submission failed >> >> >> Fix the posix acls and see if these clear up over time as well, I?m >> unclear on what the overall effect of running without the posix acls will >> be to total gluster health. Your biggest problem sounds like you need to >> free up space on the volumes and get the overall volume health back up to >> par and see if that doesn?t resolve the symptoms you?re seeing. >> >> >> >> Thank you again for your assistance. It is greatly appreciated. >> >> - Patrick >> >> >> >> On Sat, Apr 20, 2019 at 10:50 PM Darrell Budic >> wrote: >> >> Patrick, >> >> I would definitely upgrade your two nodes from 3.12.14 to 3.12.15. You >> also mention ZFS, and that error you show makes me think you need to check >> to be sure you have ?xattr=sa? and ?acltype=posixacl? set on your ZFS >> volumes. >> >> You also observed your bricks are crossing the 95% full line, ZFS >> performance will degrade significantly the closer you get to full. In my >> experience, this starts somewhere between 10% and 5% free space remaining, >> so you?re in that realm. >> >> How?s your free memory on the servers doing? Do you have your zfs arc >> cache limited to something less than all the RAM? It shares pretty well, >> but I?ve encountered situations where other things won?t try and take ram >> back properly if they think it?s in use, so ZFS never gets the opportunity >> to give it up. >> >> Since your volume is a disperse-replica, you might try tuning >> disperse.shd-max-threads, default is 1, I?d try it at 2, 4, or even more if >> the CPUs are beefy enough. And setting server.event-threads to 4 and >> client.event-threads to 8 has proven helpful in many cases. After you get >> upgraded to 3.12.15, enabling performance.stat-prefetch may help as well. I >> don?t know if it matters, but I?d also recommend resetting >> performance.least-prio-threads to the default of 1 (or try 2 or 4) and/or >> also setting performance.io-thread-count to 32 if those have beefy CPUs. >> >> Beyond those general ideas, more info about your hardware (CPU and RAM) >> and workload (VMs, direct storage for web servers or enders, etc) may net >> you some more ideas. Then you?re going to have to do more digging into >> brick logs looking for errors and/or warnings to see what?s going on. >> >> -Darrell >> >> >> On Apr 20, 2019, at 8:22 AM, Patrick Rennie >> wrote: >> >> Hello Gluster Users, >> >> I am hoping someone can help me with resolving an ongoing issue I've been >> having, I'm new to mailing lists so forgive me if I have gotten anything >> wrong. We have noticed our performance deteriorating over the last few >> weeks, easily measured by trying to do an ls on one of our top-level >> folders, and timing it, which usually would take 2-5 seconds, and now takes >> up to 20 minutes, which obviously renders our cluster basically unusable. >> This has been intermittent in the past but is now almost constant and I am >> not sure how to work out the exact cause. We have noticed some errors in >> the brick logs, and have noticed that if we kill the right brick process, >> performance instantly returns back to normal, this is not always the same >> brick, but it indicates to me something in the brick processes or >> background tasks may be causing extreme latency. Due to this ability to fix >> it by killing the right brick process off, I think it's a specific file, or >> folder, or operation which may be hanging and causing the increased >> latency, but I am not sure how to work it out. One last thing to add is >> that our bricks are getting quite full (~95% full), we are trying to >> migrate data off to new storage but that is going slowly, not helped by >> this issue. I am currently trying to run a full heal as there appear to be >> many files needing healing, and I have all brick processes running so they >> have an opportunity to heal, but this means performance is very poor. It >> currently takes over 15-20 minutes to do an ls of one of our top-level >> folders, which just contains 60-80 other folders, this should take 2-5 >> seconds. This is all being checked by FUSE mount locally on the storage >> node itself, but it is the same for other clients and VMs accessing the >> cluster. Initially, it seemed our NFS mounts were not affected and operated >> at normal speed, but testing over the last day has shown that our NFS >> clients are also extremely slow, so it doesn't seem specific to FUSE as I >> first thought it might be. >> >> I am not sure how to proceed from here, I am fairly new to gluster having >> inherited this setup from my predecessor and trying to keep it going. I >> have included some info below to try and help with diagnosis, please let me >> know if any further info would be helpful. I would really appreciate any >> advice on what I could try to work out the cause. Thank you in advance for >> reading this, and any suggestions you might be able to offer. >> >> - Patrick >> >> This is an example of the main error I see in our brick logs, there have >> been others, I can post them when I see them again too: >> [2019-04-20 04:54:43.055680] E [MSGID: 113001] >> [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on >> /brick1/ library: system.posix_acl_default [Operation not >> supported] >> [2019-04-20 05:01:29.476313] W [posix.c:4929:posix_getxattr] >> 0-gvAA01-posix: Extended attributes not supported (try remounting brick >> with 'user_xattr' flag) >> >> Our setup consists of 2 storage nodes and an arbiter node. I have noticed >> our nodes are on slightly different versions, I'm not sure if this could be >> an issue. We have 9 bricks on each node, made up of ZFS RAIDZ2 pools - >> total capacity is around 560TB. >> We have bonded 10gbps NICS on each node, and I have tested bandwidth with >> iperf and found that it's what would be expected from this config. >> Individual brick performance seems ok, I've tested several bricks using >> dd and can write a 10GB files at 1.7GB/s. >> >> # dd if=/dev/zero of=/brick1/test/test.file bs=1M count=10000 >> 10000+0 records in >> 10000+0 records out >> 10485760000 bytes (10 GB, 9.8 GiB) copied, 6.20303 s, 1.7 GB/s >> >> Node 1: >> # glusterfs --version >> glusterfs 3.12.15 >> >> Node 2: >> # glusterfs --version >> glusterfs 3.12.14 >> >> Arbiter: >> # glusterfs --version >> glusterfs 3.12.14 >> >> Here is our gluster volume status: >> >> # gluster volume status >> Status of volume: gvAA01 >> Gluster process TCP Port RDMA Port Online >> Pid >> >> ------------------------------------------------------------------------------ >> Brick 01-B:/brick1/gvAA01/brick 49152 0 Y 7219 >> Brick 02-B:/brick1/gvAA01/brick 49152 0 Y 21845 >> Brick 00-A:/arbiterAA01/gvAA01/bri >> ck1 49152 0 Y >> 6931 >> Brick 01-B:/brick2/gvAA01/brick 49153 0 Y 7239 >> Brick 02-B:/brick2/gvAA01/brick 49153 0 Y 9916 >> Brick 00-A:/arbiterAA01/gvAA01/bri >> ck2 49153 0 Y >> 6939 >> Brick 01-B:/brick3/gvAA01/brick 49154 0 Y 7235 >> Brick 02-B:/brick3/gvAA01/brick 49154 0 Y 21858 >> Brick 00-A:/arbiterAA01/gvAA01/bri >> ck3 49154 0 Y >> 6947 >> Brick 01-B:/brick4/gvAA01/brick 49155 0 Y 31840 >> Brick 02-B:/brick4/gvAA01/brick 49155 0 Y 9933 >> Brick 00-A:/arbiterAA01/gvAA01/bri >> ck4 49155 0 Y >> 6956 >> Brick 01-B:/brick5/gvAA01/brick 49156 0 Y 7233 >> Brick 02-B:/brick5/gvAA01/brick 49156 0 Y 9942 >> Brick 00-A:/arbiterAA01/gvAA01/bri >> ck5 49156 0 Y >> 6964 >> Brick 01-B:/brick6/gvAA01/brick 49157 0 Y 7234 >> Brick 02-B:/brick6/gvAA01/brick 49157 0 Y 9952 >> Brick 00-A:/arbiterAA01/gvAA01/bri >> ck6 49157 0 Y >> 6974 >> Brick 01-B:/brick7/gvAA01/brick 49158 0 Y 7248 >> Brick 02-B:/brick7/gvAA01/brick 49158 0 Y 9960 >> Brick 00-A:/arbiterAA01/gvAA01/bri >> ck7 49158 0 Y >> 6984 >> Brick 01-B:/brick8/gvAA01/brick 49159 0 Y 7253 >> Brick 02-B:/brick8/gvAA01/brick 49159 0 Y 9970 >> Brick 00-A:/arbiterAA01/gvAA01/bri >> ck8 49159 0 Y >> 6993 >> Brick 01-B:/brick9/gvAA01/brick 49160 0 Y 7245 >> Brick 02-B:/brick9/gvAA01/brick 49160 0 Y 9984 >> Brick 00-A:/arbiterAA01/gvAA01/bri >> ck9 49160 0 Y >> 7001 >> NFS Server on localhost 2049 0 Y >> 17276 >> Self-heal Daemon on localhost N/A N/A Y >> 25245 >> NFS Server on 02-B 2049 0 Y 9089 >> Self-heal Daemon on 02-B N/A N/A Y 17838 >> NFS Server on 00-a 2049 0 Y 15660 >> Self-heal Daemon on 00-a N/A N/A Y 16218 >> >> Task Status of Volume gvAA01 >> >> ------------------------------------------------------------------------------ >> There are no active volume tasks >> >> And gluster volume info: >> >> # gluster volume info >> >> Volume Name: gvAA01 >> Type: Distributed-Replicate >> Volume ID: ca4ece2c-13fe-414b-856c-2878196d6118 >> Status: Started >> Snapshot Count: 0 >> Number of Bricks: 9 x (2 + 1) = 27 >> Transport-type: tcp >> Bricks: >> Brick1: 01-B:/brick1/gvAA01/brick >> Brick2: 02-B:/brick1/gvAA01/brick >> Brick3: 00-A:/arbiterAA01/gvAA01/brick1 (arbiter) >> Brick4: 01-B:/brick2/gvAA01/brick >> Brick5: 02-B:/brick2/gvAA01/brick >> Brick6: 00-A:/arbiterAA01/gvAA01/brick2 (arbiter) >> Brick7: 01-B:/brick3/gvAA01/brick >> Brick8: 02-B:/brick3/gvAA01/brick >> Brick9: 00-A:/arbiterAA01/gvAA01/brick3 (arbiter) >> Brick10: 01-B:/brick4/gvAA01/brick >> Brick11: 02-B:/brick4/gvAA01/brick >> Brick12: 00-A:/arbiterAA01/gvAA01/brick4 (arbiter) >> Brick13: 01-B:/brick5/gvAA01/brick >> Brick14: 02-B:/brick5/gvAA01/brick >> Brick15: 00-A:/arbiterAA01/gvAA01/brick5 (arbiter) >> Brick16: 01-B:/brick6/gvAA01/brick >> Brick17: 02-B:/brick6/gvAA01/brick >> Brick18: 00-A:/arbiterAA01/gvAA01/brick6 (arbiter) >> Brick19: 01-B:/brick7/gvAA01/brick >> Brick20: 02-B:/brick7/gvAA01/brick >> Brick21: 00-A:/arbiterAA01/gvAA01/brick7 (arbiter) >> Brick22: 01-B:/brick8/gvAA01/brick >> Brick23: 02-B:/brick8/gvAA01/brick >> Brick24: 00-A:/arbiterAA01/gvAA01/brick8 (arbiter) >> Brick25: 01-B:/brick9/gvAA01/brick >> Brick26: 02-B:/brick9/gvAA01/brick >> Brick27: 00-A:/arbiterAA01/gvAA01/brick9 (arbiter) >> Options Reconfigured: >> cluster.shd-max-threads: 4 >> performance.least-prio-threads: 16 >> cluster.readdir-optimize: on >> performance.quick-read: off >> performance.stat-prefetch: off >> cluster.data-self-heal: on >> cluster.lookup-unhashed: auto >> cluster.lookup-optimize: on >> cluster.favorite-child-policy: mtime >> server.allow-insecure: on >> transport.address-family: inet >> client.bind-insecure: on >> cluster.entry-self-heal: off >> cluster.metadata-self-heal: off >> performance.md-cache-timeout: 600 >> cluster.self-heal-daemon: enable >> performance.readdir-ahead: on >> diagnostics.brick-log-level: INFO >> nfs.disable: off >> >> Thank you for any assistance. >> >> - Patrick >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users >> >> >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From meira at cesup.ufrgs.br Sun Apr 21 14:34:09 2019 From: meira at cesup.ufrgs.br (Lindolfo Meira) Date: Sun, 21 Apr 2019 11:34:09 -0300 (-03) Subject: [Gluster-users] Enabling quotas on gluster In-Reply-To: References: Message-ID: Thanks Hari. Lindolfo Meira, MSc Diretor Geral, Centro Nacional de Supercomputa??o Universidade Federal do Rio Grande do Sul +55 (51) 3308-3139 On Thu, 4 Apr 2019, Hari Gowtham wrote: > Hi, > > The performance hit that quota causes depended on a number of factors > like: > 1) the number of files, > 2) the depth of the directories in the FS > 3) the breadth of the directories in the FS > 4) the number of bricks. > > These are the main contributions to the performance hit. > If the volume is of lesser size then quota should work fine. > Let us know more about your use case to help you better. > > Note: gluster quota is not being actively worked on. > > On Thu, Apr 4, 2019 at 3:45 AM Lindolfo Meira wrote: > > > > Hi folks. > > > > Does anyone know how significant is the performance penalty for enabling > > directory level quotas on a gluster fs, compared to the case with no > > quotas at all? > > > > > > Lindolfo Meira, MSc > > Diretor Geral, Centro Nacional de Supercomputa??o > > Universidade Federal do Rio Grande do Sul > > +55 (51) 3308-3139_______________________________________________ > > Gluster-users mailing list > > Gluster-users at gluster.org > > https://lists.gluster.org/mailman/listinfo/gluster-users > > > > -- > Regards, > Hari Gowtham. > From patrickmrennie at gmail.com Sun Apr 21 15:03:48 2019 From: patrickmrennie at gmail.com (Patrick Rennie) Date: Sun, 21 Apr 2019 23:03:48 +0800 Subject: [Gluster-users] Extremely slow cluster performance In-Reply-To: References: Message-ID: I just tried to check my "gluster volume heal gvAA01 statistics" and it doesn't seem like a full heal was still in progress, just an index, I have started the full heal again and am trying to monitor it with "gluster volume heal gvAA01 info" which just shows me thousands of gfid file identifiers scrolling past. What is the best way to check the status of a heal and track the files healed and progress to completion? Thank you, - Patrick On Sun, Apr 21, 2019 at 10:28 PM Patrick Rennie wrote: > I think just worked out why NFS lookups are sometimes slow and sometimes > fast as the hostname uses round robin DNS lookups, if I change to a > specific host, 01-B, it's always quick, and if I change to the other brick > host, 02-B, it's always slow. > Maybe that will help to narrow this down? > > On Sun, Apr 21, 2019 at 10:24 PM Patrick Rennie > wrote: > >> Hi Strahil, >> >> Thank you for your reply and your suggestions. I'm not sure which logs >> would be most relevant to be checking to diagnose this issue, we have the >> brick logs, the cluster mount logs, the shd logs or something else? I have >> posted a few that I have seen repeated a few times already. I will continue >> to post anything further that I see. >> I am working on migrating data to some new storage, so this will slowly >> free up space, although this is a production cluster and new data is being >> uploaded every day, sometimes faster than I can migrate it off. I have >> several other similar clusters and none of them have the same problem, one >> the others is actually at 98-99% right now (big problem, I know) but still >> performs perfectly fine compared to this cluster, I am not sure low space >> is the root cause here. >> >> I currently have 13 VMs accessing this cluster, I have checked each one >> and all of them use one of the two options below to mount the cluster in >> fstab >> >> HOSTNAME:/gvAA01 /mountpoint glusterfs >> defaults,_netdev,rw,log-level=WARNING,direct-io-mode=disable,use-readdirp=no >> 0 0 >> HOSTNAME:/gvAA01 /mountpoint glusterfs >> defaults,_netdev,rw,log-level=WARNING,direct-io-mode=disable >> >> I also have a few other VMs which use NFS to access the cluster, and >> these machines appear to be significantly quicker, initially I get a >> similar delay with NFS but if I cancel the first "ls" and try it again I >> get < 1 sec lookups, this can take over 10 minutes by FUSE/gluster client, >> but the same trick of cancelling and trying again doesn't work for >> FUSE/gluster. Sometimes the NFS queries have no delay at all, so this is a >> bit strange to me. >> HOSTNAME:/gvAA01 /mountpoint/ nfs >> defaults,_netdev,vers=3,async,noatime 0 0 >> >> Example: >> user at VM:~$ time ls /cluster/folder >> ^C >> >> real 9m49.383s >> user 0m0.001s >> sys 0m0.010s >> >> user at VM:~$ time ls /cluster/folder >> >> >> real 0m0.069s >> user 0m0.001s >> sys 0m0.007s >> >> --- >> >> I have checked the profiling as you suggested, I let it run for around a >> minute, then cancelled it and saved the profile info. >> >> root at HOSTNAME:/var/log/glusterfs# gluster volume profile gvAA01 start >> Starting volume profile on gvAA01 has been successful >> root at HOSTNAME:/var/log/glusterfs# time ls /cluster/folder >> ^C >> >> real 1m1.660s >> user 0m0.000s >> sys 0m0.002s >> >> root at HOSTNAME:/var/log/glusterfs# gluster volume profile gvAA01 info >> >> ~/profile.txt >> root at HOSTNAME:/var/log/glusterfs# gluster volume profile gvAA01 stop >> >> I will attach the results to this email as it's over 1000 lines. >> Unfortunately, I'm not sure what I'm looking at but possibly somebody will >> be able to help me make sense of it and let me know if it highlights any >> specific issues. >> >> Happy to try any further suggestions. Thank you, >> >> -Patrick >> >> On Sun, Apr 21, 2019 at 7:55 PM Strahil wrote: >> >>> By the way, can you provide the 'volume info' and the mount options on >>> all clients? >>> Maybe , there is an option that uses a lot of resources due to some >>> client's mount options. >>> >>> Best Regards, >>> Strahil Nikolov >>> On Apr 21, 2019 10:55, Patrick Rennie wrote: >>> >>> Just another small update, I'm continuing to watch my brick logs and I >>> just saw these errors come up in the recent events too. I am going to >>> continue to post any errors I see in the hope of finding the right one to >>> try and fix.. >>> This is from the logs on brick1, seems to be occurring on both nodes on >>> brick1, although at different times. I'm not sure what this means, can >>> anyone shed any light? >>> I guess I am looking for some kind of specific error which may indicate >>> something is broken or stuck and locking up and causing the extreme latency >>> I'm seeing in the cluster. >>> >>> [2019-04-21 07:25:55.064497] E [rpcsvc.c:1364:rpcsvc_submit_generic] >>> 0-rpc-service: failed to submit message (XID: 0x7c700c, Program: GlusterFS >>> 3.3, ProgVers: 330, Proc: 29) to rpc-transport (tcp.gvAA01-server) >>> [2019-04-21 07:25:55.064612] E [server.c:195:server_submit_reply] >>> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e58a) >>> [0x7f3b3e93158a] >>> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17d45) >>> [0x7f3b3e4c5d45] >>> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) >>> [0x7f3b3e4b72cd] ) 0-: Reply submission failed >>> [2019-04-21 07:25:55.064675] E [rpcsvc.c:1364:rpcsvc_submit_generic] >>> 0-rpc-service: failed to submit message (XID: 0x7c70af, Program: GlusterFS >>> 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server) >>> [2019-04-21 07:25:55.064705] E [server.c:195:server_submit_reply] >>> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa) >>> [0x7f3b3e9318fa] >>> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35) >>> [0x7f3b3e4c5f35] >>> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) >>> [0x7f3b3e4b72cd] ) 0-: Reply submission failed >>> [2019-04-21 07:25:55.064742] E [rpcsvc.c:1364:rpcsvc_submit_generic] >>> 0-rpc-service: failed to submit message (XID: 0x7c723c, Program: GlusterFS >>> 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server) >>> [2019-04-21 07:25:55.064768] E [server.c:195:server_submit_reply] >>> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa) >>> [0x7f3b3e9318fa] >>> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35) >>> [0x7f3b3e4c5f35] >>> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) >>> [0x7f3b3e4b72cd] ) 0-: Reply submission failed >>> [2019-04-21 07:25:55.064812] E [rpcsvc.c:1364:rpcsvc_submit_generic] >>> 0-rpc-service: failed to submit message (XID: 0x7c72b4, Program: GlusterFS >>> 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server) >>> [2019-04-21 07:25:55.064837] E [server.c:195:server_submit_reply] >>> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa) >>> [0x7f3b3e9318fa] >>> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35) >>> [0x7f3b3e4c5f35] >>> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) >>> [0x7f3b3e4b72cd] ) 0-: Reply submission failed >>> [2019-04-21 07:25:55.064880] E [rpcsvc.c:1364:rpcsvc_submit_generic] >>> 0-rpc-service: failed to submit message (XID: 0x7c740b, Program: GlusterFS >>> 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server) >>> [2019-04-21 07:25:55.064905] E [server.c:195:server_submit_reply] >>> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa) >>> [0x7f3b3e9318fa] >>> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35) >>> [0x7f3b3e4c5f35] >>> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) >>> [0x7f3b3e4b72cd] ) 0-: Reply submission failed >>> [2019-04-21 07:25:55.064939] E [rpcsvc.c:1364:rpcsvc_submit_generic] >>> 0-rpc-service: failed to submit message (XID: 0x7c7441, Program: GlusterFS >>> 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server) >>> [2019-04-21 07:25:55.064962] E [server.c:195:server_submit_reply] >>> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa) >>> [0x7f3b3e9318fa] >>> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35) >>> [0x7f3b3e4c5f35] >>> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) >>> [0x7f3b3e4b72cd] ) 0-: Reply submission failed >>> [2019-04-21 07:25:55.064996] E [rpcsvc.c:1364:rpcsvc_submit_generic] >>> 0-rpc-service: failed to submit message (XID: 0x7c74d5, Program: GlusterFS >>> 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server) >>> [2019-04-21 07:25:55.065020] E [server.c:195:server_submit_reply] >>> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa) >>> [0x7f3b3e9318fa] >>> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35) >>> [0x7f3b3e4c5f35] >>> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) >>> [0x7f3b3e4b72cd] ) 0-: Reply submission failed >>> [2019-04-21 07:25:55.065052] E [rpcsvc.c:1364:rpcsvc_submit_generic] >>> 0-rpc-service: failed to submit message (XID: 0x7c7551, Program: GlusterFS >>> 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server) >>> [2019-04-21 07:25:55.065076] E [server.c:195:server_submit_reply] >>> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa) >>> [0x7f3b3e9318fa] >>> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35) >>> [0x7f3b3e4c5f35] >>> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) >>> [0x7f3b3e4b72cd] ) 0-: Reply submission failed >>> [2019-04-21 07:25:55.065110] E [rpcsvc.c:1364:rpcsvc_submit_generic] >>> 0-rpc-service: failed to submit message (XID: 0x7c76d1, Program: GlusterFS >>> 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server) >>> [2019-04-21 07:25:55.065133] E [server.c:195:server_submit_reply] >>> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa) >>> [0x7f3b3e9318fa] >>> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35) >>> [0x7f3b3e4c5f35] >>> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) >>> [0x7f3b3e4b72cd] ) 0-: Reply submission failed >>> >>> Thanks again, >>> >>> -Patrick >>> >>> On Sun, Apr 21, 2019 at 3:50 PM Patrick Rennie >>> wrote: >>> >>> Hi Darrell, >>> >>> Thanks again for your advice, I've left it for a while but unfortunately >>> it's still just as slow and causing more problems for our operations now. I >>> will need to try and take some steps to at least bring performance back to >>> normal while continuing to investigate the issue longer term. I can >>> definitely see one node with heavier CPU than the other, almost double, >>> which I am OK with, but I think the heal process is going to take forever, >>> trying to check the "gluster volume heal info" shows thousands and >>> thousands of files which may need healing, I have no idea how many in total >>> the command is still running after hours, so I am not sure what has gone so >>> wrong to cause this. >>> >>> I've checked cluster.op-version and cluster.max-op-version and it looks >>> like I'm on the latest version there. >>> >>> I have no idea how long the healing is going to take on this cluster, we >>> have around 560TB of data on here, but I don't think I can wait that long >>> to try and restore performance to normal. >>> >>> Can anyone think of anything else I can try in the meantime to work out >>> what's causing the extreme latency? >>> >>> I've been going through cluster client the logs of some of our VMs and >>> on some of our FTP servers I found this in the cluster mount log, but I am >>> not seeing it on any of our other servers, just our FTP servers. >>> >>> [2019-04-21 07:16:19.925388] E [MSGID: 101046] >>> [dht-common.c:1904:dht_revalidate_cbk] 0-gvAA01-dht: dict is null >>> [2019-04-21 07:19:43.413834] W [MSGID: 114031] >>> [client-rpc-fops.c:2203:client3_3_setattr_cbk] 0-gvAA01-client-19: remote >>> operation failed [No such file or directory] >>> [2019-04-21 07:19:43.414153] W [MSGID: 114031] >>> [client-rpc-fops.c:2203:client3_3_setattr_cbk] 0-gvAA01-client-20: remote >>> operation failed [No such file or directory] >>> [2019-04-21 07:23:33.154717] E [MSGID: 101046] >>> [dht-common.c:1904:dht_revalidate_cbk] 0-gvAA01-dht: dict is null >>> [2019-04-21 07:33:24.943913] E [MSGID: 101046] >>> [dht-common.c:1904:dht_revalidate_cbk] 0-gvAA01-dht: dict is null >>> >>> Any ideas what this could mean? I am basically just grasping at straws >>> here. >>> >>> I am going to hold off on the version upgrade until I know there are no >>> files which need healing, which could be a while, from some reading I've >>> done there shouldn't be any issues with this as both are on v3.12.x >>> >>> I've free'd up a small amount of space, but I still need to work on this >>> further. >>> >>> I've read of a command "find .glusterfs -type f -links -2 -exec rm {} >>> \;" which could be run on each brick and it would potentially clean up any >>> files which were deleted straight from the bricks, but not via the client, >>> I have a feeling this could help me free up about 5-10TB per brick from >>> what I've been told about the history of this cluster. Can anyone confirm >>> if this is actually safe to run? >>> >>> At this stage, I'm open to any suggestions as to how to proceed, thanks >>> again for any advice. >>> >>> Cheers, >>> >>> - Patrick >>> >>> On Sun, Apr 21, 2019 at 1:22 AM Darrell Budic >>> wrote: >>> >>> Patrick, >>> >>> Sounds like progress. Be aware that gluster is expected to max out the >>> CPUs on at least one of your servers while healing. This is normal and >>> won?t adversely affect overall performance (any more than having bricks in >>> need of healing, at any rate) unless you?re overdoing it. shd threads <= 4 >>> should not do that on your hardware. Other tunings may have also increased >>> overall performance, so you may see higher CPU than previously anyway. I?d >>> recommend upping those thread counts and letting it heal as fast as >>> possible, especially if these are dedicated Gluster storage servers (Ie: >>> not also running VMs, etc). You should see ?normal? CPU use one heals are >>> completed. I see ~15-30% overall normally, 95-98% while healing (x my 20 >>> cores). It?s also likely to be different between your servers, in a pure >>> replica, one tends to max and one tends to be a little higher, in a >>> distributed-replica, I?d expect more than one to run harder while healing. >>> >>> Keep the differences between doing an ls on a brick and doing an ls on a >>> gluster mount in mind. When you do a ls on a gluster volume, it isn?t just >>> doing a ls on one brick, it?s effectively doing it on ALL of your bricks, >>> and they all have to return data before the ls succeeds. In a distributed >>> volume, it?s figuring out where on each volume things live and getting the >>> stat() from each to assemble the whole thing. And if things are in need of >>> healing, it will take even longer to decide which version is current and >>> use it (shd triggers a heal anytime it encounters this). Any of these >>> things being slow slows down the overall response. >>> >>> At this point, I?d get some sleep too, and let your cluster heal while >>> you do. I?d really want it fully healed before I did any updates anyway, so >>> let it use CPU and get itself sorted out. Expect it to do a round of >>> healing after you upgrade each machine too, this is normal so don?t let the >>> CPU spike surprise you, It?s just catching up from the downtime incurred by >>> the update and/or reboot if you did one. >>> >>> That reminds me, check your gluster cluster.op-version and >>> cluster.max-op-version (gluster vol get all all | grep op-version). If >>> op-version isn?t at the max-op-verison, set it to it so you?re taking >>> advantage of the latest features available to your version. >>> >>> -Darrell >>> >>> On Apr 20, 2019, at 11:54 AM, Patrick Rennie >>> wrote: >>> >>> Hi Darrell, >>> >>> Thanks again for your advice, I've applied the acltype=posixacl on my >>> zpools and I think that has reduced some of the noise from my brick logs. >>> I also bumped up some of the thread counts you suggested but my CPU load >>> skyrocketed, so I dropped it back down to something slightly lower, but >>> still higher than it was before, and will see how that goes for a while. >>> >>> Although low space is a definite issue, if I run an ls anywhere on my >>> bricks directly it's instant, <1 second, and still takes several minutes >>> via gluster, so there is still a problem in my gluster configuration >>> somewhere. We don't have any snapshots, but I am trying to work out if any >>> data on there is safe to delete, or if there is any way I can safely find >>> and delete data which has been removed directly from the bricks in the >>> past. I also have lz4 compression already enabled on each zpool which does >>> help a bit, we get between 1.05 and 1.08x compression on this data. >>> I've tried to go through each client and checked it's cluster mount logs >>> and also my brick logs and looking for errors, so far nothing is jumping >>> out at me, but there are some warnings and errors here and there, I am >>> trying to work out what they mean. >>> >>> It's already 1 am here and unfortunately, I'm still awake working on >>> this issue, but I think that I will have to leave the version upgrades >>> until tomorrow. >>> >>> Thanks again for your advice so far. If anyone has any ideas on where I >>> can look for errors other than brick logs or the cluster mount logs to help >>> resolve this issue, it would be much appreciated. >>> >>> Cheers, >>> >>> - Patrick >>> >>> On Sat, Apr 20, 2019 at 11:57 PM Darrell Budic >>> wrote: >>> >>> See inline: >>> >>> On Apr 20, 2019, at 10:09 AM, Patrick Rennie >>> wrote: >>> >>> Hi Darrell, >>> >>> Thanks for your reply, this issue seems to be getting worse over the >>> last few days, really has me tearing my hair out. I will do as you have >>> suggested and get started on upgrading from 3.12.14 to 3.12.15. >>> I've checked the zfs properties and all bricks have "xattr=sa" set, but >>> none of them has "acltype=posixacl" set, currently the acltype property >>> shows "off", if I make these changes will it apply retroactively to the >>> existing data? I'm unfamiliar with what this will change so I may need to >>> look into that before I proceed. >>> >>> >>> It is safe to apply that now, any new set/get calls will then use it if >>> new posixacls exist, and use older if not. ZFS is good that way. It should >>> clear up your posix_acl and posix errors over time. >>> >>> I understand performance is going to slow down as the bricks get full, I >>> am currently trying to free space and migrate data to some newer storage, I >>> have fresh several hundred TB storage I just setup recently but with these >>> performance issues it's really slow. I also believe there is significant >>> data which has been deleted directly from the bricks in the past, so if I >>> can reclaim this space in a safe manner then I will have at least around >>> 10-15% free space. >>> >>> >>> Full ZFS volumes will have a much larger impact on performance than >>> you?d think, I?d prioritize this. If you have been taking zfs snapshots, >>> consider deleting them to get the overall volume free space back up. And >>> just to be sure it?s been said, delete from within the mounted volumes, >>> don?t delete directly from the bricks (gluster will just try and heal it >>> later, compounding your issues). Does not apply to deleting other data from >>> the ZFS volume if it?s not part of the brick directory, of course. >>> >>> These servers have dual 8 core Xeon (E5-2620v4) and 512GB of RAM so >>> generally they have plenty of resources available, currently only using >>> around 330/512GB of memory. >>> >>> I will look into what your suggested settings will change, and then will >>> probably go ahead with your recommendations, for our specs as stated above, >>> what would you suggest for performance.io-thread-count ? >>> >>> >>> I run single 2630v4s on my servers, which have a smaller storage >>> footprint than yours. I?d go with 32 for performance.io-thread-count. >>> I?d try 4 for the shd thread settings on that gear. Your memory use sounds >>> fine, so no worries there. >>> >>> Our workload is nothing too extreme, we have a few VMs which write >>> backup data to this storage nightly for our clients, our VMs don't live on >>> this cluster, but just write to it. >>> >>> >>> If they are writing compressible data, you?ll get immediate benefit by >>> setting compression=lz4 on your ZFS volumes. It won?t help any old data, of >>> course, but it will compress new data going forward. This is another one >>> that?s safe to enable on the fly. >>> >>> I've been going through all of the logs I can, below are some slightly >>> sanitized errors I've come across, but I'm not sure what to make of them. >>> The main error I am seeing is the first one below, across several of my >>> bricks, but possibly only for specific folders on the cluster, I'm not 100% >>> about that yet though. >>> >>> [2019-04-20 05:56:59.512649] E [MSGID: 113001] >>> [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on >>> /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default [Operation not >>> supported] >>> [2019-04-20 05:59:06.084333] E [MSGID: 113001] >>> [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on >>> /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default [Operation not >>> supported] >>> [2019-04-20 05:59:43.289030] E [MSGID: 113001] >>> [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on >>> /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default [Operation not >>> supported] >>> [2019-04-20 05:59:50.582257] E [MSGID: 113001] >>> [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on >>> /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default [Operation not >>> supported] >>> [2019-04-20 06:01:42.501701] E [MSGID: 113001] >>> [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on >>> /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default [Operation not >>> supported] >>> [2019-04-20 06:01:51.665354] W [posix.c:4929:posix_getxattr] >>> 0-gvAA01-posix: Extended attributes not supported (try remounting brick >>> with 'user_xattr' flag) >>> >>> >>> [2019-04-20 13:12:36.131856] E [MSGID: 113002] >>> [posix-helpers.c:893:posix_gfid_set] 0-gvAA01-posix: gfid is null for >>> /xxxxxxxxxxxxxxxxxxxx [Invalid argument] >>> [2019-04-20 13:12:36.131959] E [MSGID: 113002] >>> [posix.c:362:posix_lookup] 0-gvAA01-posix: buf->ia_gfid is null for >>> /brick2/xxxxxxxxxxxxxxxxxxxx_62906_tmp [No data available] >>> [2019-04-20 13:12:36.132016] E [MSGID: 115050] >>> [server-rpc-fops.c:175:server_lookup_cbk] 0-gvAA01-server: 24274759: LOOKUP >>> /xxxxxxxxxxxxxxxxxxxx (a7c9b4a0-b7ee-4d01-a79e-576013c8ac87/Cloud >>> Backup_clone1.vbm_62906_tmp), client: >>> 00-A-16217-2019/04/08-21:23:03:692424-gvAA01-client-4-0-3, error-xlator: >>> gvAA01-posix [No data available] >>> [2019-04-20 13:12:38.093719] E [MSGID: 115050] >>> [server-rpc-fops.c:175:server_lookup_cbk] 0-gvAA01-server: 24276491: LOOKUP >>> /xxxxxxxxxxxxxxxxxxxx (a7c9b4a0-b7ee-4d01-a79e-576013c8ac87/Cloud >>> Backup_clone1.vbm_62906_tmp), client: >>> 00-A-16217-2019/04/08-21:23:03:692424-gvAA01-client-4-0-3, error-xlator: >>> gvAA01-posix [No data available] >>> [2019-04-20 13:12:38.093660] E [MSGID: 113002] >>> [posix-helpers.c:893:posix_gfid_set] 0-gvAA01-posix: gfid is null for >>> /xxxxxxxxxxxxxxxxxxxx [Invalid argument] >>> [2019-04-20 13:12:38.093696] E [MSGID: 113002] >>> [posix.c:362:posix_lookup] 0-gvAA01-posix: buf->ia_gfid is null for >>> /brick2/xxxxxxxxxxxxxxxxxxxx [No data available] >>> >>> >>> posixacls should clear those up, as mentioned. >>> >>> >>> [2019-04-20 14:25:59.654576] E [inodelk.c:404:__inode_unlock_lock] >>> 0-gvAA01-locks: Matching lock not found for unlock 0-9223372036854775807, >>> by 980fdbbd367f0000 on 0x7fc4f0161440 >>> [2019-04-20 14:25:59.654668] E [MSGID: 115053] >>> [server-rpc-fops.c:295:server_inodelk_cbk] 0-gvAA01-server: 6092928: >>> INODELK /xxxxxxxxxxxxxxxxxxxx.cdr$ (25b14631-a179-4274-8243-6e272d4f2ad8), >>> client: >>> cb-per-worker18-53637-2019/04/19-14:25:37:927673-gvAA01-client-1-0-4, >>> error-xlator: gvAA01-locks [Invalid argument] >>> >>> >>> [2019-04-20 13:35:07.495495] E [rpcsvc.c:1364:rpcsvc_submit_generic] >>> 0-rpc-service: failed to submit message (XID: 0x247c644, Program: GlusterFS >>> 3.3, ProgVers: 330, Proc: 27) to rpc-transport (tcp.gvAA01-server) >>> [2019-04-20 13:35:07.495619] E [server.c:195:server_submit_reply] >>> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.14/xlator/debug/io-stats.so(+0x1696a) >>> [0x7ff4ae6f796a] >>> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.14/xlator/protocol/server.so(+0x2d6e8) >>> [0x7ff4ae2a96e8] >>> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.14/xlator/protocol/server.so(+0x928d) >>> [0x7ff4ae28528d] ) 0-: Reply submission failed >>> >>> >>> Fix the posix acls and see if these clear up over time as well, I?m >>> unclear on what the overall effect of running without the posix acls will >>> be to total gluster health. Your biggest problem sounds like you need to >>> free up space on the volumes and get the overall volume health back up to >>> par and see if that doesn?t resolve the symptoms you?re seeing. >>> >>> >>> >>> Thank you again for your assistance. It is greatly appreciated. >>> >>> - Patrick >>> >>> >>> >>> On Sat, Apr 20, 2019 at 10:50 PM Darrell Budic >>> wrote: >>> >>> Patrick, >>> >>> I would definitely upgrade your two nodes from 3.12.14 to 3.12.15. You >>> also mention ZFS, and that error you show makes me think you need to check >>> to be sure you have ?xattr=sa? and ?acltype=posixacl? set on your ZFS >>> volumes. >>> >>> You also observed your bricks are crossing the 95% full line, ZFS >>> performance will degrade significantly the closer you get to full. In my >>> experience, this starts somewhere between 10% and 5% free space remaining, >>> so you?re in that realm. >>> >>> How?s your free memory on the servers doing? Do you have your zfs arc >>> cache limited to something less than all the RAM? It shares pretty well, >>> but I?ve encountered situations where other things won?t try and take ram >>> back properly if they think it?s in use, so ZFS never gets the opportunity >>> to give it up. >>> >>> Since your volume is a disperse-replica, you might try tuning >>> disperse.shd-max-threads, default is 1, I?d try it at 2, 4, or even more if >>> the CPUs are beefy enough. And setting server.event-threads to 4 and >>> client.event-threads to 8 has proven helpful in many cases. After you get >>> upgraded to 3.12.15, enabling performance.stat-prefetch may help as well. I >>> don?t know if it matters, but I?d also recommend resetting >>> performance.least-prio-threads to the default of 1 (or try 2 or 4) and/or >>> also setting performance.io-thread-count to 32 if those have beefy CPUs. >>> >>> Beyond those general ideas, more info about your hardware (CPU and RAM) >>> and workload (VMs, direct storage for web servers or enders, etc) may net >>> you some more ideas. Then you?re going to have to do more digging into >>> brick logs looking for errors and/or warnings to see what?s going on. >>> >>> -Darrell >>> >>> >>> On Apr 20, 2019, at 8:22 AM, Patrick Rennie >>> wrote: >>> >>> Hello Gluster Users, >>> >>> I am hoping someone can help me with resolving an ongoing issue I've >>> been having, I'm new to mailing lists so forgive me if I have gotten >>> anything wrong. We have noticed our performance deteriorating over the last >>> few weeks, easily measured by trying to do an ls on one of our top-level >>> folders, and timing it, which usually would take 2-5 seconds, and now takes >>> up to 20 minutes, which obviously renders our cluster basically unusable. >>> This has been intermittent in the past but is now almost constant and I am >>> not sure how to work out the exact cause. We have noticed some errors in >>> the brick logs, and have noticed that if we kill the right brick process, >>> performance instantly returns back to normal, this is not always the same >>> brick, but it indicates to me something in the brick processes or >>> background tasks may be causing extreme latency. Due to this ability to fix >>> it by killing the right brick process off, I think it's a specific file, or >>> folder, or operation which may be hanging and causing the increased >>> latency, but I am not sure how to work it out. One last thing to add is >>> that our bricks are getting quite full (~95% full), we are trying to >>> migrate data off to new storage but that is going slowly, not helped by >>> this issue. I am currently trying to run a full heal as there appear to be >>> many files needing healing, and I have all brick processes running so they >>> have an opportunity to heal, but this means performance is very poor. It >>> currently takes over 15-20 minutes to do an ls of one of our top-level >>> folders, which just contains 60-80 other folders, this should take 2-5 >>> seconds. This is all being checked by FUSE mount locally on the storage >>> node itself, but it is the same for other clients and VMs accessing the >>> cluster. Initially, it seemed our NFS mounts were not affected and operated >>> at normal speed, but testing over the last day has shown that our NFS >>> clients are also extremely slow, so it doesn't seem specific to FUSE as I >>> first thought it might be. >>> >>> I am not sure how to proceed from here, I am fairly new to gluster >>> having inherited this setup from my predecessor and trying to keep it >>> going. I have included some info below to try and help with diagnosis, >>> please let me know if any further info would be helpful. I would really >>> appreciate any advice on what I could try to work out the cause. Thank you >>> in advance for reading this, and any suggestions you might be able to >>> offer. >>> >>> - Patrick >>> >>> This is an example of the main error I see in our brick logs, there have >>> been others, I can post them when I see them again too: >>> [2019-04-20 04:54:43.055680] E [MSGID: 113001] >>> [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on >>> /brick1/ library: system.posix_acl_default [Operation not >>> supported] >>> [2019-04-20 05:01:29.476313] W [posix.c:4929:posix_getxattr] >>> 0-gvAA01-posix: Extended attributes not supported (try remounting brick >>> with 'user_xattr' flag) >>> >>> Our setup consists of 2 storage nodes and an arbiter node. I have >>> noticed our nodes are on slightly different versions, I'm not sure if this >>> could be an issue. We have 9 bricks on each node, made up of ZFS RAIDZ2 >>> pools - total capacity is around 560TB. >>> We have bonded 10gbps NICS on each node, and I have tested bandwidth >>> with iperf and found that it's what would be expected from this config. >>> Individual brick performance seems ok, I've tested several bricks using >>> dd and can write a 10GB files at 1.7GB/s. >>> >>> # dd if=/dev/zero of=/brick1/test/test.file bs=1M count=10000 >>> 10000+0 records in >>> 10000+0 records out >>> 10485760000 bytes (10 GB, 9.8 GiB) copied, 6.20303 s, 1.7 GB/s >>> >>> Node 1: >>> # glusterfs --version >>> glusterfs 3.12.15 >>> >>> Node 2: >>> # glusterfs --version >>> glusterfs 3.12.14 >>> >>> Arbiter: >>> # glusterfs --version >>> glusterfs 3.12.14 >>> >>> Here is our gluster volume status: >>> >>> # gluster volume status >>> Status of volume: gvAA01 >>> Gluster process TCP Port RDMA Port Online >>> Pid >>> >>> ------------------------------------------------------------------------------ >>> Brick 01-B:/brick1/gvAA01/brick 49152 0 Y 7219 >>> Brick 02-B:/brick1/gvAA01/brick 49152 0 Y 21845 >>> Brick 00-A:/arbiterAA01/gvAA01/bri >>> ck1 49152 0 Y >>> 6931 >>> Brick 01-B:/brick2/gvAA01/brick 49153 0 Y 7239 >>> Brick 02-B:/brick2/gvAA01/brick 49153 0 Y 9916 >>> Brick 00-A:/arbiterAA01/gvAA01/bri >>> ck2 49153 0 Y >>> 6939 >>> Brick 01-B:/brick3/gvAA01/brick 49154 0 Y 7235 >>> Brick 02-B:/brick3/gvAA01/brick 49154 0 Y 21858 >>> Brick 00-A:/arbiterAA01/gvAA01/bri >>> ck3 49154 0 Y >>> 6947 >>> Brick 01-B:/brick4/gvAA01/brick 49155 0 Y 31840 >>> Brick 02-B:/brick4/gvAA01/brick 49155 0 Y 9933 >>> Brick 00-A:/arbiterAA01/gvAA01/bri >>> ck4 49155 0 Y >>> 6956 >>> Brick 01-B:/brick5/gvAA01/brick 49156 0 Y 7233 >>> Brick 02-B:/brick5/gvAA01/brick 49156 0 Y 9942 >>> Brick 00-A:/arbiterAA01/gvAA01/bri >>> ck5 49156 0 Y >>> 6964 >>> Brick 01-B:/brick6/gvAA01/brick 49157 0 Y 7234 >>> Brick 02-B:/brick6/gvAA01/brick 49157 0 Y 9952 >>> Brick 00-A:/arbiterAA01/gvAA01/bri >>> ck6 49157 0 Y >>> 6974 >>> Brick 01-B:/brick7/gvAA01/brick 49158 0 Y 7248 >>> Brick 02-B:/brick7/gvAA01/brick 49158 0 Y 9960 >>> Brick 00-A:/arbiterAA01/gvAA01/bri >>> ck7 49158 0 Y >>> 6984 >>> Brick 01-B:/brick8/gvAA01/brick 49159 0 Y 7253 >>> Brick 02-B:/brick8/gvAA01/brick 49159 0 Y 9970 >>> Brick 00-A:/arbiterAA01/gvAA01/bri >>> ck8 49159 0 Y >>> 6993 >>> Brick 01-B:/brick9/gvAA01/brick 49160 0 Y 7245 >>> Brick 02-B:/brick9/gvAA01/brick 49160 0 Y 9984 >>> Brick 00-A:/arbiterAA01/gvAA01/bri >>> ck9 49160 0 Y >>> 7001 >>> NFS Server on localhost 2049 0 Y >>> 17276 >>> Self-heal Daemon on localhost N/A N/A Y >>> 25245 >>> NFS Server on 02-B 2049 0 Y 9089 >>> Self-heal Daemon on 02-B N/A N/A Y 17838 >>> NFS Server on 00-a 2049 0 Y 15660 >>> Self-heal Daemon on 00-a N/A N/A Y 16218 >>> >>> Task Status of Volume gvAA01 >>> >>> ------------------------------------------------------------------------------ >>> There are no active volume tasks >>> >>> And gluster volume info: >>> >>> # gluster volume info >>> >>> Volume Name: gvAA01 >>> Type: Distributed-Replicate >>> Volume ID: ca4ece2c-13fe-414b-856c-2878196d6118 >>> Status: Started >>> Snapshot Count: 0 >>> Number of Bricks: 9 x (2 + 1) = 27 >>> Transport-type: tcp >>> Bricks: >>> Brick1: 01-B:/brick1/gvAA01/brick >>> Brick2: 02-B:/brick1/gvAA01/brick >>> Brick3: 00-A:/arbiterAA01/gvAA01/brick1 (arbiter) >>> Brick4: 01-B:/brick2/gvAA01/brick >>> Brick5: 02-B:/brick2/gvAA01/brick >>> Brick6: 00-A:/arbiterAA01/gvAA01/brick2 (arbiter) >>> Brick7: 01-B:/brick3/gvAA01/brick >>> Brick8: 02-B:/brick3/gvAA01/brick >>> Brick9: 00-A:/arbiterAA01/gvAA01/brick3 (arbiter) >>> Brick10: 01-B:/brick4/gvAA01/brick >>> Brick11: 02-B:/brick4/gvAA01/brick >>> Brick12: 00-A:/arbiterAA01/gvAA01/brick4 (arbiter) >>> Brick13: 01-B:/brick5/gvAA01/brick >>> Brick14: 02-B:/brick5/gvAA01/brick >>> Brick15: 00-A:/arbiterAA01/gvAA01/brick5 (arbiter) >>> Brick16: 01-B:/brick6/gvAA01/brick >>> Brick17: 02-B:/brick6/gvAA01/brick >>> Brick18: 00-A:/arbiterAA01/gvAA01/brick6 (arbiter) >>> Brick19: 01-B:/brick7/gvAA01/brick >>> Brick20: 02-B:/brick7/gvAA01/brick >>> Brick21: 00-A:/arbiterAA01/gvAA01/brick7 (arbiter) >>> Brick22: 01-B:/brick8/gvAA01/brick >>> Brick23: 02-B:/brick8/gvAA01/brick >>> Brick24: 00-A:/arbiterAA01/gvAA01/brick8 (arbiter) >>> Brick25: 01-B:/brick9/gvAA01/brick >>> Brick26: 02-B:/brick9/gvAA01/brick >>> Brick27: 00-A:/arbiterAA01/gvAA01/brick9 (arbiter) >>> Options Reconfigured: >>> cluster.shd-max-threads: 4 >>> performance.least-prio-threads: 16 >>> cluster.readdir-optimize: on >>> performance.quick-read: off >>> performance.stat-prefetch: off >>> cluster.data-self-heal: on >>> cluster.lookup-unhashed: auto >>> cluster.lookup-optimize: on >>> cluster.favorite-child-policy: mtime >>> server.allow-insecure: on >>> transport.address-family: inet >>> client.bind-insecure: on >>> cluster.entry-self-heal: off >>> cluster.metadata-self-heal: off >>> performance.md-cache-timeout: 600 >>> cluster.self-heal-daemon: enable >>> performance.readdir-ahead: on >>> diagnostics.brick-log-level: INFO >>> nfs.disable: off >>> >>> Thank you for any assistance. >>> >>> - Patrick >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> https://lists.gluster.org/mailman/listinfo/gluster-users >>> >>> >>> >>> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From hunter86_bg at yahoo.com Sun Apr 21 15:39:19 2019 From: hunter86_bg at yahoo.com (Strahil) Date: Sun, 21 Apr 2019 18:39:19 +0300 Subject: [Gluster-users] Extremely slow cluster performance Message-ID: <79s1555j5gp2peufvf8r68qu.1555861159775@email.android.com> This looks more like FUSE problem. Are the clients on v3.12.xx ? Can you setup a VM for a test and run FUSE mounts using v5.6 and with v6.x Best Regards, Strahil NikolovOn Apr 21, 2019 17:24, Patrick Rennie wrote: > > Hi Strahil,? > > Thank you for your reply and your suggestions. I'm not sure which logs would be most relevant to be checking to diagnose this issue, we have the brick logs, the cluster mount logs, the shd logs or something else? I have posted a few that I have seen repeated a few times already. I will continue to post anything further that I see.? > I am working on migrating data to some new storage, so this will slowly free up space, although this is a production cluster and new data is being uploaded every day, sometimes faster than I can migrate it off. I have several other similar clusters and none of them have the same problem, one the others is actually at 98-99% right now (big problem, I know) but still performs perfectly fine compared to this cluster, I am not sure low space is the root cause here.? > > I currently have 13 VMs accessing this cluster, I have checked each one and all of them use one of the two options below to mount the cluster in fstab > > HOSTNAME:/gvAA01? ?/mountpoint? ? glusterfs? ? ? ?defaults,_netdev,rw,log-level=WARNING,direct-io-mode=disable,use-readdirp=no? ? 0 0 > HOSTNAME:/gvAA01? ?/mountpoint? ? glusterfs? ? ? ?defaults,_netdev,rw,log-level=WARNING,direct-io-mode=disable > > I also have a few other VMs which use NFS to access the cluster, and these machines appear to be significantly quicker, initially I get a similar delay with NFS but if I cancel the first "ls" and try it again I get < 1 sec lookups, this can take over 10 minutes by FUSE/gluster client, but the same trick of cancelling and trying again doesn't work for FUSE/gluster. Sometimes the NFS queries have no delay at all, so this is a bit strange to me.? > HOSTNAME:/gvAA01? ? ? ? /mountpoint/ nfs defaults,_netdev,vers=3,async,noatime 0 0 > > Example: > user at VM:~$ time ls /cluster/folder > ^C > > real? ? 9m49.383s > user? ? 0m0.001s > sys? ? ?0m0.010s > > user at VM:~$ time ls /cluster/folder > > > real? ? 0m0.069s > user? ? 0m0.001s > sys? ? ?0m0.007s > > --- > > I have checked the profiling as you suggested, I let it run for around a minute, then cancelled it and saved the profile info.? > > root at HOSTNAME:/var/log/glusterfs# gluster volume profile gvAA01 start > Starting volume profile on gvAA01 has been successful > root at HOSTNAME:/var/log/glusterfs# time ls /cluster/folder > ^C > > real? ? 1m1.660s > user? ? 0m0.000s > sys? ? ?0m0.002s > > root at HOSTNAME:/var/log/glusterfs# gluster volume profile gvAA01 info >> ~/profile.txt > root at HOSTNAME:/var/log/glusterfs# gluster volume profile gvAA01 stop > > I will attach the results to this email as it's over 1000 lines. Unfortunately, I'm not sure what I'm looking at but possibly somebody will be able to help me make sense of it and let me know if it highlights any specific issues.? > > Happy to try any further suggestions. Thank you, > > -Patrick > > On Sun, Apr 21, 2019 at 7:55 PM Strahil wrote: >> >> By the way, can you provide the 'volume info' and the mount options on all clients? >> Maybe , there is an option that uses a lot of resources due to some client's mount options. >> >> Best Regards, >> Strahil Nikolov >> >> On Apr 21, 2019 10:55, Patrick Rennie wrote: >>> >>> Just another small update, I'm continuing to watch my brick logs and I just saw these errors come up in the recent events too. I am going to continue to post any errors I see in the hope of finding the right one to try and fix..? >>> This is from the logs on brick1 -------------- next part -------------- An HTML attachment was scrubbed... URL: From hunter86_bg at yahoo.com Sun Apr 21 15:50:43 2019 From: hunter86_bg at yahoo.com (Strahil) Date: Sun, 21 Apr 2019 18:50:43 +0300 Subject: [Gluster-users] Extremely slow cluster performance Message-ID: Usually when this happens I run '/find /fuse/mount/point -exec stat {} \;' from a client (using gluster with oVirt). Yet, my scale is multiple times smaller and I don't know how this will affect you (except it will trigger a heal). So the round-robin of the DNS clarifies the mystery .In such case, maybe FUSE client is not the problem.Still it is worth trying a VM with the new gluster version to mount the cluster. From the profile (took a short glance over it from my phone), not all bricks are spending much of their time in LOOKUP. Maybe your data is not evenly distributed? Is that ever possible ? Sadly you can't rebalance untill all those heals are pending.(Maybe I'm wrong) Have you checked the speed of 'ls /my/brick/subdir1/' on each brick ? Sadly, I'm just a gluster user, so take everything with a grain of salt. Best Regards, Strahil NikolovOn Apr 21, 2019 18:03, Patrick Rennie wrote: > > I just tried to check my "gluster volume heal gvAA01 statistics" and it doesn't seem like a full heal was still in progress, just an index, I have started the full heal again and am trying to monitor it with "gluster volume heal gvAA01 info" which just shows me thousands of gfid file identifiers scrolling past.? > What is the best way to check the status of a heal and track the files healed and progress to completion?? > > Thank you, > - Patrick > > On Sun, Apr 21, 2019 at 10:28 PM Patrick Rennie wrote: >> >> I think just worked out why NFS lookups are sometimes slow and sometimes fast as the hostname uses round robin DNS lookups, if I change to a specific host, 01-B, it's always quick, and if I change to the other brick host, 02-B, it's always slow.? >> Maybe that will help to narrow this down?? >> >> On Sun, Apr 21, 2019 at 10:24 PM Patrick Rennie wrote: >>> >>> Hi Strahil,? >>> >>> Thank you for your reply and your suggestions. I'm not sure which logs would be most relevant to be checking to diagnose this issue, we have the brick logs, the cluster mount logs, the shd logs or something else? I have posted a few that I have seen repeated a few times already. I will continue to post anything further that I see.? >>> I am working on migrating data to some new storage, so this will slowly free up space, although this is a production cluster and new data is being uploaded every day, sometimes faster than I can migrate it off. I have several other similar clusters and none of them have the same problem, one the others is actually at 98-99% right now (big problem, I know) but still performs perfectly fine compared to this cluster, I am not sure low space is the root cause here.? >>> >>> I currently have 13 VMs accessing this cluster, I have checked each one and all of them use one of the two options below to mount the cluster in fstab >>> >>> HOSTNAME:/gvAA01? ?/mountpoint? ? glusterfs? ? ? ?defaults,_netdev,rw,log-level=WARNING,direct-io-mode=disable,use-readdirp=no? ? 0 0 >>> HOSTNAME:/gvAA01? ?/mountpoint? ? glusterfs? ? ? ?defaults,_netdev,rw,log-level=WARNING,direct-io-mode=disable >>> >>> I also have a few other VMs which use NFS to access the cluster, and these machines appear to be significantly quicker, initially I get a similar delay with NFS but if I cancel the first "ls" and try it again I get < 1 sec lookups, this can take over 10 minutes by FUSE/gluster client, but the same trick of cancelling and trying again doesn't work for FUSE/gluster. Sometimes the NFS queries have no delay at all, so this is a bit strange to me.? >>> HOSTNAME:/gvAA01? ? ? ? /mountpoint/ nfs defaults,_netdev,vers=3,async,noatime 0 0 >>> >>> Example: >>> user at VM:~$ time ls /cluster/folder >>> ^C >>> >>> real? ? 9m49.383s >>> user? ? 0m0.001s >>> sys? ? ?0m0.010s >>> >>> user at VM:~$ time ls /cluster/folder >>> >>> >>> real? ? 0m0.069s >>> user? ? 0m0.001s >>> sys? ? ?0m0.007s >>> >>> --- >>> >>> I have checked the profiling as you suggested, I let it run for around a minute, then cancelled it and saved the profile info.? >>> >>> root at HOSTNAME:/var/log/glusterfs# gluster volume profile gvAA01 start >>> Starting volume profile on gvAA01 has been successful >>> root at HOSTNAME:/var/log/glusterfs# time ls /cluster/folder >>> ^C >>> >>> real? ? 1m1.660s >>> user? ? 0m0.000s >>> sys? ? ?0m0.002s >>> >>> root at HOSTNAME:/var/log/glusterfs# gluster volume profile gvAA01 info >> ~/profile.txt >>> root at HOSTNAME:/var/log/glusterfs# gluster volume profile gvAA01 stop >>> >>> I will attach the results to this email as it's o -------------- next part -------------- An HTML attachment was scrubbed... URL: From patrickmrennie at gmail.com Sun Apr 21 16:24:39 2019 From: patrickmrennie at gmail.com (Patrick Rennie) Date: Mon, 22 Apr 2019 00:24:39 +0800 Subject: [Gluster-users] Extremely slow cluster performance In-Reply-To: <79s1555j5gp2peufvf8r68qu.1555861159775@email.android.com> References: <79s1555j5gp2peufvf8r68qu.1555861159775@email.android.com> Message-ID: Hi Strahil, Thanks again for your help, I checked most of my clients are on 3.13.2 which I think is the default packaged with Ubuntu. I upgraded a test VM to v5.6 and tested again and there is no difference, performance accessing the cluster is the same. Cheers, -Patrick On Sun, Apr 21, 2019 at 11:39 PM Strahil wrote: > This looks more like FUSE problem. > Are the clients on v3.12.xx ? > Can you setup a VM for a test and run FUSE mounts using v5.6 and with v6.x > > Best Regards, > Strahil Nikolov > On Apr 21, 2019 17:24, Patrick Rennie wrote: > > Hi Strahil, > > Thank you for your reply and your suggestions. I'm not sure which logs > would be most relevant to be checking to diagnose this issue, we have the > brick logs, the cluster mount logs, the shd logs or something else? I have > posted a few that I have seen repeated a few times already. I will continue > to post anything further that I see. > I am working on migrating data to some new storage, so this will slowly > free up space, although this is a production cluster and new data is being > uploaded every day, sometimes faster than I can migrate it off. I have > several other similar clusters and none of them have the same problem, one > the others is actually at 98-99% right now (big problem, I know) but still > performs perfectly fine compared to this cluster, I am not sure low space > is the root cause here. > > I currently have 13 VMs accessing this cluster, I have checked each one > and all of them use one of the two options below to mount the cluster in > fstab > > HOSTNAME:/gvAA01 /mountpoint glusterfs > defaults,_netdev,rw,log-level=WARNING,direct-io-mode=disable,use-readdirp=no > 0 0 > HOSTNAME:/gvAA01 /mountpoint glusterfs > defaults,_netdev,rw,log-level=WARNING,direct-io-mode=disable > > I also have a few other VMs which use NFS to access the cluster, and these > machines appear to be significantly quicker, initially I get a similar > delay with NFS but if I cancel the first "ls" and try it again I get < 1 > sec lookups, this can take over 10 minutes by FUSE/gluster client, but the > same trick of cancelling and trying again doesn't work for FUSE/gluster. > Sometimes the NFS queries have no delay at all, so this is a bit strange to > me. > HOSTNAME:/gvAA01 /mountpoint/ nfs > defaults,_netdev,vers=3,async,noatime 0 0 > > Example: > user at VM:~$ time ls /cluster/folder > ^C > > real 9m49.383s > user 0m0.001s > sys 0m0.010s > > user at VM:~$ time ls /cluster/folder > > > real 0m0.069s > user 0m0.001s > sys 0m0.007s > > --- > > I have checked the profiling as you suggested, I let it run for around a > minute, then cancelled it and saved the profile info. > > root at HOSTNAME:/var/log/glusterfs# gluster volume profile gvAA01 start > Starting volume profile on gvAA01 has been successful > root at HOSTNAME:/var/log/glusterfs# time ls /cluster/folder > ^C > > real 1m1.660s > user 0m0.000s > sys 0m0.002s > > root at HOSTNAME:/var/log/glusterfs# gluster volume profile gvAA01 info >> > ~/profile.txt > root at HOSTNAME:/var/log/glusterfs# gluster volume profile gvAA01 stop > > I will attach the results to this email as it's over 1000 lines. > Unfortunately, I'm not sure what I'm looking at but possibly somebody will > be able to help me make sense of it and let me know if it highlights any > specific issues. > > Happy to try any further suggestions. Thank you, > > -Patrick > > On Sun, Apr 21, 2019 at 7:55 PM Strahil wrote: > > By the way, can you provide the 'volume info' and the mount options on all > clients? > Maybe , there is an option that uses a lot of resources due to some > client's mount options. > > Best Regards, > Strahil Nikolov > On Apr 21, 2019 10:55, Patrick Rennie wrote: > > Just another small update, I'm continuing to watch my brick logs and I > just saw these errors come up in the recent events too. I am going to > continue to post any errors I see in the hope of finding the right one to > try and fix.. > This is from the logs on brick1 > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From patrickmrennie at gmail.com Sun Apr 21 16:33:23 2019 From: patrickmrennie at gmail.com (Patrick Rennie) Date: Mon, 22 Apr 2019 00:33:23 +0800 Subject: [Gluster-users] Extremely slow cluster performance In-Reply-To: References: Message-ID: Thanks again, I have tried to run a find over the cluster to try and trigger self-healing, but it's very slow so I don't have it running right now. If I check the same "ls /brick/folder" on all bricks, it takes less than 0.01 sec so I don't think any individual brick is causing the problem, performance on each brick seems to be normal. I think the issue is somewhere in the gluster internal communication as I believe FUSE mounted clients will try to communicate with all bricks. Unfortunately, I am not sure how to confirm this or narrow this down. Really struggling with this one now, it's starting to significantly impact our operations. I'm not sure what else I can try so appreciate any suggestions. Thank you, - Patrick On Sun, Apr 21, 2019 at 11:50 PM Strahil wrote: > Usually when this happens I run '/find /fuse/mount/point -exec stat {} > \;' from a client (using gluster with oVirt). > Yet, my scale is multiple times smaller and I don't know how this will > affect you (except it will trigger a heal). > > So the round-robin of the DNS clarifies the mystery .In such case, maybe > FUSE client is not the problem.Still it is worth trying a VM with the new > gluster version to mount the cluster. > > From the profile (took a short glance over it from my phone), not all > bricks are spending much of their time in LOOKUP. > Maybe your data is not evenly distributed? Is that ever possible ? > Sadly you can't rebalance untill all those heals are pending.(Maybe I'm > wrong) > > Have you checked the speed of 'ls /my/brick/subdir1/' on each brick ? > > Sadly, I'm just a gluster user, so take everything with a grain of salt. > > Best Regards, > Strahil Nikolov > On Apr 21, 2019 18:03, Patrick Rennie wrote: > > I just tried to check my "gluster volume heal gvAA01 statistics" and it > doesn't seem like a full heal was still in progress, just an index, I have > started the full heal again and am trying to monitor it with "gluster > volume heal gvAA01 info" which just shows me thousands of gfid file > identifiers scrolling past. > What is the best way to check the status of a heal and track the files > healed and progress to completion? > > Thank you, > - Patrick > > On Sun, Apr 21, 2019 at 10:28 PM Patrick Rennie > wrote: > > I think just worked out why NFS lookups are sometimes slow and sometimes > fast as the hostname uses round robin DNS lookups, if I change to a > specific host, 01-B, it's always quick, and if I change to the other brick > host, 02-B, it's always slow. > Maybe that will help to narrow this down? > > On Sun, Apr 21, 2019 at 10:24 PM Patrick Rennie > wrote: > > Hi Strahil, > > Thank you for your reply and your suggestions. I'm not sure which logs > would be most relevant to be checking to diagnose this issue, we have the > brick logs, the cluster mount logs, the shd logs or something else? I have > posted a few that I have seen repeated a few times already. I will continue > to post anything further that I see. > I am working on migrating data to some new storage, so this will slowly > free up space, although this is a production cluster and new data is being > uploaded every day, sometimes faster than I can migrate it off. I have > several other similar clusters and none of them have the same problem, one > the others is actually at 98-99% right now (big problem, I know) but still > performs perfectly fine compared to this cluster, I am not sure low space > is the root cause here. > > I currently have 13 VMs accessing this cluster, I have checked each one > and all of them use one of the two options below to mount the cluster in > fstab > > HOSTNAME:/gvAA01 /mountpoint glusterfs > defaults,_netdev,rw,log-level=WARNING,direct-io-mode=disable,use-readdirp=no > 0 0 > HOSTNAME:/gvAA01 /mountpoint glusterfs > defaults,_netdev,rw,log-level=WARNING,direct-io-mode=disable > > I also have a few other VMs which use NFS to access the cluster, and these > machines appear to be significantly quicker, initially I get a similar > delay with NFS but if I cancel the first "ls" and try it again I get < 1 > sec lookups, this can take over 10 minutes by FUSE/gluster client, but the > same trick of cancelling and trying again doesn't work for FUSE/gluster. > Sometimes the NFS queries have no delay at all, so this is a bit strange to > me. > HOSTNAME:/gvAA01 /mountpoint/ nfs > defaults,_netdev,vers=3,async,noatime 0 0 > > Example: > user at VM:~$ time ls /cluster/folder > ^C > > real 9m49.383s > user 0m0.001s > sys 0m0.010s > > user at VM:~$ time ls /cluster/folder > > > real 0m0.069s > user 0m0.001s > sys 0m0.007s > > --- > > I have checked the profiling as you suggested, I let it run for around a > minute, then cancelled it and saved the profile info. > > root at HOSTNAME:/var/log/glusterfs# gluster volume profile gvAA01 start > Starting volume profile on gvAA01 has been successful > root at HOSTNAME:/var/log/glusterfs# time ls /cluster/folder > ^C > > real 1m1.660s > user 0m0.000s > sys 0m0.002s > > root at HOSTNAME:/var/log/glusterfs# gluster volume profile gvAA01 info >> > ~/profile.txt > root at HOSTNAME:/var/log/glusterfs# gluster volume profile gvAA01 stop > > I will attach the results to this email as it's o > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From budic at onholyground.com Sun Apr 21 16:43:23 2019 From: budic at onholyground.com (Darrell Budic) Date: Sun, 21 Apr 2019 11:43:23 -0500 Subject: [Gluster-users] Extremely slow cluster performance In-Reply-To: References: <93FC9B39-2E8C-4579-8C9D-DEF1A28B7384@onholyground.com> Message-ID: <0A865F28-C4A6-41EF-AE37-70216670B4F0@onholyground.com> Patrick- Specifically re: > Thanks again for your advice, I've left it for a while but unfortunately it's still just as slow and causing more problems for our operations now. I will need to try and take some steps to at least bring performance back to normal while continuing to investigate the issue longer term. I can definitely see one node with heavier CPU than the other, almost double, which I am OK with, but I think the heal process is going to take forever, trying to check the "gluster volume heal info" shows thousands and thousands of files which may need healing, I have no idea how many in total the command is still running after hours, so I am not sure what has gone so wrong to cause this. > ... > I have no idea how long the healing is going to take on this cluster, we have around 560TB of data on here, but I don't think I can wait that long to try and restore performance to normal. You?re in a bind, I know, but it?s just going to take some time recover. You have a lot of data, and even at the best speeds your disks and networks can muster, it?s going to take a while. Until your cluster is fully healed, anything else you try may not have the full effect it would on a fully operational cluster. Your predecessor may have made things worse by not having proper posix attributes on the ZFS file system. You may have made things worse by killing brick processes in your distributed-replicated setup, creating an additional need for healing and possibly compounding the overall performance issues. I?m not trying to blame you or make you feel bad, but I do want to point out that there?s a problem here, and there is unlikely to be a silver bullet that will resolve the issue instantly. You?re going to have to give it time to get back into a ?normal" condition, which seems to be what your setup was configured and tested for in the first place. Those things said, rather than trying to move things from this cluster to different storage, what about having your VMs mount different storage in the first place and move the write load off of this cluster while it recovers? Looking at the profile you posted for Strahil, your bricks are spending a lot of time doing LOOKUPs, and some are slower than others by a significant margin. If you haven?t already, check the zfs pools on those, make sure they don?t have any failed disks that might be slowing them down. Consider if you can speed them up with a ZIL or SLOG if they are spinning disks (although your previous server descriptions sound like you don?t need a SLOG, ZILs may help fi they are HDDs)? Just saw your additional comments that one server is faster than than the other, it?s possible that it?s got the actual data and the other one is doing healings every time it gets accessed, or it?s just got fuller and slower volumes. It may make sense to try forcing all your VM mounts to the faster server for a while, even if it?s the one with higher load (serving will get preference to healing, but don?t push the shd-max-threads too high, they can squash performance. Given it?s a dispersed volume, make sure you?ve got disperse.shd-max-threads at 4 or 8, and raise disperse.shd-wait-qlength to 4096 or so. You?re getting into things best tested with everything working, but desperate times call for accelerated testing, right? You could experiment with different values of performance.io -thread-cound, try 48. But if your CPU load is already near max, you?re getting everything you can out of your CPU already, so don?t spend too much time on it. Check out https://github.com/gluster/glusterfs/blob/release-3.11/extras/group-nl-cache and try applying these to your gluster volume. Without knowing more about your workload, these may help if you?re doing a lot of directory listing and file lookups or tests for the (non)existence of a file from your VMs. If those help, search the mailing list for info on the mount option ?negative_cache=1? and a thread titled '[Gluster-users] Gluster native mount is really slow compared to nfs?, it may have some client side mount options that could give you further benefits. Have a look at https://docs.gluster.org/en/v3/Administrator%20Guide/Managing%20Volumes/#tuning-options , cluster.data-sef-heal-algorithm full may help things heal faster for you. performance.flush-behind & related may improve write response to the clients, use caution unless you have UPSs & battery backed raids, etc. If you have stats on network traffic on/between your two ?real? node servers, you can use that as a proxy value for healing performance. I looked up the performance.stat-prefetch bug for you, it was fixed back in 3.8, so it should be safe to enable on your 3.12.x system even with servers at .15 & .14. You?ll probably have to wait for devs to get anything else out of those logs, but make sure your servers can all see each other (gluster peer status, everything should be ?Peer in Cluster (Connected)? on all servers), and all 3 see all the bricks in the ?gluster vol status?. Maybe check for split brain files on those you keep seeing in the logs? Good luck, have patience, and remember (& remind others) that things are not in their normal state at this moment, and look for things outside of the gluster server cluster to try to help (https://joejulian.name/post/optimizing-web-performance-with-glusterfs/) get through the healing as well. -Darrell > On Apr 21, 2019, at 4:41 AM, Patrick Rennie wrote: > > Another small update from me, I have been keeping an eye on the glustershd.log file to see what is going on and I keep seeing the same file names come up in there every 10 minutes, but not a lot of other activity. Logs below. > How can I be sure my heal is progressing through the files which actually need to be healed? I thought it would show up in these logs. > I also increased the "cluster.shd-max-threads" from 4 to 8 to try and speed things up too. > > Any ideas here? > > Thanks, > > - Patrick > > On 01-B > ------- > [2019-04-21 09:12:54.575689] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-gvAA01-replicate-6: performing metadata selfheal on 5354c112-2e58-451d-a6f7-6bfcc1c9d904 > [2019-04-21 09:12:54.733601] I [MSGID: 108026] [afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6: Completed metadata selfheal on 5354c112-2e58-451d-a6f7-6bfcc1c9d904. sources=[0] 2 sinks=1 > [2019-04-21 09:13:12.028509] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-5: performing entry selfheal on 30547ab6-1fbd-422e-9c81-2009f9ff7ebe > [2019-04-21 09:13:12.047470] W [MSGID: 108015] [afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-5: expunging file 30547ab6-1fbd-422e-9c81-2009f9ff7ebe/XXXXXXXX.vbm_346744_tmp (00000000-0000-0000-0000-000000000000) on gvAA01-client-17 > > [2019-04-21 09:23:13.044377] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-5: performing entry selfheal on 30547ab6-1fbd-422e-9c81-2009f9ff7ebe > [2019-04-21 09:23:13.051479] W [MSGID: 108015] [afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-5: expunging file 30547ab6-1fbd-422e-9c81-2009f9ff7ebe/XXXXXXXX.vbm_346744_tmp (00000000-0000-0000-0000-000000000000) on gvAA01-client-17 > > [2019-04-21 09:33:07.400369] I [MSGID: 108026] [afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6: Completed data selfheal on 2fd9899f-192b-49cb-ae9c-df35d3f004fa. sources=[0] 2 sinks=1 > [2019-04-21 09:33:11.825449] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-gvAA01-replicate-6: performing metadata selfheal on 2fd9899f-192b-49cb-ae9c-df35d3f004fa > [2019-04-21 09:33:14.029837] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-5: performing entry selfheal on 30547ab6-1fbd-422e-9c81-2009f9ff7ebe > [2019-04-21 09:33:14.037436] W [MSGID: 108015] [afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-5: expunging file 30547ab6-1fbd-422e-9c81-2009f9ff7ebe/XXXXXXXX.vbm_346744_tmp (00000000-0000-0000-0000-000000000000) on gvAA01-client-17 > [2019-04-21 09:33:23.913882] I [MSGID: 108026] [afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6: Completed metadata selfheal on 2fd9899f-192b-49cb-ae9c-df35d3f004fa. sources=[0] 2 sinks=1 > [2019-04-21 09:33:43.874201] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-gvAA01-replicate-6: performing metadata selfheal on c25b80fd-f7df-4c6d-92bd-db930e89a0b1 > [2019-04-21 09:34:02.273898] I [MSGID: 108026] [afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6: Completed metadata selfheal on c25b80fd-f7df-4c6d-92bd-db930e89a0b1. sources=[0] 2 sinks=1 > [2019-04-21 09:35:12.282045] I [MSGID: 108026] [afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6: Completed data selfheal on 94027f22-a7d7-4827-be0d-09cf5ddda885. sources=[0] 2 sinks=1 > [2019-04-21 09:35:15.146252] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-gvAA01-replicate-6: performing metadata selfheal on 94027f22-a7d7-4827-be0d-09cf5ddda885 > [2019-04-21 09:35:15.254538] I [MSGID: 108026] [afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6: Completed metadata selfheal on 94027f22-a7d7-4827-be0d-09cf5ddda885. sources=[0] 2 sinks=1 > [2019-04-21 09:35:22.900803] I [MSGID: 108026] [afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6: Completed data selfheal on 84c93069-cfd8-441b-a6e8-958bed535b45. sources=[0] 2 sinks=1 > [2019-04-21 09:35:27.150963] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-gvAA01-replicate-6: performing metadata selfheal on 84c93069-cfd8-441b-a6e8-958bed535b45 > [2019-04-21 09:35:29.186295] I [MSGID: 108026] [afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6: Completed metadata selfheal on 84c93069-cfd8-441b-a6e8-958bed535b45. sources=[0] 2 sinks=1 > [2019-04-21 09:35:35.967451] I [MSGID: 108026] [afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6: Completed data selfheal on e747c32e-4353-4173-9024-855c69cdf9b9. sources=[0] 2 sinks=1 > [2019-04-21 09:35:40.733444] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-gvAA01-replicate-6: performing metadata selfheal on e747c32e-4353-4173-9024-855c69cdf9b9 > [2019-04-21 09:35:58.707593] I [MSGID: 108026] [afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6: Completed metadata selfheal on e747c32e-4353-4173-9024-855c69cdf9b9. sources=[0] 2 sinks=1 > [2019-04-21 09:36:25.554260] I [MSGID: 108026] [afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6: Completed data selfheal on 4758d581-9de0-403b-af8b-bfd3d71d020d. sources=[0] 2 sinks=1 > [2019-04-21 09:36:26.031422] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-gvAA01-replicate-6: performing metadata selfheal on 4758d581-9de0-403b-af8b-bfd3d71d020d > [2019-04-21 09:36:26.083982] I [MSGID: 108026] [afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6: Completed metadata selfheal on 4758d581-9de0-403b-af8b-bfd3d71d020d. sources=[0] 2 sinks=1 > > On 02-B > ------- > [2019-04-21 09:03:15.815250] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-4: performing entry selfheal on 8fa9513e-82fd-4a6e-8ac9-e1f1cd8afb01 > [2019-04-21 09:03:15.863153] W [MSGID: 108015] [afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-4: expunging file 8fa9513e-82fd-4a6e-8ac9-e1f1cd8afb01/C_VOL-b003-i174-cd.md5.tmp (00000000-0000-0000-0000-000000000000) on gvAA01-client-14 > [2019-04-21 09:03:15.867432] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-4: performing entry selfheal on 65f6d9fb-a441-4e47-b91a-0936d11a8c8f > [2019-04-21 09:03:15.875134] W [MSGID: 108015] [afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-4: expunging file 65f6d9fb-a441-4e47-b91a-0936d11a8c8f/C_VOL-b001-i14937-cd.md5.tmp (00000000-0000-0000-0000-000000000000) on gvAA01-client-14 > [2019-04-21 09:03:39.020198] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-5: performing entry selfheal on 30547ab6-1fbd-422e-9c81-2009f9ff7ebe > [2019-04-21 09:03:39.027345] W [MSGID: 108015] [afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-5: expunging file 30547ab6-1fbd-422e-9c81-2009f9ff7ebe/XXXXXXXX.vbm_346744_tmp (00000000-0000-0000-0000-000000000000) on gvAA01-client-17 > > [2019-04-21 09:13:18.524874] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-4: performing entry selfheal on 8fa9513e-82fd-4a6e-8ac9-e1f1cd8afb01 > [2019-04-21 09:13:20.070172] W [MSGID: 108015] [afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-4: expunging file 8fa9513e-82fd-4a6e-8ac9-e1f1cd8afb01/C_VOL-b003-i174-cd.md5.tmp (00000000-0000-0000-0000-000000000000) on gvAA01-client-14 > [2019-04-21 09:13:20.074977] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-4: performing entry selfheal on 65f6d9fb-a441-4e47-b91a-0936d11a8c8f > [2019-04-21 09:13:20.080827] W [MSGID: 108015] [afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-4: expunging file 65f6d9fb-a441-4e47-b91a-0936d11a8c8f/C_VOL-b001-i14937-cd.md5.tmp (00000000-0000-0000-0000-000000000000) on gvAA01-client-14 > [2019-04-21 09:13:40.015763] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-5: performing entry selfheal on 30547ab6-1fbd-422e-9c81-2009f9ff7ebe > [2019-04-21 09:13:40.021805] W [MSGID: 108015] [afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-5: expunging file 30547ab6-1fbd-422e-9c81-2009f9ff7ebe/XXXXXXXX.vbm_346744_tmp (00000000-0000-0000-0000-000000000000) on gvAA01-client-17 > > [2019-04-21 09:23:21.991032] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-4: performing entry selfheal on 8fa9513e-82fd-4a6e-8ac9-e1f1cd8afb01 > [2019-04-21 09:23:22.054565] W [MSGID: 108015] [afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-4: expunging file 8fa9513e-82fd-4a6e-8ac9-e1f1cd8afb01/C_VOL-b003-i174-cd.md5.tmp (00000000-0000-0000-0000-000000000000) on gvAA01-client-14 > [2019-04-21 09:23:22.059225] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-4: performing entry selfheal on 65f6d9fb-a441-4e47-b91a-0936d11a8c8f > [2019-04-21 09:23:22.066266] W [MSGID: 108015] [afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-4: expunging file 65f6d9fb-a441-4e47-b91a-0936d11a8c8f/C_VOL-b001-i14937-cd.md5.tmp (00000000-0000-0000-0000-000000000000) on gvAA01-client-14 > [2019-04-21 09:23:41.129962] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-5: performing entry selfheal on 30547ab6-1fbd-422e-9c81-2009f9ff7ebe > [2019-04-21 09:23:41.135919] W [MSGID: 108015] [afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-5: expunging file 30547ab6-1fbd-422e-9c81-2009f9ff7ebe/XXXXXXXX.vbm_346744_tmp (00000000-0000-0000-0000-000000000000) on gvAA01-client-17 > > [2019-04-21 09:33:24.015223] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-4: performing entry selfheal on 8fa9513e-82fd-4a6e-8ac9-e1f1cd8afb01 > [2019-04-21 09:33:24.069686] W [MSGID: 108015] [afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-4: expunging file 8fa9513e-82fd-4a6e-8ac9-e1f1cd8afb01/C_VOL-b003-i174-cd.md5.tmp (00000000-0000-0000-0000-000000000000) on gvAA01-client-14 > [2019-04-21 09:33:24.074341] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-4: performing entry selfheal on 65f6d9fb-a441-4e47-b91a-0936d11a8c8f > [2019-04-21 09:33:24.080065] W [MSGID: 108015] [afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-4: expunging file 65f6d9fb-a441-4e47-b91a-0936d11a8c8f/C_VOL-b001-i14937-cd.md5.tmp (00000000-0000-0000-0000-000000000000) on gvAA01-client-14 > [2019-04-21 09:33:42.099515] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-5: performing entry selfheal on 30547ab6-1fbd-422e-9c81-2009f9ff7ebe > [2019-04-21 09:33:42.107481] W [MSGID: 108015] [afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-5: expunging file 30547ab6-1fbd-422e-9c81-2009f9ff7ebe/XXXXXXXX.vbm_346744_tmp (00000000-0000-0000-0000-000000000000) on gvAA01-client-17 > > > On Sun, Apr 21, 2019 at 3:55 PM Patrick Rennie > wrote: > Just another small update, I'm continuing to watch my brick logs and I just saw these errors come up in the recent events too. I am going to continue to post any errors I see in the hope of finding the right one to try and fix.. > This is from the logs on brick1, seems to be occurring on both nodes on brick1, although at different times. I'm not sure what this means, can anyone shed any light? > I guess I am looking for some kind of specific error which may indicate something is broken or stuck and locking up and causing the extreme latency I'm seeing in the cluster. > > [2019-04-21 07:25:55.064497] E [rpcsvc.c:1364:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x7c700c, Program: GlusterFS 3.3, ProgVers: 330, Proc: 29) to rpc-transport (tcp.gvAA01-server) > [2019-04-21 07:25:55.064612] E [server.c:195:server_submit_reply] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e58a) [0x7f3b3e93158a] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17d45) [0x7f3b3e4c5d45] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) [0x7f3b3e4b72cd] ) 0-: Reply submission failed > [2019-04-21 07:25:55.064675] E [rpcsvc.c:1364:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x7c70af, Program: GlusterFS 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server) > [2019-04-21 07:25:55.064705] E [server.c:195:server_submit_reply] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa) [0x7f3b3e9318fa] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35) [0x7f3b3e4c5f35] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) [0x7f3b3e4b72cd] ) 0-: Reply submission failed > [2019-04-21 07:25:55.064742] E [rpcsvc.c:1364:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x7c723c, Program: GlusterFS 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server) > [2019-04-21 07:25:55.064768] E [server.c:195:server_submit_reply] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa) [0x7f3b3e9318fa] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35) [0x7f3b3e4c5f35] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) [0x7f3b3e4b72cd] ) 0-: Reply submission failed > [2019-04-21 07:25:55.064812] E [rpcsvc.c:1364:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x7c72b4, Program: GlusterFS 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server) > [2019-04-21 07:25:55.064837] E [server.c:195:server_submit_reply] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa) [0x7f3b3e9318fa] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35) [0x7f3b3e4c5f35] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) [0x7f3b3e4b72cd] ) 0-: Reply submission failed > [2019-04-21 07:25:55.064880] E [rpcsvc.c:1364:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x7c740b, Program: GlusterFS 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server) > [2019-04-21 07:25:55.064905] E [server.c:195:server_submit_reply] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa) [0x7f3b3e9318fa] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35) [0x7f3b3e4c5f35] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) [0x7f3b3e4b72cd] ) 0-: Reply submission failed > [2019-04-21 07:25:55.064939] E [rpcsvc.c:1364:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x7c7441, Program: GlusterFS 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server) > [2019-04-21 07:25:55.064962] E [server.c:195:server_submit_reply] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa) [0x7f3b3e9318fa] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35) [0x7f3b3e4c5f35] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) [0x7f3b3e4b72cd] ) 0-: Reply submission failed > [2019-04-21 07:25:55.064996] E [rpcsvc.c:1364:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x7c74d5, Program: GlusterFS 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server) > [2019-04-21 07:25:55.065020] E [server.c:195:server_submit_reply] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa) [0x7f3b3e9318fa] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35) [0x7f3b3e4c5f35] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) [0x7f3b3e4b72cd] ) 0-: Reply submission failed > [2019-04-21 07:25:55.065052] E [rpcsvc.c:1364:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x7c7551, Program: GlusterFS 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server) > [2019-04-21 07:25:55.065076] E [server.c:195:server_submit_reply] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa) [0x7f3b3e9318fa] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35) [0x7f3b3e4c5f35] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) [0x7f3b3e4b72cd] ) 0-: Reply submission failed > [2019-04-21 07:25:55.065110] E [rpcsvc.c:1364:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x7c76d1, Program: GlusterFS 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server) > [2019-04-21 07:25:55.065133] E [server.c:195:server_submit_reply] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa) [0x7f3b3e9318fa] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35) [0x7f3b3e4c5f35] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) [0x7f3b3e4b72cd] ) 0-: Reply submission failed > > Thanks again, > > -Patrick > > On Sun, Apr 21, 2019 at 3:50 PM Patrick Rennie > wrote: > Hi Darrell, > > Thanks again for your advice, I've left it for a while but unfortunately it's still just as slow and causing more problems for our operations now. I will need to try and take some steps to at least bring performance back to normal while continuing to investigate the issue longer term. I can definitely see one node with heavier CPU than the other, almost double, which I am OK with, but I think the heal process is going to take forever, trying to check the "gluster volume heal info" shows thousands and thousands of files which may need healing, I have no idea how many in total the command is still running after hours, so I am not sure what has gone so wrong to cause this. > > I've checked cluster.op-version and cluster.max-op-version and it looks like I'm on the latest version there. > > I have no idea how long the healing is going to take on this cluster, we have around 560TB of data on here, but I don't think I can wait that long to try and restore performance to normal. > > Can anyone think of anything else I can try in the meantime to work out what's causing the extreme latency? > > I've been going through cluster client the logs of some of our VMs and on some of our FTP servers I found this in the cluster mount log, but I am not seeing it on any of our other servers, just our FTP servers. > > [2019-04-21 07:16:19.925388] E [MSGID: 101046] [dht-common.c:1904:dht_revalidate_cbk] 0-gvAA01-dht: dict is null > [2019-04-21 07:19:43.413834] W [MSGID: 114031] [client-rpc-fops.c:2203:client3_3_setattr_cbk] 0-gvAA01-client-19: remote operation failed [No such file or directory] > [2019-04-21 07:19:43.414153] W [MSGID: 114031] [client-rpc-fops.c:2203:client3_3_setattr_cbk] 0-gvAA01-client-20: remote operation failed [No such file or directory] > [2019-04-21 07:23:33.154717] E [MSGID: 101046] [dht-common.c:1904:dht_revalidate_cbk] 0-gvAA01-dht: dict is null > [2019-04-21 07:33:24.943913] E [MSGID: 101046] [dht-common.c:1904:dht_revalidate_cbk] 0-gvAA01-dht: dict is null > > Any ideas what this could mean? I am basically just grasping at straws here. > > I am going to hold off on the version upgrade until I know there are no files which need healing, which could be a while, from some reading I've done there shouldn't be any issues with this as both are on v3.12.x > > I've free'd up a small amount of space, but I still need to work on this further. > > I've read of a command "find .glusterfs -type f -links -2 -exec rm {} \;" which could be run on each brick and it would potentially clean up any files which were deleted straight from the bricks, but not via the client, I have a feeling this could help me free up about 5-10TB per brick from what I've been told about the history of this cluster. Can anyone confirm if this is actually safe to run? > > At this stage, I'm open to any suggestions as to how to proceed, thanks again for any advice. > > Cheers, > > - Patrick > > On Sun, Apr 21, 2019 at 1:22 AM Darrell Budic > wrote: > Patrick, > > Sounds like progress. Be aware that gluster is expected to max out the CPUs on at least one of your servers while healing. This is normal and won?t adversely affect overall performance (any more than having bricks in need of healing, at any rate) unless you?re overdoing it. shd threads <= 4 should not do that on your hardware. Other tunings may have also increased overall performance, so you may see higher CPU than previously anyway. I?d recommend upping those thread counts and letting it heal as fast as possible, especially if these are dedicated Gluster storage servers (Ie: not also running VMs, etc). You should see ?normal? CPU use one heals are completed. I see ~15-30% overall normally, 95-98% while healing (x my 20 cores). It?s also likely to be different between your servers, in a pure replica, one tends to max and one tends to be a little higher, in a distributed-replica, I?d expect more than one to run harder while healing. > > Keep the differences between doing an ls on a brick and doing an ls on a gluster mount in mind. When you do a ls on a gluster volume, it isn?t just doing a ls on one brick, it?s effectively doing it on ALL of your bricks, and they all have to return data before the ls succeeds. In a distributed volume, it?s figuring out where on each volume things live and getting the stat() from each to assemble the whole thing. And if things are in need of healing, it will take even longer to decide which version is current and use it (shd triggers a heal anytime it encounters this). Any of these things being slow slows down the overall response. > > At this point, I?d get some sleep too, and let your cluster heal while you do. I?d really want it fully healed before I did any updates anyway, so let it use CPU and get itself sorted out. Expect it to do a round of healing after you upgrade each machine too, this is normal so don?t let the CPU spike surprise you, It?s just catching up from the downtime incurred by the update and/or reboot if you did one. > > That reminds me, check your gluster cluster.op-version and cluster.max-op-version (gluster vol get all all | grep op-version). If op-version isn?t at the max-op-verison, set it to it so you?re taking advantage of the latest features available to your version. > > -Darrell > >> On Apr 20, 2019, at 11:54 AM, Patrick Rennie > wrote: >> >> Hi Darrell, >> >> Thanks again for your advice, I've applied the acltype=posixacl on my zpools and I think that has reduced some of the noise from my brick logs. >> I also bumped up some of the thread counts you suggested but my CPU load skyrocketed, so I dropped it back down to something slightly lower, but still higher than it was before, and will see how that goes for a while. >> >> Although low space is a definite issue, if I run an ls anywhere on my bricks directly it's instant, <1 second, and still takes several minutes via gluster, so there is still a problem in my gluster configuration somewhere. We don't have any snapshots, but I am trying to work out if any data on there is safe to delete, or if there is any way I can safely find and delete data which has been removed directly from the bricks in the past. I also have lz4 compression already enabled on each zpool which does help a bit, we get between 1.05 and 1.08x compression on this data. >> I've tried to go through each client and checked it's cluster mount logs and also my brick logs and looking for errors, so far nothing is jumping out at me, but there are some warnings and errors here and there, I am trying to work out what they mean. >> >> It's already 1 am here and unfortunately, I'm still awake working on this issue, but I think that I will have to leave the version upgrades until tomorrow. >> >> Thanks again for your advice so far. If anyone has any ideas on where I can look for errors other than brick logs or the cluster mount logs to help resolve this issue, it would be much appreciated. >> >> Cheers, >> >> - Patrick >> >> On Sat, Apr 20, 2019 at 11:57 PM Darrell Budic > wrote: >> See inline: >> >>> On Apr 20, 2019, at 10:09 AM, Patrick Rennie > wrote: >>> >>> Hi Darrell, >>> >>> Thanks for your reply, this issue seems to be getting worse over the last few days, really has me tearing my hair out. I will do as you have suggested and get started on upgrading from 3.12.14 to 3.12.15. >>> I've checked the zfs properties and all bricks have "xattr=sa" set, but none of them has "acltype=posixacl" set, currently the acltype property shows "off", if I make these changes will it apply retroactively to the existing data? I'm unfamiliar with what this will change so I may need to look into that before I proceed. >> >> It is safe to apply that now, any new set/get calls will then use it if new posixacls exist, and use older if not. ZFS is good that way. It should clear up your posix_acl and posix errors over time. >> >>> I understand performance is going to slow down as the bricks get full, I am currently trying to free space and migrate data to some newer storage, I have fresh several hundred TB storage I just setup recently but with these performance issues it's really slow. I also believe there is significant data which has been deleted directly from the bricks in the past, so if I can reclaim this space in a safe manner then I will have at least around 10-15% free space. >> >> Full ZFS volumes will have a much larger impact on performance than you?d think, I?d prioritize this. If you have been taking zfs snapshots, consider deleting them to get the overall volume free space back up. And just to be sure it?s been said, delete from within the mounted volumes, don?t delete directly from the bricks (gluster will just try and heal it later, compounding your issues). Does not apply to deleting other data from the ZFS volume if it?s not part of the brick directory, of course. >> >>> These servers have dual 8 core Xeon (E5-2620v4) and 512GB of RAM so generally they have plenty of resources available, currently only using around 330/512GB of memory. >>> >>> I will look into what your suggested settings will change, and then will probably go ahead with your recommendations, for our specs as stated above, what would you suggest for performance.io -thread-count ? >> >> I run single 2630v4s on my servers, which have a smaller storage footprint than yours. I?d go with 32 for performance.io -thread-count. I?d try 4 for the shd thread settings on that gear. Your memory use sounds fine, so no worries there. >> >>> Our workload is nothing too extreme, we have a few VMs which write backup data to this storage nightly for our clients, our VMs don't live on this cluster, but just write to it. >> >> If they are writing compressible data, you?ll get immediate benefit by setting compression=lz4 on your ZFS volumes. It won?t help any old data, of course, but it will compress new data going forward. This is another one that?s safe to enable on the fly. >> >>> I've been going through all of the logs I can, below are some slightly sanitized errors I've come across, but I'm not sure what to make of them. The main error I am seeing is the first one below, across several of my bricks, but possibly only for specific folders on the cluster, I'm not 100% about that yet though. >>> >>> [2019-04-20 05:56:59.512649] E [MSGID: 113001] [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default [Operation not supported] >>> [2019-04-20 05:59:06.084333] E [MSGID: 113001] [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default [Operation not supported] >>> [2019-04-20 05:59:43.289030] E [MSGID: 113001] [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default [Operation not supported] >>> [2019-04-20 05:59:50.582257] E [MSGID: 113001] [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default [Operation not supported] >>> [2019-04-20 06:01:42.501701] E [MSGID: 113001] [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default [Operation not supported] >>> [2019-04-20 06:01:51.665354] W [posix.c:4929:posix_getxattr] 0-gvAA01-posix: Extended attributes not supported (try remounting brick with 'user_xattr' flag) >>> >>> >>> [2019-04-20 13:12:36.131856] E [MSGID: 113002] [posix-helpers.c:893:posix_gfid_set] 0-gvAA01-posix: gfid is null for /xxxxxxxxxxxxxxxxxxxx [Invalid argument] >>> [2019-04-20 13:12:36.131959] E [MSGID: 113002] [posix.c:362:posix_lookup] 0-gvAA01-posix: buf->ia_gfid is null for /brick2/xxxxxxxxxxxxxxxxxxxx_62906_tmp [No data available] >>> [2019-04-20 13:12:36.132016] E [MSGID: 115050] [server-rpc-fops.c:175:server_lookup_cbk] 0-gvAA01-server: 24274759: LOOKUP /xxxxxxxxxxxxxxxxxxxx (a7c9b4a0-b7ee-4d01-a79e-576013c8ac87/Cloud Backup_clone1.vbm_62906_tmp), client: 00-A-16217-2019/04/08-21:23:03:692424-gvAA01-client-4-0-3, error-xlator: gvAA01-posix [No data available] >>> [2019-04-20 13:12:38.093719] E [MSGID: 115050] [server-rpc-fops.c:175:server_lookup_cbk] 0-gvAA01-server: 24276491: LOOKUP /xxxxxxxxxxxxxxxxxxxx (a7c9b4a0-b7ee-4d01-a79e-576013c8ac87/Cloud Backup_clone1.vbm_62906_tmp), client: 00-A-16217-2019/04/08-21:23:03:692424-gvAA01-client-4-0-3, error-xlator: gvAA01-posix [No data available] >>> [2019-04-20 13:12:38.093660] E [MSGID: 113002] [posix-helpers.c:893:posix_gfid_set] 0-gvAA01-posix: gfid is null for /xxxxxxxxxxxxxxxxxxxx [Invalid argument] >>> [2019-04-20 13:12:38.093696] E [MSGID: 113002] [posix.c:362:posix_lookup] 0-gvAA01-posix: buf->ia_gfid is null for /brick2/xxxxxxxxxxxxxxxxxxxx [No data available] >>> >> >> posixacls should clear those up, as mentioned. >> >>> >>> [2019-04-20 14:25:59.654576] E [inodelk.c:404:__inode_unlock_lock] 0-gvAA01-locks: Matching lock not found for unlock 0-9223372036854775807, by 980fdbbd367f0000 on 0x7fc4f0161440 >>> [2019-04-20 14:25:59.654668] E [MSGID: 115053] [server-rpc-fops.c:295:server_inodelk_cbk] 0-gvAA01-server: 6092928: INODELK /xxxxxxxxxxxxxxxxxxxx.cdr$ (25b14631-a179-4274-8243-6e272d4f2ad8), client: cb-per-worker18-53637-2019/04/19-14:25:37:927673-gvAA01-client-1-0-4, error-xlator: gvAA01-locks [Invalid argument] >>> >>> >>> [2019-04-20 13:35:07.495495] E [rpcsvc.c:1364:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x247c644, Program: GlusterFS 3.3, ProgVers: 330, Proc: 27) to rpc-transport (tcp.gvAA01-server) >>> [2019-04-20 13:35:07.495619] E [server.c:195:server_submit_reply] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.14/xlator/debug/io-stats.so(+0x1696a) [0x7ff4ae6f796a] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.14/xlator/protocol/server.so(+0x2d6e8) [0x7ff4ae2a96e8] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.14/xlator/protocol/server.so(+0x928d) [0x7ff4ae28528d] ) 0-: Reply submission failed >>> >> >> Fix the posix acls and see if these clear up over time as well, I?m unclear on what the overall effect of running without the posix acls will be to total gluster health. Your biggest problem sounds like you need to free up space on the volumes and get the overall volume health back up to par and see if that doesn?t resolve the symptoms you?re seeing. >> >> >>> >>> Thank you again for your assistance. It is greatly appreciated. >>> >>> - Patrick >>> >>> >>> >>> On Sat, Apr 20, 2019 at 10:50 PM Darrell Budic > wrote: >>> Patrick, >>> >>> I would definitely upgrade your two nodes from 3.12.14 to 3.12.15. You also mention ZFS, and that error you show makes me think you need to check to be sure you have ?xattr=sa? and ?acltype=posixacl? set on your ZFS volumes. >>> >>> You also observed your bricks are crossing the 95% full line, ZFS performance will degrade significantly the closer you get to full. In my experience, this starts somewhere between 10% and 5% free space remaining, so you?re in that realm. >>> >>> How?s your free memory on the servers doing? Do you have your zfs arc cache limited to something less than all the RAM? It shares pretty well, but I?ve encountered situations where other things won?t try and take ram back properly if they think it?s in use, so ZFS never gets the opportunity to give it up. >>> >>> Since your volume is a disperse-replica, you might try tuning disperse.shd-max-threads, default is 1, I?d try it at 2, 4, or even more if the CPUs are beefy enough. And setting server.event-threads to 4 and client.event-threads to 8 has proven helpful in many cases. After you get upgraded to 3.12.15, enabling performance.stat-prefetch may help as well. I don?t know if it matters, but I?d also recommend resetting performance.least-prio-threads to the default of 1 (or try 2 or 4) and/or also setting performance.io -thread-count to 32 if those have beefy CPUs. >>> >>> Beyond those general ideas, more info about your hardware (CPU and RAM) and workload (VMs, direct storage for web servers or enders, etc) may net you some more ideas. Then you?re going to have to do more digging into brick logs looking for errors and/or warnings to see what?s going on. >>> >>> -Darrell >>> >>> >>>> On Apr 20, 2019, at 8:22 AM, Patrick Rennie > wrote: >>>> >>>> Hello Gluster Users, >>>> >>>> I am hoping someone can help me with resolving an ongoing issue I've been having, I'm new to mailing lists so forgive me if I have gotten anything wrong. We have noticed our performance deteriorating over the last few weeks, easily measured by trying to do an ls on one of our top-level folders, and timing it, which usually would take 2-5 seconds, and now takes up to 20 minutes, which obviously renders our cluster basically unusable. This has been intermittent in the past but is now almost constant and I am not sure how to work out the exact cause. We have noticed some errors in the brick logs, and have noticed that if we kill the right brick process, performance instantly returns back to normal, this is not always the same brick, but it indicates to me something in the brick processes or background tasks may be causing extreme latency. Due to this ability to fix it by killing the right brick process off, I think it's a specific file, or folder, or operation which may be hanging and causing the increased latency, but I am not sure how to work it out. One last thing to add is that our bricks are getting quite full (~95% full), we are trying to migrate data off to new storage but that is going slowly, not helped by this issue. I am currently trying to run a full heal as there appear to be many files needing healing, and I have all brick processes running so they have an opportunity to heal, but this means performance is very poor. It currently takes over 15-20 minutes to do an ls of one of our top-level folders, which just contains 60-80 other folders, this should take 2-5 seconds. This is all being checked by FUSE mount locally on the storage node itself, but it is the same for other clients and VMs accessing the cluster. Initially, it seemed our NFS mounts were not affected and operated at normal speed, but testing over the last day has shown that our NFS clients are also extremely slow, so it doesn't seem specific to FUSE as I first thought it might be. >>>> >>>> I am not sure how to proceed from here, I am fairly new to gluster having inherited this setup from my predecessor and trying to keep it going. I have included some info below to try and help with diagnosis, please let me know if any further info would be helpful. I would really appreciate any advice on what I could try to work out the cause. Thank you in advance for reading this, and any suggestions you might be able to offer. >>>> >>>> - Patrick >>>> >>>> This is an example of the main error I see in our brick logs, there have been others, I can post them when I see them again too: >>>> [2019-04-20 04:54:43.055680] E [MSGID: 113001] [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on /brick1/ library: system.posix_acl_default [Operation not supported] >>>> [2019-04-20 05:01:29.476313] W [posix.c:4929:posix_getxattr] 0-gvAA01-posix: Extended attributes not supported (try remounting brick with 'user_xattr' flag) >>>> >>>> Our setup consists of 2 storage nodes and an arbiter node. I have noticed our nodes are on slightly different versions, I'm not sure if this could be an issue. We have 9 bricks on each node, made up of ZFS RAIDZ2 pools - total capacity is around 560TB. >>>> We have bonded 10gbps NICS on each node, and I have tested bandwidth with iperf and found that it's what would be expected from this config. >>>> Individual brick performance seems ok, I've tested several bricks using dd and can write a 10GB files at 1.7GB/s. >>>> >>>> # dd if=/dev/zero of=/brick1/test/test.file bs=1M count=10000 >>>> 10000+0 records in >>>> 10000+0 records out >>>> 10485760000 bytes (10 GB, 9.8 GiB) copied, 6.20303 s, 1.7 GB/s >>>> >>>> Node 1: >>>> # glusterfs --version >>>> glusterfs 3.12.15 >>>> >>>> Node 2: >>>> # glusterfs --version >>>> glusterfs 3.12.14 >>>> >>>> Arbiter: >>>> # glusterfs --version >>>> glusterfs 3.12.14 >>>> >>>> Here is our gluster volume status: >>>> >>>> # gluster volume status >>>> Status of volume: gvAA01 >>>> Gluster process TCP Port RDMA Port Online Pid >>>> ------------------------------------------------------------------------------ >>>> Brick 01-B:/brick1/gvAA01/brick 49152 0 Y 7219 >>>> Brick 02-B:/brick1/gvAA01/brick 49152 0 Y 21845 >>>> Brick 00-A:/arbiterAA01/gvAA01/bri >>>> ck1 49152 0 Y 6931 >>>> Brick 01-B:/brick2/gvAA01/brick 49153 0 Y 7239 >>>> Brick 02-B:/brick2/gvAA01/brick 49153 0 Y 9916 >>>> Brick 00-A:/arbiterAA01/gvAA01/bri >>>> ck2 49153 0 Y 6939 >>>> Brick 01-B:/brick3/gvAA01/brick 49154 0 Y 7235 >>>> Brick 02-B:/brick3/gvAA01/brick 49154 0 Y 21858 >>>> Brick 00-A:/arbiterAA01/gvAA01/bri >>>> ck3 49154 0 Y 6947 >>>> Brick 01-B:/brick4/gvAA01/brick 49155 0 Y 31840 >>>> Brick 02-B:/brick4/gvAA01/brick 49155 0 Y 9933 >>>> Brick 00-A:/arbiterAA01/gvAA01/bri >>>> ck4 49155 0 Y 6956 >>>> Brick 01-B:/brick5/gvAA01/brick 49156 0 Y 7233 >>>> Brick 02-B:/brick5/gvAA01/brick 49156 0 Y 9942 >>>> Brick 00-A:/arbiterAA01/gvAA01/bri >>>> ck5 49156 0 Y 6964 >>>> Brick 01-B:/brick6/gvAA01/brick 49157 0 Y 7234 >>>> Brick 02-B:/brick6/gvAA01/brick 49157 0 Y 9952 >>>> Brick 00-A:/arbiterAA01/gvAA01/bri >>>> ck6 49157 0 Y 6974 >>>> Brick 01-B:/brick7/gvAA01/brick 49158 0 Y 7248 >>>> Brick 02-B:/brick7/gvAA01/brick 49158 0 Y 9960 >>>> Brick 00-A:/arbiterAA01/gvAA01/bri >>>> ck7 49158 0 Y 6984 >>>> Brick 01-B:/brick8/gvAA01/brick 49159 0 Y 7253 >>>> Brick 02-B:/brick8/gvAA01/brick 49159 0 Y 9970 >>>> Brick 00-A:/arbiterAA01/gvAA01/bri >>>> ck8 49159 0 Y 6993 >>>> Brick 01-B:/brick9/gvAA01/brick 49160 0 Y 7245 >>>> Brick 02-B:/brick9/gvAA01/brick 49160 0 Y 9984 >>>> Brick 00-A:/arbiterAA01/gvAA01/bri >>>> ck9 49160 0 Y 7001 >>>> NFS Server on localhost 2049 0 Y 17276 >>>> Self-heal Daemon on localhost N/A N/A Y 25245 >>>> NFS Server on 02-B 2049 0 Y 9089 >>>> Self-heal Daemon on 02-B N/A N/A Y 17838 >>>> NFS Server on 00-a 2049 0 Y 15660 >>>> Self-heal Daemon on 00-a N/A N/A Y 16218 >>>> >>>> Task Status of Volume gvAA01 >>>> ------------------------------------------------------------------------------ >>>> There are no active volume tasks >>>> >>>> And gluster volume info: >>>> >>>> # gluster volume info >>>> >>>> Volume Name: gvAA01 >>>> Type: Distributed-Replicate >>>> Volume ID: ca4ece2c-13fe-414b-856c-2878196d6118 >>>> Status: Started >>>> Snapshot Count: 0 >>>> Number of Bricks: 9 x (2 + 1) = 27 >>>> Transport-type: tcp >>>> Bricks: >>>> Brick1: 01-B:/brick1/gvAA01/brick >>>> Brick2: 02-B:/brick1/gvAA01/brick >>>> Brick3: 00-A:/arbiterAA01/gvAA01/brick1 (arbiter) >>>> Brick4: 01-B:/brick2/gvAA01/brick >>>> Brick5: 02-B:/brick2/gvAA01/brick >>>> Brick6: 00-A:/arbiterAA01/gvAA01/brick2 (arbiter) >>>> Brick7: 01-B:/brick3/gvAA01/brick >>>> Brick8: 02-B:/brick3/gvAA01/brick >>>> Brick9: 00-A:/arbiterAA01/gvAA01/brick3 (arbiter) >>>> Brick10: 01-B:/brick4/gvAA01/brick >>>> Brick11: 02-B:/brick4/gvAA01/brick >>>> Brick12: 00-A:/arbiterAA01/gvAA01/brick4 (arbiter) >>>> Brick13: 01-B:/brick5/gvAA01/brick >>>> Brick14: 02-B:/brick5/gvAA01/brick >>>> Brick15: 00-A:/arbiterAA01/gvAA01/brick5 (arbiter) >>>> Brick16: 01-B:/brick6/gvAA01/brick >>>> Brick17: 02-B:/brick6/gvAA01/brick >>>> Brick18: 00-A:/arbiterAA01/gvAA01/brick6 (arbiter) >>>> Brick19: 01-B:/brick7/gvAA01/brick >>>> Brick20: 02-B:/brick7/gvAA01/brick >>>> Brick21: 00-A:/arbiterAA01/gvAA01/brick7 (arbiter) >>>> Brick22: 01-B:/brick8/gvAA01/brick >>>> Brick23: 02-B:/brick8/gvAA01/brick >>>> Brick24: 00-A:/arbiterAA01/gvAA01/brick8 (arbiter) >>>> Brick25: 01-B:/brick9/gvAA01/brick >>>> Brick26: 02-B:/brick9/gvAA01/brick >>>> Brick27: 00-A:/arbiterAA01/gvAA01/brick9 (arbiter) >>>> Options Reconfigured: >>>> cluster.shd-max-threads: 4 >>>> performance.least-prio-threads: 16 >>>> cluster.readdir-optimize: on >>>> performance.quick-read: off >>>> performance.stat-prefetch: off >>>> cluster.data-self-heal: on >>>> cluster.lookup-unhashed: auto >>>> cluster.lookup-optimize: on >>>> cluster.favorite-child-policy: mtime >>>> server.allow-insecure: on >>>> transport.address-family: inet >>>> client.bind-insecure: on >>>> cluster.entry-self-heal: off >>>> cluster.metadata-self-heal: off >>>> performance.md-cache-timeout: 600 >>>> cluster.self-heal-daemon: enable >>>> performance.readdir-ahead: on >>>> diagnostics.brick-log-level: INFO >>>> nfs.disable: off >>>> >>>> Thank you for any assistance. >>>> >>>> - Patrick >>>> _______________________________________________ >>>> Gluster-users mailing list >>>> Gluster-users at gluster.org >>>> https://lists.gluster.org/mailman/listinfo/gluster-users >> > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From hunter86_bg at yahoo.com Mon Apr 22 04:27:04 2019 From: hunter86_bg at yahoo.com (Strahil) Date: Mon, 22 Apr 2019 07:27:04 +0300 Subject: [Gluster-users] Gluster 5.6 slow read despite fast local brick Message-ID: Hello Community, I have been left with the impression that FUSE mounts will read from both local and remote bricks , is that right? I'm using oVirt as a hyperconverged setup and despite my slow network (currently 1 gbit/s, will be expanded soon), I was expecting that at least the reads from the local brick will be fast, yet I can't reach more than 250 MB/s while the 2 data bricks are NVME with much higher capabilities. Is there something I can do about that ? Maybe change cluster.choose-local, as I don't see it on my other volumes ? What are the risks associated with that? Volume Name: data_fast Type: Replicate Volume ID: b78aa52a-4c49-407d-bfd8-fdffb2a3610a Status: Started Snapshot Count: 0 Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: ovirt1:/gluster_bricks/data_fast/data_fast Brick2: ovirt2:/gluster_bricks/data_fast/data_fast Brick3: ovirt3:/gluster_bricks/data_fast/data_fast (arbiter) Options Reconfigured: performance.client-io-threads: off nfs.disable: on transport.address-family: inet performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.low-prio-threads: 32 network.remote-dio: off cluster.eager-lock: enable cluster.quorum-type: auto cluster.server-quorum-type: server cluster.data-self-heal-algorithm: full cluster.locking-scheme: granular cluster.shd-max-threads: 8 cluster.shd-wait-qlength: 10000 features.shard: on user.cifs: off cluster.choose-local: off storage.owner-uid: 36 storage.owner-gid: 36 performance.strict-o-direct: on cluster.granular-entry-heal: enable network.ping-timeout: 30 cluster.enable-shared-storage: enable Best Regards, Strahil Nikolov -------------- next part -------------- An HTML attachment was scrubbed... URL: From kdhananj at redhat.com Mon Apr 22 06:47:49 2019 From: kdhananj at redhat.com (Krutika Dhananjay) Date: Mon, 22 Apr 2019 12:17:49 +0530 Subject: [Gluster-users] Settings for VM hosting In-Reply-To: <20190419071816.GH25080@althea.ulrar.net> References: <20190418072722.GF25080@althea.ulrar.net> <20190419071816.GH25080@althea.ulrar.net> Message-ID: On Fri, Apr 19, 2019 at 12:48 PM wrote: > On Fri, Apr 19, 2019 at 06:47:49AM +0530, Krutika Dhananjay wrote: > > Looks good mostly. > > You can also turn on performance.stat-prefetch, and also set > > Ah the corruption bug has been fixed, I missed that. Great ! > > > client.event-threads and server.event-threads to 4. > > I didn't realize that would also apply to libgfapi ? > Good to know, thanks. > > > And if your bricks are on ssds, then you could also enable > > performance.client-io-threads. > > I'm surprised by that, the doc says "This feature is not recommended for > distributed, replicated or distributed-replicated volumes." > Since this volume is just a replica 3, shouldn't this stay off ? > The disks are all nvme, which I assume would count as ssd. > They're not recommended if you're using slower disks (HDDs for instance) as it can increase the number of fsyncs triggered by replicate module and their slowness can degrade performance. With nvme/ssds this should not be a problem and the net result of enabling client-io-threads there should be an improvement in perf. -Krutika > > And if your bricks and hypervisors are on same set of machines > > (hyperconverged), > > then you can turn off cluster.choose-local and see if it helps read > > performance. > > Thanks, we'll give those a try ! > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ksubrahm at redhat.com Mon Apr 22 09:41:22 2019 From: ksubrahm at redhat.com (Karthik Subrahmanya) Date: Mon, 22 Apr 2019 15:11:22 +0530 Subject: [Gluster-users] adding thin arbiter In-Reply-To: References: Message-ID: Hi, Currently we do not have support for converting an existing volume to a thin-arbiter volume. It is also not supported to replace the thin-arbiter brick with a new one. You can create a fresh thin arbiter volume using GD2 framework and play around that. Feel free to share your experience with thin-arbiter. The GD1 CLIs are being implemented. We will keep things posted on this list as and when they are ready to consume. Regards, Karthik On Fri, Apr 19, 2019 at 8:39 PM wrote: > Hi guys, > > On an existing volume, I have a volume with 3 replica. One of them is an > arbiter. Is there a way to change the arbiter to a thin-arbiter? I tried > removing the arbiter brick and add it back, but the add-brick command > does't take the --thin-arbiter option. > > xpk > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From srangana at redhat.com Mon Apr 22 11:38:39 2019 From: srangana at redhat.com (Shyam Ranganathan) Date: Mon, 22 Apr 2019 07:38:39 -0400 Subject: [Gluster-users] v6.0 release notes fix request In-Reply-To: References: Message-ID: Thanks for reporting, this is fixed now. On 4/19/19 2:57 AM, Artem Russakovskii wrote: > Hi, > > https://docs.gluster.org/en/latest/release-notes/6.0/?currently contains > a list of fixed bugs that's run-on and should be fixed with proper line > breaks: > image.png > > Sincerely, > Artem > > -- > Founder, Android Police ,?APK Mirror > , Illogical Robot LLC > beerpla.net | +ArtemRussakovskii > |?@ArtemR > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > From srangana at redhat.com Mon Apr 22 13:30:45 2019 From: srangana at redhat.com (Shyam Ranganathan) Date: Mon, 22 Apr 2019 09:30:45 -0400 Subject: [Gluster-users] Announcing Gluster release 6.1 Message-ID: The Gluster community is pleased to announce the release of Gluster 6.1 (packages available at [1]). Release notes for the release can be found at [2]. Major changes, features and limitations addressed in this release: None Thanks, Gluster community [1] Packages for 6.1: https://download.gluster.org/pub/gluster/glusterfs/6/6.1/ [2] Release notes for 6.1: https://docs.gluster.org/en/latest/release-notes/6.1/ _______________________________________________ maintainers mailing list maintainers at gluster.org https://lists.gluster.org/mailman/listinfo/maintainers From hunter86_bg at yahoo.com Mon Apr 22 14:18:42 2019 From: hunter86_bg at yahoo.com (Strahil Nikolov) Date: Mon, 22 Apr 2019 14:18:42 +0000 (UTC) Subject: [Gluster-users] Gluster 5.6 slow read despite fast local brick In-Reply-To: References: Message-ID: <1248551588.2770560.1555942722144@mail.yahoo.com> As I had the option to rebuild the volume - I did it and it still reads quite slower than before 5.6 upgrade. I have set cluster.choose-local to 'on' but still the same performance. Volume Name: data_fast Type: Replicate Volume ID: 888a32ea-9b5c-4001-a9c5-8bc7ee0bddce Status: Started Snapshot Count: 0 Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: ovirt1:/gluster_bricks/data_fast/data_fast Brick2: ovirt2:/gluster_bricks/data_fast/data_fast Brick3: ovirt3:/gluster_bricks/data_fast/data_fast (arbiter) Options Reconfigured: cluster.choose-local: on network.ping-timeout: 30 cluster.granular-entry-heal: enable performance.strict-o-direct: on storage.owner-gid: 36 storage.owner-uid: 36 user.cifs: off features.shard: on cluster.shd-wait-qlength: 10000 cluster.shd-max-threads: 8 cluster.locking-scheme: granular cluster.data-self-heal-algorithm: full cluster.server-quorum-type: server cluster.quorum-type: auto cluster.eager-lock: enable network.remote-dio: off performance.low-prio-threads: 32 performance.io-cache: off performance.read-ahead: off performance.quick-read: off transport.address-family: inet nfs.disable: on performance.client-io-threads: off cluster.enable-shared-storage: enable Any issues expected when downgrading the version ? Best Regards,Strahil Nikolov ? ??????????, 22 ????? 2019 ?., 0:26:51 ?. ???????-4, Strahil ??????: Hello Community, I have been left with the impression that FUSE mounts will read from both local and remote bricks , is that right? I'm using oVirt as a hyperconverged setup and despite my slow network (currently 1 gbit/s, will be expanded soon), I was expecting that at least the reads from the local brick will be fast, yet I can't reach more than 250 MB/s while the 2 data bricks are NVME with much higher capabilities. Is there something I can do about that ? Maybe change cluster.choose-local, as I don't see it on my other volumes ? What are the risks associated with that? Volume Name: data_fast Type: Replicate Volume ID: b78aa52a-4c49-407d-bfd8-fdffb2a3610a Status: Started Snapshot Count: 0 Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: ovirt1:/gluster_bricks/data_fast/data_fast Brick2: ovirt2:/gluster_bricks/data_fast/data_fast Brick3: ovirt3:/gluster_bricks/data_fast/data_fast (arbiter) Options Reconfigured: performance.client-io-threads: off nfs.disable: on transport.address-family: inet performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.low-prio-threads: 32 network.remote-dio: off cluster.eager-lock: enable cluster.quorum-type: auto cluster.server-quorum-type: server cluster.data-self-heal-algorithm: full cluster.locking-scheme: granular cluster.shd-max-threads: 8 cluster.shd-wait-qlength: 10000 features.shard: on user.cifs: off cluster.choose-local: off storage.owner-uid: 36 storage.owner-gid: 36 performance.strict-o-direct: on cluster.granular-entry-heal: enable network.ping-timeout: 30 cluster.enable-shared-storage: enable Best Regards, Strahil Nikolov Hello Community, I have been left with the impression that FUSE mounts will read from both local and remote bricks , is that right? I'm using oVirt as a hyperconverged setup and despite my slow network (currently 1 gbit/s, will be expanded soon), I was expecting that at least the reads from the local brick will be fast, yet I can't reach more than 250 MB/s while the 2 data bricks are NVME with much higher capabilities. Is there something I can do about that ? Maybe change cluster.choose-local, as I don't see it on my other volumes ? What are the risks associated with that? Volume Name: data_fast Type: Replicate Volume ID: b78aa52a-4c49-407d-bfd8-fdffb2a3610a Status: Started Snapshot Count: 0 Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: ovirt1:/gluster_bricks/data_fast/data_fast Brick2: ovirt2:/gluster_bricks/data_fast/data_fast Brick3: ovirt3:/gluster_bricks/data_fast/data_fast (arbiter) Options Reconfigured: performance.client-io-threads: off nfs.disable: on transport.address-family: inet performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.low-prio-threads: 32 network.remote-dio: off cluster.eager-lock: enable cluster.quorum-type: auto cluster.server-quorum-type: server cluster.data-self-heal-algorithm: full cluster.locking-scheme: granular cluster.shd-max-threads: 8 cluster.shd-wait-qlength: 10000 features.shard: on user.cifs: off cluster.choose-local: off storage.owner-uid: 36 storage.owner-gid: 36 performance.strict-o-direct: on cluster.granular-entry-heal: enable network.ping-timeout: 30 cluster.enable-shared-storage: enable Best Regards, Strahil Nikolov -------------- next part -------------- An HTML attachment was scrubbed... URL: From amye at redhat.com Mon Apr 22 14:43:35 2019 From: amye at redhat.com (Amye Scavarda) Date: Mon, 22 Apr 2019 07:43:35 -0700 Subject: [Gluster-users] Community Happy Hour at Red Hat Summit Message-ID: The Ceph and Gluster teams are joining forces to put on a Community Happy Hour in Boston on Tuesday, May 7th as part of Red Hat Summit. More details, including RSVP at: https://cephandglusterhappyhour_rhsummit.eventbrite.com -- amye -- Amye Scavarda | amye at redhat.com | Gluster Community Lead From hunter86_bg at yahoo.com Mon Apr 22 18:00:56 2019 From: hunter86_bg at yahoo.com (Strahil) Date: Mon, 22 Apr 2019 21:00:56 +0300 Subject: [Gluster-users] Gluster 5.6 slow read despite fast local brick Message-ID: I've set 'cluster.choose-local: on' and the sequential read is aprox 550MB/s , but this is far below the 1.3G I have observed with gluster v5.5 . Should I consider it a bug, or some options need to be changed ? What about rolling back? I've tried to roll back one of my nodes, but it never came back until I have upgraded to 5.6 . Maybe a full offline downgrade could work... Best Regards, Strahil NikolovOn Apr 22, 2019 17:18, Strahil Nikolov wrote: > > As I had the option to rebuild the volume - I did it and it still reads quite slower than before 5.6 upgrade. > > I have set cluster.choose-local to 'on' but still the same performance. > > Volume Name: data_fast > Type: Replicate > Volume ID: 888a32ea-9b5c-4001-a9c5-8bc7ee0bddce > Status: Started > Snapshot Count: 0 > Number of Bricks: 1 x (2 + 1) = 3 > Transport-type: tcp > Bricks: > Brick1: ovirt1:/gluster_bricks/data_fast/data_fast > Brick2: ovirt2:/gluster_bricks/data_fast/data_fast > Brick3: ovirt3:/gluster_bricks/data_fast/data_fast (arbiter) > Options Reconfigured: > cluster.choose-local: on > network.ping-timeout: 30 > cluster.granular-entry-heal: enable > performance.strict-o-direct: on > storage.owner-gid: 36 > storage.owner-uid: 36 > user.cifs: off > features.shard: on > cluster.shd-wait-qlength: 10000 > cluster.shd-max-threads: 8 > cluster.locking-scheme: granular > cluster.data-self-heal-algorithm: full > cluster.server-quorum-type: server > cluster.quorum-type: auto > cluster.eager-lock: enable > network.remote-dio: off > performance.low-prio-threads: 32 > performance.io-cache: off > performance.read-ahead: off > performance.quick-read: off > transport.address-family: inet > nfs.disable: on > performance.client-io-threads: off > cluster.enable-shared-storage: enable > > Any issues expected when downgrading the version ? > > Best Regards, > Strahil Nikolov > > > ? ??????????, 22 ????? 2019 ?., 0:26:51 ?. ???????-4, Strahil ??????: > > > Hello Community, > > I have been left with the impression that FUSE mounts will read from both local and remote bricks , is that right? > > I'm using oVirt as a hyperconverged setup and despite my slow network (currently 1 gbit/s, will be expanded soon), I was expecting that at least the reads from the local brick will be fast, yet I can't reach more than 250 MB/s while the 2 data bricks are NVME with much higher capabilities. > > Is there something I can do about that ? > Maybe change cluster.choose-local, as I don't see it on my other volumes ? > What are the risks associated with that? > > Volume Name: data_fast > Type: Replicate > Volume ID: b78aa52a-4c49-407d-bfd8-fdffb2a3610a > Status: Started > Snapshot Count: 0 > Number of Bricks: 1 x (2 + 1) = 3 > Transport-type: tcp > Bricks: > Brick1: ovirt1:/gluster_bricks/data_fast/data_fast > Brick2: ovirt2:/gluster_bricks/data_fast/data_fast > Brick3: ovirt3:/gluster_bricks/data_fast/data_fast (arbiter) > Options Reconfigured: > performance.client-io-threads: off > nfs.disable: on > transport.address-family: inet > performance.quick-read: off > performance.read-ahead: off > performance.io-cache: off > performance.low-prio-threads: 32 > network.remote-dio: off > cluster.eager-lock: enable > cluster.quorum-type: auto > cluster.server-quorum-type: server > cluster.data-self-heal-algorithm: full > cluster.locking-scheme: granular > cluster.shd-max-threads: 8 > cluster.shd-wait-qlength: 10000 > features.shard: on > user.cifs: off > cluster.choose-local: off > storage.owner-uid: 36 > storage.owner-gid: 36 > performance.strict-o-direct: on > cluster.granular-entry-heal: enable > network.ping-timeout: 30 > cluster.enable-shared-storage: enable > > Best Regards, > Strahil Nikolov > > Hello Community, > > I have been left with the impression that FUSE mounts will read from both local and remote bricks , is that right? > > I'm using oVirt as a hyperconverged setup and despite my slow network (currently 1 gbit/s, will be expanded soon), I was expecting that at least the reads from the local brick will be fast, yet I can't reach more than 250 MB/s while the -------------- next part -------------- An HTML attachment was scrubbed... URL: From rabhat at redhat.com Mon Apr 22 21:20:58 2019 From: rabhat at redhat.com (FNU Raghavendra Manjunath) Date: Mon, 22 Apr 2019 17:20:58 -0400 Subject: [Gluster-users] Proposal: Changes in Gluster Community meetings In-Reply-To: References: <62104B6F-99CF-4C22-80FC-9C177F73E897@onholyground.com> Message-ID: Hi, This is the agenda for tomorrow's community meeting for NA/EMEA timezone. https://hackmd.io/OqZbh7gfQe6uvVUXUVKJ5g?both ---- On Thu, Apr 11, 2019 at 4:56 AM Amar Tumballi Suryanarayan < atumball at redhat.com> wrote: > Hi All, > > Below is the final details of our community meeting, and I will be sending > invites to mailing list following this email. You can add Gluster Community > Calendar so you can get notifications on the meetings. > > We are starting the meetings from next week. For the first meeting, we > need 1 volunteer from users to discuss the use case / what went well, and > what went bad, etc. preferrably in APAC region. NA/EMEA region, next week. > > Draft Content: https://hackmd.io/OqZbh7gfQe6uvVUXUVKJ5g > ---- > Gluster Community Meeting > Previous > Meeting minutes: > > - http://github.com/gluster/community > > > Date/Time: > Check the community calendar > > Bridge > > - APAC friendly hours > - Bridge: https://bluejeans.com/836554017 > - NA/EMEA > - Bridge: https://bluejeans.com/486278655 > > ------------------------------ > Attendance > > - Name, Company > > Host > > - Who will host next meeting? > - Host will need to send out the agenda 24hr - 12hrs in advance to > mailing list, and also make sure to send the meeting minutes. > - Host will need to reach out to one user at least who can talk > about their usecase, their experience, and their needs. > - Host needs to send meeting minutes as PR to > http://github.com/gluster/community > > User stories > > - Discuss 1 usecase from a user. > - How was the architecture derived, what volume type used, options, > etc? > - What were the major issues faced ? How to improve them? > - What worked good? > - How can we all collaborate well, so it is win-win for the > community and the user? How can we > > Community > > - > > Any release updates? > - > > Blocker issues across the project? > - > > Metrics > - Number of new bugs since previous meeting. How many are not triaged? > - Number of emails, anything unanswered? > > Conferences > / Meetups > > - Any conference in next 1 month where gluster-developers are going? > gluster-users are going? So we can meet and discuss. > > Developer > focus > > - > > Any design specs to discuss? > - > > Metrics of the week? > - Coverity > - Clang-Scan > - Number of patches from new developers. > - Did we increase test coverage? > - [Atin] Also talk about most frequent test failures in the CI and > carve out an AI to get them fixed. > > RoundTable > > - > > ---- > > Regards, > Amar > > On Mon, Mar 25, 2019 at 8:53 PM Amar Tumballi Suryanarayan < > atumball at redhat.com> wrote: > >> Thanks for the feedback Darrell, >> >> The new proposal is to have one in North America 'morning' time. (10AM >> PST), And another in ASIA day time, which is evening 7pm/6pm in Australia, >> 9pm Newzealand, 5pm Tokyo, 4pm Beijing. >> >> For example, if we choose Every other Tuesday for meeting, and 1st of the >> month is Tuesday, we would have North America time for 1st, and on 15th it >> would be ASIA/Pacific time. >> >> Hopefully, this way, we can cover all the timezones, and meeting minutes >> would be committed to github repo, so that way, it will be easier for >> everyone to be aware of what is happening. >> >> Regards, >> Amar >> >> On Mon, Mar 25, 2019 at 8:40 PM Darrell Budic >> wrote: >> >>> As a user, I?d like to visit more of these, but the time slot is my 3AM. >>> Any possibility for a rolling schedule (move meeting +6 hours each week >>> with rolling attendance from maintainers?) or an occasional regional >>> meeting 12 hours opposed to the one you?re proposing? >>> >>> -Darrell >>> >>> On Mar 25, 2019, at 4:25 AM, Amar Tumballi Suryanarayan < >>> atumball at redhat.com> wrote: >>> >>> All, >>> >>> We currently have 3 meetings which are public: >>> >>> 1. Maintainer's Meeting >>> >>> - Runs once in 2 weeks (on Mondays), and current attendance is around >>> 3-5 on an avg, and not much is discussed. >>> - Without majority attendance, we can't take any decisions too. >>> >>> 2. Community meeting >>> >>> - Supposed to happen on #gluster-meeting, every 2 weeks, and is the >>> only meeting which is for 'Community/Users'. Others are for developers >>> as of now. >>> Sadly attendance is getting closer to 0 in recent times. >>> >>> 3. GCS meeting >>> >>> - We started it as an effort inside Red Hat gluster team, and opened it >>> up for community from Jan 2019, but the attendance was always from RHT >>> members, and haven't seen any traction from wider group. >>> >>> So, I have a proposal to call out for cancelling all these meeting, and >>> keeping just 1 weekly 'Community' meeting, where even topics related to >>> maintainers and GCS and other projects can be discussed. >>> >>> I have a template of a draft template @ >>> https://hackmd.io/OqZbh7gfQe6uvVUXUVKJ5g >>> >>> Please feel free to suggest improvements, both in agenda and in timings. >>> So, we can have more participation from members of community, which allows >>> more user - developer interactions, and hence quality of project. >>> >>> Waiting for feedbacks, >>> >>> Regards, >>> Amar >>> >>> >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> https://lists.gluster.org/mailman/listinfo/gluster-users >>> >>> >>> >> >> -- >> Amar Tumballi (amarts) >> > > > -- > Amar Tumballi (amarts) > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From patrickmrennie at gmail.com Tue Apr 23 10:07:42 2019 From: patrickmrennie at gmail.com (Patrick Rennie) Date: Tue, 23 Apr 2019 18:07:42 +0800 Subject: [Gluster-users] Extremely slow cluster performance In-Reply-To: <0A865F28-C4A6-41EF-AE37-70216670B4F0@onholyground.com> References: <93FC9B39-2E8C-4579-8C9D-DEF1A28B7384@onholyground.com> <0A865F28-C4A6-41EF-AE37-70216670B4F0@onholyground.com> Message-ID: Hi Darrel, Thanks again for your advice, I tried to take yesterday off and just not think about it, back at it again today. Still no real progress, however my colleague upgraded our version to 3.13 yesterday, this has broken NFS and caused some other issues for us now. It did add the 'gluster volume heal info summary' so I can use that to try and keep an eye on how many files do seem to need healing, if it's accurate it's possibly less than I though. We are in the progress of moving this data to new storage, but it does take a long time to move so much data around, and more keeps coming in each day. We do have 3 cache SSDs for each brick so generally performance on the bricks themselves is quite quick, I can DD a 10GB file at ~1.7-2GB/s directly on a brick so I think the performance of each brick is actually ok. It's a distribute/replicate volume, not dispearsed so I can't change disperse.shd-max-threads. I have checked the basics like all peers connected and no scrubs in progress etc. Will keep working away at this, and will start to read through some of your performance tuning suggestions. Really appreciate your advice. Cheers, -Patrick On Mon, Apr 22, 2019 at 12:43 AM Darrell Budic wrote: > Patrick- > > Specifically re: > > Thanks again for your advice, I've left it for a while but unfortunately >>> it's still just as slow and causing more problems for our operations now. I >>> will need to try and take some steps to at least bring performance back to >>> normal while continuing to investigate the issue longer term. I can >>> definitely see one node with heavier CPU than the other, almost double, >>> which I am OK with, but I think the heal process is going to take forever, >>> trying to check the "gluster volume heal info" shows thousands and >>> thousands of files which may need healing, I have no idea how many in total >>> the command is still running after hours, so I am not sure what has gone so >>> wrong to cause this. >>> ... >>> I have no idea how long the healing is going to take on this cluster, we >>> have around 560TB of data on here, but I don't think I can wait that long >>> to try and restore performance to normal. >>> >> > You?re in a bind, I know, but it?s just going to take some time recover. > You have a lot of data, and even at the best speeds your disks and networks > can muster, it?s going to take a while. Until your cluster is fully healed, > anything else you try may not have the full effect it would on a fully > operational cluster. Your predecessor may have made things worse by not > having proper posix attributes on the ZFS file system. You may have made > things worse by killing brick processes in your distributed-replicated > setup, creating an additional need for healing and possibly compounding the > overall performance issues. I?m not trying to blame you or make you feel > bad, but I do want to point out that there?s a problem here, and there is > unlikely to be a silver bullet that will resolve the issue instantly. > You?re going to have to give it time to get back into a ?normal" condition, > which seems to be what your setup was configured and tested for in the > first place. > > Those things said, rather than trying to move things from this cluster to > different storage, what about having your VMs mount different storage in > the first place and move the write load off of this cluster while it > recovers? > > Looking at the profile you posted for Strahil, your bricks are spending a > lot of time doing LOOKUPs, and some are slower than others by a significant > margin. If you haven?t already, check the zfs pools on those, make sure > they don?t have any failed disks that might be slowing them down. Consider > if you can speed them up with a ZIL or SLOG if they are spinning disks > (although your previous server descriptions sound like you don?t need a > SLOG, ZILs may help fi they are HDDs)? Just saw your additional comments > that one server is faster than than the other, it?s possible that it?s got > the actual data and the other one is doing healings every time it gets > accessed, or it?s just got fuller and slower volumes. It may make sense to > try forcing all your VM mounts to the faster server for a while, even if > it?s the one with higher load (serving will get preference to healing, but > don?t push the shd-max-threads too high, they can squash performance. Given > it?s a dispersed volume, make sure you?ve got disperse.shd-max-threads at 4 > or 8, and raise disperse.shd-wait-qlength to 4096 or so. > > You?re getting into things best tested with everything working, but > desperate times call for accelerated testing, right? > > You could experiment with different values of performance.io-thread-cound, > try 48. But if your CPU load is already near max, you?re getting everything > you can out of your CPU already, so don?t spend too much time on it. > > Check out > https://github.com/gluster/glusterfs/blob/release-3.11/extras/group-nl-cache and > try applying these to your gluster volume. Without knowing more about your > workload, these may help if you?re doing a lot of directory listing and > file lookups or tests for the (non)existence of a file from your VMs. If > those help, search the mailing list for info on the mount option > ?negative_cache=1? and a thread titled '[Gluster-users] Gluster native > mount is really slow compared to nfs?, it may have some client side mount > options that could give you further benefits. > > Have a look at > https://docs.gluster.org/en/v3/Administrator%20Guide/Managing%20Volumes/#tuning-options, > cluster.data-sef-heal-algorithm full may help things heal faster for you. > performance.flush-behind & related may improve write response to the > clients, use caution unless you have UPSs & battery backed raids, etc. If > you have stats on network traffic on/between your two ?real? node servers, > you can use that as a proxy value for healing performance. > > I looked up the performance.stat-prefetch bug for you, it was fixed back > in 3.8, so it should be safe to enable on your 3.12.x system even with > servers at .15 & .14. > > You?ll probably have to wait for devs to get anything else out of those logs, > but make sure your servers can all see each other (gluster peer status, > everything should be ?Peer in Cluster (Connected)? on all servers), and > all 3 see all the bricks in the ?gluster vol status?. Maybe check for > split brain files on those you keep seeing in the logs? > > Good luck, have patience, and remember (& remind others) that things are > not in their normal state at this moment, and look for things outside of > the gluster server cluster to try to help ( > https://joejulian.name/post/optimizing-web-performance-with-glusterfs/) > get through the healing as well. > > -Darrell > > On Apr 21, 2019, at 4:41 AM, Patrick Rennie > wrote: > > Another small update from me, I have been keeping an eye on the > glustershd.log file to see what is going on and I keep seeing the same file > names come up in there every 10 minutes, but not a lot of other activity. > Logs below. > How can I be sure my heal is progressing through the files which actually > need to be healed? I thought it would show up in these logs. > I also increased the "cluster.shd-max-threads" from 4 to 8 to try and > speed things up too. > > Any ideas here? > > Thanks, > > - Patrick > > On 01-B > ------- > [2019-04-21 09:12:54.575689] I [MSGID: 108026] > [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] > 0-gvAA01-replicate-6: performing metadata selfheal on > 5354c112-2e58-451d-a6f7-6bfcc1c9d904 > [2019-04-21 09:12:54.733601] I [MSGID: 108026] > [afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6: > Completed metadata selfheal on 5354c112-2e58-451d-a6f7-6bfcc1c9d904. > sources=[0] 2 sinks=1 > [2019-04-21 09:13:12.028509] I [MSGID: 108026] > [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-5: > performing entry selfheal on 30547ab6-1fbd-422e-9c81-2009f9ff7ebe > [2019-04-21 09:13:12.047470] W [MSGID: 108015] > [afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-5: > expunging file 30547ab6-1fbd-422e-9c81-2009f9ff7ebe/XXXXXXXX.vbm_346744_tmp > (00000000-0000-0000-0000-000000000000) on gvAA01-client-17 > > [2019-04-21 09:23:13.044377] I [MSGID: 108026] > [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-5: > performing entry selfheal on 30547ab6-1fbd-422e-9c81-2009f9ff7ebe > [2019-04-21 09:23:13.051479] W [MSGID: 108015] > [afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-5: > expunging file 30547ab6-1fbd-422e-9c81-2009f9ff7ebe/XXXXXXXX.vbm_346744_tmp > (00000000-0000-0000-0000-000000000000) on gvAA01-client-17 > > [2019-04-21 09:33:07.400369] I [MSGID: 108026] > [afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6: > Completed data selfheal on 2fd9899f-192b-49cb-ae9c-df35d3f004fa. > sources=[0] 2 sinks=1 > [2019-04-21 09:33:11.825449] I [MSGID: 108026] > [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] > 0-gvAA01-replicate-6: performing metadata selfheal on > 2fd9899f-192b-49cb-ae9c-df35d3f004fa > [2019-04-21 09:33:14.029837] I [MSGID: 108026] > [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-5: > performing entry selfheal on 30547ab6-1fbd-422e-9c81-2009f9ff7ebe > [2019-04-21 09:33:14.037436] W [MSGID: 108015] > [afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-5: > expunging file 30547ab6-1fbd-422e-9c81-2009f9ff7ebe/XXXXXXXX.vbm_346744_tmp > (00000000-0000-0000-0000-000000000000) on gvAA01-client-17 > [2019-04-21 09:33:23.913882] I [MSGID: 108026] > [afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6: > Completed metadata selfheal on 2fd9899f-192b-49cb-ae9c-df35d3f004fa. > sources=[0] 2 sinks=1 > [2019-04-21 09:33:43.874201] I [MSGID: 108026] > [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] > 0-gvAA01-replicate-6: performing metadata selfheal on > c25b80fd-f7df-4c6d-92bd-db930e89a0b1 > [2019-04-21 09:34:02.273898] I [MSGID: 108026] > [afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6: > Completed metadata selfheal on c25b80fd-f7df-4c6d-92bd-db930e89a0b1. > sources=[0] 2 sinks=1 > [2019-04-21 09:35:12.282045] I [MSGID: 108026] > [afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6: > Completed data selfheal on 94027f22-a7d7-4827-be0d-09cf5ddda885. > sources=[0] 2 sinks=1 > [2019-04-21 09:35:15.146252] I [MSGID: 108026] > [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] > 0-gvAA01-replicate-6: performing metadata selfheal on > 94027f22-a7d7-4827-be0d-09cf5ddda885 > [2019-04-21 09:35:15.254538] I [MSGID: 108026] > [afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6: > Completed metadata selfheal on 94027f22-a7d7-4827-be0d-09cf5ddda885. > sources=[0] 2 sinks=1 > [2019-04-21 09:35:22.900803] I [MSGID: 108026] > [afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6: > Completed data selfheal on 84c93069-cfd8-441b-a6e8-958bed535b45. > sources=[0] 2 sinks=1 > [2019-04-21 09:35:27.150963] I [MSGID: 108026] > [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] > 0-gvAA01-replicate-6: performing metadata selfheal on > 84c93069-cfd8-441b-a6e8-958bed535b45 > [2019-04-21 09:35:29.186295] I [MSGID: 108026] > [afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6: > Completed metadata selfheal on 84c93069-cfd8-441b-a6e8-958bed535b45. > sources=[0] 2 sinks=1 > [2019-04-21 09:35:35.967451] I [MSGID: 108026] > [afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6: > Completed data selfheal on e747c32e-4353-4173-9024-855c69cdf9b9. > sources=[0] 2 sinks=1 > [2019-04-21 09:35:40.733444] I [MSGID: 108026] > [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] > 0-gvAA01-replicate-6: performing metadata selfheal on > e747c32e-4353-4173-9024-855c69cdf9b9 > [2019-04-21 09:35:58.707593] I [MSGID: 108026] > [afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6: > Completed metadata selfheal on e747c32e-4353-4173-9024-855c69cdf9b9. > sources=[0] 2 sinks=1 > [2019-04-21 09:36:25.554260] I [MSGID: 108026] > [afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6: > Completed data selfheal on 4758d581-9de0-403b-af8b-bfd3d71d020d. > sources=[0] 2 sinks=1 > [2019-04-21 09:36:26.031422] I [MSGID: 108026] > [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] > 0-gvAA01-replicate-6: performing metadata selfheal on > 4758d581-9de0-403b-af8b-bfd3d71d020d > [2019-04-21 09:36:26.083982] I [MSGID: 108026] > [afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6: > Completed metadata selfheal on 4758d581-9de0-403b-af8b-bfd3d71d020d. > sources=[0] 2 sinks=1 > > On 02-B > ------- > [2019-04-21 09:03:15.815250] I [MSGID: 108026] > [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-4: > performing entry selfheal on 8fa9513e-82fd-4a6e-8ac9-e1f1cd8afb01 > [2019-04-21 09:03:15.863153] W [MSGID: 108015] > [afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-4: > expunging file > 8fa9513e-82fd-4a6e-8ac9-e1f1cd8afb01/C_VOL-b003-i174-cd.md5.tmp > (00000000-0000-0000-0000-000000000000) on gvAA01-client-14 > [2019-04-21 09:03:15.867432] I [MSGID: 108026] > [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-4: > performing entry selfheal on 65f6d9fb-a441-4e47-b91a-0936d11a8c8f > [2019-04-21 09:03:15.875134] W [MSGID: 108015] > [afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-4: > expunging file > 65f6d9fb-a441-4e47-b91a-0936d11a8c8f/C_VOL-b001-i14937-cd.md5.tmp > (00000000-0000-0000-0000-000000000000) on gvAA01-client-14 > [2019-04-21 09:03:39.020198] I [MSGID: 108026] > [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-5: > performing entry selfheal on 30547ab6-1fbd-422e-9c81-2009f9ff7ebe > [2019-04-21 09:03:39.027345] W [MSGID: 108015] > [afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-5: > expunging file 30547ab6-1fbd-422e-9c81-2009f9ff7ebe/XXXXXXXX.vbm_346744_tmp > (00000000-0000-0000-0000-000000000000) on gvAA01-client-17 > > [2019-04-21 09:13:18.524874] I [MSGID: 108026] > [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-4: > performing entry selfheal on 8fa9513e-82fd-4a6e-8ac9-e1f1cd8afb01 > [2019-04-21 09:13:20.070172] W [MSGID: 108015] > [afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-4: > expunging file > 8fa9513e-82fd-4a6e-8ac9-e1f1cd8afb01/C_VOL-b003-i174-cd.md5.tmp > (00000000-0000-0000-0000-000000000000) on gvAA01-client-14 > [2019-04-21 09:13:20.074977] I [MSGID: 108026] > [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-4: > performing entry selfheal on 65f6d9fb-a441-4e47-b91a-0936d11a8c8f > [2019-04-21 09:13:20.080827] W [MSGID: 108015] > [afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-4: > expunging file > 65f6d9fb-a441-4e47-b91a-0936d11a8c8f/C_VOL-b001-i14937-cd.md5.tmp > (00000000-0000-0000-0000-000000000000) on gvAA01-client-14 > [2019-04-21 09:13:40.015763] I [MSGID: 108026] > [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-5: > performing entry selfheal on 30547ab6-1fbd-422e-9c81-2009f9ff7ebe > [2019-04-21 09:13:40.021805] W [MSGID: 108015] > [afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-5: > expunging file 30547ab6-1fbd-422e-9c81-2009f9ff7ebe/XXXXXXXX.vbm_346744_tmp > (00000000-0000-0000-0000-000000000000) on gvAA01-client-17 > > [2019-04-21 09:23:21.991032] I [MSGID: 108026] > [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-4: > performing entry selfheal on 8fa9513e-82fd-4a6e-8ac9-e1f1cd8afb01 > [2019-04-21 09:23:22.054565] W [MSGID: 108015] > [afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-4: > expunging file > 8fa9513e-82fd-4a6e-8ac9-e1f1cd8afb01/C_VOL-b003-i174-cd.md5.tmp > (00000000-0000-0000-0000-000000000000) on gvAA01-client-14 > [2019-04-21 09:23:22.059225] I [MSGID: 108026] > [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-4: > performing entry selfheal on 65f6d9fb-a441-4e47-b91a-0936d11a8c8f > [2019-04-21 09:23:22.066266] W [MSGID: 108015] > [afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-4: > expunging file > 65f6d9fb-a441-4e47-b91a-0936d11a8c8f/C_VOL-b001-i14937-cd.md5.tmp > (00000000-0000-0000-0000-000000000000) on gvAA01-client-14 > [2019-04-21 09:23:41.129962] I [MSGID: 108026] > [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-5: > performing entry selfheal on 30547ab6-1fbd-422e-9c81-2009f9ff7ebe > [2019-04-21 09:23:41.135919] W [MSGID: 108015] > [afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-5: > expunging file 30547ab6-1fbd-422e-9c81-2009f9ff7ebe/XXXXXXXX.vbm_346744_tmp > (00000000-0000-0000-0000-000000000000) on gvAA01-client-17 > > [2019-04-21 09:33:24.015223] I [MSGID: 108026] > [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-4: > performing entry selfheal on 8fa9513e-82fd-4a6e-8ac9-e1f1cd8afb01 > [2019-04-21 09:33:24.069686] W [MSGID: 108015] > [afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-4: > expunging file > 8fa9513e-82fd-4a6e-8ac9-e1f1cd8afb01/C_VOL-b003-i174-cd.md5.tmp > (00000000-0000-0000-0000-000000000000) on gvAA01-client-14 > [2019-04-21 09:33:24.074341] I [MSGID: 108026] > [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-4: > performing entry selfheal on 65f6d9fb-a441-4e47-b91a-0936d11a8c8f > [2019-04-21 09:33:24.080065] W [MSGID: 108015] > [afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-4: > expunging file > 65f6d9fb-a441-4e47-b91a-0936d11a8c8f/C_VOL-b001-i14937-cd.md5.tmp > (00000000-0000-0000-0000-000000000000) on gvAA01-client-14 > [2019-04-21 09:33:42.099515] I [MSGID: 108026] > [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-5: > performing entry selfheal on 30547ab6-1fbd-422e-9c81-2009f9ff7ebe > [2019-04-21 09:33:42.107481] W [MSGID: 108015] > [afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-5: > expunging file 30547ab6-1fbd-422e-9c81-2009f9ff7ebe/XXXXXXXX.vbm_346744_tmp > (00000000-0000-0000-0000-000000000000) on gvAA01-client-17 > > > On Sun, Apr 21, 2019 at 3:55 PM Patrick Rennie > wrote: > >> Just another small update, I'm continuing to watch my brick logs and I >> just saw these errors come up in the recent events too. I am going to >> continue to post any errors I see in the hope of finding the right one to >> try and fix.. >> This is from the logs on brick1, seems to be occurring on both nodes on >> brick1, although at different times. I'm not sure what this means, can >> anyone shed any light? >> I guess I am looking for some kind of specific error which may indicate >> something is broken or stuck and locking up and causing the extreme latency >> I'm seeing in the cluster. >> >> [2019-04-21 07:25:55.064497] E [rpcsvc.c:1364:rpcsvc_submit_generic] >> 0-rpc-service: failed to submit message (XID: 0x7c700c, Program: GlusterFS >> 3.3, ProgVers: 330, Proc: 29) to rpc-transport (tcp.gvAA01-server) >> [2019-04-21 07:25:55.064612] E [server.c:195:server_submit_reply] >> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e58a) >> [0x7f3b3e93158a] >> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17d45) >> [0x7f3b3e4c5d45] >> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) >> [0x7f3b3e4b72cd] ) 0-: Reply submission failed >> [2019-04-21 07:25:55.064675] E [rpcsvc.c:1364:rpcsvc_submit_generic] >> 0-rpc-service: failed to submit message (XID: 0x7c70af, Program: GlusterFS >> 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server) >> [2019-04-21 07:25:55.064705] E [server.c:195:server_submit_reply] >> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa) >> [0x7f3b3e9318fa] >> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35) >> [0x7f3b3e4c5f35] >> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) >> [0x7f3b3e4b72cd] ) 0-: Reply submission failed >> [2019-04-21 07:25:55.064742] E [rpcsvc.c:1364:rpcsvc_submit_generic] >> 0-rpc-service: failed to submit message (XID: 0x7c723c, Program: GlusterFS >> 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server) >> [2019-04-21 07:25:55.064768] E [server.c:195:server_submit_reply] >> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa) >> [0x7f3b3e9318fa] >> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35) >> [0x7f3b3e4c5f35] >> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) >> [0x7f3b3e4b72cd] ) 0-: Reply submission failed >> [2019-04-21 07:25:55.064812] E [rpcsvc.c:1364:rpcsvc_submit_generic] >> 0-rpc-service: failed to submit message (XID: 0x7c72b4, Program: GlusterFS >> 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server) >> [2019-04-21 07:25:55.064837] E [server.c:195:server_submit_reply] >> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa) >> [0x7f3b3e9318fa] >> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35) >> [0x7f3b3e4c5f35] >> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) >> [0x7f3b3e4b72cd] ) 0-: Reply submission failed >> [2019-04-21 07:25:55.064880] E [rpcsvc.c:1364:rpcsvc_submit_generic] >> 0-rpc-service: failed to submit message (XID: 0x7c740b, Program: GlusterFS >> 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server) >> [2019-04-21 07:25:55.064905] E [server.c:195:server_submit_reply] >> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa) >> [0x7f3b3e9318fa] >> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35) >> [0x7f3b3e4c5f35] >> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) >> [0x7f3b3e4b72cd] ) 0-: Reply submission failed >> [2019-04-21 07:25:55.064939] E [rpcsvc.c:1364:rpcsvc_submit_generic] >> 0-rpc-service: failed to submit message (XID: 0x7c7441, Program: GlusterFS >> 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server) >> [2019-04-21 07:25:55.064962] E [server.c:195:server_submit_reply] >> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa) >> [0x7f3b3e9318fa] >> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35) >> [0x7f3b3e4c5f35] >> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) >> [0x7f3b3e4b72cd] ) 0-: Reply submission failed >> [2019-04-21 07:25:55.064996] E [rpcsvc.c:1364:rpcsvc_submit_generic] >> 0-rpc-service: failed to submit message (XID: 0x7c74d5, Program: GlusterFS >> 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server) >> [2019-04-21 07:25:55.065020] E [server.c:195:server_submit_reply] >> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa) >> [0x7f3b3e9318fa] >> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35) >> [0x7f3b3e4c5f35] >> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) >> [0x7f3b3e4b72cd] ) 0-: Reply submission failed >> [2019-04-21 07:25:55.065052] E [rpcsvc.c:1364:rpcsvc_submit_generic] >> 0-rpc-service: failed to submit message (XID: 0x7c7551, Program: GlusterFS >> 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server) >> [2019-04-21 07:25:55.065076] E [server.c:195:server_submit_reply] >> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa) >> [0x7f3b3e9318fa] >> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35) >> [0x7f3b3e4c5f35] >> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) >> [0x7f3b3e4b72cd] ) 0-: Reply submission failed >> [2019-04-21 07:25:55.065110] E [rpcsvc.c:1364:rpcsvc_submit_generic] >> 0-rpc-service: failed to submit message (XID: 0x7c76d1, Program: GlusterFS >> 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server) >> [2019-04-21 07:25:55.065133] E [server.c:195:server_submit_reply] >> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa) >> [0x7f3b3e9318fa] >> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35) >> [0x7f3b3e4c5f35] >> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) >> [0x7f3b3e4b72cd] ) 0-: Reply submission failed >> >> Thanks again, >> >> -Patrick >> >> On Sun, Apr 21, 2019 at 3:50 PM Patrick Rennie >> wrote: >> >>> Hi Darrell, >>> >>> Thanks again for your advice, I've left it for a while but unfortunately >>> it's still just as slow and causing more problems for our operations now. I >>> will need to try and take some steps to at least bring performance back to >>> normal while continuing to investigate the issue longer term. I can >>> definitely see one node with heavier CPU than the other, almost double, >>> which I am OK with, but I think the heal process is going to take forever, >>> trying to check the "gluster volume heal info" shows thousands and >>> thousands of files which may need healing, I have no idea how many in total >>> the command is still running after hours, so I am not sure what has gone so >>> wrong to cause this. >>> >>> I've checked cluster.op-version and cluster.max-op-version and it looks >>> like I'm on the latest version there. >>> >>> I have no idea how long the healing is going to take on this cluster, we >>> have around 560TB of data on here, but I don't think I can wait that long >>> to try and restore performance to normal. >>> >>> Can anyone think of anything else I can try in the meantime to work out >>> what's causing the extreme latency? >>> >>> I've been going through cluster client the logs of some of our VMs and >>> on some of our FTP servers I found this in the cluster mount log, but I am >>> not seeing it on any of our other servers, just our FTP servers. >>> >>> [2019-04-21 07:16:19.925388] E [MSGID: 101046] >>> [dht-common.c:1904:dht_revalidate_cbk] 0-gvAA01-dht: dict is null >>> [2019-04-21 07:19:43.413834] W [MSGID: 114031] >>> [client-rpc-fops.c:2203:client3_3_setattr_cbk] 0-gvAA01-client-19: remote >>> operation failed [No such file or directory] >>> [2019-04-21 07:19:43.414153] W [MSGID: 114031] >>> [client-rpc-fops.c:2203:client3_3_setattr_cbk] 0-gvAA01-client-20: remote >>> operation failed [No such file or directory] >>> [2019-04-21 07:23:33.154717] E [MSGID: 101046] >>> [dht-common.c:1904:dht_revalidate_cbk] 0-gvAA01-dht: dict is null >>> [2019-04-21 07:33:24.943913] E [MSGID: 101046] >>> [dht-common.c:1904:dht_revalidate_cbk] 0-gvAA01-dht: dict is null >>> >>> Any ideas what this could mean? I am basically just grasping at straws >>> here. >>> >>> I am going to hold off on the version upgrade until I know there are no >>> files which need healing, which could be a while, from some reading I've >>> done there shouldn't be any issues with this as both are on v3.12.x >>> >>> I've free'd up a small amount of space, but I still need to work on this >>> further. >>> >>> I've read of a command "find .glusterfs -type f -links -2 -exec rm {} >>> \;" which could be run on each brick and it would potentially clean up any >>> files which were deleted straight from the bricks, but not via the client, >>> I have a feeling this could help me free up about 5-10TB per brick from >>> what I've been told about the history of this cluster. Can anyone confirm >>> if this is actually safe to run? >>> >>> At this stage, I'm open to any suggestions as to how to proceed, thanks >>> again for any advice. >>> >>> Cheers, >>> >>> - Patrick >>> >>> On Sun, Apr 21, 2019 at 1:22 AM Darrell Budic >>> wrote: >>> >>>> Patrick, >>>> >>>> Sounds like progress. Be aware that gluster is expected to max out the >>>> CPUs on at least one of your servers while healing. This is normal and >>>> won?t adversely affect overall performance (any more than having bricks in >>>> need of healing, at any rate) unless you?re overdoing it. shd threads <= 4 >>>> should not do that on your hardware. Other tunings may have also increased >>>> overall performance, so you may see higher CPU than previously anyway. I?d >>>> recommend upping those thread counts and letting it heal as fast as >>>> possible, especially if these are dedicated Gluster storage servers (Ie: >>>> not also running VMs, etc). You should see ?normal? CPU use one heals are >>>> completed. I see ~15-30% overall normally, 95-98% while healing (x my 20 >>>> cores). It?s also likely to be different between your servers, in a pure >>>> replica, one tends to max and one tends to be a little higher, in a >>>> distributed-replica, I?d expect more than one to run harder while healing. >>>> >>>> Keep the differences between doing an ls on a brick and doing an ls on >>>> a gluster mount in mind. When you do a ls on a gluster volume, it isn?t >>>> just doing a ls on one brick, it?s effectively doing it on ALL of your >>>> bricks, and they all have to return data before the ls succeeds. In a >>>> distributed volume, it?s figuring out where on each volume things live and >>>> getting the stat() from each to assemble the whole thing. And if things are >>>> in need of healing, it will take even longer to decide which version is >>>> current and use it (shd triggers a heal anytime it encounters this). Any of >>>> these things being slow slows down the overall response. >>>> >>>> At this point, I?d get some sleep too, and let your cluster heal while >>>> you do. I?d really want it fully healed before I did any updates anyway, so >>>> let it use CPU and get itself sorted out. Expect it to do a round of >>>> healing after you upgrade each machine too, this is normal so don?t let the >>>> CPU spike surprise you, It?s just catching up from the downtime incurred by >>>> the update and/or reboot if you did one. >>>> >>>> That reminds me, check your gluster cluster.op-version and >>>> cluster.max-op-version (gluster vol get all all | grep op-version). If >>>> op-version isn?t at the max-op-verison, set it to it so you?re taking >>>> advantage of the latest features available to your version. >>>> >>>> -Darrell >>>> >>>> On Apr 20, 2019, at 11:54 AM, Patrick Rennie >>>> wrote: >>>> >>>> Hi Darrell, >>>> >>>> Thanks again for your advice, I've applied the acltype=posixacl on my >>>> zpools and I think that has reduced some of the noise from my brick logs. >>>> I also bumped up some of the thread counts you suggested but my CPU >>>> load skyrocketed, so I dropped it back down to something slightly lower, >>>> but still higher than it was before, and will see how that goes for a >>>> while. >>>> >>>> Although low space is a definite issue, if I run an ls anywhere on my >>>> bricks directly it's instant, <1 second, and still takes several minutes >>>> via gluster, so there is still a problem in my gluster configuration >>>> somewhere. We don't have any snapshots, but I am trying to work out if any >>>> data on there is safe to delete, or if there is any way I can safely find >>>> and delete data which has been removed directly from the bricks in the >>>> past. I also have lz4 compression already enabled on each zpool which does >>>> help a bit, we get between 1.05 and 1.08x compression on this data. >>>> I've tried to go through each client and checked it's cluster mount >>>> logs and also my brick logs and looking for errors, so far nothing is >>>> jumping out at me, but there are some warnings and errors here and there, I >>>> am trying to work out what they mean. >>>> >>>> It's already 1 am here and unfortunately, I'm still awake working on >>>> this issue, but I think that I will have to leave the version upgrades >>>> until tomorrow. >>>> >>>> Thanks again for your advice so far. If anyone has any ideas on where I >>>> can look for errors other than brick logs or the cluster mount logs to help >>>> resolve this issue, it would be much appreciated. >>>> >>>> Cheers, >>>> >>>> - Patrick >>>> >>>> On Sat, Apr 20, 2019 at 11:57 PM Darrell Budic >>>> wrote: >>>> >>>>> See inline: >>>>> >>>>> On Apr 20, 2019, at 10:09 AM, Patrick Rennie >>>>> wrote: >>>>> >>>>> Hi Darrell, >>>>> >>>>> Thanks for your reply, this issue seems to be getting worse over the >>>>> last few days, really has me tearing my hair out. I will do as you have >>>>> suggested and get started on upgrading from 3.12.14 to 3.12.15. >>>>> I've checked the zfs properties and all bricks have "xattr=sa" set, >>>>> but none of them has "acltype=posixacl" set, currently the acltype property >>>>> shows "off", if I make these changes will it apply retroactively to the >>>>> existing data? I'm unfamiliar with what this will change so I may need to >>>>> look into that before I proceed. >>>>> >>>>> >>>>> It is safe to apply that now, any new set/get calls will then use it >>>>> if new posixacls exist, and use older if not. ZFS is good that way. It >>>>> should clear up your posix_acl and posix errors over time. >>>>> >>>>> I understand performance is going to slow down as the bricks get full, >>>>> I am currently trying to free space and migrate data to some newer storage, >>>>> I have fresh several hundred TB storage I just setup recently but with >>>>> these performance issues it's really slow. I also believe there is >>>>> significant data which has been deleted directly from the bricks in the >>>>> past, so if I can reclaim this space in a safe manner then I will have at >>>>> least around 10-15% free space. >>>>> >>>>> >>>>> Full ZFS volumes will have a much larger impact on performance than >>>>> you?d think, I?d prioritize this. If you have been taking zfs snapshots, >>>>> consider deleting them to get the overall volume free space back up. And >>>>> just to be sure it?s been said, delete from within the mounted volumes, >>>>> don?t delete directly from the bricks (gluster will just try and heal it >>>>> later, compounding your issues). Does not apply to deleting other data from >>>>> the ZFS volume if it?s not part of the brick directory, of course. >>>>> >>>>> These servers have dual 8 core Xeon (E5-2620v4) and 512GB of RAM so >>>>> generally they have plenty of resources available, currently only using >>>>> around 330/512GB of memory. >>>>> >>>>> I will look into what your suggested settings will change, and then >>>>> will probably go ahead with your recommendations, for our specs as stated >>>>> above, what would you suggest for performance.io-thread-count ? >>>>> >>>>> >>>>> I run single 2630v4s on my servers, which have a smaller storage >>>>> footprint than yours. I?d go with 32 for performance.io-thread-count. >>>>> I?d try 4 for the shd thread settings on that gear. Your memory use sounds >>>>> fine, so no worries there. >>>>> >>>>> Our workload is nothing too extreme, we have a few VMs which write >>>>> backup data to this storage nightly for our clients, our VMs don't live on >>>>> this cluster, but just write to it. >>>>> >>>>> >>>>> If they are writing compressible data, you?ll get immediate benefit by >>>>> setting compression=lz4 on your ZFS volumes. It won?t help any old data, of >>>>> course, but it will compress new data going forward. This is another one >>>>> that?s safe to enable on the fly. >>>>> >>>>> I've been going through all of the logs I can, below are some slightly >>>>> sanitized errors I've come across, but I'm not sure what to make of them. >>>>> The main error I am seeing is the first one below, across several of my >>>>> bricks, but possibly only for specific folders on the cluster, I'm not 100% >>>>> about that yet though. >>>>> >>>>> [2019-04-20 05:56:59.512649] E [MSGID: 113001] >>>>> [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on >>>>> /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default [Operation not >>>>> supported] >>>>> [2019-04-20 05:59:06.084333] E [MSGID: 113001] >>>>> [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on >>>>> /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default [Operation not >>>>> supported] >>>>> [2019-04-20 05:59:43.289030] E [MSGID: 113001] >>>>> [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on >>>>> /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default [Operation not >>>>> supported] >>>>> [2019-04-20 05:59:50.582257] E [MSGID: 113001] >>>>> [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on >>>>> /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default [Operation not >>>>> supported] >>>>> [2019-04-20 06:01:42.501701] E [MSGID: 113001] >>>>> [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on >>>>> /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default [Operation not >>>>> supported] >>>>> [2019-04-20 06:01:51.665354] W [posix.c:4929:posix_getxattr] >>>>> 0-gvAA01-posix: Extended attributes not supported (try remounting brick >>>>> with 'user_xattr' flag) >>>>> >>>>> >>>>> [2019-04-20 13:12:36.131856] E [MSGID: 113002] >>>>> [posix-helpers.c:893:posix_gfid_set] 0-gvAA01-posix: gfid is null for >>>>> /xxxxxxxxxxxxxxxxxxxx [Invalid argument] >>>>> [2019-04-20 13:12:36.131959] E [MSGID: 113002] >>>>> [posix.c:362:posix_lookup] 0-gvAA01-posix: buf->ia_gfid is null for >>>>> /brick2/xxxxxxxxxxxxxxxxxxxx_62906_tmp [No data available] >>>>> [2019-04-20 13:12:36.132016] E [MSGID: 115050] >>>>> [server-rpc-fops.c:175:server_lookup_cbk] 0-gvAA01-server: 24274759: LOOKUP >>>>> /xxxxxxxxxxxxxxxxxxxx (a7c9b4a0-b7ee-4d01-a79e-576013c8ac87/Cloud >>>>> Backup_clone1.vbm_62906_tmp), client: >>>>> 00-A-16217-2019/04/08-21:23:03:692424-gvAA01-client-4-0-3, error-xlator: >>>>> gvAA01-posix [No data available] >>>>> [2019-04-20 13:12:38.093719] E [MSGID: 115050] >>>>> [server-rpc-fops.c:175:server_lookup_cbk] 0-gvAA01-server: 24276491: LOOKUP >>>>> /xxxxxxxxxxxxxxxxxxxx (a7c9b4a0-b7ee-4d01-a79e-576013c8ac87/Cloud >>>>> Backup_clone1.vbm_62906_tmp), client: >>>>> 00-A-16217-2019/04/08-21:23:03:692424-gvAA01-client-4-0-3, error-xlator: >>>>> gvAA01-posix [No data available] >>>>> [2019-04-20 13:12:38.093660] E [MSGID: 113002] >>>>> [posix-helpers.c:893:posix_gfid_set] 0-gvAA01-posix: gfid is null for >>>>> /xxxxxxxxxxxxxxxxxxxx [Invalid argument] >>>>> [2019-04-20 13:12:38.093696] E [MSGID: 113002] >>>>> [posix.c:362:posix_lookup] 0-gvAA01-posix: buf->ia_gfid is null for >>>>> /brick2/xxxxxxxxxxxxxxxxxxxx [No data available] >>>>> >>>>> >>>>> posixacls should clear those up, as mentioned. >>>>> >>>>> >>>>> [2019-04-20 14:25:59.654576] E [inodelk.c:404:__inode_unlock_lock] >>>>> 0-gvAA01-locks: Matching lock not found for unlock 0-9223372036854775807, >>>>> by 980fdbbd367f0000 on 0x7fc4f0161440 >>>>> [2019-04-20 14:25:59.654668] E [MSGID: 115053] >>>>> [server-rpc-fops.c:295:server_inodelk_cbk] 0-gvAA01-server: 6092928: >>>>> INODELK /xxxxxxxxxxxxxxxxxxxx.cdr$ (25b14631-a179-4274-8243-6e272d4f2ad8), >>>>> client: >>>>> cb-per-worker18-53637-2019/04/19-14:25:37:927673-gvAA01-client-1-0-4, >>>>> error-xlator: gvAA01-locks [Invalid argument] >>>>> >>>>> >>>>> [2019-04-20 13:35:07.495495] E [rpcsvc.c:1364:rpcsvc_submit_generic] >>>>> 0-rpc-service: failed to submit message (XID: 0x247c644, Program: GlusterFS >>>>> 3.3, ProgVers: 330, Proc: 27) to rpc-transport (tcp.gvAA01-server) >>>>> [2019-04-20 13:35:07.495619] E [server.c:195:server_submit_reply] >>>>> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.14/xlator/debug/io-stats.so(+0x1696a) >>>>> [0x7ff4ae6f796a] >>>>> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.14/xlator/protocol/server.so(+0x2d6e8) >>>>> [0x7ff4ae2a96e8] >>>>> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.14/xlator/protocol/server.so(+0x928d) >>>>> [0x7ff4ae28528d] ) 0-: Reply submission failed >>>>> >>>>> >>>>> Fix the posix acls and see if these clear up over time as well, I?m >>>>> unclear on what the overall effect of running without the posix acls will >>>>> be to total gluster health. Your biggest problem sounds like you need to >>>>> free up space on the volumes and get the overall volume health back up to >>>>> par and see if that doesn?t resolve the symptoms you?re seeing. >>>>> >>>>> >>>>> >>>>> Thank you again for your assistance. It is greatly appreciated. >>>>> >>>>> - Patrick >>>>> >>>>> >>>>> >>>>> On Sat, Apr 20, 2019 at 10:50 PM Darrell Budic >>>>> wrote: >>>>> >>>>>> Patrick, >>>>>> >>>>>> I would definitely upgrade your two nodes from 3.12.14 to 3.12.15. >>>>>> You also mention ZFS, and that error you show makes me think you need to >>>>>> check to be sure you have ?xattr=sa? and ?acltype=posixacl? set on your ZFS >>>>>> volumes. >>>>>> >>>>>> You also observed your bricks are crossing the 95% full line, ZFS >>>>>> performance will degrade significantly the closer you get to full. In my >>>>>> experience, this starts somewhere between 10% and 5% free space remaining, >>>>>> so you?re in that realm. >>>>>> >>>>>> How?s your free memory on the servers doing? Do you have your zfs arc >>>>>> cache limited to something less than all the RAM? It shares pretty well, >>>>>> but I?ve encountered situations where other things won?t try and take ram >>>>>> back properly if they think it?s in use, so ZFS never gets the opportunity >>>>>> to give it up. >>>>>> >>>>>> Since your volume is a disperse-replica, you might try tuning >>>>>> disperse.shd-max-threads, default is 1, I?d try it at 2, 4, or even more if >>>>>> the CPUs are beefy enough. And setting server.event-threads to 4 and >>>>>> client.event-threads to 8 has proven helpful in many cases. After you get >>>>>> upgraded to 3.12.15, enabling performance.stat-prefetch may help as well. I >>>>>> don?t know if it matters, but I?d also recommend resetting >>>>>> performance.least-prio-threads to the default of 1 (or try 2 or 4) and/or >>>>>> also setting performance.io-thread-count to 32 if those have beefy >>>>>> CPUs. >>>>>> >>>>>> Beyond those general ideas, more info about your hardware (CPU and >>>>>> RAM) and workload (VMs, direct storage for web servers or enders, etc) may >>>>>> net you some more ideas. Then you?re going to have to do more digging into >>>>>> brick logs looking for errors and/or warnings to see what?s going on. >>>>>> >>>>>> -Darrell >>>>>> >>>>>> >>>>>> On Apr 20, 2019, at 8:22 AM, Patrick Rennie >>>>>> wrote: >>>>>> >>>>>> Hello Gluster Users, >>>>>> >>>>>> I am hoping someone can help me with resolving an ongoing issue I've >>>>>> been having, I'm new to mailing lists so forgive me if I have gotten >>>>>> anything wrong. We have noticed our performance deteriorating over the last >>>>>> few weeks, easily measured by trying to do an ls on one of our top-level >>>>>> folders, and timing it, which usually would take 2-5 seconds, and now takes >>>>>> up to 20 minutes, which obviously renders our cluster basically unusable. >>>>>> This has been intermittent in the past but is now almost constant and I am >>>>>> not sure how to work out the exact cause. We have noticed some errors in >>>>>> the brick logs, and have noticed that if we kill the right brick process, >>>>>> performance instantly returns back to normal, this is not always the same >>>>>> brick, but it indicates to me something in the brick processes or >>>>>> background tasks may be causing extreme latency. Due to this ability to fix >>>>>> it by killing the right brick process off, I think it's a specific file, or >>>>>> folder, or operation which may be hanging and causing the increased >>>>>> latency, but I am not sure how to work it out. One last thing to add is >>>>>> that our bricks are getting quite full (~95% full), we are trying to >>>>>> migrate data off to new storage but that is going slowly, not helped by >>>>>> this issue. I am currently trying to run a full heal as there appear to be >>>>>> many files needing healing, and I have all brick processes running so they >>>>>> have an opportunity to heal, but this means performance is very poor. It >>>>>> currently takes over 15-20 minutes to do an ls of one of our top-level >>>>>> folders, which just contains 60-80 other folders, this should take 2-5 >>>>>> seconds. This is all being checked by FUSE mount locally on the storage >>>>>> node itself, but it is the same for other clients and VMs accessing the >>>>>> cluster. Initially, it seemed our NFS mounts were not affected and operated >>>>>> at normal speed, but testing over the last day has shown that our NFS >>>>>> clients are also extremely slow, so it doesn't seem specific to FUSE as I >>>>>> first thought it might be. >>>>>> >>>>>> I am not sure how to proceed from here, I am fairly new to gluster >>>>>> having inherited this setup from my predecessor and trying to keep it >>>>>> going. I have included some info below to try and help with diagnosis, >>>>>> please let me know if any further info would be helpful. I would really >>>>>> appreciate any advice on what I could try to work out the cause. Thank you >>>>>> in advance for reading this, and any suggestions you might be able to >>>>>> offer. >>>>>> >>>>>> - Patrick >>>>>> >>>>>> This is an example of the main error I see in our brick logs, there >>>>>> have been others, I can post them when I see them again too: >>>>>> [2019-04-20 04:54:43.055680] E [MSGID: 113001] >>>>>> [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on >>>>>> /brick1/ library: system.posix_acl_default [Operation not >>>>>> supported] >>>>>> [2019-04-20 05:01:29.476313] W [posix.c:4929:posix_getxattr] >>>>>> 0-gvAA01-posix: Extended attributes not supported (try remounting brick >>>>>> with 'user_xattr' flag) >>>>>> >>>>>> Our setup consists of 2 storage nodes and an arbiter node. I have >>>>>> noticed our nodes are on slightly different versions, I'm not sure if this >>>>>> could be an issue. We have 9 bricks on each node, made up of ZFS RAIDZ2 >>>>>> pools - total capacity is around 560TB. >>>>>> We have bonded 10gbps NICS on each node, and I have tested bandwidth >>>>>> with iperf and found that it's what would be expected from this config. >>>>>> Individual brick performance seems ok, I've tested several bricks >>>>>> using dd and can write a 10GB files at 1.7GB/s. >>>>>> >>>>>> # dd if=/dev/zero of=/brick1/test/test.file bs=1M count=10000 >>>>>> 10000+0 records in >>>>>> 10000+0 records out >>>>>> 10485760000 bytes (10 GB, 9.8 GiB) copied, 6.20303 s, 1.7 GB/s >>>>>> >>>>>> Node 1: >>>>>> # glusterfs --version >>>>>> glusterfs 3.12.15 >>>>>> >>>>>> Node 2: >>>>>> # glusterfs --version >>>>>> glusterfs 3.12.14 >>>>>> >>>>>> Arbiter: >>>>>> # glusterfs --version >>>>>> glusterfs 3.12.14 >>>>>> >>>>>> Here is our gluster volume status: >>>>>> >>>>>> # gluster volume status >>>>>> Status of volume: gvAA01 >>>>>> Gluster process TCP Port RDMA Port >>>>>> Online Pid >>>>>> >>>>>> ------------------------------------------------------------------------------ >>>>>> Brick 01-B:/brick1/gvAA01/brick 49152 0 Y 7219 >>>>>> Brick 02-B:/brick1/gvAA01/brick 49152 0 Y 21845 >>>>>> Brick 00-A:/arbiterAA01/gvAA01/bri >>>>>> ck1 49152 0 Y >>>>>> 6931 >>>>>> Brick 01-B:/brick2/gvAA01/brick 49153 0 Y 7239 >>>>>> Brick 02-B:/brick2/gvAA01/brick 49153 0 Y 9916 >>>>>> Brick 00-A:/arbiterAA01/gvAA01/bri >>>>>> ck2 49153 0 Y >>>>>> 6939 >>>>>> Brick 01-B:/brick3/gvAA01/brick 49154 0 Y 7235 >>>>>> Brick 02-B:/brick3/gvAA01/brick 49154 0 Y 21858 >>>>>> Brick 00-A:/arbiterAA01/gvAA01/bri >>>>>> ck3 49154 0 Y >>>>>> 6947 >>>>>> Brick 01-B:/brick4/gvAA01/brick 49155 0 Y 31840 >>>>>> Brick 02-B:/brick4/gvAA01/brick 49155 0 Y 9933 >>>>>> Brick 00-A:/arbiterAA01/gvAA01/bri >>>>>> ck4 49155 0 Y >>>>>> 6956 >>>>>> Brick 01-B:/brick5/gvAA01/brick 49156 0 Y 7233 >>>>>> Brick 02-B:/brick5/gvAA01/brick 49156 0 Y 9942 >>>>>> Brick 00-A:/arbiterAA01/gvAA01/bri >>>>>> ck5 49156 0 Y >>>>>> 6964 >>>>>> Brick 01-B:/brick6/gvAA01/brick 49157 0 Y 7234 >>>>>> Brick 02-B:/brick6/gvAA01/brick 49157 0 Y 9952 >>>>>> Brick 00-A:/arbiterAA01/gvAA01/bri >>>>>> ck6 49157 0 Y >>>>>> 6974 >>>>>> Brick 01-B:/brick7/gvAA01/brick 49158 0 Y 7248 >>>>>> Brick 02-B:/brick7/gvAA01/brick 49158 0 Y 9960 >>>>>> Brick 00-A:/arbiterAA01/gvAA01/bri >>>>>> ck7 49158 0 Y >>>>>> 6984 >>>>>> Brick 01-B:/brick8/gvAA01/brick 49159 0 Y 7253 >>>>>> Brick 02-B:/brick8/gvAA01/brick 49159 0 Y 9970 >>>>>> Brick 00-A:/arbiterAA01/gvAA01/bri >>>>>> ck8 49159 0 Y >>>>>> 6993 >>>>>> Brick 01-B:/brick9/gvAA01/brick 49160 0 Y 7245 >>>>>> Brick 02-B:/brick9/gvAA01/brick 49160 0 Y 9984 >>>>>> Brick 00-A:/arbiterAA01/gvAA01/bri >>>>>> ck9 49160 0 Y >>>>>> 7001 >>>>>> NFS Server on localhost 2049 0 Y >>>>>> 17276 >>>>>> Self-heal Daemon on localhost N/A N/A Y >>>>>> 25245 >>>>>> NFS Server on 02-B 2049 0 Y 9089 >>>>>> Self-heal Daemon on 02-B N/A N/A Y 17838 >>>>>> NFS Server on 00-a 2049 0 Y 15660 >>>>>> Self-heal Daemon on 00-a N/A N/A Y 16218 >>>>>> >>>>>> Task Status of Volume gvAA01 >>>>>> >>>>>> ------------------------------------------------------------------------------ >>>>>> There are no active volume tasks >>>>>> >>>>>> And gluster volume info: >>>>>> >>>>>> # gluster volume info >>>>>> >>>>>> Volume Name: gvAA01 >>>>>> Type: Distributed-Replicate >>>>>> Volume ID: ca4ece2c-13fe-414b-856c-2878196d6118 >>>>>> Status: Started >>>>>> Snapshot Count: 0 >>>>>> Number of Bricks: 9 x (2 + 1) = 27 >>>>>> Transport-type: tcp >>>>>> Bricks: >>>>>> Brick1: 01-B:/brick1/gvAA01/brick >>>>>> Brick2: 02-B:/brick1/gvAA01/brick >>>>>> Brick3: 00-A:/arbiterAA01/gvAA01/brick1 (arbiter) >>>>>> Brick4: 01-B:/brick2/gvAA01/brick >>>>>> Brick5: 02-B:/brick2/gvAA01/brick >>>>>> Brick6: 00-A:/arbiterAA01/gvAA01/brick2 (arbiter) >>>>>> Brick7: 01-B:/brick3/gvAA01/brick >>>>>> Brick8: 02-B:/brick3/gvAA01/brick >>>>>> Brick9: 00-A:/arbiterAA01/gvAA01/brick3 (arbiter) >>>>>> Brick10: 01-B:/brick4/gvAA01/brick >>>>>> Brick11: 02-B:/brick4/gvAA01/brick >>>>>> Brick12: 00-A:/arbiterAA01/gvAA01/brick4 (arbiter) >>>>>> Brick13: 01-B:/brick5/gvAA01/brick >>>>>> Brick14: 02-B:/brick5/gvAA01/brick >>>>>> Brick15: 00-A:/arbiterAA01/gvAA01/brick5 (arbiter) >>>>>> Brick16: 01-B:/brick6/gvAA01/brick >>>>>> Brick17: 02-B:/brick6/gvAA01/brick >>>>>> Brick18: 00-A:/arbiterAA01/gvAA01/brick6 (arbiter) >>>>>> Brick19: 01-B:/brick7/gvAA01/brick >>>>>> Brick20: 02-B:/brick7/gvAA01/brick >>>>>> Brick21: 00-A:/arbiterAA01/gvAA01/brick7 (arbiter) >>>>>> Brick22: 01-B:/brick8/gvAA01/brick >>>>>> Brick23: 02-B:/brick8/gvAA01/brick >>>>>> Brick24: 00-A:/arbiterAA01/gvAA01/brick8 (arbiter) >>>>>> Brick25: 01-B:/brick9/gvAA01/brick >>>>>> Brick26: 02-B:/brick9/gvAA01/brick >>>>>> Brick27: 00-A:/arbiterAA01/gvAA01/brick9 (arbiter) >>>>>> Options Reconfigured: >>>>>> cluster.shd-max-threads: 4 >>>>>> performance.least-prio-threads: 16 >>>>>> cluster.readdir-optimize: on >>>>>> performance.quick-read: off >>>>>> performance.stat-prefetch: off >>>>>> cluster.data-self-heal: on >>>>>> cluster.lookup-unhashed: auto >>>>>> cluster.lookup-optimize: on >>>>>> cluster.favorite-child-policy: mtime >>>>>> server.allow-insecure: on >>>>>> transport.address-family: inet >>>>>> client.bind-insecure: on >>>>>> cluster.entry-self-heal: off >>>>>> cluster.metadata-self-heal: off >>>>>> performance.md-cache-timeout: 600 >>>>>> cluster.self-heal-daemon: enable >>>>>> performance.readdir-ahead: on >>>>>> diagnostics.brick-log-level: INFO >>>>>> nfs.disable: off >>>>>> >>>>>> Thank you for any assistance. >>>>>> >>>>>> - Patrick >>>>>> _______________________________________________ >>>>>> Gluster-users mailing list >>>>>> Gluster-users at gluster.org >>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>>> >>>>>> >>>>>> >>>>> >>>> _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pgurusid at redhat.com Tue Apr 23 13:34:06 2019 From: pgurusid at redhat.com (Poornima Gurusiddaiah) Date: Tue, 23 Apr 2019 19:04:06 +0530 Subject: [Gluster-users] Upgrade 5.5 -> 5.6: network traffic bug fixed? In-Reply-To: References: Message-ID: Hi, Thank you for the update, sorry for the delay. I did some more tests, but couldn't see the behaviour of spiked network bandwidth usage when quick-read is on. After upgrading, have you remounted the clients? As in the fix will not be effective until the process is restarted. If you have already restarted the client processes, then there must be something related to workload in the live system that is triggering a bug in quick-read. Would need wireshark capture if possible, to debug further. Regards, Poornima On Tue, Apr 16, 2019 at 6:25 PM Hu Bert wrote: > Hi Poornima, > > thx for your efforts. I made a couple of tests and the results are the > same, so the options are not related. Anyway, i'm not able to > reproduce the problem on my testing system, although the volume > options are the same. > > About 1.5 hours ago i set performance.quick-read to on again and > watched: load/iowait went up (not bad at the moment, little traffic), > but network traffic went up - from <20 MBit/s up to 160 MBit/s. After > deactivating quick-read traffic dropped to < 20 MBit/s again. > > munin graph: https://abload.de/img/network-client4s0kle.png > > The 2nd peak is from the last test. > > > Thx, > Hubert > > Am Di., 16. Apr. 2019 um 09:43 Uhr schrieb Hu Bert >: > > > > In my first test on my testing setup the traffic was on a normal > > level, so i thought i was "safe". But on my live system the network > > traffic was a multiple of the traffic one would expect. > > performance.quick-read was enabled in both, the only difference in the > > volume options between live and testing are: > > > > performance.read-ahead: testing on, live off > > performance.io-cache: testing on, live off > > > > I ran another test on my testing setup, deactivated both and copied 9 > > GB of data. Now the traffic went up as well, from before ~9-10 MBit/s > > up to 100 MBit/s with both options off. Does performance.quick-read > > require one of those options set to 'on'? > > > > I'll start another test shortly, and activate on of those 2 options, > > maybe there's a connection between those 3 options? > > > > > > Best Regards, > > Hubert > > > > Am Di., 16. Apr. 2019 um 08:57 Uhr schrieb Poornima Gurusiddaiah > > : > > > > > > Thank you for reporting this. I had done testing on my local setup and > the issue was resolved even with quick-read enabled. Let me test it again. > > > > > > Regards, > > > Poornima > > > > > > On Mon, Apr 15, 2019 at 12:25 PM Hu Bert > wrote: > > >> > > >> fyi: after setting performance.quick-read to off network traffic > > >> dropped to normal levels, client load/iowait back to normal as well. > > >> > > >> client: https://abload.de/img/network-client-afterihjqi.png > > >> server: https://abload.de/img/network-server-afterwdkrl.png > > >> > > >> Am Mo., 15. Apr. 2019 um 08:33 Uhr schrieb Hu Bert < > revirii at googlemail.com>: > > >> > > > >> > Good Morning, > > >> > > > >> > today i updated my replica 3 setup (debian stretch) from version 5.5 > > >> > to 5.6, as i thought the network traffic bug (#1673058) was fixed > and > > >> > i could re-activate 'performance.quick-read' again. See release > notes: > > >> > > > >> > https://review.gluster.org/#/c/glusterfs/+/22538/ > > >> > > http://git.gluster.org/cgit/glusterfs.git/commit/?id=34a2347780c2429284f57232f3aabb78547a9795 > > >> > > > >> > Upgrade went fine, and then i was watching iowait and network > traffic. > > >> > It seems that the network traffic went up after upgrade and > > >> > reactivation of performance.quick-read. Here are some graphs: > > >> > > > >> > network client1: https://abload.de/img/network-clientfwj1m.png > > >> > network client2: https://abload.de/img/network-client2trkow.png > > >> > network server: https://abload.de/img/network-serverv3jjr.png > > >> > > > >> > gluster volume info: https://pastebin.com/ZMuJYXRZ > > >> > > > >> > Just wondering if the network traffic bug really got fixed or if > this > > >> > is a new problem. I'll wait a couple of minutes and then deactivate > > >> > performance.quick-read again, just to see if network traffic goes > down > > >> > to normal levels. > > >> > > > >> > > > >> > Best regards, > > >> > Hubert > > >> _______________________________________________ > > >> Gluster-users mailing list > > >> Gluster-users at gluster.org > > >> https://lists.gluster.org/mailman/listinfo/gluster-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From budic at onholyground.com Tue Apr 23 15:16:25 2019 From: budic at onholyground.com (Darrell Budic) Date: Tue, 23 Apr 2019 10:16:25 -0500 Subject: [Gluster-users] Proposal: Changes in Gluster Community meetings In-Reply-To: References: <62104B6F-99CF-4C22-80FC-9C177F73E897@onholyground.com> Message-ID: <907BA003-F786-46CF-A31B-38C93CE9BB20@onholyground.com> I was one of the folk who wanted a NA/EMEA scheduled meeting, and I?m going to have to miss it due to some real life issues (clogged sewer I?m going to have to be dealing with at the time). Apologies, I?ll work on making the next one. -Darrell > On Apr 22, 2019, at 4:20 PM, FNU Raghavendra Manjunath wrote: > > > Hi, > > This is the agenda for tomorrow's community meeting for NA/EMEA timezone. > > https://hackmd.io/OqZbh7gfQe6uvVUXUVKJ5g?both > ---- > > > > On Thu, Apr 11, 2019 at 4:56 AM Amar Tumballi Suryanarayan > wrote: > Hi All, > > Below is the final details of our community meeting, and I will be sending invites to mailing list following this email. You can add Gluster Community Calendar so you can get notifications on the meetings. > > We are starting the meetings from next week. For the first meeting, we need 1 volunteer from users to discuss the use case / what went well, and what went bad, etc. preferrably in APAC region. NA/EMEA region, next week. > > Draft Content: https://hackmd.io/OqZbh7gfQe6uvVUXUVKJ5g > ---- > Gluster Community Meeting > > Previous Meeting minutes: > > http://github.com/gluster/community > Date/Time: Check the community calendar > Bridge > > APAC friendly hours > Bridge: https://bluejeans.com/836554017 > NA/EMEA > Bridge: https://bluejeans.com/486278655 > Attendance > > Name, Company > Host > > Who will host next meeting? > Host will need to send out the agenda 24hr - 12hrs in advance to mailing list, and also make sure to send the meeting minutes. > Host will need to reach out to one user at least who can talk about their usecase, their experience, and their needs. > Host needs to send meeting minutes as PR to http://github.com/gluster/community > User stories > > Discuss 1 usecase from a user. > How was the architecture derived, what volume type used, options, etc? > What were the major issues faced ? How to improve them? > What worked good? > How can we all collaborate well, so it is win-win for the community and the user? How can we > Community > > Any release updates? > > Blocker issues across the project? > > Metrics > > Number of new bugs since previous meeting. How many are not triaged? > Number of emails, anything unanswered? > Conferences / Meetups > > Any conference in next 1 month where gluster-developers are going? gluster-users are going? So we can meet and discuss. > Developer focus > > Any design specs to discuss? > > Metrics of the week? > > Coverity > Clang-Scan > Number of patches from new developers. > Did we increase test coverage? > [Atin] Also talk about most frequent test failures in the CI and carve out an AI to get them fixed. > RoundTable > > > ---- > > Regards, > Amar > > On Mon, Mar 25, 2019 at 8:53 PM Amar Tumballi Suryanarayan > wrote: > Thanks for the feedback Darrell, > > The new proposal is to have one in North America 'morning' time. (10AM PST), And another in ASIA day time, which is evening 7pm/6pm in Australia, 9pm Newzealand, 5pm Tokyo, 4pm Beijing. > > For example, if we choose Every other Tuesday for meeting, and 1st of the month is Tuesday, we would have North America time for 1st, and on 15th it would be ASIA/Pacific time. > > Hopefully, this way, we can cover all the timezones, and meeting minutes would be committed to github repo, so that way, it will be easier for everyone to be aware of what is happening. > > Regards, > Amar > > On Mon, Mar 25, 2019 at 8:40 PM Darrell Budic > wrote: > As a user, I?d like to visit more of these, but the time slot is my 3AM. Any possibility for a rolling schedule (move meeting +6 hours each week with rolling attendance from maintainers?) or an occasional regional meeting 12 hours opposed to the one you?re proposing? > > -Darrell > >> On Mar 25, 2019, at 4:25 AM, Amar Tumballi Suryanarayan > wrote: >> >> All, >> >> We currently have 3 meetings which are public: >> >> 1. Maintainer's Meeting >> >> - Runs once in 2 weeks (on Mondays), and current attendance is around 3-5 on an avg, and not much is discussed. >> - Without majority attendance, we can't take any decisions too. >> >> 2. Community meeting >> >> - Supposed to happen on #gluster-meeting, every 2 weeks, and is the only meeting which is for 'Community/Users'. Others are for developers as of now. >> Sadly attendance is getting closer to 0 in recent times. >> >> 3. GCS meeting >> >> - We started it as an effort inside Red Hat gluster team, and opened it up for community from Jan 2019, but the attendance was always from RHT members, and haven't seen any traction from wider group. >> >> So, I have a proposal to call out for cancelling all these meeting, and keeping just 1 weekly 'Community' meeting, where even topics related to maintainers and GCS and other projects can be discussed. >> >> I have a template of a draft template @ https://hackmd.io/OqZbh7gfQe6uvVUXUVKJ5g >> >> Please feel free to suggest improvements, both in agenda and in timings. So, we can have more participation from members of community, which allows more user - developer interactions, and hence quality of project. >> >> Waiting for feedbacks, >> >> Regards, >> Amar >> >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users > > > -- > Amar Tumballi (amarts) > > > -- > Amar Tumballi (amarts) > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From cody at platform9.com Fri Apr 19 02:33:16 2019 From: cody at platform9.com (Cody Hill) Date: Thu, 18 Apr 2019 21:33:16 -0500 Subject: [Gluster-users] GlusterFS on ZFS In-Reply-To: <085deed5-f048-4baa-84f8-1f6ef1436a5b@email.android.com> References: <085deed5-f048-4baa-84f8-1f6ef1436a5b@email.android.com> Message-ID: Thanks for the info Karli, I wasn?t aware ZFS Dedup was such a dog. I guess I?ll leave that off. My data get?s 3.5:1 savings on compression alone. I was aware of stripped sets. I will be doing 6x Striped sets across 12x disks. On top of this design I?m going to try and test Intel Optane DIMM (512GB) as a ?Tier? for GlusterFS to try and get further write acceleration. And issues with GlusterFS ?Tier? functionality that anyone is aware of? Thank you, Cody Hill > On Apr 18, 2019, at 2:32 AM, Karli Sj?berg wrote: > > > > Den 17 apr. 2019 16:30 skrev Cody Hill : > Hey folks. > > I?m looking to deploy GlusterFS to host some VMs. I?ve done a lot of reading and would like to implement Deduplication and Compression in this setup. My thought would be to run ZFS to handle the Compression and Deduplication. > > You _really_ don't want ZFS doing dedup for any reason. > > > ZFS would give me the following benefits: > 1. If a single disk fails rebuilds happen locally instead of over the network > 2. Zil & L2Arc should add a slight performance increase > > Adding two really good NVME SSD's as a mirrored SLOG vdev does a huge deal for synchronous write performance, turning every random write into large streams that the spinning drives handle better. > > Don't know how picky Gluster is about synchronicity though, most "performance" tweaking suggests setting stuff to async, which I wouldn't recommend, but it's a huge boost for throughput obviously; not having to wait for stuff to actually get written, but it's dangerous. > > With mirrored NVME SLOG's, you could probably get that throughput without going asynchronous, which saves you from potential data corruption in a sudden power loss. > > L2ARC on the other hand does a bit for read latency, but for a general purpose file server- in practice- not a huge difference, the working set is just too large. Also keep in mind that L2ARC isn't "free". You need more RAM to know where you've cached stuff... > > 3. Deduplication and Compression are inline and have pretty good performance with modern hardware (Intel Skylake) > > ZFS deduplication has terrible performance. Watch your throughput automatically drop from hundreds or thousands of MB/s down to, like 5. It's a feature;) > > 4. Automated Snapshotting > > I can then layer GlusterFS on top to handle distribution to allow 3x Replicas of my storage. > My question is? Why aren?t more people doing this? Is this a horrible idea for some reason that I?m missing? > > While it could save a lot of space in some hypothetical instance, the drawbacks can never motivate it. E.g. if you want one node to suddenly die and never recover because of RAM exhaustion, go with ZFS dedup ;) > > I?d be very interested to hear your thoughts. > > Avoid ZFS dedup at all costs. LZ4 compression on the hand is awesome, definitely use that! It's basically a free performance enhancer the also saves space :) > > As another person has said, the best performance layout is RAID10- striped mirrors. I understand you'd want to get as much volume as possible with RAID-Z/RAID(5|6) since gluster also replicates/distributes, but it has a huge impact on IOPS. If performance is the main concern, do striped mirrors with replica 3 in Gluster. My advice is to test thoroughly with different pool layouts to see what gives acceptable performance against your volume requirements. > > /K > > > Additional thoughts: > I?d like to use Ganesha pNFS to connect to this storage. (Any issues here?) > I think I?d need KeepAliveD across these 3x nodes to store in the FSTAB (Is this correct?) > I?m also thinking about creating a ?Gluster Tier? of 512GB of Intel Optane DIMM to really smooth out write latencies? Any issues here? > > Thank you, > Cody Hill > > > > > > > > > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From patrickmrennie at gmail.com Sat Apr 20 05:28:44 2019 From: patrickmrennie at gmail.com (Patrick Rennie) Date: Sat, 20 Apr 2019 13:28:44 +0800 Subject: [Gluster-users] Extremely slow Gluster performance Message-ID: Hello Gluster Users, I am hoping someone can help me with resolving an ongoing issue I've been having, I'm new to mailing lists so forgive me if I have gotten anything wrong. We have noticed our performance deteriorating over the last few weeks, easily measured by trying to do an ls on one of our top-level folders, and timing it, which usually would take 2-5 seconds, and now takes up to 20 minutes, which obviously renders our cluster basically unusable. This has been intermittent in the past but is now almost constant and I am not sure how to work out the exact cause. We have noticed some errors in the brick logs, and have noticed that if we kill the right brick process, performance instantly returns back to normal, this is not always the same brick, but it indicates to me something in the brick processes or background tasks may be causing extreme latency. Due to this ability to fix it by killing the right brick process off, I think it's a specific file, or folder, or operation which may be hanging and causing the increased latency, but I am not sure how to work it out. One last thing to add is that our bricks are getting quite full (~95% full), we are trying to migrate data off to new storage but that is going slowly, not helped by this issue. I am currently trying to run a full heal as there appear to be many files needing healing, and I have all brick processes running so they have an opportunity to heal, but this means performance is very poor. It currently takes over 15-20 minutes to do an ls of one of our top-level folders, which just contains 60-80 other folders, this should take 2-5 seconds. This is all being checked by FUSE mount locally on the storage node itself, but it is the same for other clients and VMs accessing the cluster. Initially it seemed our NFS mounts were not affected and operated at normal speed, but testing over the last day has shown that our NFS clients are also extremely slow, so it doesn't seem specific to FUSE as I first thought it might be. I am not sure how to proceed from here, I am fairly new to gluster having inherited this setup from my predecessor and trying to keep it going. I have included some info below to try and help with diagnosis, please let me know if any further info would be helpful. I would really appreciate any advice on what I could try to work out the cause. Thank you in advance for reading this, and any suggestions you might be able to offer. - Patrick This is an example of the main error I see in our brick logs, there have been others, I can post them when I see them again too: [2019-04-20 04:54:43.055680] E [MSGID: 113001] [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on /brick1/ library: system.posix_acl_default [Operation not supported] [2019-04-20 05:01:29.476313] W [posix.c:4929:posix_getxattr] 0-gvAA01-posix: Extended attributes not supported (try remounting brick with 'user_xattr' flag) Our setup consists of 2 storage nodes and an arbiter node. I have noticed our nodes are on slightly different versions, I'm not sure if this could be an issue. We have 9 bricks on each node, made up of ZFS RAIDZ2 pools - total capacity is around 560TB. We have bonded 10gbps NICS on each node, and I have tested bandwidth with iperf and found that it's what would be expected from this config. Individual brick performance seems ok, I've tested several bricks using dd and can write a 10GB files at 1.7GB/s. # dd if=/dev/zero of=/brick1/test/test.file bs=1M count=10000 10000+0 records in 10000+0 records out 10485760000 bytes (10 GB, 9.8 GiB) copied, 6.20303 s, 1.7 GB/s Node 1: # glusterfs --version glusterfs 3.12.15 Node 2: # glusterfs --version glusterfs 3.12.14 Arbiter: # glusterfs --version glusterfs 3.12.14 Here is our gluster volume status: # gluster volume status Status of volume: gvAA01 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 01-B:/brick1/gvAA01/brick 49152 0 Y 7219 Brick 02-B:/brick1/gvAA01/brick 49152 0 Y 21845 Brick 00-A:/arbiterAA01/gvAA01/bri ck1 49152 0 Y 6931 Brick 01-B:/brick2/gvAA01/brick 49153 0 Y 7239 Brick 02-B:/brick2/gvAA01/brick 49153 0 Y 9916 Brick 00-A:/arbiterAA01/gvAA01/bri ck2 49153 0 Y 6939 Brick 01-B:/brick3/gvAA01/brick 49154 0 Y 7235 Brick 02-B:/brick3/gvAA01/brick 49154 0 Y 21858 Brick 00-A:/arbiterAA01/gvAA01/bri ck3 49154 0 Y 6947 Brick 01-B:/brick4/gvAA01/brick 49155 0 Y 31840 Brick 02-B:/brick4/gvAA01/brick 49155 0 Y 9933 Brick 00-A:/arbiterAA01/gvAA01/bri ck4 49155 0 Y 6956 Brick 01-B:/brick5/gvAA01/brick 49156 0 Y 7233 Brick 02-B:/brick5/gvAA01/brick 49156 0 Y 9942 Brick 00-A:/arbiterAA01/gvAA01/bri ck5 49156 0 Y 6964 Brick 01-B:/brick6/gvAA01/brick 49157 0 Y 7234 Brick 02-B:/brick6/gvAA01/brick 49157 0 Y 9952 Brick 00-A:/arbiterAA01/gvAA01/bri ck6 49157 0 Y 6974 Brick 01-B:/brick7/gvAA01/brick 49158 0 Y 7248 Brick 02-B:/brick7/gvAA01/brick 49158 0 Y 9960 Brick 00-A:/arbiterAA01/gvAA01/bri ck7 49158 0 Y 6984 Brick 01-B:/brick8/gvAA01/brick 49159 0 Y 7253 Brick 02-B:/brick8/gvAA01/brick 49159 0 Y 9970 Brick 00-A:/arbiterAA01/gvAA01/bri ck8 49159 0 Y 6993 Brick 01-B:/brick9/gvAA01/brick 49160 0 Y 7245 Brick 02-B:/brick9/gvAA01/brick 49160 0 Y 9984 Brick 00-A:/arbiterAA01/gvAA01/bri ck9 49160 0 Y 7001 NFS Server on localhost 2049 0 Y 17276 Self-heal Daemon on localhost N/A N/A Y 25245 NFS Server on 02-B 2049 0 Y 9089 Self-heal Daemon on 02-B N/A N/A Y 17838 NFS Server on 00-a 2049 0 Y 15660 Self-heal Daemon on 00-a N/A N/A Y 16218 Task Status of Volume gvAA01 ------------------------------------------------------------------------------ There are no active volume tasks And gluster volume info: # gluster volume info Volume Name: gvAA01 Type: Distributed-Replicate Volume ID: ca4ece2c-13fe-414b-856c-2878196d6118 Status: Started Snapshot Count: 0 Number of Bricks: 9 x (2 + 1) = 27 Transport-type: tcp Bricks: Brick1: 01-B:/brick1/gvAA01/brick Brick2: 02-B:/brick1/gvAA01/brick Brick3: 00-A:/arbiterAA01/gvAA01/brick1 (arbiter) Brick4: 01-B:/brick2/gvAA01/brick Brick5: 02-B:/brick2/gvAA01/brick Brick6: 00-A:/arbiterAA01/gvAA01/brick2 (arbiter) Brick7: 01-B:/brick3/gvAA01/brick Brick8: 02-B:/brick3/gvAA01/brick Brick9: 00-A:/arbiterAA01/gvAA01/brick3 (arbiter) Brick10: 01-B:/brick4/gvAA01/brick Brick11: 02-B:/brick4/gvAA01/brick Brick12: 00-A:/arbiterAA01/gvAA01/brick4 (arbiter) Brick13: 01-B:/brick5/gvAA01/brick Brick14: 02-B:/brick5/gvAA01/brick Brick15: 00-A:/arbiterAA01/gvAA01/brick5 (arbiter) Brick16: 01-B:/brick6/gvAA01/brick Brick17: 02-B:/brick6/gvAA01/brick Brick18: 00-A:/arbiterAA01/gvAA01/brick6 (arbiter) Brick19: 01-B:/brick7/gvAA01/brick Brick20: 02-B:/brick7/gvAA01/brick Brick21: 00-A:/arbiterAA01/gvAA01/brick7 (arbiter) Brick22: 01-B:/brick8/gvAA01/brick Brick23: 02-B:/brick8/gvAA01/brick Brick24: 00-A:/arbiterAA01/gvAA01/brick8 (arbiter) Brick25: 01-B:/brick9/gvAA01/brick Brick26: 02-B:/brick9/gvAA01/brick Brick27: 00-A:/arbiterAA01/gvAA01/brick9 (arbiter) Options Reconfigured: cluster.shd-max-threads: 4 performance.least-prio-threads: 16 cluster.readdir-optimize: on performance.quick-read: off performance.stat-prefetch: off cluster.data-self-heal: on cluster.lookup-unhashed: auto cluster.lookup-optimize: on cluster.favorite-child-policy: mtime server.allow-insecure: on transport.address-family: inet client.bind-insecure: on cluster.entry-self-heal: off cluster.metadata-self-heal: off performance.md-cache-timeout: 600 cluster.self-heal-daemon: enable performance.readdir-ahead: on diagnostics.brick-log-level: INFO nfs.disable: off -------------- next part -------------- An HTML attachment was scrubbed... URL: From nbalacha at redhat.com Wed Apr 24 04:04:12 2019 From: nbalacha at redhat.com (Nithya Balachandran) Date: Wed, 24 Apr 2019 09:34:12 +0530 Subject: [Gluster-users] Extremely slow Gluster performance In-Reply-To: References: Message-ID: Hi Patrick, Did this start only after the upgrade? How do you determine which brick process to kill? Are there a lot of files to be healed on the volume? Can you provide a tcpdump of the slow listing from a separate test client mount ? 1. Mount the gluster volume on a different mount point than the one being used by your users. 2. Start capturing packets on the system where you have mounted the volume in (1). - tcpdump -i any -s 0 -w /var/tmp/dirls.pcap tcp and not port 22 3. List the directory that is slow from the fuse client 4. Stop the capture (after a couple of minutes or after the listing returns, whichever is earlier) 5. Send us the pcap and the listing of the same directory from one of the bricks in order to compare the entries. We may need more information post looking at the tcpdump. Regards, Nithya On Tue, 23 Apr 2019 at 23:39, Patrick Rennie wrote: > Hello Gluster Users, > > I am hoping someone can help me with resolving an ongoing issue I've been > having, I'm new to mailing lists so forgive me if I have gotten anything > wrong. We have noticed our performance deteriorating over the last few > weeks, easily measured by trying to do an ls on one of our top-level > folders, and timing it, which usually would take 2-5 seconds, and now takes > up to 20 minutes, which obviously renders our cluster basically unusable. > This has been intermittent in the past but is now almost constant and I am > not sure how to work out the exact cause. We have noticed some errors in > the brick logs, and have noticed that if we kill the right brick process, > performance instantly returns back to normal, this is not always the same > brick, but it indicates to me something in the brick processes or > background tasks may be causing extreme latency. Due to this ability to fix > it by killing the right brick process off, I think it's a specific file, or > folder, or operation which may be hanging and causing the increased > latency, but I am not sure how to work it out. One last thing to add is > that our bricks are getting quite full (~95% full), we are trying to > migrate data off to new storage but that is going slowly, not helped by > this issue. I am currently trying to run a full heal as there appear to be > many files needing healing, and I have all brick processes running so they > have an opportunity to heal, but this means performance is very poor. It > currently takes over 15-20 minutes to do an ls of one of our top-level > folders, which just contains 60-80 other folders, this should take 2-5 > seconds. This is all being checked by FUSE mount locally on the storage > node itself, but it is the same for other clients and VMs accessing the > cluster. Initially it seemed our NFS mounts were not affected and operated > at normal speed, but testing over the last day has shown that our NFS > clients are also extremely slow, so it doesn't seem specific to FUSE as I > first thought it might be. > > I am not sure how to proceed from here, I am fairly new to gluster having > inherited this setup from my predecessor and trying to keep it going. I > have included some info below to try and help with diagnosis, please let me > know if any further info would be helpful. I would really appreciate any > advice on what I could try to work out the cause. Thank you in advance for > reading this, and any suggestions you might be able to offer. > > - Patrick > > This is an example of the main error I see in our brick logs, there have > been others, I can post them when I see them again too: > [2019-04-20 04:54:43.055680] E [MSGID: 113001] > [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on > /brick1/ library: system.posix_acl_default [Operation not > supported] > [2019-04-20 05:01:29.476313] W [posix.c:4929:posix_getxattr] > 0-gvAA01-posix: Extended attributes not supported (try remounting brick > with 'user_xattr' flag) > > Our setup consists of 2 storage nodes and an arbiter node. I have noticed > our nodes are on slightly different versions, I'm not sure if this could be > an issue. We have 9 bricks on each node, made up of ZFS RAIDZ2 pools - > total capacity is around 560TB. > We have bonded 10gbps NICS on each node, and I have tested bandwidth with > iperf and found that it's what would be expected from this config. > Individual brick performance seems ok, I've tested several bricks using dd > and can write a 10GB files at 1.7GB/s. > > # dd if=/dev/zero of=/brick1/test/test.file bs=1M count=10000 > 10000+0 records in > 10000+0 records out > 10485760000 bytes (10 GB, 9.8 GiB) copied, 6.20303 s, 1.7 GB/s > > Node 1: > # glusterfs --version > glusterfs 3.12.15 > > Node 2: > # glusterfs --version > glusterfs 3.12.14 > > Arbiter: > # glusterfs --version > glusterfs 3.12.14 > > Here is our gluster volume status: > > # gluster volume status > Status of volume: gvAA01 > Gluster process TCP Port RDMA Port Online > Pid > > ------------------------------------------------------------------------------ > Brick 01-B:/brick1/gvAA01/brick 49152 0 Y 7219 > Brick 02-B:/brick1/gvAA01/brick 49152 0 Y 21845 > Brick 00-A:/arbiterAA01/gvAA01/bri > ck1 49152 0 Y > 6931 > Brick 01-B:/brick2/gvAA01/brick 49153 0 Y 7239 > Brick 02-B:/brick2/gvAA01/brick 49153 0 Y 9916 > Brick 00-A:/arbiterAA01/gvAA01/bri > ck2 49153 0 Y > 6939 > Brick 01-B:/brick3/gvAA01/brick 49154 0 Y 7235 > Brick 02-B:/brick3/gvAA01/brick 49154 0 Y 21858 > Brick 00-A:/arbiterAA01/gvAA01/bri > ck3 49154 0 Y > 6947 > Brick 01-B:/brick4/gvAA01/brick 49155 0 Y 31840 > Brick 02-B:/brick4/gvAA01/brick 49155 0 Y 9933 > Brick 00-A:/arbiterAA01/gvAA01/bri > ck4 49155 0 Y > 6956 > Brick 01-B:/brick5/gvAA01/brick 49156 0 Y 7233 > Brick 02-B:/brick5/gvAA01/brick 49156 0 Y 9942 > Brick 00-A:/arbiterAA01/gvAA01/bri > ck5 49156 0 Y > 6964 > Brick 01-B:/brick6/gvAA01/brick 49157 0 Y 7234 > Brick 02-B:/brick6/gvAA01/brick 49157 0 Y 9952 > Brick 00-A:/arbiterAA01/gvAA01/bri > ck6 49157 0 Y > 6974 > Brick 01-B:/brick7/gvAA01/brick 49158 0 Y 7248 > Brick 02-B:/brick7/gvAA01/brick 49158 0 Y 9960 > Brick 00-A:/arbiterAA01/gvAA01/bri > ck7 49158 0 Y > 6984 > Brick 01-B:/brick8/gvAA01/brick 49159 0 Y 7253 > Brick 02-B:/brick8/gvAA01/brick 49159 0 Y 9970 > Brick 00-A:/arbiterAA01/gvAA01/bri > ck8 49159 0 Y > 6993 > Brick 01-B:/brick9/gvAA01/brick 49160 0 Y 7245 > Brick 02-B:/brick9/gvAA01/brick 49160 0 Y 9984 > Brick 00-A:/arbiterAA01/gvAA01/bri > ck9 49160 0 Y > 7001 > NFS Server on localhost 2049 0 Y > 17276 > Self-heal Daemon on localhost N/A N/A Y > 25245 > NFS Server on 02-B 2049 0 Y 9089 > Self-heal Daemon on 02-B N/A N/A Y 17838 > NFS Server on 00-a 2049 0 Y 15660 > Self-heal Daemon on 00-a N/A N/A Y 16218 > > Task Status of Volume gvAA01 > > ------------------------------------------------------------------------------ > There are no active volume tasks > > And gluster volume info: > > # gluster volume info > > Volume Name: gvAA01 > Type: Distributed-Replicate > Volume ID: ca4ece2c-13fe-414b-856c-2878196d6118 > Status: Started > Snapshot Count: 0 > Number of Bricks: 9 x (2 + 1) = 27 > Transport-type: tcp > Bricks: > Brick1: 01-B:/brick1/gvAA01/brick > Brick2: 02-B:/brick1/gvAA01/brick > Brick3: 00-A:/arbiterAA01/gvAA01/brick1 (arbiter) > Brick4: 01-B:/brick2/gvAA01/brick > Brick5: 02-B:/brick2/gvAA01/brick > Brick6: 00-A:/arbiterAA01/gvAA01/brick2 (arbiter) > Brick7: 01-B:/brick3/gvAA01/brick > Brick8: 02-B:/brick3/gvAA01/brick > Brick9: 00-A:/arbiterAA01/gvAA01/brick3 (arbiter) > Brick10: 01-B:/brick4/gvAA01/brick > Brick11: 02-B:/brick4/gvAA01/brick > Brick12: 00-A:/arbiterAA01/gvAA01/brick4 (arbiter) > Brick13: 01-B:/brick5/gvAA01/brick > Brick14: 02-B:/brick5/gvAA01/brick > Brick15: 00-A:/arbiterAA01/gvAA01/brick5 (arbiter) > Brick16: 01-B:/brick6/gvAA01/brick > Brick17: 02-B:/brick6/gvAA01/brick > Brick18: 00-A:/arbiterAA01/gvAA01/brick6 (arbiter) > Brick19: 01-B:/brick7/gvAA01/brick > Brick20: 02-B:/brick7/gvAA01/brick > Brick21: 00-A:/arbiterAA01/gvAA01/brick7 (arbiter) > Brick22: 01-B:/brick8/gvAA01/brick > Brick23: 02-B:/brick8/gvAA01/brick > Brick24: 00-A:/arbiterAA01/gvAA01/brick8 (arbiter) > Brick25: 01-B:/brick9/gvAA01/brick > Brick26: 02-B:/brick9/gvAA01/brick > Brick27: 00-A:/arbiterAA01/gvAA01/brick9 (arbiter) > Options Reconfigured: > cluster.shd-max-threads: 4 > performance.least-prio-threads: 16 > cluster.readdir-optimize: on > performance.quick-read: off > performance.stat-prefetch: off > cluster.data-self-heal: on > cluster.lookup-unhashed: auto > cluster.lookup-optimize: on > cluster.favorite-child-policy: mtime > server.allow-insecure: on > transport.address-family: inet > client.bind-insecure: on > cluster.entry-self-heal: off > cluster.metadata-self-heal: off > performance.md-cache-timeout: 600 > cluster.self-heal-daemon: enable > performance.readdir-ahead: on > diagnostics.brick-log-level: INFO > nfs.disable: off > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From patrickmrennie at gmail.com Wed Apr 24 07:27:54 2019 From: patrickmrennie at gmail.com (Patrick Rennie) Date: Wed, 24 Apr 2019 15:27:54 +0800 Subject: [Gluster-users] Extremely slow Gluster performance In-Reply-To: References: Message-ID: Hi Nithya, Thanks for your reply. I believe this issue first began a few weeks ago, and has been intermittent and gradually gotten worse. I've seen it take as long as 20 minutes to do a simple ls on our gluster mount via FUSE. My colleague upgraded us from 3.12 to 3.13 about 2 days ago as he thought it might help but it didn't change much. Immediately following the upgrade, the performance was good, but within 10-15 minutes it had gotten much worse again. We have one brick process which has been killed right now, and performance has gone from 5-20 mins to do an "ls" down to about 20-60 seconds, we are hoping that this will allow some other bricks to heal and then once those issues are resolved, we will restart this brick process again. We determined which brick to try and kill by monitoring the load of our disks and finding out which disk was struggling the most. Within seconds of killing the brick process, performance significantly improved, then gradually got a little bit worse again. Our current theory is that there are too many heal operations going on, and our CPU/disks are struggling a bit with all of the load from heals, plus hundreds of clients accessing and trying to write new data. This is probably not the best way to go about it, but we have had to do this to keep things usable for our clients. Our workload consists of 24/7 backup data being written to this cluster. I am also in the process of trying to delete several million files from a directory we no longer need to reduce some small file i/o from the cluster. I believe there are still a lot of files which do need to be healed, I ran a "gluster volume heal info summary" which I can do now in 3.13 and there are a handful of files on some bricks which need healing, and several thousand on one brick, I will include a copy of that too. I have done as you suggested and collected 2 sample tcp dumps for you. If it's OK with you, I will email this to you separately as it may contain sensitive production data. Thank you for your assistance, much appreciated. Kind Regards, - Patrick On Wed, Apr 24, 2019 at 12:04 PM Nithya Balachandran wrote: > Hi Patrick, > > Did this start only after the upgrade? > How do you determine which brick process to kill? > Are there a lot of files to be healed on the volume? > > Can you provide a tcpdump of the slow listing from a separate test client > mount ? > > 1. Mount the gluster volume on a different mount point than the one > being used by your users. > 2. Start capturing packets on the system where you have mounted the > volume in (1). > - tcpdump -i any -s 0 -w /var/tmp/dirls.pcap tcp and not port 22 > 3. List the directory that is slow from the fuse client > 4. Stop the capture (after a couple of minutes or after the listing > returns, whichever is earlier) > 5. Send us the pcap and the listing of the same directory from one of > the bricks in order to compare the entries. > > > We may need more information post looking at the tcpdump. > > Regards, > Nithya > > On Tue, 23 Apr 2019 at 23:39, Patrick Rennie > wrote: > >> Hello Gluster Users, >> >> I am hoping someone can help me with resolving an ongoing issue I've been >> having, I'm new to mailing lists so forgive me if I have gotten anything >> wrong. We have noticed our performance deteriorating over the last few >> weeks, easily measured by trying to do an ls on one of our top-level >> folders, and timing it, which usually would take 2-5 seconds, and now takes >> up to 20 minutes, which obviously renders our cluster basically unusable. >> This has been intermittent in the past but is now almost constant and I am >> not sure how to work out the exact cause. We have noticed some errors in >> the brick logs, and have noticed that if we kill the right brick process, >> performance instantly returns back to normal, this is not always the same >> brick, but it indicates to me something in the brick processes or >> background tasks may be causing extreme latency. Due to this ability to fix >> it by killing the right brick process off, I think it's a specific file, or >> folder, or operation which may be hanging and causing the increased >> latency, but I am not sure how to work it out. One last thing to add is >> that our bricks are getting quite full (~95% full), we are trying to >> migrate data off to new storage but that is going slowly, not helped by >> this issue. I am currently trying to run a full heal as there appear to be >> many files needing healing, and I have all brick processes running so they >> have an opportunity to heal, but this means performance is very poor. It >> currently takes over 15-20 minutes to do an ls of one of our top-level >> folders, which just contains 60-80 other folders, this should take 2-5 >> seconds. This is all being checked by FUSE mount locally on the storage >> node itself, but it is the same for other clients and VMs accessing the >> cluster. Initially it seemed our NFS mounts were not affected and operated >> at normal speed, but testing over the last day has shown that our NFS >> clients are also extremely slow, so it doesn't seem specific to FUSE as I >> first thought it might be. >> >> I am not sure how to proceed from here, I am fairly new to gluster having >> inherited this setup from my predecessor and trying to keep it going. I >> have included some info below to try and help with diagnosis, please let me >> know if any further info would be helpful. I would really appreciate any >> advice on what I could try to work out the cause. Thank you in advance for >> reading this, and any suggestions you might be able to offer. >> >> - Patrick >> >> This is an example of the main error I see in our brick logs, there have >> been others, I can post them when I see them again too: >> [2019-04-20 04:54:43.055680] E [MSGID: 113001] >> [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on >> /brick1/ library: system.posix_acl_default [Operation not >> supported] >> [2019-04-20 05:01:29.476313] W [posix.c:4929:posix_getxattr] >> 0-gvAA01-posix: Extended attributes not supported (try remounting brick >> with 'user_xattr' flag) >> >> Our setup consists of 2 storage nodes and an arbiter node. I have noticed >> our nodes are on slightly different versions, I'm not sure if this could be >> an issue. We have 9 bricks on each node, made up of ZFS RAIDZ2 pools - >> total capacity is around 560TB. >> We have bonded 10gbps NICS on each node, and I have tested bandwidth with >> iperf and found that it's what would be expected from this config. >> Individual brick performance seems ok, I've tested several bricks using >> dd and can write a 10GB files at 1.7GB/s. >> >> # dd if=/dev/zero of=/brick1/test/test.file bs=1M count=10000 >> 10000+0 records in >> 10000+0 records out >> 10485760000 bytes (10 GB, 9.8 GiB) copied, 6.20303 s, 1.7 GB/s >> >> Node 1: >> # glusterfs --version >> glusterfs 3.12.15 >> >> Node 2: >> # glusterfs --version >> glusterfs 3.12.14 >> >> Arbiter: >> # glusterfs --version >> glusterfs 3.12.14 >> >> Here is our gluster volume status: >> >> # gluster volume status >> Status of volume: gvAA01 >> Gluster process TCP Port RDMA Port Online >> Pid >> >> ------------------------------------------------------------------------------ >> Brick 01-B:/brick1/gvAA01/brick 49152 0 Y 7219 >> Brick 02-B:/brick1/gvAA01/brick 49152 0 Y 21845 >> Brick 00-A:/arbiterAA01/gvAA01/bri >> ck1 49152 0 Y >> 6931 >> Brick 01-B:/brick2/gvAA01/brick 49153 0 Y 7239 >> Brick 02-B:/brick2/gvAA01/brick 49153 0 Y 9916 >> Brick 00-A:/arbiterAA01/gvAA01/bri >> ck2 49153 0 Y >> 6939 >> Brick 01-B:/brick3/gvAA01/brick 49154 0 Y 7235 >> Brick 02-B:/brick3/gvAA01/brick 49154 0 Y 21858 >> Brick 00-A:/arbiterAA01/gvAA01/bri >> ck3 49154 0 Y >> 6947 >> Brick 01-B:/brick4/gvAA01/brick 49155 0 Y 31840 >> Brick 02-B:/brick4/gvAA01/brick 49155 0 Y 9933 >> Brick 00-A:/arbiterAA01/gvAA01/bri >> ck4 49155 0 Y >> 6956 >> Brick 01-B:/brick5/gvAA01/brick 49156 0 Y 7233 >> Brick 02-B:/brick5/gvAA01/brick 49156 0 Y 9942 >> Brick 00-A:/arbiterAA01/gvAA01/bri >> ck5 49156 0 Y >> 6964 >> Brick 01-B:/brick6/gvAA01/brick 49157 0 Y 7234 >> Brick 02-B:/brick6/gvAA01/brick 49157 0 Y 9952 >> Brick 00-A:/arbiterAA01/gvAA01/bri >> ck6 49157 0 Y >> 6974 >> Brick 01-B:/brick7/gvAA01/brick 49158 0 Y 7248 >> Brick 02-B:/brick7/gvAA01/brick 49158 0 Y 9960 >> Brick 00-A:/arbiterAA01/gvAA01/bri >> ck7 49158 0 Y >> 6984 >> Brick 01-B:/brick8/gvAA01/brick 49159 0 Y 7253 >> Brick 02-B:/brick8/gvAA01/brick 49159 0 Y 9970 >> Brick 00-A:/arbiterAA01/gvAA01/bri >> ck8 49159 0 Y >> 6993 >> Brick 01-B:/brick9/gvAA01/brick 49160 0 Y 7245 >> Brick 02-B:/brick9/gvAA01/brick 49160 0 Y 9984 >> Brick 00-A:/arbiterAA01/gvAA01/bri >> ck9 49160 0 Y >> 7001 >> NFS Server on localhost 2049 0 Y >> 17276 >> Self-heal Daemon on localhost N/A N/A Y >> 25245 >> NFS Server on 02-B 2049 0 Y 9089 >> Self-heal Daemon on 02-B N/A N/A Y 17838 >> NFS Server on 00-a 2049 0 Y 15660 >> Self-heal Daemon on 00-a N/A N/A Y 16218 >> >> Task Status of Volume gvAA01 >> >> ------------------------------------------------------------------------------ >> There are no active volume tasks >> >> And gluster volume info: >> >> # gluster volume info >> >> Volume Name: gvAA01 >> Type: Distributed-Replicate >> Volume ID: ca4ece2c-13fe-414b-856c-2878196d6118 >> Status: Started >> Snapshot Count: 0 >> Number of Bricks: 9 x (2 + 1) = 27 >> Transport-type: tcp >> Bricks: >> Brick1: 01-B:/brick1/gvAA01/brick >> Brick2: 02-B:/brick1/gvAA01/brick >> Brick3: 00-A:/arbiterAA01/gvAA01/brick1 (arbiter) >> Brick4: 01-B:/brick2/gvAA01/brick >> Brick5: 02-B:/brick2/gvAA01/brick >> Brick6: 00-A:/arbiterAA01/gvAA01/brick2 (arbiter) >> Brick7: 01-B:/brick3/gvAA01/brick >> Brick8: 02-B:/brick3/gvAA01/brick >> Brick9: 00-A:/arbiterAA01/gvAA01/brick3 (arbiter) >> Brick10: 01-B:/brick4/gvAA01/brick >> Brick11: 02-B:/brick4/gvAA01/brick >> Brick12: 00-A:/arbiterAA01/gvAA01/brick4 (arbiter) >> Brick13: 01-B:/brick5/gvAA01/brick >> Brick14: 02-B:/brick5/gvAA01/brick >> Brick15: 00-A:/arbiterAA01/gvAA01/brick5 (arbiter) >> Brick16: 01-B:/brick6/gvAA01/brick >> Brick17: 02-B:/brick6/gvAA01/brick >> Brick18: 00-A:/arbiterAA01/gvAA01/brick6 (arbiter) >> Brick19: 01-B:/brick7/gvAA01/brick >> Brick20: 02-B:/brick7/gvAA01/brick >> Brick21: 00-A:/arbiterAA01/gvAA01/brick7 (arbiter) >> Brick22: 01-B:/brick8/gvAA01/brick >> Brick23: 02-B:/brick8/gvAA01/brick >> Brick24: 00-A:/arbiterAA01/gvAA01/brick8 (arbiter) >> Brick25: 01-B:/brick9/gvAA01/brick >> Brick26: 02-B:/brick9/gvAA01/brick >> Brick27: 00-A:/arbiterAA01/gvAA01/brick9 (arbiter) >> Options Reconfigured: >> cluster.shd-max-threads: 4 >> performance.least-prio-threads: 16 >> cluster.readdir-optimize: on >> performance.quick-read: off >> performance.stat-prefetch: off >> cluster.data-self-heal: on >> cluster.lookup-unhashed: auto >> cluster.lookup-optimize: on >> cluster.favorite-child-policy: mtime >> server.allow-insecure: on >> transport.address-family: inet >> client.bind-insecure: on >> cluster.entry-self-heal: off >> cluster.metadata-self-heal: off >> performance.md-cache-timeout: 600 >> cluster.self-heal-daemon: enable >> performance.readdir-ahead: on >> diagnostics.brick-log-level: INFO >> nfs.disable: off >> >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From budic at onholyground.com Wed Apr 24 10:04:01 2019 From: budic at onholyground.com (Darrell Budic) Date: Wed, 24 Apr 2019 05:04:01 -0500 Subject: [Gluster-users] Extremely slow cluster performance In-Reply-To: References: <93FC9B39-2E8C-4579-8C9D-DEF1A28B7384@onholyground.com> <0A865F28-C4A6-41EF-AE37-70216670B4F0@onholyground.com> Message-ID: Patrick- What did you upgrade to? I?m probably missing something, but there wasn?t really a 3.13 version, and it isn?t listed on https://www.gluster.org/release-schedule/ Sorry about the confusion between dispersed and distribute-replicate, you?re absolutely correct that you need the normal shd max-threads settings there. Any improvements over time? Did you make sure each client or VM host can see all the servers? I?ve had an issue where a client was only talking to one of the servers, so it forced the servers to heal everything all the time, had a big performance impact. Probably don?t apply to an NFS mount, but may to your fuse mounts. Along those lines, any errors on the switches connecting the servers to the clients? Could explain why one is slow and the other isn?t so slow if one?s erroring a lot on the net. -Darrell > On Apr 23, 2019, at 5:07 AM, Patrick Rennie wrote: > > Hi Darrel, > > Thanks again for your advice, I tried to take yesterday off and just not think about it, back at it again today. Still no real progress, however my colleague upgraded our version to 3.13 yesterday, this has broken NFS and caused some other issues for us now. It did add the 'gluster volume heal info summary' so I can use that to try and keep an eye on how many files do seem to need healing, if it's accurate it's possibly less than I though. > > We are in the progress of moving this data to new storage, but it does take a long time to move so much data around, and more keeps coming in each day. > > We do have 3 cache SSDs for each brick so generally performance on the bricks themselves is quite quick, I can DD a 10GB file at ~1.7-2GB/s directly on a brick so I think the performance of each brick is actually ok. > > It's a distribute/replicate volume, not dispearsed so I can't change disperse.shd-max-threads. > > I have checked the basics like all peers connected and no scrubs in progress etc. > > Will keep working away at this, and will start to read through some of your performance tuning suggestions. Really appreciate your advice. > > Cheers, > > -Patrick > > > > On Mon, Apr 22, 2019 at 12:43 AM Darrell Budic > wrote: > Patrick- > > Specifically re: >> Thanks again for your advice, I've left it for a while but unfortunately it's still just as slow and causing more problems for our operations now. I will need to try and take some steps to at least bring performance back to normal while continuing to investigate the issue longer term. I can definitely see one node with heavier CPU than the other, almost double, which I am OK with, but I think the heal process is going to take forever, trying to check the "gluster volume heal info" shows thousands and thousands of files which may need healing, I have no idea how many in total the command is still running after hours, so I am not sure what has gone so wrong to cause this. >> ... >> I have no idea how long the healing is going to take on this cluster, we have around 560TB of data on here, but I don't think I can wait that long to try and restore performance to normal. > > You?re in a bind, I know, but it?s just going to take some time recover. You have a lot of data, and even at the best speeds your disks and networks can muster, it?s going to take a while. Until your cluster is fully healed, anything else you try may not have the full effect it would on a fully operational cluster. Your predecessor may have made things worse by not having proper posix attributes on the ZFS file system. You may have made things worse by killing brick processes in your distributed-replicated setup, creating an additional need for healing and possibly compounding the overall performance issues. I?m not trying to blame you or make you feel bad, but I do want to point out that there?s a problem here, and there is unlikely to be a silver bullet that will resolve the issue instantly. You?re going to have to give it time to get back into a ?normal" condition, which seems to be what your setup was configured and tested for in the first place. > > Those things said, rather than trying to move things from this cluster to different storage, what about having your VMs mount different storage in the first place and move the write load off of this cluster while it recovers? > > Looking at the profile you posted for Strahil, your bricks are spending a lot of time doing LOOKUPs, and some are slower than others by a significant margin. If you haven?t already, check the zfs pools on those, make sure they don?t have any failed disks that might be slowing them down. Consider if you can speed them up with a ZIL or SLOG if they are spinning disks (although your previous server descriptions sound like you don?t need a SLOG, ZILs may help fi they are HDDs)? Just saw your additional comments that one server is faster than than the other, it?s possible that it?s got the actual data and the other one is doing healings every time it gets accessed, or it?s just got fuller and slower volumes. It may make sense to try forcing all your VM mounts to the faster server for a while, even if it?s the one with higher load (serving will get preference to healing, but don?t push the shd-max-threads too high, they can squash performance. Given it?s a dispersed volume, make sure you?ve got disperse.shd-max-threads at 4 or 8, and raise disperse.shd-wait-qlength to 4096 or so. > > You?re getting into things best tested with everything working, but desperate times call for accelerated testing, right? > > You could experiment with different values of performance.io -thread-cound, try 48. But if your CPU load is already near max, you?re getting everything you can out of your CPU already, so don?t spend too much time on it. > > Check out https://github.com/gluster/glusterfs/blob/release-3.11/extras/group-nl-cache and try applying these to your gluster volume. Without knowing more about your workload, these may help if you?re doing a lot of directory listing and file lookups or tests for the (non)existence of a file from your VMs. If those help, search the mailing list for info on the mount option ?negative_cache=1? and a thread titled '[Gluster-users] Gluster native mount is really slow compared to nfs?, it may have some client side mount options that could give you further benefits. > > Have a look at https://docs.gluster.org/en/v3/Administrator%20Guide/Managing%20Volumes/#tuning-options , cluster.data-sef-heal-algorithm full may help things heal faster for you. performance.flush-behind & related may improve write response to the clients, use caution unless you have UPSs & battery backed raids, etc. If you have stats on network traffic on/between your two ?real? node servers, you can use that as a proxy value for healing performance. > > I looked up the performance.stat-prefetch bug for you, it was fixed back in 3.8, so it should be safe to enable on your 3.12.x system even with servers at .15 & .14. > > You?ll probably have to wait for devs to get anything else out of those logs, but make sure your servers can all see each other (gluster peer status, everything should be ?Peer in Cluster (Connected)? on all servers), and all 3 see all the bricks in the ?gluster vol status?. Maybe check for split brain files on those you keep seeing in the logs? > > Good luck, have patience, and remember (& remind others) that things are not in their normal state at this moment, and look for things outside of the gluster server cluster to try to help (https://joejulian.name/post/optimizing-web-performance-with-glusterfs/ ) get through the healing as well. > > -Darrell > >> On Apr 21, 2019, at 4:41 AM, Patrick Rennie > wrote: >> >> Another small update from me, I have been keeping an eye on the glustershd.log file to see what is going on and I keep seeing the same file names come up in there every 10 minutes, but not a lot of other activity. Logs below. >> How can I be sure my heal is progressing through the files which actually need to be healed? I thought it would show up in these logs. >> I also increased the "cluster.shd-max-threads" from 4 to 8 to try and speed things up too. >> >> Any ideas here? >> >> Thanks, >> >> - Patrick >> >> On 01-B >> ------- >> [2019-04-21 09:12:54.575689] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-gvAA01-replicate-6: performing metadata selfheal on 5354c112-2e58-451d-a6f7-6bfcc1c9d904 >> [2019-04-21 09:12:54.733601] I [MSGID: 108026] [afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6: Completed metadata selfheal on 5354c112-2e58-451d-a6f7-6bfcc1c9d904. sources=[0] 2 sinks=1 >> [2019-04-21 09:13:12.028509] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-5: performing entry selfheal on 30547ab6-1fbd-422e-9c81-2009f9ff7ebe >> [2019-04-21 09:13:12.047470] W [MSGID: 108015] [afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-5: expunging file 30547ab6-1fbd-422e-9c81-2009f9ff7ebe/XXXXXXXX.vbm_346744_tmp (00000000-0000-0000-0000-000000000000) on gvAA01-client-17 >> >> [2019-04-21 09:23:13.044377] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-5: performing entry selfheal on 30547ab6-1fbd-422e-9c81-2009f9ff7ebe >> [2019-04-21 09:23:13.051479] W [MSGID: 108015] [afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-5: expunging file 30547ab6-1fbd-422e-9c81-2009f9ff7ebe/XXXXXXXX.vbm_346744_tmp (00000000-0000-0000-0000-000000000000) on gvAA01-client-17 >> >> [2019-04-21 09:33:07.400369] I [MSGID: 108026] [afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6: Completed data selfheal on 2fd9899f-192b-49cb-ae9c-df35d3f004fa. sources=[0] 2 sinks=1 >> [2019-04-21 09:33:11.825449] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-gvAA01-replicate-6: performing metadata selfheal on 2fd9899f-192b-49cb-ae9c-df35d3f004fa >> [2019-04-21 09:33:14.029837] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-5: performing entry selfheal on 30547ab6-1fbd-422e-9c81-2009f9ff7ebe >> [2019-04-21 09:33:14.037436] W [MSGID: 108015] [afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-5: expunging file 30547ab6-1fbd-422e-9c81-2009f9ff7ebe/XXXXXXXX.vbm_346744_tmp (00000000-0000-0000-0000-000000000000) on gvAA01-client-17 >> [2019-04-21 09:33:23.913882] I [MSGID: 108026] [afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6: Completed metadata selfheal on 2fd9899f-192b-49cb-ae9c-df35d3f004fa. sources=[0] 2 sinks=1 >> [2019-04-21 09:33:43.874201] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-gvAA01-replicate-6: performing metadata selfheal on c25b80fd-f7df-4c6d-92bd-db930e89a0b1 >> [2019-04-21 09:34:02.273898] I [MSGID: 108026] [afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6: Completed metadata selfheal on c25b80fd-f7df-4c6d-92bd-db930e89a0b1. sources=[0] 2 sinks=1 >> [2019-04-21 09:35:12.282045] I [MSGID: 108026] [afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6: Completed data selfheal on 94027f22-a7d7-4827-be0d-09cf5ddda885. sources=[0] 2 sinks=1 >> [2019-04-21 09:35:15.146252] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-gvAA01-replicate-6: performing metadata selfheal on 94027f22-a7d7-4827-be0d-09cf5ddda885 >> [2019-04-21 09:35:15.254538] I [MSGID: 108026] [afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6: Completed metadata selfheal on 94027f22-a7d7-4827-be0d-09cf5ddda885. sources=[0] 2 sinks=1 >> [2019-04-21 09:35:22.900803] I [MSGID: 108026] [afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6: Completed data selfheal on 84c93069-cfd8-441b-a6e8-958bed535b45. sources=[0] 2 sinks=1 >> [2019-04-21 09:35:27.150963] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-gvAA01-replicate-6: performing metadata selfheal on 84c93069-cfd8-441b-a6e8-958bed535b45 >> [2019-04-21 09:35:29.186295] I [MSGID: 108026] [afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6: Completed metadata selfheal on 84c93069-cfd8-441b-a6e8-958bed535b45. sources=[0] 2 sinks=1 >> [2019-04-21 09:35:35.967451] I [MSGID: 108026] [afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6: Completed data selfheal on e747c32e-4353-4173-9024-855c69cdf9b9. sources=[0] 2 sinks=1 >> [2019-04-21 09:35:40.733444] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-gvAA01-replicate-6: performing metadata selfheal on e747c32e-4353-4173-9024-855c69cdf9b9 >> [2019-04-21 09:35:58.707593] I [MSGID: 108026] [afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6: Completed metadata selfheal on e747c32e-4353-4173-9024-855c69cdf9b9. sources=[0] 2 sinks=1 >> [2019-04-21 09:36:25.554260] I [MSGID: 108026] [afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6: Completed data selfheal on 4758d581-9de0-403b-af8b-bfd3d71d020d. sources=[0] 2 sinks=1 >> [2019-04-21 09:36:26.031422] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-gvAA01-replicate-6: performing metadata selfheal on 4758d581-9de0-403b-af8b-bfd3d71d020d >> [2019-04-21 09:36:26.083982] I [MSGID: 108026] [afr-self-heal-common.c:1726:afr_log_selfheal] 0-gvAA01-replicate-6: Completed metadata selfheal on 4758d581-9de0-403b-af8b-bfd3d71d020d. sources=[0] 2 sinks=1 >> >> On 02-B >> ------- >> [2019-04-21 09:03:15.815250] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-4: performing entry selfheal on 8fa9513e-82fd-4a6e-8ac9-e1f1cd8afb01 >> [2019-04-21 09:03:15.863153] W [MSGID: 108015] [afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-4: expunging file 8fa9513e-82fd-4a6e-8ac9-e1f1cd8afb01/C_VOL-b003-i174-cd.md5.tmp (00000000-0000-0000-0000-000000000000) on gvAA01-client-14 >> [2019-04-21 09:03:15.867432] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-4: performing entry selfheal on 65f6d9fb-a441-4e47-b91a-0936d11a8c8f >> [2019-04-21 09:03:15.875134] W [MSGID: 108015] [afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-4: expunging file 65f6d9fb-a441-4e47-b91a-0936d11a8c8f/C_VOL-b001-i14937-cd.md5.tmp (00000000-0000-0000-0000-000000000000) on gvAA01-client-14 >> [2019-04-21 09:03:39.020198] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-5: performing entry selfheal on 30547ab6-1fbd-422e-9c81-2009f9ff7ebe >> [2019-04-21 09:03:39.027345] W [MSGID: 108015] [afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-5: expunging file 30547ab6-1fbd-422e-9c81-2009f9ff7ebe/XXXXXXXX.vbm_346744_tmp (00000000-0000-0000-0000-000000000000) on gvAA01-client-17 >> >> [2019-04-21 09:13:18.524874] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-4: performing entry selfheal on 8fa9513e-82fd-4a6e-8ac9-e1f1cd8afb01 >> [2019-04-21 09:13:20.070172] W [MSGID: 108015] [afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-4: expunging file 8fa9513e-82fd-4a6e-8ac9-e1f1cd8afb01/C_VOL-b003-i174-cd.md5.tmp (00000000-0000-0000-0000-000000000000) on gvAA01-client-14 >> [2019-04-21 09:13:20.074977] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-4: performing entry selfheal on 65f6d9fb-a441-4e47-b91a-0936d11a8c8f >> [2019-04-21 09:13:20.080827] W [MSGID: 108015] [afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-4: expunging file 65f6d9fb-a441-4e47-b91a-0936d11a8c8f/C_VOL-b001-i14937-cd.md5.tmp (00000000-0000-0000-0000-000000000000) on gvAA01-client-14 >> [2019-04-21 09:13:40.015763] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-5: performing entry selfheal on 30547ab6-1fbd-422e-9c81-2009f9ff7ebe >> [2019-04-21 09:13:40.021805] W [MSGID: 108015] [afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-5: expunging file 30547ab6-1fbd-422e-9c81-2009f9ff7ebe/XXXXXXXX.vbm_346744_tmp (00000000-0000-0000-0000-000000000000) on gvAA01-client-17 >> >> [2019-04-21 09:23:21.991032] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-4: performing entry selfheal on 8fa9513e-82fd-4a6e-8ac9-e1f1cd8afb01 >> [2019-04-21 09:23:22.054565] W [MSGID: 108015] [afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-4: expunging file 8fa9513e-82fd-4a6e-8ac9-e1f1cd8afb01/C_VOL-b003-i174-cd.md5.tmp (00000000-0000-0000-0000-000000000000) on gvAA01-client-14 >> [2019-04-21 09:23:22.059225] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-4: performing entry selfheal on 65f6d9fb-a441-4e47-b91a-0936d11a8c8f >> [2019-04-21 09:23:22.066266] W [MSGID: 108015] [afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-4: expunging file 65f6d9fb-a441-4e47-b91a-0936d11a8c8f/C_VOL-b001-i14937-cd.md5.tmp (00000000-0000-0000-0000-000000000000) on gvAA01-client-14 >> [2019-04-21 09:23:41.129962] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-5: performing entry selfheal on 30547ab6-1fbd-422e-9c81-2009f9ff7ebe >> [2019-04-21 09:23:41.135919] W [MSGID: 108015] [afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-5: expunging file 30547ab6-1fbd-422e-9c81-2009f9ff7ebe/XXXXXXXX.vbm_346744_tmp (00000000-0000-0000-0000-000000000000) on gvAA01-client-17 >> >> [2019-04-21 09:33:24.015223] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-4: performing entry selfheal on 8fa9513e-82fd-4a6e-8ac9-e1f1cd8afb01 >> [2019-04-21 09:33:24.069686] W [MSGID: 108015] [afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-4: expunging file 8fa9513e-82fd-4a6e-8ac9-e1f1cd8afb01/C_VOL-b003-i174-cd.md5.tmp (00000000-0000-0000-0000-000000000000) on gvAA01-client-14 >> [2019-04-21 09:33:24.074341] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-4: performing entry selfheal on 65f6d9fb-a441-4e47-b91a-0936d11a8c8f >> [2019-04-21 09:33:24.080065] W [MSGID: 108015] [afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-4: expunging file 65f6d9fb-a441-4e47-b91a-0936d11a8c8f/C_VOL-b001-i14937-cd.md5.tmp (00000000-0000-0000-0000-000000000000) on gvAA01-client-14 >> [2019-04-21 09:33:42.099515] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-gvAA01-replicate-5: performing entry selfheal on 30547ab6-1fbd-422e-9c81-2009f9ff7ebe >> [2019-04-21 09:33:42.107481] W [MSGID: 108015] [afr-self-heal-entry.c:56:afr_selfheal_entry_delete] 0-gvAA01-replicate-5: expunging file 30547ab6-1fbd-422e-9c81-2009f9ff7ebe/XXXXXXXX.vbm_346744_tmp (00000000-0000-0000-0000-000000000000) on gvAA01-client-17 >> >> >> On Sun, Apr 21, 2019 at 3:55 PM Patrick Rennie > wrote: >> Just another small update, I'm continuing to watch my brick logs and I just saw these errors come up in the recent events too. I am going to continue to post any errors I see in the hope of finding the right one to try and fix.. >> This is from the logs on brick1, seems to be occurring on both nodes on brick1, although at different times. I'm not sure what this means, can anyone shed any light? >> I guess I am looking for some kind of specific error which may indicate something is broken or stuck and locking up and causing the extreme latency I'm seeing in the cluster. >> >> [2019-04-21 07:25:55.064497] E [rpcsvc.c:1364:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x7c700c, Program: GlusterFS 3.3, ProgVers: 330, Proc: 29) to rpc-transport (tcp.gvAA01-server) >> [2019-04-21 07:25:55.064612] E [server.c:195:server_submit_reply] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e58a) [0x7f3b3e93158a] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17d45) [0x7f3b3e4c5d45] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) [0x7f3b3e4b72cd] ) 0-: Reply submission failed >> [2019-04-21 07:25:55.064675] E [rpcsvc.c:1364:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x7c70af, Program: GlusterFS 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server) >> [2019-04-21 07:25:55.064705] E [server.c:195:server_submit_reply] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa) [0x7f3b3e9318fa] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35) [0x7f3b3e4c5f35] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) [0x7f3b3e4b72cd] ) 0-: Reply submission failed >> [2019-04-21 07:25:55.064742] E [rpcsvc.c:1364:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x7c723c, Program: GlusterFS 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server) >> [2019-04-21 07:25:55.064768] E [server.c:195:server_submit_reply] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa) [0x7f3b3e9318fa] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35) [0x7f3b3e4c5f35] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) [0x7f3b3e4b72cd] ) 0-: Reply submission failed >> [2019-04-21 07:25:55.064812] E [rpcsvc.c:1364:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x7c72b4, Program: GlusterFS 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server) >> [2019-04-21 07:25:55.064837] E [server.c:195:server_submit_reply] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa) [0x7f3b3e9318fa] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35) [0x7f3b3e4c5f35] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) [0x7f3b3e4b72cd] ) 0-: Reply submission failed >> [2019-04-21 07:25:55.064880] E [rpcsvc.c:1364:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x7c740b, Program: GlusterFS 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server) >> [2019-04-21 07:25:55.064905] E [server.c:195:server_submit_reply] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa) [0x7f3b3e9318fa] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35) [0x7f3b3e4c5f35] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) [0x7f3b3e4b72cd] ) 0-: Reply submission failed >> [2019-04-21 07:25:55.064939] E [rpcsvc.c:1364:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x7c7441, Program: GlusterFS 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server) >> [2019-04-21 07:25:55.064962] E [server.c:195:server_submit_reply] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa) [0x7f3b3e9318fa] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35) [0x7f3b3e4c5f35] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) [0x7f3b3e4b72cd] ) 0-: Reply submission failed >> [2019-04-21 07:25:55.064996] E [rpcsvc.c:1364:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x7c74d5, Program: GlusterFS 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server) >> [2019-04-21 07:25:55.065020] E [server.c:195:server_submit_reply] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa) [0x7f3b3e9318fa] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35) [0x7f3b3e4c5f35] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) [0x7f3b3e4b72cd] ) 0-: Reply submission failed >> [2019-04-21 07:25:55.065052] E [rpcsvc.c:1364:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x7c7551, Program: GlusterFS 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server) >> [2019-04-21 07:25:55.065076] E [server.c:195:server_submit_reply] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa) [0x7f3b3e9318fa] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35) [0x7f3b3e4c5f35] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) [0x7f3b3e4b72cd] ) 0-: Reply submission failed >> [2019-04-21 07:25:55.065110] E [rpcsvc.c:1364:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x7c76d1, Program: GlusterFS 3.3, ProgVers: 330, Proc: 30) to rpc-transport (tcp.gvAA01-server) >> [2019-04-21 07:25:55.065133] E [server.c:195:server_submit_reply] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/debug/io-stats.so(+0x1e8fa) [0x7f3b3e9318fa] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x17f35) [0x7f3b3e4c5f35] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.15/xlator/protocol/server.so(+0x92cd) [0x7f3b3e4b72cd] ) 0-: Reply submission failed >> >> Thanks again, >> >> -Patrick >> >> On Sun, Apr 21, 2019 at 3:50 PM Patrick Rennie > wrote: >> Hi Darrell, >> >> Thanks again for your advice, I've left it for a while but unfortunately it's still just as slow and causing more problems for our operations now. I will need to try and take some steps to at least bring performance back to normal while continuing to investigate the issue longer term. I can definitely see one node with heavier CPU than the other, almost double, which I am OK with, but I think the heal process is going to take forever, trying to check the "gluster volume heal info" shows thousands and thousands of files which may need healing, I have no idea how many in total the command is still running after hours, so I am not sure what has gone so wrong to cause this. >> >> I've checked cluster.op-version and cluster.max-op-version and it looks like I'm on the latest version there. >> >> I have no idea how long the healing is going to take on this cluster, we have around 560TB of data on here, but I don't think I can wait that long to try and restore performance to normal. >> >> Can anyone think of anything else I can try in the meantime to work out what's causing the extreme latency? >> >> I've been going through cluster client the logs of some of our VMs and on some of our FTP servers I found this in the cluster mount log, but I am not seeing it on any of our other servers, just our FTP servers. >> >> [2019-04-21 07:16:19.925388] E [MSGID: 101046] [dht-common.c:1904:dht_revalidate_cbk] 0-gvAA01-dht: dict is null >> [2019-04-21 07:19:43.413834] W [MSGID: 114031] [client-rpc-fops.c:2203:client3_3_setattr_cbk] 0-gvAA01-client-19: remote operation failed [No such file or directory] >> [2019-04-21 07:19:43.414153] W [MSGID: 114031] [client-rpc-fops.c:2203:client3_3_setattr_cbk] 0-gvAA01-client-20: remote operation failed [No such file or directory] >> [2019-04-21 07:23:33.154717] E [MSGID: 101046] [dht-common.c:1904:dht_revalidate_cbk] 0-gvAA01-dht: dict is null >> [2019-04-21 07:33:24.943913] E [MSGID: 101046] [dht-common.c:1904:dht_revalidate_cbk] 0-gvAA01-dht: dict is null >> >> Any ideas what this could mean? I am basically just grasping at straws here. >> >> I am going to hold off on the version upgrade until I know there are no files which need healing, which could be a while, from some reading I've done there shouldn't be any issues with this as both are on v3.12.x >> >> I've free'd up a small amount of space, but I still need to work on this further. >> >> I've read of a command "find .glusterfs -type f -links -2 -exec rm {} \;" which could be run on each brick and it would potentially clean up any files which were deleted straight from the bricks, but not via the client, I have a feeling this could help me free up about 5-10TB per brick from what I've been told about the history of this cluster. Can anyone confirm if this is actually safe to run? >> >> At this stage, I'm open to any suggestions as to how to proceed, thanks again for any advice. >> >> Cheers, >> >> - Patrick >> >> On Sun, Apr 21, 2019 at 1:22 AM Darrell Budic > wrote: >> Patrick, >> >> Sounds like progress. Be aware that gluster is expected to max out the CPUs on at least one of your servers while healing. This is normal and won?t adversely affect overall performance (any more than having bricks in need of healing, at any rate) unless you?re overdoing it. shd threads <= 4 should not do that on your hardware. Other tunings may have also increased overall performance, so you may see higher CPU than previously anyway. I?d recommend upping those thread counts and letting it heal as fast as possible, especially if these are dedicated Gluster storage servers (Ie: not also running VMs, etc). You should see ?normal? CPU use one heals are completed. I see ~15-30% overall normally, 95-98% while healing (x my 20 cores). It?s also likely to be different between your servers, in a pure replica, one tends to max and one tends to be a little higher, in a distributed-replica, I?d expect more than one to run harder while healing. >> >> Keep the differences between doing an ls on a brick and doing an ls on a gluster mount in mind. When you do a ls on a gluster volume, it isn?t just doing a ls on one brick, it?s effectively doing it on ALL of your bricks, and they all have to return data before the ls succeeds. In a distributed volume, it?s figuring out where on each volume things live and getting the stat() from each to assemble the whole thing. And if things are in need of healing, it will take even longer to decide which version is current and use it (shd triggers a heal anytime it encounters this). Any of these things being slow slows down the overall response. >> >> At this point, I?d get some sleep too, and let your cluster heal while you do. I?d really want it fully healed before I did any updates anyway, so let it use CPU and get itself sorted out. Expect it to do a round of healing after you upgrade each machine too, this is normal so don?t let the CPU spike surprise you, It?s just catching up from the downtime incurred by the update and/or reboot if you did one. >> >> That reminds me, check your gluster cluster.op-version and cluster.max-op-version (gluster vol get all all | grep op-version). If op-version isn?t at the max-op-verison, set it to it so you?re taking advantage of the latest features available to your version. >> >> -Darrell >> >>> On Apr 20, 2019, at 11:54 AM, Patrick Rennie > wrote: >>> >>> Hi Darrell, >>> >>> Thanks again for your advice, I've applied the acltype=posixacl on my zpools and I think that has reduced some of the noise from my brick logs. >>> I also bumped up some of the thread counts you suggested but my CPU load skyrocketed, so I dropped it back down to something slightly lower, but still higher than it was before, and will see how that goes for a while. >>> >>> Although low space is a definite issue, if I run an ls anywhere on my bricks directly it's instant, <1 second, and still takes several minutes via gluster, so there is still a problem in my gluster configuration somewhere. We don't have any snapshots, but I am trying to work out if any data on there is safe to delete, or if there is any way I can safely find and delete data which has been removed directly from the bricks in the past. I also have lz4 compression already enabled on each zpool which does help a bit, we get between 1.05 and 1.08x compression on this data. >>> I've tried to go through each client and checked it's cluster mount logs and also my brick logs and looking for errors, so far nothing is jumping out at me, but there are some warnings and errors here and there, I am trying to work out what they mean. >>> >>> It's already 1 am here and unfortunately, I'm still awake working on this issue, but I think that I will have to leave the version upgrades until tomorrow. >>> >>> Thanks again for your advice so far. If anyone has any ideas on where I can look for errors other than brick logs or the cluster mount logs to help resolve this issue, it would be much appreciated. >>> >>> Cheers, >>> >>> - Patrick >>> >>> On Sat, Apr 20, 2019 at 11:57 PM Darrell Budic > wrote: >>> See inline: >>> >>>> On Apr 20, 2019, at 10:09 AM, Patrick Rennie > wrote: >>>> >>>> Hi Darrell, >>>> >>>> Thanks for your reply, this issue seems to be getting worse over the last few days, really has me tearing my hair out. I will do as you have suggested and get started on upgrading from 3.12.14 to 3.12.15. >>>> I've checked the zfs properties and all bricks have "xattr=sa" set, but none of them has "acltype=posixacl" set, currently the acltype property shows "off", if I make these changes will it apply retroactively to the existing data? I'm unfamiliar with what this will change so I may need to look into that before I proceed. >>> >>> It is safe to apply that now, any new set/get calls will then use it if new posixacls exist, and use older if not. ZFS is good that way. It should clear up your posix_acl and posix errors over time. >>> >>>> I understand performance is going to slow down as the bricks get full, I am currently trying to free space and migrate data to some newer storage, I have fresh several hundred TB storage I just setup recently but with these performance issues it's really slow. I also believe there is significant data which has been deleted directly from the bricks in the past, so if I can reclaim this space in a safe manner then I will have at least around 10-15% free space. >>> >>> Full ZFS volumes will have a much larger impact on performance than you?d think, I?d prioritize this. If you have been taking zfs snapshots, consider deleting them to get the overall volume free space back up. And just to be sure it?s been said, delete from within the mounted volumes, don?t delete directly from the bricks (gluster will just try and heal it later, compounding your issues). Does not apply to deleting other data from the ZFS volume if it?s not part of the brick directory, of course. >>> >>>> These servers have dual 8 core Xeon (E5-2620v4) and 512GB of RAM so generally they have plenty of resources available, currently only using around 330/512GB of memory. >>>> >>>> I will look into what your suggested settings will change, and then will probably go ahead with your recommendations, for our specs as stated above, what would you suggest for performance.io -thread-count ? >>> >>> I run single 2630v4s on my servers, which have a smaller storage footprint than yours. I?d go with 32 for performance.io -thread-count. I?d try 4 for the shd thread settings on that gear. Your memory use sounds fine, so no worries there. >>> >>>> Our workload is nothing too extreme, we have a few VMs which write backup data to this storage nightly for our clients, our VMs don't live on this cluster, but just write to it. >>> >>> If they are writing compressible data, you?ll get immediate benefit by setting compression=lz4 on your ZFS volumes. It won?t help any old data, of course, but it will compress new data going forward. This is another one that?s safe to enable on the fly. >>> >>>> I've been going through all of the logs I can, below are some slightly sanitized errors I've come across, but I'm not sure what to make of them. The main error I am seeing is the first one below, across several of my bricks, but possibly only for specific folders on the cluster, I'm not 100% about that yet though. >>>> >>>> [2019-04-20 05:56:59.512649] E [MSGID: 113001] [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default [Operation not supported] >>>> [2019-04-20 05:59:06.084333] E [MSGID: 113001] [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default [Operation not supported] >>>> [2019-04-20 05:59:43.289030] E [MSGID: 113001] [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default [Operation not supported] >>>> [2019-04-20 05:59:50.582257] E [MSGID: 113001] [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default [Operation not supported] >>>> [2019-04-20 06:01:42.501701] E [MSGID: 113001] [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on /brick7/xxxxxxxxxxxxxxxxxxxx: system.posix_acl_default [Operation not supported] >>>> [2019-04-20 06:01:51.665354] W [posix.c:4929:posix_getxattr] 0-gvAA01-posix: Extended attributes not supported (try remounting brick with 'user_xattr' flag) >>>> >>>> >>>> [2019-04-20 13:12:36.131856] E [MSGID: 113002] [posix-helpers.c:893:posix_gfid_set] 0-gvAA01-posix: gfid is null for /xxxxxxxxxxxxxxxxxxxx [Invalid argument] >>>> [2019-04-20 13:12:36.131959] E [MSGID: 113002] [posix.c:362:posix_lookup] 0-gvAA01-posix: buf->ia_gfid is null for /brick2/xxxxxxxxxxxxxxxxxxxx_62906_tmp [No data available] >>>> [2019-04-20 13:12:36.132016] E [MSGID: 115050] [server-rpc-fops.c:175:server_lookup_cbk] 0-gvAA01-server: 24274759: LOOKUP /xxxxxxxxxxxxxxxxxxxx (a7c9b4a0-b7ee-4d01-a79e-576013c8ac87/Cloud Backup_clone1.vbm_62906_tmp), client: 00-A-16217-2019/04/08-21:23:03:692424-gvAA01-client-4-0-3, error-xlator: gvAA01-posix [No data available] >>>> [2019-04-20 13:12:38.093719] E [MSGID: 115050] [server-rpc-fops.c:175:server_lookup_cbk] 0-gvAA01-server: 24276491: LOOKUP /xxxxxxxxxxxxxxxxxxxx (a7c9b4a0-b7ee-4d01-a79e-576013c8ac87/Cloud Backup_clone1.vbm_62906_tmp), client: 00-A-16217-2019/04/08-21:23:03:692424-gvAA01-client-4-0-3, error-xlator: gvAA01-posix [No data available] >>>> [2019-04-20 13:12:38.093660] E [MSGID: 113002] [posix-helpers.c:893:posix_gfid_set] 0-gvAA01-posix: gfid is null for /xxxxxxxxxxxxxxxxxxxx [Invalid argument] >>>> [2019-04-20 13:12:38.093696] E [MSGID: 113002] [posix.c:362:posix_lookup] 0-gvAA01-posix: buf->ia_gfid is null for /brick2/xxxxxxxxxxxxxxxxxxxx [No data available] >>>> >>> >>> posixacls should clear those up, as mentioned. >>> >>>> >>>> [2019-04-20 14:25:59.654576] E [inodelk.c:404:__inode_unlock_lock] 0-gvAA01-locks: Matching lock not found for unlock 0-9223372036854775807, by 980fdbbd367f0000 on 0x7fc4f0161440 >>>> [2019-04-20 14:25:59.654668] E [MSGID: 115053] [server-rpc-fops.c:295:server_inodelk_cbk] 0-gvAA01-server: 6092928: INODELK /xxxxxxxxxxxxxxxxxxxx.cdr$ (25b14631-a179-4274-8243-6e272d4f2ad8), client: cb-per-worker18-53637-2019/04/19-14:25:37:927673-gvAA01-client-1-0-4, error-xlator: gvAA01-locks [Invalid argument] >>>> >>>> >>>> [2019-04-20 13:35:07.495495] E [rpcsvc.c:1364:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x247c644, Program: GlusterFS 3.3, ProgVers: 330, Proc: 27) to rpc-transport (tcp.gvAA01-server) >>>> [2019-04-20 13:35:07.495619] E [server.c:195:server_submit_reply] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.14/xlator/debug/io-stats.so(+0x1696a) [0x7ff4ae6f796a] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.14/xlator/protocol/server.so(+0x2d6e8) [0x7ff4ae2a96e8] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.14/xlator/protocol/server.so(+0x928d) [0x7ff4ae28528d] ) 0-: Reply submission failed >>>> >>> >>> Fix the posix acls and see if these clear up over time as well, I?m unclear on what the overall effect of running without the posix acls will be to total gluster health. Your biggest problem sounds like you need to free up space on the volumes and get the overall volume health back up to par and see if that doesn?t resolve the symptoms you?re seeing. >>> >>> >>>> >>>> Thank you again for your assistance. It is greatly appreciated. >>>> >>>> - Patrick >>>> >>>> >>>> >>>> On Sat, Apr 20, 2019 at 10:50 PM Darrell Budic > wrote: >>>> Patrick, >>>> >>>> I would definitely upgrade your two nodes from 3.12.14 to 3.12.15. You also mention ZFS, and that error you show makes me think you need to check to be sure you have ?xattr=sa? and ?acltype=posixacl? set on your ZFS volumes. >>>> >>>> You also observed your bricks are crossing the 95% full line, ZFS performance will degrade significantly the closer you get to full. In my experience, this starts somewhere between 10% and 5% free space remaining, so you?re in that realm. >>>> >>>> How?s your free memory on the servers doing? Do you have your zfs arc cache limited to something less than all the RAM? It shares pretty well, but I?ve encountered situations where other things won?t try and take ram back properly if they think it?s in use, so ZFS never gets the opportunity to give it up. >>>> >>>> Since your volume is a disperse-replica, you might try tuning disperse.shd-max-threads, default is 1, I?d try it at 2, 4, or even more if the CPUs are beefy enough. And setting server.event-threads to 4 and client.event-threads to 8 has proven helpful in many cases. After you get upgraded to 3.12.15, enabling performance.stat-prefetch may help as well. I don?t know if it matters, but I?d also recommend resetting performance.least-prio-threads to the default of 1 (or try 2 or 4) and/or also setting performance.io -thread-count to 32 if those have beefy CPUs. >>>> >>>> Beyond those general ideas, more info about your hardware (CPU and RAM) and workload (VMs, direct storage for web servers or enders, etc) may net you some more ideas. Then you?re going to have to do more digging into brick logs looking for errors and/or warnings to see what?s going on. >>>> >>>> -Darrell >>>> >>>> >>>>> On Apr 20, 2019, at 8:22 AM, Patrick Rennie > wrote: >>>>> >>>>> Hello Gluster Users, >>>>> >>>>> I am hoping someone can help me with resolving an ongoing issue I've been having, I'm new to mailing lists so forgive me if I have gotten anything wrong. We have noticed our performance deteriorating over the last few weeks, easily measured by trying to do an ls on one of our top-level folders, and timing it, which usually would take 2-5 seconds, and now takes up to 20 minutes, which obviously renders our cluster basically unusable. This has been intermittent in the past but is now almost constant and I am not sure how to work out the exact cause. We have noticed some errors in the brick logs, and have noticed that if we kill the right brick process, performance instantly returns back to normal, this is not always the same brick, but it indicates to me something in the brick processes or background tasks may be causing extreme latency. Due to this ability to fix it by killing the right brick process off, I think it's a specific file, or folder, or operation which may be hanging and causing the increased latency, but I am not sure how to work it out. One last thing to add is that our bricks are getting quite full (~95% full), we are trying to migrate data off to new storage but that is going slowly, not helped by this issue. I am currently trying to run a full heal as there appear to be many files needing healing, and I have all brick processes running so they have an opportunity to heal, but this means performance is very poor. It currently takes over 15-20 minutes to do an ls of one of our top-level folders, which just contains 60-80 other folders, this should take 2-5 seconds. This is all being checked by FUSE mount locally on the storage node itself, but it is the same for other clients and VMs accessing the cluster. Initially, it seemed our NFS mounts were not affected and operated at normal speed, but testing over the last day has shown that our NFS clients are also extremely slow, so it doesn't seem specific to FUSE as I first thought it might be. >>>>> >>>>> I am not sure how to proceed from here, I am fairly new to gluster having inherited this setup from my predecessor and trying to keep it going. I have included some info below to try and help with diagnosis, please let me know if any further info would be helpful. I would really appreciate any advice on what I could try to work out the cause. Thank you in advance for reading this, and any suggestions you might be able to offer. >>>>> >>>>> - Patrick >>>>> >>>>> This is an example of the main error I see in our brick logs, there have been others, I can post them when I see them again too: >>>>> [2019-04-20 04:54:43.055680] E [MSGID: 113001] [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on /brick1/ library: system.posix_acl_default [Operation not supported] >>>>> [2019-04-20 05:01:29.476313] W [posix.c:4929:posix_getxattr] 0-gvAA01-posix: Extended attributes not supported (try remounting brick with 'user_xattr' flag) >>>>> >>>>> Our setup consists of 2 storage nodes and an arbiter node. I have noticed our nodes are on slightly different versions, I'm not sure if this could be an issue. We have 9 bricks on each node, made up of ZFS RAIDZ2 pools - total capacity is around 560TB. >>>>> We have bonded 10gbps NICS on each node, and I have tested bandwidth with iperf and found that it's what would be expected from this config. >>>>> Individual brick performance seems ok, I've tested several bricks using dd and can write a 10GB files at 1.7GB/s. >>>>> >>>>> # dd if=/dev/zero of=/brick1/test/test.file bs=1M count=10000 >>>>> 10000+0 records in >>>>> 10000+0 records out >>>>> 10485760000 bytes (10 GB, 9.8 GiB) copied, 6.20303 s, 1.7 GB/s >>>>> >>>>> Node 1: >>>>> # glusterfs --version >>>>> glusterfs 3.12.15 >>>>> >>>>> Node 2: >>>>> # glusterfs --version >>>>> glusterfs 3.12.14 >>>>> >>>>> Arbiter: >>>>> # glusterfs --version >>>>> glusterfs 3.12.14 >>>>> >>>>> Here is our gluster volume status: >>>>> >>>>> # gluster volume status >>>>> Status of volume: gvAA01 >>>>> Gluster process TCP Port RDMA Port Online Pid >>>>> ------------------------------------------------------------------------------ >>>>> Brick 01-B:/brick1/gvAA01/brick 49152 0 Y 7219 >>>>> Brick 02-B:/brick1/gvAA01/brick 49152 0 Y 21845 >>>>> Brick 00-A:/arbiterAA01/gvAA01/bri >>>>> ck1 49152 0 Y 6931 >>>>> Brick 01-B:/brick2/gvAA01/brick 49153 0 Y 7239 >>>>> Brick 02-B:/brick2/gvAA01/brick 49153 0 Y 9916 >>>>> Brick 00-A:/arbiterAA01/gvAA01/bri >>>>> ck2 49153 0 Y 6939 >>>>> Brick 01-B:/brick3/gvAA01/brick 49154 0 Y 7235 >>>>> Brick 02-B:/brick3/gvAA01/brick 49154 0 Y 21858 >>>>> Brick 00-A:/arbiterAA01/gvAA01/bri >>>>> ck3 49154 0 Y 6947 >>>>> Brick 01-B:/brick4/gvAA01/brick 49155 0 Y 31840 >>>>> Brick 02-B:/brick4/gvAA01/brick 49155 0 Y 9933 >>>>> Brick 00-A:/arbiterAA01/gvAA01/bri >>>>> ck4 49155 0 Y 6956 >>>>> Brick 01-B:/brick5/gvAA01/brick 49156 0 Y 7233 >>>>> Brick 02-B:/brick5/gvAA01/brick 49156 0 Y 9942 >>>>> Brick 00-A:/arbiterAA01/gvAA01/bri >>>>> ck5 49156 0 Y 6964 >>>>> Brick 01-B:/brick6/gvAA01/brick 49157 0 Y 7234 >>>>> Brick 02-B:/brick6/gvAA01/brick 49157 0 Y 9952 >>>>> Brick 00-A:/arbiterAA01/gvAA01/bri >>>>> ck6 49157 0 Y 6974 >>>>> Brick 01-B:/brick7/gvAA01/brick 49158 0 Y 7248 >>>>> Brick 02-B:/brick7/gvAA01/brick 49158 0 Y 9960 >>>>> Brick 00-A:/arbiterAA01/gvAA01/bri >>>>> ck7 49158 0 Y 6984 >>>>> Brick 01-B:/brick8/gvAA01/brick 49159 0 Y 7253 >>>>> Brick 02-B:/brick8/gvAA01/brick 49159 0 Y 9970 >>>>> Brick 00-A:/arbiterAA01/gvAA01/bri >>>>> ck8 49159 0 Y 6993 >>>>> Brick 01-B:/brick9/gvAA01/brick 49160 0 Y 7245 >>>>> Brick 02-B:/brick9/gvAA01/brick 49160 0 Y 9984 >>>>> Brick 00-A:/arbiterAA01/gvAA01/bri >>>>> ck9 49160 0 Y 7001 >>>>> NFS Server on localhost 2049 0 Y 17276 >>>>> Self-heal Daemon on localhost N/A N/A Y 25245 >>>>> NFS Server on 02-B 2049 0 Y 9089 >>>>> Self-heal Daemon on 02-B N/A N/A Y 17838 >>>>> NFS Server on 00-a 2049 0 Y 15660 >>>>> Self-heal Daemon on 00-a N/A N/A Y 16218 >>>>> >>>>> Task Status of Volume gvAA01 >>>>> ------------------------------------------------------------------------------ >>>>> There are no active volume tasks >>>>> >>>>> And gluster volume info: >>>>> >>>>> # gluster volume info >>>>> >>>>> Volume Name: gvAA01 >>>>> Type: Distributed-Replicate >>>>> Volume ID: ca4ece2c-13fe-414b-856c-2878196d6118 >>>>> Status: Started >>>>> Snapshot Count: 0 >>>>> Number of Bricks: 9 x (2 + 1) = 27 >>>>> Transport-type: tcp >>>>> Bricks: >>>>> Brick1: 01-B:/brick1/gvAA01/brick >>>>> Brick2: 02-B:/brick1/gvAA01/brick >>>>> Brick3: 00-A:/arbiterAA01/gvAA01/brick1 (arbiter) >>>>> Brick4: 01-B:/brick2/gvAA01/brick >>>>> Brick5: 02-B:/brick2/gvAA01/brick >>>>> Brick6: 00-A:/arbiterAA01/gvAA01/brick2 (arbiter) >>>>> Brick7: 01-B:/brick3/gvAA01/brick >>>>> Brick8: 02-B:/brick3/gvAA01/brick >>>>> Brick9: 00-A:/arbiterAA01/gvAA01/brick3 (arbiter) >>>>> Brick10: 01-B:/brick4/gvAA01/brick >>>>> Brick11: 02-B:/brick4/gvAA01/brick >>>>> Brick12: 00-A:/arbiterAA01/gvAA01/brick4 (arbiter) >>>>> Brick13: 01-B:/brick5/gvAA01/brick >>>>> Brick14: 02-B:/brick5/gvAA01/brick >>>>> Brick15: 00-A:/arbiterAA01/gvAA01/brick5 (arbiter) >>>>> Brick16: 01-B:/brick6/gvAA01/brick >>>>> Brick17: 02-B:/brick6/gvAA01/brick >>>>> Brick18: 00-A:/arbiterAA01/gvAA01/brick6 (arbiter) >>>>> Brick19: 01-B:/brick7/gvAA01/brick >>>>> Brick20: 02-B:/brick7/gvAA01/brick >>>>> Brick21: 00-A:/arbiterAA01/gvAA01/brick7 (arbiter) >>>>> Brick22: 01-B:/brick8/gvAA01/brick >>>>> Brick23: 02-B:/brick8/gvAA01/brick >>>>> Brick24: 00-A:/arbiterAA01/gvAA01/brick8 (arbiter) >>>>> Brick25: 01-B:/brick9/gvAA01/brick >>>>> Brick26: 02-B:/brick9/gvAA01/brick >>>>> Brick27: 00-A:/arbiterAA01/gvAA01/brick9 (arbiter) >>>>> Options Reconfigured: >>>>> cluster.shd-max-threads: 4 >>>>> performance.least-prio-threads: 16 >>>>> cluster.readdir-optimize: on >>>>> performance.quick-read: off >>>>> performance.stat-prefetch: off >>>>> cluster.data-self-heal: on >>>>> cluster.lookup-unhashed: auto >>>>> cluster.lookup-optimize: on >>>>> cluster.favorite-child-policy: mtime >>>>> server.allow-insecure: on >>>>> transport.address-family: inet >>>>> client.bind-insecure: on >>>>> cluster.entry-self-heal: off >>>>> cluster.metadata-self-heal: off >>>>> performance.md-cache-timeout: 600 >>>>> cluster.self-heal-daemon: enable >>>>> performance.readdir-ahead: on >>>>> diagnostics.brick-log-level: INFO >>>>> nfs.disable: off >>>>> >>>>> Thank you for any assistance. >>>>> >>>>> - Patrick >>>>> _______________________________________________ >>>>> Gluster-users mailing list >>>>> Gluster-users at gluster.org >>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>> >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From hongzhi.song at windriver.com Thu Apr 25 02:21:05 2019 From: hongzhi.song at windriver.com (Hongzhi, Song) Date: Thu, 25 Apr 2019 10:21:05 +0800 Subject: [Gluster-users] Download server is crashed? Message-ID: <9d67dde1-f0f5-f304-e554-997120af4c72@windriver.com> Hi all, I try to download .tar.gz from https://download.gluster.org/pub/gluster/glusterfs/LATEST/glusterfs-6.1.tar.gz and https://bits.gluster.org/pub/gluster/glusterfs/src/glusterfs-6.1.tar.gz But connection is always been closed. Anyone else meet this issue? --Hongzhi -------------- next part -------------- An HTML attachment was scrubbed... URL: From vbellur at redhat.com Thu Apr 25 19:45:57 2019 From: vbellur at redhat.com (Vijay Bellur) Date: Thu, 25 Apr 2019 12:45:57 -0700 Subject: [Gluster-users] Download server is crashed? In-Reply-To: <9d67dde1-f0f5-f304-e554-997120af4c72@windriver.com> References: <9d67dde1-f0f5-f304-e554-997120af4c72@windriver.com> Message-ID: On Wed, Apr 24, 2019 at 7:36 PM Hongzhi, Song wrote: > Hi all, > > I try to download .tar.gz from > https://download.gluster.org/pub/gluster/glusterfs/LATEST/ > glusterfs-6.1.tar.gz > > > and > https://bits.gluster.org/pub/gluster/glusterfs/src/glusterfs-6.1.tar.gz > > > But connection is always been closed. Anyone else meet this issue? > > > Checked now and am able to download successfully. Do you still observe the problem? Thanks, Vijay -------------- next part -------------- An HTML attachment was scrubbed... URL: From phaley at mit.edu Fri Apr 26 01:05:45 2019 From: phaley at mit.edu (Pat Haley) Date: Thu, 25 Apr 2019 21:05:45 -0400 Subject: [Gluster-users] Expanding brick size in glusterfs 3.7.11 Message-ID: <29119358-18e7-caab-2535-e2530255fa75@mit.edu> Hi, Last summer we added a new brick to our gluster volume (running glusterfs 3.7.11).? The new brick was a new server with with 12 of 24 disk bays filled (we couldn't afford to fill them all at the time).? These 12 disks are managed in a hardware RAID-6.? We have recently been able to purchase another 12 disks.? We would like to just add these new disks to the existing hardware RAID and thus expand the size of the brick.? If we can successfully add them to the hardware RAID like this, will gluster have any problems with the expanded brick size? -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Pat Haley Email: phaley at mit.edu Center for Ocean Engineering Phone: (617) 253-6824 Dept. of Mechanical Engineering Fax: (617) 253-8125 MIT, Room 5-213 http://web.mit.edu/phaley/www/ 77 Massachusetts Avenue Cambridge, MA 02139-4301 From hongzhi.song at windriver.com Fri Apr 26 01:27:04 2019 From: hongzhi.song at windriver.com (Hongzhi, Song) Date: Fri, 26 Apr 2019 09:27:04 +0800 Subject: [Gluster-users] Download server is crashed? In-Reply-To: References: <9d67dde1-f0f5-f304-e554-997120af4c72@windriver.com> Message-ID: Yeah, it's works for me. Thanks very much. --Hongzhi On 4/26/19 3:45 AM, Vijay Bellur wrote: > > > On Wed, Apr 24, 2019 at 7:36 PM Hongzhi, Song > > wrote: > > Hi all, > > I try to download .tar.gz from > https://download.gluster.org/pub/gluster/glusterfs/LATEST/glusterfs-6.1.tar.gz > > > and > https://bits.gluster.org/pub/gluster/glusterfs/src/glusterfs-6.1.tar.gz > > > But connection is always been closed. Anyone else meet this issue? > > > > Checked now and am able to download successfully. Do you still observe > the problem? > > Thanks, > Vijay -------------- next part -------------- An HTML attachment was scrubbed... URL: From jim.kinney at gmail.com Fri Apr 26 03:35:43 2019 From: jim.kinney at gmail.com (Jim Kinney) Date: Thu, 25 Apr 2019 23:35:43 -0400 Subject: [Gluster-users] Expanding brick size in glusterfs 3.7.11 In-Reply-To: <29119358-18e7-caab-2535-e2530255fa75@mit.edu> References: <29119358-18e7-caab-2535-e2530255fa75@mit.edu> Message-ID: I've expanded bricks using lvm and there was no problems at all with gluster seeing the change. The expansion was performed basically simultaneously on both existing bricks of a replica. I would expect the raid expansion to behave similarly. On April 25, 2019 9:05:45 PM EDT, Pat Haley wrote: > >Hi, > >Last summer we added a new brick to our gluster volume (running >glusterfs 3.7.11).? The new brick was a new server with with 12 of 24 >disk bays filled (we couldn't afford to fill them all at the time).? >These 12 disks are managed in a hardware RAID-6.? We have recently been > >able to purchase another 12 disks.? We would like to just add these new > >disks to the existing hardware RAID and thus expand the size of the >brick.? If we can successfully add them to the hardware RAID like this, > >will gluster have any problems with the expanded brick size? > >-- > >-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >Pat Haley Email: phaley at mit.edu >Center for Ocean Engineering Phone: (617) 253-6824 >Dept. of Mechanical Engineering Fax: (617) 253-8125 >MIT, Room 5-213 http://web.mit.edu/phaley/www/ >77 Massachusetts Avenue >Cambridge, MA 02139-4301 > >_______________________________________________ >Gluster-users mailing list >Gluster-users at gluster.org >https://lists.gluster.org/mailman/listinfo/gluster-users -- Sent from my Android device with K-9 Mail. All tyopes are thumb related and reflect authenticity. -------------- next part -------------- An HTML attachment was scrubbed... URL: From pascal.suter at dalco.ch Fri Apr 26 06:22:22 2019 From: pascal.suter at dalco.ch (Pascal Suter) Date: Fri, 26 Apr 2019 08:22:22 +0200 Subject: [Gluster-users] Expanding brick size in glusterfs 3.7.11 In-Reply-To: References: <29119358-18e7-caab-2535-e2530255fa75@mit.edu> Message-ID: <5091be7d-8af9-dd7d-a258-2b2a981f2abf@dalco.ch> I may add to that that i have expanded linux filesystems (xfs and ext4) both via LVM and some by adding disks to a hardware raid. from the OS point of view it does not make a difference, the procedure once the block device on which the filesytem resides is expanded is prettymuch the same and so far always worked like a charm. one word of caution though: i've just recently had a case with a raid 6 across 12 disks (1TB, a 5 year old RAID array) where during a planned power outage a disk failed, when turnging the storage back on, a second failed right after that and the third failed during rebuild. luckily this was a retired server used for backup only, so no harm done.. but this just shows us, that under the "ritght" circumstances, multi disk failures are possible. the more disks you have in your raidset the higher the chance of a disk failure.. by doubling the amount of disks in your raidset you double the chance of a disk failure and therefore a double or tripple disk failure as well. long story short.. i'd consider creating a second raid acorss your 12 new disks and adding this as a second brick to gluster storage.. that's what gluster's for after all .. to scale your storage :) in the case of raid 6 you will loose the capacity of two disks but you will gain alot in terms of redundancy and dataprotection. also you will not have the performance impact of the raid expansion.. this is usually a rather long process which will eat a lot of your performance while it's ongoing. of course, if you have mirrored bricks, that's a different story, but i assume you don't. cheers Pascal On 26.04.19 05:35, Jim Kinney wrote: > I've expanded bricks using lvm and there was no problems at all with > gluster seeing the change. The expansion was performed basically > simultaneously on both existing bricks of a replica. I would expect > the raid expansion to behave similarly. > > On April 25, 2019 9:05:45 PM EDT, Pat Haley wrote: > > Hi, > > Last summer we added a new brick to our gluster volume (running > glusterfs 3.7.11).? The new brick was a new server with with 12 of 24 > disk bays filled (we couldn't afford to fill them all at the time). > These 12 disks are managed in a hardware RAID-6.? We have recently been > able to purchase another 12 disks.? We would like to just add these new > disks to the existing hardware RAID and thus expand the size of the > brick.? If we can successfully add them to the hardware RAID like this, > will gluster have any problems with the expanded brick size? > > -- > ------------------------------------------------------------------------ > Pat Haley Email: phaley at mit.edu > Center for Ocean Engineering Phone: (617) 253-6824 > Dept. of Mechanical Engineering Fax: (617) 253-8125 > MIT, Room 5-213http://web.mit.edu/phaley/www/ > 77 Massachusetts Avenue > Cambridge, MA 02139-4301 > ------------------------------------------------------------------------ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > > > -- > Sent from my Android device with K-9 Mail. All tyopes are thumb > related and reflect authenticity. > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From aspandey at redhat.com Fri Apr 26 06:44:07 2019 From: aspandey at redhat.com (Ashish Pandey) Date: Fri, 26 Apr 2019 02:44:07 -0400 (EDT) Subject: [Gluster-users] Expanding brick size in glusterfs 3.7.11 In-Reply-To: <5091be7d-8af9-dd7d-a258-2b2a981f2abf@dalco.ch> References: <29119358-18e7-caab-2535-e2530255fa75@mit.edu> <5091be7d-8af9-dd7d-a258-2b2a981f2abf@dalco.ch> Message-ID: <626137912.15064407.1556261047198.JavaMail.zimbra@redhat.com> Pat, I would like to see the final configuration of your gluster volume after you added bricks on new node. You mentioned that - "The new brick was a new server with with 12 of 24 disk bays filled (we couldn't afford to fill them all at the time). These 12 disks are managed in a hardware RAID-6." If all the new bricks are on one new node then probably that is not a good situation to be in . @Pascal, I agree with your suggestion.. "long story short.. i'd consider creating a second raid acorss your 12 new disks and adding this as a second brick to gluster storage.. that's what gluster's for after all .. to scale your storage :) in the case of raid 6 you will loose the capacity of two disks but you will gain alot in terms of redundancy and dataprotection." --- Ashish ----- Original Message ----- From: "Pascal Suter" To: gluster-users at gluster.org Sent: Friday, April 26, 2019 11:52:22 AM Subject: Re: [Gluster-users] Expanding brick size in glusterfs 3.7.11 I may add to that that i have expanded linux filesystems (xfs and ext4) both via LVM and some by adding disks to a hardware raid. from the OS point of view it does not make a difference, the procedure once the block device on which the filesytem resides is expanded is prettymuch the same and so far always worked like a charm. one word of caution though: i've just recently had a case with a raid 6 across 12 disks (1TB, a 5 year old RAID array) where during a planned power outage a disk failed, when turnging the storage back on, a second failed right after that and the third failed during rebuild. luckily this was a retired server used for backup only, so no harm done.. but this just shows us, that under the "ritght" circumstances, multi disk failures are possible. the more disks you have in your raidset the higher the chance of a disk failure.. by doubling the amount of disks in your raidset you double the chance of a disk failure and therefore a double or tripple disk failure as well. long story short.. i'd consider creating a second raid acorss your 12 new disks and adding this as a second brick to gluster storage.. that's what gluster's for after all .. to scale your storage :) in the case of raid 6 you will loose the capacity of two disks but you will gain alot in terms of redundancy and dataprotection. also you will not have the performance impact of the raid expansion.. this is usually a rather long process which will eat a lot of your performance while it's ongoing. of course, if you have mirrored bricks, that's a different story, but i assume you don't. cheers Pascal On 26.04.19 05:35, Jim Kinney wrote: I've expanded bricks using lvm and there was no problems at all with gluster seeing the change. The expansion was performed basically simultaneously on both existing bricks of a replica. I would expect the raid expansion to behave similarly. On April 25, 2019 9:05:45 PM EDT, Pat Haley wrote:
Hi, Last summer we added a new brick to our gluster volume (running glusterfs 3.7.11).? The new brick was a new server with with 12 of 24 disk bays filled (we couldn't afford to fill them all at the time).? These 12 disks are managed in a hardware RAID-6.? We have recently been able to purchase another 12 disks.? We would like to just add these new disks to the existing hardware RAID and thus expand the size of the brick.? If we can successfully add them to the hardware RAID like this, will gluster have any problems with the expanded brick size? -- Pat Haley Email: phaley at mit.edu Center for Ocean Engineering Phone: (617) 253-6824 Dept. of Mechanical Engineering Fax: (617) 253-8125 MIT, Room 5-213 http://web.mit.edu/phaley/www/ 77 Massachusetts Avenue Cambridge, MA 02139-4301 Gluster-users mailing list Gluster-users at gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users -- Sent from my Android device with K-9 Mail. All tyopes are thumb related and reflect authenticity. _______________________________________________ Gluster-users mailing list Gluster-users at gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
_______________________________________________ Gluster-users mailing list Gluster-users at gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From atumball at redhat.com Fri Apr 26 07:54:00 2019 From: atumball at redhat.com (Amar Tumballi Suryanarayan) Date: Fri, 26 Apr 2019 13:24:00 +0530 Subject: [Gluster-users] One more way to contact Gluster team - Slack (gluster.slack.com) Message-ID: Hi All, We wanted to move to Slack from IRC for our official communication channel from sometime, but couldn't as we didn't had a proper URL for us to register. 'gluster' was taken and we didn't knew who had it registered. Thanks to constant ask from Satish, Slack team has now agreed to let us use https://gluster.slack.com and I am happy to invite you all there. (Use this link to join) Please note that, it won't be a replacement for mailing list. But can be used by all developers and users for quick communication. Also note that, no information there would be 'stored' beyond 10k lines as we are using the free version of Slack. Regards, Amar -------------- next part -------------- An HTML attachment was scrubbed... URL: From mscherer at redhat.com Fri Apr 26 08:16:39 2019 From: mscherer at redhat.com (Michael Scherer) Date: Fri, 26 Apr 2019 10:16:39 +0200 Subject: [Gluster-users] [Gluster-devel] One more way to contact Gluster team - Slack (gluster.slack.com) In-Reply-To: References: Message-ID: Le vendredi 26 avril 2019 ? 13:24 +0530, Amar Tumballi Suryanarayan a ?crit : > Hi All, > > We wanted to move to Slack from IRC for our official communication > channel > from sometime, but couldn't as we didn't had a proper URL for us to > register. 'gluster' was taken and we didn't knew who had it > registered. > Thanks to constant ask from Satish, Slack team has now agreed to let > us use > https://gluster.slack.com and I am happy to invite you all there. > (Use this > link > < > https://join.slack.com/t/gluster/shared_invite/enQtNjIxMTA1MTk3MDE1LWIzZWZjNzhkYWEwNDdiZWRiOTczMTc4ZjdiY2JiMTc3MDE5YmEyZTRkNzg0MWJiMWM3OGEyMDU2MmYzMTViYTA > > > to > join) > > Please note that, it won't be a replacement for mailing list. But can > be > used by all developers and users for quick communication. Also note > that, > no information there would be 'stored' beyond 10k lines as we are > using the > free version of Slack. Aren't we concerned about the ToS of slack ? Last time I did read them, they were quite scary (like, if you use your corporate email, you engage your employer, and that wasn't the worst part). Also, to anticipate the question, my employer Legal department told me to not setup a bridge between IRC and slack, due to the said ToS. -- Michael Scherer Sysadmin, Community Infrastructure -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: This is a digitally signed message part URL: From hunter86_bg at yahoo.com Fri Apr 26 09:30:24 2019 From: hunter86_bg at yahoo.com (Strahil) Date: Fri, 26 Apr 2019 12:30:24 +0300 Subject: [Gluster-users] Expanding brick size in glusterfs 3.7.11 Message-ID: <917597svcslwraoaswvs1d5k.1556271024750@email.android.com> I have gluster bricks ontop of thin LVM and when I resized my LV, everything went live and without issues. As gluster is working ontop the File System, it relies on the information from it (in my case XFS). Just make sure that bricks for replica volumes have the same size (verify with 'df /brick/path'). Best Regards, Strahil NikolovOn Apr 26, 2019 04:05, Pat Haley wrote: > > > Hi, > > Last summer we added a new brick to our gluster volume (running > glusterfs 3.7.11).? The new brick was a new server with with 12 of 24 > disk bays filled (we couldn't afford to fill them all at the time).? > These 12 disks are managed in a hardware RAID-6.? We have recently been > able to purchase another 12 disks.? We would like to just add these new > disks to the existing hardware RAID and thus expand the size of the > brick.? If we can successfully add them to the hardware RAID like this, > will gluster have any problems with the expanded brick size? > > -- > > -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > Pat Haley????????????????????????? Email:? phaley at mit.edu > Center for Ocean Engineering?????? Phone:? (617) 253-6824 > Dept. of Mechanical Engineering??? Fax:??? (617) 253-8125 > MIT, Room 5-213??????????????????? http://web.mit.edu/phaley/www/ > 77 Massachusetts Avenue > Cambridge, MA? 02139-4301 > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users From scott.c.worthington at gmail.com Fri Apr 26 09:59:24 2019 From: scott.c.worthington at gmail.com (Scott Worthington) Date: Fri, 26 Apr 2019 04:59:24 -0500 Subject: [Gluster-users] [Gluster-devel] One more way to contact Gluster team - Slack (gluster.slack.com) In-Reply-To: References: Message-ID: Hello, are you not _BOTH_ Red Hat FTEs or contractors? On Fri, Apr 26, 2019, 3:16 AM Michael Scherer wrote: > Le vendredi 26 avril 2019 ? 13:24 +0530, Amar Tumballi Suryanarayan a > ?crit : > > Hi All, > > > > We wanted to move to Slack from IRC for our official communication > > channel > > from sometime, but couldn't as we didn't had a proper URL for us to > > register. 'gluster' was taken and we didn't knew who had it > > registered. > > Thanks to constant ask from Satish, Slack team has now agreed to let > > us use > > https://gluster.slack.com and I am happy to invite you all there. > > (Use this > > link > > < > > > https://join.slack.com/t/gluster/shared_invite/enQtNjIxMTA1MTk3MDE1LWIzZWZjNzhkYWEwNDdiZWRiOTczMTc4ZjdiY2JiMTc3MDE5YmEyZTRkNzg0MWJiMWM3OGEyMDU2MmYzMTViYTA > > > > > to > > join) > > > > Please note that, it won't be a replacement for mailing list. But can > > be > > used by all developers and users for quick communication. Also note > > that, > > no information there would be 'stored' beyond 10k lines as we are > > using the > > free version of Slack. > > Aren't we concerned about the ToS of slack ? Last time I did read them, > they were quite scary (like, if you use your corporate email, you > engage your employer, and that wasn't the worst part). > > Also, to anticipate the question, my employer Legal department told me > to not setup a bridge between IRC and slack, due to the said ToS. > > -- > Michael Scherer > Sysadmin, Community Infrastructure > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From harold at redhat.com Fri Apr 26 12:20:21 2019 From: harold at redhat.com (Harold Miller) Date: Fri, 26 Apr 2019 08:20:21 -0400 Subject: [Gluster-users] [Gluster-devel] One more way to contact Gluster team - Slack (gluster.slack.com) In-Reply-To: References: Message-ID: Has Red Hat security cleared the Slack systems for confidential / customer information? If not, it will make it difficult for support to collect/answer questions. Harold Miller, Associate Manager, Red Hat, Enterprise Cloud Support Desk - US (650) 254-4346 On Fri, Apr 26, 2019 at 6:00 AM Scott Worthington < scott.c.worthington at gmail.com> wrote: > Hello, are you not _BOTH_ Red Hat FTEs or contractors? > > On Fri, Apr 26, 2019, 3:16 AM Michael Scherer wrote: > >> Le vendredi 26 avril 2019 ? 13:24 +0530, Amar Tumballi Suryanarayan a >> ?crit : >> > Hi All, >> > >> > We wanted to move to Slack from IRC for our official communication >> > channel >> > from sometime, but couldn't as we didn't had a proper URL for us to >> > register. 'gluster' was taken and we didn't knew who had it >> > registered. >> > Thanks to constant ask from Satish, Slack team has now agreed to let >> > us use >> > https://gluster.slack.com and I am happy to invite you all there. >> > (Use this >> > link >> > < >> > >> https://join.slack.com/t/gluster/shared_invite/enQtNjIxMTA1MTk3MDE1LWIzZWZjNzhkYWEwNDdiZWRiOTczMTc4ZjdiY2JiMTc3MDE5YmEyZTRkNzg0MWJiMWM3OGEyMDU2MmYzMTViYTA >> > > >> > to >> > join) >> > >> > Please note that, it won't be a replacement for mailing list. But can >> > be >> > used by all developers and users for quick communication. Also note >> > that, >> > no information there would be 'stored' beyond 10k lines as we are >> > using the >> > free version of Slack. >> >> Aren't we concerned about the ToS of slack ? Last time I did read them, >> they were quite scary (like, if you use your corporate email, you >> engage your employer, and that wasn't the worst part). >> >> Also, to anticipate the question, my employer Legal department told me >> to not setup a bridge between IRC and slack, due to the said ToS. >> >> -- >> Michael Scherer >> Sysadmin, Community Infrastructure >> >> >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -- HAROLD MILLER ASSOCIATE MANAGER, ENTERPRISE CLOUD SUPPORT Red Hat Harold at RedHat.com T: (650)-254-4346 TRIED. TESTED. TRUSTED. -------------- next part -------------- An HTML attachment was scrubbed... URL: From kkeithle at redhat.com Fri Apr 26 12:57:43 2019 From: kkeithle at redhat.com (Kaleb Keithley) Date: Fri, 26 Apr 2019 08:57:43 -0400 Subject: [Gluster-users] [Gluster-devel] One more way to contact Gluster team - Slack (gluster.slack.com) In-Reply-To: References: Message-ID: On Fri, Apr 26, 2019 at 8:21 AM Harold Miller wrote: > Has Red Hat security cleared the Slack systems for confidential / customer > information? > > If not, it will make it difficult for support to collect/answer questions. > I'm pretty sure Amar meant as a replacement for the freenode #gluster and #gluster-dev channels, given that he sent this to the public gluster mailing lists @gluster.org. Nobody should have even been posting confidential and/or customer information to any of those lists or channels. And AFAIK nobody ever has. Amar, would you like to clarify which IRC channels you meant? > Harold Miller, Associate Manager, > Red Hat, Enterprise Cloud Support > Desk - US (650) 254-4346 > > > > On Fri, Apr 26, 2019 at 6:00 AM Scott Worthington < > scott.c.worthington at gmail.com> wrote: > >> Hello, are you not _BOTH_ Red Hat FTEs or contractors? >> >> On Fri, Apr 26, 2019, 3:16 AM Michael Scherer >> wrote: >> >>> Le vendredi 26 avril 2019 ? 13:24 +0530, Amar Tumballi Suryanarayan a >>> ?crit : >>> > Hi All, >>> > >>> > We wanted to move to Slack from IRC for our official communication >>> > channel >>> > from sometime, but couldn't as we didn't had a proper URL for us to >>> > register. 'gluster' was taken and we didn't knew who had it >>> > registered. >>> > Thanks to constant ask from Satish, Slack team has now agreed to let >>> > us use >>> > https://gluster.slack.com and I am happy to invite you all there. >>> > (Use this >>> > link >>> > < >>> > >>> https://join.slack.com/t/gluster/shared_invite/enQtNjIxMTA1MTk3MDE1LWIzZWZjNzhkYWEwNDdiZWRiOTczMTc4ZjdiY2JiMTc3MDE5YmEyZTRkNzg0MWJiMWM3OGEyMDU2MmYzMTViYTA >>> > > >>> > to >>> > join) >>> > >>> > Please note that, it won't be a replacement for mailing list. But can >>> > be >>> > used by all developers and users for quick communication. Also note >>> > that, >>> > no information there would be 'stored' beyond 10k lines as we are >>> > using the >>> > free version of Slack. >>> >>> Aren't we concerned about the ToS of slack ? Last time I did read them, >>> they were quite scary (like, if you use your corporate email, you >>> engage your employer, and that wasn't the worst part). >>> >>> Also, to anticipate the question, my employer Legal department told me >>> to not setup a bridge between IRC and slack, due to the said ToS. >>> >>> -- >>> Michael Scherer >>> Sysadmin, Community Infrastructure >>> >>> >>> >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> https://lists.gluster.org/mailman/listinfo/gluster-users >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users > > > > -- > > HAROLD MILLER > > ASSOCIATE MANAGER, ENTERPRISE CLOUD SUPPORT > > Red Hat > > > > Harold at RedHat.com T: (650)-254-4346 > > TRIED. TESTED. TRUSTED. > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From atumball at redhat.com Fri Apr 26 13:20:09 2019 From: atumball at redhat.com (Amar Tumballi Suryanarayan) Date: Fri, 26 Apr 2019 18:50:09 +0530 Subject: [Gluster-users] [Gluster-devel] One more way to contact Gluster team - Slack (gluster.slack.com) In-Reply-To: References: Message-ID: On Fri, Apr 26, 2019 at 6:27 PM Kaleb Keithley wrote: > > > On Fri, Apr 26, 2019 at 8:21 AM Harold Miller wrote: > >> Has Red Hat security cleared the Slack systems for confidential / >> customer information? >> >> If not, it will make it difficult for support to collect/answer questions. >> > > I'm pretty sure Amar meant as a replacement for the freenode #gluster and > #gluster-dev channels, given that he sent this to the public gluster > mailing lists @gluster.org. Nobody should have even been posting > confidential and/or customer information to any of those lists or channels. > And AFAIK nobody ever has. > > Yep, I am only talking about IRC (from freenode, #gluster, #gluster-dev etc). Also, I am not saying we are 'replacing IRC'. Gluster as a project started in pre-Slack era, and we have many users who prefer to stay in IRC. So, for now, no pressure to make a statement calling Slack channel as a 'Replacement' to IRC. > Amar, would you like to clarify which IRC channels you meant? > > Thanks Kaleb. I was bit confused on why the concern of it came up in this group. > >> On Fri, Apr 26, 2019 at 6:00 AM Scott Worthington < >> scott.c.worthington at gmail.com> wrote: >> >>> Hello, are you not _BOTH_ Red Hat FTEs or contractors? >>> >>> Yes! but come from very different internal teams. Michael supports Gluster (the project) team's Infrastructure needs, and has valid concerns from his perspective :-) I, on the other hand, bother more about code, users, and how to make sure we are up-to-date with other technologies and communities, from the engineering view point. > On Fri, Apr 26, 2019, 3:16 AM Michael Scherer wrote: >>> >>>> Le vendredi 26 avril 2019 ? 13:24 +0530, Amar Tumballi Suryanarayan a >>>> ?crit : >>>> > Hi All, >>>> > >>>> > We wanted to move to Slack from IRC for our official communication >>>> > channel >>>> > from sometime, but couldn't as we didn't had a proper URL for us to >>>> > register. 'gluster' was taken and we didn't knew who had it >>>> > registered. >>>> > Thanks to constant ask from Satish, Slack team has now agreed to let >>>> > us use >>>> > https://gluster.slack.com and I am happy to invite you all there. >>>> > (Use this >>>> > link >>>> > < >>>> > >>>> https://join.slack.com/t/gluster/shared_invite/enQtNjIxMTA1MTk3MDE1LWIzZWZjNzhkYWEwNDdiZWRiOTczMTc4ZjdiY2JiMTc3MDE5YmEyZTRkNzg0MWJiMWM3OGEyMDU2MmYzMTViYTA >>>> > > >>>> > to >>>> > join) >>>> > >>>> > Please note that, it won't be a replacement for mailing list. But can >>>> > be >>>> > used by all developers and users for quick communication. Also note >>>> > that, >>>> > no information there would be 'stored' beyond 10k lines as we are >>>> > using the >>>> > free version of Slack. >>>> >>>> Aren't we concerned about the ToS of slack ? Last time I did read them, >>>> they were quite scary (like, if you use your corporate email, you >>>> engage your employer, and that wasn't the worst part). >>>> >>>> Also, to anticipate the question, my employer Legal department told me >>>> to not setup a bridge between IRC and slack, due to the said ToS. >>>> >>>> Again, re-iterating here. Not planning to use any bridges from IRC to Slack. I re-read the Slack API Terms and condition. And it makes sense. They surely don't want us to build another slack, or abuse slack with too many API requests made for collecting logs. Currently, to start with, we are not adding any bots (other than github bot). Hopefully, that will keep us under proper usage guidelines. -Amar > -- >>>> Michael Scherer >>>> Sysadmin, Community Infrastructure >>>> >>>> >>>> >>>> _______________________________________________ >>>> Gluster-users mailing list >>>> Gluster-users at gluster.org >>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>> >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> https://lists.gluster.org/mailman/listinfo/gluster-users >> >> >> >> -- >> >> HAROLD MILLER >> >> ASSOCIATE MANAGER, ENTERPRISE CLOUD SUPPORT >> >> Red Hat >> >> >> >> Harold at RedHat.com T: (650)-254-4346 >> >> TRIED. TESTED. TRUSTED. >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users > > -- Amar Tumballi (amarts) -------------- next part -------------- An HTML attachment was scrubbed... URL: From harold at redhat.com Fri Apr 26 13:26:35 2019 From: harold at redhat.com (Harold Miller) Date: Fri, 26 Apr 2019 09:26:35 -0400 Subject: [Gluster-users] [Gluster-devel] One more way to contact Gluster team - Slack (gluster.slack.com) In-Reply-To: References: Message-ID: Amar, Thanks for the clarification. I'll go climb back into my cave. :) Harold On Fri, Apr 26, 2019 at 9:20 AM Amar Tumballi Suryanarayan < atumball at redhat.com> wrote: > > > On Fri, Apr 26, 2019 at 6:27 PM Kaleb Keithley > wrote: > >> >> >> On Fri, Apr 26, 2019 at 8:21 AM Harold Miller wrote: >> >>> Has Red Hat security cleared the Slack systems for confidential / >>> customer information? >>> >>> If not, it will make it difficult for support to collect/answer >>> questions. >>> >> >> I'm pretty sure Amar meant as a replacement for the freenode #gluster and >> #gluster-dev channels, given that he sent this to the public gluster >> mailing lists @gluster.org. Nobody should have even been posting >> confidential and/or customer information to any of those lists or channels. >> And AFAIK nobody ever has. >> >> > Yep, I am only talking about IRC (from freenode, #gluster, #gluster-dev > etc). Also, I am not saying we are 'replacing IRC'. Gluster as a project > started in pre-Slack era, and we have many users who prefer to stay in IRC. > So, for now, no pressure to make a statement calling Slack channel as a > 'Replacement' to IRC. > > >> Amar, would you like to clarify which IRC channels you meant? >> >> > > Thanks Kaleb. I was bit confused on why the concern of it came up in this > group. > > > >> >>> On Fri, Apr 26, 2019 at 6:00 AM Scott Worthington < >>> scott.c.worthington at gmail.com> wrote: >>> >>>> Hello, are you not _BOTH_ Red Hat FTEs or contractors? >>>> >>>> > Yes! but come from very different internal teams. > > Michael supports Gluster (the project) team's Infrastructure needs, and > has valid concerns from his perspective :-) I, on the other hand, bother > more about code, users, and how to make sure we are up-to-date with other > technologies and communities, from the engineering view point. > > >> On Fri, Apr 26, 2019, 3:16 AM Michael Scherer >>>> wrote: >>>> >>>>> Le vendredi 26 avril 2019 ? 13:24 +0530, Amar Tumballi Suryanarayan a >>>>> ?crit : >>>>> > Hi All, >>>>> > >>>>> > We wanted to move to Slack from IRC for our official communication >>>>> > channel >>>>> > from sometime, but couldn't as we didn't had a proper URL for us to >>>>> > register. 'gluster' was taken and we didn't knew who had it >>>>> > registered. >>>>> > Thanks to constant ask from Satish, Slack team has now agreed to let >>>>> > us use >>>>> > https://gluster.slack.com and I am happy to invite you all there. >>>>> > (Use this >>>>> > link >>>>> > < >>>>> > >>>>> https://join.slack.com/t/gluster/shared_invite/enQtNjIxMTA1MTk3MDE1LWIzZWZjNzhkYWEwNDdiZWRiOTczMTc4ZjdiY2JiMTc3MDE5YmEyZTRkNzg0MWJiMWM3OGEyMDU2MmYzMTViYTA >>>>> > > >>>>> > to >>>>> > join) >>>>> > >>>>> > Please note that, it won't be a replacement for mailing list. But can >>>>> > be >>>>> > used by all developers and users for quick communication. Also note >>>>> > that, >>>>> > no information there would be 'stored' beyond 10k lines as we are >>>>> > using the >>>>> > free version of Slack. >>>>> >>>>> Aren't we concerned about the ToS of slack ? Last time I did read them, >>>>> they were quite scary (like, if you use your corporate email, you >>>>> engage your employer, and that wasn't the worst part). >>>>> >>>>> Also, to anticipate the question, my employer Legal department told me >>>>> to not setup a bridge between IRC and slack, due to the said ToS. >>>>> >>>>> > Again, re-iterating here. Not planning to use any bridges from IRC to > Slack. I re-read the Slack API Terms and condition. And it makes sense. > They surely don't want us to build another slack, or abuse slack with too > many API requests made for collecting logs. > > Currently, to start with, we are not adding any bots (other than github > bot). Hopefully, that will keep us under proper usage guidelines. > > -Amar > > >> -- >>>>> Michael Scherer >>>>> Sysadmin, Community Infrastructure >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Gluster-users mailing list >>>>> Gluster-users at gluster.org >>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>> >>>> _______________________________________________ >>>> Gluster-users mailing list >>>> Gluster-users at gluster.org >>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>> >>> >>> >>> -- >>> >>> HAROLD MILLER >>> >>> ASSOCIATE MANAGER, ENTERPRISE CLOUD SUPPORT >>> >>> Red Hat >>> >>> >>> >>> Harold at RedHat.com T: (650)-254-4346 >>> >>> TRIED. TESTED. TRUSTED. >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> https://lists.gluster.org/mailman/listinfo/gluster-users >> >> > > -- > Amar Tumballi (amarts) > -- HAROLD MILLER ASSOCIATE MANAGER, ENTERPRISE CLOUD SUPPORT Red Hat Harold at RedHat.com T: (650)-254-4346 TRIED. TESTED. TRUSTED. -------------- next part -------------- An HTML attachment was scrubbed... URL: From shonrs at redhat.com Fri Apr 26 13:29:23 2019 From: shonrs at redhat.com (Shon Stephens) Date: Fri, 26 Apr 2019 09:29:23 -0400 Subject: [Gluster-users] Gluster 5 Geo-replication Guide Message-ID: Dear All, Is there a good, step by step guide for setting up geo-replication with Glusterfs 5? The docs are a difficult to decipher read, for me, and seem more feature guide than actual instruction. Thank you, Shon -- SHON STEPHENS SENIOR CONSULTANT Red Hat T: 571-781-0787 M: 703-297-0682 TRIED. TESTED. TRUSTED. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mscherer at redhat.com Fri Apr 26 14:43:53 2019 From: mscherer at redhat.com (Michael Scherer) Date: Fri, 26 Apr 2019 16:43:53 +0200 Subject: [Gluster-users] [Gluster-devel] One more way to contact Gluster team - Slack (gluster.slack.com) In-Reply-To: References: Message-ID: <52de190db15f5c1225029b6785f266d72751657e.camel@redhat.com> Le vendredi 26 avril 2019 ? 04:59 -0500, Scott Worthington a ?crit : > Hello, are you not _BOTH_ Red Hat FTEs or contractors? We do, that was just a turn of phrase to be more generic. > On Fri, Apr 26, 2019, 3:16 AM Michael Scherer > wrote: > > > Le vendredi 26 avril 2019 ? 13:24 +0530, Amar Tumballi Suryanarayan > > a > > ?crit : > > > Hi All, > > > > > > We wanted to move to Slack from IRC for our official > > > communication > > > channel > > > from sometime, but couldn't as we didn't had a proper URL for us > > > to > > > register. 'gluster' was taken and we didn't knew who had it > > > registered. > > > Thanks to constant ask from Satish, Slack team has now agreed to > > > let > > > us use > > > https://gluster.slack.com and I am happy to invite you all there. > > > (Use this > > > link > > > < > > > > > > > https://join.slack.com/t/gluster/shared_invite/enQtNjIxMTA1MTk3MDE1LWIzZWZjNzhkYWEwNDdiZWRiOTczMTc4ZjdiY2JiMTc3MDE5YmEyZTRkNzg0MWJiMWM3OGEyMDU2MmYzMTViYTA > > > > > > > > > > to > > > join) > > > > > > Please note that, it won't be a replacement for mailing list. But > > > can > > > be > > > used by all developers and users for quick communication. Also > > > note > > > that, > > > no information there would be 'stored' beyond 10k lines as we are > > > using the > > > free version of Slack. > > > > Aren't we concerned about the ToS of slack ? Last time I did read > > them, > > they were quite scary (like, if you use your corporate email, you > > engage your employer, and that wasn't the worst part). > > > > Also, to anticipate the question, my employer Legal department told > > me > > to not setup a bridge between IRC and slack, due to the said ToS. > > > > -- > > Michael Scherer > > Sysadmin, Community Infrastructure > > > > > > > > _______________________________________________ > > Gluster-users mailing list > > Gluster-users at gluster.org > > https://lists.gluster.org/mailman/listinfo/gluster-users -- Michael Scherer Sysadmin, Community Infrastructure -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: This is a digitally signed message part URL: From amye at redhat.com Fri Apr 26 15:10:42 2019 From: amye at redhat.com (Amye Scavarda) Date: Fri, 26 Apr 2019 08:10:42 -0700 Subject: [Gluster-users] Gluster Monthly Newsletter, April 2019 Message-ID: Gluster Monthly Newsletter, April 2019 Upcoming Community Happy Hour at Red Hat Summit! Tue, May 7, 2019, 6:30 PM ? 7:30 PM EDT https://cephandglusterhappyhour_rhsummit.eventbrite.com has all the details. Gluster 7 Roadmap Discussion kicked off for our 7 roadmap on the mailing lists, see [Gluster-users] GlusterFS v7.0 (and v8.0) roadmap discussion https://lists.gluster.org/pipermail/gluster-users/2019-March/036139.html for more details. Community Survey Feedback is available! Come see how other people are using Gluster and what features they?d like! https://www.gluster.org/community-survey-feedback-2019/ Gluster Friday Five: Find our Friday Five podcast at https://www.youtube.com/channel/UCfilWh0JA5NfCjbqq1vsBVA ---- Contributors Top Contributing Companies: Red Hat, Comcast, DataLab, Gentoo Linux, Facebook, BioDec, Samsung, Etersoft Top Contributors in April: Atin Mukherjee, Sanju Rakonde, Kotresh HR, Pranith Kumar Karampuri, Kinglong Mee ---- Noteworthy Threads: [Gluster-users] [Gluster-devel] Upgrade testing to gluster 6 https://lists.gluster.org/pipermail/gluster-users/2019-April/036229.html [Gluster-users] Heketi v9.0.0 available for download https://lists.gluster.org/pipermail/gluster-users/2019-April/036285.html [Gluster-users] Proposal: Changes in Gluster Community meetings https://lists.gluster.org/pipermail/gluster-users/2019-April/036337.html [Gluster-users] XFS, WORM and the Year-2038 Problem https://lists.gluster.org/pipermail/gluster-users/2019-April/036356.html [Gluster-users] Community Happy Hour at Red Hat Summit https://lists.gluster.org/pipermail/gluster-users/2019-April/036422.html [Gluster-devel] Backporting important fixes in release branches https://lists.gluster.org/pipermail/gluster-devel/2019-April/056041.html [Gluster-devel] BZ updates https://lists.gluster.org/pipermail/gluster-devel/2019-April/056153.html [Gluster-users] One more way to contact Gluster team - Slack (gluster.slack.com) https://lists.gluster.org/pipermail/gluster-users/2019-April/036440.html Events: Red Hat Summit, May 4-6, 2019 - https://www.redhat.com/en/summit/2019 Open Source Summit and KubeCon + CloudNativeCon Shanghai, June 24-26, 2019 https://www.lfasiallc.com/events/kubecon-cloudnativecon-china-2019/ DevConf India, August 2- 3 2019, Bengaluru - https://devconf.info/in DevConf USA, August 15-17, 2019, Boston - https://devconf.info/us/ -- Amye Scavarda | amye at redhat.com | Gluster Community Lead From amye at redhat.com Fri Apr 26 15:25:22 2019 From: amye at redhat.com (Amye Scavarda) Date: Fri, 26 Apr 2019 08:25:22 -0700 Subject: [Gluster-users] Signing off from Gluster Message-ID: It's been a delight to work with this community for the past few years, and as of today, I'm stepping away for a new opportunity. Amar Tumballi has already taken up several of the community initiatives that were in flight. You've already seen his work in creating new communication channels and meeting times, and I look forward to seeing what he does in the future! Mike Perez, the Ceph Commmunity Lead, has graciously volunteered to support the community in the interim, and he's copied on this message as well. Stormy Peters, the Community Team Manager is also available for questions. Thank you all! -- amye -- Amye Scavarda | amye at redhat.com | Gluster Community Lead From snowmailer at gmail.com Fri Apr 26 15:33:07 2019 From: snowmailer at gmail.com (Martin Toth) Date: Fri, 26 Apr 2019 17:33:07 +0200 Subject: [Gluster-users] Signing off from Gluster In-Reply-To: References: Message-ID: <0842FB62-3DBD-4012-B3BF-8C303DB36E00@gmail.com> Thanks for all. We will miss you! BR! > On 26 Apr 2019, at 17:25, Amye Scavarda wrote: > > It's been a delight to work with this community for the past few > years, and as of today, I'm stepping away for a new opportunity. Amar > Tumballi has already taken up several of the community initiatives > that were in flight. You've already seen his work in creating new > communication channels and meeting times, and I look forward to seeing > what he does in the future! > > Mike Perez, the Ceph Commmunity Lead, has graciously volunteered to > support the community in the interim, and he's copied on this message > as well. Stormy Peters, the Community Team Manager is also available > for questions. > > Thank you all! > -- amye > > -- > Amye Scavarda | amye at redhat.com | Gluster Community Lead > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users From hetz at hetz.biz Sat Apr 27 15:05:32 2019 From: hetz at hetz.biz (Hetz Ben Hamo) Date: Sat, 27 Apr 2019 18:05:32 +0300 Subject: [Gluster-users] puzzled about calculation Message-ID: Hi, I've looked at a YouTube video about Gluster volumes creation. The video is here: https://www.youtube.com/watch?v=9SRsvFZZa5E One thing that is weird to me is this: the guy creates a volume of replica 2, where each brick is 66TB (see at approx 12:10), yet at the end on windows where it shows the shares - each share is 132TB... Shouldn't each share be 66TB or am I missing something? Thanks -------------- next part -------------- An HTML attachment was scrubbed... URL: From amukherj at redhat.com Sat Apr 27 16:38:40 2019 From: amukherj at redhat.com (Atin Mukherjee) Date: Sat, 27 Apr 2019 22:08:40 +0530 Subject: [Gluster-users] puzzled about calculation In-Reply-To: References: Message-ID: On Sat, 27 Apr 2019 at 20:36, Hetz Ben Hamo wrote: > Hi, > > I've looked at a YouTube video about Gluster volumes creation. The video > is here: > https://www.youtube.com/watch?v=9SRsvFZZa5E > > One thing that is weird to me is this: the guy creates a volume of replica > 2, where each brick is 66TB (see at approx 12:10), yet at the end on > windows where it shows the shares - each share is 132TB... > Its a 2x2 volume which means it will comprise of total of 66X4 TB of space where 66X2=132 TB will be the size of actual storage space offered to the unified namespace. > Shouldn't each share be 66TB or am I missing something? > > Thanks > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -- - Atin (atinm) -------------- next part -------------- An HTML attachment was scrubbed... URL: From hetz at hetz.biz Sat Apr 27 16:40:04 2019 From: hetz at hetz.biz (Hetz Ben Hamo) Date: Sat, 27 Apr 2019 19:40:04 +0300 Subject: [Gluster-users] puzzled about calculation In-Reply-To: References: Message-ID: Ok, thanks for the clarification. ;) On Sat, Apr 27, 2019 at 7:38 PM Atin Mukherjee wrote: > > > On Sat, 27 Apr 2019 at 20:36, Hetz Ben Hamo wrote: > >> Hi, >> >> I've looked at a YouTube video about Gluster volumes creation. The video >> is here: >> https://www.youtube.com/watch?v=9SRsvFZZa5E >> >> One thing that is weird to me is this: the guy creates a volume of >> replica 2, where each brick is 66TB (see at approx 12:10), yet at the end >> on windows where it shows the shares - each share is 132TB... >> > > Its a 2x2 volume which means it will comprise of total of 66X4 TB of space > where 66X2=132 TB will be the size of actual storage space offered to the > unified namespace. > > >> Shouldn't each share be 66TB or am I missing something? >> >> Thanks >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users > > -- > - Atin (atinm) > -------------- next part -------------- An HTML attachment was scrubbed... URL: From revirii at googlemail.com Mon Apr 29 07:17:06 2019 From: revirii at googlemail.com (Hu Bert) Date: Mon, 29 Apr 2019 09:17:06 +0200 Subject: [Gluster-users] Upgrade 5.5 -> 5.6: network traffic bug fixed? In-Reply-To: References: Message-ID: Good morning, back in office... ;-) i reactivated quick-read on both volumes and watched traffic, which now looks normal. Well, i did umount/mount of both gluster volumes after installing the upgrades 5.5 -> 5.6, but it seems that this wasn't enough? And the changes took place after doing a reboot (kernel update...) of all clients? Maybe some processes were still running? I'll keep watching network traffic and report if i see that it's higher than usual. Best regards, Hubert Am Di., 23. Apr. 2019 um 15:34 Uhr schrieb Poornima Gurusiddaiah : > > Hi, > > Thank you for the update, sorry for the delay. > > I did some more tests, but couldn't see the behaviour of spiked network bandwidth usage when quick-read is on. After upgrading, have you remounted the clients? As in the fix will not be effective until the process is restarted. > If you have already restarted the client processes, then there must be something related to workload in the live system that is triggering a bug in quick-read. Would need wireshark capture if possible, to debug further. > > Regards, > Poornima > > On Tue, Apr 16, 2019 at 6:25 PM Hu Bert wrote: >> >> Hi Poornima, >> >> thx for your efforts. I made a couple of tests and the results are the >> same, so the options are not related. Anyway, i'm not able to >> reproduce the problem on my testing system, although the volume >> options are the same. >> >> About 1.5 hours ago i set performance.quick-read to on again and >> watched: load/iowait went up (not bad at the moment, little traffic), >> but network traffic went up - from <20 MBit/s up to 160 MBit/s. After >> deactivating quick-read traffic dropped to < 20 MBit/s again. >> >> munin graph: https://abload.de/img/network-client4s0kle.png >> >> The 2nd peak is from the last test. >> >> >> Thx, >> Hubert >> >> Am Di., 16. Apr. 2019 um 09:43 Uhr schrieb Hu Bert : >> > >> > In my first test on my testing setup the traffic was on a normal >> > level, so i thought i was "safe". But on my live system the network >> > traffic was a multiple of the traffic one would expect. >> > performance.quick-read was enabled in both, the only difference in the >> > volume options between live and testing are: >> > >> > performance.read-ahead: testing on, live off >> > performance.io-cache: testing on, live off >> > >> > I ran another test on my testing setup, deactivated both and copied 9 >> > GB of data. Now the traffic went up as well, from before ~9-10 MBit/s >> > up to 100 MBit/s with both options off. Does performance.quick-read >> > require one of those options set to 'on'? >> > >> > I'll start another test shortly, and activate on of those 2 options, >> > maybe there's a connection between those 3 options? >> > >> > >> > Best Regards, >> > Hubert >> > >> > Am Di., 16. Apr. 2019 um 08:57 Uhr schrieb Poornima Gurusiddaiah >> > : >> > > >> > > Thank you for reporting this. I had done testing on my local setup and the issue was resolved even with quick-read enabled. Let me test it again. >> > > >> > > Regards, >> > > Poornima >> > > >> > > On Mon, Apr 15, 2019 at 12:25 PM Hu Bert wrote: >> > >> >> > >> fyi: after setting performance.quick-read to off network traffic >> > >> dropped to normal levels, client load/iowait back to normal as well. >> > >> >> > >> client: https://abload.de/img/network-client-afterihjqi.png >> > >> server: https://abload.de/img/network-server-afterwdkrl.png >> > >> >> > >> Am Mo., 15. Apr. 2019 um 08:33 Uhr schrieb Hu Bert : >> > >> > >> > >> > Good Morning, >> > >> > >> > >> > today i updated my replica 3 setup (debian stretch) from version 5.5 >> > >> > to 5.6, as i thought the network traffic bug (#1673058) was fixed and >> > >> > i could re-activate 'performance.quick-read' again. See release notes: >> > >> > >> > >> > https://review.gluster.org/#/c/glusterfs/+/22538/ >> > >> > http://git.gluster.org/cgit/glusterfs.git/commit/?id=34a2347780c2429284f57232f3aabb78547a9795 >> > >> > >> > >> > Upgrade went fine, and then i was watching iowait and network traffic. >> > >> > It seems that the network traffic went up after upgrade and >> > >> > reactivation of performance.quick-read. Here are some graphs: >> > >> > >> > >> > network client1: https://abload.de/img/network-clientfwj1m.png >> > >> > network client2: https://abload.de/img/network-client2trkow.png >> > >> > network server: https://abload.de/img/network-serverv3jjr.png >> > >> > >> > >> > gluster volume info: https://pastebin.com/ZMuJYXRZ >> > >> > >> > >> > Just wondering if the network traffic bug really got fixed or if this >> > >> > is a new problem. I'll wait a couple of minutes and then deactivate >> > >> > performance.quick-read again, just to see if network traffic goes down >> > >> > to normal levels. >> > >> > >> > >> > >> > >> > Best regards, >> > >> > Hubert >> > >> _______________________________________________ >> > >> Gluster-users mailing list >> > >> Gluster-users at gluster.org >> > >> https://lists.gluster.org/mailman/listinfo/gluster-users From spisla80 at gmail.com Mon Apr 29 08:49:08 2019 From: spisla80 at gmail.com (David Spisla) Date: Mon, 29 Apr 2019 10:49:08 +0200 Subject: [Gluster-users] XFS, WORM and the Year-2038 Problem In-Reply-To: References: Message-ID: Hello Gluster Community, here is a possible explanation why the LastAccess date is changed at brick level resp why can XFS ever have a date of e.g. Can store 2070 in an INT32 field: It's amazing that you can set timestamps well above 2038 for the atime and these are also displayed via the usual system tools. After a while, it was observed that the values change and are mapped to the range between 1902-1969. I suspect that the initially successful setting of a well over 2038 stationary atime corresponds to an *in-memory* representation of the timestamp. This seems to allow setting over 2038. The *on-disk* representation of XFS, on the other hand, only allows the maximum value of 2038, values above are then mapped to the range 1902-1969, which is the negative number range of a signed int32. This is what I have taken from this thread: https://lkml.org/lkml/2014/6/1/240 Finally I observed, that after reboot or remount of the XFS Filesystem the in-memory representation changes to the on-disk representation. Concerning the WORM functionality it seems to be neccessary to enable the ctime feature, otherwise the information of the Retention would be lost, if the Retention date is above 2038 in case of reboot or remount of the XFS Filesystem. Regards David Spisla Am Mo., 15. Apr. 2019 um 11:51 Uhr schrieb David Spisla : > Hello Amar, > > Am Mo., 15. Apr. 2019 um 11:27 Uhr schrieb Amar Tumballi Suryanarayan < > atumball at redhat.com>: > >> >> >> On Mon, Apr 15, 2019 at 2:40 PM David Spisla wrote: >> >>> Hi folks, >>> I tried out default retention periods e.g. to set the Retention date to >>> 2071. When I did the WORMing, everything seems to be OK. From FUSE and also >>> at Brick-Level, the retention was set to 2071 on all nodes.Additionally I >>> enabled the storage.ctime option, so that the timestamps are stored in the >>> mdata xattr, too. But after a while I obeserved, that on Brick-Level the >>> atime (which stores the retention) was switched to 1934: >>> >>> # stat /gluster/brick1/glusterbrick/data/file3.txt >>> File: /gluster/brick1/glusterbrick/data/file3.txt >>> Size: 5 Blocks: 16 IO Block: 4096 regular file >>> Device: 830h/2096d Inode: 115 Links: 2 >>> Access: (0544/-r-xr--r--) Uid: ( 2000/ gluster) Gid: ( 2000/ >>> gluster) >>> Access: 1934-12-13 20:45:51.000000000 +0000 >>> Modify: 2019-04-10 09:50:09.000000000 +0000 >>> Change: 2019-04-10 10:13:39.703623917 +0000 >>> Birth: - >>> >>> From FUSE I get the correct atime: >>> # stat /gluster/volume1/data/file3.txt >>> File: /gluster/volume1/data/file3.txt >>> Size: 5 Blocks: 1 IO Block: 131072 regular file >>> Device: 2eh/46d Inode: 10812026387234582248 Links: 1 >>> Access: (0544/-r-xr--r--) Uid: ( 2000/ gluster) Gid: ( 2000/ >>> gluster) >>> Access: 2071-01-19 03:14:07.000000000 +0000 >>> Modify: 2019-04-10 09:50:09.000000000 +0000 >>> Change: 2019-04-10 10:13:39.705341476 +0000 >>> Birth: - >>> >>> >> From FUSE you get the time of what the clients set, as we now store >> timestamp as extended attribute, not the 'stat->st_atime'. >> >> This is called 'ctime' feature which we introduced in glusterfs-5.0, It >> helps us to support statx() variables. >> > So I am assuming that the values in the default xfs timestamps are not > important for WORM, if I use storage.ctime? > Does it work correctly with other clients like samba-vfs-glusterfs? > >> >> >>> I find out that XFS supports only 32-Bit timestamp values. So in my >>> expectation it should not be possible to set the atime to 2071. But at >>> first it was 2071 and later it was switched to 1934 due to the YEAR-2038 >>> problem. I am asking myself: >>> 1. Why it is possible to set atime on XFS greater than 2038? >>> 2. And why this atime switched to a time lower 1970 after a while? >>> >>> Regards >>> David Spisla >>> >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> https://lists.gluster.org/mailman/listinfo/gluster-users >> >> >> >> -- >> Amar Tumballi (amarts) >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mscherer at redhat.com Mon Apr 29 09:20:54 2019 From: mscherer at redhat.com (Michael Scherer) Date: Mon, 29 Apr 2019 11:20:54 +0200 Subject: [Gluster-users] [Gluster-devel] One more way to contact Gluster team - Slack (gluster.slack.com) In-Reply-To: References: Message-ID: <9ab07686a7beb07344fe39f76023dee4743619c8.camel@redhat.com> Le vendredi 26 avril 2019 ? 18:50 +0530, Amar Tumballi Suryanarayan a ?crit : > On Fri, Apr 26, 2019 at 6:27 PM Kaleb Keithley > wrote: > > > > > > > On Fri, Apr 26, 2019 at 8:21 AM Harold Miller > > wrote: > > > > > Has Red Hat security cleared the Slack systems for confidential / > > > customer information? > > > > > > If not, it will make it difficult for support to collect/answer > > > questions. > > > > > > > I'm pretty sure Amar meant as a replacement for the freenode > > #gluster and > > #gluster-dev channels, given that he sent this to the public > > gluster > > mailing lists @gluster.org. Nobody should have even been posting > > confidential and/or customer information to any of those lists or > > channels. > > And AFAIK nobody ever has. > > > > > > Yep, I am only talking about IRC (from freenode, #gluster, #gluster- > dev etc). Also, I am not saying we are 'replacing IRC'. Gluster as a > project started in pre-Slack era, and we have many users who prefer > to stay in IRC. > So, for now, no pressure to make a statement calling Slack channel as > a 'Replacement' to IRC. > > > > Amar, would you like to clarify which IRC channels you meant? > > > > > > Thanks Kaleb. I was bit confused on why the concern of it came up in > this group. Well, unless people start to be on both irc and slack and everything, that's fragmentation. Also, since people can't access old logs (per design with the free plan of slack), but they are still here on slack servers, how is it going to work from a GDPR point of view ? Shouldn't it requires a update to the privacy policy ? -- Michael Scherer Sysadmin, Community Infrastructure -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: This is a digitally signed message part URL: From joao.bauto at neuro.fchampalimaud.org Mon Apr 29 10:25:32 2019 From: joao.bauto at neuro.fchampalimaud.org (=?UTF-8?B?Sm/Do28gQmHDunRv?=) Date: Mon, 29 Apr 2019 11:25:32 +0100 Subject: [Gluster-users] parallel-readdir prevents directories and files listing - Bug 1670382 Message-ID: Hi, I have an 8 brick distributed volume where Windows and Linux clients mount the volume via samba and headless compute servers using gluster native fuse. With parallel-readdir on, if a Windows client creates a new folder, the folder is indeed created but invisible to the Windows client. Accessing the same samba share in a Linux client, the folder is again visible and with normal behavior. The same folder is also visible when mounting via gluster native fuse. The Windows client can list existing directories and rename them while, for files, everything seems to be working fine. Gluster servers: CentOS 7.5 with Gluster 5.3 and Samba 4.8.3-4.el7.0.1 from @fasttrack Clients tested: Windows 10, Ubuntu 18.10, CentOS 7.5 https://bugzilla.redhat.com/show_bug.cgi?id=1670382 Volume Name: tank Type: Distribute Volume ID: 9582685f-07fa-41fd-b9fc-ebab3a6989cf Status: Started Snapshot Count: 0 Number of Bricks: 8 Transport-type: tcp Bricks: Brick1: swp-gluster-01:/tank/volume1/brick Brick2: swp-gluster-02:/tank/volume1/brick Brick3: swp-gluster-03:/tank/volume1/brick Brick4: swp-gluster-04:/tank/volume1/brick Brick5: swp-gluster-01:/tank/volume2/brick Brick6: swp-gluster-02:/tank/volume2/brick Brick7: swp-gluster-03:/tank/volume2/brick Brick8: swp-gluster-04:/tank/volume2/brick Options Reconfigured: performance.parallel-readdir: on performance.readdir-ahead: on performance.cache-invalidation: on performance.md-cache-timeout: 600 storage.batch-fsync-delay-usec: 0 performance.write-behind-window-size: 32MB performance.stat-prefetch: on performance.read-ahead: on performance.read-ahead-page-count: 16 performance.rda-request-size: 131072 performance.quick-read: on performance.open-behind: on performance.nl-cache-timeout: 600 performance.nl-cache: on performance.io-thread-count: 64 performance.io-cache: off performance.flush-behind: on performance.client-io-threads: off performance.write-behind: off performance.cache-samba-metadata: on network.inode-lru-limit: 0 features.cache-invalidation-timeout: 600 features.cache-invalidation: on cluster.readdir-optimize: on cluster.lookup-optimize: on client.event-threads: 4 server.event-threads: 16 features.quota-deem-statfs: on nfs.disable: on features.quota: on features.inode-quota: on cluster.enable-shared-storage: disable Cheers, Jo?o Ba?to -------------- next part -------------- An HTML attachment was scrubbed... URL: From aspandey at redhat.com Mon Apr 29 10:43:59 2019 From: aspandey at redhat.com (aspandey at redhat.com) Date: Mon, 29 Apr 2019 10:43:59 +0000 Subject: [Gluster-users] Invitation: Gluster Community Meeting (APAC friendly hours) @ Tue Apr 30, 2019 11:30am - 12:30pm (IST) (gluster-users@gluster.org) Message-ID: <000000000000463e3a0587a8f6ae@google.com> You have been invited to the following event. Title: Gluster Community Meeting (APAC friendly hours) Bridge: https://bluejeans.com/836554017 Meeting minutes: https://hackmd.io/OqZbh7gfQe6uvVUXUVKJ5g?both Previous Meeting notes: http://github.com/gluster/community When: Tue Apr 30, 2019 11:30am ? 12:30pm India Standard Time - Kolkata Where: https://bluejeans.com/836554017 Calendar: gluster-users at gluster.org Who: * aspandey at redhat.com - organizer * gluster-users at gluster.org * maintainers at gluster.org * gluster-devel at gluster.org Event details: https://www.google.com/calendar/event?action=VIEW&eid=N2NpMWp1YjRkbmZoYjhxNWMyZ2ZxdTB1dmUgZ2x1c3Rlci11c2Vyc0BnbHVzdGVyLm9yZw&tok=MTkjYXNwYW5kZXlAcmVkaGF0LmNvbWRmNDE5YmMxMTg3ZTY4ZDA5ZWUwODY4MjJjMDYwOGEzZDNiMGVlNzE&ctz=Asia%2FKolkata&hl=en&es=0 Invitation from Google Calendar: https://www.google.com/calendar/ You are receiving this courtesy email at the account gluster-users at gluster.org because you are an attendee of this event. To stop receiving future updates for this event, decline this event. Alternatively you can sign up for a Google account at https://www.google.com/calendar/ and control your notification settings for your entire calendar. Forwarding this invitation could allow any recipient to send a response to the organizer and be added to the guest list, or invite others regardless of their own invitation status, or to modify your RSVP. Learn more at https://support.google.com/calendar/answer/37135#forwarding -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/calendar Size: 1882 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: invite.ics Type: application/ics Size: 1923 bytes Desc: not available URL: From jthottan at redhat.com Tue Apr 30 07:20:11 2019 From: jthottan at redhat.com (Jiffin Tony Thottan) Date: Tue, 30 Apr 2019 12:50:11 +0530 Subject: [Gluster-users] Proposing to previous ganesha HA cluster solution back to gluster code as gluster-7 feature Message-ID: Hi all, Some of you folks may be familiar with HA solution provided for nfs-ganesha by gluster using pacemaker and corosync. That feature was removed in glusterfs 3.10 in favour for common HA project "Storhaug". Even Storhaug was not progressed much from last two years and current development is in halt state, hence planning to restore old HA ganesha solution back to gluster code repository with some improvement and targetting for next gluster release 7. I have opened up an issue [1] with details and posted initial set of patches [2] Please share your thoughts on the same Regards, Jiffin [1]https://github.com/gluster/glusterfs/issues/663 [2] https://review.gluster.org/#/q/topic:rfc-663+(status:open+OR+status:merged) -------------- next part -------------- An HTML attachment was scrubbed... URL: From jim.kinney at gmail.com Tue Apr 30 12:19:42 2019 From: jim.kinney at gmail.com (Jim Kinney) Date: Tue, 30 Apr 2019 08:19:42 -0400 Subject: [Gluster-users] Proposing to previous ganesha HA cluster solution back to gluster code as gluster-7 feature In-Reply-To: References: Message-ID: <9BE7F129-DE42-46A5-896B-81460E605E9E@gmail.com> +1! I'm using nfs-ganesha in my next upgrade so my client systems can use NFS instead of fuse mounts. Having an integrated, designed in process to coordinate multiple nodes into an HA cluster will very welcome. On April 30, 2019 3:20:11 AM EDT, Jiffin Tony Thottan wrote: >Hi all, > >Some of you folks may be familiar with HA solution provided for >nfs-ganesha by gluster using pacemaker and corosync. > >That feature was removed in glusterfs 3.10 in favour for common HA >project "Storhaug". Even Storhaug was not progressed > >much from last two years and current development is in halt state, >hence >planning to restore old HA ganesha solution back > >to gluster code repository with some improvement and targetting for >next >gluster release 7. > >I have opened up an issue [1] with details and posted initial set of >patches [2] > >Please share your thoughts on the same > >Regards, > >Jiffin > >[1]https://github.com/gluster/glusterfs/issues/663 > > >[2] >https://review.gluster.org/#/q/topic:rfc-663+(status:open+OR+status:merged) -- Sent from my Android device with K-9 Mail. All tyopes are thumb related and reflect authenticity. -------------- next part -------------- An HTML attachment was scrubbed... URL: From hunter86_bg at yahoo.com Tue Apr 30 13:04:50 2019 From: hunter86_bg at yahoo.com (Strahil) Date: Tue, 30 Apr 2019 16:04:50 +0300 Subject: [Gluster-users] Proposing to previous ganesha HA clustersolution back to gluster code as gluster-7 feature Message-ID: Keep in mind that corosync/pacemaker is hard for proper setup by new admins/users. I'm still trying to remediate the effects of poor configuration at work. Also, storhaug is nice for hyperconverged setups where the host is not only hosting bricks, but other workloads. Corosync/pacemaker require proper fencing to be setup and most of the stonith resources 'shoot the other node in the head'. I would be happy to see an easy to deploy (let say 'cluster.enable-ha-ganesha true') and gluster to be bringing up the Floating IPs and taking care of the NFS locks, so no disruption will be felt by the clients. Still, this will be a lot of work to achieve. Best Regards, Strahil NikolovOn Apr 30, 2019 15:19, Jim Kinney wrote: > > +1! > I'm using nfs-ganesha in my next upgrade so my client systems can use NFS instead of fuse mounts. Having an integrated, designed in process to coordinate multiple nodes into an HA cluster will very welcome. > > On April 30, 2019 3:20:11 AM EDT, Jiffin Tony Thottan wrote: >> >> Hi all, >> >> Some of you folks may be familiar with HA solution provided for nfs-ganesha by gluster using pacemaker and corosync. >> >> That feature was removed in glusterfs 3.10 in favour for common HA project "Storhaug". Even Storhaug was not progressed >> >> much from last two years and current development is in halt state, hence planning to restore old HA ganesha solution back >> >> to gluster code repository with some improvement and targetting for next gluster release 7. >> >> I have opened up an issue [1] with details and posted initial set of patches [2] >> >> Please share your thoughts on the same >> >> Regards, >> >> Jiffin?? >> >> [1] https://github.com/gluster/glusterfs/issues/663 >> >> [2] https://review.gluster.org/#/q/topic:rfc-663+(status:open+OR+status:merged) > > > -- > Sent from my Android device with K-9 Mail. All tyopes are thumb related and reflect authenticity. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Renaud.Fortier at fsaa.ulaval.ca Tue Apr 30 13:11:49 2019 From: Renaud.Fortier at fsaa.ulaval.ca (Renaud Fortier) Date: Tue, 30 Apr 2019 13:11:49 +0000 Subject: [Gluster-users] Proposing to previous ganesha HA cluster solution back to gluster code as gluster-7 feature In-Reply-To: <9BE7F129-DE42-46A5-896B-81460E605E9E@gmail.com> References: <9BE7F129-DE42-46A5-896B-81460E605E9E@gmail.com> Message-ID: <7d75b62f0eb0495782c46ef8521790d5@ul-exc-pr-mbx13.ulaval.ca> IMO, you should keep storhaug and maintain it. At the beginning, we were with pacemaker and corosync. Then we move to storhaug with the upgrade to gluster 4.1.x. Now you are talking about going back like it was. Maybe it will be better with pacemake and corosync but the important is to have a solution that will be stable and maintained. thanks Renaud De : gluster-users-bounces at gluster.org [mailto:gluster-users-bounces at gluster.org] De la part de Jim Kinney Envoy? : 30 avril 2019 08:20 ? : gluster-users at gluster.org; Jiffin Tony Thottan ; gluster-users at gluster.org; Gluster Devel ; gluster-maintainers at gluster.org; nfs-ganesha ; devel at lists.nfs-ganesha.org Objet : Re: [Gluster-users] Proposing to previous ganesha HA cluster solution back to gluster code as gluster-7 feature +1! I'm using nfs-ganesha in my next upgrade so my client systems can use NFS instead of fuse mounts. Having an integrated, designed in process to coordinate multiple nodes into an HA cluster will very welcome. On April 30, 2019 3:20:11 AM EDT, Jiffin Tony Thottan > wrote: Hi all, Some of you folks may be familiar with HA solution provided for nfs-ganesha by gluster using pacemaker and corosync. That feature was removed in glusterfs 3.10 in favour for common HA project "Storhaug". Even Storhaug was not progressed much from last two years and current development is in halt state, hence planning to restore old HA ganesha solution back to gluster code repository with some improvement and targetting for next gluster release 7. I have opened up an issue [1] with details and posted initial set of patches [2] Please share your thoughts on the same Regards, Jiffin [1] https://github.com/gluster/glusterfs/issues/663 [2] https://review.gluster.org/#/q/topic:rfc-663+(status:open+OR+status:merged) -- Sent from my Android device with K-9 Mail. All tyopes are thumb related and reflect authenticity. -------------- next part -------------- An HTML attachment was scrubbed... URL: From hunter86_bg at yahoo.com Tue Apr 30 13:29:51 2019 From: hunter86_bg at yahoo.com (Strahil Nikolov) Date: Tue, 30 Apr 2019 13:29:51 +0000 (UTC) Subject: [Gluster-users] Proposing to previous ganesha HA clustersolution back to gluster code as gluster-7 feature In-Reply-To: References: Message-ID: <1028413072.2343069.1556630991785@mail.yahoo.com> Hi, I'm posting this again as it got bounced. Keep in mind that corosync/pacemaker? is hard for proper setup by new admins/users. I'm still trying to remediate the effects of poor configuration at work. Also, storhaug is nice for hyperconverged setups where the host is not only hosting bricks, but? other? workloads. Corosync/pacemaker require proper fencing to be setup and most of the stonith resources 'shoot the other node in the head'. I would be happy to see an easy to deploy (let say 'cluster.enable-ha-ganesha true') and gluster to be bringing up the Floating IPs and taking care of the NFS locks, so no disruption will be felt by the clients. Still, this will be a lot of work to achieve. Best Regards, Strahil Nikolov On Apr 30, 2019 15:19, Jim Kinney wrote: >?? > +1! > I'm using nfs-ganesha in my next upgrade so my client systems can use NFS instead of fuse mounts. Having an integrated, designed in process to coordinate multiple nodes into an HA cluster will very welcome. > > On April 30, 2019 3:20:11 AM EDT, Jiffin Tony Thottan wrote: >>?? >> Hi all, >> >> Some of you folks may be familiar with HA solution provided for nfs-ganesha by gluster using pacemaker and corosync. >> >> That feature was removed in glusterfs 3.10 in favour for common HA project "Storhaug". Even Storhaug was not progressed >> >> much from last two years and current development is in halt state, hence planning to restore old HA ganesha solution back >> >> to gluster code repository with some improvement and targetting for next gluster release 7. >> >>??I have opened up an issue [1] with details and posted initial set of patches [2] >> >> Please share your thoughts on the same >> >> >> Regards, >> >> Jiffin?? >> >> [1] https://github.com/gluster/glusterfs/issues/663 >> >> [2] https://review.gluster.org/#/q/topic:rfc-663+(status:open+OR+status:merged) >> >> > > -- > Sent from my Android device with K-9 Mail. All tyopes are thumb related and reflect authenticity. Keep in mind that corosync/pacemaker? is hard for proper setup by new admins/users. I'm still trying to remediate the effects of poor configuration at work. Also, storhaug is nice for hyperconverged setups where the host is not only hosting bricks, but? other? workloads. Corosync/pacemaker require proper fencing to be setup and most of the stonith resources 'shoot the other node in the head'. I would be happy to see an easy to deploy (let say 'cluster.enable-ha-ganesha true') and gluster to be bringing up the Floating IPs and taking care of the NFS locks, so no disruption will be felt by the clients. Still, this will be a lot of work to achieve. Best Regards, Strahil NikolovOn Apr 30, 2019 15:19, Jim Kinney wrote: > > +1! > I'm using nfs-ganesha in my next upgrade so my client systems can use NFS instead of fuse mounts. Having an integrated, designed in process to coordinate multiple nodes into an HA cluster will very welcome. > > On April 30, 2019 3:20:11 AM EDT, Jiffin Tony Thottan wrote: >> >> Hi all, >> >> Some of you folks may be familiar with HA solution provided for nfs-ganesha by gluster using pacemaker and corosync. >> >> That feature was removed in glusterfs 3.10 in favour for common HA project "Storhaug". Even Storhaug was not progressed >> >> much from last two years and current development is in halt state, hence planning to restore old HA ganesha solution back >> >> to gluster code repository with some improvement and targetting for next gluster release 7. >> >> I have opened up an issue [1] with details and posted initial set of patches [2] >> >> Please share your thoughts on the same >> >> Regards, >> >> Jiffin?? >> >> [1] https://github.com/gluster/glusterfs/issues/663 >> >> [2] https://review.gluster.org/#/q/topic:rfc-663+(status:open+OR+status:merged) > > > -- > Sent from my Android device with K-9 Mail. All tyopes are thumb related and reflect authenticity. From gbok at gelbergroup.com Sat Apr 27 02:02:17 2019 From: gbok at gelbergroup.com (Greg Bok) Date: Sat, 27 Apr 2019 02:02:17 -0000 Subject: [Gluster-users] Synchronous Client-Side Replication Behavior Message-ID: What are the guarantees at write completion time with client-side replication and replicated volume types? When the write completes has it successfully been written to all replicas or just one? How does write-behind/flush-behind affect this behavior? -------------- next part -------------- An HTML attachment was scrubbed... URL: