From archon810 at gmail.com Sat Aug 1 04:45:43 2020 From: archon810 at gmail.com (Artem Russakovskii) Date: Fri, 31 Jul 2020 21:45:43 -0700 Subject: [Gluster-users] [Gluster-devel] Announcing Gluster release 7.7 In-Reply-To: References: Message-ID: Got it, thanks. Already started upgrading the fleet to 15.2, so we'll be able to upgrade from 7.6 soon. On Thu, Jul 30, 2020, 10:39 PM Shwetha Acharya wrote: > Hi Artem, > > As per current Tentative plans for community packages > we > are supporting Leap15.2 only. > > Regards, > Shwetha > > On Fri, Jul 31, 2020 at 1:03 AM Artem Russakovskii > wrote: > >> Hi, >> >> >> https://download.opensuse.org/repositories/home:/glusterfs:/Leap15.1-7/openSUSE_Leap_15.1/x86_64/ >> is still missing 7.7. Is there an ETA please? >> >> Thanks. >> >> >> Sincerely, >> Artem >> >> -- >> Founder, Android Police , APK Mirror >> , Illogical Robot LLC >> beerpla.net | @ArtemR >> >> >> On Wed, Jul 22, 2020 at 9:27 AM Rinku Kothiya >> wrote: >> >>> Hi, >>> >>> The Gluster community is pleased to announce the release of Gluster7.7 >>> (packages available at [1]). >>> Release notes for the release can be found at [2]. >>> >>> Major changes, features and limitations addressed in this release: >>> None >>> >>> Please Note: Some of the packages are unavailable and we are working on >>> it. We will release them soon. >>> >>> Thanks, >>> Gluster community >>> >>> References: >>> >>> [1] Packages for 7.7: >>> https://download.gluster.org/pub/gluster/glusterfs/7/7.7/ >>> >>> [2] Release notes for 7.7: >>> https://docs.gluster.org/en/latest/release-notes/7.7/ >>> ________ >>> >>> >>> >>> Community Meeting Calendar: >>> >>> Schedule - >>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >>> Bridge: https://bluejeans.com/441850968 >>> >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> https://lists.gluster.org/mailman/listinfo/gluster-users >>> >> ________ >> >> >> >> Community Meeting Calendar: >> >> Schedule - >> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >> Bridge: https://bluejeans.com/441850968 >> >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From spalai at redhat.com Mon Aug 3 05:46:36 2020 From: spalai at redhat.com (Susant Palai) Date: Mon, 3 Aug 2020 11:16:36 +0530 Subject: [Gluster-users] Rebalance improvement. Message-ID: <90AD956E-EB56-4A00-AB8F-C44D3A1BE0E1@redhat.com> Hi, Recently, we have pushed some performance improvements for Rebalance Crawl which used to consume a significant amount of time, out of the entire rebalance process. The patch [1] is recently merged in upstream and may land as an experimental feature in the upcoming upstream release. The improvement currently works only for pure-distribute Volume. (which can be expanded). Things to look forward to in future : - Parallel Crawl in Rebalance - Global Layout Once these improvements are in place, we would be able to reduce the overall rebalance time by a significant time. Would request our community to try out the feature and give us feedback. More information regarding the same will follow. Thanks & Regards, Susant Palai [1] https://review.gluster.org/#/c/glusterfs/+/24443/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From revirii at googlemail.com Mon Aug 3 06:46:21 2020 From: revirii at googlemail.com (Hu Bert) Date: Mon, 3 Aug 2020 08:46:21 +0200 Subject: [Gluster-users] [Gluster-devel] Announcing Gluster release 7.7 In-Reply-To: References: Message-ID: Hi there, just wanted to say thanks to all the developers, maintainers etc. This release (7) has brought us a small but nice performance improvement. Utilization and IOs per disk decreased, latency dropped. See attached images. I read the release notes but couldn't identify the specific changes/features for this improvement. Maybe someone could point to them - but no hurry... :-) Best regards, Hubert Am Mi., 22. Juli 2020 um 18:27 Uhr schrieb Rinku Kothiya : > > Hi, > > The Gluster community is pleased to announce the release of Gluster7.7 (packages available at [1]). > Release notes for the release can be found at [2]. > > Major changes, features and limitations addressed in this release: > None > > Please Note: Some of the packages are unavailable and we are working on it. We will release them soon. > > Thanks, > Gluster community > > References: > > [1] Packages for 7.7: > https://download.gluster.org/pub/gluster/glusterfs/7/7.7/ > > [2] Release notes for 7.7: > https://docs.gluster.org/en/latest/release-notes/7.7/ > ________ > > > > Community Meeting Calendar: > > Schedule - > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > Bridge: https://bluejeans.com/441850968 > > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- A non-text attachment was scrubbed... Name: diskstats_iops-week.png Type: image/png Size: 68055 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: diskstats_utilization-week.png Type: image/png Size: 61232 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: diskstats_latency-week.png Type: image/png Size: 54543 bytes Desc: not available URL: From spalai at redhat.com Mon Aug 3 07:17:01 2020 From: spalai at redhat.com (Susant Palai) Date: Mon, 3 Aug 2020 12:47:01 +0530 Subject: [Gluster-users] Rebalance improvement. In-Reply-To: <90AD956E-EB56-4A00-AB8F-C44D3A1BE0E1@redhat.com> References: <90AD956E-EB56-4A00-AB8F-C44D3A1BE0E1@redhat.com> Message-ID: Centos Users can add the following repo and install the build from the master branch to try out the feature. [Testing purpose only, not ready for consumption in production env.] [gluster-nightly-master] baseurl= http://artifacts.ci.centos.org/gluster/nightly/master/7/x86_64/ gpgcheck=0 keepalive=1 enabled=1 repo_gpgcheck = 0 name=Gluster Nightly builds (master branch) A summary of perf numbers from our test lab : DirSize - 1Million Old New %diff Depth - 100 (Run 1) 353 74 +377% Depth - 100 (Run 2) 348 72 +377~% Depth - 50 246 122 +100% Depth - 3 174 114 +52% Susant On Mon, Aug 3, 2020 at 11:16 AM Susant Palai wrote: > Hi, > Recently, we have pushed some performance improvements for Rebalance > Crawl which used to consume a significant amount of time, out of the entire > rebalance process. > > > The patch [1] is recently merged in upstream and may land as an > experimental feature in the upcoming upstream release. > > The improvement currently works only for pure-distribute Volume. (which > can be expanded). > > > Things to look forward to in future : > - Parallel Crawl in Rebalance > - Global Layout > > Once these improvements are in place, we would be able to reduce the > overall rebalance time by a significant time. > > Would request our community to try out the feature and give us feedback. > > More information regarding the same will follow. > > > Thanks & Regards, > Susant Palai > > > [1] https://review.gluster.org/#/c/glusterfs/+/24443/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From aravinda at kadalu.io Mon Aug 3 08:28:40 2020 From: aravinda at kadalu.io (Aravinda VK) Date: Mon, 3 Aug 2020 13:58:40 +0530 Subject: [Gluster-users] Rebalance improvement. In-Reply-To: References: <90AD956E-EB56-4A00-AB8F-C44D3A1BE0E1@redhat.com> Message-ID: <36E38592-A906-48A5-B437-84C7C37057F2@kadalu.io> Interesting numbers. Thanks for the effort. What is the unit of old/new numbers? seconds? > On 03-Aug-2020, at 12:47 PM, Susant Palai wrote: > > Centos Users can add the following repo and install the build from the master branch to try out the feature. [Testing purpose only, not ready for consumption in production env.] > > [gluster-nightly-master] > baseurl=http://artifacts.ci.centos.org/gluster/nightly/master/7/x86_64/ > gpgcheck=0 > keepalive=1 > enabled=1 > repo_gpgcheck = 0 > name=Gluster Nightly builds (master branch) > > A summary of perf numbers from our test lab : > > DirSize - 1Million Old New %diff > Depth - 100 (Run 1) 353 74 +377% > Depth - 100 (Run 2) 348 72 +377~% > Depth - 50 246 122 +100% > Depth - 3 174 114 +52% > > Susant > > > On Mon, Aug 3, 2020 at 11:16 AM Susant Palai > wrote: > Hi, > Recently, we have pushed some performance improvements for Rebalance Crawl which used to consume a significant amount of time, out of the entire rebalance process. > > > The patch [1] is recently merged in upstream and may land as an experimental feature in the upcoming upstream release. > > The improvement currently works only for pure-distribute Volume. (which can be expanded). > > > Things to look forward to in future : > - Parallel Crawl in Rebalance > - Global Layout > > Once these improvements are in place, we would be able to reduce the overall rebalance time by a significant time. > > Would request our community to try out the feature and give us feedback. > > More information regarding the same will follow. > > > Thanks & Regards, > Susant Palai > > > [1] https://review.gluster.org/#/c/glusterfs/+/24443/ ________ > > > > Community Meeting Calendar: > > Schedule - > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > Bridge: https://bluejeans.com/441850968 > > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users Aravinda Vishwanathapura https://kadalu.io -------------- next part -------------- An HTML attachment was scrubbed... URL: From spalai at redhat.com Mon Aug 3 08:39:37 2020 From: spalai at redhat.com (Susant Palai) Date: Mon, 3 Aug 2020 14:09:37 +0530 Subject: [Gluster-users] Rebalance improvement. In-Reply-To: <36E38592-A906-48A5-B437-84C7C37057F2@kadalu.io> References: <90AD956E-EB56-4A00-AB8F-C44D3A1BE0E1@redhat.com> <36E38592-A906-48A5-B437-84C7C37057F2@kadalu.io> Message-ID: > On 03-Aug-2020, at 13:58, Aravinda VK wrote: > > Interesting numbers. Thanks for the effort. > > What is the unit of old/new numbers? seconds? Minutes. > >> On 03-Aug-2020, at 12:47 PM, Susant Palai > wrote: >> >> Centos Users can add the following repo and install the build from the master branch to try out the feature. [Testing purpose only, not ready for consumption in production env.] >> >> [gluster-nightly-master] >> baseurl=http://artifacts.ci.centos.org/gluster/nightly/master/7/x86_64/ >> gpgcheck=0 >> keepalive=1 >> enabled=1 >> repo_gpgcheck = 0 >> name=Gluster Nightly builds (master branch) >> >> A summary of perf numbers from our test lab : >> >> DirSize - 1Million Old New %diff >> Depth - 100 (Run 1) 353 74 +377% >> Depth - 100 (Run 2) 348 72 +377~% >> Depth - 50 246 122 +100% >> Depth - 3 174 114 +52% >> >> Susant >> >> >> On Mon, Aug 3, 2020 at 11:16 AM Susant Palai > wrote: >> Hi, >> Recently, we have pushed some performance improvements for Rebalance Crawl which used to consume a significant amount of time, out of the entire rebalance process. >> >> >> The patch [1] is recently merged in upstream and may land as an experimental feature in the upcoming upstream release. >> >> The improvement currently works only for pure-distribute Volume. (which can be expanded). >> >> >> Things to look forward to in future : >> - Parallel Crawl in Rebalance >> - Global Layout >> >> Once these improvements are in place, we would be able to reduce the overall rebalance time by a significant time. >> >> Would request our community to try out the feature and give us feedback. >> >> More information regarding the same will follow. >> >> >> Thanks & Regards, >> Susant Palai >> >> >> [1] https://review.gluster.org/#/c/glusterfs/+/24443/ ________ >> >> >> >> Community Meeting Calendar: >> >> Schedule - >> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >> Bridge: https://bluejeans.com/441850968 >> >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users > > Aravinda Vishwanathapura > https://kadalu.io > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From archon810 at gmail.com Mon Aug 3 17:54:24 2020 From: archon810 at gmail.com (Artem Russakovskii) Date: Mon, 3 Aug 2020 10:54:24 -0700 Subject: [Gluster-users] Gluster linear scale-out performance In-Reply-To: References: Message-ID: > > Do you kill all gluster processes (not just glusterd but even the brick > processes) before issuing reboot? This is necessary to prevent I/O stalls. > There is stop-all-gluster-processes.sh which should be available as a part > of the gluster installation (maybe in /usr/share/glusterfs/scripts/) which > you can use. Can you check if this helps? > A reboot shuts down gracefully, so those processes are shut down before the reboot begins. We've moved on to discussing this matter in the gluster slack, there's a lot more info there now about the above. The gist is heavy xfs fragmentation when bricks are almost full (95-96%) made healing as well as disk accesses a lot more expensive and slow, and prone to hanging. What's still not clear is why a slowdown of one brick/gluster instance affects similarly affects all bricks/gluster instances, on other servers, and how that can be optimized/mitigated. Sincerely, Artem -- Founder, Android Police , APK Mirror , Illogical Robot LLC beerpla.net | @ArtemR On Thu, Jul 30, 2020 at 8:21 PM Ravishankar N wrote: > > On 25/07/20 4:35 am, Artem Russakovskii wrote: > > Speaking of fio, could the gluster team please help me understand > something? > > We've been having lots of performance issues related to gluster using > attached block storage on Linode. At some point, I figured out that Linode > has a cap of 500 IOPS on their block storage > > (with spikes to 1500 IOPS). The block storage we use is formatted xfs with > 4KB bsize (block size). > > I then ran a bunch of fio tests on the block storage itself (not the > gluster fuse mount), which performed horribly when the bs parameter was set > to 4k: > fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 > --name=test --filename=test --bs=4k --iodepth=64 --size=4G > --readwrite=randwrite --ramp_time=4 > During these tests, fio ETA crawled to over an hour, at some point dropped > to 45min and I did see 500-1500 IOPS flash by briefly, then it went back > down to 0. I/O seems majorly choked for some reason, likely because gluster > is using some of it. Transfer speed with such 4k block size is 2 MB/s with > spikes to 6MB/s. This causes the load on the server to spike up to 100+ and > brings down all our servers. > > Jobs: 1 (f=1): [w(1)][20.3%][r=0KiB/s,w=5908KiB/s][r=0,w=1477 IOPS][eta 43m:00s] Jobs: 1 (f=1): [w(1)][21.5%][r=0KiB/s,w=0KiB/s][r=0,w=0 IOPS][eta 44m:54s] > > xfs_info /mnt/citadel_block1 > meta-data=/dev/sdc isize=512 agcount=103, agsize=26214400 blks > = sectsz=512 attr=2, projid32bit=1 > = crc=1 finobt=1, sparse=0, rmapbt=0 > = reflink=0 > data = bsize=4096 blocks=2684354560, imaxpct=25 > = sunit=0 swidth=0 blks > naming =version 2 bsize=4096 ascii-ci=0, ftype=1log =internal log bsize=4096 blocks=51200, version=2 > = sectsz=512 sunit=0 blks, lazy-count=1 > realtime =none extsz=4096 blocks=0, rtextents=0 > > When I increase the --bs param to fio from 4k to, say, 64k, transfer speed > goes up significantly and is more like 50MB/s, and at 256k, it's 200MB/s. > > So what I'm trying to understand is: > > 1. How does the xfs block size (4KB) relate to the block size in fio > tests? If we're limited by IOPS, and xfs block size is 4KB, how can fio > produce better results with varying --bs param? > 2. Would increasing the xfs data block size to something like 64-256KB > help with our issue of choking IO and skyrocketing load? > > I have experienced similar behavior when running fio tests with bs=4k on a > gluster volume backed by XFS with a high load (numjobs=32) . When I > observed the strace of the brick processes (fsync -f -T -p $PID), I saw > fysnc system calls taking around 2500 seconds which is insane. I'm not sure > if this is specific to the way fio does its i/o pattern and the way XFS > handles it. When I used 64k block sizes, the fio tests completed just fine. > > > 1. The worst hangs and load spikes happen when we reboot one of the > gluster servers, but not when it's down - when it comes back online. Even > with gluster not showing anything pending heal, my guess is it's still > trying to do lots of IO between the 4 nodes for some reason, but I don't > understand why. > > Do you kill all gluster processes (not just glusterd but even the brick > processes) before issuing reboot? This is necessary to prevent I/O stalls. > There is stop-all-gluster-processes.sh which should be available as a part > of the gluster installation (maybe in /usr/share/glusterfs/scripts/) which > you can use. Can you check if this helps? > > Regards, > > Ravi > > I've been banging my head on the wall with this problem for months. > Appreciate any feedback here. > > Thank you. > > gluster volume info below > > Volume Name: SNIP_data1 > Type: Replicate > Volume ID: SNIP > Status: Started > Snapshot Count: 0 > Number of Bricks: 1 x 4 = 4 > Transport-type: tcp > Bricks: > Brick1: nexus2:/mnt/SNIP_block1/SNIP_data1 > Brick2: forge:/mnt/SNIP_block1/SNIP_data1 > Brick3: hive:/mnt/SNIP_block1/SNIP_data1 > Brick4: citadel:/mnt/SNIP_block1/SNIP_data1 > Options Reconfigured: > cluster.quorum-count: 1 > cluster.quorum-type: fixed > network.ping-timeout: 5 > network.remote-dio: enable > performance.rda-cache-limit: 256MB > performance.readdir-ahead: on > performance.parallel-readdir: on > network.inode-lru-limit: 500000 > performance.md-cache-timeout: 600 > performance.cache-invalidation: on > performance.stat-prefetch: on > features.cache-invalidation-timeout: 600 > features.cache-invalidation: on > cluster.readdir-optimize: on > performance.io-thread-count: 32 > server.event-threads: 4 > client.event-threads: 4 > performance.read-ahead: off > cluster.lookup-optimize: on > performance.cache-size: 1GB > cluster.self-heal-daemon: enable > transport.address-family: inet > nfs.disable: on > performance.client-io-threads: on > cluster.granular-entry-heal: enable > cluster.data-self-heal-algorithm: full > > > Sincerely, > Artem > > -- > Founder, Android Police , APK Mirror > , Illogical Robot LLC > beerpla.net | @ArtemR > > > On Thu, Jul 23, 2020 at 12:08 AM Qing Wang wrote: > >> Hi, >> >> I have one more question about the Gluster linear scale-out performance >> regarding the "write-behind off" case specifically -- when "write-behind" >> is off, and still the stripe volumes and other settings as early thread >> posted, the storage I/O seems not to relate to the number of storage >> nodes. In my experiment, no matter I have 2 brick server nodes or 8 brick >> server nodes, the aggregated gluster I/O performance is ~100MB/sec. And fio >> benchmark measurement gives the same result. If "write behind" is on, then >> the storage performance is linear scale-out along with the # of brick >> server nodes increasing. >> >> No matter the write behind option is on/off, I thought the gluster I/O >> performance should be pulled and aggregated together as a whole. If that is >> the case, why do I get a consistent gluster performance (~100MB/sec) when >> "write behind" is off? Please advise me if I misunderstood something. >> >> Thanks, >> Qing >> >> >> >> >> On Tue, Jul 21, 2020 at 7:29 PM Qing Wang wrote: >> >>> fio gives me the correct linear scale-out results, and you're right, the >>> storage cache is the root cause that makes the dd measurement results not >>> accurate at all. >>> >>> Thanks, >>> Qing >>> >>> >>> On Tue, Jul 21, 2020 at 2:53 PM Yaniv Kaul wrote: >>> >>>> >>>> >>>> On Tue, 21 Jul 2020, 21:43 Qing Wang wrote: >>>> >>>>> Hi Yaniv, >>>>> >>>>> Thanks for the quick response. I forget to mention I am testing the >>>>> writing performance, not reading. In this case, would the client cache hit >>>>> rate still be a big issue? >>>>> >>>> >>>> It's not hitting the storage directly. Since it's also single threaded, >>>> it may also not saturate it. I highly recommend testing properly. >>>> Y. >>>> >>>> >>>>> I'll use fio to run my test once again, thanks for the suggestion. >>>>> >>>>> Thanks, >>>>> Qing >>>>> >>>>> On Tue, Jul 21, 2020 at 2:38 PM Yaniv Kaul wrote: >>>>> >>>>>> >>>>>> >>>>>> On Tue, 21 Jul 2020, 21:30 Qing Wang wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I am trying to test Gluster linear scale-out performance by adding >>>>>>> more storage server/bricks, and measure the storage I/O performance. To >>>>>>> vary the storage server number, I create several "stripe" volumes that >>>>>>> contain 2 brick servers, 3 brick servers, 4 brick servers, and so on. On >>>>>>> gluster client side, I used "dd if=/dev/zero >>>>>>> of=/mnt/glusterfs/dns_test_data_26g bs=1M count=26000" to create 26G data >>>>>>> (or larger size), and those data will be distributed to the corresponding >>>>>>> gluster servers (each has gluster brick on it) and "dd" returns the final >>>>>>> I/O throughput. The Internet is 40G infiniband, although I didn't do any >>>>>>> specific configurations to use advanced features. >>>>>>> >>>>>> >>>>>> Your dd command is inaccurate, as it'll hit the client cache. It is >>>>>> also single threaded. I suggest switching to fio. >>>>>> Y. >>>>>> >>>>>> >>>>>>> What confuses me is that the storage I/O seems not to relate to the >>>>>>> number of storage nodes, but Gluster documents said it should be linear >>>>>>> scaling. For example, when "write-behind" is on, and when Infiniband "jumbo >>>>>>> frame" (connected mode) is on, I can get ~800 MB/sec reported by "dd", no >>>>>>> matter I have 2 brick servers or 8 brick servers -- for 2 server case, each >>>>>>> server can have ~400 MB/sec; for 4 server case, each server can have >>>>>>> ~200MB/sec. That said, each server I/O does aggregate to the final storage >>>>>>> I/O (800 MB/sec), but this is not "linear scale-out". >>>>>>> >>>>>>> Can somebody help me to understand why this is the case? I certainly >>>>>>> can have some misunderstanding/misconfiguration here. Please correct me if >>>>>>> I do, thanks! >>>>>>> >>>>>>> Best, >>>>>>> Qing >>>>>>> ________ >>>>>>> >>>>>>> >>>>>>> >>>>>>> Community Meeting Calendar: >>>>>>> >>>>>>> Schedule - >>>>>>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >>>>>>> Bridge: https://bluejeans.com/441850968 >>>>>>> >>>>>>> Gluster-users mailing list >>>>>>> Gluster-users at gluster.org >>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>>>> >>>>>> ________ >> >> >> >> Community Meeting Calendar: >> >> Schedule - >> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >> Bridge: https://bluejeans.com/441850968 >> >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users >> > > ________ > > > > Community Meeting Calendar: > > Schedule - > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > Bridge: https://bluejeans.com/441850968 > > Gluster-users mailing listGluster-users at gluster.orghttps://lists.gluster.org/mailman/listinfo/gluster-users > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bob at computerisms.ca Tue Aug 4 03:01:17 2020 From: bob at computerisms.ca (Computerisms Corporation) Date: Mon, 3 Aug 2020 20:01:17 -0700 Subject: [Gluster-users] performance Message-ID: <696b3c28-519b-c3e3-ce5d-e60d2f194d4c@computerisms.ca> Hi Gurus, I have been trying to wrap my head around performance improvements on my gluster setup, and I don't seem to be making any progress. I mean forward progress. making it worse takes practically no effort at all. My gluster is distributed-replicated across 6 bricks and 2 servers, with an arbiter on each server. I designed it like this so I have an expansion path to more servers in the future (like the staggered arbiter diagram in the red hat documentation). gluster v info output is below. I have compiled gluster 7.6 from sources on both servers. Servers are 6core/3.4Ghz with 32 GB RAM, no swap, and SSD and gigabit network connections. They are running debian, and are being used as redundant web servers. There is some 3Million files on the Gluster Storage averaging 130KB/file. Currently only one of the two servers is serving web services. There are well over 100 sites, and apache server-status claims around 5 hits per second, depending on time of day, so a fair bit of logging going on. The gluster is only holding website data and config files that will be common between the two servers, no databases or anything like that on the Gluster. When the serving server is under load load average is consistently 12-20. glusterfs is always at the top with 150%-250% cpu, and each of 3 bricks at roughly 50-70%, so consistently pegging 4 of the 6 cores. apache processes will easily eat up all the rest of the cpus after that. And web page response time is underwhelming at best. Interestingly, mostly because it is not something I have ever experienced before, software interrupts sit between 1 and 5 on each core, but the last core is usually sitting around 20. Have never encountered a high load average where the si number was ever significant. I have googled the crap out of that (as well as gluster performance in general), there are nearly limitless posts about what it is, but have yet to see one thing to explain what to do about it. Sadly I can't really shut down the gluster process to confirm if that is the cause, but it's a pretty good bet, I think. When the system is not under load, glusterfs will be running at around 100% with each of the 3 bricks around 35%, so using 2 cores when doing not much of anything. nload shows the network cards rarely climb above 300 Mbps unless I am doing a direct file transfer between the servers, in which case it gets right up to the 1Gbps limit. RAM is never above 15GB unless I am causing it to happen. atop show a disk busy percentage, it is often above 50% and sometimes will hit 100%, and is no where near as consistently showing excessive usage like the cpu cores are. The cpu definitely seems to be the bottleneck. When I found out about the groups directory, I figured one of those must be useful to me, but as best as I can tell they are not. But I am really hoping that someone has configured a system like mine and has a good group file they might share for this situation, or a peak at their volume info output? or maybe this is really just about as good as I should expect? Maybe the fix is that I need more/faster cores? I hope not, as that isn't really an option. Anyway, here is my volume info as promised. root at mooglian:/Computerisms/sites/computerisms.ca/log# gluster v info Volume Name: webisms Type: Distributed-Replicate Volume ID: 261901e7-60b4-4760-897d-0163beed356e Status: Started Snapshot Count: 0 Number of Bricks: 2 x (2 + 1) = 6 Transport-type: tcp Bricks: Brick1: mooglian:/var/GlusterBrick/replset-0/webisms-replset-0 Brick2: moogle:/var/GlusterBrick/replset-0/webisms-replset-0 Brick3: moogle:/var/GlusterBrick/replset-0-arb/webisms-replset-0-arb (arbiter) Brick4: moogle:/var/GlusterBrick/replset-1/webisms-replset-1 Brick5: mooglian:/var/GlusterBrick/replset-1/webisms-replset-1 Brick6: mooglian:/var/GlusterBrick/replset-1-arb/webisms-replset-1-arb (arbiter) Options Reconfigured: auth.allow: xxxx performance.client-io-threads: off nfs.disable: on storage.fips-mode-rchecksum: on transport.address-family: inet performance.stat-prefetch: on network.inode-lru-limit: 200000 performance.write-behind-window-size: 4MB performance.readdir-ahead: on performance.io-thread-count: 64 performance.cache-size: 8GB server.event-threads: 8 client.event-threads: 8 performance.nl-cache-timeout: 600 -- Bob Miller Cell: 867-334-7117 Office: 867-633-3760 Office: 867-322-0362 www.computerisms.ca From hunter86_bg at yahoo.com Tue Aug 4 04:00:06 2020 From: hunter86_bg at yahoo.com (Strahil Nikolov) Date: Tue, 04 Aug 2020 07:00:06 +0300 Subject: [Gluster-users] performance In-Reply-To: <696b3c28-519b-c3e3-ce5d-e60d2f194d4c@computerisms.ca> References: <696b3c28-519b-c3e3-ce5d-e60d2f194d4c@computerisms.ca> Message-ID: <7991483E-5365-4C87-89FA-C871AED18062@yahoo.com> ?? 4 ?????? 2020 ?. 6:01:17 GMT+03:00, Computerisms Corporation ??????: >Hi Gurus, > >I have been trying to wrap my head around performance improvements on >my >gluster setup, and I don't seem to be making any progress. I mean >forward progress. making it worse takes practically no effort at all. > >My gluster is distributed-replicated across 6 bricks and 2 servers, >with >an arbiter on each server. I designed it like this so I have an >expansion path to more servers in the future (like the staggered >arbiter >diagram in the red hat documentation). gluster v info output is below. > >I have compiled gluster 7.6 from sources on both servers. There is a 7.7 version which is fixing somw stuff. Why do you have to compile it from source ? >Servers are 6core/3.4Ghz with 32 GB RAM, no swap, and SSD and gigabit >network connections. They are running debian, and are being used as >redundant web servers. There is some 3Million files on the Gluster >Storage averaging 130KB/file. This type of workload is called 'metadata-intensive'. There are some recommendations for this type of workload: https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3/html/administration_guide/small_file_performance_enhancements Keep an eye on the section that mentions dirty-ratio?= 5 &dirty-background-ration?= 2. >Currently only one of the two servers is > >serving web services. There are well over 100 sites, and apache >server-status claims around 5 hits per second, depending on time of >day, >so a fair bit of logging going on. The gluster is only holding website > >data and config files that will be common between the two servers, no >databases or anything like that on the Gluster. > >When the serving server is under load load average is consistently >12-20. glusterfs is always at the top with 150%-250% cpu, and each of >3 >bricks at roughly 50-70%, so consistently pegging 4 of the 6 cores. >apache processes will easily eat up all the rest of the cpus after >that. > And web page response time is underwhelming at best. > >Interestingly, mostly because it is not something I have ever >experienced before, software interrupts sit between 1 and 5 on each >core, but the last core is usually sitting around 20. Have never >encountered a high load average where the si number was ever >significant. I have googled the crap out of that (as well as gluster >performance in general), there are nearly limitless posts about what it > >is, but have yet to see one thing to explain what to do about it. There is an explanation about that in the link I provided above: Configuring a higher event threads value than the available processing units could again cause context switches on these threads. As a result reducing the number deduced from the previous step to a number that is less that the available processing units is recommended. >Sadly >I can't really shut down the gluster process to confirm if that is the >cause, but it's a pretty good bet, I think. > >When the system is not under load, glusterfs will be running at around >100% with each of the 3 bricks around 35%, so using 2 cores when doing >not much of anything. > >nload shows the network cards rarely climb above 300 Mbps unless I am >doing a direct file transfer between the servers, in which case it gets > >right up to the 1Gbps limit. RAM is never above 15GB unless I am >causing it to happen. atop show a disk busy percentage, it is often >above 50% and sometimes will hit 100%, and is no where near as >consistently showing excessive usage like the cpu cores are. The cpu >definitely seems to be the bottleneck. >When I found out about the groups directory, I figured one of those >must >be useful to me, but as best as I can tell they are not. But I am >really hoping that someone has configured a system like mine and has a >good group file they might share for this situation, or a peak at their > >volume info output? > >or maybe this is really just about as good as I should expect? Maybe >the fix is that I need more/faster cores? I hope not, as that isn't >really an option. > >Anyway, here is my volume info as promised. > >root at mooglian:/Computerisms/sites/computerisms.ca/log# gluster v info > >Volume Name: webisms >Type: Distributed-Replicate >Volume ID: 261901e7-60b4-4760-897d-0163beed356e >Status: Started >Snapshot Count: 0 >Number of Bricks: 2 x (2 + 1) = 6 >Transport-type: tcp >Bricks: >Brick1: mooglian:/var/GlusterBrick/replset-0/webisms-replset-0 >Brick2: moogle:/var/GlusterBrick/replset-0/webisms-replset-0 >Brick3: moogle:/var/GlusterBrick/replset-0-arb/webisms-replset-0-arb >(arbiter) >Brick4: moogle:/var/GlusterBrick/replset-1/webisms-replset-1 >Brick5: mooglian:/var/GlusterBrick/replset-1/webisms-replset-1 >Brick6: mooglian:/var/GlusterBrick/replset-1-arb/webisms-replset-1-arb >(arbiter) >Options Reconfigured: >auth.allow: xxxx >performance.client-io-threads: off >nfs.disable: on >storage.fips-mode-rchecksum: on >transport.address-family: inet >performance.stat-prefetch: on >network.inode-lru-limit: 200000 >performance.write-behind-window-size: 4MB >performance.readdir-ahead: on >performance.io-thread-count: 64 >performance.cache-size: 8GB >server.event-threads: 8 >client.event-threads: 8 >performance.nl-cache-timeout: 600 As 'storage.fips-mode-rchecksum' is using sha256, you can try to disable it - which should use the less cpu intensive md5. Yet, I have never played with that option ... Check the RH page about the tunings and try different values for the event threads. Best Regards, Strahil Nikolov From archon810 at gmail.com Tue Aug 4 04:42:45 2020 From: archon810 at gmail.com (Artem Russakovskii) Date: Mon, 3 Aug 2020 21:42:45 -0700 Subject: [Gluster-users] performance In-Reply-To: <696b3c28-519b-c3e3-ce5d-e60d2f194d4c@computerisms.ca> References: <696b3c28-519b-c3e3-ce5d-e60d2f194d4c@computerisms.ca> Message-ID: I tried putting all web files (specifically WordPress php and static files as well as various cache files) on gluster before, and the results were miserable on a busy site - our usual ~8-10 load quickly turned into 100+ and killed everything. I had to go back to running just the user uploads (which are static files in the Wordpress uploads/ dir) on gluster and using rsync (via lsyncd) for the frequently executed php / cache. I'd love to figure this out as well and tune gluster for heavy reads and moderate writes, but I haven't cracked that recipe yet. On Mon, Aug 3, 2020, 8:08 PM Computerisms Corporation wrote: > Hi Gurus, > > I have been trying to wrap my head around performance improvements on my > gluster setup, and I don't seem to be making any progress. I mean > forward progress. making it worse takes practically no effort at all. > > My gluster is distributed-replicated across 6 bricks and 2 servers, with > an arbiter on each server. I designed it like this so I have an > expansion path to more servers in the future (like the staggered arbiter > diagram in the red hat documentation). gluster v info output is below. > I have compiled gluster 7.6 from sources on both servers. > > Servers are 6core/3.4Ghz with 32 GB RAM, no swap, and SSD and gigabit > network connections. They are running debian, and are being used as > redundant web servers. There is some 3Million files on the Gluster > Storage averaging 130KB/file. Currently only one of the two servers is > serving web services. There are well over 100 sites, and apache > server-status claims around 5 hits per second, depending on time of day, > so a fair bit of logging going on. The gluster is only holding website > data and config files that will be common between the two servers, no > databases or anything like that on the Gluster. > > When the serving server is under load load average is consistently > 12-20. glusterfs is always at the top with 150%-250% cpu, and each of 3 > bricks at roughly 50-70%, so consistently pegging 4 of the 6 cores. > apache processes will easily eat up all the rest of the cpus after that. > And web page response time is underwhelming at best. > > Interestingly, mostly because it is not something I have ever > experienced before, software interrupts sit between 1 and 5 on each > core, but the last core is usually sitting around 20. Have never > encountered a high load average where the si number was ever > significant. I have googled the crap out of that (as well as gluster > performance in general), there are nearly limitless posts about what it > is, but have yet to see one thing to explain what to do about it. Sadly > I can't really shut down the gluster process to confirm if that is the > cause, but it's a pretty good bet, I think. > > When the system is not under load, glusterfs will be running at around > 100% with each of the 3 bricks around 35%, so using 2 cores when doing > not much of anything. > > nload shows the network cards rarely climb above 300 Mbps unless I am > doing a direct file transfer between the servers, in which case it gets > right up to the 1Gbps limit. RAM is never above 15GB unless I am > causing it to happen. atop show a disk busy percentage, it is often > above 50% and sometimes will hit 100%, and is no where near as > consistently showing excessive usage like the cpu cores are. The cpu > definitely seems to be the bottleneck. > > When I found out about the groups directory, I figured one of those must > be useful to me, but as best as I can tell they are not. But I am > really hoping that someone has configured a system like mine and has a > good group file they might share for this situation, or a peak at their > volume info output? > > or maybe this is really just about as good as I should expect? Maybe > the fix is that I need more/faster cores? I hope not, as that isn't > really an option. > > Anyway, here is my volume info as promised. > > root at mooglian:/Computerisms/sites/computerisms.ca/log# gluster v info > > Volume Name: webisms > Type: Distributed-Replicate > Volume ID: 261901e7-60b4-4760-897d-0163beed356e > Status: Started > Snapshot Count: 0 > Number of Bricks: 2 x (2 + 1) = 6 > Transport-type: tcp > Bricks: > Brick1: mooglian:/var/GlusterBrick/replset-0/webisms-replset-0 > Brick2: moogle:/var/GlusterBrick/replset-0/webisms-replset-0 > Brick3: moogle:/var/GlusterBrick/replset-0-arb/webisms-replset-0-arb > (arbiter) > Brick4: moogle:/var/GlusterBrick/replset-1/webisms-replset-1 > Brick5: mooglian:/var/GlusterBrick/replset-1/webisms-replset-1 > Brick6: mooglian:/var/GlusterBrick/replset-1-arb/webisms-replset-1-arb > (arbiter) > Options Reconfigured: > auth.allow: xxxx > performance.client-io-threads: off > nfs.disable: on > storage.fips-mode-rchecksum: on > transport.address-family: inet > performance.stat-prefetch: on > network.inode-lru-limit: 200000 > performance.write-behind-window-size: 4MB > performance.readdir-ahead: on > performance.io-thread-count: 64 > performance.cache-size: 8GB > server.event-threads: 8 > client.event-threads: 8 > performance.nl-cache-timeout: 600 > > > -- > Bob Miller > Cell: 867-334-7117 > Office: 867-633-3760 > Office: 867-322-0362 > www.computerisms.ca > ________ > > > > Community Meeting Calendar: > > Schedule - > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > Bridge: https://bluejeans.com/441850968 > > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bob at computerisms.ca Tue Aug 4 19:47:44 2020 From: bob at computerisms.ca (Computerisms Corporation) Date: Tue, 4 Aug 2020 12:47:44 -0700 Subject: [Gluster-users] performance In-Reply-To: <7991483E-5365-4C87-89FA-C871AED18062@yahoo.com> References: <696b3c28-519b-c3e3-ce5d-e60d2f194d4c@computerisms.ca> <7991483E-5365-4C87-89FA-C871AED18062@yahoo.com> Message-ID: <345b06c4-5996-9aa3-f846-0944c60ee398@computerisms.ca> Hi Strahil, thanks for your response. >> >> I have compiled gluster 7.6 from sources on both servers. > > There is a 7.7 version which is fixing somw stuff. Why do you have to compile it from source ? Because I have often found with other stuff in the past compiling from source makes a bunch of problems go away. software generally works the way the developers expect it to if you use the sources, so they are better able to help if required. so now I generally compile most of my center-piece softwares and use packages for all the supporting stuff. > >> Servers are 6core/3.4Ghz with 32 GB RAM, no swap, and SSD and gigabit >> network connections. They are running debian, and are being used as >> redundant web servers. There is some 3Million files on the Gluster >> Storage averaging 130KB/file. > > This type of workload is called 'metadata-intensive'. does this mean the metadata-cache group file would be a good one to enable? will try. waited 10 minutes, no change that I can see. > There are some recommendations for this type of workload: > https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3/html/administration_guide/small_file_performance_enhancements > > Keep an eye on the section that mentions dirty-ratio?= 5 &dirty-background-ration?= 2. I have actually read that whole manual, and specifically that page several times. And also this one: https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.1/html/administration_guide/small_file_performance_enhancements Perhaps I am not understanding it correctly. I tried these suggestions before and it got worse, not better. so I have been operating under the assumption that maybe these guidelines are not appropriate for newer versions. But will try again. adjusting the dirty ratios. Load average went from around 15 to 35 in about 2-3 minutes, but 20 minutes later, it is back down to 20. It may be having a minimal positive impact on cpu, though, I haven't see the main glusterfs go over 200% since I changed this, an the brick processes are hovering just below 50% where they were consistently above 50% before. Might just be time of day with the system not as busy. after watching for 30 minutes, load average is fluctuating between 10 and 30, but cpu idle appears marginally better on average than it was. >> Interestingly, mostly because it is not something I have ever >> experienced before, software interrupts sit between 1 and 5 on each >> core, but the last core is usually sitting around 20. Have never >> encountered a high load average where the si number was ever >> significant. I have googled the crap out of that (as well as gluster >> performance in general), there are nearly limitless posts about what it >> >> is, but have yet to see one thing to explain what to do about it. > > There is an explanation about that in the link I provided above: > > Configuring a higher event threads value than the available processing units could again cause context switches on these threads. As a result reducing the number deduced from the previous step to a number that is less that the available processing units is recommended. Okay, again, have played with these numbers before and it did not pan out as expected. if I understand it correctly, I have 3 brick processes (glusterfsd), so the "deduced" number should be 3, and I should set it lower than that, so 2. but it also says "If a specific thread consumes more number of CPU cycles than needed, increasing the event thread count would enhance the performance of the Red Hat Storage Server." which is why I had it at 8. but will set it to 2 now. load average is at 17 to start, waiting a while to see what happens. so 15 minutes later, load average is currently 12, but is fluctuating between 10 and 20, have seen no significant change in cpu usage or anything else in top. now try also changing server.outstanding-rpc-limit to 256 and wait. 15 minutes later; load has been above 30 but is currently back down to 12. no significant change in cpu. try increasing to 512 and wait. 15 minutes later, load average is 50. no signficant difference in cpu. Software interrupts remain around where they were. wa from top remains about where it was. not sure why load average is climbing so high. changing rpc-limit to 128. ugh. 10 minutes later, load average just popped over 100. resetting rpc-limit. now trying cluster.lookup-optimize on, lazy rebalancing (probably a bad idea on the live system, but how much worse can it get?) Ya, bad idea, 80 hours estimated to complete, load is over 50 and server is crawling. disabling rebalance and turning lookup-optimize off, for now. right now the only suggested parameter I haven't played with is the performance.io-thread-count, which I currently have at 64. sigh. an hour later load average is 80 and climbing. apache processes are numbering in the hundreds and I am constantly having to restart it. this brings load average down to 5, but as apache processes climb and are held open load average gets up to over 100 again with 3-4 minutes, and system starts going non-responsive. rinse and repeat. so followed all the recommendations, maybe the dirty settings had a small positive impact, but overall system is most definitely worse for having made the changes. I have returned the configs back to how they were except the dirty settings and the metadata-cache group. increased performance.cache-size to 16GB for now, because that is the one thing that seems to help when I "tune" (aka make worse) the system. have had to restart apache a couple dozen times or more, but after another 30 minutes or so system has pretty much settled back to how it was before I started. cpu is like I originally stated, all 6 cores maxed out most of the time, software interrupts still have all cpus running around 5 with the last one consistently sitting around 20-25. Disk is busy but not usually maxed out. RAM is about half used. network load peaks at about 1/3 capacity. load average is between 10 and 20. sites are responding, but sluggish. so am I not reading these recommendations and following the instructions correctly? am I not waiting long enough after each implementation, should I be making 1 change per day instead of thinking 15 minutes should be enough for the system to catch up? I have read the full red hat documentation and the significant majority of the gluster docs, maybe I am missing something else there? should these settings have had a different effect than they did? For what it's worth, I am running ext4 as my underlying fs and I have read a few times that XFS might have been a better choice. But that is not a trivial experiment to make at this time with the system in production. It's one thing (and still a bad thing to be sure) to semi-bork the system for an hour or two while I play with configurations, but would take a day or so offline to reformat and restore the data. > > As 'storage.fips-mode-rchecksum' is using sha256, you can try to disable it - which should use the less cpu intensive md5. Yet, I have never played with that option ... Done. no signficant difference than I can see. > Check the RH page about the tunings and try different values for the event threads. in the past I have tried 2, 4, 8, 16, and 32. Playing with just those I never noticed that any of them made any difference. Though I might have some different options now than I did then, so might try these again throughout the day... Thanks again for your time Strahil, if you have any more thoughts would love to hear them. > > > Best Regards, > Strahil Nikolov > From bob at computerisms.ca Tue Aug 4 19:48:51 2020 From: bob at computerisms.ca (Computerisms Corporation) Date: Tue, 4 Aug 2020 12:48:51 -0700 Subject: [Gluster-users] performance In-Reply-To: References: <696b3c28-519b-c3e3-ce5d-e60d2f194d4c@computerisms.ca> Message-ID: <4b507ee0-1028-f006-2fb7-461a6bc0a3ef@computerisms.ca> Hi Artem, would also like this recipe. If you have any comments on my answer to Strahil, would love to hear them... On 2020-08-03 9:42 p.m., Artem Russakovskii wrote: > I tried putting all web files (specifically WordPress php and static > files as well as various cache files) on gluster before, and the results > were miserable on a busy site - our usual ~8-10 load quickly turned into > 100+ and killed everything. > > I had to go back to running just the user uploads (which are static > files in the Wordpress uploads/ dir) on gluster and using rsync (via > lsyncd) for the frequently executed php / cache. > > I'd love to figure this out as well and tune gluster for heavy reads and > moderate writes, but I haven't cracked that recipe yet. > > On Mon, Aug 3, 2020, 8:08 PM Computerisms Corporation > > wrote: > > Hi Gurus, > > I have been trying to wrap my head around performance improvements > on my > gluster setup, and I don't seem to be making any progress.? I mean > forward progress.? making it worse takes practically no effort at all. > > My gluster is distributed-replicated across 6 bricks and 2 servers, > with > an arbiter on each server.? I designed it like this so I have an > expansion path to more servers in the future (like the staggered > arbiter > diagram in the red hat documentation).? gluster v info output is below. > I have compiled gluster 7.6 from sources on both servers. > > Servers are 6core/3.4Ghz with 32 GB RAM, no swap, and SSD and gigabit > network connections.? They are running debian, and are being used as > redundant web servers.? There is some 3Million files on the Gluster > Storage averaging 130KB/file.? Currently only one of the two servers is > serving web services.? There are well over 100 sites, and apache > server-status claims around 5 hits per second, depending on time of > day, > so a fair bit of logging going on.? The gluster is only holding website > data and config files that will be common between the two servers, no > databases or anything like that on the Gluster. > > When the serving server is under load load average is consistently > 12-20.? glusterfs is always at the top with 150%-250% cpu, and each > of 3 > bricks at roughly 50-70%, so consistently pegging 4 of the 6 cores. > apache processes will easily eat up all the rest of the cpus after > that. > ? And web page response time is underwhelming at best. > > Interestingly, mostly because it is not something I have ever > experienced before, software interrupts sit between 1 and 5 on each > core, but the last core is usually sitting around 20.? Have never > encountered a high load average where the si number was ever > significant.? I have googled the crap out of that (as well as gluster > performance in general), there are nearly limitless posts about what it > is, but have yet to see one thing to explain what to do about it. > Sadly > I can't really shut down the gluster process to confirm if that is the > cause, but it's a pretty good bet, I think. > > When the system is not under load, glusterfs will be running at around > 100% with each of the 3 bricks around 35%, so using 2 cores when doing > not much of anything. > > nload shows the network cards rarely climb above 300 Mbps unless I am > doing a direct file transfer between the servers, in which case it gets > right up to the 1Gbps limit.? RAM is never above 15GB unless I am > causing it to happen.? atop show a disk busy percentage, it is often > above 50% and sometimes will hit 100%, and is no where near as > consistently showing excessive usage like the cpu cores are.? The cpu > definitely seems to be the bottleneck. > > When I found out about the groups directory, I figured one of those > must > be useful to me, but as best as I can tell they are not.? But I am > really hoping that someone has configured a system like mine and has a > good group file they might share for this situation, or a peak at their > volume info output? > > or maybe this is really just about as good as I should expect?? Maybe > the fix is that I need more/faster cores?? I hope not, as that isn't > really an option. > > Anyway, here is my volume info as promised. > > root at mooglian:/Computerisms/sites/computerisms.ca/log# > gluster v info > > Volume Name: webisms > Type: Distributed-Replicate > Volume ID: 261901e7-60b4-4760-897d-0163beed356e > Status: Started > Snapshot Count: 0 > Number of Bricks: 2 x (2 + 1) = 6 > Transport-type: tcp > Bricks: > Brick1: mooglian:/var/GlusterBrick/replset-0/webisms-replset-0 > Brick2: moogle:/var/GlusterBrick/replset-0/webisms-replset-0 > Brick3: moogle:/var/GlusterBrick/replset-0-arb/webisms-replset-0-arb > (arbiter) > Brick4: moogle:/var/GlusterBrick/replset-1/webisms-replset-1 > Brick5: mooglian:/var/GlusterBrick/replset-1/webisms-replset-1 > Brick6: mooglian:/var/GlusterBrick/replset-1-arb/webisms-replset-1-arb > (arbiter) > Options Reconfigured: > auth.allow: xxxx > performance.client-io-threads: off > nfs.disable: on > storage.fips-mode-rchecksum: on > transport.address-family: inet > performance.stat-prefetch: on > network.inode-lru-limit: 200000 > performance.write-behind-window-size: 4MB > performance.readdir-ahead: on > performance.io-thread-count: 64 > performance.cache-size: 8GB > server.event-threads: 8 > client.event-threads: 8 > performance.nl-cache-timeout: 600 > > > -- > Bob Miller > Cell: 867-334-7117 > Office: 867-633-3760 > Office: 867-322-0362 > www.computerisms.ca > ________ > > > > Community Meeting Calendar: > > Schedule - > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > Bridge: https://bluejeans.com/441850968 > > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > From hunter86_bg at yahoo.com Tue Aug 4 22:51:59 2020 From: hunter86_bg at yahoo.com (Strahil Nikolov) Date: Wed, 05 Aug 2020 01:51:59 +0300 Subject: [Gluster-users] performance In-Reply-To: <345b06c4-5996-9aa3-f846-0944c60ee398@computerisms.ca> References: <696b3c28-519b-c3e3-ce5d-e60d2f194d4c@computerisms.ca> <7991483E-5365-4C87-89FA-C871AED18062@yahoo.com> <345b06c4-5996-9aa3-f846-0944c60ee398@computerisms.ca> Message-ID: <2CD68ED2-199F-407D-B0CC-385793BA16FD@yahoo.com> ?? 4 ?????? 2020 ?. 22:47:44 GMT+03:00, Computerisms Corporation ??????: >Hi Strahil, thanks for your response. > >>> >>> I have compiled gluster 7.6 from sources on both servers. >> >> There is a 7.7 version which is fixing somw stuff. Why do you have >to compile it from source ? > >Because I have often found with other stuff in the past compiling from >source makes a bunch of problems go away. software generally works the > >way the developers expect it to if you use the sources, so they are >better able to help if required. so now I generally compile most of my > >center-piece softwares and use packages for all the supporting stuff. Hm... OK. I guess you can try 7.7 whenever it's possible. >> >>> Servers are 6core/3.4Ghz with 32 GB RAM, no swap, and SSD and >gigabit >>> network connections. They are running debian, and are being used as >>> redundant web servers. There is some 3Million files on the Gluster >>> Storage averaging 130KB/file. >> >> This type of workload is called 'metadata-intensive'. > >does this mean the metadata-cache group file would be a good one to >enable? will try. > >waited 10 minutes, no change that I can see. > >> There are some recommendations for this type of workload: >> >https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3/html/administration_guide/small_file_performance_enhancements >> >> Keep an eye on the section that mentions dirty-ratio?= 5 >&dirty-background-ration?= 2. > >I have actually read that whole manual, and specifically that page >several times. And also this one: > >https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.1/html/administration_guide/small_file_performance_enhancements > >Perhaps I am not understanding it correctly. I tried these suggestions > >before and it got worse, not better. so I have been operating under >the >assumption that maybe these guidelines are not appropriate for newer >versions. Actually, the settings are not changed much, so they should work for you. >But will try again. adjusting the dirty ratios. > >Load average went from around 15 to 35 in about 2-3 minutes, but 20 >minutes later, it is back down to 20. It may be having a minimal >positive impact on cpu, though, I haven't see the main glusterfs go >over >200% since I changed this, an the brick processes are hovering just >below 50% where they were consistently above 50% before. Might just >be >time of day with the system not as busy. > >after watching for 30 minutes, load average is fluctuating between 10 >and 30, but cpu idle appears marginally better on average than it was. > >>> Interestingly, mostly because it is not something I have ever >>> experienced before, software interrupts sit between 1 and 5 on each >>> core, but the last core is usually sitting around 20. Have never >>> encountered a high load average where the si number was ever >>> significant. I have googled the crap out of that (as well as >gluster >>> performance in general), there are nearly limitless posts about what >it >>> >>> is, but have yet to see one thing to explain what to do about it. This is happening on all nodes ? I got a similar situation caused by bad NIC (si in top was way high), but the chance for bad NIC on all servers is very low. You can still patch OS + Firmware on your next maintenance. >> There is an explanation about that in the link I provided above: >> >> Configuring a higher event threads value than the available >processing units could again cause context switches on these threads. >As a result reducing the number deduced from the previous step to a >number that is less that the available processing units is recommended. > >Okay, again, have played with these numbers before and it did not pan >out as expected. if I understand it correctly, I have 3 brick >processes >(glusterfsd), so the "deduced" number should be 3, and I should set it >lower than that, so 2. but it also says "If a specific thread consumes > >more number of CPU cycles than needed, increasing the event thread >count >would enhance the performance of the Red Hat Storage Server." which is > >why I had it at 8. Yeah, but you got only 6 cores and they are not dedicated for gluster only. I think that you need to test with lower values. >but will set it to 2 now. load average is at 17 to start, waiting a >while to see what happens. > >so 15 minutes later, load average is currently 12, but is fluctuating >between 10 and 20, have seen no significant change in cpu usage or >anything else in top. > >now try also changing server.outstanding-rpc-limit to 256 and wait. > >15 minutes later; load has been above 30 but is currently back down to >12. no significant change in cpu. try increasing to 512 and wait. > >15 minutes later, load average is 50. no signficant difference in cpu. > >Software interrupts remain around where they were. wa from top remains > >about where it was. not sure why load average is climbing so high. >changing rpc-limit to 128. > >ugh. 10 minutes later, load average just popped over 100. resetting >rpc-limit. > >now trying cluster.lookup-optimize on, lazy rebalancing (probably a bad > >idea on the live system, but how much worse can it get?) Ya, bad idea, > >80 hours estimated to complete, load is over 50 and server is crawling. > >disabling rebalance and turning lookup-optimize off, for now. > >right now the only suggested parameter I haven't played with is the >performance.io-thread-count, which I currently have at 64. I think that as you have SSDs only, you might have some results by changing this one. >sigh. an hour later load average is 80 and climbing. apache processes > >are numbering in the hundreds and I am constantly having to restart it. > >this brings load average down to 5, but as apache processes climb and >are held open load average gets up to over 100 again with 3-4 minutes, >and system starts going non-responsive. rinse and repeat. > >so followed all the recommendations, maybe the dirty settings had a >small positive impact, but overall system is most definitely worse for >having made the changes. > >I have returned the configs back to how they were except the dirty >settings and the metadata-cache group. increased >performance.cache-size >to 16GB for now, because that is the one thing that seems to help when >I >"tune" (aka make worse) the system. have had to restart apache a >couple >dozen times or more, but after another 30 minutes or so system has >pretty much settled back to how it was before I started. cpu is like I > >originally stated, all 6 cores maxed out most of the time, software >interrupts still have all cpus running around 5 with the last one >consistently sitting around 20-25. Disk is busy but not usually maxed >out. RAM is about half used. network load peaks at about 1/3 >capacity. >load average is between 10 and 20. sites are responding, but sluggish. > >so am I not reading these recommendations and following the >instructions >correctly? am I not waiting long enough after each implementation, >should I be making 1 change per day instead of thinking 15 minutes >should be enough for the system to catch up? I have read the full red >hat documentation and the significant majority of the gluster docs, >maybe I am missing something else there? should these settings have >had >a different effect than they did? > >For what it's worth, I am running ext4 as my underlying fs and I have >read a few times that XFS might have been a better choice. But that is > >not a trivial experiment to make at this time with the system in >production. It's one thing (and still a bad thing to be sure) to >semi-bork the system for an hour or two while I play with >configurations, but would take a day or so offline to reformat and >restore the data. XFS should bring better performance, but if the issue is not in FS -> it won't make a change... What I/O scheduler are you using for the SSDs (you can check via 'cat /sys/block/sdX/queue/scheduler)? >> >> As 'storage.fips-mode-rchecksum' is using sha256, you can try to >disable it - which should use the less cpu intensive md5. Yet, I have >never played with that option ... > >Done. no signficant difference than I can see. > >> Check the RH page about the tunings and try different values for the >event threads. > >in the past I have tried 2, 4, 8, 16, and 32. Playing with just those >I >never noticed that any of them made any difference. Though I might >have >some different options now than I did then, so might try these again >throughout the day... Are you talking about server or client event threads (or both)? >Thanks again for your time Strahil, if you have any more thoughts would > >love to hear them. Can you check if you use 'noatime' for the bricks ? It won't bring any effect on the CPU side, but it might help with the I/O. I see that your indicator for high load is loadavg, but have you actually checked how many processes are in 'R' or 'D' state ? Some monitoring checks can raise loadavg artificially. Also, are you using software mirroring (either mdadm or striped/mirrored LVs )? >> >> >> Best Regards, >> Strahil Nikolov >> >________ > > > >Community Meeting Calendar: > >Schedule - >Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >Bridge: https://bluejeans.com/441850968 > >Gluster-users mailing list >Gluster-users at gluster.org >https://lists.gluster.org/mailman/listinfo/gluster-users From bob at computerisms.ca Wed Aug 5 01:53:34 2020 From: bob at computerisms.ca (Computerisms Corporation) Date: Tue, 4 Aug 2020 18:53:34 -0700 Subject: [Gluster-users] performance In-Reply-To: <2CD68ED2-199F-407D-B0CC-385793BA16FD@yahoo.com> References: <696b3c28-519b-c3e3-ce5d-e60d2f194d4c@computerisms.ca> <7991483E-5365-4C87-89FA-C871AED18062@yahoo.com> <345b06c4-5996-9aa3-f846-0944c60ee398@computerisms.ca> <2CD68ED2-199F-407D-B0CC-385793BA16FD@yahoo.com> Message-ID: <64ee1b88-42d6-75d2-05ff-4703d168cc25@computerisms.ca> Hi Strahil, thanks again for sticking with me on this. > Hm... OK. I guess you can try 7.7 whenever it's possible. Acknowledged. >> Perhaps I am not understanding it correctly. I tried these suggestions >> >> before and it got worse, not better. so I have been operating under >> the >> assumption that maybe these guidelines are not appropriate for newer >> versions. > > Actually, the settings are not changed much, so they should work for you. Okay, then maybe I am doing something incorrectly, or not understanding some fundamental piece of things that I should be. >>>> Interestingly, mostly because it is not something I have ever >>>> experienced before, software interrupts sit between 1 and 5 on each >>>> core, but the last core is usually sitting around 20. Have never >>>> encountered a high load average where the si number was ever >>>> significant. I have googled the crap out of that (as well as >> gluster >>>> performance in general), there are nearly limitless posts about what >> it >>>> >>>> is, but have yet to see one thing to explain what to do about it. > > This is happening on all nodes ? > I got a similar situation caused by bad NIC (si in top was way high), but the chance for bad NIC on all servers is very low. > You can still patch OS + Firmware on your next maintenance. Yes, but it's not to the same extreme. The other node is currently not actually serving anything to the internet, so right now it's only function is replicated gluster and databases. On the 2nd node there is also one core, the first one in this case as opposed to the last one on the main node, but it sits between 10 and 15 instead of 20 and 25, and the remaining cores will be between 0 and 2 instead of 1 and 5. I have no evidence of any bad hardware, and these servers were both commissioned only within the last couple of months. But will still poke around on this path. >> more number of CPU cycles than needed, increasing the event thread >> count >> would enhance the performance of the Red Hat Storage Server." which is >> >> why I had it at 8. > > Yeah, but you got only 6 cores and they are not dedicated for gluster only. I think that you need to test with lower values. Okay, I will change these values a few times over the next couple of hours and see what happens. >> right now the only suggested parameter I haven't played with is the >> performance.io-thread-count, which I currently have at 64. > > I think that as you have SSDs only, you might have some results by changing this one. Okay, will also modify this incrementally. do you think it can go higher? I think I got this number from a thread on this list, but I am not really sure what would be a reasonable value for my system. >> >> For what it's worth, I am running ext4 as my underlying fs and I have >> read a few times that XFS might have been a better choice. But that is >> >> not a trivial experiment to make at this time with the system in >> production. It's one thing (and still a bad thing to be sure) to >> semi-bork the system for an hour or two while I play with >> configurations, but would take a day or so offline to reformat and >> restore the data. > > XFS should bring better performance, but if the issue is not in FS -> it won't make a change... > What I/O scheduler are you using for the SSDs (you can check via 'cat /sys/block/sdX/queue/scheduler)? # cat /sys/block/vda/queue/scheduler [mq-deadline] none >> in the past I have tried 2, 4, 8, 16, and 32. Playing with just those >> I >> never noticed that any of them made any difference. Though I might >> have >> some different options now than I did then, so might try these again >> throughout the day... > > Are you talking about server or client event threads (or both)? It never occurred to me to set them to different values. so far when I set one I set the other to the same value. > >> Thanks again for your time Strahil, if you have any more thoughts would >> >> love to hear them. > > Can you check if you use 'noatime' for the bricks ? It won't bring any effect on the CPU side, but it might help with the I/O. I checked into this, and I have nodiratime set, but not noatime. from what I can gather, it should provide nearly the same benefit performance wise while leaving the atime attribute on the files. Never know, I may decide I want those at some point in the future. > I see that your indicator for high load is loadavg, but have you actually checked how many processes are in 'R' or 'D' state ? > Some monitoring checks can raise loadavg artificially. occasionally a batch of processes will be in R state, and I see the D state show up from time to time, but mostly everything is S. > Also, are you using software mirroring (either mdadm or striped/mirrored LVs )? No, single disk. And I opted to not put the gluster on a thinLVM, as I don't see myself using the lvm snapshots in this scenario. So, we just moved into a quieter time of the day, but maybe I just stumbled onto something. I was trying to figure out if/how I could throw more RAM at the problem. gluster docs says write behind is not a cache unless flush-behind is on. So seems that is a way to throw ram to it? I put performance.write-behind-window-size: 512MB and performance.flush-behind: on and the whole system calmed down pretty much immediately. could be just timing, though, will have to see tomorrow during business hours whether the system stays at a reasonable load. I will still test the other options you suggested tonight, though, this is probably too good to be true. Can't thank you enough for your input, Strahil, your help is truly appreciated! > >>> >>> >>> Best Regards, >>> Strahil Nikolov >>> >> ________ >> >> >> >> Community Meeting Calendar: >> >> Schedule - >> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >> Bridge: https://bluejeans.com/441850968 >> >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users From gilberto.nunes32 at gmail.com Wed Aug 5 02:00:52 2020 From: gilberto.nunes32 at gmail.com (Gilberto Nunes) Date: Tue, 4 Aug 2020 23:00:52 -0300 Subject: [Gluster-users] Two VMS as arbiter... Message-ID: Hi there. I have two physical servers deployed as replica 2 and, obviously, I got a split-brain. So I am thinking in use two virtual machines,each one in physical servers.... Then this two VMS act as a artiber of gluster set.... Is this doable? Thanks -------------- next part -------------- An HTML attachment was scrubbed... URL: From bob at computerisms.ca Wed Aug 5 02:47:05 2020 From: bob at computerisms.ca (Computerisms Corporation) Date: Tue, 4 Aug 2020 19:47:05 -0700 Subject: [Gluster-users] Two VMS as arbiter... In-Reply-To: References: Message-ID: <4610d2cc-eafa-6a5b-d778-797e6ce7e994@computerisms.ca> Hi Gilberto, My understanding is there can only be one arbiter per replicated set. I don't have a lot of practice with gluster, so this could be bad advice, but the way I dealt with it on my two servers was to use 6 bricks as distributed-replicated (this is also relatively easy to migrate to 3 servers if that happens for you in the future): Server1 Server2 brick1 brick1.5 arbiter1.5 brick2 brick2.5 arbiter2.5 On 2020-08-04 7:00 p.m., Gilberto Nunes wrote: > Hi there. > I have two physical servers deployed as replica 2 and, obviously, I got > a split-brain. > So I am thinking in use two virtual machines,each one in physical > servers.... > Then this two VMS act as a artiber of gluster set.... > > Is this doable? > > Thanks > > ________ > > > > Community Meeting Calendar: > > Schedule - > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > Bridge: https://bluejeans.com/441850968 > > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > From gilberto.nunes32 at gmail.com Wed Aug 5 03:25:07 2020 From: gilberto.nunes32 at gmail.com (Gilberto Nunes) Date: Wed, 5 Aug 2020 00:25:07 -0300 Subject: [Gluster-users] Two VMS as arbiter... In-Reply-To: <4610d2cc-eafa-6a5b-d778-797e6ce7e994@computerisms.ca> References: <4610d2cc-eafa-6a5b-d778-797e6ce7e994@computerisms.ca> Message-ID: Hi Bob! Could you, please, send me more detail about this configuration? I will appreciate that! Thank you --- Gilberto Nunes Ferreira (47) 3025-5907 (47) 99676-7530 - Whatsapp / Telegram Skype: gilberto.nunes36 Em ter., 4 de ago. de 2020 ?s 23:47, Computerisms Corporation < bob at computerisms.ca> escreveu: > Hi Gilberto, > > My understanding is there can only be one arbiter per replicated set. I > don't have a lot of practice with gluster, so this could be bad advice, > but the way I dealt with it on my two servers was to use 6 bricks as > distributed-replicated (this is also relatively easy to migrate to 3 > servers if that happens for you in the future): > > Server1 Server2 > brick1 brick1.5 > arbiter1.5 brick2 > brick2.5 arbiter2.5 > > On 2020-08-04 7:00 p.m., Gilberto Nunes wrote: > > Hi there. > > I have two physical servers deployed as replica 2 and, obviously, I got > > a split-brain. > > So I am thinking in use two virtual machines,each one in physical > > servers.... > > Then this two VMS act as a artiber of gluster set.... > > > > Is this doable? > > > > Thanks > > > > ________ > > > > > > > > Community Meeting Calendar: > > > > Schedule - > > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > > Bridge: https://bluejeans.com/441850968 > > > > Gluster-users mailing list > > Gluster-users at gluster.org > > https://lists.gluster.org/mailman/listinfo/gluster-users > > > ________ > > > > Community Meeting Calendar: > > Schedule - > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > Bridge: https://bluejeans.com/441850968 > > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bob at computerisms.ca Wed Aug 5 05:14:28 2020 From: bob at computerisms.ca (Computerisms Corporation) Date: Tue, 4 Aug 2020 22:14:28 -0700 Subject: [Gluster-users] Two VMS as arbiter... In-Reply-To: References: <4610d2cc-eafa-6a5b-d778-797e6ce7e994@computerisms.ca> Message-ID: <6496d212-9ffa-5112-fc14-aee578b25f01@computerisms.ca> check the example of the chained configuration on this page: https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.3/html/administration_guide/creating_arbitrated_replicated_volumes and apply it to two servers... On 2020-08-04 8:25 p.m., Gilberto Nunes wrote: > Hi Bob! > > Could you, please, send me more detail about this configuration? > I will appreciate that! > > Thank you > --- > Gilberto Nunes Ferreira > > (47) 3025-5907 > ** > (47) 99676-7530 - Whatsapp / Telegram > > Skype: gilberto.nunes36 > > > > > > Em ter., 4 de ago. de 2020 ?s 23:47, Computerisms Corporation > > escreveu: > > Hi Gilberto, > > My understanding is there can only be one arbiter per replicated > set.? I > don't have a lot of practice with gluster, so this could be bad advice, > but the way I dealt with it on my two servers was to use 6 bricks as > distributed-replicated (this is also relatively easy to migrate to 3 > servers if that happens for you in the future): > > Server1? ? ?Server2 > brick1? ? ? brick1.5 > arbiter1.5? brick2 > brick2.5? ? arbiter2.5 > > On 2020-08-04 7:00 p.m., Gilberto Nunes wrote: > > Hi there. > > I have two physical servers deployed as replica 2 and, obviously, > I got > > a split-brain. > > So I am thinking in use two virtual machines,each one in physical > > servers.... > > Then this two VMS act as a artiber of gluster set.... > > > > Is this doable? > > > > Thanks > > > > ________ > > > > > > > > Community Meeting Calendar: > > > > Schedule - > > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > > Bridge: https://bluejeans.com/441850968 > > > > Gluster-users mailing list > > Gluster-users at gluster.org > > https://lists.gluster.org/mailman/listinfo/gluster-users > > > ________ > > > > Community Meeting Calendar: > > Schedule - > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > Bridge: https://bluejeans.com/441850968 > > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > From gilberto.nunes32 at gmail.com Wed Aug 5 11:57:10 2020 From: gilberto.nunes32 at gmail.com (Gilberto Nunes) Date: Wed, 5 Aug 2020 08:57:10 -0300 Subject: [Gluster-users] Two VMS as arbiter... In-Reply-To: <6496d212-9ffa-5112-fc14-aee578b25f01@computerisms.ca> References: <4610d2cc-eafa-6a5b-d778-797e6ce7e994@computerisms.ca> <6496d212-9ffa-5112-fc14-aee578b25f01@computerisms.ca> Message-ID: hum I see... like this: [image: image.png] --- Gilberto Nunes Ferreira (47) 3025-5907 (47) 99676-7530 - Whatsapp / Telegram Skype: gilberto.nunes36 Em qua., 5 de ago. de 2020 ?s 02:14, Computerisms Corporation < bob at computerisms.ca> escreveu: > check the example of the chained configuration on this page: > > > https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.3/html/administration_guide/creating_arbitrated_replicated_volumes > > and apply it to two servers... > > On 2020-08-04 8:25 p.m., Gilberto Nunes wrote: > > Hi Bob! > > > > Could you, please, send me more detail about this configuration? > > I will appreciate that! > > > > Thank you > > --- > > Gilberto Nunes Ferreira > > > > (47) 3025-5907 > > ** > > (47) 99676-7530 - Whatsapp / Telegram > > > > Skype: gilberto.nunes36 > > > > > > > > > > > > Em ter., 4 de ago. de 2020 ?s 23:47, Computerisms Corporation > > > escreveu: > > > > Hi Gilberto, > > > > My understanding is there can only be one arbiter per replicated > > set. I > > don't have a lot of practice with gluster, so this could be bad > advice, > > but the way I dealt with it on my two servers was to use 6 bricks as > > distributed-replicated (this is also relatively easy to migrate to 3 > > servers if that happens for you in the future): > > > > Server1 Server2 > > brick1 brick1.5 > > arbiter1.5 brick2 > > brick2.5 arbiter2.5 > > > > On 2020-08-04 7:00 p.m., Gilberto Nunes wrote: > > > Hi there. > > > I have two physical servers deployed as replica 2 and, obviously, > > I got > > > a split-brain. > > > So I am thinking in use two virtual machines,each one in physical > > > servers.... > > > Then this two VMS act as a artiber of gluster set.... > > > > > > Is this doable? > > > > > > Thanks > > > > > > ________ > > > > > > > > > > > > Community Meeting Calendar: > > > > > > Schedule - > > > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > > > Bridge: https://bluejeans.com/441850968 > > > > > > Gluster-users mailing list > > > Gluster-users at gluster.org > > > https://lists.gluster.org/mailman/listinfo/gluster-users > > > > > ________ > > > > > > > > Community Meeting Calendar: > > > > Schedule - > > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > > Bridge: https://bluejeans.com/441850968 > > > > Gluster-users mailing list > > Gluster-users at gluster.org > > https://lists.gluster.org/mailman/listinfo/gluster-users > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 54749 bytes Desc: not available URL: From mathias.waack at seim-partner.de Wed Aug 5 13:48:52 2020 From: mathias.waack at seim-partner.de (Mathias Waack) Date: Wed, 5 Aug 2020 15:48:52 +0200 Subject: [Gluster-users] Repair after accident Message-ID: <5822bb92-432e-e08e-d230-7adbf57127ce@seim-partner.de> Hi all, we are running a gluster setup with two nodes: Status of volume: gvol Gluster process???????????????????????????? TCP Port? RDMA Port Online? Pid ------------------------------------------------------------------------------ Brick 192.168.1.x:/zbrick????????????????? 49152???? 0 Y?????? 13350 Brick 192.168.1.y:/zbrick????????????????? 49152???? 0 Y?????? 5965 Self-heal Daemon on localhost?????????????? N/A?????? N/A Y?????? 14188 Self-heal Daemon on 192.168.1.93??????????? N/A?????? N/A Y?????? 6003 Task Status of Volume gvol ------------------------------------------------------------------------------ There are no active volume tasks The glusterfs hosts a bunch of containers with its data volumes. The underlying fs is zfs. Few days ago one of the containers created a lot of files in one of its data volumes, and at the end it completely filled up the space of the glusterfs volume. But this happened only on one host, on the other host there was still enough space. We finally were able to identify this container and found out, the sizes of the data on /zbrick were different on both hosts for this container. Now we made the big mistake to delete these files on both hosts in the /zbrick volume, not on the mounted glusterfs volume. Later we found the reason for this behavior: the network driver on the second node partially crashed (which means we ware able to login on the node, so we assumed the network was running, but the card was already dropping packets at this time) at the same time, as the failed container started to fill up the gluster volume. After rebooting the second node? the gluster became available again. Now the glusterfs volume is running again- but it is still (nearly) full: the files created by the container are not visible, but they still count into amount of free space. How can we fix this? In addition there are some files which are no longer accessible since this accident: tail access.log.old tail: cannot open 'access.log.old' for reading: Input/output error Looks like affected by this error are files which have been changed during the accident. Is there a way to fix this too? Thanks ??? Mathias From gilberto.nunes32 at gmail.com Wed Aug 5 14:07:10 2020 From: gilberto.nunes32 at gmail.com (Gilberto Nunes) Date: Wed, 5 Aug 2020 11:07:10 -0300 Subject: [Gluster-users] Two VMS as arbiter... In-Reply-To: References: <4610d2cc-eafa-6a5b-d778-797e6ce7e994@computerisms.ca> <6496d212-9ffa-5112-fc14-aee578b25f01@computerisms.ca> Message-ID: Well... I do the follow: gluster vol create VMS replica 3 arbiter 1 pve01:/DATA/brick1 pve02:/DATA/brick1.5 pve01:/DATA/arbiter1.5 pve02:/DATA/brick2 pv e01:/DATA/brick2.5 pve02:/DATA/arbiter2.5 force And now I have: gluster vol info Volume Name: VMS Type: Distributed-Replicate Volume ID: 1bd712f5-ccb9-4322-8275-abe363d1ffdd Status: Started Snapshot Count: 0 Number of Bricks: 2 x (2 + 1) = 6 Transport-type: tcp Bricks: Brick1: pve01:/DATA/brick1 Brick2: pve02:/DATA/brick1.5 Brick3: pve01:/DATA/arbiter1.5 (arbiter) Brick4: pve02:/DATA/brick2 Brick5: pve01:/DATA/brick2.5 Brick6: pve02:/DATA/arbiter2.5 (arbiter) Options Reconfigured: cluster.quorum-count: 1 cluster.quorum-reads: false cluster.self-heal-daemon: enable cluster.heal-timeout: 10 storage.fips-mode-rchecksum: on transport.address-family: inet nfs.disable: on performance.client-io-threads: off This values I have put it myself, in order to see if could improve the time to make the volume available, when pve01 goes down with ifupdown cluster.quorum-count: 1 cluster.quorum-reads: false cluster.self-heal-daemon: enable cluster.heal-timeout: 10 Nevertheless, it took more than 1 minutes to the volume VMS available in the other host (pve02). Is there any trick to reduce this time ? Thanks --- Gilberto Nunes Ferreira Em qua., 5 de ago. de 2020 ?s 08:57, Gilberto Nunes < gilberto.nunes32 at gmail.com> escreveu: > hum I see... like this: > [image: image.png] > --- > Gilberto Nunes Ferreira > > (47) 3025-5907 > (47) 99676-7530 - Whatsapp / Telegram > > Skype: gilberto.nunes36 > > > > > > Em qua., 5 de ago. de 2020 ?s 02:14, Computerisms Corporation < > bob at computerisms.ca> escreveu: > >> check the example of the chained configuration on this page: >> >> >> https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.3/html/administration_guide/creating_arbitrated_replicated_volumes >> >> and apply it to two servers... >> >> On 2020-08-04 8:25 p.m., Gilberto Nunes wrote: >> > Hi Bob! >> > >> > Could you, please, send me more detail about this configuration? >> > I will appreciate that! >> > >> > Thank you >> > --- >> > Gilberto Nunes Ferreira >> > >> > (47) 3025-5907 >> > ** >> > (47) 99676-7530 - Whatsapp / Telegram >> > >> > Skype: gilberto.nunes36 >> > >> > >> > >> > >> > >> > Em ter., 4 de ago. de 2020 ?s 23:47, Computerisms Corporation >> > > escreveu: >> > >> > Hi Gilberto, >> > >> > My understanding is there can only be one arbiter per replicated >> > set. I >> > don't have a lot of practice with gluster, so this could be bad >> advice, >> > but the way I dealt with it on my two servers was to use 6 bricks as >> > distributed-replicated (this is also relatively easy to migrate to 3 >> > servers if that happens for you in the future): >> > >> > Server1 Server2 >> > brick1 brick1.5 >> > arbiter1.5 brick2 >> > brick2.5 arbiter2.5 >> > >> > On 2020-08-04 7:00 p.m., Gilberto Nunes wrote: >> > > Hi there. >> > > I have two physical servers deployed as replica 2 and, obviously, >> > I got >> > > a split-brain. >> > > So I am thinking in use two virtual machines,each one in physical >> > > servers.... >> > > Then this two VMS act as a artiber of gluster set.... >> > > >> > > Is this doable? >> > > >> > > Thanks >> > > >> > > ________ >> > > >> > > >> > > >> > > Community Meeting Calendar: >> > > >> > > Schedule - >> > > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >> > > Bridge: https://bluejeans.com/441850968 >> > > >> > > Gluster-users mailing list >> > > Gluster-users at gluster.org >> > > https://lists.gluster.org/mailman/listinfo/gluster-users >> > > >> > ________ >> > >> > >> > >> > Community Meeting Calendar: >> > >> > Schedule - >> > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >> > Bridge: https://bluejeans.com/441850968 >> > >> > Gluster-users mailing list >> > Gluster-users at gluster.org >> > https://lists.gluster.org/mailman/listinfo/gluster-users >> > >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 54749 bytes Desc: not available URL: From bob at computerisms.ca Wed Aug 5 16:44:28 2020 From: bob at computerisms.ca (Computerisms Corporation) Date: Wed, 5 Aug 2020 09:44:28 -0700 Subject: [Gluster-users] performance In-Reply-To: <64ee1b88-42d6-75d2-05ff-4703d168cc25@computerisms.ca> References: <696b3c28-519b-c3e3-ce5d-e60d2f194d4c@computerisms.ca> <7991483E-5365-4C87-89FA-C871AED18062@yahoo.com> <345b06c4-5996-9aa3-f846-0944c60ee398@computerisms.ca> <2CD68ED2-199F-407D-B0CC-385793BA16FD@yahoo.com> <64ee1b88-42d6-75d2-05ff-4703d168cc25@computerisms.ca> Message-ID: Hi List, > So, we just moved into a quieter time of the day, but maybe I just > stumbled onto something.? I was trying to figure out if/how I could > throw more RAM at the problem.? gluster docs says write behind is not a > cache unless flush-behind is on.? So seems that is a way to throw ram to > it?? I put performance.write-behind-window-size: 512MB and > performance.flush-behind: on and the whole system calmed down pretty > much immediately.? could be just timing, though, will have to see > tomorrow during business hours whether the system stays at a reasonable > load. so reporting back that this seems to have definitely had a significant positive effect. So far today I have not seen the load average climb over 13 with the 15minute average hovering around 7. cpus are still spiking from time to time, but they are not staying maxed out all the time, and frequently I am seeing brief periods of up to 80% idle. glusterfs process still spiking up to 180% or so, but consistently running around 70%, and the brick processes still spiking up to 70-80%, but consistently running around 20%. Disk has only been above 50% in atop once so far today when it spiked up to 92%, and still lots of RAM left over. So far nload even seems indicates I could get away with a 100Mbit network connection. Websites are snappy relative to what they were, still a bit sluggish on the first page of any given site, but tolerable or close to. Apache processes are opening and closing right away, instead of stacking up. Overall, system is performing pretty much like I would expect it to without gluster. I haven't played with any of the other settings yet, just going to leave it like this for a day. I have to admit I am a little bit suspicious. I have been arguing with Gluster for a very long time, and I have never known it to play this nice. kind feels like when your girl tells you she is "fine"; conversation has stopped, but you aren't really sure if it's done... > > I will still test the other options you suggested tonight, though, this > is probably too good to be true. > > Can't thank you enough for your input, Strahil, your help is truly > appreciated! > > > > > > >> >>>> >>>> >>>> Best Regards, >>>> Strahil Nikolov >>>> >>> ________ >>> >>> >>> >>> Community Meeting Calendar: >>> >>> Schedule - >>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >>> Bridge: https://bluejeans.com/441850968 >>> >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> https://lists.gluster.org/mailman/listinfo/gluster-users > ________ > > > > Community Meeting Calendar: > > Schedule - > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > Bridge: https://bluejeans.com/441850968 > > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users From gilberto.nunes32 at gmail.com Wed Aug 5 20:41:57 2020 From: gilberto.nunes32 at gmail.com (Gilberto Nunes) Date: Wed, 5 Aug 2020 17:41:57 -0300 Subject: [Gluster-users] Two VMS as arbiter... In-Reply-To: References: <4610d2cc-eafa-6a5b-d778-797e6ce7e994@computerisms.ca> <6496d212-9ffa-5112-fc14-aee578b25f01@computerisms.ca> Message-ID: I'm in trouble here. When I shutdown the pve01 server, the shared folder over glusterfs is EMPTY! It's supposed to be a qcow2 file inside it. The content is show right, just after I power on pve01 backup... Some advice? Thanks --- Gilberto Nunes Ferreira (47) 3025-5907 (47) 99676-7530 - Whatsapp / Telegram Skype: gilberto.nunes36 Em qua., 5 de ago. de 2020 ?s 11:07, Gilberto Nunes < gilberto.nunes32 at gmail.com> escreveu: > Well... > I do the follow: > > gluster vol create VMS replica 3 arbiter 1 pve01:/DATA/brick1 > pve02:/DATA/brick1.5 pve01:/DATA/arbiter1.5 pve02:/DATA/brick2 pv > e01:/DATA/brick2.5 pve02:/DATA/arbiter2.5 force > > And now I have: > gluster vol info > > Volume Name: VMS > Type: Distributed-Replicate > Volume ID: 1bd712f5-ccb9-4322-8275-abe363d1ffdd > Status: Started > Snapshot Count: 0 > Number of Bricks: 2 x (2 + 1) = 6 > Transport-type: tcp > Bricks: > Brick1: pve01:/DATA/brick1 > Brick2: pve02:/DATA/brick1.5 > Brick3: pve01:/DATA/arbiter1.5 (arbiter) > Brick4: pve02:/DATA/brick2 > Brick5: pve01:/DATA/brick2.5 > Brick6: pve02:/DATA/arbiter2.5 (arbiter) > Options Reconfigured: > cluster.quorum-count: 1 > cluster.quorum-reads: false > cluster.self-heal-daemon: enable > cluster.heal-timeout: 10 > storage.fips-mode-rchecksum: on > transport.address-family: inet > nfs.disable: on > performance.client-io-threads: off > > This values I have put it myself, in order to see if could improve the > time to make the volume available, when pve01 goes down with ifupdown > cluster.quorum-count: 1 > cluster.quorum-reads: false > cluster.self-heal-daemon: enable > cluster.heal-timeout: 10 > > Nevertheless, it took more than 1 minutes to the volume VMS available in > the other host (pve02). > Is there any trick to reduce this time ? > > Thanks > > --- > Gilberto Nunes Ferreira > > > > > > > Em qua., 5 de ago. de 2020 ?s 08:57, Gilberto Nunes < > gilberto.nunes32 at gmail.com> escreveu: > >> hum I see... like this: >> [image: image.png] >> --- >> Gilberto Nunes Ferreira >> >> (47) 3025-5907 >> (47) 99676-7530 - Whatsapp / Telegram >> >> Skype: gilberto.nunes36 >> >> >> >> >> >> Em qua., 5 de ago. de 2020 ?s 02:14, Computerisms Corporation < >> bob at computerisms.ca> escreveu: >> >>> check the example of the chained configuration on this page: >>> >>> >>> https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.3/html/administration_guide/creating_arbitrated_replicated_volumes >>> >>> and apply it to two servers... >>> >>> On 2020-08-04 8:25 p.m., Gilberto Nunes wrote: >>> > Hi Bob! >>> > >>> > Could you, please, send me more detail about this configuration? >>> > I will appreciate that! >>> > >>> > Thank you >>> > --- >>> > Gilberto Nunes Ferreira >>> > >>> > (47) 3025-5907 >>> > ** >>> > (47) 99676-7530 - Whatsapp / Telegram >>> > >>> > Skype: gilberto.nunes36 >>> > >>> > >>> > >>> > >>> > >>> > Em ter., 4 de ago. de 2020 ?s 23:47, Computerisms Corporation >>> > > escreveu: >>> > >>> > Hi Gilberto, >>> > >>> > My understanding is there can only be one arbiter per replicated >>> > set. I >>> > don't have a lot of practice with gluster, so this could be bad >>> advice, >>> > but the way I dealt with it on my two servers was to use 6 bricks >>> as >>> > distributed-replicated (this is also relatively easy to migrate to >>> 3 >>> > servers if that happens for you in the future): >>> > >>> > Server1 Server2 >>> > brick1 brick1.5 >>> > arbiter1.5 brick2 >>> > brick2.5 arbiter2.5 >>> > >>> > On 2020-08-04 7:00 p.m., Gilberto Nunes wrote: >>> > > Hi there. >>> > > I have two physical servers deployed as replica 2 and, >>> obviously, >>> > I got >>> > > a split-brain. >>> > > So I am thinking in use two virtual machines,each one in >>> physical >>> > > servers.... >>> > > Then this two VMS act as a artiber of gluster set.... >>> > > >>> > > Is this doable? >>> > > >>> > > Thanks >>> > > >>> > > ________ >>> > > >>> > > >>> > > >>> > > Community Meeting Calendar: >>> > > >>> > > Schedule - >>> > > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >>> > > Bridge: https://bluejeans.com/441850968 >>> > > >>> > > Gluster-users mailing list >>> > > Gluster-users at gluster.org >>> > > https://lists.gluster.org/mailman/listinfo/gluster-users >>> > > >>> > ________ >>> > >>> > >>> > >>> > Community Meeting Calendar: >>> > >>> > Schedule - >>> > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >>> > Bridge: https://bluejeans.com/441850968 >>> > >>> > Gluster-users mailing list >>> > Gluster-users at gluster.org >>> > https://lists.gluster.org/mailman/listinfo/gluster-users >>> > >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 54749 bytes Desc: not available URL: From hunter86_bg at yahoo.com Wed Aug 5 20:54:37 2020 From: hunter86_bg at yahoo.com (Strahil Nikolov) Date: Wed, 05 Aug 2020 23:54:37 +0300 Subject: [Gluster-users] Two VMS as arbiter... In-Reply-To: References: <4610d2cc-eafa-6a5b-d778-797e6ce7e994@computerisms.ca> <6496d212-9ffa-5112-fc14-aee578b25f01@computerisms.ca> Message-ID: <1EF050E2-FDB5-42DE-BF6B-4AA08997CB4B@yahoo.com> If I understood you correctly, you are looking for this: Option: network.ping-timeout Default Value: 42 Description: Time duration for which the client waits to check if the server is responsive. Best Regards, Strahil Nikolov ?? 5 ?????? 2020 ?. 17:07:10 GMT+03:00, Gilberto Nunes ??????: >Well... >I do the follow: > >gluster vol create VMS replica 3 arbiter 1 pve01:/DATA/brick1 >pve02:/DATA/brick1.5 pve01:/DATA/arbiter1.5 pve02:/DATA/brick2 pv >e01:/DATA/brick2.5 pve02:/DATA/arbiter2.5 force > >And now I have: >gluster vol info > >Volume Name: VMS >Type: Distributed-Replicate >Volume ID: 1bd712f5-ccb9-4322-8275-abe363d1ffdd >Status: Started >Snapshot Count: 0 >Number of Bricks: 2 x (2 + 1) = 6 >Transport-type: tcp >Bricks: >Brick1: pve01:/DATA/brick1 >Brick2: pve02:/DATA/brick1.5 >Brick3: pve01:/DATA/arbiter1.5 (arbiter) >Brick4: pve02:/DATA/brick2 >Brick5: pve01:/DATA/brick2.5 >Brick6: pve02:/DATA/arbiter2.5 (arbiter) >Options Reconfigured: >cluster.quorum-count: 1 >cluster.quorum-reads: false >cluster.self-heal-daemon: enable >cluster.heal-timeout: 10 >storage.fips-mode-rchecksum: on >transport.address-family: inet >nfs.disable: on >performance.client-io-threads: off > >This values I have put it myself, in order to see if could improve the >time >to make the volume available, when pve01 goes down with ifupdown >cluster.quorum-count: 1 >cluster.quorum-reads: false >cluster.self-heal-daemon: enable >cluster.heal-timeout: 10 > >Nevertheless, it took more than 1 minutes to the volume VMS available >in >the other host (pve02). >Is there any trick to reduce this time ? > >Thanks > >--- >Gilberto Nunes Ferreira > > > > > > >Em qua., 5 de ago. de 2020 ?s 08:57, Gilberto Nunes < >gilberto.nunes32 at gmail.com> escreveu: > >> hum I see... like this: >> [image: image.png] >> --- >> Gilberto Nunes Ferreira >> >> (47) 3025-5907 >> (47) 99676-7530 - Whatsapp / Telegram >> >> Skype: gilberto.nunes36 >> >> >> >> >> >> Em qua., 5 de ago. de 2020 ?s 02:14, Computerisms Corporation < >> bob at computerisms.ca> escreveu: >> >>> check the example of the chained configuration on this page: >>> >>> >>> >https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.3/html/administration_guide/creating_arbitrated_replicated_volumes >>> >>> and apply it to two servers... >>> >>> On 2020-08-04 8:25 p.m., Gilberto Nunes wrote: >>> > Hi Bob! >>> > >>> > Could you, please, send me more detail about this configuration? >>> > I will appreciate that! >>> > >>> > Thank you >>> > --- >>> > Gilberto Nunes Ferreira >>> > >>> > (47) 3025-5907 >>> > ** >>> > (47) 99676-7530 - Whatsapp / Telegram >>> > >>> > Skype: gilberto.nunes36 >>> > >>> > >>> > >>> > >>> > >>> > Em ter., 4 de ago. de 2020 ?s 23:47, Computerisms Corporation >>> > > escreveu: >>> > >>> > Hi Gilberto, >>> > >>> > My understanding is there can only be one arbiter per >replicated >>> > set. I >>> > don't have a lot of practice with gluster, so this could be >bad >>> advice, >>> > but the way I dealt with it on my two servers was to use 6 >bricks as >>> > distributed-replicated (this is also relatively easy to >migrate to 3 >>> > servers if that happens for you in the future): >>> > >>> > Server1 Server2 >>> > brick1 brick1.5 >>> > arbiter1.5 brick2 >>> > brick2.5 arbiter2.5 >>> > >>> > On 2020-08-04 7:00 p.m., Gilberto Nunes wrote: >>> > > Hi there. >>> > > I have two physical servers deployed as replica 2 and, >obviously, >>> > I got >>> > > a split-brain. >>> > > So I am thinking in use two virtual machines,each one in >physical >>> > > servers.... >>> > > Then this two VMS act as a artiber of gluster set.... >>> > > >>> > > Is this doable? >>> > > >>> > > Thanks >>> > > >>> > > ________ >>> > > >>> > > >>> > > >>> > > Community Meeting Calendar: >>> > > >>> > > Schedule - >>> > > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >>> > > Bridge: https://bluejeans.com/441850968 >>> > > >>> > > Gluster-users mailing list >>> > > Gluster-users at gluster.org > >>> > > https://lists.gluster.org/mailman/listinfo/gluster-users >>> > > >>> > ________ >>> > >>> > >>> > >>> > Community Meeting Calendar: >>> > >>> > Schedule - >>> > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >>> > Bridge: https://bluejeans.com/441850968 >>> > >>> > Gluster-users mailing list >>> > Gluster-users at gluster.org >>> > https://lists.gluster.org/mailman/listinfo/gluster-users >>> > >>> >> From hunter86_bg at yahoo.com Wed Aug 5 21:07:53 2020 From: hunter86_bg at yahoo.com (Strahil Nikolov) Date: Thu, 06 Aug 2020 00:07:53 +0300 Subject: [Gluster-users] performance In-Reply-To: <64ee1b88-42d6-75d2-05ff-4703d168cc25@computerisms.ca> References: <696b3c28-519b-c3e3-ce5d-e60d2f194d4c@computerisms.ca> <7991483E-5365-4C87-89FA-C871AED18062@yahoo.com> <345b06c4-5996-9aa3-f846-0944c60ee398@computerisms.ca> <2CD68ED2-199F-407D-B0CC-385793BA16FD@yahoo.com> <64ee1b88-42d6-75d2-05ff-4703d168cc25@computerisms.ca> Message-ID: <68274322-B514-4555-A236-D159B16D42FC@yahoo.com> ?? 5 ?????? 2020 ?. 4:53:34 GMT+03:00, Computerisms Corporation ??????: >Hi Strahil, > >thanks again for sticking with me on this. >> Hm... OK. I guess you can try 7.7 whenever it's possible. > >Acknowledged. > >>> Perhaps I am not understanding it correctly. I tried these >suggestions >>> >>> before and it got worse, not better. so I have been operating under >>> the >>> assumption that maybe these guidelines are not appropriate for newer >>> versions. >> >> Actually, the settings are not changed much, so they should work >for you. > >Okay, then maybe I am doing something incorrectly, or not understanding > >some fundamental piece of things that I should be. To be honest, the documentation seems pretty useless to me. >>>>> Interestingly, mostly because it is not something I have ever >>>>> experienced before, software interrupts sit between 1 and 5 on >each >>>>> core, but the last core is usually sitting around 20. Have never >>>>> encountered a high load average where the si number was ever >>>>> significant. I have googled the crap out of that (as well as >>> gluster >>>>> performance in general), there are nearly limitless posts about >what >>> it >>>>> >>>>> is, but have yet to see one thing to explain what to do about it. >> >> This is happening on all nodes ? >> I got a similar situation caused by bad NIC (si in top was way >high), but the chance for bad NIC on all servers is very low. >> You can still patch OS + Firmware on your next maintenance. > >Yes, but it's not to the same extreme. The other node is currently not > >actually serving anything to the internet, so right now it's only >function is replicated gluster and databases. On the 2nd node there is > >also one core, the first one in this case as opposed to the last one on > >the main node, but it sits between 10 and 15 instead of 20 and 25, and >the remaining cores will be between 0 and 2 instead of 1 and 5. >I have no evidence of any bad hardware, and these servers were both >commissioned only within the last couple of months. But will still >poke >around on this path. It could be a bad firmware also. If you get the opportunity, flash the firmware and bump the OS to the max. >>> more number of CPU cycles than needed, increasing the event thread >>> count >>> would enhance the performance of the Red Hat Storage Server." which >is >>> >>> why I had it at 8. >> >> Yeah, but you got only 6 cores and they are not dedicated for >gluster only. I think that you need to test with lower values. > >Okay, I will change these values a few times over the next couple of >hours and see what happens. > >>> right now the only suggested parameter I haven't played with is the >>> performance.io-thread-count, which I currently have at 64. >> >> I think that as you have SSDs only, you might have some results by >changing this one. > >Okay, will also modify this incrementally. do you think it can go >higher? I think I got this number from a thread on this list, but I am > >not really sure what would be a reasonable value for my system. I guess you can try to increase it a little bit and check how is it going. >>> >>> For what it's worth, I am running ext4 as my underlying fs and I >have >>> read a few times that XFS might have been a better choice. But that >is >>> >>> not a trivial experiment to make at this time with the system in >>> production. It's one thing (and still a bad thing to be sure) to >>> semi-bork the system for an hour or two while I play with >>> configurations, but would take a day or so offline to reformat and >>> restore the data. >> >> XFS should bring better performance, but if the issue is not in FS >-> it won't make a change... >> What I/O scheduler are you using for the SSDs (you can check via 'cat >/sys/block/sdX/queue/scheduler)? > ># cat /sys/block/vda/queue/scheduler >[mq-deadline] none Deadline prioritizes reads in a 2:1 ratio /default tunings/ . You can consider testing 'none' if your SSDs are good. I see vda , please share details on the infra as this is very important. Virtual disks have their limitations and if you are on a VM, then there might be chance to increase the CPU count. If you are on a VM, I would recommend you to use more (in numbers) and smaller disks in stripe sets (either raid0 via mdadm, or pure striped LV). Also, if you are on a VM -> there is no reason to reorder your I/O requests in the VM, just to do it again on the Hypervisour. In such case 'none' can bring better performance, but this varies on the workload. >>> in the past I have tried 2, 4, 8, 16, and 32. Playing with just >those >>> I >>> never noticed that any of them made any difference. Though I might >>> have >>> some different options now than I did then, so might try these again >>> throughout the day... >> >> Are you talking about server or client event threads (or both)? > >It never occurred to me to set them to different values. so far when I > >set one I set the other to the same value. Yeah, this makes sense. >> >>> Thanks again for your time Strahil, if you have any more thoughts >would >>> >>> love to hear them. >> >> Can you check if you use 'noatime' for the bricks ? It won't bring >any effect on the CPU side, but it might help with the I/O. > >I checked into this, and I have nodiratime set, but not noatime. from >what I can gather, it should provide nearly the same benefit >performance >wise while leaving the atime attribute on the files. Never know, I may > >decide I want those at some point in the future. All necessary data is in the file attributes on the brick. I doubt you will need to have access times on the brick itself. Another possibility is to use 'relatime'. >> I see that your indicator for high load is loadavg, but have you >actually checked how many processes are in 'R' or 'D' state ? >> Some monitoring checks can raise loadavg artificially. > >occasionally a batch of processes will be in R state, and I see the D >state show up from time to time, but mostly everything is S. > >> Also, are you using software mirroring (either mdadm or >striped/mirrored LVs )? > >No, single disk. And I opted to not put the gluster on a thinLVM, as I > >don't see myself using the lvm snapshots in this scenario. > >So, we just moved into a quieter time of the day, but maybe I just >stumbled onto something. I was trying to figure out if/how I could >throw more RAM at the problem. gluster docs says write behind is not a > >cache unless flush-behind is on. So seems that is a way to throw ram >to >it? I put performance.write-behind-window-size: 512MB and >performance.flush-behind: on and the whole system calmed down pretty >much immediately. could be just timing, though, will have to see >tomorrow during business hours whether the system stays at a reasonable > >load. > >I will still test the other options you suggested tonight, though, this > >is probably too good to be true. > >Can't thank you enough for your input, Strahil, your help is truly >appreciated! > > > > > > >> >>>> >>>> >>>> Best Regards, >>>> Strahil Nikolov >>>> >>> ________ >>> >>> >>> >>> Community Meeting Calendar: >>> >>> Schedule - >>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >>> Bridge: https://bluejeans.com/441850968 >>> >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> https://lists.gluster.org/mailman/listinfo/gluster-users >________ > > > >Community Meeting Calendar: > >Schedule - >Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >Bridge: https://bluejeans.com/441850968 > >Gluster-users mailing list >Gluster-users at gluster.org >https://lists.gluster.org/mailman/listinfo/gluster-users From hunter86_bg at yahoo.com Wed Aug 5 21:15:23 2020 From: hunter86_bg at yahoo.com (Strahil Nikolov) Date: Thu, 06 Aug 2020 00:15:23 +0300 Subject: [Gluster-users] Two VMS as arbiter... In-Reply-To: References: <4610d2cc-eafa-6a5b-d778-797e6ce7e994@computerisms.ca> <6496d212-9ffa-5112-fc14-aee578b25f01@computerisms.ca> Message-ID: <77292D07-DE1E-4797-A54C-13086317763C@yahoo.com> This could happen if you have pending heals. Did you reboot that node recently ? Did you set automatic unsplit-brain ? Check for pending heals and files in splitbrain. If not, you can check https://docs.gluster.org/en/latest/Troubleshooting/resolving-splitbrain/ (look at point 5). Best Regards, Strahil Nikolov ?? 5 ?????? 2020 ?. 23:41:57 GMT+03:00, Gilberto Nunes ??????: >I'm in trouble here. >When I shutdown the pve01 server, the shared folder over glusterfs is >EMPTY! >It's supposed to be a qcow2 file inside it. >The content is show right, just after I power on pve01 backup... > >Some advice? > > >Thanks > >--- >Gilberto Nunes Ferreira > >(47) 3025-5907 >(47) 99676-7530 - Whatsapp / Telegram > >Skype: gilberto.nunes36 > > > > > >Em qua., 5 de ago. de 2020 ?s 11:07, Gilberto Nunes < >gilberto.nunes32 at gmail.com> escreveu: > >> Well... >> I do the follow: >> >> gluster vol create VMS replica 3 arbiter 1 pve01:/DATA/brick1 >> pve02:/DATA/brick1.5 pve01:/DATA/arbiter1.5 pve02:/DATA/brick2 pv >> e01:/DATA/brick2.5 pve02:/DATA/arbiter2.5 force >> >> And now I have: >> gluster vol info >> >> Volume Name: VMS >> Type: Distributed-Replicate >> Volume ID: 1bd712f5-ccb9-4322-8275-abe363d1ffdd >> Status: Started >> Snapshot Count: 0 >> Number of Bricks: 2 x (2 + 1) = 6 >> Transport-type: tcp >> Bricks: >> Brick1: pve01:/DATA/brick1 >> Brick2: pve02:/DATA/brick1.5 >> Brick3: pve01:/DATA/arbiter1.5 (arbiter) >> Brick4: pve02:/DATA/brick2 >> Brick5: pve01:/DATA/brick2.5 >> Brick6: pve02:/DATA/arbiter2.5 (arbiter) >> Options Reconfigured: >> cluster.quorum-count: 1 >> cluster.quorum-reads: false >> cluster.self-heal-daemon: enable >> cluster.heal-timeout: 10 >> storage.fips-mode-rchecksum: on >> transport.address-family: inet >> nfs.disable: on >> performance.client-io-threads: off >> >> This values I have put it myself, in order to see if could improve >the >> time to make the volume available, when pve01 goes down with ifupdown >> cluster.quorum-count: 1 >> cluster.quorum-reads: false >> cluster.self-heal-daemon: enable >> cluster.heal-timeout: 10 >> >> Nevertheless, it took more than 1 minutes to the volume VMS available >in >> the other host (pve02). >> Is there any trick to reduce this time ? >> >> Thanks >> >> --- >> Gilberto Nunes Ferreira >> >> >> >> >> >> >> Em qua., 5 de ago. de 2020 ?s 08:57, Gilberto Nunes < >> gilberto.nunes32 at gmail.com> escreveu: >> >>> hum I see... like this: >>> [image: image.png] >>> --- >>> Gilberto Nunes Ferreira >>> >>> (47) 3025-5907 >>> (47) 99676-7530 - Whatsapp / Telegram >>> >>> Skype: gilberto.nunes36 >>> >>> >>> >>> >>> >>> Em qua., 5 de ago. de 2020 ?s 02:14, Computerisms Corporation < >>> bob at computerisms.ca> escreveu: >>> >>>> check the example of the chained configuration on this page: >>>> >>>> >>>> >https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.3/html/administration_guide/creating_arbitrated_replicated_volumes >>>> >>>> and apply it to two servers... >>>> >>>> On 2020-08-04 8:25 p.m., Gilberto Nunes wrote: >>>> > Hi Bob! >>>> > >>>> > Could you, please, send me more detail about this configuration? >>>> > I will appreciate that! >>>> > >>>> > Thank you >>>> > --- >>>> > Gilberto Nunes Ferreira >>>> > >>>> > (47) 3025-5907 >>>> > ** >>>> > (47) 99676-7530 - Whatsapp / Telegram >>>> > >>>> > Skype: gilberto.nunes36 >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > Em ter., 4 de ago. de 2020 ?s 23:47, Computerisms Corporation >>>> > > escreveu: >>>> > >>>> > Hi Gilberto, >>>> > >>>> > My understanding is there can only be one arbiter per >replicated >>>> > set. I >>>> > don't have a lot of practice with gluster, so this could be >bad >>>> advice, >>>> > but the way I dealt with it on my two servers was to use 6 >bricks >>>> as >>>> > distributed-replicated (this is also relatively easy to >migrate to >>>> 3 >>>> > servers if that happens for you in the future): >>>> > >>>> > Server1 Server2 >>>> > brick1 brick1.5 >>>> > arbiter1.5 brick2 >>>> > brick2.5 arbiter2.5 >>>> > >>>> > On 2020-08-04 7:00 p.m., Gilberto Nunes wrote: >>>> > > Hi there. >>>> > > I have two physical servers deployed as replica 2 and, >>>> obviously, >>>> > I got >>>> > > a split-brain. >>>> > > So I am thinking in use two virtual machines,each one in >>>> physical >>>> > > servers.... >>>> > > Then this two VMS act as a artiber of gluster set.... >>>> > > >>>> > > Is this doable? >>>> > > >>>> > > Thanks >>>> > > >>>> > > ________ >>>> > > >>>> > > >>>> > > >>>> > > Community Meeting Calendar: >>>> > > >>>> > > Schedule - >>>> > > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >>>> > > Bridge: https://bluejeans.com/441850968 >>>> > > >>>> > > Gluster-users mailing list >>>> > > Gluster-users at gluster.org > >>>> > > https://lists.gluster.org/mailman/listinfo/gluster-users >>>> > > >>>> > ________ >>>> > >>>> > >>>> > >>>> > Community Meeting Calendar: >>>> > >>>> > Schedule - >>>> > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >>>> > Bridge: https://bluejeans.com/441850968 >>>> > >>>> > Gluster-users mailing list >>>> > Gluster-users at gluster.org >>>> > https://lists.gluster.org/mailman/listinfo/gluster-users >>>> > >>>> >>> From gilberto.nunes32 at gmail.com Wed Aug 5 22:56:58 2020 From: gilberto.nunes32 at gmail.com (Gilberto Nunes) Date: Wed, 5 Aug 2020 19:56:58 -0300 Subject: [Gluster-users] Two VMS as arbiter... In-Reply-To: <77292D07-DE1E-4797-A54C-13086317763C@yahoo.com> References: <4610d2cc-eafa-6a5b-d778-797e6ce7e994@computerisms.ca> <6496d212-9ffa-5112-fc14-aee578b25f01@computerisms.ca> <77292D07-DE1E-4797-A54C-13086317763C@yahoo.com> Message-ID: Ok...Thanks a lot Strahil This gluster volume set VMS cluster.favorite-child-policy size do the trick to me here! Cheers --- Gilberto Nunes Ferreira (47) 3025-5907 (47) 99676-7530 - Whatsapp / Telegram Skype: gilberto.nunes36 Em qua., 5 de ago. de 2020 ?s 18:15, Strahil Nikolov escreveu: > This could happen if you have pending heals. Did you reboot that node > recently ? > Did you set automatic unsplit-brain ? > > Check for pending heals and files in splitbrain. > > If not, you can check > https://docs.gluster.org/en/latest/Troubleshooting/resolving-splitbrain/ > (look at point 5). > > Best Regards, > Strahil Nikolov > > ?? 5 ?????? 2020 ?. 23:41:57 GMT+03:00, Gilberto Nunes < > gilberto.nunes32 at gmail.com> ??????: > >I'm in trouble here. > >When I shutdown the pve01 server, the shared folder over glusterfs is > >EMPTY! > >It's supposed to be a qcow2 file inside it. > >The content is show right, just after I power on pve01 backup... > > > >Some advice? > > > > > >Thanks > > > >--- > >Gilberto Nunes Ferreira > > > >(47) 3025-5907 > >(47) 99676-7530 - Whatsapp / Telegram > > > >Skype: gilberto.nunes36 > > > > > > > > > > > >Em qua., 5 de ago. de 2020 ?s 11:07, Gilberto Nunes < > >gilberto.nunes32 at gmail.com> escreveu: > > > >> Well... > >> I do the follow: > >> > >> gluster vol create VMS replica 3 arbiter 1 pve01:/DATA/brick1 > >> pve02:/DATA/brick1.5 pve01:/DATA/arbiter1.5 pve02:/DATA/brick2 pv > >> e01:/DATA/brick2.5 pve02:/DATA/arbiter2.5 force > >> > >> And now I have: > >> gluster vol info > >> > >> Volume Name: VMS > >> Type: Distributed-Replicate > >> Volume ID: 1bd712f5-ccb9-4322-8275-abe363d1ffdd > >> Status: Started > >> Snapshot Count: 0 > >> Number of Bricks: 2 x (2 + 1) = 6 > >> Transport-type: tcp > >> Bricks: > >> Brick1: pve01:/DATA/brick1 > >> Brick2: pve02:/DATA/brick1.5 > >> Brick3: pve01:/DATA/arbiter1.5 (arbiter) > >> Brick4: pve02:/DATA/brick2 > >> Brick5: pve01:/DATA/brick2.5 > >> Brick6: pve02:/DATA/arbiter2.5 (arbiter) > >> Options Reconfigured: > >> cluster.quorum-count: 1 > >> cluster.quorum-reads: false > >> cluster.self-heal-daemon: enable > >> cluster.heal-timeout: 10 > >> storage.fips-mode-rchecksum: on > >> transport.address-family: inet > >> nfs.disable: on > >> performance.client-io-threads: off > >> > >> This values I have put it myself, in order to see if could improve > >the > >> time to make the volume available, when pve01 goes down with ifupdown > >> cluster.quorum-count: 1 > >> cluster.quorum-reads: false > >> cluster.self-heal-daemon: enable > >> cluster.heal-timeout: 10 > >> > >> Nevertheless, it took more than 1 minutes to the volume VMS available > >in > >> the other host (pve02). > >> Is there any trick to reduce this time ? > >> > >> Thanks > >> > >> --- > >> Gilberto Nunes Ferreira > >> > >> > >> > >> > >> > >> > >> Em qua., 5 de ago. de 2020 ?s 08:57, Gilberto Nunes < > >> gilberto.nunes32 at gmail.com> escreveu: > >> > >>> hum I see... like this: > >>> [image: image.png] > >>> --- > >>> Gilberto Nunes Ferreira > >>> > >>> (47) 3025-5907 > >>> (47) 99676-7530 - Whatsapp / Telegram > >>> > >>> Skype: gilberto.nunes36 > >>> > >>> > >>> > >>> > >>> > >>> Em qua., 5 de ago. de 2020 ?s 02:14, Computerisms Corporation < > >>> bob at computerisms.ca> escreveu: > >>> > >>>> check the example of the chained configuration on this page: > >>>> > >>>> > >>>> > > > https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.3/html/administration_guide/creating_arbitrated_replicated_volumes > >>>> > >>>> and apply it to two servers... > >>>> > >>>> On 2020-08-04 8:25 p.m., Gilberto Nunes wrote: > >>>> > Hi Bob! > >>>> > > >>>> > Could you, please, send me more detail about this configuration? > >>>> > I will appreciate that! > >>>> > > >>>> > Thank you > >>>> > --- > >>>> > Gilberto Nunes Ferreira > >>>> > > >>>> > (47) 3025-5907 > >>>> > ** > >>>> > (47) 99676-7530 - Whatsapp / Telegram > >>>> > > >>>> > Skype: gilberto.nunes36 > >>>> > > >>>> > > >>>> > > >>>> > > >>>> > > >>>> > Em ter., 4 de ago. de 2020 ?s 23:47, Computerisms Corporation > >>>> > > escreveu: > >>>> > > >>>> > Hi Gilberto, > >>>> > > >>>> > My understanding is there can only be one arbiter per > >replicated > >>>> > set. I > >>>> > don't have a lot of practice with gluster, so this could be > >bad > >>>> advice, > >>>> > but the way I dealt with it on my two servers was to use 6 > >bricks > >>>> as > >>>> > distributed-replicated (this is also relatively easy to > >migrate to > >>>> 3 > >>>> > servers if that happens for you in the future): > >>>> > > >>>> > Server1 Server2 > >>>> > brick1 brick1.5 > >>>> > arbiter1.5 brick2 > >>>> > brick2.5 arbiter2.5 > >>>> > > >>>> > On 2020-08-04 7:00 p.m., Gilberto Nunes wrote: > >>>> > > Hi there. > >>>> > > I have two physical servers deployed as replica 2 and, > >>>> obviously, > >>>> > I got > >>>> > > a split-brain. > >>>> > > So I am thinking in use two virtual machines,each one in > >>>> physical > >>>> > > servers.... > >>>> > > Then this two VMS act as a artiber of gluster set.... > >>>> > > > >>>> > > Is this doable? > >>>> > > > >>>> > > Thanks > >>>> > > > >>>> > > ________ > >>>> > > > >>>> > > > >>>> > > > >>>> > > Community Meeting Calendar: > >>>> > > > >>>> > > Schedule - > >>>> > > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > >>>> > > Bridge: https://bluejeans.com/441850968 > >>>> > > > >>>> > > Gluster-users mailing list > >>>> > > Gluster-users at gluster.org > > > >>>> > > https://lists.gluster.org/mailman/listinfo/gluster-users > >>>> > > > >>>> > ________ > >>>> > > >>>> > > >>>> > > >>>> > Community Meeting Calendar: > >>>> > > >>>> > Schedule - > >>>> > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > >>>> > Bridge: https://bluejeans.com/441850968 > >>>> > > >>>> > Gluster-users mailing list > >>>> > Gluster-users at gluster.org > >>>> > https://lists.gluster.org/mailman/listinfo/gluster-users > >>>> > > >>>> > >>> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From archon810 at gmail.com Thu Aug 6 00:28:44 2020 From: archon810 at gmail.com (Artem Russakovskii) Date: Wed, 5 Aug 2020 17:28:44 -0700 Subject: [Gluster-users] performance In-Reply-To: References: <696b3c28-519b-c3e3-ce5d-e60d2f194d4c@computerisms.ca> <7991483E-5365-4C87-89FA-C871AED18062@yahoo.com> <345b06c4-5996-9aa3-f846-0944c60ee398@computerisms.ca> <2CD68ED2-199F-407D-B0CC-385793BA16FD@yahoo.com> <64ee1b88-42d6-75d2-05ff-4703d168cc25@computerisms.ca> Message-ID: I'm very curious whether these improvements hold up over the next few days. Please report back. Sincerely, Artem -- Founder, Android Police , APK Mirror , Illogical Robot LLC beerpla.net | @ArtemR On Wed, Aug 5, 2020 at 9:44 AM Computerisms Corporation wrote: > Hi List, > > > So, we just moved into a quieter time of the day, but maybe I just > > stumbled onto something. I was trying to figure out if/how I could > > throw more RAM at the problem. gluster docs says write behind is not a > > cache unless flush-behind is on. So seems that is a way to throw ram to > > it? I put performance.write-behind-window-size: 512MB and > > performance.flush-behind: on and the whole system calmed down pretty > > much immediately. could be just timing, though, will have to see > > tomorrow during business hours whether the system stays at a reasonable > > load. > > so reporting back that this seems to have definitely had a significant > positive effect. > > So far today I have not seen the load average climb over 13 with the > 15minute average hovering around 7. cpus are still spiking from time to > time, but they are not staying maxed out all the time, and frequently I > am seeing brief periods of up to 80% idle. glusterfs process still > spiking up to 180% or so, but consistently running around 70%, and the > brick processes still spiking up to 70-80%, but consistently running > around 20%. Disk has only been above 50% in atop once so far today when > it spiked up to 92%, and still lots of RAM left over. So far nload even > seems indicates I could get away with a 100Mbit network connection. > Websites are snappy relative to what they were, still a bit sluggish on > the first page of any given site, but tolerable or close to. Apache > processes are opening and closing right away, instead of stacking up. > > Overall, system is performing pretty much like I would expect it to > without gluster. I haven't played with any of the other settings yet, > just going to leave it like this for a day. > > I have to admit I am a little bit suspicious. I have been arguing with > Gluster for a very long time, and I have never known it to play this > nice. kind feels like when your girl tells you she is "fine"; > conversation has stopped, but you aren't really sure if it's done... > > > > > I will still test the other options you suggested tonight, though, this > > is probably too good to be true. > > > > Can't thank you enough for your input, Strahil, your help is truly > > appreciated! > > > > > > > > > > > > > >> > >>>> > >>>> > >>>> Best Regards, > >>>> Strahil Nikolov > >>>> > >>> ________ > >>> > >>> > >>> > >>> Community Meeting Calendar: > >>> > >>> Schedule - > >>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > >>> Bridge: https://bluejeans.com/441850968 > >>> > >>> Gluster-users mailing list > >>> Gluster-users at gluster.org > >>> https://lists.gluster.org/mailman/listinfo/gluster-users > > ________ > > > > > > > > Community Meeting Calendar: > > > > Schedule - > > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > > Bridge: https://bluejeans.com/441850968 > > > > Gluster-users mailing list > > Gluster-users at gluster.org > > https://lists.gluster.org/mailman/listinfo/gluster-users > ________ > > > > Community Meeting Calendar: > > Schedule - > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > Bridge: https://bluejeans.com/441850968 > > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hunter86_bg at yahoo.com Thu Aug 6 04:08:51 2020 From: hunter86_bg at yahoo.com (Strahil Nikolov) Date: Thu, 06 Aug 2020 07:08:51 +0300 Subject: [Gluster-users] Two VMS as arbiter... In-Reply-To: References: <4610d2cc-eafa-6a5b-d778-797e6ce7e994@computerisms.ca> <6496d212-9ffa-5112-fc14-aee578b25f01@computerisms.ca> <77292D07-DE1E-4797-A54C-13086317763C@yahoo.com> Message-ID: As you mentioned qcow2 files, check the virt group (/var/lib/glusterfs/group or something like that). It has optimal setttins for VMs and is used by oVirt. WARNING: If you decide to enable the group, which will also enable sharding, NEVER EVER DISABLE SHARDING -> ONCE ENABLED STAYS ENABLED !!! Sharding helps reduce loocking during replica heals. WARNING2: As virt group uses sharding (fixes the size of file into shard size), you should consider cluster.favorite-child-policy with value ctime/mtime. Best Regards, Strahil Nikolov ?? 6 ?????? 2020 ?. 1:56:58 GMT+03:00, Gilberto Nunes ??????: >Ok...Thanks a lot Strahil > >This gluster volume set VMS cluster.favorite-child-policy size do the >trick >to me here! > >Cheers >--- >Gilberto Nunes Ferreira > >(47) 3025-5907 >(47) 99676-7530 - Whatsapp / Telegram > >Skype: gilberto.nunes36 > > > > > >Em qua., 5 de ago. de 2020 ?s 18:15, Strahil Nikolov > >escreveu: > >> This could happen if you have pending heals. Did you reboot that node >> recently ? >> Did you set automatic unsplit-brain ? >> >> Check for pending heals and files in splitbrain. >> >> If not, you can check >> >https://docs.gluster.org/en/latest/Troubleshooting/resolving-splitbrain/ >> (look at point 5). >> >> Best Regards, >> Strahil Nikolov >> >> ?? 5 ?????? 2020 ?. 23:41:57 GMT+03:00, Gilberto Nunes < >> gilberto.nunes32 at gmail.com> ??????: >> >I'm in trouble here. >> >When I shutdown the pve01 server, the shared folder over glusterfs >is >> >EMPTY! >> >It's supposed to be a qcow2 file inside it. >> >The content is show right, just after I power on pve01 backup... >> > >> >Some advice? >> > >> > >> >Thanks >> > >> >--- >> >Gilberto Nunes Ferreira >> > >> >(47) 3025-5907 >> >(47) 99676-7530 - Whatsapp / Telegram >> > >> >Skype: gilberto.nunes36 >> > >> > >> > >> > >> > >> >Em qua., 5 de ago. de 2020 ?s 11:07, Gilberto Nunes < >> >gilberto.nunes32 at gmail.com> escreveu: >> > >> >> Well... >> >> I do the follow: >> >> >> >> gluster vol create VMS replica 3 arbiter 1 pve01:/DATA/brick1 >> >> pve02:/DATA/brick1.5 pve01:/DATA/arbiter1.5 pve02:/DATA/brick2 pv >> >> e01:/DATA/brick2.5 pve02:/DATA/arbiter2.5 force >> >> >> >> And now I have: >> >> gluster vol info >> >> >> >> Volume Name: VMS >> >> Type: Distributed-Replicate >> >> Volume ID: 1bd712f5-ccb9-4322-8275-abe363d1ffdd >> >> Status: Started >> >> Snapshot Count: 0 >> >> Number of Bricks: 2 x (2 + 1) = 6 >> >> Transport-type: tcp >> >> Bricks: >> >> Brick1: pve01:/DATA/brick1 >> >> Brick2: pve02:/DATA/brick1.5 >> >> Brick3: pve01:/DATA/arbiter1.5 (arbiter) >> >> Brick4: pve02:/DATA/brick2 >> >> Brick5: pve01:/DATA/brick2.5 >> >> Brick6: pve02:/DATA/arbiter2.5 (arbiter) >> >> Options Reconfigured: >> >> cluster.quorum-count: 1 >> >> cluster.quorum-reads: false >> >> cluster.self-heal-daemon: enable >> >> cluster.heal-timeout: 10 >> >> storage.fips-mode-rchecksum: on >> >> transport.address-family: inet >> >> nfs.disable: on >> >> performance.client-io-threads: off >> >> >> >> This values I have put it myself, in order to see if could improve >> >the >> >> time to make the volume available, when pve01 goes down with >ifupdown >> >> cluster.quorum-count: 1 >> >> cluster.quorum-reads: false >> >> cluster.self-heal-daemon: enable >> >> cluster.heal-timeout: 10 >> >> >> >> Nevertheless, it took more than 1 minutes to the volume VMS >available >> >in >> >> the other host (pve02). >> >> Is there any trick to reduce this time ? >> >> >> >> Thanks >> >> >> >> --- >> >> Gilberto Nunes Ferreira >> >> >> >> >> >> >> >> >> >> >> >> >> >> Em qua., 5 de ago. de 2020 ?s 08:57, Gilberto Nunes < >> >> gilberto.nunes32 at gmail.com> escreveu: >> >> >> >>> hum I see... like this: >> >>> [image: image.png] >> >>> --- >> >>> Gilberto Nunes Ferreira >> >>> >> >>> (47) 3025-5907 >> >>> (47) 99676-7530 - Whatsapp / Telegram >> >>> >> >>> Skype: gilberto.nunes36 >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> Em qua., 5 de ago. de 2020 ?s 02:14, Computerisms Corporation < >> >>> bob at computerisms.ca> escreveu: >> >>> >> >>>> check the example of the chained configuration on this page: >> >>>> >> >>>> >> >>>> >> > >> >https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.3/html/administration_guide/creating_arbitrated_replicated_volumes >> >>>> >> >>>> and apply it to two servers... >> >>>> >> >>>> On 2020-08-04 8:25 p.m., Gilberto Nunes wrote: >> >>>> > Hi Bob! >> >>>> > >> >>>> > Could you, please, send me more detail about this >configuration? >> >>>> > I will appreciate that! >> >>>> > >> >>>> > Thank you >> >>>> > --- >> >>>> > Gilberto Nunes Ferreira >> >>>> > >> >>>> > (47) 3025-5907 >> >>>> > ** >> >>>> > (47) 99676-7530 - Whatsapp / Telegram >> >>>> > >> >>>> > Skype: gilberto.nunes36 >> >>>> > >> >>>> > >> >>>> > >> >>>> > >> >>>> > >> >>>> > Em ter., 4 de ago. de 2020 ?s 23:47, Computerisms Corporation >> >>>> > > escreveu: >> >>>> > >> >>>> > Hi Gilberto, >> >>>> > >> >>>> > My understanding is there can only be one arbiter per >> >replicated >> >>>> > set. I >> >>>> > don't have a lot of practice with gluster, so this could >be >> >bad >> >>>> advice, >> >>>> > but the way I dealt with it on my two servers was to use 6 >> >bricks >> >>>> as >> >>>> > distributed-replicated (this is also relatively easy to >> >migrate to >> >>>> 3 >> >>>> > servers if that happens for you in the future): >> >>>> > >> >>>> > Server1 Server2 >> >>>> > brick1 brick1.5 >> >>>> > arbiter1.5 brick2 >> >>>> > brick2.5 arbiter2.5 >> >>>> > >> >>>> > On 2020-08-04 7:00 p.m., Gilberto Nunes wrote: >> >>>> > > Hi there. >> >>>> > > I have two physical servers deployed as replica 2 and, >> >>>> obviously, >> >>>> > I got >> >>>> > > a split-brain. >> >>>> > > So I am thinking in use two virtual machines,each one >in >> >>>> physical >> >>>> > > servers.... >> >>>> > > Then this two VMS act as a artiber of gluster set.... >> >>>> > > >> >>>> > > Is this doable? >> >>>> > > >> >>>> > > Thanks >> >>>> > > >> >>>> > > ________ >> >>>> > > >> >>>> > > >> >>>> > > >> >>>> > > Community Meeting Calendar: >> >>>> > > >> >>>> > > Schedule - >> >>>> > > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >> >>>> > > Bridge: https://bluejeans.com/441850968 >> >>>> > > >> >>>> > > Gluster-users mailing list >> >>>> > > Gluster-users at gluster.org >> > >> >>>> > > >https://lists.gluster.org/mailman/listinfo/gluster-users >> >>>> > > >> >>>> > ________ >> >>>> > >> >>>> > >> >>>> > >> >>>> > Community Meeting Calendar: >> >>>> > >> >>>> > Schedule - >> >>>> > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >> >>>> > Bridge: https://bluejeans.com/441850968 >> >>>> > >> >>>> > Gluster-users mailing list >> >>>> > Gluster-users at gluster.org > >> >>>> > https://lists.gluster.org/mailman/listinfo/gluster-users >> >>>> > >> >>>> >> >>> >> From gilberto.nunes32 at gmail.com Thu Aug 6 12:32:29 2020 From: gilberto.nunes32 at gmail.com (Gilberto Nunes) Date: Thu, 6 Aug 2020 09:32:29 -0300 Subject: [Gluster-users] Two VMS as arbiter... In-Reply-To: References: <4610d2cc-eafa-6a5b-d778-797e6ce7e994@computerisms.ca> <6496d212-9ffa-5112-fc14-aee578b25f01@computerisms.ca> <77292D07-DE1E-4797-A54C-13086317763C@yahoo.com> Message-ID: What do you mean "sharding"? Do you mean sharing folders between two servers to host qcow2 or raw vm images? Here I am using Proxmox which uses qemu but not virsh. Thanks --- Gilberto Nunes Ferreira (47) 3025-5907 (47) 99676-7530 - Whatsapp / Telegram Skype: gilberto.nunes36 Em qui., 6 de ago. de 2020 ?s 01:09, Strahil Nikolov escreveu: > As you mentioned qcow2 files, check the virt group > (/var/lib/glusterfs/group or something like that). It has optimal setttins > for VMs and is used by oVirt. > > WARNING: If you decide to enable the group, which will also enable > sharding, NEVER EVER DISABLE SHARDING -> ONCE ENABLED STAYS ENABLED !!! > Sharding helps reduce loocking during replica heals. > > WARNING2: As virt group uses sharding (fixes the size of file into shard > size), you should consider cluster.favorite-child-policy with value > ctime/mtime. > > Best Regards, > Strahil Nikolov > > ?? 6 ?????? 2020 ?. 1:56:58 GMT+03:00, Gilberto Nunes < > gilberto.nunes32 at gmail.com> ??????: > >Ok...Thanks a lot Strahil > > > >This gluster volume set VMS cluster.favorite-child-policy size do the > >trick > >to me here! > > > >Cheers > >--- > >Gilberto Nunes Ferreira > > > >(47) 3025-5907 > >(47) 99676-7530 - Whatsapp / Telegram > > > >Skype: gilberto.nunes36 > > > > > > > > > > > >Em qua., 5 de ago. de 2020 ?s 18:15, Strahil Nikolov > > > >escreveu: > > > >> This could happen if you have pending heals. Did you reboot that node > >> recently ? > >> Did you set automatic unsplit-brain ? > >> > >> Check for pending heals and files in splitbrain. > >> > >> If not, you can check > >> > >https://docs.gluster.org/en/latest/Troubleshooting/resolving-splitbrain/ > >> (look at point 5). > >> > >> Best Regards, > >> Strahil Nikolov > >> > >> ?? 5 ?????? 2020 ?. 23:41:57 GMT+03:00, Gilberto Nunes < > >> gilberto.nunes32 at gmail.com> ??????: > >> >I'm in trouble here. > >> >When I shutdown the pve01 server, the shared folder over glusterfs > >is > >> >EMPTY! > >> >It's supposed to be a qcow2 file inside it. > >> >The content is show right, just after I power on pve01 backup... > >> > > >> >Some advice? > >> > > >> > > >> >Thanks > >> > > >> >--- > >> >Gilberto Nunes Ferreira > >> > > >> >(47) 3025-5907 > >> >(47) 99676-7530 - Whatsapp / Telegram > >> > > >> >Skype: gilberto.nunes36 > >> > > >> > > >> > > >> > > >> > > >> >Em qua., 5 de ago. de 2020 ?s 11:07, Gilberto Nunes < > >> >gilberto.nunes32 at gmail.com> escreveu: > >> > > >> >> Well... > >> >> I do the follow: > >> >> > >> >> gluster vol create VMS replica 3 arbiter 1 pve01:/DATA/brick1 > >> >> pve02:/DATA/brick1.5 pve01:/DATA/arbiter1.5 pve02:/DATA/brick2 pv > >> >> e01:/DATA/brick2.5 pve02:/DATA/arbiter2.5 force > >> >> > >> >> And now I have: > >> >> gluster vol info > >> >> > >> >> Volume Name: VMS > >> >> Type: Distributed-Replicate > >> >> Volume ID: 1bd712f5-ccb9-4322-8275-abe363d1ffdd > >> >> Status: Started > >> >> Snapshot Count: 0 > >> >> Number of Bricks: 2 x (2 + 1) = 6 > >> >> Transport-type: tcp > >> >> Bricks: > >> >> Brick1: pve01:/DATA/brick1 > >> >> Brick2: pve02:/DATA/brick1.5 > >> >> Brick3: pve01:/DATA/arbiter1.5 (arbiter) > >> >> Brick4: pve02:/DATA/brick2 > >> >> Brick5: pve01:/DATA/brick2.5 > >> >> Brick6: pve02:/DATA/arbiter2.5 (arbiter) > >> >> Options Reconfigured: > >> >> cluster.quorum-count: 1 > >> >> cluster.quorum-reads: false > >> >> cluster.self-heal-daemon: enable > >> >> cluster.heal-timeout: 10 > >> >> storage.fips-mode-rchecksum: on > >> >> transport.address-family: inet > >> >> nfs.disable: on > >> >> performance.client-io-threads: off > >> >> > >> >> This values I have put it myself, in order to see if could improve > >> >the > >> >> time to make the volume available, when pve01 goes down with > >ifupdown > >> >> cluster.quorum-count: 1 > >> >> cluster.quorum-reads: false > >> >> cluster.self-heal-daemon: enable > >> >> cluster.heal-timeout: 10 > >> >> > >> >> Nevertheless, it took more than 1 minutes to the volume VMS > >available > >> >in > >> >> the other host (pve02). > >> >> Is there any trick to reduce this time ? > >> >> > >> >> Thanks > >> >> > >> >> --- > >> >> Gilberto Nunes Ferreira > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> Em qua., 5 de ago. de 2020 ?s 08:57, Gilberto Nunes < > >> >> gilberto.nunes32 at gmail.com> escreveu: > >> >> > >> >>> hum I see... like this: > >> >>> [image: image.png] > >> >>> --- > >> >>> Gilberto Nunes Ferreira > >> >>> > >> >>> (47) 3025-5907 > >> >>> (47) 99676-7530 - Whatsapp / Telegram > >> >>> > >> >>> Skype: gilberto.nunes36 > >> >>> > >> >>> > >> >>> > >> >>> > >> >>> > >> >>> Em qua., 5 de ago. de 2020 ?s 02:14, Computerisms Corporation < > >> >>> bob at computerisms.ca> escreveu: > >> >>> > >> >>>> check the example of the chained configuration on this page: > >> >>>> > >> >>>> > >> >>>> > >> > > >> > > > https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.3/html/administration_guide/creating_arbitrated_replicated_volumes > >> >>>> > >> >>>> and apply it to two servers... > >> >>>> > >> >>>> On 2020-08-04 8:25 p.m., Gilberto Nunes wrote: > >> >>>> > Hi Bob! > >> >>>> > > >> >>>> > Could you, please, send me more detail about this > >configuration? > >> >>>> > I will appreciate that! > >> >>>> > > >> >>>> > Thank you > >> >>>> > --- > >> >>>> > Gilberto Nunes Ferreira > >> >>>> > > >> >>>> > (47) 3025-5907 > >> >>>> > ** > >> >>>> > (47) 99676-7530 - Whatsapp / Telegram > >> >>>> > > >> >>>> > Skype: gilberto.nunes36 > >> >>>> > > >> >>>> > > >> >>>> > > >> >>>> > > >> >>>> > > >> >>>> > Em ter., 4 de ago. de 2020 ?s 23:47, Computerisms Corporation > >> >>>> > > escreveu: > >> >>>> > > >> >>>> > Hi Gilberto, > >> >>>> > > >> >>>> > My understanding is there can only be one arbiter per > >> >replicated > >> >>>> > set. I > >> >>>> > don't have a lot of practice with gluster, so this could > >be > >> >bad > >> >>>> advice, > >> >>>> > but the way I dealt with it on my two servers was to use 6 > >> >bricks > >> >>>> as > >> >>>> > distributed-replicated (this is also relatively easy to > >> >migrate to > >> >>>> 3 > >> >>>> > servers if that happens for you in the future): > >> >>>> > > >> >>>> > Server1 Server2 > >> >>>> > brick1 brick1.5 > >> >>>> > arbiter1.5 brick2 > >> >>>> > brick2.5 arbiter2.5 > >> >>>> > > >> >>>> > On 2020-08-04 7:00 p.m., Gilberto Nunes wrote: > >> >>>> > > Hi there. > >> >>>> > > I have two physical servers deployed as replica 2 and, > >> >>>> obviously, > >> >>>> > I got > >> >>>> > > a split-brain. > >> >>>> > > So I am thinking in use two virtual machines,each one > >in > >> >>>> physical > >> >>>> > > servers.... > >> >>>> > > Then this two VMS act as a artiber of gluster set.... > >> >>>> > > > >> >>>> > > Is this doable? > >> >>>> > > > >> >>>> > > Thanks > >> >>>> > > > >> >>>> > > ________ > >> >>>> > > > >> >>>> > > > >> >>>> > > > >> >>>> > > Community Meeting Calendar: > >> >>>> > > > >> >>>> > > Schedule - > >> >>>> > > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > >> >>>> > > Bridge: https://bluejeans.com/441850968 > >> >>>> > > > >> >>>> > > Gluster-users mailing list > >> >>>> > > Gluster-users at gluster.org > >> > > >> >>>> > > > >https://lists.gluster.org/mailman/listinfo/gluster-users > >> >>>> > > > >> >>>> > ________ > >> >>>> > > >> >>>> > > >> >>>> > > >> >>>> > Community Meeting Calendar: > >> >>>> > > >> >>>> > Schedule - > >> >>>> > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > >> >>>> > Bridge: https://bluejeans.com/441850968 > >> >>>> > > >> >>>> > Gluster-users mailing list > >> >>>> > Gluster-users at gluster.org > > > >> >>>> > https://lists.gluster.org/mailman/listinfo/gluster-users > >> >>>> > > >> >>>> > >> >>> > >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gilberto.nunes32 at gmail.com Thu Aug 6 13:37:07 2020 From: gilberto.nunes32 at gmail.com (Gilberto Nunes) Date: Thu, 6 Aug 2020 10:37:07 -0300 Subject: [Gluster-users] Two VMS as arbiter... In-Reply-To: References: <4610d2cc-eafa-6a5b-d778-797e6ce7e994@computerisms.ca> <6496d212-9ffa-5112-fc14-aee578b25f01@computerisms.ca> <77292D07-DE1E-4797-A54C-13086317763C@yahoo.com> Message-ID: Oh I see... I was confused because the terms... Now I read this and everything becomes clear... https://staged-gluster-docs.readthedocs.io/en/release3.7.0beta1/Features/shard/ https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.3/html/configuring_red_hat_virtualization_with_red_hat_gluster_storage/chap-hosting_virtual_machine_images_on_red_hat_storage_volumes Should I use cluster.granular-entrey-heal-enable too, since I am working with big files? Thanks --- Gilberto Nunes Ferreira (47) 3025-5907 (47) 99676-7530 - Whatsapp / Telegram Skype: gilberto.nunes36 Em qui., 6 de ago. de 2020 ?s 09:32, Gilberto Nunes < gilberto.nunes32 at gmail.com> escreveu: > What do you mean "sharding"? Do you mean sharing folders between two > servers to host qcow2 or raw vm images? > Here I am using Proxmox which uses qemu but not virsh. > > Thanks > --- > Gilberto Nunes Ferreira > > (47) 3025-5907 > (47) 99676-7530 - Whatsapp / Telegram > > Skype: gilberto.nunes36 > > > > > > Em qui., 6 de ago. de 2020 ?s 01:09, Strahil Nikolov < > hunter86_bg at yahoo.com> escreveu: > >> As you mentioned qcow2 files, check the virt group >> (/var/lib/glusterfs/group or something like that). It has optimal setttins >> for VMs and is used by oVirt. >> >> WARNING: If you decide to enable the group, which will also enable >> sharding, NEVER EVER DISABLE SHARDING -> ONCE ENABLED STAYS ENABLED !!! >> Sharding helps reduce loocking during replica heals. >> >> WARNING2: As virt group uses sharding (fixes the size of file into shard >> size), you should consider cluster.favorite-child-policy with value >> ctime/mtime. >> >> Best Regards, >> Strahil Nikolov >> >> ?? 6 ?????? 2020 ?. 1:56:58 GMT+03:00, Gilberto Nunes < >> gilberto.nunes32 at gmail.com> ??????: >> >Ok...Thanks a lot Strahil >> > >> >This gluster volume set VMS cluster.favorite-child-policy size do the >> >trick >> >to me here! >> > >> >Cheers >> >--- >> >Gilberto Nunes Ferreira >> > >> >(47) 3025-5907 >> >(47) 99676-7530 - Whatsapp / Telegram >> > >> >Skype: gilberto.nunes36 >> > >> > >> > >> > >> > >> >Em qua., 5 de ago. de 2020 ?s 18:15, Strahil Nikolov >> > >> >escreveu: >> > >> >> This could happen if you have pending heals. Did you reboot that node >> >> recently ? >> >> Did you set automatic unsplit-brain ? >> >> >> >> Check for pending heals and files in splitbrain. >> >> >> >> If not, you can check >> >> >> >https://docs.gluster.org/en/latest/Troubleshooting/resolving-splitbrain/ >> >> (look at point 5). >> >> >> >> Best Regards, >> >> Strahil Nikolov >> >> >> >> ?? 5 ?????? 2020 ?. 23:41:57 GMT+03:00, Gilberto Nunes < >> >> gilberto.nunes32 at gmail.com> ??????: >> >> >I'm in trouble here. >> >> >When I shutdown the pve01 server, the shared folder over glusterfs >> >is >> >> >EMPTY! >> >> >It's supposed to be a qcow2 file inside it. >> >> >The content is show right, just after I power on pve01 backup... >> >> > >> >> >Some advice? >> >> > >> >> > >> >> >Thanks >> >> > >> >> >--- >> >> >Gilberto Nunes Ferreira >> >> > >> >> >(47) 3025-5907 >> >> >(47) 99676-7530 - Whatsapp / Telegram >> >> > >> >> >Skype: gilberto.nunes36 >> >> > >> >> > >> >> > >> >> > >> >> > >> >> >Em qua., 5 de ago. de 2020 ?s 11:07, Gilberto Nunes < >> >> >gilberto.nunes32 at gmail.com> escreveu: >> >> > >> >> >> Well... >> >> >> I do the follow: >> >> >> >> >> >> gluster vol create VMS replica 3 arbiter 1 pve01:/DATA/brick1 >> >> >> pve02:/DATA/brick1.5 pve01:/DATA/arbiter1.5 pve02:/DATA/brick2 pv >> >> >> e01:/DATA/brick2.5 pve02:/DATA/arbiter2.5 force >> >> >> >> >> >> And now I have: >> >> >> gluster vol info >> >> >> >> >> >> Volume Name: VMS >> >> >> Type: Distributed-Replicate >> >> >> Volume ID: 1bd712f5-ccb9-4322-8275-abe363d1ffdd >> >> >> Status: Started >> >> >> Snapshot Count: 0 >> >> >> Number of Bricks: 2 x (2 + 1) = 6 >> >> >> Transport-type: tcp >> >> >> Bricks: >> >> >> Brick1: pve01:/DATA/brick1 >> >> >> Brick2: pve02:/DATA/brick1.5 >> >> >> Brick3: pve01:/DATA/arbiter1.5 (arbiter) >> >> >> Brick4: pve02:/DATA/brick2 >> >> >> Brick5: pve01:/DATA/brick2.5 >> >> >> Brick6: pve02:/DATA/arbiter2.5 (arbiter) >> >> >> Options Reconfigured: >> >> >> cluster.quorum-count: 1 >> >> >> cluster.quorum-reads: false >> >> >> cluster.self-heal-daemon: enable >> >> >> cluster.heal-timeout: 10 >> >> >> storage.fips-mode-rchecksum: on >> >> >> transport.address-family: inet >> >> >> nfs.disable: on >> >> >> performance.client-io-threads: off >> >> >> >> >> >> This values I have put it myself, in order to see if could improve >> >> >the >> >> >> time to make the volume available, when pve01 goes down with >> >ifupdown >> >> >> cluster.quorum-count: 1 >> >> >> cluster.quorum-reads: false >> >> >> cluster.self-heal-daemon: enable >> >> >> cluster.heal-timeout: 10 >> >> >> >> >> >> Nevertheless, it took more than 1 minutes to the volume VMS >> >available >> >> >in >> >> >> the other host (pve02). >> >> >> Is there any trick to reduce this time ? >> >> >> >> >> >> Thanks >> >> >> >> >> >> --- >> >> >> Gilberto Nunes Ferreira >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> Em qua., 5 de ago. de 2020 ?s 08:57, Gilberto Nunes < >> >> >> gilberto.nunes32 at gmail.com> escreveu: >> >> >> >> >> >>> hum I see... like this: >> >> >>> [image: image.png] >> >> >>> --- >> >> >>> Gilberto Nunes Ferreira >> >> >>> >> >> >>> (47) 3025-5907 >> >> >>> (47) 99676-7530 - Whatsapp / Telegram >> >> >>> >> >> >>> Skype: gilberto.nunes36 >> >> >>> >> >> >>> >> >> >>> >> >> >>> >> >> >>> >> >> >>> Em qua., 5 de ago. de 2020 ?s 02:14, Computerisms Corporation < >> >> >>> bob at computerisms.ca> escreveu: >> >> >>> >> >> >>>> check the example of the chained configuration on this page: >> >> >>>> >> >> >>>> >> >> >>>> >> >> > >> >> >> > >> https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.3/html/administration_guide/creating_arbitrated_replicated_volumes >> >> >>>> >> >> >>>> and apply it to two servers... >> >> >>>> >> >> >>>> On 2020-08-04 8:25 p.m., Gilberto Nunes wrote: >> >> >>>> > Hi Bob! >> >> >>>> > >> >> >>>> > Could you, please, send me more detail about this >> >configuration? >> >> >>>> > I will appreciate that! >> >> >>>> > >> >> >>>> > Thank you >> >> >>>> > --- >> >> >>>> > Gilberto Nunes Ferreira >> >> >>>> > >> >> >>>> > (47) 3025-5907 >> >> >>>> > ** >> >> >>>> > (47) 99676-7530 - Whatsapp / Telegram >> >> >>>> > >> >> >>>> > Skype: gilberto.nunes36 >> >> >>>> > >> >> >>>> > >> >> >>>> > >> >> >>>> > >> >> >>>> > >> >> >>>> > Em ter., 4 de ago. de 2020 ?s 23:47, Computerisms Corporation >> >> >>>> > > escreveu: >> >> >>>> > >> >> >>>> > Hi Gilberto, >> >> >>>> > >> >> >>>> > My understanding is there can only be one arbiter per >> >> >replicated >> >> >>>> > set. I >> >> >>>> > don't have a lot of practice with gluster, so this could >> >be >> >> >bad >> >> >>>> advice, >> >> >>>> > but the way I dealt with it on my two servers was to use 6 >> >> >bricks >> >> >>>> as >> >> >>>> > distributed-replicated (this is also relatively easy to >> >> >migrate to >> >> >>>> 3 >> >> >>>> > servers if that happens for you in the future): >> >> >>>> > >> >> >>>> > Server1 Server2 >> >> >>>> > brick1 brick1.5 >> >> >>>> > arbiter1.5 brick2 >> >> >>>> > brick2.5 arbiter2.5 >> >> >>>> > >> >> >>>> > On 2020-08-04 7:00 p.m., Gilberto Nunes wrote: >> >> >>>> > > Hi there. >> >> >>>> > > I have two physical servers deployed as replica 2 and, >> >> >>>> obviously, >> >> >>>> > I got >> >> >>>> > > a split-brain. >> >> >>>> > > So I am thinking in use two virtual machines,each one >> >in >> >> >>>> physical >> >> >>>> > > servers.... >> >> >>>> > > Then this two VMS act as a artiber of gluster set.... >> >> >>>> > > >> >> >>>> > > Is this doable? >> >> >>>> > > >> >> >>>> > > Thanks >> >> >>>> > > >> >> >>>> > > ________ >> >> >>>> > > >> >> >>>> > > >> >> >>>> > > >> >> >>>> > > Community Meeting Calendar: >> >> >>>> > > >> >> >>>> > > Schedule - >> >> >>>> > > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >> >> >>>> > > Bridge: https://bluejeans.com/441850968 >> >> >>>> > > >> >> >>>> > > Gluster-users mailing list >> >> >>>> > > Gluster-users at gluster.org >> >> > >> >> >>>> > > >> >https://lists.gluster.org/mailman/listinfo/gluster-users >> >> >>>> > > >> >> >>>> > ________ >> >> >>>> > >> >> >>>> > >> >> >>>> > >> >> >>>> > Community Meeting Calendar: >> >> >>>> > >> >> >>>> > Schedule - >> >> >>>> > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >> >> >>>> > Bridge: https://bluejeans.com/441850968 >> >> >>>> > >> >> >>>> > Gluster-users mailing list >> >> >>>> > Gluster-users at gluster.org >> > >> >> >>>> > https://lists.gluster.org/mailman/listinfo/gluster-users >> >> >>>> > >> >> >>>> >> >> >>> >> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hunter86_bg at yahoo.com Thu Aug 6 17:14:42 2020 From: hunter86_bg at yahoo.com (Strahil Nikolov) Date: Thu, 06 Aug 2020 20:14:42 +0300 Subject: [Gluster-users] Two VMS as arbiter... In-Reply-To: References: <4610d2cc-eafa-6a5b-d778-797e6ce7e994@computerisms.ca> <6496d212-9ffa-5112-fc14-aee578b25f01@computerisms.ca> <77292D07-DE1E-4797-A54C-13086317763C@yahoo.com> Message-ID: <7A2FF09A-AE59-4C89-B596-968D55BD5521@yahoo.com> The settings I got in my group is: [root at ovirt1 ~]# cat /var/lib/glusterd/groups/virt performance.quick-read=off performance.read-ahead=off performance.io-cache=off performance.low-prio-threads=32 network.remote-dio=enable cluster.eager-lock=enable cluster.quorum-type=auto cluster.server-quorum-type=server cluster.data-self-heal-algorithm=full cluster.locking-scheme=granular cluster.shd-max-threads=8 cluster.shd-wait-qlength=10000 features.shard=on user.cifs=off cluster.choose-local=off client.event-threads=4 server.event-threads=4 performance.client-io-threads=on I'm not sure that sharded files are treated as big or not.If your brick disks are faster than your network bandwidth, you can enable 'cluster.choose-local' . Keep in mind that some users report issues with sparse qcow2 images during intensive writes (suspected shard xlator cannot create fast enough the shards -> default shard size (64MB) is way smaller than the RedHat's supported size which is 512MB) and I would recommend you to use preallocated qcow2 disks as much as possible or to bump the shard size. Sharding was developed especially for Virt usage. Consider using another cluster.favorite-child-policy , as all shards have the same size. Best Regards, Strahil Nikolov ?? 6 ?????? 2020 ?. 16:37:07 GMT+03:00, Gilberto Nunes ??????: >Oh I see... I was confused because the terms... Now I read this and >everything becomes clear... > >https://staged-gluster-docs.readthedocs.io/en/release3.7.0beta1/Features/shard/ > >https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.3/html/configuring_red_hat_virtualization_with_red_hat_gluster_storage/chap-hosting_virtual_machine_images_on_red_hat_storage_volumes > > >Should I use cluster.granular-entrey-heal-enable too, since I am >working >with big files? > >Thanks > >--- >Gilberto Nunes Ferreira > >(47) 3025-5907 >(47) 99676-7530 - Whatsapp / Telegram > >Skype: gilberto.nunes36 > > > > > >Em qui., 6 de ago. de 2020 ?s 09:32, Gilberto Nunes < >gilberto.nunes32 at gmail.com> escreveu: > >> What do you mean "sharding"? Do you mean sharing folders between two >> servers to host qcow2 or raw vm images? >> Here I am using Proxmox which uses qemu but not virsh. >> >> Thanks >> --- >> Gilberto Nunes Ferreira >> >> (47) 3025-5907 >> (47) 99676-7530 - Whatsapp / Telegram >> >> Skype: gilberto.nunes36 >> >> >> >> >> >> Em qui., 6 de ago. de 2020 ?s 01:09, Strahil Nikolov < >> hunter86_bg at yahoo.com> escreveu: >> >>> As you mentioned qcow2 files, check the virt group >>> (/var/lib/glusterfs/group or something like that). It has optimal >setttins >>> for VMs and is used by oVirt. >>> >>> WARNING: If you decide to enable the group, which will also enable >>> sharding, NEVER EVER DISABLE SHARDING -> ONCE ENABLED STAYS ENABLED >!!! >>> Sharding helps reduce loocking during replica heals. >>> >>> WARNING2: As virt group uses sharding (fixes the size of file into >shard >>> size), you should consider cluster.favorite-child-policy with value >>> ctime/mtime. >>> >>> Best Regards, >>> Strahil Nikolov >>> >>> ?? 6 ?????? 2020 ?. 1:56:58 GMT+03:00, Gilberto Nunes < >>> gilberto.nunes32 at gmail.com> ??????: >>> >Ok...Thanks a lot Strahil >>> > >>> >This gluster volume set VMS cluster.favorite-child-policy size do >the >>> >trick >>> >to me here! >>> > >>> >Cheers >>> >--- >>> >Gilberto Nunes Ferreira >>> > >>> >(47) 3025-5907 >>> >(47) 99676-7530 - Whatsapp / Telegram >>> > >>> >Skype: gilberto.nunes36 >>> > >>> > >>> > >>> > >>> > >>> >Em qua., 5 de ago. de 2020 ?s 18:15, Strahil Nikolov >>> > >>> >escreveu: >>> > >>> >> This could happen if you have pending heals. Did you reboot that >node >>> >> recently ? >>> >> Did you set automatic unsplit-brain ? >>> >> >>> >> Check for pending heals and files in splitbrain. >>> >> >>> >> If not, you can check >>> >> >>> >>https://docs.gluster.org/en/latest/Troubleshooting/resolving-splitbrain/ >>> >> (look at point 5). >>> >> >>> >> Best Regards, >>> >> Strahil Nikolov >>> >> >>> >> ?? 5 ?????? 2020 ?. 23:41:57 GMT+03:00, Gilberto Nunes < >>> >> gilberto.nunes32 at gmail.com> ??????: >>> >> >I'm in trouble here. >>> >> >When I shutdown the pve01 server, the shared folder over >glusterfs >>> >is >>> >> >EMPTY! >>> >> >It's supposed to be a qcow2 file inside it. >>> >> >The content is show right, just after I power on pve01 backup... >>> >> > >>> >> >Some advice? >>> >> > >>> >> > >>> >> >Thanks >>> >> > >>> >> >--- >>> >> >Gilberto Nunes Ferreira >>> >> > >>> >> >(47) 3025-5907 >>> >> >(47) 99676-7530 - Whatsapp / Telegram >>> >> > >>> >> >Skype: gilberto.nunes36 >>> >> > >>> >> > >>> >> > >>> >> > >>> >> > >>> >> >Em qua., 5 de ago. de 2020 ?s 11:07, Gilberto Nunes < >>> >> >gilberto.nunes32 at gmail.com> escreveu: >>> >> > >>> >> >> Well... >>> >> >> I do the follow: >>> >> >> >>> >> >> gluster vol create VMS replica 3 arbiter 1 pve01:/DATA/brick1 >>> >> >> pve02:/DATA/brick1.5 pve01:/DATA/arbiter1.5 pve02:/DATA/brick2 >pv >>> >> >> e01:/DATA/brick2.5 pve02:/DATA/arbiter2.5 force >>> >> >> >>> >> >> And now I have: >>> >> >> gluster vol info >>> >> >> >>> >> >> Volume Name: VMS >>> >> >> Type: Distributed-Replicate >>> >> >> Volume ID: 1bd712f5-ccb9-4322-8275-abe363d1ffdd >>> >> >> Status: Started >>> >> >> Snapshot Count: 0 >>> >> >> Number of Bricks: 2 x (2 + 1) = 6 >>> >> >> Transport-type: tcp >>> >> >> Bricks: >>> >> >> Brick1: pve01:/DATA/brick1 >>> >> >> Brick2: pve02:/DATA/brick1.5 >>> >> >> Brick3: pve01:/DATA/arbiter1.5 (arbiter) >>> >> >> Brick4: pve02:/DATA/brick2 >>> >> >> Brick5: pve01:/DATA/brick2.5 >>> >> >> Brick6: pve02:/DATA/arbiter2.5 (arbiter) >>> >> >> Options Reconfigured: >>> >> >> cluster.quorum-count: 1 >>> >> >> cluster.quorum-reads: false >>> >> >> cluster.self-heal-daemon: enable >>> >> >> cluster.heal-timeout: 10 >>> >> >> storage.fips-mode-rchecksum: on >>> >> >> transport.address-family: inet >>> >> >> nfs.disable: on >>> >> >> performance.client-io-threads: off >>> >> >> >>> >> >> This values I have put it myself, in order to see if could >improve >>> >> >the >>> >> >> time to make the volume available, when pve01 goes down with >>> >ifupdown >>> >> >> cluster.quorum-count: 1 >>> >> >> cluster.quorum-reads: false >>> >> >> cluster.self-heal-daemon: enable >>> >> >> cluster.heal-timeout: 10 >>> >> >> >>> >> >> Nevertheless, it took more than 1 minutes to the volume VMS >>> >available >>> >> >in >>> >> >> the other host (pve02). >>> >> >> Is there any trick to reduce this time ? >>> >> >> >>> >> >> Thanks >>> >> >> >>> >> >> --- >>> >> >> Gilberto Nunes Ferreira >>> >> >> >>> >> >> >>> >> >> >>> >> >> >>> >> >> >>> >> >> >>> >> >> Em qua., 5 de ago. de 2020 ?s 08:57, Gilberto Nunes < >>> >> >> gilberto.nunes32 at gmail.com> escreveu: >>> >> >> >>> >> >>> hum I see... like this: >>> >> >>> [image: image.png] >>> >> >>> --- >>> >> >>> Gilberto Nunes Ferreira >>> >> >>> >>> >> >>> (47) 3025-5907 >>> >> >>> (47) 99676-7530 - Whatsapp / Telegram >>> >> >>> >>> >> >>> Skype: gilberto.nunes36 >>> >> >>> >>> >> >>> >>> >> >>> >>> >> >>> >>> >> >>> >>> >> >>> Em qua., 5 de ago. de 2020 ?s 02:14, Computerisms Corporation >< >>> >> >>> bob at computerisms.ca> escreveu: >>> >> >>> >>> >> >>>> check the example of the chained configuration on this page: >>> >> >>>> >>> >> >>>> >>> >> >>>> >>> >> > >>> >> >>> > >>> >https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.3/html/administration_guide/creating_arbitrated_replicated_volumes >>> >> >>>> >>> >> >>>> and apply it to two servers... >>> >> >>>> >>> >> >>>> On 2020-08-04 8:25 p.m., Gilberto Nunes wrote: >>> >> >>>> > Hi Bob! >>> >> >>>> > >>> >> >>>> > Could you, please, send me more detail about this >>> >configuration? >>> >> >>>> > I will appreciate that! >>> >> >>>> > >>> >> >>>> > Thank you >>> >> >>>> > --- >>> >> >>>> > Gilberto Nunes Ferreira >>> >> >>>> > >>> >> >>>> > (47) 3025-5907 >>> >> >>>> > ** >>> >> >>>> > (47) 99676-7530 - Whatsapp / Telegram >>> >> >>>> > >>> >> >>>> > Skype: gilberto.nunes36 >>> >> >>>> > >>> >> >>>> > >>> >> >>>> > >>> >> >>>> > >>> >> >>>> > >>> >> >>>> > Em ter., 4 de ago. de 2020 ?s 23:47, Computerisms >Corporation >>> >> >>>> > > >escreveu: >>> >> >>>> > >>> >> >>>> > Hi Gilberto, >>> >> >>>> > >>> >> >>>> > My understanding is there can only be one arbiter per >>> >> >replicated >>> >> >>>> > set. I >>> >> >>>> > don't have a lot of practice with gluster, so this >could >>> >be >>> >> >bad >>> >> >>>> advice, >>> >> >>>> > but the way I dealt with it on my two servers was to >use 6 >>> >> >bricks >>> >> >>>> as >>> >> >>>> > distributed-replicated (this is also relatively easy >to >>> >> >migrate to >>> >> >>>> 3 >>> >> >>>> > servers if that happens for you in the future): >>> >> >>>> > >>> >> >>>> > Server1 Server2 >>> >> >>>> > brick1 brick1.5 >>> >> >>>> > arbiter1.5 brick2 >>> >> >>>> > brick2.5 arbiter2.5 >>> >> >>>> > >>> >> >>>> > On 2020-08-04 7:00 p.m., Gilberto Nunes wrote: >>> >> >>>> > > Hi there. >>> >> >>>> > > I have two physical servers deployed as replica 2 >and, >>> >> >>>> obviously, >>> >> >>>> > I got >>> >> >>>> > > a split-brain. >>> >> >>>> > > So I am thinking in use two virtual machines,each >one >>> >in >>> >> >>>> physical >>> >> >>>> > > servers.... >>> >> >>>> > > Then this two VMS act as a artiber of gluster >set.... >>> >> >>>> > > >>> >> >>>> > > Is this doable? >>> >> >>>> > > >>> >> >>>> > > Thanks >>> >> >>>> > > >>> >> >>>> > > ________ >>> >> >>>> > > >>> >> >>>> > > >>> >> >>>> > > >>> >> >>>> > > Community Meeting Calendar: >>> >> >>>> > > >>> >> >>>> > > Schedule - >>> >> >>>> > > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >>> >> >>>> > > Bridge: https://bluejeans.com/441850968 >>> >> >>>> > > >>> >> >>>> > > Gluster-users mailing list >>> >> >>>> > > Gluster-users at gluster.org >>> >> > >>> >> >>>> > > >>> >https://lists.gluster.org/mailman/listinfo/gluster-users >>> >> >>>> > > >>> >> >>>> > ________ >>> >> >>>> > >>> >> >>>> > >>> >> >>>> > >>> >> >>>> > Community Meeting Calendar: >>> >> >>>> > >>> >> >>>> > Schedule - >>> >> >>>> > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >>> >> >>>> > Bridge: https://bluejeans.com/441850968 >>> >> >>>> > >>> >> >>>> > Gluster-users mailing list >>> >> >>>> > Gluster-users at gluster.org >>> > >>> >> >>>> > >https://lists.gluster.org/mailman/listinfo/gluster-users >>> >> >>>> > >>> >> >>>> >>> >> >>> >>> >> >>> >> From gilberto.nunes32 at gmail.com Thu Aug 6 18:15:33 2020 From: gilberto.nunes32 at gmail.com (Gilberto Nunes) Date: Thu, 6 Aug 2020 15:15:33 -0300 Subject: [Gluster-users] Two VMS as arbiter... In-Reply-To: <7A2FF09A-AE59-4C89-B596-968D55BD5521@yahoo.com> References: <4610d2cc-eafa-6a5b-d778-797e6ce7e994@computerisms.ca> <6496d212-9ffa-5112-fc14-aee578b25f01@computerisms.ca> <77292D07-DE1E-4797-A54C-13086317763C@yahoo.com> <7A2FF09A-AE59-4C89-B596-968D55BD5521@yahoo.com> Message-ID: The options that worked best in my tests were as follows, to avoid split-brain gluster vol set VMS cluster.heal-timeout 20 gluster volume heal VMS enable gluster vol set VMS cluster.quorum-reads false gluster vol set VMS cluster.quorum-count 1 gluster vol set VMS network.ping-timeout 2 gluster volume set VMS cluster.favorite-child-policy mtime gluster volume heal VMS granular-entry-heal enable gluster volume set VMS cluster.data-self-heal-algorithm full Here gluster volume set VMS cluster.favorite-child-policy mtime I used "size" but I read in several places that mtime is better ... I did several and exhaustive tests ... power off hosts, migrating vm, creating folders and files inside the vm ... activating HA etc ... After the "crash" ie after the host that was restarted / shutdown comes back, the volume looks like this Brick pve02: / DATA / brick /images/100/vm-100-disk-0.qcow2 - Possibly undergoing heal Status: Connected Number of entries: 1 Indicating that healing is taking place ... After a few minutes / hours depending on the hardware speed, "possibly undergoing" disappears ... But at no time was there data loss ... While possibly undergoing heals I migrate the vm from one side to another also without problems ... Here in the tests I performed, the healing of a 10G VM HD, having 4G busy, took 30 minutes ... Remembering that I'm using a virtualbox with 2 vms in it with 2 G of ram each, each vm being a proxmox. In a real environment this time is much less and also depends on the size of the VM's HD! Cheers --- Gilberto Nunes Ferreira Em qui., 6 de ago. de 2020 ?s 14:14, Strahil Nikolov escreveu: > The settings I got in my group is: > [root at ovirt1 ~]# cat /var/lib/glusterd/groups/virt > performance.quick-read=off > performance.read-ahead=off > performance.io-cache=off > performance.low-prio-threads=32 > network.remote-dio=enable > cluster.eager-lock=enable > cluster.quorum-type=auto > cluster.server-quorum-type=server > cluster.data-self-heal-algorithm=full > cluster.locking-scheme=granular > cluster.shd-max-threads=8 > cluster.shd-wait-qlength=10000 > features.shard=on > user.cifs=off > cluster.choose-local=off > client.event-threads=4 > server.event-threads=4 > performance.client-io-threads=on > > I'm not sure that sharded files are treated as big or not.If your > brick disks are faster than your network bandwidth, you can enable > 'cluster.choose-local' . > > Keep in mind that some users report issues with sparse qcow2 images > during intensive writes (suspected shard xlator cannot create fast enough > the shards -> default shard size (64MB) is way smaller than the RedHat's > supported size which is 512MB) and I would recommend you to use > preallocated qcow2 disks as much as possible or to bump the shard size. > > Sharding was developed especially for Virt usage. > > Consider using another cluster.favorite-child-policy , as all shards > have the same size. > > Best Regards, > Strahil Nikolov > > > > ?? 6 ?????? 2020 ?. 16:37:07 GMT+03:00, Gilberto Nunes < > gilberto.nunes32 at gmail.com> ??????: > >Oh I see... I was confused because the terms... Now I read this and > >everything becomes clear... > > > > > https://staged-gluster-docs.readthedocs.io/en/release3.7.0beta1/Features/shard/ > > > > > https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.3/html/configuring_red_hat_virtualization_with_red_hat_gluster_storage/chap-hosting_virtual_machine_images_on_red_hat_storage_volumes > > > > > >Should I use cluster.granular-entrey-heal-enable too, since I am > >working > >with big files? > > > >Thanks > > > >--- > >Gilberto Nunes Ferreira > > > >(47) 3025-5907 > >(47) 99676-7530 - Whatsapp / Telegram > > > >Skype: gilberto.nunes36 > > > > > > > > > > > >Em qui., 6 de ago. de 2020 ?s 09:32, Gilberto Nunes < > >gilberto.nunes32 at gmail.com> escreveu: > > > >> What do you mean "sharding"? Do you mean sharing folders between two > >> servers to host qcow2 or raw vm images? > >> Here I am using Proxmox which uses qemu but not virsh. > >> > >> Thanks > >> --- > >> Gilberto Nunes Ferreira > >> > >> (47) 3025-5907 > >> (47) 99676-7530 - Whatsapp / Telegram > >> > >> Skype: gilberto.nunes36 > >> > >> > >> > >> > >> > >> Em qui., 6 de ago. de 2020 ?s 01:09, Strahil Nikolov < > >> hunter86_bg at yahoo.com> escreveu: > >> > >>> As you mentioned qcow2 files, check the virt group > >>> (/var/lib/glusterfs/group or something like that). It has optimal > >setttins > >>> for VMs and is used by oVirt. > >>> > >>> WARNING: If you decide to enable the group, which will also enable > >>> sharding, NEVER EVER DISABLE SHARDING -> ONCE ENABLED STAYS ENABLED > >!!! > >>> Sharding helps reduce loocking during replica heals. > >>> > >>> WARNING2: As virt group uses sharding (fixes the size of file into > >shard > >>> size), you should consider cluster.favorite-child-policy with value > >>> ctime/mtime. > >>> > >>> Best Regards, > >>> Strahil Nikolov > >>> > >>> ?? 6 ?????? 2020 ?. 1:56:58 GMT+03:00, Gilberto Nunes < > >>> gilberto.nunes32 at gmail.com> ??????: > >>> >Ok...Thanks a lot Strahil > >>> > > >>> >This gluster volume set VMS cluster.favorite-child-policy size do > >the > >>> >trick > >>> >to me here! > >>> > > >>> >Cheers > >>> >--- > >>> >Gilberto Nunes Ferreira > >>> > > >>> >(47) 3025-5907 > >>> >(47) 99676-7530 - Whatsapp / Telegram > >>> > > >>> >Skype: gilberto.nunes36 > >>> > > >>> > > >>> > > >>> > > >>> > > >>> >Em qua., 5 de ago. de 2020 ?s 18:15, Strahil Nikolov > >>> > > >>> >escreveu: > >>> > > >>> >> This could happen if you have pending heals. Did you reboot that > >node > >>> >> recently ? > >>> >> Did you set automatic unsplit-brain ? > >>> >> > >>> >> Check for pending heals and files in splitbrain. > >>> >> > >>> >> If not, you can check > >>> >> > >>> > >>https://docs.gluster.org/en/latest/Troubleshooting/resolving-splitbrain/ > >>> >> (look at point 5). > >>> >> > >>> >> Best Regards, > >>> >> Strahil Nikolov > >>> >> > >>> >> ?? 5 ?????? 2020 ?. 23:41:57 GMT+03:00, Gilberto Nunes < > >>> >> gilberto.nunes32 at gmail.com> ??????: > >>> >> >I'm in trouble here. > >>> >> >When I shutdown the pve01 server, the shared folder over > >glusterfs > >>> >is > >>> >> >EMPTY! > >>> >> >It's supposed to be a qcow2 file inside it. > >>> >> >The content is show right, just after I power on pve01 backup... > >>> >> > > >>> >> >Some advice? > >>> >> > > >>> >> > > >>> >> >Thanks > >>> >> > > >>> >> >--- > >>> >> >Gilberto Nunes Ferreira > >>> >> > > >>> >> >(47) 3025-5907 > >>> >> >(47) 99676-7530 - Whatsapp / Telegram > >>> >> > > >>> >> >Skype: gilberto.nunes36 > >>> >> > > >>> >> > > >>> >> > > >>> >> > > >>> >> > > >>> >> >Em qua., 5 de ago. de 2020 ?s 11:07, Gilberto Nunes < > >>> >> >gilberto.nunes32 at gmail.com> escreveu: > >>> >> > > >>> >> >> Well... > >>> >> >> I do the follow: > >>> >> >> > >>> >> >> gluster vol create VMS replica 3 arbiter 1 pve01:/DATA/brick1 > >>> >> >> pve02:/DATA/brick1.5 pve01:/DATA/arbiter1.5 pve02:/DATA/brick2 > >pv > >>> >> >> e01:/DATA/brick2.5 pve02:/DATA/arbiter2.5 force > >>> >> >> > >>> >> >> And now I have: > >>> >> >> gluster vol info > >>> >> >> > >>> >> >> Volume Name: VMS > >>> >> >> Type: Distributed-Replicate > >>> >> >> Volume ID: 1bd712f5-ccb9-4322-8275-abe363d1ffdd > >>> >> >> Status: Started > >>> >> >> Snapshot Count: 0 > >>> >> >> Number of Bricks: 2 x (2 + 1) = 6 > >>> >> >> Transport-type: tcp > >>> >> >> Bricks: > >>> >> >> Brick1: pve01:/DATA/brick1 > >>> >> >> Brick2: pve02:/DATA/brick1.5 > >>> >> >> Brick3: pve01:/DATA/arbiter1.5 (arbiter) > >>> >> >> Brick4: pve02:/DATA/brick2 > >>> >> >> Brick5: pve01:/DATA/brick2.5 > >>> >> >> Brick6: pve02:/DATA/arbiter2.5 (arbiter) > >>> >> >> Options Reconfigured: > >>> >> >> cluster.quorum-count: 1 > >>> >> >> cluster.quorum-reads: false > >>> >> >> cluster.self-heal-daemon: enable > >>> >> >> cluster.heal-timeout: 10 > >>> >> >> storage.fips-mode-rchecksum: on > >>> >> >> transport.address-family: inet > >>> >> >> nfs.disable: on > >>> >> >> performance.client-io-threads: off > >>> >> >> > >>> >> >> This values I have put it myself, in order to see if could > >improve > >>> >> >the > >>> >> >> time to make the volume available, when pve01 goes down with > >>> >ifupdown > >>> >> >> cluster.quorum-count: 1 > >>> >> >> cluster.quorum-reads: false > >>> >> >> cluster.self-heal-daemon: enable > >>> >> >> cluster.heal-timeout: 10 > >>> >> >> > >>> >> >> Nevertheless, it took more than 1 minutes to the volume VMS > >>> >available > >>> >> >in > >>> >> >> the other host (pve02). > >>> >> >> Is there any trick to reduce this time ? > >>> >> >> > >>> >> >> Thanks > >>> >> >> > >>> >> >> --- > >>> >> >> Gilberto Nunes Ferreira > >>> >> >> > >>> >> >> > >>> >> >> > >>> >> >> > >>> >> >> > >>> >> >> > >>> >> >> Em qua., 5 de ago. de 2020 ?s 08:57, Gilberto Nunes < > >>> >> >> gilberto.nunes32 at gmail.com> escreveu: > >>> >> >> > >>> >> >>> hum I see... like this: > >>> >> >>> [image: image.png] > >>> >> >>> --- > >>> >> >>> Gilberto Nunes Ferreira > >>> >> >>> > >>> >> >>> (47) 3025-5907 > >>> >> >>> (47) 99676-7530 - Whatsapp / Telegram > >>> >> >>> > >>> >> >>> Skype: gilberto.nunes36 > >>> >> >>> > >>> >> >>> > >>> >> >>> > >>> >> >>> > >>> >> >>> > >>> >> >>> Em qua., 5 de ago. de 2020 ?s 02:14, Computerisms Corporation > >< > >>> >> >>> bob at computerisms.ca> escreveu: > >>> >> >>> > >>> >> >>>> check the example of the chained configuration on this page: > >>> >> >>>> > >>> >> >>>> > >>> >> >>>> > >>> >> > > >>> >> > >>> > > >>> > > > https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.3/html/administration_guide/creating_arbitrated_replicated_volumes > >>> >> >>>> > >>> >> >>>> and apply it to two servers... > >>> >> >>>> > >>> >> >>>> On 2020-08-04 8:25 p.m., Gilberto Nunes wrote: > >>> >> >>>> > Hi Bob! > >>> >> >>>> > > >>> >> >>>> > Could you, please, send me more detail about this > >>> >configuration? > >>> >> >>>> > I will appreciate that! > >>> >> >>>> > > >>> >> >>>> > Thank you > >>> >> >>>> > --- > >>> >> >>>> > Gilberto Nunes Ferreira > >>> >> >>>> > > >>> >> >>>> > (47) 3025-5907 > >>> >> >>>> > ** > >>> >> >>>> > (47) 99676-7530 - Whatsapp / Telegram > >>> >> >>>> > > >>> >> >>>> > Skype: gilberto.nunes36 > >>> >> >>>> > > >>> >> >>>> > > >>> >> >>>> > > >>> >> >>>> > > >>> >> >>>> > > >>> >> >>>> > Em ter., 4 de ago. de 2020 ?s 23:47, Computerisms > >Corporation > >>> >> >>>> > > > >escreveu: > >>> >> >>>> > > >>> >> >>>> > Hi Gilberto, > >>> >> >>>> > > >>> >> >>>> > My understanding is there can only be one arbiter per > >>> >> >replicated > >>> >> >>>> > set. I > >>> >> >>>> > don't have a lot of practice with gluster, so this > >could > >>> >be > >>> >> >bad > >>> >> >>>> advice, > >>> >> >>>> > but the way I dealt with it on my two servers was to > >use 6 > >>> >> >bricks > >>> >> >>>> as > >>> >> >>>> > distributed-replicated (this is also relatively easy > >to > >>> >> >migrate to > >>> >> >>>> 3 > >>> >> >>>> > servers if that happens for you in the future): > >>> >> >>>> > > >>> >> >>>> > Server1 Server2 > >>> >> >>>> > brick1 brick1.5 > >>> >> >>>> > arbiter1.5 brick2 > >>> >> >>>> > brick2.5 arbiter2.5 > >>> >> >>>> > > >>> >> >>>> > On 2020-08-04 7:00 p.m., Gilberto Nunes wrote: > >>> >> >>>> > > Hi there. > >>> >> >>>> > > I have two physical servers deployed as replica 2 > >and, > >>> >> >>>> obviously, > >>> >> >>>> > I got > >>> >> >>>> > > a split-brain. > >>> >> >>>> > > So I am thinking in use two virtual machines,each > >one > >>> >in > >>> >> >>>> physical > >>> >> >>>> > > servers.... > >>> >> >>>> > > Then this two VMS act as a artiber of gluster > >set.... > >>> >> >>>> > > > >>> >> >>>> > > Is this doable? > >>> >> >>>> > > > >>> >> >>>> > > Thanks > >>> >> >>>> > > > >>> >> >>>> > > ________ > >>> >> >>>> > > > >>> >> >>>> > > > >>> >> >>>> > > > >>> >> >>>> > > Community Meeting Calendar: > >>> >> >>>> > > > >>> >> >>>> > > Schedule - > >>> >> >>>> > > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > >>> >> >>>> > > Bridge: https://bluejeans.com/441850968 > >>> >> >>>> > > > >>> >> >>>> > > Gluster-users mailing list > >>> >> >>>> > > Gluster-users at gluster.org > >>> >> > > >>> >> >>>> > > > >>> >https://lists.gluster.org/mailman/listinfo/gluster-users > >>> >> >>>> > > > >>> >> >>>> > ________ > >>> >> >>>> > > >>> >> >>>> > > >>> >> >>>> > > >>> >> >>>> > Community Meeting Calendar: > >>> >> >>>> > > >>> >> >>>> > Schedule - > >>> >> >>>> > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > >>> >> >>>> > Bridge: https://bluejeans.com/441850968 > >>> >> >>>> > > >>> >> >>>> > Gluster-users mailing list > >>> >> >>>> > Gluster-users at gluster.org > >>> > > >>> >> >>>> > > >https://lists.gluster.org/mailman/listinfo/gluster-users > >>> >> >>>> > > >>> >> >>>> > >>> >> >>> > >>> >> > >>> > >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From archon810 at gmail.com Thu Aug 6 18:39:20 2020 From: archon810 at gmail.com (Artem Russakovskii) Date: Thu, 6 Aug 2020 11:39:20 -0700 Subject: [Gluster-users] [Gluster-devel] Announcing Gluster release 7.7 In-Reply-To: References: Message-ID: Looks like someone built gluster 7.7 for OpenSUSE 15.1 after all. Yay. Sincerely, Artem -- Founder, Android Police , APK Mirror , Illogical Robot LLC beerpla.net | @ArtemR On Sun, Aug 2, 2020 at 11:46 PM Hu Bert wrote: > Hi there, > > just wanted to say thanks to all the developers, maintainers etc. This > release (7) has brought us a small but nice performance improvement. > Utilization and IOs per disk decreased, latency dropped. See attached > images. > > I read the release notes but couldn't identify the specific > changes/features for this improvement. Maybe someone could point to > them - but no hurry... :-) > > > Best regards, > Hubert > > Am Mi., 22. Juli 2020 um 18:27 Uhr schrieb Rinku Kothiya < > rkothiya at redhat.com>: > > > > Hi, > > > > The Gluster community is pleased to announce the release of Gluster7.7 > (packages available at [1]). > > Release notes for the release can be found at [2]. > > > > Major changes, features and limitations addressed in this release: > > None > > > > Please Note: Some of the packages are unavailable and we are working on > it. We will release them soon. > > > > Thanks, > > Gluster community > > > > References: > > > > [1] Packages for 7.7: > > https://download.gluster.org/pub/gluster/glusterfs/7/7.7/ > > > > [2] Release notes for 7.7: > > https://docs.gluster.org/en/latest/release-notes/7.7/ > > ________ > > > > > > > > Community Meeting Calendar: > > > > Schedule - > > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > > Bridge: https://bluejeans.com/441850968 > > > > Gluster-users mailing list > > Gluster-users at gluster.org > > https://lists.gluster.org/mailman/listinfo/gluster-users > ________ > > > > Community Meeting Calendar: > > Schedule - > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > Bridge: https://bluejeans.com/441850968 > > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mathias.waack at seim-partner.de Fri Aug 7 07:24:38 2020 From: mathias.waack at seim-partner.de (Mathias Waack) Date: Fri, 7 Aug 2020 09:24:38 +0200 Subject: [Gluster-users] Repair after accident In-Reply-To: <5822bb92-432e-e08e-d230-7adbf57127ce@seim-partner.de> References: <5822bb92-432e-e08e-d230-7adbf57127ce@seim-partner.de> Message-ID: Hi all, maybe I should add some more information: The container which filled up the space was running on node x, which still shows a nearly filled fs: 192.168.1.x:/gvol? 2.6T? 2.5T? 149G? 95% /gluster nearly the same situation on the underlying brick partition on node x: zdata/brick???? 2.6T? 2.4T? 176G? 94% /zbrick On node y the network card crashed, glusterfs shows the same values: 192.168.1.y:/gvol? 2.6T? 2.5T? 149G? 95% /gluster but different values on the brick: zdata/brick???? 2.9T? 1.6T? 1.4T? 54% /zbrick I think this happened because glusterfs still has hardlinks to the deleted files on node x? So I can find these files with: find /zbrick/.glusterfs -links 1 -ls | grep -v ' -> ' But now I am lost. How can I verify these files really belongs to the right container? Or can I just delete this files because there is no way to access it? Or offers glusterfs a way to solve this situation? Mathias On 05.08.20 15:48, Mathias Waack wrote: > Hi all, > > we are running a gluster setup with two nodes: > > Status of volume: gvol > Gluster process???????????????????????????? TCP Port? RDMA Port > Online? Pid > ------------------------------------------------------------------------------ > > Brick 192.168.1.x:/zbrick????????????????? 49152???? 0 Y 13350 > Brick 192.168.1.y:/zbrick????????????????? 49152???? 0 Y 5965 > Self-heal Daemon on localhost?????????????? N/A?????? N/A Y 14188 > Self-heal Daemon on 192.168.1.93??????????? N/A?????? N/A Y 6003 > > Task Status of Volume gvol > ------------------------------------------------------------------------------ > > There are no active volume tasks > > The glusterfs hosts a bunch of containers with its data volumes. The > underlying fs is zfs. Few days ago one of the containers created a lot > of files in one of its data volumes, and at the end it completely > filled up the space of the glusterfs volume. But this happened only on > one host, on the other host there was still enough space. We finally > were able to identify this container and found out, the sizes of the > data on /zbrick were different on both hosts for this container. Now > we made the big mistake to delete these files on both hosts in the > /zbrick volume, not on the mounted glusterfs volume. > > Later we found the reason for this behavior: the network driver on the > second node partially crashed (which means we ware able to login on > the node, so we assumed the network was running, but the card was > already dropping packets at this time) at the same time, as the failed > container started to fill up the gluster volume. After rebooting the > second node? the gluster became available again. > > Now the glusterfs volume is running again- but it is still (nearly) > full: the files created by the container are not visible, but they > still count into amount of free space. How can we fix this? > > In addition there are some files which are no longer accessible since > this accident: > > tail access.log.old > tail: cannot open 'access.log.old' for reading: Input/output error > > Looks like affected by this error are files which have been changed > during the accident. Is there a way to fix this too? > > Thanks > ??? Mathias > > > ________ > > > > Community Meeting Calendar: > > Schedule - > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > Bridge: https://bluejeans.com/441850968 > > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users From hunter86_bg at yahoo.com Fri Aug 7 12:32:46 2020 From: hunter86_bg at yahoo.com (Strahil Nikolov) Date: Fri, 07 Aug 2020 15:32:46 +0300 Subject: [Gluster-users] Repair after accident In-Reply-To: References: <5822bb92-432e-e08e-d230-7adbf57127ce@seim-partner.de> Message-ID: <333712BC-D10B-4759-AED1-7793F1C17AC6@yahoo.com> Have you tried to gluster heal and check if the files are back into their place? I always thought that those hard links are used by the healing mechanism and if that is true - gluster should restore the files to their original location and then wiping the correct files from FUSE will be easy. Best Regards, Strahil Nikolov ?? 7 ?????? 2020 ?. 10:24:38 GMT+03:00, Mathias Waack ??????: >Hi all, > >maybe I should add some more information: > >The container which filled up the space was running on node x, which >still shows a nearly filled fs: > >192.168.1.x:/gvol? 2.6T? 2.5T? 149G? 95% /gluster > >nearly the same situation on the underlying brick partition on node x: > >zdata/brick???? 2.6T? 2.4T? 176G? 94% /zbrick > >On node y the network card crashed, glusterfs shows the same values: > >192.168.1.y:/gvol? 2.6T? 2.5T? 149G? 95% /gluster > >but different values on the brick: > >zdata/brick???? 2.9T? 1.6T? 1.4T? 54% /zbrick > >I think this happened because glusterfs still has hardlinks to the >deleted files on node x? So I can find these files with: > >find /zbrick/.glusterfs -links 1 -ls | grep -v ' -> ' > >But now I am lost. How can I verify these files really belongs to the >right container? Or can I just delete this files because there is no >way >to access it? Or offers glusterfs a way to solve this situation? > >Mathias > >On 05.08.20 15:48, Mathias Waack wrote: >> Hi all, >> >> we are running a gluster setup with two nodes: >> >> Status of volume: gvol >> Gluster process???????????????????????????? TCP Port? RDMA Port >> Online? Pid >> >------------------------------------------------------------------------------ > >> >> Brick 192.168.1.x:/zbrick????????????????? 49152???? 0 Y 13350 >> Brick 192.168.1.y:/zbrick????????????????? 49152???? 0 Y 5965 >> Self-heal Daemon on localhost?????????????? N/A?????? N/A Y 14188 >> Self-heal Daemon on 192.168.1.93??????????? N/A?????? N/A Y 6003 >> >> Task Status of Volume gvol >> >------------------------------------------------------------------------------ > >> >> There are no active volume tasks >> >> The glusterfs hosts a bunch of containers with its data volumes. The >> underlying fs is zfs. Few days ago one of the containers created a >lot >> of files in one of its data volumes, and at the end it completely >> filled up the space of the glusterfs volume. But this happened only >on >> one host, on the other host there was still enough space. We finally >> were able to identify this container and found out, the sizes of the >> data on /zbrick were different on both hosts for this container. Now >> we made the big mistake to delete these files on both hosts in the >> /zbrick volume, not on the mounted glusterfs volume. >> >> Later we found the reason for this behavior: the network driver on >the >> second node partially crashed (which means we ware able to login on >> the node, so we assumed the network was running, but the card was >> already dropping packets at this time) at the same time, as the >failed >> container started to fill up the gluster volume. After rebooting the >> second node? the gluster became available again. >> >> Now the glusterfs volume is running again- but it is still (nearly) >> full: the files created by the container are not visible, but they >> still count into amount of free space. How can we fix this? >> >> In addition there are some files which are no longer accessible since > >> this accident: >> >> tail access.log.old >> tail: cannot open 'access.log.old' for reading: Input/output error >> >> Looks like affected by this error are files which have been changed >> during the accident. Is there a way to fix this too? >> >> Thanks >> ??? Mathias >> >> >> ________ >> >> >> >> Community Meeting Calendar: >> >> Schedule - >> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >> Bridge: https://bluejeans.com/441850968 >> >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users >________ > > > >Community Meeting Calendar: > >Schedule - >Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >Bridge: https://bluejeans.com/441850968 > >Gluster-users mailing list >Gluster-users at gluster.org >https://lists.gluster.org/mailman/listinfo/gluster-users From crl.langlois at gmail.com Fri Aug 7 17:14:07 2020 From: crl.langlois at gmail.com (carl langlois) Date: Fri, 7 Aug 2020 13:14:07 -0400 Subject: [Gluster-users] Keep having unsync entries Message-ID: Hi all, I am currently upgrading my ovirt cluster and after doing the upgrade on one node i end up having unsync entries that heal by the headl command. My setup is a 2+1 with 4 volume. here is a snapshot of one a volume info Volume Name: data Type: Replicate Volume ID: 71c999a4-b769-471f-8169-a1a66b28f9b0 Status: Started Snapshot Count: 0 Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: ovhost1:/gluster_bricks/data/data Brick2: ovhost2:/gluster_bricks/data/data Brick3: ovhost3:/gluster_bricks/data/data (arbiter) Options Reconfigured: server.allow-insecure: on nfs.disable: on transport.address-family: inet performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.low-prio-threads: 32 network.remote-dio: enable cluster.eager-lock: enable cluster.quorum-type: auto cluster.server-quorum-type: server cluster.data-self-heal-algorithm: full cluster.locking-scheme: granular cluster.shd-max-threads: 8 cluster.shd-wait-qlength: 10000 features.shard: on user.cifs: off storage.owner-uid: 36 storage.owner-gid: 36 network.ping-timeout: 30 performance.strict-o-direct: on cluster.granular-entry-heal: enable features.shard-block-size: 64MB Also the output of v headl data info gluster> v heal data info Brick ovhost1:/gluster_bricks/data/data /4e59777c-5b7b-4bf1-8463-1c818067955e/dom_md/ids /__DIRECT_IO_TEST__ Status: Connected Number of entries: 2 Brick ovhost2:/gluster_bricks/data/data Status: Connected Number of entries: 0 Brick ovhost3:/gluster_bricks/data/data /4e59777c-5b7b-4bf1-8463-1c818067955e/dom_md/ids /__DIRECT_IO_TEST__ Status: Connected Number of entries: 2 does not seem to be a split brain also. gluster> v heal data info split-brain Brick ovhost1:/gluster_bricks/data/data Status: Connected Number of entries in split-brain: 0 Brick ovhost2:/gluster_bricks/data/data Status: Connected Number of entries in split-brain: 0 Brick ovhost3:/gluster_bricks/data/data Status: Connected Number of entries in split-brain: 0 not sure how to resolve this issue. gluster version is 3.2.15 Regards Carl -------------- next part -------------- An HTML attachment was scrubbed... URL: From hunter86_bg at yahoo.com Fri Aug 7 18:00:15 2020 From: hunter86_bg at yahoo.com (Strahil Nikolov) Date: Fri, 07 Aug 2020 21:00:15 +0300 Subject: [Gluster-users] Keep having unsync entries In-Reply-To: References: Message-ID: <6EA80B60-FB4B-4B68-883C-E81BC1A95FFC@yahoo.com> I think Ravi made a change to prevent that in gluster v6.6 You can rsync the 2 files from ovhost1 and run a full heal (I don't know why heal without 'full' doesn't clean up the entries). Anyways, ovirt can live without these 2 , but as you don't want to risk any downtimes - just rsync them from ovhost1 and run a 'gluster volume heal data full'. By the way , which version of ovirt do you use ? Gluster v3 was used in 4.2.X Best Regards, Strahil Nikolov ?? 7 ?????? 2020 ?. 20:14:07 GMT+03:00, carl langlois ??????: >Hi all, > >I am currently upgrading my ovirt cluster and after doing the upgrade >on >one node i end up having unsync entries that heal by the headl command. >My setup is a 2+1 with 4 volume. >here is a snapshot of one a volume info >Volume Name: data >Type: Replicate >Volume ID: 71c999a4-b769-471f-8169-a1a66b28f9b0 >Status: Started >Snapshot Count: 0 >Number of Bricks: 1 x (2 + 1) = 3 >Transport-type: tcp >Bricks: >Brick1: ovhost1:/gluster_bricks/data/data >Brick2: ovhost2:/gluster_bricks/data/data >Brick3: ovhost3:/gluster_bricks/data/data (arbiter) >Options Reconfigured: >server.allow-insecure: on >nfs.disable: on >transport.address-family: inet >performance.quick-read: off >performance.read-ahead: off >performance.io-cache: off >performance.low-prio-threads: 32 >network.remote-dio: enable >cluster.eager-lock: enable >cluster.quorum-type: auto >cluster.server-quorum-type: server >cluster.data-self-heal-algorithm: full >cluster.locking-scheme: granular >cluster.shd-max-threads: 8 >cluster.shd-wait-qlength: 10000 >features.shard: on >user.cifs: off >storage.owner-uid: 36 >storage.owner-gid: 36 >network.ping-timeout: 30 >performance.strict-o-direct: on >cluster.granular-entry-heal: enable >features.shard-block-size: 64MB > >Also the output of v headl data info > >gluster> v heal data info >Brick ovhost1:/gluster_bricks/data/data >/4e59777c-5b7b-4bf1-8463-1c818067955e/dom_md/ids >/__DIRECT_IO_TEST__ >Status: Connected >Number of entries: 2 > >Brick ovhost2:/gluster_bricks/data/data >Status: Connected >Number of entries: 0 > >Brick ovhost3:/gluster_bricks/data/data >/4e59777c-5b7b-4bf1-8463-1c818067955e/dom_md/ids >/__DIRECT_IO_TEST__ >Status: Connected >Number of entries: 2 > >does not seem to be a split brain also. >gluster> v heal data info split-brain >Brick ovhost1:/gluster_bricks/data/data >Status: Connected >Number of entries in split-brain: 0 > >Brick ovhost2:/gluster_bricks/data/data >Status: Connected >Number of entries in split-brain: 0 > >Brick ovhost3:/gluster_bricks/data/data >Status: Connected >Number of entries in split-brain: 0 > >not sure how to resolve this issue. >gluster version is 3.2.15 > >Regards > >Carl From gilberto.nunes32 at gmail.com Fri Aug 7 18:03:06 2020 From: gilberto.nunes32 at gmail.com (Gilberto Nunes) Date: Fri, 7 Aug 2020 15:03:06 -0300 Subject: [Gluster-users] Pending healing... Message-ID: Hi I have a pending entry like this gluster vol heal VMS info summary Brick glusterfs01:/DATA/vms Status: Connected Total Number of entries: 1 Number of entries in heal pending: 1 Number of entries in split-brain: 0 Number of entries possibly healing: 0 Brick glusterfs02:/DATA/vms Status: Connected Total Number of entries: 1 Number of entries in heal pending: 1 Number of entries in split-brain: 0 Number of entries possibly healing: 0 How can I solve this? Should I follow this? https://icicimov.github.io/blog/high-availability/GlusterFS-metadata-split-brain-recovery/ --- Gilberto Nunes Ferreira -------------- next part -------------- An HTML attachment was scrubbed... URL: From bob at computerisms.ca Fri Aug 7 18:27:56 2020 From: bob at computerisms.ca (Computerisms Corporation) Date: Fri, 7 Aug 2020 11:27:56 -0700 Subject: [Gluster-users] performance In-Reply-To: References: <696b3c28-519b-c3e3-ce5d-e60d2f194d4c@computerisms.ca> <7991483E-5365-4C87-89FA-C871AED18062@yahoo.com> <345b06c4-5996-9aa3-f846-0944c60ee398@computerisms.ca> <2CD68ED2-199F-407D-B0CC-385793BA16FD@yahoo.com> <64ee1b88-42d6-75d2-05ff-4703d168cc25@computerisms.ca> Message-ID: <40dab403-411b-aa6a-83e5-7a45021ac866@computerisms.ca> Hi Artem and others, Happy to report the system has been relatively stable for the remainder of the week. I have one wordpress site that seems to get hung processes when someone logs in with an incorrect password. Since it is only one, and reliably reproduceable, I am not sure if the issue is to do with Gluster or Wordpress itself, but afaik it was not doing it some months back before the system was using Gluster so I am guessing some combo of both. Regardless, that is the one and only time apache processes stacked up to over 150, and that still only brought the load average up to just under 25; the system did go a bit sluggish, but remained fairly responsive throughout until I restarted apache. Otherwise 15 minute load average consistently runs between 8 and 11 during peak hours and between 4 and 7 during off hours, and other than the one time I have not seen the one-minute load average go over 15. all resources still spike to full capacity from time to time, but it never remains that way for long like it did before. For site responsiveness, first visit to any given site is quite slow, like 3-5 seconds on straight html pages, 10-15 seconds for some of the more bloated WP themes, but clicking links within the site after the first page is loaded is relatively quick, like 1 second on straight html pages, and ~5-6 seconds on the bloated themes. Again, not sure if that is a Gluster related thing or something else. So, still holding my breath a bit, but seems this solution is working, at least for me. I haven't played with any of the other settings yet to see if I can improve it further, probably will next week. thinking to increase the write behind window size further to see what happens, as well as play with the settings suggested by Strahil. On 2020-08-05 5:28 p.m., Artem Russakovskii wrote: > I'm very curious whether these improvements hold up over the next few > days. Please report back. > > Sincerely, > Artem > > -- > Founder, Android Police , APK Mirror > , Illogical Robot LLC > beerpla.net | @ArtemR > > > On Wed, Aug 5, 2020 at 9:44 AM Computerisms Corporation > > wrote: > > Hi List, > > > So, we just moved into a quieter time of the day, but maybe I just > > stumbled onto something.? I was trying to figure out if/how I could > > throw more RAM at the problem.? gluster docs says write behind is > not a > > cache unless flush-behind is on.? So seems that is a way to throw > ram to > > it?? I put performance.write-behind-window-size: 512MB and > > performance.flush-behind: on and the whole system calmed down pretty > > much immediately.? could be just timing, though, will have to see > > tomorrow during business hours whether the system stays at a > reasonable > > load. > > so reporting back that this seems to have definitely had a significant > positive effect. > > So far today I have not seen the load average climb over 13 with the > 15minute average hovering around 7.? cpus are still spiking from > time to > time, but they are not staying maxed out all the time, and frequently I > am seeing brief periods of up to 80% idle.? glusterfs process still > spiking up to 180% or so, but consistently running around 70%, and the > brick processes still spiking up to 70-80%, but consistently running > around 20%.? Disk has only been above 50% in atop once so far today > when > it spiked up to 92%, and still lots of RAM left over.? So far nload > even > seems indicates I could get away with a 100Mbit network connection. > Websites are snappy relative to what they were, still a bit sluggish on > the first page of any given site, but tolerable or close to.? Apache > processes are opening and closing right away, instead of stacking up. > > Overall, system is performing pretty much like I would expect it to > without gluster.? I haven't played with any of the other settings yet, > just going to leave it like this for a day. > > I have to admit I am a little bit suspicious.? I have been arguing with > Gluster for a very long time, and I have never known it to play this > nice.? kind feels like when your girl tells you she is "fine"; > conversation has stopped, but you aren't really sure if it's done... > > > > > I will still test the other options you suggested tonight, > though, this > > is probably too good to be true. > > > > Can't thank you enough for your input, Strahil, your help is truly > > appreciated! > > > > > > > > > > > > > >> > >>>> > >>>> > >>>> Best Regards, > >>>> Strahil Nikolov > >>>> > >>> ________ > >>> > >>> > >>> > >>> Community Meeting Calendar: > >>> > >>> Schedule - > >>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > >>> Bridge: https://bluejeans.com/441850968 > >>> > >>> Gluster-users mailing list > >>> Gluster-users at gluster.org > >>> https://lists.gluster.org/mailman/listinfo/gluster-users > > ________ > > > > > > > > Community Meeting Calendar: > > > > Schedule - > > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > > Bridge: https://bluejeans.com/441850968 > > > > Gluster-users mailing list > > Gluster-users at gluster.org > > https://lists.gluster.org/mailman/listinfo/gluster-users > ________ > > > > Community Meeting Calendar: > > Schedule - > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > Bridge: https://bluejeans.com/441850968 > > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > From crl.langlois at gmail.com Fri Aug 7 18:34:27 2020 From: crl.langlois at gmail.com (carl langlois) Date: Fri, 7 Aug 2020 14:34:27 -0400 Subject: [Gluster-users] Keep having unsync entries In-Reply-To: <6EA80B60-FB4B-4B68-883C-E81BC1A95FFC@yahoo.com> References: <6EA80B60-FB4B-4B68-883C-E81BC1A95FFC@yahoo.com> Message-ID: Hi Strahil Thanks for the quick answer. I will try to rsync them manually like you suggested. I am still on 4.2.x. I am in the process of moving my cluster to 4.3 but need to move to 4.2.8 first. But moving to 4.2.8 is not an easy task since I need to pin the base os to 7.6 before moving to 4.2.8. Hope moving to 4.3 will be easy :-) ... I suspect 4.4 to be a pain to upgrade since there is no upgrade path from 7.8 -> 8 ... :-( Anyway thanks for the hints. Regards Carl On Fri, Aug 7, 2020 at 2:00 PM Strahil Nikolov wrote: > I think Ravi made a change to prevent that in gluster v6.6 > > You can rsync the 2 files from ovhost1 and run a full heal (I don't > know why heal without 'full' doesn't clean up the entries). > > Anyways, ovirt can live without these 2 , but as you don't want to risk > any downtimes - just rsync them from ovhost1 and run a 'gluster volume > heal data full'. > > By the way , which version of ovirt do you use ? Gluster v3 was used in > 4.2.X > > Best Regards, > Strahil Nikolov > > > > ?? 7 ?????? 2020 ?. 20:14:07 GMT+03:00, carl langlois < > crl.langlois at gmail.com> ??????: > >Hi all, > > > >I am currently upgrading my ovirt cluster and after doing the upgrade > >on > >one node i end up having unsync entries that heal by the headl command. > >My setup is a 2+1 with 4 volume. > >here is a snapshot of one a volume info > >Volume Name: data > >Type: Replicate > >Volume ID: 71c999a4-b769-471f-8169-a1a66b28f9b0 > >Status: Started > >Snapshot Count: 0 > >Number of Bricks: 1 x (2 + 1) = 3 > >Transport-type: tcp > >Bricks: > >Brick1: ovhost1:/gluster_bricks/data/data > >Brick2: ovhost2:/gluster_bricks/data/data > >Brick3: ovhost3:/gluster_bricks/data/data (arbiter) > >Options Reconfigured: > >server.allow-insecure: on > >nfs.disable: on > >transport.address-family: inet > >performance.quick-read: off > >performance.read-ahead: off > >performance.io-cache: off > >performance.low-prio-threads: 32 > >network.remote-dio: enable > >cluster.eager-lock: enable > >cluster.quorum-type: auto > >cluster.server-quorum-type: server > >cluster.data-self-heal-algorithm: full > >cluster.locking-scheme: granular > >cluster.shd-max-threads: 8 > >cluster.shd-wait-qlength: 10000 > >features.shard: on > >user.cifs: off > >storage.owner-uid: 36 > >storage.owner-gid: 36 > >network.ping-timeout: 30 > >performance.strict-o-direct: on > >cluster.granular-entry-heal: enable > >features.shard-block-size: 64MB > > > >Also the output of v headl data info > > > >gluster> v heal data info > >Brick ovhost1:/gluster_bricks/data/data > >/4e59777c-5b7b-4bf1-8463-1c818067955e/dom_md/ids > >/__DIRECT_IO_TEST__ > >Status: Connected > >Number of entries: 2 > > > >Brick ovhost2:/gluster_bricks/data/data > >Status: Connected > >Number of entries: 0 > > > >Brick ovhost3:/gluster_bricks/data/data > >/4e59777c-5b7b-4bf1-8463-1c818067955e/dom_md/ids > >/__DIRECT_IO_TEST__ > >Status: Connected > >Number of entries: 2 > > > >does not seem to be a split brain also. > >gluster> v heal data info split-brain > >Brick ovhost1:/gluster_bricks/data/data > >Status: Connected > >Number of entries in split-brain: 0 > > > >Brick ovhost2:/gluster_bricks/data/data > >Status: Connected > >Number of entries in split-brain: 0 > > > >Brick ovhost3:/gluster_bricks/data/data > >Status: Connected > >Number of entries in split-brain: 0 > > > >not sure how to resolve this issue. > >gluster version is 3.2.15 > > > >Regards > > > >Carl > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mathias.waack at seim-partner.de Fri Aug 7 18:39:59 2020 From: mathias.waack at seim-partner.de (Mathias Waack) Date: Fri, 7 Aug 2020 20:39:59 +0200 Subject: [Gluster-users] Repair after accident In-Reply-To: <333712BC-D10B-4759-AED1-7793F1C17AC6@yahoo.com> References: <5822bb92-432e-e08e-d230-7adbf57127ce@seim-partner.de> <333712BC-D10B-4759-AED1-7793F1C17AC6@yahoo.com> Message-ID: <151fd00d-94d1-1053-7202-bcd60735c1fc@seim-partner.de> Hi Strahil, but I cannot find these files in the heal info: find /zbrick/.glusterfs -links 1 -ls | grep -v ' -> ' ... 7443397? 132463 -rw-------?? 1 999????? docker?? 1073741824 Aug? 3 10:35 /zbrick/.glusterfs/b5/3c/b53c8e46-068b-4286-94a6-7cf54f711983 Now looking for this file in the heal infos: gluster volume heal gvol info | grep b53c8e46-068b-4286-94a6-7cf54f711983 shows nothing. So I do not know, what I have to heal... Mathias On 07.08.20 14:32, Strahil Nikolov wrote: > Have you tried to gluster heal and check if the files are back into their place? > > I always thought that those hard links are used by the healing mechanism and if that is true - gluster should restore the files to their original location and then wiping the correct files from FUSE will be easy. > > Best Regards, > Strahil Nikolov > > ?? 7 ?????? 2020 ?. 10:24:38 GMT+03:00, Mathias Waack ??????: >> Hi all, >> >> maybe I should add some more information: >> >> The container which filled up the space was running on node x, which >> still shows a nearly filled fs: >> >> 192.168.1.x:/gvol? 2.6T? 2.5T? 149G? 95% /gluster >> >> nearly the same situation on the underlying brick partition on node x: >> >> zdata/brick???? 2.6T? 2.4T? 176G? 94% /zbrick >> >> On node y the network card crashed, glusterfs shows the same values: >> >> 192.168.1.y:/gvol? 2.6T? 2.5T? 149G? 95% /gluster >> >> but different values on the brick: >> >> zdata/brick???? 2.9T? 1.6T? 1.4T? 54% /zbrick >> >> I think this happened because glusterfs still has hardlinks to the >> deleted files on node x? So I can find these files with: >> >> find /zbrick/.glusterfs -links 1 -ls | grep -v ' -> ' >> >> But now I am lost. How can I verify these files really belongs to the >> right container? Or can I just delete this files because there is no >> way >> to access it? Or offers glusterfs a way to solve this situation? >> >> Mathias >> >> On 05.08.20 15:48, Mathias Waack wrote: >>> Hi all, >>> >>> we are running a gluster setup with two nodes: >>> >>> Status of volume: gvol >>> Gluster process???????????????????????????? TCP Port? RDMA Port >>> Online? Pid >>> >> ------------------------------------------------------------------------------ >> >>> Brick 192.168.1.x:/zbrick????????????????? 49152???? 0 Y 13350 >>> Brick 192.168.1.y:/zbrick????????????????? 49152???? 0 Y 5965 >>> Self-heal Daemon on localhost?????????????? N/A?????? N/A Y 14188 >>> Self-heal Daemon on 192.168.1.93??????????? N/A?????? N/A Y 6003 >>> >>> Task Status of Volume gvol >>> >> ------------------------------------------------------------------------------ >> >>> There are no active volume tasks >>> >>> The glusterfs hosts a bunch of containers with its data volumes. The >>> underlying fs is zfs. Few days ago one of the containers created a >> lot >>> of files in one of its data volumes, and at the end it completely >>> filled up the space of the glusterfs volume. But this happened only >> on >>> one host, on the other host there was still enough space. We finally >>> were able to identify this container and found out, the sizes of the >>> data on /zbrick were different on both hosts for this container. Now >>> we made the big mistake to delete these files on both hosts in the >>> /zbrick volume, not on the mounted glusterfs volume. >>> >>> Later we found the reason for this behavior: the network driver on >> the >>> second node partially crashed (which means we ware able to login on >>> the node, so we assumed the network was running, but the card was >>> already dropping packets at this time) at the same time, as the >> failed >>> container started to fill up the gluster volume. After rebooting the >>> second node? the gluster became available again. >>> >>> Now the glusterfs volume is running again- but it is still (nearly) >>> full: the files created by the container are not visible, but they >>> still count into amount of free space. How can we fix this? >>> >>> In addition there are some files which are no longer accessible since >>> this accident: >>> >>> tail access.log.old >>> tail: cannot open 'access.log.old' for reading: Input/output error >>> >>> Looks like affected by this error are files which have been changed >>> during the accident. Is there a way to fix this too? >>> >>> Thanks >>> ??? Mathias >>> >>> >>> ________ >>> >>> >>> >>> Community Meeting Calendar: >>> >>> Schedule - >>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >>> Bridge: https://bluejeans.com/441850968 >>> >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> https://lists.gluster.org/mailman/listinfo/gluster-users >> ________ >> >> >> >> Community Meeting Calendar: >> >> Schedule - >> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >> Bridge: https://bluejeans.com/441850968 >> >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users From gilberto.nunes32 at gmail.com Sat Aug 8 04:27:14 2020 From: gilberto.nunes32 at gmail.com (Gilberto Nunes) Date: Sat, 8 Aug 2020 01:27:14 -0300 Subject: [Gluster-users] Monitoring tools for GlusterFS Message-ID: Hi guys... I miss some tools that could be used in order to monitor healing for example, and others things like resources used... What do you recommend? Tools that could be used in CLI but that shows a percentage as healing is under way would be nice! Thanks. --- Gilberto Nunes Ferreira -------------- next part -------------- An HTML attachment was scrubbed... URL: From adrianquintero at gmail.com Sat Aug 8 05:09:38 2020 From: adrianquintero at gmail.com (Adrian Quintero) Date: Sat, 8 Aug 2020 01:09:38 -0400 Subject: [Gluster-users] Monitoring tools for GlusterFS In-Reply-To: References: Message-ID: Hello Gilberto, I've had the same questions and some of the community friends were kind enough to send me a few things to look for However as you have stated it would be interesting to know exactly how to monitor the healing process. A friend mentioned some things we should monitor from a Gluster's perspective but not limited to: - Thin LVM (pool should never get full) - Number of snapshots - Quotas (both inodes and total size) - GeoRep Status - Gluster brick status (2 out of 3 down and it's an outage for ' replica 3' volumes) - Pending Heals - Errors in Gluster brick logs can indicate FS issues - Errors in other Gluster logs As for the resources used please look at https://docs.gluster.org/en/latest/Administrator%20Guide/Monitoring%20Workload/ it has helped me quite a bit. If you find out about the healing process and how to monitor it , please let me know. Regards, Adrian Quintero -------------- next part -------------- An HTML attachment was scrubbed... URL: From hunter86_bg at yahoo.com Sat Aug 8 06:57:16 2020 From: hunter86_bg at yahoo.com (Strahil Nikolov) Date: Sat, 8 Aug 2020 06:57:16 +0000 (UTC) Subject: [Gluster-users] Keep having unsync entries In-Reply-To: References: <6EA80B60-FB4B-4B68-883C-E81BC1A95FFC@yahoo.com> Message-ID: <855486521.1248366.1596869836761@mail.yahoo.com> Keep in mind that 4.3 is using Gluster v6 . I'm on latest 4.3.10 but with Gluster v7. I was hit by a very rare ACL bug (reported by some other guys here and in oVirt ML) and thus I will recommend you to test functionality after every gluster major upgrade (start,stop,create snapshot,remove snapshot, etc). In my case 6.6+ and 7.1+ were problematic, but you got no way to skip them. Best Regards, Strahil Nikolov ? ?????, 7 ?????? 2020 ?., 21:34:39 ???????+3, carl langlois ??????: Hi?Strahil Thanks for the quick answer. I will try to rsync them manually like you suggested.?? I am still on 4.2.x. I am in the process of moving my cluster to 4.3 but need to move to 4.2.8 first. But moving to 4.2.8 is not an easy task since I need to pin the base os to 7.6 before moving to 4.2.8.? Hope moving to 4.3 will be easy :-) ... I suspect 4.4 to be a pain to upgrade since there is no upgrade path from 7.8 -> 8 ... :-( Anyway thanks for the hints. Regards Carl On Fri, Aug 7, 2020 at 2:00 PM Strahil Nikolov wrote: > I think Ravi made a change to prevent that in gluster v6.6 > > You can? rsync the? 2 files from? ovhost1 and run a full heal (I don't know why heal without 'full' doesn't clean up the entries). > > Anyways, ovirt can live without these 2 , but as you don't want to risk any downtimes? - just rsync them from ovhost1 and run a 'gluster volume heal data full'. > > By the way , which version of ovirt do you use ? Gluster v3 was? used? in 4.2.X > > Best Regards, > Strahil Nikolov > > > > ?? 7 ?????? 2020 ?. 20:14:07 GMT+03:00, carl langlois ??????: >>Hi all, >> >>I am currently upgrading my ovirt cluster and after doing the upgrade >>on >>one node i end up having unsync entries that heal by the headl command. >>My setup is a 2+1? with 4 volume. >>here is a snapshot of one a volume info >>Volume Name: data >>Type: Replicate >>Volume ID: 71c999a4-b769-471f-8169-a1a66b28f9b0 >>Status: Started >>Snapshot Count: 0 >>Number of Bricks: 1 x (2 + 1) = 3 >>Transport-type: tcp >>Bricks: >>Brick1: ovhost1:/gluster_bricks/data/data >>Brick2: ovhost2:/gluster_bricks/data/data >>Brick3: ovhost3:/gluster_bricks/data/data (arbiter) >>Options Reconfigured: >>server.allow-insecure: on >>nfs.disable: on >>transport.address-family: inet >>performance.quick-read: off >>performance.read-ahead: off >>performance.io-cache: off >>performance.low-prio-threads: 32 >>network.remote-dio: enable >>cluster.eager-lock: enable >>cluster.quorum-type: auto >>cluster.server-quorum-type: server >>cluster.data-self-heal-algorithm: full >>cluster.locking-scheme: granular >>cluster.shd-max-threads: 8 >>cluster.shd-wait-qlength: 10000 >>features.shard: on >>user.cifs: off >>storage.owner-uid: 36 >>storage.owner-gid: 36 >>network.ping-timeout: 30 >>performance.strict-o-direct: on >>cluster.granular-entry-heal: enable >>features.shard-block-size: 64MB >> >>Also the output of v headl data info >> >>gluster> v heal data info >>Brick ovhost1:/gluster_bricks/data/data >>/4e59777c-5b7b-4bf1-8463-1c818067955e/dom_md/ids >>/__DIRECT_IO_TEST__ >>Status: Connected >>Number of entries: 2 >> >>Brick ovhost2:/gluster_bricks/data/data >>Status: Connected >>Number of entries: 0 >> >>Brick ovhost3:/gluster_bricks/data/data >>/4e59777c-5b7b-4bf1-8463-1c818067955e/dom_md/ids >>/__DIRECT_IO_TEST__ >>Status: Connected >>Number of entries: 2 >> >>does not seem to be a split brain also. >>gluster> v heal data info split-brain >>Brick ovhost1:/gluster_bricks/data/data >>Status: Connected >>Number of entries in split-brain: 0 >> >>Brick ovhost2:/gluster_bricks/data/data >>Status: Connected >>Number of entries in split-brain: 0 >> >>Brick ovhost3:/gluster_bricks/data/data >>Status: Connected >>Number of entries in split-brain: 0 >> >>not sure how to resolve this issue. >>gluster version is 3.2.15 >> >>Regards >> >>Carl > From hunter86_bg at yahoo.com Sat Aug 8 07:00:55 2020 From: hunter86_bg at yahoo.com (Strahil Nikolov) Date: Sat, 8 Aug 2020 07:00:55 +0000 (UTC) Subject: [Gluster-users] Repair after accident In-Reply-To: <151fd00d-94d1-1053-7202-bcd60735c1fc@seim-partner.de> References: <5822bb92-432e-e08e-d230-7adbf57127ce@seim-partner.de> <333712BC-D10B-4759-AED1-7793F1C17AC6@yahoo.com> <151fd00d-94d1-1053-7202-bcd60735c1fc@seim-partner.de> Message-ID: <1181804313.1257349.1596870055548@mail.yahoo.com> In glusterfs the long string is called "gfid" and does not represent the name. Best Regards, Strahil Nikolov ? ?????, 7 ?????? 2020 ?., 21:40:11 ???????+3, Mathias Waack ??????: Hi Strahil, but I cannot find these files in the heal info: find /zbrick/.glusterfs -links 1 -ls | grep -v ' -> ' ... 7443397? 132463 -rw-------?? 1 999????? docker?? 1073741824 Aug? 3 10:35 /zbrick/.glusterfs/b5/3c/b53c8e46-068b-4286-94a6-7cf54f711983 Now looking for this file in the heal infos: gluster volume heal gvol info | grep b53c8e46-068b-4286-94a6-7cf54f711983 shows nothing. So I do not know, what I have to heal... Mathias On 07.08.20 14:32, Strahil Nikolov wrote: > Have you tried to gluster heal and check if the files are back into their place? > > I always thought that those hard links are used? by the healing mechanism? and if that is true - gluster should restore the files to their original location and then wiping the correct files from FUSE will be easy. > > Best Regards, > Strahil Nikolov > > ?? 7 ?????? 2020 ?. 10:24:38 GMT+03:00, Mathias Waack ??????: >> Hi all, >> >> maybe I should add some more information: >> >> The container which filled up the space was running on node x, which >> still shows a nearly filled fs: >> >> 192.168.1.x:/gvol? 2.6T? 2.5T? 149G? 95% /gluster >> >> nearly the same situation on the underlying brick partition on node x: >> >> zdata/brick???? 2.6T? 2.4T? 176G? 94% /zbrick >> >> On node y the network card crashed, glusterfs shows the same values: >> >> 192.168.1.y:/gvol? 2.6T? 2.5T? 149G? 95% /gluster >> >> but different values on the brick: >> >> zdata/brick???? 2.9T? 1.6T? 1.4T? 54% /zbrick >> >> I think this happened because glusterfs still has hardlinks to the >> deleted files on node x? So I can find these files with: >> >> find /zbrick/.glusterfs -links 1 -ls | grep -v ' -> ' >> >> But now I am lost. How can I verify these files really belongs to the >> right container? Or can I just delete this files because there is no >> way >> to access it? Or offers glusterfs a way to solve this situation? >> >> Mathias >> >> On 05.08.20 15:48, Mathias Waack wrote: >>> Hi all, >>> >>> we are running a gluster setup with two nodes: >>> >>> Status of volume: gvol >>> Gluster process???????????????????????????? TCP Port? RDMA Port >>> Online? Pid >>> >> ------------------------------------------------------------------------------ >> >>> Brick 192.168.1.x:/zbrick????????????????? 49152???? 0 Y 13350 >>> Brick 192.168.1.y:/zbrick????????????????? 49152???? 0 Y 5965 >>> Self-heal Daemon on localhost?????????????? N/A?????? N/A Y 14188 >>> Self-heal Daemon on 192.168.1.93??????????? N/A?????? N/A Y 6003 >>> >>> Task Status of Volume gvol >>> >> ------------------------------------------------------------------------------ >> >>> There are no active volume tasks >>> >>> The glusterfs hosts a bunch of containers with its data volumes. The >>> underlying fs is zfs. Few days ago one of the containers created a >> lot >>> of files in one of its data volumes, and at the end it completely >>> filled up the space of the glusterfs volume. But this happened only >> on >>> one host, on the other host there was still enough space. We finally >>> were able to identify this container and found out, the sizes of the >>> data on /zbrick were different on both hosts for this container. Now >>> we made the big mistake to delete these files on both hosts in the >>> /zbrick volume, not on the mounted glusterfs volume. >>> >>> Later we found the reason for this behavior: the network driver on >> the >>> second node partially crashed (which means we ware able to login on >>> the node, so we assumed the network was running, but the card was >>> already dropping packets at this time) at the same time, as the >> failed >>> container started to fill up the gluster volume. After rebooting the >>> second node? the gluster became available again. >>> >>> Now the glusterfs volume is running again- but it is still (nearly) >>> full: the files created by the container are not visible, but they >>> still count into amount of free space. How can we fix this? >>> >>> In addition there are some files which are no longer accessible since >>> this accident: >>> >>> tail access.log.old >>> tail: cannot open 'access.log.old' for reading: Input/output error >>> >>> Looks like affected by this error are files which have been changed >>> during the accident. Is there a way to fix this too? >>> >>> Thanks >>>? ??? Mathias >>> >>> >>> ________ >>> >>> >>> >>> Community Meeting Calendar: >>> >>> Schedule - >>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >>> Bridge: https://bluejeans.com/441850968 >>> >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> https://lists.gluster.org/mailman/listinfo/gluster-users >> ________ >> >> >> >> Community Meeting Calendar: >> >> Schedule - >> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >> Bridge: https://bluejeans.com/441850968 >> >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users ________ Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users at gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users From mathias.waack at seim-partner.de Sat Aug 8 15:02:10 2020 From: mathias.waack at seim-partner.de (Mathias Waack) Date: Sat, 8 Aug 2020 17:02:10 +0200 Subject: [Gluster-users] Repair after accident In-Reply-To: <1181804313.1257349.1596870055548@mail.yahoo.com> References: <5822bb92-432e-e08e-d230-7adbf57127ce@seim-partner.de> <333712BC-D10B-4759-AED1-7793F1C17AC6@yahoo.com> <151fd00d-94d1-1053-7202-bcd60735c1fc@seim-partner.de> <1181804313.1257349.1596870055548@mail.yahoo.com> Message-ID: <235f1f25-c9dd-cb75-3f49-b4a82bffb4d6@seim-partner.de> So b53c8e46-068b-4286-94a6-7cf54f711983 is not a gfid? What else is it? Mathias On 08.08.20 09:00, Strahil Nikolov wrote: > In glusterfs the long string is called "gfid" and does not represent the name. > > Best Regards, > Strahil Nikolov > > > > > > > ? ?????, 7 ?????? 2020 ?., 21:40:11 ???????+3, Mathias Waack ??????: > > > > > > Hi Strahil, > > but I cannot find these files in the heal info: > > find /zbrick/.glusterfs -links 1 -ls | grep -v ' -> ' > ... > 7443397? 132463 -rw-------?? 1 999????? docker?? 1073741824 Aug? 3 10:35 > /zbrick/.glusterfs/b5/3c/b53c8e46-068b-4286-94a6-7cf54f711983 > > Now looking for this file in the heal infos: > > gluster volume heal gvol info | grep b53c8e46-068b-4286-94a6-7cf54f711983 > > shows nothing. > > So I do not know, what I have to heal... > > Mathias > > On 07.08.20 14:32, Strahil Nikolov wrote: >> Have you tried to gluster heal and check if the files are back into their place? >> >> I always thought that those hard links are used? by the healing mechanism? and if that is true - gluster should restore the files to their original location and then wiping the correct files from FUSE will be easy. >> >> Best Regards, >> Strahil Nikolov >> >> ?? 7 ?????? 2020 ?. 10:24:38 GMT+03:00, Mathias Waack ??????: >>> Hi all, >>> >>> maybe I should add some more information: >>> >>> The container which filled up the space was running on node x, which >>> still shows a nearly filled fs: >>> >>> 192.168.1.x:/gvol? 2.6T? 2.5T? 149G? 95% /gluster >>> >>> nearly the same situation on the underlying brick partition on node x: >>> >>> zdata/brick???? 2.6T? 2.4T? 176G? 94% /zbrick >>> >>> On node y the network card crashed, glusterfs shows the same values: >>> >>> 192.168.1.y:/gvol? 2.6T? 2.5T? 149G? 95% /gluster >>> >>> but different values on the brick: >>> >>> zdata/brick???? 2.9T? 1.6T? 1.4T? 54% /zbrick >>> >>> I think this happened because glusterfs still has hardlinks to the >>> deleted files on node x? So I can find these files with: >>> >>> find /zbrick/.glusterfs -links 1 -ls | grep -v ' -> ' >>> >>> But now I am lost. How can I verify these files really belongs to the >>> right container? Or can I just delete this files because there is no >>> way >>> to access it? Or offers glusterfs a way to solve this situation? >>> >>> Mathias >>> >>> On 05.08.20 15:48, Mathias Waack wrote: >>>> Hi all, >>>> >>>> we are running a gluster setup with two nodes: >>>> >>>> Status of volume: gvol >>>> Gluster process???????????????????????????? TCP Port? RDMA Port >>>> Online? Pid >>>> >>> ------------------------------------------------------------------------------ >>> >>>> Brick 192.168.1.x:/zbrick????????????????? 49152???? 0 Y 13350 >>>> Brick 192.168.1.y:/zbrick????????????????? 49152???? 0 Y 5965 >>>> Self-heal Daemon on localhost?????????????? N/A?????? N/A Y 14188 >>>> Self-heal Daemon on 192.168.1.93??????????? N/A?????? N/A Y 6003 >>>> >>>> Task Status of Volume gvol >>>> >>> ------------------------------------------------------------------------------ >>> >>>> There are no active volume tasks >>>> >>>> The glusterfs hosts a bunch of containers with its data volumes. The >>>> underlying fs is zfs. Few days ago one of the containers created a >>> lot >>>> of files in one of its data volumes, and at the end it completely >>>> filled up the space of the glusterfs volume. But this happened only >>> on >>>> one host, on the other host there was still enough space. We finally >>>> were able to identify this container and found out, the sizes of the >>>> data on /zbrick were different on both hosts for this container. Now >>>> we made the big mistake to delete these files on both hosts in the >>>> /zbrick volume, not on the mounted glusterfs volume. >>>> >>>> Later we found the reason for this behavior: the network driver on >>> the >>>> second node partially crashed (which means we ware able to login on >>>> the node, so we assumed the network was running, but the card was >>>> already dropping packets at this time) at the same time, as the >>> failed >>>> container started to fill up the gluster volume. After rebooting the >>>> second node? the gluster became available again. >>>> >>>> Now the glusterfs volume is running again- but it is still (nearly) >>>> full: the files created by the container are not visible, but they >>>> still count into amount of free space. How can we fix this? >>>> >>>> In addition there are some files which are no longer accessible since >>>> this accident: >>>> >>>> tail access.log.old >>>> tail: cannot open 'access.log.old' for reading: Input/output error >>>> >>>> Looks like affected by this error are files which have been changed >>>> during the accident. Is there a way to fix this too? >>>> >>>> Thanks >>>> ? ??? Mathias >>>> >>>> >>>> ________ >>>> >>>> >>>> >>>> Community Meeting Calendar: >>>> >>>> Schedule - >>>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >>>> Bridge: https://bluejeans.com/441850968 >>>> >>>> Gluster-users mailing list >>>> Gluster-users at gluster.org >>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>> ________ >>> >>> >>> >>> Community Meeting Calendar: >>> >>> Schedule - >>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >>> Bridge: https://bluejeans.com/441850968 >>> >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> https://lists.gluster.org/mailman/listinfo/gluster-users > ________ > > > > Community Meeting Calendar: > > Schedule - > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > Bridge: https://bluejeans.com/441850968 > > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users From hunter86_bg at yahoo.com Sat Aug 8 16:05:23 2020 From: hunter86_bg at yahoo.com (Strahil Nikolov) Date: Sat, 08 Aug 2020 19:05:23 +0300 Subject: [Gluster-users] Repair after accident In-Reply-To: <235f1f25-c9dd-cb75-3f49-b4a82bffb4d6@seim-partner.de> References: <5822bb92-432e-e08e-d230-7adbf57127ce@seim-partner.de> <333712BC-D10B-4759-AED1-7793F1C17AC6@yahoo.com> <151fd00d-94d1-1053-7202-bcd60735c1fc@seim-partner.de> <1181804313.1257349.1596870055548@mail.yahoo.com> <235f1f25-c9dd-cb75-3f49-b4a82bffb4d6@seim-partner.de> Message-ID: <785C240A-515B-4AD7-B30E-BA00F2DF2E06@yahoo.com> If you read my previous email, you will see that i noted that the string IS GFID and not the name of the file :) You can find the name following the procedure at: https://docs.gluster.org/en/latest/Troubleshooting/gfid-to-path/ Of course, that will be slow for all entries in .glusterfs and you will need to create a script to match all gfids to brick path. I guess the fastest way to find the deleted files (As far as I understood they were deleted on the brick directly and entries in .glusterfs were left) is to create a script that: 0.Create a ramfs for the files: findmnt /mnt || mount -t ramfs -o size=128MB - /mnt 1. Get all inodes ionice -c 2 -n 7 nice -n 15 find /full/path/to/brick -type f -exec ls -i {} \; >/mnt/data 2. Get only the inodes: nice -n 15 awk '{print $1}' /mnt/data > /mnt/inode_only 3. Now the fun starts now-> find inodes that are not duplicate: nice -n 15 uniq -u /mnt/inode_only > /mnt/gfid-only 4. Once you have the inodes, you can verify that they do exists only in .gluster dir for i in $(cat /mnt/gfid-only); do ionice -c 2 -n 7 nice -n 15 find /path/to/.glusterfs -inum $i ; echo;echo; done 5. If it's OK -> delete for i in $(cat /mnt/gfid-only); do ionice -c 2 -n 7 nice -n 15 find /path/to/brick -inum $i -delete ; done Last , repeat on all bricks Good luck! P.S.: Consider creating a gluster snapshot before that - just in case... Better safe than sorry. P.S: If you think that you got enough resources, you can remove ionice/nice stuff . They are just to guarantee you won't eat too many resources. Best Regards, Strahil Nikolov ?? 8 ?????? 2020 ?. 18:02:10 GMT+03:00, Mathias Waack ??????: >So b53c8e46-068b-4286-94a6-7cf54f711983 is not a gfid? What else is it? > >Mathias > >On 08.08.20 09:00, Strahil Nikolov wrote: >> In glusterfs the long string is called "gfid" and does not represent >the name. >> >> Best Regards, >> Strahil Nikolov >> >> >> >> >> >> >> ? ?????, 7 ?????? 2020 ?., 21:40:11 ???????+3, Mathias Waack > ??????: >> >> >> >> >> >> Hi Strahil, >> >> but I cannot find these files in the heal info: >> >> find /zbrick/.glusterfs -links 1 -ls | grep -v ' -> ' >> ... >> 7443397? 132463 -rw-------?? 1 999????? docker?? 1073741824 Aug? 3 >10:35 >> /zbrick/.glusterfs/b5/3c/b53c8e46-068b-4286-94a6-7cf54f711983 >> >> Now looking for this file in the heal infos: >> >> gluster volume heal gvol info | grep >b53c8e46-068b-4286-94a6-7cf54f711983 >> >> shows nothing. >> >> So I do not know, what I have to heal... >> >> Mathias >> >> On 07.08.20 14:32, Strahil Nikolov wrote: >>> Have you tried to gluster heal and check if the files are back into >their place? >>> >>> I always thought that those hard links are used? by the healing >mechanism? and if that is true - gluster should restore the files to >their original location and then wiping the correct files from FUSE >will be easy. >>> >>> Best Regards, >>> Strahil Nikolov >>> >>> ?? 7 ?????? 2020 ?. 10:24:38 GMT+03:00, Mathias Waack > ??????: >>>> Hi all, >>>> >>>> maybe I should add some more information: >>>> >>>> The container which filled up the space was running on node x, >which >>>> still shows a nearly filled fs: >>>> >>>> 192.168.1.x:/gvol? 2.6T? 2.5T? 149G? 95% /gluster >>>> >>>> nearly the same situation on the underlying brick partition on node >x: >>>> >>>> zdata/brick???? 2.6T? 2.4T? 176G? 94% /zbrick >>>> >>>> On node y the network card crashed, glusterfs shows the same >values: >>>> >>>> 192.168.1.y:/gvol? 2.6T? 2.5T? 149G? 95% /gluster >>>> >>>> but different values on the brick: >>>> >>>> zdata/brick???? 2.9T? 1.6T? 1.4T? 54% /zbrick >>>> >>>> I think this happened because glusterfs still has hardlinks to the >>>> deleted files on node x? So I can find these files with: >>>> >>>> find /zbrick/.glusterfs -links 1 -ls | grep -v ' -> ' >>>> >>>> But now I am lost. How can I verify these files really belongs to >the >>>> right container? Or can I just delete this files because there is >no >>>> way >>>> to access it? Or offers glusterfs a way to solve this situation? >>>> >>>> Mathias >>>> >>>> On 05.08.20 15:48, Mathias Waack wrote: >>>>> Hi all, >>>>> >>>>> we are running a gluster setup with two nodes: >>>>> >>>>> Status of volume: gvol >>>>> Gluster process???????????????????????????? TCP Port? RDMA Port >>>>> Online? Pid >>>>> >>>> >------------------------------------------------------------------------------ >>>> >>>>> Brick 192.168.1.x:/zbrick????????????????? 49152???? 0 Y 13350 >>>>> Brick 192.168.1.y:/zbrick????????????????? 49152???? 0 Y 5965 >>>>> Self-heal Daemon on localhost?????????????? N/A?????? N/A Y 14188 >>>>> Self-heal Daemon on 192.168.1.93??????????? N/A?????? N/A Y 6003 >>>>> >>>>> Task Status of Volume gvol >>>>> >>>> >------------------------------------------------------------------------------ >>>> >>>>> There are no active volume tasks >>>>> >>>>> The glusterfs hosts a bunch of containers with its data volumes. >The >>>>> underlying fs is zfs. Few days ago one of the containers created a >>>> lot >>>>> of files in one of its data volumes, and at the end it completely >>>>> filled up the space of the glusterfs volume. But this happened >only >>>> on >>>>> one host, on the other host there was still enough space. We >finally >>>>> were able to identify this container and found out, the sizes of >the >>>>> data on /zbrick were different on both hosts for this container. >Now >>>>> we made the big mistake to delete these files on both hosts in the >>>>> /zbrick volume, not on the mounted glusterfs volume. >>>>> >>>>> Later we found the reason for this behavior: the network driver on >>>> the >>>>> second node partially crashed (which means we ware able to login >on >>>>> the node, so we assumed the network was running, but the card was >>>>> already dropping packets at this time) at the same time, as the >>>> failed >>>>> container started to fill up the gluster volume. After rebooting >the >>>>> second node? the gluster became available again. >>>>> >>>>> Now the glusterfs volume is running again- but it is still >(nearly) >>>>> full: the files created by the container are not visible, but they >>>>> still count into amount of free space. How can we fix this? >>>>> >>>>> In addition there are some files which are no longer accessible >since >>>>> this accident: >>>>> >>>>> tail access.log.old >>>>> tail: cannot open 'access.log.old' for reading: Input/output error >>>>> >>>>> Looks like affected by this error are files which have been >changed >>>>> during the accident. Is there a way to fix this too? >>>>> >>>>> Thanks >>>>> ? ??? Mathias >>>>> >>>>> >>>>> ________ >>>>> >>>>> >>>>> >>>>> Community Meeting Calendar: >>>>> >>>>> Schedule - >>>>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >>>>> Bridge: https://bluejeans.com/441850968 >>>>> >>>>> Gluster-users mailing list >>>>> Gluster-users at gluster.org >>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>> ________ >>>> >>>> >>>> >>>> Community Meeting Calendar: >>>> >>>> Schedule - >>>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >>>> Bridge: https://bluejeans.com/441850968 >>>> >>>> Gluster-users mailing list >>>> Gluster-users at gluster.org >>>> https://lists.gluster.org/mailman/listinfo/gluster-users >> ________ >> >> >> >> Community Meeting Calendar: >> >> Schedule - >> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >> Bridge: https://bluejeans.com/441850968 >> >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users >________ > > > >Community Meeting Calendar: > >Schedule - >Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >Bridge: https://bluejeans.com/441850968 > >Gluster-users mailing list >Gluster-users at gluster.org >https://lists.gluster.org/mailman/listinfo/gluster-users From mathias.waack at seim-partner.de Sat Aug 8 17:05:29 2020 From: mathias.waack at seim-partner.de (Mathias Waack) Date: Sat, 8 Aug 2020 19:05:29 +0200 Subject: [Gluster-users] Repair after accident In-Reply-To: <785C240A-515B-4AD7-B30E-BA00F2DF2E06@yahoo.com> References: <5822bb92-432e-e08e-d230-7adbf57127ce@seim-partner.de> <333712BC-D10B-4759-AED1-7793F1C17AC6@yahoo.com> <151fd00d-94d1-1053-7202-bcd60735c1fc@seim-partner.de> <1181804313.1257349.1596870055548@mail.yahoo.com> <235f1f25-c9dd-cb75-3f49-b4a82bffb4d6@seim-partner.de> <785C240A-515B-4AD7-B30E-BA00F2DF2E06@yahoo.com> Message-ID: Oh I see, I got you wrong. Now I am going to start to understand the whole thing. Thank you for the comprehensive explanation. For good luck it is weekend, so I can start digging into this... Thanks ??? Mathias On 08.08.20 18:05, Strahil Nikolov wrote: > If you read my previous email, you will see that i noted that the string IS GFID and not the name of the file :) > > > You can find the name following the procedure at: https://docs.gluster.org/en/latest/Troubleshooting/gfid-to-path/ > > > Of course, that will be slow for all entries in .glusterfs and you will need to create a script to match all gfids to brick path. > > > I guess the fastest way to find the deleted files (As far as I understood they were deleted on the brick directly and entries in .glusterfs were left) is to create a script that: > > 0.Create a ramfs for the files: > findmnt /mnt || mount -t ramfs -o size=128MB - /mnt > > 1. Get all inodes > ionice -c 2 -n 7 nice -n 15 find /full/path/to/brick -type f -exec ls -i {} \; >/mnt/data > > 2. Get only the inodes: > nice -n 15 awk '{print $1}' /mnt/data > /mnt/inode_only > 3. Now the fun starts now-> find inodes that are not duplicate: > > nice -n 15 uniq -u /mnt/inode_only > /mnt/gfid-only > > 4. Once you have the inodes, you can verify that they do exists only in .gluster dir > for i in $(cat /mnt/gfid-only); do ionice -c 2 -n 7 nice -n 15 find /path/to/.glusterfs -inum $i ; echo;echo; done > > 5. If it's OK -> delete > > for i in $(cat /mnt/gfid-only); do ionice -c 2 -n 7 nice -n 15 find /path/to/brick -inum $i -delete ; done > > > Last , repeat on all bricks > Good luck! > > > P.S.: Consider creating a gluster snapshot before that - just in case... Better safe than sorry. > > P.S: If you think that you got enough resources, you can remove ionice/nice stuff . They are just to guarantee you won't eat too many resources. > > > Best Regards, > Strahil Nikolov > > > > > ?? 8 ?????? 2020 ?. 18:02:10 GMT+03:00, Mathias Waack ??????: >> So b53c8e46-068b-4286-94a6-7cf54f711983 is not a gfid? What else is it? >> >> Mathias >> >> On 08.08.20 09:00, Strahil Nikolov wrote: >>> In glusterfs the long string is called "gfid" and does not represent >> the name. >>> Best Regards, >>> Strahil Nikolov >>> >>> >>> >>> >>> >>> >>> ? ?????, 7 ?????? 2020 ?., 21:40:11 ???????+3, Mathias Waack >> ??????: >>> >>> >>> >>> >>> Hi Strahil, >>> >>> but I cannot find these files in the heal info: >>> >>> find /zbrick/.glusterfs -links 1 -ls | grep -v ' -> ' >>> ... >>> 7443397? 132463 -rw-------?? 1 999????? docker?? 1073741824 Aug? 3 >> 10:35 >>> /zbrick/.glusterfs/b5/3c/b53c8e46-068b-4286-94a6-7cf54f711983 >>> >>> Now looking for this file in the heal infos: >>> >>> gluster volume heal gvol info | grep >> b53c8e46-068b-4286-94a6-7cf54f711983 >>> shows nothing. >>> >>> So I do not know, what I have to heal... >>> >>> Mathias >>> >>> On 07.08.20 14:32, Strahil Nikolov wrote: >>>> Have you tried to gluster heal and check if the files are back into >> their place? >>>> I always thought that those hard links are used? by the healing >> mechanism? and if that is true - gluster should restore the files to >> their original location and then wiping the correct files from FUSE >> will be easy. >>>> Best Regards, >>>> Strahil Nikolov >>>> >>>> ?? 7 ?????? 2020 ?. 10:24:38 GMT+03:00, Mathias Waack >> ??????: >>>>> Hi all, >>>>> >>>>> maybe I should add some more information: >>>>> >>>>> The container which filled up the space was running on node x, >> which >>>>> still shows a nearly filled fs: >>>>> >>>>> 192.168.1.x:/gvol? 2.6T? 2.5T? 149G? 95% /gluster >>>>> >>>>> nearly the same situation on the underlying brick partition on node >> x: >>>>> zdata/brick???? 2.6T? 2.4T? 176G? 94% /zbrick >>>>> >>>>> On node y the network card crashed, glusterfs shows the same >> values: >>>>> 192.168.1.y:/gvol? 2.6T? 2.5T? 149G? 95% /gluster >>>>> >>>>> but different values on the brick: >>>>> >>>>> zdata/brick???? 2.9T? 1.6T? 1.4T? 54% /zbrick >>>>> >>>>> I think this happened because glusterfs still has hardlinks to the >>>>> deleted files on node x? So I can find these files with: >>>>> >>>>> find /zbrick/.glusterfs -links 1 -ls | grep -v ' -> ' >>>>> >>>>> But now I am lost. How can I verify these files really belongs to >> the >>>>> right container? Or can I just delete this files because there is >> no >>>>> way >>>>> to access it? Or offers glusterfs a way to solve this situation? >>>>> >>>>> Mathias >>>>> >>>>> On 05.08.20 15:48, Mathias Waack wrote: >>>>>> Hi all, >>>>>> >>>>>> we are running a gluster setup with two nodes: >>>>>> >>>>>> Status of volume: gvol >>>>>> Gluster process???????????????????????????? TCP Port? RDMA Port >>>>>> Online? Pid >>>>>> >> ------------------------------------------------------------------------------ >>>>>> Brick 192.168.1.x:/zbrick????????????????? 49152???? 0 Y 13350 >>>>>> Brick 192.168.1.y:/zbrick????????????????? 49152???? 0 Y 5965 >>>>>> Self-heal Daemon on localhost?????????????? N/A?????? N/A Y 14188 >>>>>> Self-heal Daemon on 192.168.1.93??????????? N/A?????? N/A Y 6003 >>>>>> >>>>>> Task Status of Volume gvol >>>>>> >> ------------------------------------------------------------------------------ >>>>>> There are no active volume tasks >>>>>> >>>>>> The glusterfs hosts a bunch of containers with its data volumes. >> The >>>>>> underlying fs is zfs. Few days ago one of the containers created a >>>>> lot >>>>>> of files in one of its data volumes, and at the end it completely >>>>>> filled up the space of the glusterfs volume. But this happened >> only >>>>> on >>>>>> one host, on the other host there was still enough space. We >> finally >>>>>> were able to identify this container and found out, the sizes of >> the >>>>>> data on /zbrick were different on both hosts for this container. >> Now >>>>>> we made the big mistake to delete these files on both hosts in the >>>>>> /zbrick volume, not on the mounted glusterfs volume. >>>>>> >>>>>> Later we found the reason for this behavior: the network driver on >>>>> the >>>>>> second node partially crashed (which means we ware able to login >> on >>>>>> the node, so we assumed the network was running, but the card was >>>>>> already dropping packets at this time) at the same time, as the >>>>> failed >>>>>> container started to fill up the gluster volume. After rebooting >> the >>>>>> second node? the gluster became available again. >>>>>> >>>>>> Now the glusterfs volume is running again- but it is still >> (nearly) >>>>>> full: the files created by the container are not visible, but they >>>>>> still count into amount of free space. How can we fix this? >>>>>> >>>>>> In addition there are some files which are no longer accessible >> since >>>>>> this accident: >>>>>> >>>>>> tail access.log.old >>>>>> tail: cannot open 'access.log.old' for reading: Input/output error >>>>>> >>>>>> Looks like affected by this error are files which have been >> changed >>>>>> during the accident. Is there a way to fix this too? >>>>>> >>>>>> Thanks >>>>>> ? ??? Mathias >>>>>> >>>>>> >>>>>> ________ >>>>>> >>>>>> >>>>>> >>>>>> Community Meeting Calendar: >>>>>> >>>>>> Schedule - >>>>>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >>>>>> Bridge: https://bluejeans.com/441850968 >>>>>> >>>>>> Gluster-users mailing list >>>>>> Gluster-users at gluster.org >>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>> ________ >>>>> >>>>> >>>>> >>>>> Community Meeting Calendar: >>>>> >>>>> Schedule - >>>>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >>>>> Bridge: https://bluejeans.com/441850968 >>>>> >>>>> Gluster-users mailing list >>>>> Gluster-users at gluster.org >>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>> ________ >>> >>> >>> >>> Community Meeting Calendar: >>> >>> Schedule - >>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >>> Bridge: https://bluejeans.com/441850968 >>> >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> https://lists.gluster.org/mailman/listinfo/gluster-users >> ________ >> >> >> >> Community Meeting Calendar: >> >> Schedule - >> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >> Bridge: https://bluejeans.com/441850968 >> >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users From gilberto.nunes32 at gmail.com Sun Aug 9 15:16:57 2020 From: gilberto.nunes32 at gmail.com (Gilberto Nunes) Date: Sun, 9 Aug 2020 12:16:57 -0300 Subject: [Gluster-users] Monitoring tools for GlusterFS In-Reply-To: <58f109a7-6d62-4814-425d-7728ea4f8338@fischer-ka.de> References: <58f109a7-6d62-4814-425d-7728ea4f8338@fischer-ka.de> Message-ID: I try it but get the error bellow ./gstatus.py -v Traceback (most recent call last): File "./gstatus.py", line 245, in main() File "./gstatus.py", line 133, in main cluster.initialise() File "/root/gstatus/gstatus/libgluster/cluster.py", line 97, in initialise self.define_nodes() File "/root/gstatus/gstatus/libgluster/cluster.py", line 170, in define_nodes local_ip_list = get_ipv4_addr() # Grab all IP's File "/root/gstatus/gstatus/libutils/network.py", line 130, in get_ipv4_addr namestr = names.tobytes() AttributeError: 'array.array' object has no attribute 'tobytes' --- Gilberto Nunes Ferreira (47) 3025-5907 (47) 99676-7530 - Whatsapp / Telegram Skype: gilberto.nunes36 Em s?b., 8 de ago. de 2020 ?s 18:28, Ingo Fischer escreveu: > Hey, > > I use gstatus https://github.com/gluster/gstatus > > Ingo > > Am 08.08.20 um 06:27 schrieb Gilberto Nunes: > > Hi guys... I miss some tools that could be used in order to monitor > > healing for example, and others things like resources used... What do > > you recommend? > > Tools that could be used in CLI but that shows a percentage as healing > > is under way would be nice! > > > > Thanks. > > > > --- > > Gilberto Nunes Ferreira > > > > > > > > > > ________ > > > > > > > > Community Meeting Calendar: > > > > Schedule - > > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > > Bridge: https://bluejeans.com/441850968 > > > > Gluster-users mailing list > > Gluster-users at gluster.org > > https://lists.gluster.org/mailman/listinfo/gluster-users > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hunter86_bg at yahoo.com Sun Aug 9 15:27:22 2020 From: hunter86_bg at yahoo.com (Strahil Nikolov) Date: Sun, 09 Aug 2020 18:27:22 +0300 Subject: [Gluster-users] Monitoring tools for GlusterFS In-Reply-To: References: <58f109a7-6d62-4814-425d-7728ea4f8338@fischer-ka.de> Message-ID: You got a problem with your gstatus. How did you deploy it ? What is your gluster version ?Mine is working quite fine with 7.7 Best Regards, Strahil Nikolov ?? 9 ?????? 2020 ?. 18:16:57 GMT+03:00, Gilberto Nunes ??????: >I try it but get the error bellow > >./gstatus.py -v > >Traceback (most recent call last): > File "./gstatus.py", line 245, in > main() > File "./gstatus.py", line 133, in main > cluster.initialise() >File "/root/gstatus/gstatus/libgluster/cluster.py", line 97, in >initialise > self.define_nodes() > File "/root/gstatus/gstatus/libgluster/cluster.py", line 170, in >define_nodes > local_ip_list = get_ipv4_addr() # Grab all IP's > File "/root/gstatus/gstatus/libutils/network.py", line 130, in >get_ipv4_addr > namestr = names.tobytes() >AttributeError: 'array.array' object has no attribute 'tobytes' > > >--- >Gilberto Nunes Ferreira > >(47) 3025-5907 >(47) 99676-7530 - Whatsapp / Telegram > >Skype: gilberto.nunes36 > > > > > >Em s?b., 8 de ago. de 2020 ?s 18:28, Ingo Fischer >escreveu: > >> Hey, >> >> I use gstatus https://github.com/gluster/gstatus >> >> Ingo >> >> Am 08.08.20 um 06:27 schrieb Gilberto Nunes: >> > Hi guys... I miss some tools that could be used in order to >monitor >> > healing for example, and others things like resources used... What >do >> > you recommend? >> > Tools that could be used in CLI but that shows a percentage as >healing >> > is under way would be nice! >> > >> > Thanks. >> > >> > --- >> > Gilberto Nunes Ferreira >> > >> > >> > >> > >> > ________ >> > >> > >> > >> > Community Meeting Calendar: >> > >> > Schedule - >> > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >> > Bridge: https://bluejeans.com/441850968 >> > >> > Gluster-users mailing list >> > Gluster-users at gluster.org >> > https://lists.gluster.org/mailman/listinfo/gluster-users >> > >> From gilberto.nunes32 at gmail.com Sun Aug 9 17:12:35 2020 From: gilberto.nunes32 at gmail.com (Gilberto Nunes) Date: Sun, 9 Aug 2020 14:12:35 -0300 Subject: [Gluster-users] Monitoring tools for GlusterFS In-Reply-To: References: <58f109a7-6d62-4814-425d-7728ea4f8338@fischer-ka.de> Message-ID: How did you deploy it ? - git clone, ./gstatus.py, and python gstatus.py install then gstatus What is your gluster version ? Latest stable to Debian Buster (v8) --- Gilberto Nunes Ferreira (47) 3025-5907 (47) 99676-7530 - Whatsapp / Telegram Skype: gilberto.nunes36 Em dom., 9 de ago. de 2020 ?s 12:28, Strahil Nikolov escreveu: > You got a problem with your gstatus. > How did you deploy it ? > What is your gluster version ?Mine is working quite fine with 7.7 > > Best Regards, > Strahil Nikolov > > ?? 9 ?????? 2020 ?. 18:16:57 GMT+03:00, Gilberto Nunes < > gilberto.nunes32 at gmail.com> ??????: > >I try it but get the error bellow > > > >./gstatus.py -v > > > >Traceback (most recent call last): > > File "./gstatus.py", line 245, in > > main() > > File "./gstatus.py", line 133, in main > > cluster.initialise() > >File "/root/gstatus/gstatus/libgluster/cluster.py", line 97, in > >initialise > > self.define_nodes() > > File "/root/gstatus/gstatus/libgluster/cluster.py", line 170, in > >define_nodes > > local_ip_list = get_ipv4_addr() # Grab all IP's > > File "/root/gstatus/gstatus/libutils/network.py", line 130, in > >get_ipv4_addr > > namestr = names.tobytes() > >AttributeError: 'array.array' object has no attribute 'tobytes' > > > > > >--- > >Gilberto Nunes Ferreira > > > >(47) 3025-5907 > >(47) 99676-7530 - Whatsapp / Telegram > > > >Skype: gilberto.nunes36 > > > > > > > > > > > >Em s?b., 8 de ago. de 2020 ?s 18:28, Ingo Fischer > >escreveu: > > > >> Hey, > >> > >> I use gstatus https://github.com/gluster/gstatus > >> > >> Ingo > >> > >> Am 08.08.20 um 06:27 schrieb Gilberto Nunes: > >> > Hi guys... I miss some tools that could be used in order to > >monitor > >> > healing for example, and others things like resources used... What > >do > >> > you recommend? > >> > Tools that could be used in CLI but that shows a percentage as > >healing > >> > is under way would be nice! > >> > > >> > Thanks. > >> > > >> > --- > >> > Gilberto Nunes Ferreira > >> > > >> > > >> > > >> > > >> > ________ > >> > > >> > > >> > > >> > Community Meeting Calendar: > >> > > >> > Schedule - > >> > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > >> > Bridge: https://bluejeans.com/441850968 > >> > > >> > Gluster-users mailing list > >> > Gluster-users at gluster.org > >> > https://lists.gluster.org/mailman/listinfo/gluster-users > >> > > >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hunter86_bg at yahoo.com Sun Aug 9 18:37:35 2020 From: hunter86_bg at yahoo.com (Strahil Nikolov) Date: Sun, 9 Aug 2020 18:37:35 +0000 (UTC) Subject: [Gluster-users] Monitoring tools for GlusterFS In-Reply-To: References: <58f109a7-6d62-4814-425d-7728ea4f8338@fischer-ka.de> Message-ID: <292755878.1582157.1596998255554@mail.yahoo.com> Hi Gilberto, I just tested latest master branch on CentOS8 -> total failure (both python2 and python3). I have opened a github issue at?https://github.com/gluster/gstatus/issues/30? I guess, you can add details for OS, packages and gluster version (maybe a traceback also). The more interesing part is that the old zip I got is working on CentOS8 (with python2.7) without issues: [root at glustera gstatus-master]# python2 gstatus.py ? ? ????Product: Community ?????????Capacity: ?22.00 GiB(raw bricks) ?????Status: HEALTHY ?????????????????????520.00 MiB(raw used) ??Glusterfs: 8.0 ???????????????????????????6.00 GiB(usable from volumes) ?OverCommit: No ???????????????Snapshots: ??0 If you wish, I can try to upload it somewhere for you. Best Regards, Strahil Nikolov ? ??????, 9 ?????? 2020 ?., 20:13:13 ???????+3, Gilberto Nunes ??????: How did you deploy it ? - git clone, ./gstatus.py, and python gstatus.py install then gstatus What is your gluster version ? Latest?stable to Debian Buster (v8) --- Gilberto Nunes Ferreira (47) 3025-5907 (47) 99676-7530 - Whatsapp / Telegram Skype: gilberto.nunes36 Em dom., 9 de ago. de 2020 ?s 12:28, Strahil Nikolov escreveu: > You got a problem with your gstatus. > How did you deploy it ? > What is your gluster version ?Mine is working quite fine with 7.7 > > Best Regards, > Strahil Nikolov > > ?? 9 ?????? 2020 ?. 18:16:57 GMT+03:00, Gilberto Nunes ??????: >>I try it but get the error bellow >> >>./gstatus.py -v >> >>Traceback (most recent call last): >>? File "./gstatus.py", line 245, in >>? ? main() >>? File "./gstatus.py", line 133, in main >>? ? cluster.initialise() >>File "/root/gstatus/gstatus/libgluster/cluster.py", line 97, in >>initialise >>? ? self.define_nodes() >>? File "/root/gstatus/gstatus/libgluster/cluster.py", line 170, in >>define_nodes >>? ? local_ip_list = get_ipv4_addr()? # Grab all IP's >>? File "/root/gstatus/gstatus/libutils/network.py", line 130, in >>get_ipv4_addr >>? ? namestr = names.tobytes() >>AttributeError: 'array.array' object has no attribute 'tobytes' >> >> >>--- >>Gilberto Nunes Ferreira >> >>(47) 3025-5907 >>(47) 99676-7530 - Whatsapp / Telegram >> >>Skype: gilberto.nunes36 >> >> >> >> >> >>Em s?b., 8 de ago. de 2020 ?s 18:28, Ingo Fischer >>escreveu: >> >>> Hey, >>> >>> I use gstatus https://github.com/gluster/gstatus >>> >>> Ingo >>> >>> Am 08.08.20 um 06:27 schrieb Gilberto Nunes: >>> > Hi guys...? I miss some tools that could be used in order to >>monitor >>> > healing for example, and others things like resources used...? What >>do >>> > you recommend? >>> > Tools that could be used in CLI but that shows a percentage as >>healing >>> > is under way would be nice! >>> > >>> > Thanks. >>> > >>> > --- >>> > Gilberto Nunes Ferreira >>> > >>> > >>> > >>> > >>> > ________ >>> > >>> > >>> > >>> > Community Meeting Calendar: >>> > >>> > Schedule - >>> > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >>> > Bridge: https://bluejeans.com/441850968 >>> > >>> > Gluster-users mailing list >>> > Gluster-users at gluster.org >>> > https://lists.gluster.org/mailman/listinfo/gluster-users >>> > >>> > From rkothiya at redhat.com Mon Aug 10 19:17:40 2020 From: rkothiya at redhat.com (Rinku Kothiya) Date: Tue, 11 Aug 2020 00:47:40 +0530 Subject: [Gluster-users] [Gluster-devel] Announcing Gluster release 6.10 Message-ID: Hi, The Gluster community is pleased to announce the release of Gluster6.10 (packages available at [1]). Release notes for the release can be found at [2]. This is the last minor release of 6. Users are highly encouraged to upgrade to newer releases of GlusterFS. Please Note: Some of the packages are unavailable and we are working on it. We will release them soon. Thanks, Gluster community References: [1] Packages for 6.10: https://download.gluster.org/pub/gluster/glusterfs/6/6.10/ [2] Release notes for 6.10: https://docs.gluster.org/en/latest/release-notes/6.10/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From dm at belkam.com Wed Aug 12 07:00:52 2020 From: dm at belkam.com (Dmitry Melekhov) Date: Wed, 12 Aug 2020 11:00:52 +0400 Subject: [Gluster-users] gluster over vdo, problem with gfapi Message-ID: <0896ab00-bc2f-6ec4-ab21-5800006c2947@belkam.com> Hello! We are testing gluster 8 on centos 8.2 and we try to use volume created over vdo. This is 2 nodes setup. There is lvm created over vdo, and xfs filesystem. Test vm runs just fine if? we run vm over fuse: ? ???? ???? ???? /root/pool/ is fuse mount. but if we try to run: ? ???? ???? ?????? ???? ???? ?? then vm boot dies, qemu says- no bootable device. It works without cache='directsync' though. But live migration does not work. btw, everything work OK if we run VM on gluster volume without vdo... Any ideas what can cause this and how it can be fixed? Thank you! From dm at belkam.com Wed Aug 12 07:14:04 2020 From: dm at belkam.com (dm) Date: Wed, 12 Aug 2020 11:14:04 +0400 Subject: [Gluster-users] gluster over vdo, problem with gfapi In-Reply-To: <0896ab00-bc2f-6ec4-ab21-5800006c2947@belkam.com> References: <0896ab00-bc2f-6ec4-ab21-5800006c2947@belkam.com> Message-ID: btw, part of brick log: [2020-08-12 07:08:32.646082] I [MSGID: 115029] [server-handshake.c:561:server_setvolume] 0-pool-server: accepted client from CTX_ID:9eea4bec-a522-4a29-be83-5d66c04ce6ee-GRAPH_ID:0-PID:765 2-HOST:nabu-PC_NAME:pool-client-2-RECON_NO:-0 (version: 8.0) with subvol /wall/pool/brick [2020-08-12 07:08:32.669522] E [MSGID: 113040] [posix-inode-fd-ops.c:1727:posix_readv] 0-pool-posix: read failed on gfid=231fbad6-8d8d-4555-8137-2362a06fc140, fd=0x7f342800ca38, offset=0 size=512, buf=0x7f345450f000 [Invalid argument] [2020-08-12 07:08:32.669565] E [MSGID: 115068] [server-rpc-fops_v2.c:1374:server4_readv_cbk] 0-pool-server: READ info [{frame=34505}, {READV_fd_no=0}, {uuid_utoa=231fbad6-8d8d-4555-8137-2 362a06fc140}, {client=CTX_ID:9eea4bec-a522-4a29-be83-5d66c04ce6ee-GRAPH_ID:0-PID:7652-HOST:nabu-PC_NAME:pool-client-2-RECON_NO:-0}, {error-xlator=pool-posix}, {errno=22}, {error=Invalid a rgument}] [2020-08-12 07:08:33.241625] E [MSGID: 113040] [posix-inode-fd-ops.c:1727:posix_readv] 0-pool-posix: read failed on gfid=231fbad6-8d8d-4555-8137-2362a06fc140, fd=0x7f342800ca38, offset=0 size=512, buf=0x7f345450f000 [Invalid argument] [2020-08-12 07:08:33.241669] E [MSGID: 115068] [server-rpc-fops_v2.c:1374:server4_readv_cbk] 0-pool-server: READ info [{frame=34507}, {READV_fd_no=0}, {uuid_utoa=231fbad6-8d8d-4555-8137-2 362a06fc140}, {client=CTX_ID:9eea4bec-a522-4a29-be83-5d66c04ce6ee-GRAPH_ID:0-PID:7652-HOST:nabu-PC_NAME:pool-client-2-RECON_NO:-0}, {error-xlator=pool-posix}, {errno=22}, {error=Invalid a rgument}] [2020-08-12 07:09:45.897326] W [socket.c:767:__socket_rwv] 0-tcp.pool-server: readv on 192.168.222.25:49081 failed (No data available) [2020-08-12 07:09:45.897357] I [MSGID: 115036] [server.c:498:server_rpc_notify] 0-pool-server: disconnecting connection [{client-uid=CTX_ID:9eea4bec-a522-4a29-be83-5d66c04ce6ee-GRAPH_ID:0 -PID:7652-HOST:nabu-PC_NAME:pool-client-2-RECON_NO:-0}] Thank you! 12.08.2020 11:00, Dmitry Melekhov ?????: > Hello! > > > We are testing gluster 8 on centos 8.2 and we try to use volume > created over vdo. > > This is 2 nodes setup. > > There is lvm created over vdo, and xfs filesystem. > > > Test vm runs just fine if? we run vm over fuse: > > > ? > ???? > ???? > ???? > > > /root/pool/ is fuse mount. > > > but if we try to run: > > > ? > ???? > ???? > ?????? > ???? > ???? > ?? > > > then vm boot dies, qemu says- no bootable device. > > > It works without cache='directsync' though. > > But live migration does not work. > > > btw, everything work OK if we run VM on gluster volume without vdo... > > Any ideas what can cause this and how it can be fixed? > > > Thank you! > From dm at belkam.com Wed Aug 12 07:39:44 2020 From: dm at belkam.com (dm) Date: Wed, 12 Aug 2020 11:39:44 +0400 Subject: [Gluster-users] gluster over vdo, problem with gfapi In-Reply-To: <0896ab00-bc2f-6ec4-ab21-5800006c2947@belkam.com> References: <0896ab00-bc2f-6ec4-ab21-5800006c2947@belkam.com> Message-ID: Some more info, really we have lvm over lvm here: lvm-vdo-lvm... Thank you! 12.08.2020 11:00, Dmitry Melekhov ?????: > Hello! > > > We are testing gluster 8 on centos 8.2 and we try to use volume > created over vdo. > > This is 2 nodes setup. > > There is lvm created over vdo, and xfs filesystem. > > > Test vm runs just fine if? we run vm over fuse: > > > ? > ???? > ???? > ???? > > > /root/pool/ is fuse mount. > > > but if we try to run: > > > ? > ???? > ???? > ?????? > ???? > ???? > ?? > > > then vm boot dies, qemu says- no bootable device. > > > It works without cache='directsync' though. > > But live migration does not work. > > > btw, everything work OK if we run VM on gluster volume without vdo... > > Any ideas what can cause this and how it can be fixed? > > > Thank you! > From dm at belkam.com Wed Aug 12 07:46:59 2020 From: dm at belkam.com (dm) Date: Wed, 12 Aug 2020 11:46:59 +0400 Subject: [Gluster-users] gluster over vdo, problem with gfapi In-Reply-To: References: <0896ab00-bc2f-6ec4-ab21-5800006c2947@belkam.com> Message-ID: <0da4807e-b342-ee5b-d43c-5da882c05315@belkam.com> 12.08.2020 11:39, dm ?????: > Some more info, really we have lvm over lvm here: > > lvm-vdo-lvm... > > Thank you! > Sorry, this is wrong, I forgot we replaced this, vdo now is over physical drive... So, only one lvm layer here. > > 12.08.2020 11:00, Dmitry Melekhov ?????: >> Hello! >> >> >> We are testing gluster 8 on centos 8.2 and we try to use volume >> created over vdo. >> >> This is 2 nodes setup. >> >> There is lvm created over vdo, and xfs filesystem. >> >> >> Test vm runs just fine if? we run vm over fuse: >> >> >> ? >> ???? >> ???? >> ???? >> >> >> /root/pool/ is fuse mount. >> >> >> but if we try to run: >> >> >> ? >> ???? >> ???? >> ?????? >> ???? >> ???? >> ?? >> >> >> then vm boot dies, qemu says- no bootable device. >> >> >> It works without cache='directsync' though. >> >> But live migration does not work. >> >> >> btw, everything work OK if we run VM on gluster volume without vdo... >> >> Any ideas what can cause this and how it can be fixed? >> >> >> Thank you! >> > From amar at kadalu.io Wed Aug 12 08:55:00 2020 From: amar at kadalu.io (Amar Tumballi) Date: Wed, 12 Aug 2020 14:25:00 +0530 Subject: [Gluster-users] gluster over vdo, problem with gfapi In-Reply-To: <0da4807e-b342-ee5b-d43c-5da882c05315@belkam.com> References: <0896ab00-bc2f-6ec4-ab21-5800006c2947@belkam.com> <0da4807e-b342-ee5b-d43c-5da882c05315@belkam.com> Message-ID: Hi Dimitry, Was this working earlier and now failing on Version 8 or is this a new setup which you did first time? -Amar On Wed, Aug 12, 2020 at 1:17 PM dm wrote: > 12.08.2020 11:39, dm ?????: > > Some more info, really we have lvm over lvm here: > > > > lvm-vdo-lvm... > > > > Thank you! > > > > Sorry, this is wrong, I forgot we replaced this, > > vdo now is over physical drive... > > So, only one lvm layer here. > > > > > 12.08.2020 11:00, Dmitry Melekhov ?????: > >> Hello! > >> > >> > >> We are testing gluster 8 on centos 8.2 and we try to use volume > >> created over vdo. > >> > >> This is 2 nodes setup. > >> > >> There is lvm created over vdo, and xfs filesystem. > >> > >> > >> Test vm runs just fine if we run vm over fuse: > >> > >> > >> > >> > >> > >> > >> > >> > >> /root/pool/ is fuse mount. > >> > >> > >> but if we try to run: > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> then vm boot dies, qemu says- no bootable device. > >> > >> > >> It works without cache='directsync' though. > >> > >> But live migration does not work. > >> > >> > >> btw, everything work OK if we run VM on gluster volume without vdo... > >> > >> Any ideas what can cause this and how it can be fixed? > >> > >> > >> Thank you! > >> > > > > ________ > > > > Community Meeting Calendar: > > Schedule - > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > Bridge: https://bluejeans.com/441850968 > > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > -- -- https://kadalu.io Container Storage made easy! -------------- next part -------------- An HTML attachment was scrubbed... URL: From dm at belkam.com Wed Aug 12 09:00:19 2020 From: dm at belkam.com (Dmitry Melekhov) Date: Wed, 12 Aug 2020 13:00:19 +0400 Subject: [Gluster-users] gluster over vdo, problem with gfapi In-Reply-To: References: <0896ab00-bc2f-6ec4-ab21-5800006c2947@belkam.com> <0da4807e-b342-ee5b-d43c-5da882c05315@belkam.com> Message-ID: 12.08.2020 12:55, Amar Tumballi ?????: > Hi Dimitry, > > Was this working earlier and now failing on Version 8 or is this a new > setup which you did first time? > Hello! This is first time we? are testing gluster over vdo. Thank you! From sasundar at redhat.com Wed Aug 12 13:34:26 2020 From: sasundar at redhat.com (Satheesaran Sundaramoorthi) Date: Wed, 12 Aug 2020 19:04:26 +0530 Subject: [Gluster-users] gluster over vdo, problem with gfapi In-Reply-To: References: <0896ab00-bc2f-6ec4-ab21-5800006c2947@belkam.com> <0da4807e-b342-ee5b-d43c-5da882c05315@belkam.com> Message-ID: On Wed, Aug 12, 2020 at 2:30 PM Dmitry Melekhov wrote: > 12.08.2020 12:55, Amar Tumballi ?????: > > Hi Dimitry, > > > > Was this working earlier and now failing on Version 8 or is this a new > > setup which you did first time? > > > Hello! > > > This is first time we are testing gluster over vdo. > > Thank you! > > > Hello Dmitry, I have been testing the RHEL downstream variant of gluster with RHEL 8.2, where VMs are created with their images on fuse mounted gluster volume with VDO. This worked good. But I see you are using 'gfapi', so that could be different. Though I don't have valuable inputs to help you, do you see 'gfapi' good enough than using fuse mounted volume -- Satheesaran S -------------- next part -------------- An HTML attachment was scrubbed... URL: From hunter86_bg at yahoo.com Wed Aug 12 13:50:21 2020 From: hunter86_bg at yahoo.com (Strahil Nikolov) Date: Wed, 12 Aug 2020 16:50:21 +0300 Subject: [Gluster-users] gluster over vdo, problem with gfapi In-Reply-To: References: <0896ab00-bc2f-6ec4-ab21-5800006c2947@belkam.com> <0da4807e-b342-ee5b-d43c-5da882c05315@belkam.com> Message-ID: <637AF5D4-E2FE-435B-AD2E-B3295DC7E719@yahoo.com> Libgfapi brings far better performance , but qemu has some limitations. If it works on FUSE , but not on libgfapi -> it seems obvious. Have you tried to connect from C7 to the Gluster TSP via libgfapi. Also, is SELINUX in enforcing or not ? Best Regards, Strahil Nikolov ?? 12 ?????? 2020 ?. 16:34:26 GMT+03:00, Satheesaran Sundaramoorthi ??????: >On Wed, Aug 12, 2020 at 2:30 PM Dmitry Melekhov wrote: > >> 12.08.2020 12:55, Amar Tumballi ?????: >> > Hi Dimitry, >> > >> > Was this working earlier and now failing on Version 8 or is this a >new >> > setup which you did first time? >> > >> Hello! >> >> >> This is first time we are testing gluster over vdo. >> >> Thank you! >> >> >> Hello Dmitry, > >I have been testing the RHEL downstream variant of gluster with RHEL >8.2, >where VMs are created with their images on fuse mounted gluster volume >with >VDO. >This worked good. > >But I see you are using 'gfapi', so that could be different. >Though I don't have valuable inputs to help you, do you see 'gfapi' >good >enough than using fuse mounted volume > >-- Satheesaran S From dm at belkam.com Wed Aug 12 15:03:29 2020 From: dm at belkam.com (Dmitry Melekhov) Date: Wed, 12 Aug 2020 19:03:29 +0400 Subject: [Gluster-users] gluster over vdo, problem with gfapi In-Reply-To: <637AF5D4-E2FE-435B-AD2E-B3295DC7E719@yahoo.com> References: <0896ab00-bc2f-6ec4-ab21-5800006c2947@belkam.com> <0da4807e-b342-ee5b-d43c-5da882c05315@belkam.com> <637AF5D4-E2FE-435B-AD2E-B3295DC7E719@yahoo.com> Message-ID: 12.08.2020 17:50, Strahil Nikolov ?????: > Libgfapi brings far better performance , Yes, and several vms do not rely on the same mount point... > but qemu has some limitations. > > > If it works on FUSE , but not on libgfapi -> it seems obvious. Not obvious for me, we tested vdo locally, i.e. without gluster and qemu works with cache=none or cache=directsync without problems, so problem is somewhere in gluster. > > Have you tried to connect from C7 to the Gluster TSP via libgfapi. No, but we tested the same setup with gluster 7 with the same result before we upgraded to 8. > > Also, is SELINUX in enforcing or not ? selinux is disabled... Thank you! > > Best Regards, > Strahil Nikolov > > ?? 12 ?????? 2020 ?. 16:34:26 GMT+03:00, Satheesaran Sundaramoorthi ??????: >> On Wed, Aug 12, 2020 at 2:30 PM Dmitry Melekhov wrote: >> >>> 12.08.2020 12:55, Amar Tumballi ?????: >>>> Hi Dimitry, >>>> >>>> Was this working earlier and now failing on Version 8 or is this a >> new >>>> setup which you did first time? >>>> >>> Hello! >>> >>> >>> This is first time we are testing gluster over vdo. >>> >>> Thank you! >>> >>> >>> Hello Dmitry, >> I have been testing the RHEL downstream variant of gluster with RHEL >> 8.2, >> where VMs are created with their images on fuse mounted gluster volume >> with >> VDO. >> This worked good. >> >> But I see you are using 'gfapi', so that could be different. >> Though I don't have valuable inputs to help you, do you see 'gfapi' >> good >> enough than using fuse mounted volume We think that gfapi is better for 2 reasons: 1. it is faster; 2. each qemu process connects to gluster cluster , so there is no one point of failure- fuse mount... Thank you! >> >> -- Satheesaran S From sacchi at kadalu.io Wed Aug 12 16:52:19 2020 From: sacchi at kadalu.io (Sachidananda Urs) Date: Wed, 12 Aug 2020 22:22:19 +0530 Subject: [Gluster-users] Monitoring tools for GlusterFS In-Reply-To: References: <58f109a7-6d62-4814-425d-7728ea4f8338@fischer-ka.de> Message-ID: On Sun, Aug 9, 2020 at 10:43 PM Gilberto Nunes wrote: > How did you deploy it ? - git clone, ./gstatus.py, and python gstatus.py > install then gstatus > > What is your gluster version ? Latest stable to Debian Buster (v8) > > > Hello Gilberto. I just made a 1.0.0 release. gstatus binary is available to download from (requires python >= 3.6) https://github.com/gluster/gstatus/releases/tag/v1.0.0 You can find the complete documentation here: https://github.com/gluster/gstatus/blob/master/README Follow the below steps for a quick method to test it out: # curl -LO https://github.com/gluster/gstatus/releases/download/v1.0.0/gstatus # chmod +x gstatus # ./gstatus -a # ./gstatus --help If you like what you see. You can move it to /usr/local/bin Would like to hear your feedback. Any feature requests/bugs/PRs are welcome. -sac -------------- next part -------------- An HTML attachment was scrubbed... URL: From gilberto.nunes32 at gmail.com Wed Aug 12 17:01:45 2020 From: gilberto.nunes32 at gmail.com (Gilberto Nunes) Date: Wed, 12 Aug 2020 14:01:45 -0300 Subject: [Gluster-users] Monitoring tools for GlusterFS In-Reply-To: References: <58f109a7-6d62-4814-425d-7728ea4f8338@fischer-ka.de> Message-ID: It's work! ./gstatus -a Cluster: Status: Healthy GlusterFS: 8.0 Nodes: 2/2 Volumes: 1/1 Volumes: VMS Replicate Started (UP) - 2/2 Bricks Up Capacity: (28.41% used) 265.00 GiB/931.00 GiB (used/total) Bricks: Distribute Group 1: glusterfs01:/DATA/vms (Online) glusterfs02:/DATA/vms (Online) Awesome, thanks! --- Gilberto Nunes Ferreira (47) 3025-5907 (47) 99676-7530 - Whatsapp / Telegram Skype: gilberto.nunes36 Em qua., 12 de ago. de 2020 ?s 13:52, Sachidananda Urs escreveu: > > > On Sun, Aug 9, 2020 at 10:43 PM Gilberto Nunes > wrote: > >> How did you deploy it ? - git clone, ./gstatus.py, and python gstatus.py >> install then gstatus >> >> What is your gluster version ? Latest stable to Debian Buster (v8) >> >> >> > Hello Gilberto. I just made a 1.0.0 release. > gstatus binary is available to download from (requires python >= 3.6) > https://github.com/gluster/gstatus/releases/tag/v1.0.0 > > You can find the complete documentation here: > https://github.com/gluster/gstatus/blob/master/README > > Follow the below steps for a quick method to test it out: > > # curl -LO > https://github.com/gluster/gstatus/releases/download/v1.0.0/gstatus > > # chmod +x gstatus > > # ./gstatus -a > # ./gstatus --help > > If you like what you see. You can move it to /usr/local/bin > > Would like to hear your feedback. Any feature requests/bugs/PRs are > welcome. > > -sac > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hunter86_bg at yahoo.com Wed Aug 12 19:25:05 2020 From: hunter86_bg at yahoo.com (Strahil Nikolov) Date: Wed, 12 Aug 2020 22:25:05 +0300 Subject: [Gluster-users] gluster over vdo, problem with gfapi In-Reply-To: References: <0896ab00-bc2f-6ec4-ab21-5800006c2947@belkam.com> <0da4807e-b342-ee5b-d43c-5da882c05315@belkam.com> <637AF5D4-E2FE-435B-AD2E-B3295DC7E719@yahoo.com> Message-ID: <27761A0B-DF92-4CFC-8F69-8125928856F1@yahoo.com> I am not sure that it is ok to use any caching (at least ovirt doesn't uses) . Have you set the 'virt' group of settings ? They seem to be optimal , but keep in mind that if you enable them -> you will enable sharding which cannot be 'disabled' afterwards. The fact that it works on C7 is strange, with wifh version of gluster did you test. Best Regards, Strahil Nikolov ?? 12 ?????? 2020 ?. 18:03:29 GMT+03:00, Dmitry Melekhov ??????: > >12.08.2020 17:50, Strahil Nikolov ?????: >> Libgfapi brings far better performance , > >Yes, and several vms do not rely on the same mount point... > > >> but qemu has some limitations. >> >> >> If it works on FUSE , but not on libgfapi -> it seems obvious. > > >Not obvious for me, we tested vdo locally, i.e. without gluster and >qemu >works with cache=none or cache=directsync without problems, > >so problem is somewhere in gluster. > >> >> Have you tried to connect from C7 to the Gluster TSP via libgfapi. >No, but we tested the same setup with gluster 7 with the same result >before we upgraded to 8. >> >> Also, is SELINUX in enforcing or not ? > >selinux is disabled... > > >Thank you! > >> >> Best Regards, >> Strahil Nikolov >> >> ?? 12 ?????? 2020 ?. 16:34:26 GMT+03:00, Satheesaran Sundaramoorthi > ??????: >>> On Wed, Aug 12, 2020 at 2:30 PM Dmitry Melekhov >wrote: >>> >>>> 12.08.2020 12:55, Amar Tumballi ?????: >>>>> Hi Dimitry, >>>>> >>>>> Was this working earlier and now failing on Version 8 or is this a >>> new >>>>> setup which you did first time? >>>>> >>>> Hello! >>>> >>>> >>>> This is first time we are testing gluster over vdo. >>>> >>>> Thank you! >>>> >>>> >>>> Hello Dmitry, >>> I have been testing the RHEL downstream variant of gluster with RHEL >>> 8.2, >>> where VMs are created with their images on fuse mounted gluster >volume >>> with >>> VDO. >>> This worked good. >>> >>> But I see you are using 'gfapi', so that could be different. >>> Though I don't have valuable inputs to help you, do you see 'gfapi' >>> good >>> enough than using fuse mounted volume > > >We think that gfapi is better for 2 reasons: > >1. it is faster; > >2. each qemu process connects to gluster cluster , so there is no one >point of failure- fuse mount... > > >Thank you! > >>> >>> -- Satheesaran S From hunter86_bg at yahoo.com Wed Aug 12 19:28:07 2020 From: hunter86_bg at yahoo.com (Strahil Nikolov) Date: Wed, 12 Aug 2020 22:28:07 +0300 Subject: [Gluster-users] Monitoring tools for GlusterFS In-Reply-To: References: <58f109a7-6d62-4814-425d-7728ea4f8338@fischer-ka.de> Message-ID: <54415920-F5A1-43BD-B55B-1E25B9C9C069@yahoo.com> I couldn't make it work on C8... Maybe I was cloning the wrong branch. Details can be found at https://github.com/gluster/gstatus/issues/30#issuecomment-673041743 Best Regards, Strahil Nikolov ?? 12 ?????? 2020 ?. 20:01:45 GMT+03:00, Gilberto Nunes ??????: >It's work! >./gstatus -a > >Cluster: > Status: Healthy GlusterFS: 8.0 > Nodes: 2/2 Volumes: 1/1 > >Volumes: > VMS Replicate Started (UP) - 2/2 >Bricks Up > Capacity: (28.41% >used) 265.00 GiB/931.00 GiB (used/total) > Bricks: > Distribute Group 1: > >glusterfs01:/DATA/vms > (Online) > >glusterfs02:/DATA/vms > (Online) > > > >Awesome, thanks! >--- >Gilberto Nunes Ferreira > >(47) 3025-5907 >(47) 99676-7530 - Whatsapp / Telegram > >Skype: gilberto.nunes36 > > > > > >Em qua., 12 de ago. de 2020 ?s 13:52, Sachidananda Urs > >escreveu: > >> >> >> On Sun, Aug 9, 2020 at 10:43 PM Gilberto Nunes > >> wrote: >> >>> How did you deploy it ? - git clone, ./gstatus.py, and python >gstatus.py >>> install then gstatus >>> >>> What is your gluster version ? Latest stable to Debian Buster (v8) >>> >>> >>> >> Hello Gilberto. I just made a 1.0.0 release. >> gstatus binary is available to download from (requires python >= 3.6) >> https://github.com/gluster/gstatus/releases/tag/v1.0.0 >> >> You can find the complete documentation here: >> https://github.com/gluster/gstatus/blob/master/README >> >> Follow the below steps for a quick method to test it out: >> >> # curl -LO >> https://github.com/gluster/gstatus/releases/download/v1.0.0/gstatus >> >> # chmod +x gstatus >> >> # ./gstatus -a >> # ./gstatus --help >> >> If you like what you see. You can move it to /usr/local/bin >> >> Would like to hear your feedback. Any feature requests/bugs/PRs are >> welcome. >> >> -sac >> From archon810 at gmail.com Wed Aug 12 22:03:29 2020 From: archon810 at gmail.com (Artem Russakovskii) Date: Wed, 12 Aug 2020 15:03:29 -0700 Subject: [Gluster-users] Pending healing... In-Reply-To: References: Message-ID: Remove the "summary" part of the command, which should list the exact file pending heal. Then launch the heal manually. If it still doesn't heal, try running md5sum on the file and see if it heals after that. Sincerely, Artem -- Founder, Android Police , APK Mirror , Illogical Robot LLC beerpla.net | @ArtemR On Fri, Aug 7, 2020 at 11:03 AM Gilberto Nunes wrote: > Hi > > I have a pending entry like this > > gluster vol heal VMS info summary > Brick glusterfs01:/DATA/vms > Status: Connected > Total Number of entries: 1 > Number of entries in heal pending: 1 > Number of entries in split-brain: 0 > Number of entries possibly healing: 0 > > Brick glusterfs02:/DATA/vms > Status: Connected > Total Number of entries: 1 > Number of entries in heal pending: 1 > Number of entries in split-brain: 0 > Number of entries possibly healing: 0 > > How can I solve this? > Should I follow this? > > > https://icicimov.github.io/blog/high-availability/GlusterFS-metadata-split-brain-recovery/ > > > > --- > Gilberto Nunes Ferreira > > > > > ________ > > > > Community Meeting Calendar: > > Schedule - > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > Bridge: https://bluejeans.com/441850968 > > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From archon810 at gmail.com Wed Aug 12 22:08:14 2020 From: archon810 at gmail.com (Artem Russakovskii) Date: Wed, 12 Aug 2020 15:08:14 -0700 Subject: [Gluster-users] performance In-Reply-To: <40dab403-411b-aa6a-83e5-7a45021ac866@computerisms.ca> References: <696b3c28-519b-c3e3-ce5d-e60d2f194d4c@computerisms.ca> <7991483E-5365-4C87-89FA-C871AED18062@yahoo.com> <345b06c4-5996-9aa3-f846-0944c60ee398@computerisms.ca> <2CD68ED2-199F-407D-B0CC-385793BA16FD@yahoo.com> <64ee1b88-42d6-75d2-05ff-4703d168cc25@computerisms.ca> <40dab403-411b-aa6a-83e5-7a45021ac866@computerisms.ca> Message-ID: Hmm, in our case of running gluster across Linode block storage (which itself runs inside Ceph, as I found out), the only thing that helped with the hangs so far was defragmenting xfs. I tried changing many things, including the scheduler to "none" and this performance.write-behind-window-size setting, and nothing seemed to help or provide any meaningful difference. Sincerely, Artem -- Founder, Android Police , APK Mirror , Illogical Robot LLC beerpla.net | @ArtemR On Fri, Aug 7, 2020 at 11:28 AM Computerisms Corporation < bob at computerisms.ca> wrote: > Hi Artem and others, > > Happy to report the system has been relatively stable for the remainder > of the week. I have one wordpress site that seems to get hung processes > when someone logs in with an incorrect password. Since it is only one, > and reliably reproduceable, I am not sure if the issue is to do with > Gluster or Wordpress itself, but afaik it was not doing it some months > back before the system was using Gluster so I am guessing some combo of > both. > > Regardless, that is the one and only time apache processes stacked up to > over 150, and that still only brought the load average up to just under > 25; the system did go a bit sluggish, but remained fairly responsive > throughout until I restarted apache. Otherwise 15 minute load average > consistently runs between 8 and 11 during peak hours and between 4 and 7 > during off hours, and other than the one time I have not seen the > one-minute load average go over 15. all resources still spike to full > capacity from time to time, but it never remains that way for long like > it did before. > > For site responsiveness, first visit to any given site is quite slow, > like 3-5 seconds on straight html pages, 10-15 seconds for some of the > more bloated WP themes, but clicking links within the site after the > first page is loaded is relatively quick, like 1 second on straight html > pages, and ~5-6 seconds on the bloated themes. Again, not sure if that > is a Gluster related thing or something else. > > So, still holding my breath a bit, but seems this solution is working, > at least for me. I haven't played with any of the other settings yet to > see if I can improve it further, probably will next week. thinking to > increase the write behind window size further to see what happens, as > well as play with the settings suggested by Strahil. > > On 2020-08-05 5:28 p.m., Artem Russakovskii wrote: > > I'm very curious whether these improvements hold up over the next few > > days. Please report back. > > > > Sincerely, > > Artem > > > > -- > > Founder, Android Police , APK Mirror > > , Illogical Robot LLC > > beerpla.net | @ArtemR > > > > > > On Wed, Aug 5, 2020 at 9:44 AM Computerisms Corporation > > > wrote: > > > > Hi List, > > > > > So, we just moved into a quieter time of the day, but maybe I just > > > stumbled onto something. I was trying to figure out if/how I > could > > > throw more RAM at the problem. gluster docs says write behind is > > not a > > > cache unless flush-behind is on. So seems that is a way to throw > > ram to > > > it? I put performance.write-behind-window-size: 512MB and > > > performance.flush-behind: on and the whole system calmed down > pretty > > > much immediately. could be just timing, though, will have to see > > > tomorrow during business hours whether the system stays at a > > reasonable > > > load. > > > > so reporting back that this seems to have definitely had a > significant > > positive effect. > > > > So far today I have not seen the load average climb over 13 with the > > 15minute average hovering around 7. cpus are still spiking from > > time to > > time, but they are not staying maxed out all the time, and > frequently I > > am seeing brief periods of up to 80% idle. glusterfs process still > > spiking up to 180% or so, but consistently running around 70%, and > the > > brick processes still spiking up to 70-80%, but consistently running > > around 20%. Disk has only been above 50% in atop once so far today > > when > > it spiked up to 92%, and still lots of RAM left over. So far nload > > even > > seems indicates I could get away with a 100Mbit network connection. > > Websites are snappy relative to what they were, still a bit sluggish > on > > the first page of any given site, but tolerable or close to. Apache > > processes are opening and closing right away, instead of stacking up. > > > > Overall, system is performing pretty much like I would expect it to > > without gluster. I haven't played with any of the other settings > yet, > > just going to leave it like this for a day. > > > > I have to admit I am a little bit suspicious. I have been arguing > with > > Gluster for a very long time, and I have never known it to play this > > nice. kind feels like when your girl tells you she is "fine"; > > conversation has stopped, but you aren't really sure if it's done... > > > > > > > > I will still test the other options you suggested tonight, > > though, this > > > is probably too good to be true. > > > > > > Can't thank you enough for your input, Strahil, your help is truly > > > appreciated! > > > > > > > > > > > > > > > > > > > > >> > > >>>> > > >>>> > > >>>> Best Regards, > > >>>> Strahil Nikolov > > >>>> > > >>> ________ > > >>> > > >>> > > >>> > > >>> Community Meeting Calendar: > > >>> > > >>> Schedule - > > >>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > > >>> Bridge: https://bluejeans.com/441850968 > > >>> > > >>> Gluster-users mailing list > > >>> Gluster-users at gluster.org > > >>> https://lists.gluster.org/mailman/listinfo/gluster-users > > > ________ > > > > > > > > > > > > Community Meeting Calendar: > > > > > > Schedule - > > > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > > > Bridge: https://bluejeans.com/441850968 > > > > > > Gluster-users mailing list > > > Gluster-users at gluster.org > > > https://lists.gluster.org/mailman/listinfo/gluster-users > > ________ > > > > > > > > Community Meeting Calendar: > > > > Schedule - > > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > > Bridge: https://bluejeans.com/441850968 > > > > Gluster-users mailing list > > Gluster-users at gluster.org > > https://lists.gluster.org/mailman/listinfo/gluster-users > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gilberto.nunes32 at gmail.com Wed Aug 12 22:20:10 2020 From: gilberto.nunes32 at gmail.com (Gilberto Nunes) Date: Wed, 12 Aug 2020 19:20:10 -0300 Subject: [Gluster-users] Pending healing... In-Reply-To: References: Message-ID: Well after a couple of hours the healing process was completed.... Thanks anyway. Em qua, 12 de ago de 2020 19:04, Artem Russakovskii escreveu: > Remove the "summary" part of the command, which should list the exact file > pending heal. > > Then launch the heal manually. If it still doesn't heal, try running > md5sum on the file and see if it heals after that. > > Sincerely, > Artem > > -- > Founder, Android Police , APK Mirror > , Illogical Robot LLC > beerpla.net | @ArtemR > > > On Fri, Aug 7, 2020 at 11:03 AM Gilberto Nunes > wrote: > >> Hi >> >> I have a pending entry like this >> >> gluster vol heal VMS info summary >> Brick glusterfs01:/DATA/vms >> Status: Connected >> Total Number of entries: 1 >> Number of entries in heal pending: 1 >> Number of entries in split-brain: 0 >> Number of entries possibly healing: 0 >> >> Brick glusterfs02:/DATA/vms >> Status: Connected >> Total Number of entries: 1 >> Number of entries in heal pending: 1 >> Number of entries in split-brain: 0 >> Number of entries possibly healing: 0 >> >> How can I solve this? >> Should I follow this? >> >> >> https://icicimov.github.io/blog/high-availability/GlusterFS-metadata-split-brain-recovery/ >> >> >> >> --- >> Gilberto Nunes Ferreira >> >> >> >> >> ________ >> >> >> >> Community Meeting Calendar: >> >> Schedule - >> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >> Bridge: https://bluejeans.com/441850968 >> >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dm at belkam.com Thu Aug 13 02:23:12 2020 From: dm at belkam.com (Dmitry Melekhov) Date: Thu, 13 Aug 2020 06:23:12 +0400 Subject: [Gluster-users] gluster over vdo, problem with gfapi In-Reply-To: <27761A0B-DF92-4CFC-8F69-8125928856F1@yahoo.com> References: <0896ab00-bc2f-6ec4-ab21-5800006c2947@belkam.com> <0da4807e-b342-ee5b-d43c-5da882c05315@belkam.com> <637AF5D4-E2FE-435B-AD2E-B3295DC7E719@yahoo.com> <27761A0B-DF92-4CFC-8F69-8125928856F1@yahoo.com> Message-ID: 12.08.2020 23:25, Strahil Nikolov ?????: > I am not sure that it is ok to use any caching (at least ovirt doesn't uses) . > > Have you set the 'virt' group of settings ? They seem to be optimal , but keep in mind that if you enable them -> you will enable sharding which cannot be 'disabled' afterwards. Sorry, I don't follow, as I said everything works until we set cache=none or cache=directsync in libvirt, i.e. there is no relation with other gluster settings. > > The fact that it works on C7 is strange, with wifh version of gluster did you test. > > Dunno, we run qemu on the same host as gluster itself, so we have the same gfapi version as gluster server. From hunter86_bg at yahoo.com Thu Aug 13 03:31:27 2020 From: hunter86_bg at yahoo.com (Strahil Nikolov) Date: Thu, 13 Aug 2020 06:31:27 +0300 Subject: [Gluster-users] gluster over vdo, problem with gfapi In-Reply-To: References: <0896ab00-bc2f-6ec4-ab21-5800006c2947@belkam.com> <0da4807e-b342-ee5b-d43c-5da882c05315@belkam.com> <637AF5D4-E2FE-435B-AD2E-B3295DC7E719@yahoo.com> <27761A0B-DF92-4CFC-8F69-8125928856F1@yahoo.com> Message-ID: <12B971B2-2BB7-43F1-8495-25410605BC58@yahoo.com> ?? 13 ?????? 2020 ?. 5:23:12 GMT+03:00, Dmitry Melekhov ??????: > >12.08.2020 23:25, Strahil Nikolov ?????: >> I am not sure that it is ok to use any caching (at least ovirt >doesn't uses) . >> >> Have you set the 'virt' group of settings ? They seem to be optimal >, but keep in mind that if you enable them -> you will enable sharding > which cannot be 'disabled' afterwards. > > >Sorry, I don't follow, as I said everything works until we set >cache=none or cache=directsync in libvirt, > >i.e. there is no relation with other gluster settings. > >> >> The fact that it works on C7 is strange, with wifh version of >gluster did you test. >> >> >Dunno, we run qemu on the same host as gluster itself, so we have the >same gfapi version as gluster server. I ment dis you use C7 with Gluster 7 (or older) or C7 with the new Gluster 8. Anyways, if it worked before , it should run now - open an issue in github and I guess someone from the devs will take a look. Best Regards, Strahil Nikolov From sacchi at kadalu.io Thu Aug 13 03:59:57 2020 From: sacchi at kadalu.io (Sachidananda Urs) Date: Thu, 13 Aug 2020 09:29:57 +0530 Subject: [Gluster-users] Monitoring tools for GlusterFS In-Reply-To: <54415920-F5A1-43BD-B55B-1E25B9C9C069@yahoo.com> References: <58f109a7-6d62-4814-425d-7728ea4f8338@fischer-ka.de> <54415920-F5A1-43BD-B55B-1E25B9C9C069@yahoo.com> Message-ID: On Thu, Aug 13, 2020 at 12:58 AM Strahil Nikolov wrote: > I couldn't make it work on C8... > Maybe I was cloning the wrong branch. > > Details can be found at > https://github.com/gluster/gstatus/issues/30#issuecomment-673041743 I have commented on the issue: https://github.com/gluster/gstatus/issues/30#issuecomment-673238987 These are the steps: $ git clone https://github.com/gluster/gstatus.git $ cd gstatus $ VERSION=1.0.0 make gen-version # python3 setup.py install make gen-version will create version.py Thanks, sac -------------- next part -------------- An HTML attachment was scrubbed... URL: From dm at belkam.com Thu Aug 13 04:14:09 2020 From: dm at belkam.com (Dmitry Melekhov) Date: Thu, 13 Aug 2020 08:14:09 +0400 Subject: [Gluster-users] gluster over vdo, problem with gfapi In-Reply-To: <12B971B2-2BB7-43F1-8495-25410605BC58@yahoo.com> References: <0896ab00-bc2f-6ec4-ab21-5800006c2947@belkam.com> <0da4807e-b342-ee5b-d43c-5da882c05315@belkam.com> <637AF5D4-E2FE-435B-AD2E-B3295DC7E719@yahoo.com> <27761A0B-DF92-4CFC-8F69-8125928856F1@yahoo.com> <12B971B2-2BB7-43F1-8495-25410605BC58@yahoo.com> Message-ID: 13.08.2020 07:31, Strahil Nikolov ?????: > I ment dis you use C7 with Gluster 7 (or older) or C7 with the new Gluster 8. Frankly, I don't know what you mean by C7.. :-( > > Anyways, > if it worked before , it should run now - open an issue in github and I guess someone from the devs will take a look. > No, it never worked... But opening issue is good idea, thank you! From dm at belkam.com Thu Aug 13 04:19:49 2020 From: dm at belkam.com (dm) Date: Thu, 13 Aug 2020 08:19:49 +0400 Subject: [Gluster-users] gluster over vdo, problem with gfapi In-Reply-To: References: <0896ab00-bc2f-6ec4-ab21-5800006c2947@belkam.com> <0da4807e-b342-ee5b-d43c-5da882c05315@belkam.com> <637AF5D4-E2FE-435B-AD2E-B3295DC7E719@yahoo.com> <27761A0B-DF92-4CFC-8F69-8125928856F1@yahoo.com> <12B971B2-2BB7-43F1-8495-25410605BC58@yahoo.com> Message-ID: <99d6ee01-14cf-7cfe-7d70-0e17474fff0f@belkam.com> btw, all I wrote before was about raw file format, if it is qcow2 then, using gfapi: ?virsh create /kvmconf/stewjon.xml error: Failed to create domain from /kvmconf/stewjon.xml error: internal error: process exited while connecting to monitor: [2020-08-13 04:17:37.326933] E [MSGID: 108006] [afr-common.c:6073:__afr_handle_child_down_event] 0-pool-replicate-0: All subvolumes are down. Going offline until at least one of them comes back up. [2020-08-13 04:17:47.220840] I [io-stats.c:4054:fini] 0-pool: io-stats translator unloaded 2020-08-13T04:17:47.222064Z qemu-kvm: -drive file=gluster://127.0.0.1:24007/pool/stewjon.qcow2,file.debug=4,format=qcow2,if=none,id=drive-virtio-disk0,cache=directsync: Could not read qcow2 header: Invalid argument very interesting... only problem here- should I report this to qemu, gluster or vdo? :-( 13.08.2020 08:14, Dmitry Melekhov ?????: > 13.08.2020 07:31, Strahil Nikolov ?????: >> I ment dis you use C7 with Gluster 7 (or older) or C7 with the new >> Gluster 8. > Frankly, I don't know what you mean by C7.. :-( >> >> Anyways, >> if it worked before , it should run now - open an issue in github and >> I guess someone? from the devs will take a? look. >> > > No, it never worked... > > > But opening issue is good idea, thank you! > > From hunter86_bg at yahoo.com Thu Aug 13 06:03:25 2020 From: hunter86_bg at yahoo.com (Strahil Nikolov) Date: Thu, 13 Aug 2020 09:03:25 +0300 Subject: [Gluster-users] gluster over vdo, problem with gfapi In-Reply-To: References: <0896ab00-bc2f-6ec4-ab21-5800006c2947@belkam.com> <0da4807e-b342-ee5b-d43c-5da882c05315@belkam.com> <637AF5D4-E2FE-435B-AD2E-B3295DC7E719@yahoo.com> <27761A0B-DF92-4CFC-8F69-8125928856F1@yahoo.com> <12B971B2-2BB7-43F1-8495-25410605BC58@yahoo.com> Message-ID: C7 -> CentOS7 Just try with the virt group enabled on a test setup . Best Regards, Strahil Nikolov ?? 13 ?????? 2020 ?. 7:14:09 GMT+03:00, Dmitry Melekhov ??????: >13.08.2020 07:31, Strahil Nikolov ?????: >> I ment dis you use C7 with Gluster 7 (or older) or C7 with the new >Gluster 8. >Frankly, I don't know what you mean by C7.. :-( >> >> Anyways, >> if it worked before , it should run now - open an issue in github and >I guess someone from the devs will take a look. >> > >No, it never worked... > > >But opening issue is good idea, thank you! From hunter86_bg at yahoo.com Thu Aug 13 06:06:03 2020 From: hunter86_bg at yahoo.com (Strahil Nikolov) Date: Thu, 13 Aug 2020 09:06:03 +0300 Subject: [Gluster-users] gluster over vdo, problem with gfapi In-Reply-To: <99d6ee01-14cf-7cfe-7d70-0e17474fff0f@belkam.com> References: <0896ab00-bc2f-6ec4-ab21-5800006c2947@belkam.com> <0da4807e-b342-ee5b-d43c-5da882c05315@belkam.com> <637AF5D4-E2FE-435B-AD2E-B3295DC7E719@yahoo.com> <27761A0B-DF92-4CFC-8F69-8125928856F1@yahoo.com> <12B971B2-2BB7-43F1-8495-25410605BC58@yahoo.com> <99d6ee01-14cf-7cfe-7d70-0e17474fff0f@belkam.com> Message-ID: <303469F3-1CDC-4029-AEB9-DFBD6FB3D5A2@yahoo.com> I don't think it is VDO, but I can be wrong. My ovirt setup is VDO + Gluster v7.7 + CentOS 7.8 . I tested libgfapi a long time ago and it worked. If you wish you can ask in the ovirt users' mailing list how qemu is using libgfapi. Best Regards, Strahil Nikolov ?? 13 ?????? 2020 ?. 7:19:49 GMT+03:00, dm ??????: >btw, all I wrote before was about raw file format, >if it is qcow2 then, using gfapi: > > > ?virsh create /kvmconf/stewjon.xml >error: Failed to create domain from /kvmconf/stewjon.xml >error: internal error: process exited while connecting to monitor: >[2020-08-13 04:17:37.326933] E [MSGID: 108006] >[afr-common.c:6073:__afr_handle_child_down_event] 0-pool-replicate-0: >All subvolumes are down. Going offline until at least one of them comes > >back up. >[2020-08-13 04:17:47.220840] I [io-stats.c:4054:fini] 0-pool: io-stats >translator unloaded >2020-08-13T04:17:47.222064Z qemu-kvm: -drive >file=gluster://127.0.0.1:24007/pool/stewjon.qcow2,file.debug=4,format=qcow2,if=none,id=drive-virtio-disk0,cache=directsync: > >Could not read qcow2 header: Invalid argument > >very interesting... > >only problem here- should I report this to qemu, gluster or vdo? :-( > > >13.08.2020 08:14, Dmitry Melekhov ?????: >> 13.08.2020 07:31, Strahil Nikolov ?????: >>> I ment dis you use C7 with Gluster 7 (or older) or C7 with the new >>> Gluster 8. >> Frankly, I don't know what you mean by C7.. :-( >>> >>> Anyways, >>> if it worked before , it should run now - open an issue in github >and >>> I guess someone? from the devs will take a? look. >>> >> >> No, it never worked... >> >> >> But opening issue is good idea, thank you! >> >> From dm at belkam.com Thu Aug 13 06:12:55 2020 From: dm at belkam.com (Dmitry Melekhov) Date: Thu, 13 Aug 2020 10:12:55 +0400 Subject: [Gluster-users] gluster over vdo, problem with gfapi In-Reply-To: <303469F3-1CDC-4029-AEB9-DFBD6FB3D5A2@yahoo.com> References: <0896ab00-bc2f-6ec4-ab21-5800006c2947@belkam.com> <0da4807e-b342-ee5b-d43c-5da882c05315@belkam.com> <637AF5D4-E2FE-435B-AD2E-B3295DC7E719@yahoo.com> <27761A0B-DF92-4CFC-8F69-8125928856F1@yahoo.com> <12B971B2-2BB7-43F1-8495-25410605BC58@yahoo.com> <99d6ee01-14cf-7cfe-7d70-0e17474fff0f@belkam.com> <303469F3-1CDC-4029-AEB9-DFBD6FB3D5A2@yahoo.com> Message-ID: 13.08.2020 10:06, Strahil Nikolov ?????: > I don't think it is VDO, but I can be wrong. > > My ovirt setup is VDO + Gluster v7.7 + CentOS 7.8 . I tested libgfapi a long time ago and it worked. > If you wish you can ask in the ovirt users' mailing list how qemu is using libgfapi. > > As I wrote without vdo everything works just fine on the same server. And it works with vdo over fuse mount in any case and with qfapi if cache settings is default. I don't think that ovirt mail list is right place to ask- we don't use ovirt. Thank you! P.S. We decided to wait some replies here and then open issue in gluster... Thank you! From kees.dejong+lst at neobits.nl Thu Aug 13 06:13:03 2020 From: kees.dejong+lst at neobits.nl (K. de Jong) Date: Thu, 13 Aug 2020 08:13:03 +0200 Subject: [Gluster-users] 4 node cluster (best performance + redundancy setup?) Message-ID: I posted something in the subreddit [1], but I saw the suggestion elsewhere that the mailinglist is more active. I've been reading the docs. And from this [2] overview the distributed replicated [3] and dispersed + redundancy [4] sound the most interesting. Each node (Raspberry Pi 4, 2x 8GB and 2x 4GB version) has a 4TB HDD disk attached via a docking station. I'm still waiting for the 4th Raspberry Pi, so I can't really experiment with the intended setup. But the setup of 2 replicas and 1 arbiter was quite disappointing. I got between 6MB/s and 60 MB/s, depending on the test (I did a broad range of tests with bonnie++ and simply dd). Without GlusterFS a simple dd of a 1GB file is about 100+ MB/s. 100MB/s is okay for this cluster. My goal is the following: * Run a HA environment with Pacemaker (services like Nextcloud, Dovecot, Apache). * One node should be able to fail without downtime. * Performance and storage efficiency should be reasonable with the given hardware. So with that I mean, when everything is a replica then storage is stuck at 4TB. And I would prefer to have some more than that limitation, but with redundancy. However, when reading the docs about disperse, I see some interesting points. A big pro is "providing space-efficient protection against disk or server failures". But the following is interesting as well: "The total number of bricks must be greater than 2 * redundancy". So, I want the cluster to be available when one node fails. And be able to recreate the data on a new disk, on that forth node. I also read about the RMW efficiency, I guess 2 sets of 2 is the only thing that will work with that performance and disk efficiency in mind. Because 1 redundancy would mess up the RMW cycle. My questions: * With 4 nodes; is it possible to use disperse and redundancy? And is a redundancy count of 2 the best (and only) choice when dealing with 4 disks? * The example does show a 4 node disperse command, but has as output `There isn't an optimal redundancy value for this configuration. Do you want to create the volume with redundancy 1 ? (y/n)`. I'm not sure if it's okay to simply select 'y' as an answer. The output is a bit vague, because it says it's not optimal, so it will be just slow, but will work I guess? * The RMW (Read-Modify-Write) cycle is probably what's meant. 512 * (#Bricks - redundancy) would be in this case for me 512 * (4-1) = 1536 byes, which doesn't seem optimal, because it's a weird number, it's not a power of 2 (512, 1024, 2048, etc.). Choosing a replica of 2 would translate to 1024, which would seem more "okay". But I don't know for sure. * Or am I better off by simply creating 2 pairs of replicas (so no disperse)? So in that sense I would have 8TB available, and one node can fail. This would provide some read performance benefits. * What would be a good way to integrate this with Pacemaker? With that I mean, should I manage the gluster resource with Pacemaker? Or simply try to mount the glusterfs, if it's not available, then depending resources can't start anyway. So in other words, let glusterfs handle failover itself. Any advice/tips? [1] [2] [3] [4] -------------- next part -------------- An HTML attachment was scrubbed... URL: From aspandey at redhat.com Thu Aug 13 09:49:51 2020 From: aspandey at redhat.com (Ashish Pandey) Date: Thu, 13 Aug 2020 05:49:51 -0400 (EDT) Subject: [Gluster-users] 4 node cluster (best performance + redundancy setup?) In-Reply-To: References: Message-ID: <264420059.37752007.1597312191618.JavaMail.zimbra@redhat.com> ----- Original Message ----- From: "K. de Jong" To: gluster-users at gluster.org Sent: Thursday, August 13, 2020 11:43:03 AM Subject: [Gluster-users] 4 node cluster (best performance + redundancy setup?) I posted something in the subreddit [1], but I saw the suggestion elsewhere that the mailinglist is more active. I've been reading the docs. And from this [2] overview the distributed replicated [3] and dispersed + redundancy [4] sound the most interesting. Each node (Raspberry Pi 4, 2x 8GB and 2x 4GB version) has a 4TB HD disk attached via a docking station. I'm still waiting for the 4th Raspberry Pi, so I can't really experiment with the intended setup. But the setup of 2 replicas and 1 arbiter was quite disappointing. I got between 6MB/s and 60 MB/s, depending on the test (I did a broad range of tests with bonnie++ and simply dd). Without GlusterFS a simple dd of a 1GB file is about 100+ MB/s. 100MB/s is okay for this cluster. My goal is the following: * Run a HA environment with Pacemaker (services like Nextcloud, Dovecot, Apache). * One node should be able to fail without downtime. * Performance and storage efficiency should be reasonable with the given hardware. So with that I mean, when everything is a replica then storage is stuck at 4TB. And I would prefer to have some more than that limitation, but with redundancy. However, when reading the docs about disperse, I see some interesting points. A big pro is "providing space-efficient protection against disk or server failures". But the following is interesting as well: "The total number of bricks must be greater than 2 * redundancy". So, I want the cluster to be available when one node fails. And be able to recreate the data on a new disk, on that forth node. I also read about the RMW efficiency, I guess 2 sets of 2 is the only thing that will work with that performance and disk efficiency in mind. Because 1 redundancy would mess up the RMW cycle. My questions: * With 4 nodes; is it possible to use disperse and redundancy? And is a redundancy count of 2 the best (and only) choice when dealing with 4 disks? With 4 nodes, yes it is possible to use disperse volume. Redundancy count 2 is not the best but most often used as far as my interaction with users. disperse volume with 4 bricks is also possible but it might not be a best configuration. I would suggest to have 6 bricks and 4 +2 configuration where 4 - Data bricks and 2 - Redundant bricks, in other way maximum number of brick which can go bad while you can still use disperse volume. If you have number of disks on 4 nodes, you can create the 4 +2 disperse volume in different way while maintaining the requirenment of EC (disperse volume) * The example does show a 4 node disperse command, but has as output `There isn't an optimal redundancy value for this configuration. Do you want to create the volume with redundancy 1 ? (y/n)`. I'm not sure if it's okay to simply select 'y' as an answer. The output is a bit vague, because it says it's not optimal, so it will be just slow, but will work I guess? It will not be optimal from the point of view of calculation which we make. You want to have a best configuration where yu can have maximum redundancy (failure tolerance) and also maximum storage capacity. In that regards, it will not be an optimal solution. Performance can also be a factor. * The RMW (Read-Modify-Write) cycle is probably what's meant. 512 * (#Bricks - redundancy) would be in this case for me 512 * (4-1) = 1536 byes, which doesn't seem optimal, because it's a weird number, it's not a power of 2 (512, 1024, 2048, etc.). Choosing a replica of 2 would translate to 1024, which would seem more "okay". But I don't know for sure. Yes, you are right. * Or am I better off by simply creating 2 pairs of replicas (so no disperse)? So in that sense I would have 8TB available, and one node can fail. This would provide some read performance benefits. * What would be a good way to integrate this with Pacemaker? With that I mean, should I manage the gluster resource with Pacemaker? Or simply try to mount the glusterfs, if it's not available, then depending resources can't start anyway. So in other words, let glusterfs handle failover itself. gluster can handle fail over on replica or disperse level as per its implementation. Even if you want to go for replica, it does not replica 2 does not look like a best option, you should go for replica 3 or arbiter volume to have best fault tolerance. However, that will cost you a big storage capacity. Any advice/tips? [1] https://www.reddit.com/r/gluster/comments/i8ifdd/4_node_cluster_best_performance_redundancy_setup/ [2] https://docs.gluster.org/en/latest/Administrator%20Guide/Setting%20Up%20Volumes/ [3] https://docs.gluster.org/en/latest/Administrator%20Guide/Setting%20Up%20Volumes/#creating-distributed-replicated-volumes [4] https://docs.gluster.org/en/latest/Administrator%20Guide/Setting%20Up%20Volumes/#creating-distributed-dispersed-volumes ________ Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users at gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From joao.bauto at neuro.fchampalimaud.org Fri Aug 14 01:35:14 2020 From: joao.bauto at neuro.fchampalimaud.org (=?UTF-8?B?Sm/Do28gQmHDunRv?=) Date: Fri, 14 Aug 2020 02:35:14 +0100 Subject: [Gluster-users] Wrong directory quota usage Message-ID: Hi all, We have a 4-node distributed cluster with 2 bricks per node running Gluster 7.7 + ZFS. We use directory quota to limit the space used by our members on each project. Two days ago we noticed inconsistent space used reported by Gluster in the quota list. A small snippet of gluster volume quota vol list, Path Hard-limit Soft-limit Used Available Soft-limit exceeded? Hard-limit exceeded? /projectA 5.0TB 80%(4.0TB) 3.1TB 1.9TB No No */projectB 100.0TB 80%(80.0TB) 16383.4PB 740.9TB No No* /projectC 70.0TB 80%(56.0TB) 50.0TB 20.0TB No No The total space available in the cluster is 360TB, the quota for projectB is 100TB and, as you can see, its reporting 16383.4PB used and 740TB available (already decreased from 750TB). There was an issue in Gluster 3.x related to the wrong directory quota ( https://lists.gluster.org/pipermail/gluster-users/2016-February/025305.html and https://lists.gluster.org/pipermail/gluster-users/2018-November/035374.html) but it's marked as solved (not sure if the solution still applies). *On projectB* # getfattr -d -m . -e hex projectB # file: projectB trusted.gfid=0x3ca2bce0455945efa6662813ce20fc0c trusted.glusterfs.9582685f-07fa-41fd-b9fc-ebab3a6989cf.xtime=0x5f35e69800098ed9 trusted.glusterfs.dht=0xe1a4060c000000003ffffffe5ffffffc trusted.glusterfs.mdata=0x010000000000000000000000005f355c59000000000939079f000000005ce2aff90000000007fdacb0000000005ce2aff90000000007fdacb0 trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri.1=0x0000ab0f227a860000000000478e33acffffffffffffc112 trusted.glusterfs.quota.dirty=0x3000 trusted.glusterfs.quota.limit-set.1=0x0000640000000000ffffffffffffffff trusted.glusterfs.quota.size.1=0x0000ab0f227a860000000000478e33acffffffffffffc112 *On projectA* # getfattr -d -m . -e hex projectA # file: projectA trusted.gfid=0x05b09ded19354c0eb544d22d4659582e trusted.glusterfs.9582685f-07fa-41fd-b9fc-ebab3a6989cf.xtime=0x5f1aeb9f00044c64 trusted.glusterfs.dht=0xe1a4060c000000001fffffff3ffffffd trusted.glusterfs.mdata=0x010000000000000000000000005f1ac6a10000000018f30a4e000000005c338fab0000000017a3135a000000005b0694fb000000001584a21b trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri.1=0x0000067de3bbe20000000000000128610000000000033498 trusted.glusterfs.quota.dirty=0x3000 trusted.glusterfs.quota.limit-set.1=0x0000460000000000ffffffffffffffff trusted.glusterfs.quota.size.1=0x0000067de3bbe20000000000000128610000000000033498 Any idea on what's happening and how to fix it? Thanks! *Jo?o Ba?to* --------------- *Scientific Computing and Software Platform* Champalimaud Research Champalimaud Center for the Unknown Av. Bras?lia, Doca de Pedrou?os 1400-038 Lisbon, Portugal fchampalimaud.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From gilberto.nunes32 at gmail.com Fri Aug 14 04:34:05 2020 From: gilberto.nunes32 at gmail.com (Gilberto Nunes) Date: Fri, 14 Aug 2020 01:34:05 -0300 Subject: [Gluster-users] Monitoring tools for GlusterFS In-Reply-To: References: <58f109a7-6d62-4814-425d-7728ea4f8338@fischer-ka.de> Message-ID: Hi Could you improve the output to show "Possibly undergoing heal" as well? gluster vol heal VMS info Brick gluster01:/DATA/vms Status: Connected Number of entries: 0 Brick gluster02:/DATA/vms /images/100/vm-100-disk-0.raw - Possibly undergoing heal Status: Connected Number of entries: 1 Thanks --- Gilberto Nunes Ferreira Em qua., 12 de ago. de 2020 ?s 13:52, Sachidananda Urs escreveu: > > > On Sun, Aug 9, 2020 at 10:43 PM Gilberto Nunes > wrote: > >> How did you deploy it ? - git clone, ./gstatus.py, and python gstatus.py >> install then gstatus >> >> What is your gluster version ? Latest stable to Debian Buster (v8) >> >> >> > Hello Gilberto. I just made a 1.0.0 release. > gstatus binary is available to download from (requires python >= 3.6) > https://github.com/gluster/gstatus/releases/tag/v1.0.0 > > You can find the complete documentation here: > https://github.com/gluster/gstatus/blob/master/README > > Follow the below steps for a quick method to test it out: > > # curl -LO > https://github.com/gluster/gstatus/releases/download/v1.0.0/gstatus > > # chmod +x gstatus > > # ./gstatus -a > # ./gstatus --help > > If you like what you see. You can move it to /usr/local/bin > > Would like to hear your feedback. Any feature requests/bugs/PRs are > welcome. > > -sac > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hunter86_bg at yahoo.com Fri Aug 14 09:16:44 2020 From: hunter86_bg at yahoo.com (Strahil Nikolov) Date: Fri, 14 Aug 2020 12:16:44 +0300 Subject: [Gluster-users] Wrong directory quota usage In-Reply-To: References: Message-ID: Hi Jo?o, Based on your output it seems that the quota size is different on the 2 bricks. Have you tried to remove the quota and then recreate it ? Maybe it will be the easiest way to fix it. Best Regards, Strahil Nikolov ?? 14 ?????? 2020 ?. 4:35:14 GMT+03:00, "Jo?o Ba?to" ??????: >Hi all, > >We have a 4-node distributed cluster with 2 bricks per node running >Gluster >7.7 + ZFS. We use directory quota to limit the space used by our >members on >each project. Two days ago we noticed inconsistent space used reported >by >Gluster in the quota list. > >A small snippet of gluster volume quota vol list, > > Path Hard-limit Soft-limit Used >Available Soft-limit exceeded? Hard-limit exceeded? >/projectA 5.0TB 80%(4.0TB) 3.1TB 1.9TB > No No >*/projectB 100.0TB 80%(80.0TB) 16383.4PB 740.9TB > No No* >/projectC 70.0TB 80%(56.0TB) 50.0TB 20.0TB > No No > >The total space available in the cluster is 360TB, the quota for >projectB >is 100TB and, as you can see, its reporting 16383.4PB used and 740TB >available (already decreased from 750TB). > >There was an issue in Gluster 3.x related to the wrong directory quota >( >https://lists.gluster.org/pipermail/gluster-users/2016-February/025305.html > and >https://lists.gluster.org/pipermail/gluster-users/2018-November/035374.html) >but it's marked as solved (not sure if the solution still applies). > >*On projectB* ># getfattr -d -m . -e hex projectB ># file: projectB >trusted.gfid=0x3ca2bce0455945efa6662813ce20fc0c >trusted.glusterfs.9582685f-07fa-41fd-b9fc-ebab3a6989cf.xtime=0x5f35e69800098ed9 >trusted.glusterfs.dht=0xe1a4060c000000003ffffffe5ffffffc >trusted.glusterfs.mdata=0x010000000000000000000000005f355c59000000000939079f000000005ce2aff90000000007fdacb0000000005ce2aff90000000007fdacb0 >trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri.1=0x0000ab0f227a860000000000478e33acffffffffffffc112 >trusted.glusterfs.quota.dirty=0x3000 >trusted.glusterfs.quota.limit-set.1=0x0000640000000000ffffffffffffffff >trusted.glusterfs.quota.size.1=0x0000ab0f227a860000000000478e33acffffffffffffc112 > >*On projectA* ># getfattr -d -m . -e hex projectA ># file: projectA >trusted.gfid=0x05b09ded19354c0eb544d22d4659582e >trusted.glusterfs.9582685f-07fa-41fd-b9fc-ebab3a6989cf.xtime=0x5f1aeb9f00044c64 >trusted.glusterfs.dht=0xe1a4060c000000001fffffff3ffffffd >trusted.glusterfs.mdata=0x010000000000000000000000005f1ac6a10000000018f30a4e000000005c338fab0000000017a3135a000000005b0694fb000000001584a21b >trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri.1=0x0000067de3bbe20000000000000128610000000000033498 >trusted.glusterfs.quota.dirty=0x3000 >trusted.glusterfs.quota.limit-set.1=0x0000460000000000ffffffffffffffff >trusted.glusterfs.quota.size.1=0x0000067de3bbe20000000000000128610000000000033498 > >Any idea on what's happening and how to fix it? > >Thanks! >*Jo?o Ba?to* >--------------- > >*Scientific Computing and Software Platform* >Champalimaud Research >Champalimaud Center for the Unknown >Av. Bras?lia, Doca de Pedrou?os >1400-038 Lisbon, Portugal >fchampalimaud.org From joao.bauto at neuro.fchampalimaud.org Fri Aug 14 11:39:49 2020 From: joao.bauto at neuro.fchampalimaud.org (=?UTF-8?B?Sm/Do28gQmHDunRv?=) Date: Fri, 14 Aug 2020 12:39:49 +0100 Subject: [Gluster-users] Wrong directory quota usage In-Reply-To: References: Message-ID: Hi Strahil, I have tried removing the quota for that specific directory and setting it again but it didn't work (maybe it has to be a quota disable and enable in the volume options). Currently testing a solution by Hari with the quota_fsck.py script (https://medium.com/@harigowtham/ glusterfs-quota-fix-accounting-840df33fcd3a) and its detecting a lot of size mismatch in files. Thank you, *Jo?o Ba?to* --------------- *Scientific Computing and Software Platform* Champalimaud Research Champalimaud Center for the Unknown Av. Bras?lia, Doca de Pedrou?os 1400-038 Lisbon, Portugal fchampalimaud.org Strahil Nikolov escreveu no dia sexta, 14/08/2020 ?(s) 10:16: > Hi Jo?o, > > Based on your output it seems that the quota size is different on the 2 > bricks. > > Have you tried to remove the quota and then recreate it ? Maybe it will be > the easiest way to fix it. > > Best Regards, > Strahil Nikolov > > > ?? 14 ?????? 2020 ?. 4:35:14 GMT+03:00, "Jo?o Ba?to" < > joao.bauto at neuro.fchampalimaud.org> ??????: > >Hi all, > > > >We have a 4-node distributed cluster with 2 bricks per node running > >Gluster > >7.7 + ZFS. We use directory quota to limit the space used by our > >members on > >each project. Two days ago we noticed inconsistent space used reported > >by > >Gluster in the quota list. > > > >A small snippet of gluster volume quota vol list, > > > > Path Hard-limit Soft-limit Used > >Available Soft-limit exceeded? Hard-limit exceeded? > >/projectA 5.0TB 80%(4.0TB) 3.1TB 1.9TB > > No No > >*/projectB 100.0TB 80%(80.0TB) 16383.4PB 740.9TB > > No No* > >/projectC 70.0TB 80%(56.0TB) 50.0TB 20.0TB > > No No > > > >The total space available in the cluster is 360TB, the quota for > >projectB > >is 100TB and, as you can see, its reporting 16383.4PB used and 740TB > >available (already decreased from 750TB). > > > >There was an issue in Gluster 3.x related to the wrong directory quota > >( > > > https://lists.gluster.org/pipermail/gluster-users/2016-February/025305.html > > and > > > https://lists.gluster.org/pipermail/gluster-users/2018-November/035374.html > ) > >but it's marked as solved (not sure if the solution still applies). > > > >*On projectB* > ># getfattr -d -m . -e hex projectB > ># file: projectB > >trusted.gfid=0x3ca2bce0455945efa6662813ce20fc0c > > >trusted.glusterfs.9582685f-07fa-41fd-b9fc-ebab3a6989cf.xtime=0x5f35e69800098ed9 > >trusted.glusterfs.dht=0xe1a4060c000000003ffffffe5ffffffc > > >trusted.glusterfs.mdata=0x010000000000000000000000005f355c59000000000939079f000000005ce2aff90000000007fdacb0000000005ce2aff90000000007fdacb0 > > >trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri.1=0x0000ab0f227a860000000000478e33acffffffffffffc112 > >trusted.glusterfs.quota.dirty=0x3000 > >trusted.glusterfs.quota.limit-set.1=0x0000640000000000ffffffffffffffff > > >trusted.glusterfs.quota.size.1=0x0000ab0f227a860000000000478e33acffffffffffffc112 > > > >*On projectA* > ># getfattr -d -m . -e hex projectA > ># file: projectA > >trusted.gfid=0x05b09ded19354c0eb544d22d4659582e > > >trusted.glusterfs.9582685f-07fa-41fd-b9fc-ebab3a6989cf.xtime=0x5f1aeb9f00044c64 > >trusted.glusterfs.dht=0xe1a4060c000000001fffffff3ffffffd > > >trusted.glusterfs.mdata=0x010000000000000000000000005f1ac6a10000000018f30a4e000000005c338fab0000000017a3135a000000005b0694fb000000001584a21b > > >trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri.1=0x0000067de3bbe20000000000000128610000000000033498 > >trusted.glusterfs.quota.dirty=0x3000 > >trusted.glusterfs.quota.limit-set.1=0x0000460000000000ffffffffffffffff > > >trusted.glusterfs.quota.size.1=0x0000067de3bbe20000000000000128610000000000033498 > > > >Any idea on what's happening and how to fix it? > > > >Thanks! > >*Jo?o Ba?to* > >--------------- > > > >*Scientific Computing and Software Platform* > >Champalimaud Research > >Champalimaud Center for the Unknown > >Av. Bras?lia, Doca de Pedrou?os > >1400-038 Lisbon, Portugal > >fchampalimaud.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sacchi at kadalu.io Fri Aug 14 13:55:05 2020 From: sacchi at kadalu.io (Sachidananda Urs) Date: Fri, 14 Aug 2020 19:25:05 +0530 Subject: [Gluster-users] Monitoring tools for GlusterFS In-Reply-To: References: <58f109a7-6d62-4814-425d-7728ea4f8338@fischer-ka.de> Message-ID: On Fri, Aug 14, 2020 at 10:04 AM Gilberto Nunes wrote: > Hi > Could you improve the output to show "Possibly undergoing heal" as well? > gluster vol heal VMS info > Brick gluster01:/DATA/vms > Status: Connected > Number of entries: 0 > > Brick gluster02:/DATA/vms > /images/100/vm-100-disk-0.raw - Possibly undergoing heal > Status: Connected > Number of entries: 1 > > We plan to add heal count. For example: Self-Heal: 456 pending files. Or something similar. If we list files, and if the number of files is high it takes a long time and fills the screen making it quite cumbersome. -sac > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gilberto.nunes32 at gmail.com Fri Aug 14 14:08:57 2020 From: gilberto.nunes32 at gmail.com (Gilberto Nunes) Date: Fri, 14 Aug 2020 11:08:57 -0300 Subject: [Gluster-users] Monitoring tools for GlusterFS In-Reply-To: References: <58f109a7-6d62-4814-425d-7728ea4f8338@fischer-ka.de> Message-ID: Yes! I see! For many small files is complicated... Here I am generally using 2 or 3 large files (VM disk images!)... I think this could be at least some progress bar or percent about the healing process... Some ETA, or similar... Otherwise the tool is nice and promising... Thanks anyway. --- Gilberto Nunes Ferreira Em sex., 14 de ago. de 2020 ?s 10:55, Sachidananda Urs escreveu: > > > On Fri, Aug 14, 2020 at 10:04 AM Gilberto Nunes < > gilberto.nunes32 at gmail.com> wrote: > >> Hi >> Could you improve the output to show "Possibly undergoing heal" as well? >> gluster vol heal VMS info >> Brick gluster01:/DATA/vms >> Status: Connected >> Number of entries: 0 >> >> Brick gluster02:/DATA/vms >> /images/100/vm-100-disk-0.raw - Possibly undergoing heal >> Status: Connected >> Number of entries: 1 >> >> > > We plan to add heal count. For example: > > Self-Heal: 456 pending files. > > Or something similar. If we list files, and if the number of files is high > it takes a long time and fills the screen making it quite cumbersome. > > -sac > >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthewb at uvic.ca Fri Aug 14 17:22:16 2020 From: matthewb at uvic.ca (Matthew Benstead) Date: Fri, 14 Aug 2020 10:22:16 -0700 Subject: [Gluster-users] Geo-replication causes OOM Message-ID: <7d66d907-0802-4aeb-961e-964dd401fe24@uvic.ca> Hi, We are building a new storage system, and after geo-replication has been running for a few hours the server runs out of memory and oom-killer starts killing bricks. It runs fine without geo-replication on, and the server has 64GB of RAM. I have stopped geo-replication for now. Any ideas what to tune? [root at storage01 ~]# gluster --version | head -1 glusterfs 7.7 [root at storage01 ~]# cat /etc/centos-release; uname -r CentOS Linux release 7.8.2003 (Core) 3.10.0-1127.10.1.el7.x86_64 [root at storage01 ~]# df -h /storage2/ Filesystem??????????? Size? Used Avail Use% Mounted on 10.0.231.91:/storage? 328T? 228T? 100T? 70% /storage2 [root at storage01 ~]# cat /proc/meminfo? | grep MemTotal MemTotal:?????? 65412064 kB [root at storage01 ~]# free -g ????????????? total??????? used??????? free????? shared buff/cache?? available Mem:???????????? 62????????? 18?????????? 0?????????? 0 43????????? 43 Swap:???????????? 3?????????? 0?????????? 3 [root at storage01 ~]# gluster volume info Volume Name: storage Type: Distributed-Replicate Volume ID: cf94a8f2-324b-40b3-bf72-c3766100ea99 Status: Started Snapshot Count: 0 Number of Bricks: 3 x (2 + 1) = 9 Transport-type: tcp Bricks: Brick1: 10.0.231.91:/data/storage_a/storage Brick2: 10.0.231.92:/data/storage_b/storage Brick3: 10.0.231.93:/data/storage_c/storage (arbiter) Brick4: 10.0.231.92:/data/storage_a/storage Brick5: 10.0.231.93:/data/storage_b/storage Brick6: 10.0.231.91:/data/storage_c/storage (arbiter) Brick7: 10.0.231.93:/data/storage_a/storage Brick8: 10.0.231.91:/data/storage_b/storage Brick9: 10.0.231.92:/data/storage_c/storage (arbiter) Options Reconfigured: changelog.changelog: on geo-replication.ignore-pid-check: on geo-replication.indexing: on network.ping-timeout: 10 features.inode-quota: on features.quota: on nfs.disable: on features.quota-deem-statfs: on storage.fips-mode-rchecksum: on performance.readdir-ahead: on performance.parallel-readdir: on cluster.lookup-optimize: on client.event-threads: 4 server.event-threads: 4 performance.cache-size: 256MB You can see the memory spike and reduce as bricks are killed - this happened twice in the graph below: You can see two brick processes are down: [root at storage01 ~]# gluster volume status Status of volume: storage Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.0.231.91:/data/storage_a/storage N/A N/A N N/A Brick 10.0.231.92:/data/storage_b/storage 49152 0 Y 1627 Brick 10.0.231.93:/data/storage_c/storage 49152 0 Y 259966 Brick 10.0.231.92:/data/storage_a/storage 49153 0 Y 1642 Brick 10.0.231.93:/data/storage_b/storage 49153 0 Y 259975 Brick 10.0.231.91:/data/storage_c/storage 49153 0 Y 20656 Brick 10.0.231.93:/data/storage_a/storage 49154 0 Y 259983 Brick 10.0.231.91:/data/storage_b/storage N/A N/A N N/A Brick 10.0.231.92:/data/storage_c/storage 49154 0 Y 1655 Self-heal Daemon on localhost N/A N/A Y 20690 Quota Daemon on localhost N/A N/A Y 172136 Self-heal Daemon on 10.0.231.93 N/A N/A Y 260010 Quota Daemon on 10.0.231.93 N/A N/A Y 128115 Self-heal Daemon on 10.0.231.92 N/A N/A Y 1702 Quota Daemon on 10.0.231.92 N/A N/A Y 128564 Task Status of Volume storage ------------------------------------------------------------------------------ There are no active volume tasks Logs: [2020-08-13 20:58:22.186540] I [MSGID: 106143] [glusterd-pmap.c:389:pmap_registry_remove] 0-pmap: removing brick (null) on port 49154 [2020-08-13 20:58:22.196110] I [MSGID: 106005] [glusterd-handler.c:5960:__glusterd_brick_rpc_notify] 0-management: Brick 10.0.231.91:/data/storage_b/storage has disconnected from glusterd. [2020-08-13 20:58:22.196752] I [MSGID: 106143] [glusterd-pmap.c:389:pmap_registry_remove] 0-pmap: removing brick /data/storage_b/storage on port 49154 [2020-08-13 21:05:23.418966] I [MSGID: 106143] [glusterd-pmap.c:389:pmap_registry_remove] 0-pmap: removing brick (null) on port 49152 [2020-08-13 21:05:23.420881] I [MSGID: 106005] [glusterd-handler.c:5960:__glusterd_brick_rpc_notify] 0-management: Brick 10.0.231.91:/data/storage_a/storage has disconnected from glusterd. [2020-08-13 21:05:23.421334] I [MSGID: 106143] [glusterd-pmap.c:389:pmap_registry_remove] 0-pmap: removing brick /data/storage_a/storage on port 49152 [Thu Aug 13 13:58:17 2020] Out of memory: Kill process 20664 (glusterfsd) score 422 or sacrifice child [Thu Aug 13 13:58:17 2020] Killed process 20664 (glusterfsd), UID 0, total-vm:32884384kB, anon-rss:29625096kB, file-rss:0kB, shmem-rss:0kB [Thu Aug 13 14:05:18 2020] Out of memory: Kill process 20647 (glusterfsd) score 467 or sacrifice child [Thu Aug 13 14:05:18 2020] Killed process 20647 (glusterfsd), UID 0, total-vm:36265116kB, anon-rss:32767744kB, file-rss:520kB, shmem-rss:0kB0 glustershd logs: [2020-08-13 20:58:22.181368] W [socket.c:775:__socket_rwv] 0-storage-client-7: readv on 10.0.231.91:49154 failed (No data available) [2020-08-13 20:58:22.185413] I [MSGID: 114018] [client.c:2347:client_rpc_notify] 0-storage-client-7: disconnected from storage-client-7. Client process will keep trying to connect to glusterd until brick's port is available [2020-08-13 20:58:25.211872] E [MSGID: 114058] [client-handshake.c:1455:client_query_portmap_cbk] 0-storage-client-7: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. [2020-08-13 20:58:25.211934] I [MSGID: 114018] [client.c:2347:client_rpc_notify] 0-storage-client-7: disconnected from storage-client-7. Client process will keep trying to connect to glusterd until brick's port is available [2020-08-13 21:00:28.386633] I [socket.c:865:__socket_shutdown] 0-storage-client-7: intentional socket shutdown(8) [2020-08-13 21:02:34.565373] I [socket.c:865:__socket_shutdown] 0-storage-client-7: intentional socket shutdown(8) [2020-08-13 21:02:58.000263] W [MSGID: 114031] [client-rpc-fops_v2.c:920:client4_0_getxattr_cbk] 0-storage-client-7: remote operation failed. Path: / (00000000-0000-0000-0000-000000000001). Key: trusted.glusterfs.pathinfo [Transport endpoint is not connected] [2020-08-13 21:02:58.000460] W [MSGID: 114029] [client-rpc-fops_v2.c:4469:client4_0_getxattr] 0-storage-client-7: failed to send the fop [2020-08-13 21:04:40.733823] I [socket.c:865:__socket_shutdown] 0-storage-client-7: intentional socket shutdown(8) [2020-08-13 21:05:23.418987] W [socket.c:775:__socket_rwv] 0-storage-client-0: readv on 10.0.231.91:49152 failed (No data available) [2020-08-13 21:05:23.419365] I [MSGID: 114018] [client.c:2347:client_rpc_notify] 0-storage-client-0: disconnected from storage-client-0. Client process will keep trying to connect to glusterd until brick's port is available [2020-08-13 21:05:26.423218] E [MSGID: 114058] [client-handshake.c:1455:client_query_portmap_cbk] 0-storage-client-0: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. [2020-08-13 21:06:46.919942] I [socket.c:865:__socket_shutdown] 0-storage-client-7: intentional socket shutdown(8) [2020-08-13 21:05:26.423274] I [MSGID: 114018] [client.c:2347:client_rpc_notify] 0-storage-client-0: disconnected from storage-client-0. Client process will keep trying to connect to glusterd until brick's port is available [2020-08-13 21:07:29.667896] I [socket.c:865:__socket_shutdown] 0-storage-client-0: intentional socket shutdown(8) [2020-08-13 21:08:05.660858] I [MSGID: 100041] [glusterfsd-mgmt.c:1111:glusterfs_handle_svc_attach] 0-glusterfs: received attach request for volfile-id=shd/storage [2020-08-13 21:08:05.660948] I [MSGID: 100040] [glusterfsd-mgmt.c:106:mgmt_process_volfile] 0-glusterfs: No change in volfile, continuing [2020-08-13 21:08:05.661326] I [rpc-clnt.c:1963:rpc_clnt_reconfig] 0-storage-client-7: changing port to 49154 (from 0) [2020-08-13 21:08:05.664638] I [MSGID: 114057] [client-handshake.c:1375:select_server_supported_programs] 0-storage-client-7: Using Program GlusterFS 4.x v1, Num (1298437), Version (400) [2020-08-13 21:08:05.665266] I [MSGID: 114046] [client-handshake.c:1105:client_setvolume_cbk] 0-storage-client-7: Connected to storage-client-7, attached to remote volume '/data/storage_b/storage'. [2020-08-13 21:08:05.713533] I [rpc-clnt.c:1963:rpc_clnt_reconfig] 0-storage-client-0: changing port to 49152 (from 0) [2020-08-13 21:08:05.716535] I [MSGID: 114057] [client-handshake.c:1375:select_server_supported_programs] 0-storage-client-0: Using Program GlusterFS 4.x v1, Num (1298437), Version (400) [2020-08-13 21:08:05.717224] I [MSGID: 114046] [client-handshake.c:1105:client_setvolume_cbk] 0-storage-client-0: Connected to storage-client-0, attached to remote volume '/data/storage_a/storage'. Thanks, ?-Matthew -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: kdhgpokcnegaegcf.png Type: image/png Size: 73797 bytes Desc: not available URL: From hunter86_bg at yahoo.com Sat Aug 15 05:16:12 2020 From: hunter86_bg at yahoo.com (Strahil Nikolov) Date: Sat, 15 Aug 2020 08:16:12 +0300 Subject: [Gluster-users] Wrong directory quota usage In-Reply-To: References: Message-ID: Hi Jo?o, most probably enable/disable should help. Have you checked all bricks on the ZFS ? Your example is for projectA vs ProjectB. What about 'ProjectB' directories on all bricks of the volume ? If enable/disable doesn't help, I have an idea but I have never test it, so I can't guarantee that it will help: - Create a new dir via FUSE - Set the quota on that new dir as you would like to set it on the ProjectB - use getfattr on the bricks to identify if everything is the same on all bricks If all are the same, you can use setfattr with the same values from the new dir to the 'ProjectB' volume's brick directories and remove the dirty flag. When you stat that dir ('du' or 'stat' from fuse should work), the quota should get fixed . Best Regards, Strahil Nikolov ?? 14 ?????? 2020 ?. 14:39:49 GMT+03:00, "Jo?o Ba?to" ??????: >Hi Strahil, > >I have tried removing the quota for that specific directory and setting >it >again but it didn't work (maybe it has to be a quota disable and enable >in the volume options). Currently testing a solution >by Hari with the quota_fsck.py script (https://medium.com/@harigowtham/ >glusterfs-quota-fix-accounting-840df33fcd3a) and its detecting a lot of >size mismatch in files. > >Thank you, >*Jo?o Ba?to* >--------------- > >*Scientific Computing and Software Platform* >Champalimaud Research >Champalimaud Center for the Unknown >Av. Bras?lia, Doca de Pedrou?os >1400-038 Lisbon, Portugal >fchampalimaud.org > > >Strahil Nikolov escreveu no dia sexta, >14/08/2020 >?(s) 10:16: > >> Hi Jo?o, >> >> Based on your output it seems that the quota size is different on the >2 >> bricks. >> >> Have you tried to remove the quota and then recreate it ? Maybe it >will be >> the easiest way to fix it. >> >> Best Regards, >> Strahil Nikolov >> >> >> ?? 14 ?????? 2020 ?. 4:35:14 GMT+03:00, "Jo?o Ba?to" < >> joao.bauto at neuro.fchampalimaud.org> ??????: >> >Hi all, >> > >> >We have a 4-node distributed cluster with 2 bricks per node running >> >Gluster >> >7.7 + ZFS. We use directory quota to limit the space used by our >> >members on >> >each project. Two days ago we noticed inconsistent space used >reported >> >by >> >Gluster in the quota list. >> > >> >A small snippet of gluster volume quota vol list, >> > >> > Path Hard-limit Soft-limit Used >> >Available Soft-limit exceeded? Hard-limit exceeded? >> >/projectA 5.0TB 80%(4.0TB) 3.1TB >1.9TB >> > No No >> >*/projectB 100.0TB 80%(80.0TB) 16383.4PB 740.9TB >> > No No* >> >/projectC 70.0TB 80%(56.0TB) 50.0TB >20.0TB >> > No No >> > >> >The total space available in the cluster is 360TB, the quota for >> >projectB >> >is 100TB and, as you can see, its reporting 16383.4PB used and 740TB >> >available (already decreased from 750TB). >> > >> >There was an issue in Gluster 3.x related to the wrong directory >quota >> >( >> > >> >https://lists.gluster.org/pipermail/gluster-users/2016-February/025305.html >> > and >> > >> >https://lists.gluster.org/pipermail/gluster-users/2018-November/035374.html >> ) >> >but it's marked as solved (not sure if the solution still applies). >> > >> >*On projectB* >> ># getfattr -d -m . -e hex projectB >> ># file: projectB >> >trusted.gfid=0x3ca2bce0455945efa6662813ce20fc0c >> >> >>trusted.glusterfs.9582685f-07fa-41fd-b9fc-ebab3a6989cf.xtime=0x5f35e69800098ed9 >> >trusted.glusterfs.dht=0xe1a4060c000000003ffffffe5ffffffc >> >> >>trusted.glusterfs.mdata=0x010000000000000000000000005f355c59000000000939079f000000005ce2aff90000000007fdacb0000000005ce2aff90000000007fdacb0 >> >> >>trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri.1=0x0000ab0f227a860000000000478e33acffffffffffffc112 >> >trusted.glusterfs.quota.dirty=0x3000 >> >>trusted.glusterfs.quota.limit-set.1=0x0000640000000000ffffffffffffffff >> >> >>trusted.glusterfs.quota.size.1=0x0000ab0f227a860000000000478e33acffffffffffffc112 >> > >> >*On projectA* >> ># getfattr -d -m . -e hex projectA >> ># file: projectA >> >trusted.gfid=0x05b09ded19354c0eb544d22d4659582e >> >> >>trusted.glusterfs.9582685f-07fa-41fd-b9fc-ebab3a6989cf.xtime=0x5f1aeb9f00044c64 >> >trusted.glusterfs.dht=0xe1a4060c000000001fffffff3ffffffd >> >> >>trusted.glusterfs.mdata=0x010000000000000000000000005f1ac6a10000000018f30a4e000000005c338fab0000000017a3135a000000005b0694fb000000001584a21b >> >> >>trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri.1=0x0000067de3bbe20000000000000128610000000000033498 >> >trusted.glusterfs.quota.dirty=0x3000 >> >>trusted.glusterfs.quota.limit-set.1=0x0000460000000000ffffffffffffffff >> >> >>trusted.glusterfs.quota.size.1=0x0000067de3bbe20000000000000128610000000000033498 >> > >> >Any idea on what's happening and how to fix it? >> > >> >Thanks! >> >*Jo?o Ba?to* >> >--------------- >> > >> >*Scientific Computing and Software Platform* >> >Champalimaud Research >> >Champalimaud Center for the Unknown >> >Av. Bras?lia, Doca de Pedrou?os >> >1400-038 Lisbon, Portugal >> >fchampalimaud.org >> From hunter86_bg at yahoo.com Sat Aug 15 05:21:26 2020 From: hunter86_bg at yahoo.com (Strahil Nikolov) Date: Sat, 15 Aug 2020 08:21:26 +0300 Subject: [Gluster-users] Monitoring tools for GlusterFS In-Reply-To: References: <58f109a7-6d62-4814-425d-7728ea4f8338@fischer-ka.de> Message-ID: <2B3A014B-C78F-415F-A425-4500202BF6A8@yahoo.com> Usually sharding is used for that purpose. Each shard is of a fixed size. For details: https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.3/html/configuring_red_hat_virtualization_with_red_hat_gluster_storage/chap-hosting_virtual_machine_images_on_red_hat_storage_volumes Best Regards, Strahil Nikolov ?? 14 ?????? 2020 ?. 17:08:57 GMT+03:00, Gilberto Nunes ??????: >Yes! I see! >For many small files is complicated... >Here I am generally using 2 or 3 large files (VM disk images!)... >I think this could be at least some progress bar or percent about the >healing process... Some ETA, or similar... >Otherwise the tool is nice and promising... >Thanks anyway. > >--- >Gilberto Nunes Ferreira > > > >Em sex., 14 de ago. de 2020 ?s 10:55, Sachidananda Urs > >escreveu: > >> >> >> On Fri, Aug 14, 2020 at 10:04 AM Gilberto Nunes < >> gilberto.nunes32 at gmail.com> wrote: >> >>> Hi >>> Could you improve the output to show "Possibly undergoing heal" as >well? >>> gluster vol heal VMS info >>> Brick gluster01:/DATA/vms >>> Status: Connected >>> Number of entries: 0 >>> >>> Brick gluster02:/DATA/vms >>> /images/100/vm-100-disk-0.raw - Possibly undergoing heal >>> Status: Connected >>> Number of entries: 1 >>> >>> >> >> We plan to add heal count. For example: >> >> Self-Heal: 456 pending files. >> >> Or something similar. If we list files, and if the number of files is >high >> it takes a long time and fills the screen making it quite cumbersome. >> >> -sac >> >>> From hunter86_bg at yahoo.com Sat Aug 15 05:35:16 2020 From: hunter86_bg at yahoo.com (Strahil Nikolov) Date: Sat, 15 Aug 2020 08:35:16 +0300 Subject: [Gluster-users] Geo-replication causes OOM In-Reply-To: <7d66d907-0802-4aeb-961e-964dd401fe24@uvic.ca> References: <7d66d907-0802-4aeb-961e-964dd401fe24@uvic.ca> Message-ID: <7A418761-7531-4B2F-9430-20976A22FFE6@yahoo.com> Hey Matthew, Can you check with valgrind the memory leak ? It will be something like: Find the geo rep process via ps and note all parameters it was started with . Next stop geo rep. Then start it with valgrind : valgrind --log-file="filename" --tool=memcheck --leak-check=full It might help narrowing the problem. Best Regards, Strahil Nikolov ?? 14 ?????? 2020 ?. 20:22:16 GMT+03:00, Matthew Benstead ??????: >Hi, > >We are building a new storage system, and after geo-replication has >been >running for a few hours the server runs out of memory and oom-killer >starts killing bricks. It runs fine without geo-replication on, and the > >server has 64GB of RAM. I have stopped geo-replication for now. > >Any ideas what to tune? > >[root at storage01 ~]# gluster --version | head -1 >glusterfs 7.7 > >[root at storage01 ~]# cat /etc/centos-release; uname -r >CentOS Linux release 7.8.2003 (Core) >3.10.0-1127.10.1.el7.x86_64 > >[root at storage01 ~]# df -h /storage2/ >Filesystem??????????? Size? Used Avail Use% Mounted on >10.0.231.91:/storage? 328T? 228T? 100T? 70% /storage2 > >[root at storage01 ~]# cat /proc/meminfo? | grep MemTotal >MemTotal:?????? 65412064 kB > >[root at storage01 ~]# free -g > ????????????? total??????? used??????? free????? shared buff/cache?? >available >Mem:???????????? 62????????? 18?????????? 0?????????? 0 43????????? 43 >Swap:???????????? 3?????????? 0?????????? 3 > > >[root at storage01 ~]# gluster volume info > >Volume Name: storage >Type: Distributed-Replicate >Volume ID: cf94a8f2-324b-40b3-bf72-c3766100ea99 >Status: Started >Snapshot Count: 0 >Number of Bricks: 3 x (2 + 1) = 9 >Transport-type: tcp >Bricks: >Brick1: 10.0.231.91:/data/storage_a/storage >Brick2: 10.0.231.92:/data/storage_b/storage >Brick3: 10.0.231.93:/data/storage_c/storage (arbiter) >Brick4: 10.0.231.92:/data/storage_a/storage >Brick5: 10.0.231.93:/data/storage_b/storage >Brick6: 10.0.231.91:/data/storage_c/storage (arbiter) >Brick7: 10.0.231.93:/data/storage_a/storage >Brick8: 10.0.231.91:/data/storage_b/storage >Brick9: 10.0.231.92:/data/storage_c/storage (arbiter) >Options Reconfigured: >changelog.changelog: on >geo-replication.ignore-pid-check: on >geo-replication.indexing: on >network.ping-timeout: 10 >features.inode-quota: on >features.quota: on >nfs.disable: on >features.quota-deem-statfs: on >storage.fips-mode-rchecksum: on >performance.readdir-ahead: on >performance.parallel-readdir: on >cluster.lookup-optimize: on >client.event-threads: 4 >server.event-threads: 4 >performance.cache-size: 256MB > >You can see the memory spike and reduce as bricks are killed - this >happened twice in the graph below: > > > >You can see two brick processes are down: > >[root at storage01 ~]# gluster volume status >Status of volume: storage >Gluster process TCP Port RDMA Port Online > Pid >------------------------------------------------------------------------------ >Brick 10.0.231.91:/data/storage_a/storage N/A N/A N > N/A >Brick 10.0.231.92:/data/storage_b/storage 49152 0 Y > 1627 >Brick 10.0.231.93:/data/storage_c/storage 49152 0 Y > 259966 >Brick 10.0.231.92:/data/storage_a/storage 49153 0 Y > 1642 >Brick 10.0.231.93:/data/storage_b/storage 49153 0 Y > 259975 >Brick 10.0.231.91:/data/storage_c/storage 49153 0 Y > 20656 >Brick 10.0.231.93:/data/storage_a/storage 49154 0 Y > 259983 >Brick 10.0.231.91:/data/storage_b/storage N/A N/A N > N/A >Brick 10.0.231.92:/data/storage_c/storage 49154 0 Y > 1655 >Self-heal Daemon on localhost N/A N/A Y > 20690 >Quota Daemon on localhost N/A N/A Y > 172136 >Self-heal Daemon on 10.0.231.93 N/A N/A Y > 260010 >Quota Daemon on 10.0.231.93 N/A N/A Y > 128115 >Self-heal Daemon on 10.0.231.92 N/A N/A Y > 1702 >Quota Daemon on 10.0.231.92 N/A N/A Y > 128564 > >Task Status of Volume storage >------------------------------------------------------------------------------ >There are no active volume tasks > >Logs: > >[2020-08-13 20:58:22.186540] I [MSGID: 106143] >[glusterd-pmap.c:389:pmap_registry_remove] 0-pmap: removing brick >(null) on port 49154 >[2020-08-13 20:58:22.196110] I [MSGID: 106005] >[glusterd-handler.c:5960:__glusterd_brick_rpc_notify] 0-management: >Brick 10.0.231.91:/data/storage_b/storage has disconnected from >glusterd. >[2020-08-13 20:58:22.196752] I [MSGID: 106143] >[glusterd-pmap.c:389:pmap_registry_remove] 0-pmap: removing brick >/data/storage_b/storage on port 49154 > >[2020-08-13 21:05:23.418966] I [MSGID: 106143] >[glusterd-pmap.c:389:pmap_registry_remove] 0-pmap: removing brick >(null) on port 49152 >[2020-08-13 21:05:23.420881] I [MSGID: 106005] >[glusterd-handler.c:5960:__glusterd_brick_rpc_notify] 0-management: >Brick 10.0.231.91:/data/storage_a/storage has disconnected from >glusterd. >[2020-08-13 21:05:23.421334] I [MSGID: 106143] >[glusterd-pmap.c:389:pmap_registry_remove] 0-pmap: removing brick >/data/storage_a/storage on port 49152 > > > >[Thu Aug 13 13:58:17 2020] Out of memory: Kill process 20664 >(glusterfsd) score 422 or sacrifice child >[Thu Aug 13 13:58:17 2020] Killed process 20664 (glusterfsd), UID 0, >total-vm:32884384kB, anon-rss:29625096kB, file-rss:0kB, shmem-rss:0kB > >[Thu Aug 13 14:05:18 2020] Out of memory: Kill process 20647 >(glusterfsd) score 467 or sacrifice child >[Thu Aug 13 14:05:18 2020] Killed process 20647 (glusterfsd), UID 0, >total-vm:36265116kB, anon-rss:32767744kB, file-rss:520kB, >shmem-rss:0kB0 > > > >glustershd logs: > >[2020-08-13 20:58:22.181368] W [socket.c:775:__socket_rwv] >0-storage-client-7: readv on 10.0.231.91:49154 failed (No data >available) >[2020-08-13 20:58:22.185413] I [MSGID: 114018] >[client.c:2347:client_rpc_notify] 0-storage-client-7: disconnected from >storage-client-7. Client process will keep trying to connect to >glusterd until brick's port is available >[2020-08-13 20:58:25.211872] E [MSGID: 114058] >[client-handshake.c:1455:client_query_portmap_cbk] 0-storage-client-7: >failed to get the port number for remote subvolume. Please run 'gluster >volume status' on server to see if brick process is running. >[2020-08-13 20:58:25.211934] I [MSGID: 114018] >[client.c:2347:client_rpc_notify] 0-storage-client-7: disconnected from >storage-client-7. Client process will keep trying to connect to >glusterd until brick's port is available >[2020-08-13 21:00:28.386633] I [socket.c:865:__socket_shutdown] >0-storage-client-7: intentional socket shutdown(8) >[2020-08-13 21:02:34.565373] I [socket.c:865:__socket_shutdown] >0-storage-client-7: intentional socket shutdown(8) >[2020-08-13 21:02:58.000263] W [MSGID: 114031] >[client-rpc-fops_v2.c:920:client4_0_getxattr_cbk] 0-storage-client-7: >remote operation failed. Path: / >(00000000-0000-0000-0000-000000000001). Key: trusted.glusterfs.pathinfo >[Transport endpoint is not connected] >[2020-08-13 21:02:58.000460] W [MSGID: 114029] >[client-rpc-fops_v2.c:4469:client4_0_getxattr] 0-storage-client-7: >failed to send the fop >[2020-08-13 21:04:40.733823] I [socket.c:865:__socket_shutdown] >0-storage-client-7: intentional socket shutdown(8) >[2020-08-13 21:05:23.418987] W [socket.c:775:__socket_rwv] >0-storage-client-0: readv on 10.0.231.91:49152 failed (No data >available) >[2020-08-13 21:05:23.419365] I [MSGID: 114018] >[client.c:2347:client_rpc_notify] 0-storage-client-0: disconnected from >storage-client-0. Client process will keep trying to connect to >glusterd until brick's port is available >[2020-08-13 21:05:26.423218] E [MSGID: 114058] >[client-handshake.c:1455:client_query_portmap_cbk] 0-storage-client-0: >failed to get the port number for remote subvolume. Please run 'gluster >volume status' on server to see if brick process is running. >[2020-08-13 21:06:46.919942] I [socket.c:865:__socket_shutdown] >0-storage-client-7: intentional socket shutdown(8) >[2020-08-13 21:05:26.423274] I [MSGID: 114018] >[client.c:2347:client_rpc_notify] 0-storage-client-0: disconnected from >storage-client-0. Client process will keep trying to connect to >glusterd until brick's port is available >[2020-08-13 21:07:29.667896] I [socket.c:865:__socket_shutdown] >0-storage-client-0: intentional socket shutdown(8) >[2020-08-13 21:08:05.660858] I [MSGID: 100041] >[glusterfsd-mgmt.c:1111:glusterfs_handle_svc_attach] 0-glusterfs: >received attach request for volfile-id=shd/storage >[2020-08-13 21:08:05.660948] I [MSGID: 100040] >[glusterfsd-mgmt.c:106:mgmt_process_volfile] 0-glusterfs: No change in >volfile, continuing >[2020-08-13 21:08:05.661326] I [rpc-clnt.c:1963:rpc_clnt_reconfig] >0-storage-client-7: changing port to 49154 (from 0) >[2020-08-13 21:08:05.664638] I [MSGID: 114057] >[client-handshake.c:1375:select_server_supported_programs] >0-storage-client-7: Using Program GlusterFS 4.x v1, Num (1298437), >Version (400) >[2020-08-13 21:08:05.665266] I [MSGID: 114046] >[client-handshake.c:1105:client_setvolume_cbk] 0-storage-client-7: >Connected to storage-client-7, attached to remote volume >'/data/storage_b/storage'. >[2020-08-13 21:08:05.713533] I [rpc-clnt.c:1963:rpc_clnt_reconfig] >0-storage-client-0: changing port to 49152 (from 0) >[2020-08-13 21:08:05.716535] I [MSGID: 114057] >[client-handshake.c:1375:select_server_supported_programs] >0-storage-client-0: Using Program GlusterFS 4.x v1, Num (1298437), >Version (400) >[2020-08-13 21:08:05.717224] I [MSGID: 114046] >[client-handshake.c:1105:client_setvolume_cbk] 0-storage-client-0: >Connected to storage-client-0, attached to remote volume >'/data/storage_a/storage'. > > >Thanks, > ?-Matthew From ssivakum at redhat.com Sat Aug 15 10:57:01 2020 From: ssivakum at redhat.com (Srijan Sivakumar) Date: Sat, 15 Aug 2020 16:27:01 +0530 Subject: [Gluster-users] Wrong directory quota usage In-Reply-To: References: Message-ID: Hi Jo?o, The quota accounting error is what we're looking at here. I think you've already looked into the blog post by Hari and are using the script to fix the accounting issue. That should help you out in fixing this issue. Let me know if you face any issues while using it. Regards, Srijan Sivakumar On Fri 14 Aug, 2020, 17:10 Jo?o Ba?to, wrote: > Hi Strahil, > > I have tried removing the quota for that specific directory and setting it > again but it didn't work (maybe it has to be a quota disable and enable > in the volume options). Currently testing a solution > by Hari with the quota_fsck.py script (https://medium.com/@harigowtham/ > glusterfs-quota-fix-accounting-840df33fcd3a) and its detecting a lot of > size mismatch in files. > > Thank you, > *Jo?o Ba?to* > --------------- > > *Scientific Computing and Software Platform* > Champalimaud Research > Champalimaud Center for the Unknown > Av. Bras?lia, Doca de Pedrou?os > 1400-038 Lisbon, Portugal > fchampalimaud.org > > > Strahil Nikolov escreveu no dia sexta, 14/08/2020 > ?(s) 10:16: > >> Hi Jo?o, >> >> Based on your output it seems that the quota size is different on the 2 >> bricks. >> >> Have you tried to remove the quota and then recreate it ? Maybe it will >> be the easiest way to fix it. >> >> Best Regards, >> Strahil Nikolov >> >> >> ?? 14 ?????? 2020 ?. 4:35:14 GMT+03:00, "Jo?o Ba?to" < >> joao.bauto at neuro.fchampalimaud.org> ??????: >> >Hi all, >> > >> >We have a 4-node distributed cluster with 2 bricks per node running >> >Gluster >> >7.7 + ZFS. We use directory quota to limit the space used by our >> >members on >> >each project. Two days ago we noticed inconsistent space used reported >> >by >> >Gluster in the quota list. >> > >> >A small snippet of gluster volume quota vol list, >> > >> > Path Hard-limit Soft-limit Used >> >Available Soft-limit exceeded? Hard-limit exceeded? >> >/projectA 5.0TB 80%(4.0TB) 3.1TB 1.9TB >> > No No >> >*/projectB 100.0TB 80%(80.0TB) 16383.4PB 740.9TB >> > No No* >> >/projectC 70.0TB 80%(56.0TB) 50.0TB 20.0TB >> > No No >> > >> >The total space available in the cluster is 360TB, the quota for >> >projectB >> >is 100TB and, as you can see, its reporting 16383.4PB used and 740TB >> >available (already decreased from 750TB). >> > >> >There was an issue in Gluster 3.x related to the wrong directory quota >> >( >> > >> https://lists.gluster.org/pipermail/gluster-users/2016-February/025305.html >> > and >> > >> https://lists.gluster.org/pipermail/gluster-users/2018-November/035374.html >> ) >> >but it's marked as solved (not sure if the solution still applies). >> > >> >*On projectB* >> ># getfattr -d -m . -e hex projectB >> ># file: projectB >> >trusted.gfid=0x3ca2bce0455945efa6662813ce20fc0c >> >> >trusted.glusterfs.9582685f-07fa-41fd-b9fc-ebab3a6989cf.xtime=0x5f35e69800098ed9 >> >trusted.glusterfs.dht=0xe1a4060c000000003ffffffe5ffffffc >> >> >trusted.glusterfs.mdata=0x010000000000000000000000005f355c59000000000939079f000000005ce2aff90000000007fdacb0000000005ce2aff90000000007fdacb0 >> >> >trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri.1=0x0000ab0f227a860000000000478e33acffffffffffffc112 >> >trusted.glusterfs.quota.dirty=0x3000 >> >trusted.glusterfs.quota.limit-set.1=0x0000640000000000ffffffffffffffff >> >> >trusted.glusterfs.quota.size.1=0x0000ab0f227a860000000000478e33acffffffffffffc112 >> > >> >*On projectA* >> ># getfattr -d -m . -e hex projectA >> ># file: projectA >> >trusted.gfid=0x05b09ded19354c0eb544d22d4659582e >> >> >trusted.glusterfs.9582685f-07fa-41fd-b9fc-ebab3a6989cf.xtime=0x5f1aeb9f00044c64 >> >trusted.glusterfs.dht=0xe1a4060c000000001fffffff3ffffffd >> >> >trusted.glusterfs.mdata=0x010000000000000000000000005f1ac6a10000000018f30a4e000000005c338fab0000000017a3135a000000005b0694fb000000001584a21b >> >> >trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri.1=0x0000067de3bbe20000000000000128610000000000033498 >> >trusted.glusterfs.quota.dirty=0x3000 >> >trusted.glusterfs.quota.limit-set.1=0x0000460000000000ffffffffffffffff >> >> >trusted.glusterfs.quota.size.1=0x0000067de3bbe20000000000000128610000000000033498 >> > >> >Any idea on what's happening and how to fix it? >> > >> >Thanks! >> >*Jo?o Ba?to* >> >--------------- >> > >> >*Scientific Computing and Software Platform* >> >Champalimaud Research >> >Champalimaud Center for the Unknown >> >Av. Bras?lia, Doca de Pedrou?os >> >1400-038 Lisbon, Portugal >> >fchampalimaud.org >> > ________ > > > > Community Meeting Calendar: > > Schedule - > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > Bridge: https://bluejeans.com/441850968 > > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ian.geiser at hiveio.com Sat Aug 15 15:24:19 2020 From: ian.geiser at hiveio.com (Ian Geiser) Date: Sat, 15 Aug 2020 11:24:19 -0400 Subject: [Gluster-users] Passive detection of self-heal Message-ID: Greetings, I am trying to monitor the start/stop of a selfheal in the cluster without needing to poll the cli. Is there a passive way to monitor if the cluster is in a state of selfheal? It looked like checking the xattrop directory for a file count worked in some cases, but it was not accurate. Is there a better way? Thanks! -------------- next part -------------- An HTML attachment was scrubbed... URL: From joao.bauto at neuro.fchampalimaud.org Sat Aug 15 23:27:56 2020 From: joao.bauto at neuro.fchampalimaud.org (=?UTF-8?B?Sm/Do28gQmHDunRv?=) Date: Sun, 16 Aug 2020 00:27:56 +0100 Subject: [Gluster-users] Wrong directory quota usage In-Reply-To: References: Message-ID: Hi Srijan & Strahil, I ran the quota_fsck script mentioned in Hari's blog post in all bricks and it detected a lot of size mismatch. The script was executed as, - python quota_fsck.py --sub-dir projectB --fix-issues /mnt/tank /tank/volume2/brick (in all nodes and bricks) Here is a snippet from the script, Size Mismatch /tank/volume2/brick/projectB {'parents': {'00000000-0000-0000-0000-000000000001': {'contri_file_count': 18446744073035296610L, 'contri_size': 18446645297413872640L, 'contri_dir_count': 18446744073709527653L}}, 'version': '1', 'file_count': 18446744073035296610L, 'dirty': False, 'dir_count': 18446744073709527653L, 'size': 18446645297413872640L} 15204281691754 MARKING DIRTY: /tank/volume2/brick/projectB stat on /mnt/tank/projectB Files verified : 683223 Directories verified : 46823 Objects Fixed : 705230 Checking the xattr in the bricks I can see the directory in question marked as dirty, # getfattr -d -m. -e hex /tank/volume2/brick/projectB getfattr: Removing leading '/' from absolute path names # file: tank/volume2/brick/projectB trusted.gfid=0x3ca2bce0455945efa6662813ce20fc0c trusted.glusterfs.9582685f-07fa-41fd-b9fc-ebab3a6989cf.xtime=0x5f372478000a7705 trusted.glusterfs.dht=0xe1a4060c000000003ffffffe5ffffffc trusted.glusterfs.mdata=0x010000000000000000000000005f3724750000000013ddf679000000005ce2aff90000000007fdacb0000000005ce2aff90000000007fdacb0 trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri.1=0x00000ca6ccf7a80000000000000790a1000000000000b6ea trusted.glusterfs.quota.dirty=0x3100 trusted.glusterfs.quota.limit-set.1=0x0000640000000000ffffffffffffffff trusted.glusterfs.quota.size.1=0x00000ca6ccf7a80000000000000790a1000000000000b6ea Now, my question is how do I trigger Gluster to recalculate the quota for this directory? Is it automatic but it takes a while? Because the quota list did change but not to a good "result". Path Hard-limit Soft-limit Used Available Soft-limit exceeded? Hard-limit exceeded? /projectB 100.0TB 80%(80.0TB) 16383.9PB 190.1TB No No I would like to avoid a disable/enable quota in the volume as it removes the configs. Thank you for all the help! *Jo?o Ba?to* --------------- *Scientific Computing and Software Platform* Champalimaud Research Champalimaud Center for the Unknown Av. Bras?lia, Doca de Pedrou?os 1400-038 Lisbon, Portugal fchampalimaud.org Srijan Sivakumar escreveu no dia s?bado, 15/08/2020 ?(s) 11:57: > Hi Jo?o, > > The quota accounting error is what we're looking at here. I think you've > already looked into the blog post by Hari and are using the script to fix > the accounting issue. > That should help you out in fixing this issue. > > Let me know if you face any issues while using it. > > Regards, > Srijan Sivakumar > > > On Fri 14 Aug, 2020, 17:10 Jo?o Ba?to, > wrote: > >> Hi Strahil, >> >> I have tried removing the quota for that specific directory and setting >> it again but it didn't work (maybe it has to be a quota disable and enable >> in the volume options). Currently testing a solution >> by Hari with the quota_fsck.py script (https://medium.com/@harigowtham/ >> glusterfs-quota-fix-accounting-840df33fcd3a) and its detecting a lot of >> size mismatch in files. >> >> Thank you, >> *Jo?o Ba?to* >> --------------- >> >> *Scientific Computing and Software Platform* >> Champalimaud Research >> Champalimaud Center for the Unknown >> Av. Bras?lia, Doca de Pedrou?os >> 1400-038 Lisbon, Portugal >> fchampalimaud.org >> >> >> Strahil Nikolov escreveu no dia sexta, >> 14/08/2020 ?(s) 10:16: >> >>> Hi Jo?o, >>> >>> Based on your output it seems that the quota size is different on the 2 >>> bricks. >>> >>> Have you tried to remove the quota and then recreate it ? Maybe it will >>> be the easiest way to fix it. >>> >>> Best Regards, >>> Strahil Nikolov >>> >>> >>> ?? 14 ?????? 2020 ?. 4:35:14 GMT+03:00, "Jo?o Ba?to" < >>> joao.bauto at neuro.fchampalimaud.org> ??????: >>> >Hi all, >>> > >>> >We have a 4-node distributed cluster with 2 bricks per node running >>> >Gluster >>> >7.7 + ZFS. We use directory quota to limit the space used by our >>> >members on >>> >each project. Two days ago we noticed inconsistent space used reported >>> >by >>> >Gluster in the quota list. >>> > >>> >A small snippet of gluster volume quota vol list, >>> > >>> > Path Hard-limit Soft-limit Used >>> >Available Soft-limit exceeded? Hard-limit exceeded? >>> >/projectA 5.0TB 80%(4.0TB) 3.1TB 1.9TB >>> > No No >>> >*/projectB 100.0TB 80%(80.0TB) 16383.4PB 740.9TB >>> > No No* >>> >/projectC 70.0TB 80%(56.0TB) 50.0TB 20.0TB >>> > No No >>> > >>> >The total space available in the cluster is 360TB, the quota for >>> >projectB >>> >is 100TB and, as you can see, its reporting 16383.4PB used and 740TB >>> >available (already decreased from 750TB). >>> > >>> >There was an issue in Gluster 3.x related to the wrong directory quota >>> >( >>> > >>> https://lists.gluster.org/pipermail/gluster-users/2016-February/025305.html >>> > and >>> > >>> https://lists.gluster.org/pipermail/gluster-users/2018-November/035374.html >>> ) >>> >but it's marked as solved (not sure if the solution still applies). >>> > >>> >*On projectB* >>> ># getfattr -d -m . -e hex projectB >>> ># file: projectB >>> >trusted.gfid=0x3ca2bce0455945efa6662813ce20fc0c >>> >>> >trusted.glusterfs.9582685f-07fa-41fd-b9fc-ebab3a6989cf.xtime=0x5f35e69800098ed9 >>> >trusted.glusterfs.dht=0xe1a4060c000000003ffffffe5ffffffc >>> >>> >trusted.glusterfs.mdata=0x010000000000000000000000005f355c59000000000939079f000000005ce2aff90000000007fdacb0000000005ce2aff90000000007fdacb0 >>> >>> >trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri.1=0x0000ab0f227a860000000000478e33acffffffffffffc112 >>> >trusted.glusterfs.quota.dirty=0x3000 >>> >trusted.glusterfs.quota.limit-set.1=0x0000640000000000ffffffffffffffff >>> >>> >trusted.glusterfs.quota.size.1=0x0000ab0f227a860000000000478e33acffffffffffffc112 >>> > >>> >*On projectA* >>> ># getfattr -d -m . -e hex projectA >>> ># file: projectA >>> >trusted.gfid=0x05b09ded19354c0eb544d22d4659582e >>> >>> >trusted.glusterfs.9582685f-07fa-41fd-b9fc-ebab3a6989cf.xtime=0x5f1aeb9f00044c64 >>> >trusted.glusterfs.dht=0xe1a4060c000000001fffffff3ffffffd >>> >>> >trusted.glusterfs.mdata=0x010000000000000000000000005f1ac6a10000000018f30a4e000000005c338fab0000000017a3135a000000005b0694fb000000001584a21b >>> >>> >trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri.1=0x0000067de3bbe20000000000000128610000000000033498 >>> >trusted.glusterfs.quota.dirty=0x3000 >>> >trusted.glusterfs.quota.limit-set.1=0x0000460000000000ffffffffffffffff >>> >>> >trusted.glusterfs.quota.size.1=0x0000067de3bbe20000000000000128610000000000033498 >>> > >>> >Any idea on what's happening and how to fix it? >>> > >>> >Thanks! >>> >*Jo?o Ba?to* >>> >--------------- >>> > >>> >*Scientific Computing and Software Platform* >>> >Champalimaud Research >>> >Champalimaud Center for the Unknown >>> >Av. Bras?lia, Doca de Pedrou?os >>> >1400-038 Lisbon, Portugal >>> >fchampalimaud.org >>> >> ________ >> >> >> >> Community Meeting Calendar: >> >> Schedule - >> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >> Bridge: https://bluejeans.com/441850968 >> >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ssivakum at redhat.com Sun Aug 16 05:10:46 2020 From: ssivakum at redhat.com (Srijan Sivakumar) Date: Sun, 16 Aug 2020 10:40:46 +0530 Subject: [Gluster-users] Wrong directory quota usage In-Reply-To: References: Message-ID: Hi Jo?o, Yes it'll take some time given the file system size as it has to change the xattrs in each level and then crawl upwards. stat is done by the script itself so the crawl is initiated. Regards, Srijan Sivakumar On Sun 16 Aug, 2020, 04:58 Jo?o Ba?to, wrote: > Hi Srijan & Strahil, > > I ran the quota_fsck script mentioned in Hari's blog post in all bricks > and it detected a lot of size mismatch. > > The script was executed as, > > - python quota_fsck.py --sub-dir projectB --fix-issues /mnt/tank > /tank/volume2/brick (in all nodes and bricks) > > Here is a snippet from the script, > > Size Mismatch /tank/volume2/brick/projectB {'parents': > {'00000000-0000-0000-0000-000000000001': {'contri_file_count': > 18446744073035296610L, 'contri_size': 18446645297413872640L, > 'contri_dir_count': 18446744073709527653L}}, 'version': '1', 'file_count': > 18446744073035296610L, 'dirty': False, 'dir_count': 18446744073709527653L, > 'size': 18446645297413872640L} 15204281691754 > MARKING DIRTY: /tank/volume2/brick/projectB > stat on /mnt/tank/projectB > Files verified : 683223 > Directories verified : 46823 > Objects Fixed : 705230 > > Checking the xattr in the bricks I can see the directory in question > marked as dirty, > # getfattr -d -m. -e hex /tank/volume2/brick/projectB > getfattr: Removing leading '/' from absolute path names > # file: tank/volume2/brick/projectB > trusted.gfid=0x3ca2bce0455945efa6662813ce20fc0c > > trusted.glusterfs.9582685f-07fa-41fd-b9fc-ebab3a6989cf.xtime=0x5f372478000a7705 > trusted.glusterfs.dht=0xe1a4060c000000003ffffffe5ffffffc > > trusted.glusterfs.mdata=0x010000000000000000000000005f3724750000000013ddf679000000005ce2aff90000000007fdacb0000000005ce2aff90000000007fdacb0 > > trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri.1=0x00000ca6ccf7a80000000000000790a1000000000000b6ea > trusted.glusterfs.quota.dirty=0x3100 > trusted.glusterfs.quota.limit-set.1=0x0000640000000000ffffffffffffffff > > trusted.glusterfs.quota.size.1=0x00000ca6ccf7a80000000000000790a1000000000000b6ea > > Now, my question is how do I trigger Gluster to recalculate the quota for > this directory? Is it automatic but it takes a while? Because the quota > list did change but not to a good "result". > > Path Hard-limit Soft-limit Used > Available Soft-limit exceeded? Hard-limit exceeded? > /projectB 100.0TB 80%(80.0TB) 16383.9PB 190.1TB > No No > > I would like to avoid a disable/enable quota in the volume as it removes > the configs. > > Thank you for all the help! > *Jo?o Ba?to* > --------------- > > *Scientific Computing and Software Platform* > Champalimaud Research > Champalimaud Center for the Unknown > Av. Bras?lia, Doca de Pedrou?os > 1400-038 Lisbon, Portugal > fchampalimaud.org > > > Srijan Sivakumar escreveu no dia s?bado, 15/08/2020 > ?(s) 11:57: > >> Hi Jo?o, >> >> The quota accounting error is what we're looking at here. I think you've >> already looked into the blog post by Hari and are using the script to fix >> the accounting issue. >> That should help you out in fixing this issue. >> >> Let me know if you face any issues while using it. >> >> Regards, >> Srijan Sivakumar >> >> >> On Fri 14 Aug, 2020, 17:10 Jo?o Ba?to, < >> joao.bauto at neuro.fchampalimaud.org> wrote: >> >>> Hi Strahil, >>> >>> I have tried removing the quota for that specific directory and setting >>> it again but it didn't work (maybe it has to be a quota disable and enable >>> in the volume options). Currently testing a solution >>> by Hari with the quota_fsck.py script (https://medium.com/@harigowtham/ >>> glusterfs-quota-fix-accounting-840df33fcd3a) and its detecting a lot of >>> size mismatch in files. >>> >>> Thank you, >>> *Jo?o Ba?to* >>> --------------- >>> >>> *Scientific Computing and Software Platform* >>> Champalimaud Research >>> Champalimaud Center for the Unknown >>> Av. Bras?lia, Doca de Pedrou?os >>> 1400-038 Lisbon, Portugal >>> fchampalimaud.org >>> >>> >>> Strahil Nikolov escreveu no dia sexta, >>> 14/08/2020 ?(s) 10:16: >>> >>>> Hi Jo?o, >>>> >>>> Based on your output it seems that the quota size is different on the 2 >>>> bricks. >>>> >>>> Have you tried to remove the quota and then recreate it ? Maybe it will >>>> be the easiest way to fix it. >>>> >>>> Best Regards, >>>> Strahil Nikolov >>>> >>>> >>>> ?? 14 ?????? 2020 ?. 4:35:14 GMT+03:00, "Jo?o Ba?to" < >>>> joao.bauto at neuro.fchampalimaud.org> ??????: >>>> >Hi all, >>>> > >>>> >We have a 4-node distributed cluster with 2 bricks per node running >>>> >Gluster >>>> >7.7 + ZFS. We use directory quota to limit the space used by our >>>> >members on >>>> >each project. Two days ago we noticed inconsistent space used reported >>>> >by >>>> >Gluster in the quota list. >>>> > >>>> >A small snippet of gluster volume quota vol list, >>>> > >>>> > Path Hard-limit Soft-limit Used >>>> >Available Soft-limit exceeded? Hard-limit exceeded? >>>> >/projectA 5.0TB 80%(4.0TB) 3.1TB 1.9TB >>>> > No No >>>> >*/projectB 100.0TB 80%(80.0TB) 16383.4PB 740.9TB >>>> > No No* >>>> >/projectC 70.0TB 80%(56.0TB) 50.0TB 20.0TB >>>> > No No >>>> > >>>> >The total space available in the cluster is 360TB, the quota for >>>> >projectB >>>> >is 100TB and, as you can see, its reporting 16383.4PB used and 740TB >>>> >available (already decreased from 750TB). >>>> > >>>> >There was an issue in Gluster 3.x related to the wrong directory quota >>>> >( >>>> > >>>> https://lists.gluster.org/pipermail/gluster-users/2016-February/025305.html >>>> > and >>>> > >>>> https://lists.gluster.org/pipermail/gluster-users/2018-November/035374.html >>>> ) >>>> >but it's marked as solved (not sure if the solution still applies). >>>> > >>>> >*On projectB* >>>> ># getfattr -d -m . -e hex projectB >>>> ># file: projectB >>>> >trusted.gfid=0x3ca2bce0455945efa6662813ce20fc0c >>>> >>>> >trusted.glusterfs.9582685f-07fa-41fd-b9fc-ebab3a6989cf.xtime=0x5f35e69800098ed9 >>>> >trusted.glusterfs.dht=0xe1a4060c000000003ffffffe5ffffffc >>>> >>>> >trusted.glusterfs.mdata=0x010000000000000000000000005f355c59000000000939079f000000005ce2aff90000000007fdacb0000000005ce2aff90000000007fdacb0 >>>> >>>> >trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri.1=0x0000ab0f227a860000000000478e33acffffffffffffc112 >>>> >trusted.glusterfs.quota.dirty=0x3000 >>>> >trusted.glusterfs.quota.limit-set.1=0x0000640000000000ffffffffffffffff >>>> >>>> >trusted.glusterfs.quota.size.1=0x0000ab0f227a860000000000478e33acffffffffffffc112 >>>> > >>>> >*On projectA* >>>> ># getfattr -d -m . -e hex projectA >>>> ># file: projectA >>>> >trusted.gfid=0x05b09ded19354c0eb544d22d4659582e >>>> >>>> >trusted.glusterfs.9582685f-07fa-41fd-b9fc-ebab3a6989cf.xtime=0x5f1aeb9f00044c64 >>>> >trusted.glusterfs.dht=0xe1a4060c000000001fffffff3ffffffd >>>> >>>> >trusted.glusterfs.mdata=0x010000000000000000000000005f1ac6a10000000018f30a4e000000005c338fab0000000017a3135a000000005b0694fb000000001584a21b >>>> >>>> >trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri.1=0x0000067de3bbe20000000000000128610000000000033498 >>>> >trusted.glusterfs.quota.dirty=0x3000 >>>> >trusted.glusterfs.quota.limit-set.1=0x0000460000000000ffffffffffffffff >>>> >>>> >trusted.glusterfs.quota.size.1=0x0000067de3bbe20000000000000128610000000000033498 >>>> > >>>> >Any idea on what's happening and how to fix it? >>>> > >>>> >Thanks! >>>> >*Jo?o Ba?to* >>>> >--------------- >>>> > >>>> >*Scientific Computing and Software Platform* >>>> >Champalimaud Research >>>> >Champalimaud Center for the Unknown >>>> >Av. Bras?lia, Doca de Pedrou?os >>>> >1400-038 Lisbon, Portugal >>>> >fchampalimaud.org >>>> >>> ________ >>> >>> >>> >>> Community Meeting Calendar: >>> >>> Schedule - >>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >>> Bridge: https://bluejeans.com/441850968 >>> >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> https://lists.gluster.org/mailman/listinfo/gluster-users >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthewb at uvic.ca Mon Aug 17 20:46:24 2020 From: matthewb at uvic.ca (Matthew Benstead) Date: Mon, 17 Aug 2020 13:46:24 -0700 Subject: [Gluster-users] Geo-replication causes OOM In-Reply-To: <7A418761-7531-4B2F-9430-20976A22FFE6@yahoo.com> References: <7d66d907-0802-4aeb-961e-964dd401fe24@uvic.ca> <7A418761-7531-4B2F-9430-20976A22FFE6@yahoo.com> Message-ID: <74f920aa-6ef0-c49f-99dd-ab23b5821279@uvic.ca> Thanks Strahil, Would the geo rep process be the gsyncd.py proceses? It seems like it's the glusterfsd and auxiliary mounts that are holding all the memory right now... Could this be related to the open-behind bug mentioned here: https://github.com/gluster/glusterfs/issues/1444? and here: https://github.com/gluster/glusterfs/issues/1440 ? Thanks, ?-Matthew Matthew Benstead System Administrator Pacific Climate Impacts Consortium University of Victoria, UH1 PO Box 1800, STN CSC Victoria, BC, V8W 2Y2 Phone: 1-250-721-8432 Email: matthewb at uvic.ca On 2020-08-14 10:35 p.m., Strahil Nikolov wrote: > Hey Matthew, > > Can you check with valgrind the memory leak ? > > It will be something like: > Find the geo rep process via ps and note all parameters it was started with . > Next stop geo rep. > > Then start it with valgrind : > valgrind --log-file="filename" --tool=memcheck --leak-check=full > > It might help narrowing the problem. > > Best Regards, > Strahil Nikolov > > ?? 14 ?????? 2020 ?. 20:22:16 GMT+03:00, Matthew Benstead ??????: >> Hi, >> >> We are building a new storage system, and after geo-replication has >> been >> running for a few hours the server runs out of memory and oom-killer >> starts killing bricks. It runs fine without geo-replication on, and the >> >> server has 64GB of RAM. I have stopped geo-replication for now. >> >> Any ideas what to tune? >> >> [root at storage01 ~]# gluster --version | head -1 >> glusterfs 7.7 >> >> [root at storage01 ~]# cat /etc/centos-release; uname -r >> CentOS Linux release 7.8.2003 (Core) >> 3.10.0-1127.10.1.el7.x86_64 >> >> [root at storage01 ~]# df -h /storage2/ >> Filesystem??????????? Size? Used Avail Use% Mounted on >> 10.0.231.91:/storage? 328T? 228T? 100T? 70% /storage2 >> >> [root at storage01 ~]# cat /proc/meminfo? | grep MemTotal >> MemTotal:?????? 65412064 kB >> >> [root at storage01 ~]# free -g >> ????????????? total??????? used??????? free????? shared buff/cache >> available >> Mem:???????????? 62????????? 18?????????? 0?????????? 0 43????????? 43 >> Swap:???????????? 3?????????? 0?????????? 3 >> >> >> [root at storage01 ~]# gluster volume info >> >> Volume Name: storage >> Type: Distributed-Replicate >> Volume ID: cf94a8f2-324b-40b3-bf72-c3766100ea99 >> Status: Started >> Snapshot Count: 0 >> Number of Bricks: 3 x (2 + 1) = 9 >> Transport-type: tcp >> Bricks: >> Brick1: 10.0.231.91:/data/storage_a/storage >> Brick2: 10.0.231.92:/data/storage_b/storage >> Brick3: 10.0.231.93:/data/storage_c/storage (arbiter) >> Brick4: 10.0.231.92:/data/storage_a/storage >> Brick5: 10.0.231.93:/data/storage_b/storage >> Brick6: 10.0.231.91:/data/storage_c/storage (arbiter) >> Brick7: 10.0.231.93:/data/storage_a/storage >> Brick8: 10.0.231.91:/data/storage_b/storage >> Brick9: 10.0.231.92:/data/storage_c/storage (arbiter) >> Options Reconfigured: >> changelog.changelog: on >> geo-replication.ignore-pid-check: on >> geo-replication.indexing: on >> network.ping-timeout: 10 >> features.inode-quota: on >> features.quota: on >> nfs.disable: on >> features.quota-deem-statfs: on >> storage.fips-mode-rchecksum: on >> performance.readdir-ahead: on >> performance.parallel-readdir: on >> cluster.lookup-optimize: on >> client.event-threads: 4 >> server.event-threads: 4 >> performance.cache-size: 256MB >> >> You can see the memory spike and reduce as bricks are killed - this >> happened twice in the graph below: >> >> >> >> You can see two brick processes are down: >> >> [root at storage01 ~]# gluster volume status >> Status of volume: storage >> Gluster process TCP Port RDMA Port Online >> Pid >> ------------------------------------------------------------------------------ >> Brick 10.0.231.91:/data/storage_a/storage N/A N/A N >> N/A >> Brick 10.0.231.92:/data/storage_b/storage 49152 0 Y >> 1627 >> Brick 10.0.231.93:/data/storage_c/storage 49152 0 Y >> 259966 >> Brick 10.0.231.92:/data/storage_a/storage 49153 0 Y >> 1642 >> Brick 10.0.231.93:/data/storage_b/storage 49153 0 Y >> 259975 >> Brick 10.0.231.91:/data/storage_c/storage 49153 0 Y >> 20656 >> Brick 10.0.231.93:/data/storage_a/storage 49154 0 Y >> 259983 >> Brick 10.0.231.91:/data/storage_b/storage N/A N/A N >> N/A >> Brick 10.0.231.92:/data/storage_c/storage 49154 0 Y >> 1655 >> Self-heal Daemon on localhost N/A N/A Y >> 20690 >> Quota Daemon on localhost N/A N/A Y >> 172136 >> Self-heal Daemon on 10.0.231.93 N/A N/A Y >> 260010 >> Quota Daemon on 10.0.231.93 N/A N/A Y >> 128115 >> Self-heal Daemon on 10.0.231.92 N/A N/A Y >> 1702 >> Quota Daemon on 10.0.231.92 N/A N/A Y >> 128564 >> >> Task Status of Volume storage >> ------------------------------------------------------------------------------ >> There are no active volume tasks >> >> Logs: >> >> [2020-08-13 20:58:22.186540] I [MSGID: 106143] >> [glusterd-pmap.c:389:pmap_registry_remove] 0-pmap: removing brick >> (null) on port 49154 >> [2020-08-13 20:58:22.196110] I [MSGID: 106005] >> [glusterd-handler.c:5960:__glusterd_brick_rpc_notify] 0-management: >> Brick 10.0.231.91:/data/storage_b/storage has disconnected from >> glusterd. >> [2020-08-13 20:58:22.196752] I [MSGID: 106143] >> [glusterd-pmap.c:389:pmap_registry_remove] 0-pmap: removing brick >> /data/storage_b/storage on port 49154 >> >> [2020-08-13 21:05:23.418966] I [MSGID: 106143] >> [glusterd-pmap.c:389:pmap_registry_remove] 0-pmap: removing brick >> (null) on port 49152 >> [2020-08-13 21:05:23.420881] I [MSGID: 106005] >> [glusterd-handler.c:5960:__glusterd_brick_rpc_notify] 0-management: >> Brick 10.0.231.91:/data/storage_a/storage has disconnected from >> glusterd. >> [2020-08-13 21:05:23.421334] I [MSGID: 106143] >> [glusterd-pmap.c:389:pmap_registry_remove] 0-pmap: removing brick >> /data/storage_a/storage on port 49152 >> >> >> >> [Thu Aug 13 13:58:17 2020] Out of memory: Kill process 20664 >> (glusterfsd) score 422 or sacrifice child >> [Thu Aug 13 13:58:17 2020] Killed process 20664 (glusterfsd), UID 0, >> total-vm:32884384kB, anon-rss:29625096kB, file-rss:0kB, shmem-rss:0kB >> >> [Thu Aug 13 14:05:18 2020] Out of memory: Kill process 20647 >> (glusterfsd) score 467 or sacrifice child >> [Thu Aug 13 14:05:18 2020] Killed process 20647 (glusterfsd), UID 0, >> total-vm:36265116kB, anon-rss:32767744kB, file-rss:520kB, >> shmem-rss:0kB0 >> >> >> >> glustershd logs: >> >> [2020-08-13 20:58:22.181368] W [socket.c:775:__socket_rwv] >> 0-storage-client-7: readv on 10.0.231.91:49154 failed (No data >> available) >> [2020-08-13 20:58:22.185413] I [MSGID: 114018] >> [client.c:2347:client_rpc_notify] 0-storage-client-7: disconnected from >> storage-client-7. Client process will keep trying to connect to >> glusterd until brick's port is available >> [2020-08-13 20:58:25.211872] E [MSGID: 114058] >> [client-handshake.c:1455:client_query_portmap_cbk] 0-storage-client-7: >> failed to get the port number for remote subvolume. Please run 'gluster >> volume status' on server to see if brick process is running. >> [2020-08-13 20:58:25.211934] I [MSGID: 114018] >> [client.c:2347:client_rpc_notify] 0-storage-client-7: disconnected from >> storage-client-7. Client process will keep trying to connect to >> glusterd until brick's port is available >> [2020-08-13 21:00:28.386633] I [socket.c:865:__socket_shutdown] >> 0-storage-client-7: intentional socket shutdown(8) >> [2020-08-13 21:02:34.565373] I [socket.c:865:__socket_shutdown] >> 0-storage-client-7: intentional socket shutdown(8) >> [2020-08-13 21:02:58.000263] W [MSGID: 114031] >> [client-rpc-fops_v2.c:920:client4_0_getxattr_cbk] 0-storage-client-7: >> remote operation failed. Path: / >> (00000000-0000-0000-0000-000000000001). Key: trusted.glusterfs.pathinfo >> [Transport endpoint is not connected] >> [2020-08-13 21:02:58.000460] W [MSGID: 114029] >> [client-rpc-fops_v2.c:4469:client4_0_getxattr] 0-storage-client-7: >> failed to send the fop >> [2020-08-13 21:04:40.733823] I [socket.c:865:__socket_shutdown] >> 0-storage-client-7: intentional socket shutdown(8) >> [2020-08-13 21:05:23.418987] W [socket.c:775:__socket_rwv] >> 0-storage-client-0: readv on 10.0.231.91:49152 failed (No data >> available) >> [2020-08-13 21:05:23.419365] I [MSGID: 114018] >> [client.c:2347:client_rpc_notify] 0-storage-client-0: disconnected from >> storage-client-0. Client process will keep trying to connect to >> glusterd until brick's port is available >> [2020-08-13 21:05:26.423218] E [MSGID: 114058] >> [client-handshake.c:1455:client_query_portmap_cbk] 0-storage-client-0: >> failed to get the port number for remote subvolume. Please run 'gluster >> volume status' on server to see if brick process is running. >> [2020-08-13 21:06:46.919942] I [socket.c:865:__socket_shutdown] >> 0-storage-client-7: intentional socket shutdown(8) >> [2020-08-13 21:05:26.423274] I [MSGID: 114018] >> [client.c:2347:client_rpc_notify] 0-storage-client-0: disconnected from >> storage-client-0. Client process will keep trying to connect to >> glusterd until brick's port is available >> [2020-08-13 21:07:29.667896] I [socket.c:865:__socket_shutdown] >> 0-storage-client-0: intentional socket shutdown(8) >> [2020-08-13 21:08:05.660858] I [MSGID: 100041] >> [glusterfsd-mgmt.c:1111:glusterfs_handle_svc_attach] 0-glusterfs: >> received attach request for volfile-id=shd/storage >> [2020-08-13 21:08:05.660948] I [MSGID: 100040] >> [glusterfsd-mgmt.c:106:mgmt_process_volfile] 0-glusterfs: No change in >> volfile, continuing >> [2020-08-13 21:08:05.661326] I [rpc-clnt.c:1963:rpc_clnt_reconfig] >> 0-storage-client-7: changing port to 49154 (from 0) >> [2020-08-13 21:08:05.664638] I [MSGID: 114057] >> [client-handshake.c:1375:select_server_supported_programs] >> 0-storage-client-7: Using Program GlusterFS 4.x v1, Num (1298437), >> Version (400) >> [2020-08-13 21:08:05.665266] I [MSGID: 114046] >> [client-handshake.c:1105:client_setvolume_cbk] 0-storage-client-7: >> Connected to storage-client-7, attached to remote volume >> '/data/storage_b/storage'. >> [2020-08-13 21:08:05.713533] I [rpc-clnt.c:1963:rpc_clnt_reconfig] >> 0-storage-client-0: changing port to 49152 (from 0) >> [2020-08-13 21:08:05.716535] I [MSGID: 114057] >> [client-handshake.c:1375:select_server_supported_programs] >> 0-storage-client-0: Using Program GlusterFS 4.x v1, Num (1298437), >> Version (400) >> [2020-08-13 21:08:05.717224] I [MSGID: 114046] >> [client-handshake.c:1105:client_setvolume_cbk] 0-storage-client-0: >> Connected to storage-client-0, attached to remote volume >> '/data/storage_a/storage'. >> >> >> Thanks, >> ?-Matthew -------------- next part -------------- An HTML attachment was scrubbed... URL: From gilberto.nunes32 at gmail.com Tue Aug 18 12:56:49 2020 From: gilberto.nunes32 at gmail.com (Gilberto Nunes) Date: Tue, 18 Aug 2020 09:56:49 -0300 Subject: [Gluster-users] GlusterFS performance for big files... Message-ID: Hi friends... I have a 2-nodes GlusterFS, with has the follow configuration: gluster vol info Volume Name: VMS Type: Replicate Volume ID: a4ec9cfb-1bba-405c-b249-8bd5467e0b91 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: server02:/DATA/vms Brick2: server01:/DATA/vms Options Reconfigured: performance.read-ahead: off performance.io-cache: on performance.cache-refresh-timeout: 1 performance.cache-size: 1073741824 performance.io-thread-count: 64 performance.write-behind-window-size: 64MB cluster.granular-entry-heal: enable cluster.self-heal-daemon: enable performance.client-io-threads: on cluster.data-self-heal-algorithm: full cluster.favorite-child-policy: mtime network.ping-timeout: 2 cluster.quorum-count: 1 cluster.quorum-reads: false cluster.heal-timeout: 20 storage.fips-mode-rchecksum: on transport.address-family: inet nfs.disable: on HDDs are SSD and SAS Network connections between the servers are dedicated 1GB (no switch!). Files are 500G 200G 200G 250G 200G 100G size each. Performance so far so good is ok... Any other advice which could point me, let me know! Thanks --- Gilberto Nunes Ferreira -------------- next part -------------- An HTML attachment was scrubbed... URL: From ykaul at redhat.com Tue Aug 18 13:19:17 2020 From: ykaul at redhat.com (Yaniv Kaul) Date: Tue, 18 Aug 2020 16:19:17 +0300 Subject: [Gluster-users] GlusterFS performance for big files... In-Reply-To: References: Message-ID: On Tue, Aug 18, 2020 at 3:57 PM Gilberto Nunes wrote: > Hi friends... > > I have a 2-nodes GlusterFS, with has the follow configuration: > gluster vol info > > Volume Name: VMS > Type: Replicate > Volume ID: a4ec9cfb-1bba-405c-b249-8bd5467e0b91 > Status: Started > Snapshot Count: 0 > Number of Bricks: 1 x 2 = 2 > Transport-type: tcp > Bricks: > Brick1: server02:/DATA/vms > Brick2: server01:/DATA/vms > Options Reconfigured: > performance.read-ahead: off > performance.io-cache: on > performance.cache-refresh-timeout: 1 > performance.cache-size: 1073741824 > performance.io-thread-count: 64 > performance.write-behind-window-size: 64MB > cluster.granular-entry-heal: enable > cluster.self-heal-daemon: enable > performance.client-io-threads: on > cluster.data-self-heal-algorithm: full > cluster.favorite-child-policy: mtime > network.ping-timeout: 2 > cluster.quorum-count: 1 > cluster.quorum-reads: false > cluster.heal-timeout: 20 > storage.fips-mode-rchecksum: on > transport.address-family: inet > nfs.disable: on > > HDDs are SSD and SAS > Network connections between the servers are dedicated 1GB (no switch!). > You can't get good performance on 1Gb. > Files are 500G 200G 200G 250G 200G 100G size each. > > Performance so far so good is ok... > What's your workload? Read? Write? sequential? random? many files? With more bricks and nodes, you should probably use sharding. What are your expectations, btw? Y. > Any other advice which could point me, let me know! > > Thanks > > > > --- > Gilberto Nunes Ferreira > > ________ > > > > Community Meeting Calendar: > > Schedule - > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > Bridge: https://bluejeans.com/441850968 > > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sankarshan.mukhopadhyay at gmail.com Tue Aug 18 13:28:37 2020 From: sankarshan.mukhopadhyay at gmail.com (sankarshan) Date: Tue, 18 Aug 2020 18:58:37 +0530 Subject: [Gluster-users] GlusterFS performance for big files... In-Reply-To: References: Message-ID: On Tue, 18 Aug 2020 at 18:50, Yaniv Kaul wrote: > > > > On Tue, Aug 18, 2020 at 3:57 PM Gilberto Nunes wrote: >> >> Hi friends... >> >> I have a 2-nodes GlusterFS, with has the follow configuration: >> gluster vol info >> I'd be interested in the chosen configuration for this deployment - the 2 node set up. Was there a specific requirement which led to this? >> Volume Name: VMS >> Type: Replicate >> Volume ID: a4ec9cfb-1bba-405c-b249-8bd5467e0b91 >> Status: Started >> Snapshot Count: 0 >> Number of Bricks: 1 x 2 = 2 >> Transport-type: tcp >> Bricks: >> Brick1: server02:/DATA/vms >> Brick2: server01:/DATA/vms >> Options Reconfigured: >> performance.read-ahead: off >> performance.io-cache: on >> performance.cache-refresh-timeout: 1 >> performance.cache-size: 1073741824 >> performance.io-thread-count: 64 >> performance.write-behind-window-size: 64MB >> cluster.granular-entry-heal: enable >> cluster.self-heal-daemon: enable >> performance.client-io-threads: on >> cluster.data-self-heal-algorithm: full >> cluster.favorite-child-policy: mtime >> network.ping-timeout: 2 >> cluster.quorum-count: 1 >> cluster.quorum-reads: false >> cluster.heal-timeout: 20 >> storage.fips-mode-rchecksum: on >> transport.address-family: inet >> nfs.disable: on >> >> HDDs are SSD and SAS >> Network connections between the servers are dedicated 1GB (no switch!). > > > You can't get good performance on 1Gb. >> >> Files are 500G 200G 200G 250G 200G 100G size each. >> >> Performance so far so good is ok... > > > What's your workload? Read? Write? sequential? random? many files? > With more bricks and nodes, you should probably use sharding. > > What are your expectations, btw? > Y. > >> >> Any other advice which could point me, let me know! >> >> Thanks >> >> >> >> --- >> Gilberto Nunes Ferreira >> >> ________ >> >> >> >> Community Meeting Calendar: >> >> Schedule - >> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >> Bridge: https://bluejeans.com/441850968 >> >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users > > ________ > > > > Community Meeting Calendar: > > Schedule - > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > Bridge: https://bluejeans.com/441850968 > > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -- sankarshan mukhopadhyay From gilberto.nunes32 at gmail.com Tue Aug 18 13:47:01 2020 From: gilberto.nunes32 at gmail.com (Gilberto Nunes) Date: Tue, 18 Aug 2020 10:47:01 -0300 Subject: [Gluster-users] GlusterFS performance for big files... In-Reply-To: References: Message-ID: >> What's your workload? I have 6 KVM VMs which have Windows and Linux installed on it. >> Read? >> Write? iostat (I am using sdc as the main storage) cavg-cpu: %user %nice %system %iowait %steal %idle 9.15 0.00 1.25 1.38 0.00 88.22 Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util sdc 0.00 1.00 0.00 1.50 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.50 >> sequential? random? sequential >> many files? 6 files 500G 200G 200G 250G 200G 100G size each. With more bricks and nodes, you should probably use sharding. For now I have only two bricks/nodes.... Plan for more is now out of the question! What are your expectations, btw? I ran many environments with Proxmox Virtual Environment, which use QEMU (not virt) and LXC...But I use majority KVM (QEMU) virtual machines. My goal is to use glusterfs since I think it's more resource demanding such as memory and cpu and nic, when compared to ZFS or CEPH. --- Gilberto Nunes Ferreira (47) 3025-5907 (47) 99676-7530 - Whatsapp / Telegram Skype: gilberto.nunes36 Em ter., 18 de ago. de 2020 ?s 10:29, sankarshan < sankarshan.mukhopadhyay at gmail.com> escreveu: > On Tue, 18 Aug 2020 at 18:50, Yaniv Kaul wrote: > > > > > > > > On Tue, Aug 18, 2020 at 3:57 PM Gilberto Nunes < > gilberto.nunes32 at gmail.com> wrote: > >> > >> Hi friends... > >> > >> I have a 2-nodes GlusterFS, with has the follow configuration: > >> gluster vol info > >> > > I'd be interested in the chosen configuration for this deployment - > the 2 node set up. Was there a specific requirement which led to this? > > >> Volume Name: VMS > >> Type: Replicate > >> Volume ID: a4ec9cfb-1bba-405c-b249-8bd5467e0b91 > >> Status: Started > >> Snapshot Count: 0 > >> Number of Bricks: 1 x 2 = 2 > >> Transport-type: tcp > >> Bricks: > >> Brick1: server02:/DATA/vms > >> Brick2: server01:/DATA/vms > >> Options Reconfigured: > >> performance.read-ahead: off > >> performance.io-cache: on > >> performance.cache-refresh-timeout: 1 > >> performance.cache-size: 1073741824 > >> performance.io-thread-count: 64 > >> performance.write-behind-window-size: 64MB > >> cluster.granular-entry-heal: enable > >> cluster.self-heal-daemon: enable > >> performance.client-io-threads: on > >> cluster.data-self-heal-algorithm: full > >> cluster.favorite-child-policy: mtime > >> network.ping-timeout: 2 > >> cluster.quorum-count: 1 > >> cluster.quorum-reads: false > >> cluster.heal-timeout: 20 > >> storage.fips-mode-rchecksum: on > >> transport.address-family: inet > >> nfs.disable: on > >> > >> HDDs are SSD and SAS > >> Network connections between the servers are dedicated 1GB (no switch!). > > > > > > You can't get good performance on 1Gb. > >> > >> Files are 500G 200G 200G 250G 200G 100G size each. > >> > >> Performance so far so good is ok... > > > > > > What's your workload? Read? Write? sequential? random? many files? > > With more bricks and nodes, you should probably use sharding. > > > > What are your expectations, btw? > > Y. > > > >> > >> Any other advice which could point me, let me know! > >> > >> Thanks > >> > >> > >> > >> --- > >> Gilberto Nunes Ferreira > >> > >> ________ > >> > >> > >> > >> Community Meeting Calendar: > >> > >> Schedule - > >> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > >> Bridge: https://bluejeans.com/441850968 > >> > >> Gluster-users mailing list > >> Gluster-users at gluster.org > >> https://lists.gluster.org/mailman/listinfo/gluster-users > > > > ________ > > > > > > > > Community Meeting Calendar: > > > > Schedule - > > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > > Bridge: https://bluejeans.com/441850968 > > > > Gluster-users mailing list > > Gluster-users at gluster.org > > https://lists.gluster.org/mailman/listinfo/gluster-users > > > > -- > sankarshan mukhopadhyay > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hunter86_bg at yahoo.com Tue Aug 18 17:03:06 2020 From: hunter86_bg at yahoo.com (Strahil Nikolov) Date: Tue, 18 Aug 2020 20:03:06 +0300 Subject: [Gluster-users] GlusterFS performance for big files... In-Reply-To: References: Message-ID: <3D24023B-D4C8-4E23-8421-E660CD057E50@yahoo.com> There is a 'virt' group optimized for virtual workloads. Usually I recommend to start from ground up in order to optimize on all levels. - I/O scheduler of the bricks (either (mq-)deadline or noop/none) - CPU cstates - Tuned profile (swappiness, dirty settings) - MTU of the gluster network, the bigger the better - Gluster tunables (virt group is a good start) If your gluster nodes are actually in the cloud, it is recommended (at least for AWS) to use a stripe over 8 virtual disks for each brick. Keep in mind that shard size on RH Gluster Storage is using 512MB while the default on community edition is 64MB. Best Regards, Strahil Nikolov ?? 18 ?????? 2020 ?. 16:47:01 GMT+03:00, Gilberto Nunes ??????: >>> What's your workload? >I have 6 KVM VMs which have Windows and Linux installed on it. > >>> Read? >>> Write? >iostat (I am using sdc as the main storage) >cavg-cpu: %user %nice %system %iowait %steal %idle > 9.15 0.00 1.25 1.38 0.00 88.22 > >Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s >%rrqm > %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util >sdc 0.00 1.00 0.00 1.50 0.00 0.00 >0.00 > 0.00 0.00 0.00 0.00 0.00 1.50 > > >>> sequential? random? >sequential >>> many files? >6 files 500G 200G 200G 250G 200G 100G size each. >With more bricks and nodes, you should probably use sharding. >For now I have only two bricks/nodes.... Plan for more is now out of >the >question! > >What are your expectations, btw? > >I ran many environments with Proxmox Virtual Environment, which use >QEMU >(not virt) and LXC...But I use majority KVM (QEMU) virtual machines. >My goal is to use glusterfs since I think it's more resource demanding >such >as memory and cpu and nic, when compared to ZFS or CEPH. > > >--- >Gilberto Nunes Ferreira > >(47) 3025-5907 >(47) 99676-7530 - Whatsapp / Telegram > >Skype: gilberto.nunes36 > > > > > >Em ter., 18 de ago. de 2020 ?s 10:29, sankarshan < >sankarshan.mukhopadhyay at gmail.com> escreveu: > >> On Tue, 18 Aug 2020 at 18:50, Yaniv Kaul wrote: >> > >> > >> > >> > On Tue, Aug 18, 2020 at 3:57 PM Gilberto Nunes < >> gilberto.nunes32 at gmail.com> wrote: >> >> >> >> Hi friends... >> >> >> >> I have a 2-nodes GlusterFS, with has the follow configuration: >> >> gluster vol info >> >> >> >> I'd be interested in the chosen configuration for this deployment - >> the 2 node set up. Was there a specific requirement which led to >this? >> >> >> Volume Name: VMS >> >> Type: Replicate >> >> Volume ID: a4ec9cfb-1bba-405c-b249-8bd5467e0b91 >> >> Status: Started >> >> Snapshot Count: 0 >> >> Number of Bricks: 1 x 2 = 2 >> >> Transport-type: tcp >> >> Bricks: >> >> Brick1: server02:/DATA/vms >> >> Brick2: server01:/DATA/vms >> >> Options Reconfigured: >> >> performance.read-ahead: off >> >> performance.io-cache: on >> >> performance.cache-refresh-timeout: 1 >> >> performance.cache-size: 1073741824 >> >> performance.io-thread-count: 64 >> >> performance.write-behind-window-size: 64MB >> >> cluster.granular-entry-heal: enable >> >> cluster.self-heal-daemon: enable >> >> performance.client-io-threads: on >> >> cluster.data-self-heal-algorithm: full >> >> cluster.favorite-child-policy: mtime >> >> network.ping-timeout: 2 >> >> cluster.quorum-count: 1 >> >> cluster.quorum-reads: false >> >> cluster.heal-timeout: 20 >> >> storage.fips-mode-rchecksum: on >> >> transport.address-family: inet >> >> nfs.disable: on >> >> >> >> HDDs are SSD and SAS >> >> Network connections between the servers are dedicated 1GB (no >switch!). >> > >> > >> > You can't get good performance on 1Gb. >> >> >> >> Files are 500G 200G 200G 250G 200G 100G size each. >> >> >> >> Performance so far so good is ok... >> > >> > >> > What's your workload? Read? Write? sequential? random? many files? >> > With more bricks and nodes, you should probably use sharding. >> > >> > What are your expectations, btw? >> > Y. >> > >> >> >> >> Any other advice which could point me, let me know! >> >> >> >> Thanks >> >> >> >> >> >> >> >> --- >> >> Gilberto Nunes Ferreira >> >> >> >> ________ >> >> >> >> >> >> >> >> Community Meeting Calendar: >> >> >> >> Schedule - >> >> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >> >> Bridge: https://bluejeans.com/441850968 >> >> >> >> Gluster-users mailing list >> >> Gluster-users at gluster.org >> >> https://lists.gluster.org/mailman/listinfo/gluster-users >> > >> > ________ >> > >> > >> > >> > Community Meeting Calendar: >> > >> > Schedule - >> > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >> > Bridge: https://bluejeans.com/441850968 >> > >> > Gluster-users mailing list >> > Gluster-users at gluster.org >> > https://lists.gluster.org/mailman/listinfo/gluster-users >> >> >> >> -- >> sankarshan mukhopadhyay >> >> From joao.bauto at neuro.fchampalimaud.org Tue Aug 18 20:15:31 2020 From: joao.bauto at neuro.fchampalimaud.org (=?UTF-8?B?Sm/Do28gQmHDunRv?=) Date: Tue, 18 Aug 2020 21:15:31 +0100 Subject: [Gluster-users] Wrong directory quota usage In-Reply-To: References: Message-ID: Hi Srijan, Is there a way of getting the status of the crawl process? We are going to expand this cluster, adding 12 new bricks (around 500TB) and we rely heavily on the quota feature to control the space usage for each project. It's been running since Saturday (nothing changed) and unsure if it's going to finish tomorrow or in weeks. Thank you! *Jo?o Ba?to* --------------- *Scientific Computing and Software Platform* Champalimaud Research Champalimaud Center for the Unknown Av. Bras?lia, Doca de Pedrou?os 1400-038 Lisbon, Portugal fchampalimaud.org Srijan Sivakumar escreveu no dia domingo, 16/08/2020 ?(s) 06:11: > Hi Jo?o, > > Yes it'll take some time given the file system size as it has to change > the xattrs in each level and then crawl upwards. > > stat is done by the script itself so the crawl is initiated. > > Regards, > Srijan Sivakumar > > On Sun 16 Aug, 2020, 04:58 Jo?o Ba?to, > wrote: > >> Hi Srijan & Strahil, >> >> I ran the quota_fsck script mentioned in Hari's blog post in all bricks >> and it detected a lot of size mismatch. >> >> The script was executed as, >> >> - python quota_fsck.py --sub-dir projectB --fix-issues /mnt/tank >> /tank/volume2/brick (in all nodes and bricks) >> >> Here is a snippet from the script, >> >> Size Mismatch /tank/volume2/brick/projectB {'parents': >> {'00000000-0000-0000-0000-000000000001': {'contri_file_count': >> 18446744073035296610L, 'contri_size': 18446645297413872640L, >> 'contri_dir_count': 18446744073709527653L}}, 'version': '1', 'file_count': >> 18446744073035296610L, 'dirty': False, 'dir_count': 18446744073709527653L, >> 'size': 18446645297413872640L} 15204281691754 >> MARKING DIRTY: /tank/volume2/brick/projectB >> stat on /mnt/tank/projectB >> Files verified : 683223 >> Directories verified : 46823 >> Objects Fixed : 705230 >> >> Checking the xattr in the bricks I can see the directory in question >> marked as dirty, >> # getfattr -d -m. -e hex /tank/volume2/brick/projectB >> getfattr: Removing leading '/' from absolute path names >> # file: tank/volume2/brick/projectB >> trusted.gfid=0x3ca2bce0455945efa6662813ce20fc0c >> >> trusted.glusterfs.9582685f-07fa-41fd-b9fc-ebab3a6989cf.xtime=0x5f372478000a7705 >> trusted.glusterfs.dht=0xe1a4060c000000003ffffffe5ffffffc >> >> trusted.glusterfs.mdata=0x010000000000000000000000005f3724750000000013ddf679000000005ce2aff90000000007fdacb0000000005ce2aff90000000007fdacb0 >> >> trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri.1=0x00000ca6ccf7a80000000000000790a1000000000000b6ea >> trusted.glusterfs.quota.dirty=0x3100 >> trusted.glusterfs.quota.limit-set.1=0x0000640000000000ffffffffffffffff >> >> trusted.glusterfs.quota.size.1=0x00000ca6ccf7a80000000000000790a1000000000000b6ea >> >> Now, my question is how do I trigger Gluster to recalculate the quota for >> this directory? Is it automatic but it takes a while? Because the quota >> list did change but not to a good "result". >> >> Path Hard-limit Soft-limit Used >> Available Soft-limit exceeded? Hard-limit exceeded? >> /projectB 100.0TB 80%(80.0TB) 16383.9PB 190.1TB >> No No >> >> I would like to avoid a disable/enable quota in the volume as it removes >> the configs. >> >> Thank you for all the help! >> *Jo?o Ba?to* >> --------------- >> >> *Scientific Computing and Software Platform* >> Champalimaud Research >> Champalimaud Center for the Unknown >> Av. Bras?lia, Doca de Pedrou?os >> 1400-038 Lisbon, Portugal >> fchampalimaud.org >> >> >> Srijan Sivakumar escreveu no dia s?bado, >> 15/08/2020 ?(s) 11:57: >> >>> Hi Jo?o, >>> >>> The quota accounting error is what we're looking at here. I think you've >>> already looked into the blog post by Hari and are using the script to fix >>> the accounting issue. >>> That should help you out in fixing this issue. >>> >>> Let me know if you face any issues while using it. >>> >>> Regards, >>> Srijan Sivakumar >>> >>> >>> On Fri 14 Aug, 2020, 17:10 Jo?o Ba?to, < >>> joao.bauto at neuro.fchampalimaud.org> wrote: >>> >>>> Hi Strahil, >>>> >>>> I have tried removing the quota for that specific directory and setting >>>> it again but it didn't work (maybe it has to be a quota disable and enable >>>> in the volume options). Currently testing a solution >>>> by Hari with the quota_fsck.py script (https://medium.com/@harigowtham/ >>>> glusterfs-quota-fix-accounting-840df33fcd3a) and its detecting a lot >>>> of size mismatch in files. >>>> >>>> Thank you, >>>> *Jo?o Ba?to* >>>> --------------- >>>> >>>> *Scientific Computing and Software Platform* >>>> Champalimaud Research >>>> Champalimaud Center for the Unknown >>>> Av. Bras?lia, Doca de Pedrou?os >>>> 1400-038 Lisbon, Portugal >>>> fchampalimaud.org >>>> >>>> >>>> Strahil Nikolov escreveu no dia sexta, >>>> 14/08/2020 ?(s) 10:16: >>>> >>>>> Hi Jo?o, >>>>> >>>>> Based on your output it seems that the quota size is different on the >>>>> 2 bricks. >>>>> >>>>> Have you tried to remove the quota and then recreate it ? Maybe it >>>>> will be the easiest way to fix it. >>>>> >>>>> Best Regards, >>>>> Strahil Nikolov >>>>> >>>>> >>>>> ?? 14 ?????? 2020 ?. 4:35:14 GMT+03:00, "Jo?o Ba?to" < >>>>> joao.bauto at neuro.fchampalimaud.org> ??????: >>>>> >Hi all, >>>>> > >>>>> >We have a 4-node distributed cluster with 2 bricks per node running >>>>> >Gluster >>>>> >7.7 + ZFS. We use directory quota to limit the space used by our >>>>> >members on >>>>> >each project. Two days ago we noticed inconsistent space used reported >>>>> >by >>>>> >Gluster in the quota list. >>>>> > >>>>> >A small snippet of gluster volume quota vol list, >>>>> > >>>>> > Path Hard-limit Soft-limit Used >>>>> >Available Soft-limit exceeded? Hard-limit exceeded? >>>>> >/projectA 5.0TB 80%(4.0TB) 3.1TB >>>>> 1.9TB >>>>> > No No >>>>> >*/projectB 100.0TB 80%(80.0TB) 16383.4PB 740.9TB >>>>> > No No* >>>>> >/projectC 70.0TB 80%(56.0TB) 50.0TB 20.0TB >>>>> > No No >>>>> > >>>>> >The total space available in the cluster is 360TB, the quota for >>>>> >projectB >>>>> >is 100TB and, as you can see, its reporting 16383.4PB used and 740TB >>>>> >available (already decreased from 750TB). >>>>> > >>>>> >There was an issue in Gluster 3.x related to the wrong directory quota >>>>> >( >>>>> > >>>>> https://lists.gluster.org/pipermail/gluster-users/2016-February/025305.html >>>>> > and >>>>> > >>>>> https://lists.gluster.org/pipermail/gluster-users/2018-November/035374.html >>>>> ) >>>>> >but it's marked as solved (not sure if the solution still applies). >>>>> > >>>>> >*On projectB* >>>>> ># getfattr -d -m . -e hex projectB >>>>> ># file: projectB >>>>> >trusted.gfid=0x3ca2bce0455945efa6662813ce20fc0c >>>>> >>>>> >trusted.glusterfs.9582685f-07fa-41fd-b9fc-ebab3a6989cf.xtime=0x5f35e69800098ed9 >>>>> >trusted.glusterfs.dht=0xe1a4060c000000003ffffffe5ffffffc >>>>> >>>>> >trusted.glusterfs.mdata=0x010000000000000000000000005f355c59000000000939079f000000005ce2aff90000000007fdacb0000000005ce2aff90000000007fdacb0 >>>>> >>>>> >trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri.1=0x0000ab0f227a860000000000478e33acffffffffffffc112 >>>>> >trusted.glusterfs.quota.dirty=0x3000 >>>>> >trusted.glusterfs.quota.limit-set.1=0x0000640000000000ffffffffffffffff >>>>> >>>>> >trusted.glusterfs.quota.size.1=0x0000ab0f227a860000000000478e33acffffffffffffc112 >>>>> > >>>>> >*On projectA* >>>>> ># getfattr -d -m . -e hex projectA >>>>> ># file: projectA >>>>> >trusted.gfid=0x05b09ded19354c0eb544d22d4659582e >>>>> >>>>> >trusted.glusterfs.9582685f-07fa-41fd-b9fc-ebab3a6989cf.xtime=0x5f1aeb9f00044c64 >>>>> >trusted.glusterfs.dht=0xe1a4060c000000001fffffff3ffffffd >>>>> >>>>> >trusted.glusterfs.mdata=0x010000000000000000000000005f1ac6a10000000018f30a4e000000005c338fab0000000017a3135a000000005b0694fb000000001584a21b >>>>> >>>>> >trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri.1=0x0000067de3bbe20000000000000128610000000000033498 >>>>> >trusted.glusterfs.quota.dirty=0x3000 >>>>> >trusted.glusterfs.quota.limit-set.1=0x0000460000000000ffffffffffffffff >>>>> >>>>> >trusted.glusterfs.quota.size.1=0x0000067de3bbe20000000000000128610000000000033498 >>>>> > >>>>> >Any idea on what's happening and how to fix it? >>>>> > >>>>> >Thanks! >>>>> >*Jo?o Ba?to* >>>>> >--------------- >>>>> > >>>>> >*Scientific Computing and Software Platform* >>>>> >Champalimaud Research >>>>> >Champalimaud Center for the Unknown >>>>> >Av. Bras?lia, Doca de Pedrou?os >>>>> >1400-038 Lisbon, Portugal >>>>> >fchampalimaud.org >>>>> >>>> ________ >>>> >>>> >>>> >>>> Community Meeting Calendar: >>>> >>>> Schedule - >>>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >>>> Bridge: https://bluejeans.com/441850968 >>>> >>>> Gluster-users mailing list >>>> Gluster-users at gluster.org >>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From ssivakum at redhat.com Tue Aug 18 20:42:22 2020 From: ssivakum at redhat.com (Srijan Sivakumar) Date: Wed, 19 Aug 2020 02:12:22 +0530 Subject: [Gluster-users] Wrong directory quota usage In-Reply-To: References: Message-ID: Hi Jo?o, There isn't a straightforward way of tracking the crawl but as gluster uses find and stat during crawl, one can run the following command, # ps aux | grep find If the output is of the form, "root 1513 0.0 0.1 127224 2636 ? S 12:24 0.00 /usr/bin/find . -exec /usr/bin/stat {} \" then it means that the crawl is still going on. Thanks and Regards, SRIJAN SIVAKUMAR Associate Software Engineer Red Hat T: +91-9727532362 TRIED. TESTED. TRUSTED. On Wed, Aug 19, 2020 at 1:46 AM Jo?o Ba?to < joao.bauto at neuro.fchampalimaud.org> wrote: > Hi Srijan, > > Is there a way of getting the status of the crawl process? > We are going to expand this cluster, adding 12 new bricks (around 500TB) > and we rely heavily on the quota feature to control the space usage for > each project. It's been running since Saturday (nothing changed) and > unsure if it's going to finish tomorrow or in weeks. > > Thank you! > *Jo?o Ba?to* > --------------- > > *Scientific Computing and Software Platform* > Champalimaud Research > Champalimaud Center for the Unknown > Av. Bras?lia, Doca de Pedrou?os > 1400-038 Lisbon, Portugal > fchampalimaud.org > > > Srijan Sivakumar escreveu no dia domingo, > 16/08/2020 ?(s) 06:11: > >> Hi Jo?o, >> >> Yes it'll take some time given the file system size as it has to change >> the xattrs in each level and then crawl upwards. >> >> stat is done by the script itself so the crawl is initiated. >> >> Regards, >> Srijan Sivakumar >> >> On Sun 16 Aug, 2020, 04:58 Jo?o Ba?to, < >> joao.bauto at neuro.fchampalimaud.org> wrote: >> >>> Hi Srijan & Strahil, >>> >>> I ran the quota_fsck script mentioned in Hari's blog post in all bricks >>> and it detected a lot of size mismatch. >>> >>> The script was executed as, >>> >>> - python quota_fsck.py --sub-dir projectB --fix-issues /mnt/tank >>> /tank/volume2/brick (in all nodes and bricks) >>> >>> Here is a snippet from the script, >>> >>> Size Mismatch /tank/volume2/brick/projectB {'parents': >>> {'00000000-0000-0000-0000-000000000001': {'contri_file_count': >>> 18446744073035296610L, 'contri_size': 18446645297413872640L, >>> 'contri_dir_count': 18446744073709527653L}}, 'version': '1', 'file_count': >>> 18446744073035296610L, 'dirty': False, 'dir_count': 18446744073709527653L, >>> 'size': 18446645297413872640L} 15204281691754 >>> MARKING DIRTY: /tank/volume2/brick/projectB >>> stat on /mnt/tank/projectB >>> Files verified : 683223 >>> Directories verified : 46823 >>> Objects Fixed : 705230 >>> >>> Checking the xattr in the bricks I can see the directory in question >>> marked as dirty, >>> # getfattr -d -m. -e hex /tank/volume2/brick/projectB >>> getfattr: Removing leading '/' from absolute path names >>> # file: tank/volume2/brick/projectB >>> trusted.gfid=0x3ca2bce0455945efa6662813ce20fc0c >>> >>> trusted.glusterfs.9582685f-07fa-41fd-b9fc-ebab3a6989cf.xtime=0x5f372478000a7705 >>> trusted.glusterfs.dht=0xe1a4060c000000003ffffffe5ffffffc >>> >>> trusted.glusterfs.mdata=0x010000000000000000000000005f3724750000000013ddf679000000005ce2aff90000000007fdacb0000000005ce2aff90000000007fdacb0 >>> >>> trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri.1=0x00000ca6ccf7a80000000000000790a1000000000000b6ea >>> trusted.glusterfs.quota.dirty=0x3100 >>> trusted.glusterfs.quota.limit-set.1=0x0000640000000000ffffffffffffffff >>> >>> trusted.glusterfs.quota.size.1=0x00000ca6ccf7a80000000000000790a1000000000000b6ea >>> >>> Now, my question is how do I trigger Gluster to recalculate the quota >>> for this directory? Is it automatic but it takes a while? Because the quota >>> list did change but not to a good "result". >>> >>> Path Hard-limit Soft-limit Used >>> Available Soft-limit exceeded? Hard-limit exceeded? >>> /projectB 100.0TB 80%(80.0TB) 16383.9PB 190.1TB >>> No No >>> >>> I would like to avoid a disable/enable quota in the volume as it removes >>> the configs. >>> >>> Thank you for all the help! >>> *Jo?o Ba?to* >>> --------------- >>> >>> *Scientific Computing and Software Platform* >>> Champalimaud Research >>> Champalimaud Center for the Unknown >>> Av. Bras?lia, Doca de Pedrou?os >>> 1400-038 Lisbon, Portugal >>> fchampalimaud.org >>> >>> >>> Srijan Sivakumar escreveu no dia s?bado, >>> 15/08/2020 ?(s) 11:57: >>> >>>> Hi Jo?o, >>>> >>>> The quota accounting error is what we're looking at here. I think >>>> you've already looked into the blog post by Hari and are using the script >>>> to fix the accounting issue. >>>> That should help you out in fixing this issue. >>>> >>>> Let me know if you face any issues while using it. >>>> >>>> Regards, >>>> Srijan Sivakumar >>>> >>>> >>>> On Fri 14 Aug, 2020, 17:10 Jo?o Ba?to, < >>>> joao.bauto at neuro.fchampalimaud.org> wrote: >>>> >>>>> Hi Strahil, >>>>> >>>>> I have tried removing the quota for that specific directory and >>>>> setting it again but it didn't work (maybe it has to be a quota disable and >>>>> enable in the volume options). Currently testing a solution >>>>> by Hari with the quota_fsck.py script (https://medium.com/@harigowtham >>>>> /glusterfs-quota-fix-accounting-840df33fcd3a) and its detecting a lot >>>>> of size mismatch in files. >>>>> >>>>> Thank you, >>>>> *Jo?o Ba?to* >>>>> --------------- >>>>> >>>>> *Scientific Computing and Software Platform* >>>>> Champalimaud Research >>>>> Champalimaud Center for the Unknown >>>>> Av. Bras?lia, Doca de Pedrou?os >>>>> 1400-038 Lisbon, Portugal >>>>> fchampalimaud.org >>>>> >>>>> >>>>> Strahil Nikolov escreveu no dia sexta, >>>>> 14/08/2020 ?(s) 10:16: >>>>> >>>>>> Hi Jo?o, >>>>>> >>>>>> Based on your output it seems that the quota size is different on the >>>>>> 2 bricks. >>>>>> >>>>>> Have you tried to remove the quota and then recreate it ? Maybe it >>>>>> will be the easiest way to fix it. >>>>>> >>>>>> Best Regards, >>>>>> Strahil Nikolov >>>>>> >>>>>> >>>>>> ?? 14 ?????? 2020 ?. 4:35:14 GMT+03:00, "Jo?o Ba?to" < >>>>>> joao.bauto at neuro.fchampalimaud.org> ??????: >>>>>> >Hi all, >>>>>> > >>>>>> >We have a 4-node distributed cluster with 2 bricks per node running >>>>>> >Gluster >>>>>> >7.7 + ZFS. We use directory quota to limit the space used by our >>>>>> >members on >>>>>> >each project. Two days ago we noticed inconsistent space used >>>>>> reported >>>>>> >by >>>>>> >Gluster in the quota list. >>>>>> > >>>>>> >A small snippet of gluster volume quota vol list, >>>>>> > >>>>>> > Path Hard-limit Soft-limit Used >>>>>> >Available Soft-limit exceeded? Hard-limit exceeded? >>>>>> >/projectA 5.0TB 80%(4.0TB) 3.1TB >>>>>> 1.9TB >>>>>> > No No >>>>>> >*/projectB 100.0TB 80%(80.0TB) 16383.4PB 740.9TB >>>>>> > No No* >>>>>> >/projectC 70.0TB 80%(56.0TB) 50.0TB 20.0TB >>>>>> > No No >>>>>> > >>>>>> >The total space available in the cluster is 360TB, the quota for >>>>>> >projectB >>>>>> >is 100TB and, as you can see, its reporting 16383.4PB used and 740TB >>>>>> >available (already decreased from 750TB). >>>>>> > >>>>>> >There was an issue in Gluster 3.x related to the wrong directory >>>>>> quota >>>>>> >( >>>>>> > >>>>>> https://lists.gluster.org/pipermail/gluster-users/2016-February/025305.html >>>>>> > and >>>>>> > >>>>>> https://lists.gluster.org/pipermail/gluster-users/2018-November/035374.html >>>>>> ) >>>>>> >but it's marked as solved (not sure if the solution still applies). >>>>>> > >>>>>> >*On projectB* >>>>>> ># getfattr -d -m . -e hex projectB >>>>>> ># file: projectB >>>>>> >trusted.gfid=0x3ca2bce0455945efa6662813ce20fc0c >>>>>> >>>>>> >trusted.glusterfs.9582685f-07fa-41fd-b9fc-ebab3a6989cf.xtime=0x5f35e69800098ed9 >>>>>> >trusted.glusterfs.dht=0xe1a4060c000000003ffffffe5ffffffc >>>>>> >>>>>> >trusted.glusterfs.mdata=0x010000000000000000000000005f355c59000000000939079f000000005ce2aff90000000007fdacb0000000005ce2aff90000000007fdacb0 >>>>>> >>>>>> >trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri.1=0x0000ab0f227a860000000000478e33acffffffffffffc112 >>>>>> >trusted.glusterfs.quota.dirty=0x3000 >>>>>> >>>>>> >trusted.glusterfs.quota.limit-set.1=0x0000640000000000ffffffffffffffff >>>>>> >>>>>> >trusted.glusterfs.quota.size.1=0x0000ab0f227a860000000000478e33acffffffffffffc112 >>>>>> > >>>>>> >*On projectA* >>>>>> ># getfattr -d -m . -e hex projectA >>>>>> ># file: projectA >>>>>> >trusted.gfid=0x05b09ded19354c0eb544d22d4659582e >>>>>> >>>>>> >trusted.glusterfs.9582685f-07fa-41fd-b9fc-ebab3a6989cf.xtime=0x5f1aeb9f00044c64 >>>>>> >trusted.glusterfs.dht=0xe1a4060c000000001fffffff3ffffffd >>>>>> >>>>>> >trusted.glusterfs.mdata=0x010000000000000000000000005f1ac6a10000000018f30a4e000000005c338fab0000000017a3135a000000005b0694fb000000001584a21b >>>>>> >>>>>> >trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri.1=0x0000067de3bbe20000000000000128610000000000033498 >>>>>> >trusted.glusterfs.quota.dirty=0x3000 >>>>>> >>>>>> >trusted.glusterfs.quota.limit-set.1=0x0000460000000000ffffffffffffffff >>>>>> >>>>>> >trusted.glusterfs.quota.size.1=0x0000067de3bbe20000000000000128610000000000033498 >>>>>> > >>>>>> >Any idea on what's happening and how to fix it? >>>>>> > >>>>>> >Thanks! >>>>>> >*Jo?o Ba?to* >>>>>> >--------------- >>>>>> > >>>>>> >*Scientific Computing and Software Platform* >>>>>> >Champalimaud Research >>>>>> >Champalimaud Center for the Unknown >>>>>> >Av. Bras?lia, Doca de Pedrou?os >>>>>> >1400-038 Lisbon, Portugal >>>>>> >fchampalimaud.org >>>>>> >>>>> ________ >>>>> >>>>> >>>>> >>>>> Community Meeting Calendar: >>>>> >>>>> Schedule - >>>>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >>>>> Bridge: https://bluejeans.com/441850968 >>>>> >>>>> Gluster-users mailing list >>>>> Gluster-users at gluster.org >>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>> >>>> -- -------------- next part -------------- An HTML attachment was scrubbed... URL: From joao.bauto at neuro.fchampalimaud.org Tue Aug 18 22:23:19 2020 From: joao.bauto at neuro.fchampalimaud.org (=?UTF-8?B?Sm/Do28gQmHDunRv?=) Date: Tue, 18 Aug 2020 23:23:19 +0100 Subject: [Gluster-users] Wrong directory quota usage In-Reply-To: References: Message-ID: Hi Srijan, I didn't get any result with that command so I went to our other cluster (we are merging two clusters, data is replicated) and activated the quota feature on the same directory. Running the same command on each node I get a similar output to yours. One process per brick I'm assuming. root 1746822 1.4 0.0 230324 2992 ? S 23:06 0:04 /usr/bin/find . -exec /usr/bin/stat {} \ ; root 1746858 5.3 0.0 233924 6644 ? S 23:06 0:15 /usr/bin/find . -exec /usr/bin/stat {} \ ; root 1746889 3.3 0.0 233592 6452 ? S 23:06 0:10 /usr/bin/find . -exec /usr/bin/stat {} \ ; root 1746930 3.1 0.0 230476 3232 ? S 23:06 0:09 /usr/bin/find . -exec /usr/bin/stat {} \ ; At this point, is it easier to just disable and enable the feature and force a new crawl? We don't mind a temporary increase in CPU and IO usage. Thank you again! *Jo?o Ba?to* --------------- *Scientific Computing and Software Platform* Champalimaud Research Champalimaud Center for the Unknown Av. Bras?lia, Doca de Pedrou?os 1400-038 Lisbon, Portugal fchampalimaud.org Srijan Sivakumar escreveu no dia ter?a, 18/08/2020 ?(s) 21:42: > Hi Jo?o, > > There isn't a straightforward way of tracking the crawl but as gluster > uses find and stat during crawl, one can run the following command, > # ps aux | grep find > > If the output is of the form, > "root 1513 0.0 0.1 127224 2636 ? S 12:24 0.00 > /usr/bin/find . -exec /usr/bin/stat {} \" > then it means that the crawl is still going on. > > > Thanks and Regards, > > SRIJAN SIVAKUMAR > > Associate Software Engineer > > Red Hat > > > > > > T: +91-9727532362 > > > TRIED. TESTED. TRUSTED. > > > On Wed, Aug 19, 2020 at 1:46 AM Jo?o Ba?to < > joao.bauto at neuro.fchampalimaud.org> wrote: > >> Hi Srijan, >> >> Is there a way of getting the status of the crawl process? >> We are going to expand this cluster, adding 12 new bricks (around 500TB) >> and we rely heavily on the quota feature to control the space usage for >> each project. It's been running since Saturday (nothing changed) and >> unsure if it's going to finish tomorrow or in weeks. >> >> Thank you! >> *Jo?o Ba?to* >> --------------- >> >> *Scientific Computing and Software Platform* >> Champalimaud Research >> Champalimaud Center for the Unknown >> Av. Bras?lia, Doca de Pedrou?os >> 1400-038 Lisbon, Portugal >> fchampalimaud.org >> >> >> Srijan Sivakumar escreveu no dia domingo, >> 16/08/2020 ?(s) 06:11: >> >>> Hi Jo?o, >>> >>> Yes it'll take some time given the file system size as it has to change >>> the xattrs in each level and then crawl upwards. >>> >>> stat is done by the script itself so the crawl is initiated. >>> >>> Regards, >>> Srijan Sivakumar >>> >>> On Sun 16 Aug, 2020, 04:58 Jo?o Ba?to, < >>> joao.bauto at neuro.fchampalimaud.org> wrote: >>> >>>> Hi Srijan & Strahil, >>>> >>>> I ran the quota_fsck script mentioned in Hari's blog post in all bricks >>>> and it detected a lot of size mismatch. >>>> >>>> The script was executed as, >>>> >>>> - python quota_fsck.py --sub-dir projectB --fix-issues /mnt/tank >>>> /tank/volume2/brick (in all nodes and bricks) >>>> >>>> Here is a snippet from the script, >>>> >>>> Size Mismatch /tank/volume2/brick/projectB {'parents': >>>> {'00000000-0000-0000-0000-000000000001': {'contri_file_count': >>>> 18446744073035296610L, 'contri_size': 18446645297413872640L, >>>> 'contri_dir_count': 18446744073709527653L}}, 'version': '1', 'file_count': >>>> 18446744073035296610L, 'dirty': False, 'dir_count': 18446744073709527653L, >>>> 'size': 18446645297413872640L} 15204281691754 >>>> MARKING DIRTY: /tank/volume2/brick/projectB >>>> stat on /mnt/tank/projectB >>>> Files verified : 683223 >>>> Directories verified : 46823 >>>> Objects Fixed : 705230 >>>> >>>> Checking the xattr in the bricks I can see the directory in question >>>> marked as dirty, >>>> # getfattr -d -m. -e hex /tank/volume2/brick/projectB >>>> getfattr: Removing leading '/' from absolute path names >>>> # file: tank/volume2/brick/projectB >>>> trusted.gfid=0x3ca2bce0455945efa6662813ce20fc0c >>>> >>>> trusted.glusterfs.9582685f-07fa-41fd-b9fc-ebab3a6989cf.xtime=0x5f372478000a7705 >>>> trusted.glusterfs.dht=0xe1a4060c000000003ffffffe5ffffffc >>>> >>>> trusted.glusterfs.mdata=0x010000000000000000000000005f3724750000000013ddf679000000005ce2aff90000000007fdacb0000000005ce2aff90000000007fdacb0 >>>> >>>> trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri.1=0x00000ca6ccf7a80000000000000790a1000000000000b6ea >>>> trusted.glusterfs.quota.dirty=0x3100 >>>> trusted.glusterfs.quota.limit-set.1=0x0000640000000000ffffffffffffffff >>>> >>>> trusted.glusterfs.quota.size.1=0x00000ca6ccf7a80000000000000790a1000000000000b6ea >>>> >>>> Now, my question is how do I trigger Gluster to recalculate the quota >>>> for this directory? Is it automatic but it takes a while? Because the quota >>>> list did change but not to a good "result". >>>> >>>> Path Hard-limit Soft-limit Used >>>> Available Soft-limit exceeded? Hard-limit exceeded? >>>> /projectB 100.0TB 80%(80.0TB) 16383.9PB 190.1TB >>>> No No >>>> >>>> I would like to avoid a disable/enable quota in the volume as it >>>> removes the configs. >>>> >>>> Thank you for all the help! >>>> *Jo?o Ba?to* >>>> --------------- >>>> >>>> *Scientific Computing and Software Platform* >>>> Champalimaud Research >>>> Champalimaud Center for the Unknown >>>> Av. Bras?lia, Doca de Pedrou?os >>>> 1400-038 Lisbon, Portugal >>>> fchampalimaud.org >>>> >>>> >>>> Srijan Sivakumar escreveu no dia s?bado, >>>> 15/08/2020 ?(s) 11:57: >>>> >>>>> Hi Jo?o, >>>>> >>>>> The quota accounting error is what we're looking at here. I think >>>>> you've already looked into the blog post by Hari and are using the script >>>>> to fix the accounting issue. >>>>> That should help you out in fixing this issue. >>>>> >>>>> Let me know if you face any issues while using it. >>>>> >>>>> Regards, >>>>> Srijan Sivakumar >>>>> >>>>> >>>>> On Fri 14 Aug, 2020, 17:10 Jo?o Ba?to, < >>>>> joao.bauto at neuro.fchampalimaud.org> wrote: >>>>> >>>>>> Hi Strahil, >>>>>> >>>>>> I have tried removing the quota for that specific directory and >>>>>> setting it again but it didn't work (maybe it has to be a quota disable and >>>>>> enable in the volume options). Currently testing a solution >>>>>> by Hari with the quota_fsck.py script (https://medium.com/@ >>>>>> harigowtham/glusterfs-quota-fix-accounting-840df33fcd3a) and its >>>>>> detecting a lot of size mismatch in files. >>>>>> >>>>>> Thank you, >>>>>> *Jo?o Ba?to* >>>>>> --------------- >>>>>> >>>>>> *Scientific Computing and Software Platform* >>>>>> Champalimaud Research >>>>>> Champalimaud Center for the Unknown >>>>>> Av. Bras?lia, Doca de Pedrou?os >>>>>> 1400-038 Lisbon, Portugal >>>>>> fchampalimaud.org >>>>>> >>>>>> >>>>>> Strahil Nikolov escreveu no dia sexta, >>>>>> 14/08/2020 ?(s) 10:16: >>>>>> >>>>>>> Hi Jo?o, >>>>>>> >>>>>>> Based on your output it seems that the quota size is different on >>>>>>> the 2 bricks. >>>>>>> >>>>>>> Have you tried to remove the quota and then recreate it ? Maybe it >>>>>>> will be the easiest way to fix it. >>>>>>> >>>>>>> Best Regards, >>>>>>> Strahil Nikolov >>>>>>> >>>>>>> >>>>>>> ?? 14 ?????? 2020 ?. 4:35:14 GMT+03:00, "Jo?o Ba?to" < >>>>>>> joao.bauto at neuro.fchampalimaud.org> ??????: >>>>>>> >Hi all, >>>>>>> > >>>>>>> >We have a 4-node distributed cluster with 2 bricks per node running >>>>>>> >Gluster >>>>>>> >7.7 + ZFS. We use directory quota to limit the space used by our >>>>>>> >members on >>>>>>> >each project. Two days ago we noticed inconsistent space used >>>>>>> reported >>>>>>> >by >>>>>>> >Gluster in the quota list. >>>>>>> > >>>>>>> >A small snippet of gluster volume quota vol list, >>>>>>> > >>>>>>> > Path Hard-limit Soft-limit Used >>>>>>> >Available Soft-limit exceeded? Hard-limit exceeded? >>>>>>> >/projectA 5.0TB 80%(4.0TB) 3.1TB >>>>>>> 1.9TB >>>>>>> > No No >>>>>>> >*/projectB 100.0TB 80%(80.0TB) 16383.4PB 740.9TB >>>>>>> > No No* >>>>>>> >/projectC 70.0TB 80%(56.0TB) 50.0TB >>>>>>> 20.0TB >>>>>>> > No No >>>>>>> > >>>>>>> >The total space available in the cluster is 360TB, the quota for >>>>>>> >projectB >>>>>>> >is 100TB and, as you can see, its reporting 16383.4PB used and 740TB >>>>>>> >available (already decreased from 750TB). >>>>>>> > >>>>>>> >There was an issue in Gluster 3.x related to the wrong directory >>>>>>> quota >>>>>>> >( >>>>>>> > >>>>>>> https://lists.gluster.org/pipermail/gluster-users/2016-February/025305.html >>>>>>> > and >>>>>>> > >>>>>>> https://lists.gluster.org/pipermail/gluster-users/2018-November/035374.html >>>>>>> ) >>>>>>> >but it's marked as solved (not sure if the solution still applies). >>>>>>> > >>>>>>> >*On projectB* >>>>>>> ># getfattr -d -m . -e hex projectB >>>>>>> ># file: projectB >>>>>>> >trusted.gfid=0x3ca2bce0455945efa6662813ce20fc0c >>>>>>> >>>>>>> >trusted.glusterfs.9582685f-07fa-41fd-b9fc-ebab3a6989cf.xtime=0x5f35e69800098ed9 >>>>>>> >trusted.glusterfs.dht=0xe1a4060c000000003ffffffe5ffffffc >>>>>>> >>>>>>> >trusted.glusterfs.mdata=0x010000000000000000000000005f355c59000000000939079f000000005ce2aff90000000007fdacb0000000005ce2aff90000000007fdacb0 >>>>>>> >>>>>>> >trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri.1=0x0000ab0f227a860000000000478e33acffffffffffffc112 >>>>>>> >trusted.glusterfs.quota.dirty=0x3000 >>>>>>> >>>>>>> >trusted.glusterfs.quota.limit-set.1=0x0000640000000000ffffffffffffffff >>>>>>> >>>>>>> >trusted.glusterfs.quota.size.1=0x0000ab0f227a860000000000478e33acffffffffffffc112 >>>>>>> > >>>>>>> >*On projectA* >>>>>>> ># getfattr -d -m . -e hex projectA >>>>>>> ># file: projectA >>>>>>> >trusted.gfid=0x05b09ded19354c0eb544d22d4659582e >>>>>>> >>>>>>> >trusted.glusterfs.9582685f-07fa-41fd-b9fc-ebab3a6989cf.xtime=0x5f1aeb9f00044c64 >>>>>>> >trusted.glusterfs.dht=0xe1a4060c000000001fffffff3ffffffd >>>>>>> >>>>>>> >trusted.glusterfs.mdata=0x010000000000000000000000005f1ac6a10000000018f30a4e000000005c338fab0000000017a3135a000000005b0694fb000000001584a21b >>>>>>> >>>>>>> >trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri.1=0x0000067de3bbe20000000000000128610000000000033498 >>>>>>> >trusted.glusterfs.quota.dirty=0x3000 >>>>>>> >>>>>>> >trusted.glusterfs.quota.limit-set.1=0x0000460000000000ffffffffffffffff >>>>>>> >>>>>>> >trusted.glusterfs.quota.size.1=0x0000067de3bbe20000000000000128610000000000033498 >>>>>>> > >>>>>>> >Any idea on what's happening and how to fix it? >>>>>>> > >>>>>>> >Thanks! >>>>>>> >*Jo?o Ba?to* >>>>>>> >--------------- >>>>>>> > >>>>>>> >*Scientific Computing and Software Platform* >>>>>>> >Champalimaud Research >>>>>>> >Champalimaud Center for the Unknown >>>>>>> >Av. Bras?lia, Doca de Pedrou?os >>>>>>> >1400-038 Lisbon, Portugal >>>>>>> >fchampalimaud.org >>>>>>> >>>>>> ________ >>>>>> >>>>>> >>>>>> >>>>>> Community Meeting Calendar: >>>>>> >>>>>> Schedule - >>>>>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >>>>>> Bridge: https://bluejeans.com/441850968 >>>>>> >>>>>> Gluster-users mailing list >>>>>> Gluster-users at gluster.org >>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>>> >>>>> > > -- > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ssivakum at redhat.com Wed Aug 19 06:24:41 2020 From: ssivakum at redhat.com (Srijan Sivakumar) Date: Wed, 19 Aug 2020 11:54:41 +0530 Subject: [Gluster-users] Wrong directory quota usage In-Reply-To: References: Message-ID: Hi Jo?o, If the crawl is not going on and the values are still not reflecting properly then it means the crawl process has ended abruptly. Yes, technically disabling and enabling the quota will trigger crawl but it'd do a complete crawl of the filesystem, hence would take time and be resource consuming. Usually disabling-enabling is the last thing to do if the accounting isn't reflecting properly but if you're going to merge these two clusters then probably you can go ahead with the merging and then enable quota. -- Thanks and Regards, SRIJAN SIVAKUMAR Associate Software Engineer Red Hat T: +91-9727532362 TRIED. TESTED. TRUSTED. On Wed, Aug 19, 2020 at 3:53 AM Jo?o Ba?to < joao.bauto at neuro.fchampalimaud.org> wrote: > Hi Srijan, > > I didn't get any result with that command so I went to our other cluster > (we are merging two clusters, data is replicated) and activated the quota > feature on the same directory. Running the same command on each node I get > a similar output to yours. One process per brick I'm assuming. > > root 1746822 1.4 0.0 230324 2992 ? S 23:06 0:04 > /usr/bin/find . -exec /usr/bin/stat {} \ ; > root 1746858 5.3 0.0 233924 6644 ? S 23:06 0:15 > /usr/bin/find . -exec /usr/bin/stat {} \ ; > root 1746889 3.3 0.0 233592 6452 ? S 23:06 0:10 > /usr/bin/find . -exec /usr/bin/stat {} \ ; > root 1746930 3.1 0.0 230476 3232 ? S 23:06 0:09 > /usr/bin/find . -exec /usr/bin/stat {} \ ; > > At this point, is it easier to just disable and enable the feature and > force a new crawl? We don't mind a temporary increase in CPU and IO usage. > > Thank you again! > *Jo?o Ba?to* > --------------- > > *Scientific Computing and Software Platform* > Champalimaud Research > Champalimaud Center for the Unknown > Av. Bras?lia, Doca de Pedrou?os > 1400-038 Lisbon, Portugal > fchampalimaud.org > > > Srijan Sivakumar escreveu no dia ter?a, 18/08/2020 > ?(s) 21:42: > >> Hi Jo?o, >> >> There isn't a straightforward way of tracking the crawl but as gluster >> uses find and stat during crawl, one can run the following command, >> # ps aux | grep find >> >> If the output is of the form, >> "root 1513 0.0 0.1 127224 2636 ? S 12:24 0.00 >> /usr/bin/find . -exec /usr/bin/stat {} \" >> then it means that the crawl is still going on. >> >> >> Thanks and Regards, >> >> SRIJAN SIVAKUMAR >> >> Associate Software Engineer >> >> Red Hat >> >> >> >> >> >> T: +91-9727532362 >> >> >> TRIED. TESTED. TRUSTED. >> >> >> On Wed, Aug 19, 2020 at 1:46 AM Jo?o Ba?to < >> joao.bauto at neuro.fchampalimaud.org> wrote: >> >>> Hi Srijan, >>> >>> Is there a way of getting the status of the crawl process? >>> We are going to expand this cluster, adding 12 new bricks (around 500TB) >>> and we rely heavily on the quota feature to control the space usage for >>> each project. It's been running since Saturday (nothing changed) and >>> unsure if it's going to finish tomorrow or in weeks. >>> >>> Thank you! >>> *Jo?o Ba?to* >>> --------------- >>> >>> *Scientific Computing and Software Platform* >>> Champalimaud Research >>> Champalimaud Center for the Unknown >>> Av. Bras?lia, Doca de Pedrou?os >>> 1400-038 Lisbon, Portugal >>> fchampalimaud.org >>> >>> >>> Srijan Sivakumar escreveu no dia domingo, >>> 16/08/2020 ?(s) 06:11: >>> >>>> Hi Jo?o, >>>> >>>> Yes it'll take some time given the file system size as it has to change >>>> the xattrs in each level and then crawl upwards. >>>> >>>> stat is done by the script itself so the crawl is initiated. >>>> >>>> Regards, >>>> Srijan Sivakumar >>>> >>>> On Sun 16 Aug, 2020, 04:58 Jo?o Ba?to, < >>>> joao.bauto at neuro.fchampalimaud.org> wrote: >>>> >>>>> Hi Srijan & Strahil, >>>>> >>>>> I ran the quota_fsck script mentioned in Hari's blog post in all >>>>> bricks and it detected a lot of size mismatch. >>>>> >>>>> The script was executed as, >>>>> >>>>> - python quota_fsck.py --sub-dir projectB --fix-issues /mnt/tank >>>>> /tank/volume2/brick (in all nodes and bricks) >>>>> >>>>> Here is a snippet from the script, >>>>> >>>>> Size Mismatch /tank/volume2/brick/projectB {'parents': >>>>> {'00000000-0000-0000-0000-000000000001': {'contri_file_count': >>>>> 18446744073035296610L, 'contri_size': 18446645297413872640L, >>>>> 'contri_dir_count': 18446744073709527653L}}, 'version': '1', 'file_count': >>>>> 18446744073035296610L, 'dirty': False, 'dir_count': 18446744073709527653L, >>>>> 'size': 18446645297413872640L} 15204281691754 >>>>> MARKING DIRTY: /tank/volume2/brick/projectB >>>>> stat on /mnt/tank/projectB >>>>> Files verified : 683223 >>>>> Directories verified : 46823 >>>>> Objects Fixed : 705230 >>>>> >>>>> Checking the xattr in the bricks I can see the directory in question >>>>> marked as dirty, >>>>> # getfattr -d -m. -e hex /tank/volume2/brick/projectB >>>>> getfattr: Removing leading '/' from absolute path names >>>>> # file: tank/volume2/brick/projectB >>>>> trusted.gfid=0x3ca2bce0455945efa6662813ce20fc0c >>>>> >>>>> trusted.glusterfs.9582685f-07fa-41fd-b9fc-ebab3a6989cf.xtime=0x5f372478000a7705 >>>>> trusted.glusterfs.dht=0xe1a4060c000000003ffffffe5ffffffc >>>>> >>>>> trusted.glusterfs.mdata=0x010000000000000000000000005f3724750000000013ddf679000000005ce2aff90000000007fdacb0000000005ce2aff90000000007fdacb0 >>>>> >>>>> trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri.1=0x00000ca6ccf7a80000000000000790a1000000000000b6ea >>>>> trusted.glusterfs.quota.dirty=0x3100 >>>>> trusted.glusterfs.quota.limit-set.1=0x0000640000000000ffffffffffffffff >>>>> >>>>> trusted.glusterfs.quota.size.1=0x00000ca6ccf7a80000000000000790a1000000000000b6ea >>>>> >>>>> Now, my question is how do I trigger Gluster to recalculate the quota >>>>> for this directory? Is it automatic but it takes a while? Because the quota >>>>> list did change but not to a good "result". >>>>> >>>>> Path Hard-limit Soft-limit Used >>>>> Available Soft-limit exceeded? Hard-limit exceeded? >>>>> /projectB 100.0TB 80%(80.0TB) 16383.9PB 190.1TB >>>>> No No >>>>> >>>>> I would like to avoid a disable/enable quota in the volume as it >>>>> removes the configs. >>>>> >>>>> Thank you for all the help! >>>>> *Jo?o Ba?to* >>>>> --------------- >>>>> >>>>> *Scientific Computing and Software Platform* >>>>> Champalimaud Research >>>>> Champalimaud Center for the Unknown >>>>> Av. Bras?lia, Doca de Pedrou?os >>>>> 1400-038 Lisbon, Portugal >>>>> fchampalimaud.org >>>>> >>>>> >>>>> Srijan Sivakumar escreveu no dia s?bado, >>>>> 15/08/2020 ?(s) 11:57: >>>>> >>>>>> Hi Jo?o, >>>>>> >>>>>> The quota accounting error is what we're looking at here. I think >>>>>> you've already looked into the blog post by Hari and are using the script >>>>>> to fix the accounting issue. >>>>>> That should help you out in fixing this issue. >>>>>> >>>>>> Let me know if you face any issues while using it. >>>>>> >>>>>> Regards, >>>>>> Srijan Sivakumar >>>>>> >>>>>> >>>>>> On Fri 14 Aug, 2020, 17:10 Jo?o Ba?to, < >>>>>> joao.bauto at neuro.fchampalimaud.org> wrote: >>>>>> >>>>>>> Hi Strahil, >>>>>>> >>>>>>> I have tried removing the quota for that specific directory and >>>>>>> setting it again but it didn't work (maybe it has to be a quota disable and >>>>>>> enable in the volume options). Currently testing a solution >>>>>>> by Hari with the quota_fsck.py script (https://medium.com/@ >>>>>>> harigowtham/glusterfs-quota-fix-accounting-840df33fcd3a) and its >>>>>>> detecting a lot of size mismatch in files. >>>>>>> >>>>>>> Thank you, >>>>>>> *Jo?o Ba?to* >>>>>>> --------------- >>>>>>> >>>>>>> *Scientific Computing and Software Platform* >>>>>>> Champalimaud Research >>>>>>> Champalimaud Center for the Unknown >>>>>>> Av. Bras?lia, Doca de Pedrou?os >>>>>>> 1400-038 Lisbon, Portugal >>>>>>> fchampalimaud.org >>>>>>> >>>>>>> >>>>>>> Strahil Nikolov escreveu no dia sexta, >>>>>>> 14/08/2020 ?(s) 10:16: >>>>>>> >>>>>>>> Hi Jo?o, >>>>>>>> >>>>>>>> Based on your output it seems that the quota size is different on >>>>>>>> the 2 bricks. >>>>>>>> >>>>>>>> Have you tried to remove the quota and then recreate it ? Maybe it >>>>>>>> will be the easiest way to fix it. >>>>>>>> >>>>>>>> Best Regards, >>>>>>>> Strahil Nikolov >>>>>>>> >>>>>>>> >>>>>>>> ?? 14 ?????? 2020 ?. 4:35:14 GMT+03:00, "Jo?o Ba?to" < >>>>>>>> joao.bauto at neuro.fchampalimaud.org> ??????: >>>>>>>> >Hi all, >>>>>>>> > >>>>>>>> >We have a 4-node distributed cluster with 2 bricks per node running >>>>>>>> >Gluster >>>>>>>> >7.7 + ZFS. We use directory quota to limit the space used by our >>>>>>>> >members on >>>>>>>> >each project. Two days ago we noticed inconsistent space used >>>>>>>> reported >>>>>>>> >by >>>>>>>> >Gluster in the quota list. >>>>>>>> > >>>>>>>> >A small snippet of gluster volume quota vol list, >>>>>>>> > >>>>>>>> > Path Hard-limit Soft-limit Used >>>>>>>> >Available Soft-limit exceeded? Hard-limit exceeded? >>>>>>>> >/projectA 5.0TB 80%(4.0TB) 3.1TB >>>>>>>> 1.9TB >>>>>>>> > No No >>>>>>>> >*/projectB 100.0TB 80%(80.0TB) 16383.4PB 740.9TB >>>>>>>> > No No* >>>>>>>> >/projectC 70.0TB 80%(56.0TB) 50.0TB >>>>>>>> 20.0TB >>>>>>>> > No No >>>>>>>> > >>>>>>>> >The total space available in the cluster is 360TB, the quota for >>>>>>>> >projectB >>>>>>>> >is 100TB and, as you can see, its reporting 16383.4PB used and >>>>>>>> 740TB >>>>>>>> >available (already decreased from 750TB). >>>>>>>> > >>>>>>>> >There was an issue in Gluster 3.x related to the wrong directory >>>>>>>> quota >>>>>>>> >( >>>>>>>> > >>>>>>>> https://lists.gluster.org/pipermail/gluster-users/2016-February/025305.html >>>>>>>> > and >>>>>>>> > >>>>>>>> https://lists.gluster.org/pipermail/gluster-users/2018-November/035374.html >>>>>>>> ) >>>>>>>> >but it's marked as solved (not sure if the solution still applies). >>>>>>>> > >>>>>>>> >*On projectB* >>>>>>>> ># getfattr -d -m . -e hex projectB >>>>>>>> ># file: projectB >>>>>>>> >trusted.gfid=0x3ca2bce0455945efa6662813ce20fc0c >>>>>>>> >>>>>>>> >trusted.glusterfs.9582685f-07fa-41fd-b9fc-ebab3a6989cf.xtime=0x5f35e69800098ed9 >>>>>>>> >trusted.glusterfs.dht=0xe1a4060c000000003ffffffe5ffffffc >>>>>>>> >>>>>>>> >trusted.glusterfs.mdata=0x010000000000000000000000005f355c59000000000939079f000000005ce2aff90000000007fdacb0000000005ce2aff90000000007fdacb0 >>>>>>>> >>>>>>>> >trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri.1=0x0000ab0f227a860000000000478e33acffffffffffffc112 >>>>>>>> >trusted.glusterfs.quota.dirty=0x3000 >>>>>>>> >>>>>>>> >trusted.glusterfs.quota.limit-set.1=0x0000640000000000ffffffffffffffff >>>>>>>> >>>>>>>> >trusted.glusterfs.quota.size.1=0x0000ab0f227a860000000000478e33acffffffffffffc112 >>>>>>>> > >>>>>>>> >*On projectA* >>>>>>>> ># getfattr -d -m . -e hex projectA >>>>>>>> ># file: projectA >>>>>>>> >trusted.gfid=0x05b09ded19354c0eb544d22d4659582e >>>>>>>> >>>>>>>> >trusted.glusterfs.9582685f-07fa-41fd-b9fc-ebab3a6989cf.xtime=0x5f1aeb9f00044c64 >>>>>>>> >trusted.glusterfs.dht=0xe1a4060c000000001fffffff3ffffffd >>>>>>>> >>>>>>>> >trusted.glusterfs.mdata=0x010000000000000000000000005f1ac6a10000000018f30a4e000000005c338fab0000000017a3135a000000005b0694fb000000001584a21b >>>>>>>> >>>>>>>> >trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri.1=0x0000067de3bbe20000000000000128610000000000033498 >>>>>>>> >trusted.glusterfs.quota.dirty=0x3000 >>>>>>>> >>>>>>>> >trusted.glusterfs.quota.limit-set.1=0x0000460000000000ffffffffffffffff >>>>>>>> >>>>>>>> >trusted.glusterfs.quota.size.1=0x0000067de3bbe20000000000000128610000000000033498 >>>>>>>> > >>>>>>>> >Any idea on what's happening and how to fix it? >>>>>>>> > >>>>>>>> >Thanks! >>>>>>>> >*Jo?o Ba?to* >>>>>>>> >--------------- >>>>>>>> > >>>>>>>> >*Scientific Computing and Software Platform* >>>>>>>> >Champalimaud Research >>>>>>>> >Champalimaud Center for the Unknown >>>>>>>> >Av. Bras?lia, Doca de Pedrou?os >>>>>>>> >1400-038 Lisbon, Portugal >>>>>>>> >fchampalimaud.org >>>>>>>> >>>>>>> ________ >>>>>>> >>>>>>> >>>>>>> >>>>>>> Community Meeting Calendar: >>>>>>> >>>>>>> Schedule - >>>>>>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >>>>>>> Bridge: https://bluejeans.com/441850968 >>>>>>> >>>>>>> Gluster-users mailing list >>>>>>> Gluster-users at gluster.org >>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>>>> >>>>>> >> >> -- >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vd at d7informatics.de Wed Aug 19 11:21:51 2020 From: vd at d7informatics.de (Volker Dormeyer) Date: Wed, 19 Aug 2020 13:21:51 +0200 Subject: [Gluster-users] Gluster cluster questions Message-ID: <94347073-adf0-b14c-3f7c-e856f364b971@d7informatics.de> Hello All, I am new to this list. I am planning do create a cluster with several Gluster nodes. I get to the following questions: a) Is it possible to build a cluster with some of the bricks as SSD and some of them with HDD? In the end I want two separate pools one fast pool with SSD storage and a second slow pool with HDD. Is there a way to define a scenario like this in Gluster? The question is not related to SSD caching. b) Or, shall I spilt this in two clusters of Gluster, one fast the another one slow. My imagination is to build something like this: Kubernetes Cluster 1??? ...??? Kubernetes Cluster N ???????? |????????????????????????????? | ?storage class slow???? <->???? storage class fast ----------------------------------------------------- ???????? ? ????????? \ ?????? / ??? ???? ? ? ????????? Heketi ???????????????????? /??????? \ ?? Gluster 1 (fast)????????????? Gluster 2 (slow) Best Regards, Volker From aravinda at kadalu.io Wed Aug 19 11:47:51 2020 From: aravinda at kadalu.io (Aravinda VK) Date: Wed, 19 Aug 2020 17:17:51 +0530 Subject: [Gluster-users] Gluster cluster questions In-Reply-To: <94347073-adf0-b14c-3f7c-e856f364b971@d7informatics.de> References: <94347073-adf0-b14c-3f7c-e856f364b971@d7informatics.de> Message-ID: <8D98E9C4-673E-4D55-8A88-A89DD2B5A8F2@kadalu.io> Hi Volker, > On 19-Aug-2020, at 4:51 PM, Volker Dormeyer wrote: > > Hello All, > > I am new to this list. > > I am planning do create a cluster with several Gluster nodes. I get to > the following questions: > > a) Is it possible to build a cluster with some of the bricks as SSD and > some of them with HDD? In the end I want two separate pools one fast > pool with SSD storage and a second slow pool with HDD. Is there a way to > define a scenario like this in Gluster? The question is not related to > SSD caching. Cluster in GlusterFS is only required to create group of nodes which can be used to provision Volumes as required. Disks are not tied to Cluster. Volumes can be created as required. For example, Create a Volume using HDD and create another volume using SSD. (I am not sure this flexibility is available in Heketi or not, may be that supports tagging the devices and provision using those tags) > > b) Or, shall I spilt this in two clusters of Gluster, one fast the > another one slow. As answered above, Splitting the Cluster is not required. Create two Volumes. > > My imagination is to build something like this: > > > Kubernetes Cluster 1 ... Kubernetes Cluster N > | | > storage class slow <-> storage class fast > > ----------------------------------------------------- > \ / > Heketi > / \ > Gluster 1 (fast) Gluster 2 (slow) https://kadalu.io can be used as alternative to Heketi to use GlusterFS with Kubernetes. Create two Storage pools one with SSD and another with HDD, and create Storage class to use these pools. https://kadalu.io/docs/k8s-storage/latest/storage-classes > > > Best Regards, > Volker > > ________ > > > > Community Meeting Calendar: > > Schedule - > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > Bridge: https://bluejeans.com/441850968 > > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users Aravinda Vishwanathapura https://kadalu.io -------------- next part -------------- An HTML attachment was scrubbed... URL: From joao.bauto at neuro.fchampalimaud.org Wed Aug 19 14:42:08 2020 From: joao.bauto at neuro.fchampalimaud.org (=?UTF-8?B?Sm/Do28gQmHDunRv?=) Date: Wed, 19 Aug 2020 15:42:08 +0100 Subject: [Gluster-users] Wrong directory quota usage In-Reply-To: References: Message-ID: Hi Srijan, Before I do the disable/enable just want to check something with you. The other cluster where the crawling is running, I can see the find command and this one which seems to be the one triggering the crawler (4 processes, one per brick in all nodes) /usr/sbin/glusterfs -s localhost --volfile-id client_per_brick/tank.client.hostname.tank-volume1-brick.vol --use-readdirp=yes --client-pid -100 -l /var/log/glusterfs/quota_crawl/tank-volume1-brick.log /var/run/gluster/tmp/mntYbIVwT Can I manually trigger this command? Thanks! *Jo?o Ba?to* --------------- *Scientific Computing and Software Platform* Champalimaud Research Champalimaud Center for the Unknown Av. Bras?lia, Doca de Pedrou?os 1400-038 Lisbon, Portugal fchampalimaud.org Srijan Sivakumar escreveu no dia quarta, 19/08/2020 ?(s) 07:25: > Hi Jo?o, > > If the crawl is not going on and the values are still not reflecting > properly then it means the crawl process has ended abruptly. > > Yes, technically disabling and enabling the quota will trigger crawl but > it'd do a complete crawl of the filesystem, hence would take time and be > resource consuming. Usually disabling-enabling is the last thing to do if > the accounting isn't reflecting properly but if you're going to merge these > two clusters then probably you can go ahead with the merging and then > enable quota. > > -- > Thanks and Regards, > > SRIJAN SIVAKUMAR > > Associate Software Engineer > > Red Hat > > > > > > T: +91-9727532362 > > > TRIED. TESTED. TRUSTED. > > On Wed, Aug 19, 2020 at 3:53 AM Jo?o Ba?to < > joao.bauto at neuro.fchampalimaud.org> wrote: > >> Hi Srijan, >> >> I didn't get any result with that command so I went to our other cluster >> (we are merging two clusters, data is replicated) and activated the quota >> feature on the same directory. Running the same command on each node I get >> a similar output to yours. One process per brick I'm assuming. >> >> root 1746822 1.4 0.0 230324 2992 ? S 23:06 0:04 >> /usr/bin/find . -exec /usr/bin/stat {} \ ; >> root 1746858 5.3 0.0 233924 6644 ? S 23:06 0:15 >> /usr/bin/find . -exec /usr/bin/stat {} \ ; >> root 1746889 3.3 0.0 233592 6452 ? S 23:06 0:10 >> /usr/bin/find . -exec /usr/bin/stat {} \ ; >> root 1746930 3.1 0.0 230476 3232 ? S 23:06 0:09 >> /usr/bin/find . -exec /usr/bin/stat {} \ ; >> >> At this point, is it easier to just disable and enable the feature and >> force a new crawl? We don't mind a temporary increase in CPU and IO usage. >> >> Thank you again! >> *Jo?o Ba?to* >> --------------- >> >> *Scientific Computing and Software Platform* >> Champalimaud Research >> Champalimaud Center for the Unknown >> Av. Bras?lia, Doca de Pedrou?os >> 1400-038 Lisbon, Portugal >> fchampalimaud.org >> >> >> Srijan Sivakumar escreveu no dia ter?a, 18/08/2020 >> ?(s) 21:42: >> >>> Hi Jo?o, >>> >>> There isn't a straightforward way of tracking the crawl but as gluster >>> uses find and stat during crawl, one can run the following command, >>> # ps aux | grep find >>> >>> If the output is of the form, >>> "root 1513 0.0 0.1 127224 2636 ? S 12:24 0.00 >>> /usr/bin/find . -exec /usr/bin/stat {} \" >>> then it means that the crawl is still going on. >>> >>> >>> Thanks and Regards, >>> >>> SRIJAN SIVAKUMAR >>> >>> Associate Software Engineer >>> >>> Red Hat >>> >>> >>> >>> >>> >>> T: +91-9727532362 >>> >>> >>> TRIED. TESTED. TRUSTED. >>> >>> >>> On Wed, Aug 19, 2020 at 1:46 AM Jo?o Ba?to < >>> joao.bauto at neuro.fchampalimaud.org> wrote: >>> >>>> Hi Srijan, >>>> >>>> Is there a way of getting the status of the crawl process? >>>> We are going to expand this cluster, adding 12 new bricks (around >>>> 500TB) and we rely heavily on the quota feature to control the space usage >>>> for each project. It's been running since Saturday (nothing changed) >>>> and unsure if it's going to finish tomorrow or in weeks. >>>> >>>> Thank you! >>>> *Jo?o Ba?to* >>>> --------------- >>>> >>>> *Scientific Computing and Software Platform* >>>> Champalimaud Research >>>> Champalimaud Center for the Unknown >>>> Av. Bras?lia, Doca de Pedrou?os >>>> 1400-038 Lisbon, Portugal >>>> fchampalimaud.org >>>> >>>> >>>> Srijan Sivakumar escreveu no dia domingo, >>>> 16/08/2020 ?(s) 06:11: >>>> >>>>> Hi Jo?o, >>>>> >>>>> Yes it'll take some time given the file system size as it has to >>>>> change the xattrs in each level and then crawl upwards. >>>>> >>>>> stat is done by the script itself so the crawl is initiated. >>>>> >>>>> Regards, >>>>> Srijan Sivakumar >>>>> >>>>> On Sun 16 Aug, 2020, 04:58 Jo?o Ba?to, < >>>>> joao.bauto at neuro.fchampalimaud.org> wrote: >>>>> >>>>>> Hi Srijan & Strahil, >>>>>> >>>>>> I ran the quota_fsck script mentioned in Hari's blog post in all >>>>>> bricks and it detected a lot of size mismatch. >>>>>> >>>>>> The script was executed as, >>>>>> >>>>>> - python quota_fsck.py --sub-dir projectB --fix-issues /mnt/tank >>>>>> /tank/volume2/brick (in all nodes and bricks) >>>>>> >>>>>> Here is a snippet from the script, >>>>>> >>>>>> Size Mismatch /tank/volume2/brick/projectB {'parents': >>>>>> {'00000000-0000-0000-0000-000000000001': {'contri_file_count': >>>>>> 18446744073035296610L, 'contri_size': 18446645297413872640L, >>>>>> 'contri_dir_count': 18446744073709527653L}}, 'version': '1', 'file_count': >>>>>> 18446744073035296610L, 'dirty': False, 'dir_count': 18446744073709527653L, >>>>>> 'size': 18446645297413872640L} 15204281691754 >>>>>> MARKING DIRTY: /tank/volume2/brick/projectB >>>>>> stat on /mnt/tank/projectB >>>>>> Files verified : 683223 >>>>>> Directories verified : 46823 >>>>>> Objects Fixed : 705230 >>>>>> >>>>>> Checking the xattr in the bricks I can see the directory in question >>>>>> marked as dirty, >>>>>> # getfattr -d -m. -e hex /tank/volume2/brick/projectB >>>>>> getfattr: Removing leading '/' from absolute path names >>>>>> # file: tank/volume2/brick/projectB >>>>>> trusted.gfid=0x3ca2bce0455945efa6662813ce20fc0c >>>>>> >>>>>> trusted.glusterfs.9582685f-07fa-41fd-b9fc-ebab3a6989cf.xtime=0x5f372478000a7705 >>>>>> trusted.glusterfs.dht=0xe1a4060c000000003ffffffe5ffffffc >>>>>> >>>>>> trusted.glusterfs.mdata=0x010000000000000000000000005f3724750000000013ddf679000000005ce2aff90000000007fdacb0000000005ce2aff90000000007fdacb0 >>>>>> >>>>>> trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri.1=0x00000ca6ccf7a80000000000000790a1000000000000b6ea >>>>>> trusted.glusterfs.quota.dirty=0x3100 >>>>>> trusted.glusterfs.quota.limit-set.1=0x0000640000000000ffffffffffffffff >>>>>> >>>>>> trusted.glusterfs.quota.size.1=0x00000ca6ccf7a80000000000000790a1000000000000b6ea >>>>>> >>>>>> Now, my question is how do I trigger Gluster to recalculate the quota >>>>>> for this directory? Is it automatic but it takes a while? Because the quota >>>>>> list did change but not to a good "result". >>>>>> >>>>>> Path Hard-limit Soft-limit Used >>>>>> Available Soft-limit exceeded? Hard-limit exceeded? >>>>>> /projectB 100.0TB 80%(80.0TB) 16383.9PB 190.1TB >>>>>> No No >>>>>> >>>>>> I would like to avoid a disable/enable quota in the volume as it >>>>>> removes the configs. >>>>>> >>>>>> Thank you for all the help! >>>>>> *Jo?o Ba?to* >>>>>> --------------- >>>>>> >>>>>> *Scientific Computing and Software Platform* >>>>>> Champalimaud Research >>>>>> Champalimaud Center for the Unknown >>>>>> Av. Bras?lia, Doca de Pedrou?os >>>>>> 1400-038 Lisbon, Portugal >>>>>> fchampalimaud.org >>>>>> >>>>>> >>>>>> Srijan Sivakumar escreveu no dia s?bado, >>>>>> 15/08/2020 ?(s) 11:57: >>>>>> >>>>>>> Hi Jo?o, >>>>>>> >>>>>>> The quota accounting error is what we're looking at here. I think >>>>>>> you've already looked into the blog post by Hari and are using the script >>>>>>> to fix the accounting issue. >>>>>>> That should help you out in fixing this issue. >>>>>>> >>>>>>> Let me know if you face any issues while using it. >>>>>>> >>>>>>> Regards, >>>>>>> Srijan Sivakumar >>>>>>> >>>>>>> >>>>>>> On Fri 14 Aug, 2020, 17:10 Jo?o Ba?to, < >>>>>>> joao.bauto at neuro.fchampalimaud.org> wrote: >>>>>>> >>>>>>>> Hi Strahil, >>>>>>>> >>>>>>>> I have tried removing the quota for that specific directory and >>>>>>>> setting it again but it didn't work (maybe it has to be a quota disable and >>>>>>>> enable in the volume options). Currently testing a solution >>>>>>>> by Hari with the quota_fsck.py script (https://medium.com/@ >>>>>>>> harigowtham/glusterfs-quota-fix-accounting-840df33fcd3a) and its >>>>>>>> detecting a lot of size mismatch in files. >>>>>>>> >>>>>>>> Thank you, >>>>>>>> *Jo?o Ba?to* >>>>>>>> --------------- >>>>>>>> >>>>>>>> *Scientific Computing and Software Platform* >>>>>>>> Champalimaud Research >>>>>>>> Champalimaud Center for the Unknown >>>>>>>> Av. Bras?lia, Doca de Pedrou?os >>>>>>>> 1400-038 Lisbon, Portugal >>>>>>>> fchampalimaud.org >>>>>>>> >>>>>>>> >>>>>>>> Strahil Nikolov escreveu no dia sexta, >>>>>>>> 14/08/2020 ?(s) 10:16: >>>>>>>> >>>>>>>>> Hi Jo?o, >>>>>>>>> >>>>>>>>> Based on your output it seems that the quota size is different on >>>>>>>>> the 2 bricks. >>>>>>>>> >>>>>>>>> Have you tried to remove the quota and then recreate it ? Maybe it >>>>>>>>> will be the easiest way to fix it. >>>>>>>>> >>>>>>>>> Best Regards, >>>>>>>>> Strahil Nikolov >>>>>>>>> >>>>>>>>> >>>>>>>>> ?? 14 ?????? 2020 ?. 4:35:14 GMT+03:00, "Jo?o Ba?to" < >>>>>>>>> joao.bauto at neuro.fchampalimaud.org> ??????: >>>>>>>>> >Hi all, >>>>>>>>> > >>>>>>>>> >We have a 4-node distributed cluster with 2 bricks per node >>>>>>>>> running >>>>>>>>> >Gluster >>>>>>>>> >7.7 + ZFS. We use directory quota to limit the space used by our >>>>>>>>> >members on >>>>>>>>> >each project. Two days ago we noticed inconsistent space used >>>>>>>>> reported >>>>>>>>> >by >>>>>>>>> >Gluster in the quota list. >>>>>>>>> > >>>>>>>>> >A small snippet of gluster volume quota vol list, >>>>>>>>> > >>>>>>>>> > Path Hard-limit Soft-limit Used >>>>>>>>> >Available Soft-limit exceeded? Hard-limit exceeded? >>>>>>>>> >/projectA 5.0TB 80%(4.0TB) 3.1TB >>>>>>>>> 1.9TB >>>>>>>>> > No No >>>>>>>>> >*/projectB 100.0TB 80%(80.0TB) 16383.4PB 740.9TB >>>>>>>>> > No No* >>>>>>>>> >/projectC 70.0TB 80%(56.0TB) 50.0TB >>>>>>>>> 20.0TB >>>>>>>>> > No No >>>>>>>>> > >>>>>>>>> >The total space available in the cluster is 360TB, the quota for >>>>>>>>> >projectB >>>>>>>>> >is 100TB and, as you can see, its reporting 16383.4PB used and >>>>>>>>> 740TB >>>>>>>>> >available (already decreased from 750TB). >>>>>>>>> > >>>>>>>>> >There was an issue in Gluster 3.x related to the wrong directory >>>>>>>>> quota >>>>>>>>> >( >>>>>>>>> > >>>>>>>>> https://lists.gluster.org/pipermail/gluster-users/2016-February/025305.html >>>>>>>>> > and >>>>>>>>> > >>>>>>>>> https://lists.gluster.org/pipermail/gluster-users/2018-November/035374.html >>>>>>>>> ) >>>>>>>>> >but it's marked as solved (not sure if the solution still >>>>>>>>> applies). >>>>>>>>> > >>>>>>>>> >*On projectB* >>>>>>>>> ># getfattr -d -m . -e hex projectB >>>>>>>>> ># file: projectB >>>>>>>>> >trusted.gfid=0x3ca2bce0455945efa6662813ce20fc0c >>>>>>>>> >>>>>>>>> >trusted.glusterfs.9582685f-07fa-41fd-b9fc-ebab3a6989cf.xtime=0x5f35e69800098ed9 >>>>>>>>> >trusted.glusterfs.dht=0xe1a4060c000000003ffffffe5ffffffc >>>>>>>>> >>>>>>>>> >trusted.glusterfs.mdata=0x010000000000000000000000005f355c59000000000939079f000000005ce2aff90000000007fdacb0000000005ce2aff90000000007fdacb0 >>>>>>>>> >>>>>>>>> >trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri.1=0x0000ab0f227a860000000000478e33acffffffffffffc112 >>>>>>>>> >trusted.glusterfs.quota.dirty=0x3000 >>>>>>>>> >>>>>>>>> >trusted.glusterfs.quota.limit-set.1=0x0000640000000000ffffffffffffffff >>>>>>>>> >>>>>>>>> >trusted.glusterfs.quota.size.1=0x0000ab0f227a860000000000478e33acffffffffffffc112 >>>>>>>>> > >>>>>>>>> >*On projectA* >>>>>>>>> ># getfattr -d -m . -e hex projectA >>>>>>>>> ># file: projectA >>>>>>>>> >trusted.gfid=0x05b09ded19354c0eb544d22d4659582e >>>>>>>>> >>>>>>>>> >trusted.glusterfs.9582685f-07fa-41fd-b9fc-ebab3a6989cf.xtime=0x5f1aeb9f00044c64 >>>>>>>>> >trusted.glusterfs.dht=0xe1a4060c000000001fffffff3ffffffd >>>>>>>>> >>>>>>>>> >trusted.glusterfs.mdata=0x010000000000000000000000005f1ac6a10000000018f30a4e000000005c338fab0000000017a3135a000000005b0694fb000000001584a21b >>>>>>>>> >>>>>>>>> >trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri.1=0x0000067de3bbe20000000000000128610000000000033498 >>>>>>>>> >trusted.glusterfs.quota.dirty=0x3000 >>>>>>>>> >>>>>>>>> >trusted.glusterfs.quota.limit-set.1=0x0000460000000000ffffffffffffffff >>>>>>>>> >>>>>>>>> >trusted.glusterfs.quota.size.1=0x0000067de3bbe20000000000000128610000000000033498 >>>>>>>>> > >>>>>>>>> >Any idea on what's happening and how to fix it? >>>>>>>>> > >>>>>>>>> >Thanks! >>>>>>>>> >*Jo?o Ba?to* >>>>>>>>> >--------------- >>>>>>>>> > >>>>>>>>> >*Scientific Computing and Software Platform* >>>>>>>>> >Champalimaud Research >>>>>>>>> >Champalimaud Center for the Unknown >>>>>>>>> >Av. Bras?lia, Doca de Pedrou?os >>>>>>>>> >1400-038 Lisbon, Portugal >>>>>>>>> >fchampalimaud.org >>>>>>>>> >>>>>>>> ________ >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Community Meeting Calendar: >>>>>>>> >>>>>>>> Schedule - >>>>>>>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >>>>>>>> Bridge: https://bluejeans.com/441850968 >>>>>>>> >>>>>>>> Gluster-users mailing list >>>>>>>> Gluster-users at gluster.org >>>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>>>>> >>>>>>> >>> >>> -- >>> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vd at d7informatics.de Wed Aug 19 15:02:03 2020 From: vd at d7informatics.de (Volker Dormeyer) Date: Wed, 19 Aug 2020 17:02:03 +0200 Subject: [Gluster-users] Gluster cluster questions In-Reply-To: <8D98E9C4-673E-4D55-8A88-A89DD2B5A8F2@kadalu.io> References: <94347073-adf0-b14c-3f7c-e856f364b971@d7informatics.de> <8D98E9C4-673E-4D55-8A88-A89DD2B5A8F2@kadalu.io> Message-ID: <109576d9-8829-b837-92de-4eb81397465b@d7informatics.de> Hi Aravinda, Thank you!! On 8/19/20 1:47 PM, Aravinda VK wrote: > https://kadalu.io?can be used as alternative to Heketi to use > GlusterFS with Kubernetes. Create two Storage pools one with SSD and > another with HDD, and create Storage class to use these pools. > > https://kadalu.io/docs/k8s-storage/latest/storage-classes This sounds promising! I read through the documentation. A few questions came up: 1. I need Kubernetes on the storage cluster? Right? This would be a good thing. 2. Which components are required on the client Cluster? I mean the Kubernetes Clusters which make use of the storage cluster? At least a client to mount the storage would be required. 3. How far is the ARM64 implementation?? This is from the github site: The release versions and 'latest' versions are not yet ARM ready! But we have an image for |linux/arm64|,|linux/arm/v7| platform support! Best Regards, Volker From aravinda at kadalu.io Wed Aug 19 16:06:23 2020 From: aravinda at kadalu.io (Aravinda VK) Date: Wed, 19 Aug 2020 21:36:23 +0530 Subject: [Gluster-users] Gluster cluster questions In-Reply-To: <109576d9-8829-b837-92de-4eb81397465b@d7informatics.de> References: <94347073-adf0-b14c-3f7c-e856f364b971@d7informatics.de> <8D98E9C4-673E-4D55-8A88-A89DD2B5A8F2@kadalu.io> <109576d9-8829-b837-92de-4eb81397465b@d7informatics.de> Message-ID: <50B8BF0E-7D9B-49C6-8889-15C48A235E26@kadalu.io> Hi Volker, > On 19-Aug-2020, at 8:32 PM, Volker Dormeyer wrote: > > Hi Aravinda, > > Thank you!! > > On 8/19/20 1:47 PM, Aravinda VK wrote: > >> https://kadalu.io can be used as alternative to Heketi to use >> GlusterFS with Kubernetes. Create two Storage pools one with SSD and >> another with HDD, and create Storage class to use these pools. >> >> https://kadalu.io/docs/k8s-storage/latest/storage-classes > > This sounds promising! I read through the documentation. A few questions > came up: > > 1. I need Kubernetes on the storage cluster? Right? This would be a good > thing. > `kubectl kadalu install` will install all pods which are required to run GlusterFS server and Client. Kadalu also supports external GlusterFS outside Kubernetes. > 2. Which components are required on the client Cluster? I mean the > Kubernetes Clusters which make use of the storage cluster? At least a > client to mount the storage would be required. Kadalu csi pods includes GlusterFS client bits, CSI node plugin will automatically mounts the PV when when application Pod starts. > > 3. How far is the ARM64 implementation? > > This is from the github site: > > The release versions and 'latest' versions are not yet ARM ready! But we > have an image for |linux/arm64|,|linux/arm/v7| platform support! > Our recent tests showing very promising results, hope the upcoming release will have latest images with Arm support. Feel free to open issues/RFE here https://github.com/kadalu/kadalu/issues > > Best Regards, > Volker > Aravinda Vishwanathapura https://kadalu.io From ssivakum at redhat.com Wed Aug 19 17:03:59 2020 From: ssivakum at redhat.com (Srijan Sivakumar) Date: Wed, 19 Aug 2020 22:33:59 +0530 Subject: [Gluster-users] Wrong directory quota usage In-Reply-To: References: Message-ID: Hi Jo?o, I'd recommend to go with the disable/enable of the quota as that'd eventually do the same thing. Rather than manually changing the parameters in the said command, that would be the better option. -- Thanks and Regards, SRIJAN SIVAKUMAR Associate Software Engineer Red Hat T: +91-9727532362 TRIED. TESTED. TRUSTED. On Wed, Aug 19, 2020 at 8:12 PM Jo?o Ba?to < joao.bauto at neuro.fchampalimaud.org> wrote: > Hi Srijan, > > Before I do the disable/enable just want to check something with you. The > other cluster where the crawling is running, I can see the find command and > this one which seems to be the one triggering the crawler (4 processes, one > per brick in all nodes) > > /usr/sbin/glusterfs -s localhost --volfile-id > client_per_brick/tank.client.hostname.tank-volume1-brick.vol > --use-readdirp=yes --client-pid -100 -l > /var/log/glusterfs/quota_crawl/tank-volume1-brick.log > /var/run/gluster/tmp/mntYbIVwT > > Can I manually trigger this command? > > Thanks! > *Jo?o Ba?to* > --------------- > > *Scientific Computing and Software Platform* > Champalimaud Research > Champalimaud Center for the Unknown > Av. Bras?lia, Doca de Pedrou?os > 1400-038 Lisbon, Portugal > fchampalimaud.org > > > Srijan Sivakumar escreveu no dia quarta, 19/08/2020 > ?(s) 07:25: > >> Hi Jo?o, >> >> If the crawl is not going on and the values are still not reflecting >> properly then it means the crawl process has ended abruptly. >> >> Yes, technically disabling and enabling the quota will trigger crawl but >> it'd do a complete crawl of the filesystem, hence would take time and be >> resource consuming. Usually disabling-enabling is the last thing to do if >> the accounting isn't reflecting properly but if you're going to merge these >> two clusters then probably you can go ahead with the merging and then >> enable quota. >> >> -- >> Thanks and Regards, >> >> SRIJAN SIVAKUMAR >> >> Associate Software Engineer >> >> Red Hat >> >> >> >> >> >> T: +91-9727532362 >> >> >> TRIED. TESTED. TRUSTED. >> >> On Wed, Aug 19, 2020 at 3:53 AM Jo?o Ba?to < >> joao.bauto at neuro.fchampalimaud.org> wrote: >> >>> Hi Srijan, >>> >>> I didn't get any result with that command so I went to our other cluster >>> (we are merging two clusters, data is replicated) and activated the quota >>> feature on the same directory. Running the same command on each node I get >>> a similar output to yours. One process per brick I'm assuming. >>> >>> root 1746822 1.4 0.0 230324 2992 ? S 23:06 0:04 >>> /usr/bin/find . -exec /usr/bin/stat {} \ ; >>> root 1746858 5.3 0.0 233924 6644 ? S 23:06 0:15 >>> /usr/bin/find . -exec /usr/bin/stat {} \ ; >>> root 1746889 3.3 0.0 233592 6452 ? S 23:06 0:10 >>> /usr/bin/find . -exec /usr/bin/stat {} \ ; >>> root 1746930 3.1 0.0 230476 3232 ? S 23:06 0:09 >>> /usr/bin/find . -exec /usr/bin/stat {} \ ; >>> >>> At this point, is it easier to just disable and enable the feature and >>> force a new crawl? We don't mind a temporary increase in CPU and IO usage. >>> >>> Thank you again! >>> *Jo?o Ba?to* >>> --------------- >>> >>> *Scientific Computing and Software Platform* >>> Champalimaud Research >>> Champalimaud Center for the Unknown >>> Av. Bras?lia, Doca de Pedrou?os >>> 1400-038 Lisbon, Portugal >>> fchampalimaud.org >>> >>> >>> Srijan Sivakumar escreveu no dia ter?a, >>> 18/08/2020 ?(s) 21:42: >>> >>>> Hi Jo?o, >>>> >>>> There isn't a straightforward way of tracking the crawl but as gluster >>>> uses find and stat during crawl, one can run the following command, >>>> # ps aux | grep find >>>> >>>> If the output is of the form, >>>> "root 1513 0.0 0.1 127224 2636 ? S 12:24 0.00 >>>> /usr/bin/find . -exec /usr/bin/stat {} \" >>>> then it means that the crawl is still going on. >>>> >>>> >>>> Thanks and Regards, >>>> >>>> SRIJAN SIVAKUMAR >>>> >>>> Associate Software Engineer >>>> >>>> Red Hat >>>> >>>> >>>> >>>> >>>> >>>> T: +91-9727532362 >>>> >>>> >>>> TRIED. TESTED. TRUSTED. >>>> >>>> >>>> On Wed, Aug 19, 2020 at 1:46 AM Jo?o Ba?to < >>>> joao.bauto at neuro.fchampalimaud.org> wrote: >>>> >>>>> Hi Srijan, >>>>> >>>>> Is there a way of getting the status of the crawl process? >>>>> We are going to expand this cluster, adding 12 new bricks (around >>>>> 500TB) and we rely heavily on the quota feature to control the space usage >>>>> for each project. It's been running since Saturday (nothing changed) >>>>> and unsure if it's going to finish tomorrow or in weeks. >>>>> >>>>> Thank you! >>>>> *Jo?o Ba?to* >>>>> --------------- >>>>> >>>>> *Scientific Computing and Software Platform* >>>>> Champalimaud Research >>>>> Champalimaud Center for the Unknown >>>>> Av. Bras?lia, Doca de Pedrou?os >>>>> 1400-038 Lisbon, Portugal >>>>> fchampalimaud.org >>>>> >>>>> >>>>> Srijan Sivakumar escreveu no dia domingo, >>>>> 16/08/2020 ?(s) 06:11: >>>>> >>>>>> Hi Jo?o, >>>>>> >>>>>> Yes it'll take some time given the file system size as it has to >>>>>> change the xattrs in each level and then crawl upwards. >>>>>> >>>>>> stat is done by the script itself so the crawl is initiated. >>>>>> >>>>>> Regards, >>>>>> Srijan Sivakumar >>>>>> >>>>>> On Sun 16 Aug, 2020, 04:58 Jo?o Ba?to, < >>>>>> joao.bauto at neuro.fchampalimaud.org> wrote: >>>>>> >>>>>>> Hi Srijan & Strahil, >>>>>>> >>>>>>> I ran the quota_fsck script mentioned in Hari's blog post in all >>>>>>> bricks and it detected a lot of size mismatch. >>>>>>> >>>>>>> The script was executed as, >>>>>>> >>>>>>> - python quota_fsck.py --sub-dir projectB --fix-issues /mnt/tank >>>>>>> /tank/volume2/brick (in all nodes and bricks) >>>>>>> >>>>>>> Here is a snippet from the script, >>>>>>> >>>>>>> Size Mismatch /tank/volume2/brick/projectB {'parents': >>>>>>> {'00000000-0000-0000-0000-000000000001': {'contri_file_count': >>>>>>> 18446744073035296610L, 'contri_size': 18446645297413872640L, >>>>>>> 'contri_dir_count': 18446744073709527653L}}, 'version': '1', 'file_count': >>>>>>> 18446744073035296610L, 'dirty': False, 'dir_count': 18446744073709527653L, >>>>>>> 'size': 18446645297413872640L} 15204281691754 >>>>>>> MARKING DIRTY: /tank/volume2/brick/projectB >>>>>>> stat on /mnt/tank/projectB >>>>>>> Files verified : 683223 >>>>>>> Directories verified : 46823 >>>>>>> Objects Fixed : 705230 >>>>>>> >>>>>>> Checking the xattr in the bricks I can see the directory in question >>>>>>> marked as dirty, >>>>>>> # getfattr -d -m. -e hex /tank/volume2/brick/projectB >>>>>>> getfattr: Removing leading '/' from absolute path names >>>>>>> # file: tank/volume2/brick/projectB >>>>>>> trusted.gfid=0x3ca2bce0455945efa6662813ce20fc0c >>>>>>> >>>>>>> trusted.glusterfs.9582685f-07fa-41fd-b9fc-ebab3a6989cf.xtime=0x5f372478000a7705 >>>>>>> trusted.glusterfs.dht=0xe1a4060c000000003ffffffe5ffffffc >>>>>>> >>>>>>> trusted.glusterfs.mdata=0x010000000000000000000000005f3724750000000013ddf679000000005ce2aff90000000007fdacb0000000005ce2aff90000000007fdacb0 >>>>>>> >>>>>>> trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri.1=0x00000ca6ccf7a80000000000000790a1000000000000b6ea >>>>>>> trusted.glusterfs.quota.dirty=0x3100 >>>>>>> >>>>>>> trusted.glusterfs.quota.limit-set.1=0x0000640000000000ffffffffffffffff >>>>>>> >>>>>>> trusted.glusterfs.quota.size.1=0x00000ca6ccf7a80000000000000790a1000000000000b6ea >>>>>>> >>>>>>> Now, my question is how do I trigger Gluster to recalculate the >>>>>>> quota for this directory? Is it automatic but it takes a while? Because the >>>>>>> quota list did change but not to a good "result". >>>>>>> >>>>>>> Path Hard-limit Soft-limit Used >>>>>>> Available Soft-limit exceeded? Hard-limit exceeded? >>>>>>> /projectB 100.0TB 80%(80.0TB) 16383.9PB 190.1TB >>>>>>> No No >>>>>>> >>>>>>> I would like to avoid a disable/enable quota in the volume as it >>>>>>> removes the configs. >>>>>>> >>>>>>> Thank you for all the help! >>>>>>> *Jo?o Ba?to* >>>>>>> --------------- >>>>>>> >>>>>>> *Scientific Computing and Software Platform* >>>>>>> Champalimaud Research >>>>>>> Champalimaud Center for the Unknown >>>>>>> Av. Bras?lia, Doca de Pedrou?os >>>>>>> 1400-038 Lisbon, Portugal >>>>>>> fchampalimaud.org >>>>>>> >>>>>>> >>>>>>> Srijan Sivakumar escreveu no dia s?bado, >>>>>>> 15/08/2020 ?(s) 11:57: >>>>>>> >>>>>>>> Hi Jo?o, >>>>>>>> >>>>>>>> The quota accounting error is what we're looking at here. I think >>>>>>>> you've already looked into the blog post by Hari and are using the script >>>>>>>> to fix the accounting issue. >>>>>>>> That should help you out in fixing this issue. >>>>>>>> >>>>>>>> Let me know if you face any issues while using it. >>>>>>>> >>>>>>>> Regards, >>>>>>>> Srijan Sivakumar >>>>>>>> >>>>>>>> >>>>>>>> On Fri 14 Aug, 2020, 17:10 Jo?o Ba?to, < >>>>>>>> joao.bauto at neuro.fchampalimaud.org> wrote: >>>>>>>> >>>>>>>>> Hi Strahil, >>>>>>>>> >>>>>>>>> I have tried removing the quota for that specific directory and >>>>>>>>> setting it again but it didn't work (maybe it has to be a quota disable and >>>>>>>>> enable in the volume options). Currently testing a solution >>>>>>>>> by Hari with the quota_fsck.py script (https://medium.com/@ >>>>>>>>> harigowtham/glusterfs-quota-fix-accounting-840df33fcd3a) and its >>>>>>>>> detecting a lot of size mismatch in files. >>>>>>>>> >>>>>>>>> Thank you, >>>>>>>>> *Jo?o Ba?to* >>>>>>>>> --------------- >>>>>>>>> >>>>>>>>> *Scientific Computing and Software Platform* >>>>>>>>> Champalimaud Research >>>>>>>>> Champalimaud Center for the Unknown >>>>>>>>> Av. Bras?lia, Doca de Pedrou?os >>>>>>>>> 1400-038 Lisbon, Portugal >>>>>>>>> fchampalimaud.org >>>>>>>>> >>>>>>>>> >>>>>>>>> Strahil Nikolov escreveu no dia sexta, >>>>>>>>> 14/08/2020 ?(s) 10:16: >>>>>>>>> >>>>>>>>>> Hi Jo?o, >>>>>>>>>> >>>>>>>>>> Based on your output it seems that the quota size is different on >>>>>>>>>> the 2 bricks. >>>>>>>>>> >>>>>>>>>> Have you tried to remove the quota and then recreate it ? Maybe >>>>>>>>>> it will be the easiest way to fix it. >>>>>>>>>> >>>>>>>>>> Best Regards, >>>>>>>>>> Strahil Nikolov >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> ?? 14 ?????? 2020 ?. 4:35:14 GMT+03:00, "Jo?o Ba?to" < >>>>>>>>>> joao.bauto at neuro.fchampalimaud.org> ??????: >>>>>>>>>> >Hi all, >>>>>>>>>> > >>>>>>>>>> >We have a 4-node distributed cluster with 2 bricks per node >>>>>>>>>> running >>>>>>>>>> >Gluster >>>>>>>>>> >7.7 + ZFS. We use directory quota to limit the space used by our >>>>>>>>>> >members on >>>>>>>>>> >each project. Two days ago we noticed inconsistent space used >>>>>>>>>> reported >>>>>>>>>> >by >>>>>>>>>> >Gluster in the quota list. >>>>>>>>>> > >>>>>>>>>> >A small snippet of gluster volume quota vol list, >>>>>>>>>> > >>>>>>>>>> > Path Hard-limit Soft-limit Used >>>>>>>>>> >Available Soft-limit exceeded? Hard-limit exceeded? >>>>>>>>>> >/projectA 5.0TB 80%(4.0TB) 3.1TB >>>>>>>>>> 1.9TB >>>>>>>>>> > No No >>>>>>>>>> >*/projectB 100.0TB 80%(80.0TB) 16383.4PB 740.9TB >>>>>>>>>> > No No* >>>>>>>>>> >/projectC 70.0TB 80%(56.0TB) 50.0TB >>>>>>>>>> 20.0TB >>>>>>>>>> > No No >>>>>>>>>> > >>>>>>>>>> >The total space available in the cluster is 360TB, the quota for >>>>>>>>>> >projectB >>>>>>>>>> >is 100TB and, as you can see, its reporting 16383.4PB used and >>>>>>>>>> 740TB >>>>>>>>>> >available (already decreased from 750TB). >>>>>>>>>> > >>>>>>>>>> >There was an issue in Gluster 3.x related to the wrong directory >>>>>>>>>> quota >>>>>>>>>> >( >>>>>>>>>> > >>>>>>>>>> https://lists.gluster.org/pipermail/gluster-users/2016-February/025305.html >>>>>>>>>> > and >>>>>>>>>> > >>>>>>>>>> https://lists.gluster.org/pipermail/gluster-users/2018-November/035374.html >>>>>>>>>> ) >>>>>>>>>> >but it's marked as solved (not sure if the solution still >>>>>>>>>> applies). >>>>>>>>>> > >>>>>>>>>> >*On projectB* >>>>>>>>>> ># getfattr -d -m . -e hex projectB >>>>>>>>>> ># file: projectB >>>>>>>>>> >trusted.gfid=0x3ca2bce0455945efa6662813ce20fc0c >>>>>>>>>> >>>>>>>>>> >trusted.glusterfs.9582685f-07fa-41fd-b9fc-ebab3a6989cf.xtime=0x5f35e69800098ed9 >>>>>>>>>> >trusted.glusterfs.dht=0xe1a4060c000000003ffffffe5ffffffc >>>>>>>>>> >>>>>>>>>> >trusted.glusterfs.mdata=0x010000000000000000000000005f355c59000000000939079f000000005ce2aff90000000007fdacb0000000005ce2aff90000000007fdacb0 >>>>>>>>>> >>>>>>>>>> >trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri.1=0x0000ab0f227a860000000000478e33acffffffffffffc112 >>>>>>>>>> >trusted.glusterfs.quota.dirty=0x3000 >>>>>>>>>> >>>>>>>>>> >trusted.glusterfs.quota.limit-set.1=0x0000640000000000ffffffffffffffff >>>>>>>>>> >>>>>>>>>> >trusted.glusterfs.quota.size.1=0x0000ab0f227a860000000000478e33acffffffffffffc112 >>>>>>>>>> > >>>>>>>>>> >*On projectA* >>>>>>>>>> ># getfattr -d -m . -e hex projectA >>>>>>>>>> ># file: projectA >>>>>>>>>> >trusted.gfid=0x05b09ded19354c0eb544d22d4659582e >>>>>>>>>> >>>>>>>>>> >trusted.glusterfs.9582685f-07fa-41fd-b9fc-ebab3a6989cf.xtime=0x5f1aeb9f00044c64 >>>>>>>>>> >trusted.glusterfs.dht=0xe1a4060c000000001fffffff3ffffffd >>>>>>>>>> >>>>>>>>>> >trusted.glusterfs.mdata=0x010000000000000000000000005f1ac6a10000000018f30a4e000000005c338fab0000000017a3135a000000005b0694fb000000001584a21b >>>>>>>>>> >>>>>>>>>> >trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri.1=0x0000067de3bbe20000000000000128610000000000033498 >>>>>>>>>> >trusted.glusterfs.quota.dirty=0x3000 >>>>>>>>>> >>>>>>>>>> >trusted.glusterfs.quota.limit-set.1=0x0000460000000000ffffffffffffffff >>>>>>>>>> >>>>>>>>>> >trusted.glusterfs.quota.size.1=0x0000067de3bbe20000000000000128610000000000033498 >>>>>>>>>> > >>>>>>>>>> >Any idea on what's happening and how to fix it? >>>>>>>>>> > >>>>>>>>>> >Thanks! >>>>>>>>>> >*Jo?o Ba?to* >>>>>>>>>> >--------------- >>>>>>>>>> > >>>>>>>>>> >*Scientific Computing and Software Platform* >>>>>>>>>> >Champalimaud Research >>>>>>>>>> >Champalimaud Center for the Unknown >>>>>>>>>> >Av. Bras?lia, Doca de Pedrou?os >>>>>>>>>> >1400-038 Lisbon, Portugal >>>>>>>>>> >fchampalimaud.org >>>>>>>>>> >>>>>>>>> ________ >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Community Meeting Calendar: >>>>>>>>> >>>>>>>>> Schedule - >>>>>>>>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >>>>>>>>> Bridge: https://bluejeans.com/441850968 >>>>>>>>> >>>>>>>>> Gluster-users mailing list >>>>>>>>> Gluster-users at gluster.org >>>>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>>>>>> >>>>>>>> >>>> >>>> -- >>>> >>> >> >> -- Thanks and Regards, SRIJAN SIVAKUMAR Associate Software Engineer Red Hat T: +91-9727532362 TRIED. TESTED. TRUSTED. -------------- next part -------------- An HTML attachment was scrubbed... URL: From vd at d7informatics.de Wed Aug 19 19:06:01 2020 From: vd at d7informatics.de (Volker Dormeyer) Date: Wed, 19 Aug 2020 21:06:01 +0200 Subject: [Gluster-users] Gluster cluster questions In-Reply-To: <50B8BF0E-7D9B-49C6-8889-15C48A235E26@kadalu.io> References: <94347073-adf0-b14c-3f7c-e856f364b971@d7informatics.de> <8D98E9C4-673E-4D55-8A88-A89DD2B5A8F2@kadalu.io> <109576d9-8829-b837-92de-4eb81397465b@d7informatics.de> <50B8BF0E-7D9B-49C6-8889-15C48A235E26@kadalu.io> Message-ID: <33560d25-5920-0c3b-e627-b38d4ca05172@d7informatics.de> Thank you Aravinda! I think I will try it. On 8/19/20 6:06 PM, Aravinda VK wrote: > Hi Volker, > >> On 19-Aug-2020, at 8:32 PM, Volker Dormeyer wrote: >> >> Hi Aravinda, >> >> Thank you!! >> >> On 8/19/20 1:47 PM, Aravinda VK wrote: >> >>> https://kadalu.io can be used as alternative to Heketi to use >>> GlusterFS with Kubernetes. Create two Storage pools one with SSD and >>> another with HDD, and create Storage class to use these pools. >>> >>> https://kadalu.io/docs/k8s-storage/latest/storage-classes >> This sounds promising! I read through the documentation. A few questions >> came up: >> >> 1. I need Kubernetes on the storage cluster? Right? This would be a good >> thing. >> > `kubectl kadalu install` will install all pods which are required to run GlusterFS server and Client. Kadalu also supports external GlusterFS outside Kubernetes. > >> 2. Which components are required on the client Cluster? I mean the >> Kubernetes Clusters which make use of the storage cluster? At least a >> client to mount the storage would be required. > Kadalu csi pods includes GlusterFS client bits, CSI node plugin will automatically mounts the PV when when application Pod starts. > >> 3. How far is the ARM64 implementation? >> >> This is from the github site: >> >> The release versions and 'latest' versions are not yet ARM ready! But we >> have an image for |linux/arm64|,|linux/arm/v7| platform support! >> > Our recent tests showing very promising results, hope the upcoming release will have latest images with Arm support. Feel free to open issues/RFE here > https://github.com/kadalu/kadalu/issues > >> Best Regards, >> Volker >> > Aravinda Vishwanathapura > https://kadalu.io > > > From bob at computerisms.ca Thu Aug 20 00:46:41 2020 From: bob at computerisms.ca (Computerisms Corporation) Date: Wed, 19 Aug 2020 17:46:41 -0700 Subject: [Gluster-users] performance In-Reply-To: <68274322-B514-4555-A236-D159B16D42FC@yahoo.com> References: <696b3c28-519b-c3e3-ce5d-e60d2f194d4c@computerisms.ca> <7991483E-5365-4C87-89FA-C871AED18062@yahoo.com> <345b06c4-5996-9aa3-f846-0944c60ee398@computerisms.ca> <2CD68ED2-199F-407D-B0CC-385793BA16FD@yahoo.com> <64ee1b88-42d6-75d2-05ff-4703d168cc25@computerisms.ca> <68274322-B514-4555-A236-D159B16D42FC@yahoo.com> Message-ID: <0166c1ff-83c0-4d5f-aa96-6cd8a2518cd1@computerisms.ca> Hi Strahil, so over the last two weeks, the system has been relatively stable. I have powered off both servers at least once, for about 5 minutes each time. server came up, auto-healed what it needed to, so all of that part is working as expected. will answer things inline and follow with more questions: >>> Hm... OK. I guess you can try 7.7 whenever it's possible. >> >> Acknowledged. Still on my list. > It could be a bad firmware also. If you get the opportunity, flash the firmware and bump the OS to the max. Datacenter says everything was up to date as of installation, not really wanting them to take the servers offline for long enough to redo all the hardware. >>>> more number of CPU cycles than needed, increasing the event thread >>>> count >>>> would enhance the performance of the Red Hat Storage Server." which >> is >>>> why I had it at 8. >>> Yeah, but you got only 6 cores and they are not dedicated for >> gluster only. I think that you need to test with lower values. figured out my magic number for client/server threads, it should be 5. I set it to 5, observed no change I could attribute to it, so tried 4, and got the same thing; no visible effect. >>>> right now the only suggested parameter I haven't played with is the >>>> performance.io-thread-count, which I currently have at 64. >> not really sure what would be a reasonable value for my system. > I guess you can try to increase it a little bit and check how is it going. turns out if you try to set this higher than 64, you get an error saying 64 is the max. >>> What I/O scheduler are you using for the SSDs (you can check via 'cat >> /sys/block/sdX/queue/scheduler)? >> >> # cat /sys/block/vda/queue/scheduler >> [mq-deadline] none > > Deadline prioritizes reads in a 2:1 ratio /default tunings/ . You can consider testing 'none' if your SSDs are good. I did this. I would say it did have a positive effect, but it was a minimal one. > I see vda , please share details on the infra as this is very important. Virtual disks have their limitations and if you are on a VM, then there might be chance to increase the CPU count. > If you are on a VM, I would recommend you to use more (in numbers) and smaller disks in stripe sets (either raid0 via mdadm, or pure striped LV). > Also, if you are on a VM -> there is no reason to reorder your I/O requests in the VM, just to do it again on the Hypervisour. In such case 'none' can bring better performance, but this varies on the workload. hm, this is a good question, one I have been asking the datacenter for a while, but they are a little bit slippery on what exactly it is they have going on there. They advertise the servers as metal with a virtual layer. The virtual layer is so you can log into a site and power the server down or up, mount an ISO to boot from, access a console, and some other nifty things. can't any more, but when they first introduced the system, you could even access the BIOS of the server. But apparently, and they swear up and down by this, it is a physical server, with real dedicated SSDs and real sticks of RAM. I have found virtio and qemu as loaded kernel modules, so certainly there is something virtual involved, but other than that and their nifty little tools, it has always acted and worked like a metal server to me. > All necessary data is in the file attributes on the brick. I doubt you will need to have access times on the brick itself. Another possibility is to use 'relatime'. remounted all bricks with noatime, no significant difference. >> cache unless flush-behind is on. So seems that is a way to throw ram >> to >> it? I put performance.write-behind-window-size: 512MB and >> performance.flush-behind: on and the whole system calmed down pretty >> much immediately. could be just timing, though, will have to see >> tomorrow during business hours whether the system stays at a reasonable Tried increasing this to its max of 1GB, no noticeable change from 512MB. The 2nd server is not acting inline with the first server. glusterfsd processes are running at 50-80% of a core each, with one brick often going over 200%, where as they usually stick to 30-45% on the first server. apache processes consume as much as 90% of a core where as they rarely go over 15% on the first server, and they frequently stack up to having more than 100 running at once, which drives load average up to 40-60. It's very much like the first server was before I found the flush-behind setting, but not as bad; at least it isn't going completely non-responsive. Additionally, it is still taking an excessive time to load the first page of most sites. I am guessing I need to increase read speeds to fix this, so I have played with performance.io-cache/cache-max-file-size(slight positive change), read-ahead/read-ahead-page-count(negative change till page count set to max of 16, then no noticeable difference), and rda-cache-limit/rda-request-size(minimal positive effect). I still have RAM to spare, so would be nice if I could be using it to improve things on the read side of things, but have found no magic bullet like flush-behind was. I found a good number of more options to try, have been going a little crazy with them, will post them at the bottom. I found a post that suggested mount options are also important: https://lists.gluster.org/pipermail/gluster-users/2018-September/034937.html I confirmed these are in the man pages, so I tried umounting and re-mounting with the -o option to include these thusly: mount -t glusterfs moogle:webisms /Computerisms/ -o negative-timeout=10,attribute-timeout=30,fopen-keep-cache,direct-io-mode=enable,fetch-attempts=5 But I don't think they are working: /# mount | grep glus moogle:webisms on /Computerisms type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072) would be grateful if there are any other suggestions anyone can think of. root at moogle:/# gluster v info Volume Name: webisms Type: Distributed-Replicate Volume ID: 261901e7-60b4-4760-897d-0163beed356e Status: Started Snapshot Count: 0 Number of Bricks: 2 x (2 + 1) = 6 Transport-type: tcp Bricks: Brick1: mooglian:/var/GlusterBrick/replset-0/webisms-replset-0 Brick2: moogle:/var/GlusterBrick/replset-0/webisms-replset-0 Brick3: moogle:/var/GlusterBrick/replset-0-arb/webisms-replset-0-arb (arbiter) Brick4: moogle:/var/GlusterBrick/replset-1/webisms-replset-1 Brick5: mooglian:/var/GlusterBrick/replset-1/webisms-replset-1 Brick6: mooglian:/var/GlusterBrick/replset-1-arb/webisms-replset-1-arb (arbiter) Options Reconfigured: performance.rda-cache-limit: 1GB performance.client-io-threads: off nfs.disable: on storage.fips-mode-rchecksum: off transport.address-family: inet performance.stat-prefetch: on network.inode-lru-limit: 200000 performance.write-behind-window-size: 1073741824 performance.readdir-ahead: on performance.io-thread-count: 64 performance.cache-size: 12GB server.event-threads: 4 client.event-threads: 4 performance.nl-cache-timeout: 600 auth.allow: xxxxxx performance.open-behind: off performance.quick-read: off cluster.lookup-optimize: off cluster.rebal-throttle: lazy features.cache-invalidation: on features.cache-invalidation-timeout: 600 performance.cache-invalidation: on performance.md-cache-timeout: 600 performance.flush-behind: on cluster.read-hash-mode: 0 performance.strict-o-direct: on cluster.readdir-optimize: on cluster.lookup-unhashed: off performance.cache-refresh-timeout: 30 performance.enable-least-priority: off cluster.choose-local: on performance.rda-request-size: 128KB performance.read-ahead: on performance.read-ahead-page-count: 16 performance.cache-max-file-size: 5MB performance.io-cache: on From sacchi at kadalu.io Thu Aug 20 15:53:25 2020 From: sacchi at kadalu.io (Sachidananda Urs) Date: Thu, 20 Aug 2020 21:23:25 +0530 Subject: [Gluster-users] Monitoring tools for GlusterFS In-Reply-To: References: <58f109a7-6d62-4814-425d-7728ea4f8338@fischer-ka.de> Message-ID: On Fri, Aug 14, 2020 at 10:04 AM Gilberto Nunes wrote: > Hi > Could you improve the output to show "Possibly undergoing heal" as well? > gluster vol heal VMS info > Brick gluster01:/DATA/vms > Status: Connected > Number of entries: 0 > > Brick gluster02:/DATA/vms > /images/100/vm-100-disk-0.raw - Possibly undergoing heal > Status: Connected > Number of entries: 1 > Gilberto, the release 1.0.2 ( https://github.com/gluster/gstatus/releases/tag/v1.0.2) has included self-heal status. The output looks like this: root at master-node:/home/sac/work/gstatus# gstatus Cluster: Status: Healthy GlusterFS: 9dev Nodes: 3/3 Volumes: 1/1 Volumes: snap-1 Replicate Started (UP) - 2/2 Bricks Up Capacity: (9.43% used) 4.00 GiB/40.00 GiB (used/total) Self-Heal: slave-1:/mnt/brick1/snapr1/r11 (13 File(s) to heal). Snapshots: 2 Quota: On Hope that helps. -sac > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gilberto.nunes32 at gmail.com Thu Aug 20 16:06:58 2020 From: gilberto.nunes32 at gmail.com (Gilberto Nunes) Date: Thu, 20 Aug 2020 13:06:58 -0300 Subject: [Gluster-users] Monitoring tools for GlusterFS In-Reply-To: References: <58f109a7-6d62-4814-425d-7728ea4f8338@fischer-ka.de> Message-ID: Awesome, thanks! That's nice! --- Gilberto Nunes Ferreira Em qui., 20 de ago. de 2020 ?s 12:53, Sachidananda Urs escreveu: > > > On Fri, Aug 14, 2020 at 10:04 AM Gilberto Nunes < > gilberto.nunes32 at gmail.com> wrote: > >> Hi >> Could you improve the output to show "Possibly undergoing heal" as well? >> gluster vol heal VMS info >> Brick gluster01:/DATA/vms >> Status: Connected >> Number of entries: 0 >> >> Brick gluster02:/DATA/vms >> /images/100/vm-100-disk-0.raw - Possibly undergoing heal >> Status: Connected >> Number of entries: 1 >> > > Gilberto, the release 1.0.2 ( > https://github.com/gluster/gstatus/releases/tag/v1.0.2) has included > self-heal status. > The output looks like this: > > root at master-node:/home/sac/work/gstatus# gstatus > > > Cluster: > > Status: Healthy GlusterFS: 9dev > > Nodes: 3/3 Volumes: 1/1 > > > Volumes: > > snap-1 Replicate Started (UP) - 2/2 > Bricks Up > > Capacity: (9.43% > used) 4.00 GiB/40.00 GiB (used/total) > > Self-Heal: > > slave-1:/mnt/brick1/snapr1/r11 > (13 File(s) to heal). > > Snapshots: 2 > > Quota: On > > Hope that helps. > > -sac > >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From joao.bauto at neuro.fchampalimaud.org Thu Aug 20 17:19:25 2020 From: joao.bauto at neuro.fchampalimaud.org (=?UTF-8?B?Sm/Do28gQmHDunRv?=) Date: Thu, 20 Aug 2020 18:19:25 +0100 Subject: [Gluster-users] Wrong directory quota usage In-Reply-To: References: Message-ID: Hi Srijan, After a 3rd run of the quota_fsck script, the quotas got fixed! Working normally again. Thank you for your help! *Jo?o Ba?to* --------------- *Scientific Computing and Software Platform* Champalimaud Research Champalimaud Center for the Unknown Av. Bras?lia, Doca de Pedrou?os 1400-038 Lisbon, Portugal fchampalimaud.org Srijan Sivakumar escreveu no dia quarta, 19/08/2020 ?(s) 18:04: > Hi Jo?o, > > I'd recommend to go with the disable/enable of the quota as that'd > eventually do the same thing. Rather than manually changing the parameters > in the said command, that would be the better option. > > -- > Thanks and Regards, > > SRIJAN SIVAKUMAR > > Associate Software Engineer > > Red Hat > > > > > > T: +91-9727532362 > > > TRIED. TESTED. TRUSTED. > > On Wed, Aug 19, 2020 at 8:12 PM Jo?o Ba?to < > joao.bauto at neuro.fchampalimaud.org> wrote: > >> Hi Srijan, >> >> Before I do the disable/enable just want to check something with you. The >> other cluster where the crawling is running, I can see the find command and >> this one which seems to be the one triggering the crawler (4 processes, one >> per brick in all nodes) >> >> /usr/sbin/glusterfs -s localhost --volfile-id >> client_per_brick/tank.client.hostname.tank-volume1-brick.vol >> --use-readdirp=yes --client-pid -100 -l >> /var/log/glusterfs/quota_crawl/tank-volume1-brick.log >> /var/run/gluster/tmp/mntYbIVwT >> >> Can I manually trigger this command? >> >> Thanks! >> *Jo?o Ba?to* >> --------------- >> >> *Scientific Computing and Software Platform* >> Champalimaud Research >> Champalimaud Center for the Unknown >> Av. Bras?lia, Doca de Pedrou?os >> 1400-038 Lisbon, Portugal >> fchampalimaud.org >> >> >> Srijan Sivakumar escreveu no dia quarta, >> 19/08/2020 ?(s) 07:25: >> >>> Hi Jo?o, >>> >>> If the crawl is not going on and the values are still not reflecting >>> properly then it means the crawl process has ended abruptly. >>> >>> Yes, technically disabling and enabling the quota will trigger crawl but >>> it'd do a complete crawl of the filesystem, hence would take time and be >>> resource consuming. Usually disabling-enabling is the last thing to do if >>> the accounting isn't reflecting properly but if you're going to merge these >>> two clusters then probably you can go ahead with the merging and then >>> enable quota. >>> >>> -- >>> Thanks and Regards, >>> >>> SRIJAN SIVAKUMAR >>> >>> Associate Software Engineer >>> >>> Red Hat >>> >>> >>> >>> >>> >>> T: +91-9727532362 >>> >>> >>> TRIED. TESTED. TRUSTED. >>> >>> On Wed, Aug 19, 2020 at 3:53 AM Jo?o Ba?to < >>> joao.bauto at neuro.fchampalimaud.org> wrote: >>> >>>> Hi Srijan, >>>> >>>> I didn't get any result with that command so I went to our other >>>> cluster (we are merging two clusters, data is replicated) and activated the >>>> quota feature on the same directory. Running the same command on each node >>>> I get a similar output to yours. One process per brick I'm assuming. >>>> >>>> root 1746822 1.4 0.0 230324 2992 ? S 23:06 0:04 >>>> /usr/bin/find . -exec /usr/bin/stat {} \ ; >>>> root 1746858 5.3 0.0 233924 6644 ? S 23:06 0:15 >>>> /usr/bin/find . -exec /usr/bin/stat {} \ ; >>>> root 1746889 3.3 0.0 233592 6452 ? S 23:06 0:10 >>>> /usr/bin/find . -exec /usr/bin/stat {} \ ; >>>> root 1746930 3.1 0.0 230476 3232 ? S 23:06 0:09 >>>> /usr/bin/find . -exec /usr/bin/stat {} \ ; >>>> >>>> At this point, is it easier to just disable and enable the feature and >>>> force a new crawl? We don't mind a temporary increase in CPU and IO usage. >>>> >>>> Thank you again! >>>> *Jo?o Ba?to* >>>> --------------- >>>> >>>> *Scientific Computing and Software Platform* >>>> Champalimaud Research >>>> Champalimaud Center for the Unknown >>>> Av. Bras?lia, Doca de Pedrou?os >>>> 1400-038 Lisbon, Portugal >>>> fchampalimaud.org >>>> >>>> >>>> Srijan Sivakumar escreveu no dia ter?a, >>>> 18/08/2020 ?(s) 21:42: >>>> >>>>> Hi Jo?o, >>>>> >>>>> There isn't a straightforward way of tracking the crawl but as gluster >>>>> uses find and stat during crawl, one can run the following command, >>>>> # ps aux | grep find >>>>> >>>>> If the output is of the form, >>>>> "root 1513 0.0 0.1 127224 2636 ? S 12:24 0.00 >>>>> /usr/bin/find . -exec /usr/bin/stat {} \" >>>>> then it means that the crawl is still going on. >>>>> >>>>> >>>>> Thanks and Regards, >>>>> >>>>> SRIJAN SIVAKUMAR >>>>> >>>>> Associate Software Engineer >>>>> >>>>> Red Hat >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> T: +91-9727532362 >>>>> >>>>> >>>>> TRIED. TESTED. TRUSTED. >>>>> >>>>> >>>>> On Wed, Aug 19, 2020 at 1:46 AM Jo?o Ba?to < >>>>> joao.bauto at neuro.fchampalimaud.org> wrote: >>>>> >>>>>> Hi Srijan, >>>>>> >>>>>> Is there a way of getting the status of the crawl process? >>>>>> We are going to expand this cluster, adding 12 new bricks (around >>>>>> 500TB) and we rely heavily on the quota feature to control the space usage >>>>>> for each project. It's been running since Saturday (nothing changed) >>>>>> and unsure if it's going to finish tomorrow or in weeks. >>>>>> >>>>>> Thank you! >>>>>> *Jo?o Ba?to* >>>>>> --------------- >>>>>> >>>>>> *Scientific Computing and Software Platform* >>>>>> Champalimaud Research >>>>>> Champalimaud Center for the Unknown >>>>>> Av. Bras?lia, Doca de Pedrou?os >>>>>> 1400-038 Lisbon, Portugal >>>>>> fchampalimaud.org >>>>>> >>>>>> >>>>>> Srijan Sivakumar escreveu no dia domingo, >>>>>> 16/08/2020 ?(s) 06:11: >>>>>> >>>>>>> Hi Jo?o, >>>>>>> >>>>>>> Yes it'll take some time given the file system size as it has to >>>>>>> change the xattrs in each level and then crawl upwards. >>>>>>> >>>>>>> stat is done by the script itself so the crawl is initiated. >>>>>>> >>>>>>> Regards, >>>>>>> Srijan Sivakumar >>>>>>> >>>>>>> On Sun 16 Aug, 2020, 04:58 Jo?o Ba?to, < >>>>>>> joao.bauto at neuro.fchampalimaud.org> wrote: >>>>>>> >>>>>>>> Hi Srijan & Strahil, >>>>>>>> >>>>>>>> I ran the quota_fsck script mentioned in Hari's blog post in all >>>>>>>> bricks and it detected a lot of size mismatch. >>>>>>>> >>>>>>>> The script was executed as, >>>>>>>> >>>>>>>> - python quota_fsck.py --sub-dir projectB --fix-issues >>>>>>>> /mnt/tank /tank/volume2/brick (in all nodes and bricks) >>>>>>>> >>>>>>>> Here is a snippet from the script, >>>>>>>> >>>>>>>> Size Mismatch /tank/volume2/brick/projectB {'parents': >>>>>>>> {'00000000-0000-0000-0000-000000000001': {'contri_file_count': >>>>>>>> 18446744073035296610L, 'contri_size': 18446645297413872640L, >>>>>>>> 'contri_dir_count': 18446744073709527653L}}, 'version': '1', 'file_count': >>>>>>>> 18446744073035296610L, 'dirty': False, 'dir_count': 18446744073709527653L, >>>>>>>> 'size': 18446645297413872640L} 15204281691754 >>>>>>>> MARKING DIRTY: /tank/volume2/brick/projectB >>>>>>>> stat on /mnt/tank/projectB >>>>>>>> Files verified : 683223 >>>>>>>> Directories verified : 46823 >>>>>>>> Objects Fixed : 705230 >>>>>>>> >>>>>>>> Checking the xattr in the bricks I can see the directory in >>>>>>>> question marked as dirty, >>>>>>>> # getfattr -d -m. -e hex /tank/volume2/brick/projectB >>>>>>>> getfattr: Removing leading '/' from absolute path names >>>>>>>> # file: tank/volume2/brick/projectB >>>>>>>> trusted.gfid=0x3ca2bce0455945efa6662813ce20fc0c >>>>>>>> >>>>>>>> trusted.glusterfs.9582685f-07fa-41fd-b9fc-ebab3a6989cf.xtime=0x5f372478000a7705 >>>>>>>> trusted.glusterfs.dht=0xe1a4060c000000003ffffffe5ffffffc >>>>>>>> >>>>>>>> trusted.glusterfs.mdata=0x010000000000000000000000005f3724750000000013ddf679000000005ce2aff90000000007fdacb0000000005ce2aff90000000007fdacb0 >>>>>>>> >>>>>>>> trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri.1=0x00000ca6ccf7a80000000000000790a1000000000000b6ea >>>>>>>> trusted.glusterfs.quota.dirty=0x3100 >>>>>>>> >>>>>>>> trusted.glusterfs.quota.limit-set.1=0x0000640000000000ffffffffffffffff >>>>>>>> >>>>>>>> trusted.glusterfs.quota.size.1=0x00000ca6ccf7a80000000000000790a1000000000000b6ea >>>>>>>> >>>>>>>> Now, my question is how do I trigger Gluster to recalculate the >>>>>>>> quota for this directory? Is it automatic but it takes a while? Because the >>>>>>>> quota list did change but not to a good "result". >>>>>>>> >>>>>>>> Path Hard-limit Soft-limit Used >>>>>>>> Available Soft-limit exceeded? Hard-limit exceeded? >>>>>>>> /projectB 100.0TB 80%(80.0TB) 16383.9PB 190.1TB >>>>>>>> No No >>>>>>>> >>>>>>>> I would like to avoid a disable/enable quota in the volume as it >>>>>>>> removes the configs. >>>>>>>> >>>>>>>> Thank you for all the help! >>>>>>>> *Jo?o Ba?to* >>>>>>>> --------------- >>>>>>>> >>>>>>>> *Scientific Computing and Software Platform* >>>>>>>> Champalimaud Research >>>>>>>> Champalimaud Center for the Unknown >>>>>>>> Av. Bras?lia, Doca de Pedrou?os >>>>>>>> 1400-038 Lisbon, Portugal >>>>>>>> fchampalimaud.org >>>>>>>> >>>>>>>> >>>>>>>> Srijan Sivakumar escreveu no dia s?bado, >>>>>>>> 15/08/2020 ?(s) 11:57: >>>>>>>> >>>>>>>>> Hi Jo?o, >>>>>>>>> >>>>>>>>> The quota accounting error is what we're looking at here. I think >>>>>>>>> you've already looked into the blog post by Hari and are using the script >>>>>>>>> to fix the accounting issue. >>>>>>>>> That should help you out in fixing this issue. >>>>>>>>> >>>>>>>>> Let me know if you face any issues while using it. >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Srijan Sivakumar >>>>>>>>> >>>>>>>>> >>>>>>>>> On Fri 14 Aug, 2020, 17:10 Jo?o Ba?to, < >>>>>>>>> joao.bauto at neuro.fchampalimaud.org> wrote: >>>>>>>>> >>>>>>>>>> Hi Strahil, >>>>>>>>>> >>>>>>>>>> I have tried removing the quota for that specific directory and >>>>>>>>>> setting it again but it didn't work (maybe it has to be a quota disable and >>>>>>>>>> enable in the volume options). Currently testing a solution >>>>>>>>>> by Hari with the quota_fsck.py script (https://medium.com/@ >>>>>>>>>> harigowtham/glusterfs-quota-fix-accounting-840df33fcd3a) and its >>>>>>>>>> detecting a lot of size mismatch in files. >>>>>>>>>> >>>>>>>>>> Thank you, >>>>>>>>>> *Jo?o Ba?to* >>>>>>>>>> --------------- >>>>>>>>>> >>>>>>>>>> *Scientific Computing and Software Platform* >>>>>>>>>> Champalimaud Research >>>>>>>>>> Champalimaud Center for the Unknown >>>>>>>>>> Av. Bras?lia, Doca de Pedrou?os >>>>>>>>>> 1400-038 Lisbon, Portugal >>>>>>>>>> fchampalimaud.org >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Strahil Nikolov escreveu no dia sexta, >>>>>>>>>> 14/08/2020 ?(s) 10:16: >>>>>>>>>> >>>>>>>>>>> Hi Jo?o, >>>>>>>>>>> >>>>>>>>>>> Based on your output it seems that the quota size is different >>>>>>>>>>> on the 2 bricks. >>>>>>>>>>> >>>>>>>>>>> Have you tried to remove the quota and then recreate it ? Maybe >>>>>>>>>>> it will be the easiest way to fix it. >>>>>>>>>>> >>>>>>>>>>> Best Regards, >>>>>>>>>>> Strahil Nikolov >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> ?? 14 ?????? 2020 ?. 4:35:14 GMT+03:00, "Jo?o Ba?to" < >>>>>>>>>>> joao.bauto at neuro.fchampalimaud.org> ??????: >>>>>>>>>>> >Hi all, >>>>>>>>>>> > >>>>>>>>>>> >We have a 4-node distributed cluster with 2 bricks per node >>>>>>>>>>> running >>>>>>>>>>> >Gluster >>>>>>>>>>> >7.7 + ZFS. We use directory quota to limit the space used by our >>>>>>>>>>> >members on >>>>>>>>>>> >each project. Two days ago we noticed inconsistent space used >>>>>>>>>>> reported >>>>>>>>>>> >by >>>>>>>>>>> >Gluster in the quota list. >>>>>>>>>>> > >>>>>>>>>>> >A small snippet of gluster volume quota vol list, >>>>>>>>>>> > >>>>>>>>>>> > Path Hard-limit Soft-limit Used >>>>>>>>>>> >Available Soft-limit exceeded? Hard-limit exceeded? >>>>>>>>>>> >/projectA 5.0TB 80%(4.0TB) 3.1TB >>>>>>>>>>> 1.9TB >>>>>>>>>>> > No No >>>>>>>>>>> >*/projectB 100.0TB 80%(80.0TB) 16383.4PB >>>>>>>>>>> 740.9TB >>>>>>>>>>> > No No* >>>>>>>>>>> >/projectC 70.0TB 80%(56.0TB) 50.0TB >>>>>>>>>>> 20.0TB >>>>>>>>>>> > No No >>>>>>>>>>> > >>>>>>>>>>> >The total space available in the cluster is 360TB, the quota for >>>>>>>>>>> >projectB >>>>>>>>>>> >is 100TB and, as you can see, its reporting 16383.4PB used and >>>>>>>>>>> 740TB >>>>>>>>>>> >available (already decreased from 750TB). >>>>>>>>>>> > >>>>>>>>>>> >There was an issue in Gluster 3.x related to the wrong >>>>>>>>>>> directory quota >>>>>>>>>>> >( >>>>>>>>>>> > >>>>>>>>>>> https://lists.gluster.org/pipermail/gluster-users/2016-February/025305.html >>>>>>>>>>> > and >>>>>>>>>>> > >>>>>>>>>>> https://lists.gluster.org/pipermail/gluster-users/2018-November/035374.html >>>>>>>>>>> ) >>>>>>>>>>> >but it's marked as solved (not sure if the solution still >>>>>>>>>>> applies). >>>>>>>>>>> > >>>>>>>>>>> >*On projectB* >>>>>>>>>>> ># getfattr -d -m . -e hex projectB >>>>>>>>>>> ># file: projectB >>>>>>>>>>> >trusted.gfid=0x3ca2bce0455945efa6662813ce20fc0c >>>>>>>>>>> >>>>>>>>>>> >trusted.glusterfs.9582685f-07fa-41fd-b9fc-ebab3a6989cf.xtime=0x5f35e69800098ed9 >>>>>>>>>>> >trusted.glusterfs.dht=0xe1a4060c000000003ffffffe5ffffffc >>>>>>>>>>> >>>>>>>>>>> >trusted.glusterfs.mdata=0x010000000000000000000000005f355c59000000000939079f000000005ce2aff90000000007fdacb0000000005ce2aff90000000007fdacb0 >>>>>>>>>>> >>>>>>>>>>> >trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri.1=0x0000ab0f227a860000000000478e33acffffffffffffc112 >>>>>>>>>>> >trusted.glusterfs.quota.dirty=0x3000 >>>>>>>>>>> >>>>>>>>>>> >trusted.glusterfs.quota.limit-set.1=0x0000640000000000ffffffffffffffff >>>>>>>>>>> >>>>>>>>>>> >trusted.glusterfs.quota.size.1=0x0000ab0f227a860000000000478e33acffffffffffffc112 >>>>>>>>>>> > >>>>>>>>>>> >*On projectA* >>>>>>>>>>> ># getfattr -d -m . -e hex projectA >>>>>>>>>>> ># file: projectA >>>>>>>>>>> >trusted.gfid=0x05b09ded19354c0eb544d22d4659582e >>>>>>>>>>> >>>>>>>>>>> >trusted.glusterfs.9582685f-07fa-41fd-b9fc-ebab3a6989cf.xtime=0x5f1aeb9f00044c64 >>>>>>>>>>> >trusted.glusterfs.dht=0xe1a4060c000000001fffffff3ffffffd >>>>>>>>>>> >>>>>>>>>>> >trusted.glusterfs.mdata=0x010000000000000000000000005f1ac6a10000000018f30a4e000000005c338fab0000000017a3135a000000005b0694fb000000001584a21b >>>>>>>>>>> >>>>>>>>>>> >trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri.1=0x0000067de3bbe20000000000000128610000000000033498 >>>>>>>>>>> >trusted.glusterfs.quota.dirty=0x3000 >>>>>>>>>>> >>>>>>>>>>> >trusted.glusterfs.quota.limit-set.1=0x0000460000000000ffffffffffffffff >>>>>>>>>>> >>>>>>>>>>> >trusted.glusterfs.quota.size.1=0x0000067de3bbe20000000000000128610000000000033498 >>>>>>>>>>> > >>>>>>>>>>> >Any idea on what's happening and how to fix it? >>>>>>>>>>> > >>>>>>>>>>> >Thanks! >>>>>>>>>>> >*Jo?o Ba?to* >>>>>>>>>>> >--------------- >>>>>>>>>>> > >>>>>>>>>>> >*Scientific Computing and Software Platform* >>>>>>>>>>> >Champalimaud Research >>>>>>>>>>> >Champalimaud Center for the Unknown >>>>>>>>>>> >Av. Bras?lia, Doca de Pedrou?os >>>>>>>>>>> >1400-038 Lisbon, Portugal >>>>>>>>>>> >fchampalimaud.org >>>>>>>>>>> >>>>>>>>>> ________ >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Community Meeting Calendar: >>>>>>>>>> >>>>>>>>>> Schedule - >>>>>>>>>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >>>>>>>>>> Bridge: https://bluejeans.com/441850968 >>>>>>>>>> >>>>>>>>>> Gluster-users mailing list >>>>>>>>>> Gluster-users at gluster.org >>>>>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>>>>>>> >>>>>>>>> >>>>> >>>>> -- >>>>> >>>> >>> >>> > > -- > Thanks and Regards, > > SRIJAN SIVAKUMAR > > Associate Software Engineer > > Red Hat > > > > > > T: +91-9727532362 > > > TRIED. TESTED. TRUSTED. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vd at d7informatics.de Thu Aug 20 19:57:49 2020 From: vd at d7informatics.de (Volker Dormeyer) Date: Thu, 20 Aug 2020 21:57:49 +0200 Subject: [Gluster-users] Kadalu Message-ID: <25a97bf3-aa7c-1c81-d072-13c2bca6f612@d7informatics.de> Hi all, The more I read on kadalu the more questions I have, please tell me, if this is the wrong place to ask about kadalu. When I use the external mode to access Gluster, I need to specify a Gluster node, but what happend to my service if this node is not reachable anymore. Or what does happen in general as soon as this node fails? Can I run the Gluster services together with kadalu on a Kubernetes cluster and provide storage to a second Kubernetes cluster without local storage? Thank you, Volker From kees.dejong+lst at neobits.nl Thu Aug 20 21:23:03 2020 From: kees.dejong+lst at neobits.nl (K. de Jong) Date: Thu, 20 Aug 2020 23:23:03 +0200 Subject: [Gluster-users] 4 node cluster (best performance + redundancy setup?) In-Reply-To: <264420059.37752007.1597312191618.JavaMail.zimbra@redhat.com> References: <264420059.37752007.1597312191618.JavaMail.zimbra@redhat.com> Message-ID: On Thu, 2020-08-13 at 05:49 -0400, Ashish Pandey wrote: > > With 4 nodes, yes it is possible to use disperse volume. > > Redundancy count 2 is not the best but most often used as far as my > > interaction with users. > > disperse volume with 4 bricks is also possible but it might not be a > > best configuration. > > I would suggest to have 6 bricks and 4 +2 configuration > > where 4 - Data bricks > > and 2 - Redundant bricks, in other way maximum number of brick which > > can go bad while you can still use disperse volume. > > > > If you have number of disks on 4 nodes, you can create the 4 +2 > > disperse volume in different way while maintaining the requirenment > > of EC (disperse volume) Thank you for your reply. I finally received my 4th disk and I started to experiment with different modes. But it seems like I can't do much with 4 bricks (and using them all). My idea was to have a 3+1 setup. So that one node (brick) can fail and everything still works without loosing the minimum quorum of 3. But using disperse with redundancy doesn't accept this. At least one needs to be set for redundancy. But then the RMW (Read-Modify-Write) cycle is not efficient; 512 * (4-1) = 1536 bytes. Setting 2 disks for redundancy is not recommended in terms of split-brain scenarios. An uneven number needs to be configured, i.e. nog 2 or 4. A replica set of 4 is also not allowed, since there has to be a majority in the quorum. So, an uneven number is required, which is not 4. Using arbiters makes no difference in this context (of course). How would I best achieve a 3+1 setup? Because to maintain a running system without split-brain, I need at least 3 nodes. With 4, one should be able to fail. But the modes I've explored here do not seem to support that. So maybe there is an option to have a disk in standby? Performance and disk efficiency are of course always nice too. But I'm wondering now if 4 disks is even possible at all. On do, aug 13, 2020 at 05:49, Ashish Pandey wrote: > > > *From:*"K. de Jong" > *To:*gluster-users at gluster.org > *Sent:*Thursday, August 13, 2020 11:43:03 AM > *Subject:*[Gluster-users] 4 node cluster (best performance + > redundancy setup?) > > I posted something in the subreddit [1], but I saw the suggestion > elsewhere that the mailinglist is more active. > > I've been reading the docs. And from this [2] overview the distributed > replicated [3] and dispersed + redundancy [4] sound the most > interesting. > > Each node (Raspberry Pi 4, 2x 8GB and 2x 4GB version) has a 4TB HD > disk attached via a docking station. I'm still waiting for the 4th > Raspberry Pi, so I can't really experiment with the intended setup. > But > the setup of 2 replicas and 1 arbiter was quite disappointing. I got > between 6MB/s and 60 MB/s, depending on the test (I did a broad range > of tests with bonnie++ and simply dd). Without GlusterFS a simple dd > of > a 1GB file is about 100+ MB/s. 100MB/s is okay for this cluster. > > My goal is the following: > * Run a HA environment with Pacemaker (services like Nextcloud, > Dovecot, Apache). > * One node should be able to fail without downtime. > * Performance and storage efficiency should be reasonable with the > given hardware. So with that I mean, when everything is a replica then > storage is stuck at 4TB. And I would prefer to have some more than > that > limitation, but with redundancy. > > However, when reading the docs about disperse, I see some interesting > points. A big pro is "providing space-efficient protection against > disk > or server failures". But the following is interesting as well: "The > total number of bricks must be greater than 2 * redundancy". So, I > want > the cluster to be available when one node fails. And be able to > recreate the data on a new disk, on that forth node. I also read about > the RMW efficiency, I guess 2 sets of 2 is the only thing that will > work with that performance and disk efficiency in mind. Because 1 > redundancy would mess up the RMW cycle. > > My questions: > * With 4 nodes; is it possible to use disperse and redundancy? And is > a > redundancy count of 2 the best (and only) choice when dealing with 4 > disks? > > With 4 nodes, yes it is possible to use disperse volume. > Redundancy count 2 is not the best but most often used as far as my > interaction with users. > disperse volume with 4 bricks is also possible but it might not be a > best configuration. > I would suggest to have 6 bricks and 4 +2 configuration > where 4 - Data bricks > and 2 - Redundant bricks, in other way maximum number of brick which > can go bad while you can still use disperse volume. > > If you have number of disks on 4 nodes, you can create the 4 +2 > disperse volume in different way while maintaining the requirenment > of EC (disperse volume) > > > * The example does show a 4 node disperse command, but has as output > `There isn't an optimal redundancy value for this configuration. Do > you > want to create the volume with redundancy 1 ? (y/n)`. I'm not sure if > it's okay to simply select 'y' as an answer. The output is a bit > vague, > because it says it's not optimal, so it will be just slow, but will > work I guess? > > > It will not be optimal from the point of view of calculation which we > make. > You want to have a best configuration where yu can have maximum > redundancy (failure tolerance) and also maximum storage capacity. > In that regards, it will not be an optimal solution. Performance can > also be a factor. > > * The RMW (Read-Modify-Write) cycle is probably what's meant. 512 * > (#Bricks - redundancy) would be in this case for me 512 * (4-1) = 1536 > byes, which doesn't seem optimal, because it's a weird number, it's > not > a power of 2 (512, 1024, 2048, etc.). Choosing a replica of 2 would > translate to 1024, which would seem more "okay". But I don't know for > sure. > > Yes, you are right. > > * Or am I better off by simply creating 2 pairs of replicas (so no > disperse)? So in that sense I would have 8TB available, and one node > can fail. This would provide some read performance benefits. > * What would be a good way to integrate this with Pacemaker? With that > I mean, should I manage the gluster resource with Pacemaker? Or simply > try to mount the glusterfs, if it's not available, then depending > resources can't start anyway. So in other words, let glusterfs handle > failover itself. > > > gluster can handle fail over on replica or disperse level as per its > implementation. > Even if you want to go for replica, it does not replica 2 does not > look like a best option, you should > go for replica 3 or arbiter volume to have best fault tolerance. > However, that will cost you a big storage capacity. > > > Any advice/tips? > > > > > [1] > > [2] > > [3] > > [4] > > > ________ > > > > Community Meeting Calendar: > > Schedule - > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > Bridge: https://bluejeans.com/441850968 > > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gilberto.nunes32 at gmail.com Thu Aug 20 22:36:52 2020 From: gilberto.nunes32 at gmail.com (Gilberto Nunes) Date: Thu, 20 Aug 2020 19:36:52 -0300 Subject: [Gluster-users] Monitoring tools for GlusterFS In-Reply-To: References: <58f109a7-6d62-4814-425d-7728ea4f8338@fischer-ka.de> Message-ID: Hi Sachidananda! I am trying to use the latest release of gstatus, but when I cut off one of the nodes, I get timeout... gluster vol heal VMS info Brick glusterfs01:/DATA/vms Status: Transport endpoint is not connected Number of entries: - Brick glusterfs02:/DATA/vms /images/100/vm-100-disk-0.qcow2 Status: Connected Number of entries: 1 proxmox02:~# gstatus -a Error : Request timed out gluster vol heal VMS info works but gstatus -a gets me timeout. Any suggestions? Thanks --- Gilberto Nunes Ferreira Em qui., 20 de ago. de 2020 ?s 12:53, Sachidananda Urs escreveu: > > > On Fri, Aug 14, 2020 at 10:04 AM Gilberto Nunes < > gilberto.nunes32 at gmail.com> wrote: > >> Hi >> Could you improve the output to show "Possibly undergoing heal" as well? >> gluster vol heal VMS info >> Brick gluster01:/DATA/vms >> Status: Connected >> Number of entries: 0 >> >> Brick gluster02:/DATA/vms >> /images/100/vm-100-disk-0.raw - Possibly undergoing heal >> Status: Connected >> Number of entries: 1 >> > > Gilberto, the release 1.0.2 ( > https://github.com/gluster/gstatus/releases/tag/v1.0.2) has included > self-heal status. > The output looks like this: > > root at master-node:/home/sac/work/gstatus# gstatus > > > Cluster: > > Status: Healthy GlusterFS: 9dev > > Nodes: 3/3 Volumes: 1/1 > > > Volumes: > > snap-1 Replicate Started (UP) - 2/2 > Bricks Up > > Capacity: (9.43% > used) 4.00 GiB/40.00 GiB (used/total) > > Self-Heal: > > slave-1:/mnt/brick1/snapr1/r11 > (13 File(s) to heal). > > Snapshots: 2 > > Quota: On > > Hope that helps. > > -sac > >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From amar at kadalu.io Fri Aug 21 04:15:05 2020 From: amar at kadalu.io (Amar Tumballi) Date: Fri, 21 Aug 2020 09:45:05 +0530 Subject: [Gluster-users] Kadalu In-Reply-To: <25a97bf3-aa7c-1c81-d072-13c2bca6f612@d7informatics.de> References: <25a97bf3-aa7c-1c81-d072-13c2bca6f612@d7informatics.de> Message-ID: Let me try to answer you.. On Fri, 21 Aug, 2020, 1:28 am Volker Dormeyer, wrote: > Hi all, > > The more I read on kadalu the more questions I have, please tell me, if > this is the wrong place to ask about kadalu. > > We are using GitHub issues (https://github.com/kadalu/kadalu/issues) or slack channel (https://kadalu.slack.com) to discuss kadalu issues. This is Ok, but not every issue of kadalu may be discussed here. > When I use the external mode to access Gluster, I need to specify a > Gluster node, but what happend to my service if this node is not > reachable anymore. Or what does happen in general as soon as this node > fails? > > the node is used for 'mounting' (ie, to fetch the volume info), so, if you are having a HA setup with replica 3, even if the node goes down, gluster file system continues to work, ie, all PVs will be working fine. We can enhance the 'options:' in storage spec to take 'backup-volfile-server', so even the mount issue can be resolved. Should be an RFE to project. > Can I run the Gluster services together with kadalu on a Kubernetes > cluster and provide storage to a second Kubernetes cluster without local > storage? > > Is the second storage cluster having access to nodes in this cluster? (ie, reachability?) if yes, it works as an 'External' gluster setup for that second cluster. Should work technically, haven't tried that setup. > Thank you, > Volker > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bob at computerisms.ca Fri Aug 21 04:44:35 2020 From: bob at computerisms.ca (Computerisms Corporation) Date: Thu, 20 Aug 2020 21:44:35 -0700 Subject: [Gluster-users] client side profiling Message-ID: <8caca4a7-eb76-5693-7524-f20a833eca03@computerisms.ca> Hi List, I am still struggling with my setup. One server is working reasonably well for serving websites, but serving sites from the 2nd server is still using excessive amounts of cpu; a bit of which is gluster, but most of which is apache. Gluster docs mentions client-side-profiling: https://gluster.readthedocs.io/en/latest/Administrator%20Guide/Performance%20Testing/#client-side-profiling specifically: "In short, use client-side profiling for understanding "why is my application unresponsive"?" Great, I think this is what I want. instructions are: *gluster volume profile your-volume start *setfattr -n trusted.io-stats-dump -v /tmp/io-stats-pre.txt /your/mountpoint *This will generate the specified file on the client Okay: root at moogle:/usr/src/gluster-profile-analysis-master# gluster vol profile webisms start Profile on Volume webisms is already started root at moogle:/usr/src/gluster-profile-analysis-master# setfattr -n trusted.io-stats-dump -v /tmp/stats.txt /Computerisms root at moogle:/usr/src/gluster-profile-analysis-master# ls /tmp/stats.txt ls: cannot access '/tmp/stats.txt': No such file or directory thought for sure I am doing something wrong, so I had a look at the gvp-client.sh script, and it appears I am doing the command correctly, there is just no output file. Am I missing something? or is this an outdated methodology that no longer works? -- Bob Miller Cell: 867-334-7117 Office: 867-633-3760 Office: 867-322-0362 www.computerisms.ca From hunter86_bg at yahoo.com Fri Aug 21 05:32:50 2020 From: hunter86_bg at yahoo.com (Strahil Nikolov) Date: Fri, 21 Aug 2020 08:32:50 +0300 Subject: [Gluster-users] performance In-Reply-To: <0166c1ff-83c0-4d5f-aa96-6cd8a2518cd1@computerisms.ca> References: <696b3c28-519b-c3e3-ce5d-e60d2f194d4c@computerisms.ca> <7991483E-5365-4C87-89FA-C871AED18062@yahoo.com> <345b06c4-5996-9aa3-f846-0944c60ee398@computerisms.ca> <2CD68ED2-199F-407D-B0CC-385793BA16FD@yahoo.com> <64ee1b88-42d6-75d2-05ff-4703d168cc25@computerisms.ca> <68274322-B514-4555-A236-D159B16D42FC@yahoo.com> <0166c1ff-83c0-4d5f-aa96-6cd8a2518cd1@computerisms.ca> Message-ID: ?? 20 ?????? 2020 ?. 3:46:41 GMT+03:00, Computerisms Corporation ??????: >Hi Strahil, > >so over the last two weeks, the system has been relatively stable. I >have powered off both servers at least once, for about 5 minutes each >time. server came up, auto-healed what it needed to, so all of that >part is working as expected. > >will answer things inline and follow with more questions: > >>>> Hm... OK. I guess you can try 7.7 whenever it's possible. >>> >>> Acknowledged. > >Still on my list. >> It could be a bad firmware also. If you get the opportunity, flash >the firmware and bump the OS to the max. > >Datacenter says everything was up to date as of installation, not >really >wanting them to take the servers offline for long enough to redo all >the >hardware. > >>>>> more number of CPU cycles than needed, increasing the event thread >>>>> count >>>>> would enhance the performance of the Red Hat Storage Server." >which >>> is >>>>> why I had it at 8. >>>> Yeah, but you got only 6 cores and they are not dedicated for >>> gluster only. I think that you need to test with lower values. > >figured out my magic number for client/server threads, it should be 5. >I set it to 5, observed no change I could attribute to it, so tried 4, >and got the same thing; no visible effect. > >>>>> right now the only suggested parameter I haven't played with is >the >>>>> performance.io-thread-count, which I currently have at 64. >>> not really sure what would be a reasonable value for my system. >> I guess you can try to increase it a little bit and check how is it >going. > >turns out if you try to set this higher than 64, you get an error >saying >64 is the max. > >>>> What I/O scheduler are you using for the SSDs (you can check via >'cat >>> /sys/block/sdX/queue/scheduler)? >>> >>> # cat /sys/block/vda/queue/scheduler >>> [mq-deadline] none >> >> Deadline prioritizes reads in a 2:1 ratio /default tunings/ . You >can consider testing 'none' if your SSDs are good. > >I did this. I would say it did have a positive effect, but it was a >minimal one. > >> I see vda , please share details on the infra as this is very >important. Virtual disks have their limitations and if you are on a VM, > then there might be chance to increase the CPU count. >> If you are on a VM, I would recommend you to use more (in numbers) >and smaller disks in stripe sets (either raid0 via mdadm, or pure >striped LV). >> Also, if you are on a VM -> there is no reason to reorder your I/O >requests in the VM, just to do it again on the Hypervisour. In such >case 'none' can bring better performance, but this varies on the >workload. > >hm, this is a good question, one I have been asking the datacenter for >a >while, but they are a little bit slippery on what exactly it is they >have going on there. They advertise the servers as metal with a >virtual >layer. The virtual layer is so you can log into a site and power the >server down or up, mount an ISO to boot from, access a console, and >some >other nifty things. can't any more, but when they first introduced the > >system, you could even access the BIOS of the server. But apparently, >and they swear up and down by this, it is a physical server, with real >dedicated SSDs and real sticks of RAM. I have found virtio and qemu as > >loaded kernel modules, so certainly there is something virtual >involved, >but other than that and their nifty little tools, it has always acted >and worked like a metal server to me. You can use 'virt-what' binary to find if and what type of Virtualization is used. I have a suspicion you are ontop of Openstack (which uses CEPH), so I guess you can try to get more info. For example, an Openstack instance can have '0x1af4' in '/sys/block/vdX/device/vendor' (replace X with actual device letter). Another check could be: /usr/lib/udev/scsi_id -g -u -d /dev/vda And also, you can try to take a look with smartctl from smartmontools package: smartctl -a /dev/vdX >> All necessary data is in the file attributes on the brick. I doubt >you will need to have access times on the brick itself. Another >possibility is to use 'relatime'. > >remounted all bricks with noatime, no significant difference. > >>> cache unless flush-behind is on. So seems that is a way to throw >ram >>> to >>> it? I put performance.write-behind-window-size: 512MB and >>> performance.flush-behind: on and the whole system calmed down pretty >>> much immediately. could be just timing, though, will have to see >>> tomorrow during business hours whether the system stays at a >reasonable > >Tried increasing this to its max of 1GB, no noticeable change from >512MB. > >The 2nd server is not acting inline with the first server. glusterfsd >processes are running at 50-80% of a core each, with one brick often >going over 200%, where as they usually stick to 30-45% on the first >server. apache processes consume as much as 90% of a core where as >they >rarely go over 15% on the first server, and they frequently stack up to > >having more than 100 running at once, which drives load average up to >40-60. It's very much like the first server was before I found the >flush-behind setting, but not as bad; at least it isn't going >completely >non-responsive. > >Additionally, it is still taking an excessive time to load the first >page of most sites. I am guessing I need to increase read speeds to >fix >this, so I have played with >performance.io-cache/cache-max-file-size(slight positive change), >read-ahead/read-ahead-page-count(negative change till page count set to > >max of 16, then no noticeable difference), and >rda-cache-limit/rda-request-size(minimal positive effect). I still >have >RAM to spare, so would be nice if I could be using it to improve things >on the read side of things, but have found no magic bullet like >flush-behind was. > >I found a good number of more options to try, have been going a little >crazy with them, will post them at the bottom. I found a post that >suggested mount options are also important: > >https://lists.gluster.org/pipermail/gluster-users/2018-September/034937.html > >I confirmed these are in the man pages, so I tried umounting and >re-mounting with the -o option to include these thusly: > >mount -t glusterfs moogle:webisms /Computerisms/ -o >negative-timeout=10,attribute-timeout=30,fopen-keep-cache,direct-io-mode=enable,fetch-attempts=5 > >But I don't think they are working: > >/# mount | grep glus >moogle:webisms on /Computerisms type fuse.glusterfs >(rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072) > >would be grateful if there are any other suggestions anyone can think >of. > >root at moogle:/# gluster v info > >Volume Name: webisms >Type: Distributed-Replicate >Volume ID: 261901e7-60b4-4760-897d-0163beed356e >Status: Started >Snapshot Count: 0 >Number of Bricks: 2 x (2 + 1) = 6 >Transport-type: tcp >Bricks: >Brick1: mooglian:/var/GlusterBrick/replset-0/webisms-replset-0 >Brick2: moogle:/var/GlusterBrick/replset-0/webisms-replset-0 >Brick3: moogle:/var/GlusterBrick/replset-0-arb/webisms-replset-0-arb >(arbiter) >Brick4: moogle:/var/GlusterBrick/replset-1/webisms-replset-1 >Brick5: mooglian:/var/GlusterBrick/replset-1/webisms-replset-1 >Brick6: mooglian:/var/GlusterBrick/replset-1-arb/webisms-replset-1-arb >(arbiter) >Options Reconfigured: >performance.rda-cache-limit: 1GB >performance.client-io-threads: off >nfs.disable: on >storage.fips-mode-rchecksum: off >transport.address-family: inet >performance.stat-prefetch: on >network.inode-lru-limit: 200000 >performance.write-behind-window-size: 1073741824 >performance.readdir-ahead: on >performance.io-thread-count: 64 >performance.cache-size: 12GB >server.event-threads: 4 >client.event-threads: 4 >performance.nl-cache-timeout: 600 >auth.allow: xxxxxx >performance.open-behind: off >performance.quick-read: off >cluster.lookup-optimize: off >cluster.rebal-throttle: lazy >features.cache-invalidation: on >features.cache-invalidation-timeout: 600 >performance.cache-invalidation: on >performance.md-cache-timeout: 600 >performance.flush-behind: on >cluster.read-hash-mode: 0 >performance.strict-o-direct: on >cluster.readdir-optimize: on >cluster.lookup-unhashed: off >performance.cache-refresh-timeout: 30 >performance.enable-least-priority: off >cluster.choose-local: on >performance.rda-request-size: 128KB >performance.read-ahead: on >performance.read-ahead-page-count: 16 >performance.cache-max-file-size: 5MB >performance.io-cache: on From hunter86_bg at yahoo.com Fri Aug 21 05:33:05 2020 From: hunter86_bg at yahoo.com (Strahil Nikolov) Date: Fri, 21 Aug 2020 08:33:05 +0300 Subject: [Gluster-users] client side profiling In-Reply-To: <8caca4a7-eb76-5693-7524-f20a833eca03@computerisms.ca> References: <8caca4a7-eb76-5693-7524-f20a833eca03@computerisms.ca> Message-ID: <70905929-5C0E-4425-B28B-32D78C9C352D@yahoo.com> >master# gluster vol profile webisms start >Profile on Volume webisms is already started It seems that it was already started. Can you stop it and check node's load before starting it again? Best Regards, Strahil Nikolov ?? 21 ?????? 2020 ?. 7:44:35 GMT+03:00, Computerisms Corporation ??????: >Hi List, > >I am still struggling with my setup. One server is working reasonably >well for serving websites, but serving sites from the 2nd server is >still using excessive amounts of cpu; a bit of which is gluster, but >most of which is apache. > >Gluster docs mentions client-side-profiling: > >https://gluster.readthedocs.io/en/latest/Administrator%20Guide/Performance%20Testing/#client-side-profiling > >specifically: > >"In short, use client-side profiling for understanding "why is my >application unresponsive"?" > >Great, I think this is what I want. instructions are: > >*gluster volume profile your-volume start >*setfattr -n trusted.io-stats-dump -v /tmp/io-stats-pre.txt >/your/mountpoint >*This will generate the specified file on the client > >Okay: > >root at moogle:/usr/src/gluster-profile-analysis-master# gluster vol >profile webisms start >Profile on Volume webisms is already started >root at moogle:/usr/src/gluster-profile-analysis-master# setfattr -n >trusted.io-stats-dump -v /tmp/stats.txt /Computerisms >root at moogle:/usr/src/gluster-profile-analysis-master# ls /tmp/stats.txt >ls: cannot access '/tmp/stats.txt': No such file or directory > >thought for sure I am doing something wrong, so I had a look at the >gvp-client.sh script, and it appears I am doing the command correctly, >there is just no output file. Am I missing something? or is this an >outdated methodology that no longer works? From jahernan at redhat.com Fri Aug 21 05:34:30 2020 From: jahernan at redhat.com (Xavi Hernandez) Date: Fri, 21 Aug 2020 07:34:30 +0200 Subject: [Gluster-users] client side profiling In-Reply-To: <8caca4a7-eb76-5693-7524-f20a833eca03@computerisms.ca> References: <8caca4a7-eb76-5693-7524-f20a833eca03@computerisms.ca> Message-ID: Hi, see the comments inline On Fri, Aug 21, 2020 at 6:44 AM Computerisms Corporation < bob at computerisms.ca> wrote: > Hi List, > > I am still struggling with my setup. One server is working reasonably > well for serving websites, but serving sites from the 2nd server is > still using excessive amounts of cpu; a bit of which is gluster, but > most of which is apache. > > Gluster docs mentions client-side-profiling: > > > https://gluster.readthedocs.io/en/latest/Administrator%20Guide/Performance%20Testing/#client-side-profiling > > specifically: > > "In short, use client-side profiling for understanding "why is my > application unresponsive"?" > > Great, I think this is what I want. instructions are: > > *gluster volume profile your-volume start > *setfattr -n trusted.io-stats-dump -v /tmp/io-stats-pre.txt > /your/mountpoint > *This will generate the specified file on the client > > Okay: > > root at moogle:/usr/src/gluster-profile-analysis-master# gluster vol > profile webisms start > Profile on Volume webisms is already started > root at moogle:/usr/src/gluster-profile-analysis-master# setfattr -n > trusted.io-stats-dump -v /tmp/stats.txt /Computerisms > root at moogle:/usr/src/gluster-profile-analysis-master# ls /tmp/stats.txt > ls: cannot access '/tmp/stats.txt': No such file or directory > > thought for sure I am doing something wrong, so I had a look at the > gvp-client.sh script, and it appears I am doing the command correctly, > there is just no output file. Am I missing something? or is this an > outdated methodology that no longer works? > For security reasons, the value passed cannot represent a full path, so this was changed to only tell the name of a file. The file itself is stored inside /var/run/gluster. If you look there, there should be a file like '-tmp-stats.txt' (replacing '/' by '_') which contains the client profile. Regards, Xavi -------------- next part -------------- An HTML attachment was scrubbed... URL: From amar at kadalu.io Fri Aug 21 05:37:11 2020 From: amar at kadalu.io (Amar Tumballi) Date: Fri, 21 Aug 2020 11:07:11 +0530 Subject: [Gluster-users] client side profiling In-Reply-To: <70905929-5C0E-4425-B28B-32D78C9C352D@yahoo.com> References: <8caca4a7-eb76-5693-7524-f20a833eca03@computerisms.ca> <70905929-5C0E-4425-B28B-32D78C9C352D@yahoo.com> Message-ID: Checked why this didn't work! Due to some CVE vulnerability concerns, we changed the output it into /var/run/gluster/. On Fri, Aug 21, 2020 at 11:03 AM Strahil Nikolov wrote: > >master# gluster vol profile webisms start > >Profile on Volume webisms is already started > > It seems that it was already started. Can you stop it and check node's > load before starting it again? > > Best Regards, > Strahil Nikolov > > ?? 21 ?????? 2020 ?. 7:44:35 GMT+03:00, Computerisms Corporation < > bob at computerisms.ca> ??????: > >Hi List, > > > >I am still struggling with my setup. One server is working reasonably > >well for serving websites, but serving sites from the 2nd server is > >still using excessive amounts of cpu; a bit of which is gluster, but > >most of which is apache. > > > >Gluster docs mentions client-side-profiling: > > > > > https://gluster.readthedocs.io/en/latest/Administrator%20Guide/Performance%20Testing/#client-side-profiling > > > >specifically: > > > >"In short, use client-side profiling for understanding "why is my > >application unresponsive"?" > > > >Great, I think this is what I want. instructions are: > > > >*gluster volume profile your-volume start > >*setfattr -n trusted.io-stats-dump -v /tmp/io-stats-pre.txt > >/your/mountpoint > >*This will generate the specified file on the client > > > >Okay: > > > >root at moogle:/usr/src/gluster-profile-analysis-master# gluster vol > >profile webisms start > >Profile on Volume webisms is already started > >root at moogle:/usr/src/gluster-profile-analysis-master# setfattr -n > >trusted.io-stats-dump -v /tmp/stats.txt /Computerisms > >root at moogle:/usr/src/gluster-profile-analysis-master# ls /tmp/stats.txt > >ls: cannot access '/tmp/stats.txt': No such file or directory > > > >thought for sure I am doing something wrong, so I had a look at the > >gvp-client.sh script, and it appears I am doing the command correctly, > >there is just no output file. Am I missing something? or is this an > >outdated methodology that no longer works? > ________ > > > > Community Meeting Calendar: > > Schedule - > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > Bridge: https://bluejeans.com/441850968 > > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > -- -- https://kadalu.io Container Storage made easy! -------------- next part -------------- An HTML attachment was scrubbed... URL: From bob at computerisms.ca Fri Aug 21 07:02:34 2020 From: bob at computerisms.ca (Computerisms Corporation) Date: Fri, 21 Aug 2020 00:02:34 -0700 Subject: [Gluster-users] performance In-Reply-To: References: <696b3c28-519b-c3e3-ce5d-e60d2f194d4c@computerisms.ca> <7991483E-5365-4C87-89FA-C871AED18062@yahoo.com> <345b06c4-5996-9aa3-f846-0944c60ee398@computerisms.ca> <2CD68ED2-199F-407D-B0CC-385793BA16FD@yahoo.com> <64ee1b88-42d6-75d2-05ff-4703d168cc25@computerisms.ca> <68274322-B514-4555-A236-D159B16D42FC@yahoo.com> <0166c1ff-83c0-4d5f-aa96-6cd8a2518cd1@computerisms.ca> Message-ID: <4e21c13a-059e-d9b0-1332-595ed6edf9f9@computerisms.ca> Hi Strahil, > You can use 'virt-what' binary to find if and what type of Virtualization is used. cool, did not know about that. trouble server: root at moogle:/# virt-what hyperv kvm good server: root at mooglian:/# virt-what kvm > I have a suspicion you are ontop of Openstack (which uses CEPH), so I guess you can try to get more info. > For example, an Openstack instance can have '0x1af4' in '/sys/block/vdX/device/vendor' (replace X with actual device letter). > Another check could be: > /usr/lib/udev/scsi_id -g -u -d /dev/vda This command returns no output on the bad server. Good server returns: root at mooglian:/# /usr/lib/udev/scsi_id -g -u -d /dev/vda -bash: /usr/lib/udev/scsi_id: No such file or directory > And also, you can try to take a look with smartctl from smartmontools package: > smartctl -a /dev/vdX Both servers return: /dev/vda: Unable to detect device type When I asked them about this earlier this week I was told the two servers are identical, but I guess there is something different about the server giving me trouble. I will go back to them and see what they have to say. Thanks for pointing me at this... From bob at computerisms.ca Fri Aug 21 07:03:43 2020 From: bob at computerisms.ca (Computerisms Corporation) Date: Fri, 21 Aug 2020 00:03:43 -0700 Subject: [Gluster-users] client side profiling In-Reply-To: <70905929-5C0E-4425-B28B-32D78C9C352D@yahoo.com> References: <8caca4a7-eb76-5693-7524-f20a833eca03@computerisms.ca> <70905929-5C0E-4425-B28B-32D78C9C352D@yahoo.com> Message-ID: <453c1b09-06b0-abfa-33f2-fb1a6eb743d3@computerisms.ca> stopped and started it many times. Load is fine (2-3) so long as I am not serving sites... On 2020-08-20 10:33 p.m., Strahil Nikolov wrote: >> master# gluster vol profile webisms start >> Profile on Volume webisms is already started > > It seems that it was already started. Can you stop it and check node's load before starting it again? > > Best Regards, > Strahil Nikolov > > ?? 21 ?????? 2020 ?. 7:44:35 GMT+03:00, Computerisms Corporation ??????: >> Hi List, >> >> I am still struggling with my setup. One server is working reasonably >> well for serving websites, but serving sites from the 2nd server is >> still using excessive amounts of cpu; a bit of which is gluster, but >> most of which is apache. >> >> Gluster docs mentions client-side-profiling: >> >> https://gluster.readthedocs.io/en/latest/Administrator%20Guide/Performance%20Testing/#client-side-profiling >> >> specifically: >> >> "In short, use client-side profiling for understanding "why is my >> application unresponsive"?" >> >> Great, I think this is what I want. instructions are: >> >> *gluster volume profile your-volume start >> *setfattr -n trusted.io-stats-dump -v /tmp/io-stats-pre.txt >> /your/mountpoint >> *This will generate the specified file on the client >> >> Okay: >> >> root at moogle:/usr/src/gluster-profile-analysis-master# gluster vol >> profile webisms start >> Profile on Volume webisms is already started >> root at moogle:/usr/src/gluster-profile-analysis-master# setfattr -n >> trusted.io-stats-dump -v /tmp/stats.txt /Computerisms >> root at moogle:/usr/src/gluster-profile-analysis-master# ls /tmp/stats.txt >> ls: cannot access '/tmp/stats.txt': No such file or directory >> >> thought for sure I am doing something wrong, so I had a look at the >> gvp-client.sh script, and it appears I am doing the command correctly, >> there is just no output file. Am I missing something? or is this an >> outdated methodology that no longer works? From bob at computerisms.ca Fri Aug 21 07:05:19 2020 From: bob at computerisms.ca (Computerisms Corporation) Date: Fri, 21 Aug 2020 00:05:19 -0700 Subject: [Gluster-users] client side profiling In-Reply-To: References: <8caca4a7-eb76-5693-7524-f20a833eca03@computerisms.ca> Message-ID: Hi Xavi, Amar, > For security reasons, the value passed cannot represent a full path, so > this was changed to only tell the name of a file. The file itself is > stored inside /var/run/gluster. > > If you look there, there should be a file like '-tmp-stats.txt' > (replacing '/' by '_') which contains the client profile. Indeed, found the missing file there. Thank you. tomorrow's task is to see if I can learn anything from that information... > > Regards, > > Xavi From diego.zuccato at unibo.it Fri Aug 21 11:56:17 2020 From: diego.zuccato at unibo.it (Diego Zuccato) Date: Fri, 21 Aug 2020 13:56:17 +0200 Subject: [Gluster-users] How to fix I/O error ? (resend) Message-ID: Hello all. I have a volume setup as: -8<-- root at str957-biostor:~# gluster v info BigVol Volume Name: BigVol Type: Distributed-Replicate Volume ID: c51926bd-6715-46b2-8bb3-8c915ec47e28 Status: Started Snapshot Count: 0 Number of Bricks: 28 x (2 + 1) = 84 Transport-type: tcp Bricks: Brick1: str957-biostor2:/srv/bricks/00/BigVol Brick2: str957-biostor:/srv/bricks/00/BigVol Brick3: str957-biostq:/srv/arbiters/00/BigVol (arbiter) [...] Options Reconfigured: cluster.granular-entry-heal: enable client.event-threads: 8 server.event-threads: 8 server.ssl: on client.ssl: on nfs.disable: on performance.readdir-ahead: on transport.address-family: inet features.bitrot: on features.scrub: Active features.scrub-freq: biweekly auth.ssl-allow: str957-bio* ssl.certificate-depth: 1 cluster.self-heal-daemon: enable features.quota: on features.inode-quota: on features.quota-deem-statfs: on server.manage-gids: on features.scrub-throttle: aggressive -8<-- After a couple failures (a disk on biostor2 went "missing", and glusterd on biostq got killed by OOM) I noticed that some files can't be accessed from the clients: -8<-- $ ls -lh 1_germline_CGTACTAG_L005_R* -rwxr-xr-x 1 e.f domain^users 2,0G apr 24 2015 1_germline_CGTACTAG_L005_R1_001.fastq.gz -rwxr-xr-x 1 e.f domain^users 2,0G apr 24 2015 1_germline_CGTACTAG_L005_R2_001.fastq.gz $ ls -lh 1_germline_CGTACTAG_L005_R1_001.fastq.gz ls: cannot access '1_germline_CGTACTAG_L005_R1_001.fastq.gz': Input/output error -8<-- (note that if I request ls for more files, it works...). The files have exactly the same contents (verified via md5sum). The only difference is in getfattr: trusted.bit-rot.version is 0x17000000000000005f3f9e670002ad5b on a node and 0x12000000000000005f3ce7af000dccad on the other. On the client, the log reports: -8<- [2020-08-21 11:32:52.208809] W [MSGID: 108008] [afr-self-heal-name.c:354:afr_selfheal_name_gfid_mismatch_check] 4-BigVol-replicate-13: GFID mismatch for /1_germline_CGTACTAG_L005_R1_001.fastq.gz d70a4a6d-05fc-4988-8041-5e7f62155fe5 on BigVol-client-55 and f249f88a-909f-489d-8d1d-d428e842ee96 on BigVol-client-34 [2020-08-21 11:32:52.209768] W [fuse-bridge.c:471:fuse_entry_cbk] 0-glusterfs-fuse: 233606: LOOKUP() /[...]/1_germline_CGTACTAG_L005_R1_001.fastq.gz => -1 (Errore di input/output) -8<-- As suggested on IRC, I tested the RAM, but the only thing I got have been a "Peer rejected" status due to another OOM kill. No problem, I've been able to resolve it, but the original problem still remains. What else can I do? TIA! -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Universit? di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 From vd at d7informatics.de Fri Aug 21 12:19:55 2020 From: vd at d7informatics.de (Volker Dormeyer) Date: Fri, 21 Aug 2020 14:19:55 +0200 Subject: [Gluster-users] Kadalu In-Reply-To: References: <25a97bf3-aa7c-1c81-d072-13c2bca6f612@d7informatics.de> Message-ID: <4c3d9af2-399a-e399-f933-5b197cc79abc@d7informatics.de> Hello Amar, thank you - I'm going to test this. Volker On 8/21/20 5:45 AM, Amar Tumballi wrote: > Let me try to answer you.. > ? > > When I use the external mode to access Gluster, I need to specify a > Gluster node, but what happend to my service if this node is not > reachable anymore. Or what does happen in general as soon as this node > fails? > > ? > the node is used for 'mounting' (ie, to fetch the volume info), so, if > you are having a HA setup with replica 3, even if the node goes down, > gluster?file system continues to work, ie, all PVs will be working fine. > > We can enhance the 'options:' in storage spec to take > 'backup-volfile-server', so even the mount issue can be resolved. > Should be an RFE to project. > ? > > Can I run the Gluster services together with kadalu on a Kubernetes > cluster and provide storage to a second Kubernetes cluster without > local > storage? > > > Is the second storage cluster having access to nodes in this cluster? > (ie, reachability?) if yes, it works as an 'External' gluster setup > for that second cluster. But works. From sacchi at kadalu.io Fri Aug 21 14:36:27 2020 From: sacchi at kadalu.io (Sachidananda Urs) Date: Fri, 21 Aug 2020 20:06:27 +0530 Subject: [Gluster-users] Monitoring tools for GlusterFS In-Reply-To: References: <58f109a7-6d62-4814-425d-7728ea4f8338@fischer-ka.de> Message-ID: On Fri, Aug 21, 2020 at 4:07 AM Gilberto Nunes wrote: > Hi Sachidananda! > I am trying to use the latest release of gstatus, but when I cut off one > of the nodes, I get timeout... > I tried to reproduce, but couldn't. How did you cut off the node? I killed all the gluster processes on one of the nodes and I see this. You can see one of the bricks is shown as offline. And nodes are 2/3. Can you please tell me the steps to reproduce the issue. root at master-node:/mnt/gluster/movies# gstatus -a Cluster: Status: Degraded GlusterFS: 9dev Nodes: 2/3 Volumes: 1/1 Volumes: snap-1 Replicate Started (PARTIAL) - 1/2 Bricks Up Capacity: (12.02% used) 5.00 GiB/40.00 GiB (used/total) Self-Heal: slave-1:/mnt/brick1/snapr1/r11 (7 File(s) to heal). Snapshots: 2 Name: snap_1_today_GMT-2020.08.15-15.39.10 Status: Started Created On: 2020-08-15 15:39:10 +0000 Name: snap_2_today_GMT-2020.08.15-15.39.20 Status: Stopped Created On: 2020-08-15 15:39:20 +0000 Bricks: Distribute Group 1: slave-1:/mnt/brick1/snapr1/r11 (Online) slave-2:/mnt/brick1/snapr2/r22 (Offline) Quota: Off Note: glusterd/glusterfsd is down in one or more nodes. Sizes might not be accurate. root at master-node:/mnt/gluster/movies# > -------------- next part -------------- An HTML attachment was scrubbed... URL: From archon810 at gmail.com Sat Aug 22 17:21:52 2020 From: archon810 at gmail.com (Artem Russakovskii) Date: Sat, 22 Aug 2020 10:21:52 -0700 Subject: [Gluster-users] Monitoring tools for GlusterFS In-Reply-To: References: <58f109a7-6d62-4814-425d-7728ea4f8338@fischer-ka.de> Message-ID: The output currently has some whitespace issues. 1. The space shift under Cluster is different than under Volumes, making the output look a bit inconsistent. 2. Can you please fix the tabulation for when volume names are varying in length? This output is shifted and looks messy as a result for me. Cluster: Status: Healthy GlusterFS: 7.7 Nodes: 4/4 Volumes: 3/3 Volumes: XX2 Replicate Started (UP) - 4/4 Bricks Up Capacity: (54.03% used) 553.00 GiB/1024.00 GiB (used/total) XXXXXXXXXXXXX_data3 Replicate Started (UP) - 4/4 Bricks Up Capacity: (78.41% used) 392.00 GiB/500.00 GiB (used/total) XXXXXXXXX_data1 Replicate Started (UP) - 4/4 Bricks Up Capacity: (94.24% used) 9.00 TiB/10.00 TiB (used/total) Sincerely, Artem -- Founder, Android Police , APK Mirror , Illogical Robot LLC beerpla.net | @ArtemR On Fri, Aug 21, 2020 at 7:36 AM Sachidananda Urs wrote: > > > On Fri, Aug 21, 2020 at 4:07 AM Gilberto Nunes > wrote: > >> Hi Sachidananda! >> I am trying to use the latest release of gstatus, but when I cut off one >> of the nodes, I get timeout... >> > > I tried to reproduce, but couldn't. How did you cut off the node? I killed > all the gluster processes on one of the nodes and I see this. > You can see one of the bricks is shown as offline. And nodes are 2/3. Can > you please tell me the steps to reproduce the issue. > > root at master-node:/mnt/gluster/movies# gstatus -a > > > Cluster: > > Status: Degraded GlusterFS: 9dev > > Nodes: 2/3 Volumes: 1/1 > > > Volumes: > > snap-1 Replicate Started (PARTIAL) - > 1/2 Bricks Up > > Capacity: (12.02% > used) 5.00 GiB/40.00 GiB (used/total) > > Self-Heal: > > slave-1:/mnt/brick1/snapr1/r11 > (7 File(s) to heal). > > Snapshots: 2 > > Name: > snap_1_today_GMT-2020.08.15-15.39.10 > > Status: Started > Created On: 2020-08-15 15:39:10 +0000 > > Name: > snap_2_today_GMT-2020.08.15-15.39.20 > > Status: Stopped > Created On: 2020-08-15 15:39:20 +0000 > > Bricks: > > Distribute Group > 1: > > slave-1:/mnt/brick1/snapr1/r11 > (Online) > > slave-2:/mnt/brick1/snapr2/r22 > (Offline) > > Quota: Off > > Note: > glusterd/glusterfsd is down in one or more nodes. > > Sizes might > not be accurate. > > > > root at master-node:/mnt/gluster/movies# > >> ________ > > > > Community Meeting Calendar: > > Schedule - > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > Bridge: https://bluejeans.com/441850968 > > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mabi at protonmail.ch Sat Aug 22 17:53:33 2020 From: mabi at protonmail.ch (mabi) Date: Sat, 22 Aug 2020 17:53:33 +0000 Subject: [Gluster-users] Upgrade from 6.9 to 7.7 stuck (peer is rejected) Message-ID: Hello, I just started an upgrade of my 3 nodes replica (incl arbiter) of GlusterFS from 6.9 to 7.7 but unfortunately after upgrading the first node, that node gets rejected due to the following error: [2020-08-22 17:43:00.240990] E [MSGID: 106012] [glusterd-utils.c:3537:glusterd_compare_friend_volume] 0-management: Cksums of quota configuration of volume myvolume differ. local cksum = 3013120651, remote cksum = 0 on peer myfirstnode.domain.tld So glusterd process is running but not glusterfsd. I am exactly in the same issue as described here: https://www.gitmemory.com/Adam2Marsh But I do not see any solutions or workaround. So now I am stuck with a degraded GlusterFS cluster. Could someone please advise me as soon as possible on what I should do? Is there maybe any workarounds? Thank you very much in advance for your response. Best regards, Mabi From mabi at protonmail.ch Sun Aug 23 07:46:24 2020 From: mabi at protonmail.ch (mabi) Date: Sun, 23 Aug 2020 07:46:24 +0000 Subject: [Gluster-users] Upgrade from 6.9 to 7.7 stuck (peer is rejected) In-Reply-To: References: Message-ID: Hello, So to be precise I am exactly having the following issue: https://github.com/gluster/glusterfs/issues/1332 I could not wait any longer to find some workarounds or quick fixes so I decided to downgrade my rejected from 7.7 back to 6.9 which worked. I would be really glad if someone could fix this issue or provide me a workaround which works because version 6 of GlusterFS is not supported anymore so I would really like to move on to the stable version 7. Thank you very much in advance. Best regards, Mabi ??????? Original Message ??????? On Saturday, August 22, 2020 7:53 PM, mabi wrote: > Hello, > > I just started an upgrade of my 3 nodes replica (incl arbiter) of GlusterFS from 6.9 to 7.7 but unfortunately after upgrading the first node, that node gets rejected due to the following error: > > [2020-08-22 17:43:00.240990] E [MSGID: 106012] [glusterd-utils.c:3537:glusterd_compare_friend_volume] 0-management: Cksums of quota configuration of volume myvolume differ. local cksum = 3013120651, remote cksum = 0 on peer myfirstnode.domain.tld > > So glusterd process is running but not glusterfsd. > > I am exactly in the same issue as described here: > > https://www.gitmemory.com/Adam2Marsh > > But I do not see any solutions or workaround. So now I am stuck with a degraded GlusterFS cluster. > > Could someone please advise me as soon as possible on what I should do? Is there maybe any workarounds? > > Thank you very much in advance for your response. > > Best regards, > Mabi From sacchi at kadalu.io Sun Aug 23 13:20:15 2020 From: sacchi at kadalu.io (Sachidananda Urs) Date: Sun, 23 Aug 2020 18:50:15 +0530 Subject: [Gluster-users] Monitoring tools for GlusterFS In-Reply-To: References: <58f109a7-6d62-4814-425d-7728ea4f8338@fischer-ka.de> Message-ID: On Sat, Aug 22, 2020 at 10:52 PM Artem Russakovskii wrote: > The output currently has some whitespace issues. > > 1. The space shift under Cluster is different than under Volumes, making > the output look a bit inconsistent. > 2. Can you please fix the tabulation for when volume names are varying in > length? This output is shifted and looks messy as a result for me. > Artem, since the volume names vary for longer volumes the column number increases and users have to scroll to right. To overcome this, I have decided to print the volume name in a row by itself. PR: https://github.com/gluster/gstatus/pull/44 fixes the issue. The output looks like this: root at master-node:/home/sac/work/gstatus# gstatus Cluster: Status: Healthy GlusterFS: 9dev Nodes: 3/3 Volumes: 2/2 Volumes: snap-1 Replicate Started (UP) - 2/2 Bricks Up Capacity: (12.04% used) 5.00 GiB/40.00 GiB (used/total) Snapshots: 2 Quota: On very_very_long_long_name_to_test_the_gstatus_display Replicate Started (UP) - 2/2 Bricks Up Capacity: (12.04% used) 5.00 GiB/40.00 GiB (used/total) root at master-node:/home/sac/work/gstatus# -sac -------------- next part -------------- An HTML attachment was scrubbed... URL: From nladha at redhat.com Mon Aug 24 09:14:39 2020 From: nladha at redhat.com (Nikhil Ladha) Date: Mon, 24 Aug 2020 14:44:39 +0530 Subject: [Gluster-users] Upgrade from 6.9 to 7.7 stuck (peer is rejected) Message-ID: Hello Mabi You don't need to follow the offline upgrade procedure. Please do follow the online upgrade procedure only. Upgrade the nodes one by one, you will notice the `Peer Rejected` state, after upgrading one node or so, but once all the nodes are upgraded it will be back to `Peer in Cluster(Connected)`. Also, if any of the shd's are not online you can try restarting that node to fix that. I have tried this on my own setup so I am pretty sure, it should work for you as well. This is the workaround for the time being so that you are able to upgrade, we are working on the issue to come up with a fix for it ASAP. And, yes if you face any issues even after upgrading all the nodes to 7.7, you will be able to downgrade in back to 6.9, which I think you have already tried and it works as per your previous mail. Regards Nikhil Ladha -------------- next part -------------- An HTML attachment was scrubbed... URL: From mabi at protonmail.ch Mon Aug 24 11:48:07 2020 From: mabi at protonmail.ch (mabi) Date: Mon, 24 Aug 2020 11:48:07 +0000 Subject: [Gluster-users] Upgrade from 6.9 to 7.7 stuck (peer is rejected) In-Reply-To: References: Message-ID: Dear Nikhil, Thank you for your answer. So does this mean that all my FUSE clients where I have the volume mounted will not loose at any time their connection during the whole upgrade procedure of all 3 nodes? I am asking because if I understand correctly there will be an overlap of time where more than one node will not be running the glusterfsd (brick) process so this means that the quorum is lost and then my FUSE clients will loose connection to the volume? I just want to be sure that there will not be any downtime. Best regards, Mabi ??????? Original Message ??????? On Monday, August 24, 2020 11:14 AM, Nikhil Ladha wrote: > Hello Mabi > > You don't need to follow the offline upgrade procedure. Please do follow the online upgrade procedure only. Upgrade the nodes one by one, you will notice the `Peer Rejected` state, after upgrading one node or so, but once all the nodes are upgraded it will be back to `Peer in Cluster(Connected)`. Also, if any of the shd's are not online you can try restarting that node to fix that. I have tried this on my own setup so I am pretty sure, it should work for you as well. > This is the workaround for the time being so that you are able to upgrade, we are working on the issue to come up with a fix for it ASAP. > > And, yes if you face any issues even after upgrading all the nodes to 7.7, you will be able to downgrade in back to 6.9, which I think you have already tried and it works as per your previous mail. > > Regards > Nikhil Ladha -------------- next part -------------- An HTML attachment was scrubbed... URL: From diego.zuccato at unibo.it Mon Aug 24 13:23:03 2020 From: diego.zuccato at unibo.it (Diego Zuccato) Date: Mon, 24 Aug 2020 15:23:03 +0200 Subject: [Gluster-users] How to fix I/O error ? (resend) In-Reply-To: References: Message-ID: <4c279753-56d4-ae24-8baa-9d739757bd0a@unibo.it> Il 21/08/20 13:56, Diego Zuccato ha scritto: Hello again. I also tried disabling bitrot (and re-enabling it afterwards) and the procedure for recovery from split-brain[*] removing the file and its link from one of the nodes, but no luck. I'm now completely out of ideas :( How can I resync those gfids ? Tks! Diego [*] even if "gluster volume heal BigVol info split-brain" reports 0 for every brick. > Hello all. > > I have a volume setup as: > -8<-- > root at str957-biostor:~# gluster v info BigVol > > Volume Name: BigVol > Type: Distributed-Replicate > Volume ID: c51926bd-6715-46b2-8bb3-8c915ec47e28 > Status: Started > Snapshot Count: 0 > Number of Bricks: 28 x (2 + 1) = 84 > Transport-type: tcp > Bricks: > Brick1: str957-biostor2:/srv/bricks/00/BigVol > Brick2: str957-biostor:/srv/bricks/00/BigVol > Brick3: str957-biostq:/srv/arbiters/00/BigVol (arbiter) > [...] > Options Reconfigured: > cluster.granular-entry-heal: enable > client.event-threads: 8 > server.event-threads: 8 > server.ssl: on > client.ssl: on > nfs.disable: on > performance.readdir-ahead: on > transport.address-family: inet > features.bitrot: on > features.scrub: Active > features.scrub-freq: biweekly > auth.ssl-allow: str957-bio* > ssl.certificate-depth: 1 > cluster.self-heal-daemon: enable > features.quota: on > features.inode-quota: on > features.quota-deem-statfs: on > server.manage-gids: on > features.scrub-throttle: aggressive > -8<-- > > After a couple failures (a disk on biostor2 went "missing", and glusterd > on biostq got killed by OOM) I noticed that some files can't be accessed > from the clients: > -8<-- > $ ls -lh 1_germline_CGTACTAG_L005_R* > -rwxr-xr-x 1 e.f domain^users 2,0G apr 24 2015 > 1_germline_CGTACTAG_L005_R1_001.fastq.gz > -rwxr-xr-x 1 e.f domain^users 2,0G apr 24 2015 > 1_germline_CGTACTAG_L005_R2_001.fastq.gz > $ ls -lh 1_germline_CGTACTAG_L005_R1_001.fastq.gz > ls: cannot access '1_germline_CGTACTAG_L005_R1_001.fastq.gz': > Input/output error > -8<-- > (note that if I request ls for more files, it works...). > > The files have exactly the same contents (verified via md5sum). The only > difference is in getfattr: trusted.bit-rot.version is > 0x17000000000000005f3f9e670002ad5b on a node and > 0x12000000000000005f3ce7af000dccad on the other. > > On the client, the log reports: > -8<- > [2020-08-21 11:32:52.208809] W [MSGID: 108008] > [afr-self-heal-name.c:354:afr_selfheal_name_gfid_mismatch_check] > 4-BigVol-replicate-13: GFID mismatch for > /1_germline_CGTACTAG_L005_R1_001.fastq.gz > d70a4a6d-05fc-4988-8041-5e7f62155fe5 on BigVol-client-55 and > f249f88a-909f-489d-8d1d-d428e842ee96 on BigVol-client-34 > [2020-08-21 11:32:52.209768] W [fuse-bridge.c:471:fuse_entry_cbk] > 0-glusterfs-fuse: 233606: LOOKUP() > /[...]/1_germline_CGTACTAG_L005_R1_001.fastq.gz => -1 (Errore di > input/output) > -8<-- > > As suggested on IRC, I tested the RAM, but the only thing I got have > been a "Peer rejected" status due to another OOM kill. No problem, I've > been able to resolve it, but the original problem still remains. > > What else can I do? > > TIA! > > -- > Diego Zuccato > DIFA - Dip. di Fisica e Astronomia > Servizi Informatici > Alma Mater Studiorum - Universit? di Bologna > V.le Berti-Pichat 6/2 - 40127 Bologna - Italy > tel.: +39 051 20 95786 > ________ > > > > Community Meeting Calendar: > > Schedule - > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > Bridge: https://bluejeans.com/441850968 > > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Universit? di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 From dcunningham at voisonics.com Tue Aug 25 03:24:05 2020 From: dcunningham at voisonics.com (David Cunningham) Date: Tue, 25 Aug 2020 15:24:05 +1200 Subject: [Gluster-users] Geo-replication log file not closed Message-ID: Hello, We're having an issue with the rotated gsyncd.log not being released. Here's the output of 'lsof': # lsof | grep 'gsyncd.log.1' python2 4495 root 3w REG 8,1 991675023 4332241 /var/log/glusterfs/geo-replication/gvol0_nvfs10_gvol0/gsyncd.log.1 (deleted) python2 4495 4496 root 3w REG 8,1 991675023 4332241 /var/log/glusterfs/geo-replication/gvol0_nvfs10_gvol0/gsyncd.log.1 (deleted) python2 4495 4507 root 3w REG 8,1 991675023 4332241 /var/log/glusterfs/geo-replication/gvol0_nvfs10_gvol0/gsyncd.log.1 (deleted) python2 4508 root 3w REG 8,1 991675023 4332241 /var/log/glusterfs/geo-replication/gvol0_nvfs10_gvol0/gsyncd.log.1 (deleted) python2 4508 root 5w REG 8,1 991675023 4332241 /var/log/glusterfs/geo-replication/gvol0_nvfs10_gvol0/gsyncd.log.1 (deleted) python2 4508 4511 root 3w REG 8,1 991675023 4332241 /var/log/glusterfs/geo-replication/gvol0_nvfs10_gvol0/gsyncd.log.1 (deleted) ... etc... Those processes are: # ps -ef | egrep '4495|4508' root 4495 1 0 Aug10 ? 00:00:59 /usr/bin/python2 /usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/gsyncd.py --path=/nodirectwritedata/gluster/gvol0 --monitor -c /var/lib/glusterd/geo-replication/gvol0_nvfs10_gvol0/gsyncd.conf --iprefix=/var :gvol0 --glusterd-uuid=b7521445-ee93-4fed-8ced-6a609fa8c7d4 nvfs10::gvol0 root 4508 4495 0 Aug10 ? 00:01:56 python2 /usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/gsyncd.py agent gvol0 nvfs10::gvol0 --local-path /nodirectwritedata/gluster/gvol0 --local-node cafs30 --local-node-id b7521445-ee93-4fed-8ced-6a609fa8c7d4 --slave-id cdcdb210-839c-4306-a4dc-e696b165ed17 --rpc-fd 9,12,11,10 And here's the relevant part of the /etc/logrotate.d/glusterfs-georep script: /var/log/glusterfs/geo-replication/*/*.log { sharedscripts rotate 52 missingok compress delaycompress notifempty postrotate for pid in `ps -aef | grep glusterfs | egrep "\-\-aux-gfid-mount" | awk '{print $2}'`; do /usr/bin/kill -HUP $pid > /dev/null 2>&1 || true done endscript } If I run the postrotate part manually: # ps -aef | grep glusterfs | egrep "\-\-aux-gfid-mount" | awk '{print $2}' 4520 # ps -aef | grep 4520 root 4520 1 0 Aug10 ? 01:24:23 /usr/sbin/glusterfs --aux-gfid-mount --acl --log-level=INFO --log-file=/var/log/glusterfs/geo-replication/gvol0_nvfs10_gvol0/mnt-nodirectwritedata-gluster-gvol0.log --volfile-server=localhost --volfile-id=gvol0 --client-pid=-1 /tmp/gsyncd-aux-mount-Tq_3sU Perhaps the problem is that the kill -HUP in the logrotate script doesn't act on the right process? If so, does anyone have a command to get the right PID? Thanks in advance for any help. -- David Cunningham, Voisonics Limited http://voisonics.com/ USA: +1 213 221 1092 New Zealand: +64 (0)28 2558 3782 -------------- next part -------------- An HTML attachment was scrubbed... URL: From dcunningham at voisonics.com Tue Aug 25 03:46:24 2020 From: dcunningham at voisonics.com (David Cunningham) Date: Tue, 25 Aug 2020 15:46:24 +1200 Subject: [Gluster-users] How safe are major version upgrades? Message-ID: Hello, We have a production system with around 50GB of data running GlusterFS 5.13. It has 3 replicating/mirrored nodes, and also geo-replicates to another site. How safe would it be to upgrade to a more recent major version, eg 7.x? I'm not sure how recommended in-place upgrades are, or if a complete re-install is necessary for safety. We have a maximum window of around 4 hours for this upgrade and would not want any significant risk of an unsuccessful upgrade at the end of that time. Is version 8.0 considered stable? Thanks in advance, -- David Cunningham, Voisonics Limited http://voisonics.com/ USA: +1 213 221 1092 New Zealand: +64 (0)28 2558 3782 -------------- next part -------------- An HTML attachment was scrubbed... URL: From diego.zuccato at unibo.it Tue Aug 25 13:18:02 2020 From: diego.zuccato at unibo.it (Diego Zuccato) Date: Tue, 25 Aug 2020 15:18:02 +0200 Subject: [Gluster-users] How to fix I/O error ? (resend) In-Reply-To: <4c279753-56d4-ae24-8baa-9d739757bd0a@unibo.it> References: <4c279753-56d4-ae24-8baa-9d739757bd0a@unibo.it> Message-ID: <66c74779-a32c-9f12-59c3-7a1e76f41d3a@unibo.it> Il 24/08/20 15:23, Diego Zuccato ha scritto: > I'm now completely out of ideas :( Actually I have one last idea. My nodes are installed from standard Debian "stable" repos. That means they're version 3.8.8 ! I understand it's an ancient version. What's the recommended upgrade path to a current version? Possibly keeping the data safe: I have nowhere to move all those TBs to... -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Universit? di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 From amar at kadalu.io Tue Aug 25 13:27:13 2020 From: amar at kadalu.io (Amar Tumballi) Date: Tue, 25 Aug 2020 18:57:13 +0530 Subject: [Gluster-users] How to fix I/O error ? (resend) In-Reply-To: <66c74779-a32c-9f12-59c3-7a1e76f41d3a@unibo.it> References: <4c279753-56d4-ae24-8baa-9d739757bd0a@unibo.it> <66c74779-a32c-9f12-59c3-7a1e76f41d3a@unibo.it> Message-ID: On Tue, Aug 25, 2020 at 6:48 PM Diego Zuccato wrote: > Il 24/08/20 15:23, Diego Zuccato ha scritto: > > > I'm now completely out of ideas :( > Actually I have one last idea. My nodes are installed from standard > Debian "stable" repos. That means they're version 3.8.8 ! > I understand it's an ancient version. > What's the recommended upgrade path to a current version? Possibly > keeping the data safe: I have nowhere to move all those TBs to... > I am not aware of any data layout changes we did between current latest (7.7) and 3.8.8. But due to some issues, 'online' migration is not possible, even the clients needs to be updated, so you have to umount the volume once. Regards, Amar > -- > Diego Zuccato > DIFA - Dip. di Fisica e Astronomia > Servizi Informatici > Alma Mater Studiorum - Universit? di Bologna > V.le Berti-Pichat 6/2 - 40127 Bologna - Italy > tel.: +39 051 20 95786 > ________ > > > > Community Meeting Calendar: > > Schedule - > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > Bridge: https://bluejeans.com/441850968 > > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > -- -- https://kadalu.io Container Storage made easy! -------------- next part -------------- An HTML attachment was scrubbed... URL: From jordielau at outlook.com Tue Aug 25 23:57:19 2020 From: jordielau at outlook.com (liu zhijing) Date: Tue, 25 Aug 2020 23:57:19 +0000 Subject: [Gluster-users] upgrade gluster from old version Message-ID: hi everyone! I found It is hard to upgrade from a gluster version from 3.7.6 to 7.7,so I remove all the install package and delete all config file except the brick, and I resinstall the glusterfs 7.7, then reconfig the same volume name but use a temp brick.after everything is ok ,then i stop the volume and replace the old brick , change the brick volume id as the new volume id , at last start the volume successfully. What I want to know is if this is the right way to upgrade . -------------- next part -------------- An HTML attachment was scrubbed... URL: From diego.zuccato at unibo.it Wed Aug 26 06:43:04 2020 From: diego.zuccato at unibo.it (Diego Zuccato) Date: Wed, 26 Aug 2020 08:43:04 +0200 Subject: [Gluster-users] How to fix I/O error ? (resend) In-Reply-To: References: <4c279753-56d4-ae24-8baa-9d739757bd0a@unibo.it> <66c74779-a32c-9f12-59c3-7a1e76f41d3a@unibo.it> Message-ID: <20656535-6048-5f36-fed5-75553a53dd3f@unibo.it> Il 25/08/20 15:27, Amar Tumballi ha scritto: > I am not aware of any data layout changes we did between current?latest > (7.7) and 3.8.8. But due to some issues, 'online' migration is not > possible, even the clients needs to be updated, so you have to umount > the volume once. Tks for the info. Actually the issue is less bad than I thought: I checked on a client that (somehow) still used Debian oldstable. Current stable uses 5.5, still old but not prehistoric :) Too bad the original issue still persists, even after removing the file and its hardlink from .gluster dir :( Maybe the upgrade can fix it? Or I risk breaking it even more? -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Universit? di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 From srakonde at redhat.com Wed Aug 26 10:04:52 2020 From: srakonde at redhat.com (Sanju Rakonde) Date: Wed, 26 Aug 2020 15:34:52 +0530 Subject: [Gluster-users] upgrade gluster from old version In-Reply-To: References: Message-ID: Hi, I believe you can do an offline upgrade (I have never tried upgrading from 3.7 to 7.7, so there might be issues). If you want to do a fresh install, after installing the 7.7 packages, you can use the same old bricks to create the volumes. but you need to add force at the end of volume create command. On Wed, Aug 26, 2020 at 5:27 AM liu zhijing wrote: > hi everyone! > I found It is hard to upgrade from a gluster version from 3.7.6 to 7.7,so > I remove all the install package and delete all config file except the > brick, and I resinstall the glusterfs 7.7, then reconfig the same volume > name but use a temp brick.after everything is ok ,then i stop the volume > and replace the old brick , change the brick volume id as the new volume id > , at last start the volume successfully. > What I want to know is if this is the right way to upgrade . > > > ________ > > > > Community Meeting Calendar: > > Schedule - > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > Bridge: https://bluejeans.com/441850968 > > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > -- Thanks, Sanju -------------- next part -------------- An HTML attachment was scrubbed... URL: From ramon.selga at gmail.com Wed Aug 26 11:11:31 2020 From: ramon.selga at gmail.com (Ramon Selga) Date: Wed, 26 Aug 2020 13:11:31 +0200 Subject: [Gluster-users] upgrade gluster from old version In-Reply-To: References: Message-ID: <605e0d08-a3bb-2696-2e6e-04843dc47bc8@gmail.com> Tested several times recently: upgrade 3.12.15 to 7.7 without problems. Upgrade servers and clients first to 3.12.15 ( old version, take a look to repo site). If volumes are replicated you could do it on-line, one by one, looking carefully with self-heal process. For disperse volumes you must stop them before upgrades. Hope it helps! Ramon El 26/08/20 a les 12:04, Sanju Rakonde ha escrit: > Hi, > > I believe you can do an offline upgrade (I have never tried upgrading from 3.7 > to 7.7, so there might be issues). > > If you want to do a fresh install, after installing the 7.7 packages, you can > use the same old bricks to create the volumes. but you need to add force at > the end of volume create command. > > On Wed, Aug 26, 2020 at 5:27 AM liu zhijing > wrote: > > hi everyone! > I found It is hard to upgrade? from a gluster version from 3.7.6 to 7.7,so > I remove all the install package and delete all config file except the > brick, and I resinstall the glusterfs 7.7, then reconfig the same volume > name but use a temp brick.after everything is ok ,then i stop the volume > and replace the old brick , change the brick volume id as the new volume > id , at last ?start the volume successfully. > What I want to know is ?if this is the right way to upgrade . > > > ________ > > > > Community Meeting Calendar: > > Schedule - > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > Bridge: https://bluejeans.com/441850968 > > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > > > > -- > Thanks, > Sanju > > > ________ > > > > Community Meeting Calendar: > > Schedule - > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > Bridge: https://bluejeans.com/441850968 > > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From revirii at googlemail.com Wed Aug 26 13:39:04 2020 From: revirii at googlemail.com (Hu Bert) Date: Wed, 26 Aug 2020 15:39:04 +0200 Subject: [Gluster-users] How safe are major version upgrades? In-Reply-To: References: Message-ID: Hi, we have 2 replicate-3 systems, and i upgraded both online from 5.12 to 6.8 and then to 7.6. No (big) problems here, upgrade took between 10 to 20 minutes (wait until healing is done) - but no geo replication, so i can't say anything about that part. Best regards, Hubert Am Di., 25. Aug. 2020 um 05:47 Uhr schrieb David Cunningham : > > Hello, > > We have a production system with around 50GB of data running GlusterFS 5.13. It has 3 replicating/mirrored nodes, and also geo-replicates to another site. > > How safe would it be to upgrade to a more recent major version, eg 7.x? I'm not sure how recommended in-place upgrades are, or if a complete re-install is necessary for safety. > > We have a maximum window of around 4 hours for this upgrade and would not want any significant risk of an unsuccessful upgrade at the end of that time. > > Is version 8.0 considered stable? > > Thanks in advance, > > -- > David Cunningham, Voisonics Limited > http://voisonics.com/ > USA: +1 213 221 1092 > New Zealand: +64 (0)28 2558 3782 > ________ > > > > Community Meeting Calendar: > > Schedule - > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > Bridge: https://bluejeans.com/441850968 > > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users From revirii at googlemail.com Wed Aug 26 13:43:22 2020 From: revirii at googlemail.com (Hu Bert) Date: Wed, 26 Aug 2020 15:43:22 +0200 Subject: [Gluster-users] upgrade gluster from old version In-Reply-To: References: Message-ID: Hi, i'd check the release logs of every x.0-version and the upgrade guide if there are any params that are not supported anymore. If you have one of these params set, you need to disable them, e.g.: https://docs.gluster.org/en/latest/Upgrade-Guide/upgrade_to_6/ Best regards, Hubert Am Mi., 26. Aug. 2020 um 12:11 Uhr schrieb Sanju Rakonde : > > Hi, > > I believe you can do an offline upgrade (I have never tried upgrading from 3.7 to 7.7, so there might be issues). > > If you want to do a fresh install, after installing the 7.7 packages, you can use the same old bricks to create the volumes. but you need to add force at the end of volume create command. > > On Wed, Aug 26, 2020 at 5:27 AM liu zhijing wrote: >> >> hi everyone! >> I found It is hard to upgrade from a gluster version from 3.7.6 to 7.7,so I remove all the install package and delete all config file except the brick, and I resinstall the glusterfs 7.7, then reconfig the same volume name but use a temp brick.after everything is ok ,then i stop the volume and replace the old brick , change the brick volume id as the new volume id , at last start the volume successfully. >> What I want to know is if this is the right way to upgrade . >> >> >> ________ >> >> >> >> Community Meeting Calendar: >> >> Schedule - >> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >> Bridge: https://bluejeans.com/441850968 >> >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users > > > > -- > Thanks, > Sanju > ________ > > > > Community Meeting Calendar: > > Schedule - > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > Bridge: https://bluejeans.com/441850968 > > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users From dcunningham at voisonics.com Wed Aug 26 22:20:15 2020 From: dcunningham at voisonics.com (David Cunningham) Date: Thu, 27 Aug 2020 10:20:15 +1200 Subject: [Gluster-users] How safe are major version upgrades? In-Reply-To: References: Message-ID: Thank you for that Hubert. Does anyone know if an in-place upgrade between major versions is officially okay or not? On Thu, 27 Aug 2020 at 01:39, Hu Bert wrote: > Hi, > we have 2 replicate-3 systems, and i upgraded both online from 5.12 to > 6.8 and then to 7.6. No (big) problems here, upgrade took between 10 > to 20 minutes (wait until healing is done) - but no geo replication, > so i can't say anything about that part. > > Best regards, > Hubert > > Am Di., 25. Aug. 2020 um 05:47 Uhr schrieb David Cunningham > : > > > > Hello, > > > > We have a production system with around 50GB of data running GlusterFS > 5.13. It has 3 replicating/mirrored nodes, and also geo-replicates to > another site. > > > > How safe would it be to upgrade to a more recent major version, eg 7.x? > I'm not sure how recommended in-place upgrades are, or if a complete > re-install is necessary for safety. > > > > We have a maximum window of around 4 hours for this upgrade and would > not want any significant risk of an unsuccessful upgrade at the end of that > time. > > > > Is version 8.0 considered stable? > > > > Thanks in advance, > > > > -- > > David Cunningham, Voisonics Limited > > http://voisonics.com/ > > USA: +1 213 221 1092 > > New Zealand: +64 (0)28 2558 3782 > > ________ > > > > > > > > Community Meeting Calendar: > > > > Schedule - > > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > > Bridge: https://bluejeans.com/441850968 > > > > Gluster-users mailing list > > Gluster-users at gluster.org > > https://lists.gluster.org/mailman/listinfo/gluster-users > -- David Cunningham, Voisonics Limited http://voisonics.com/ USA: +1 213 221 1092 New Zealand: +64 (0)28 2558 3782 -------------- next part -------------- An HTML attachment was scrubbed... URL: From wkmail at bneit.com Wed Aug 26 22:45:44 2020 From: wkmail at bneit.com (WK) Date: Wed, 26 Aug 2020 15:45:44 -0700 Subject: [Gluster-users] set: failed: Quorum not met. Volume operation not allowed. Message-ID: <4498aa16-4929-ea1e-f017-88cde1baff12@bneit.com> So we migrated a number of VMs from a small Gluster 2+1A volume to a newer cluster. Then a few days later the client said he wanted an old forgotten file that had been left behind on the the deprecated system. However the arbiter and one of the brick nodes had been scraped, leaving only a single gluster node. The volume I need uses shards so I am not excited about having to piece it back together. I powered it up the single node and tried to mount the volume and of course it refused to mount due to quorum and gluster volume status shows the volume offline In the past I had worked around this issue by disabling quorum, but that was years ago, so I googled it and found list messages suggesting the following: gluster volume set VOL cluster.quorum-type none gluster volume set VOL cluster.server-quorum-type none However, the gluster 6.9 system refuses to accept those set commands due to the quorum and spits out the set failed error. So in modern Gluster, what is the preferred method for starting and mounting a? single node/volume that was once part of a actual 3 node cluster? Thanks. -wk -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomas.cameron at camerontech.com Wed Aug 26 22:58:29 2020 From: thomas.cameron at camerontech.com (Thomas Cameron) Date: Wed, 26 Aug 2020 17:58:29 -0500 Subject: [Gluster-users] How safe are major version upgrades? In-Reply-To: References: Message-ID: <5c797ef8-b77c-c717-8edf-c7196054fce1@camerontech.com> On 8/26/2020 5:20 PM, David Cunningham wrote: > Thank you for that Hubert. > > Does anyone know if an in-place upgrade between major versions is > officially okay or not? As a tangential question, what versions are safe to move from and to? I've inherited an ooooold CentOS 7 machine running an ancient version of Gluster (I want to say 3.x) from EPEL. Can I move from that to 7 directly? Or do I need to go from 3 to 4 to 5 to 6 to 7? Thomas From dcunningham at voisonics.com Thu Aug 27 03:59:59 2020 From: dcunningham at voisonics.com (David Cunningham) Date: Thu, 27 Aug 2020 15:59:59 +1200 Subject: [Gluster-users] Geo-replication force active server Message-ID: Hello, We have geo-replication, and one of the nodes on the primary side (in mirror replication with the other primary nodes), is acting a little badly. These have been mentioned in other emails to the list about high CPU usage and not closing log files. At the moment it's hard to tell whether the problems are because this is the active geo-replication push node, or if it's something else to do with the server it's running on. How can we force a particular node to be the active geo-replication push node? If we can make a different node the push and the problems move too, then we know it's geo-replication and not the server that's the problem. Thanks in advance, -- David Cunningham, Voisonics Limited http://voisonics.com/ USA: +1 213 221 1092 New Zealand: +64 (0)28 2558 3782 -------------- next part -------------- An HTML attachment was scrubbed... URL: From ksubrahm at redhat.com Thu Aug 27 04:28:35 2020 From: ksubrahm at redhat.com (Karthik Subrahmanya) Date: Thu, 27 Aug 2020 09:58:35 +0530 Subject: [Gluster-users] set: failed: Quorum not met. Volume operation not allowed. In-Reply-To: <4498aa16-4929-ea1e-f017-88cde1baff12@bneit.com> References: <4498aa16-4929-ea1e-f017-88cde1baff12@bneit.com> Message-ID: Hi, Since your two nodes are scrapped and there is no chance that they will come back in later time, you can try reducing the replica count to 1 by removing the down bricks from the volume and then mounting the volume back to access the data which is available on the only up brick. The remove brick command looks like this: gluster volume remove-brick VOLNAME replica 1 :/brick-path :/brick-path force Regards, Karthik On Thu, Aug 27, 2020 at 4:24 AM WK wrote: > > So we migrated a number of VMs from a small Gluster 2+1A volume to a newer cluster. > > Then a few days later the client said he wanted an old forgotten file that had been left behind on the the deprecated system. > > However the arbiter and one of the brick nodes had been scraped, leaving only a single gluster node. > > The volume I need uses shards so I am not excited about having to piece it back together. > > I powered it up the single node and tried to mount the volume and of course it refused to mount due to quorum and gluster volume status shows the volume offline > > In the past I had worked around this issue by disabling quorum, but that was years ago, so I googled it and found list messages suggesting the following: > > gluster volume set VOL cluster.quorum-type none > gluster volume set VOL cluster.server-quorum-type none > > However, the gluster 6.9 system refuses to accept those set commands due to the quorum and spits out the set failed error. > > So in modern Gluster, what is the preferred method for starting and mounting a single node/volume that was once part of a actual 3 node cluster? > > Thanks. > > -wk > > > ________ > > > > Community Meeting Calendar: > > Schedule - > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > Bridge: https://bluejeans.com/441850968 > > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users From phaley at mit.edu Thu Aug 27 14:32:16 2020 From: phaley at mit.edu (Pat Haley) Date: Thu, 27 Aug 2020 10:32:16 -0400 Subject: [Gluster-users] gluster volume rebalance making things more unbalanced Message-ID: Hi, We have distributed gluster volume spread across 4 bricks. Yesterday I noticed that the remaining space was uneven (about 2.7TB, 1.7TB, 1TB, 1TB) so I issued the following rebalance command * |gluster volume rebalance start force| Today I see that instead, things have gotten even more unbalanced (64G 853G 6.2T 20K).? I'm killing the rebalance now.? What should I do to make sure that I get a successful rebalance? Thanks Pat|| -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Pat Haley Email: phaley at mit.edu Center for Ocean Engineering Phone: (617) 253-6824 Dept. of Mechanical Engineering Fax: (617) 253-8125 MIT, Room 5-213 http://web.mit.edu/phaley/www/ 77 Massachusetts Avenue Cambridge, MA 02139-4301 -------------- next part -------------- An HTML attachment was scrubbed... URL: From hunter86_bg at yahoo.com Thu Aug 27 14:43:40 2020 From: hunter86_bg at yahoo.com (Strahil Nikolov) Date: Thu, 27 Aug 2020 14:43:40 +0000 (UTC) Subject: [Gluster-users] gluster volume rebalance making things more unbalanced In-Reply-To: References: Message-ID: <1666903524.6222266.1598539420219@mail.yahoo.com> Sadly I have no idea why rebalance did that , so you should check the logs on all nodes for clues. Is there any reason why you used "force" in that command ? Best Regards, Strahil Nikolov ? ?????????, 27 ?????? 2020 ?., 17:32:24 ???????+3, Pat Haley ??????: Hi, We have distributed gluster volume spread across 4 bricks.? Yesterday I noticed that the remaining space was uneven (about 2.7TB, 1.7TB, 1TB, 1TB) so I issued the following rebalance command ???? ????* gluster volume rebalance start force ???? Today I see that instead, things have gotten even more unbalanced (64G 853G 6.2T 20K).? I'm killing the rebalance now.? What should I do to make sure that I get a successful rebalance? Thanks Pat -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Pat Haley??????????????????????????Email:??phaley at mit.edu Center for Ocean Engineering?????? Phone:??(617) 253-6824 Dept. of Mechanical Engineering????Fax:????(617) 253-8125 MIT, Room 5-213????????????????????http://web.mit.edu/phaley/www/ 77 Massachusetts Avenue Cambridge, MA??02139-4301 ________ Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users at gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users From phaley at mit.edu Thu Aug 27 14:46:58 2020 From: phaley at mit.edu (Pat Haley) Date: Thu, 27 Aug 2020 10:46:58 -0400 Subject: [Gluster-users] gluster volume rebalance making things more unbalanced In-Reply-To: <1666903524.6222266.1598539420219@mail.yahoo.com> References: <1666903524.6222266.1598539420219@mail.yahoo.com> Message-ID: <0fb9460c-e22e-b64a-edf3-9dbaa8527d00@mit.edu> Hi Strahil The documentation I looked at: * https://docs.google.com/document/d/18iGX6I7I0yHUZ1zAfLIEnXDRPkIoMh6CHmwyyb6J4GM/edit#heading=h.oogvisuwd2qd suggested that not using force might leave some links behind that could affect performance Thanks Pat On 8/27/20 10:43 AM, Strahil Nikolov wrote: > Sadly I have no idea why rebalance did that , so you should check the logs on all nodes for clues. > > Is there any reason why you used "force" in that command ? > > > Best Regards, > Strahil Nikolov > > > > > > > ? ?????????, 27 ?????? 2020 ?., 17:32:24 ???????+3, Pat Haley ??????: > > > > > > > > > > > Hi, > > We have distributed gluster volume spread across 4 bricks.? Yesterday I noticed that the remaining space was uneven (about 2.7TB, 1.7TB, 1TB, 1TB) so I issued the following rebalance command > > > ????* gluster volume rebalance start force > > > Today I see that instead, things have gotten even more unbalanced (64G 853G 6.2T 20K).? I'm killing the rebalance now.? What should I do to make sure that I get a successful rebalance? > > Thanks > > Pat -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Pat Haley Email: phaley at mit.edu Center for Ocean Engineering Phone: (617) 253-6824 Dept. of Mechanical Engineering Fax: (617) 253-8125 MIT, Room 5-213 http://web.mit.edu/phaley/www/ 77 Massachusetts Avenue Cambridge, MA 02139-4301 -------------- next part -------------- An HTML attachment was scrubbed... URL: From joe at julianfamily.org Thu Aug 27 14:53:37 2020 From: joe at julianfamily.org (Joe Julian) Date: Thu, 27 Aug 2020 07:53:37 -0700 Subject: [Gluster-users] gluster volume rebalance making things more unbalanced In-Reply-To: References: Message-ID: <36D22279-C136-404B-94D0-0F5ACC10FD87@julianfamily.org> When a file should be moved based on its dht hash mapping but the target that it should be moved to has less free space than the origin, the rebalance command does not move the file and leaves the dht pointer in place. When you use "force", you override that behavior and always move each file regardless of free space. In theory, eventually when the rebalance is finished you should end up with utilization mostly balanced but as the rebalance is processing you may end up in the state you show. On August 27, 2020 7:32:16 AM PDT, Pat Haley wrote: > >Hi, > >We have distributed gluster volume spread across 4 bricks. Yesterday I >noticed that the remaining space was uneven (about 2.7TB, 1.7TB, 1TB, >1TB) so I issued the following rebalance command > > * |gluster volume rebalance start force| > >Today I see that instead, things have gotten even more unbalanced (64G >853G 6.2T 20K).? I'm killing the rebalance now.? What should I do to >make sure that I get a successful rebalance? > >Thanks > >Pat|| > >-- > >-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >Pat Haley Email: phaley at mit.edu >Center for Ocean Engineering Phone: (617) 253-6824 >Dept. of Mechanical Engineering Fax: (617) 253-8125 >MIT, Room 5-213 http://web.mit.edu/phaley/www/ >77 Massachusetts Avenue >Cambridge, MA 02139-4301 -- Sent from my Android device with K-9 Mail. Please excuse my brevity. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rkothiya at redhat.com Thu Aug 27 17:32:32 2020 From: rkothiya at redhat.com (Rinku Kothiya) Date: Thu, 27 Aug 2020 23:02:32 +0530 Subject: [Gluster-users] [Gluster-devel] Announcing Gluster release 8.1 Message-ID: Hi, The Gluster community is pleased to announce the release of Gluster8.1 (packages available at [1]). Release notes for the release can be found at [2]. Major changes, features, improvements and limitations addressed in this release: - Performance improvement over the creation of large files - VM disks in oVirt by bringing down trivial lookups of non-existent shards. Issue (#1425) - Fsync in the replication module uses eager-lock functionality which improves the performance of VM workloads with an improvement of more than 50% in small-block of approximately 4kb with write heavy workloads. Issue (#1253) Thanks, Gluster community References: [1] Packages for 8.1: https://download.gluster.org/pub/gluster/glusterfs/8/8.1/ [2] Release notes for 8.1: https://docs.gluster.org/en/latest/release-notes/8.1/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From gilberto.nunes32 at gmail.com Thu Aug 27 18:35:45 2020 From: gilberto.nunes32 at gmail.com (Gilberto Nunes) Date: Thu, 27 Aug 2020 15:35:45 -0300 Subject: [Gluster-users] Eager lock Message-ID: Hi there I wonder if eager lock for a 2-node gluster brings some improvement specially in this new gluster 8.1... Is there any pros? Thanks --- Gilberto Nunes Ferreira -------------- next part -------------- An HTML attachment was scrubbed... URL: From wkmail at bneit.com Thu Aug 27 19:47:56 2020 From: wkmail at bneit.com (WK) Date: Thu, 27 Aug 2020 12:47:56 -0700 Subject: [Gluster-users] set: failed: Quorum not met. Volume operation not allowed. In-Reply-To: References: <4498aa16-4929-ea1e-f017-88cde1baff12@bneit.com> Message-ID: No Luck.? Same problem. I stopped the volume. I ran the remove-brick command. It warned about not being able to migrate files from removed bricks and asked if I want to continue. when I say 'yes' Gluster responds with 'failed: Quorum not met Volume operation not allowed' -wk On 8/26/2020 9:28 PM, Karthik Subrahmanya wrote: > Hi, > > Since your two nodes are scrapped and there is no chance that they > will come back in later time, you can try reducing the replica count > to 1 by removing the down bricks from the volume and then mounting the > volume back to access the data which is available on the only up > brick. > The remove brick command looks like this: > > gluster volume remove-brick VOLNAME replica 1 > :/brick-path > :/brick-path force > > Regards, > Karthik > > > On Thu, Aug 27, 2020 at 4:24 AM WK wrote: >> So we migrated a number of VMs from a small Gluster 2+1A volume to a newer cluster. >> >> Then a few days later the client said he wanted an old forgotten file that had been left behind on the the deprecated system. >> >> However the arbiter and one of the brick nodes had been scraped, leaving only a single gluster node. >> >> The volume I need uses shards so I am not excited about having to piece it back together. >> >> I powered it up the single node and tried to mount the volume and of course it refused to mount due to quorum and gluster volume status shows the volume offline >> >> In the past I had worked around this issue by disabling quorum, but that was years ago, so I googled it and found list messages suggesting the following: >> >> gluster volume set VOL cluster.quorum-type none >> gluster volume set VOL cluster.server-quorum-type none >> >> However, the gluster 6.9 system refuses to accept those set commands due to the quorum and spits out the set failed error. >> >> So in modern Gluster, what is the preferred method for starting and mounting a single node/volume that was once part of a actual 3 node cluster? >> >> Thanks. >> >> -wk >> >> >> ________ >> >> >> >> Community Meeting Calendar: >> >> Schedule - >> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >> Bridge: https://bluejeans.com/441850968 >> >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users From wkmail at bneit.com Thu Aug 27 21:07:14 2020 From: wkmail at bneit.com (WK) Date: Thu, 27 Aug 2020 14:07:14 -0700 Subject: [Gluster-users] set: failed: Quorum not met. Volume operation not allowed. SUCCESS In-Reply-To: References: <4498aa16-4929-ea1e-f017-88cde1baff12@bneit.com> Message-ID: So success! I dont know why but when I set "server-quorum-type" to none FIRST it seemed to work without complaining about quorum. then quorum-type was able to be set to none as well ? gluster volume set VOL cluster.server-quorum-type none ? gluster volume set VOL cluster.quorum-type none Finally I used Karthik's remove-brick command and it worked this time and I am now copying off the needed image. So I guess order counts. Thanks. -wk On 8/27/2020 12:47 PM, WK wrote: > No Luck.? Same problem. > > I stopped the volume. > > I ran the remove-brick command. It warned about not being able to > migrate files from removed bricks and asked if I want to continue. > > when I say 'yes' > > Gluster responds with 'failed: Quorum not met Volume operation not > allowed' > > > -wk > > On 8/26/2020 9:28 PM, Karthik Subrahmanya wrote: >> Hi, >> >> Since your two nodes are scrapped and there is no chance that they >> will come back in later time, you can try reducing the replica count >> to 1 by removing the down bricks from the volume and then mounting the >> volume back to access the data which is available on the only up >> brick. >> The remove brick command looks like this: >> >> gluster volume remove-brick VOLNAME replica 1 >> :/brick-path >> :/brick-path force >> >> Regards, >> Karthik >> >> >> On Thu, Aug 27, 2020 at 4:24 AM WK wrote: >>> So we migrated a number of VMs from a small Gluster 2+1A volume to a >>> newer cluster. >>> >>> Then a few days later the client said he wanted an old forgotten >>> file that had been left behind on the the deprecated system. >>> >>> However the arbiter and one of the brick nodes had been scraped, >>> leaving only a single gluster node. >>> >>> The volume I need uses shards so I am not excited about having to >>> piece it back together. >>> >>> I powered it up the single node and tried to mount the volume and of >>> course it refused to mount due to quorum and gluster volume status >>> shows the volume offline >>> >>> In the past I had worked around this issue by disabling quorum, but >>> that was years ago, so I googled it and found list messages >>> suggesting the following: >>> >>> ? gluster volume set VOL cluster.quorum-type none >>> ? gluster volume set VOL cluster.server-quorum-type none >>> >>> However, the gluster 6.9 system refuses to accept those set commands >>> due to the quorum and spits out the set failed error. >>> >>> So in modern Gluster, what is the preferred method for starting and >>> mounting a? single node/volume that was once part of a actual 3 node >>> cluster? >>> >>> Thanks. >>> >>> -wk >>> >>> >>> ________ >>> >>> >>> >>> Community Meeting Calendar: >>> >>> Schedule - >>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >>> Bridge: https://bluejeans.com/441850968 >>> >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> https://lists.gluster.org/mailman/listinfo/gluster-users > ________ > > > > Community Meeting Calendar: > > Schedule - > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > Bridge: https://bluejeans.com/441850968 > > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users From ksubrahm at redhat.com Fri Aug 28 04:15:36 2020 From: ksubrahm at redhat.com (Karthik Subrahmanya) Date: Fri, 28 Aug 2020 09:45:36 +0530 Subject: [Gluster-users] set: failed: Quorum not met. Volume operation not allowed. SUCCESS In-Reply-To: References: <4498aa16-4929-ea1e-f017-88cde1baff12@bneit.com> Message-ID: Hi, You had server-quorum enabled which could be the cause of the errors you were getting at the first place. In latest releases only client-quorum is enabled and the server-quorum is disabled by default. Yes, the order matters in such cases. Regards, Karthik On Fri, Aug 28, 2020 at 2:37 AM WK wrote: > > So success! > > I dont know why but when I set "server-quorum-type" to none FIRST it > seemed to work without complaining about quorum. > > then quorum-type was able to be set to none as well > > gluster volume set VOL cluster.server-quorum-type none > gluster volume set VOL cluster.quorum-type none > > Finally I used Karthik's remove-brick command and it worked this time > and I am now copying off the needed image. > > So I guess order counts. > > Thanks. > > -wk > > > > On 8/27/2020 12:47 PM, WK wrote: > > No Luck. Same problem. > > > > I stopped the volume. > > > > I ran the remove-brick command. It warned about not being able to > > migrate files from removed bricks and asked if I want to continue. > > > > when I say 'yes' > > > > Gluster responds with 'failed: Quorum not met Volume operation not > > allowed' > > > > > > -wk > > > > On 8/26/2020 9:28 PM, Karthik Subrahmanya wrote: > >> Hi, > >> > >> Since your two nodes are scrapped and there is no chance that they > >> will come back in later time, you can try reducing the replica count > >> to 1 by removing the down bricks from the volume and then mounting the > >> volume back to access the data which is available on the only up > >> brick. > >> The remove brick command looks like this: > >> > >> gluster volume remove-brick VOLNAME replica 1 > >> :/brick-path > >> :/brick-path force > >> > >> Regards, > >> Karthik > >> > >> > >> On Thu, Aug 27, 2020 at 4:24 AM WK wrote: > >>> So we migrated a number of VMs from a small Gluster 2+1A volume to a > >>> newer cluster. > >>> > >>> Then a few days later the client said he wanted an old forgotten > >>> file that had been left behind on the the deprecated system. > >>> > >>> However the arbiter and one of the brick nodes had been scraped, > >>> leaving only a single gluster node. > >>> > >>> The volume I need uses shards so I am not excited about having to > >>> piece it back together. > >>> > >>> I powered it up the single node and tried to mount the volume and of > >>> course it refused to mount due to quorum and gluster volume status > >>> shows the volume offline > >>> > >>> In the past I had worked around this issue by disabling quorum, but > >>> that was years ago, so I googled it and found list messages > >>> suggesting the following: > >>> > >>> gluster volume set VOL cluster.quorum-type none > >>> gluster volume set VOL cluster.server-quorum-type none > >>> > >>> However, the gluster 6.9 system refuses to accept those set commands > >>> due to the quorum and spits out the set failed error. > >>> > >>> So in modern Gluster, what is the preferred method for starting and > >>> mounting a single node/volume that was once part of a actual 3 node > >>> cluster? > >>> > >>> Thanks. > >>> > >>> -wk > >>> > >>> > >>> ________ > >>> > >>> > >>> > >>> Community Meeting Calendar: > >>> > >>> Schedule - > >>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > >>> Bridge: https://bluejeans.com/441850968 > >>> > >>> Gluster-users mailing list > >>> Gluster-users at gluster.org > >>> https://lists.gluster.org/mailman/listinfo/gluster-users > > ________ > > > > > > > > Community Meeting Calendar: > > > > Schedule - > > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > > Bridge: https://bluejeans.com/441850968 > > > > Gluster-users mailing list > > Gluster-users at gluster.org > > https://lists.gluster.org/mailman/listinfo/gluster-users > From diego.zuccato at unibo.it Fri Aug 28 06:34:51 2020 From: diego.zuccato at unibo.it (Diego Zuccato) Date: Fri, 28 Aug 2020 08:34:51 +0200 Subject: [Gluster-users] Node sizing Message-ID: <8f52420e-d601-91b4-ee7b-42570b8d12b4@unibo.it> Hello all. I just noticed that rebuilding arbiter bricks is using lots of CPU and RAM. I thought it was quite a lightweight op so I installed the arbiter node in a VM, but 8CPUs and 16GB RAM are maxed out (and a bit of swap gets used, too). The volume is 28*(2+1) 10TB bricks. Gluster v 5.5 . Is there some rule of thumb for sizing nodes? I couldn't find anything... TIA. -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Universit? di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 From felix.koelzow at gmx.de Fri Aug 28 08:31:08 2020 From: felix.koelzow at gmx.de (=?UTF-8?Q?Felix_K=c3=b6lzow?=) Date: Fri, 28 Aug 2020 10:31:08 +0200 Subject: [Gluster-users] How to fix I/O error ? (resend) In-Reply-To: <20656535-6048-5f36-fed5-75553a53dd3f@unibo.it> References: <4c279753-56d4-ae24-8baa-9d739757bd0a@unibo.it> <66c74779-a32c-9f12-59c3-7a1e76f41d3a@unibo.it> <20656535-6048-5f36-fed5-75553a53dd3f@unibo.it> Message-ID: Dear Diego, I faced a similar issue on gluster 6.0 and I was able to resolve it (at least in my case). Observation: I faced a directory were a simple ls leads to input/output error. I cd into the corresponding directory on the brick and I did a ls command and it works. I got a list of all the file names # ls -1v * > /tmp/mylist Afterwards, I cd into the directory of interest on the MOUNTPOINT and I removed all the files which are obviously hidden due to the input/output error: # while read item # do # rm -rf $item # done < /tmp/mylist Thats it. Afterwards, I copied the deleted files back from our backup. Please give me a hint if this procedure also works for you. Regards, Felix On 26/08/2020 08:43, Diego Zuccato wrote: > Il 25/08/20 15:27, Amar Tumballi ha scritto: > >> I am not aware of any data layout changes we did between current?latest >> (7.7) and 3.8.8. But due to some issues, 'online' migration is not >> possible, even the clients needs to be updated, so you have to umount >> the volume once. > Tks for the info. > Actually the issue is less bad than I thought: I checked on a client > that (somehow) still used Debian oldstable. Current stable uses 5.5, > still old but not prehistoric :) > > Too bad the original issue still persists, even after removing the file > and its hardlink from .gluster dir :( > Maybe the upgrade can fix it? Or I risk breaking it even more? > From diego.zuccato at unibo.it Fri Aug 28 11:47:02 2020 From: diego.zuccato at unibo.it (Diego Zuccato) Date: Fri, 28 Aug 2020 13:47:02 +0200 Subject: [Gluster-users] How to fix I/O error ? (resend) In-Reply-To: References: <4c279753-56d4-ae24-8baa-9d739757bd0a@unibo.it> <66c74779-a32c-9f12-59c3-7a1e76f41d3a@unibo.it> <20656535-6048-5f36-fed5-75553a53dd3f@unibo.it> Message-ID: <634928b2-fb66-fefa-643f-4e9e92082432@unibo.it> Il 28/08/20 10:31, Felix K?lzow ha scritto: > I faced a directory were a simple ls leads to input/output error. I saw something similar, but the directory was OK, except some files that reported "??" (IIRC in the size field). That got healed automatically. > I cd into the corresponding directory on the brick and I did a ls > command and it works. Well, you have to check all the bricks of a replica to be sure to get all the files. > # while read item > # do > # rm -rf $item > # done < /tmp/mylist Before this I'd have saved the files outside of the bricks :) > Thats it. Afterwards, I copied the deleted files back from our backup. Ah, you had a backup! :) > Please give me a hint if this procedure also works for you.Different situation. But could probably work. Except for the fact we don't have a backup of those files :( Our volume is mostly used for archiving, so writes are rare. I know really well redundancy is no substitute for a backup (with redundancy only, if a file gets deleted, it's lost -- for this, a WORM translator could be useful :) ). BTW, in my case I noticed that having the two replicas online and bringing down the arbiters brought back online the files, so I completely removed the abriter bricks (degrading to replica 2) and I'm now slowly re-adding 'em to have "replica 3 arbiter 1" again (see "node sizing" thread). -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Universit? di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 From phaley at mit.edu Fri Aug 28 16:57:09 2020 From: phaley at mit.edu (Pat Haley) Date: Fri, 28 Aug 2020 12:57:09 -0400 Subject: [Gluster-users] Removing spurious hostname from peer configuration Message-ID: <2c6d030c-c6c3-4cee-ebda-7c2a43e1a356@mit.edu> Hi All, We have a distributed gluster filesystem across 2 servers.? We recently realized that one of the servers (mseas-data3) has 2 hostnames for the other server (mseas-data2).? One of these is on an external port that we rarely use.? When that port went down following a power outage, we ended up in a weird state where the gluster filesystem was being served to the rest of the cluster but (a) mseas-data3 kept indicating that mseas-data2 was disconnected in response to "gluster peer status" and (b) we kept having to restart the glusterd daemon on mseas-data3.? Since we don't use the external port much and didn't think gluster used it at all it was a while before we diagnosed the problem. Now we would like to expunge that external hostname by making the following changes _Current setting on MSEAS-DATA3_ /var/lib/glusterd/peers/c1110fd9-cb99-4ca1-b18a-536a122d67ef uuid=c1110fd9-cb99-4ca1-b18a-536a122d67ef state=3 hostname1=MSEAS-DATA2.MIT.EDU hostname2=mseas-data2 _Proposed change on MSEAS-DATA3_ /var/lib/glusterd/peers/c1110fd9-cb99-4ca1-b18a-536a122d67ef uuid=c1110fd9-cb99-4ca1-b18a-536a122d67ef state=3 #hostname1=MSEAS-DATA2.MIT.EDU hostname1=mseas-data2 (manually changing a configuration file).? Is this the correct approach?? Do we need to make this change in additional files as well? Do we need to bring down the volume and daemons first? Any advice will be greatly appreciated Thanks Pat -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Pat Haley Email: phaley at mit.edu Center for Ocean Engineering Phone: (617) 253-6824 Dept. of Mechanical Engineering Fax: (617) 253-8125 MIT, Room 5-213 http://web.mit.edu/phaley/www/ 77 Massachusetts Avenue Cambridge, MA 02139-4301 -------------- next part -------------- An HTML attachment was scrubbed... URL: From hunter86_bg at yahoo.com Fri Aug 28 19:20:53 2020 From: hunter86_bg at yahoo.com (Strahil Nikolov) Date: Fri, 28 Aug 2020 19:20:53 +0000 (UTC) Subject: [Gluster-users] Removing spurious hostname from peer configuration In-Reply-To: <2c6d030c-c6c3-4cee-ebda-7c2a43e1a356@mit.edu> References: <2c6d030c-c6c3-4cee-ebda-7c2a43e1a356@mit.edu> Message-ID: <1141710855.2227.1598642453723@mail.yahoo.com> Hi Pat I have done something similar with downtime (on a lab), but I think you don't need any downtime. Here is my idea: - Edit all nodes and remove that file - Stop Glusterd on all nodes (Glusterd , but not the bricks' processes) - Start Glusterd on all nodes It seems that if you edit on 1 node and restart , the node will get the configuration from the other node which will have the 2 hostnames. Thus , you need to power down all glusterd processes in the TSP. Yet, I haven't done it exactly like that (I stopped also the bricks, but I could afford it) and you need to do a test on a set of VMs. Best Regards, Strahil Nikolov ? ?????, 28 ?????? 2020 ?., 19:57:18 ???????+3, Pat Haley ??????: Hi All, We have a distributed gluster filesystem across 2 servers.? We recently realized that one of the servers (mseas-data3) has 2 hostnames for the other server (mseas-data2).? One of these is on an external port that we rarely use.? When that port went down following a power outage, we ended up in a weird state where the gluster filesystem was being served to the rest of the cluster but (a) mseas-data3 kept indicating that mseas-data2 was disconnected in response to "gluster peer status" and (b) we kept having to restart the glusterd daemon on mseas-data3.? Since we don't use the external port much and didn't think gluster used it at all it was a while before we diagnosed the problem. Now we would like to expunge that external hostname by making the following changes Current setting on MSEAS-DATA3 /var/lib/glusterd/peers/c1110fd9-cb99-4ca1-b18a-536a122d67ef uuid=c1110fd9-cb99-4ca1-b18a-536a122d67ef state=3 hostname1=MSEAS-DATA2.MIT.EDU hostname2=mseas-data2 Proposed change on MSEAS-DATA3 /var/lib/glusterd/peers/c1110fd9-cb99-4ca1-b18a-536a122d67ef uuid=c1110fd9-cb99-4ca1-b18a-536a122d67ef state=3 #hostname1=MSEAS-DATA2.MIT.EDU hostname1=mseas-data2 (manually changing a configuration file).? Is this the correct approach?? Do we need to make this change in additional files as well? Do we need to bring down the volume and daemons first? Any advice will be greatly appreciated Thanks Pat -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Pat Haley??????????????????????????Email:??phaley at mit.edu Center for Ocean Engineering?????? Phone:??(617) 253-6824 Dept. of Mechanical Engineering????Fax:????(617) 253-8125 MIT, Room 5-213????????????????????http://web.mit.edu/phaley/www/ 77 Massachusetts Avenue Cambridge, MA??02139-4301 ________ Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users at gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users From archon810 at gmail.com Sat Aug 29 17:39:49 2020 From: archon810 at gmail.com (Artem Russakovskii) Date: Sat, 29 Aug 2020 10:39:49 -0700 Subject: [Gluster-users] Monitoring tools for GlusterFS In-Reply-To: References: <58f109a7-6d62-4814-425d-7728ea4f8338@fischer-ka.de> Message-ID: Another small tweak: in your README, you have this: "curl -LO v1.0.3 gstatus (download) " This makes it impossible to just easily copy paste. You should just put the link in there, and wrap in code formatting blocks. Sincerely, Artem -- Founder, Android Police , APK Mirror , Illogical Robot LLC beerpla.net | @ArtemR On Sun, Aug 23, 2020 at 6:20 AM Sachidananda Urs wrote: > > > On Sat, Aug 22, 2020 at 10:52 PM Artem Russakovskii > wrote: > >> The output currently has some whitespace issues. >> >> 1. The space shift under Cluster is different than under Volumes, making >> the output look a bit inconsistent. >> 2. Can you please fix the tabulation for when volume names are varying in >> length? This output is shifted and looks messy as a result for me. >> > > Artem, since the volume names vary for longer volumes the column number > increases and users have to scroll to right. > To overcome this, I have decided to print the volume name in a row by > itself. PR: https://github.com/gluster/gstatus/pull/44 fixes the issue. > > The output looks like this: > > root at master-node:/home/sac/work/gstatus# gstatus > > > Cluster: > > Status: Healthy GlusterFS: 9dev > > Nodes: 3/3 Volumes: 2/2 > > > Volumes: > > > snap-1 > > Replicate Started (UP) - 2/2 Bricks Up > > Capacity: (12.04% used) 5.00 GiB/40.00 > GiB (used/total) > > Snapshots: 2 > > Quota: On > > > very_very_long_long_name_to_test_the_gstatus_display > > Replicate Started (UP) - 2/2 Bricks Up > > Capacity: (12.04% used) 5.00 GiB/40.00 > GiB (used/total) > > > > root at master-node:/home/sac/work/gstatus# > > > -sac > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sacchi at kadalu.io Sun Aug 30 13:55:40 2020 From: sacchi at kadalu.io (Sachidananda Urs) Date: Sun, 30 Aug 2020 19:25:40 +0530 Subject: [Gluster-users] Monitoring tools for GlusterFS In-Reply-To: References: <58f109a7-6d62-4814-425d-7728ea4f8338@fischer-ka.de> Message-ID: On Sat, Aug 29, 2020 at 11:10 PM Artem Russakovskii wrote: > Another small tweak: in your README, you have this: > "curl -LO v1.0.3 gstatus (download) > " > This makes it impossible to just easily copy paste. You should just put > the link in there, and wrap in code formatting blocks. > Ack. PR: https://github.com/gluster/gstatus/pull/48 should fix the issue. -------------- next part -------------- An HTML attachment was scrubbed... URL: From dcunningham at voisonics.com Mon Aug 31 04:11:40 2020 From: dcunningham at voisonics.com (David Cunningham) Date: Mon, 31 Aug 2020 16:11:40 +1200 Subject: [Gluster-users] Geo-replication log file not closed In-Reply-To: References: Message-ID: Hello all, Apparently we don't want to "kill -HUP" the two processes that have rotated log file still open: root 4495 1 0 Aug10 ? 00:00:59 /usr/bin/python2 /usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/gsyncd.py --path=/nodirectwritedata/gluster/gvol0 --monitor -c /var/lib/glusterd/geo-replication/gvol0_nvfs10_gvol0/gsyncd.conf --iprefix=/var :gvol0 --glusterd-uuid=b7521445-ee93-4fed-8ced-6a609fa8c7d4 nvfs10::gvol0 root 4508 4495 0 Aug10 ? 00:01:56 python2 /usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/gsyncd.py agent gvol0 nvfs10::gvol0 --local-path /nodirectwritedata/gluster/gvol0 --local-node cafs30 --local-node-id b7521445-ee93-4fed-8ced-6a609fa8c7d4 --slave-id cdcdb210-839c-4306-a4dc-e696b165ed17 --rpc-fd 9,12,11,10 ... a kill -HUP on those processes stops them rather than re-opening the log file. Does anyone know if these processes are supposed to have gsyncd.log open? If so, how do we tell them to close and re-open their file handle? Thanks in advance! On Tue, 25 Aug 2020 at 15:24, David Cunningham wrote: > Hello, > > We're having an issue with the rotated gsyncd.log not being released. > Here's the output of 'lsof': > > # lsof | grep 'gsyncd.log.1' > python2 4495 root 3w REG 8,1 > 991675023 4332241 > /var/log/glusterfs/geo-replication/gvol0_nvfs10_gvol0/gsyncd.log.1 (deleted) > python2 4495 4496 root 3w REG 8,1 > 991675023 4332241 > /var/log/glusterfs/geo-replication/gvol0_nvfs10_gvol0/gsyncd.log.1 (deleted) > python2 4495 4507 root 3w REG 8,1 > 991675023 4332241 > /var/log/glusterfs/geo-replication/gvol0_nvfs10_gvol0/gsyncd.log.1 (deleted) > python2 4508 root 3w REG 8,1 > 991675023 4332241 > /var/log/glusterfs/geo-replication/gvol0_nvfs10_gvol0/gsyncd.log.1 (deleted) > python2 4508 root 5w REG 8,1 > 991675023 4332241 > /var/log/glusterfs/geo-replication/gvol0_nvfs10_gvol0/gsyncd.log.1 (deleted) > python2 4508 4511 root 3w REG 8,1 > 991675023 4332241 > /var/log/glusterfs/geo-replication/gvol0_nvfs10_gvol0/gsyncd.log.1 (deleted) > ... etc... > > Those processes are: > # ps -ef | egrep '4495|4508' > root 4495 1 0 Aug10 ? 00:00:59 /usr/bin/python2 > /usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/gsyncd.py > --path=/nodirectwritedata/gluster/gvol0 --monitor -c > /var/lib/glusterd/geo-replication/gvol0_nvfs10_gvol0/gsyncd.conf > --iprefix=/var :gvol0 --glusterd-uuid=b7521445-ee93-4fed-8ced-6a609fa8c7d4 > nvfs10::gvol0 > root 4508 4495 0 Aug10 ? 00:01:56 python2 > /usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/gsyncd.py agent gvol0 > nvfs10::gvol0 --local-path /nodirectwritedata/gluster/gvol0 --local-node > cafs30 --local-node-id b7521445-ee93-4fed-8ced-6a609fa8c7d4 --slave-id > cdcdb210-839c-4306-a4dc-e696b165ed17 --rpc-fd 9,12,11,10 > > And here's the relevant part of the /etc/logrotate.d/glusterfs-georep > script: > > /var/log/glusterfs/geo-replication/*/*.log { > sharedscripts > rotate 52 > missingok > compress > delaycompress > notifempty > postrotate > for pid in `ps -aef | grep glusterfs | egrep "\-\-aux-gfid-mount" | > awk '{print $2}'`; do > /usr/bin/kill -HUP $pid > /dev/null 2>&1 || true > done > endscript > } > > If I run the postrotate part manually: > # ps -aef | grep glusterfs | egrep "\-\-aux-gfid-mount" | awk '{print $2}' > 4520 > > # ps -aef | grep 4520 > root 4520 1 0 Aug10 ? 01:24:23 /usr/sbin/glusterfs > --aux-gfid-mount --acl --log-level=INFO > --log-file=/var/log/glusterfs/geo-replication/gvol0_nvfs10_gvol0/mnt-nodirectwritedata-gluster-gvol0.log > --volfile-server=localhost --volfile-id=gvol0 --client-pid=-1 > /tmp/gsyncd-aux-mount-Tq_3sU > > Perhaps the problem is that the kill -HUP in the logrotate script doesn't > act on the right process? If so, does anyone have a command to get the > right PID? > > Thanks in advance for any help. > > -- > David Cunningham, Voisonics Limited > http://voisonics.com/ > USA: +1 213 221 1092 > New Zealand: +64 (0)28 2558 3782 > -- David Cunningham, Voisonics Limited http://voisonics.com/ USA: +1 213 221 1092 New Zealand: +64 (0)28 2558 3782 -------------- next part -------------- An HTML attachment was scrubbed... URL: