<html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">Patrick,<div class=""><br class=""></div><div class="">I would definitely upgrade your two nodes from 3.12.14 to 3.12.15. You also mention ZFS, and that error you show makes me think you need to check to be sure you have “xattr=sa” and “acltype=posixacl” set on your ZFS volumes.</div><div class=""><br class=""></div><div class="">You also observed your bricks are crossing the 95% full line, ZFS performance will degrade significantly the closer you get to full. In my experience, this starts somewhere between 10% and 5% free space remaining, so you’re in that realm. </div><div class=""><br class=""></div><div class="">How’s your free memory on the servers doing? Do you have your zfs arc cache limited to something less than all the RAM? It shares pretty well, but I’ve encountered situations where other things won’t try and take ram back properly if they think it’s in use, so ZFS never gets the opportunity to give it up.</div><div class=""><br class=""></div><div class="">Since your volume is a disperse-replica, you might try tuning disperse.shd-max-threads, default is 1, I’d try it at 2, 4, or even more if the CPUs are beefy enough. And setting server.event-threads to 4 and client.event-threads to 8 has proven helpful in many cases. After you get upgraded to 3.12.15, enabling performance.stat-prefetch may help as well. I don’t know if it matters, but I’d also recommend resetting performance.least-prio-threads to the default of 1 (or try 2 or 4) and/or also setting <a href="http://performance.io" class="">performance.io</a>-thread-count to 32 if those have beefy CPUs.</div><div class=""><br class=""></div><div class="">Beyond those general ideas, more info about your hardware (CPU and RAM) and workload (VMs, direct storage for web servers or enders, etc) may net you some more ideas. Then you’re going to have to do more digging into brick logs looking for errors and/or warnings to see what’s going on.</div><div class=""><br class=""></div><div class=""> -Darrell</div><div class=""><br class=""></div><div class=""><br class=""><div><blockquote type="cite" class=""><div class="">On Apr 20, 2019, at 8:22 AM, Patrick Rennie <<a href="mailto:patrickmrennie@gmail.com" class="">patrickmrennie@gmail.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div dir="ltr" class="">Hello Gluster Users, <div class=""><br class=""></div><div class="">I am hoping someone can help me with resolving an ongoing issue I've been having, I'm new to mailing lists so forgive me if I have gotten anything wrong. We have noticed our performance deteriorating over the last few weeks, easily measured by trying to do an ls on one of our top-level folders, and timing it, which usually would take 2-5 seconds, and now takes up to 20 minutes, which obviously renders our cluster basically unusable. This has been intermittent in the past but is now almost constant and I am not sure how to work out the exact cause. We have noticed some errors in the brick logs, and have noticed that if we kill the right brick process, performance instantly returns back to normal, this is not always the same brick, but it indicates to me something in the brick processes or background tasks may be causing extreme latency. Due to this ability to fix it by killing the right brick process off, I think it's a specific file, or folder, or operation which may be hanging and causing the increased latency, but I am not sure how to work it out. One last thing to add is that our bricks are getting quite full (~95% full), we are trying to migrate data off to new storage but that is going slowly, not helped by this issue. I am currently trying to run a full heal as there appear to be many files needing healing, and I have all brick processes running so they have an opportunity to heal, but this means performance is very poor. It currently takes over 15-20 minutes to do an ls of one of our top-level folders, which just contains 60-80 other folders, this should take 2-5 seconds. This is all being checked by FUSE mount locally on the storage node itself, but it is the same for other clients and VMs accessing the cluster. Initially, it seemed our NFS mounts were not affected and operated at normal speed, but testing over the last day has shown that our NFS clients are also extremely slow, so it doesn't seem specific to FUSE as I first thought it might be. </div><div class=""><br class=""></div><div class="">I am not sure how to proceed from here, I am fairly new to gluster having inherited this setup from my predecessor and trying to keep it going. I have included some info below to try and help with diagnosis, please let me know if any further info would be helpful. I would really appreciate any advice on what I could try to work out the cause. Thank you in advance for reading this, and any suggestions you might be able to offer. </div><div class=""><br class=""></div><div class="">- Patrick</div><div class=""><br class=""></div><div class="">This is an example of the main error I see in our brick logs, there have been others, I can post them when I see them again too:</div><div class="">[2019-04-20 04:54:43.055680] E [MSGID: 113001] [posix.c:4940:posix_getxattr] 0-gvAA01-posix: getxattr failed on /brick1/<filename> library: system.posix_acl_default [Operation not supported]<br class=""></div><div class=""><div class="">[2019-04-20 05:01:29.476313] W [posix.c:4929:posix_getxattr] 0-gvAA01-posix: Extended attributes not supported (try remounting brick with 'user_xattr' flag)<br class=""></div></div><div class=""><br class=""></div><div class="">Our setup consists of 2 storage nodes and an arbiter node. I have noticed our nodes are on slightly different versions, I'm not sure if this could be an issue. We have 9 bricks on each node, made up of ZFS RAIDZ2 pools - total capacity is around 560TB. </div><div class="">We have bonded 10gbps NICS on each node, and I have tested bandwidth with iperf and found that it's what would be expected from this config. </div><div class="">Individual brick performance seems ok, I've tested several bricks using dd and can write a 10GB files at 1.7GB/s. </div><div class=""><div class=""><br class=""></div><div class=""># dd if=/dev/zero of=/brick1/test/test.file bs=1M count=10000</div><div class="">10000+0 records in</div><div class="">10000+0 records out</div><div class="">10485760000 bytes (10 GB, 9.8 GiB) copied, 6.20303 s, 1.7 GB/s</div></div><div class=""><br class=""></div><div class="">Node 1:<br class=""></div><div class=""><div class=""># glusterfs --version</div><div class="">glusterfs 3.12.15</div></div><div class=""><br class=""></div><div class="">Node 2:</div><div class=""><div class=""># glusterfs --version</div><div class="">glusterfs 3.12.14</div></div><div class=""><br class=""></div><div class="">Arbiter:</div><div class=""><div class=""># glusterfs --version</div><div class="">glusterfs 3.12.14</div></div><div class=""><br class=""></div><div class="">Here is our gluster volume status:</div><div class=""><br class=""></div><div class=""><div class=""># gluster volume status</div><div class="">Status of volume: gvAA01</div><div class="">Gluster process TCP Port RDMA Port Online Pid</div><div class="">------------------------------------------------------------------------------</div><div class="">Brick 01-B:/brick1/gvAA01/brick 49152 0 Y 7219</div><div class="">Brick 02-B:/brick1/gvAA01/brick 49152 0 Y 21845</div><div class="">Brick 00-A:/arbiterAA01/gvAA01/bri</div><div class="">ck1 49152 0 Y 6931</div><div class="">Brick 01-B:/brick2/gvAA01/brick 49153 0 Y 7239</div><div class="">Brick 02-B:/brick2/gvAA01/brick 49153 0 Y 9916</div><div class="">Brick 00-A:/arbiterAA01/gvAA01/bri</div><div class="">ck2 49153 0 Y 6939</div><div class="">Brick 01-B:/brick3/gvAA01/brick 49154 0 Y 7235</div><div class="">Brick 02-B:/brick3/gvAA01/brick 49154 0 Y 21858</div><div class="">Brick 00-A:/arbiterAA01/gvAA01/bri</div><div class="">ck3 49154 0 Y 6947</div><div class="">Brick 01-B:/brick4/gvAA01/brick 49155 0 Y 31840</div><div class="">Brick 02-B:/brick4/gvAA01/brick 49155 0 Y 9933</div><div class="">Brick 00-A:/arbiterAA01/gvAA01/bri</div><div class="">ck4 49155 0 Y 6956</div><div class="">Brick 01-B:/brick5/gvAA01/brick 49156 0 Y 7233</div><div class="">Brick 02-B:/brick5/gvAA01/brick 49156 0 Y 9942</div><div class="">Brick 00-A:/arbiterAA01/gvAA01/bri</div><div class="">ck5 49156 0 Y 6964</div><div class="">Brick 01-B:/brick6/gvAA01/brick 49157 0 Y 7234</div><div class="">Brick 02-B:/brick6/gvAA01/brick 49157 0 Y 9952</div><div class="">Brick 00-A:/arbiterAA01/gvAA01/bri</div><div class="">ck6 49157 0 Y 6974</div><div class="">Brick 01-B:/brick7/gvAA01/brick 49158 0 Y 7248</div><div class="">Brick 02-B:/brick7/gvAA01/brick 49158 0 Y 9960</div><div class="">Brick 00-A:/arbiterAA01/gvAA01/bri</div><div class="">ck7 49158 0 Y 6984</div><div class="">Brick 01-B:/brick8/gvAA01/brick 49159 0 Y 7253</div><div class="">Brick 02-B:/brick8/gvAA01/brick 49159 0 Y 9970</div><div class="">Brick 00-A:/arbiterAA01/gvAA01/bri</div><div class="">ck8 49159 0 Y 6993</div><div class="">Brick 01-B:/brick9/gvAA01/brick 49160 0 Y 7245</div><div class="">Brick 02-B:/brick9/gvAA01/brick 49160 0 Y 9984</div><div class="">Brick 00-A:/arbiterAA01/gvAA01/bri</div><div class="">ck9 49160 0 Y 7001</div><div class="">NFS Server on localhost 2049 0 Y 17276</div><div class="">Self-heal Daemon on localhost N/A N/A Y 25245</div><div class="">NFS Server on 02-B 2049 0 Y 9089</div><div class="">Self-heal Daemon on 02-B N/A N/A Y 17838</div><div class="">NFS Server on 00-a 2049 0 Y 15660</div><div class="">Self-heal Daemon on 00-a N/A N/A Y 16218</div><div class=""><br class=""></div><div class="">Task Status of Volume gvAA01</div><div class="">------------------------------------------------------------------------------</div><div class="">There are no active volume tasks</div></div><div class=""><br class=""></div><div class="">And gluster volume info: </div><div class=""><br class=""></div><div class=""><div class=""># gluster volume info</div><div class=""><br class=""></div><div class="">Volume Name: gvAA01</div><div class="">Type: Distributed-Replicate</div><div class="">Volume ID: ca4ece2c-13fe-414b-856c-2878196d6118</div><div class="">Status: Started</div><div class="">Snapshot Count: 0</div><div class="">Number of Bricks: 9 x (2 + 1) = 27</div><div class="">Transport-type: tcp</div><div class="">Bricks:</div><div class="">Brick1: 01-B:/brick1/gvAA01/brick</div><div class="">Brick2: 02-B:/brick1/gvAA01/brick</div><div class="">Brick3: 00-A:/arbiterAA01/gvAA01/brick1 (arbiter)</div><div class="">Brick4: 01-B:/brick2/gvAA01/brick</div><div class="">Brick5: 02-B:/brick2/gvAA01/brick</div><div class="">Brick6: 00-A:/arbiterAA01/gvAA01/brick2 (arbiter)</div><div class="">Brick7: 01-B:/brick3/gvAA01/brick</div><div class="">Brick8: 02-B:/brick3/gvAA01/brick</div><div class="">Brick9: 00-A:/arbiterAA01/gvAA01/brick3 (arbiter)</div><div class="">Brick10: 01-B:/brick4/gvAA01/brick</div><div class="">Brick11: 02-B:/brick4/gvAA01/brick</div><div class="">Brick12: 00-A:/arbiterAA01/gvAA01/brick4 (arbiter)</div><div class="">Brick13: 01-B:/brick5/gvAA01/brick</div><div class="">Brick14: 02-B:/brick5/gvAA01/brick</div><div class="">Brick15: 00-A:/arbiterAA01/gvAA01/brick5 (arbiter)</div><div class="">Brick16: 01-B:/brick6/gvAA01/brick</div><div class="">Brick17: 02-B:/brick6/gvAA01/brick</div><div class="">Brick18: 00-A:/arbiterAA01/gvAA01/brick6 (arbiter)</div><div class="">Brick19: 01-B:/brick7/gvAA01/brick</div><div class="">Brick20: 02-B:/brick7/gvAA01/brick</div><div class="">Brick21: 00-A:/arbiterAA01/gvAA01/brick7 (arbiter)</div><div class="">Brick22: 01-B:/brick8/gvAA01/brick</div><div class="">Brick23: 02-B:/brick8/gvAA01/brick</div><div class="">Brick24: 00-A:/arbiterAA01/gvAA01/brick8 (arbiter)</div><div class="">Brick25: 01-B:/brick9/gvAA01/brick</div><div class="">Brick26: 02-B:/brick9/gvAA01/brick</div><div class="">Brick27: 00-A:/arbiterAA01/gvAA01/brick9 (arbiter)</div><div class="">Options Reconfigured:</div><div class="">cluster.shd-max-threads: 4</div><div class="">performance.least-prio-threads: 16</div><div class="">cluster.readdir-optimize: on</div><div class="">performance.quick-read: off</div><div class="">performance.stat-prefetch: off</div><div class="">cluster.data-self-heal: on</div><div class="">cluster.lookup-unhashed: auto</div><div class="">cluster.lookup-optimize: on</div><div class="">cluster.favorite-child-policy: mtime</div><div class="">server.allow-insecure: on</div><div class="">transport.address-family: inet</div><div class="">client.bind-insecure: on</div><div class="">cluster.entry-self-heal: off</div><div class="">cluster.metadata-self-heal: off</div><div class="">performance.md-cache-timeout: 600</div><div class="">cluster.self-heal-daemon: enable</div><div class="">performance.readdir-ahead: on</div><div class="">diagnostics.brick-log-level: INFO</div><div class="">nfs.disable: off</div></div><div class="gmail-yj6qo"></div><br class="gmail-Apple-interchange-newline"><div class="">Thank you for any assistance. </div><div class=""><br class=""></div><div class="">- Patrick</div></div>
_______________________________________________<br class="">Gluster-users mailing list<br class=""><a href="mailto:Gluster-users@gluster.org" class="">Gluster-users@gluster.org</a><br class="">https://lists.gluster.org/mailman/listinfo/gluster-users</div></blockquote></div><br class=""></div></body></html>