<html><head><meta http-equiv="Content-Type" content="text/html; charset=us-ascii"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">Hi Xavi,<div class=""><br class=""></div><div class="">I went back and checked ZFS stats, and it seemed to be behaving normal during the event (one glusterfsd started to use 100% CPU at around 1:40, where zfs target size started to increase):</div><div class=""><br class=""></div><div class=""><img apple-inline="yes" id="3D89C67E-FDFC-4C4E-938B-6B2091A7E206" width="1249" height="558" src="cid:9F17D897-B4F0-44B0-BE17-65D100748CD5@akunacapital.local" class=""></div><div class=""><br class=""></div><div class="">So looks like ZFS was able to always: 1. keep data and meta usage under the target (half of the total physical RAM, which is 32GB). 2. Release RAM when the OS required.</div><div class=""><br class=""></div><div class="">Today I observed another hanging occurred on a different server. htop showed that most of the glusterfsd processes were stuck in D status using 0 CPU while one or two of them was in R status using 100%. I also saw an updatedb.mlocate in D status with 100% CPU (scheduled daily cron job). I am not sure if they are related. But since I don't use mlocate, I disabled that.</div><div class=""><br class=""></div><div class="">Thanks,</div><div class="">Yuhao<br class=""><div><br class=""><blockquote type="cite" class=""><div class="">On Aug 23, 2018, at 18:28, Xavi Hernandez <<a href="mailto:jahernan@redhat.com" class="">jahernan@redhat.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div dir="auto" style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 14px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""><div class="">Hi Yuhao,</div><div dir="auto" class=""><br class=""></div><div dir="auto" class="">sorry for the late answer. I've had holidays and just returned. <br class=""><br class=""><div class="gmail_quote" dir="auto"><div dir="ltr" class="">On Wed, 8 Aug 2018, 07:49 Yuhao Zhang, <<a href="mailto:zzyzxd@gmail.com" class="">zzyzxd@gmail.com</a>> wrote:<br class=""></div><blockquote class="gmail_quote" style="margin: 0px 0px 0px 0.8ex; border-left-width: 1px; border-left-style: solid; border-left-color: rgb(204, 204, 204); padding-left: 1ex;"><div style="word-wrap: break-word; line-break: after-white-space;" class="">Hi Xavi,<div class=""><br class=""></div><div class="">Thank you for the suggestions, these are extremely helpful. I haven't thought it could be ZFS problem. I went back and checked a longer monitoring window and now I can see a pattern. Please see this attached Grafana screenshot (also available here: <a href="https://cl.ly/070J2y3n1u0F" target="_blank" rel="noreferrer" class="">https://cl.ly/070J2y3n1u0F</a> . Note that the data gaps were when I took down the server for rebooting):</div><div class=""><br class=""></div><div class=""><img id="m_7243594476962546097E2C98C60-010D-41C6-A758-54A51DE54118" width="1249" height="586" src="cid:B5E7D357-9D6B-4E75-B715-830927DE979F@akunacapital.local" class=""><br class=""><div class=""><br class=""></div><div class="">Between 8/4 - 8/6, I tried two transfer tests, and experienced 2 the gluster hanging problems. One during the first transfer, and another one happened shortly after the second transfer. I blocked both in pink lines. </div><div class=""><br class=""></div><div class="">Looks like during my transfer tests, free memory was almost exhausted. The system has a very high cached memory, which I think was due to ZFS ARC. However, I am under the impression that ZFS will release space from ARC if it observes low system available memory. I am not sure why it didn't do that.</div></div></div></blockquote></div></div><div dir="auto" class=""><br class=""></div><div dir="auto" class="">Yes, it should release memory, but for some reason I don't understand, when there's high metadata load, it's not able to release the allocated memory fast enough (or so it seems). I've observed high CPU utilization by a ZFS process at this point.</div><div dir="auto" class=""><br class=""></div><div dir="auto" class=""><div class="gmail_quote" dir="auto"><blockquote class="gmail_quote" style="margin: 0px 0px 0px 0.8ex; border-left-width: 1px; border-left-style: solid; border-left-color: rgb(204, 204, 204); padding-left: 1ex;"><div style="word-wrap: break-word; line-break: after-white-space;" class=""><div class=""><div class=""> </div><div class=""><br class=""></div><div class="">I did't tweak related ZFS parameters. zfs_arc_max was set to 0 (default value). According to doc, it is "Max arc size of ARC in bytes. If set to 0 then it will consume 1/2 of system RAM." So it appeared that this setting didn't work.</div></div></div></blockquote></div></div><div dir="auto" class=""><br class=""></div><div dir="auto" class="">From my experience, with high metadata load this limit is not respected. Using 1/8 of system RAM seemed to keep memory consumption under control, at least for the workloads I used.</div><div dir="auto" class=""><br class=""></div><div dir="auto" class="">In theory, ZFS 0.7.x.y should solve the memory management problems, but I haven't tested it. </div><div dir="auto" class=""><br class=""></div><div dir="auto" class=""><div class="gmail_quote" dir="auto"><blockquote class="gmail_quote" style="margin: 0px 0px 0px 0.8ex; border-left-width: 1px; border-left-style: solid; border-left-color: rgb(204, 204, 204); padding-left: 1ex;"><div style="word-wrap: break-word; line-break: after-white-space;" class=""><div class=""><div class=""><br class=""></div><div class="">When the server was under heavy IO, the used memory was instead decreased, which I can't explain.</div></div></div></blockquote></div></div><div dir="auto" class=""><br class=""></div><div dir="auto" class="">I've only seen this problem when accessing large amounts of different files (typical on a copy, rsync or find on a volume with thousands or millions of files and directories). However, high IO on small set of files doesn't cause any trouble.</div><div dir="auto" class=""><br class=""></div><div dir="auto" class="">It's related with caching of metadata, so high IO on a small set of files doesn't require much metadata. </div><div dir="auto" class=""><br class=""></div><div dir="auto" class=""><div class="gmail_quote" dir="auto"><blockquote class="gmail_quote" style="margin: 0px 0px 0px 0.8ex; border-left-width: 1px; border-left-style: solid; border-left-color: rgb(204, 204, 204); padding-left: 1ex;"><div style="word-wrap: break-word; line-break: after-white-space;" class=""><div class=""><div class=""><br class=""></div><div class="">May I ask if you, or anyone else in this group, has recommendation on ZFS settings for my setup? My server has 64GB physical memory and 150GB SSD space reserved for L2_ARC.The zpool has 6 vdevs and each has 12TB * 10 hard drives on raidz2. Total usable space in the zpool is 482TB.</div></div></div></blockquote></div></div><div dir="auto" class=""><br class=""></div><div dir="auto" class="">As I said, I would try with 1/8 of system memory for ARC (it will use more than that anyway). A drop cache also helps when memory is getting exhausted. It causes ZFS to release memory faster, though I don't consider it a good solution.</div><div dir="auto" class=""><br class=""></div><div dir="auto" class="">Also make sure that zfs_txg_timeout is set to 5 or a similar value to avoid long disk access bursts. Other options to consider, depending on the use case, are: zfs_disable_prefetch=1 and zfs_nocacheflush=1.</div><div dir="auto" class=""><br class=""></div><div dir="auto" class="">For better performance with gluster, xattr option on ZFS datasets should be set to "sa", but this needs to be done on volume creation, before creating files. Otherwise it will only be applied to newer files. To use "sa" safely, version 0.6.5.8 or higher should be used. </div><div dir="auto" class=""><br class=""></div><div dir="auto" class="">Xavi</div><div dir="auto" class=""><br class=""></div><div dir="auto" class=""><div class="gmail_quote" dir="auto"><blockquote class="gmail_quote" style="margin: 0px 0px 0px 0.8ex; border-left-width: 1px; border-left-style: solid; border-left-color: rgb(204, 204, 204); padding-left: 1ex;"><div style="word-wrap: break-word; line-break: after-white-space;" class=""><div class=""><div class=""><br class=""></div><div class="">Thank you,</div><div class="">Yuhao</div><div class=""><br class=""></div><div class=""><blockquote type="cite" class=""><div class="">On Aug 7, 2018, at 01:36, Xavi Hernandez <<a href="mailto:jahernan@redhat.com" target="_blank" rel="noreferrer" class="">jahernan@redhat.com</a>> wrote:</div><br class="m_7243594476962546097Apple-interchange-newline"><div class=""><div dir="auto" class=""><div class="">Hi Yuhao, <br class=""><br class=""><div class="gmail_quote"><div dir="ltr" class="">On Mon, 6 Aug 2018, 15:26 Yuhao Zhang, <<a href="mailto:zzyzxd@gmail.com" target="_blank" rel="noreferrer" class="">zzyzxd@gmail.com</a>> wrote:<br class=""></div><blockquote class="gmail_quote" style="margin: 0px 0px 0px 0.8ex; border-left-width: 1px; border-left-style: solid; border-left-color: rgb(204, 204, 204); padding-left: 1ex;"><div style="word-wrap: break-word; line-break: after-white-space;" class=""><div class="">Hello,</div><div class=""><br class=""></div>I just experienced another hanging one hour ago and the server was not even under heavy IO.<div class=""><br class=""></div><div class="">Atin, I attached the process monitoring results and another statedump.</div><div class=""><br class=""></div><div class="">Xavi, ZFS was fine, during the hanging, I can still write directly to the ZFS volume. My ZFS version: ZFS: Loaded module v0.6.5.6-0ubuntu16, ZFS pool version 5000, ZFS filesystem version 5</div></div></blockquote></div></div><div dir="auto" class=""><br class=""></div><div dir="auto" class="">I highly recommend you to upgrade to version 0.6.5.8 at least. It fixes a kernel panic that can happen when used with gluster. However this is not your current problem.</div><div dir="auto" class=""><br class=""></div><div dir="auto" class="">Top statistics show low available memory and high CPU utilization of kswapd process (along with one of the gluster processes). I've seen frequent memory management problems with ZFS. Have you configured any ZFS parameters? It's highly recommendable to tweak some memory limits.</div><div dir="auto" class=""><br class=""></div><div dir="auto" class="">If that were the problem, there's one thing that should alleviate it (and see if it could be related):</div><div dir="auto" class=""><br class=""></div><div dir="auto" class="">echo 3 >/proc/sys/vm/drop_caches</div><div dir="auto" class=""><br class=""></div><div dir="auto" class="">This should be done on all bricks from time to time. You can wait until the problem appears, but in this case the recovery time can be larger. </div><div dir="auto" class=""><br class=""></div><div dir="auto" class="">I think this should fix the high CPU usage of kswapd. If so, we'll need to tweak some ZFS parameters.</div><div dir="auto" class=""><br class=""></div><div dir="auto" class="">I'm not sure if the high CPU usage of gluster could be related to this or not.</div><div dir="auto" class=""><br class=""></div><div dir="auto" class="">Xavi</div><div dir="auto" class=""><div class="gmail_quote"><blockquote class="gmail_quote" style="margin: 0px 0px 0px 0.8ex; border-left-width: 1px; border-left-style: solid; border-left-color: rgb(204, 204, 204); padding-left: 1ex;"><div style="word-wrap: break-word; line-break: after-white-space;" class=""><div class=""><div class=""><br class=""></div><div class="">Thank you,</div><div class="">Yuhao</div><div class=""></div></div></div><div style="word-wrap: break-word; line-break: after-white-space;" class=""><div class=""><div class=""></div></div></div><div style="word-wrap: break-word; line-break: after-white-space;" class=""><div class=""><div class=""></div></div></div><div style="word-wrap: break-word; line-break: after-white-space;" class=""><div class=""><div class=""></div></div></div></blockquote></div></div></div></div></blockquote></div><br class=""></div></div></blockquote></div></div></div><span id="cid:%3C%3E"><Image 2018-08-07 at 23.59.09.png></span></div></blockquote></div><br class=""></div></body></html>