[Gluster-users] High load on glusterfs client
Sebastian.Gumprich at t-systems.com
Sebastian.Gumprich at t-systems.com
Thu Mar 10 11:26:51 UTC 2016
Hi Krutika,
thanks for ypur input. I disabled client-side heal and will monitor if it happens again!
Regards
Sebastian
Von: Krutika Dhananjay [mailto:kdhananj at redhat.com]
Gesendet: Dienstag, 8. März 2016 04:04
An: Gumprich, Sebastian
Cc: gluster-users at gluster.org
Betreff: Re: [Gluster-users] High load on glusterfs client
Could you try disabling client-side heals and see if it works for you?
Here's what you'd need to do:
#gluster volume set <VOL> entry-self-heal off
#gluster volume set <VOL> data-self-heal off
#gluster volume set <VOL> metadata-self-heal off
-Krutika
On Wed, Mar 2, 2016 at 12:37 AM, <Sebastian.Gumprich at t-systems.com<mailto:Sebastian.Gumprich at t-systems.com>> wrote:
Hello everyone,
I’m experiencing high load on our glusterfs clients.
Here’s the setup:
There are to glusterfs server:
Nfs01 and nfs02 with the following configuration:
[root nfs01 ~]# gluster volume info opt
Volume Name: opt
Type: Replicate
Volume ID: 5b77070f-5378-45ec-9eda-5f7dd007ff8a
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: nfs01:/opt/bkk
Brick2: nfs02:/opt/bkk
Options Reconfigured:
performance.readdir-ahead: on
performance.quick-read: off
performance.cache-size: 512MB
performance.cache-refresh-timeout: 10
performance.read-ahead: off
performance.write-behind-window-size: 4MB
network.ping-timeout: 2
performance.io-thread-count: 16
performance.cache-max-file-size: 2MB
performance.md-cache-timeout: 1
Then there are two clients (web01 and web02) that mount the brick via a virtual ip-address (nfs-VIP):
nfs-VIP:/opt on /opt/bkk type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
operating system on all server is CentOS Linux release 7.2.1511 (Core).
Glusterfs version is glusterfs 3.7.6 built on Nov 9 2015 15:20:26
On the brick lies the PHP dynamic webcontent from a typo3 CMS.
On the client (web01) the following is logged in the gluster.log:
iner_08850598886fb5f39c9cf1d269d7e20677f97ede.php>, e09948dd-1e9b-4430-8f55-3df64cda2385 on opt-client-1 and ba80a475-7b83-4c83-bd0c-798a108bfb63 on opt-client-0. Skipping conservative merge on the file.
[2016-03-01 18:40:50.570040] E [MSGID: 108008] [afr-self-heal-entry.c:253:afr_selfheal_detect_gfid_and_type_mismatch] 0-opt-replicate-0: Gfid mismatch detected for <4c6dda77-6a2b-4996-bca4-9ace4cee45cc/News_News_layout_Detail_html_bd113d9c433c8f88376e47547db3b94e698a5ecd.php>, 739ee14c-2d5d-458b-bffd-83595bfcbe6a on opt-client-1 and 5a311733-731e-4478-ad3c-a70fbf66ba30 on opt-client-0. Skipping conservative merge on the file.
[2016-03-01 18:40:50.572992] E [MSGID: 108008] [afr-self-heal-entry.c:253:afr_selfheal_detect_gfid_and_type_mismatch] 0-opt-replicate-0: Gfid mismatch detected for <4c6dda77-6a2b-4996-bca4-9ace4cee45cc/News_News_partial_Detail_FalMediaContainer_9c1b3fd40fca9019726b3f6b8bc04618ffadab7b.php>, bb6907a1-ce80-4e03-92df-6fbc69d24a4d on opt-client-1 and 6f88aa67-cb81-4c26-94b2-e3aaa8704e8d on opt-client-0. Skipping conservative merge on the file.
[2016-03-01 18:40:50.791704] E [MSGID: 108008] [afr-self-heal-entry.c:253:afr_selfheal_detect_gfid_and_type_mismatch] 0-opt-replicate-0: Gfid mismatch detected for <4c6dda77-6a2b-4996-bca4-9ace4cee45cc/Powermail_Form_action_create_f40464a6a7f73d86cda514065167d59a7ddece73.php>, 5e5b224b-ea20-4d38-8504-61b24f5d6a3b on opt-client-1 and fab07af5-2aa5-4873-a6e2-6265ec78e304 on opt-client-0. Skipping conservative merge on the file.
[2016-03-01 18:40:54.085964] E [MSGID: 108008] [afr-self-heal-entry.c:253:afr_selfheal_detect_gfid_and_type_mismatch] 0-opt-replicate-0: Gfid mismatch detected for <4c6dda77-6a2b-4996-bca4-9ace4cee45cc/News_News_action_detail_8d30b654cd8343fe40616b8a2f8a5343b1ed776e.php>, 4d75a687-b9ab-4f97-b698-38668d1981ae on opt-client-1 and 110b315e-2e28-4859-a8b9-e0f1629faa3c on opt-client-0. Skipping conservative merge on the file.
[2016-03-01 18:40:56.153651] E [MSGID: 108008] [afr-self-heal-entry.c:253:afr_selfheal_detect_gfid_and_type_mismatch] 0-opt-replicate-0: Gfid mismatch detected for <4c6dda77-6a2b-4996-bca4-9ace4cee45cc/Powermail_Form_layout_Default_aae217b167ad82f4b1258bb01fa73f305844dbd8.php>, 6f7e2709-8c14-486a-85a2-a3cb48af4ca5 on opt-client-1 and 6ab62408-0406-4834-96b9-a51e18441d4c on opt-client-0. Skipping conservative merge on the file.
[2016-03-01 18:41:05.476126] I [MSGID: 108026] [afr-self-heal-entry.c:593:afr_selfheal_entry_do] 0-opt-replicate-0: performing entry selfheal on 7a922c37-48d0-4dfb-8abb-18a435c948af
[2016-03-01 18:41:05.597093] I [MSGID: 108026] [afr-self-heal-common.c:651:afr_log_selfheal] 0-opt-replicate-0: Completed entry selfheal on 7a922c37-48d0-4dfb-8abb-18a435c948af. source=1 sinks=0
[2016-03-01 18:41:05.790944] E [MSGID: 108008] [afr-self-heal-entry.c:253:afr_selfheal_detect_gfid_and_type_mismatch] 0-opt-replicate-0: Gfid mismatch detected for <4c6dda77-6a2b-4996-bca4-9ace4cee45cc/Powermail_Form_action_form_f0755f8526150f023fd98252b510a40c49586dbd.php>, 118668d9-608a-477a-b655-bcc6c2298bf4 on opt-client-1 and a87943a4-e18a-4642-adff-1ad765496533 on opt-client-0. Skipping conservative merge on the file.
[2016-03-01 18:41:06.649695] W [MSGID: 108008] [afr-self-heal-name.c:359:afr_selfheal_name_gfid_mismatch_check] 0-opt-replicate-0: GFID mismatch for <gfid:4c6dda77-6a2b-4996-bca4-9ace4cee45cc>/Powermail_Form_action_form_f0755f8526150f023fd98252b510a40c49586dbd.php 118668d9-608a-477a-b655-bcc6c2298bf4 on opt-client-1 and a87943a4-e18a-4642-adff-1ad765496533 on opt-client-0
[2016-03-01 18:41:06.661277] W [fuse-bridge.c:462:fuse_entry_cbk] 0-glusterfs-fuse: 184415191: LOOKUP() /releases/1.0.1/typo3temp/Cache/Code/fluid_template/Powermail_Form_action_form_f0755f8526150f023fd98252b510a40c49586dbd.php => -1 (Input/output error)
[2016-03-01 18:41:06.680968] W [fuse-bridge.c:462:fuse_entry_cbk] 0-glusterfs-fuse: 184422672: LOOKUP() /releases/1.0.1/typo3temp/Cache/Code/fluid_template/Powermail_Form_action_form_f0755f8526150f023fd98252b510a40c49586dbd.php => -1 (Input/output error)
[2016-03-01 18:41:06.680222] W [MSGID: 108008] [afr-self-heal-name.c:359:afr_selfheal_name_gfid_mismatch_check] 0-opt-replicate-0: GFID mismatch for <gfid:4c6dda77-6a2b-4996-bca4-9ace4cee45cc>/Powermail_Form_action_form_f0755f8526150f023fd98252b510a40c49586dbd.php 118668d9-608a-477a-b655-bcc6c2298bf4 on opt-client-1 and a87943a4-e18a-4642-adff-1ad765496533 on opt-client-0
There are many more of these entries, this is just a really small excerpt. The files that have a mismatch are tempary php-cache files.
When I delete these files, the load goes down and the files in the volume heal info become less (see below).
Here’s the output of gluster volume heal opt info. Note that this output is *after* deleting most of the cache files, before that there were many more entries.
[root at nfs01 fluid_template]# gluster volume heal opt info
Brick nfs01:/opt/bkk
<gfid:23fc1027-0aec-4b84-9ffb-c164a9d43d20>
<gfid:92cb9dde-2721-4c11-93a6-2582ed9edd5d>
<gfid:a0dbcf8a-67f8-4870-ab57-3d5d1218601c>
<gfid:947cbcc4-1978-4b9e-b726-2acd0a4fda5a>
<gfid:440b9b36-bad5-4cb6-b935-8a004132340a>
Number of entries: 5
Brick nfs02:/opt/bkk
/releases/1.0.1/typo3temp/Cache/Code/fluid_template - Possibly undergoing heal
/releases/1.0.1/typo3temp/Cache/Code/fluid_template/Powermail_Form_partial_Form_Select_7bf809152d985037de761d8d375d286e44b4f13a.php
/releases/1.0.1/typo3temp/Cache/Code/fluid_template/Powermail_Form_partial_Misc_GoogleAdwordsConversion_f7254aeb252ea43cd89f9051b5a43109d47938f1.php
/releases/1.0.1/typo3temp/Cache/Code/fluid_template/Powermail_Form_partial_Form_File_4b3a3f667c475577847aa77118f3af5666ecb2c6.php
/releases/1.0.1/typo3temp/Cache/Code/fluid_template/Powermail_Form_action_form_f0755f8526150f023fd98252b510a40c49586dbd.php
/releases/1.0.1/typo3temp/Cache/Code/fluid_template/Powermail_Form_partial_Form_Input_b3e08744b23680f0a14e60e716f9994d9580e3f4.php
/releases/1.0.1/typo3temp/Cache/Code/fluid_template/News_News_action_detail_8d30b654cd8343fe40616b8a2f8a5343b1ed776e.php
/releases/1.0.1/typo3temp/Cache/Code/fluid_template/News_News_layout_Detail_html_bd113d9c433c8f88376e47547db3b94e698a5ecd.php
/releases/1.0.1/typo3temp/Cache/Code/fluid_template/News_News_partial_Detail_Opengraph_b98680f3686dccf00e22181e66d11ca9de7a44bd.php
/releases/1.0.1/typo3temp/Cache/Code/fluid_template/News_News_partial_Detail_FalMediaContainer_9c1b3fd40fca9019726b3f6b8bc04618ffadab7b.php
/releases/1.0.1/typo3temp/Cache/Code/fluid_template/News_News_partial_Detail_MediaContainer_08850598886fb5f39c9cf1d269d7e20677f97ede.php
/releases/1.0.1/typo3temp/Cache/Code/fluid_template/Powermail_Form_partial_Form_Text_c10766db8d335d5cd9555878aef5d886dcb6926e.php
/releases/1.0.1/typo3temp/Cache/Code/fluid_template/Powermail_Form_action_create_f40464a6a7f73d86cda514065167d59a7ddece73.php
/releases/1.0.1/typo3temp/Cache/Code/fluid_template/Powermail_Form_layout_Default_aae217b167ad82f4b1258bb01fa73f305844dbd8.php
/releases/1.0.1/typo3temp/Cache/Code/fluid_template/Powermail_Form_partial_Misc_HoneyPod_fc83c414f744612c3cb44c8827372a30f17791d0.php
/releases/1.0.1/typo3temp/Cache/Code/fluid_template/Powermail_Form_partial_Form_Textarea_39d24d8e3e2813636dfff2a89b7cefb8e9117c97.php
/releases/1.0.1/typo3temp/Cache/Code/fluid_template/Powermail_Form_partial_Form_Submit_86e69c50ccebf20584db2e3c74859373c53d320f.php
/releases/1.0.1/typo3temp/Cache/Code/fluid_template/Powermail_Form_partial_Form_Check_a2a11c64ac58dab16eab29e4cda88518c15a4d25.php
/releases/1.0.1/typo3temp/Cache/Code/fluid_template/Powermail_Form_partial_Misc_FormError_7cade8e8fc1d23c761360c0efbe8cb145eed2e39.php
/releases/1.0.1/typo3temp/Cache/Code/fluid_template/Powermail_Form_partial_Form_Radio_d038a263b5ea81f0f7795e1e47516e5e2937cbd9.php
/releases/1.0.1/typo3temp/Cache/Code/fluid_template/Powermail_Form_partial_Form_Hidden_a7651f5498e0d36b4e2eae5fca015ef0e9365067.php
Number of entries: 21
Here are some heal-infos during the high load:
Starting time of crawl: Tue Mar 1 19:09:45 2016
Ending time of crawl: Tue Mar 1 19:09:52 2016
Type of crawl: INDEX
No. of entries healed: 2
No. of entries in split-brain: 0
No. of heal failed entries: 168
And here’s the performance monitoring info during 60 seconds of high load:
Brick: nfs01:/opt/bkk
------------------------------
Cumulative Stats:
Block Size: 1b+ 2b+ 4b+
No. of Reads: 18 33 308
No. of Writes: 64 88 2994
Block Size: 8b+ 16b+ 32b+
No. of Reads: 117 154 15612
No. of Writes: 370 369 1432
Block Size: 64b+ 128b+ 256b+
No. of Reads: 3721 12884 19917
No. of Writes: 7585 900221 135011
Block Size: 512b+ 1024b+ 2048b+
No. of Reads: 23929 12251 19835
No. of Writes: 63067 30950 23540
Block Size: 4096b+ 8192b+ 16384b+
No. of Reads: 9096 9449 5566
No. of Writes: 40455 36397 13926
Block Size: 32768b+ 65536b+ 131072b+
No. of Reads: 5159 6055 34001
No. of Writes: 20722 6600 12762
%-latency Avg-latency Min-Latency Max-Latency No. of calls Fop
--------- ----------- ----------- ----------- ------------ ----
0.00 0.00 us 0.00 us 0.00 us 212065 FORGET
0.00 0.00 us 0.00 us 0.00 us 4118713 RELEASE
0.00 0.00 us 0.00 us 0.00 us 9931097 RELEASEDIR
0.00 47.00 us 47.00 us 47.00 us 1 GETXATTR
0.00 124.00 us 124.00 us 124.00 us 1 XATTROP
0.00 144.00 us 144.00 us 144.00 us 1 UNLINK
0.00 37.17 us 35.00 us 41.00 us 6 STATFS
0.01 36.84 us 32.00 us 61.00 us 19 FSTAT
0.01 48.80 us 44.00 us 62.00 us 20 STAT
0.05 75.52 us 54.00 us 156.00 us 48 REMOVEXATTR
0.05 81.69 us 69.00 us 146.00 us 48 SETATTR
0.05 37.16 us 14.00 us 436.00 us 109 FLUSH
0.05 38.31 us 18.00 us 115.00 us 108 FINODELK
0.06 73.16 us 46.00 us 170.00 us 63 OPEN
0.10 38.70 us 20.00 us 167.00 us 195 INODELK
0.11 335.37 us 40.00 us 563.00 us 27 READDIR
0.15 50.58 us 27.00 us 392.00 us 232 OPENDIR
0.15 81.74 us 35.00 us 215.00 us 144 FXATTROP
0.16 130.53 us 68.00 us 697.00 us 100 WRITE
0.93 1532.04 us 171.00 us 15356.00 us 48 CREATE
1.40 284.90 us 25.00 us 1146.00 us 390 READDIRP
10.20 52.29 us 28.00 us 2323.00 us 15482 READLINK
18.94 33.25 us 11.00 us 27242.00 us 45219 ENTRYLK
67.58 93.74 us 32.00 us 27521.00 us 57223 LOOKUP
Duration: 6492593 seconds
Data Read: 5679496942 bytes
Data Written: 4510536316 bytes
Interval 1 Stats:
Block Size: 4096b+ 8192b+ 16384b+
No. of Reads: 0 0 0
No. of Writes: 1 50 2
Block Size: 32768b+
No. of Reads: 0
No. of Writes: 47
%-latency Avg-latency Min-Latency Max-Latency No. of calls Fop
--------- ----------- ----------- ----------- ------------ ----
0.00 0.00 us 0.00 us 0.00 us 48 FORGET
0.00 0.00 us 0.00 us 0.00 us 104 RELEASE
0.00 0.00 us 0.00 us 0.00 us 231 RELEASEDIR
0.00 124.00 us 124.00 us 124.00 us 1 XATTROP
0.00 144.00 us 144.00 us 144.00 us 1 UNLINK
0.00 36.40 us 35.00 us 37.00 us 5 STATFS
0.01 36.84 us 32.00 us 61.00 us 19 FSTAT
0.01 48.80 us 44.00 us 62.00 us 20 STAT
0.05 75.52 us 54.00 us 156.00 us 48 REMOVEXATTR
0.05 37.69 us 14.00 us 436.00 us 101 FLUSH
0.06 81.69 us 69.00 us 146.00 us 48 SETATTR
0.06 74.26 us 46.00 us 170.00 us 53 OPEN
0.06 38.31 us 18.00 us 115.00 us 108 FINODELK
0.10 311.33 us 40.00 us 563.00 us 24 READDIR
0.11 38.70 us 20.00 us 167.00 us 195 INODELK
0.16 50.58 us 27.00 us 392.00 us 231 OPENDIR
0.17 81.74 us 35.00 us 215.00 us 144 FXATTROP
0.18 130.53 us 68.00 us 697.00 us 100 WRITE
1.03 1532.04 us 171.00 us 15356.00 us 48 CREATE
1.56 284.90 us 25.00 us 1146.00 us 390 READDIRP
11.31 52.27 us 28.00 us 2323.00 us 15395 READLINK
18.24 33.67 us 11.00 us 27242.00 us 38571 ENTRYLK
66.83 94.49 us 32.00 us 27521.00 us 50338 LOOKUP
Duration: 68 seconds
Data Read: 0 bytes
Data Written: 3347998 bytes
Brick: nfs02:/opt/bkk
------------------------------
Cumulative Stats:
Block Size: 1b+ 2b+ 4b+
No. of Reads: 26 49 541
No. of Writes: 64 94 3848
Block Size: 8b+ 16b+ 32b+
No. of Reads: 218 205 1267
No. of Writes: 452 417 1448
Block Size: 64b+ 128b+ 256b+
No. of Reads: 6097 39042 11111
No. of Writes: 8617 924503 136768
Block Size: 512b+ 1024b+ 2048b+
No. of Reads: 120819 37802 16506
No. of Writes: 64399 35996 24999
Block Size: 4096b+ 8192b+ 16384b+
No. of Reads: 76162 20449 10948
No. of Writes: 41302 37488 14034
Block Size: 32768b+ 65536b+ 131072b+
No. of Reads: 7733 7306 31648
No. of Writes: 20849 6750 12886
%-latency Avg-latency Min-Latency Max-Latency No. of calls Fop
--------- ----------- ----------- ----------- ------------ ----
0.00 0.00 us 0.00 us 0.00 us 231622 FORGET
0.00 0.00 us 0.00 us 0.00 us 6123626 RELEASE
0.00 0.00 us 0.00 us 0.00 us 10869781 RELEASEDIR
0.00 116.00 us 116.00 us 116.00 us 1 XATTROP
0.00 40.00 us 38.00 us 43.00 us 6 STATFS
0.01 40.85 us 33.00 us 97.00 us 13 FSTAT
0.01 46.26 us 29.00 us 100.00 us 23 STAT
0.04 73.96 us 53.00 us 150.00 us 48 REMOVEXATTR
0.04 76.85 us 61.00 us 99.00 us 48 SETATTR
0.04 35.83 us 28.00 us 155.00 us 103 UNLINK
0.04 36.28 us 15.00 us 142.00 us 112 FINODELK
0.05 38.68 us 13.00 us 220.00 us 133 FLUSH
0.09 324.11 us 28.00 us 589.00 us 28 READDIR
0.12 85.88 us 36.00 us 215.00 us 144 FXATTROP
0.12 124.55 us 76.00 us 192.00 us 100 WRITE
0.13 78.76 us 19.00 us 529.00 us 161 GETXATTR
0.20 78.65 us 43.00 us 384.00 us 261 OPEN
0.23 54.09 us 2.00 us 260.00 us 426 OPENDIR
0.60 1261.23 us 174.00 us 10655.00 us 48 CREATE
0.66 81.15 us 17.00 us 9254.00 us 819 INODELK
1.21 279.97 us 23.00 us 1587.00 us 434 READDIRP
8.10 52.61 us 27.00 us 1283.00 us 15496 READLINK
15.48 34.54 us 10.00 us 13810.00 us 45133 ENTRYLK
72.84 104.30 us 14.00 us 14613.00 us 70322 LOOKUP
Duration: 6492593 seconds
Data Read: 6308054987 bytes
Data Written: 4579768980 bytes
Interval 1 Stats:
Block Size: 4096b+ 8192b+ 16384b+
No. of Reads: 0 0 0
No. of Writes: 1 50 2
Block Size: 32768b+
No. of Reads: 0
No. of Writes: 47
%-latency Avg-latency Min-Latency Max-Latency No. of calls Fop
--------- ----------- ----------- ----------- ------------ ----
0.00 0.00 us 0.00 us 0.00 us 48 FORGET
0.00 0.00 us 0.00 us 0.00 us 286 RELEASE
0.00 0.00 us 0.00 us 0.00 us 400 RELEASEDIR
0.00 116.00 us 116.00 us 116.00 us 1 XATTROP
0.00 40.40 us 38.00 us 43.00 us 5 STATFS
0.01 40.85 us 33.00 us 97.00 us 13 FSTAT
0.01 46.14 us 29.00 us 100.00 us 22 STAT
0.04 73.96 us 53.00 us 150.00 us 48 REMOVEXATTR
0.04 76.85 us 61.00 us 99.00 us 48 SETATTR
0.04 35.83 us 28.00 us 155.00 us 103 UNLINK
0.05 36.28 us 15.00 us 142.00 us 112 FINODELK
0.05 39.64 us 13.00 us 220.00 us 117 FLUSH
0.10 330.28 us 28.00 us 589.00 us 25 READDIR
0.14 85.88 us 36.00 us 215.00 us 144 FXATTROP
0.14 124.55 us 76.00 us 192.00 us 100 WRITE
0.14 79.13 us 19.00 us 529.00 us 159 GETXATTR
0.22 79.51 us 43.00 us 384.00 us 235 OPEN
0.25 54.27 us 2.00 us 260.00 us 400 OPENDIR
0.70 1261.23 us 174.00 us 10655.00 us 48 CREATE
0.77 81.15 us 17.00 us 9254.00 us 819 INODELK
1.26 283.31 us 23.00 us 1587.00 us 386 READDIRP
7.67 52.88 us 27.00 us 1283.00 us 12595 READLINK
15.47 34.83 us 10.00 us 13810.00 us 38576 ENTRYLK
72.91 105.05 us 14.00 us 14613.00 us 60295 LOOKUP
Duration: 68 seconds
Data Read: 0 bytes
Data Written: 3347998 bytes
Can anybody tell me how to fix the problem with the high load and these cache files?
Thanks in advance!
Regards
Sebastian
_______________________________________________
Gluster-users mailing list
Gluster-users at gluster.org<mailto:Gluster-users at gluster.org>
http://www.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160310/5389b31f/attachment-0001.html>
More information about the Gluster-users
mailing list