[Gluster-users] performance.cache-size for high-RAM clients/servers, other tweaks for performance, and improvements to Gluster docs

Sun Apr 8 07:00:28 UTC 2018

+Raghavendra and Manoj  for their insights.

On 6 April 2018 at 13:53, Artem Russakovskii <archon810 at gmail.com> wrote:

> I restarted rsync, and this has been sitting there for almost a minute,
> barely moved several bytes in that time:
>
> 2014/11/545b06baa3d98/com.google.android.apps.inputmethod.zhuyin-2.1.0.
> 79226761-armeabi-v7a-175-minAPI14.apk
>       6,389,760  45%   18.76kB/s    0:06:50
>
> I straced each of the 3 processes rsync created and saw this (note: every
> time there were several seconds of no output, I ctrl-C'ed and detached from
> strace):
>
> citadel:/home/archon810 # strace -p 16776
> Process 16776 attached
> select(6, [5], [4], [5], {45, 293510})  = 1 (out [4], left {44, 71342})
> write(4, "\4\200\0\7\3513>\2755\360[\372\317\337DZ\36\324\300o\235\
> 377\247\367\177%\37\226\352\377\256\351"..., 32776) = 32776
> ioctl(1, TIOCGPGRP, [16776])            = 0
> write(1, "\r      4,292,608  30%   27.07kB/"..., 46) = 46
> select(6, [5], [4], [5], {60, 0})       = 1 (out [4], left {59, 999998})
> write(4, "\4\200\0\7\270\224\277\24\31\247f\32\233x\t\276l\f-\254\r\
> 246\324\360\30\235\350\6\34\304\230\242"..., 32776) = 32776
> select(6, [5], [4], [5], {60, 0})       = 1 (out [4], left {59, 999998})
> write(4, "\4\200\0\7\346_\363\36\33\320}Dd\5_\327\250\237i\242?B\
> 276e\245\202Z\213\301[\25S"..., 32776) = 32776
> select(6, [5], [4], [5], {60, 0})       = 1 (out [4], left {59, 999998})
> write(4, "\4\200\0\7\330\303\221\357\225\37h\373\366X\306L\f>\234\\
> %n\253\266\5\372c\257>V\366\255"..., 32776) = 32776
> select(6, [5], [4], [5], {60, 0})       = 1 (out [4], left {59, 999998})
> write(4, "\4\200\0\7i\301\17u\224{/O\213\330\33\317\272\246\221\22\
> 261|w\244\5\307|\21\373\v\356k"..., 32776) = 32776
> select(6, [5], [4], [5], {60, 0})       = 1 (out [4], left {59, 999998})
> write(4, "\4\200\0\7\270\277\233\206n\304:\362_\213~\356bm\5\350\
> 337\26\203\225\332\277\372\275\247<\307\22"..., 32776) = 32776
> read(3, "\316\214\260\341:\263P\214\373n\313\10\333
> }\323\364Q\353\r\232d\204\257\\Q\306/\277\253/\356"..., 262144) = 262144
> select(6, [5], [4], [5], {60, 0})       = 1 (out [4], left {59, 999998})
> write(4, "\4\200\0\7\314\233\274S08\330\276\226\267\233\360rp\
> 210x)\320\0314\223\323\3335Y\312\313\307"..., 32776) = 32776
> select(6, [5], [4], [5], {60, 0})       = 1 (out [4], left {59, 999998})
> write(4, "\4\200\0\7\316\214\260\341:\263P\214\373n\313\10\333
> }\323\364Q\353\r\232d\204\257\\Q\306/"..., 32776) = 32776
> select(6, [5], [4], [5], {60, 0}^CProcess 16776 detached
>  <detached ...>
> citadel:/home/archon810 # strace -p 16777
> Process 16777 attached
> select(4, [3], [], [3], {38, 210908}^CProcess 16777 detached
>  <detached ...>
> citadel:/home/archon810 # strace -p 16776
> Process 16776 attached
> select(6, [5], [4], [5], {48, 295874}^CProcess 16776 detached
>  <detached ...>
> citadel:/home/archon810 # strace -p 16778
> Process 16778 attached
> select(1, [0], [], [0], {60, 0})        = 1 (in [0], left {59, 999996})
> read(0, "\0\200\0\0\4\200\0\7\3508\343\204\207\255\4\212y\230&&\372\30*\322\f\325v\335\230
> \16v"..., 32768) = 32768
> select(1, [0], [], [0], {60, 0})        = 1 (in [0], left {59, 999998})
> read(0, "\373\30\2\2667\371\207)", 8)   = 8
> select(1, [0], [], [0], {60, 0})        = 1 (in [0], left {59, 999999})
> read(0, "\0\200\0\0\4\200\0\7\6\213\2223\233\36-\350,\303\0\234\7`
> \317\276H\353u\217\275\316\333@"..., 32768) = 32768
> select(1, [0], [], [0], {60, 0})        = 1 (in [0], left {59, 999999})
> read(0, "\375\33\367_\357\330\362\222", 8) = 8
> select(1, [0], [], [0], {60, 0})        = 1 (in [0], left {59, 999999})
> read(0, "\0\200\0\0\4\200\0\7`Nv\355\275\336wzQ\365\264\364\20AX\
> 365DG\372\311\216\212\375\276"..., 32768) = 32768
> select(1, [0], [], [0], {60, 0})        = 1 (in [0], left {59, 999999})
> read(0, "\374)\300\264}\21\226s", 8)    = 8
> select(1, [0], [], [0], {60, 0})        = 1 (in [0], left {59, 999999})
> read(0, "\0\200\0\0\4\200\0\7\10:\v\342O\305\374\5:Y+ \250\315\24\202J-@
> \256WC\320\371"..., 32768) = 32768
> select(1, [0], [], [0], {60, 0})        = 1 (in [0], left {59, 999999})
> read(0, "\3023\24O\343y\312\204", 8)    = 8
> select(1, [0], [], [0], {60, 0})        = 1 (in [0], left {59, 999999})
> read(0, "\0\200\0\0\4\200\0\7\27\22^\n/S.\215\362T\f\257Q\207z\241~B\3\32\32344\17"...,
> 32768) = 32768
> select(1, [0], [], [0], {60, 0})        = 1 (in [0], left {59, 999998})
> read(0, "\367P\222\262\224\17\25\250", 8) = 8
> select(1, [0], [], [0], {60, 0})        = 1 (in [0], left {59, 999999})
> read(0, "\0\200\0\0\4\200\0\7FujR\213\372\341E\232\360\n\257\323\
> 233>\364\245\37\3\31\314\20\206\362"..., 32768) = 32768
> select(1, [0], [], [0], {60, 0})        = 1 (in [0], left {59, 999999})
> read(0, "\203o\300\341\37\340(8", 8)    = 8
> select(1, [0], [], [0], {60, 0})        = 1 (in [0], left {59, 999998})
> read(0, "\0\200\0\0\4\200\0\7n\211\357\301\217\210\23\341$\342d8\
> 25N\2035[\260\1\206B\206!\2"..., 32768) = 32768
> select(1, [0], [], [0], {60, 0})        = 1 (in [0], left {59, 999999})
> read(0, "|\222\223\336\201w\325\356", 8) = 8
> select(1, [0], [], [0], {60, 0})        = 1 (in [0], left {59, 999999})
> read(0, "\0\200\0\0\4\200\0\7\220\216Y\343\362\366\231\372?\
> 334N^\303\35\374cC;\vtx\231<w"..., 32768) = 32768
> select(1, [0], [], [0], {60, 0})        = 1 (in [0], left {59, 999998})
> read(0, ";k\v\314\21\375\3\274", 8)     = 8
> write(3, "\3508\343\204\207\255\4\212y\230&&\372\30*\322\f\325v\335\230
> \16v\213O//\332\4\24\24"..., 262144^C
>
> I'm really not sure what to make of this. In the time I wrote the above,
> the file still hasn't finished copying.
>
> 2014/11/545b06baa3d98/com.google.android.apps.inputmethod.zhuyin-2.1.0.
> 79226761-armeabi-v7a-175-minAPI14.apk
>      10,321,920  73%   33.31kB/s    0:01:53
>
>
> Sincerely,
> Artem
>
> --
> Founder, Android Police <http://www.androidpolice.com>, APK Mirror
> <http://www.apkmirror.com/>, Illogical Robot LLC
> beerpla.net | +ArtemRussakovskii
> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR
> <http://twitter.com/ArtemR>
>
> On Fri, Apr 6, 2018 at 1:12 AM, Artem Russakovskii <archon810 at gmail.com>
> wrote:
>
>> Hi again,
>>
>> I'd like to expand on the performance issues and plead for help. Here's
>> one case which shows these odd hiccups: https://i.imgur.com/CXBPjTK.gifv.
>>
>> In this GIF where I switch back and forth between copy operations on 2
>> servers, I'm copying a 10GB dir full of .apk and image files.
>>
>> On server "hive" I'm copying straight from the main disk to an attached
>> volume block (xfs). As you can see, the transfers are relatively speedy and
>> don't hiccup.
>> On server "citadel" I'm copying the same set of data to a 4-replicate
>> gluster which uses block storage as a brick. As you can see, performance is
>> much worse, and there are frequent pauses for many seconds where nothing
>> seems to be happening - just freezes.
>>
>> All 4 servers have the same specs, and all of them have performance
>> issues with gluster and no such issues when raw xfs block storage is used.
>>
>> hive has long finished copying the data, while citadel is barely chugging
>> along and is expected to take probably half an hour to an hour. I have over
>> 1TB of data to migrate, at which point if we went live, I'm not even sure
>> gluster would be able to keep up instead of bringing the machines and
>> services down.
>>
>>
>>
>> Here's the cluster config, though it didn't seem to make any difference
>> performance-wise before I applied the customizations vs after.
>>
>> Volume Name: apkmirror_data1
>> Type: Replicate
>> Volume ID: 11ecee7e-d4f8-497a-9994-ceb144d6841e
>> Status: Started
>> Snapshot Count: 0
>> Number of Bricks: 1 x 4 = 4
>> Transport-type: tcp
>> Bricks:
>> Brick1: nexus2:/mnt/nexus2_block1/apkmirror_data1
>> Brick2: forge:/mnt/forge_block1/apkmirror_data1
>> Brick3: hive:/mnt/hive_block1/apkmirror_data1
>> Brick4: citadel:/mnt/citadel_block1/apkmirror_data1
>> Options Reconfigured:
>> cluster.quorum-count: 1
>> cluster.quorum-type: fixed
>> network.ping-timeout: 5
>> network.remote-dio: enable
>> performance.rda-cache-limit: 256MB
>> performance.readdir-ahead: on
>> performance.parallel-readdir: on
>> network.inode-lru-limit: 500000
>> performance.md-cache-timeout: 600
>> performance.cache-invalidation: on
>> performance.stat-prefetch: on
>> features.cache-invalidation-timeout: 600
>> features.cache-invalidation: on
>> cluster.readdir-optimize: on
>> performance.io-thread-count: 32
>> server.event-threads: 4
>> client.event-threads: 4
>> performance.read-ahead: off
>> cluster.lookup-optimize: on
>> performance.cache-size: 1GB
>> cluster.self-heal-daemon: enable
>> transport.address-family: inet
>> nfs.disable: on
>> performance.client-io-threads: on
>>
>>
>> The mounts are done as follows in /etc/fstab:
>> /dev/disk/by-id/scsi-0Linode_Volume_citadel_block1 /mnt/citadel_block1
>> xfs defaults 0 2
>> localhost:/apkmirror_data1 /mnt/apkmirror_data1 glusterfs
>> defaults,_netdev 0 0
>>
>> I'm really not sure if direct-io-mode mount tweaks would do anything
>> here, what the value should be set to, and what it is by default.
>>
>> The OS is OpenSUSE 42.3, 64-bit. 80GB of RAM, 20 CPUs, hosted by Linode.
>>
>> I'd really appreciate any help in the matter.
>>
>> Thank you.
>>
>>
>> Sincerely,
>> Artem
>>
>> --
>> Founder, Android Police <http://www.androidpolice.com>, APK Mirror
>> <http://www.apkmirror.com/>, Illogical Robot LLC
>> beerpla.net | +ArtemRussakovskii
>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>> <http://twitter.com/ArtemR>
>>
>> On Thu, Apr 5, 2018 at 11:13 PM, Artem Russakovskii <archon810 at gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I'm trying to squeeze performance out of gluster on 4 80GB RAM 20-CPU
>>> machines where Gluster runs on attached block storage (Linode) in (4
>>> replicate bricks), and so far everything I tried results in sub-optimal
>>> performance.
>>>
>>> There are many files - mostly images, several million - and many
>>> operations take minutes, copying multiple files (even if they're small)
>>> suddenly freezes up for seconds at a time, then continues, iostat
>>> frequently shows large r_await and w_awaits with 100% utilization for the
>>> attached block device, etc.
>>>
>>> But anyway, there are many guides out there for small-file performance
>>> improvements, but more explanation is needed, and I think more tweaks
>>> should be possible.
>>>
>>> My question today is about performance.cache-size. Is this a size of
>>> cache in RAM? If so, how do I view the current cache size to see if it gets
>>> full and I should increase its size? Is it advisable to bump it up if I
>>> have many tens of gigs of RAM free?
>>>
>>>
>>>
>>> More generally, in the last 2 months since I first started working with
>>> gluster and set a production system live, I've been feeling frustrated
>>> because Gluster has a lot of poorly-documented and confusing options. I
>>> really wish documentation could be improved with examples and better
>>> explanations.
>>>
>>> Specifically, it'd be absolutely amazing if the docs offered a strategy
>>> for setting each value and ways of determining more optimal values. For
>>> example, for performance.cache-size, if it said something like "run command
>>> abc to see your current cache size, and if it's hurting, up it, but be
>>> aware that it's limited by RAM," it'd be already a huge improvement to the
>>> docs. And so on with other options.
>>>
>>>
>>>
>>> The gluster team is quite helpful on this mailing list, but in a
>>> reactive rather than proactive way. Perhaps it's tunnel vision once you've
>>> worked on a project for so long where less technical explanations and even
>>> proper documentation of options takes a back seat, but I encourage you to
>>> be more proactive about helping us understand and optimize Gluster.
>>>
>>> Thank you.
>>>
>>> Sincerely,
>>> Artem
>>>
>>> --
>>> Founder, Android Police <http://www.androidpolice.com>, APK Mirror
>>> <http://www.apkmirror.com/>, Illogical Robot LLC
>>> beerpla.net | +ArtemRussakovskii
>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>> <http://twitter.com/ArtemR>
>>>
>>
>>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180408/b973348a/attachment.html>