[Gluster-users] performance.cache-size for high-RAM clients/servers, other tweaks for performance, and improvements to Gluster docs

Fri Apr 6 08:23:28 UTC 2018

I restarted rsync, and this has been sitting there for almost a minute,
barely moved several bytes in that time:

2014/11/545b06baa3d98/com.google.android.apps.inputmethod.zhuyin-2.1.0.79226761-armeabi-v7a-175-minAPI14.apk
      6,389,760  45%   18.76kB/s    0:06:50

I straced each of the 3 processes rsync created and saw this (note: every
time there were several seconds of no output, I ctrl-C'ed and detached from
strace):

citadel:/home/archon810 # strace -p 16776
Process 16776 attached
select(6, [5], [4], [5], {45, 293510})  = 1 (out [4], left {44, 71342})
write(4,
"\4\200\0\7\3513>\2755\360[\372\317\337DZ\36\324\300o\235\377\247\367\177%\37\226\352\377\256\351"...,
32776) = 32776
ioctl(1, TIOCGPGRP, [16776])            = 0
write(1, "\r      4,292,608  30%   27.07kB/"..., 46) = 46
select(6, [5], [4], [5], {60, 0})       = 1 (out [4], left {59, 999998})
write(4,
"\4\200\0\7\270\224\277\24\31\247f\32\233x\t\276l\f-\254\r\246\324\360\30\235\350\6\34\304\230\242"...,
32776) = 32776
select(6, [5], [4], [5], {60, 0})       = 1 (out [4], left {59, 999998})
write(4,
"\4\200\0\7\346_\363\36\33\320}Dd\5_\327\250\237i\242?B\276e\245\202Z\213\301[\25S"...,
32776) = 32776
select(6, [5], [4], [5], {60, 0})       = 1 (out [4], left {59, 999998})
write(4,
"\4\200\0\7\330\303\221\357\225\37h\373\366X\306L\f>\234\\%n\253\266\5\372c\257>V\366\255"...,
32776) = 32776
select(6, [5], [4], [5], {60, 0})       = 1 (out [4], left {59, 999998})
write(4,
"\4\200\0\7i\301\17u\224{/O\213\330\33\317\272\246\221\22\261|w\244\5\307|\21\373\v\356k"...,
32776) = 32776
select(6, [5], [4], [5], {60, 0})       = 1 (out [4], left {59, 999998})
write(4,
"\4\200\0\7\270\277\233\206n\304:\362_\213~\356bm\5\350\337\26\203\225\332\277\372\275\247<\307\22"...,
32776) = 32776
read(3, "\316\214\260\341:\263P\214\373n\313\10\333
}\323\364Q\353\r\232d\204\257\\Q\306/\277\253/\356"..., 262144) = 262144
select(6, [5], [4], [5], {60, 0})       = 1 (out [4], left {59, 999998})
write(4,
"\4\200\0\7\314\233\274S08\330\276\226\267\233\360rp\210x)\320\0314\223\323\3335Y\312\313\307"...,
32776) = 32776
select(6, [5], [4], [5], {60, 0})       = 1 (out [4], left {59, 999998})
write(4, "\4\200\0\7\316\214\260\341:\263P\214\373n\313\10\333
}\323\364Q\353\r\232d\204\257\\Q\306/"..., 32776) = 32776
select(6, [5], [4], [5], {60, 0}^CProcess 16776 detached
 <detached ...>
citadel:/home/archon810 # strace -p 16777
Process 16777 attached
select(4, [3], [], [3], {38, 210908}^CProcess 16777 detached
 <detached ...>
citadel:/home/archon810 # strace -p 16776
Process 16776 attached
select(6, [5], [4], [5], {48, 295874}^CProcess 16776 detached
 <detached ...>
citadel:/home/archon810 # strace -p 16778
Process 16778 attached
select(1, [0], [], [0], {60, 0})        = 1 (in [0], left {59, 999996})
read(0,
"\0\200\0\0\4\200\0\7\3508\343\204\207\255\4\212y\230&&\372\30*\322\f\325v\335\230
\16v"..., 32768) = 32768
select(1, [0], [], [0], {60, 0})        = 1 (in [0], left {59, 999998})
read(0, "\373\30\2\2667\371\207)", 8)   = 8
select(1, [0], [], [0], {60, 0})        = 1 (in [0], left {59, 999999})
read(0,
"\0\200\0\0\4\200\0\7\6\213\2223\233\36-\350,\303\0\234\7`\317\276H\353u\217\275\316\333@"...,
32768) = 32768
select(1, [0], [], [0], {60, 0})        = 1 (in [0], left {59, 999999})
read(0, "\375\33\367_\357\330\362\222", 8) = 8
select(1, [0], [], [0], {60, 0})        = 1 (in [0], left {59, 999999})
read(0,
"\0\200\0\0\4\200\0\7`Nv\355\275\336wzQ\365\264\364\20AX\365DG\372\311\216\212\375\276"...,
32768) = 32768
select(1, [0], [], [0], {60, 0})        = 1 (in [0], left {59, 999999})
read(0, "\374)\300\264}\21\226s", 8)    = 8
select(1, [0], [], [0], {60, 0})        = 1 (in [0], left {59, 999999})
read(0, "\0\200\0\0\4\200\0\7\10:\v\342O\305\374\5:Y+
\250\315\24\202J-@\256WC\320\371"...,
32768) = 32768
select(1, [0], [], [0], {60, 0})        = 1 (in [0], left {59, 999999})
read(0, "\3023\24O\343y\312\204", 8)    = 8
select(1, [0], [], [0], {60, 0})        = 1 (in [0], left {59, 999999})
read(0,
"\0\200\0\0\4\200\0\7\27\22^\n/S.\215\362T\f\257Q\207z\241~B\3\32\32344\17"...,
32768) = 32768
select(1, [0], [], [0], {60, 0})        = 1 (in [0], left {59, 999998})
read(0, "\367P\222\262\224\17\25\250", 8) = 8
select(1, [0], [], [0], {60, 0})        = 1 (in [0], left {59, 999999})
read(0,
"\0\200\0\0\4\200\0\7FujR\213\372\341E\232\360\n\257\323\233>\364\245\37\3\31\314\20\206\362"...,
32768) = 32768
select(1, [0], [], [0], {60, 0})        = 1 (in [0], left {59, 999999})
read(0, "\203o\300\341\37\340(8", 8)    = 8
select(1, [0], [], [0], {60, 0})        = 1 (in [0], left {59, 999998})
read(0,
"\0\200\0\0\4\200\0\7n\211\357\301\217\210\23\341$\342d8\25N\2035[\260\1\206B\206!\2"...,
32768) = 32768
select(1, [0], [], [0], {60, 0})        = 1 (in [0], left {59, 999999})
read(0, "|\222\223\336\201w\325\356", 8) = 8
select(1, [0], [], [0], {60, 0})        = 1 (in [0], left {59, 999999})
read(0,
"\0\200\0\0\4\200\0\7\220\216Y\343\362\366\231\372?\334N^\303\35\374cC;\vtx\231<w"...,
32768) = 32768
select(1, [0], [], [0], {60, 0})        = 1 (in [0], left {59, 999998})
read(0, ";k\v\314\21\375\3\274", 8)     = 8
write(3, "\3508\343\204\207\255\4\212y\230&&\372\30*\322\f\325v\335\230
\16v\213O//\332\4\24\24"..., 262144^C

I'm really not sure what to make of this. In the time I wrote the above,
the file still hasn't finished copying.

2014/11/545b06baa3d98/com.google.android.apps.inputmethod.zhuyin-2.1.0.79226761-armeabi-v7a-175-minAPI14.apk
     10,321,920  73%   33.31kB/s    0:01:53

Sincerely,
Artem

--
Founder, Android Police <http://www.androidpolice.com>, APK Mirror
<http://www.apkmirror.com/>, Illogical Robot LLC
beerpla.net | +ArtemRussakovskii
<https://plus.google.com/+ArtemRussakovskii> | @ArtemR
<http://twitter.com/ArtemR>

On Fri, Apr 6, 2018 at 1:12 AM, Artem Russakovskii <archon810 at gmail.com>
wrote:

> Hi again,
>
> I'd like to expand on the performance issues and plead for help. Here's
> one case which shows these odd hiccups: https://i.imgur.com/CXBPjTK.gifv.
>
> In this GIF where I switch back and forth between copy operations on 2
> servers, I'm copying a 10GB dir full of .apk and image files.
>
> On server "hive" I'm copying straight from the main disk to an attached
> volume block (xfs). As you can see, the transfers are relatively speedy and
> don't hiccup.
> On server "citadel" I'm copying the same set of data to a 4-replicate
> gluster which uses block storage as a brick. As you can see, performance is
> much worse, and there are frequent pauses for many seconds where nothing
> seems to be happening - just freezes.
>
> All 4 servers have the same specs, and all of them have performance issues
> with gluster and no such issues when raw xfs block storage is used.
>
> hive has long finished copying the data, while citadel is barely chugging
> along and is expected to take probably half an hour to an hour. I have over
> 1TB of data to migrate, at which point if we went live, I'm not even sure
> gluster would be able to keep up instead of bringing the machines and
> services down.
>
>
>
> Here's the cluster config, though it didn't seem to make any difference
> performance-wise before I applied the customizations vs after.
>
> Volume Name: apkmirror_data1
> Type: Replicate
> Volume ID: 11ecee7e-d4f8-497a-9994-ceb144d6841e
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x 4 = 4
> Transport-type: tcp
> Bricks:
> Brick1: nexus2:/mnt/nexus2_block1/apkmirror_data1
> Brick2: forge:/mnt/forge_block1/apkmirror_data1
> Brick3: hive:/mnt/hive_block1/apkmirror_data1
> Brick4: citadel:/mnt/citadel_block1/apkmirror_data1
> Options Reconfigured:
> cluster.quorum-count: 1
> cluster.quorum-type: fixed
> network.ping-timeout: 5
> network.remote-dio: enable
> performance.rda-cache-limit: 256MB
> performance.readdir-ahead: on
> performance.parallel-readdir: on
> network.inode-lru-limit: 500000
> performance.md-cache-timeout: 600
> performance.cache-invalidation: on
> performance.stat-prefetch: on
> features.cache-invalidation-timeout: 600
> features.cache-invalidation: on
> cluster.readdir-optimize: on
> performance.io-thread-count: 32
> server.event-threads: 4
> client.event-threads: 4
> performance.read-ahead: off
> cluster.lookup-optimize: on
> performance.cache-size: 1GB
> cluster.self-heal-daemon: enable
> transport.address-family: inet
> nfs.disable: on
> performance.client-io-threads: on
>
>
> The mounts are done as follows in /etc/fstab:
> /dev/disk/by-id/scsi-0Linode_Volume_citadel_block1 /mnt/citadel_block1
> xfs defaults 0 2
> localhost:/apkmirror_data1 /mnt/apkmirror_data1 glusterfs defaults,_netdev
> 0 0
>
> I'm really not sure if direct-io-mode mount tweaks would do anything here,
> what the value should be set to, and what it is by default.
>
> The OS is OpenSUSE 42.3, 64-bit. 80GB of RAM, 20 CPUs, hosted by Linode.
>
> I'd really appreciate any help in the matter.
>
> Thank you.
>
>
> Sincerely,
> Artem
>
> --
> Founder, Android Police <http://www.androidpolice.com>, APK Mirror
> <http://www.apkmirror.com/>, Illogical Robot LLC
> beerpla.net | +ArtemRussakovskii
> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR
> <http://twitter.com/ArtemR>
>
> On Thu, Apr 5, 2018 at 11:13 PM, Artem Russakovskii <archon810 at gmail.com>
> wrote:
>
>> Hi,
>>
>> I'm trying to squeeze performance out of gluster on 4 80GB RAM 20-CPU
>> machines where Gluster runs on attached block storage (Linode) in (4
>> replicate bricks), and so far everything I tried results in sub-optimal
>> performance.
>>
>> There are many files - mostly images, several million - and many
>> operations take minutes, copying multiple files (even if they're small)
>> suddenly freezes up for seconds at a time, then continues, iostat
>> frequently shows large r_await and w_awaits with 100% utilization for the
>> attached block device, etc.
>>
>> But anyway, there are many guides out there for small-file performance
>> improvements, but more explanation is needed, and I think more tweaks
>> should be possible.
>>
>> My question today is about performance.cache-size. Is this a size of
>> cache in RAM? If so, how do I view the current cache size to see if it gets
>> full and I should increase its size? Is it advisable to bump it up if I
>> have many tens of gigs of RAM free?
>>
>>
>>
>> More generally, in the last 2 months since I first started working with
>> gluster and set a production system live, I've been feeling frustrated
>> because Gluster has a lot of poorly-documented and confusing options. I
>> really wish documentation could be improved with examples and better
>> explanations.
>>
>> Specifically, it'd be absolutely amazing if the docs offered a strategy
>> for setting each value and ways of determining more optimal values. For
>> example, for performance.cache-size, if it said something like "run command
>> abc to see your current cache size, and if it's hurting, up it, but be
>> aware that it's limited by RAM," it'd be already a huge improvement to the
>> docs. And so on with other options.
>>
>>
>>
>> The gluster team is quite helpful on this mailing list, but in a reactive
>> rather than proactive way. Perhaps it's tunnel vision once you've worked on
>> a project for so long where less technical explanations and even proper
>> documentation of options takes a back seat, but I encourage you to be more
>> proactive about helping us understand and optimize Gluster.
>>
>> Thank you.
>>
>> Sincerely,
>> Artem
>>
>> --
>> Founder, Android Police <http://www.androidpolice.com>, APK Mirror
>> <http://www.apkmirror.com/>, Illogical Robot LLC
>> beerpla.net | +ArtemRussakovskii
>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>> <http://twitter.com/ArtemR>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180406/150c6751/attachment.html>