[Gluster-users] kernel parameters for improving gluster writes on millions of small writes (long)
Harry Mangalam
hjmangalam at gmail.com
Thu Jul 26 15:12:11 UTC 2012
I had not, tho I had searched for something like this for a good bit
yesterday....(?!) Back to google class for me..
Thanks very much!
hjm
On Thu, Jul 26, 2012 at 8:07 AM, John Mark Walker <johnmark at redhat.com> wrote:
> Harry,
>
> Have you seen this post?
>
> http://community.gluster.org/a/linux-kernel-tuning-for-glusterfs/
>
>
> Be sure and read all the comments, as Ben England chimes in on the comments, and he's one of the performance engineers at Red Hat.
>
> -JM
>
>
> ----- Harry Mangalam <hjmangalam at gmail.com> wrote:
>> This is a continuation of my previous posts about improving write perf
>> when trapping millions of small writes to a gluster filesystem.
>> I was able to improve write perf by ~30x by running STDOUT thru gzip
>> to consolidate and reduce the output stream.
>>
>> Today, another similar problem, having to do with yet another
>> bioinformatics program (which these days typically handle the 'short
>> reads' that come out of the majority of sequencing hardware, each read
>> being 30-150 characters, with some metadata typically in an ASCII file
>> containing millions of such entries). Reading them doesn't seem to be
>> a problem (at least on our systems) but writing them is quite awful..
>>
>> The program is called 'art_illumina' from the Broad Inst's 'ALLPATHS'
>> suite and it generates an artificial Illumina data set from an input
>> genome. In this case about 5GB of the type of data described above.
>> Like before, the gluster process goes to >100% and the program itself
>> slows to ~20-30% of a CPU. In this case, the app's output cannot be
>> extrnally trapped by redirecting thru gzip since the output flag
>> specifies the base filename for 2 files that are created internally
>> and then written directly. This prevents even setting up a named pipe
>> to trap and process the output.
>>
>> Since this gluster storage was set up specifically for bioinformatics,
>> this is a repeating problem and while some of the issues can be dealt
>> with by trapping and converting output, it would be VERY NICE if we
>> could deal with it at the OS level.
>>
>> The gluster volume is running over IPoIB on QDR IB and looks like this:
>> Volume Name: gl
>> Type: Distribute
>> Volume ID: 21f480f7-fc5a-4fd8-a084-3964634a9332
>> Status: Started
>> Number of Bricks: 8
>> Transport-type: tcp,rdma
>> Bricks:
>> Brick1: bs2:/raid1
>> Brick2: bs2:/raid2
>> Brick3: bs3:/raid1
>> Brick4: bs3:/raid2
>> Brick5: bs4:/raid1
>> Brick6: bs4:/raid2
>> Brick7: bs1:/raid1
>> Brick8: bs1:/raid2
>> Options Reconfigured:
>> performance.write-behind-window-size: 1024MB
>> performance.flush-behind: on
>> performance.cache-size: 268435456
>> nfs.disable: on
>> performance.io-cache: on
>> performance.quick-read: on
>> performance.io-thread-count: 64
>> auth.allow: 10.2.*.*,10.1.*.*
>>
>> I've tried to increase every caching option that might improve this
>> kind of performance, but it doesn't seem to help. At this point, I'm
>> wondering whether changing the client (or server) kernel parameters
>> will help.
>>
>> The client's meminfo is:
>> cat /proc/meminfo
>> MemTotal: 529425924 kB
>> MemFree: 241833188 kB
>> Buffers: 355248 kB
>> Cached: 279699444 kB
>> SwapCached: 0 kB
>> Active: 2241580 kB
>> Inactive: 278287248 kB
>> Active(anon): 190988 kB
>> Inactive(anon): 287952 kB
>> Active(file): 2050592 kB
>> Inactive(file): 277999296 kB
>> Unevictable: 16856 kB
>> Mlocked: 16856 kB
>> SwapTotal: 563198732 kB
>> SwapFree: 563198732 kB
>> Dirty: 1656 kB
>> Writeback: 0 kB
>> AnonPages: 486876 kB
>> Mapped: 19808 kB
>> Shmem: 164 kB
>> Slab: 1475476 kB
>> SReclaimable: 1205944 kB
>> SUnreclaim: 269532 kB
>> KernelStack: 5928 kB
>> PageTables: 27312 kB
>> NFS_Unstable: 0 kB
>> Bounce: 0 kB
>> WritebackTmp: 0 kB
>> CommitLimit: 827911692 kB
>> Committed_AS: 536852 kB
>> VmallocTotal: 34359738367 kB
>> VmallocUsed: 1227732 kB
>> VmallocChunk: 33888774404 kB
>> HardwareCorrupted: 0 kB
>> AnonHugePages: 376832 kB
>> HugePages_Total: 0
>> HugePages_Free: 0
>> HugePages_Rsvd: 0
>> HugePages_Surp: 0
>> Hugepagesize: 2048 kB
>> DirectMap4k: 201088 kB
>> DirectMap2M: 15509504 kB
>> DirectMap1G: 521142272 kB
>>
>> and the server's meminfo is:
>>
>> $ cat /proc/meminfo
>> MemTotal: 32861400 kB
>> MemFree: 1232172 kB
>> Buffers: 29116 kB
>> Cached: 30017272 kB
>> SwapCached: 44 kB
>> Active: 18840852 kB
>> Inactive: 11772428 kB
>> Active(anon): 492928 kB
>> Inactive(anon): 75264 kB
>> Active(file): 18347924 kB
>> Inactive(file): 11697164 kB
>> Unevictable: 0 kB
>> Mlocked: 0 kB
>> SwapTotal: 16382900 kB
>> SwapFree: 16382680 kB
>> Dirty: 8 kB
>> Writeback: 0 kB
>> AnonPages: 566876 kB
>> Mapped: 14212 kB
>> Shmem: 1276 kB
>> Slab: 429164 kB
>> SReclaimable: 324752 kB
>> SUnreclaim: 104412 kB
>> KernelStack: 3528 kB
>> PageTables: 16956 kB
>> NFS_Unstable: 0 kB
>> Bounce: 0 kB
>> WritebackTmp: 0 kB
>> CommitLimit: 32813600 kB
>> Committed_AS: 3053096 kB
>> VmallocTotal: 34359738367 kB
>> VmallocUsed: 340196 kB
>> VmallocChunk: 34342345980 kB
>> HardwareCorrupted: 0 kB
>> AnonHugePages: 200704 kB
>> HugePages_Total: 0
>> HugePages_Free: 0
>> HugePages_Rsvd: 0
>> HugePages_Surp: 0
>> Hugepagesize: 2048 kB
>> DirectMap4k: 6656 kB
>> DirectMap2M: 2072576 kB
>> DirectMap1G: 31457280 kB
>>
>> Does this suggest any approach? Is there a doc that suggests optimal
>> kernel parameters for gluster?
>>
>> I guess the only other option is to use the glusterfs as an NFS mount
>> and use the NFS client's caching..? That will help on a single
>> process but decrease the overall cluster bandwidth considerably.
>>
>> --
>> Harry Mangalam - Research Computing, OIT, Rm 225 MSTB, UC Irvine
>> [m/c 2225] / 92697 Google Voice Multiplexer: (949) 478-4487
>> 415 South Circle View Dr, Irvine, CA, 92697 [shipping]
>> MSTB Lat/Long: (33.642025,-117.844414) (paste into Google Maps)
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>
--
Harry Mangalam - Research Computing, OIT, Rm 225 MSTB, UC Irvine
[m/c 2225] / 92697 Google Voice Multiplexer: (949) 478-4487
415 South Circle View Dr, Irvine, CA, 92697 [shipping]
MSTB Lat/Long: (33.642025,-117.844414) (paste into Google Maps)
More information about the Gluster-users
mailing list