[Gluster-users] kernel parameters for improving gluster writes on millions of small writes (long)
John Mark Walker
johnmark at redhat.com
Thu Jul 26 15:07:19 UTC 2012
Harry,
Have you seen this post?
http://community.gluster.org/a/linux-kernel-tuning-for-glusterfs/
Be sure and read all the comments, as Ben England chimes in on the comments, and he's one of the performance engineers at Red Hat.
-JM
----- Harry Mangalam <hjmangalam at gmail.com> wrote:
> This is a continuation of my previous posts about improving write perf
> when trapping millions of small writes to a gluster filesystem.
> I was able to improve write perf by ~30x by running STDOUT thru gzip
> to consolidate and reduce the output stream.
>
> Today, another similar problem, having to do with yet another
> bioinformatics program (which these days typically handle the 'short
> reads' that come out of the majority of sequencing hardware, each read
> being 30-150 characters, with some metadata typically in an ASCII file
> containing millions of such entries). Reading them doesn't seem to be
> a problem (at least on our systems) but writing them is quite awful..
>
> The program is called 'art_illumina' from the Broad Inst's 'ALLPATHS'
> suite and it generates an artificial Illumina data set from an input
> genome. In this case about 5GB of the type of data described above.
> Like before, the gluster process goes to >100% and the program itself
> slows to ~20-30% of a CPU. In this case, the app's output cannot be
> extrnally trapped by redirecting thru gzip since the output flag
> specifies the base filename for 2 files that are created internally
> and then written directly. This prevents even setting up a named pipe
> to trap and process the output.
>
> Since this gluster storage was set up specifically for bioinformatics,
> this is a repeating problem and while some of the issues can be dealt
> with by trapping and converting output, it would be VERY NICE if we
> could deal with it at the OS level.
>
> The gluster volume is running over IPoIB on QDR IB and looks like this:
> Volume Name: gl
> Type: Distribute
> Volume ID: 21f480f7-fc5a-4fd8-a084-3964634a9332
> Status: Started
> Number of Bricks: 8
> Transport-type: tcp,rdma
> Bricks:
> Brick1: bs2:/raid1
> Brick2: bs2:/raid2
> Brick3: bs3:/raid1
> Brick4: bs3:/raid2
> Brick5: bs4:/raid1
> Brick6: bs4:/raid2
> Brick7: bs1:/raid1
> Brick8: bs1:/raid2
> Options Reconfigured:
> performance.write-behind-window-size: 1024MB
> performance.flush-behind: on
> performance.cache-size: 268435456
> nfs.disable: on
> performance.io-cache: on
> performance.quick-read: on
> performance.io-thread-count: 64
> auth.allow: 10.2.*.*,10.1.*.*
>
> I've tried to increase every caching option that might improve this
> kind of performance, but it doesn't seem to help. At this point, I'm
> wondering whether changing the client (or server) kernel parameters
> will help.
>
> The client's meminfo is:
> cat /proc/meminfo
> MemTotal: 529425924 kB
> MemFree: 241833188 kB
> Buffers: 355248 kB
> Cached: 279699444 kB
> SwapCached: 0 kB
> Active: 2241580 kB
> Inactive: 278287248 kB
> Active(anon): 190988 kB
> Inactive(anon): 287952 kB
> Active(file): 2050592 kB
> Inactive(file): 277999296 kB
> Unevictable: 16856 kB
> Mlocked: 16856 kB
> SwapTotal: 563198732 kB
> SwapFree: 563198732 kB
> Dirty: 1656 kB
> Writeback: 0 kB
> AnonPages: 486876 kB
> Mapped: 19808 kB
> Shmem: 164 kB
> Slab: 1475476 kB
> SReclaimable: 1205944 kB
> SUnreclaim: 269532 kB
> KernelStack: 5928 kB
> PageTables: 27312 kB
> NFS_Unstable: 0 kB
> Bounce: 0 kB
> WritebackTmp: 0 kB
> CommitLimit: 827911692 kB
> Committed_AS: 536852 kB
> VmallocTotal: 34359738367 kB
> VmallocUsed: 1227732 kB
> VmallocChunk: 33888774404 kB
> HardwareCorrupted: 0 kB
> AnonHugePages: 376832 kB
> HugePages_Total: 0
> HugePages_Free: 0
> HugePages_Rsvd: 0
> HugePages_Surp: 0
> Hugepagesize: 2048 kB
> DirectMap4k: 201088 kB
> DirectMap2M: 15509504 kB
> DirectMap1G: 521142272 kB
>
> and the server's meminfo is:
>
> $ cat /proc/meminfo
> MemTotal: 32861400 kB
> MemFree: 1232172 kB
> Buffers: 29116 kB
> Cached: 30017272 kB
> SwapCached: 44 kB
> Active: 18840852 kB
> Inactive: 11772428 kB
> Active(anon): 492928 kB
> Inactive(anon): 75264 kB
> Active(file): 18347924 kB
> Inactive(file): 11697164 kB
> Unevictable: 0 kB
> Mlocked: 0 kB
> SwapTotal: 16382900 kB
> SwapFree: 16382680 kB
> Dirty: 8 kB
> Writeback: 0 kB
> AnonPages: 566876 kB
> Mapped: 14212 kB
> Shmem: 1276 kB
> Slab: 429164 kB
> SReclaimable: 324752 kB
> SUnreclaim: 104412 kB
> KernelStack: 3528 kB
> PageTables: 16956 kB
> NFS_Unstable: 0 kB
> Bounce: 0 kB
> WritebackTmp: 0 kB
> CommitLimit: 32813600 kB
> Committed_AS: 3053096 kB
> VmallocTotal: 34359738367 kB
> VmallocUsed: 340196 kB
> VmallocChunk: 34342345980 kB
> HardwareCorrupted: 0 kB
> AnonHugePages: 200704 kB
> HugePages_Total: 0
> HugePages_Free: 0
> HugePages_Rsvd: 0
> HugePages_Surp: 0
> Hugepagesize: 2048 kB
> DirectMap4k: 6656 kB
> DirectMap2M: 2072576 kB
> DirectMap1G: 31457280 kB
>
> Does this suggest any approach? Is there a doc that suggests optimal
> kernel parameters for gluster?
>
> I guess the only other option is to use the glusterfs as an NFS mount
> and use the NFS client's caching..? That will help on a single
> process but decrease the overall cluster bandwidth considerably.
>
> --
> Harry Mangalam - Research Computing, OIT, Rm 225 MSTB, UC Irvine
> [m/c 2225] / 92697 Google Voice Multiplexer: (949) 478-4487
> 415 South Circle View Dr, Irvine, CA, 92697 [shipping]
> MSTB Lat/Long: (33.642025,-117.844414) (paste into Google Maps)
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
More information about the Gluster-users
mailing list