[Gluster-users] kernel parameters for improving gluster writes on millions of small writes (long)

Thu Jul 26 15:07:19 UTC 2012

Harry,

Have you seen this post?

http://community.gluster.org/a/linux-kernel-tuning-for-glusterfs/

Be sure and read all the comments, as Ben England chimes in on the comments, and he's one of the performance engineers at Red Hat.

-JM

----- Harry Mangalam <hjmangalam at gmail.com> wrote:
> This is a continuation of my previous posts about improving write perf
> when trapping millions of small writes to a gluster filesystem.
> I was able to improve write perf by ~30x by running STDOUT thru gzip
> to consolidate and reduce the output stream.
> 
> Today, another similar problem, having to do with yet another
> bioinformatics program (which these days typically handle the 'short
> reads' that come out of the majority of sequencing hardware, each read
> being 30-150 characters, with some metadata typically in an ASCII file
> containing millions of such entries).  Reading them doesn't seem to be
> a problem (at least on our systems) but writing them is quite awful..
> 
> The program is called 'art_illumina' from the Broad Inst's 'ALLPATHS'
> suite and it generates an artificial Illumina data set from an input
> genome.  In this case about 5GB of the type of data described above.
> Like before, the gluster process goes to >100% and the program itself
> slows to ~20-30% of a CPU.  In this case, the app's output cannot be
> extrnally trapped by redirecting thru gzip since the output flag
> specifies the base filename for 2 files that are created internally
> and then written directly.  This prevents even setting up a named pipe
> to trap and process the output.
> 
> Since this gluster storage was set up specifically for bioinformatics,
> this is a repeating problem and while some of the issues can be dealt
> with by trapping and converting output, it would be VERY NICE if we
> could deal with it at the OS level.
> 
> The gluster volume is running over IPoIB on QDR IB and looks like this:
> Volume Name: gl
> Type: Distribute
> Volume ID: 21f480f7-fc5a-4fd8-a084-3964634a9332
> Status: Started
> Number of Bricks: 8
> Transport-type: tcp,rdma
> Bricks:
> Brick1: bs2:/raid1
> Brick2: bs2:/raid2
> Brick3: bs3:/raid1
> Brick4: bs3:/raid2
> Brick5: bs4:/raid1
> Brick6: bs4:/raid2
> Brick7: bs1:/raid1
> Brick8: bs1:/raid2
> Options Reconfigured:
> performance.write-behind-window-size: 1024MB
> performance.flush-behind: on
> performance.cache-size: 268435456
> nfs.disable: on
> performance.io-cache: on
> performance.quick-read: on
> performance.io-thread-count: 64
> auth.allow: 10.2.*.*,10.1.*.*
> 
> I've tried to increase every caching option that might improve this
> kind of performance, but it doesn't seem to help.  At this point, I'm
> wondering whether changing the client (or server) kernel parameters
> will help.
> 
> The client's meminfo is:
>  cat  /proc/meminfo
> MemTotal:       529425924 kB
> MemFree:        241833188 kB
> Buffers:          355248 kB
> Cached:         279699444 kB
> SwapCached:            0 kB
> Active:          2241580 kB
> Inactive:       278287248 kB
> Active(anon):     190988 kB
> Inactive(anon):   287952 kB
> Active(file):    2050592 kB
> Inactive(file): 277999296 kB
> Unevictable:       16856 kB
> Mlocked:           16856 kB
> SwapTotal:      563198732 kB
> SwapFree:       563198732 kB
> Dirty:              1656 kB
> Writeback:             0 kB
> AnonPages:        486876 kB
> Mapped:            19808 kB
> Shmem:               164 kB
> Slab:            1475476 kB
> SReclaimable:    1205944 kB
> SUnreclaim:       269532 kB
> KernelStack:        5928 kB
> PageTables:        27312 kB
> NFS_Unstable:          0 kB
> Bounce:                0 kB
> WritebackTmp:          0 kB
> CommitLimit:    827911692 kB
> Committed_AS:     536852 kB
> VmallocTotal:   34359738367 kB
> VmallocUsed:     1227732 kB
> VmallocChunk:   33888774404 kB
> HardwareCorrupted:     0 kB
> AnonHugePages:    376832 kB
> HugePages_Total:       0
> HugePages_Free:        0
> HugePages_Rsvd:        0
> HugePages_Surp:        0
> Hugepagesize:       2048 kB
> DirectMap4k:      201088 kB
> DirectMap2M:    15509504 kB
> DirectMap1G:    521142272 kB
> 
> and the server's meminfo is:
> 
> $ cat  /proc/meminfo
> MemTotal:       32861400 kB
> MemFree:         1232172 kB
> Buffers:           29116 kB
> Cached:         30017272 kB
> SwapCached:           44 kB
> Active:         18840852 kB
> Inactive:       11772428 kB
> Active(anon):     492928 kB
> Inactive(anon):    75264 kB
> Active(file):   18347924 kB
> Inactive(file): 11697164 kB
> Unevictable:           0 kB
> Mlocked:               0 kB
> SwapTotal:      16382900 kB
> SwapFree:       16382680 kB
> Dirty:                 8 kB
> Writeback:             0 kB
> AnonPages:        566876 kB
> Mapped:            14212 kB
> Shmem:              1276 kB
> Slab:             429164 kB
> SReclaimable:     324752 kB
> SUnreclaim:       104412 kB
> KernelStack:        3528 kB
> PageTables:        16956 kB
> NFS_Unstable:          0 kB
> Bounce:                0 kB
> WritebackTmp:          0 kB
> CommitLimit:    32813600 kB
> Committed_AS:    3053096 kB
> VmallocTotal:   34359738367 kB
> VmallocUsed:      340196 kB
> VmallocChunk:   34342345980 kB
> HardwareCorrupted:     0 kB
> AnonHugePages:    200704 kB
> HugePages_Total:       0
> HugePages_Free:        0
> HugePages_Rsvd:        0
> HugePages_Surp:        0
> Hugepagesize:       2048 kB
> DirectMap4k:        6656 kB
> DirectMap2M:     2072576 kB
> DirectMap1G:    31457280 kB
> 
> Does this suggest any approach?  Is there a doc that suggests optimal
> kernel parameters for gluster?
> 
> I guess the only other option is to use the glusterfs as an NFS mount
> and use the NFS client's caching..?  That will help on a single
> process but decrease the overall cluster bandwidth considerably.
> 
> -- 
> Harry Mangalam - Research Computing, OIT, Rm 225 MSTB, UC Irvine
> [m/c 2225] / 92697 Google Voice Multiplexer: (949) 478-4487
> 415 South Circle View Dr, Irvine, CA, 92697 [shipping]
> MSTB Lat/Long: (33.642025,-117.844414) (paste into Google Maps)
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users