[Gluster-users] GlusterFS OOM Issue

Steven King sking at kingrst.com
Mon Feb 11 18:04:46 UTC 2013


Thanks Brian,

Our system is a Supermicro motherboard-X8 Series with a 4 drive hardware 
RAID 10. The CPU is an Intel Quad Core i7 X3440 @ 2.53GHz.

At the time the system had 4GB RAM and about 3GB swap. We have since 
upgraded the RAM to 16GB and swap is the same.

Current memory usage:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1488 root 20 0 8768m 8.5g 2032 R 2 53.9 181:15.04 glusterfs

The process over time but quickly consumes the available memory on the 
system and then more slowly begins to eat up swap.

I've tried forcing the kernel to GC and reclaim cache, however the cache 
is only about a GB

Current output of free -m before a cache reclaim:
root at ifx05:~# free -m
total used free shared buffers cached
Mem: 16079 15865 213 0 374 1224
-/+ buffers/cache: 14267 1812
Swap: 3814 1 3813

Here is the output from OOM killer:

Feb 8 08:05:36 ifx05 kernel: [679946.164642] glusterfsd invoked 
oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0
Feb 8 08:05:36 ifx05 kernel: [679946.270816] glusterfsd cpuset=/ 
mems_allowed=0
Feb 8 08:05:36 ifx05 kernel: [679946.325069] Pid: 2070, comm: glusterfsd 
Not tainted 3.2.0-0.bpo.3-amd64 #1
Feb 8 08:05:36 ifx05 kernel: [679946.408416] Call Trace:
Feb 8 08:05:36 ifx05 kernel: [679946.438667] [<ffffffff810bf159>] ? 
dump_header+0x76/0x1a7
Feb 8 08:05:36 ifx05 kernel: [679946.505292] [<ffffffff81173f34>] ? 
security_real_capable_noaudit+0x34/0x59
Feb 8 08:05:36 ifx05 kernel: [679946.589592] [<ffffffff810bf088>] ? 
oom_unkillable_task+0x5f/0x92
Feb 8 08:05:36 ifx05 kernel: [679946.663604] [<ffffffff810bf5af>] ? 
oom_kill_process+0x52/0x28d
Feb 8 08:05:36 ifx05 kernel: [679946.735431] [<ffffffff810bfabb>] ? 
out_of_memory+0x2d1/0x337
Feb 8 08:05:36 ifx05 kernel: [679946.805178] [<ffffffff810c3cbd>] ? 
__alloc_pages_nodemask+0x5d8/0x731
Feb 8 08:05:48 ifx05 kernel: [679946.884294] [<ffffffff810ef944>] ? 
alloc_pages_current+0xa7/0xc9
Feb 8 08:05:48 ifx05 kernel: [679946.958201] [<ffffffff810be8e1>] ? 
filemap_fault+0x26d/0x35c
Feb 8 08:05:48 ifx05 kernel: [679947.027962] [<ffffffff810dad70>] ? 
__do_fault+0xc6/0x438
Feb 8 08:05:48 ifx05 kernel: [679947.093631] [<ffffffff810dbf09>] ? 
handle_pte_fault+0x352/0x965
Feb 8 08:05:48 ifx05 kernel: [679947.166503] [<ffffffff81120595>] ? 
getxattr+0xee/0x119
Feb 8 08:05:48 ifx05 kernel: [679947.230005] [<ffffffff81120595>] ? 
getxattr+0xee/0x119
Feb 8 08:05:48 ifx05 kernel: [679947.293519] [<ffffffff81368e2e>] ? 
do_page_fault+0x327/0x34c
Feb 8 08:05:48 ifx05 kernel: [679947.363275] [<ffffffff81111e9e>] ? 
user_path_at_empty+0x55/0x7d
Feb 8 08:05:48 ifx05 kernel: [679947.436234] [<ffffffff81109949>] ? 
sys_newlstat+0x24/0x2d
Feb 8 08:05:48 ifx05 kernel: [679947.502876] [<ffffffff81117bb9>] ? 
dput+0x29/0xf2
Feb 8 08:05:50 ifx05 kernel: [679947.561177] [<ffffffff81366235>] ? 
page_fault+0x25/0x30
Feb 8 08:05:50 ifx05 kernel: [679947.625726] Mem-Info:
Feb 8 08:05:50 ifx05 kernel: [679947.653900] Node 0 DMA per-cpu:
Feb 8 08:05:50 ifx05 kernel: [679947.692559] CPU 0: hi: 0, btch: 1 usd: 0
Feb 8 08:05:50 ifx05 kernel: [679947.750881] CPU 1: hi: 0, btch: 1 usd: 0
Feb 8 08:05:50 ifx05 kernel: [679947.809187] CPU 2: hi: 0, btch: 1 usd: 0
Feb 8 08:05:50 ifx05 kernel: [679947.867501] CPU 3: hi: 0, btch: 1 usd: 0
Feb 8 08:05:50 ifx05 kernel: [679947.925813] CPU 4: hi: 0, btch: 1 usd: 0
Feb 8 08:05:50 ifx05 kernel: [679947.984128] CPU 5: hi: 0, btch: 1 usd: 0
Feb 8 08:05:50 ifx05 kernel: [679948.042445] CPU 6: hi: 0, btch: 1 usd: 0
Feb 8 08:05:50 ifx05 kernel: [679948.100757] CPU 7: hi: 0, btch: 1 usd: 0
Feb 8 08:05:50 ifx05 kernel: [679948.159068] Node 0 DMA32 per-cpu:
Feb 8 08:05:50 ifx05 kernel: [679948.199814] CPU 0: hi: 186, btch: 31 usd: 0
Feb 8 08:05:50 ifx05 kernel: [679948.258130] CPU 1: hi: 186, btch: 31 usd: 0
Feb 8 08:05:50 ifx05 kernel: [679948.316447] CPU 2: hi: 186, btch: 31 
usd: 30
Feb 8 08:05:50 ifx05 kernel: [679948.374756] CPU 3: hi: 186, btch: 31 usd: 0
Feb 8 08:05:50 ifx05 kernel: [679948.433069] CPU 4: hi: 186, btch: 31 usd: 0
Feb 8 08:05:50 ifx05 kernel: [679948.491381] CPU 5: hi: 186, btch: 31 usd: 0
Feb 8 08:05:50 ifx05 kernel: [679948.549695] CPU 6: hi: 186, btch: 31 usd: 0
Feb 8 08:05:50 ifx05 kernel: [679948.608011] CPU 7: hi: 186, btch: 31 usd: 0
Feb 8 08:05:50 ifx05 kernel: [679948.666323] Node 0 Normal per-cpu:
Feb 8 08:05:50 ifx05 kernel: [679948.708109] CPU 0: hi: 186, btch: 31 usd: 0
Feb 8 08:05:50 ifx05 kernel: [679948.766425] CPU 1: hi: 186, btch: 31 usd: 0
Feb 8 08:05:50 ifx05 kernel: [679948.824739] CPU 2: hi: 186, btch: 31 usd: 0
Feb 8 08:05:50 ifx05 kernel: [679948.883052] CPU 3: hi: 186, btch: 31 
usd: 59
Feb 8 08:05:50 ifx05 kernel: [679948.941365] CPU 4: hi: 186, btch: 31 usd: 0
Feb 8 08:05:50 ifx05 kernel: [679948.999678] CPU 5: hi: 186, btch: 31 usd: 0
Feb 8 08:05:50 ifx05 kernel: [679949.057996] CPU 6: hi: 186, btch: 31 usd: 0
Feb 8 08:05:50 ifx05 kernel: [679949.116306] CPU 7: hi: 186, btch: 31 usd: 0
Feb 8 08:05:50 ifx05 kernel: [679949.174626] active_anon:702601 
inactive_anon:260864 isolated_anon:55
Feb 8 08:05:50 ifx05 kernel: [679949.174628] active_file:630 
inactive_file:833 isolated_file:86
Feb 8 08:05:50 ifx05 kernel: [679949.174629] unevictable:0 dirty:0 
writeback:0 unstable:0
Feb 8 08:05:50 ifx05 kernel: [679949.174630] free:21828 
slab_reclaimable:5760 slab_unreclaimable:5863
Feb 8 08:05:50 ifx05 kernel: [679949.174631] mapped:378 shmem:163 
pagetables:5053 bounce:0
Feb 8 08:05:50 ifx05 kernel: [679949.533783] Node 0 DMA free:15904kB 
min:256kB low:320kB high:384kB active_anon:0kB inactive_anon:0kB 
active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB 
isolated(file):0kB present:15680kB mlocked:0kB dirty:0kB writeback:0kB 
mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB 
kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB 
writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Feb 8 08:05:50 ifx05 kernel: [679949.974901] lowmem_reserve[]: 0 2991 
4001 4001
Feb 8 08:05:50 ifx05 kernel: [679950.029570] Node 0 DMA32 free:54652kB 
min:50332kB low:62912kB high:75496kB active_anon:2356292kB 
inactive_anon:589348kB active_file:1656kB inactive_file:2088kB 
unevictable:0kB isolated(anon):128kB isolated(file):0kB 
present:3063584kB mlocked:0kB dirty:0kB writeback:0kB mapped:1136kB 
shmem:648kB slab_reclaimable:14680kB slab_unreclaimable:5804kB 
kernel_stack:776kB pagetables:11180kB unstable:0kB bounce:0kB 
writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Feb 8 08:05:50 ifx05 kernel: [679950.518329] lowmem_reserve[]: 0 0 1010 1010
Feb 8 08:05:50 ifx05 kernel: [679950.569883] Node 0 Normal free:16616kB 
min:16992kB low:21240kB high:25488kB active_anon:453668kB 
inactive_anon:454080kB active_file:676kB inactive_file:392kB 
unevictable:0kB isolated(anon):220kB isolated(file):128kB 
present:1034240kB mlocked:0kB dirty:0kB writeback:0kB mapped:508kB 
shmem:4kB slab_reclaimable:8360kB slab_unreclaimable:17648kB 
kernel_stack:1632kB pagetables:9032kB unstable:0kB bounce:0kB 
writeback_tmp:0kB pages_scanned:3 all_unreclaimable? no
Feb 8 08:05:50 ifx05 kernel: [679951.055531] lowmem_reserve[]: 0 0 0 0
Feb 8 08:05:50 ifx05 kernel: [679951.100836] Node 0 DMA: 0*4kB 0*8kB 
0*16kB 1*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB 
= 15904kB
Feb 8 08:05:50 ifx05 kernel: [679951.230148] Node 0 DMA32: 12135*4kB 
0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 1*2048kB 
1*4096kB = 54684kB
Feb 8 08:05:50 ifx05 kernel: [679951.365691] Node 0 Normal: 843*4kB 
502*8kB 173*16kB 55*32kB 9*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 
0*2048kB 1*4096kB = 16716kB
Feb 8 08:05:50 ifx05 kernel: [679951.505398] 94663 total pagecache pages
Feb 8 08:05:50 ifx05 kernel: [679951.552277] 92078 pages in swap cache
Feb 8 08:05:50 ifx05 kernel: [679951.597096] Swap cache stats: add 
6081494, delete 5990029, find 1251231/1858053
Feb 8 08:05:50 ifx05 kernel: [679951.685541] Free swap = 0kB
Feb 8 08:05:50 ifx05 kernel: [679951.720980] Total swap = 3906556kB
Feb 8 08:05:50 ifx05 kernel: [679951.775776] 1048560 pages RAM
Feb 8 08:05:50 ifx05 kernel: [679951.812253] 35003 pages reserved
Feb 8 08:05:50 ifx05 kernel: [679951.851847] 10245 pages shared
Feb 8 08:05:50 ifx05 kernel: [679951.889380] 982341 pages non-shared
Feb 8 08:05:50 ifx05 kernel: [679951.932102] [ pid ] uid tgid total_vm 
rss cpu oom_adj oom_score_adj name
Feb 8 08:05:50 ifx05 kernel: [679952.021615] [ 329] 0 329 4263 1 0 -17 
-1000 udevd
Feb 8 08:05:50 ifx05 kernel: [679952.021619] [ 826] 1 826 2036 21 1 0 0 
portmap
Feb 8 08:05:50 ifx05 kernel: [679952.021622] [ 906] 102 906 3608 2 1 0 0 
rpc.statd
Feb 8 08:05:50 ifx05 kernel: [679952.021625] [ 1051] 0 1051 6768 0 7 0 0 
rpc.idmapd
Feb 8 08:05:50 ifx05 kernel: [679952.021628] [ 1212] 0 1212 992 1 2 0 0 
acpid
Feb 8 08:05:50 ifx05 kernel: [679952.021631] [ 1221] 0 1221 4691 1 2 0 0 atd
Feb 8 08:05:50 ifx05 kernel: [679952.021634] [ 1253] 0 1253 5619 20 3 0 
0 cron
Feb 8 08:05:50 ifx05 kernel: [679952.021637] [ 1438] 0 1438 12307 29 3 
-17 -1000 sshd
Feb 8 08:05:50 ifx05 kernel: [679952.021640] [ 1450] 0 1450 9304 29 3 0 
0 master
Feb 8 08:05:50 ifx05 kernel: [679952.021642] [ 1457] 107 1457 9860 37 2 
0 0 qmgr
Feb 8 08:05:50 ifx05 kernel: [679952.021645] [ 1491] 0 1491 54480 636 3 
0 0 glusterfsd
Feb 8 08:05:50 ifx05 kernel: [679952.021648] [ 1495] 0 1495 70879 1081 6 
0 0 glusterfsd
Feb 8 08:05:50 ifx05 kernel: [679952.021651] [ 1499] 0 1499 37250 25 4 0 
0 glusterfsd
Feb 8 08:05:50 ifx05 kernel: [679952.021653] [ 1503] 0 1503 70880 934 4 
0 0 glusterfsd
Feb 8 08:05:50 ifx05 kernel: [679952.021656] [ 1507] 0 1507 111360 1936 
5 0 0 glusterfsd
Feb 8 08:05:50 ifx05 kernel: [679952.021659] [ 1511] 0 1511 87515 1429 4 
0 0 glusterfsd
Feb 8 08:05:50 ifx05 kernel: [679952.021661] [ 1515] 0 1515 70617 489 0 
0 0 glusterfsd
Feb 8 08:05:50 ifx05 kernel: [679952.021664] [ 1519] 0 1519 70901 956 7 
0 0 glusterfsd
Feb 8 08:05:50 ifx05 kernel: [679952.021667] [ 1523] 0 1523 88380 2041 2 
0 0 glusterfsd
Feb 8 08:05:50 ifx05 kernel: [679952.021670] [ 1527] 0 1527 54818 1421 7 
0 0 glusterfsd
Feb 8 08:05:50 ifx05 kernel: [679952.021673] [ 1531] 0 1531 37250 27 4 0 
0 glusterfsd
Feb 8 08:05:50 ifx05 kernel: [679952.021675] [ 1535] 0 1535 54224 315 1 
0 0 glusterfsd
Feb 8 08:05:50 ifx05 kernel: [679952.021678] [ 1539] 0 1539 37250 0 4 0 
0 glusterfsd
Feb 8 08:05:50 ifx05 kernel: [679952.021681] [ 1543] 0 1543 72001 1561 2 
0 0 glusterfsd
Feb 8 08:05:50 ifx05 kernel: [679952.021684] [ 1547] 0 1547 87954 2595 5 
0 0 glusterfsd
Feb 8 08:05:50 ifx05 kernel: [679952.021687] [ 1551] 0 1551 37209 11 4 0 
0 glusterfsd
Feb 8 08:05:50 ifx05 kernel: [679952.021689] [ 1559] 0 1559 71410 1301 5 
0 0 glusterfsd
Feb 8 08:05:50 ifx05 kernel: [679952.021692] [ 1563] 0 1563 121841 3270 
2 0 0 glusterfsd
Feb 8 08:05:50 ifx05 kernel: [679952.021694] [ 1567] 0 1567 70925 1217 5 
0 0 glusterfsd
Feb 8 08:05:50 ifx05 kernel: [679952.021697] [ 1571] 0 1571 54241 1169 2 
0 0 glusterfsd
Feb 8 08:05:50 ifx05 kernel: [679952.021699] [ 1635] 105 1635 9597 32 3 
0 0 ntpd
Feb 8 08:05:50 ifx05 kernel: [679952.021702] [ 1670] 0 1670 1495 1 0 0 0 
getty
Feb 8 08:05:50 ifx05 kernel: [679952.021705] [ 2659] 107 2659 10454 47 2 
0 0 tlsmgr
Feb 8 08:05:50 ifx05 kernel: [679952.021708] [ 6691] 0 6691 47094 44 1 0 
0 glusterd
Feb 8 08:05:50 ifx05 kernel: [679952.021711] [ 7449] 0 7449 10818 5 1 0 
0 syslog-ng
Feb 8 08:05:50 ifx05 kernel: [679952.021713] [ 7450] 0 7450 12452 157 1 
0 0 syslog-ng
Feb 8 08:05:50 ifx05 kernel: [679952.021716] [ 9475] 108 9475 12591 8 5 
0 0 zabbix_agentd
Feb 8 08:05:50 ifx05 kernel: [679952.021719] [ 9476] 108 9476 12591 241 
7 0 0 zabbix_agentd
Feb 8 08:05:50 ifx05 kernel: [679952.021722] [ 9477] 108 9477 12591 24 3 
0 0 zabbix_agentd
Feb 8 08:05:50 ifx05 kernel: [679952.021725] [ 9478] 108 9478 12591 24 4 
0 0 zabbix_agentd
Feb 8 08:05:50 ifx05 kernel: [679952.021728] [ 9479] 108 9479 12591 24 0 
0 0 zabbix_agentd
Feb 8 08:05:50 ifx05 kernel: [679952.021731] [ 9480] 108 9480 12591 24 4 
0 0 zabbix_agentd
Feb 8 08:05:50 ifx05 kernel: [679952.021734] [ 9481] 108 9481 12591 24 1 
0 0 zabbix_agentd
Feb 8 08:05:50 ifx05 kernel: [679952.021737] [ 9482] 108 9482 12591 112 
4 0 0 zabbix_agentd
Feb 8 08:05:50 ifx05 kernel: [679952.021740] [ 9600] 106 9600 11810 300 
0 0 0 snmpd
Feb 8 08:05:50 ifx05 kernel: [679952.021743] [10556] 0 10556 57500 82 4 
0 0 glusterfs
Feb 8 08:05:50 ifx05 kernel: [679952.021746] [31815] 0 31815 27485 245 4 
0 0 ruby
Feb 8 08:05:50 ifx05 kernel: [679952.021749] [ 2842] 0 2842 4262 1 3 -17 
-1000 udevd
Feb 8 08:05:50 ifx05 kernel: [679952.021754] [30513] 0 30513 1724628 
814692 5 0 0 glusterfs
Feb 8 08:05:50 ifx05 kernel: [679952.021757] [19809] 107 19809 9820 89 2 
0 0 pickup
Feb 8 08:05:50 ifx05 kernel: [679952.021759] [20590] 0 20590 8214 61 2 0 
0 cron
Feb 8 08:05:50 ifx05 kernel: [679952.021762] [20591] 0 20591 1001 24 2 0 
0 sh
Feb 8 08:05:50 ifx05 kernel: [679952.021765] [20592] 0 20592 40966 18786 
3 0 0 puppet
Feb 8 08:05:50 ifx05 kernel: [679952.021767] [20861] 0 20861 41134 19137 
5 0 0 puppet
Feb 8 08:05:50 ifx05 kernel: [679952.021770] Out of memory: Kill process 
30513 (glusterfs) score 816 or sacrifice child
Feb 8 08:05:50 ifx05 kernel: [679952.021772] Killed process 30513 
(glusterfs) total-vm:6898512kB, anon-rss:3257908kB, file-rss:860kB

On 2/9/13 2:26 PM, Brian Foster wrote:
> On 02/08/2013 05:14 PM, Steven King wrote:
>> Hello,
>>
>> I am running GlusterFS version 3.2.7-2~bpo60+1 on Debian 6.0.6. Today, I
>> have experienced a a glusterfs process cause the server to invoke
>> oom_killer.
>>
>> How exactly would I go about investigating this and coming up with a fix?
>>
> The OOM killer output to syslog and details on your hardware might be
> useful to include.
>
> Following that, you could monitor the address space (VIRT) and set size
> (RES/RSS) of the relevant processes with top on your server. For
> example, is there a sudden increase in set size or does it constantly,
> gradually increase?
>
> Brian

-- 
Steve King

Network/Linux Engineer - AdSafe Media
Cisco Certified Network Professional
CompTIA Linux+ Certified Professional
CompTIA A+ Certified Professional




More information about the Gluster-users mailing list