[Bugs] [Bug 1399578] New: [compound FOPs]: Memory leak while doing FOPs with brick down
bugzilla at redhat.com
bugzilla at redhat.com
Tue Nov 29 10:39:48 UTC 2016
https://bugzilla.redhat.com/show_bug.cgi?id=1399578
Bug ID: 1399578
Summary: [compound FOPs]: Memory leak while doing FOPs with
brick down
Product: GlusterFS
Version: mainline
Component: core
Keywords: Triaged
Severity: urgent
Assignee: kdhananj at redhat.com
Reporter: kdhananj at redhat.com
CC: amukherj at redhat.com, bugs at gluster.org,
kdhananj at redhat.com, nchilaka at redhat.com,
rhs-bugs at redhat.com, sasundar at redhat.com,
sbhaloth at redhat.com, storage-qa-internal at redhat.com
Depends On: 1398315
+++ This bug was initially created as a clone of Bug #1398315 +++
Description of problem:
======================
this bug I am raising, because even after the file is completely written to the
brick(with one brick down) the memory is not getting cleared. Hence a very high
chance of memory leak. This is seen in both brick process and fuse client
Fuse client: I check in interval of 10min post the write was complete and
didn't see any change in memory consumed
2951 root 20 0 86.157g 0.014t 0 S 0.3 93.3 27:39.73 glusterfs
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2951 root 20 0 86.157g 0.014t 0 S 2.0 93.3 27:43.90 glusterfs
same with brick process
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1431 root 20 0 608672 24324 4256 S 0.0 0.3 0:01.99 glusterd
3914 root 20 0 4461344 3.097g 4348 S 0.0 40.5 15:00.45 glusterfsd
3937 root 20 0 672724 31104 3092 S 0.0 0.4 0:01.82 glusterfs
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1431 root 20 0 608672 24324 4256 S 0.0 0.3 0:02.00 glusterd
3914 root 20 0 4461344 3.097g 4348 S 0.0 40.5 15:00.45 glusterfsd
3937 root 20 0 672724 31104 3092 S 0.0 0.4 0:01.84 glusterfs
Version-Release number of selected component (if applicable):
==========
3.8.4-5
Steps to Reproduce:
1. create a 1x2 vol
2. enable compound fops, fuse mount the volume on a client
3. keep track of the memory consumption by both the brick processes and the
client process
4. create a 10 gb file with dd
5. after about 5gb is written bring down one brick
Now after the file is completely written, note down the memory consumed by
brick and the fuse client
Now leave the setup idle and check after 15min.
You don't see any freed up memory
Note: I would like to track them as two different issues. However on RCA if we
find that the root cause is same, then we can go ahead and dup one of them to
the other
--- Additional comment from nchilaka on 2016-11-24 07:48:52 EST ---
and here comes the OOM Kill :)
[Thu Nov 24 18:13:50 2016] glusterfs invoked oom-killer: gfp_mask=0x200da,
order=0, oom_score_adj=0
[Thu Nov 24 18:13:50 2016] glusterfs cpuset=/ mems_allowed=0-1
[Thu Nov 24 18:13:50 2016] CPU: 3 PID: 2953 Comm: glusterfs Not tainted
3.10.0-510.el7.x86_64 #1
[Thu Nov 24 18:13:50 2016] Hardware name: Supermicro
X9DRW-3LN4F+/X9DRW-3TF+/X9DRW-3LN4F+/X9DRW-3TF+, BIOS 1.0b 05/29/2012
[Thu Nov 24 18:13:50 2016] ffff880475a93ec0 000000002e7f78c3 ffff88046bb23990
ffffffff81685ccc
[Thu Nov 24 18:13:50 2016] ffff88046bb23a20 ffffffff81680c77 ffffffff812ae65b
ffff880476e27d00
[Thu Nov 24 18:13:50 2016] ffff880476e27d18 ffffffff00000202 fffeefff00000000
0000000000000001
[Thu Nov 24 18:13:50 2016] Call Trace:
[Thu Nov 24 18:13:50 2016] [<ffffffff81685ccc>] dump_stack+0x19/0x1b
[Thu Nov 24 18:13:50 2016] [<ffffffff81680c77>] dump_header+0x8e/0x225
[Thu Nov 24 18:13:50 2016] [<ffffffff812ae65b>] ?
cred_has_capability+0x6b/0x120
[Thu Nov 24 18:13:50 2016] [<ffffffff8113cb03>] ? delayacct_end+0x33/0xb0
[Thu Nov 24 18:13:50 2016] [<ffffffff8118460e>] oom_kill_process+0x24e/0x3c0
[Thu Nov 24 18:13:50 2016] [<ffffffff81184e46>] out_of_memory+0x4b6/0x4f0
[Thu Nov 24 18:13:50 2016] [<ffffffff81681780>]
__alloc_pages_slowpath+0x5d7/0x725
[Thu Nov 24 18:13:50 2016] [<ffffffff8118af55>]
__alloc_pages_nodemask+0x405/0x420
[Thu Nov 24 18:13:50 2016] [<ffffffff811d209a>] alloc_pages_vma+0x9a/0x150
[Thu Nov 24 18:13:50 2016] [<ffffffff811c2e8b>]
read_swap_cache_async+0xeb/0x160
[Thu Nov 24 18:13:50 2016] [<ffffffff811c2fa8>] swapin_readahead+0xa8/0x110
[Thu Nov 24 18:13:50 2016] [<ffffffff811b120c>] handle_mm_fault+0xb1c/0xfe0
[Thu Nov 24 18:13:50 2016] [<ffffffff81691794>] __do_page_fault+0x154/0x450
[Thu Nov 24 18:13:50 2016] [<ffffffff81691ac5>] do_page_fault+0x35/0x90
[Thu Nov 24 18:13:50 2016] [<ffffffff8168dfc0>] ? bstep_iret+0xf/0xf
[Thu Nov 24 18:13:50 2016] [<ffffffff8168dd88>] page_fault+0x28/0x30
[Thu Nov 24 18:13:50 2016] Mem-Info:
[Thu Nov 24 18:13:50 2016] active_anon:3322839 inactive_anon:510929
isolated_anon:0
active_file:174 inactive_file:754 isolated_file:0
unevictable:0 dirty:0 writeback:136 unstable:0
slab_reclaimable:11575 slab_unreclaimable:22836
mapped:291 shmem:742 pagetables:45178 bounce:0
free:32274 free_pcp:30 free_cma:0
[Thu Nov 24 18:13:50 2016] Node 0 DMA free:15848kB min:84kB low:104kB
high:124kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB
unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15932kB
managed:15848kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB
slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB
unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[Thu Nov 24 18:13:50 2016] lowmem_reserve[]: 0 1763 7777 7777
[Thu Nov 24 18:13:50 2016] Node 0 DMA32 free:33960kB min:10020kB low:12524kB
high:15028kB active_anon:1239140kB inactive_anon:445404kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
present:2052108kB managed:1807368kB mlocked:0kB dirty:0kB writeback:0kB
mapped:612kB shmem:608kB slab_reclaimable:1464kB slab_unreclaimable:8624kB
kernel_stack:336kB pagetables:3624kB unstable:0kB bounce:0kB free_pcp:120kB
local_pcp:120kB free_cma:0kB writeback_tmp:0kB pages_scanned:0
all_unreclaimable? yes
[Thu Nov 24 18:13:50 2016] lowmem_reserve[]: 0 0 6014 6014
[Thu Nov 24 18:13:50 2016] Node 0 Normal free:34060kB min:34180kB low:42724kB
high:51268kB active_anon:4993472kB inactive_anon:713816kB active_file:632kB
inactive_file:3244kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
present:6291456kB managed:6158340kB mlocked:0kB dirty:0kB writeback:344kB
mapped:456kB shmem:2316kB slab_reclaimable:13080kB slab_unreclaimable:44936kB
kernel_stack:3136kB pagetables:50900kB unstable:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:20840
all_unreclaimable? yes
[Thu Nov 24 18:13:50 2016] lowmem_reserve[]: 0 0 0 0
[Thu Nov 24 18:13:50 2016] Node 1 Normal free:45228kB min:45820kB low:57272kB
high:68728kB active_anon:7058744kB inactive_anon:884496kB active_file:64kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
present:8388608kB managed:8255248kB mlocked:0kB dirty:0kB writeback:200kB
mapped:96kB shmem:44kB slab_reclaimable:31756kB slab_unreclaimable:37784kB
kernel_stack:2400kB pagetables:126188kB unstable:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:8515
all_unreclaimable? yes
[Thu Nov 24 18:13:50 2016] lowmem_reserve[]: 0 0 0 0
[Thu Nov 24 18:13:50 2016] Node 0 DMA: 0*4kB 1*8kB (U) 0*16kB 1*32kB (U) 1*64kB
(U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) =
15848kB
[Thu Nov 24 18:13:50 2016] Node 0 DMA32: 1013*4kB (UM) 688*8kB (UEM) 266*16kB
(UEM) 54*32kB (UEM) 16*64kB (UEM) 6*128kB (EM) 7*256kB (EM) 5*512kB (M)
8*1024kB (UEM) 2*2048kB (M) 0*4096kB = 33972kB
[Thu Nov 24 18:13:50 2016] Node 0 Normal: 206*4kB (UEM) 158*8kB (UEM) 87*16kB
(UEM) 59*32kB (UEM) 87*64kB (UEM) 38*128kB (UM) 24*256kB (UE) 6*512kB (UEM)
10*1024kB (M) 0*2048kB 0*4096kB = 35256kB
[Thu Nov 24 18:13:50 2016] Node 1 Normal: 164*4kB (UEM) 114*8kB (UEM) 68*16kB
(UEM) 47*32kB (UEM) 17*64kB (UEM) 6*128kB (UM) 11*256kB (UEM) 35*512kB (UM)
19*1024kB (UM) 0*2048kB 0*4096kB = 46208kB
[Thu Nov 24 18:13:50 2016] Node 0 hugepages_total=0 hugepages_free=0
hugepages_surp=0 hugepages_size=1048576kB
[Thu Nov 24 18:13:50 2016] Node 0 hugepages_total=0 hugepages_free=0
hugepages_surp=0 hugepages_size=2048kB
[Thu Nov 24 18:13:50 2016] Node 1 hugepages_total=0 hugepages_free=0
hugepages_surp=0 hugepages_size=1048576kB
[Thu Nov 24 18:13:50 2016] Node 1 hugepages_total=0 hugepages_free=0
hugepages_surp=0 hugepages_size=2048kB
[Thu Nov 24 18:13:50 2016] 28778 total pagecache pages
[Thu Nov 24 18:13:50 2016] 27044 pages in swap cache
[Thu Nov 24 18:13:50 2016] Swap cache stats: add 2112210, delete 2085166, find
18057/22414
[Thu Nov 24 18:13:50 2016] Free swap = 0kB
[Thu Nov 24 18:13:50 2016] Total swap = 8257532kB
[Thu Nov 24 18:13:50 2016] 4187026 pages RAM
[Thu Nov 24 18:13:50 2016] 0 pages HighMem/MovableOnly
[Thu Nov 24 18:13:50 2016] 127825 pages reserved
[Thu Nov 24 18:13:50 2016] [ pid ] uid tgid total_vm rss nr_ptes
swapents oom_score_adj name
[Thu Nov 24 18:13:50 2016] [ 731] 0 731 9204 172 21
49 0 systemd-journal
[Thu Nov 24 18:13:50 2016] [ 752] 0 752 67411 0 34
608 0 lvmetad
[Thu Nov 24 18:13:50 2016] [ 768] 0 768 11319 1 23
245 -1000 systemd-udevd
[Thu Nov 24 18:13:50 2016] [ 1091] 0 1091 13854 23 28
87 -1000 auditd
[Thu Nov 24 18:13:50 2016] [ 1113] 0 1113 4860 81 14
38 0 irqbalance
[Thu Nov 24 18:13:50 2016] [ 1116] 81 1116 8207 95 17
52 -900 dbus-daemon
[Thu Nov 24 18:13:50 2016] [ 1119] 997 1119 28962 47 26
50 0 chronyd
[Thu Nov 24 18:13:50 2016] [ 1127] 998 1127 132067 81 55
1658 0 polkitd
[Thu Nov 24 18:13:50 2016] [ 1128] 0 1128 6048 43 16
30 0 systemd-logind
[Thu Nov 24 18:13:50 2016] [ 1131] 0 1131 31556 26 19
130 0 crond
[Thu Nov 24 18:13:50 2016] [ 1141] 0 1141 81800 261 82
4781 0 firewalld
[Thu Nov 24 18:13:50 2016] [ 1148] 0 1148 27509 1 10
31 0 agetty
[Thu Nov 24 18:13:50 2016] [ 1150] 0 1150 109534 294 68
345 0 NetworkManager
[Thu Nov 24 18:13:50 2016] [ 1250] 0 1250 28206 1 55
3122 0 dhclient
[Thu Nov 24 18:13:50 2016] [ 1508] 0 1508 54944 164 38
135 0 rsyslogd
[Thu Nov 24 18:13:50 2016] [ 1511] 0 1511 138288 91 89
2576 0 tuned
[Thu Nov 24 18:13:50 2016] [ 1516] 0 1516 28335 1 11
38 0 rhsmcertd
[Thu Nov 24 18:13:50 2016] [ 1538] 0 1538 20617 25 42
189 -1000 sshd
[Thu Nov 24 18:13:50 2016] [ 1552] 0 1552 26971 0 9
24 0 rhnsd
[Thu Nov 24 18:13:50 2016] [ 2331] 0 2331 22244 16 41
239 0 master
[Thu Nov 24 18:13:50 2016] [ 2363] 89 2363 22270 15 44
235 0 pickup
[Thu Nov 24 18:13:50 2016] [ 2365] 89 2365 22287 14 44
236 0 qmgr
[Thu Nov 24 18:13:50 2016] [ 2869] 0 2869 35726 28 71
291 0 sshd
[Thu Nov 24 18:13:50 2016] [ 2873] 0 2873 29316 81 15
492 0 bash
[Thu Nov 24 18:13:50 2016] [ 2951] 0 2951 22585439 3780372 43885
2041242 0 glusterfs
[Thu Nov 24 18:13:50 2016] [ 2969] 0 2969 35726 26 68
291 0 sshd
[Thu Nov 24 18:13:50 2016] [ 2973] 0 2973 28846 72 14
39 0 bash
[Thu Nov 24 18:13:50 2016] [ 2998] 0 2998 31927 68 17
70 0 screen
[Thu Nov 24 18:13:50 2016] [ 2999] 0 2999 38218 4753 32
4734 0 bash
[Thu Nov 24 18:13:50 2016] [ 3674] 0 3674 35726 316 72
0 0 sshd
[Thu Nov 24 18:13:50 2016] [ 3678] 0 3678 28846 109 12
0 0 bash
[Thu Nov 24 18:13:50 2016] [ 3815] 0 3815 130941 18547 177
0 0 yum
[Thu Nov 24 18:13:50 2016] Out of memory: Kill process 2951 (glusterfs) score
929 or sacrifice child
[Thu Nov 24 18:13:50 2016] Killed process 2951 (glusterfs) total-vm:90341756kB,
anon-rss:15121488kB, file-rss:0kB, shmem-rss:0kB
Referenced Bugs:
https://bugzilla.redhat.com/show_bug.cgi?id=1398315
[Bug 1398315] [compound FOPs]: Memory leak while doing FOPs with brick down
--
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=HGWi9beBiT&a=cc_unsubscribe
More information about the Bugs
mailing list