[Bugs] [Bug 1399578] New: [compound FOPs]: Memory leak while doing FOPs with brick down

bugzilla at redhat.com bugzilla at redhat.com
Tue Nov 29 10:39:48 UTC 2016


https://bugzilla.redhat.com/show_bug.cgi?id=1399578

            Bug ID: 1399578
           Summary: [compound FOPs]: Memory leak while doing FOPs with
                    brick down
           Product: GlusterFS
           Version: mainline
         Component: core
          Keywords: Triaged
          Severity: urgent
          Assignee: kdhananj at redhat.com
          Reporter: kdhananj at redhat.com
                CC: amukherj at redhat.com, bugs at gluster.org,
                    kdhananj at redhat.com, nchilaka at redhat.com,
                    rhs-bugs at redhat.com, sasundar at redhat.com,
                    sbhaloth at redhat.com, storage-qa-internal at redhat.com
        Depends On: 1398315



+++ This bug was initially created as a clone of Bug #1398315 +++

Description of problem:
======================
this bug I am raising, because even after the file is completely written to the
brick(with one brick down) the memory is not getting cleared. Hence a very high
chance of memory leak. This is seen in both brick process and fuse client

Fuse client: I check in interval of 10min post the write was complete and
didn't see any change in memory consumed
2951 root      20   0 86.157g 0.014t      0 S   0.3 93.3  27:39.73 glusterfs
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND

 PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 2951 root      20   0 86.157g 0.014t      0 S   2.0 93.3  27:43.90 glusterfs



same with brick process
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 1431 root      20   0  608672  24324   4256 S   0.0  0.3   0:01.99 glusterd
 3914 root      20   0 4461344 3.097g   4348 S   0.0 40.5  15:00.45 glusterfsd
 3937 root      20   0  672724  31104   3092 S   0.0  0.4   0:01.82 glusterfs
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 1431 root      20   0  608672  24324   4256 S   0.0  0.3   0:02.00 glusterd
 3914 root      20   0 4461344 3.097g   4348 S   0.0 40.5  15:00.45 glusterfsd
 3937 root      20   0  672724  31104   3092 S   0.0  0.4   0:01.84 glusterfs

Version-Release number of selected component (if applicable):
==========
3.8.4-5

Steps to Reproduce:
1. create a 1x2 vol
2. enable compound fops, fuse mount the volume on a client
3. keep track of the memory consumption by both  the brick processes and the
client process
4. create a 10 gb file with dd
5. after about 5gb is written bring down one brick

Now after the file is completely written, note down the memory consumed by
brick and the fuse client

Now leave the setup idle and check after 15min.
You don't see any freed up memory


Note: I would like to track them as two different issues. However on RCA if we
find that the root cause is same, then we can go ahead and dup one of them to
the other

--- Additional comment from nchilaka on 2016-11-24 07:48:52 EST ---

and here comes the OOM Kill :)
[Thu Nov 24 18:13:50 2016] glusterfs invoked oom-killer: gfp_mask=0x200da,
order=0, oom_score_adj=0
[Thu Nov 24 18:13:50 2016] glusterfs cpuset=/ mems_allowed=0-1
[Thu Nov 24 18:13:50 2016] CPU: 3 PID: 2953 Comm: glusterfs Not tainted
3.10.0-510.el7.x86_64 #1
[Thu Nov 24 18:13:50 2016] Hardware name: Supermicro
X9DRW-3LN4F+/X9DRW-3TF+/X9DRW-3LN4F+/X9DRW-3TF+, BIOS 1.0b 05/29/2012
[Thu Nov 24 18:13:50 2016]  ffff880475a93ec0 000000002e7f78c3 ffff88046bb23990
ffffffff81685ccc
[Thu Nov 24 18:13:50 2016]  ffff88046bb23a20 ffffffff81680c77 ffffffff812ae65b
ffff880476e27d00
[Thu Nov 24 18:13:50 2016]  ffff880476e27d18 ffffffff00000202 fffeefff00000000
0000000000000001
[Thu Nov 24 18:13:50 2016] Call Trace:
[Thu Nov 24 18:13:50 2016]  [<ffffffff81685ccc>] dump_stack+0x19/0x1b
[Thu Nov 24 18:13:50 2016]  [<ffffffff81680c77>] dump_header+0x8e/0x225
[Thu Nov 24 18:13:50 2016]  [<ffffffff812ae65b>] ?
cred_has_capability+0x6b/0x120
[Thu Nov 24 18:13:50 2016]  [<ffffffff8113cb03>] ? delayacct_end+0x33/0xb0
[Thu Nov 24 18:13:50 2016]  [<ffffffff8118460e>] oom_kill_process+0x24e/0x3c0
[Thu Nov 24 18:13:50 2016]  [<ffffffff81184e46>] out_of_memory+0x4b6/0x4f0
[Thu Nov 24 18:13:50 2016]  [<ffffffff81681780>]
__alloc_pages_slowpath+0x5d7/0x725
[Thu Nov 24 18:13:50 2016]  [<ffffffff8118af55>]
__alloc_pages_nodemask+0x405/0x420
[Thu Nov 24 18:13:50 2016]  [<ffffffff811d209a>] alloc_pages_vma+0x9a/0x150
[Thu Nov 24 18:13:50 2016]  [<ffffffff811c2e8b>]
read_swap_cache_async+0xeb/0x160
[Thu Nov 24 18:13:50 2016]  [<ffffffff811c2fa8>] swapin_readahead+0xa8/0x110
[Thu Nov 24 18:13:50 2016]  [<ffffffff811b120c>] handle_mm_fault+0xb1c/0xfe0
[Thu Nov 24 18:13:50 2016]  [<ffffffff81691794>] __do_page_fault+0x154/0x450
[Thu Nov 24 18:13:50 2016]  [<ffffffff81691ac5>] do_page_fault+0x35/0x90
[Thu Nov 24 18:13:50 2016]  [<ffffffff8168dfc0>] ? bstep_iret+0xf/0xf
[Thu Nov 24 18:13:50 2016]  [<ffffffff8168dd88>] page_fault+0x28/0x30
[Thu Nov 24 18:13:50 2016] Mem-Info:
[Thu Nov 24 18:13:50 2016] active_anon:3322839 inactive_anon:510929
isolated_anon:0
 active_file:174 inactive_file:754 isolated_file:0
 unevictable:0 dirty:0 writeback:136 unstable:0
 slab_reclaimable:11575 slab_unreclaimable:22836
 mapped:291 shmem:742 pagetables:45178 bounce:0
 free:32274 free_pcp:30 free_cma:0
[Thu Nov 24 18:13:50 2016] Node 0 DMA free:15848kB min:84kB low:104kB
high:124kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB
unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15932kB
managed:15848kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB
slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB
unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[Thu Nov 24 18:13:50 2016] lowmem_reserve[]: 0 1763 7777 7777
[Thu Nov 24 18:13:50 2016] Node 0 DMA32 free:33960kB min:10020kB low:12524kB
high:15028kB active_anon:1239140kB inactive_anon:445404kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
present:2052108kB managed:1807368kB mlocked:0kB dirty:0kB writeback:0kB
mapped:612kB shmem:608kB slab_reclaimable:1464kB slab_unreclaimable:8624kB
kernel_stack:336kB pagetables:3624kB unstable:0kB bounce:0kB free_pcp:120kB
local_pcp:120kB free_cma:0kB writeback_tmp:0kB pages_scanned:0
all_unreclaimable? yes
[Thu Nov 24 18:13:50 2016] lowmem_reserve[]: 0 0 6014 6014
[Thu Nov 24 18:13:50 2016] Node 0 Normal free:34060kB min:34180kB low:42724kB
high:51268kB active_anon:4993472kB inactive_anon:713816kB active_file:632kB
inactive_file:3244kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
present:6291456kB managed:6158340kB mlocked:0kB dirty:0kB writeback:344kB
mapped:456kB shmem:2316kB slab_reclaimable:13080kB slab_unreclaimable:44936kB
kernel_stack:3136kB pagetables:50900kB unstable:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:20840
all_unreclaimable? yes
[Thu Nov 24 18:13:50 2016] lowmem_reserve[]: 0 0 0 0
[Thu Nov 24 18:13:50 2016] Node 1 Normal free:45228kB min:45820kB low:57272kB
high:68728kB active_anon:7058744kB inactive_anon:884496kB active_file:64kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
present:8388608kB managed:8255248kB mlocked:0kB dirty:0kB writeback:200kB
mapped:96kB shmem:44kB slab_reclaimable:31756kB slab_unreclaimable:37784kB
kernel_stack:2400kB pagetables:126188kB unstable:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:8515
all_unreclaimable? yes
[Thu Nov 24 18:13:50 2016] lowmem_reserve[]: 0 0 0 0
[Thu Nov 24 18:13:50 2016] Node 0 DMA: 0*4kB 1*8kB (U) 0*16kB 1*32kB (U) 1*64kB
(U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) =
15848kB
[Thu Nov 24 18:13:50 2016] Node 0 DMA32: 1013*4kB (UM) 688*8kB (UEM) 266*16kB
(UEM) 54*32kB (UEM) 16*64kB (UEM) 6*128kB (EM) 7*256kB (EM) 5*512kB (M)
8*1024kB (UEM) 2*2048kB (M) 0*4096kB = 33972kB
[Thu Nov 24 18:13:50 2016] Node 0 Normal: 206*4kB (UEM) 158*8kB (UEM) 87*16kB
(UEM) 59*32kB (UEM) 87*64kB (UEM) 38*128kB (UM) 24*256kB (UE) 6*512kB (UEM)
10*1024kB (M) 0*2048kB 0*4096kB = 35256kB
[Thu Nov 24 18:13:50 2016] Node 1 Normal: 164*4kB (UEM) 114*8kB (UEM) 68*16kB
(UEM) 47*32kB (UEM) 17*64kB (UEM) 6*128kB (UM) 11*256kB (UEM) 35*512kB (UM)
19*1024kB (UM) 0*2048kB 0*4096kB = 46208kB
[Thu Nov 24 18:13:50 2016] Node 0 hugepages_total=0 hugepages_free=0
hugepages_surp=0 hugepages_size=1048576kB
[Thu Nov 24 18:13:50 2016] Node 0 hugepages_total=0 hugepages_free=0
hugepages_surp=0 hugepages_size=2048kB
[Thu Nov 24 18:13:50 2016] Node 1 hugepages_total=0 hugepages_free=0
hugepages_surp=0 hugepages_size=1048576kB
[Thu Nov 24 18:13:50 2016] Node 1 hugepages_total=0 hugepages_free=0
hugepages_surp=0 hugepages_size=2048kB
[Thu Nov 24 18:13:50 2016] 28778 total pagecache pages
[Thu Nov 24 18:13:50 2016] 27044 pages in swap cache
[Thu Nov 24 18:13:50 2016] Swap cache stats: add 2112210, delete 2085166, find
18057/22414
[Thu Nov 24 18:13:50 2016] Free swap  = 0kB
[Thu Nov 24 18:13:50 2016] Total swap = 8257532kB
[Thu Nov 24 18:13:50 2016] 4187026 pages RAM
[Thu Nov 24 18:13:50 2016] 0 pages HighMem/MovableOnly
[Thu Nov 24 18:13:50 2016] 127825 pages reserved
[Thu Nov 24 18:13:50 2016] [ pid ]   uid  tgid total_vm      rss nr_ptes
swapents oom_score_adj name
[Thu Nov 24 18:13:50 2016] [  731]     0   731     9204      172      21      
49             0 systemd-journal
[Thu Nov 24 18:13:50 2016] [  752]     0   752    67411        0      34     
608             0 lvmetad
[Thu Nov 24 18:13:50 2016] [  768]     0   768    11319        1      23     
245         -1000 systemd-udevd
[Thu Nov 24 18:13:50 2016] [ 1091]     0  1091    13854       23      28      
87         -1000 auditd
[Thu Nov 24 18:13:50 2016] [ 1113]     0  1113     4860       81      14      
38             0 irqbalance
[Thu Nov 24 18:13:50 2016] [ 1116]    81  1116     8207       95      17      
52          -900 dbus-daemon
[Thu Nov 24 18:13:50 2016] [ 1119]   997  1119    28962       47      26      
50             0 chronyd
[Thu Nov 24 18:13:50 2016] [ 1127]   998  1127   132067       81      55    
1658             0 polkitd
[Thu Nov 24 18:13:50 2016] [ 1128]     0  1128     6048       43      16      
30             0 systemd-logind
[Thu Nov 24 18:13:50 2016] [ 1131]     0  1131    31556       26      19     
130             0 crond
[Thu Nov 24 18:13:50 2016] [ 1141]     0  1141    81800      261      82    
4781             0 firewalld
[Thu Nov 24 18:13:50 2016] [ 1148]     0  1148    27509        1      10      
31             0 agetty
[Thu Nov 24 18:13:50 2016] [ 1150]     0  1150   109534      294      68     
345             0 NetworkManager
[Thu Nov 24 18:13:50 2016] [ 1250]     0  1250    28206        1      55    
3122             0 dhclient
[Thu Nov 24 18:13:50 2016] [ 1508]     0  1508    54944      164      38     
135             0 rsyslogd
[Thu Nov 24 18:13:50 2016] [ 1511]     0  1511   138288       91      89    
2576             0 tuned
[Thu Nov 24 18:13:50 2016] [ 1516]     0  1516    28335        1      11      
38             0 rhsmcertd
[Thu Nov 24 18:13:50 2016] [ 1538]     0  1538    20617       25      42     
189         -1000 sshd
[Thu Nov 24 18:13:50 2016] [ 1552]     0  1552    26971        0       9      
24             0 rhnsd
[Thu Nov 24 18:13:50 2016] [ 2331]     0  2331    22244       16      41     
239             0 master
[Thu Nov 24 18:13:50 2016] [ 2363]    89  2363    22270       15      44     
235             0 pickup
[Thu Nov 24 18:13:50 2016] [ 2365]    89  2365    22287       14      44     
236             0 qmgr
[Thu Nov 24 18:13:50 2016] [ 2869]     0  2869    35726       28      71     
291             0 sshd
[Thu Nov 24 18:13:50 2016] [ 2873]     0  2873    29316       81      15     
492             0 bash
[Thu Nov 24 18:13:50 2016] [ 2951]     0  2951 22585439  3780372   43885 
2041242             0 glusterfs
[Thu Nov 24 18:13:50 2016] [ 2969]     0  2969    35726       26      68     
291             0 sshd
[Thu Nov 24 18:13:50 2016] [ 2973]     0  2973    28846       72      14      
39             0 bash
[Thu Nov 24 18:13:50 2016] [ 2998]     0  2998    31927       68      17      
70             0 screen
[Thu Nov 24 18:13:50 2016] [ 2999]     0  2999    38218     4753      32    
4734             0 bash
[Thu Nov 24 18:13:50 2016] [ 3674]     0  3674    35726      316      72       
0             0 sshd
[Thu Nov 24 18:13:50 2016] [ 3678]     0  3678    28846      109      12       
0             0 bash
[Thu Nov 24 18:13:50 2016] [ 3815]     0  3815   130941    18547     177       
0             0 yum
[Thu Nov 24 18:13:50 2016] Out of memory: Kill process 2951 (glusterfs) score
929 or sacrifice child
[Thu Nov 24 18:13:50 2016] Killed process 2951 (glusterfs) total-vm:90341756kB,
anon-rss:15121488kB, file-rss:0kB, shmem-rss:0kB


Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1398315
[Bug 1398315] [compound FOPs]: Memory leak while doing FOPs with brick down
-- 
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=HGWi9beBiT&a=cc_unsubscribe


More information about the Bugs mailing list