[Bugs] [Bug 1401021] New: OOM kill of nfs-ganesha on one node while fs-sanity test suite is executed.

Fri Dec 2 15:06:15 UTC 2016

https://bugzilla.redhat.com/show_bug.cgi?id=1401021

            Bug ID: 1401021
           Summary: OOM kill of nfs-ganesha on one node while fs-sanity
                    test suite is executed.
           Product: GlusterFS
           Version: 3.9
         Component: distribute
          Keywords: Triaged
          Severity: urgent
          Assignee: bugs at gluster.org
          Reporter: jthottan at redhat.com
                CC: aloganat at redhat.com, bugs at gluster.org,
                    jthottan at redhat.com, kkeithle at redhat.com,
                    mzywusko at redhat.com, ndevos at redhat.com,
                    rhs-bugs at redhat.com, sbhaloth at redhat.com,
                    skoduri at redhat.com, sraj at redhat.com,
                    storage-qa-internal at redhat.com
        Depends On: 1381452, 1397052

+++ This bug was initially created as a clone of Bug #1397052 +++

+++ This bug was initially created as a clone of Bug #1381452 +++

Description of problem:

OOM kill of nfs-ganesha on one node while posix_compliance test suite is
executed.

Version-Release number of selected component (if applicable):

[root at dhcp42-59 ~]# rpm -qa|grep ganesha
nfs-ganesha-2.4.0-2.el6rhs.x86_64
nfs-ganesha-gluster-2.4.0-2.el6rhs.x86_64
glusterfs-ganesha-3.8.4-2.el6rhs.x86_64

How reproducible:

Once

Steps to Reproduce:
1. Create a ganesha cluster, create a volume and enable ganesha on it.
2. Mount the volume with ver=4 on client and start executing posix_compliance
test suite.
3. Observe that once the posix_compliance test suite is finished, ganesha gets
oom_killed on the mounted node with below messages in dmesg:

pcs invoked oom-killer: gfp_mask=0x280da, order=0, oom_adj=0, oom_score_adj=0
pcs cpuset=/ mems_allowed=0
Pid: 3248, comm: pcs Not tainted 2.6.32-642.4.2.el6.x86_64 #1
Call Trace:
 [<ffffffff81131420>] ? dump_header+0x90/0x1b0
 [<ffffffff8123bfec>] ? security_real_capable_noaudit+0x3c/0x70
 [<ffffffff811318a2>] ? oom_kill_process+0x82/0x2a0
 [<ffffffff811317e1>] ? select_bad_process+0xe1/0x120
 [<ffffffff81131ce0>] ? out_of_memory+0x220/0x3c0
 [<ffffffff8113e6bc>] ? __alloc_pages_nodemask+0x93c/0x950
 [<ffffffff81177a0a>] ? alloc_pages_vma+0x9a/0x150
 [<ffffffff81159d8d>] ? handle_pte_fault+0x73d/0xb20
 [<ffffffff810567c7>] ? pte_alloc_one+0x37/0x50
 [<ffffffff81193f79>] ? do_huge_pmd_anonymous_page+0xb9/0x3b0
 [<ffffffff8115a409>] ? handle_mm_fault+0x299/0x3d0
 [<ffffffff81052156>] ? __do_page_fault+0x146/0x500
 [<ffffffff811609d5>] ? do_mmap_pgoff+0x335/0x380
 [<ffffffff8154f03e>] ? do_page_fault+0x3e/0xa0
 [<ffffffff8154c345>] ? page_fault+0x25/0x30
Mem-Info:
Node 0 DMA per-cpu:
CPU    0: hi:    0, btch:   1 usd:   0
CPU    1: hi:    0, btch:   1 usd:   0
CPU    2: hi:    0, btch:   1 usd:   0
CPU    3: hi:    0, btch:   1 usd:   0
Node 0 DMA32 per-cpu:
CPU    0: hi:  186, btch:  31 usd:   0
CPU    1: hi:  186, btch:  31 usd:   0
CPU    2: hi:  186, btch:  31 usd:   0
CPU    3: hi:  186, btch:  31 usd:   0
Node 0 Normal per-cpu:
CPU    0: hi:  186, btch:  31 usd:   0
CPU    1: hi:  186, btch:  31 usd:   0
CPU    2: hi:  186, btch:  31 usd:  30
CPU    3: hi:  186, btch:  31 usd:  92
active_anon:1618270 inactive_anon:292236 isolated_anon:0
 active_file:0 inactive_file:12 isolated_file:0
 unevictable:22804 dirty:13 writeback:0 unstable:0
 free:25293 slab_reclaimable:4863 slab_unreclaimable:20291
 mapped:12015 shmem:14677 pagetables:7894 bounce:0
Node 0 DMA free:15716kB min:124kB low:152kB high:184kB active_anon:0kB
inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB
isolated(anon):0kB isolated(file):0kB present:15320kB mlocked:0kB dirty:0kB
writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB
kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB
pages_scanned:0 all_unreclaimable? yes
lowmem_reserve[]: 0 3512 8057 8057
Node 0 DMA32 free:47568kB min:29404kB low:36752kB high:44104kB
active_anon:2723100kB inactive_anon:544028kB active_file:4kB inactive_file:28kB
unevictable:24kB isolated(anon):0kB isolated(file):0kB present:3596500kB
mlocked:24kB dirty:4kB writeback:0kB mapped:328kB shmem:4kB
slab_reclaimable:48kB slab_unreclaimable:196kB kernel_stack:0kB
pagetables:8600kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:64
all_unreclaimable? yes
lowmem_reserve[]: 0 0 4545 4545
Node 0 Normal free:37888kB min:38052kB low:47564kB high:57076kB
active_anon:3749980kB inactive_anon:624916kB active_file:0kB inactive_file:20kB
unevictable:91192kB isolated(anon):0kB isolated(file):0kB present:4654080kB
mlocked:62576kB dirty:48kB writeback:0kB mapped:47732kB shmem:58704kB
slab_reclaimable:19404kB slab_unreclaimable:80968kB kernel_stack:11680kB
pagetables:22976kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:313
all_unreclaimable? yes
lowmem_reserve[]: 0 0 0 0
Node 0 DMA: 3*4kB 1*8kB 1*16kB 2*32kB 2*64kB 1*128kB 0*256kB 0*512kB 1*1024kB
1*2048kB 3*4096kB = 15716kB
Node 0 DMA32: 376*4kB 137*8kB 37*16kB 31*32kB 33*64kB 5*128kB 4*256kB 4*512kB
1*1024kB 2*2048kB 8*4096kB = 47896kB
Node 0 Normal: 8430*4kB 0*8kB 1*16kB 1*32kB 1*64kB 1*128kB 1*256kB 1*512kB
1*1024kB 1*2048kB 0*4096kB = 37800kB
39246 total pagecache pages
22008 pages in swap cache
Swap cache stats: add 1306301, delete 1284293, find 109183/161893
Free swap  = 0kB
Total swap = 3145724kB
2097151 pages RAM
82361 pages reserved
28737 pages shared
1970987 pages non-shared
[ pid ]   uid  tgid total_vm      rss cpu oom_adj oom_score_adj name
[  641]     0   641     2776      105   1     -17         -1000 udevd
[ 1362]     0  1362     2281       83   2       0             0 dhclient
[ 1426]     0  1426     6900      158   2     -17         -1000 auditd
[ 1494]     0  1494     4578      139   2       0             0 irqbalance
[ 1512]    32  1512     4746      148   2       0             0 rpcbind
[ 1568]    81  1568    24364      226   2       0             0 dbus-daemon
[ 1590]     0  1590    47244      238   0       0             0 cupsd
[ 1622]     0  1622     1021      133   0       0             0 acpid
[ 1634]    68  1634     9500      305   2       0             0 hald
[ 1635]     0  1635     5101      132   0       0             0 hald-runner
[ 1667]     0  1667     5631      118   3       0             0 hald-addon-inpu
[ 1674]    68  1674     4503      161   3       0             0 hald-addon-acpi
[ 1701]     0  1701    96537      251   0       0             0 automount
[ 6072]     0  6072     1698       35   1       0             0 mcelog
[ 6089]     0  6089    16560       89   2     -17         -1000 sshd
[ 6168]     0  6168    20226      287   2       0             0 master
[ 6172]    89  6172    20289      273   2       0             0 qmgr
[ 6197]     0  6197    45773      220   2       0             0 abrtd
[ 6224]     0  6224     5278       71   2       0             0 atd
[ 6238]     0  6238    25235      113   1       0             0 rhnsd
[ 6250]     0  6250    27088       99   2       0             0 rhsmcertd
[ 6267]     0  6267    16092      105   2       0             0 certmonger
[ 6312]     0  6312     1017      115   2       0             0 mingetty
[ 6314]     0  6314     1017      115   2       0             0 mingetty
[ 6316]     0  6316     1017      115   0       0             0 mingetty
[ 6318]     0  6318     1017      115   2       0             0 mingetty
[ 6320]     0  6320     1017      115   0       0             0 mingetty
[ 6322]     0  6322     1017      115   2       0             0 mingetty
[ 8731]     0  8731    29221      197   2       0             0 crond
[ 8756]     0  8756    62806      244   0       0             0 rsyslogd
[ 9346]     0  9346   167548      405   0       0             0 glusterd
[10370]     0 10370   373001      555   1       0             0 glusterfsd
[10473]     0 10473   211916      436   3       0             0 glusterfs
[11607]     0 11607    15925      197   3       0             0 check_gluster_s
[25769]     0 25769   175887     7686   0     -17         -1000 dmeventd
[27899]     0 27899    44284      284   2       0             0 tuned
[ 8055]     0  8055     5852      186   1       0             0 rpc.statd
[26157]     0 26157    25553      340   2       0             0 sshd
[26258]     0 26258    27089      204   2       0             0 bash
[29145]     0 29145   320209     3119   0       0             0 glusterfsd
[29165]     0 29165   320210     3635   0       0             0 glusterfsd
[29185]     0 29185   319695     4118   3       0             0 glusterfsd
[29206]     0 29206   263494      945   1       0             0 glusterfs
[31559]     0 31559     2973      100   0     -17         -1000 udevd
[31904]     0 31904  3793222  1859244   1       0             0 ganesha.nfsd
[32496]     0 32496     2972       97   2     -17         -1000 udevd
[  444]     0   444   152050    19556   2       0             0 corosync
[  510]     0   510    49291      143   2     -16          -941 fenced
[  525]     0   525    52509      138   1     -16          -941 dlm_controld
[  589]     0   589    32284      133   2     -16          -941 gfs_controld
[  751]     0   751    20100      313   2       0             0 pacemakerd
[  758]   189   758    23995     1230   2       0             0 cib
[  759]     0   759    23889      346   2       0             0 stonithd
[  760]     0   760    15627      365   2       0             0 lrmd
[  761]   189   761    21431      392   2       0             0 attrd
[  762]   189   762    29577      740   0       0             0 pengine
[  763]     0   763    34234      737   2       0             0 crmd
[  790]     0   790    35543    10087   3       0             0 ruby
[ 3248]     0  3248    45176     1674   2       0             0 pcs
Out of memory: Kill process 31904 (ganesha.nfsd) score 884 or sacrifice child
Killed process 31904, UID 0, (ganesha.nfsd) total-vm:15172888kB,
anon-rss:7435736kB, file-rss:1240kB

Actual results:

OOM kill of nfs-ganesha on one node while posix_compliance test suite is
executed.

Expected results:

There should not be any OOM_kills

Additional info:

sosreports will be attached.

--- Additional comment from Red Hat Bugzilla Rules Engine on 2016-10-04
03:01:08 EDT ---

This bug is automatically being proposed for the current release of Red Hat
Gluster Storage 3 under active development, by setting the release flag
'rhgs‑3.2.0' to '?'. 

If this bug should be proposed for a different release, please manually change
the proposed release flag.

--- Additional comment from Shashank Raj on 2016-10-04 06:05:30 EDT ---

sosreports and logs can be accessed at
http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1381452

--- Additional comment from Soumya Koduri on 2016-10-05 01:57:50 EDT ---

Shashank,

Please turn off features.cache-invalidation for that volume and re-run the
tests. If the oom_score of ganesha process is still high after the tests
complete, please tune the no. of worker threads of nfs-ganesha to 16 using
below config option and re-try the tests.

NFS_Core_Param
{
        Nb_Worker = 16;
}

--- Additional comment from Shashank Raj on 2016-10-05 04:37:24 EDT ---

Tried running posix_compliance again with both features.cache-invalidation
on/off and i am not able to reproduce this issue again.

So it seems some other test in fs-sanity is the culprit for this issue.

Will keep trying it and update bug accordingly. For now changing the bug title
as appropriate.

--- Additional comment from John Skeoch on 2016-11-07 22:54:17 EST ---

User sraj at redhat.com's account has been closed

--- Additional comment from John Skeoch on 2016-11-07 22:57:23 EST ---

User sraj at redhat.com's account has been closed

--- Additional comment from Soumya Koduri on 2016-11-08 04:16:26 EST ---

(In reply to Shashank Raj from comment #4)
> Tried running posix_compliance again with both features.cache-invalidation
> on/off and i am not able to reproduce this issue again.
> 
> So it seems some other test in fs-sanity is the culprit for this issue.
> 
> Will keep trying it and update bug accordingly. For now changing the bug
> title as appropriate.

Surabhi,

Could you please check the same and update the bug with the details of the test
which may be causing this issue.

--- Additional comment from Arthy Loganathan on 2016-11-17 09:59:22 EST ---

For 6X2 volume, while executing posix_compliance tests, ganesha gets oom_killed
on the mounted node always when the oom_score reaches to ~870.

sosreports are at, http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1381452/

--- Additional comment from Soumya Koduri on 2016-11-17 10:01:58 EST ---

Does that mean the issue is not seen with lesser no. of replica bricks (for
eg., 2*2 volume configuration)?

--- Additional comment from Arthy Loganathan on 2016-11-17 22:37:04 EST ---

I have tried with volumes with lesser bricks like plain distribute with 2
bricks and 1*2 volume, and issue is not seen. 

As Jiffin suggested, I have executed the following test with 6*2 volume

prove -vf /opt/qa/tools/posix-testsuite/tests/rename/00.t

and oom_score increases drastically when this test is running. 

dmesg:

[248560.640500] Call Trace:
[248560.640511]  [<ffffffff81685eac>] dump_stack+0x19/0x1b
[248560.640516]  [<ffffffff81680e57>] dump_header+0x8e/0x225
[248560.640523]  [<ffffffff812ae71b>] ? cred_has_capability+0x6b/0x120
[248560.640530]  [<ffffffff8113cb03>] ? delayacct_end+0x33/0xb0
[248560.640537]  [<ffffffff8118460e>] oom_kill_process+0x24e/0x3c0
[248560.640542]  [<ffffffff810936ce>] ? has_capability_noaudit+0x1e/0x30
[248560.640545]  [<ffffffff81184e46>] out_of_memory+0x4b6/0x4f0
[248560.640548]  [<ffffffff81681960>] __alloc_pages_slowpath+0x5d7/0x725
[248560.640552]  [<ffffffff8118af55>] __alloc_pages_nodemask+0x405/0x420
[248560.640556]  [<ffffffff811cf10a>] alloc_pages_current+0xaa/0x170
[248560.640563]  [<ffffffff8106a587>] pte_alloc_one+0x17/0x40
[248560.640568]  [<ffffffff811adb23>] __pte_alloc+0x23/0x170
[248560.640571]  [<ffffffff811b1535>] handle_mm_fault+0xe25/0xfe0
[248560.640574]  [<ffffffff811b76d5>] ? do_mmap_pgoff+0x305/0x3c0
[248560.640579]  [<ffffffff81691994>] __do_page_fault+0x154/0x450
[248560.640581]  [<ffffffff81691cc5>] do_page_fault+0x35/0x90
[248560.640584]  [<ffffffff8168df88>] page_fault+0x28/0x30
[248560.640586] Mem-Info:
[248560.640591] active_anon:1620957 inactive_anon:292997 isolated_anon:0
 active_file:0 inactive_file:974 isolated_file:0
 unevictable:6562 dirty:0 writeback:0 unstable:0
 slab_reclaimable:7116 slab_unreclaimable:13556
 mapped:5683 shmem:8641 pagetables:7532 bounce:0
 free:25150 free_pcp:474 free_cma:0
[248560.640595] Node 0 DMA free:15852kB min:132kB low:164kB high:196kB
active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB
unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15936kB
managed:15852kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB
slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB
unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[248560.640602] lowmem_reserve[]: 0 3327 7805 7805
[248560.640605] Node 0 DMA32 free:46200kB min:28752kB low:35940kB high:43128kB
active_anon:2727832kB inactive_anon:545796kB active_file:0kB
inactive_file:2536kB unevictable:16040kB isolated(anon):0kB isolated(file):0kB
present:3653620kB managed:3408880kB mlocked:16040kB dirty:0kB writeback:0kB
mapped:16596kB shmem:12260kB slab_reclaimable:9708kB slab_unreclaimable:24740kB
kernel_stack:4944kB pagetables:11448kB unstable:0kB bounce:0kB free_pcp:792kB
local_pcp:120kB free_cma:0kB writeback_tmp:0kB pages_scanned:285
all_unreclaimable? yes
[248560.640611] lowmem_reserve[]: 0 0 4478 4478
[248560.640613] Node 0 Normal free:38548kB min:38696kB low:48368kB high:58044kB
active_anon:3755996kB inactive_anon:626192kB active_file:0kB
inactive_file:1360kB unevictable:10208kB isolated(anon):0kB isolated(file):0kB
present:4718592kB managed:4585756kB mlocked:10208kB dirty:0kB writeback:0kB
mapped:6136kB shmem:22304kB slab_reclaimable:18756kB slab_unreclaimable:29484kB
kernel_stack:7872kB pagetables:18680kB unstable:0kB bounce:0kB free_pcp:1104kB
local_pcp:160kB free_cma:0kB writeback_tmp:0kB pages_scanned:1049
all_unreclaimable? yes
[248560.640618] lowmem_reserve[]: 0 0 0 0
[248560.640620] Node 0 DMA: 1*4kB (U) 1*8kB (U) 0*16kB 1*32kB (U) 1*64kB (U)
1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) =
15852kB
[248560.640630] Node 0 DMA32: 1564*4kB (UE) 956*8kB (UE) 723*16kB (UEM)
388*32kB (UEM) 120*64kB (UEM) 5*128kB (EM) 0*256kB 0*512kB 0*1024kB 0*2048kB
0*4096kB = 46208kB
[248560.640639] Node 0 Normal: 1174*4kB (UEM) 1024*8kB (UEM) 691*16kB (UEM)
305*32kB (UEM) 53*64kB (UEM) 9*128kB (M) 1*256kB (M) 0*512kB 0*1024kB 0*2048kB
0*4096kB = 38504kB
[248560.640649] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0
hugepages_size=2048kB
[248560.640651] 18106 total pagecache pages
[248560.640653] 6146 pages in swap cache
[248560.640654] Swap cache stats: add 1107386, delete 1101240, find
294696/305552
[248560.640655] Free swap  = 0kB
[248560.640656] Total swap = 2097148kB
[248560.640657] 2097037 pages RAM
[248560.640658] 0 pages HighMem/MovableOnly
[248560.640659] 94415 pages reserved
[248560.640660] [ pid ]   uid  tgid total_vm      rss nr_ptes swapents
oom_score_adj name
[248560.640666] [  685]     0   685    17664     2179      39       49         
   0 systemd-journal
[248560.640668] [  716]     0   716   220817      676      46     1476         
   0 lvmetad
[248560.640671] [  722]     0   722    11679      635      22      546        
-1000 systemd-udevd
[248560.640675] [  881]     0   881   179084     6113      49        0        
-1000 dmeventd
[248560.640685] [ 1273]     0  1273    13854      234      26       89        
-1000 auditd
[248560.640688] [ 1292]     0  1292     4826      217      14       37         
   0 irqbalance
[248560.640690] [ 1293]    81  1293     8197      262      17       71         
-900 dbus-daemon
[248560.640692] [ 1296]     0  1296     6156      261      15      138         
   0 systemd-logind
[248560.640695] [ 1299]   998  1299   132067      351      54     1894         
   0 polkitd
[248560.640697] [ 1310]   997  1310    28962      310      26       42         
   0 chronyd
[248560.640699] [ 1311]    32  1311    16237      175      34      104         
   0 rpcbind
[248560.640701] [ 1322]     0  1322    50303      142      40      114         
   0 gssproxy
[248560.640704] [ 1334]     0  1334    82865      469      84     5904         
   0 firewalld
[248560.640706] [ 1691]     0  1691    28206      115      52     3081         
   0 dhclient
[248560.640708] [ 1785]     0  1785    28335       98      12       37         
   0 rhsmcertd
[248560.640710] [ 1787]     0  1787   138291      385      87     2567         
   0 tuned
[248560.640712] [ 1798]     0  1798    20617       91      42      190        
-1000 sshd
[248560.640715] [ 1916]     0  1916    22244      222      42      238         
   0 master
[248560.640717] [ 1918]    89  1918    22287      245      44      236         
   0 qmgr
[248560.640719] [ 2334]     0  2334    31556      209      17      133         
   0 crond
[248560.640721] [ 2385]     0  2385    26978      101       8       37         
   0 rhnsd
[248560.640723] [ 2388]     0  2388    27509      164      10       33         
   0 agetty
[248560.640726] [17375]    29 17375    10605      230      24      177         
   0 rpc.statd
[248560.640728] [16763]     0 16763    72838     1270      59      105         
   0 rsyslogd
[248560.640730] [16951]     0 16951   151619      470      86    12040         
   0 glusterd
[248560.640733] [27747]     0 27747   428530     2595     125    10071         
   0 glusterfsd
[248560.640735] [27962]     0 27962   226969     5025      89     6433         
   0 glusterfs
[248560.640737] [29536]     0 29536    49589     2611      63     2017         
   0 corosync
[248560.640739] [29552]     0 29552    33157      377      64     1026         
   0 pacemakerd
[248560.640741] [29554]   189 29554    35595     2224      72     1416         
   0 cib
[248560.640744] [29555]     0 29555    34361      885      69      479         
   0 stonithd
[248560.640746] [29556]     0 29556    26273      371      52      228         
   0 lrmd
[248560.640748] [29557]   189 29557    31731      940      64      345         
   0 attrd
[248560.640750] [29558]   189 29558    38963     2038      71      241         
   0 pengine
[248560.640752] [29559]   189 29559    47014     2147      79      880         
   0 crmd
[248560.640754] [29577]     0 29577   244360     8064      98     2064         
   0 pcsd
[248560.640757] [ 6278]     0  6278  3262386  1857506    4856   406101         
   0 ganesha.nfsd
[248560.640759] [22343]     0 22343    35726      306      72      290         
   0 sshd
[248560.640761] [22358]     0 22358    28879      278      14       48         
   0 bash
[248560.640764] [27763]     0 27763   330732     1777     113     8853         
   0 glusterfsd
[248560.640767] [27785]     0 27785   330733     2281     115     9062         
   0 glusterfsd
[248560.640769] [27804]     0 27804   314090     3314     110     8708         
   0 glusterfsd
[248560.640771] [27836]     0 27836   255249     6445     106    14561         
   0 glusterfs
[248560.640773] [22453]    89 22453    22270      479      42        0         
   0 pickup
[248560.640776] [ 5088]     0  5088    35726      635      71        0         
   0 sshd
[248560.640778] [ 5111]     0  5111    28879      319      15        0         
   0 bash
[248560.640780] [11672]     0 11672    26984      136      10        0         
   0 tail
[248560.640782] [14710]     0 14710    35726      581      72        0         
   0 sshd
[248560.640784] [14745]     0 14745    28879      311      14        0         
   0 bash
[248560.640787] [15813]     0 15813    28910      333      14        0         
   0 ganesha_grace
[248560.640789] [15819]     0 15819    28910      180      10        0         
   0 ganesha_grace
[248560.640791] [15820]     0 15820    30197      552      62        0         
   0 crm_attribute
[248560.640793] [15821]     0 15821    28877      274      14        0         
   0 portblock
[248560.640795] [15824]     0 15824    28811      185      14        0         
   0 ganesha_mon
[248560.640797] [15825]     0 15825    28877      125      10        0         
   0 portblock
[248560.640800] [15826]     0 15826    28811       98      11        0         
   0 ganesha_mon
[248560.640801] [15827]     0 15827    26974      127      10        0         
   0 basename
[248560.640803] Out of memory: Kill process 6278 (ganesha.nfsd) score 870 or
sacrifice child
[248560.640886] Killed process 6278 (ganesha.nfsd) total-vm:13049544kB,
anon-rss:7430024kB, file-rss:0kB, shmem-rss:0kB

--- Additional comment from Worker Ant on 2016-11-21 09:07:18 EST ---

REVIEW: http://review.gluster.org/15894 (dht/rename : Incase of failure remove
linkto file properly) posted (#1) for review on master by jiffin tony Thottan
(jthottan at redhat.com)

--- Additional comment from Worker Ant on 2016-12-01 00:31:55 EST ---

REVIEW: http://review.gluster.org/15894 (dht/rename : Incase of failure remove
linkto file properly) posted (#2) for review on master by jiffin tony Thottan
(jthottan at redhat.com)

--- Additional comment from Jiffin on 2016-12-01 00:35:05 EST ---

The above issue happens when rename/00.t test executed on nfs-ganesha clients :
Steps executed in that script
 * create a file using root
 * rename the file using a non root user, it fails with EACESS
 * delete the file 
 * create directory directory using root
 * rename the directory using non root user, test hungs and slowly led to OOM
kill of ganesha 

RCA put forwarded by Du for OOM kill of ganesha
Note that when we hit this bug, we've a scenario of a dentry being present as:
 * a linkto file on one subvol 
 * a directory on rest of subvols

When a lookup happens on the dentry in such a scenario, the control flow goes
into an infinite loop of: 
dht_lookup_everywhere
dht_lookup_everywhere_cbk
dht_lookup_unlink_cbk
dht_lookup_everywhere_done
dht_lookup_directory (as local->dir_count > 0)
dht_lookup_dir_cbk (sets to local->need_selfheal = 1 as the entry is a linkto
file on one of the subvol)
dht_lookup_everywhere (as need_selfheal = 1).

This infinite loop can cause increased consumption of memory due to:
1) dht_lookup_directory assigns a new layout to local->layout unconditionally
2) Most of the functions in this loop do a stack_wind of various fops.

This results in growing of call stack (note that call-stack is destroyed only
after lookup response is received by fuse - which never happens in this case)

--- Additional comment from Worker Ant on 2016-12-01 01:10:08 EST ---

REVIEW: http://review.gluster.org/15894 (dht/rename : Incase of failure remove
linkto file properly) posted (#3) for review on master by jiffin tony Thottan
(jthottan at redhat.com)

--- Additional comment from Worker Ant on 2016-12-01 10:47:38 EST ---

COMMIT: http://review.gluster.org/15894 committed in master by Vijay Bellur
(vbellur at redhat.com) 
------
commit 57d59f4be205ae0c7888758366dc0049bdcfe449
Author: Jiffin Tony Thottan <jthottan at redhat.com>
Date:   Mon Nov 21 18:08:14 2016 +0530

    dht/rename : Incase of failure remove linkto file properly

    Generally linkto file is created using root user. Consider following
    case, a user is trying to rename a file which he is not permitted.
    So the rename fails with EACESS and when rename tries to cleanup the
    linkto file, it fails.

    The above issue happens when rename/00.t test executed on nfs-ganesha
    clients :
    Steps executed in script
    * create a file "abc" using root
    * rename the file "abc" to "xyz" using a non root user, it fails with
EACESS
    * delete "abc"
    * create directory "abc" using root
    * again try ot rename "abc" to "xyz" using non root user, test hungs here
    which slowly leds to OOM kill of ganesha process

    RCA put forwarded by Du for OOM kill of ganesha
    Note that when we hit this bug, we've a scenario of a dentry being
    present as:
        * a linkto file on one subvol
        * a directory on rest of subvols

    When a lookup happens on the dentry in such a scenario, the control flow
    goes into an infinite loop of:

        dht_lookup_everywhere
        dht_lookup_everywhere_cbk
        dht_lookup_unlink_cbk
        dht_lookup_everywhere_done
        dht_lookup_directory (as local->dir_count > 0)
        dht_lookup_dir_cbk (sets to local->need_selfheal = 1 as the entry is a
linkto file on one of the subvol)
        dht_lookup_everywhere (as need_selfheal = 1).

    This infinite loop can cause increased consumption of memory due to:
    1) dht_lookup_directory assigns a new layout to local->layout
unconditionally
    2)  Most of the functions in this loop do a stack_wind of various fops.

    This results in growing of call stack (note that call-stack is destroyed
only after lookup response is
    received by fuse - which never happens in this case)

    Thanks Du for root causing the oom kill and Sushant for suggesting the fix

    Change-Id: I1e16bc14aa685542afbd21188426ecb61fd2689d
    BUG: 1397052
    Signed-off-by: Jiffin Tony Thottan <jthottan at redhat.com>
    Reviewed-on: http://review.gluster.org/15894
    NetBSD-regression: NetBSD Build System <jenkins at build.gluster.org>
    CentOS-regression: Gluster Build System <jenkins at build.gluster.org>
    Smoke: Gluster Build System <jenkins at build.gluster.org>
    Reviewed-by: Raghavendra G <rgowdapp at redhat.com>

--- Additional comment from Jiffin on 2016-12-02 09:37:08 EST ---

Patch got merged upstream master

Downstream patch link
https://code.engineering.redhat.com/gerrit/#/c/92002/1

Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1381452
[Bug 1381452] OOM kill of nfs-ganesha on one node while fs-sanity test
suite is executed.
https://bugzilla.redhat.com/show_bug.cgi?id=1397052
[Bug 1397052] OOM kill of nfs-ganesha on one node while fs-sanity test
suite is executed.
-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.