[Gluster-users] OOM Kills glustershd process in 3.10.1
Amudhan P
amudhan83 at gmail.com
Fri Apr 28 06:45:25 UTC 2017
Thanks, for pointing out will check that.
On Thu, Apr 27, 2017 at 1:51 PM, Edvin Ekström <edvin.ekstrom at screen9.com>
wrote:
> I've encountered the same issue, however in my case it seem to have been
> caused by a bug in the kernel that was present between 4.4.0-58 - 4.4.0-63 (
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1655842), seeing how
> you are running 4.4.0-62 I would suggest upgrading and see if the error
> persists.
>
> Edvin Ekström,
>
> On 2017-04-26 09:09, Amudhan P wrote:
>
> I did volume start force and now self-heal daemon is up on the node which
> was down.
>
> But bitrot has triggered crawling process on all node now, why was it
> crawling disk again? if the process is running already.
>
> [output from bitd.log]
> [2017-04-13 06:01:23.930089] I [glusterfsd-mgmt.c:1778:mgmt_getspec_cbk]
> 0-glusterfs: No change in volfile, continuing
> [2017-04-26 06:51:46.998935] I [MSGID: 100030] [glusterfsd.c:2460:main]
> 0-/usr/local/sbin/glusterfs: Started running /usr/local/sbin/glusterfs
> version 3.10.1 (args: /usr/local/sbin/glusterfs -s localhost --volfile-id
> gluster/bitd -p /var/lib/glusterd/bitd/run/bitd.pid -l
> /var/log/glusterfs/bitd.log -S /var/run/gluster/
> 02f1dd346d47b9006f9bf64e347338fd.socket --global-timer-wheel)
> [2017-04-26 06:51:47.002732] I [MSGID: 101190] [event-epoll.c:629:event_dispatch_epoll_worker]
> 0-epoll: Started thread with index 1
>
>
> On Tue, Apr 25, 2017 at 11:01 PM, Amudhan P <amudhan83 at gmail.com> wrote:
>
>> Yes, I have enabled bitrot process and it's currently running signer
>> process in some nodes.
>>
>> Disabling and enabling bitrot doesn't makes difference it will start
>> crawl process again right.
>>
>>
>> On Tuesday, April 25, 2017, Atin Mukherjee <amukherj at redhat.com> wrote:
>> >
>> >
>> > On Tue, Apr 25, 2017 at 9:22 PM, Amudhan P <amudhan83 at gmail.com> wrote:
>> >>
>> >> Hi Pranith,
>> >> if I restart glusterd service in the node alone will it work. bcoz I
>> feel that doing volume force start will trigger bitrot process to crawl
>> disks in all nodes.
>> >
>> > Have you enabled bitrot? If not then the process will not be in
>> existence. As a workaround you can always disable this option before
>> executing volume start force. Please note volume start force doesn't affect
>> any running processes.
>> >
>> >>
>> >> yes, rebalance fix layout is on process.
>> >> regards
>> >> Amudhan
>> >>
>> >> On Tue, Apr 25, 2017 at 9:15 PM, Pranith Kumar Karampuri <
>> pkarampu at redhat.com> wrote:
>> >>>
>> >>> You can restart the process using:
>> >>> gluster volume start <volname> force
>> >>>
>> >>> Did shd on this node heal a lot of data? Based on the kind of memory
>> usage it showed, seems like there is a leak.
>> >>>
>> >>>
>> >>> Sunil,
>> >>> Could you find if there any leaks in this particular version
>> that we might have missed in our testing?
>> >>>
>> >>> On Tue, Apr 25, 2017 at 8:37 PM, Amudhan P <amudhan83 at gmail.com>
>> wrote:
>> >>>>
>> >>>> Hi,
>> >>>> In one of my node glustershd process is killed due to OOM and this
>> happened only in one node out of 40 node cluster.
>> >>>> Node running on Ubuntu 16.04.2.
>> >>>> dmesg output:
>> >>>> [Mon Apr 24 17:21:38 2017] nrpe invoked oom-killer:
>> gfp_mask=0x26000c0, order=2, oom_score_adj=0
>> >>>> [Mon Apr 24 17:21:38 2017] nrpe cpuset=/ mems_allowed=0
>> >>>> [Mon Apr 24 17:21:38 2017] CPU: 0 PID: 12626 Comm: nrpe Not tainted
>> 4.4.0-62-generic #83-Ubuntu
>> >>>> [Mon Apr 24 17:21:38 2017] 0000000000000286 00000000fc26b170
>> ffff88048bf27af0 ffffffff813f7c63
>> >>>> [Mon Apr 24 17:21:38 2017] ffff88048bf27cc8 ffff88082a663c00
>> ffff88048bf27b60 ffffffff8120ad4e
>> >>>> [Mon Apr 24 17:21:38 2017] ffff88087781a870 ffff88087781a860
>> ffffea0011285a80 0000000100000001
>> >>>> [Mon Apr 24 17:21:38 2017] Call Trace:
>> >>>> [Mon Apr 24 17:21:38 2017] [<ffffffff813f7c63>] dump_stack+0x63/0x90
>> >>>> [Mon Apr 24 17:21:38 2017] [<ffffffff8120ad4e>]
>> dump_header+0x5a/0x1c5
>> >>>> [Mon Apr 24 17:21:38 2017] [<ffffffff811926c2>]
>> oom_kill_process+0x202/0x3c0
>> >>>> [Mon Apr 24 17:21:38 2017] [<ffffffff81192ae9>]
>> out_of_memory+0x219/0x460
>> >>>> [Mon Apr 24 17:21:38 2017] [<ffffffff81198a5d>]
>> __alloc_pages_slowpath.constprop.88+0x8fd/0xa70
>> >>>> [Mon Apr 24 17:21:38 2017] [<ffffffff81198e56>]
>> __alloc_pages_nodemask+0x286/0x2a0
>> >>>> [Mon Apr 24 17:21:38 2017] [<ffffffff81198f0b>]
>> alloc_kmem_pages_node+0x4b/0xc0
>> >>>> [Mon Apr 24 17:21:38 2017] [<ffffffff8107ea5e>]
>> copy_process+0x1be/0x1b70
>> >>>> [Mon Apr 24 17:21:38 2017] [<ffffffff8122d013>] ?
>> __fd_install+0x33/0xe0
>> >>>> [Mon Apr 24 17:21:38 2017] [<ffffffff81713d01>] ?
>> release_sock+0x111/0x160
>> >>>> [Mon Apr 24 17:21:38 2017] [<ffffffff810805a0>] _do_fork+0x80/0x360
>> >>>> [Mon Apr 24 17:21:38 2017] [<ffffffff8122429c>] ?
>> SyS_select+0xcc/0x110
>> >>>> [Mon Apr 24 17:21:38 2017] [<ffffffff81080929>] SyS_clone+0x19/0x20
>> >>>> [Mon Apr 24 17:21:38 2017] [<ffffffff818385f2>]
>> entry_SYSCALL_64_fastpath+0x16/0x71
>> >>>> [Mon Apr 24 17:21:38 2017] Mem-Info:
>> >>>> [Mon Apr 24 17:21:38 2017] active_anon:553952 inactive_anon:206987
>> isolated_anon:0
>> >>>> active_file:3410764 inactive_
>> file:3460179 isolated_file:0
>> >>>> unevictable:4914 dirty:212868
>> writeback:0 unstable:0
>> >>>> slab_reclaimable:386621
>> slab_unreclaimable:31829
>> >>>> mapped:6112 shmem:211 pagetables:6178
>> bounce:0
>> >>>> free:82623 free_pcp:213 free_cma:0
>> >>>> [Mon Apr 24 17:21:38 2017] Node 0 DMA free:15880kB min:32kB low:40kB
>> high:48kB active_anon:0kB inactive_anon:0k
>> >>>> B active_file:0kB inactive_file:0kB unevictable:0kB
>> isolated(anon):0kB isolated(file):0kB present:15964kB manag
>> >>>> ed:15880kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB
>> slab_reclaimable:0kB slab_unreclaimable:0kB
>> >>>> kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB
>> free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:
>> >>>> 0kB pages_scanned:0 all_unreclaimable? yes
>> >>>> [Mon Apr 24 17:21:38 2017] lowmem_reserve[]: 0 1868 31944 31944 31944
>> >>>> [Mon Apr 24 17:21:38 2017] Node 0 DMA32 free:133096kB min:3948kB
>> low:4932kB high:5920kB active_anon:170764kB in
>> >>>> active_anon:206296kB active_file:394236kB inactive_file:525288kB
>> unevictable:980kB isolated(anon):0kB isolated(
>> >>>> file):0kB present:2033596kB managed:1952976kB mlocked:980kB
>> dirty:1552kB writeback:0kB mapped:3904kB shmem:724k
>> >>>> B slab_reclaimable:502176kB slab_unreclaimable:8916kB
>> kernel_stack:1952kB pagetables:1408kB unstable:0kB bounce
>> >>>> :0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB
>> pages_scanned:0 all_unreclaimable? no
>> >>>> [Mon Apr 24 17:21:38 2017] lowmem_reserve[]: 0 0 30076 30076 30076
>> >>>> [Mon Apr 24 17:21:38 2017] Node 0 Normal free:181516kB min:63600kB
>> low:79500kB high:95400kB active_anon:2045044
>> >>>> kB inactive_anon:621652kB active_file:13248820kB inactive_
>> file:13315428kB unevictable:18676kB isolated(anon):0kB
>> isolated(file):0kB present:31322112kB managed:30798036kB mlocked:18676kB
>> dirty:849920kB writeback:0kB mapped:20544kB shmem:120kB
>> slab_reclaimable:1044308kB slab_unreclaimable:118400kB kernel_stack:33792kB
>> pagetables:23304kB unstable:0kB bounce:0kB free_pcp:852kB local_pcp:0kB
>> free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
>> >>>> [Mon Apr 24 17:21:38 2017] lowmem_reserve[]: 0 0 0 0 0
>> >>>> [Mon Apr 24 17:21:38 2017] Node 0 DMA: 0*4kB 1*8kB (U) 0*16kB 0*32kB
>> 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB
>> >>>> 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15880kB
>> >>>> [Mon Apr 24 17:21:38 2017] Node 0 DMA32: 18416*4kB (UME) 7480*8kB
>> (UME) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*
>> >>>> 512kB 0*1024kB 0*2048kB 0*4096kB = 133504kB
>> >>>> [Mon Apr 24 17:21:38 2017] Node 0 Normal: 44972*4kB (UMEH) 13*8kB
>> (EH) 13*16kB (H) 13*32kB (H) 8*64kB (H) 2*128
>> >>>> kB (H) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 181384kB
>> >>>> [Mon Apr 24 17:21:38 2017] Node 0 hugepages_total=0 hugepages_free=0
>> hugepages_surp=0 hugepages_size=1048576kB
>> >>>> [Mon Apr 24 17:21:38 2017] Node 0 hugepages_total=0 hugepages_free=0
>> hugepages_surp=0 hugepages_size=2048kB
>> >>>> [Mon Apr 24 17:21:38 2017] 6878703 total pagecache pages
>> >>>> [Mon Apr 24 17:21:38 2017] 2484 pages in swap cache
>> >>>> [Mon Apr 24 17:21:38 2017] Swap cache stats: add 3533870, delete
>> 3531386, find 3743168/4627884
>> >>>> [Mon Apr 24 17:21:38 2017] Free swap = 14976740kB
>> >>>> [Mon Apr 24 17:21:38 2017] Total swap = 15623164kB
>> >>>> [Mon Apr 24 17:21:38 2017] 8342918 pages RAM
>> >>>> [Mon Apr 24 17:21:38 2017] 0 pages HighMem/MovableOnly
>> >>>> [Mon Apr 24 17:21:38 2017] 151195 pages reserved
>> >>>> [Mon Apr 24 17:21:38 2017] 0 pages cma reserved
>> >>>> [Mon Apr 24 17:21:38 2017] 0 pages hwpoisoned
>> >>>> [Mon Apr 24 17:21:38 2017] [ pid ] uid tgid total_vm rss
>> nr_ptes nr_pmds swapents oom_score_adj name
>> >>>> [Mon Apr 24 17:21:38 2017] [ 566] 0 566 15064 460
>> 33 3 1108 0 systemd
>> >>>> -journal
>> >>>> [Mon Apr 24 17:21:38 2017] [ 602] 0 602 23693 182
>> 16 3 0 0 lvmetad
>> >>>> [Mon Apr 24 17:21:38 2017] [ 613] 0 613 11241 589
>> 21 3 264 -1000 systemd
>> >>>> -udevd
>> >>>> [Mon Apr 24 17:21:38 2017] [ 1381] 100 1381 25081 440
>> 19 3 25 0 systemd
>> >>>> -timesyn
>> >>>> [Mon Apr 24 17:21:38 2017] [ 1447] 0 1447 1100 307
>> 7 3 0 0 acpid
>> >>>> [Mon Apr 24 17:21:38 2017] [ 1449] 0 1449 7252 374
>> 21 3 47 0 cron
>> >>>> [Mon Apr 24 17:21:38 2017] [ 1451] 0 1451 77253 994
>> 19 3 10 0 lxcfs
>> >>>> [Mon Apr 24 17:21:38 2017] [ 1483] 0 1483 6511 413
>> 18 3 42 0 atd
>> >>>> [Mon Apr 24 17:21:38 2017] [ 1505] 0 1505 7157 286
>> 18 3 36 0 systemd
>> >>>> -logind
>> >>>> [Mon Apr 24 17:21:38 2017] [ 1508] 104 1508 64099 376
>> 27 4 712 0 rsyslog
>> >>>> d
>> >>>> [Mon Apr 24 17:21:38 2017] [ 1510] 107 1510 10723 497
>> 25 3 45 -900 dbus-da
>> >>>> emon
>> >>>> [Mon Apr 24 17:21:38 2017] [ 1521] 0 1521 68970 178
>> 38 3 170 0 account
>> >>>> s-daemon
>> >>>> [Mon Apr 24 17:21:38 2017] [ 1526] 0 1526 6548 785
>> 16 3 63 0 smartd
>> >>>> [Mon Apr 24 17:21:38 2017] [ 1528] 0 1528 54412 146
>> 31 5 1806 0 snapd
>> >>>> [Mon Apr 24 17:21:38 2017] [ 1578] 0 1578 3416 335
>> 11 3 24 0 mdadm
>> >>>> [Mon Apr 24 17:21:38 2017] [ 1595] 0 1595 16380 470
>> 35 3 157 -1000 sshd
>> >>>> [Mon Apr 24 17:21:38 2017] [ 1610] 0 1610 69295 303
>> 40 4 57 0 polkitd
>> >>>> [Mon Apr 24 17:21:38 2017] [ 1618] 0 1618 1306 31
>> 8 3 0 0 iscsid
>> >>>> [Mon Apr 24 17:21:38 2017] [ 1619] 0 1619 1431 877
>> 8 3 0 -17 iscsid
>> >>>> [Mon Apr 24 17:21:38 2017] [ 1624] 0 1624 126363 8027
>> 122 4 22441 0 gluster
>> >>>> d
>> >>>> [Mon Apr 24 17:21:38 2017] [ 1688] 0 1688 4884 430
>> 15 3 46 0 irqbala
>> >>>> nce
>> >>>> [Mon Apr 24 17:21:38 2017] [ 1699] 0 1699 3985 348
>> 13 3 0 0 agetty
>> >>>> [Mon Apr 24 17:21:38 2017] [ 7001] 0 7001 500631 27874
>> 145 5 3356 0 gluster
>> >>>> fsd
>> >>>> [Mon Apr 24 17:21:38 2017] [ 8136] 0 8136 500631 28760
>> 141 5 2390 0 gluster
>> >>>> fsd
>> >>>> [Mon Apr 24 17:21:38 2017] [ 9280] 0 9280 533529 27752
>> 135 5 3200 0 gluster
>> >>>> fsd
>> >>>> [Mon Apr 24 17:21:38 2017] [12626] 111 12626 5991 420
>> 16 3 113 0 nrpe
>> >>>> [Mon Apr 24 17:21:38 2017] [14342] 0 14342 533529 28377
>> 135 5 2176 0 gluster
>> >>>> fsd
>> >>>> [Mon Apr 24 17:21:38 2017] [14361] 0 14361 534063 29190
>> 136 5 1972 0 gluster
>> >>>> fsd
>> >>>> [Mon Apr 24 17:21:38 2017] [14380] 0 14380 533529 28104
>> 136 6 2437 0 glusterfsd
>> >>>> [Mon Apr 24 17:21:38 2017] [14399] 0 14399 533529 27552
>> 131 5 2808 0 glusterfsd
>> >>>> [Mon Apr 24 17:21:38 2017] [14418] 0 14418 533529 29588
>> 138 5 2697 0 glusterfsd
>> >>>> [Mon Apr 24 17:21:38 2017] [14437] 0 14437 517080 28671
>> 146 5 2170 0 glusterfsd
>> >>>> [Mon Apr 24 17:21:38 2017] [14456] 0 14456 533529 28083
>> 139 5 3359 0 glusterfsd
>> >>>> [Mon Apr 24 17:21:38 2017] [14475] 0 14475 533529 28054
>> 134 5 2954 0 glusterfsd
>> >>>> [Mon Apr 24 17:21:38 2017] [14494] 0 14494 533529 28594
>> 135 5 2311 0 glusterfsd
>> >>>> [Mon Apr 24 17:21:38 2017] [14513] 0 14513 533529 28911
>> 138 5 2833 0 glusterfsd
>> >>>> [Mon Apr 24 17:21:38 2017] [14532] 0 14532 533529 28259
>> 134 6 3145 0 glusterfsd
>> >>>> [Mon Apr 24 17:21:38 2017] [14551] 0 14551 533529 27875
>> 138 5 2267 0 glusterfsd
>> >>>> [Mon Apr 24 17:21:38 2017] [14570] 0 14570 484716 28247
>> 142 5 2875 0 glusterfsd
>> >>>> [Mon Apr 24 17:21:38 2017] [27646] 0 27646 3697561 202086
>> 2830 17 16528 0 glusterfs
>> >>>> [Mon Apr 24 17:21:38 2017] [27655] 0 27655 787371 29588
>> 197 6 25472 0 glusterfs
>> >>>> [Mon Apr 24 17:21:38 2017] [27665] 0 27665 689585 605
>> 108 6 7008 0 glusterfs
>> >>>> [Mon Apr 24 17:21:38 2017] [29878] 0 29878 193833 36054
>> 241 4 41182 0 glusterfs
>> >>>> [Mon Apr 24 17:21:38 2017] Out of memory: Kill process 27646
>> (glusterfs) score 17 or sacrifice child
>> >>>> [Mon Apr 24 17:21:38 2017] Killed process 27646 (glusterfs)
>> total-vm:14790244kB, anon-rss:795040kB, file-rss:13304kB
>> >>>> /var/log/glusterfs/glusterd.log
>> >>>> [2017-04-24 11:53:51.359603] I [MSGID: 106006]
>> [glusterd-svc-mgmt.c:327:glusterd_svc_common_rpc_notify] 0-management:
>> glustershd has disconnected from glusterd.
>> >>>> what would have gone wrong?
>> >>>> regards
>> >>>> Amudhan
>> >>>>
>> >>>> _______________________________________________
>> >>>> Gluster-users mailing list
>> >>>> Gluster-users at gluster.org
>> >>>> http://lists.gluster.org/mailman/listinfo/gluster-users
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Pranith
>> >>
>> >>
>> >> _______________________________________________
>> >> Gluster-users mailing list
>> >> Gluster-users at gluster.org
>> >> http://lists.gluster.org/mailman/listinfo/gluster-users
>> >
>> >
>>
>
>
>
> _______________________________________________
> Gluster-users mailing listGluster-users at gluster.orghttp://lists.gluster.org/mailman/listinfo/gluster-users
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170428/cb2b5e17/attachment.html>
More information about the Gluster-users
mailing list