Yes, I have enabled bitrot process and it's currently running signer process in some nodes.<br><br>Disabling and enabling bitrot doesn't makes difference it will start crawl process again right.<br><br>On Tuesday, April 25, 2017, Atin Mukherjee <<a href="mailto:amukherj@redhat.com">amukherj@redhat.com</a>> wrote:<br>><br>><br>> On Tue, Apr 25, 2017 at 9:22 PM, Amudhan P <<a href="mailto:amudhan83@gmail.com">amudhan83@gmail.com</a>> wrote:<br>>><br>>> Hi Pranith,<br>>> if I restart glusterd service in the node alone will it work. bcoz I feel that doing volume force start will trigger bitrot process to crawl disks in all nodes.<br>><br>> Have you enabled bitrot? If not then the process will not be in existence. As a workaround you can always disable this option before executing volume start force. Please note volume start force doesn't affect any running processes.<br>>  <br>>><br>>> yes, rebalance fix layout is on process.<br>>> regards<br>>> Amudhan<br>>><br>>> On Tue, Apr 25, 2017 at 9:15 PM, Pranith Kumar Karampuri <<a href="mailto:pkarampu@redhat.com">pkarampu@redhat.com</a>> wrote:<br>>>><br>>>> You can restart the process using:<br>>>> gluster volume start <volname> force<br>>>><br>>>> Did shd on this node heal a lot of data? Based on the kind of memory usage it showed, seems like there is a leak.<br>>>><br>>>><br>>>> Sunil,<br>>>>       Could you find if there any leaks in this particular version that we might have missed in our testing?<br>>>><br>>>> On Tue, Apr 25, 2017 at 8:37 PM, Amudhan P <<a href="mailto:amudhan83@gmail.com">amudhan83@gmail.com</a>> wrote:<br>>>>><br>>>>> Hi,<br>>>>> In one of my node glustershd process is killed due to OOM and this happened only in one node out of 40 node cluster.<br>>>>> Node running on Ubuntu 16.04.2.<br>>>>> dmesg output:<br>>>>> [Mon Apr 24 17:21:38 2017] nrpe invoked oom-killer: gfp_mask=0x26000c0, order=2, oom_score_adj=0<br>>>>> [Mon Apr 24 17:21:38 2017] nrpe cpuset=/ mems_allowed=0<br>>>>> [Mon Apr 24 17:21:38 2017] CPU: 0 PID: 12626 Comm: nrpe Not tainted 4.4.0-62-generic #83-Ubuntu<br>>>>> [Mon Apr 24 17:21:38 2017]  0000000000000286 00000000fc26b170 ffff88048bf27af0 ffffffff813f7c63<br>>>>> [Mon Apr 24 17:21:38 2017]  ffff88048bf27cc8 ffff88082a663c00 ffff88048bf27b60 ffffffff8120ad4e<br>>>>> [Mon Apr 24 17:21:38 2017]  ffff88087781a870 ffff88087781a860 ffffea0011285a80 0000000100000001<br>>>>> [Mon Apr 24 17:21:38 2017] Call Trace:<br>>>>> [Mon Apr 24 17:21:38 2017]  [<ffffffff813f7c63>] dump_stack+0x63/0x90<br>>>>> [Mon Apr 24 17:21:38 2017]  [<ffffffff8120ad4e>] dump_header+0x5a/0x1c5<br>>>>> [Mon Apr 24 17:21:38 2017]  [<ffffffff811926c2>] oom_kill_process+0x202/0x3c0<br>>>>> [Mon Apr 24 17:21:38 2017]  [<ffffffff81192ae9>] out_of_memory+0x219/0x460<br>>>>> [Mon Apr 24 17:21:38 2017]  [<ffffffff81198a5d>] __alloc_pages_slowpath.constprop.88+0x8fd/0xa70<br>>>>> [Mon Apr 24 17:21:38 2017]  [<ffffffff81198e56>] __alloc_pages_nodemask+0x286/0x2a0<br>>>>> [Mon Apr 24 17:21:38 2017]  [<ffffffff81198f0b>] alloc_kmem_pages_node+0x4b/0xc0<br>>>>> [Mon Apr 24 17:21:38 2017]  [<ffffffff8107ea5e>] copy_process+0x1be/0x1b70<br>>>>> [Mon Apr 24 17:21:38 2017]  [<ffffffff8122d013>] ? __fd_install+0x33/0xe0<br>>>>> [Mon Apr 24 17:21:38 2017]  [<ffffffff81713d01>] ? release_sock+0x111/0x160<br>>>>> [Mon Apr 24 17:21:38 2017]  [<ffffffff810805a0>] _do_fork+0x80/0x360<br>>>>> [Mon Apr 24 17:21:38 2017]  [<ffffffff8122429c>] ? SyS_select+0xcc/0x110<br>>>>> [Mon Apr 24 17:21:38 2017]  [<ffffffff81080929>] SyS_clone+0x19/0x20<br>>>>> [Mon Apr 24 17:21:38 2017]  [<ffffffff818385f2>] entry_SYSCALL_64_fastpath+0x16/0x71<br>>>>> [Mon Apr 24 17:21:38 2017] Mem-Info:<br>>>>> [Mon Apr 24 17:21:38 2017] active_anon:553952 inactive_anon:206987 isolated_anon:0<br>>>>>               active_file:3410764 inactive_file:3460179 isolated_file:0<br>>>>>               unevictable:4914 dirty:212868 writeback:0 unstable:0<br>>>>>               slab_reclaimable:386621 slab_unreclaimable:31829<br>>>>>               mapped:6112 shmem:211 pagetables:6178 bounce:0<br>>>>>               free:82623 free_pcp:213 free_cma:0<br>>>>> [Mon Apr 24 17:21:38 2017] Node 0 DMA free:15880kB min:32kB low:40kB high:48kB active_anon:0kB inactive_anon:0k<br>>>>> B active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15964kB manag<br>>>>> ed:15880kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB<br>>>>>  kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:<br>>>>> 0kB pages_scanned:0 all_unreclaimable? yes<br>>>>> [Mon Apr 24 17:21:38 2017] lowmem_reserve[]: 0 1868 31944 31944 31944<br>>>>> [Mon Apr 24 17:21:38 2017] Node 0 DMA32 free:133096kB min:3948kB low:4932kB high:5920kB active_anon:170764kB in<br>>>>> active_anon:206296kB active_file:394236kB inactive_file:525288kB unevictable:980kB isolated(anon):0kB isolated(<br>>>>> file):0kB present:2033596kB managed:1952976kB mlocked:980kB dirty:1552kB writeback:0kB mapped:3904kB shmem:724k<br>>>>> B slab_reclaimable:502176kB slab_unreclaimable:8916kB kernel_stack:1952kB pagetables:1408kB unstable:0kB bounce<br>>>>> :0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no<br>>>>> [Mon Apr 24 17:21:38 2017] lowmem_reserve[]: 0 0 30076 30076 30076<br>>>>> [Mon Apr 24 17:21:38 2017] Node 0 Normal free:181516kB min:63600kB low:79500kB high:95400kB active_anon:2045044<br>>>>> kB inactive_anon:621652kB active_file:13248820kB inactive_file:13315428kB unevictable:18676kB isolated(anon):0kB isolated(file):0kB present:31322112kB managed:30798036kB mlocked:18676kB dirty:849920kB writeback:0kB mapped:20544kB shmem:120kB slab_reclaimable:1044308kB slab_unreclaimable:118400kB kernel_stack:33792kB pagetables:23304kB unstable:0kB bounce:0kB free_pcp:852kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no<br>>>>> [Mon Apr 24 17:21:38 2017] lowmem_reserve[]: 0 0 0 0 0<br>>>>> [Mon Apr 24 17:21:38 2017] Node 0 DMA: 0*4kB 1*8kB (U) 0*16kB 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB<br>>>>>  1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15880kB<br>>>>> [Mon Apr 24 17:21:38 2017] Node 0 DMA32: 18416*4kB (UME) 7480*8kB (UME) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*<br>>>>> 512kB 0*1024kB 0*2048kB 0*4096kB = 133504kB<br>>>>> [Mon Apr 24 17:21:38 2017] Node 0 Normal: 44972*4kB (UMEH) 13*8kB (EH) 13*16kB (H) 13*32kB (H) 8*64kB (H) 2*128<br>>>>> kB (H) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 181384kB<br>>>>> [Mon Apr 24 17:21:38 2017] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB<br>>>>> [Mon Apr 24 17:21:38 2017] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB<br>>>>> [Mon Apr 24 17:21:38 2017] 6878703 total pagecache pages<br>>>>> [Mon Apr 24 17:21:38 2017] 2484 pages in swap cache<br>>>>> [Mon Apr 24 17:21:38 2017] Swap cache stats: add 3533870, delete 3531386, find 3743168/4627884<br>>>>> [Mon Apr 24 17:21:38 2017] Free swap  = 14976740kB<br>>>>> [Mon Apr 24 17:21:38 2017] Total swap = 15623164kB<br>>>>> [Mon Apr 24 17:21:38 2017] 8342918 pages RAM<br>>>>> [Mon Apr 24 17:21:38 2017] 0 pages HighMem/MovableOnly<br>>>>> [Mon Apr 24 17:21:38 2017] 151195 pages reserved<br>>>>> [Mon Apr 24 17:21:38 2017] 0 pages cma reserved<br>>>>> [Mon Apr 24 17:21:38 2017] 0 pages hwpoisoned<br>>>>> [Mon Apr 24 17:21:38 2017] [ pid ]  uid  tgid total_vm    rss nr_ptes nr_pmds swapents oom_score_adj name<br>>>>> [Mon Apr 24 17:21:38 2017] [  566]   0  566   15064    460    33    3   1108       0 systemd<br>>>>> -journal<br>>>>> [Mon Apr 24 17:21:38 2017] [  602]   0  602   23693    182    16    3     0       0 lvmetad<br>>>>> [Mon Apr 24 17:21:38 2017] [  613]   0  613   11241    589    21    3    264     -1000 systemd<br>>>>> -udevd<br>>>>> [Mon Apr 24 17:21:38 2017] [ 1381]  100  1381   25081    440    19    3    25       0 systemd<br>>>>> -timesyn<br>>>>> [Mon Apr 24 17:21:38 2017] [ 1447]   0  1447   1100    307    7    3     0       0 acpid<br>>>>> [Mon Apr 24 17:21:38 2017] [ 1449]   0  1449   7252    374    21    3    47       0 cron<br>>>>> [Mon Apr 24 17:21:38 2017] [ 1451]   0  1451   77253    994    19    3    10       0 lxcfs<br>>>>> [Mon Apr 24 17:21:38 2017] [ 1483]   0  1483   6511    413    18    3    42       0 atd<br>>>>> [Mon Apr 24 17:21:38 2017] [ 1505]   0  1505   7157    286    18    3    36       0 systemd<br>>>>> -logind<br>>>>> [Mon Apr 24 17:21:38 2017] [ 1508]  104  1508   64099    376    27    4    712       0 rsyslog<br>>>>> d<br>>>>> [Mon Apr 24 17:21:38 2017] [ 1510]  107  1510   10723    497    25    3    45      -900 dbus-da<br>>>>> emon<br>>>>> [Mon Apr 24 17:21:38 2017] [ 1521]   0  1521   68970    178    38    3    170       0 account<br>>>>> s-daemon<br>>>>> [Mon Apr 24 17:21:38 2017] [ 1526]   0  1526   6548    785    16    3    63       0 smartd<br>>>>> [Mon Apr 24 17:21:38 2017] [ 1528]   0  1528   54412    146    31    5   1806       0 snapd<br>>>>> [Mon Apr 24 17:21:38 2017] [ 1578]   0  1578   3416    335    11    3    24       0 mdadm<br>>>>> [Mon Apr 24 17:21:38 2017] [ 1595]   0  1595   16380    470    35    3    157     -1000 sshd<br>>>>> [Mon Apr 24 17:21:38 2017] [ 1610]   0  1610   69295    303    40    4    57       0 polkitd<br>>>>> [Mon Apr 24 17:21:38 2017] [ 1618]   0  1618   1306    31    8    3     0       0 iscsid<br>>>>> [Mon Apr 24 17:21:38 2017] [ 1619]   0  1619   1431    877    8    3     0      -17 iscsid<br>>>>> [Mon Apr 24 17:21:38 2017] [ 1624]   0  1624  126363   8027   122    4   22441       0 gluster<br>>>>> d<br>>>>> [Mon Apr 24 17:21:38 2017] [ 1688]   0  1688   4884    430    15    3    46       0 irqbala<br>>>>> nce<br>>>>> [Mon Apr 24 17:21:38 2017] [ 1699]   0  1699   3985    348    13    3     0       0 agetty<br>>>>> [Mon Apr 24 17:21:38 2017] [ 7001]   0  7001  500631   27874   145    5   3356       0 gluster<br>>>>> fsd<br>>>>> [Mon Apr 24 17:21:38 2017] [ 8136]   0  8136  500631   28760   141    5   2390       0 gluster<br>>>>> fsd<br>>>>> [Mon Apr 24 17:21:38 2017] [ 9280]   0  9280  533529   27752   135    5   3200       0 gluster<br>>>>> fsd<br>>>>> [Mon Apr 24 17:21:38 2017] [12626]  111 12626   5991    420    16    3    113       0 nrpe<br>>>>> [Mon Apr 24 17:21:38 2017] [14342]   0 14342  533529   28377   135    5   2176       0 gluster<br>>>>> fsd<br>>>>> [Mon Apr 24 17:21:38 2017] [14361]   0 14361  534063   29190   136    5   1972       0 gluster<br>>>>> fsd<br>>>>> [Mon Apr 24 17:21:38 2017] [14380]   0 14380  533529   28104   136    6   2437       0 glusterfsd<br>>>>> [Mon Apr 24 17:21:38 2017] [14399]   0 14399  533529   27552   131    5   2808       0 glusterfsd<br>>>>> [Mon Apr 24 17:21:38 2017] [14418]   0 14418  533529   29588   138    5   2697       0 glusterfsd<br>>>>> [Mon Apr 24 17:21:38 2017] [14437]   0 14437  517080   28671   146    5   2170       0 glusterfsd<br>>>>> [Mon Apr 24 17:21:38 2017] [14456]   0 14456  533529   28083   139    5   3359       0 glusterfsd<br>>>>> [Mon Apr 24 17:21:38 2017] [14475]   0 14475  533529   28054   134    5   2954       0 glusterfsd<br>>>>> [Mon Apr 24 17:21:38 2017] [14494]   0 14494  533529   28594   135    5   2311       0 glusterfsd<br>>>>> [Mon Apr 24 17:21:38 2017] [14513]   0 14513  533529   28911   138    5   2833       0 glusterfsd<br>>>>> [Mon Apr 24 17:21:38 2017] [14532]   0 14532  533529   28259   134    6   3145       0 glusterfsd<br>>>>> [Mon Apr 24 17:21:38 2017] [14551]   0 14551  533529   27875   138    5   2267       0 glusterfsd<br>>>>> [Mon Apr 24 17:21:38 2017] [14570]   0 14570  484716   28247   142    5   2875       0 glusterfsd<br>>>>> [Mon Apr 24 17:21:38 2017] [27646]   0 27646  3697561  202086   2830    17   16528       0 glusterfs<br>>>>> [Mon Apr 24 17:21:38 2017] [27655]   0 27655  787371   29588   197    6   25472       0 glusterfs<br>>>>> [Mon Apr 24 17:21:38 2017] [27665]   0 27665  689585    605   108    6   7008       0 glusterfs<br>>>>> [Mon Apr 24 17:21:38 2017] [29878]   0 29878  193833   36054   241    4   41182       0 glusterfs<br>>>>> [Mon Apr 24 17:21:38 2017] Out of memory: Kill process 27646 (glusterfs) score 17 or sacrifice child<br>>>>> [Mon Apr 24 17:21:38 2017] Killed process 27646 (glusterfs) total-vm:14790244kB, anon-rss:795040kB, file-rss:13304kB<br>>>>> /var/log/glusterfs/glusterd.log<br>>>>> [2017-04-24 11:53:51.359603] I [MSGID: 106006] [glusterd-svc-mgmt.c:327:glusterd_svc_common_rpc_notify] 0-management: glustershd has disconnected from glusterd.<br>>>>> what would have gone wrong?<br>>>>> regards<br>>>>> Amudhan<br>>>>><br>>>>> _______________________________________________<br>>>>> Gluster-users mailing list<br>>>>> <a href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a><br>>>>> <a href="http://lists.gluster.org/mailman/listinfo/gluster-users">http://lists.gluster.org/mailman/listinfo/gluster-users</a><br>>>><br>>>><br>>>><br>>>> --<br>>>> Pranith<br>>><br>>><br>>> _______________________________________________<br>>> Gluster-users mailing list<br>>> <a href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a><br>>> <a href="http://lists.gluster.org/mailman/listinfo/gluster-users">http://lists.gluster.org/mailman/listinfo/gluster-users</a><br>><br>>