<html>
<head>
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<p>I've encountered the same issue, however in my case it seem to
have been caused by a bug in the kernel that was present between
4.4.0-58 - 4.4.0-63
(<a class="moz-txt-link-freetext" href="https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1655842">https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1655842</a>),
seeing how you are running 4.4.0-62 I would suggest upgrading and
see if the error persists. <br>
</p>
<pre class="moz-signature" cols="72">Edvin Ekström,</pre>
<div class="moz-cite-prefix">On 2017-04-26 09:09, Amudhan P wrote:<br>
</div>
<blockquote
cite="mid:CABhA=2-GMnh1Aw3SyZMNFpHRtv3sx+s_hZ7WEWcHo5N+Hf++Aw@mail.gmail.com"
type="cite">
<div dir="ltr">I did volume start force and now self-heal daemon
is up on the node which was down.
<div><br>
</div>
<div>But bitrot has triggered crawling process on all node now,
why was it crawling disk again? if the process is running
already.</div>
<div><br>
</div>
<div>[output from bitd.log]</div>
<div>
<div>[2017-04-13 06:01:23.930089] I
[glusterfsd-mgmt.c:1778:mgmt_getspec_cbk] 0-glusterfs: No
change in volfile, continuing</div>
<div>[2017-04-26 06:51:46.998935] I [MSGID: 100030]
[glusterfsd.c:2460:main] 0-/usr/local/sbin/glusterfs:
Started running /usr/local/sbin/glusterfs version 3.10.1
(args: /usr/local/sbin/glusterfs -s localhost --volfile-id
gluster/bitd -p /var/lib/glusterd/bitd/run/bitd.pid -l
/var/log/glusterfs/bitd.log -S
/var/run/gluster/02f1dd346d47b9006f9bf64e347338fd.socket
--global-timer-wheel)</div>
<div>[2017-04-26 06:51:47.002732] I [MSGID: 101190]
[event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll:
Started thread with index 1</div>
<div><br>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Tue, Apr 25, 2017 at 11:01 PM,
Amudhan P <span dir="ltr"><<a moz-do-not-send="true"
href="mailto:amudhan83@gmail.com" target="_blank">amudhan83@gmail.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">Yes, I have enabled
bitrot process and it's currently running signer process
in some nodes.<br>
<br>
Disabling and enabling bitrot doesn't makes difference
it will start crawl process again right.
<div class="gmail-HOEnZb">
<div class="gmail-h5"><br>
<br>
On Tuesday, April 25, 2017, Atin Mukherjee <<a
moz-do-not-send="true"
href="mailto:amukherj@redhat.com" target="_blank">amukherj@redhat.com</a>>
wrote:<br>
><br>
><br>
> On Tue, Apr 25, 2017 at 9:22 PM, Amudhan P <<a
moz-do-not-send="true"
href="mailto:amudhan83@gmail.com" target="_blank">amudhan83@gmail.com</a>>
wrote:<br>
>><br>
>> Hi Pranith,<br>
>> if I restart glusterd service in the node
alone will it work. bcoz I feel that doing volume
force start will trigger bitrot process to crawl
disks in all nodes.<br>
><br>
> Have you enabled bitrot? If not then the
process will not be in existence. As a workaround
you can always disable this option before executing
volume start force. Please note volume start force
doesn't affect any running processes.<br>
> <br>
>><br>
>> yes, rebalance fix layout is on process.<br>
>> regards<br>
>> Amudhan<br>
>><br>
>> On Tue, Apr 25, 2017 at 9:15 PM, Pranith
Kumar Karampuri <<a moz-do-not-send="true"
href="mailto:pkarampu@redhat.com" target="_blank">pkarampu@redhat.com</a>>
wrote:<br>
>>><br>
>>> You can restart the process using:<br>
>>> gluster volume start <volname>
force<br>
>>><br>
>>> Did shd on this node heal a lot of
data? Based on the kind of memory usage it showed,
seems like there is a leak.<br>
>>><br>
>>><br>
>>> Sunil,<br>
>>> Could you find if there any
leaks in this particular version that we might have
missed in our testing?<br>
>>><br>
>>> On Tue, Apr 25, 2017 at 8:37 PM,
Amudhan P <<a moz-do-not-send="true"
href="mailto:amudhan83@gmail.com" target="_blank">amudhan83@gmail.com</a>>
wrote:<br>
>>>><br>
>>>> Hi,<br>
>>>> In one of my node glustershd
process is killed due to OOM and this happened only
in one node out of 40 node cluster.<br>
>>>> Node running on Ubuntu 16.04.2.<br>
>>>> dmesg output:<br>
>>>> [Mon Apr 24 17:21:38 2017] nrpe
invoked oom-killer: gfp_mask=0x26000c0, order=2,
oom_score_adj=0<br>
>>>> [Mon Apr 24 17:21:38 2017] nrpe
cpuset=/ mems_allowed=0<br>
>>>> [Mon Apr 24 17:21:38 2017] CPU: 0
PID: 12626 Comm: nrpe Not tainted 4.4.0-62-generic
#83-Ubuntu<br>
>>>> [Mon Apr 24 17:21:38 2017]
0000000000000286 00000000fc26b170 ffff88048bf27af0
ffffffff813f7c63<br>
>>>> [Mon Apr 24 17:21:38 2017]
ffff88048bf27cc8 ffff88082a663c00 ffff88048bf27b60
ffffffff8120ad4e<br>
>>>> [Mon Apr 24 17:21:38 2017]
ffff88087781a870 ffff88087781a860 ffffea0011285a80
0000000100000001<br>
>>>> [Mon Apr 24 17:21:38 2017] Call
Trace:<br>
>>>> [Mon Apr 24 17:21:38 2017]
[<ffffffff813f7c63>] dump_stack+0x63/0x90<br>
>>>> [Mon Apr 24 17:21:38 2017]
[<ffffffff8120ad4e>] dump_header+0x5a/0x1c5<br>
>>>> [Mon Apr 24 17:21:38 2017]
[<ffffffff811926c2>]
oom_kill_process+0x202/0x3c0<br>
>>>> [Mon Apr 24 17:21:38 2017]
[<ffffffff81192ae9>]
out_of_memory+0x219/0x460<br>
>>>> [Mon Apr 24 17:21:38 2017]
[<ffffffff81198a5d>] __alloc_pages_slowpath.<wbr>constprop.88+0x8fd/0xa70<br>
>>>> [Mon Apr 24 17:21:38 2017]
[<ffffffff81198e56>]
__alloc_pages_nodemask+0x286/<wbr>0x2a0<br>
>>>> [Mon Apr 24 17:21:38 2017]
[<ffffffff81198f0b>]
alloc_kmem_pages_node+0x4b/<wbr>0xc0<br>
>>>> [Mon Apr 24 17:21:38 2017]
[<ffffffff8107ea5e>]
copy_process+0x1be/0x1b70<br>
>>>> [Mon Apr 24 17:21:38 2017]
[<ffffffff8122d013>] ? __fd_install+0x33/0xe0<br>
>>>> [Mon Apr 24 17:21:38 2017]
[<ffffffff81713d01>] ?
release_sock+0x111/0x160<br>
>>>> [Mon Apr 24 17:21:38 2017]
[<ffffffff810805a0>] _do_fork+0x80/0x360<br>
>>>> [Mon Apr 24 17:21:38 2017]
[<ffffffff8122429c>] ? SyS_select+0xcc/0x110<br>
>>>> [Mon Apr 24 17:21:38 2017]
[<ffffffff81080929>] SyS_clone+0x19/0x20<br>
>>>> [Mon Apr 24 17:21:38 2017]
[<ffffffff818385f2>]
entry_SYSCALL_64_fastpath+<wbr>0x16/0x71<br>
>>>> [Mon Apr 24 17:21:38 2017]
Mem-Info:<br>
>>>> [Mon Apr 24 17:21:38 2017]
active_anon:553952 inactive_anon:206987
isolated_anon:0<br>
>>>>
active_<a class="moz-txt-link-freetext" href="file:3410764">file:3410764</a> inactive_<a class="moz-txt-link-freetext" href="file:3460179">file:3460179</a>
isolated_<a class="moz-txt-link-freetext" href="file:0">file:0</a><br>
>>>>
unevictable:4914 dirty:212868 writeback:0 unstable:0<br>
>>>>
slab_reclaimable:386621 slab_unreclaimable:31829<br>
>>>>
mapped:6112 shmem:211 pagetables:6178 bounce:0<br>
>>>>
free:82623 free_pcp:213 free_cma:0<br>
>>>> [Mon Apr 24 17:21:38 2017] Node 0
DMA free:15880kB min:32kB low:40kB high:48kB
active_anon:0kB inactive_anon:0k<br>
>>>> B active_<a class="moz-txt-link-freetext" href="file:0kB">file:0kB</a> inactive_<a class="moz-txt-link-freetext" href="file:0kB">file:0kB</a>
unevictable:0kB isolated(anon):0kB
isolated(file):0kB present:15964kB manag<br>
>>>> ed:15880kB mlocked:0kB dirty:0kB
writeback:0kB mapped:0kB shmem:0kB
slab_reclaimable:0kB slab_unreclaimable:0kB<br>
>>>> kernel_stack:0kB pagetables:0kB
unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB
free_cma:0kB writeback_tmp:<br>
>>>> 0kB pages_scanned:0
all_unreclaimable? yes<br>
>>>> [Mon Apr 24 17:21:38 2017]
lowmem_reserve[]: 0 1868 31944 31944 31944<br>
>>>> [Mon Apr 24 17:21:38 2017] Node 0
DMA32 free:133096kB min:3948kB low:4932kB
high:5920kB active_anon:170764kB in<br>
>>>> active_anon:206296kB
active_<a class="moz-txt-link-freetext" href="file:394236kB">file:394236kB</a> inactive_<a class="moz-txt-link-freetext" href="file:525288kB">file:525288kB</a>
unevictable:980kB isolated(anon):0kB isolated(<br>
>>>> file):0kB present:2033596kB
managed:1952976kB mlocked:980kB dirty:1552kB
writeback:0kB mapped:3904kB shmem:724k<br>
>>>> B slab_reclaimable:502176kB
slab_unreclaimable:8916kB kernel_stack:1952kB
pagetables:1408kB unstable:0kB bounce<br>
>>>> :0kB free_pcp:0kB local_pcp:0kB
free_cma:0kB writeback_tmp:0kB pages_scanned:0
all_unreclaimable? no<br>
>>>> [Mon Apr 24 17:21:38 2017]
lowmem_reserve[]: 0 0 30076 30076 30076<br>
>>>> [Mon Apr 24 17:21:38 2017] Node 0
Normal free:181516kB min:63600kB low:79500kB
high:95400kB active_anon:2045044<br>
>>>> kB inactive_anon:621652kB
active_<a class="moz-txt-link-freetext" href="file:13248820kB">file:13248820kB</a> inactive_<a class="moz-txt-link-freetext" href="file:13315428kB">file:13315428kB</a>
unevictable:18676kB isolated(anon):0kB
isolated(file):0kB present:31322112kB
managed:30798036kB mlocked:18676kB dirty:849920kB
writeback:0kB mapped:20544kB shmem:120kB
slab_reclaimable:1044308kB
slab_unreclaimable:118400kB kernel_stack:33792kB
pagetables:23304kB unstable:0kB bounce:0kB
free_pcp:852kB local_pcp:0kB free_cma:0kB
writeback_tmp:0kB pages_scanned:0 all_unreclaimable?
no<br>
>>>> [Mon Apr 24 17:21:38 2017]
lowmem_reserve[]: 0 0 0 0 0<br>
>>>> [Mon Apr 24 17:21:38 2017] Node 0
DMA: 0*4kB 1*8kB (U) 0*16kB 0*32kB 2*64kB (U)
1*128kB (U) 1*256kB (U) 0*512kB<br>
>>>> 1*1024kB (U) 1*2048kB (M) 3*4096kB
(M) = 15880kB<br>
>>>> [Mon Apr 24 17:21:38 2017] Node 0
DMA32: 18416*4kB (UME) 7480*8kB (UME) 0*16kB 0*32kB
0*64kB 0*128kB 0*256kB 0*<br>
>>>> 512kB 0*1024kB 0*2048kB 0*4096kB =
133504kB<br>
>>>> [Mon Apr 24 17:21:38 2017] Node 0
Normal: 44972*4kB (UMEH) 13*8kB (EH) 13*16kB (H)
13*32kB (H) 8*64kB (H) 2*128<br>
>>>> kB (H) 0*256kB 0*512kB 0*1024kB
0*2048kB 0*4096kB = 181384kB<br>
>>>> [Mon Apr 24 17:21:38 2017] Node 0
hugepages_total=0 hugepages_free=0 hugepages_surp=0
hugepages_size=1048576kB<br>
>>>> [Mon Apr 24 17:21:38 2017] Node 0
hugepages_total=0 hugepages_free=0 hugepages_surp=0
hugepages_size=2048kB<br>
>>>> [Mon Apr 24 17:21:38 2017] 6878703
total pagecache pages<br>
>>>> [Mon Apr 24 17:21:38 2017] 2484
pages in swap cache<br>
>>>> [Mon Apr 24 17:21:38 2017] Swap
cache stats: add 3533870, delete 3531386, find
3743168/4627884<br>
>>>> [Mon Apr 24 17:21:38 2017] Free
swap = 14976740kB<br>
>>>> [Mon Apr 24 17:21:38 2017] Total
swap = 15623164kB<br>
>>>> [Mon Apr 24 17:21:38 2017] 8342918
pages RAM<br>
>>>> [Mon Apr 24 17:21:38 2017] 0 pages
HighMem/MovableOnly<br>
>>>> [Mon Apr 24 17:21:38 2017] 151195
pages reserved<br>
>>>> [Mon Apr 24 17:21:38 2017] 0 pages
cma reserved<br>
>>>> [Mon Apr 24 17:21:38 2017] 0 pages
hwpoisoned<br>
>>>> [Mon Apr 24 17:21:38 2017] [ pid ]
uid tgid total_vm rss nr_ptes nr_pmds
swapents oom_score_adj name<br>
>>>> [Mon Apr 24 17:21:38 2017] [ 566]
0 566 15064 460 33 3
1108 0 systemd<br>
>>>> -journal<br>
>>>> [Mon Apr 24 17:21:38 2017] [ 602]
0 602 23693 182 16 3
0 0 lvmetad<br>
>>>> [Mon Apr 24 17:21:38 2017] [ 613]
0 613 11241 589 21 3
264 -1000 systemd<br>
>>>> -udevd<br>
>>>> [Mon Apr 24 17:21:38 2017] [ 1381]
100 1381 25081 440 19 3
25 0 systemd<br>
>>>> -timesyn<br>
>>>> [Mon Apr 24 17:21:38 2017] [ 1447]
0 1447 1100 307 7 3
0 0 acpid<br>
>>>> [Mon Apr 24 17:21:38 2017] [ 1449]
0 1449 7252 374 21 3
47 0 cron<br>
>>>> [Mon Apr 24 17:21:38 2017] [ 1451]
0 1451 77253 994 19 3
10 0 lxcfs<br>
>>>> [Mon Apr 24 17:21:38 2017] [ 1483]
0 1483 6511 413 18 3
42 0 atd<br>
>>>> [Mon Apr 24 17:21:38 2017] [ 1505]
0 1505 7157 286 18 3
36 0 systemd<br>
>>>> -logind<br>
>>>> [Mon Apr 24 17:21:38 2017] [ 1508]
104 1508 64099 376 27 4
712 0 rsyslog<br>
>>>> d<br>
>>>> [Mon Apr 24 17:21:38 2017] [ 1510]
107 1510 10723 497 25 3
45 -900 dbus-da<br>
>>>> emon<br>
>>>> [Mon Apr 24 17:21:38 2017] [ 1521]
0 1521 68970 178 38 3
170 0 account<br>
>>>> s-daemon<br>
>>>> [Mon Apr 24 17:21:38 2017] [ 1526]
0 1526 6548 785 16 3
63 0 smartd<br>
>>>> [Mon Apr 24 17:21:38 2017] [ 1528]
0 1528 54412 146 31 5
1806 0 snapd<br>
>>>> [Mon Apr 24 17:21:38 2017] [ 1578]
0 1578 3416 335 11 3
24 0 mdadm<br>
>>>> [Mon Apr 24 17:21:38 2017] [ 1595]
0 1595 16380 470 35 3
157 -1000 sshd<br>
>>>> [Mon Apr 24 17:21:38 2017] [ 1610]
0 1610 69295 303 40 4
57 0 polkitd<br>
>>>> [Mon Apr 24 17:21:38 2017] [ 1618]
0 1618 1306 31 8 3
0 0 iscsid<br>
>>>> [Mon Apr 24 17:21:38 2017] [ 1619]
0 1619 1431 877 8 3
0 -17 iscsid<br>
>>>> [Mon Apr 24 17:21:38 2017] [ 1624]
0 1624 126363 8027 122 4
22441 0 gluster<br>
>>>> d<br>
>>>> [Mon Apr 24 17:21:38 2017] [ 1688]
0 1688 4884 430 15 3
46 0 irqbala<br>
>>>> nce<br>
>>>> [Mon Apr 24 17:21:38 2017] [ 1699]
0 1699 3985 348 13 3
0 0 agetty<br>
>>>> [Mon Apr 24 17:21:38 2017] [ 7001]
0 7001 500631 27874 145 5
3356 0 gluster<br>
>>>> fsd<br>
>>>> [Mon Apr 24 17:21:38 2017] [ 8136]
0 8136 500631 28760 141 5
2390 0 gluster<br>
>>>> fsd<br>
>>>> [Mon Apr 24 17:21:38 2017] [ 9280]
0 9280 533529 27752 135 5
3200 0 gluster<br>
>>>> fsd<br>
>>>> [Mon Apr 24 17:21:38 2017] [12626]
111 12626 5991 420 16 3
113 0 nrpe<br>
>>>> [Mon Apr 24 17:21:38 2017] [14342]
0 14342 533529 28377 135 5
2176 0 gluster<br>
>>>> fsd<br>
>>>> [Mon Apr 24 17:21:38 2017] [14361]
0 14361 534063 29190 136 5
1972 0 gluster<br>
>>>> fsd<br>
>>>> [Mon Apr 24 17:21:38 2017] [14380]
0 14380 533529 28104 136 6
2437 0 glusterfsd<br>
>>>> [Mon Apr 24 17:21:38 2017] [14399]
0 14399 533529 27552 131 5
2808 0 glusterfsd<br>
>>>> [Mon Apr 24 17:21:38 2017] [14418]
0 14418 533529 29588 138 5
2697 0 glusterfsd<br>
>>>> [Mon Apr 24 17:21:38 2017] [14437]
0 14437 517080 28671 146 5
2170 0 glusterfsd<br>
>>>> [Mon Apr 24 17:21:38 2017] [14456]
0 14456 533529 28083 139 5
3359 0 glusterfsd<br>
>>>> [Mon Apr 24 17:21:38 2017] [14475]
0 14475 533529 28054 134 5
2954 0 glusterfsd<br>
>>>> [Mon Apr 24 17:21:38 2017] [14494]
0 14494 533529 28594 135 5
2311 0 glusterfsd<br>
>>>> [Mon Apr 24 17:21:38 2017] [14513]
0 14513 533529 28911 138 5
2833 0 glusterfsd<br>
>>>> [Mon Apr 24 17:21:38 2017] [14532]
0 14532 533529 28259 134 6
3145 0 glusterfsd<br>
>>>> [Mon Apr 24 17:21:38 2017] [14551]
0 14551 533529 27875 138 5
2267 0 glusterfsd<br>
>>>> [Mon Apr 24 17:21:38 2017] [14570]
0 14570 484716 28247 142 5
2875 0 glusterfsd<br>
>>>> [Mon Apr 24 17:21:38 2017] [27646]
0 27646 3697561 202086 2830 17
16528 0 glusterfs<br>
>>>> [Mon Apr 24 17:21:38 2017] [27655]
0 27655 787371 29588 197 6
25472 0 glusterfs<br>
>>>> [Mon Apr 24 17:21:38 2017] [27665]
0 27665 689585 605 108 6
7008 0 glusterfs<br>
>>>> [Mon Apr 24 17:21:38 2017] [29878]
0 29878 193833 36054 241 4
41182 0 glusterfs<br>
>>>> [Mon Apr 24 17:21:38 2017] Out of
memory: Kill process 27646 (glusterfs) score 17 or
sacrifice child<br>
>>>> [Mon Apr 24 17:21:38 2017] Killed
process 27646 (glusterfs) total-vm:14790244kB,
anon-rss:795040kB, file-rss:13304kB<br>
>>>> /var/log/glusterfs/glusterd.<wbr>log<br>
>>>> [2017-04-24 11:53:51.359603] I
[MSGID: 106006] [glusterd-svc-mgmt.c:327:<wbr>glusterd_svc_common_rpc_<wbr>notify]
0-management: glustershd has disconnected from
glusterd.<br>
>>>> what would have gone wrong?<br>
>>>> regards<br>
>>>> Amudhan<br>
>>>><br>
>>>> ______________________________<wbr>_________________<br>
>>>> Gluster-users mailing list<br>
>>>> <a moz-do-not-send="true"
href="mailto:Gluster-users@gluster.org"
target="_blank">Gluster-users@gluster.org</a><br>
>>>> <a moz-do-not-send="true"
href="http://lists.gluster.org/mailman/listinfo/gluster-users"
target="_blank">http://lists.gluster.org/<wbr>mailman/listinfo/gluster-users</a><br>
>>><br>
>>><br>
>>><br>
>>> --<br>
>>> Pranith<br>
>><br>
>><br>
>> ______________________________<wbr>_________________<br>
>> Gluster-users mailing list<br>
>> <a moz-do-not-send="true"
href="mailto:Gluster-users@gluster.org"
target="_blank">Gluster-users@gluster.org</a><br>
>> <a moz-do-not-send="true"
href="http://lists.gluster.org/mailman/listinfo/gluster-users"
target="_blank">http://lists.gluster.org/<wbr>mailman/listinfo/gluster-users</a><br>
><br>
>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</div>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
Gluster-users mailing list
<a class="moz-txt-link-abbreviated" href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a>
<a class="moz-txt-link-freetext" href="http://lists.gluster.org/mailman/listinfo/gluster-users">http://lists.gluster.org/mailman/listinfo/gluster-users</a></pre>
</blockquote>
<br>
</body>
</html>