<div dir="ltr"><br><div>Hi Amudhan,</div><div><br></div><div>Replies inline.</div><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Sep 8, 2017 at 6:37 AM, Amudhan P <span dir="ltr"><<a href="mailto:amudhan83@gmail.com" target="_blank">amudhan83@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div>Hi,</div><div><br></div><div>I am using glusterfs 3.10.1 with 30 nodes each with 36 bricks and 10 nodes each with 16 bricks in a single cluster. </div><div><br></div><div>By default I have paused scrub process to have it run manually. for the first time, i was trying to run scrub-on-demand and it was running fine, </div><div>but after some time, i decided to pause scrub process due to high CPU usage and user reporting folder listing taking time. </div><div>But scrub pause resulted below message in some of the nodes.</div><div>Also, i can see that scrub daemon is not showing in volume status for some nodes.</div><div><br></div><div>Error msg type 1</div><div>--</div><div><br></div><div>[2017-09-01 10:04:45.840248] I [bit-rot.c:1683:notify] 0-glustervol-bit-rot-0: BitRot scrub ondemand called</div><div>[2017-09-01 10:05:05.094948] I [glusterfsd-mgmt.c:52:mgmt_<wbr>cbk_spec] 0-mgmt: Volume file changed</div><div>[2017-09-01 10:05:06.401792] I [glusterfsd-mgmt.c:52:mgmt_<wbr>cbk_spec] 0-mgmt: Volume file changed</div><div>[2017-09-01 10:05:07.544524] I [MSGID: 118035] [bit-rot-scrub.c:1297:br_<wbr>scrubber_scale_up] 0-glustervol-bit-rot-0: Scaling up scrubbe</div><div>rs [0 => 36]</div><div>[2017-09-01 10:05:07.552893] I [MSGID: 118048] [bit-rot-scrub.c:1547:br_<wbr>scrubber_log_option] 0-glustervol-bit-rot-0: SCRUB TUNABLES::</div><div> [Frequency: biweekly, Throttle: lazy]</div><div>[2017-09-01 10:05:07.552942] I [MSGID: 118038] [bit-rot-scrub.c:948:br_<wbr>fsscan_schedule] 0-glustervol-bit-rot-0: Scrubbing is schedule</div><div>d to run at 2017-09-15 10:05:07</div><div>[2017-09-01 10:05:07.553457] I [glusterfsd-mgmt.c:1778:mgmt_<wbr>getspec_cbk] 0-glusterfs: No change in volfile, continuing</div><div>[2017-09-01 10:05:20.953815] I [bit-rot.c:1683:notify] 0-glustervol-bit-rot-0: BitRot scrub ondemand called</div><div>[2017-09-01 10:05:20.953845] I [MSGID: 118038] [bit-rot-scrub.c:1085:br_<wbr>fsscan_ondemand] 0-glustervol-bit-rot-0: Ondemand Scrubbing s</div><div>cheduled to run at 2017-09-01 10:05:21</div><div>[2017-09-01 10:05:22.216937] I [MSGID: 118044] [bit-rot-scrub.c:615:br_<wbr>scrubber_log_time] 0-glustervol-bit-rot-0: Scrubbing started a</div><div>t 2017-09-01 10:05:22</div><div>[2017-09-01 10:05:22.306307] I [glusterfsd-mgmt.c:52:mgmt_<wbr>cbk_spec] 0-mgmt: Volume file changed</div><div>[2017-09-01 10:05:24.684900] I [glusterfsd-mgmt.c:1778:mgmt_<wbr>getspec_cbk] 0-glusterfs: No change in volfile, continuing</div><div>[2017-09-06 08:37:26.422267] I [glusterfsd-mgmt.c:52:mgmt_<wbr>cbk_spec] 0-mgmt: Volume file changed</div><div>[2017-09-06 08:37:28.351821] I [glusterfsd-mgmt.c:52:mgmt_<wbr>cbk_spec] 0-mgmt: Volume file changed</div><div>[2017-09-06 08:37:30.350786] I [MSGID: 118034] [bit-rot-scrub.c:1342:br_<wbr>scrubber_scale_down] 0-glustervol-bit-rot-0: Scaling down scr</div><div>ubbers [36 => 0]</div><div>pending frames:</div><div>frame : type(0) op(0)</div><div>frame : type(0) op(0)</div><div>frame : type(0) op(0)</div><div>frame : type(0) op(0)</div><div>frame : type(0) op(0)</div><div>frame : type(0) op(0)</div><div>frame : type(0) op(0)</div><div>frame : type(0) op(0)</div><div>frame : type(0) op(0)</div><div>frame : type(0) op(0)</div><div>frame : type(0) op(0)</div><div>frame : type(0) op(0)</div><div>frame : type(0) op(0)</div><div>frame : type(0) op(0)</div><div>frame : type(0) op(0)</div><div>frame : type(0) op(0)</div><div>frame : type(0) op(0)</div><div>frame : type(0) op(0)</div><div>frame : type(0) op(0)</div><div>frame : type(0) op(0)</div><div>frame : type(0) op(0)</div><div>frame : type(0) op(0)</div><div>frame : type(0) op(0)</div><div>frame : type(0) op(0)</div><div>frame : type(0) op(0)</div><div>frame : type(0) op(0)</div><div>frame : type(0) op(0)</div><div>patchset: git://<a href="http://git.gluster.org/glusterfs.git" target="_blank">git.gluster.org/<wbr>glusterfs.git</a></div><div>signal received: 11</div><div>time of crash:</div><div>2017-09-06 08:37:30</div><div>configuration details:</div><div>argp 1</div><div>backtrace 1</div><div>dlfcn 1</div><div>libpthread 1</div><div>llistxattr 1</div><div>setfsid 1</div><div>spinlock 1</div><div>epoll.h 1</div><div>xattr.h 1</div><div>st_atim.tv_nsec 1</div><div>package-string: glusterfs 3.10.1</div><div>/usr/lib/libglusterfs.so.0(_<wbr>gf_msg_backtrace_nomem+0x78)[<wbr>0x7fda0ab0b4f8]</div><div>/usr/lib/libglusterfs.so.0(gf_<wbr>print_trace+0x324)[<wbr>0x7fda0ab14914]</div><div>/lib/x86_64-linux-gnu/libc.so.<wbr>6(+0x36d40)[0x7fda09ef9d40]</div><div>/usr/lib/libglusterfs.so.0(<wbr>syncop_readv_cbk+0x17)[<wbr>0x7fda0ab429e7]</div><div>/usr/lib/glusterfs/3.10.1/<wbr>xlator/protocol/client.so(+<wbr>0x2db4b)[0x7fda04986b4b]</div><div>/usr/lib/libgfrpc.so.0(rpc_<wbr>clnt_handle_reply+0x90)[<wbr>0x7fda0a8d5490]</div><div>/usr/lib/libgfrpc.so.0(rpc_<wbr>clnt_notify+0x1e7)[<wbr>0x7fda0a8d5777]</div><div>/usr/lib/libgfrpc.so.0(rpc_<wbr>transport_notify+0x23)[<wbr>0x7fda0a8d17d3]</div><div>/usr/lib/glusterfs/3.10.1/rpc-<wbr>transport/socket.so(+0x7194)[<wbr>0x7fda05826194]</div><div>/usr/lib/glusterfs/3.10.1/rpc-<wbr>transport/socket.so(+0x9635)[<wbr>0x7fda05828635]</div><div>/usr/lib/libglusterfs.so.0(+<wbr>0x83db0)[0x7fda0ab64db0]</div><div>/lib/x86_64-linux-gnu/<wbr>libpthread.so.0(+0x8182)[<wbr>0x7fda0a290182]</div><div>/lib/x86_64-linux-gnu/libc.so.<wbr>6(clone+0x6d)[0x7fda09fbd47d]</div><div>--------------</div><div><br></div><div>Error msg type 2</div><div><br></div><div>[2017-09-01 10:01:20.387248] I [MSGID: 118035] [bit-rot-scrub.c:1297:br_<wbr>scrubber_scale_up] 0-glustervol-bit-rot-0: Scaling up scrubbe</div><div>rs [0 => 36]</div><div>[2017-09-01 10:01:20.392544] I [MSGID: 118048] [bit-rot-scrub.c:1547:br_<wbr>scrubber_log_option] 0-glustervol-bit-rot-0: SCRUB TUNABLES::</div><div> [Frequency: biweekly, Throttle: lazy]</div><div>[2017-09-01 10:01:20.392571] I [MSGID: 118038] [bit-rot-scrub.c:948:br_<wbr>fsscan_schedule] 0-glustervol-bit-rot-0: Scrubbing is schedule</div><div>d to run at 2017-09-15 10:01:20</div><div>[2017-09-01 10:01:20.392727] I [glusterfsd-mgmt.c:1778:mgmt_<wbr>getspec_cbk] 0-glusterfs: No change in volfile, continuing</div><div>[2017-09-01 10:01:35.078694] I [bit-rot.c:1683:notify] 0-glustervol-bit-rot-0: BitRot scrub ondemand called</div><div>[2017-09-01 10:01:35.078735] I [MSGID: 118038] [bit-rot-scrub.c:1085:br_<wbr>fsscan_ondemand] 0-glustervol-bit-rot-0: Ondemand Scrubbing s</div><div>cheduled to run at 2017-09-01 10:01:36</div><div>[2017-09-01 10:01:36.355827] I [glusterfsd-mgmt.c:52:mgmt_<wbr>cbk_spec] 0-mgmt: Volume file changed</div><div>[2017-09-01 10:01:37.018622] I [MSGID: 118044] [bit-rot-scrub.c:615:br_<wbr>scrubber_log_time] 0-glustervol-bit-rot-0: Scrubbing started a</div><div>t 2017-09-01 10:01:37</div><div>[2017-09-01 10:01:37.601774] I [glusterfsd-mgmt.c:1778:mgmt_<wbr>getspec_cbk] 0-glusterfs: No change in volfile, continuing</div><div>[2017-09-06 08:33:37.738627] I [glusterfsd-mgmt.c:52:mgmt_<wbr>cbk_spec] 0-mgmt: Volume file changed</div><div>[2017-09-06 08:33:39.812894] I [glusterfsd-mgmt.c:52:mgmt_<wbr>cbk_spec] 0-mgmt: Volume file changed</div><div>[2017-09-06 08:33:41.828432] I [MSGID: 118034] [bit-rot-scrub.c:1342:br_<wbr>scrubber_scale_down] 0-glustervol-bit-rot-0: Scaling down scr</div><div>ubbers [36 => 0]</div><div>[2017-09-06 08:33:41.884031] I [MSGID: 118051] [bit-rot-ssm.c:80:br_scrub_<wbr>ssm_state_stall] 0-glustervol-bit-rot-0: Volume is under ac</div><div>tive scrubbing. Pausing scrub..</div><div>[2017-09-06 08:34:26.477106] C [rpc-clnt-ping.c:160:rpc_clnt_<wbr>ping_timer_expired] 0-glustervol-client-970: server <a href="http://192.168.0.21:49177" target="_blank">192.168.0.21:49177</a> has</div><div>not responded in the last 42 seconds, disconnecting.</div><div>[2017-09-06 08:34:29.477438] C [rpc-clnt-ping.c:160:rpc_clnt_<wbr>ping_timer_expired] 0-glustervol-client-980: server <a href="http://192.168.0.21:49178" target="_blank">192.168.0.21:49178</a> has</div><div>not responded in the last 42 seconds, disconnecting.</div><div>[2017-09-06 08:34:37.478198] C [rpc-clnt-ping.c:160:rpc_clnt_<wbr>ping_timer_expired] 0-glustervol-client-1040: server <a href="http://192.168.0.21:49184" target="_blank">192.168.0.21:49184</a> has</div><div> not responded in the last 42 seconds, disconnecting.</div><div>[2017-09-06 08:34:40.478550] C [rpc-clnt-ping.c:160:rpc_clnt_<wbr>ping_timer_expired] 0-glustervol-client-1070: server <a href="http://192.168.0.21:49187" target="_blank">192.168.0.21:49187</a> has</div><div> not responded in the last 42 seconds, disconnecting.</div><div>[2017-09-06 08:34:56.480200] C [rpc-clnt-ping.c:160:rpc_clnt_<wbr>ping_timer_expired] 0-glustervol-client-990: server <a href="http://192.168.0.21:49179" target="_blank">192.168.0.21:49179</a> has</div><div>not responded in the last 42 seconds, disconnecting.</div><div>[2017-09-06 08:34:59.480520] C [rpc-clnt-ping.c:160:rpc_clnt_<wbr>ping_timer_expired] 0-glustervol-client-760: server <a href="http://192.168.0.21:49156" target="_blank">192.168.0.21:49156</a> has</div><div>not responded in the last 42 seconds, disconnecting.</div><div>[2017-09-06 08:35:01.480751] C [rpc-clnt-ping.c:160:rpc_clnt_<wbr>ping_timer_expired] 0-glustervol-client-1020: server <a href="http://192.168.0.21:49182" target="_blank">192.168.0.21:49182</a> has</div><div> not responded in the last 42 seconds, disconnecting.</div><div>[2017-09-06 08:35:05.481223] C [rpc-clnt-ping.c:160:rpc_clnt_<wbr>ping_timer_expired] 0-glustervol-client-790: server <a href="http://192.168.0.21:49159" target="_blank">192.168.0.21:49159</a> has not responded in the last 42 seconds, disconnecting.</div><div>[2017-09-06 09:03:43.637208] E [rpc-clnt.c:200:call_bail] 0-glusterfs: bailing out frame type(GlusterFS Handshake) op(GETSPEC(2)) xid = 0x8 sent = 2017-09-06 08:33:39.813002. timeout = 1800 for <a href="http://127.0.0.1:24007" target="_blank">127.0.0.1:24007</a></div><div>[2017-09-06 09:03:44.637338] E [rpc-clnt.c:200:call_bail] 0-glustervol-client-760: bailing out frame type(GlusterFS 3.3) op(READ(12)) xid = 0x160f941 sent = 2017-09-06 08:33:41.843336. timeout = 1800 for <a href="http://192.168.0.21:49156" target="_blank">192.168.0.21:49156</a></div><div>[2017-09-06 09:03:44.637726] W [MSGID: 114031] [client-rpc-fops.c:2992:<wbr>client3_3_readv_cbk] 0-glustervol-client-760: remote operation failed [Transport endpoint is not connected]</div><div>pending frames:</div><div>frame : type(0) op(0)</div><div>frame : type(0) op(0)</div><div>frame : type(0) op(0)</div><div>frame : type(0) op(0)</div><div>frame : type(0) op(0)</div><div>frame : type(0) op(0)</div><div>frame : type(0) op(0)</div><div>frame : type(0) op(0)</div><div>frame : type(0) op(0)</div><div>frame : type(0) op(0)</div><div>frame : type(0) op(0)</div><div>frame : type(0) op(0)</div><div>frame : type(0) op(0)</div><div>frame : type(0) op(0)</div><div>frame : type(0) op(0)</div><div>frame : type(0) op(0)</div><div>frame : type(0) op(0)</div><div>frame : type(0) op(0)</div><div>frame : type(0) op(0)</div><div>frame : type(0) op(0)</div><div>frame : type(0) op(0)</div><div>frame : type(0) op(0)</div><div>frame : type(0) op(0)</div><div>frame : type(0) op(0)</div><div>frame : type(0) op(0)</div><div>frame : type(0) op(0)</div><div>frame : type(0) op(0)</div><div>frame : type(0) op(0)</div><div>frame : type(0) op(0)</div><div>frame : type(0) op(0)</div><div>frame : type(0) op(0)</div><div>frame : type(0) op(0)</div><div>frame : type(0) op(0)</div><div>frame : type(0) op(0)</div><div>frame : type(0) op(0)</div><div>frame : type(0) op(0)</div><div>frame : type(0) op(0)</div><div>frame : type(0) op(0)</div><div>frame : type(0) op(0)</div><div>frame : type(0) op(0)</div><div>frame : type(0) op(0)</div><div>frame : type(0) op(0)</div><div>frame : type(0) op(0)</div><div>patchset: git://<a href="http://git.gluster.org/glusterfs.git" target="_blank">git.gluster.org/<wbr>glusterfs.git</a></div><div>signal received: 11</div><div>time of crash:</div><div>2017-09-06 09:03:44</div><div>configuration details:</div><div>argp 1</div><div>backtrace 1</div><div>dlfcn 1</div><div>libpthread 1</div><div>llistxattr 1</div><div>setfsid 1</div><div>spinlock 1</div><div>epoll.h 1</div><div>xattr.h 1</div><div>st_atim.tv_nsec 1</div><div>package-string: glusterfs 3.10.1</div><div>/usr/lib/libglusterfs.so.0(_<wbr>gf_msg_backtrace_nomem+0x78)[<wbr>0x7f26721934f8]</div><div>/usr/lib/libglusterfs.so.0(gf_<wbr>print_trace+0x324)[<wbr>0x7f267219c914]</div><div>/lib/x86_64-linux-gnu/libc.so.<wbr>6(+0x36d40)[0x7f2671581d40]</div><div>/usr/lib/libglusterfs.so.0(<wbr>syncop_readv_cbk+0x17)[<wbr>0x7f26721ca9e7]</div><div>/usr/lib/glusterfs/3.10.1/<wbr>xlator/protocol/client.so(+<wbr>0x2db4b)[0x7f2667dd3b4b]</div><div>/usr/lib/libgfrpc.so.0(+<wbr>0xf92c)[0x7f2671f5c92c]</div><div>/usr/lib/libglusterfs.so.0(+<wbr>0x36eb2)[0x7f267219feb2]</div><div>/lib/x86_64-linux-gnu/<wbr>libpthread.so.0(+0x8182)[<wbr>0x7f2671918182]</div><div>/lib/x86_64-linux-gnu/libc.so.<wbr>6(clone+0x6d)[0x7f267164547d]</div><div>-------</div><div><br></div><div><br></div><div>My queries are below:-</div><div><br></div><div>1. To resume scrub process should I restart glusterd service in node where scrub daemon is not running or do a volume force start</div></div></blockquote><div><br></div><div>For resuming the scrub process, you can do a volume start force (gluster volume start <volume name> force)</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><br></div><div>2. if resumed, will it start from where it was stopped.</div></div></blockquote><div><br></div><div>If resumed, it will start from beginning. Resuming from where it left would happen when the scrubber is continued after a pause. But in this case the process itself is starting.</div><div>So it would start from the beginning of the volume.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><br></div><div>3. I am assuming, scrub by default assigns thread by calculating the number of bricks in the node. need an option to change it in gluster volume command.</div><div> Because in my case my node has 12 CPU's (Intel Xeon CPU 6 core + HT) when scrub was running it consumed all CPU 99%.</div><div> or it should be intelligent enough to scale down depending on available CPUs in the node.</div><div><br></div></div></blockquote><div><br></div><div>Scrub's threads are mainly scaled up and down based on the throttling mode used. By default scrubber uses LAZY throttling. Have you changed the throttling to some other higher value? (such as NORMAL or AGGRESSIVE). Also what is the scrub frequency?</div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div></div><div>4. Why was this crash? </div><div><br></div><div><br></div></div></blockquote><div><br></div><div>Can you please provide the core file associated with the crash? It will help us understand why the scrubber crashed.</div><div><br></div><div>Can you please provide the following information to analyze the issue further?</div><div><br></div><div>A) Core file for the crash</div><div>B) o/p of the following commands</div><div> "gluster volume info"</div><div> " gluster volume status"</div><div>C) gluster logs from the node where the scrubber crashed (present in /var/log/glusterfs)</div><div><br></div><div><br></div><div>Regards,</div><div>Raghavendra</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div></div><div>regards</div><span class="HOEnZb"><font color="#888888"><div>Amudhan P</div></font></span></div>
<br>______________________________<wbr>_________________<br>
Gluster-users mailing list<br>
<a href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a><br>
<a href="http://lists.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">http://lists.gluster.org/<wbr>mailman/listinfo/gluster-users</a><br></blockquote></div><br></div></div>