<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Feb 23, 2018 at 1:04 PM, Niels de Vos <span dir="ltr">&lt;<a href="mailto:ndevos@redhat.com" target="_blank">ndevos@redhat.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><span class="gmail-">On Wed, Feb 21, 2018 at 08:25:21PM +0530, Atin Mukherjee wrote:<br>

&gt; On Wed, Feb 21, 2018 at 4:24 PM, Xavi Hernandez &lt;<a href="mailto:jahernan@redhat.com">jahernan@redhat.com</a>&gt; wrote:<br>

&gt;<br>

&gt; &gt; Hi all,<br>

&gt; &gt;<br>

&gt; &gt; currently glusterd sends a SIGKILL to stop gNFS, while all other services<br>

&gt; &gt; are stopped with a SIGTERM signal first (this can be seen in<br>

&gt; &gt; glusterd_svc_stop() function of mgmt/glusterd xlator).<br>

&gt; &gt;<br>

&gt;<br>

&gt; &gt; The question is why it cannot be stopped with SIGTERM as all other<br>

&gt; &gt; services. Using SIGKILL blindly while write I/O is happening can cause<br>

&gt; &gt; multiple inconsistencies at the same time. For a replicated volume this is<br>

&gt; &gt; not a problem because it will take one of the replicas as the &quot;good&quot; one<br>

&gt; &gt; and continue, but for a disperse volume, if the number of inconsistencies<br>

&gt; &gt; is bigger than the redundancy value, a serious problem could appear.<br>

&gt; &gt;<br>

&gt; &gt; The probability of this is very small (I&#39;ve tried to reproduce this<br>

&gt; &gt; problem on my laptop but I&#39;ve been unable), but it exists.<br>

&gt; &gt;<br>

&gt; &gt; Is there any known issue that prevents gNFS to be stopped with a SIGTERM ?<br>

&gt; &gt; or can it be changed safely ?<br>

&gt; &gt;<br>

&gt;<br>

&gt; I firmly believe that we need to send SIGTERM as that&#39;s the right way to<br>

&gt; gracefully shutdown a running process but what I&#39;d request from NFS folks<br>

&gt; to confirm if there&#39;s any background on why it was done with SIGKILL.<br>

<br>

</span>No background about this is known to me. I had a quick look through the<br>

git logs, but could not find an explanation.<br>

<br>

I agree that SIGTERM would be more appropriate.<br>

<br></blockquote><div><br></div><div><br></div><div>I think there were two reasons for replacing SIGTERM with SIGKILL in gNFS:</div><div><br></div><div>1.  To avoid races in the graceful shutdown path that would affect the restart of gNFS process. </div><div><br></div><div>2.  Graceful shutdown of gNFS might have caused clients to return errors to applications.</div><div><br></div><div>Improvements done for gracefully shutting down GlusterFS might have already addressed 1. I am not entirely certain if 2. was an issue or if it still is one. If we attempt replacing SIGKILL with SIGTERM, it would be worth testing out these scenarios carefully.</div><div><br></div><div>I also see references to other SIGKILLs in glusterd and other components:</div><div><br></div><div><div>xlators/mgmt/glusterd/src/glusterd-bitd-svc.c:1</div><div>xlators/mgmt/glusterd/src/glusterd-geo-rep.c:3</div><div>xlators/mgmt/glusterd/src/glusterd-nfs-svc.c:1</div><div>xlators/mgmt/glusterd/src/glusterd-proc-mgmt.c:1</div><div>xlators/mgmt/glusterd/src/glusterd-quota.c:1</div><div>xlators/mgmt/glusterd/src/glusterd-scrub-svc.c:1</div><div>xlators/mgmt/glusterd/src/glusterd-svc-helper.c:1</div><div>xlators/mgmt/glusterd/src/glusterd-utils.c:2</div><div>xlators/nfs/server/src/nlm4.c:1</div></div><div><br></div><div>It might be worth analyzing why we need SIGKILLs and document the reason if they are indeed necessary.</div><div><br></div><div>HTH,</div><div>Vijay</div></div></div></div>