[Gluster-devel] GF_PARENT_DOWN on SIGKILL

Vijay Bellur vbellur at redhat.com
Mon Jul 25 19:51:46 UTC 2016


On 07/25/2016 02:41 AM, Xavier Hernandez wrote:
> Hi Jeff,
>
> On 22/07/16 15:37, Jeff Darcy wrote:
>>> Gah! sorry sorry, I meant to send the mail as SIGTERM. Not SIGKILL.
>>> So xavi
>>> and I were wondering why cleanup_and_exit() is not sending
>>> GF_PARENT_DOWN
>>> event.
>>
>> OK, then that grinding sound you hear is my brain shifting gears.  ;)  It
>> seems that cleanup_and_exit will call xlator.fini in some few cases, but
>> it doesn't do anything that would send notify events.  I'll bet the
>> answer
>> to "why" is just that nobody thought of it or got around to it.  The next
>> question I'd ask is: can you do what you need to do from ec.fini instead?
>> That would require enabling it in should_call_fini as well, but otherwise
>> seems pretty straightforward.
>
> As far as I know, there's no explicit guarantee on the order in which
> fini is called, so we cannot rely on it to do cleanup because ec needs
> that all its underlying xlators be fully functional to finish the cleanup.
>
> If this can be explicitly enforced and maintained, I think it could be
> moved but with some tricks, since fini is exepected to be a synchronous
> operation and the ec cleanup is asynchronous.
>
>>
>> If the answer to that question is no, then things get more complicated.
>> Can we do one loop that sends GF_EVENT_PARENT_DOWN events, then another
>> that calls fini?  Can we just do a basic list traversal (as we do now for
>> fini) or do we need to do something more complicated to deal with cluster
>> translators?  I think a separate loop doing basic list traversal would
>> work, even with brick multiplexing, so it's probably worth just coding it
>> up as an experiment.
>
> The main "difficulty" here is the asynchronous behavior of the cleanup.
> Nothing else can be shut down until the cleanup finishes.
>
> Maybe the GF_EVENT_PARENT_DOWN should account for this
> asynchronous/delayed operation, while the fini should be kept as a
> synchronous cleanup and resource release operation.
>

+1. GF_EVENT_PARENT_DOWN or similar can let translators know that we are 
winding down. Once the translators are done with respective asynchronous 
operations, they would need to acknowledge about being ready for fini(). 
Once all translators ack, we could go about invoking fini() as the final 
cleanup.

-Vijay



More information about the Gluster-devel mailing list