[Gluster-devel] Gap in protocol client-server handshake
Raghavendra Gowdappa
rgowdapp at redhat.com
Mon Feb 29 11:58:13 UTC 2016
----- Original Message -----
> From: "Avra Sengupta" <asengupt at redhat.com>
> To: "Gluster Devel" <gluster-devel at gluster.org>
> Sent: Monday, February 29, 2016 5:20:53 PM
> Subject: [Gluster-devel] Gap in protocol client-server handshake
>
> Hi,
>
> Currently on a successful connection between protocol server and client,
> the protocol client initiates a CHILD_UP event in the client stack. At
> this point in time, only the connection between server and client is
> established, and there is no guarantee that the server side stack is
> ready to serve requests. It works fine now, as most server side
> translators are not dependent on any other factors, before being able to
> serve requests today and hence they are up by the time the client stack
> translators receive the CHILD_UP (initiated by client handshake).
>
> The gap here is exposed when certain server side translators like
> NSR-Server for example, have a couple of protocol clients as their
> child(connecting them to other bricks), and they can't really serve
> requests till a quorum of their children are up. Hence these translators
> *should* defer sending CHILD_UP till they have enough children up, and
> the same CHILD_UP event needs to be propagated to the client stack
> translators.
Yes. We have seen this problem (mostly in the form of crashes of brick process).
>
> I have sent a patch(http://review.gluster.org/#/c/13549/) addressing
> this, where we maintain a child_up variable in both the protocol client
> and protocol server translators. The protocol server updates this value
> based on the CHILD_UP and CHILD_DOWN events it receives from the
> translators below it. On receiving such an event it forwards that event
> to the client. The protocol client on receiving such an event forwards
> it up the client stack, thereby letting the client translators correctly
> know that the server is up and ready to serve.
>
> The clients connecting later(long after a server has initialized and
> processed it's CHILD_UP events), receive a child_up status as part of
> the handshake, and based on the status of the server's child_up, either
> propagate a CHILD_UP event or defer it.
>
> Please have a look at the patch, and kindly state if it you have any
> concerns or you foresee any scenarios of interest which we might have
> missed.
Thanks for the patch. I'll review it.
>
> Regards,
> Avra
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
>
More information about the Gluster-devel
mailing list