[Gluster-devel] Mount hangs because of connection delays

Thu Jul 2 14:39:29 UTC 2015

I agree that a generic solution for all cluster xlators would be good.

Only question I have is whether parallel notifications are specially 
handled somewhere.

For example, if client xlator sends EC_CHILD_DOWN after a timeout, it's 
possible that an immediate EC_CHILD_UP is sent if the brick is 
connected. In this case, the cluster xlator could receive both 
notifications in any order (we have multi-threading), which is dangerous 
if EC_CHILD_DOWN is processed after EC_CHILD_UP.

I've seen that protocol/client doesn't send one notification until the 
previous one has been completed. However this assumes that there won't 
be any xlator that delays the notification (i.e. sends it in background 
at another moment). Is that a requirement to process notifications ? 
otherwise the concurrent notifications problem could appear even if 
protocol/client serializes them.

Xavi

On 07/02/2015 03:34 PM, Pranith Kumar Karampuri wrote:
> hi,
>      When glusterfs mount process is coming up all cluster xlators wait
> for at least one event from all the children before propagating the
> status upwards. Sometimes client xlator takes upto 2 minutes to
> propogate this
> event(https://bugzilla.redhat.com/show_bug.cgi?id=1054694#c0) Due to
> this xavi implemented timer in ec notify where we treat a child as down
> if it doesn't come up in 10 seconds. Similar patch went up for review
> @http://review.gluster.org/#/c/11113 for afr. Kritika raised an
> interesting point in the review that all cluster xlators need to have
> this logic for the mount to not hang, and the correct place to fix it
> would be client xlator itself. i.e. add the timer logic in client
> xlator. Which seems like a better approach. I just want to take inputs
> from everyone before we go ahead in that direction.
> i.e. on PARENT_UP in client xlator it will start a timer and if no rpc
> notification is received in that timeout it treats the client xlator as
> down.
>
> Pranith