[Gluster-devel] Mount hangs because of connection delays
ravishankar at redhat.com
Thu Jul 2 17:24:47 UTC 2015
On 07/02/2015 07:04 PM, Pranith Kumar Karampuri wrote:
> When glusterfs mount process is coming up all cluster xlators wait
> for at least one event from all the children before propagating the
> status upwards. Sometimes client xlator takes upto 2 minutes to
> propogate this
> event(https://bugzilla.redhat.com/show_bug.cgi?id=1054694#c0) Due to
> this xavi implemented timer in ec notify where we treat a child as
> down if it doesn't come up in 10 seconds. Similar patch went up for
> review @http://review.gluster.org/#/c/11113 for afr. Kritika raised an
> interesting point in the review that all cluster xlators need to have
> this logic for the mount to not hang, and the correct place to fix it
> would be client xlator itself. i.e. add the timer logic in client
> xlator. Which seems like a better approach.
I think it makes sense to handle the change only in relevant cluster
xlators like AFR/EC because of the notion of high availability
associated with them. In my limited understanding, protocol-client is
the originator (?) of the child up/down events. While it looks okay to
allow cluster xlators to take certain decisions because the 'originator'
did not respond within a specific time, altering the originator itself
without giving a chance to the upper xlators to make choices seems
incorrect to me. Perhaps I'm wrong, but setting an unconditional 10
second timer on protocol/client seems to beat the purpose of having a
configurable `network.ping-timeout` volume set option.
Just my two cents. :)
> I just want to take inputs from everyone before we go ahead in that
> i.e. on PARENT_UP in client xlator it will start a timer and if no rpc
> notification is received in that timeout it treats the client xlator
> as down.
More information about the Gluster-devel