[Bugs] [Bug 1254137] Rebalance fix-layout fails after some time with a timeout
bugzilla at redhat.com
bugzilla at redhat.com
Fri Oct 9 06:43:35 UTC 2015
https://bugzilla.redhat.com/show_bug.cgi?id=1254137
Raghavendra G <rgowdapp at redhat.com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |rgowdapp at redhat.com
--- Comment #8 from Raghavendra G <rgowdapp at redhat.com> ---
we block reading from socket till event-handler completes. This might cause
spurious disconnects due to ping-timer expiry if handlers take more time. In
this bug, the load seems to be from readdirp. I just looked at readdirp reply
path. It involves looping over dentry list from various translators:
1. protocol/client construct dentry list and hence it traverses the list.
2. afr does a loop over dentries
3. dht does a loop over dentries
4. syncop_readdirp_cbk (rebalance process use syncops) copies each dentry and
constructs a new list.
I am suspecting whether such heavy processing in handler might've prevented the
client from reading the ping response from socket (if ping response was queued
behind readdirp response), resulting in timeout of ping-timer.
One solution is that it would be better if we start reading from socket once we
read a complete rpc msg. We need not wait till rpc-program/rpc-clnt above
transport to process the reply.
--
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=2M0odgoot8&a=cc_unsubscribe
More information about the Bugs
mailing list