[Bugs] [Bug 1254137] Rebalance fix-layout fails after some time with a timeout

Fri Oct 9 06:43:35 UTC 2015

https://bugzilla.redhat.com/show_bug.cgi?id=1254137

Raghavendra G <rgowdapp at redhat.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rgowdapp at redhat.com

--- Comment #8 from Raghavendra G <rgowdapp at redhat.com> ---
we block reading from socket till event-handler completes. This might cause
spurious disconnects due to ping-timer expiry if handlers take more time. In
this bug, the load seems to be from readdirp. I just looked at readdirp reply
path. It involves looping over dentry list from various translators:

1. protocol/client construct dentry list and hence it traverses the list.
2. afr does a loop over dentries
3. dht does a loop over dentries
4. syncop_readdirp_cbk (rebalance process use syncops) copies each dentry and
constructs a new list.

I am suspecting whether such heavy processing in handler might've prevented the
client from reading the ping response from socket (if ping response was queued
behind readdirp response), resulting in timeout of ping-timer.

One solution is that it would be better if we start reading from socket once we
read a complete rpc msg. We need not wait till rpc-program/rpc-clnt above
transport to process the reply.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=2M0odgoot8&a=cc_unsubscribe