[Gluster-devel] Priority based ping packet for 3.10
Raghavendra G
raghavendra at gluster.com
Thu Jan 19 10:30:46 UTC 2017
On Thu, Jan 19, 2017 at 3:59 PM, Raghavendra G <raghavendra at gluster.com>
wrote:
> The more relevant question would be with TCP_KEEPALIVE and
> TCP_USER_TIMEOUT on sockets, do we really need ping-pong framework in
> Clients? We might need that in transport/rdma setups, but my question is
> concentrating on transport/rdma.
>
s \ concentrating on transport/rdma \ concentrating on transport/socket \
In other words would like to hear why do we need heart-beat mechanism in
> the first place. One scenario might be a healthy socket level connection
> but an unhealthy brick/client (like a deadlocked one). Are there enough
> such realistic scenarios which make ping-pong/heartbeat necessary? What
> other ways brick/client can go bad?
>
> On Thu, Jan 19, 2017 at 3:36 PM, Raghavendra G <raghavendra at gluster.com>
> wrote:
>
>>
>>
>> On Thu, Jan 19, 2017 at 1:50 PM, Mohammed Rafi K C <rkavunga at redhat.com>
>> wrote:
>>
>>> Hi,
>>>
>>> The patch for priority based ping packets [1] are ready to review. As
>>> Shyam mentioned in the comment on patch set 12, it doesn't solve the
>>> problem with network conjunction nor the disk latency. Also it won't
>>> priorities the reply of ping packets at the server end (We don't have a
>>> straight way to identify prognum in the reply).
>>>
>>>
>>> So my question , Is it worth of taking the patch or do we need to think
>>> through a more generic solutions.
>>>
>>
>> Though ping requests can take more time to reach server due to heavy
>> traffic, realistically speaking common reasons for ping-timer expiry have
>> been either
>>
>> 1. client not been able to read ping response [2]
>> 2. server not able to read ping request.
>>
>> Speaking about 2 above, Me, Kritika and Pranith were just discussing
>> today morning about an issue where they had hit ping timer expiry in
>> replicated setups when disk usage was high. The reason for this as Pranith
>> pointed out was,
>> 1. posix has some fops (like posix_xattrop, posix_fxattrop) which do
>> syscalls after holding a lock on inode (inode->lock).
>> 2. During high disk usage scenarios, syscall latencies were high
>> (sometimes >= ping-timeout value)
>> 3. Before being handed over to a new thread at io-threads xlator, a fop
>> gets executed in one of the threads that reads incoming messages from
>> socket. This execution path includes some translators like protocol/server,
>> index, quota-enforcer, marker. And these translators might access inode-ctx
>> which involves locking inode (inode->lock). Due to this locking latency of
>> syscall gets transferred to poller thread. Since poller thread is waiting
>> on inode->lock, it won't be able to read ping requests from network in-time
>> resulting in ping-timer expiry.
>>
>> I think Kritika is working on a patch to eliminate locking on inode in 1
>> above. We also need to reduce the actual fop execution in poller thread.
>> IOW, we need to hand over the fop execution to io-threads/syncop-threads as
>> early as we can. [3] helps in this scenario as it adds back the socket for
>> polling immediately after reading the entire msg but before execution of
>> fop begins. So, even though fop execution is happening in poller thread,
>> msgs from same connection can be read in other poller threads parallely
>> (and we can scale up the number of epoll-threads when load is high).
>>
>> Also, note that there is no way we can send entire ping request as
>> "URGENT" data over network. So, prioritization in [1] is only the queue of
>> messages waiting to be written to network. So, Though I suggested [1], the
>> more I think of it, it seems less irrelevant.
>>
>> [2] http://review.gluster.org/12402
>> [3] http://review.gluster.org/15036
>>
>>
>>>
>>> Note : We could make this patch more generic so that any packets can be
>>> marked as priority to add into the head instead of just Ping packets.
>>>
>>> [1] : http://review.gluster.org/#/c/11935/
>>>
>>> Regards
>>>
>>> Rafi KC
>>>
>>> _______________________________________________
>>> Gluster-devel mailing list
>>> Gluster-devel at gluster.org
>>> http://lists.gluster.org/mailman/listinfo/gluster-devel
>>>
>>
>>
>>
>> --
>> Raghavendra G
>>
>
>
>
> --
> Raghavendra G
>
--
Raghavendra G
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-devel/attachments/20170119/2b6a2c60/attachment.html>
More information about the Gluster-devel
mailing list