[Gluster-devel] Priority based ping packet for 3.10

Thu Jan 19 10:29:30 UTC 2017

The more relevant question would be with TCP_KEEPALIVE and TCP_USER_TIMEOUT
on sockets, do we really need ping-pong framework in Clients? We might need
that in transport/rdma setups, but my question is concentrating on
transport/rdma. In other words would like to hear why do we need heart-beat
mechanism in the first place. One scenario might be a healthy socket level
connection but an unhealthy brick/client (like a deadlocked one). Are there
enough such realistic scenarios which make ping-pong/heartbeat necessary?
What other ways brick/client can go bad?

On Thu, Jan 19, 2017 at 3:36 PM, Raghavendra G <raghavendra at gluster.com>
wrote:

>
>
> On Thu, Jan 19, 2017 at 1:50 PM, Mohammed Rafi K C <rkavunga at redhat.com>
> wrote:
>
>> Hi,
>>
>> The patch for priority based ping packets [1] are ready to review. As
>> Shyam mentioned in the comment on patch set 12, it doesn't solve the
>> problem with network conjunction nor the disk latency. Also it won't
>> priorities the reply of ping packets at the server end (We don't have a
>> straight way to identify prognum in the reply).
>>
>>
>> So my question , Is it worth of taking the patch or do we need to think
>> through a more generic solutions.
>>
>
> Though ping requests can take more time to reach server due to heavy
> traffic, realistically speaking common reasons for ping-timer expiry have
> been either
>
> 1. client not been able to read ping response [2]
> 2. server not able to read ping request.
>
> Speaking about 2 above, Me, Kritika and Pranith were just discussing today
> morning about an issue where they had hit ping timer expiry in replicated
> setups when disk usage was high. The reason for this as Pranith pointed out
> was,
> 1. posix has some fops (like posix_xattrop, posix_fxattrop) which do
> syscalls after holding a lock on inode (inode->lock).
> 2. During high disk usage scenarios, syscall latencies were high
> (sometimes >= ping-timeout value)
> 3. Before being handed over to a new thread at io-threads xlator, a fop
> gets executed in one of the threads that reads incoming messages from
> socket. This execution path includes some translators like protocol/server,
> index, quota-enforcer, marker. And these translators might access inode-ctx
> which involves locking inode (inode->lock). Due to this locking latency of
> syscall gets transferred to poller thread. Since poller thread is waiting
> on inode->lock, it won't be able to read ping requests from network in-time
> resulting in ping-timer expiry.
>
> I think Kritika is working on a patch to eliminate locking on inode in 1
> above. We also need to reduce the actual fop execution in poller thread.
> IOW, we need to hand over the fop execution to io-threads/syncop-threads as
> early as we can. [3] helps in this scenario as it adds back the socket for
> polling immediately after reading the entire msg but before execution of
> fop begins. So, even though fop execution is happening in poller thread,
> msgs from same connection can be read in other poller threads parallely
> (and we can scale up the number of epoll-threads when load is high).
>
> Also, note that there is no way we can send entire ping request as
> "URGENT" data over network. So, prioritization in [1] is only the queue of
> messages waiting to be written to network. So, Though I suggested [1], the
> more I think of it, it seems less irrelevant.
>
> [2] http://review.gluster.org/12402
> [3] http://review.gluster.org/15036
>
>
>>
>> Note : We could make this patch more generic so that any packets can be
>> marked as priority to add into the head instead of just Ping packets.
>>
>> [1] : http://review.gluster.org/#/c/11935/
>>
>> Regards
>>
>> Rafi KC
>>
>> _______________________________________________
>> Gluster-devel mailing list
>> Gluster-devel at gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-devel
>>
>
>
>
> --
> Raghavendra G
>

-- 
Raghavendra G
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-devel/attachments/20170119/8638cf79/attachment-0001.html>