[Gluster-devel] Priority based ping packet for 3.10

Thu Jan 19 10:06:44 UTC 2017

On Thu, Jan 19, 2017 at 1:50 PM, Mohammed Rafi K C <rkavunga at redhat.com>
wrote:

> Hi,
>
> The patch for priority based ping packets [1] are ready to review. As
> Shyam mentioned in the comment on patch set 12, it doesn't solve the
> problem with network conjunction nor the disk latency. Also it won't
> priorities the reply of ping packets at the server end (We don't have a
> straight way to identify prognum in the reply).
>
>
> So my question , Is it worth of taking the patch or do we need to think
> through a more generic solutions.
>

Though ping requests can take more time to reach server due to heavy
traffic, realistically speaking common reasons for ping-timer expiry have
been either

1. client not been able to read ping response [2]
2. server not able to read ping request.

Speaking about 2 above, Me, Kritika and Pranith were just discussing today
morning about an issue where they had hit ping timer expiry in replicated
setups when disk usage was high. The reason for this as Pranith pointed out
was,
1. posix has some fops (like posix_xattrop, posix_fxattrop) which do
syscalls after holding a lock on inode (inode->lock).
2. During high disk usage scenarios, syscall latencies were high (sometimes
>= ping-timeout value)
3. Before being handed over to a new thread at io-threads xlator, a fop
gets executed in one of the threads that reads incoming messages from
socket. This execution path includes some translators like protocol/server,
index, quota-enforcer, marker. And these translators might access inode-ctx
which involves locking inode (inode->lock). Due to this locking latency of
syscall gets transferred to poller thread. Since poller thread is waiting
on inode->lock, it won't be able to read ping requests from network in-time
resulting in ping-timer expiry.

I think Kritika is working on a patch to eliminate locking on inode in 1
above. We also need to reduce the actual fop execution in poller thread.
IOW, we need to hand over the fop execution to io-threads/syncop-threads as
early as we can. [3] helps in this scenario as it adds back the socket for
polling immediately after reading the entire msg but before execution of
fop begins. So, even though fop execution is happening in poller thread,
msgs from same connection can be read in other poller threads parallely
(and we can scale up the number of epoll-threads when load is high).

Also, note that there is no way we can send entire ping request as "URGENT"
data over network. So, prioritization in [1] is only the queue of messages
waiting to be written to network. So, Though I suggested [1], the more I
think of it, it seems less irrelevant.

[2] http://review.gluster.org/12402
[3] http://review.gluster.org/15036

>
> Note : We could make this patch more generic so that any packets can be
> marked as priority to add into the head instead of just Ping packets.
>
> [1] : http://review.gluster.org/#/c/11935/
>
> Regards
>
> Rafi KC
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-devel
>

-- 
Raghavendra G
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-devel/attachments/20170119/bd128f1b/attachment.html>