[Gluster-devel] Problems with ec/nfs.t in regression tests

Thu Feb 12 18:35:37 UTC 2015

On 02/12/2015 01:27 PM, Xavier Hernandez wrote:
> On 12.02.2015 19:09, Pranith Kumar Karampuri wrote:
>
>> On 02/12/2015 11:34 PM, Pranith Kumar Karampuri wrote:
>>> On 02/12/2015 08:15 PM, Xavier Hernandez wrote:
>>>> I've made some more investigation and the problem seems worse. It
>>>> seems that NFS sends a huge amount of requests without waiting for
>>>> answers (I've had more than 1400 requests ongoing). Probably there
>>>> will be many factors that can influence on the load that this
>>>> causes, and one of them could be ec, but it's not related
>>>> exclusively to ec. I've repeated the test using a replica 3 and a
>>>> replica 2 volumes and the problem still happens. The test basically
>>>> writes a file to an NFS mount using 'dd'. The file has a size of
>>>> 1GB. With a smaller file, the test passes successfully.
>>> Using NFS client and gluster NFS server on same machine with BIG file
>>> dd operations is known to cause hangs. anon-fd-quota.t used to give
>>> similar problems so we changed the test to not involve NFS mounts.
>> I don't re-collect the exact scenario. Avati found the deadlock of
>> memory allocation, when I just joined gluster, in 2010. Raghavendra Bhat
>> raised this bug then. CCed him to the thread as well if he knows the
>> exact scenario.
>>
> I've been doing some tests with Shyam and it seems that the root cause is the edge-triggered epoll introduced in the multi-threaded epoll patch. It has a side effect that makes the outstanding-rpc-limit option near to useless and gluster gets overflowed of requests, causing timeouts and disconnections on slow/busy machines.

Elaborating on this, the MT epoll makes the epoll edge triggered (ET), 
and so on an poll in event, we attempt to read as much as we can. If the 
client is able to supply 'n' RPCs till our read, gets a EAGAIN | 
EWOULDBLOCK, we will read them and not honor the server side throttle.

In the previous case, we read RPC by RPC and the epoll was not ET, hence 
when we reached the throttle limit, we stop reading from the socket. The 
network pipes would be filled up when this happens and so the client 
would also not be able to write more RPC, hence outstanding RPC (or 
ongoing RPCs) would be limited.

With the ET case in epoll we are breaking this.

Shyam