[Gluster-devel] Problems with ec/nfs.t in regression tests

Thu Feb 12 18:27:13 UTC 2015

On 12.02.2015 19:09, Pranith Kumar Karampuri wrote: 

> On
02/12/2015 11:34 PM, Pranith Kumar Karampuri wrote:
> 
>> On 02/12/2015
08:15 PM, Xavier Hernandez wrote: 
>> 
>>> I've made some more
investigation and the problem seems worse. It seems that NFS sends a
huge amount of requests without waiting for answers (I've had more than
1400 requests ongoing). Probably there will be many factors that can
influence on the load that this causes, and one of them could be ec, but
it's not related exclusively to ec. I've repeated the test using a
replica 3 and a replica 2 volumes and the problem still happens. The
test basically writes a file to an NFS mount using 'dd'. The file has a
size of 1GB. With a smaller file, the test passes successfully.
>> Using
NFS client and gluster NFS server on same machine with BIG file dd
operations is known to cause hangs. anon-fd-quota.t used to give similar
problems so we changed the test to not involve NFS mounts.
> 
> I don't
re-collect the exact scenario. Avati found the deadlock of 
> memory
allocation, when I just joined gluster, in 2010. Raghavendra Bhat 
>
raised this bug then. CCed him to the thread as well if he knows the 
>
exact scenario.

I've been doing some tests with Shyam and it seems that
the root cause is the edge-triggered epoll introduced in the
multi-threaded epoll patch. It has a side effect that makes the
outstanding-rpc-limit option near to useless and gluster gets overflowed
of requests, causing timeouts and disconnections on slow/busy
machines.

I've opened bug #1192114 for this problem.

Xavi

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-devel/attachments/20150212/c5f70340/attachment-0001.html>