[Gluster-devel] Too many open files

Brent A Nelson brent at phys.ufl.edu
Thu Apr 5 06:09:26 UTC 2007


That's correct.  I had commented out unify when narrowing down the mtime 
bug (which turned out to be writebehind) and then decided I had no reason 
to put it back in for this two-brick filesystem.  It was mounted without 
unify when this issue occurred.

Thanks,

Brent

On Wed, 4 Apr 2007, Anand Avati wrote:

> Can you confirm that you were NOT using unify int he setup??
>
> regards,
> avati
>
>
> On Thu, Apr 05, 2007 at 01:09:16AM -0400, Brent A Nelson wrote:
>> Awesome!
>>
>> Thanks,
>>
>> Brent
>>
>> On Wed, 4 Apr 2007, Anand Avati wrote:
>>
>>> Brent,
>>> thank you so much for your efforts of sending the output!
>>> from the log it is clear the leak fd's are all for directories. Indeed
>>> there was an issue with releasedir() call reaching all the nodes. The
>>> fix should be committed today to tla.
>>>
>>> Thanks!!
>>>
>>> avati
>>>
>>>
>>>
>>> On Wed, Apr 04, 2007 at 09:18:48PM -0400, Brent A Nelson wrote:
>>>> I avoided restarting, as this issue would take a while to reproduce.
>>>>
>>>> jupiter01 and jupiter02 are mirrors of each other.  All performance
>>>> translators are in use, except for writebehind (due to the mtime bug).
>>>>
>>>> jupiter01:
>>>> ls -l /proc/26466/fd |wc
>>>>  65536  655408 7358168
>>>> See attached for ls -l output.
>>>>
>>>> jupiter02:
>>>> ls -l /proc/3651/fd |wc
>>>> ls -l /proc/3651/fd
>>>> total 11
>>>> lrwx------ 1 root root 64 2007-04-04 20:43 0 -> /dev/null
>>>> lrwx------ 1 root root 64 2007-04-04 20:43 1 -> /dev/null
>>>> lrwx------ 1 root root 64 2007-04-04 20:43 10 -> socket:[2565251]
>>>> lrwx------ 1 root root 64 2007-04-04 20:43 2 -> /dev/null
>>>> l-wx------ 1 root root 64 2007-04-04 20:43 3 ->
>>>> /var/log/glusterfs/glusterfsd.log
>>>> lrwx------ 1 root root 64 2007-04-04 20:43 4 -> socket:[2255275]
>>>> lrwx------ 1 root root 64 2007-04-04 20:43 5 -> socket:[2249710]
>>>> lr-x------ 1 root root 64 2007-04-04 20:43 6 -> eventpoll:[2249711]
>>>> lrwx------ 1 root root 64 2007-04-04 20:43 7 -> socket:[2255306]
>>>> lr-x------ 1 root root 64 2007-04-04 20:43 8 ->
>>>> /etc/glusterfs/glusterfs-client.vol
>>>> lr-x------ 1 root root 64 2007-04-04 20:43 9 ->
>>>> /etc/glusterfs/glusterfs-client.vol
>>>>
>>>> Note that it looks like all those extra directories listed on jupiter01
>>>> were locally rsynched from jupiter01's Lustre filesystems onto the
>>>> glusterfs client on jupiter01.  A very large rsync from a different
>>>> machine to jupiter02 didn't go nuts.
>>>>
>>>> Thanks,
>>>>
>>>> Brent
>>>>
>>>> On Wed, 4 Apr 2007, Anand Avati wrote:
>>>>
>>>>> Brent,
>>>>> I hope the system is still in the same state to dig some info out.
>>>>> To verify that it is a file descriptor leak, can you please run this
>>>>> test. On the server, run ps ax and get the PID of glusterfsd. then do
>>>>> an ls -l on /proc/<pid>/fd/ and please mail the output of that. That
>>>>> should give a precise idea of what is happening.
>>>>> If the system has been reset out of the state, please give us the
>>>>> spec file you are using and the commands you ran (of some major jobs
>>>>> like heavy rsync) so that we will try to reproduce the error in our
>>>>> setup.
>>>>>
>>>>> regards,
>>>>> avati
>>>>>
>>>>>
>>>>> On Wed, Apr 04, 2007 at 01:12:33PM -0400, Brent A Nelson wrote:
>>>>>> I put a 2-node GlusterFS mirror into use internally yesterday, as
>>>>>> GlusterFS was looking pretty solid, and I rsynced a whole bunch of stuff
>>>>>> to it.  Today, however, an ls on any of the three clients gives me:
>>>>>>
>>>>>> ls: /backup: Too many open files
>>>>>>
>>>>>> It looks like glusterfsd hit a limit.  Is this a bug
>>>>>> (glusterfs/glusterfsd
>>>>>> forgetting to close files; essentially, a file descriptor leak), or do I
>>>>>> just need to increase the limit somewhere?
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Brent
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Gluster-devel mailing list
>>>>>> Gluster-devel at nongnu.org
>>>>>> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>>>>>>
>>>>>
>>>>> --
>>>>> Shaw's Principle:
>>>>>      Build a system that even a fool can use,
>>>>>      and only a fool will want to use it.
>>>>>
>>>
>>>
>>>
>>> --
>>> Shaw's Principle:
>>>       Build a system that even a fool can use,
>>>       and only a fool will want to use it.
>>>
>>
>
> -- 
> Shaw's Principle:
>        Build a system that even a fool can use,
>        and only a fool will want to use it.
>





More information about the Gluster-devel mailing list