[Gluster-devel] Too many open files
Brent A Nelson
brent at phys.ufl.edu
Fri Apr 6 19:39:37 UTC 2007
PS You also need a xlators/encryption//Makefile.am for autogen.sh to
succeed.
Thanks,
Brent
On Fri, 6 Apr 2007, Brent A Nelson wrote:
> glusterfsd dies on both nodes almost immediately (I can ls succesfully once
> before it dies, but cd in and they're dead). The glusterfs processes are
> still running, but I of course have "Transport endpoint is not connected."
>
> Also, glusterfsd and glusterfs no longer seem to know where to log by default
> and refuse to start unless I give the -l option on each.
>
> Thanks,
>
> Brent
>
> On Fri, 6 Apr 2007, Anand Avati wrote:
>
>> Brent,
>> the fix has been committed. can you please check if it works for you?
>>
>> regards,
>> avati
>>
>> On Thu, Apr 05, 2007 at 02:09:26AM -0400, Brent A Nelson wrote:
>>> That's correct. I had commented out unify when narrowing down the mtime
>>> bug (which turned out to be writebehind) and then decided I had no reason
>>> to put it back in for this two-brick filesystem. It was mounted without
>>> unify when this issue occurred.
>>>
>>> Thanks,
>>>
>>> Brent
>>>
>>> On Wed, 4 Apr 2007, Anand Avati wrote:
>>>
>>>> Can you confirm that you were NOT using unify int he setup??
>>>>
>>>> regards,
>>>> avati
>>>>
>>>>
>>>> On Thu, Apr 05, 2007 at 01:09:16AM -0400, Brent A Nelson wrote:
>>>>> Awesome!
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Brent
>>>>>
>>>>> On Wed, 4 Apr 2007, Anand Avati wrote:
>>>>>
>>>>>> Brent,
>>>>>> thank you so much for your efforts of sending the output!
>>>>>> from the log it is clear the leak fd's are all for directories. Indeed
>>>>>> there was an issue with releasedir() call reaching all the nodes. The
>>>>>> fix should be committed today to tla.
>>>>>>
>>>>>> Thanks!!
>>>>>>
>>>>>> avati
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Apr 04, 2007 at 09:18:48PM -0400, Brent A Nelson wrote:
>>>>>>> I avoided restarting, as this issue would take a while to reproduce.
>>>>>>>
>>>>>>> jupiter01 and jupiter02 are mirrors of each other. All performance
>>>>>>> translators are in use, except for writebehind (due to the mtime bug).
>>>>>>>
>>>>>>> jupiter01:
>>>>>>> ls -l /proc/26466/fd |wc
>>>>>>> 65536 655408 7358168
>>>>>>> See attached for ls -l output.
>>>>>>>
>>>>>>> jupiter02:
>>>>>>> ls -l /proc/3651/fd |wc
>>>>>>> ls -l /proc/3651/fd
>>>>>>> total 11
>>>>>>> lrwx------ 1 root root 64 2007-04-04 20:43 0 -> /dev/null
>>>>>>> lrwx------ 1 root root 64 2007-04-04 20:43 1 -> /dev/null
>>>>>>> lrwx------ 1 root root 64 2007-04-04 20:43 10 -> socket:[2565251]
>>>>>>> lrwx------ 1 root root 64 2007-04-04 20:43 2 -> /dev/null
>>>>>>> l-wx------ 1 root root 64 2007-04-04 20:43 3 ->
>>>>>>> /var/log/glusterfs/glusterfsd.log
>>>>>>> lrwx------ 1 root root 64 2007-04-04 20:43 4 -> socket:[2255275]
>>>>>>> lrwx------ 1 root root 64 2007-04-04 20:43 5 -> socket:[2249710]
>>>>>>> lr-x------ 1 root root 64 2007-04-04 20:43 6 -> eventpoll:[2249711]
>>>>>>> lrwx------ 1 root root 64 2007-04-04 20:43 7 -> socket:[2255306]
>>>>>>> lr-x------ 1 root root 64 2007-04-04 20:43 8 ->
>>>>>>> /etc/glusterfs/glusterfs-client.vol
>>>>>>> lr-x------ 1 root root 64 2007-04-04 20:43 9 ->
>>>>>>> /etc/glusterfs/glusterfs-client.vol
>>>>>>>
>>>>>>> Note that it looks like all those extra directories listed on
>>>>>>> jupiter01
>>>>>>> were locally rsynched from jupiter01's Lustre filesystems onto the
>>>>>>> glusterfs client on jupiter01. A very large rsync from a different
>>>>>>> machine to jupiter02 didn't go nuts.
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Brent
>>>>>>>
>>>>>>> On Wed, 4 Apr 2007, Anand Avati wrote:
>>>>>>>
>>>>>>>> Brent,
>>>>>>>> I hope the system is still in the same state to dig some info out.
>>>>>>>> To verify that it is a file descriptor leak, can you please run this
>>>>>>>> test. On the server, run ps ax and get the PID of glusterfsd. then do
>>>>>>>> an ls -l on /proc/<pid>/fd/ and please mail the output of that. That
>>>>>>>> should give a precise idea of what is happening.
>>>>>>>> If the system has been reset out of the state, please give us the
>>>>>>>> spec file you are using and the commands you ran (of some major jobs
>>>>>>>> like heavy rsync) so that we will try to reproduce the error in our
>>>>>>>> setup.
>>>>>>>>
>>>>>>>> regards,
>>>>>>>> avati
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Apr 04, 2007 at 01:12:33PM -0400, Brent A Nelson wrote:
>>>>>>>>> I put a 2-node GlusterFS mirror into use internally yesterday, as
>>>>>>>>> GlusterFS was looking pretty solid, and I rsynced a whole bunch of
>>>>>>>>> stuff
>>>>>>>>> to it. Today, however, an ls on any of the three clients gives me:
>>>>>>>>>
>>>>>>>>> ls: /backup: Too many open files
>>>>>>>>>
>>>>>>>>> It looks like glusterfsd hit a limit. Is this a bug
>>>>>>>>> (glusterfs/glusterfsd
>>>>>>>>> forgetting to close files; essentially, a file descriptor leak), or
>>>>>>>>> do I
>>>>>>>>> just need to increase the limit somewhere?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> Brent
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Gluster-devel mailing list
>>>>>>>>> Gluster-devel at nongnu.org
>>>>>>>>> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Shaw's Principle:
>>>>>>>> Build a system that even a fool can use,
>>>>>>>> and only a fool will want to use it.
>>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Shaw's Principle:
>>>>>> Build a system that even a fool can use,
>>>>>> and only a fool will want to use it.
>>>>>>
>>>>>
>>>>
>>>> --
>>>> Shaw's Principle:
>>>> Build a system that even a fool can use,
>>>> and only a fool will want to use it.
>>>>
>>>
>>
>> --
>> Shaw's Principle:
>> Build a system that even a fool can use,
>> and only a fool will want to use it.
>>
>
More information about the Gluster-devel
mailing list