[Gluster-devel] glustershd status

Harshavardhana harsha at harshavardhana.net
Thu Jul 17 17:25:09 UTC 2014


This is a small memory system like 1024M and a disk space for the
volume is 9gig, i do not think it has anything to do with AFR per se -
same bug is also reproducible on the bricks, nfs server too.  Also it
might be that we aren't able to capture glusterdumps on non Linux
platforms properly - one of reasons i used Valgrind output.

In Valgrind it indicates about 'lost memory' blocks - You can see the
screenshots too which indicate memory ramp ups in seconds with no i/o,
in-fact no data on the volume.

The work-around i have seen to contain this issue is to disable
self-heal-deamon and NFS - after that the memory remains proper. On an
interesting observation after running Gluster management daemon in
debugging more - i can see that

RPCLNT_CONNECT event() is constantly being triggered - which should
only occur once?? per process notification?


On Thu, Jul 17, 2014 at 3:38 AM, Krishnan Parthasarathi
<kparthas at redhat.com> wrote:
> Harsha,
>
> I don't actively work on AFR, so I might have missed some things.
> I looked for the following things in the statedump for any memory allocation
> related oddities.
> 1) grep "pool-misses" *dump*
> This tells us if there were any objects whose allocated mem-pool wasn't sufficient
> for the load it was working under.
> I see that the pool-misses were zero, which means we are doing good with the mem-pools we allocated.
>
> 2) grep "hot-count" *dump*
> This tells us the no. of objects of any kind that is 'active' in the process while the state-dump
> was taken. This should allow us to see if the numbers we see are explicable.
> I see the maximum hot-count across statedumps of processes is 50, which isn't alarming or pointing any obvious memory leaks.
>
> The above observations indicate that some object that is not mem-pool allocated is being leaked.
>
> Hope this helps,
> Krish
>
> ----- Original Message -----
>> Here you go KP - https://bugzilla.redhat.com/show_bug.cgi?id=1120570
>>
>> On Thu, Jul 17, 2014 at 12:37 AM, Krishnan Parthasarathi
>> <kparthas at redhat.com> wrote:
>> > Harsha,
>> >
>> > In addition to the valgrind output, statedump output of glustershd process
>> > when the leak is observed would be really helpful.
>> >
>> > thanks,
>> > Krish
>> >
>> > ----- Original Message -----
>> >> Nope spoke too early, using poll() has no effect on the memory usage
>> >> on Linux, so actually back to FreeBSD.
>> >>
>> >> On Thu, Jul 17, 2014 at 12:07 AM, Harshavardhana
>> >> <harsha at harshavardhana.net> wrote:
>> >> > KP,
>> >> >
>> >> > I do have a 3.2Gigs worth of valgrind output which indicates this
>> >> > issue, trying to reproduce this on Linux.
>> >> >
>> >> > My hunch says that 'compiling' with --disable-epoll might actually
>> >> > trigger this issue on Linux too. Will update here
>> >> > once i have done that testing.
>> >> >
>> >> >
>> >> > On Wed, Jul 16, 2014 at 11:44 PM, Krishnan Parthasarathi
>> >> > <kparthas at redhat.com> wrote:
>> >> >> Emmanuel,
>> >> >>
>> >> >> Could you take statedump* of the glustershd process when it has leaked
>> >> >> enough memory to be able to observe and share the output? This might
>> >> >> give us what kind of objects are we allocating abnormally high.
>> >> >>
>> >> >> * statedump of a glusterfs process
>> >> >> #kill -USR1 <pid of process>
>> >> >>
>> >> >> HTH,
>> >> >> Krish
>> >> >>
>> >> >>
>> >> >> ----- Original Message -----
>> >> >>> On Wed, Jul 16, 2014 at 11:32:06PM -0700, Harshavardhana wrote:
>> >> >>> > On a side note while looking into this issue  - I uncovered a memory
>> >> >>> > leak too which after successful registration with glusterd,
>> >> >>> > Self-heal
>> >> >>> > daemon and NFS server are killed by FreeBSD memory manager. Have you
>> >> >>> > observed any memory leaks?
>> >> >>> > I have the valgrind output and it clearly indicates of large memory
>> >> >>> > leaks - perhaps it could be just FreeBSD thing!
>> >> >>>
>> >> >>> I observed memory leaks on long terme usage. My favourite test case
>> >> >>> is building NetBSD on a replicated/distributed volume, and I can see
>> >> >>> processes growing a lot during the build. I reported it some time ago,
>> >> >>> and some leaks were plugged, but obviosuly some remain.
>> >> >>>
>> >> >>> valgrind was never ported to NetBSD, hence I lack investigative tools,
>> >> >>> but I bet the leaks exist on FreeBSD and Linux as well.
>> >> >>>
>> >> >>> --
>> >> >>> Emmanuel Dreyfus
>> >> >>> manu at netbsd.org
>> >> >>> _______________________________________________
>> >> >>> Gluster-devel mailing list
>> >> >>> Gluster-devel at gluster.org
>> >> >>> http://supercolony.gluster.org/mailman/listinfo/gluster-devel
>> >> >>>
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > Religious confuse piety with mere ritual, the virtuous confuse
>> >> > regulation with outcomes
>> >>
>> >>
>> >>
>> >> --
>> >> Religious confuse piety with mere ritual, the virtuous confuse
>> >> regulation with outcomes
>> >>
>>
>>
>>
>> --
>> Religious confuse piety with mere ritual, the virtuous confuse
>> regulation with outcomes
>>



-- 
Religious confuse piety with mere ritual, the virtuous confuse
regulation with outcomes


More information about the Gluster-devel mailing list