[Gluster-devel] glustershd status

Fri Jul 18 04:05:48 UTC 2014

Harsha,
I haven't gotten around looking at the valgrind output. I am not sure if I will be able to do it soon since I am travelling next week.
Are you seeing an equal no. of disconnect messages in glusterd logs? What is the ip:port you observe in the RPC_CLNT_CONNECT messages? Could you attach the logs to the bug? 

thanks,
Krish

----- Original Message -----
> This is a small memory system like 1024M and a disk space for the
> volume is 9gig, i do not think it has anything to do with AFR per se -
> same bug is also reproducible on the bricks, nfs server too.  Also it
> might be that we aren't able to capture glusterdumps on non Linux
> platforms properly - one of reasons i used Valgrind output.
> 
> In Valgrind it indicates about 'lost memory' blocks - You can see the
> screenshots too which indicate memory ramp ups in seconds with no i/o,
> in-fact no data on the volume.
> 
> The work-around i have seen to contain this issue is to disable
> self-heal-deamon and NFS - after that the memory remains proper. On an
> interesting observation after running Gluster management daemon in
> debugging more - i can see that
> 
> RPCLNT_CONNECT event() is constantly being triggered - which should
> only occur once?? per process notification?
> 
> 
> On Thu, Jul 17, 2014 at 3:38 AM, Krishnan Parthasarathi
> <kparthas at redhat.com> wrote:
> > Harsha,
> >
> > I don't actively work on AFR, so I might have missed some things.
> > I looked for the following things in the statedump for any memory
> > allocation
> > related oddities.
> > 1) grep "pool-misses" *dump*
> > This tells us if there were any objects whose allocated mem-pool wasn't
> > sufficient
> > for the load it was working under.
> > I see that the pool-misses were zero, which means we are doing good with
> > the mem-pools we allocated.
> >
> > 2) grep "hot-count" *dump*
> > This tells us the no. of objects of any kind that is 'active' in the
> > process while the state-dump
> > was taken. This should allow us to see if the numbers we see are
> > explicable.
> > I see the maximum hot-count across statedumps of processes is 50, which
> > isn't alarming or pointing any obvious memory leaks.
> >
> > The above observations indicate that some object that is not mem-pool
> > allocated is being leaked.
> >
> > Hope this helps,
> > Krish
> >
> > ----- Original Message -----
> >> Here you go KP - https://bugzilla.redhat.com/show_bug.cgi?id=1120570
> >>
> >> On Thu, Jul 17, 2014 at 12:37 AM, Krishnan Parthasarathi
> >> <kparthas at redhat.com> wrote:
> >> > Harsha,
> >> >
> >> > In addition to the valgrind output, statedump output of glustershd
> >> > process
> >> > when the leak is observed would be really helpful.
> >> >
> >> > thanks,
> >> > Krish
> >> >
> >> > ----- Original Message -----
> >> >> Nope spoke too early, using poll() has no effect on the memory usage
> >> >> on Linux, so actually back to FreeBSD.
> >> >>
> >> >> On Thu, Jul 17, 2014 at 12:07 AM, Harshavardhana
> >> >> <harsha at harshavardhana.net> wrote:
> >> >> > KP,
> >> >> >
> >> >> > I do have a 3.2Gigs worth of valgrind output which indicates this
> >> >> > issue, trying to reproduce this on Linux.
> >> >> >
> >> >> > My hunch says that 'compiling' with --disable-epoll might actually
> >> >> > trigger this issue on Linux too. Will update here
> >> >> > once i have done that testing.
> >> >> >
> >> >> >
> >> >> > On Wed, Jul 16, 2014 at 11:44 PM, Krishnan Parthasarathi
> >> >> > <kparthas at redhat.com> wrote:
> >> >> >> Emmanuel,
> >> >> >>
> >> >> >> Could you take statedump* of the glustershd process when it has
> >> >> >> leaked
> >> >> >> enough memory to be able to observe and share the output? This might
> >> >> >> give us what kind of objects are we allocating abnormally high.
> >> >> >>
> >> >> >> * statedump of a glusterfs process
> >> >> >> #kill -USR1 <pid of process>
> >> >> >>
> >> >> >> HTH,
> >> >> >> Krish
> >> >> >>
> >> >> >>
> >> >> >> ----- Original Message -----
> >> >> >>> On Wed, Jul 16, 2014 at 11:32:06PM -0700, Harshavardhana wrote:
> >> >> >>> > On a side note while looking into this issue  - I uncovered a
> >> >> >>> > memory
> >> >> >>> > leak too which after successful registration with glusterd,
> >> >> >>> > Self-heal
> >> >> >>> > daemon and NFS server are killed by FreeBSD memory manager. Have
> >> >> >>> > you
> >> >> >>> > observed any memory leaks?
> >> >> >>> > I have the valgrind output and it clearly indicates of large
> >> >> >>> > memory
> >> >> >>> > leaks - perhaps it could be just FreeBSD thing!
> >> >> >>>
> >> >> >>> I observed memory leaks on long terme usage. My favourite test case
> >> >> >>> is building NetBSD on a replicated/distributed volume, and I can
> >> >> >>> see
> >> >> >>> processes growing a lot during the build. I reported it some time
> >> >> >>> ago,
> >> >> >>> and some leaks were plugged, but obviosuly some remain.
> >> >> >>>
> >> >> >>> valgrind was never ported to NetBSD, hence I lack investigative
> >> >> >>> tools,
> >> >> >>> but I bet the leaks exist on FreeBSD and Linux as well.
> >> >> >>>
> >> >> >>> --
> >> >> >>> Emmanuel Dreyfus
> >> >> >>> manu at netbsd.org
> >> >> >>> _______________________________________________
> >> >> >>> Gluster-devel mailing list
> >> >> >>> Gluster-devel at gluster.org
> >> >> >>> http://supercolony.gluster.org/mailman/listinfo/gluster-devel
> >> >> >>>
> >> >> >
> >> >> >
> >> >> >
> >> >> > --
> >> >> > Religious confuse piety with mere ritual, the virtuous confuse
> >> >> > regulation with outcomes
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Religious confuse piety with mere ritual, the virtuous confuse
> >> >> regulation with outcomes
> >> >>
> >>
> >>
> >>
> >> --
> >> Religious confuse piety with mere ritual, the virtuous confuse
> >> regulation with outcomes
> >>
> 
> 
> 
> --
> Religious confuse piety with mere ritual, the virtuous confuse
> regulation with outcomes
>