[Gluster-devel] FreeBSD port of GlusterFS racks up a lot of CPU usage

Raghavendra Gowdappa rgowdapp at redhat.com
Tue Jan 19 11:53:29 UTC 2016



----- Original Message -----
> From: "Martin Simmons" <martin at lispworks.com>
> To: "Raghavendra Gowdappa" <rgowdapp at redhat.com>
> Cc: rmacklem at uoguelph.ca, jdarcy at redhat.com, raghavendra at gluster.com, gluster-devel at gluster.org,
> freebsd-fs at freebsd.org, xhernandez at datalab.es, jkh at ixsystems.com
> Sent: Tuesday, January 19, 2016 3:35:07 PM
> Subject: Re: [Gluster-devel] FreeBSD port of GlusterFS racks up a lot of CPU usage
> 
> >>>>> On Tue, 19 Jan 2016 01:01:19 -0500 (EST), Raghavendra Gowdappa said:
> > 
> > ----- Original Message -----
> > > From: "Rick Macklem" <rmacklem at uoguelph.ca>
> > > To: "Raghavendra Gowdappa" <rgowdapp at redhat.com>
> > > Cc: "Jeff Darcy" <jdarcy at redhat.com>, "Raghavendra G"
> > > <raghavendra at gluster.com>, "freebsd-fs"
> > > <freebsd-fs at freebsd.org>, "Hubbard Jordan" <jkh at ixsystems.com>, "Xavier
> > > Hernandez" <xhernandez at datalab.es>, "Gluster
> > > Devel" <gluster-devel at gluster.org>
> > > Sent: Tuesday, January 19, 2016 4:07:09 AM
> > > Subject: Re: [Gluster-devel] FreeBSD port of GlusterFS racks up a lot of
> > > CPU usage
> > > 
> > > Raghavendra Gowdappa wrote:
> > > > 
> > > > 
> > > > ----- Original Message -----
> > > > > From: "Rick Macklem" <rmacklem at uoguelph.ca>
> > > > > To: "Jeff Darcy" <jdarcy at redhat.com>
> > > > > Cc: "Raghavendra G" <raghavendra at gluster.com>, "freebsd-fs"
> > > > > <freebsd-fs at freebsd.org>, "Hubbard Jordan"
> > > > > <jkh at ixsystems.com>, "Xavier Hernandez" <xhernandez at datalab.es>,
> > > > > "Gluster
> > > > > Devel" <gluster-devel at gluster.org>
> > > > > Sent: Saturday, January 9, 2016 7:29:59 AM
> > > > > Subject: Re: [Gluster-devel] FreeBSD port of GlusterFS racks up a lot
> > > > > of
> > > > > CPU usage
> > > > > 
> > > > > Jeff Darcy wrote:
> > > > > > > > I don't know anything about gluster's poll implementation so I
> > > > > > > > may
> > > > > > > > be totally wrong, but would it be possible to use an eventfd
> > > > > > > > (or a
> > > > > > > > pipe if eventfd is not supported) to signal the need to add
> > > > > > > > more
> > > > > > > > file descriptors to the poll call ?
> > > > > > > >
> > > > > > > >
> > > > > > > > The poll call should listen on this new fd. When we need to
> > > > > > > > change
> > > > > > > > the fd list, we should simply write to the eventfd or pipe from
> > > > > > > > another thread.  This will cause the poll call to return and we
> > > > > > > > will
> > > > > > > > be able to change the fd list without having a short timeout
> > > > > > > > nor
> > > > > > > > having to decide on any trade-off.
> > > > > > > 
> > > > > > >
> > > > > > > Thats a nice idea. Based on my understanding of why timeouts are
> > > > > > > being
> > > > > > > used, this approach can work.
> > > > > > 
> > > > > > The own-thread code which preceded the current poll implementation
> > > > > > did
> > > > > > something similar, using a pipe fd to be woken up for new
> > > > > > *outgoing*
> > > > > > messages.  That code still exists, and might provide some insight
> > > > > > into
> > > > > > how to do this for the current poll code.
> > > > > I took a look at event-poll.c and found something interesting...
> > > > > - A pipe called "breaker" is already set up by event_pool_new_poll()
> > > > > and
> > > > >   closed by event_pool_destroy_poll(), however it never gets used for
> > > > >   anything.
> > > > 
> > > > I did a check on history, but couldn't find any information on why it
> > > > was
> > > > removed. Can you send this patch to http://review.gluster.org ? We can
> > > > review and merge the patch over there. If you are not aware,
> > > > development
> > > > work flow can be found at:
> > > > 
> > > > http://www.gluster.org/community/documentation/index.php/Developers
> > > > 
> > > Actually, the patch turned out to be a flop. Sometimes a fuse mount would
> > > end
> > > up with an empty file system with the patch. (I don't know why it was
> > > broken,
> > > but maybe the original author tan into issues as well?)
> > 
> > +static void
> > +event_pool_changed (struct event_pool *event_pool)
> > +{
> > +
> > +        /* Write a byte into the breaker pipe to wake up poll(). */
> > +        if (event_pool->breaker[1] >= 0)
> > +                write(event_pool->breaker[1], "X", 1);
> > +}
> > 
> > breaker is set to non-blocking on both read and write ends. So, probably
> > write might be failing sometimes with EAGAIN/EBUSY and thereby preventing
> > the socket from being registered. Probably that might be the reason?
> 
> In what situations does write return EAGAIN/EBUSY for a pipe?  If it is only
> when the pipe's internal buffer is full, then there is still unread data and
> the reading end will wake up anyway, so there should be no need to write
> more.

That's a valid point. I was kind of thinking out loud since I couldn't figure out what was wrong in earlier approach. I was particularly interested in whether write can fail because of any error. Since we were not checking the return value, a write failure can result in a socket not being registered. Though I couldn't think of any particular reason for failure, felt that its a loose end that can be tied up.

On a different note, I found a bug in current poll implementation which could've resulted in high cpu usage. I've attached the diff, which wouldn't do an array copy (of registered events to ufd) every "timeout" seconds and which I hope which will bring down cpu usage. The issue was event_pool->changed was never reset and hence every timeout seconds, we would interpret that event_pool has changed even though it has not.

> 
> 
> >         if (event_pool->breaker[1] >= 0) {
> >                  do {
> >                         ret = write(event_pool->breaker[1], "X", 1);
> >                  } while (ret != 1);
> >         }
> 
> This will busy-wait, which doesn't look good.
> 
> 
> > Also similar logic might be required while flushing out junk from read end
> > too.
> 
> Failure to read everything should be benign because it will just cause poll
> to
> return again immediately next time.
> 
> 
> > > 
> > > Anyhow, I am now using the 3.7.6 event-poll.c code except that I have
> > > increased
> > > the timeout from 1msec->10msec. (Going from 1->5->10 didn't seem to cause
> > > a
> > > problem, but I got slower test runs when I increased to 20msec, so I've
> > > settled on
> > > 10mses. This does reduce the CPU usage when the GlusterFS file systems
> > > aren't
> > > active.)
> > > I will submit this one line change to your workflow if it continues to
> > > test
> > > ok.
> > > 
> > > Thanks for everyone's input, rick
> > > 
> > > > > 
> > > > > So, I added a few lines of code that writes a byte to it whenever the
> > > > > list
> > > > > of
> > > > > file descriptors is changed and read when poll() returns, if its
> > > > > revents
> > > > > is
> > > > > set.
> > > > > I also changed the timeout to -1 (infinity) and it seems to work for
> > > > > a
> > > > > trivial
> > > > > test.
> > > > > --> Btw, I also noticed the "changed" variable gets set to 1 on a
> > > > > change,
> > > > > but
> > > > >     never reset to 0. I didn't change this, since it looks "racey".
> > > > >     (ie.
> > > > >     I
> > > > >     think you could easily get a race between a thread that clears it
> > > > >     and
> > > > >     one
> > > > >     that adds a new fd.)
> > > > > 
> > > > > A slightly safer version of the patch would set a long (100msec ??)
> > > > > timeout
> > > > > instead
> > > > > of -1.
> > > > > 
> > > > > Anyhow, I've attached the patch in case anyone would like to try it
> > > > > and
> > > > > will
> > > > > create a bug report for this after I've had more time to test it.
> > > > > (I only use a couple of laptops, so my testing will be minimal.)
> > > > > 
> > > > > Thanks for all the help, rick
> > > > > 
> > > > > > _______________________________________________
> > > > > > freebsd-fs at freebsd.org mailing list
> > > > > > https://lists.freebsd.org/mailman/listinfo/freebsd-fs
> > > > > > To unsubscribe, send any mail to
> > > > > > "freebsd-fs-unsubscribe at freebsd.org"
> > > > > > 
> > > > > 
> > > > 
> > > 
> > _______________________________________________
> > freebsd-fs at freebsd.org mailing list
> > https://lists.freebsd.org/mailman/listinfo/freebsd-fs
> > To unsubscribe, send any mail to "freebsd-fs-unsubscribe at freebsd.org"
> > 
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: poll.diff
Type: text/x-patch
Size: 413 bytes
Desc: not available
URL: <http://www.gluster.org/pipermail/gluster-devel/attachments/20160119/e5f8a6ca/attachment-0001.bin>


More information about the Gluster-devel mailing list