[Gluster-devel] Feature: FOP Statistics JSON Dumps

Richard Wareing rwareing at fb.com
Tue Sep 22 21:54:10 UTC 2015


Hey Ben,

So the UI for it is simply to read it from /var/lib/glusterd/stats.  For example for gNFSd you can simply do this:

cat /var/lib/glusterd/stats/glusterfs_nfsd.dump


To see the output.  The reason we favor this "procfs" style interface is that:

1. There are 0 depedencies on CLIs which can hang.
2. All dumps are independent of one another, gNFSd on that host is having issues, this should prevent glusterfsd from sending us stats.
3. The output can be sent to an analytics/alarm engine of your choice.  Or simply run grep w/ "watch" in a loop to watch "live" when doing debugging.

Since we have this feature...we actually never use "profile" at all actually: there's really no need since you have the data 24x7 on 5 second intervals. You only need to enable diagnostics.latency-measurement, diagnostics.count-fop-hits and set the diagnostics.ios-dump-interval to non-zero and the data will land in /var/lib/glusterd/stats/<daemon>.dump .

Bug is updated w/ example output, but here's a teaser:


{
*SNIP*
"gluster.nfsd.inter.fop.removexattr.latency_ave_usec": "0.00",
"gluster.nfsd.inter.fop.removexattr.latency_min_usec": "0.00",
"gluster.nfsd.inter.fop.removexattr.latency_max_usec": "0.00",
"gluster.nfsd.inter.fop.opendir.per_sec": "2.60",
"gluster.nfsd.inter.fop.opendir.latency_ave_usec": "1658.92",
"gluster.nfsd.inter.fop.opendir.latency_min_usec": "715.00",
"gluster.nfsd.inter.fop.opendir.latency_max_usec": "7179.00",
"gluster.nfsd.inter.fop.fsyncdir.per_sec": "0.00",
"gluster.nfsd.inter.fop.fsyncdir.latency_ave_usec": "0.00",
"gluster.nfsd.inter.fop.fsyncdir.latency_min_usec": "0.00",
"gluster.nfsd.inter.fop.fsyncdir.latency_max_usec": "0.00",
"gluster.nfsd.inter.fop.access.per_sec": "43.19",
"gluster.nfsd.inter.fop.access.latency_ave_usec": "323.51",
"gluster.nfsd.inter.fop.access.latency_min_usec": "144.00",
"gluster.nfsd.inter.fop.access.latency_max_usec": "6639.00",
"gluster.nfsd.inter.fop.create.per_sec": "0.00",
*SNIP*
}

There's also aggregate counters which track from process birth to death which are exported as well.

Richard


________________________________________
From: Ben England [bengland at redhat.com]
Sent: Tuesday, September 22, 2015 11:04 AM
To: Richard Wareing
Cc: gluster-devel at gluster.org
Subject: Re: [Gluster-devel] Feature: FOP Statistics JSON Dumps

Richard, what's great about your patch (besides lockless counters) is:

- JSON easier to parse (particularly in python).  Compare to parsing "gluster volume profile" output, which is much more difficult.  This will enable tools to display profiling data in a user-friendly way.  Would be nice if you attached a sample output to the bz 1261700.

- client side capture - io-stats translator is at the top of the translator stack so we would see latencies just like the application sees them.  "gluster volume profile" provides server-side latencies but this can be deceptive and fails to report "user experience" latencies.

I'm not that clear on the UI for it, would be nice if "gluster volume " command could be set up to automatically poll this data at a fixed rate like many other perf utilities (example: iostat), so that user could capture a Gluster profile over time with a single command; at present the support team has to give them a script to do it.  This would make it trivial for a user to share what their application is doing from a Gluster perspective, as well as how Gluster is performing from the client's perspective.    /usr/sbin/gluster utility can run on the client now since it is in gluster-cli RPM right?

So in other words it would be great to replace this:

gluster volume profile $volume_name start
gluster volume profile $volume_name info > /tmp/past
for min in `seq 1 $sample_count` ; do
  sleep $sample_interval
  gluster volume profile $volume_name info
done > gvp.log
gluster volume profile $volume_name stop

With this:

gluster volume profile $volume_name $sample_interval $sample_count > gvp.log

And be able to run this command on the client to use your patch there.

thx

-ben

----- Original Message -----
> From: "Richard Wareing" <rwareing at fb.com>
> To: gluster-devel at gluster.org
> Sent: Wednesday, September 9, 2015 10:24:54 PM
> Subject: [Gluster-devel] Feature: FOP Statistics JSON Dumps
>
> Hey all,
>
> I just uploaded a clean patch for our FOP statistics dump feature @
> https://bugzilla.redhat.com/show_bug.cgi?id=1261700 .
>
> Patches cleanly to v3.6.x/v3.7.x release branches, also includes io-stats
> support for intel arch atomic operations (ifdef'd for portability) such that
> you can collect data 24x7 with a negligible latency hit in the IO path.
> We've been using this for quite sometime and there appeared to have been
> some interest at the dev summit to have this in mainline; so here it is.
>
> Take a look, and I hope you find it useful.
>
> Richard


More information about the Gluster-devel mailing list