[Gluster-devel] [Gluster-Maintainers] Metrics: and how to get them out from gluster

Fri Sep 1 06:57:50 UTC 2017

On 09/01/2017 10:57 AM, Amar Tumballi wrote:
> Disclaimer: This email is long, and did take significant time to 
> write. Do take time and read, review and give feedback, so we can have 
> some metrics related tasks done by Gluster 4.0
>
> ---
> ** History:*
>
> To understand what is happening inside GlusterFS process, over the 
> years, we have opened many bugs and also coded few things with regard 
> to statedump, and did put some effort into io-stats translator to 
> improve the gluster's monitoring capabilities.
>
> But surely there is more required! And some glimpse of it is captured 
> in [1], [2], [3] & [4]. Also, I did send an email to this group [5] 
> about possibilities of capturing this information.
>
> ** Current problem:*
>
> When we talk about metrics or monitoring, we have to consider giving 
> out these data to a tool which can preserve the readings in a periodic 
> time, without a time graph, no metrics will make sense! So, the first 
> challenge itself is how to get them out? Should getting the metrics 
> out from each process need 'glusterd' interacting? or should we use 
> signals? Which leads us to *'challenge #1'.*
>
> Next is, should we depend on io-stats to do the reporting? If yes, how 
> to get information from between any two layers? Should we provide 
> io-stats in between all the nodes of translator graph? or should we 
> utilize STACK_WIND/UNWIND framework to get the details? This is our 
> *'challenge #2'*
>
> Once the above decision will be taken, then the question is, "what 
> about 'metrics' from other translators? Who gives it out (ie, dumps 
> it?)? Why do we need something similar to statedump, and can't we read 
> info from statedump itself?". But when we say 'metrics', we should 
> have a key and a number associated with it, statedump has lot more, 
> and no format. If its different from statedump, then what is our 
> answer for translator code to give out metrics? This is our 
> *'challenge #3*'
>
> If we get a solution to above challenges, then I guess we are in a 
> decent shape for further development. Lets go through them one by one, 
> in detail.
>
> ** Problems and proposed solutions:*
>
> *a) how to dump metrics data ?*
>
> Currently, I propose signal handler way, as it will give control for 
> us to choose what are the processes we need to capture information on, 
> and will be much faster than communicating through another tool. Also 
> considering we need to have these metrics taken every 10sec or so, 
> there will be a need for efficient way to get this out.
>
> But even there, we have challenges, because we have already chosen 
> both USR1 and USR2 signal handlers, one for statedump, another for 
> toggling latency monitoring respectively. It makes sense to continue 
> to have statedump use USR1, but toggling options should be technically 
> (for correctness too) be handled by glusterd volume set options, and 
> there should be a way to handle it in a better way by our 
> 'reconfigure()' framework in graph-switch. Proposal sent in github 
> issue #303 [6].
>
> If we are good with above proposal, then we can make use of USR2 for 
> metrics dump. Next issue will be about the format of the file itself, 
> which we will discuss at the end of the email.

How about using UDP to push data from Gluster processes?

- Signal handling only required while reloading vol file when metrics 
enabled/disabled
- Push it to pre-configured UDP address(socket file), if listener exists 
it will capture metrics else UDP message is lost
- Will not affect the io performance since it is asynchronous and error 
is not checked by the sender.
- If metrics is lost, will not impact any data/io. We don't need crash 
consistency or high accuracy while collecting metrics.
- Receiver can receive data fast without blocking incoming data and can 
produce outputs in different formats asynchronously.

Usage:
- Enable metrics using volopt
- Start udp server(Metrics receiver), Example ./gluster-metrics-receiver 
(socket file should be predefined say `/var/run/gluster/metrics.socket`)

Limitations:
Similar to `strace` command, running two receiver same time is not 
possible. It is possible to run multiple receivers by having different 
socket address for each process/pid. For example, `metrics.<pid>.socket` 
and receiver will listen using `./gluster-metrics-receiver -p <pid>`

>
> NOTE: Above approach is already implemented in 'experimental' branch, 
> excluding handling of [6].
>
> *b) where to measure the latency and fops counts?*
>
> One of the possible way is to load io-stats in between all the nodes, 
> but it has its own limitations. Mainly, how to configure options in 
> each of this translator, will having too many translators slow down 
> operation ? (ie, create one extra 'frame' for every fop, and in a 
> graph of 20 xlator, it will be 20 extra frame creates for a single fop).
>
> I propose we handle this in 'STACK_WIND/UNWIND' macros itself, and 
> provide a placeholder to store all this data in translator structure 
> itself. This will be more cleaner, and no changes are required in code 
> base, other than in 'stack.h (and some in xlator.h)'.
>
> Also, we can provide 'option monitoring enable' (or disable) option as 
> a default option for every translator, and can handle it at 
> xlator_init() time itself. (This is not a blocker for 4.0, but good to 
> have). Idea proposed @ github #304 [7].
>
> NOTE: this approach is working pretty good already at 'experimental' 
> branch, excluding [7]. Depending on feedback, we can improve it further.
>
> *c) framework for xlators to provide private metrics*
>
> One possible solution is to use statedump functions. But to cause 
> least disruption to an existing code, I propose 2 new methods. 
> 'dump_metrics()', and 'reset_metrics()' to xlator methods, which can 
> be dl_open()'d to xlator structure.
>
> 'dump_metrics()' dumps the private metrics in the expected format, and 
> will be called from the global dump-metrics framework, and 
> 'reset_metrics()' would be called from a CLI command when someone 
> wants to restart metrics from 0 to check / validate few things in a 
> running cluster. Helps debug-ability.
>
> Further feedback welcome.
>
> NOTE: a sample code is already implemented in 'experimental' branch, 
> and protocol/server xlator uses this framework to dump metrics from 
> rpc layer, and client connections.
>
> *d) format of the 'metrics' file.*
>
> If you want any plot-able data on a graph, you need key (should be 
> string), and value (should be a number), collected over time. So, this 
> file should output data for the monitoring systems and not exactly for 
> the debug-ability. We have 'statedump' for debug-ability.
>
> So, I propose a plain text file, where data would be dumped like below.
>
> ```
> # anything starting from # would be treated as comment.
> <key><space><value>
> # anything after the value would be ignored.
> ```
> Any better solutions are welcome. Ideally, we should keep this 
> friendly for external projects to consume, like tendrl [8] or 
> graphite, prometheus etc. Also note that, once we agree to the format, 
> it would be very hard to change it as external projects would use it.
>
> I would like to hear the feedback from people who are experienced with 
> monitoring systems here.
>
> NOTE: the above format works fine with 'glustermetrics' project [9] 
> and is working decently on 'experimental' branch.
>
> ------
>
> ** Discussions:*
>
> Let me know how you all want to take the discussion forward?
>
> Should we get to github, and discuss on each issue? or should I rebase 
> and send the current patches from experimental to 'master' branch and 
> discuss in our review system?  Or should we continue on the email here!
>
> Regards,
> Amar
>
> References:
>
> [1] - https://github.com/gluster/glusterfs/issues/137
> [2] - https://github.com/gluster/glusterfs/issues/141
> [3] - https://github.com/gluster/glusterfs/issues/275
> [4] - https://github.com/gluster/glusterfs/issues/168
> [5] - 
> http://lists.gluster.org/pipermail/maintainers/2017-August/002954.html 
> (last email of the thread).
> [6] - https://github.com/gluster/glusterfs/issues/303
> [7] - https://github.com/gluster/glusterfs/issues/304
> [8] - https://github.com/Tendrl
> [9] - https://github.com/amarts/glustermetrics
>
> -- 
> Amar Tumballi (amarts)
>
>
> _______________________________________________
> maintainers mailing list
> maintainers at gluster.org
> http://lists.gluster.org/mailman/listinfo/maintainers

-- 
regards
Aravinda VK
http://aravindavk.in

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-devel/attachments/20170901/6bf68773/attachment-0001.html>