[Gluster-devel] Volgen support for loading trace and io-stats translators at specific points in the graph

Krutika Dhananjay kdhananj at redhat.com
Wed May 31 09:38:52 UTC 2017


On Tue, May 30, 2017 at 6:42 PM, Shyam <srangana at redhat.com> wrote:

> On 05/30/2017 05:28 AM, Krutika Dhananjay wrote:
>
>> You're right. With brick graphs, this will be a problem.
>>
>> Couple of options:
>>
>> 1. To begin with we identify points where we think it would be useful to
>> load io-stats in the brick graph and unconditionally have
>> glusterd-volgen load them in the volfile only at these places (not very
>> useful if we want to load trace xl though. Plus, this again makes
>> io-stats placement static).
>>
>
> I think this is needed (easier to get in), so +1 for this.
>
> Additionally, if this is chosen, we may need specific triggers for each
> instance, to target measuring the io-stats. IOW, generic io-stats can
> measure below


I tried this recently with existing code by configuring the
stats-dump-interval and fop-sample-interval options, and each instance
dumps its stats into a file under /var/lib/glusterd/stats. There's one file
per io-stats xl. The downside is that with each interval, the stats from
the prev interval gets overwritten. I'm planning to change this by adding a
timestamp + pid suffix to the dump file name.



> FUSE (as an example) and below server-protocol. Then, we may want to
> enable io-threads (assuming this is one instance on the brick that is a
> static placement), or POSIX (or both/all) specifically, than have them
> enabled by default when io-stats is turned on (which is the current
> behaviour).
>
>
I didn't follow the io-threads and posix part. Could you rephrase?
I'm thinking of changing code that would make io-stats be loaded above and
below io-threads and also above posix. Not sure if you meant the same thing
in your statement above ;) Would that be fine? Or is there value in loading
it elsewhere?

-Krutika


> Does this make sense?
>
>
>> 2. Embed the trace/io-stats functionality within xlator_t object itself,
>> and keep the accounting disabled by default. Only when required, the
>> user can perhaps enable the accounting options with volume-set or
>> through volume-profile start command for the brief period where they
>> want to capture the stats and disable it as soon as they're done.
>>
>
> This is a better longer term solution IMO. This way there is no further
> injection of io-stats xltor, and we get a lot more control on this better.
>
> Depending on time to completion, I would choose 1/2 as presented above.
> This is because, I see a lot of value in this and in answering user queries
> on what is slowing down their systems, so sooner we have this the better
> (say 3.12), if (2) is possible by then, more power to it.
>
>
>> Let me know what you think.
>>
>> -Krutika
>>
>> On Fri, May 26, 2017 at 9:19 PM, Shyam <srangana at redhat.com
>> <mailto:srangana at redhat.com>> wrote:
>>
>>     On 05/26/2017 05:44 AM, Krutika Dhananjay wrote:
>>
>>         Hi,
>>
>>         debug/io-stats and debug/trace are immensely useful for isolating
>>         translators that are performance bottlenecks and those that are
>>         causing
>>         iatt inconsistencies, respectively.
>>
>>         There are other translators too under xlators/debug such as
>>         error-gen,
>>         which are useful for debugging/testing our code.
>>
>>         The trick is to load these above and below one or more suspect
>>         translators, run the test and analyse the output they dump and
>> debug
>>         your problem.
>>
>>         Unfortunately, there is no way to load these at specific points
>>         in the
>>         graph using the volume-set CLI as of today. Our only option is to
>>         manually edit the volfile and restart the process and be
>>         super-careful
>>         not to perform *any* volume-{reset,set,profile} operation and
>> graph
>>         switch operations in general that could rewrite the volfile,
>>         wiping out
>>         all previous edits to it.
>>
>>         I propose the following CLI for achieving the same:
>>
>>         # gluster volume set <VOL> {debug.trace, debug.io-stats,
>>         debug.error-gen} <xl-name>
>>
>>         where <xl-name> represents the name of the translator above
>>         which you
>>         want this translator loaded (as parent).
>>
>>         For example, if i have a 2x2 dis-rep volume named testvol and I
>>         want to
>>         load trace above and below first child of DHT, I execute the
>>         following
>>         commands:
>>
>>         # gluster volume set <VOL> debug.trace testvol-replicate-0
>>         # gluster volume set <VOL> debug.trace testvol-client-0
>>         # gluster volume set <VOL> debug.trace testvol-client-1
>>
>>         The corresponding debug/trace translators will be named
>>         testvol-replicate-0-trace-parent, testvol-client-0-trace-parent,
>>         testvol-client-1-trace-parent and so on.
>>
>>         To revert the change, the user simply uses volume-reset CLI:
>>
>>         # gluster volume reset <VOL> testvol-replicate-0-trace-parent
>>         # gluster volume reset <VOL> testvol-client-0-trace-parent
>>         # gluster volume reset <VOL> testvol-client-1-trace-parent
>>
>>         What should happen when the translator with a
>>         trace/io-stat/error-gen
>>         parent gets disabled?
>>         Well glusterd should be made to take care to remove the trace xl
>> too
>>         from the graph.
>>
>>
>>
>>         Comments and suggestions welcome.
>>
>>
>>     +1, dynamic placement of io-stats was something that I added to this
>>     spec [1] as well. So I am all for the change.
>>
>>     I have one problem though that bothered me when I wrote the spec,
>>     currently brick vol files are static, and do not undergo a graph
>>     change (or code is not yet ready to do that). So when we want to do
>>     this on the bricks, what happens? Do you have solutions for the
>>     same? I am interested, hence asking!
>>
>>     [1] Initial feature description for improved io-stats:
>>     https://review.gluster.org/#/c/16558/1/under_review/Performa
>> nce_monitoring_and_debugging.md
>>     <https://review.gluster.org/#/c/16558/1/under_review/Perform
>> ance_monitoring_and_debugging.md>
>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-devel/attachments/20170531/2d753a26/attachment.html>


More information about the Gluster-devel mailing list