[Gluster-devel] How long should metrics collection on a cluster take?

Pranith Kumar Karampuri pkarampu at redhat.com
Thu Jul 26 04:29:51 UTC 2018


On Wed, Jul 25, 2018 at 10:48 PM, John Strunk <jstrunk at redhat.com> wrote:

> I have not put together a list. Perhaps the following will help w/ the
> context though...
>
> The "reconcile loop" of the operator will take the cluster CRs and
> reconcile them against the actual cluster config. At the 20k foot level,
> this amounts to something like determining there should be 8 gluster pods
> running, and making the appropriate changes if that doesn't match reality.
> In practical terms, the construction of this reconciliation loop can be
> thought of as a set (array) of 3-tuples: [{should_act() -> bool, can_act ->
> bool, action() -> ok, error}, {..., ..., ...}, ...]
>
> Each capability of the operator would be expressed as one of these tuples.
> should_act() : true if the action() should be taken
> can_act() : true if the prerequisites for taking the action are met
> action() : make the change. Only run if should && can.
> (note that I believe should_act() and can_act() should not be separate in
> the implementation, for reasons I'll not go into here)
>
> An example action might be "upgrade the container image for pod X". The
> associated should_act would be triggered if the "image=" of the pod doesn't
> match the desired "image=" in the operator CRs. The can_act evaluation
> would be verifying that it's ok to do this... Thinking from the top of my
> head:
> - All volumes w/ a brick on this pod should be fully healed
> - Sufficient cluster nodes should be up such that quorum is not lost when
> this node goes down (does this matter?)
> - The proposed image is compatible with the current version of the CSI
> driver(s), the operator, and other gluster pods
> - Probably some other stuff
> The action() would update the "image=" in the Deployment to trigger the
> rollout
>
> The idea is that queries would be made, both to the kube API and the
> gluster cluster to verify the necessary preconditions for an action prior
> to that action being invoked. There would obviously be commonality among
> the preconditions for various actions, so the results should be fetched
> exactly once per reconcile cycle. Also note, 1 cycle == at most 1 action()
> due to the action changing the state of the system.
>
> Given that we haven't designed (or even listed) all the potential
> action()s, I can't give you a list of everything to query. I guarantee
> we'll need to know the up/down status, heal counts, and free capacity for
> each brick and node.
>

Thanks for the detailed explanation. This helps. One question though, is 5
seconds a hard limit or is there a possibility to configure it?


>
> -John
>
> On Wed, Jul 25, 2018 at 11:56 AM Pranith Kumar Karampuri <
> pkarampu at redhat.com> wrote:
>
>>
>>
>> On Wed, Jul 25, 2018 at 8:17 PM, John Strunk <jstrunk at redhat.com> wrote:
>>
>>> To add an additional data point... The operator will need to regularly
>>> reconcile the true state of the gluster cluster with the desired state
>>> stored in kubernetes. This task will be required frequently (i.e.,
>>> operator-framework defaults to every 5s even if there are no config
>>> changes).
>>> The actual amount of data we will need to query from the cluster is
>>> currently TBD and likely significantly affected by Heketi/GD1 vs. GD2
>>> choice.
>>>
>>
>> Do we have any partial list of data we will gather? Just want to
>> understand what this might entail already...
>>
>>
>>>
>>> -John
>>>
>>>
>>> On Wed, Jul 25, 2018 at 5:45 AM Pranith Kumar Karampuri <
>>> pkarampu at redhat.com> wrote:
>>>
>>>>
>>>>
>>>> On Tue, Jul 24, 2018 at 10:10 PM, Sankarshan Mukhopadhyay <
>>>> sankarshan.mukhopadhyay at gmail.com> wrote:
>>>>
>>>>> On Tue, Jul 24, 2018 at 9:48 PM, Pranith Kumar Karampuri
>>>>> <pkarampu at redhat.com> wrote:
>>>>> > hi,
>>>>> >       Quite a few commands to monitor gluster at the moment take
>>>>> almost a
>>>>> > second to give output.
>>>>>
>>>>> Is this at the (most) minimum recommended cluster size?
>>>>>
>>>>
>>>> Yes, with a single volume with 3 bricks i.e. 3 nodes in cluster.
>>>>
>>>>
>>>>>
>>>>> > Some categories of these commands:
>>>>> > 1) Any command that needs to do some sort of mount/glfs_init.
>>>>> >      Examples: 1) heal info family of commands 2) statfs to find
>>>>> > space-availability etc (On my laptop replica 3 volume with all local
>>>>> bricks,
>>>>> > glfs_init takes 0.3 seconds on average)
>>>>> > 2) glusterd commands that need to wait for the previous command to
>>>>> unlock.
>>>>> > If the previous command is something related to lvm snapshot which
>>>>> takes
>>>>> > quite a few seconds, it would be even more time consuming.
>>>>> >
>>>>> > Nowadays container workloads have hundreds of volumes if not
>>>>> thousands. If
>>>>> > we want to serve any monitoring solution at this scale (I have seen
>>>>> > customers use upto 600 volumes at a time, it will only get bigger)
>>>>> and lets
>>>>> > say collecting metrics per volume takes 2 seconds per volume(Let us
>>>>> take the
>>>>> > worst example which has all major features enabled like
>>>>> > snapshot/geo-rep/quota etc etc), that will mean that it will take 20
>>>>> minutes
>>>>> > to collect metrics of the cluster with 600 volumes. What are the
>>>>> ways in
>>>>> > which we can make this number more manageable? I was initially
>>>>> thinking may
>>>>> > be it is possible to get gd2 to execute commands in parallel on
>>>>> different
>>>>> > volumes, so potentially we could get this done in ~2 seconds. But
>>>>> quite a
>>>>> > few of the metrics need a mount or equivalent of a mount(glfs_init)
>>>>> to
>>>>> > collect different information like statfs, number of pending heals,
>>>>> quota
>>>>> > usage etc. This may lead to high memory usage as the size of the
>>>>> mounts tend
>>>>> > to be high.
>>>>> >
>>>>>
>>>>> I am not sure if starting from the "worst example" (it certainly is
>>>>> not) is a good place to start from.
>>>>
>>>>
>>>> I didn't understand your statement. Are you saying 600 volumes is a
>>>> worst example?
>>>>
>>>>
>>>>> That said, for any environment
>>>>> with that number of disposable volumes, what kind of metrics do
>>>>> actually make any sense/impact?
>>>>>
>>>>
>>>> Same metrics you track for long running volumes. It is just that the
>>>> way the metrics
>>>> are interpreted will be different. On a long running volume, you would
>>>> look at the metrics
>>>> and try to find why is the volume not giving performance as expected in
>>>> the last 1 hour. Where as
>>>> in this case, you would look at metrics and find the reason why volumes
>>>> that were
>>>> created and deleted in the last hour didn't give performance as
>>>> expected.
>>>>
>>>>
>>>>>
>>>>> > I wanted to seek suggestions from others on how to come to a
>>>>> conclusion
>>>>> > about which path to take and what problems to solve.
>>>>> >
>>>>> > I will be happy to raise github issues based on our conclusions on
>>>>> this mail
>>>>> > thread.
>>>>> >
>>>>> > --
>>>>> > Pranith
>>>>> >
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> sankarshan mukhopadhyay
>>>>> <https://about.me/sankarshan.mukhopadhyay>
>>>>> _______________________________________________
>>>>> Gluster-devel mailing list
>>>>> Gluster-devel at gluster.org
>>>>> https://lists.gluster.org/mailman/listinfo/gluster-devel
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Pranith
>>>> _______________________________________________
>>>> Gluster-devel mailing list
>>>> Gluster-devel at gluster.org
>>>> https://lists.gluster.org/mailman/listinfo/gluster-devel
>>>
>>>
>>
>>
>> --
>> Pranith
>>
>


-- 
Pranith
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-devel/attachments/20180726/0e913c70/attachment-0001.html>


More information about the Gluster-devel mailing list