[Gluster-devel] How long should metrics collection on a cluster take?

John Strunk jstrunk at redhat.com
Thu Jul 26 15:00:05 UTC 2018


It is configurable. Use the default as a notion of scale... 5s may become
30s; It won't be 5m.
Also remember, this is the maximum, not minimum. A change to a watched kube
resource will cause an immediate reconcile. The periodic, timer-based loop
is just a fallback to catch state changes not represented in the kube API.

On Thu, Jul 26, 2018 at 12:57 AM Pranith Kumar Karampuri <
pkarampu at redhat.com> wrote:

>
>
> On Thu, Jul 26, 2018 at 9:59 AM, Pranith Kumar Karampuri <
> pkarampu at redhat.com> wrote:
>
>>
>>
>> On Wed, Jul 25, 2018 at 10:48 PM, John Strunk <jstrunk at redhat.com> wrote:
>>
>>> I have not put together a list. Perhaps the following will help w/ the
>>> context though...
>>>
>>> The "reconcile loop" of the operator will take the cluster CRs and
>>> reconcile them against the actual cluster config. At the 20k foot level,
>>> this amounts to something like determining there should be 8 gluster pods
>>> running, and making the appropriate changes if that doesn't match reality.
>>> In practical terms, the construction of this reconciliation loop can be
>>> thought of as a set (array) of 3-tuples: [{should_act() -> bool, can_act ->
>>> bool, action() -> ok, error}, {..., ..., ...}, ...]
>>>
>>> Each capability of the operator would be expressed as one of these
>>> tuples.
>>> should_act() : true if the action() should be taken
>>> can_act() : true if the prerequisites for taking the action are met
>>> action() : make the change. Only run if should && can.
>>> (note that I believe should_act() and can_act() should not be separate
>>> in the implementation, for reasons I'll not go into here)
>>>
>>> An example action might be "upgrade the container image for pod X". The
>>> associated should_act would be triggered if the "image=" of the pod doesn't
>>> match the desired "image=" in the operator CRs. The can_act evaluation
>>> would be verifying that it's ok to do this... Thinking from the top of my
>>> head:
>>> - All volumes w/ a brick on this pod should be fully healed
>>> - Sufficient cluster nodes should be up such that quorum is not lost
>>> when this node goes down (does this matter?)
>>> - The proposed image is compatible with the current version of the CSI
>>> driver(s), the operator, and other gluster pods
>>> - Probably some other stuff
>>> The action() would update the "image=" in the Deployment to trigger the
>>> rollout
>>>
>>> The idea is that queries would be made, both to the kube API and the
>>> gluster cluster to verify the necessary preconditions for an action prior
>>> to that action being invoked. There would obviously be commonality among
>>> the preconditions for various actions, so the results should be fetched
>>> exactly once per reconcile cycle. Also note, 1 cycle == at most 1 action()
>>> due to the action changing the state of the system.
>>>
>>> Given that we haven't designed (or even listed) all the potential
>>> action()s, I can't give you a list of everything to query. I guarantee
>>> we'll need to know the up/down status, heal counts, and free capacity for
>>> each brick and node.
>>>
>>
>> Thanks for the detailed explanation. This helps. One question though, is
>> 5 seconds a hard limit or is there a possibility to configure it?
>>
>
> I put together an idea for reducing the mgmt operation latency involving
> mounts at https://github.com/gluster/glusterd2/issues/1069, comments
> welcome.
> @john Still want to know if there exists  a way to find if the hard limit
> can be configured...
>
>
>>
>>
>>>
>>> -John
>>>
>>> On Wed, Jul 25, 2018 at 11:56 AM Pranith Kumar Karampuri <
>>> pkarampu at redhat.com> wrote:
>>>
>>>>
>>>>
>>>> On Wed, Jul 25, 2018 at 8:17 PM, John Strunk <jstrunk at redhat.com>
>>>> wrote:
>>>>
>>>>> To add an additional data point... The operator will need to regularly
>>>>> reconcile the true state of the gluster cluster with the desired state
>>>>> stored in kubernetes. This task will be required frequently (i.e.,
>>>>> operator-framework defaults to every 5s even if there are no config
>>>>> changes).
>>>>> The actual amount of data we will need to query from the cluster is
>>>>> currently TBD and likely significantly affected by Heketi/GD1 vs. GD2
>>>>> choice.
>>>>>
>>>>
>>>> Do we have any partial list of data we will gather? Just want to
>>>> understand what this might entail already...
>>>>
>>>>
>>>>>
>>>>> -John
>>>>>
>>>>>
>>>>> On Wed, Jul 25, 2018 at 5:45 AM Pranith Kumar Karampuri <
>>>>> pkarampu at redhat.com> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Jul 24, 2018 at 10:10 PM, Sankarshan Mukhopadhyay <
>>>>>> sankarshan.mukhopadhyay at gmail.com> wrote:
>>>>>>
>>>>>>> On Tue, Jul 24, 2018 at 9:48 PM, Pranith Kumar Karampuri
>>>>>>> <pkarampu at redhat.com> wrote:
>>>>>>> > hi,
>>>>>>> >       Quite a few commands to monitor gluster at the moment take
>>>>>>> almost a
>>>>>>> > second to give output.
>>>>>>>
>>>>>>> Is this at the (most) minimum recommended cluster size?
>>>>>>>
>>>>>>
>>>>>> Yes, with a single volume with 3 bricks i.e. 3 nodes in cluster.
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> > Some categories of these commands:
>>>>>>> > 1) Any command that needs to do some sort of mount/glfs_init.
>>>>>>> >      Examples: 1) heal info family of commands 2) statfs to find
>>>>>>> > space-availability etc (On my laptop replica 3 volume with all
>>>>>>> local bricks,
>>>>>>> > glfs_init takes 0.3 seconds on average)
>>>>>>> > 2) glusterd commands that need to wait for the previous command to
>>>>>>> unlock.
>>>>>>> > If the previous command is something related to lvm snapshot which
>>>>>>> takes
>>>>>>> > quite a few seconds, it would be even more time consuming.
>>>>>>> >
>>>>>>> > Nowadays container workloads have hundreds of volumes if not
>>>>>>> thousands. If
>>>>>>> > we want to serve any monitoring solution at this scale (I have seen
>>>>>>> > customers use upto 600 volumes at a time, it will only get bigger)
>>>>>>> and lets
>>>>>>> > say collecting metrics per volume takes 2 seconds per volume(Let
>>>>>>> us take the
>>>>>>> > worst example which has all major features enabled like
>>>>>>> > snapshot/geo-rep/quota etc etc), that will mean that it will take
>>>>>>> 20 minutes
>>>>>>> > to collect metrics of the cluster with 600 volumes. What are the
>>>>>>> ways in
>>>>>>> > which we can make this number more manageable? I was initially
>>>>>>> thinking may
>>>>>>> > be it is possible to get gd2 to execute commands in parallel on
>>>>>>> different
>>>>>>> > volumes, so potentially we could get this done in ~2 seconds. But
>>>>>>> quite a
>>>>>>> > few of the metrics need a mount or equivalent of a
>>>>>>> mount(glfs_init) to
>>>>>>> > collect different information like statfs, number of pending
>>>>>>> heals, quota
>>>>>>> > usage etc. This may lead to high memory usage as the size of the
>>>>>>> mounts tend
>>>>>>> > to be high.
>>>>>>> >
>>>>>>>
>>>>>>> I am not sure if starting from the "worst example" (it certainly is
>>>>>>> not) is a good place to start from.
>>>>>>
>>>>>>
>>>>>> I didn't understand your statement. Are you saying 600 volumes is a
>>>>>> worst example?
>>>>>>
>>>>>>
>>>>>>> That said, for any environment
>>>>>>> with that number of disposable volumes, what kind of metrics do
>>>>>>> actually make any sense/impact?
>>>>>>>
>>>>>>
>>>>>> Same metrics you track for long running volumes. It is just that the
>>>>>> way the metrics
>>>>>> are interpreted will be different. On a long running volume, you
>>>>>> would look at the metrics
>>>>>> and try to find why is the volume not giving performance as expected
>>>>>> in the last 1 hour. Where as
>>>>>> in this case, you would look at metrics and find the reason why
>>>>>> volumes that were
>>>>>> created and deleted in the last hour didn't give performance as
>>>>>> expected.
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> > I wanted to seek suggestions from others on how to come to a
>>>>>>> conclusion
>>>>>>> > about which path to take and what problems to solve.
>>>>>>> >
>>>>>>> > I will be happy to raise github issues based on our conclusions on
>>>>>>> this mail
>>>>>>> > thread.
>>>>>>> >
>>>>>>> > --
>>>>>>> > Pranith
>>>>>>> >
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> sankarshan mukhopadhyay
>>>>>>> <https://about.me/sankarshan.mukhopadhyay>
>>>>>>> _______________________________________________
>>>>>>> Gluster-devel mailing list
>>>>>>> Gluster-devel at gluster.org
>>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-devel
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Pranith
>>>>>> _______________________________________________
>>>>>> Gluster-devel mailing list
>>>>>> Gluster-devel at gluster.org
>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-devel
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Pranith
>>>>
>>>
>>
>>
>> --
>> Pranith
>>
>
>
>
> --
> Pranith
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-devel/attachments/20180726/0c577ca5/attachment.html>


More information about the Gluster-devel mailing list