[Gluster-devel] How long should metrics collection on a cluster take?

Pranith Kumar Karampuri pkarampu at redhat.com
Thu Jul 26 15:06:34 UTC 2018


On Thu, Jul 26, 2018 at 8:30 PM, John Strunk <jstrunk at redhat.com> wrote:

> It is configurable. Use the default as a notion of scale... 5s may become
> 30s; It won't be 5m.
> Also remember, this is the maximum, not minimum. A change to a watched
> kube resource will cause an immediate reconcile. The periodic, timer-based
> loop is just a fallback to catch state changes not represented in the kube
> API.
>

Cool, got it. Let us wait if anyone sees any objections to the solution
proposed.

Request everyone to comment if they see any issues with
https://github.com/gluster/glusterd2/issues/1069
I think EC/AFR/Quota components will definitely be affected with this
approach. CCing them.
Please feel free to CC anyone who works on commands that require a mount to
give status.


>
> On Thu, Jul 26, 2018 at 12:57 AM Pranith Kumar Karampuri <
> pkarampu at redhat.com> wrote:
>
>>
>>
>> On Thu, Jul 26, 2018 at 9:59 AM, Pranith Kumar Karampuri <
>> pkarampu at redhat.com> wrote:
>>
>>>
>>>
>>> On Wed, Jul 25, 2018 at 10:48 PM, John Strunk <jstrunk at redhat.com>
>>> wrote:
>>>
>>>> I have not put together a list. Perhaps the following will help w/ the
>>>> context though...
>>>>
>>>> The "reconcile loop" of the operator will take the cluster CRs and
>>>> reconcile them against the actual cluster config. At the 20k foot level,
>>>> this amounts to something like determining there should be 8 gluster pods
>>>> running, and making the appropriate changes if that doesn't match reality.
>>>> In practical terms, the construction of this reconciliation loop can be
>>>> thought of as a set (array) of 3-tuples: [{should_act() -> bool, can_act ->
>>>> bool, action() -> ok, error}, {..., ..., ...}, ...]
>>>>
>>>> Each capability of the operator would be expressed as one of these
>>>> tuples.
>>>> should_act() : true if the action() should be taken
>>>> can_act() : true if the prerequisites for taking the action are met
>>>> action() : make the change. Only run if should && can.
>>>> (note that I believe should_act() and can_act() should not be separate
>>>> in the implementation, for reasons I'll not go into here)
>>>>
>>>> An example action might be "upgrade the container image for pod X". The
>>>> associated should_act would be triggered if the "image=" of the pod doesn't
>>>> match the desired "image=" in the operator CRs. The can_act evaluation
>>>> would be verifying that it's ok to do this... Thinking from the top of my
>>>> head:
>>>> - All volumes w/ a brick on this pod should be fully healed
>>>> - Sufficient cluster nodes should be up such that quorum is not lost
>>>> when this node goes down (does this matter?)
>>>> - The proposed image is compatible with the current version of the CSI
>>>> driver(s), the operator, and other gluster pods
>>>> - Probably some other stuff
>>>> The action() would update the "image=" in the Deployment to trigger the
>>>> rollout
>>>>
>>>> The idea is that queries would be made, both to the kube API and the
>>>> gluster cluster to verify the necessary preconditions for an action prior
>>>> to that action being invoked. There would obviously be commonality among
>>>> the preconditions for various actions, so the results should be fetched
>>>> exactly once per reconcile cycle. Also note, 1 cycle == at most 1 action()
>>>> due to the action changing the state of the system.
>>>>
>>>> Given that we haven't designed (or even listed) all the potential
>>>> action()s, I can't give you a list of everything to query. I guarantee
>>>> we'll need to know the up/down status, heal counts, and free capacity for
>>>> each brick and node.
>>>>
>>>
>>> Thanks for the detailed explanation. This helps. One question though, is
>>> 5 seconds a hard limit or is there a possibility to configure it?
>>>
>>
>> I put together an idea for reducing the mgmt operation latency involving
>> mounts at https://github.com/gluster/glusterd2/issues/1069, comments
>> welcome.
>> @john Still want to know if there exists  a way to find if the hard limit
>> can be configured...
>>
>>
>>>
>>>
>>>>
>>>> -John
>>>>
>>>> On Wed, Jul 25, 2018 at 11:56 AM Pranith Kumar Karampuri <
>>>> pkarampu at redhat.com> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Wed, Jul 25, 2018 at 8:17 PM, John Strunk <jstrunk at redhat.com>
>>>>> wrote:
>>>>>
>>>>>> To add an additional data point... The operator will need to
>>>>>> regularly reconcile the true state of the gluster cluster with the desired
>>>>>> state stored in kubernetes. This task will be required frequently (i.e.,
>>>>>> operator-framework defaults to every 5s even if there are no config
>>>>>> changes).
>>>>>> The actual amount of data we will need to query from the cluster is
>>>>>> currently TBD and likely significantly affected by Heketi/GD1 vs. GD2
>>>>>> choice.
>>>>>>
>>>>>
>>>>> Do we have any partial list of data we will gather? Just want to
>>>>> understand what this might entail already...
>>>>>
>>>>>
>>>>>>
>>>>>> -John
>>>>>>
>>>>>>
>>>>>> On Wed, Jul 25, 2018 at 5:45 AM Pranith Kumar Karampuri <
>>>>>> pkarampu at redhat.com> wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Jul 24, 2018 at 10:10 PM, Sankarshan Mukhopadhyay <
>>>>>>> sankarshan.mukhopadhyay at gmail.com> wrote:
>>>>>>>
>>>>>>>> On Tue, Jul 24, 2018 at 9:48 PM, Pranith Kumar Karampuri
>>>>>>>> <pkarampu at redhat.com> wrote:
>>>>>>>> > hi,
>>>>>>>> >       Quite a few commands to monitor gluster at the moment take
>>>>>>>> almost a
>>>>>>>> > second to give output.
>>>>>>>>
>>>>>>>> Is this at the (most) minimum recommended cluster size?
>>>>>>>>
>>>>>>>
>>>>>>> Yes, with a single volume with 3 bricks i.e. 3 nodes in cluster.
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> > Some categories of these commands:
>>>>>>>> > 1) Any command that needs to do some sort of mount/glfs_init.
>>>>>>>> >      Examples: 1) heal info family of commands 2) statfs to find
>>>>>>>> > space-availability etc (On my laptop replica 3 volume with all
>>>>>>>> local bricks,
>>>>>>>> > glfs_init takes 0.3 seconds on average)
>>>>>>>> > 2) glusterd commands that need to wait for the previous command
>>>>>>>> to unlock.
>>>>>>>> > If the previous command is something related to lvm snapshot
>>>>>>>> which takes
>>>>>>>> > quite a few seconds, it would be even more time consuming.
>>>>>>>> >
>>>>>>>> > Nowadays container workloads have hundreds of volumes if not
>>>>>>>> thousands. If
>>>>>>>> > we want to serve any monitoring solution at this scale (I have
>>>>>>>> seen
>>>>>>>> > customers use upto 600 volumes at a time, it will only get
>>>>>>>> bigger) and lets
>>>>>>>> > say collecting metrics per volume takes 2 seconds per volume(Let
>>>>>>>> us take the
>>>>>>>> > worst example which has all major features enabled like
>>>>>>>> > snapshot/geo-rep/quota etc etc), that will mean that it will take
>>>>>>>> 20 minutes
>>>>>>>> > to collect metrics of the cluster with 600 volumes. What are the
>>>>>>>> ways in
>>>>>>>> > which we can make this number more manageable? I was initially
>>>>>>>> thinking may
>>>>>>>> > be it is possible to get gd2 to execute commands in parallel on
>>>>>>>> different
>>>>>>>> > volumes, so potentially we could get this done in ~2 seconds. But
>>>>>>>> quite a
>>>>>>>> > few of the metrics need a mount or equivalent of a
>>>>>>>> mount(glfs_init) to
>>>>>>>> > collect different information like statfs, number of pending
>>>>>>>> heals, quota
>>>>>>>> > usage etc. This may lead to high memory usage as the size of the
>>>>>>>> mounts tend
>>>>>>>> > to be high.
>>>>>>>> >
>>>>>>>>
>>>>>>>> I am not sure if starting from the "worst example" (it certainly is
>>>>>>>> not) is a good place to start from.
>>>>>>>
>>>>>>>
>>>>>>> I didn't understand your statement. Are you saying 600 volumes is a
>>>>>>> worst example?
>>>>>>>
>>>>>>>
>>>>>>>> That said, for any environment
>>>>>>>> with that number of disposable volumes, what kind of metrics do
>>>>>>>> actually make any sense/impact?
>>>>>>>>
>>>>>>>
>>>>>>> Same metrics you track for long running volumes. It is just that the
>>>>>>> way the metrics
>>>>>>> are interpreted will be different. On a long running volume, you
>>>>>>> would look at the metrics
>>>>>>> and try to find why is the volume not giving performance as expected
>>>>>>> in the last 1 hour. Where as
>>>>>>> in this case, you would look at metrics and find the reason why
>>>>>>> volumes that were
>>>>>>> created and deleted in the last hour didn't give performance as
>>>>>>> expected.
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> > I wanted to seek suggestions from others on how to come to a
>>>>>>>> conclusion
>>>>>>>> > about which path to take and what problems to solve.
>>>>>>>> >
>>>>>>>> > I will be happy to raise github issues based on our conclusions
>>>>>>>> on this mail
>>>>>>>> > thread.
>>>>>>>> >
>>>>>>>> > --
>>>>>>>> > Pranith
>>>>>>>> >
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> sankarshan mukhopadhyay
>>>>>>>> <https://about.me/sankarshan.mukhopadhyay>
>>>>>>>> _______________________________________________
>>>>>>>> Gluster-devel mailing list
>>>>>>>> Gluster-devel at gluster.org
>>>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-devel
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Pranith
>>>>>>> _______________________________________________
>>>>>>> Gluster-devel mailing list
>>>>>>> Gluster-devel at gluster.org
>>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-devel
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Pranith
>>>>>
>>>>
>>>
>>>
>>> --
>>> Pranith
>>>
>>
>>
>>
>> --
>> Pranith
>>
>


-- 
Pranith
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-devel/attachments/20180726/5283e9be/attachment-0001.html>


More information about the Gluster-devel mailing list