[Gluster-devel] How long should metrics collection on a cluster take?

Thu Jul 26 04:57:32 UTC 2018

On Thu, Jul 26, 2018 at 9:59 AM, Pranith Kumar Karampuri <
pkarampu at redhat.com> wrote:

>
>
> On Wed, Jul 25, 2018 at 10:48 PM, John Strunk <jstrunk at redhat.com> wrote:
>
>> I have not put together a list. Perhaps the following will help w/ the
>> context though...
>>
>> The "reconcile loop" of the operator will take the cluster CRs and
>> reconcile them against the actual cluster config. At the 20k foot level,
>> this amounts to something like determining there should be 8 gluster pods
>> running, and making the appropriate changes if that doesn't match reality.
>> In practical terms, the construction of this reconciliation loop can be
>> thought of as a set (array) of 3-tuples: [{should_act() -> bool, can_act ->
>> bool, action() -> ok, error}, {..., ..., ...}, ...]
>>
>> Each capability of the operator would be expressed as one of these tuples.
>> should_act() : true if the action() should be taken
>> can_act() : true if the prerequisites for taking the action are met
>> action() : make the change. Only run if should && can.
>> (note that I believe should_act() and can_act() should not be separate in
>> the implementation, for reasons I'll not go into here)
>>
>> An example action might be "upgrade the container image for pod X". The
>> associated should_act would be triggered if the "image=" of the pod doesn't
>> match the desired "image=" in the operator CRs. The can_act evaluation
>> would be verifying that it's ok to do this... Thinking from the top of my
>> head:
>> - All volumes w/ a brick on this pod should be fully healed
>> - Sufficient cluster nodes should be up such that quorum is not lost when
>> this node goes down (does this matter?)
>> - The proposed image is compatible with the current version of the CSI
>> driver(s), the operator, and other gluster pods
>> - Probably some other stuff
>> The action() would update the "image=" in the Deployment to trigger the
>> rollout
>>
>> The idea is that queries would be made, both to the kube API and the
>> gluster cluster to verify the necessary preconditions for an action prior
>> to that action being invoked. There would obviously be commonality among
>> the preconditions for various actions, so the results should be fetched
>> exactly once per reconcile cycle. Also note, 1 cycle == at most 1 action()
>> due to the action changing the state of the system.
>>
>> Given that we haven't designed (or even listed) all the potential
>> action()s, I can't give you a list of everything to query. I guarantee
>> we'll need to know the up/down status, heal counts, and free capacity for
>> each brick and node.
>>
>
> Thanks for the detailed explanation. This helps. One question though, is 5
> seconds a hard limit or is there a possibility to configure it?
>

I put together an idea for reducing the mgmt operation latency involving
mounts at https://github.com/gluster/glusterd2/issues/1069, comments
welcome.
@john Still want to know if there exists  a way to find if the hard limit
can be configured...

>
>
>>
>> -John
>>
>> On Wed, Jul 25, 2018 at 11:56 AM Pranith Kumar Karampuri <
>> pkarampu at redhat.com> wrote:
>>
>>>
>>>
>>> On Wed, Jul 25, 2018 at 8:17 PM, John Strunk <jstrunk at redhat.com> wrote:
>>>
>>>> To add an additional data point... The operator will need to regularly
>>>> reconcile the true state of the gluster cluster with the desired state
>>>> stored in kubernetes. This task will be required frequently (i.e.,
>>>> operator-framework defaults to every 5s even if there are no config
>>>> changes).
>>>> The actual amount of data we will need to query from the cluster is
>>>> currently TBD and likely significantly affected by Heketi/GD1 vs. GD2
>>>> choice.
>>>>
>>>
>>> Do we have any partial list of data we will gather? Just want to
>>> understand what this might entail already...
>>>
>>>
>>>>
>>>> -John
>>>>
>>>>
>>>> On Wed, Jul 25, 2018 at 5:45 AM Pranith Kumar Karampuri <
>>>> pkarampu at redhat.com> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Tue, Jul 24, 2018 at 10:10 PM, Sankarshan Mukhopadhyay <
>>>>> sankarshan.mukhopadhyay at gmail.com> wrote:
>>>>>
>>>>>> On Tue, Jul 24, 2018 at 9:48 PM, Pranith Kumar Karampuri
>>>>>> <pkarampu at redhat.com> wrote:
>>>>>> > hi,
>>>>>> >       Quite a few commands to monitor gluster at the moment take
>>>>>> almost a
>>>>>> > second to give output.
>>>>>>
>>>>>> Is this at the (most) minimum recommended cluster size?
>>>>>>
>>>>>
>>>>> Yes, with a single volume with 3 bricks i.e. 3 nodes in cluster.
>>>>>
>>>>>
>>>>>>
>>>>>> > Some categories of these commands:
>>>>>> > 1) Any command that needs to do some sort of mount/glfs_init.
>>>>>> >      Examples: 1) heal info family of commands 2) statfs to find
>>>>>> > space-availability etc (On my laptop replica 3 volume with all
>>>>>> local bricks,
>>>>>> > glfs_init takes 0.3 seconds on average)
>>>>>> > 2) glusterd commands that need to wait for the previous command to
>>>>>> unlock.
>>>>>> > If the previous command is something related to lvm snapshot which
>>>>>> takes
>>>>>> > quite a few seconds, it would be even more time consuming.
>>>>>> >
>>>>>> > Nowadays container workloads have hundreds of volumes if not
>>>>>> thousands. If
>>>>>> > we want to serve any monitoring solution at this scale (I have seen
>>>>>> > customers use upto 600 volumes at a time, it will only get bigger)
>>>>>> and lets
>>>>>> > say collecting metrics per volume takes 2 seconds per volume(Let us
>>>>>> take the
>>>>>> > worst example which has all major features enabled like
>>>>>> > snapshot/geo-rep/quota etc etc), that will mean that it will take
>>>>>> 20 minutes
>>>>>> > to collect metrics of the cluster with 600 volumes. What are the
>>>>>> ways in
>>>>>> > which we can make this number more manageable? I was initially
>>>>>> thinking may
>>>>>> > be it is possible to get gd2 to execute commands in parallel on
>>>>>> different
>>>>>> > volumes, so potentially we could get this done in ~2 seconds. But
>>>>>> quite a
>>>>>> > few of the metrics need a mount or equivalent of a mount(glfs_init)
>>>>>> to
>>>>>> > collect different information like statfs, number of pending heals,
>>>>>> quota
>>>>>> > usage etc. This may lead to high memory usage as the size of the
>>>>>> mounts tend
>>>>>> > to be high.
>>>>>> >
>>>>>>
>>>>>> I am not sure if starting from the "worst example" (it certainly is
>>>>>> not) is a good place to start from.
>>>>>
>>>>>
>>>>> I didn't understand your statement. Are you saying 600 volumes is a
>>>>> worst example?
>>>>>
>>>>>
>>>>>> That said, for any environment
>>>>>> with that number of disposable volumes, what kind of metrics do
>>>>>> actually make any sense/impact?
>>>>>>
>>>>>
>>>>> Same metrics you track for long running volumes. It is just that the
>>>>> way the metrics
>>>>> are interpreted will be different. On a long running volume, you would
>>>>> look at the metrics
>>>>> and try to find why is the volume not giving performance as expected
>>>>> in the last 1 hour. Where as
>>>>> in this case, you would look at metrics and find the reason why
>>>>> volumes that were
>>>>> created and deleted in the last hour didn't give performance as
>>>>> expected.
>>>>>
>>>>>
>>>>>>
>>>>>> > I wanted to seek suggestions from others on how to come to a
>>>>>> conclusion
>>>>>> > about which path to take and what problems to solve.
>>>>>> >
>>>>>> > I will be happy to raise github issues based on our conclusions on
>>>>>> this mail
>>>>>> > thread.
>>>>>> >
>>>>>> > --
>>>>>> > Pranith
>>>>>> >
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> sankarshan mukhopadhyay
>>>>>> <https://about.me/sankarshan.mukhopadhyay>
>>>>>> _______________________________________________
>>>>>> Gluster-devel mailing list
>>>>>> Gluster-devel at gluster.org
>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-devel
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Pranith
>>>>> _______________________________________________
>>>>> Gluster-devel mailing list
>>>>> Gluster-devel at gluster.org
>>>>> https://lists.gluster.org/mailman/listinfo/gluster-devel
>>>>
>>>>
>>>
>>>
>>> --
>>> Pranith
>>>
>>
>
>
> --
> Pranith
>

-- 
Pranith
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-devel/attachments/20180726/f6308896/attachment-0001.html>