[Gluster-devel] How long should metrics collection on a cluster take?

Thu Jul 26 04:19:39 UTC 2018

On Tue, Jul 24, 2018 at 10:11 PM Sankarshan Mukhopadhyay <
sankarshan.mukhopadhyay at gmail.com> wrote:

> On Tue, Jul 24, 2018 at 9:48 PM, Pranith Kumar Karampuri
> <pkarampu at redhat.com> wrote:
> > hi,
> >       Quite a few commands to monitor gluster at the moment take almost a
> > second to give output.
>
> Is this at the (most) minimum recommended cluster size?
>
> > Some categories of these commands:
> > 1) Any command that needs to do some sort of mount/glfs_init.
> >      Examples: 1) heal info family of commands 2) statfs to find
> > space-availability etc (On my laptop replica 3 volume with all local
> bricks,
> > glfs_init takes 0.3 seconds on average)
> > 2) glusterd commands that need to wait for the previous command to
> unlock.
> > If the previous command is something related to lvm snapshot which takes
> > quite a few seconds, it would be even more time consuming.
> >
> > Nowadays container workloads have hundreds of volumes if not thousands.
> If
> > we want to serve any monitoring solution at this scale (I have seen
> > customers use upto 600 volumes at a time, it will only get bigger) and
> lets
> > say collecting metrics per volume takes 2 seconds per volume(Let us take
> the
> > worst example which has all major features enabled like
> > snapshot/geo-rep/quota etc etc), that will mean that it will take 20
> minutes
> > to collect metrics of the cluster with 600 volumes. What are the ways in
> > which we can make this number more manageable? I was initially thinking
> may
> > be it is possible to get gd2 to execute commands in parallel on different
> > volumes, so potentially we could get this done in ~2 seconds. But quite a
> > few of the metrics need a mount or equivalent of a mount(glfs_init) to
> > collect different information like statfs, number of pending heals, quota
> > usage etc. This may lead to high memory usage as the size of the mounts
> tend
> > to be high.
> >
>
> I am not sure if starting from the "worst example" (it certainly is
> not) is a good place to start from. That said, for any environment
> with that number of disposable volumes, what kind of metrics do
> actually make any sense/impact?
>

This is really interesting question. When we have more number of disposable
volumes, I think we need metrics like available size in the cluster to
create more volumes than the utilization per volumes. (If we need to
observe the usage patterns of applications then we need per volume
utilization as well)

>
> > I wanted to seek suggestions from others on how to come to a conclusion
> > about which path to take and what problems to solve.
> >
> > I will be happy to raise github issues based on our conclusions on this
> mail
> > thread.
> >
> > --
> > Pranith
> >
>
>
>
>
>
> --
> sankarshan mukhopadhyay
> <https://about.me/sankarshan.mukhopadhyay>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-devel

--
regards
Aravinda VK
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-devel/attachments/20180726/5dde96f6/attachment.html>