[Gluster-users] Extremely slow cluster performance

Sun Apr 21 16:24:39 UTC 2019

Hi Strahil,

Thanks again for your help, I checked most of my clients are on 3.13.2
which I think is the default packaged with Ubuntu.
I upgraded a test VM to v5.6 and tested again and there is no difference,
performance accessing the cluster is the same.

Cheers,
-Patrick

On Sun, Apr 21, 2019 at 11:39 PM Strahil <hunter86_bg at yahoo.com> wrote:

> This looks more like FUSE problem.
> Are the clients on v3.12.xx ?
> Can you setup a VM for a test and run FUSE mounts using v5.6 and with v6.x
>
> Best Regards,
> Strahil Nikolov
> On Apr 21, 2019 17:24, Patrick Rennie <patrickmrennie at gmail.com> wrote:
>
> Hi Strahil,
>
> Thank you for your reply and your suggestions. I'm not sure which logs
> would be most relevant to be checking to diagnose this issue, we have the
> brick logs, the cluster mount logs, the shd logs or something else? I have
> posted a few that I have seen repeated a few times already. I will continue
> to post anything further that I see.
> I am working on migrating data to some new storage, so this will slowly
> free up space, although this is a production cluster and new data is being
> uploaded every day, sometimes faster than I can migrate it off. I have
> several other similar clusters and none of them have the same problem, one
> the others is actually at 98-99% right now (big problem, I know) but still
> performs perfectly fine compared to this cluster, I am not sure low space
> is the root cause here.
>
> I currently have 13 VMs accessing this cluster, I have checked each one
> and all of them use one of the two options below to mount the cluster in
> fstab
>
> HOSTNAME:/gvAA01   /mountpoint    glusterfs
>  defaults,_netdev,rw,log-level=WARNING,direct-io-mode=disable,use-readdirp=no
>   0 0
> HOSTNAME:/gvAA01   /mountpoint    glusterfs
>  defaults,_netdev,rw,log-level=WARNING,direct-io-mode=disable
>
> I also have a few other VMs which use NFS to access the cluster, and these
> machines appear to be significantly quicker, initially I get a similar
> delay with NFS but if I cancel the first "ls" and try it again I get < 1
> sec lookups, this can take over 10 minutes by FUSE/gluster client, but the
> same trick of cancelling and trying again doesn't work for FUSE/gluster.
> Sometimes the NFS queries have no delay at all, so this is a bit strange to
> me.
> HOSTNAME:/gvAA01        /mountpoint/ nfs
> defaults,_netdev,vers=3,async,noatime 0 0
>
> Example:
> user at VM:~$ time ls /cluster/folder
> ^C
>
> real    9m49.383s
> user    0m0.001s
> sys     0m0.010s
>
> user at VM:~$ time ls /cluster/folder
> <results>
>
> real    0m0.069s
> user    0m0.001s
> sys     0m0.007s
>
> ---
>
> I have checked the profiling as you suggested, I let it run for around a
> minute, then cancelled it and saved the profile info.
>
> root at HOSTNAME:/var/log/glusterfs# gluster volume profile gvAA01 start
> Starting volume profile on gvAA01 has been successful
> root at HOSTNAME:/var/log/glusterfs# time ls /cluster/folder
> ^C
>
> real    1m1.660s
> user    0m0.000s
> sys     0m0.002s
>
> root at HOSTNAME:/var/log/glusterfs# gluster volume profile gvAA01 info >>
> ~/profile.txt
> root at HOSTNAME:/var/log/glusterfs# gluster volume profile gvAA01 stop
>
> I will attach the results to this email as it's over 1000 lines.
> Unfortunately, I'm not sure what I'm looking at but possibly somebody will
> be able to help me make sense of it and let me know if it highlights any
> specific issues.
>
> Happy to try any further suggestions. Thank you,
>
> -Patrick
>
> On Sun, Apr 21, 2019 at 7:55 PM Strahil <hunter86_bg at yahoo.com> wrote:
>
> By the way, can you provide the 'volume info' and the mount options on all
> clients?
> Maybe , there is an option that uses a lot of resources due to some
> client's mount options.
>
> Best Regards,
> Strahil Nikolov
> On Apr 21, 2019 10:55, Patrick Rennie <patrickmrennie at gmail.com> wrote:
>
> Just another small update, I'm continuing to watch my brick logs and I
> just saw these errors come up in the recent events too. I am going to
> continue to post any errors I see in the hope of finding the right one to
> try and fix..
> This is from the logs on brick1
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190422/dc1c5432/attachment.html>