[Gluster-devel] [rhhi-dev] Performance drops between Hypervisors in Gluster backed RHEV / RHS.

Wed Jan 17 18:26:38 UTC 2018

On Wed, Jan 17, 2018 at 8:19 PM, Ben Turner <bturner at redhat.com> wrote:

> Hi all.  I am seeing the strangest thing and I have no explanation and I
> was hoping I could get your input here.  I have a RHEV / RHS setup(non
> RHHI, just traditional) where we have two hypervisors with the exact same
> specs, mount the same Gluster volume, and one sees 1/2 the performance
> inside the VM as the other.  For these tests we would live migrate the VM
> from one HV to the other and testing back and forth.  The perf we are
> seeing is:
>
> RHEV3 read - 210 MB / sec      483 GB RAM       Dell r620    CPU - 8 cores
>       write - 150 MB / sec
>
> RHEV5 read - 592 / sec         483 GB RAM       Dell r620    CPU - 8 cores
>       write - 280 MB / sec
>
> On identical HW, mounting the same volume, live migrating the VM back and
> forth, the perf inside the VM is about 1/2 of the perf when run on the
> other HV!  To eliminate the RHEV / VM layer we ran similar tests directly
> on the mount and saw almost the same level of throughput difference.
>
> After that we ran iperf tests, both systems the same 9+ Gb / sec, in fact
> the speeds were almost identical over iperf.  After that the customer
> swapped cables / ports on the physical systems / router, again, same
> behavior.  We compared and contrasted configs, NW stats, and driver / FW
> versions.  Again all the same.  Both validated identical configs on BJeans
> and I later went through sosreport for anything I missed, I am completely
> stumped.  Does anyone have any ideas on where else to look?  RHEV3 is using
> less memory and has fewer VMs running as well!  While I think that we have
> eliminated a RHEV / Virt issue I left the RHHI guys in CC in case they had
> any ideas.
>
> I am happy to provide any info / open a bug / whatever ya'll think, any
> guidance on next steps would be appreciated.  Note - this is a production
> cluster with 100s of VMs so while we can test and move VMs around we can't
> just stop them all.  One last observation - These HVs have a bunch of RAM
> and I think the VMs are heavily leveraging page cache on the HV.
>

RHV uses direct IO, so I don't see why they'd use a lot of page cache.
I would assume opening a case makes sense. I didn't really understand if
it's a Gluster or RHV issue though.
You should be able to run fio on the hosts to eliminate those. I personally
like https://github.com/pcuzner/fio-tools
Y.

>
> Thanks in advance all!
>
> -b
>
> ---
> Note: This list is intended for discussions relating to Hyperconvergence
> and Red Hat Storage products, customers and/or support.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-devel/attachments/20180117/d2bf7341/attachment.html>