[Gluster-devel] [rhhi-dev] Performance drops between Hypervisors in Gluster backed RHEV / RHS.
Yaniv Kaul
ykaul at redhat.com
Wed Jan 17 18:26:38 UTC 2018
On Wed, Jan 17, 2018 at 8:19 PM, Ben Turner <bturner at redhat.com> wrote:
> Hi all. I am seeing the strangest thing and I have no explanation and I
> was hoping I could get your input here. I have a RHEV / RHS setup(non
> RHHI, just traditional) where we have two hypervisors with the exact same
> specs, mount the same Gluster volume, and one sees 1/2 the performance
> inside the VM as the other. For these tests we would live migrate the VM
> from one HV to the other and testing back and forth. The perf we are
> seeing is:
>
> RHEV3 read - 210 MB / sec 483 GB RAM Dell r620 CPU - 8 cores
> write - 150 MB / sec
>
> RHEV5 read - 592 / sec 483 GB RAM Dell r620 CPU - 8 cores
> write - 280 MB / sec
>
> On identical HW, mounting the same volume, live migrating the VM back and
> forth, the perf inside the VM is about 1/2 of the perf when run on the
> other HV! To eliminate the RHEV / VM layer we ran similar tests directly
> on the mount and saw almost the same level of throughput difference.
>
> After that we ran iperf tests, both systems the same 9+ Gb / sec, in fact
> the speeds were almost identical over iperf. After that the customer
> swapped cables / ports on the physical systems / router, again, same
> behavior. We compared and contrasted configs, NW stats, and driver / FW
> versions. Again all the same. Both validated identical configs on BJeans
> and I later went through sosreport for anything I missed, I am completely
> stumped. Does anyone have any ideas on where else to look? RHEV3 is using
> less memory and has fewer VMs running as well! While I think that we have
> eliminated a RHEV / Virt issue I left the RHHI guys in CC in case they had
> any ideas.
>
> I am happy to provide any info / open a bug / whatever ya'll think, any
> guidance on next steps would be appreciated. Note - this is a production
> cluster with 100s of VMs so while we can test and move VMs around we can't
> just stop them all. One last observation - These HVs have a bunch of RAM
> and I think the VMs are heavily leveraging page cache on the HV.
>
RHV uses direct IO, so I don't see why they'd use a lot of page cache.
I would assume opening a case makes sense. I didn't really understand if
it's a Gluster or RHV issue though.
You should be able to run fio on the hosts to eliminate those. I personally
like https://github.com/pcuzner/fio-tools
Y.
>
> Thanks in advance all!
>
> -b
>
> ---
> Note: This list is intended for discussions relating to Hyperconvergence
> and Red Hat Storage products, customers and/or support.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-devel/attachments/20180117/d2bf7341/attachment.html>
More information about the Gluster-devel
mailing list