<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Jan 17, 2018 at 8:19 PM, Ben Turner <span dir="ltr">&lt;<a href="mailto:bturner@redhat.com" target="_blank">bturner@redhat.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi all.  I am seeing the strangest thing and I have no explanation and I was hoping I could get your input here.  I have a RHEV / RHS setup(non RHHI, just traditional) where we have two hypervisors with the exact same specs, mount the same Gluster volume, and one sees 1/2 the performance inside the VM as the other.  For these tests we would live migrate the VM from one HV to the other and testing back and forth.  The perf we are seeing is:<br>

<br>

RHEV3 read - 210 MB / sec      483 GB RAM       Dell r620    CPU - 8 cores<br>

      write - 150 MB / sec<br>

<br>

RHEV5 read - 592 / sec         483 GB RAM       Dell r620    CPU - 8 cores<br>

      write - 280 MB / sec<br>

<br>

On identical HW, mounting the same volume, live migrating the VM back and forth, the perf inside the VM is about 1/2 of the perf when run on the other HV!  To eliminate the RHEV / VM layer we ran similar tests directly on the mount and saw almost the same level of throughput difference.<br>

<br>

After that we ran iperf tests, both systems the same 9+ Gb / sec, in fact the speeds were almost identical over iperf.  After that the customer swapped cables / ports on the physical systems / router, again, same behavior.  We compared and contrasted configs, NW stats, and driver / FW versions.  Again all the same.  Both validated identical configs on BJeans and I later went through sosreport for anything I missed, I am completely stumped.  Does anyone have any ideas on where else to look?  RHEV3 is using less memory and has fewer VMs running as well!  While I think that we have eliminated a RHEV / Virt issue I left the RHHI guys in CC in case they had any ideas.<br>

<br>

I am happy to provide any info / open a bug / whatever ya&#39;ll think, any guidance on next steps would be appreciated.  Note - this is a production cluster with 100s of VMs so while we can test and move VMs around we can&#39;t just stop them all.  One last observation - These HVs have a bunch of RAM and I think the VMs are heavily leveraging page cache on the HV.<br></blockquote><div><br></div><div>RHV uses direct IO, so I don&#39;t see why they&#39;d use a lot of page cache.</div><div>I would assume opening a case makes sense. I didn&#39;t really understand if it&#39;s a Gluster or RHV issue though.</div><div>You should be able to run fio on the hosts to eliminate those. I personally like <a href="https://github.com/pcuzner/fio-tools">https://github.com/pcuzner/fio-tools</a></div><div>Y.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<br>

Thanks in advance all!<br>

<br>

-b<br>

<br>

---<br>

Note: This list is intended for discussions relating to Hyperconvergence and Red Hat Storage products, customers and/or support.<br>

</blockquote></div><br></div></div>