[Gluster-devel] Performance drops between Hypervisors in Gluster backed RHEV / RHS.

Wed Jan 17 18:19:27 UTC 2018

Hi all.  I am seeing the strangest thing and I have no explanation and I was hoping I could get your input here.  I have a RHEV / RHS setup(non RHHI, just traditional) where we have two hypervisors with the exact same specs, mount the same Gluster volume, and one sees 1/2 the performance inside the VM as the other.  For these tests we would live migrate the VM from one HV to the other and testing back and forth.  The perf we are seeing is:

RHEV3 read - 210 MB / sec      483 GB RAM       Dell r620    CPU - 8 cores
      write - 150 MB / sec

RHEV5 read - 592 / sec         483 GB RAM       Dell r620    CPU - 8 cores
      write - 280 MB / sec

On identical HW, mounting the same volume, live migrating the VM back and forth, the perf inside the VM is about 1/2 of the perf when run on the other HV!  To eliminate the RHEV / VM layer we ran similar tests directly on the mount and saw almost the same level of throughput difference.

After that we ran iperf tests, both systems the same 9+ Gb / sec, in fact the speeds were almost identical over iperf.  After that the customer swapped cables / ports on the physical systems / router, again, same behavior.  We compared and contrasted configs, NW stats, and driver / FW versions.  Again all the same.  Both validated identical configs on BJeans and I later went through sosreport for anything I missed, I am completely stumped.  Does anyone have any ideas on where else to look?  RHEV3 is using less memory and has fewer VMs running as well!  While I think that we have eliminated a RHEV / Virt issue I left the RHHI guys in CC in case they had any ideas.

I am happy to provide any info / open a bug / whatever ya'll think, any guidance on next steps would be appreciated.  Note - this is a production cluster with 100s of VMs so while we can test and move VMs around we can't just stop them all.  One last observation - These HVs have a bunch of RAM and I think the VMs are heavily leveraging page cache on the HV.

Thanks in advance all!

-b