[Gluster-infra] Distributed Testing and Memory issues

Tue Mar 20 06:13:10 UTC 2018

https://github.com/gluster/glusterfs/blob/master/extras/distributed-testing/distributed-test-runner.py#L334

This call can be expensive in terms of memory because it sends the entire zip of the files from client to server. This call be optimized to send in batches if there is memory issues. You can try the change or let me know if you want me to make the change. Otherwise this code should be very slim on memory.

Thanks!

From: Deepshikha Khandelwal <dkhandel at redhat.com>
Date: Monday, March 19, 2018 at 9:24 PM
To: Karthikeyan Radhakrishnan <krad at fb.com>
Cc: Nigel Babu <nigelb at redhat.com>, gluster-infra <gluster-infra at gluster.org>, Jeff Darcy <jeff at pl.atyp.us>
Subject: Re: Distributed Testing and Memory issues

On Sun, Mar 18, 2018 at 12:12 PM, Karthikeyan Radhakrishnan <krad at fb.com<mailto:krad at fb.com>> wrote:
Hi Nigel,

This is awesome!

MemoryError is very weird. We @Facebook have never seen that. The test server/client is super thin to cause memory pressure, but the tests they run can cause such issues. How much memory does the machine you are running have?
I'm running this on machines having 2GB memory. And I think this is enough to have this distributed test framework setup for us.
Is the machine under pressure when you see the errors? The best way would be to add a rpc to query memory stat and observe.
These are newly created machines running just XMLRPC server process. I checked with top and got to know that this process is utilizing about 77% of memory at initial stage itself when the tester part of code scans and skip kicking the host/server for availability.
RPC is a new thing for me, so I'm not aware of RPC query calls. If you can brief me more about this, it would be helpful.

Let me accelerate setting up some common space (like aws) where can re-pro such problems.
It would be great.

Thanks!
-Karthik

From: Nigel Babu <nigelb at redhat.com<mailto:nigelb at redhat.com>>
Date: Saturday, March 17, 2018 at 7:03 AM
To: Karthikeyan Radhakrishnan <krad at fb.com<mailto:krad at fb.com>>
Cc: gluster-infra <gluster-infra at gluster.org<mailto:gluster-infra at gluster.org>>, Deepshikha Khandelwal <dkhandel at redhat.com<mailto:dkhandel at redhat.com>>, Jeff Darcy <jeff at pl.atyp.us<mailto:jeff at pl.atyp.us>>
Subject: Distributed Testing and Memory issues

Hey Karthik,

Deepshikha has been working on testing the distributed test framework that you contributed (thank you!). Instead of writing our own code to chunk the tests, we've decided to just consume what you've written so we can work on making it run both at FB and upstream.

We're running into MemoryError exception from the threads. Do you know what's the best way to debug or let us know how much memory your machines have? That'll help us figure out solving this sooner upstream.

PS: This email is CC'd to gluster-infra and is archived publicly.

--
nigelb

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-infra/attachments/20180320/d38e9d5f/attachment-0001.html>