[Gluster-users] High I/O And Processor Utilization

Mon Jan 18 12:08:37 UTC 2016

----- Original Message -----

> From: "Kyle Harris" <kyle.harris98 at gmail.com>
> To: "Pranith Kumar Karampuri" <pkarampu at redhat.com>
> Cc: "Lindsay Mathieson" <lindsay.mathieson at gmail.com>, "Krutika Dhananjay"
> <kdhananj at redhat.com>, gluster-users at gluster.org, "Paul Cuzner"
> <pcuzner at redhat.com>
> Sent: Friday, January 15, 2016 4:46:43 AM
> Subject: Re: [Gluster-users] High I/O And Processor Utilization

> I thought I might take a minute to update everyone on this situation. I
> rebuilt the glusterFS using a shard size of 256MB and then imported all VMs
> back on to the cluster. I rebuilt it from scratch rather than just doing an
> export/import on the data so I could start everything fresh. I wish now I
> would have used 512MB but unfortunately I didn’t see that suggestion in
> time. Anyway, the good news is the system load has greatly decreased. The
> systems are all now in a usable state.

> The bad news is that I am still seeing a bunch of heals and not sure why.
> Because of that, I am also still seeing the drives slow down from over 110
> MB/sec without Gluster running on them to ~ 25 MB/sec with bricks running on
> the drives. So it seems to me there is still an issue here.

Kyle, 
Could you share 
1) the output of `gluster volume info <VOLNAME>` 
2) Could you share the client/mount logs and also the glustershd.log files? 

> Also as fate would have it, I am having issues due to a bug in the Gluster
> NFS implementation on this version (3.7) that necessitates the need to set
> nfs.acl to off so also hoping for a fix for that soon. I think this is the
> bug but not sure: https://bugzilla.redhat.com/show_bug.cgi?id=1238318

CC'ing Niels and Soumya wrt the NFS related issue. 

-Krutika 

> So to sum up, things are working but the performance leaves much to be
> desired due mostly I suspect due to all the heals taking place.

> - Kyle

> On Mon, Jan 11, 2016 at 9:32 PM, Pranith Kumar Karampuri <
> pkarampu at redhat.com > wrote:

> > On 01/12/2016 08:52 AM, Lindsay Mathieson wrote:
> 

> > > On 11/01/16 15:37, Krutika Dhananjay wrote:
> > 
> 

> > > > Kyle,
> > > 
> > 
> 

> > > > Based on the testing we have done from our end, we've found that 512MB
> > > > is
> > > > a
> > > > good number that is neither too big nor too small,
> > > 
> > 
> 
> > > > and provides good performance both on the IO side and with respect to
> > > > self-heal.
> > > 
> > 
> 

> > > Hi Krutika, I experimented a lot with different chunk sizes, didn't find
> > > all
> > > that much difference between 4MB and 1GB
> > 
> 

> > > But benchmarks are tricky things - I used Crystal Diskmark inside a VM,
> > > which
> > > is probably not the best assessment. And two of the bricks on my replica
> > > 3
> > > are very slow, just test drives, not production. So I guess that would
> > > effevt things :)
> > 
> 

> > > These are my current setting - what do you use?
> > 
> 

> > > Volume Name: datastore1
> > 
> 
> > > Type: Replicate
> > 
> 
> > > Volume ID: 1261175d-64e1-48b1-9158-c32802cc09f0
> > 
> 
> > > Status: Started
> > 
> 
> > > Number of Bricks: 1 x 3 = 3
> > 
> 
> > > Transport-type: tcp
> > 
> 
> > > Bricks:
> > 
> 
> > > Brick1: vnb.proxmox.softlog:/vmdata/datastore1
> > 
> 
> > > Brick2: vng.proxmox.softlog:/vmdata/datastore1
> > 
> 
> > > Brick3: vna.proxmox.softlog:/vmdata/datastore1
> > 
> 
> > > Options Reconfigured:
> > 
> 
> > > network.remote-dio: enable
> > 
> 
> > > cluster.eager-lock: enable
> > 
> 
> > > performance.io-cache: off
> > 
> 
> > > performance.read-ahead: off
> > 
> 
> > > performance.quick-read: off
> > 
> 
> > > performance.stat-prefetch: off
> > 
> 
> > > performance.strict-write-ordering: on
> > 
> 
> > > performance.write-behind: off
> > 
> 
> > > nfs.enable-ino32: off
> > 
> 
> > > nfs.addr-namelookup: off
> > 
> 
> > > nfs.disable: on
> > 
> 
> > > performance.cache-refresh-timeout: 4
> > 
> 
> > > performance.io-thread-count: 32
> > 
> 
> > > cluster.server-quorum-type: server
> > 
> 
> > > cluster.quorum-type: auto
> > 
> 
> > > client.event-threads: 4
> > 
> 
> > > server.event-threads: 4
> > 
> 
> > > cluster.self-heal-window-size: 256
> > 
> 
> > > features.shard-block-size: 512MB
> > 
> 
> > > features.shard: on
> > 
> 
> > > performance.readdir-ahead: off
> > 
> 

> > Most of these tests are done by Paul Cuzner (CCed).
> 

> > Pranith
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160118/853fb847/attachment.html>