[Gluster-users] Glusterfs performance tweaks

Sat Apr 11 16:39:14 UTC 2015

I have a similar behaviour.

All systems are identical

This is my setup:

############################################################
root at store-1:~# uname -a
Linux store-1 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt7-1 (2015-03-01)
x86_64 GNU/Linux
root at store-1:~# glusterfsd --version
glusterfs 3.6.2 built on Jan 21 2015 14:23:41
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2013 Red Hat, Inc. <http://www.redhat.com/>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
It is licensed to you under your choice of the GNU Lesser
General Public License, version 3 or any later version (LGPLv3
or later), or the GNU General Public License, version 2 (GPLv2),
in all cases as published by the Free Software Foundation.
root at store-1:~# gluster peer status
Number of Peers: 3

Hostname: 172.16.155.23
Uuid: 094c8ec4-f472-4d32-b3cb-705458cca949
State: Peer in Cluster (Connected)

Hostname: 172.16.155.22
Uuid: 00042351-cc66-4528-ac66-75ed7ea01b44
State: Peer in Cluster (Connected)

Hostname: 172.16.155.24
Uuid: 113307a5-79c8-47c3-a6fa-8b725a56f807
State: Peer in Cluster (Connected)
root at store-1:~# gluster volume info

Volume Name: cinder-disperse
Type: Disperse
Volume ID: a70cf0c4-9320-4e7a-8e6b-a9b6242d151e
Status: Started
Number of Bricks: 1 x (3 + 1) = 4
Transport-type: tcp
Bricks:
Brick1: 172.16.155.21:/data/cidi
Brick2: 172.16.155.22:/data/cidi
Brick3: 172.16.155.23:/data/cidi
Brick4: 172.16.155.24:/data/cidi
Options Reconfigured:
storage.owner-gid: 114
storage.owner-uid: 108
performance.cache-size: 24576MB
performance.io-thread-count: 64
cluster.data-self-heal-algorithm: full
cluster.server-quorum-type: none
cluster.quorum-type: none
network.remote-dio: disable
cluster.eager-lock: enable
performance.stat-prefetch: off
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
server.allow-insecure: on
root at store-1:~# iperf -c 172.16.155.22
------------------------------------------------------------
Client connecting to 172.16.155.22, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[  3] local 172.16.155.21 port 46843 connected with 172.16.155.22 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  7.92 GBytes  6.80 Gbits/sec
root at store-1:~# iperf -c 172.16.155.23
------------------------------------------------------------
Client connecting to 172.16.155.23, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[  3] local 172.16.155.21 port 59368 connected with 172.16.155.23 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  7.94 GBytes  6.82 Gbits/sec
root at store-1:~# iperf -c 172.16.155.24
------------------------------------------------------------
Client connecting to 172.16.155.24, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[  3] local 172.16.155.21 port 33656 connected with 172.16.155.24 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  7.94 GBytes  6.82 Gbits/sec
root at store-1:~#
root at store-1:~# time sh -c "dd if=/dev/zero of=/data/cidi/test.tmp bs=4k
count=2000000 && sync"
2000000+0 record dentro
2000000+0 record fuori
8192000000 byte (8,2 GB) copiati, 8,58092 s, 955 MB/s

real 0m20.749s
user 0m0.276s
sys 0m6.304s
root at store-1:/data#

############################################################

root at store-2:~#  time sh -c "dd if=/dev/zero of=/data/cidi/test.tmp bs=4k
count=2000000 && sync"
2000000+0 record dentro
2000000+0 record fuori
8192000000 byte (8,2 GB) copiati, 9,60108 s, 853 MB/s

real 0m20.751s
user 0m0.264s
sys 0m6.128s

############################################################

root at store-3:~#  time sh -c "dd if=/dev/zero of=/data/cidi/test.tmp bs=4k
count=2000000 && sync"
2000000+0 record dentro
2000000+0 record fuori
8192000000 byte (8,2 GB) copiati, 9,27243 s, 883 MB/s

real 0m20.979s
user 0m0.284s
sys 0m6.168s

############################################################

root at store-4:~#  time sh -c "dd if=/dev/zero of=/data/cidi/test.tmp bs=4k
count=2000000 && sync"
2000000+0 record dentro
2000000+0 record fuori
8192000000 byte (8,2 GB) copiati, 9,12428 s, 898 MB/s

real 0m20.067s
user 0m0.244s
sys 0m6.304s

############################################################

I try to mount from another node

############################################################

root at nodo-3:~# iperf -c 172.16.155.21
------------------------------------------------------------
Client connecting to 172.16.155.21, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[  3] local 172.16.155.13 port 59299 connected with 172.16.155.21 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  7.08 GBytes  6.08 Gbits/sec
root at nodo-3:~# mount -t glusterfs 172.16.155.21:/cinder-disperse
/mnt/cinder-disperse
root at nodo-3:~# time sh -c "dd if=/dev/zero of=/mnt/cinder-disperse/test.tmp
bs=4k count=2000000 && sync"
2000000+0 record dentro
2000000+0 record fuori
8192000000 byte (8,2 GB) copiati, 72,6074 s, 113 MB/s

real 1m13.842s
user 0m0.584s
sys 0m19.868s
root at nodo-3:~#

############################################################

Is this the write speed that I can expect ?
Can I improve it in some way ?

Thanks a lot.

M.

2015-04-11 6:19 GMT+02:00 Prasun Gera <prasun.gera at gmail.com>:

> There is something that's not clear in what you are describing. Gluster
> doesn't come into play until you access your data through the gulsterfs
> mount. You can even stop your gluster volume and stop the glusterfs daemon
> to confirm that it is not really interfering with your writes to the brick
> in any way. What you are describing sounds like an issue with the way you
> have partitioned your drive or set up the filesystem, which is probably xfs
> in case of glusterfs if you are using defaults. Are you comparing the same
> file system in both your cases ?
>
> On Fri, Apr 10, 2015 at 11:45 AM, Punit Dambiwal <hypunit at gmail.com>
> wrote:
>
>> Hi Ben,
>>
>> That means if i will not attach the SSD in to brick...even not install
>> glusterfs on the server...it gives me throughput about 300mb/s but once i
>> will install glusterfs and add this ssd in to glusterfs volume it gives me
>> 16 mb/s...
>>
>> On Fri, Apr 10, 2015 at 9:32 PM, Ben Turner <bturner at redhat.com> wrote:
>>
>>> ----- Original Message -----
>>> > From: "Punit Dambiwal" <hypunit at gmail.com>
>>> > To: "Ben Turner" <bturner at redhat.com>
>>> > Cc: "Vijay Bellur" <vbellur at redhat.com>, gluster-users at gluster.org
>>> > Sent: Thursday, April 9, 2015 9:36:59 PM
>>> > Subject: Re: [Gluster-users] Glusterfs performance tweaks
>>> >
>>> > Hi Ben,
>>> >
>>> > But without glusterfs if i run the same command with dsync on the same
>>> > ssd...it gives me good throughput...all setup (CPU,RAM,Network are
>>> same)
>>> > the only difference is no glusterfs...
>>> >
>>> > [root at cpu09 mnt]# dd if=/dev/zero of=test bs=64k count=4k oflag=dsync
>>> > 4096+0 records in
>>> > 4096+0 records out
>>> > 268435456 bytes (268 MB) copied, 0.935646 s, 287 MB/s
>>> > [root at cpu09 mnt]#
>>> >
>>> > [image: Inline image 1]
>>> >
>>> > But on the top of the glusterfs it gives too slow performance....i run
>>> the
>>> > ssd trim every night to clean the garbage collection...i think there is
>>> > something need to do from gluster or OS side to improve the
>>> > performance....otherwise no use to use the ALL SSD with gluster because
>>> > with all SSD you will get the performance slower then SATA....
>>> >
>>> >
>>> >
>>> > On Fri, Apr 10, 2015 at 2:12 AM, Ben Turner <bturner at redhat.com>
>>> wrote:
>>> >
>>> > > ----- Original Message -----
>>> > > > From: "Punit Dambiwal" <hypunit at gmail.com>
>>> > > > To: "Vijay Bellur" <vbellur at redhat.com>
>>> > > > Cc: gluster-users at gluster.org
>>> > > > Sent: Wednesday, April 8, 2015 9:55:38 PM
>>> > > > Subject: Re: [Gluster-users] Glusterfs performance tweaks
>>> > > >
>>> > > > Hi Vijay,
>>> > > >
>>> > > > If i run the same command directly on the brick...
>>>
>>> What does this mean then?  Running directly on the brick to me means
>>> running directly on the SSD.  The command below is the same thing as above,
>>> what changed?
>>>
>>> -b
>>>
>>> > > >
>>> > > > [root at cpu01 1]# dd if=/dev/zero of=test bs=64k count=4k
>>> oflag=dsync
>>> > > > 4096+0 records in
>>> > > > 4096+0 records out
>>> > > > 268435456 bytes (268 MB) copied, 16.8022 s, 16.0 MB/s
>>> > > > [root at cpu01 1]# pwd
>>> > > > /bricks/1
>>> > > > [root at cpu01 1]#
>>> > > >
>>> > >
>>> > > This is your problem.  Gluster is only as fast as its slowest piece,
>>> and
>>> > > here your storage is the bottleneck.  Being that you get 16 MB to
>>> the brick
>>> > > and 12 to gluster that works out to about 25% overhead which is what
>>> I
>>> > > would expect with a single thread, single brick, single client
>>> scenario.
>>> > > This may have something to do with the way SSDs write?  On my SSD at
>>> my
>>> > > desk I only get 11.4 MB / sec when I run that DD command:
>>> > >
>>> > > # dd if=/dev/zero of=test bs=64k count=4k oflag=dsync
>>> > > 4096+0 records in
>>> > > 4096+0 records out
>>> > > 268435456 bytes (268 MB) copied, 23.065 s, 11.4 MB/s
>>> > >
>>> > > My thought is that maybe using dsync is forcing the SSD to clean the
>>> data
>>> > > or something else before writing to it:
>>> > >
>>> > > http://www.blog.solidstatediskshop.com/2012/how-does-an-ssd-write/
>>> > >
>>> > > Do your drives support fstrim?  It may be worth it to trim before
>>> you run
>>> > > and see what results you get.  Other than tuning the SSD / OS to
>>> perform
>>> > > better on the back end there isn't much we can do from the gluster
>>> > > perspective on that specific DD w/ the dsync flag.
>>> > >
>>> > > -b
>>> > >
>>> > > >
>>> > > > On Wed, Apr 8, 2015 at 6:44 PM, Vijay Bellur < vbellur at redhat.com
>>> >
>>> > > wrote:
>>> > > >
>>> > > >
>>> > > >
>>> > > > On 04/08/2015 02:57 PM, Punit Dambiwal wrote:
>>> > > >
>>> > > >
>>> > > >
>>> > > > Hi,
>>> > > >
>>> > > > I am getting very slow throughput in the glusterfs (dead
>>> slow...even
>>> > > > SATA is better) ... i am using all SSD in my environment.....
>>> > > >
>>> > > > I have the following setup :-
>>> > > > A. 4* host machine with Centos 7(Glusterfs 3.6.2 | Distributed
>>> > > > Replicated | replica=2)
>>> > > > B. Each server has 24 SSD as bricks…(Without HW Raid | JBOD)
>>> > > > C. Each server has 2 Additional ssd for OS…
>>> > > > D. Network 2*10G with bonding…(2*E5 CPU and 64GB RAM)
>>> > > >
>>> > > > Note :- Performance/Throughput slower then Normal SATA 7200
>>> RPM…even i
>>> > > > am using all SSD in my ENV..
>>> > > >
>>> > > > Gluster Volume options :-
>>> > > >
>>> > > > +++++++++++++++
>>> > > > Options Reconfigured:
>>> > > > performance.nfs.write-behind- window-size: 1024MB
>>> > > > performance.io-thread-count: 32
>>> > > > performance.cache-size: 1024MB
>>> > > > cluster.quorum-type: auto
>>> > > > cluster.server-quorum-type: server
>>> > > > diagnostics.count-fop-hits: on
>>> > > > diagnostics.latency- measurement: on
>>> > > > nfs.disable: on
>>> > > > user.cifs: enable
>>> > > > auth.allow: *
>>> > > > performance.quick-read: off
>>> > > > performance.read-ahead: off
>>> > > > performance.io-cache: off
>>> > > > performance.stat-prefetch: off
>>> > > > cluster.eager-lock: enable
>>> > > > network.remote-dio: enable
>>> > > > storage.owner-uid: 36
>>> > > > storage.owner-gid: 36
>>> > > > server.allow-insecure: on
>>> > > > network.ping-timeout: 0
>>> > > > diagnostics.brick-log-level: INFO
>>> > > > +++++++++++++++++++
>>> > > >
>>> > > > Test with SATA and Glusterfs SSD….
>>> > > > ———————
>>> > > > Dell EQL (SATA disk 7200 RPM)
>>> > > > —-
>>> > > > [root at mirror ~]#
>>> > > > 4096+0 records in
>>> > > > 4096+0 records out
>>> > > > 268435456 bytes (268 MB) copied, 20.7763 s, 12.9 MB/s
>>> > > > [root at mirror ~]# dd if=/dev/zero of=test bs=64k count=4k
>>> oflag=dsync
>>> > > > 4096+0 records in
>>> > > > 4096+0 records out
>>> > > > 268435456 bytes (268 MB) copied, 23.5947 s, 11.4 MB/s
>>> > > >
>>> > > > GlsuterFS SSD
>>> > > > —
>>> > > > [root at sv-VPN1 ~]# dd if=/dev/zero of=test bs=64k count=4k
>>> oflag=dsync
>>> > > > 4096+0 records in
>>> > > > 4096+0 records out
>>> > > > 268435456 bytes (268 MB) copied, 66.2572 s, 4.1 MB/s
>>> > > > [root at sv-VPN1 ~]# dd if=/dev/zero of=test bs=64k count=4k
>>> oflag=dsync
>>> > > > 4096+0 records in
>>> > > > 4096+0 records out
>>> > > > 268435456 bytes (268 MB) copied, 62.6922 s, 4.3 MB/s
>>> > > > ————————
>>> > > >
>>> > > > Please let me know what i should do to improve the performance of
>>> my
>>> > > > glusterfs…
>>> > > >
>>> > > >
>>> > > > What is the throughput that you get when you run these commands on
>>> the
>>> > > disks
>>> > > > directly without gluster in the picture?
>>> > > >
>>> > > > By running dd with dsync you are ensuring that there is no
>>> buffering
>>> > > anywhere
>>> > > > in the stack and that is the reason why low throughput is being
>>> observed.
>>> > > >
>>> > > > -Vijay
>>> > > >
>>> > > > -Vijay
>>> > > >
>>> > > >
>>> > > >
>>> > > > _______________________________________________
>>> > > > Gluster-users mailing list
>>> > > > Gluster-users at gluster.org
>>> > > > http://www.gluster.org/mailman/listinfo/gluster-users
>>> > >
>>> >
>>>
>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-users
>>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150411/e5b90214/attachment.html>