[Gluster-devel] [Gluster-users] Regarding the write performance in replica 1 volume in 1Gbps Ethernet, get about 50MB/s while writing single file.

Fri Sep 5 09:35:26 UTC 2014

Hi Jaden,

Sorry, from your subject I misunderstood your setup.

In a pure distributed volume, each file goes to a brick and only that brick. 
That unique brick is computed with the elastic hash algorithm.

If you get near wire speed, 1Gbps or 120MBps, when writing several files at once 
but only roughly half speed writing only one file, maybe each brick limits write 
speed: one "green" SATA disk running at 5400rpm reaches 75MBps maximum writing 
big files sequentially (Enterprise SATA disk spinning at 7200rpm reaches around 
115MBps).

Can you, please, explain which type of bricks do you have on each server node?

I'll try to emulate your setup and test it.

Thank you!

El 04/09/14 a les 03:20, Jaden Liang ha escrit:
> Hi Ramon,
>
> I am running on gluster FUSE client.
>
> I maybe not stat clearly my testing environment. Let me explain. The volume is 
> configured on 2 servers. There is no replication at all, just distributed 
> volume. So I don't think it is the replicated data issue. Actually, we can 
> reach 100MB/s when writing mutiple files at the same time.
>
> Here is the volume info:
>
> # gluster volume info vs_vol_rep1
> Volume Name: vs_vol_rep1
> Type: Distribute
> Volume ID: cd137b57-e98a-4755-939a-7fc578f2a8c0
> Status: Started
> Number of Bricks: 10
> Transport-type: tcp
> Bricks:
> Brick1: 
> host-001e67a3486c:/sf/data/vs/local/dfb0edaa-cfcb-4536-b5cb-a89aabaf8b4d/49ea070f-1480-4838-8182-95d1a6f17d81
> Brick2: 
> host-001e67a3486c:/sf/data/vs/local/ac752388-1c2d-43a2-9396-7bedaf9abce2/49ea070f-1480-4838-8182-95d1a6f17d81
> Brick3: 
> host-001e67a3486c:/sf/data/vs/local/6ef6c20e-ed59-4f3c-a354-a47caf11bbb0/49ea070f-1480-4838-8182-95d1a6f17d81
> Brick4: 
> host-001e67a3486c:/sf/data/vs/local/4fa375da-265f-4436-8385-6af949581e16/49ea070f-1480-4838-8182-95d1a6f17d81
> Brick5: 
> host-001e67a3486c:/sf/data/vs/local/184f174a-c5ee-45e8-8cbc-20ae518ad7b1/49ea070f-1480-4838-8182-95d1a6f17d81
> Brick6: 
> host-001e67a3486c:/sf/data/vs/local/0a20eb9a-bba4-4cfd-be8f-542eac7a1f98/49ea070f-1480-4838-8182-95d1a6f17d81
> Brick7: 
> host-001e67a3486c:/sf/data/vs/local/03648144-fec1-4471-9aa7-45fc2123867a/49ea070f-1480-4838-8182-95d1a6f17d81
> Brick8: 
> host-001e67a349d4:/sf/data/vs/local/e7de2d40-6ebd-4867-b2a6-c19c669ecc83/49ea070f-1480-4838-8182-95d1a6f17d81
> Brick9: 
> host-001e67a349d4:/sf/data/vs/local/896da577-cd03-42a0-8f5c-469759dd7f7b/49ea070f-1480-4838-8182-95d1a6f17d81
> Brick10: 
> host-001e67a349d4:/sf/data/vs/local/6f274934-7e8b-4145-9f3a-bab549e2a95d/49ea070f-1480-4838-8182-95d1a6f17d81
> Options Reconfigured:
> diagnostics.latency-measurement: on
> nfs.disable: on
>
> On Wednesday, September 3, 2014, Ramon Selga <ramon.selga at gmail.com 
> <mailto:ramon.selga at gmail.com>> wrote:
>
>     Hi Jaden,
>
>     May I ask some more info about your setup?
>
>     Are you using NFS client or gluster FUSE client?
>
>     If you are using NFS Client write data goes to one of nodes of replica
>     pair and that node sends write replica data to the other node. If you are
>     using one switch for client and server connections and one 1GbE port on
>     each device, data received in the first node is re-sended to the other
>     node simultaneously and, in theory, you may reach speeds closer to 100MBps.
>
>     In case of gluster FUSE Client, write data goes simultaneously to both
>     server nodes using half bandwidth for each of the client's 1GbE port
>     because replica is done by client side, that results on a writing speed
>     around 50MBps(<60MBps).
>
>     I hope this helps.
>
>     El 03/09/14 a les 07:02, Jaden Liang ha escrit:
>>     Hi all,
>>
>>     We did some more tests and analysis yesterday. It looks like 50MB/s is
>>     the top theoretical speed in replica 1 volume over 1Gbps network.
>>     GlusterFS write 128KB data once a block, then wait for return. The 128KB
>>     data would cost about 1ms in 1Gbps network. And in the server-side, it
>>     took about 800us to 1000us to write 128KB to the HDD and return. Plus
>>     some other 100us to 200us time elapsed. GlusterFS would take about
>>     2ms-2.2ms to finish a 128KB block data writing, which is about 50MB/s.
>>
>>     The question is that why don't glusterfs use pipeline writting or reading
>>     to speed up this chatty process?
>>
>>     On Tuesday, September 2, 2014, Jaden Liang <jaden1q84 at gmail.com
>>     <javascript:_e(%7B%7D,'cvml','jaden1q84 at gmail.com');>> wrote:
>>
>>         Hello, gluster-devel and gluster-users team,
>>
>>         We are running a performance test in a replica 1 volume and find out
>>         the single file sequence writing performance only get about 50MB/s in
>>         a 1Gbps Ethernet. However, if we test multiple files sequence
>>         writing, the writing performance can go up to 120MB/s which is the
>>         top speed of network.
>>
>>         We also tried to use the stat xlator to find out where is the
>>         bottleneck of single file write performance. Here is the stat data:
>>
>>         Client-side:
>>         ......
>>         vs_vol_rep1-client-8.latency.WRITE=total:21834371.000000us,
>>         mean:2665.328491us, count:8192, max:4063475, min:1849
>>         ......
>>
>>         Server-side:
>>         ......
>>         /data/sdb1/brick1.latency.WRITE=total:6156857.000000us,
>>         mean:751.569458us, count:8192, max:230864, min:611
>>         ......
>>
>>         Note that the test is write a 1GB single file sequentially to a
>>         replica 1 volume through 1Gbps Ethernet network.
>>
>>         On the client-side, we can see there are 8192 write requests totally.
>>         Every request will write 128KB data. Total eclipsed time is
>>         21834371us, about 21 seconds. The mean time of request is 2665us,
>>         about 2.6ms which means it could only serves about 380 requests in 1
>>         seconds. Plus there are other time consuming like statfs, lookup, but
>>         those are not major reasons.
>>
>>         On the server-side, the mean time of request is 751us include write
>>         data to HDD disk. So we think that is not the major reason.
>>
>>         And we also modify some codes to do the statistic of system epoll
>>         elapsed time. It only took about 20us from enqueue data to finish
>>         sent-out.
>>
>>         Now we are heading to the rpc mechanism in glusterfs. Still, we think
>>         this issue maybe encountered in gluster-devel or gluster-users teams.
>>         Therefor, any suggestions would be grateful. Or have anyone know such
>>         issue?
>>
>>         Best regards,
>>         Jaden Liang
>>         9/2/2014
>>
>>
>>         -- 
>>         Best regards,
>>         Jaden Liang
>>
>>
>>
>>     _______________________________________________
>>     Gluster-devel mailing list
>>     Gluster-devel at gluster.org  <javascript:_e(%7B%7D,'cvml','Gluster-devel at gluster.org');>
>>     http://supercolony.gluster.org/mailman/listinfo/gluster-devel
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-devel/attachments/20140905/1297d3a6/attachment.html>