[Gluster-users] Slow write times to gluster disk

Ravishankar N ravishankar at redhat.com
Tue Apr 11 04:21:21 UTC 2017


On 04/11/2017 12:42 AM, Pat Haley wrote:
>
> Hi Ravi,
>
> Thanks for the reply.  And yes, we are using the gluster native (fuse) 
> mount.  Since this is not my area of expertise I have a few questions 
> (mostly clarifications)
>
> Is a factor of 20 slow-down typical when compare a fuse-mounted 
> filesytem versus an NFS-mounted filesystem or should we also be 
> looking for additional issues?  (Note the first dd test described 
> below was run on the server that hosts the file-systems so no network 
> communication was involved).

Though both the gluster bricks and the mounts are on the same physical 
machine in your setup, the I/O still passes through different layers of 
kernel/user-space fuse stack although I don't know if 20x slow down on 
gluster vs NFS share is normal. Why don't you try doing a gluster NFS 
mount on the machine and try the dd test and compare it with the gluster 
fuse mount results?

>
> You also mention tweaking " write-behind xlator settings".  Would you 
> expect better speed improvements from switching the mounting from fuse 
> to gnfs or from tweaking the settings?  Also are these mutually 
> exclusive or would the be additional benefits from both switching to 
> gfns and tweaking?
You should test these out and find the answers yourself. :-)

>
> My next question is to make sure I'm clear on the comment " if the 
> gluster node containing the gnfs server goes down, all mounts done 
> using that node will fail".  If you have 2 servers, each 1 brick in 
> the over-all gluster FS, and one server fails, then for gnfs nothing 
> on either server is visible to other nodes while under fuse only the 
> files on the dead server are not visible.  Is this what you meant?
Yes, for gnfs mounts, all I/O from various mounts go to the gnfs server 
process (on the machine whose IP was used at the time of mounting) which 
then sends the I/O to the brick processes. For fuse, the gluster fuse 
mount itself talks directly to the bricks.
>
> Finally, you mention "even for gnfs mounts, you can achieve fail-over 
> by using CTDB".  Do you know if CTDB would have any performance impact 
> (i.e. in a worst cast scenario could adding CTDB to gnfs erase the 
> speed benefits of going to gnfs in the first place)?
I don't think it would. You can even achieve load balancing via CTDB to 
use different gnfs servers for different clients. But I don't know if 
this is needed/ helpful in your current setup where everything (bricks 
and clients) seem to be on just one server.

-Ravi
> Thanks
>
> Pat
>
>
> On 04/08/2017 12:58 AM, Ravishankar N wrote:
>> Hi Pat,
>>
>> I'm assuming you are using gluster native (fuse mount). If it helps, 
>> you could try mounting it via gluster NFS (gnfs) and then see if 
>> there is an improvement in speed. Fuse mounts are slower than gnfs 
>> mounts but you get the benefit of avoiding a single point of failure. 
>> Unlike fuse mounts, if the gluster node containing the gnfs server 
>> goes down, all mounts done using that node will fail). For fuse 
>> mounts, you could try tweaking the write-behind xlator settings to 
>> see if it helps. See the performance.write-behind and 
>> performance.write-behind-window-size options in `gluster volume set 
>> help`. Of course, even for gnfs mounts, you can achieve fail-over by 
>> using CTDB.
>>
>> Thanks,
>> Ravi
>>
>> On 04/08/2017 12:07 AM, Pat Haley wrote:
>>>
>>> Hi,
>>>
>>> We noticed a dramatic slowness when writing to a gluster disk when 
>>> compared to writing to an NFS disk. Specifically when using dd (data 
>>> duplicator) to write a 4.3 GB file of zeros:
>>>
>>>   * on NFS disk (/home): 9.5 Gb/s
>>>   * on gluster disk (/gdata): 508 Mb/s
>>>
>>> The gluser disk is 2 bricks joined together, no replication or 
>>> anything else. The hardware is (literally) the same:
>>>
>>>   * one server with 70 hard disks  and a hardware RAID card.
>>>   * 4 disks in a RAID-6 group (the NFS disk)
>>>   * 32 disks in a RAID-6 group (the max allowed by the card,
>>>     /mnt/brick1)
>>>   * 32 disks in another RAID-6 group (/mnt/brick2)
>>>   * 2 hot spare
>>>
>>> Some additional information and more tests results (after changing 
>>> the log level):
>>>
>>> glusterfs 3.7.11 built on Apr 27 2016 14:09:22
>>> CentOS release 6.8 (Final)
>>> RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS-3 3108 
>>> [Invader] (rev 02)
>>>
>>>
>>>
>>> *Create the file to /gdata (gluster)*
>>> [root at mseas-data2 gdata]# dd if=/dev/zero of=/gdata/zero1 bs=1M 
>>> count=1000
>>> 1000+0 records in
>>> 1000+0 records out
>>> 1048576000 bytes (1.0 GB) copied, 1.91876 s, *546 MB/s*
>>>
>>> *Create the file to /home (ext4)*
>>> [root at mseas-data2 gdata]# dd if=/dev/zero of=/home/zero1 bs=1M 
>>> count=1000
>>> 1000+0 records in
>>> 1000+0 records out
>>> 1048576000 bytes (1.0 GB) copied, 0.686021 s, *1.5 GB/s - *3 times 
>>> as fast*
>>>
>>>
>>> Copy from /gdata to /gdata (gluster to gluster)
>>> *[root at mseas-data2 gdata]# dd if=/gdata/zero1 of=/gdata/zero2
>>> 2048000+0 records in
>>> 2048000+0 records out
>>> 1048576000 bytes (1.0 GB) copied, 101.052 s, *10.4 MB/s* - realllyyy 
>>> slooowww
>>>
>>>
>>> *Copy from /gdata to /gdata* *2nd time *(gluster to gluster)**
>>> [root at mseas-data2 gdata]# dd if=/gdata/zero1 of=/gdata/zero2
>>> 2048000+0 records in
>>> 2048000+0 records out
>>> 1048576000 bytes (1.0 GB) copied, 92.4904 s, *11.3 MB/s* - realllyyy 
>>> slooowww again
>>>
>>>
>>>
>>> *Copy from /home to /home (ext4 to ext4)*
>>> [root at mseas-data2 gdata]# dd if=/home/zero1 of=/home/zero2
>>> 2048000+0 records in
>>> 2048000+0 records out
>>> 1048576000 bytes (1.0 GB) copied, 3.53263 s, *297 MB/s *30 times as fast
>>>
>>>
>>> *Copy from /home to /home (ext4 to ext4)*
>>> [root at mseas-data2 gdata]# dd if=/home/zero1 of=/home/zero3
>>> 2048000+0 records in
>>> 2048000+0 records out
>>> 1048576000 bytes (1.0 GB) copied, 4.1737 s, *251 MB/s* - 30 times as 
>>> fast
>>>
>>>
>>> As a test, can we copy data directly to the xfs mountpoint 
>>> (/mnt/brick1) and bypass gluster?
>>>
>>>
>>> Any help you could give us would be appreciated.
>>>
>>> Thanks
>>>
>>> -- 
>>>
>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>> Pat Haley                          Email:phaley at mit.edu
>>> Center for Ocean Engineering       Phone:  (617) 253-6824
>>> Dept. of Mechanical Engineering    Fax:    (617) 253-8125
>>> MIT, Room 5-213http://web.mit.edu/phaley/www/
>>> 77 Massachusetts Avenue
>>> Cambridge, MA  02139-4301
>>>
>>>
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>
>>
>
> -- 
>
> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> Pat Haley                          Email:phaley at mit.edu
> Center for Ocean Engineering       Phone:  (617) 253-6824
> Dept. of Mechanical Engineering    Fax:    (617) 253-8125
> MIT, Room 5-213http://web.mit.edu/phaley/www/
> 77 Massachusetts Avenue
> Cambridge, MA  02139-4301


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170411/874286d4/attachment.html>


More information about the Gluster-users mailing list