[Gluster-users] Slow write times to gluster disk

Pat Haley phaley at mit.edu
Fri May 12 14:34:04 UTC 2017


Hi Pranith,

My question was about setting up a gluster volume on an ext4 partition.  
I thought we had the bricks mounted as xfs for compatibility with gluster?

Pat


On 05/11/2017 12:06 PM, Pranith Kumar Karampuri wrote:
>
>
> On Thu, May 11, 2017 at 9:32 PM, Pat Haley <phaley at mit.edu 
> <mailto:phaley at mit.edu>> wrote:
>
>
>     Hi Pranith,
>
>     The /home partition is mounted as ext4
>     /home              ext4 defaults,usrquota,grpquota      1 2
>
>     The brick partitions are mounted ax xfs
>     /mnt/brick1  xfs defaults        0 0
>     /mnt/brick2  xfs defaults        0 0
>
>     Will this cause a problem with creating a volume under /home?
>
>
> I don't think the bottleneck is disk. You can do the same tests you 
> did on your new volume to confirm?
>
>
>     Pat
>
>
>
>     On 05/11/2017 11:32 AM, Pranith Kumar Karampuri wrote:
>>
>>
>>     On Thu, May 11, 2017 at 8:57 PM, Pat Haley <phaley at mit.edu
>>     <mailto:phaley at mit.edu>> wrote:
>>
>>
>>         Hi Pranith,
>>
>>         Unfortunately, we don't have similar hardware for a small
>>         scale test.  All we have is our production hardware.
>>
>>
>>     You said something about /home partition which has lesser disks,
>>     we can create plain distribute volume inside one of those
>>     directories. After we are done, we can remove the setup. What do
>>     you say?
>>
>>
>>         Pat
>>
>>
>>
>>
>>         On 05/11/2017 07:05 AM, Pranith Kumar Karampuri wrote:
>>>
>>>
>>>         On Thu, May 11, 2017 at 2:48 AM, Pat Haley <phaley at mit.edu
>>>         <mailto:phaley at mit.edu>> wrote:
>>>
>>>
>>>             Hi Pranith,
>>>
>>>             Since we are mounting the partitions as the bricks, I
>>>             tried the dd test writing to
>>>             <brick-path>/.glusterfs/<file-to-be-removed-after-test>.
>>>             The results without oflag=sync were 1.6 Gb/s (faster
>>>             than gluster but not as fast as I was expecting given
>>>             the 1.2 Gb/s to the no-gluster area w/ fewer disks).
>>>
>>>
>>>         Okay, then 1.6Gb/s is what we need to target for,
>>>         considering your volume is just distribute. Is there any way
>>>         you can do tests on similar hardware but at a small scale?
>>>         Just so we can run the workload to learn more about the
>>>         bottlenecks in the system? We can probably try to get the
>>>         speed to 1.2Gb/s on your /home partition you were telling me
>>>         yesterday. Let me know if that is something you are okay to do.
>>>
>>>
>>>             Pat
>>>
>>>
>>>
>>>             On 05/10/2017 01:27 PM, Pranith Kumar Karampuri wrote:
>>>>
>>>>
>>>>             On Wed, May 10, 2017 at 10:15 PM, Pat Haley
>>>>             <phaley at mit.edu <mailto:phaley at mit.edu>> wrote:
>>>>
>>>>
>>>>                 Hi Pranith,
>>>>
>>>>                 Not entirely sure (this isn't my area of
>>>>                 expertise). I'll run your answer by some other
>>>>                 people who are more familiar with this.
>>>>
>>>>                 I am also uncertain about how to interpret the
>>>>                 results when we also add the dd tests writing to
>>>>                 the /home area (no gluster, still on the same machine)
>>>>
>>>>                   * dd test without oflag=sync (rough average of
>>>>                     multiple tests)
>>>>                       o gluster w/ fuse mount : 570 Mb/s
>>>>                       o gluster w/ nfs mount: 390 Mb/s
>>>>                       o nfs (no gluster):  1.2 Gb/s
>>>>                   * dd test with oflag=sync (rough average of
>>>>                     multiple tests)
>>>>                       o gluster w/ fuse mount:  5 Mb/s
>>>>                       o gluster w/ nfs mount: 200 Mb/s
>>>>                       o nfs (no gluster): 20 Mb/s
>>>>
>>>>                 Given that the non-gluster area is a RAID-6 of 4
>>>>                 disks while each brick of the gluster area is a
>>>>                 RAID-6 of 32 disks, I would naively expect the
>>>>                 writes to the gluster area to be roughly 8x faster
>>>>                 than to the non-gluster.
>>>>
>>>>
>>>>             I think a better test is to try and write to a file
>>>>             using nfs without any gluster to a location that is not
>>>>             inside the brick but someother location that is on same
>>>>             disk(s). If you are mounting the partition as the
>>>>             brick, then we can write to a file inside .glusterfs
>>>>             directory, something like
>>>>             <brick-path>/.glusterfs/<file-to-be-removed-after-test>.
>>>>
>>>>
>>>>                 I still think we have a speed issue, I can't tell
>>>>                 if fuse vs nfs is part of the problem.
>>>>
>>>>
>>>>             I got interested in the post because I read that fuse
>>>>             speed is lesser than nfs speed which is
>>>>             counter-intuitive to my understanding. So wanted
>>>>             clarifications. Now that I got my clarifications where
>>>>             fuse outperformed nfs without sync, we can resume
>>>>             testing as described above and try to find what it is.
>>>>             Based on your email-id I am guessing you are from
>>>>             Boston and I am from Bangalore so if you are okay with
>>>>             doing this debugging for multiple days because of
>>>>             timezones, I will be happy to help. Please be a bit
>>>>             patient with me, I am under a release crunch but I am
>>>>             very curious with the problem you posted.
>>>>
>>>>                 Was there anything useful in the profiles?
>>>>
>>>>
>>>>             Unfortunately profiles didn't help me much, I think we
>>>>             are collecting the profiles from an active volume, so
>>>>             it has a lot of information that is not pertaining to
>>>>             dd so it is difficult to find the contributions of dd.
>>>>             So I went through your post again and found something I
>>>>             didn't pay much attention to earlier i.e. oflag=sync,
>>>>             so did my own tests on my setup with FUSE so sent that
>>>>             reply.
>>>>
>>>>
>>>>                 Pat
>>>>
>>>>
>>>>
>>>>                 On 05/10/2017 12:15 PM, Pranith Kumar Karampuri wrote:
>>>>>                 Okay good. At least this validates my doubts.
>>>>>                 Handling O_SYNC in gluster NFS and fuse is a bit
>>>>>                 different.
>>>>>                 When application opens a file with O_SYNC on fuse
>>>>>                 mount then each write syscall has to be written to
>>>>>                 disk as part of the syscall where as in case of
>>>>>                 NFS, there is no concept of open. NFS performs
>>>>>                 write though a handle saying it needs to be a
>>>>>                 synchronous write, so write() syscall is performed
>>>>>                 first then it performs fsync(). so an write on an
>>>>>                 fd with O_SYNC becomes write+fsync. I am
>>>>>                 suspecting that when multiple threads do this
>>>>>                 write+fsync() operation on the same file, multiple
>>>>>                 writes are batched together to be written do disk
>>>>>                 so the throughput on the disk is increasing is my
>>>>>                 guess.
>>>>>
>>>>>                 Does it answer your doubts?
>>>>>
>>>>>                 On Wed, May 10, 2017 at 9:35 PM, Pat Haley
>>>>>                 <phaley at mit.edu <mailto:phaley at mit.edu>> wrote:
>>>>>
>>>>>
>>>>>                     Without the oflag=sync and only a single test
>>>>>                     of each, the FUSE is going faster than NFS:
>>>>>
>>>>>                     FUSE:
>>>>>                     mseas-data2(dri_nascar)% dd if=/dev/zero
>>>>>                     count=4096 bs=1048576 of=zeros.txt conv=sync
>>>>>                     4096+0 records in
>>>>>                     4096+0 records out
>>>>>                     4294967296 bytes (4.3 GB) copied, 7.46961 s,
>>>>>                     575 MB/s
>>>>>
>>>>>
>>>>>                     NFS
>>>>>                     mseas-data2(HYCOM)% dd if=/dev/zero count=4096
>>>>>                     bs=1048576 of=zeros.txt conv=sync
>>>>>                     4096+0 records in
>>>>>                     4096+0 records out
>>>>>                     4294967296 bytes (4.3 GB) copied, 11.4264 s,
>>>>>                     376 MB/s
>>>>>
>>>>>
>>>>>
>>>>>                     On 05/10/2017 11:53 AM, Pranith Kumar
>>>>>                     Karampuri wrote:
>>>>>>                     Could you let me know the speed without
>>>>>>                     oflag=sync on both the mounts? No need to
>>>>>>                     collect profiles.
>>>>>>
>>>>>>                     On Wed, May 10, 2017 at 9:17 PM, Pat Haley
>>>>>>                     <phaley at mit.edu <mailto:phaley at mit.edu>> wrote:
>>>>>>
>>>>>>
>>>>>>                         Here is what I see now:
>>>>>>
>>>>>>                         [root at mseas-data2 ~]# gluster volume info
>>>>>>
>>>>>>                         Volume Name: data-volume
>>>>>>                         Type: Distribute
>>>>>>                         Volume ID:
>>>>>>                         c162161e-2a2d-4dac-b015-f31fd89ceb18
>>>>>>                         Status: Started
>>>>>>                         Number of Bricks: 2
>>>>>>                         Transport-type: tcp
>>>>>>                         Bricks:
>>>>>>                         Brick1: mseas-data2:/mnt/brick1
>>>>>>                         Brick2: mseas-data2:/mnt/brick2
>>>>>>                         Options Reconfigured:
>>>>>>                         diagnostics.count-fop-hits: on
>>>>>>                         diagnostics.latency-measurement: on
>>>>>>                         nfs.exports-auth-enable: on
>>>>>>                         diagnostics.brick-sys-log-level: WARNING
>>>>>>                         performance.readdir-ahead: on
>>>>>>                         nfs.disable: on
>>>>>>                         nfs.export-volumes: off
>>>>>>
>>>>>>
>>>>>>
>>>>>>                         On 05/10/2017 11:44 AM, Pranith Kumar
>>>>>>                         Karampuri wrote:
>>>>>>>                         Is this the volume info you have?
>>>>>>>
>>>>>>>                         >/[root at mseas-data2
>>>>>>>                         <http://www.gluster.org/mailman/listinfo/gluster-users>
>>>>>>>                         ~]# gluster volume info />//>/Volume Name: data-volume />/Type: Distribute />/Volume ID:
>>>>>>>                         c162161e-2a2d-4dac-b015-f31fd89ceb18 />/Status: Started />/Number of Bricks: 2 />/Transport-type: tcp />/Bricks: />/Brick1: mseas-data2:/mnt/brick1 />/Brick2: mseas-data2:/mnt/brick2 />/Options Reconfigured: />/performance.readdir-ahead: on />/nfs.disable: on />/nfs.export-volumes: off /
>>>>>>>                         ​I copied this from old thread from
>>>>>>>                         2016. This is distribute volume. Did you
>>>>>>>                         change any of the options in between?
>>>>>>
>>>>>>                         -- 
>>>>>>
>>>>>>                         -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>>                         Pat Haley                          Email:phaley at mit.edu <mailto:phaley at mit.edu>
>>>>>>                         Center for Ocean Engineering       Phone:  (617) 253-6824
>>>>>>                         Dept. of Mechanical Engineering    Fax:    (617) 253-8125
>>>>>>                         MIT, Room 5-213http://web.mit.edu/phaley/www/
>>>>>>                         77 Massachusetts Avenue
>>>>>>                         Cambridge, MA  02139-4301
>>>>>>
>>>>>>                     -- 
>>>>>>                     Pranith
>>>>>                     -- 
>>>>>
>>>>>                     -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>                     Pat Haley                          Email:phaley at mit.edu <mailto:phaley at mit.edu>
>>>>>                     Center for Ocean Engineering       Phone:  (617) 253-6824
>>>>>                     Dept. of Mechanical Engineering    Fax:    (617) 253-8125
>>>>>                     MIT, Room 5-213http://web.mit.edu/phaley/www/
>>>>>                     77 Massachusetts Avenue
>>>>>                     Cambridge, MA  02139-4301
>>>>>
>>>>>                 -- 
>>>>>                 Pranith
>>>>                 -- 
>>>>
>>>>                 -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>                 Pat Haley                          Email:phaley at mit.edu <mailto:phaley at mit.edu>
>>>>                 Center for Ocean Engineering       Phone:  (617) 253-6824
>>>>                 Dept. of Mechanical Engineering    Fax:    (617) 253-8125
>>>>                 MIT, Room 5-213http://web.mit.edu/phaley/www/
>>>>                 77 Massachusetts Avenue
>>>>                 Cambridge, MA  02139-4301
>>>>
>>>>             -- 
>>>>             Pranith
>>>             -- 
>>>
>>>             -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>             Pat Haley                          Email:phaley at mit.edu <mailto:phaley at mit.edu>
>>>             Center for Ocean Engineering       Phone:  (617) 253-6824
>>>             Dept. of Mechanical Engineering    Fax:    (617) 253-8125
>>>             MIT, Room 5-213http://web.mit.edu/phaley/www/
>>>             77 Massachusetts Avenue
>>>             Cambridge, MA  02139-4301
>>>
>>>         -- 
>>>         Pranith
>>         -- 
>>
>>         -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>         Pat Haley                          Email:phaley at mit.edu <mailto:phaley at mit.edu>
>>         Center for Ocean Engineering       Phone:  (617) 253-6824
>>         Dept. of Mechanical Engineering    Fax:    (617) 253-8125
>>         MIT, Room 5-213http://web.mit.edu/phaley/www/
>>         77 Massachusetts Avenue
>>         Cambridge, MA  02139-4301
>>
>>     -- 
>>     Pranith
>     -- 
>
>     -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>     Pat Haley                          Email:phaley at mit.edu <mailto:phaley at mit.edu>
>     Center for Ocean Engineering       Phone:  (617) 253-6824
>     Dept. of Mechanical Engineering    Fax:    (617) 253-8125
>     MIT, Room 5-213http://web.mit.edu/phaley/www/
>     77 Massachusetts Avenue
>     Cambridge, MA  02139-4301
>
> -- 
> Pranith
-- 

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Pat Haley                          Email:  phaley at mit.edu
Center for Ocean Engineering       Phone:  (617) 253-6824
Dept. of Mechanical Engineering    Fax:    (617) 253-8125
MIT, Room 5-213                    http://web.mit.edu/phaley/www/
77 Massachusetts Avenue
Cambridge, MA  02139-4301
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170512/61a062d1/attachment.html>


More information about the Gluster-users mailing list