[Gluster-users] Slow write times to gluster disk

Pat Haley phaley at mit.edu
Tue May 30 17:06:51 UTC 2017


Hi Pranith,

I ran the same 'dd' test both in the gluster test volume and in the 
.glusterfs directory of each brick.  The median results (12 dd trials in 
each test) are similar to before

  * gluster test volume: 586.5 MB/s
  * bricks (in .glusterfs): 1.4 GB/s

The profile for the gluster test-volume is in

http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/profile_testvol_gluster.txt

Thanks

Pat



On 05/30/2017 12:10 PM, Pranith Kumar Karampuri wrote:
> Let's start with the same 'dd' test we were testing with to see, what 
> the numbers are. Please provide profile numbers for the same. From 
> there on we will start tuning the volume to see what we can do.
>
> On Tue, May 30, 2017 at 9:16 PM, Pat Haley <phaley at mit.edu 
> <mailto:phaley at mit.edu>> wrote:
>
>
>     Hi Pranith,
>
>     Thanks for the tip.  We now have the gluster volume mounted under
>     /home.  What tests do you recommend we run?
>
>     Thanks
>
>     Pat
>
>
>
>     On 05/17/2017 05:01 AM, Pranith Kumar Karampuri wrote:
>>
>>
>>     On Tue, May 16, 2017 at 9:20 PM, Pat Haley <phaley at mit.edu
>>     <mailto:phaley at mit.edu>> wrote:
>>
>>
>>         Hi Pranith,
>>
>>         Sorry for the delay.  I never saw received your reply (but I
>>         did receive Ben Turner's follow-up to your reply).  So we
>>         tried to create a gluster volume under /home using different
>>         variations of
>>
>>         gluster volume create test-volume
>>         mseas-data2:/home/gbrick_test_1
>>         mseas-data2:/home/gbrick_test_2 transport tcp
>>
>>         However we keep getting errors of the form
>>
>>         Wrong brick type: transport, use <HOSTNAME>:<export-dir-abs-path>
>>
>>         Any thoughts on what we're doing wrong?
>>
>>
>>     You should give transport tcp at the beginning I think. Anyways,
>>     transport tcp is the default, so no need to specify so remove
>>     those two words from the CLI.
>>
>>
>>         Also do you have a list of the test we should be running once
>>         we get this volume created?  Given the time-zone difference
>>         it might help if we can run a small battery of tests and post
>>         the results rather than test-post-new test-post... .
>>
>>
>>     This is the first time I am doing performance analysis on users
>>     as far as I remember. In our team there are separate engineers
>>     who do these tests. Ben who replied earlier is one such engineer.
>>
>>     Ben,
>>         Have any suggestions?
>>
>>
>>         Thanks
>>
>>         Pat
>>
>>
>>
>>         On 05/11/2017 12:06 PM, Pranith Kumar Karampuri wrote:
>>>
>>>
>>>         On Thu, May 11, 2017 at 9:32 PM, Pat Haley <phaley at mit.edu
>>>         <mailto:phaley at mit.edu>> wrote:
>>>
>>>
>>>             Hi Pranith,
>>>
>>>             The /home partition is mounted as ext4
>>>             /home ext4 defaults,usrquota,grpquota   1 2
>>>
>>>             The brick partitions are mounted ax xfs
>>>             /mnt/brick1  xfs defaults        0 0
>>>             /mnt/brick2  xfs defaults        0 0
>>>
>>>             Will this cause a problem with creating a volume under
>>>             /home?
>>>
>>>
>>>         I don't think the bottleneck is disk. You can do the same
>>>         tests you did on your new volume to confirm?
>>>
>>>
>>>             Pat
>>>
>>>
>>>
>>>             On 05/11/2017 11:32 AM, Pranith Kumar Karampuri wrote:
>>>>
>>>>
>>>>             On Thu, May 11, 2017 at 8:57 PM, Pat Haley
>>>>             <phaley at mit.edu <mailto:phaley at mit.edu>> wrote:
>>>>
>>>>
>>>>                 Hi Pranith,
>>>>
>>>>                 Unfortunately, we don't have similar hardware for a
>>>>                 small scale test.  All we have is our production
>>>>                 hardware.
>>>>
>>>>
>>>>             You said something about /home partition which has
>>>>             lesser disks, we can create plain distribute volume
>>>>             inside one of those directories. After we are done, we
>>>>             can remove the setup. What do you say?
>>>>
>>>>
>>>>                 Pat
>>>>
>>>>
>>>>
>>>>
>>>>                 On 05/11/2017 07:05 AM, Pranith Kumar Karampuri wrote:
>>>>>
>>>>>
>>>>>                 On Thu, May 11, 2017 at 2:48 AM, Pat Haley
>>>>>                 <phaley at mit.edu <mailto:phaley at mit.edu>> wrote:
>>>>>
>>>>>
>>>>>                     Hi Pranith,
>>>>>
>>>>>                     Since we are mounting the partitions as the
>>>>>                     bricks, I tried the dd test writing to
>>>>>                     <brick-path>/.glusterfs/<file-to-be-removed-after-test>.
>>>>>                     The results without oflag=sync were 1.6 Gb/s
>>>>>                     (faster than gluster but not as fast as I was
>>>>>                     expecting given the 1.2 Gb/s to the no-gluster
>>>>>                     area w/ fewer disks).
>>>>>
>>>>>
>>>>>                 Okay, then 1.6Gb/s is what we need to target for,
>>>>>                 considering your volume is just distribute. Is
>>>>>                 there any way you can do tests on similar hardware
>>>>>                 but at a small scale? Just so we can run the
>>>>>                 workload to learn more about the bottlenecks in
>>>>>                 the system? We can probably try to get the speed
>>>>>                 to 1.2Gb/s on your /home partition you were
>>>>>                 telling me yesterday. Let me know if that is
>>>>>                 something you are okay to do.
>>>>>
>>>>>
>>>>>                     Pat
>>>>>
>>>>>
>>>>>
>>>>>                     On 05/10/2017 01:27 PM, Pranith Kumar
>>>>>                     Karampuri wrote:
>>>>>>
>>>>>>
>>>>>>                     On Wed, May 10, 2017 at 10:15 PM, Pat Haley
>>>>>>                     <phaley at mit.edu <mailto:phaley at mit.edu>> wrote:
>>>>>>
>>>>>>
>>>>>>                         Hi Pranith,
>>>>>>
>>>>>>                         Not entirely sure (this isn't my area of
>>>>>>                         expertise). I'll run your answer by some
>>>>>>                         other people who are more familiar with this.
>>>>>>
>>>>>>                         I am also uncertain about how to
>>>>>>                         interpret the results when we also add
>>>>>>                         the dd tests writing to the /home area
>>>>>>                         (no gluster, still on the same machine)
>>>>>>
>>>>>>                           * dd test without oflag=sync (rough
>>>>>>                             average of multiple tests)
>>>>>>                               o gluster w/ fuse mount : 570 Mb/s
>>>>>>                               o gluster w/ nfs mount: 390 Mb/s
>>>>>>                               o nfs (no gluster):  1.2 Gb/s
>>>>>>                           * dd test with oflag=sync (rough
>>>>>>                             average of multiple tests)
>>>>>>                               o gluster w/ fuse mount:  5 Mb/s
>>>>>>                               o gluster w/ nfs mount: 200 Mb/s
>>>>>>                               o nfs (no gluster): 20 Mb/s
>>>>>>
>>>>>>                         Given that the non-gluster area is a
>>>>>>                         RAID-6 of 4 disks while each brick of the
>>>>>>                         gluster area is a RAID-6 of 32 disks, I
>>>>>>                         would naively expect the writes to the
>>>>>>                         gluster area to be roughly 8x faster than
>>>>>>                         to the non-gluster.
>>>>>>
>>>>>>
>>>>>>                     I think a better test is to try and write to
>>>>>>                     a file using nfs without any gluster to a
>>>>>>                     location that is not inside the brick but
>>>>>>                     someother location that is on same disk(s).
>>>>>>                     If you are mounting the partition as the
>>>>>>                     brick, then we can write to a file inside
>>>>>>                     .glusterfs directory, something like
>>>>>>                     <brick-path>/.glusterfs/<file-to-be-removed-after-test>.
>>>>>>
>>>>>>
>>>>>>
>>>>>>                         I still think we have a speed issue, I
>>>>>>                         can't tell if fuse vs nfs is part of the
>>>>>>                         problem.
>>>>>>
>>>>>>
>>>>>>                     I got interested in the post because I read
>>>>>>                     that fuse speed is lesser than nfs speed
>>>>>>                     which is counter-intuitive to my
>>>>>>                     understanding. So wanted clarifications. Now
>>>>>>                     that I got my clarifications where fuse
>>>>>>                     outperformed nfs without sync, we can resume
>>>>>>                     testing as described above and try to find
>>>>>>                     what it is. Based on your email-id I am
>>>>>>                     guessing you are from Boston and I am from
>>>>>>                     Bangalore so if you are okay with doing this
>>>>>>                     debugging for multiple days because of
>>>>>>                     timezones, I will be happy to help. Please be
>>>>>>                     a bit patient with me, I am under a release
>>>>>>                     crunch but I am very curious with the problem
>>>>>>                     you posted.
>>>>>>
>>>>>>                         Was there anything useful in the profiles?
>>>>>>
>>>>>>
>>>>>>                     Unfortunately profiles didn't help me much, I
>>>>>>                     think we are collecting the profiles from an
>>>>>>                     active volume, so it has a lot of information
>>>>>>                     that is not pertaining to dd so it is
>>>>>>                     difficult to find the contributions of dd. So
>>>>>>                     I went through your post again and found
>>>>>>                     something I didn't pay much attention to
>>>>>>                     earlier i.e. oflag=sync, so did my own tests
>>>>>>                     on my setup with FUSE so sent that reply.
>>>>>>
>>>>>>
>>>>>>                         Pat
>>>>>>
>>>>>>
>>>>>>
>>>>>>                         On 05/10/2017 12:15 PM, Pranith Kumar
>>>>>>                         Karampuri wrote:
>>>>>>>                         Okay good. At least this validates my
>>>>>>>                         doubts. Handling O_SYNC in gluster NFS
>>>>>>>                         and fuse is a bit different.
>>>>>>>                         When application opens a file with
>>>>>>>                         O_SYNC on fuse mount then each write
>>>>>>>                         syscall has to be written to disk as
>>>>>>>                         part of the syscall where as in case of
>>>>>>>                         NFS, there is no concept of open. NFS
>>>>>>>                         performs write though a handle saying it
>>>>>>>                         needs to be a synchronous write, so
>>>>>>>                         write() syscall is performed first then
>>>>>>>                         it performs fsync(). so an write on an
>>>>>>>                         fd with O_SYNC becomes write+fsync. I am
>>>>>>>                         suspecting that when multiple threads do
>>>>>>>                         this write+fsync() operation on the same
>>>>>>>                         file, multiple writes are batched
>>>>>>>                         together to be written do disk so the
>>>>>>>                         throughput on the disk is increasing is
>>>>>>>                         my guess.
>>>>>>>
>>>>>>>                         Does it answer your doubts?
>>>>>>>
>>>>>>>                         On Wed, May 10, 2017 at 9:35 PM, Pat
>>>>>>>                         Haley <phaley at mit.edu
>>>>>>>                         <mailto:phaley at mit.edu>> wrote:
>>>>>>>
>>>>>>>
>>>>>>>                             Without the oflag=sync and only a
>>>>>>>                             single test of each, the FUSE is
>>>>>>>                             going faster than NFS:
>>>>>>>
>>>>>>>                             FUSE:
>>>>>>>                             mseas-data2(dri_nascar)% dd
>>>>>>>                             if=/dev/zero count=4096 bs=1048576
>>>>>>>                             of=zeros.txt conv=sync
>>>>>>>                             4096+0 records in
>>>>>>>                             4096+0 records out
>>>>>>>                             4294967296 bytes (4.3 GB) copied,
>>>>>>>                             7.46961 s, 575 MB/s
>>>>>>>
>>>>>>>
>>>>>>>                             NFS
>>>>>>>                             mseas-data2(HYCOM)% dd if=/dev/zero
>>>>>>>                             count=4096 bs=1048576 of=zeros.txt
>>>>>>>                             conv=sync
>>>>>>>                             4096+0 records in
>>>>>>>                             4096+0 records out
>>>>>>>                             4294967296 bytes (4.3 GB) copied,
>>>>>>>                             11.4264 s, 376 MB/s
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>                             On 05/10/2017 11:53 AM, Pranith
>>>>>>>                             Kumar Karampuri wrote:
>>>>>>>>                             Could you let me know the speed
>>>>>>>>                             without oflag=sync on both the
>>>>>>>>                             mounts? No need to collect profiles.
>>>>>>>>
>>>>>>>>                             On Wed, May 10, 2017 at 9:17 PM,
>>>>>>>>                             Pat Haley <phaley at mit.edu
>>>>>>>>                             <mailto:phaley at mit.edu>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>                                 Here is what I see now:
>>>>>>>>
>>>>>>>>                                 [root at mseas-data2 ~]# gluster
>>>>>>>>                                 volume info
>>>>>>>>
>>>>>>>>                                 Volume Name: data-volume
>>>>>>>>                                 Type: Distribute
>>>>>>>>                                 Volume ID:
>>>>>>>>                                 c162161e-2a2d-4dac-b015-f31fd89ceb18
>>>>>>>>                                 Status: Started
>>>>>>>>                                 Number of Bricks: 2
>>>>>>>>                                 Transport-type: tcp
>>>>>>>>                                 Bricks:
>>>>>>>>                                 Brick1: mseas-data2:/mnt/brick1
>>>>>>>>                                 Brick2: mseas-data2:/mnt/brick2
>>>>>>>>                                 Options Reconfigured:
>>>>>>>>                                 diagnostics.count-fop-hits: on
>>>>>>>>                                 diagnostics.latency-measurement: on
>>>>>>>>                                 nfs.exports-auth-enable: on
>>>>>>>>                                 diagnostics.brick-sys-log-level:
>>>>>>>>                                 WARNING
>>>>>>>>                                 performance.readdir-ahead: on
>>>>>>>>                                 nfs.disable: on
>>>>>>>>                                 nfs.export-volumes: off
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>                                 On 05/10/2017 11:44 AM, Pranith
>>>>>>>>                                 Kumar Karampuri wrote:
>>>>>>>>>                                 Is this the volume info you have?
>>>>>>>>>
>>>>>>>>>                                 >/[root at mseas-data2
>>>>>>>>>                                 <http://www.gluster.org/mailman/listinfo/gluster-users>
>>>>>>>>>                                 ~]# gluster volume info />//>/Volume Name: data-volume />/Type: Distribute />/Volume ID:
>>>>>>>>>                                 c162161e-2a2d-4dac-b015-f31fd89ceb18
>>>>>>>>>                                 />/Status: Started />/Number of Bricks: 2 />/Transport-type: tcp />/Bricks: />/Brick1: mseas-data2:/mnt/brick1 />/Brick2: mseas-data2:/mnt/brick2 />/Options Reconfigured: />/performance.readdir-ahead: on />/nfs.disable: on />/nfs.export-volumes: off /
>>>>>>>>>                                 ​I copied this from old thread
>>>>>>>>>                                 from 2016. This is distribute
>>>>>>>>>                                 volume. Did you change any of
>>>>>>>>>                                 the options in between?
>>>>>>>>
>>>>>>>>                                 -- 
>>>>>>>>
>>>>>>>>                                 -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>>>>                                 Pat Haley                          Email:phaley at mit.edu
>>>>>>>>                                 <mailto:phaley at mit.edu>
>>>>>>>>                                 Center for Ocean Engineering       Phone:  (617) 253-6824
>>>>>>>>                                 Dept. of Mechanical Engineering    Fax:    (617) 253-8125
>>>>>>>>                                 MIT, Room 5-213http://web.mit.edu/phaley/www/
>>>>>>>>                                 77 Massachusetts Avenue
>>>>>>>>                                 Cambridge, MA  02139-4301
>>>>>>>>
>>>>>>>>                             -- 
>>>>>>>>                             Pranith
>>>>>>>
>>>>>>>                             -- 
>>>>>>>
>>>>>>>                             -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>>>                             Pat Haley                          Email:phaley at mit.edu <mailto:phaley at mit.edu>
>>>>>>>                             Center for Ocean Engineering       Phone:  (617) 253-6824
>>>>>>>                             Dept. of Mechanical Engineering    Fax:    (617) 253-8125
>>>>>>>                             MIT, Room 5-213http://web.mit.edu/phaley/www/
>>>>>>>                             77 Massachusetts Avenue
>>>>>>>                             Cambridge, MA  02139-4301
>>>>>>>
>>>>>>>                         -- 
>>>>>>>                         Pranith
>>>>>>
>>>>>>                         -- 
>>>>>>
>>>>>>                         -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>>                         Pat Haley                          Email:phaley at mit.edu <mailto:phaley at mit.edu>
>>>>>>                         Center for Ocean Engineering       Phone:  (617) 253-6824
>>>>>>                         Dept. of Mechanical Engineering    Fax:    (617) 253-8125
>>>>>>                         MIT, Room 5-213http://web.mit.edu/phaley/www/
>>>>>>                         77 Massachusetts Avenue
>>>>>>                         Cambridge, MA  02139-4301
>>>>>>
>>>>>>                     -- 
>>>>>>                     Pranith
>>>>>
>>>>>                     -- 
>>>>>
>>>>>                     -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>                     Pat Haley                          Email:phaley at mit.edu <mailto:phaley at mit.edu>
>>>>>                     Center for Ocean Engineering       Phone:  (617) 253-6824
>>>>>                     Dept. of Mechanical Engineering    Fax:    (617) 253-8125
>>>>>                     MIT, Room 5-213http://web.mit.edu/phaley/www/
>>>>>                     77 Massachusetts Avenue
>>>>>                     Cambridge, MA  02139-4301
>>>>>
>>>>>                 -- 
>>>>>                 Pranith
>>>>
>>>>                 -- 
>>>>
>>>>                 -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>                 Pat Haley                          Email:phaley at mit.edu <mailto:phaley at mit.edu>
>>>>                 Center for Ocean Engineering       Phone:  (617) 253-6824
>>>>                 Dept. of Mechanical Engineering    Fax:    (617) 253-8125
>>>>                 MIT, Room 5-213http://web.mit.edu/phaley/www/
>>>>                 77 Massachusetts Avenue
>>>>                 Cambridge, MA  02139-4301
>>>>
>>>>             -- 
>>>>             Pranith
>>>
>>>             -- 
>>>
>>>             -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>             Pat Haley                          Email:phaley at mit.edu <mailto:phaley at mit.edu>
>>>             Center for Ocean Engineering       Phone:  (617) 253-6824
>>>             Dept. of Mechanical Engineering    Fax:    (617) 253-8125
>>>             MIT, Room 5-213http://web.mit.edu/phaley/www/
>>>             77 Massachusetts Avenue
>>>             Cambridge, MA  02139-4301
>>>
>>>         -- 
>>>         Pranith
>>
>>         -- 
>>
>>         -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>         Pat Haley                          Email:phaley at mit.edu <mailto:phaley at mit.edu>
>>         Center for Ocean Engineering       Phone:  (617) 253-6824
>>         Dept. of Mechanical Engineering    Fax:    (617) 253-8125
>>         MIT, Room 5-213http://web.mit.edu/phaley/www/
>>         77 Massachusetts Avenue
>>         Cambridge, MA  02139-4301
>>
>>
>>
>>
>>     -- 
>>     Pranith
>
>     -- 
>
>     -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>     Pat Haley                          Email:phaley at mit.edu <mailto:phaley at mit.edu>
>     Center for Ocean Engineering       Phone:  (617) 253-6824
>     Dept. of Mechanical Engineering    Fax:    (617) 253-8125
>     MIT, Room 5-213http://web.mit.edu/phaley/www/
>     77 Massachusetts Avenue
>     Cambridge, MA  02139-4301
>
>
>
>
> -- 
> Pranith

-- 

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Pat Haley                          Email:  phaley at mit.edu
Center for Ocean Engineering       Phone:  (617) 253-6824
Dept. of Mechanical Engineering    Fax:    (617) 253-8125
MIT, Room 5-213                    http://web.mit.edu/phaley/www/
77 Massachusetts Avenue
Cambridge, MA  02139-4301

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170530/3b7a903d/attachment.html>


More information about the Gluster-users mailing list