[Gluster-users] Slow write times to gluster disk

Pat Haley phaley at mit.edu
Tue May 30 15:46:18 UTC 2017


Hi Pranith,

Thanks for the tip.  We now have the gluster volume mounted under 
/home.  What tests do you recommend we run?

Thanks

Pat


On 05/17/2017 05:01 AM, Pranith Kumar Karampuri wrote:
>
>
> On Tue, May 16, 2017 at 9:20 PM, Pat Haley <phaley at mit.edu 
> <mailto:phaley at mit.edu>> wrote:
>
>
>     Hi Pranith,
>
>     Sorry for the delay.  I never saw received your reply (but I did
>     receive Ben Turner's follow-up to your reply).  So we tried to
>     create a gluster volume under /home using different variations of
>
>     gluster volume create test-volume mseas-data2:/home/gbrick_test_1
>     mseas-data2:/home/gbrick_test_2 transport tcp
>
>     However we keep getting errors of the form
>
>     Wrong brick type: transport, use <HOSTNAME>:<export-dir-abs-path>
>
>     Any thoughts on what we're doing wrong?
>
>
> You should give transport tcp at the beginning I think. Anyways, 
> transport tcp is the default, so no need to specify so remove those 
> two words from the CLI.
>
>
>     Also do you have a list of the test we should be running once we
>     get this volume created?  Given the time-zone difference it might
>     help if we can run a small battery of tests and post the results
>     rather than test-post-new test-post... .
>
>
> This is the first time I am doing performance analysis on users as far 
> as I remember. In our team there are separate engineers who do these 
> tests. Ben who replied earlier is one such engineer.
>
> Ben,
>     Have any suggestions?
>
>
>     Thanks
>
>     Pat
>
>
>
>     On 05/11/2017 12:06 PM, Pranith Kumar Karampuri wrote:
>>
>>
>>     On Thu, May 11, 2017 at 9:32 PM, Pat Haley <phaley at mit.edu
>>     <mailto:phaley at mit.edu>> wrote:
>>
>>
>>         Hi Pranith,
>>
>>         The /home partition is mounted as ext4
>>         /home              ext4 defaults,usrquota,grpquota      1 2
>>
>>         The brick partitions are mounted ax xfs
>>         /mnt/brick1  xfs defaults        0 0
>>         /mnt/brick2  xfs defaults        0 0
>>
>>         Will this cause a problem with creating a volume under /home?
>>
>>
>>     I don't think the bottleneck is disk. You can do the same tests
>>     you did on your new volume to confirm?
>>
>>
>>         Pat
>>
>>
>>
>>         On 05/11/2017 11:32 AM, Pranith Kumar Karampuri wrote:
>>>
>>>
>>>         On Thu, May 11, 2017 at 8:57 PM, Pat Haley <phaley at mit.edu
>>>         <mailto:phaley at mit.edu>> wrote:
>>>
>>>
>>>             Hi Pranith,
>>>
>>>             Unfortunately, we don't have similar hardware for a
>>>             small scale test. All we have is our production hardware.
>>>
>>>
>>>         You said something about /home partition which has lesser
>>>         disks, we can create plain distribute volume inside one of
>>>         those directories. After we are done, we can remove the
>>>         setup. What do you say?
>>>
>>>
>>>             Pat
>>>
>>>
>>>
>>>
>>>             On 05/11/2017 07:05 AM, Pranith Kumar Karampuri wrote:
>>>>
>>>>
>>>>             On Thu, May 11, 2017 at 2:48 AM, Pat Haley
>>>>             <phaley at mit.edu <mailto:phaley at mit.edu>> wrote:
>>>>
>>>>
>>>>                 Hi Pranith,
>>>>
>>>>                 Since we are mounting the partitions as the bricks,
>>>>                 I tried the dd test writing to
>>>>                 <brick-path>/.glusterfs/<file-to-be-removed-after-test>.
>>>>                 The results without oflag=sync were 1.6 Gb/s
>>>>                 (faster than gluster but not as fast as I was
>>>>                 expecting given the 1.2 Gb/s to the no-gluster area
>>>>                 w/ fewer disks).
>>>>
>>>>
>>>>             Okay, then 1.6Gb/s is what we need to target for,
>>>>             considering your volume is just distribute. Is there
>>>>             any way you can do tests on similar hardware but at a
>>>>             small scale? Just so we can run the workload to learn
>>>>             more about the bottlenecks in the system? We can
>>>>             probably try to get the speed to 1.2Gb/s on your /home
>>>>             partition you were telling me yesterday. Let me know if
>>>>             that is something you are okay to do.
>>>>
>>>>
>>>>                 Pat
>>>>
>>>>
>>>>
>>>>                 On 05/10/2017 01:27 PM, Pranith Kumar Karampuri wrote:
>>>>>
>>>>>
>>>>>                 On Wed, May 10, 2017 at 10:15 PM, Pat Haley
>>>>>                 <phaley at mit.edu <mailto:phaley at mit.edu>> wrote:
>>>>>
>>>>>
>>>>>                     Hi Pranith,
>>>>>
>>>>>                     Not entirely sure (this isn't my area of
>>>>>                     expertise). I'll run your answer by some other
>>>>>                     people who are more familiar with this.
>>>>>
>>>>>                     I am also uncertain about how to interpret the
>>>>>                     results when we also add the dd tests writing
>>>>>                     to the /home area (no gluster, still on the
>>>>>                     same machine)
>>>>>
>>>>>                       * dd test without oflag=sync (rough average
>>>>>                         of multiple tests)
>>>>>                           o gluster w/ fuse mount : 570 Mb/s
>>>>>                           o gluster w/ nfs mount: 390 Mb/s
>>>>>                           o nfs (no gluster):  1.2 Gb/s
>>>>>                       * dd test with oflag=sync (rough average of
>>>>>                         multiple tests)
>>>>>                           o gluster w/ fuse mount:  5 Mb/s
>>>>>                           o gluster w/ nfs mount: 200 Mb/s
>>>>>                           o nfs (no gluster): 20 Mb/s
>>>>>
>>>>>                     Given that the non-gluster area is a RAID-6 of
>>>>>                     4 disks while each brick of the gluster area
>>>>>                     is a RAID-6 of 32 disks, I would naively
>>>>>                     expect the writes to the gluster area to be
>>>>>                     roughly 8x faster than to the non-gluster.
>>>>>
>>>>>
>>>>>                 I think a better test is to try and write to a
>>>>>                 file using nfs without any gluster to a location
>>>>>                 that is not inside the brick but someother
>>>>>                 location that is on same disk(s). If you are
>>>>>                 mounting the partition as the brick, then we can
>>>>>                 write to a file inside .glusterfs directory,
>>>>>                 something like
>>>>>                 <brick-path>/.glusterfs/<file-to-be-removed-after-test>.
>>>>>
>>>>>
>>>>>
>>>>>                     I still think we have a speed issue, I can't
>>>>>                     tell if fuse vs nfs is part of the problem.
>>>>>
>>>>>
>>>>>                 I got interested in the post because I read that
>>>>>                 fuse speed is lesser than nfs speed which is
>>>>>                 counter-intuitive to my understanding. So wanted
>>>>>                 clarifications. Now that I got my clarifications
>>>>>                 where fuse outperformed nfs without sync, we can
>>>>>                 resume testing as described above and try to find
>>>>>                 what it is. Based on your email-id I am guessing
>>>>>                 you are from Boston and I am from Bangalore so if
>>>>>                 you are okay with doing this debugging for
>>>>>                 multiple days because of timezones, I will be
>>>>>                 happy to help. Please be a bit patient with me, I
>>>>>                 am under a release crunch but I am very curious
>>>>>                 with the problem you posted.
>>>>>
>>>>>                     Was there anything useful in the profiles?
>>>>>
>>>>>
>>>>>                 Unfortunately profiles didn't help me much, I
>>>>>                 think we are collecting the profiles from an
>>>>>                 active volume, so it has a lot of information that
>>>>>                 is not pertaining to dd so it is difficult to find
>>>>>                 the contributions of dd. So I went through your
>>>>>                 post again and found something I didn't pay much
>>>>>                 attention to earlier i.e. oflag=sync, so did my
>>>>>                 own tests on my setup with FUSE so sent that reply.
>>>>>
>>>>>
>>>>>                     Pat
>>>>>
>>>>>
>>>>>
>>>>>                     On 05/10/2017 12:15 PM, Pranith Kumar
>>>>>                     Karampuri wrote:
>>>>>>                     Okay good. At least this validates my doubts.
>>>>>>                     Handling O_SYNC in gluster NFS and fuse is a
>>>>>>                     bit different.
>>>>>>                     When application opens a file with O_SYNC on
>>>>>>                     fuse mount then each write syscall has to be
>>>>>>                     written to disk as part of the syscall where
>>>>>>                     as in case of NFS, there is no concept of
>>>>>>                     open. NFS performs write though a handle
>>>>>>                     saying it needs to be a synchronous write, so
>>>>>>                     write() syscall is performed first then it
>>>>>>                     performs fsync(). so an write on an fd with
>>>>>>                     O_SYNC becomes write+fsync. I am suspecting
>>>>>>                     that when multiple threads do this
>>>>>>                     write+fsync() operation on the same file,
>>>>>>                     multiple writes are batched together to be
>>>>>>                     written do disk so the throughput on the disk
>>>>>>                     is increasing is my guess.
>>>>>>
>>>>>>                     Does it answer your doubts?
>>>>>>
>>>>>>                     On Wed, May 10, 2017 at 9:35 PM, Pat Haley
>>>>>>                     <phaley at mit.edu <mailto:phaley at mit.edu>> wrote:
>>>>>>
>>>>>>
>>>>>>                         Without the oflag=sync and only a single
>>>>>>                         test of each, the FUSE is going faster
>>>>>>                         than NFS:
>>>>>>
>>>>>>                         FUSE:
>>>>>>                         mseas-data2(dri_nascar)% dd if=/dev/zero
>>>>>>                         count=4096 bs=1048576 of=zeros.txt conv=sync
>>>>>>                         4096+0 records in
>>>>>>                         4096+0 records out
>>>>>>                         4294967296 bytes (4.3 GB) copied, 7.46961
>>>>>>                         s, 575 MB/s
>>>>>>
>>>>>>
>>>>>>                         NFS
>>>>>>                         mseas-data2(HYCOM)% dd if=/dev/zero
>>>>>>                         count=4096 bs=1048576 of=zeros.txt conv=sync
>>>>>>                         4096+0 records in
>>>>>>                         4096+0 records out
>>>>>>                         4294967296 bytes (4.3 GB) copied, 11.4264
>>>>>>                         s, 376 MB/s
>>>>>>
>>>>>>
>>>>>>
>>>>>>                         On 05/10/2017 11:53 AM, Pranith Kumar
>>>>>>                         Karampuri wrote:
>>>>>>>                         Could you let me know the speed without
>>>>>>>                         oflag=sync on both the mounts? No need
>>>>>>>                         to collect profiles.
>>>>>>>
>>>>>>>                         On Wed, May 10, 2017 at 9:17 PM, Pat
>>>>>>>                         Haley <phaley at mit.edu
>>>>>>>                         <mailto:phaley at mit.edu>> wrote:
>>>>>>>
>>>>>>>
>>>>>>>                             Here is what I see now:
>>>>>>>
>>>>>>>                             [root at mseas-data2 ~]# gluster volume
>>>>>>>                             info
>>>>>>>
>>>>>>>                             Volume Name: data-volume
>>>>>>>                             Type: Distribute
>>>>>>>                             Volume ID:
>>>>>>>                             c162161e-2a2d-4dac-b015-f31fd89ceb18
>>>>>>>                             Status: Started
>>>>>>>                             Number of Bricks: 2
>>>>>>>                             Transport-type: tcp
>>>>>>>                             Bricks:
>>>>>>>                             Brick1: mseas-data2:/mnt/brick1
>>>>>>>                             Brick2: mseas-data2:/mnt/brick2
>>>>>>>                             Options Reconfigured:
>>>>>>>                             diagnostics.count-fop-hits: on
>>>>>>>                             diagnostics.latency-measurement: on
>>>>>>>                             nfs.exports-auth-enable: on
>>>>>>>                             diagnostics.brick-sys-log-level: WARNING
>>>>>>>                             performance.readdir-ahead: on
>>>>>>>                             nfs.disable: on
>>>>>>>                             nfs.export-volumes: off
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>                             On 05/10/2017 11:44 AM, Pranith
>>>>>>>                             Kumar Karampuri wrote:
>>>>>>>>                             Is this the volume info you have?
>>>>>>>>
>>>>>>>>                             >/[root at mseas-data2
>>>>>>>>                             <http://www.gluster.org/mailman/listinfo/gluster-users>
>>>>>>>>                             ~]# gluster volume info />//>/Volume Name: data-volume />/Type: Distribute />/Volume ID:
>>>>>>>>                             c162161e-2a2d-4dac-b015-f31fd89ceb18 />/Status: Started />/Number of Bricks: 2 />/Transport-type: tcp />/Bricks: />/Brick1: mseas-data2:/mnt/brick1 />/Brick2: mseas-data2:/mnt/brick2 />/Options Reconfigured: />/performance.readdir-ahead: on />/nfs.disable: on />/nfs.export-volumes: off /
>>>>>>>>                             ​I copied this from old thread from
>>>>>>>>                             2016. This is distribute volume.
>>>>>>>>                             Did you change any of the options
>>>>>>>>                             in between?
>>>>>>>
>>>>>>>                             -- 
>>>>>>>
>>>>>>>                             -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>>>                             Pat Haley                          Email:phaley at mit.edu <mailto:phaley at mit.edu>
>>>>>>>                             Center for Ocean Engineering       Phone:  (617) 253-6824
>>>>>>>                             Dept. of Mechanical Engineering    Fax:    (617) 253-8125
>>>>>>>                             MIT, Room 5-213http://web.mit.edu/phaley/www/
>>>>>>>                             77 Massachusetts Avenue
>>>>>>>                             Cambridge, MA  02139-4301
>>>>>>>
>>>>>>>                         -- 
>>>>>>>                         Pranith
>>>>>>
>>>>>>                         -- 
>>>>>>
>>>>>>                         -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>>                         Pat Haley                          Email:phaley at mit.edu <mailto:phaley at mit.edu>
>>>>>>                         Center for Ocean Engineering       Phone:  (617) 253-6824
>>>>>>                         Dept. of Mechanical Engineering    Fax:    (617) 253-8125
>>>>>>                         MIT, Room 5-213http://web.mit.edu/phaley/www/
>>>>>>                         77 Massachusetts Avenue
>>>>>>                         Cambridge, MA  02139-4301
>>>>>>
>>>>>>                     -- 
>>>>>>                     Pranith
>>>>>
>>>>>                     -- 
>>>>>
>>>>>                     -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>                     Pat Haley                          Email:phaley at mit.edu <mailto:phaley at mit.edu>
>>>>>                     Center for Ocean Engineering       Phone:  (617) 253-6824
>>>>>                     Dept. of Mechanical Engineering    Fax:    (617) 253-8125
>>>>>                     MIT, Room 5-213http://web.mit.edu/phaley/www/
>>>>>                     77 Massachusetts Avenue
>>>>>                     Cambridge, MA  02139-4301
>>>>>
>>>>>                 -- 
>>>>>                 Pranith
>>>>
>>>>                 -- 
>>>>
>>>>                 -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>                 Pat Haley                          Email:phaley at mit.edu <mailto:phaley at mit.edu>
>>>>                 Center for Ocean Engineering       Phone:  (617) 253-6824
>>>>                 Dept. of Mechanical Engineering    Fax:    (617) 253-8125
>>>>                 MIT, Room 5-213http://web.mit.edu/phaley/www/
>>>>                 77 Massachusetts Avenue
>>>>                 Cambridge, MA  02139-4301
>>>>
>>>>             -- 
>>>>             Pranith
>>>
>>>             -- 
>>>
>>>             -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>             Pat Haley                          Email:phaley at mit.edu <mailto:phaley at mit.edu>
>>>             Center for Ocean Engineering       Phone:  (617) 253-6824
>>>             Dept. of Mechanical Engineering    Fax:    (617) 253-8125
>>>             MIT, Room 5-213http://web.mit.edu/phaley/www/
>>>             77 Massachusetts Avenue
>>>             Cambridge, MA  02139-4301
>>>
>>>         -- 
>>>         Pranith
>>
>>         -- 
>>
>>         -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>         Pat Haley                          Email:phaley at mit.edu <mailto:phaley at mit.edu>
>>         Center for Ocean Engineering       Phone:  (617) 253-6824
>>         Dept. of Mechanical Engineering    Fax:    (617) 253-8125
>>         MIT, Room 5-213http://web.mit.edu/phaley/www/
>>         77 Massachusetts Avenue
>>         Cambridge, MA  02139-4301
>>
>>     -- 
>>     Pranith
>
>     -- 
>
>     -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>     Pat Haley                          Email:phaley at mit.edu <mailto:phaley at mit.edu>
>     Center for Ocean Engineering       Phone:  (617) 253-6824
>     Dept. of Mechanical Engineering    Fax:    (617) 253-8125
>     MIT, Room 5-213http://web.mit.edu/phaley/www/
>     77 Massachusetts Avenue
>     Cambridge, MA  02139-4301
>
>
>
>
> -- 
> Pranith

-- 

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Pat Haley                          Email:  phaley at mit.edu
Center for Ocean Engineering       Phone:  (617) 253-6824
Dept. of Mechanical Engineering    Fax:    (617) 253-8125
MIT, Room 5-213                    http://web.mit.edu/phaley/www/
77 Massachusetts Avenue
Cambridge, MA  02139-4301

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170530/5813a91a/attachment.html>


More information about the Gluster-users mailing list