[Gluster-users] Slow write times to gluster disk
Pat Haley
phaley at mit.edu
Wed May 31 01:40:34 UTC 2017
Hi Pranith,
The "dd" command was:
dd if=/dev/zero count=4096 bs=1048576 of=zeros.txt conv=sync
There were 2 instances where dd reported 22 seconds. The output from the
dd tests are in
http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/dd_testvol_gluster.txt
Pat
On 05/30/2017 09:27 PM, Pranith Kumar Karampuri wrote:
> Pat,
> What is the command you used? As per the following output, it
> seems like at least one write operation took 16 seconds. Which is
> really bad.
> 96.39 1165.10 us 89.00 us*16487014.00 us* 393212 WRITE
>
>
> On Tue, May 30, 2017 at 10:36 PM, Pat Haley <phaley at mit.edu
> <mailto:phaley at mit.edu>> wrote:
>
>
> Hi Pranith,
>
> I ran the same 'dd' test both in the gluster test volume and in
> the .glusterfs directory of each brick. The median results (12 dd
> trials in each test) are similar to before
>
> * gluster test volume: 586.5 MB/s
> * bricks (in .glusterfs): 1.4 GB/s
>
> The profile for the gluster test-volume is in
>
> http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/profile_testvol_gluster.txt
> <http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/profile_testvol_gluster.txt>
>
> Thanks
>
> Pat
>
>
>
>
> On 05/30/2017 12:10 PM, Pranith Kumar Karampuri wrote:
>> Let's start with the same 'dd' test we were testing with to see,
>> what the numbers are. Please provide profile numbers for the
>> same. From there on we will start tuning the volume to see what
>> we can do.
>>
>> On Tue, May 30, 2017 at 9:16 PM, Pat Haley <phaley at mit.edu
>> <mailto:phaley at mit.edu>> wrote:
>>
>>
>> Hi Pranith,
>>
>> Thanks for the tip. We now have the gluster volume mounted
>> under /home. What tests do you recommend we run?
>>
>> Thanks
>>
>> Pat
>>
>>
>>
>> On 05/17/2017 05:01 AM, Pranith Kumar Karampuri wrote:
>>>
>>>
>>> On Tue, May 16, 2017 at 9:20 PM, Pat Haley <phaley at mit.edu
>>> <mailto:phaley at mit.edu>> wrote:
>>>
>>>
>>> Hi Pranith,
>>>
>>> Sorry for the delay. I never saw received your reply
>>> (but I did receive Ben Turner's follow-up to your
>>> reply). So we tried to create a gluster volume under
>>> /home using different variations of
>>>
>>> gluster volume create test-volume
>>> mseas-data2:/home/gbrick_test_1
>>> mseas-data2:/home/gbrick_test_2 transport tcp
>>>
>>> However we keep getting errors of the form
>>>
>>> Wrong brick type: transport, use
>>> <HOSTNAME>:<export-dir-abs-path>
>>>
>>> Any thoughts on what we're doing wrong?
>>>
>>>
>>> You should give transport tcp at the beginning I think.
>>> Anyways, transport tcp is the default, so no need to specify
>>> so remove those two words from the CLI.
>>>
>>>
>>> Also do you have a list of the test we should be running
>>> once we get this volume created? Given the time-zone
>>> difference it might help if we can run a small battery
>>> of tests and post the results rather than test-post-new
>>> test-post... .
>>>
>>>
>>> This is the first time I am doing performance analysis on
>>> users as far as I remember. In our team there are separate
>>> engineers who do these tests. Ben who replied earlier is one
>>> such engineer.
>>>
>>> Ben,
>>> Have any suggestions?
>>>
>>>
>>> Thanks
>>>
>>> Pat
>>>
>>>
>>>
>>> On 05/11/2017 12:06 PM, Pranith Kumar Karampuri wrote:
>>>>
>>>>
>>>> On Thu, May 11, 2017 at 9:32 PM, Pat Haley
>>>> <phaley at mit.edu <mailto:phaley at mit.edu>> wrote:
>>>>
>>>>
>>>> Hi Pranith,
>>>>
>>>> The /home partition is mounted as ext4
>>>> /home ext4 defaults,usrquota,grpquota 1 2
>>>>
>>>> The brick partitions are mounted ax xfs
>>>> /mnt/brick1 xfs defaults 0 0
>>>> /mnt/brick2 xfs defaults 0 0
>>>>
>>>> Will this cause a problem with creating a volume
>>>> under /home?
>>>>
>>>>
>>>> I don't think the bottleneck is disk. You can do the
>>>> same tests you did on your new volume to confirm?
>>>>
>>>>
>>>> Pat
>>>>
>>>>
>>>>
>>>> On 05/11/2017 11:32 AM, Pranith Kumar Karampuri wrote:
>>>>>
>>>>>
>>>>> On Thu, May 11, 2017 at 8:57 PM, Pat Haley
>>>>> <phaley at mit.edu <mailto:phaley at mit.edu>> wrote:
>>>>>
>>>>>
>>>>> Hi Pranith,
>>>>>
>>>>> Unfortunately, we don't have similar hardware
>>>>> for a small scale test. All we have is our
>>>>> production hardware.
>>>>>
>>>>>
>>>>> You said something about /home partition which has
>>>>> lesser disks, we can create plain distribute
>>>>> volume inside one of those directories. After we
>>>>> are done, we can remove the setup. What do you say?
>>>>>
>>>>>
>>>>> Pat
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 05/11/2017 07:05 AM, Pranith Kumar
>>>>> Karampuri wrote:
>>>>>>
>>>>>>
>>>>>> On Thu, May 11, 2017 at 2:48 AM, Pat Haley
>>>>>> <phaley at mit.edu <mailto:phaley at mit.edu>> wrote:
>>>>>>
>>>>>>
>>>>>> Hi Pranith,
>>>>>>
>>>>>> Since we are mounting the partitions as
>>>>>> the bricks, I tried the dd test writing
>>>>>> to
>>>>>> <brick-path>/.glusterfs/<file-to-be-removed-after-test>.
>>>>>> The results without oflag=sync were 1.6
>>>>>> Gb/s (faster than gluster but not as fast
>>>>>> as I was expecting given the 1.2 Gb/s to
>>>>>> the no-gluster area w/ fewer disks).
>>>>>>
>>>>>>
>>>>>> Okay, then 1.6Gb/s is what we need to target
>>>>>> for, considering your volume is just
>>>>>> distribute. Is there any way you can do tests
>>>>>> on similar hardware but at a small scale?
>>>>>> Just so we can run the workload to learn more
>>>>>> about the bottlenecks in the system? We can
>>>>>> probably try to get the speed to 1.2Gb/s on
>>>>>> your /home partition you were telling me
>>>>>> yesterday. Let me know if that is something
>>>>>> you are okay to do.
>>>>>>
>>>>>>
>>>>>> Pat
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 05/10/2017 01:27 PM, Pranith Kumar
>>>>>> Karampuri wrote:
>>>>>>>
>>>>>>>
>>>>>>> On Wed, May 10, 2017 at 10:15 PM, Pat
>>>>>>> Haley <phaley at mit.edu
>>>>>>> <mailto:phaley at mit.edu>> wrote:
>>>>>>>
>>>>>>>
>>>>>>> Hi Pranith,
>>>>>>>
>>>>>>> Not entirely sure (this isn't my
>>>>>>> area of expertise). I'll run your
>>>>>>> answer by some other people who are
>>>>>>> more familiar with this.
>>>>>>>
>>>>>>> I am also uncertain about how to
>>>>>>> interpret the results when we also
>>>>>>> add the dd tests writing to the
>>>>>>> /home area (no gluster, still on the
>>>>>>> same machine)
>>>>>>>
>>>>>>> * dd test without oflag=sync
>>>>>>> (rough average of multiple tests)
>>>>>>> o gluster w/ fuse mount : 570 Mb/s
>>>>>>> o gluster w/ nfs mount: 390 Mb/s
>>>>>>> o nfs (no gluster): 1.2 Gb/s
>>>>>>> * dd test with oflag=sync (rough
>>>>>>> average of multiple tests)
>>>>>>> o gluster w/ fuse mount: 5 Mb/s
>>>>>>> o gluster w/ nfs mount: 200 Mb/s
>>>>>>> o nfs (no gluster): 20 Mb/s
>>>>>>>
>>>>>>> Given that the non-gluster area is a
>>>>>>> RAID-6 of 4 disks while each brick
>>>>>>> of the gluster area is a RAID-6 of
>>>>>>> 32 disks, I would naively expect the
>>>>>>> writes to the gluster area to be
>>>>>>> roughly 8x faster than to the
>>>>>>> non-gluster.
>>>>>>>
>>>>>>>
>>>>>>> I think a better test is to try and
>>>>>>> write to a file using nfs without any
>>>>>>> gluster to a location that is not inside
>>>>>>> the brick but someother location that is
>>>>>>> on same disk(s). If you are mounting the
>>>>>>> partition as the brick, then we can
>>>>>>> write to a file inside .glusterfs
>>>>>>> directory, something like
>>>>>>> <brick-path>/.glusterfs/<file-to-be-removed-after-test>.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I still think we have a speed issue,
>>>>>>> I can't tell if fuse vs nfs is part
>>>>>>> of the problem.
>>>>>>>
>>>>>>>
>>>>>>> I got interested in the post because I
>>>>>>> read that fuse speed is lesser than nfs
>>>>>>> speed which is counter-intuitive to my
>>>>>>> understanding. So wanted clarifications.
>>>>>>> Now that I got my clarifications where
>>>>>>> fuse outperformed nfs without sync, we
>>>>>>> can resume testing as described above
>>>>>>> and try to find what it is. Based on
>>>>>>> your email-id I am guessing you are from
>>>>>>> Boston and I am from Bangalore so if you
>>>>>>> are okay with doing this debugging for
>>>>>>> multiple days because of timezones, I
>>>>>>> will be happy to help. Please be a bit
>>>>>>> patient with me, I am under a release
>>>>>>> crunch but I am very curious with the
>>>>>>> problem you posted.
>>>>>>>
>>>>>>> Was there anything useful in the
>>>>>>> profiles?
>>>>>>>
>>>>>>>
>>>>>>> Unfortunately profiles didn't help me
>>>>>>> much, I think we are collecting the
>>>>>>> profiles from an active volume, so it
>>>>>>> has a lot of information that is not
>>>>>>> pertaining to dd so it is difficult to
>>>>>>> find the contributions of dd. So I went
>>>>>>> through your post again and found
>>>>>>> something I didn't pay much attention to
>>>>>>> earlier i.e. oflag=sync, so did my own
>>>>>>> tests on my setup with FUSE so sent that
>>>>>>> reply.
>>>>>>>
>>>>>>>
>>>>>>> Pat
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 05/10/2017 12:15 PM, Pranith
>>>>>>> Kumar Karampuri wrote:
>>>>>>>> Okay good. At least this validates
>>>>>>>> my doubts. Handling O_SYNC in
>>>>>>>> gluster NFS and fuse is a bit
>>>>>>>> different.
>>>>>>>> When application opens a file with
>>>>>>>> O_SYNC on fuse mount then each
>>>>>>>> write syscall has to be written to
>>>>>>>> disk as part of the syscall where
>>>>>>>> as in case of NFS, there is no
>>>>>>>> concept of open. NFS performs write
>>>>>>>> though a handle saying it needs to
>>>>>>>> be a synchronous write, so write()
>>>>>>>> syscall is performed first then it
>>>>>>>> performs fsync(). so an write on an
>>>>>>>> fd with O_SYNC becomes write+fsync.
>>>>>>>> I am suspecting that when multiple
>>>>>>>> threads do this write+fsync()
>>>>>>>> operation on the same file,
>>>>>>>> multiple writes are batched
>>>>>>>> together to be written do disk so
>>>>>>>> the throughput on the disk is
>>>>>>>> increasing is my guess.
>>>>>>>>
>>>>>>>> Does it answer your doubts?
>>>>>>>>
>>>>>>>> On Wed, May 10, 2017 at 9:35 PM,
>>>>>>>> Pat Haley <phaley at mit.edu
>>>>>>>> <mailto:phaley at mit.edu>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> Without the oflag=sync and only
>>>>>>>> a single test of each, the FUSE
>>>>>>>> is going faster than NFS:
>>>>>>>>
>>>>>>>> FUSE:
>>>>>>>> mseas-data2(dri_nascar)% dd
>>>>>>>> if=/dev/zero count=4096
>>>>>>>> bs=1048576 of=zeros.txt conv=sync
>>>>>>>> 4096+0 records in
>>>>>>>> 4096+0 records out
>>>>>>>> 4294967296 bytes (4.3 GB)
>>>>>>>> copied, 7.46961 s, 575 MB/s
>>>>>>>>
>>>>>>>>
>>>>>>>> NFS
>>>>>>>> mseas-data2(HYCOM)% dd
>>>>>>>> if=/dev/zero count=4096
>>>>>>>> bs=1048576 of=zeros.txt conv=sync
>>>>>>>> 4096+0 records in
>>>>>>>> 4096+0 records out
>>>>>>>> 4294967296 bytes (4.3 GB)
>>>>>>>> copied, 11.4264 s, 376 MB/s
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 05/10/2017 11:53 AM, Pranith
>>>>>>>> Kumar Karampuri wrote:
>>>>>>>>> Could you let me know the
>>>>>>>>> speed without oflag=sync on
>>>>>>>>> both the mounts? No need to
>>>>>>>>> collect profiles.
>>>>>>>>>
>>>>>>>>> On Wed, May 10, 2017 at 9:17
>>>>>>>>> PM, Pat Haley <phaley at mit.edu
>>>>>>>>> <mailto:phaley at mit.edu>> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Here is what I see now:
>>>>>>>>>
>>>>>>>>> [root at mseas-data2 ~]#
>>>>>>>>> gluster volume info
>>>>>>>>>
>>>>>>>>> Volume Name: data-volume
>>>>>>>>> Type: Distribute
>>>>>>>>> Volume ID:
>>>>>>>>> c162161e-2a2d-4dac-b015-f31fd89ceb18
>>>>>>>>> Status: Started
>>>>>>>>> Number of Bricks: 2
>>>>>>>>> Transport-type: tcp
>>>>>>>>> Bricks:
>>>>>>>>> Brick1:
>>>>>>>>> mseas-data2:/mnt/brick1
>>>>>>>>> Brick2:
>>>>>>>>> mseas-data2:/mnt/brick2
>>>>>>>>> Options Reconfigured:
>>>>>>>>> diagnostics.count-fop-hits: on
>>>>>>>>> diagnostics.latency-measurement:
>>>>>>>>> on
>>>>>>>>> nfs.exports-auth-enable: on
>>>>>>>>> diagnostics.brick-sys-log-level:
>>>>>>>>> WARNING
>>>>>>>>> performance.readdir-ahead: on
>>>>>>>>> nfs.disable: on
>>>>>>>>> nfs.export-volumes: off
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 05/10/2017 11:44 AM,
>>>>>>>>> Pranith Kumar Karampuri wrote:
>>>>>>>>>> Is this the volume info
>>>>>>>>>> you have?
>>>>>>>>>>
>>>>>>>>>> >/[root at mseas-data2
>>>>>>>>>> <http://www.gluster.org/mailman/listinfo/gluster-users>
>>>>>>>>>> ~]# gluster volume info />//>/Volume Name: data-volume />/Type: Distribute />/Volume ID:
>>>>>>>>>> c162161e-2a2d-4dac-b015-f31fd89ceb18
>>>>>>>>>> />/Status: Started />/Number of Bricks: 2 />/Transport-type: tcp />/Bricks: />/Brick1:
>>>>>>>>>> mseas-data2:/mnt/brick1 />/Brick2:
>>>>>>>>>> mseas-data2:/mnt/brick2 />/Options Reconfigured: />/performance.readdir-ahead:
>>>>>>>>>> on />/nfs.disable: on />/nfs.export-volumes: off /
>>>>>>>>>> I copied this from old
>>>>>>>>>> thread from 2016. This is
>>>>>>>>>> distribute volume. Did
>>>>>>>>>> you change any of the
>>>>>>>>>> options in between?
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>>
>>>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>>>>> Pat Haley Email:phaley at mit.edu
>>>>>>>>> <mailto:phaley at mit.edu>
>>>>>>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>>>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>>>>>>> MIT, Room 5-213http://web.mit.edu/phaley/www/
>>>>>>>>> 77 Massachusetts Avenue
>>>>>>>>> Cambridge, MA 02139-4301
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Pranith
>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>>>> Pat Haley Email:phaley at mit.edu
>>>>>>>> <mailto:phaley at mit.edu>
>>>>>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>>>>>> MIT, Room 5-213http://web.mit.edu/phaley/www/
>>>>>>>> 77 Massachusetts Avenue
>>>>>>>> Cambridge, MA 02139-4301
>>>>>>>>
>>>>>>>> --
>>>>>>>> Pranith
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>>> Pat Haley Email:phaley at mit.edu <mailto:phaley at mit.edu>
>>>>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>>>>> MIT, Room 5-213http://web.mit.edu/phaley/www/
>>>>>>> 77 Massachusetts Avenue
>>>>>>> Cambridge, MA 02139-4301
>>>>>>>
>>>>>>> --
>>>>>>> Pranith
>>>>>>
>>>>>> --
>>>>>>
>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>> Pat Haley Email:phaley at mit.edu <mailto:phaley at mit.edu>
>>>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>>>> MIT, Room 5-213http://web.mit.edu/phaley/www/
>>>>>> 77 Massachusetts Avenue
>>>>>> Cambridge, MA 02139-4301
>>>>>>
>>>>>> --
>>>>>> Pranith
>>>>>
>>>>> --
>>>>>
>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>> Pat Haley Email:phaley at mit.edu <mailto:phaley at mit.edu>
>>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>>> MIT, Room 5-213http://web.mit.edu/phaley/www/
>>>>> 77 Massachusetts Avenue
>>>>> Cambridge, MA 02139-4301
>>>>>
>>>>> --
>>>>> Pranith
>>>>
>>>> --
>>>>
>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>> Pat Haley Email:phaley at mit.edu <mailto:phaley at mit.edu>
>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>> MIT, Room 5-213http://web.mit.edu/phaley/www/
>>>> 77 Massachusetts Avenue
>>>> Cambridge, MA 02139-4301
>>>>
>>>> --
>>>> Pranith
>>>
>>> --
>>>
>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>> Pat Haley Email:phaley at mit.edu <mailto:phaley at mit.edu>
>>> Center for Ocean Engineering Phone: (617) 253-6824
>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>> MIT, Room 5-213http://web.mit.edu/phaley/www/
>>> 77 Massachusetts Avenue
>>> Cambridge, MA 02139-4301
>>>
>>>
>>>
>>>
>>> --
>>> Pranith
>>
>> --
>>
>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>> Pat Haley Email:phaley at mit.edu <mailto:phaley at mit.edu>
>> Center for Ocean Engineering Phone: (617) 253-6824
>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>> MIT, Room 5-213http://web.mit.edu/phaley/www/
>> 77 Massachusetts Avenue
>> Cambridge, MA 02139-4301
>>
>>
>>
>>
>> --
>> Pranith
>
> --
>
> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> Pat Haley Email:phaley at mit.edu <mailto:phaley at mit.edu>
> Center for Ocean Engineering Phone: (617) 253-6824
> Dept. of Mechanical Engineering Fax: (617) 253-8125
> MIT, Room 5-213http://web.mit.edu/phaley/www/
> 77 Massachusetts Avenue
> Cambridge, MA 02139-4301
>
>
>
>
> --
> Pranith
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Pat Haley Email: phaley at mit.edu
Center for Ocean Engineering Phone: (617) 253-6824
Dept. of Mechanical Engineering Fax: (617) 253-8125
MIT, Room 5-213 http://web.mit.edu/phaley/www/
77 Massachusetts Avenue
Cambridge, MA 02139-4301
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170530/f3029b69/attachment-0001.html>
More information about the Gluster-users
mailing list