[Gluster-users] Slow write times to gluster disk
Pat Haley
phaley at mit.edu
Tue May 30 15:46:18 UTC 2017
Hi Pranith,
Thanks for the tip. We now have the gluster volume mounted under
/home. What tests do you recommend we run?
Thanks
Pat
On 05/17/2017 05:01 AM, Pranith Kumar Karampuri wrote:
>
>
> On Tue, May 16, 2017 at 9:20 PM, Pat Haley <phaley at mit.edu
> <mailto:phaley at mit.edu>> wrote:
>
>
> Hi Pranith,
>
> Sorry for the delay. I never saw received your reply (but I did
> receive Ben Turner's follow-up to your reply). So we tried to
> create a gluster volume under /home using different variations of
>
> gluster volume create test-volume mseas-data2:/home/gbrick_test_1
> mseas-data2:/home/gbrick_test_2 transport tcp
>
> However we keep getting errors of the form
>
> Wrong brick type: transport, use <HOSTNAME>:<export-dir-abs-path>
>
> Any thoughts on what we're doing wrong?
>
>
> You should give transport tcp at the beginning I think. Anyways,
> transport tcp is the default, so no need to specify so remove those
> two words from the CLI.
>
>
> Also do you have a list of the test we should be running once we
> get this volume created? Given the time-zone difference it might
> help if we can run a small battery of tests and post the results
> rather than test-post-new test-post... .
>
>
> This is the first time I am doing performance analysis on users as far
> as I remember. In our team there are separate engineers who do these
> tests. Ben who replied earlier is one such engineer.
>
> Ben,
> Have any suggestions?
>
>
> Thanks
>
> Pat
>
>
>
> On 05/11/2017 12:06 PM, Pranith Kumar Karampuri wrote:
>>
>>
>> On Thu, May 11, 2017 at 9:32 PM, Pat Haley <phaley at mit.edu
>> <mailto:phaley at mit.edu>> wrote:
>>
>>
>> Hi Pranith,
>>
>> The /home partition is mounted as ext4
>> /home ext4 defaults,usrquota,grpquota 1 2
>>
>> The brick partitions are mounted ax xfs
>> /mnt/brick1 xfs defaults 0 0
>> /mnt/brick2 xfs defaults 0 0
>>
>> Will this cause a problem with creating a volume under /home?
>>
>>
>> I don't think the bottleneck is disk. You can do the same tests
>> you did on your new volume to confirm?
>>
>>
>> Pat
>>
>>
>>
>> On 05/11/2017 11:32 AM, Pranith Kumar Karampuri wrote:
>>>
>>>
>>> On Thu, May 11, 2017 at 8:57 PM, Pat Haley <phaley at mit.edu
>>> <mailto:phaley at mit.edu>> wrote:
>>>
>>>
>>> Hi Pranith,
>>>
>>> Unfortunately, we don't have similar hardware for a
>>> small scale test. All we have is our production hardware.
>>>
>>>
>>> You said something about /home partition which has lesser
>>> disks, we can create plain distribute volume inside one of
>>> those directories. After we are done, we can remove the
>>> setup. What do you say?
>>>
>>>
>>> Pat
>>>
>>>
>>>
>>>
>>> On 05/11/2017 07:05 AM, Pranith Kumar Karampuri wrote:
>>>>
>>>>
>>>> On Thu, May 11, 2017 at 2:48 AM, Pat Haley
>>>> <phaley at mit.edu <mailto:phaley at mit.edu>> wrote:
>>>>
>>>>
>>>> Hi Pranith,
>>>>
>>>> Since we are mounting the partitions as the bricks,
>>>> I tried the dd test writing to
>>>> <brick-path>/.glusterfs/<file-to-be-removed-after-test>.
>>>> The results without oflag=sync were 1.6 Gb/s
>>>> (faster than gluster but not as fast as I was
>>>> expecting given the 1.2 Gb/s to the no-gluster area
>>>> w/ fewer disks).
>>>>
>>>>
>>>> Okay, then 1.6Gb/s is what we need to target for,
>>>> considering your volume is just distribute. Is there
>>>> any way you can do tests on similar hardware but at a
>>>> small scale? Just so we can run the workload to learn
>>>> more about the bottlenecks in the system? We can
>>>> probably try to get the speed to 1.2Gb/s on your /home
>>>> partition you were telling me yesterday. Let me know if
>>>> that is something you are okay to do.
>>>>
>>>>
>>>> Pat
>>>>
>>>>
>>>>
>>>> On 05/10/2017 01:27 PM, Pranith Kumar Karampuri wrote:
>>>>>
>>>>>
>>>>> On Wed, May 10, 2017 at 10:15 PM, Pat Haley
>>>>> <phaley at mit.edu <mailto:phaley at mit.edu>> wrote:
>>>>>
>>>>>
>>>>> Hi Pranith,
>>>>>
>>>>> Not entirely sure (this isn't my area of
>>>>> expertise). I'll run your answer by some other
>>>>> people who are more familiar with this.
>>>>>
>>>>> I am also uncertain about how to interpret the
>>>>> results when we also add the dd tests writing
>>>>> to the /home area (no gluster, still on the
>>>>> same machine)
>>>>>
>>>>> * dd test without oflag=sync (rough average
>>>>> of multiple tests)
>>>>> o gluster w/ fuse mount : 570 Mb/s
>>>>> o gluster w/ nfs mount: 390 Mb/s
>>>>> o nfs (no gluster): 1.2 Gb/s
>>>>> * dd test with oflag=sync (rough average of
>>>>> multiple tests)
>>>>> o gluster w/ fuse mount: 5 Mb/s
>>>>> o gluster w/ nfs mount: 200 Mb/s
>>>>> o nfs (no gluster): 20 Mb/s
>>>>>
>>>>> Given that the non-gluster area is a RAID-6 of
>>>>> 4 disks while each brick of the gluster area
>>>>> is a RAID-6 of 32 disks, I would naively
>>>>> expect the writes to the gluster area to be
>>>>> roughly 8x faster than to the non-gluster.
>>>>>
>>>>>
>>>>> I think a better test is to try and write to a
>>>>> file using nfs without any gluster to a location
>>>>> that is not inside the brick but someother
>>>>> location that is on same disk(s). If you are
>>>>> mounting the partition as the brick, then we can
>>>>> write to a file inside .glusterfs directory,
>>>>> something like
>>>>> <brick-path>/.glusterfs/<file-to-be-removed-after-test>.
>>>>>
>>>>>
>>>>>
>>>>> I still think we have a speed issue, I can't
>>>>> tell if fuse vs nfs is part of the problem.
>>>>>
>>>>>
>>>>> I got interested in the post because I read that
>>>>> fuse speed is lesser than nfs speed which is
>>>>> counter-intuitive to my understanding. So wanted
>>>>> clarifications. Now that I got my clarifications
>>>>> where fuse outperformed nfs without sync, we can
>>>>> resume testing as described above and try to find
>>>>> what it is. Based on your email-id I am guessing
>>>>> you are from Boston and I am from Bangalore so if
>>>>> you are okay with doing this debugging for
>>>>> multiple days because of timezones, I will be
>>>>> happy to help. Please be a bit patient with me, I
>>>>> am under a release crunch but I am very curious
>>>>> with the problem you posted.
>>>>>
>>>>> Was there anything useful in the profiles?
>>>>>
>>>>>
>>>>> Unfortunately profiles didn't help me much, I
>>>>> think we are collecting the profiles from an
>>>>> active volume, so it has a lot of information that
>>>>> is not pertaining to dd so it is difficult to find
>>>>> the contributions of dd. So I went through your
>>>>> post again and found something I didn't pay much
>>>>> attention to earlier i.e. oflag=sync, so did my
>>>>> own tests on my setup with FUSE so sent that reply.
>>>>>
>>>>>
>>>>> Pat
>>>>>
>>>>>
>>>>>
>>>>> On 05/10/2017 12:15 PM, Pranith Kumar
>>>>> Karampuri wrote:
>>>>>> Okay good. At least this validates my doubts.
>>>>>> Handling O_SYNC in gluster NFS and fuse is a
>>>>>> bit different.
>>>>>> When application opens a file with O_SYNC on
>>>>>> fuse mount then each write syscall has to be
>>>>>> written to disk as part of the syscall where
>>>>>> as in case of NFS, there is no concept of
>>>>>> open. NFS performs write though a handle
>>>>>> saying it needs to be a synchronous write, so
>>>>>> write() syscall is performed first then it
>>>>>> performs fsync(). so an write on an fd with
>>>>>> O_SYNC becomes write+fsync. I am suspecting
>>>>>> that when multiple threads do this
>>>>>> write+fsync() operation on the same file,
>>>>>> multiple writes are batched together to be
>>>>>> written do disk so the throughput on the disk
>>>>>> is increasing is my guess.
>>>>>>
>>>>>> Does it answer your doubts?
>>>>>>
>>>>>> On Wed, May 10, 2017 at 9:35 PM, Pat Haley
>>>>>> <phaley at mit.edu <mailto:phaley at mit.edu>> wrote:
>>>>>>
>>>>>>
>>>>>> Without the oflag=sync and only a single
>>>>>> test of each, the FUSE is going faster
>>>>>> than NFS:
>>>>>>
>>>>>> FUSE:
>>>>>> mseas-data2(dri_nascar)% dd if=/dev/zero
>>>>>> count=4096 bs=1048576 of=zeros.txt conv=sync
>>>>>> 4096+0 records in
>>>>>> 4096+0 records out
>>>>>> 4294967296 bytes (4.3 GB) copied, 7.46961
>>>>>> s, 575 MB/s
>>>>>>
>>>>>>
>>>>>> NFS
>>>>>> mseas-data2(HYCOM)% dd if=/dev/zero
>>>>>> count=4096 bs=1048576 of=zeros.txt conv=sync
>>>>>> 4096+0 records in
>>>>>> 4096+0 records out
>>>>>> 4294967296 bytes (4.3 GB) copied, 11.4264
>>>>>> s, 376 MB/s
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 05/10/2017 11:53 AM, Pranith Kumar
>>>>>> Karampuri wrote:
>>>>>>> Could you let me know the speed without
>>>>>>> oflag=sync on both the mounts? No need
>>>>>>> to collect profiles.
>>>>>>>
>>>>>>> On Wed, May 10, 2017 at 9:17 PM, Pat
>>>>>>> Haley <phaley at mit.edu
>>>>>>> <mailto:phaley at mit.edu>> wrote:
>>>>>>>
>>>>>>>
>>>>>>> Here is what I see now:
>>>>>>>
>>>>>>> [root at mseas-data2 ~]# gluster volume
>>>>>>> info
>>>>>>>
>>>>>>> Volume Name: data-volume
>>>>>>> Type: Distribute
>>>>>>> Volume ID:
>>>>>>> c162161e-2a2d-4dac-b015-f31fd89ceb18
>>>>>>> Status: Started
>>>>>>> Number of Bricks: 2
>>>>>>> Transport-type: tcp
>>>>>>> Bricks:
>>>>>>> Brick1: mseas-data2:/mnt/brick1
>>>>>>> Brick2: mseas-data2:/mnt/brick2
>>>>>>> Options Reconfigured:
>>>>>>> diagnostics.count-fop-hits: on
>>>>>>> diagnostics.latency-measurement: on
>>>>>>> nfs.exports-auth-enable: on
>>>>>>> diagnostics.brick-sys-log-level: WARNING
>>>>>>> performance.readdir-ahead: on
>>>>>>> nfs.disable: on
>>>>>>> nfs.export-volumes: off
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 05/10/2017 11:44 AM, Pranith
>>>>>>> Kumar Karampuri wrote:
>>>>>>>> Is this the volume info you have?
>>>>>>>>
>>>>>>>> >/[root at mseas-data2
>>>>>>>> <http://www.gluster.org/mailman/listinfo/gluster-users>
>>>>>>>> ~]# gluster volume info />//>/Volume Name: data-volume />/Type: Distribute />/Volume ID:
>>>>>>>> c162161e-2a2d-4dac-b015-f31fd89ceb18 />/Status: Started />/Number of Bricks: 2 />/Transport-type: tcp />/Bricks: />/Brick1: mseas-data2:/mnt/brick1 />/Brick2: mseas-data2:/mnt/brick2 />/Options Reconfigured: />/performance.readdir-ahead: on />/nfs.disable: on />/nfs.export-volumes: off /
>>>>>>>> I copied this from old thread from
>>>>>>>> 2016. This is distribute volume.
>>>>>>>> Did you change any of the options
>>>>>>>> in between?
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>>> Pat Haley Email:phaley at mit.edu <mailto:phaley at mit.edu>
>>>>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>>>>> MIT, Room 5-213http://web.mit.edu/phaley/www/
>>>>>>> 77 Massachusetts Avenue
>>>>>>> Cambridge, MA 02139-4301
>>>>>>>
>>>>>>> --
>>>>>>> Pranith
>>>>>>
>>>>>> --
>>>>>>
>>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>>> Pat Haley Email:phaley at mit.edu <mailto:phaley at mit.edu>
>>>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>>>> MIT, Room 5-213http://web.mit.edu/phaley/www/
>>>>>> 77 Massachusetts Avenue
>>>>>> Cambridge, MA 02139-4301
>>>>>>
>>>>>> --
>>>>>> Pranith
>>>>>
>>>>> --
>>>>>
>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>> Pat Haley Email:phaley at mit.edu <mailto:phaley at mit.edu>
>>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>>> MIT, Room 5-213http://web.mit.edu/phaley/www/
>>>>> 77 Massachusetts Avenue
>>>>> Cambridge, MA 02139-4301
>>>>>
>>>>> --
>>>>> Pranith
>>>>
>>>> --
>>>>
>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>> Pat Haley Email:phaley at mit.edu <mailto:phaley at mit.edu>
>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>> MIT, Room 5-213http://web.mit.edu/phaley/www/
>>>> 77 Massachusetts Avenue
>>>> Cambridge, MA 02139-4301
>>>>
>>>> --
>>>> Pranith
>>>
>>> --
>>>
>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>> Pat Haley Email:phaley at mit.edu <mailto:phaley at mit.edu>
>>> Center for Ocean Engineering Phone: (617) 253-6824
>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>> MIT, Room 5-213http://web.mit.edu/phaley/www/
>>> 77 Massachusetts Avenue
>>> Cambridge, MA 02139-4301
>>>
>>> --
>>> Pranith
>>
>> --
>>
>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>> Pat Haley Email:phaley at mit.edu <mailto:phaley at mit.edu>
>> Center for Ocean Engineering Phone: (617) 253-6824
>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>> MIT, Room 5-213http://web.mit.edu/phaley/www/
>> 77 Massachusetts Avenue
>> Cambridge, MA 02139-4301
>>
>> --
>> Pranith
>
> --
>
> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> Pat Haley Email:phaley at mit.edu <mailto:phaley at mit.edu>
> Center for Ocean Engineering Phone: (617) 253-6824
> Dept. of Mechanical Engineering Fax: (617) 253-8125
> MIT, Room 5-213http://web.mit.edu/phaley/www/
> 77 Massachusetts Avenue
> Cambridge, MA 02139-4301
>
>
>
>
> --
> Pranith
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Pat Haley Email: phaley at mit.edu
Center for Ocean Engineering Phone: (617) 253-6824
Dept. of Mechanical Engineering Fax: (617) 253-8125
MIT, Room 5-213 http://web.mit.edu/phaley/www/
77 Massachusetts Avenue
Cambridge, MA 02139-4301
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170530/5813a91a/attachment.html>
More information about the Gluster-users
mailing list