[Gluster-users] Slow write times to gluster disk
Pat Haley
phaley at mit.edu
Thu May 11 16:02:38 UTC 2017
Hi Pranith,
The /home partition is mounted as ext4
/home ext4 defaults,usrquota,grpquota 1 2
The brick partitions are mounted ax xfs
/mnt/brick1 xfs defaults 0 0
/mnt/brick2 xfs defaults 0 0
Will this cause a problem with creating a volume under /home?
Pat
On 05/11/2017 11:32 AM, Pranith Kumar Karampuri wrote:
>
>
> On Thu, May 11, 2017 at 8:57 PM, Pat Haley <phaley at mit.edu
> <mailto:phaley at mit.edu>> wrote:
>
>
> Hi Pranith,
>
> Unfortunately, we don't have similar hardware for a small scale
> test. All we have is our production hardware.
>
>
> You said something about /home partition which has lesser disks, we
> can create plain distribute volume inside one of those directories.
> After we are done, we can remove the setup. What do you say?
>
>
> Pat
>
>
>
>
> On 05/11/2017 07:05 AM, Pranith Kumar Karampuri wrote:
>>
>>
>> On Thu, May 11, 2017 at 2:48 AM, Pat Haley <phaley at mit.edu
>> <mailto:phaley at mit.edu>> wrote:
>>
>>
>> Hi Pranith,
>>
>> Since we are mounting the partitions as the bricks, I tried
>> the dd test writing to
>> <brick-path>/.glusterfs/<file-to-be-removed-after-test>. The
>> results without oflag=sync were 1.6 Gb/s (faster than gluster
>> but not as fast as I was expecting given the 1.2 Gb/s to the
>> no-gluster area w/ fewer disks).
>>
>>
>> Okay, then 1.6Gb/s is what we need to target for, considering
>> your volume is just distribute. Is there any way you can do tests
>> on similar hardware but at a small scale? Just so we can run the
>> workload to learn more about the bottlenecks in the system? We
>> can probably try to get the speed to 1.2Gb/s on your /home
>> partition you were telling me yesterday. Let me know if that is
>> something you are okay to do.
>>
>>
>> Pat
>>
>>
>>
>> On 05/10/2017 01:27 PM, Pranith Kumar Karampuri wrote:
>>>
>>>
>>> On Wed, May 10, 2017 at 10:15 PM, Pat Haley <phaley at mit.edu
>>> <mailto:phaley at mit.edu>> wrote:
>>>
>>>
>>> Hi Pranith,
>>>
>>> Not entirely sure (this isn't my area of expertise).
>>> I'll run your answer by some other people who are more
>>> familiar with this.
>>>
>>> I am also uncertain about how to interpret the results
>>> when we also add the dd tests writing to the /home area
>>> (no gluster, still on the same machine)
>>>
>>> * dd test without oflag=sync (rough average of
>>> multiple tests)
>>> o gluster w/ fuse mount : 570 Mb/s
>>> o gluster w/ nfs mount: 390 Mb/s
>>> o nfs (no gluster): 1.2 Gb/s
>>> * dd test with oflag=sync (rough average of multiple
>>> tests)
>>> o gluster w/ fuse mount: 5 Mb/s
>>> o gluster w/ nfs mount: 200 Mb/s
>>> o nfs (no gluster): 20 Mb/s
>>>
>>> Given that the non-gluster area is a RAID-6 of 4 disks
>>> while each brick of the gluster area is a RAID-6 of 32
>>> disks, I would naively expect the writes to the gluster
>>> area to be roughly 8x faster than to the non-gluster.
>>>
>>>
>>> I think a better test is to try and write to a file using
>>> nfs without any gluster to a location that is not inside the
>>> brick but someother location that is on same disk(s). If you
>>> are mounting the partition as the brick, then we can write
>>> to a file inside .glusterfs directory, something like
>>> <brick-path>/.glusterfs/<file-to-be-removed-after-test>.
>>>
>>>
>>> I still think we have a speed issue, I can't tell if
>>> fuse vs nfs is part of the problem.
>>>
>>>
>>> I got interested in the post because I read that fuse speed
>>> is lesser than nfs speed which is counter-intuitive to my
>>> understanding. So wanted clarifications. Now that I got my
>>> clarifications where fuse outperformed nfs without sync, we
>>> can resume testing as described above and try to find what
>>> it is. Based on your email-id I am guessing you are from
>>> Boston and I am from Bangalore so if you are okay with doing
>>> this debugging for multiple days because of timezones, I
>>> will be happy to help. Please be a bit patient with me, I am
>>> under a release crunch but I am very curious with the
>>> problem you posted.
>>>
>>> Was there anything useful in the profiles?
>>>
>>>
>>> Unfortunately profiles didn't help me much, I think we are
>>> collecting the profiles from an active volume, so it has a
>>> lot of information that is not pertaining to dd so it is
>>> difficult to find the contributions of dd. So I went through
>>> your post again and found something I didn't pay much
>>> attention to earlier i.e. oflag=sync, so did my own tests on
>>> my setup with FUSE so sent that reply.
>>>
>>>
>>> Pat
>>>
>>>
>>>
>>> On 05/10/2017 12:15 PM, Pranith Kumar Karampuri wrote:
>>>> Okay good. At least this validates my doubts. Handling
>>>> O_SYNC in gluster NFS and fuse is a bit different.
>>>> When application opens a file with O_SYNC on fuse mount
>>>> then each write syscall has to be written to disk as
>>>> part of the syscall where as in case of NFS, there is
>>>> no concept of open. NFS performs write though a handle
>>>> saying it needs to be a synchronous write, so write()
>>>> syscall is performed first then it performs fsync(). so
>>>> an write on an fd with O_SYNC becomes write+fsync. I am
>>>> suspecting that when multiple threads do this
>>>> write+fsync() operation on the same file, multiple
>>>> writes are batched together to be written do disk so
>>>> the throughput on the disk is increasing is my guess.
>>>>
>>>> Does it answer your doubts?
>>>>
>>>> On Wed, May 10, 2017 at 9:35 PM, Pat Haley
>>>> <phaley at mit.edu <mailto:phaley at mit.edu>> wrote:
>>>>
>>>>
>>>> Without the oflag=sync and only a single test of
>>>> each, the FUSE is going faster than NFS:
>>>>
>>>> FUSE:
>>>> mseas-data2(dri_nascar)% dd if=/dev/zero count=4096
>>>> bs=1048576 of=zeros.txt conv=sync
>>>> 4096+0 records in
>>>> 4096+0 records out
>>>> 4294967296 bytes (4.3 GB) copied, 7.46961 s, 575 MB/s
>>>>
>>>>
>>>> NFS
>>>> mseas-data2(HYCOM)% dd if=/dev/zero count=4096
>>>> bs=1048576 of=zeros.txt conv=sync
>>>> 4096+0 records in
>>>> 4096+0 records out
>>>> 4294967296 bytes (4.3 GB) copied, 11.4264 s, 376 MB/s
>>>>
>>>>
>>>>
>>>> On 05/10/2017 11:53 AM, Pranith Kumar Karampuri wrote:
>>>>> Could you let me know the speed without oflag=sync
>>>>> on both the mounts? No need to collect profiles.
>>>>>
>>>>> On Wed, May 10, 2017 at 9:17 PM, Pat Haley
>>>>> <phaley at mit.edu <mailto:phaley at mit.edu>> wrote:
>>>>>
>>>>>
>>>>> Here is what I see now:
>>>>>
>>>>> [root at mseas-data2 ~]# gluster volume info
>>>>>
>>>>> Volume Name: data-volume
>>>>> Type: Distribute
>>>>> Volume ID: c162161e-2a2d-4dac-b015-f31fd89ceb18
>>>>> Status: Started
>>>>> Number of Bricks: 2
>>>>> Transport-type: tcp
>>>>> Bricks:
>>>>> Brick1: mseas-data2:/mnt/brick1
>>>>> Brick2: mseas-data2:/mnt/brick2
>>>>> Options Reconfigured:
>>>>> diagnostics.count-fop-hits: on
>>>>> diagnostics.latency-measurement: on
>>>>> nfs.exports-auth-enable: on
>>>>> diagnostics.brick-sys-log-level: WARNING
>>>>> performance.readdir-ahead: on
>>>>> nfs.disable: on
>>>>> nfs.export-volumes: off
>>>>>
>>>>>
>>>>>
>>>>> On 05/10/2017 11:44 AM, Pranith Kumar
>>>>> Karampuri wrote:
>>>>>> Is this the volume info you have?
>>>>>>
>>>>>> >/[root at mseas-data2
>>>>>> <http://www.gluster.org/mailman/listinfo/gluster-users>
>>>>>> ~]# gluster volume info />//>/Volume Name: data-volume />/Type: Distribute />/Volume ID: c162161e-2a2d-4dac-b015-f31fd89ceb18 />/Status: Started />/Number of Bricks: 2 />/Transport-type: tcp />/Bricks: />/Brick1: mseas-data2:/mnt/brick1 />/Brick2: mseas-data2:/mnt/brick2 />/Options Reconfigured: />/performance.readdir-ahead: on />/nfs.disable: on />/nfs.export-volumes: off /
>>>>>> I copied this from old thread from 2016.
>>>>>> This is distribute volume. Did you change any
>>>>>> of the options in between?
>>>>>
>>>>> --
>>>>>
>>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>>> Pat Haley Email:phaley at mit.edu <mailto:phaley at mit.edu>
>>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>>> MIT, Room 5-213http://web.mit.edu/phaley/www/
>>>>> 77 Massachusetts Avenue
>>>>> Cambridge, MA 02139-4301
>>>>>
>>>>> --
>>>>> Pranith
>>>> --
>>>>
>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>> Pat Haley Email:phaley at mit.edu <mailto:phaley at mit.edu>
>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>> MIT, Room 5-213http://web.mit.edu/phaley/www/
>>>> 77 Massachusetts Avenue
>>>> Cambridge, MA 02139-4301
>>>>
>>>> --
>>>> Pranith
>>> --
>>>
>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>> Pat Haley Email:phaley at mit.edu <mailto:phaley at mit.edu>
>>> Center for Ocean Engineering Phone: (617) 253-6824
>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>> MIT, Room 5-213http://web.mit.edu/phaley/www/
>>> 77 Massachusetts Avenue
>>> Cambridge, MA 02139-4301
>>>
>>> --
>>> Pranith
>> --
>>
>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>> Pat Haley Email:phaley at mit.edu <mailto:phaley at mit.edu>
>> Center for Ocean Engineering Phone: (617) 253-6824
>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>> MIT, Room 5-213http://web.mit.edu/phaley/www/
>> 77 Massachusetts Avenue
>> Cambridge, MA 02139-4301
>>
>> --
>> Pranith
> --
>
> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> Pat Haley Email:phaley at mit.edu <mailto:phaley at mit.edu>
> Center for Ocean Engineering Phone: (617) 253-6824
> Dept. of Mechanical Engineering Fax: (617) 253-8125
> MIT, Room 5-213http://web.mit.edu/phaley/www/
> 77 Massachusetts Avenue
> Cambridge, MA 02139-4301
>
> --
> Pranith
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Pat Haley Email: phaley at mit.edu
Center for Ocean Engineering Phone: (617) 253-6824
Dept. of Mechanical Engineering Fax: (617) 253-8125
MIT, Room 5-213 http://web.mit.edu/phaley/www/
77 Massachusetts Avenue
Cambridge, MA 02139-4301
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170511/11945f52/attachment.html>
More information about the Gluster-users
mailing list