[Gluster-users] Slow write times to gluster disk

Tue Jun 27 04:47:40 UTC 2017

On Mon, Jun 26, 2017 at 7:40 PM, Pat Haley <phaley at mit.edu> wrote:

>
> Hi All,
>
> Decided to try another tests of gluster mounted via FUSE vs gluster
> mounted via NFS, this time using the software we run in production (i.e.
> our ocean model writing a netCDF file).
>
> gluster mounted via NFS the run took 2.3 hr
>
> gluster mounted via FUSE: the run took 44.2 hr
>
> The only problem with using gluster mounted via NFS is that it does not
> respect the group write permissions which we need.
>
> We have an exercise coming up in the a couple of weeks.  It seems to me
> that in order to improve our write times before then, it would be good to
> solve the group write permissions for gluster mounted via NFS now.  We can
> then revisit gluster mounted via FUSE afterwards.
>
> What information would you need to help us force gluster mounted via NFS
> to respect the group write permissions?
>

+Niels, +Jiffin

I added 2 more guys who work on NFS to check why this problem happens in
your environment. Let's see what information they may need to find the
problem and solve this issue.

>
> Thanks
>
> Pat
>
>
>
>
> On 06/24/2017 01:43 AM, Pranith Kumar Karampuri wrote:
>
>
>
> On Fri, Jun 23, 2017 at 9:10 AM, Pranith Kumar Karampuri <
> pkarampu at redhat.com> wrote:
>
>>
>>
>> On Fri, Jun 23, 2017 at 2:23 AM, Pat Haley <phaley at mit.edu> wrote:
>>
>>>
>>> Hi,
>>>
>>> Today we experimented with some of the FUSE options that we found in the
>>> list.
>>>
>>> Changing these options had no effect:
>>>
>>> gluster volume set test-volume performance.cache-max-file-size 2MB
>>> gluster volume set test-volume performance.cache-refresh-timeout 4
>>> gluster volume set test-volume performance.cache-size 256MB
>>> gluster volume set test-volume performance.write-behind-window-size 4MB
>>> gluster volume set test-volume performance.write-behind-window-size 8MB
>>>
>>>
>> This is a good coincidence, I am meeting with write-behind
>> maintainer(+Raghavendra G) today for the same doubt. I think we will have
>> something by EOD IST. I will update you.
>>
>
> Sorry, forgot to update you. It seems like there is a bug in Write-behind
> and Facebook guys sent a patch http://review.gluster.org/16079 to fix the
> same. But even with that I am not seeing any improvement. May be I am doing
> something wrong. Will update you if I find anything more.
>
>> Changing the following option from its default value made the speed slower
>>>
>>> gluster volume set test-volume performance.write-behind off (on by default)
>>>
>>> Changing the following options initially appeared to give a 10% increase
>>> in speed, but this vanished in subsequent tests (we think the apparent
>>> increase may have been to a lighter workload on the computer from other
>>> users)
>>>
>>> gluster volume set test-volume performance.stat-prefetch on
>>> gluster volume set test-volume client.event-threads 4
>>> gluster volume set test-volume server.event-threads 4
>>>
>>> Can anything be gleaned from these observations?  Are there other things
>>> we can try?
>>>
>>> Thanks
>>>
>>> Pat
>>>
>>>
>>>
>>> On 06/20/2017 12:06 PM, Pat Haley wrote:
>>>
>>>
>>> Hi Ben,
>>>
>>> Sorry this took so long, but we had a real-time forecasting exercise
>>> last week and I could only get to this now.
>>>
>>> Backend Hardware/OS:
>>>
>>>    - Much of the information on our back end system is included at the
>>>    top of  http://lists.gluster.org/pipermail/gluster-users/2017-April/
>>>    030529.html
>>>    - The specific model of the hard disks is SeaGate ENTERPRISE
>>>    CAPACITY V.4 6TB (ST6000NM0024). The rated speed is 6Gb/s.
>>>    - Note: there is one physical server that hosts both the NFS and the
>>>    GlusterFS areas
>>>
>>> Latest tests
>>>
>>> I have had time to run the tests for one of the dd tests you requested
>>> to the underlying XFS FS.  The median rate was 170 MB/s.  The dd results
>>> and iostat record are in
>>>
>>> http://mseas.mit.edu/download/phaley/GlusterUsers/TestXFS/
>>>
>>> I'll add tests for the other brick and to the NFS area later.
>>>
>>> Thanks
>>>
>>> Pat
>>>
>>>
>>> On 06/12/2017 06:06 PM, Ben Turner wrote:
>>>
>>> Ok you are correct, you have a pure distributed volume.  IE no replication overhead.  So normally for pure dist I use:
>>>
>>> throughput = slowest of disks / NIC * .6-.7
>>>
>>> In your case we have:
>>>
>>> 1200 * .6 = 720
>>>
>>> So you are seeing a little less throughput than I would expect in your configuration.  What I like to do here is:
>>>
>>> -First tell me more about your back end storage, will it sustain 1200 MB / sec?  What kind of HW?  How many disks?  What type and specs are the disks?  What kind of RAID are you using?
>>>
>>> -Second can you refresh me on your workload?  Are you doing reads / writes or both?  If both what mix?  Since we are using DD I assume you are working iwth large file sequential I/O, is this correct?
>>>
>>> -Run some DD tests on the back end XFS FS.  I normally have /xfs-mount/gluster-brick, if you have something similar just mkdir on the XFS -> /xfs-mount/my-test-dir.  Inside the test dir run:
>>>
>>> If you are focusing on a write workload run:
>>>
>>> # dd if=/dev/zero of=/xfs-mount/file bs=1024k count=10000 conv=fdatasync
>>>
>>> If you are focusing on a read workload run:
>>>
>>> # echo 3 > /proc/sys/vm/drop_caches
>>> # dd if=/gluster-mount/file of=/dev/null bs=1024k count=10000
>>>
>>> ** MAKE SURE TO DROP CACHE IN BETWEEN READS!! **
>>>
>>> Run this in a loop similar to how you did in:
>>> http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/dd_testvol_gluster.txt
>>>
>>> Run this on both servers one at a time and if you are running on a SAN then run again on both at the same time.  While this is running gather iostat for me:
>>>
>>> # iostat -c -m -x 1 > iostat-$(hostname).txt
>>>
>>> Lets see how the back end performs on both servers while capturing iostat, then see how the same workload / data looks on gluster.
>>>
>>> -Last thing, when you run your kernel NFS tests are you using the same filesystem / storage you are using for the gluster bricks?  I want to be sure we have an apples to apples comparison here.
>>>
>>> -b
>>>
>>>
>>>
>>> ----- Original Message -----
>>>
>>> From: "Pat Haley" <phaley at mit.edu> <phaley at mit.edu>
>>> To: "Ben Turner" <bturner at redhat.com> <bturner at redhat.com>
>>> Sent: Monday, June 12, 2017 5:18:07 PM
>>> Subject: Re: [Gluster-users] Slow write times to gluster disk
>>>
>>>
>>> Hi Ben,
>>>
>>> Here is the output:
>>>
>>> [root at mseas-data2 ~]# gluster volume info
>>>
>>> Volume Name: data-volume
>>> Type: Distribute
>>> Volume ID: c162161e-2a2d-4dac-b015-f31fd89ceb18
>>> Status: Started
>>> Number of Bricks: 2
>>> Transport-type: tcp
>>> Bricks:
>>> Brick1: mseas-data2:/mnt/brick1
>>> Brick2: mseas-data2:/mnt/brick2
>>> Options Reconfigured:
>>> nfs.exports-auth-enable: on
>>> diagnostics.brick-sys-log-level: WARNING
>>> performance.readdir-ahead: on
>>> nfs.disable: on
>>> nfs.export-volumes: off
>>>
>>>
>>> On 06/12/2017 05:01 PM, Ben Turner wrote:
>>>
>>> What is the output of gluster v info?  That will tell us more about your
>>> config.
>>>
>>> -b
>>>
>>> ----- Original Message -----
>>>
>>> From: "Pat Haley" <phaley at mit.edu> <phaley at mit.edu>
>>> To: "Ben Turner" <bturner at redhat.com> <bturner at redhat.com>
>>> Sent: Monday, June 12, 2017 4:54:00 PM
>>> Subject: Re: [Gluster-users] Slow write times to gluster disk
>>>
>>>
>>> Hi Ben,
>>>
>>> I guess I'm confused about what you mean by replication.  If I look at
>>> the underlying bricks I only ever have a single copy of any file.  It
>>> either resides on one brick or the other  (directories exist on both
>>> bricks but not files).  We are not using gluster for redundancy (or at
>>> least that wasn't our intent).   Is that what you meant by replication
>>> or is it something else?
>>>
>>> Thanks
>>>
>>> Pat
>>>
>>> On 06/12/2017 04:28 PM, Ben Turner wrote:
>>>
>>> ----- Original Message -----
>>>
>>> From: "Pat Haley" <phaley at mit.edu> <phaley at mit.edu>
>>> To: "Ben Turner" <bturner at redhat.com> <bturner at redhat.com>, "Pranith Kumar Karampuri"<pkarampu at redhat.com> <pkarampu at redhat.com>
>>> Cc: "Ravishankar N" <ravishankar at redhat.com> <ravishankar at redhat.com>, gluster-users at gluster.org,
>>> "Steve Postma" <SPostma at ztechnet.com> <SPostma at ztechnet.com>
>>> Sent: Monday, June 12, 2017 2:35:41 PM
>>> Subject: Re: [Gluster-users] Slow write times to gluster disk
>>>
>>>
>>> Hi Guys,
>>>
>>> I was wondering what our next steps should be to solve the slow write
>>> times.
>>>
>>> Recently I was debugging a large code and writing a lot of output at
>>> every time step.  When I tried writing to our gluster disks, it was
>>> taking over a day to do a single time step whereas if I had the same
>>> program (same hardware, network) write to our nfs disk the time per
>>> time-step was about 45 minutes. What we are shooting for here would be
>>> to have similar times to either gluster of nfs.
>>>
>>> I can see in your test:
>>> http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/dd_testvol_gluster.txt
>>>
>>> You averaged ~600 MB / sec(expected for replica 2 with 10G, {~1200 MB /
>>> sec} / #replicas{2} = 600).  Gluster does client side replication so with
>>> replica 2 you will only ever see 1/2 the speed of your slowest part of
>>> the
>>> stack(NW, disk, RAM, CPU).  This is usually NW or disk and 600 is
>>> normally
>>> a best case.  Now in your output I do see the instances where you went
>>> down to 200 MB / sec.  I can only explain this in three ways:
>>>
>>> 1.  You are not using conv=fdatasync and writes are actually going to
>>> page
>>> cache and then being flushed to disk.  During the fsync the memory is not
>>> yet available and the disks are busy flushing dirty pages.
>>> 2.  Your storage RAID group is shared across multiple LUNS(like in a SAN)
>>> and when write times are slow the RAID group is busy serviceing other
>>> LUNs.
>>> 3.  Gluster bug / config issue / some other unknown unknown.
>>>
>>> So I see 2 issues here:
>>>
>>> 1.  NFS does in 45 minutes what gluster can do in 24 hours.
>>> 2.  Sometimes your throughput drops dramatically.
>>>
>>> WRT #1 - have a look at my estimates above.  My formula for guestimating
>>> gluster perf is: throughput = NIC throughput or storage(whatever is
>>> slower) / # replicas * overhead(figure .7 or .8).  Also the larger the
>>> record size the better for glusterfs mounts, I normally like to be at
>>> LEAST 64k up to 1024k:
>>>
>>> # dd if=/dev/zero of=/gluster-mount/file bs=1024k count=10000
>>> conv=fdatasync
>>>
>>> WRT #2 - Again, I question your testing and your storage config.  Try
>>> using
>>> conv=fdatasync for your DDs, use a larger record size, and make sure that
>>> your back end storage is not causing your slowdowns.  Also remember that
>>> with replica 2 you will take ~50% hit on writes because the client uses
>>> 50% of its bandwidth to write to one replica and 50% to the other.
>>>
>>> -b
>>>
>>>
>>>
>>>
>>> Thanks
>>>
>>> Pat
>>>
>>>
>>> On 06/02/2017 01:07 AM, Ben Turner wrote:
>>>
>>> Are you sure using conv=sync is what you want?  I normally use
>>> conv=fdatasync, I'll look up the difference between the two and see if
>>> it
>>> affects your test.
>>>
>>>
>>> -b
>>>
>>> ----- Original Message -----
>>>
>>> From: "Pat Haley" <phaley at mit.edu> <phaley at mit.edu>
>>> To: "Pranith Kumar Karampuri" <pkarampu at redhat.com> <pkarampu at redhat.com>
>>> Cc: "Ravishankar N" <ravishankar at redhat.com> <ravishankar at redhat.com>,gluster-users at gluster.org,
>>> "Steve Postma" <SPostma at ztechnet.com> <SPostma at ztechnet.com>, "Ben
>>> Turner" <bturner at redhat.com> <bturner at redhat.com>
>>> Sent: Tuesday, May 30, 2017 9:40:34 PM
>>> Subject: Re: [Gluster-users] Slow write times to gluster disk
>>>
>>>
>>> Hi Pranith,
>>>
>>> The "dd" command was:
>>>
>>>         dd if=/dev/zero count=4096 bs=1048576 of=zeros.txt conv=sync
>>>
>>> There were 2 instances where dd reported 22 seconds. The output from
>>> the
>>> dd tests are in
>>> http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/dd_testvol_gluster.txt
>>>
>>> Pat
>>>
>>> On 05/30/2017 09:27 PM, Pranith Kumar Karampuri wrote:
>>>
>>> Pat,
>>>           What is the command you used? As per the following output,
>>>           it
>>> seems like at least one write operation took 16 seconds. Which is
>>> really bad.
>>>          96.39    1165.10 us      89.00 us*16487014.00 us*
>>>          393212
>>>          WRITE
>>>
>>>
>>> On Tue, May 30, 2017 at 10:36 PM, Pat Haley <phaley at mit.edu<mailto:phaley at mit.edu> <phaley at mit.edu>> wrote:
>>>
>>>
>>>        Hi Pranith,
>>>
>>>        I ran the same 'dd' test both in the gluster test volume and
>>>        in
>>>        the .glusterfs directory of each brick.  The median results
>>>        (12
>>>        dd
>>>        trials in each test) are similar to before
>>>
>>>          * gluster test volume: 586.5 MB/s
>>>          * bricks (in .glusterfs): 1.4 GB/s
>>>
>>>        The profile for the gluster test-volume is in
>>>
>>>        http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/profile_testvol_gluster.txt
>>>        <http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/profile_testvol_gluster.txt> <http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/profile_testvol_gluster.txt>
>>>
>>>        Thanks
>>>
>>>        Pat
>>>
>>>
>>>
>>>
>>>        On 05/30/2017 12:10 PM, Pranith Kumar Karampuri wrote:
>>>
>>>        Let's start with the same 'dd' test we were testing with to
>>>        see,
>>>        what the numbers are. Please provide profile numbers for the
>>>        same. From there on we will start tuning the volume to see
>>>        what
>>>        we can do.
>>>
>>>        On Tue, May 30, 2017 at 9:16 PM, Pat Haley <phaley at mit.edu
>>>        <mailto:phaley at mit.edu> <phaley at mit.edu>> wrote:
>>>
>>>
>>>            Hi Pranith,
>>>
>>>            Thanks for the tip.  We now have the gluster volume
>>>            mounted
>>>            under /home.  What tests do you recommend we run?
>>>
>>>            Thanks
>>>
>>>            Pat
>>>
>>>
>>>
>>>            On 05/17/2017 05:01 AM, Pranith Kumar Karampuri wrote:
>>>
>>>            On Tue, May 16, 2017 at 9:20 PM, Pat Haley
>>>            <phaley at mit.edu
>>>            <mailto:phaley at mit.edu> <phaley at mit.edu>> wrote:
>>>
>>>
>>>                Hi Pranith,
>>>
>>>                Sorry for the delay.  I never saw received your
>>>                reply
>>>                (but I did receive Ben Turner's follow-up to your
>>>                reply).  So we tried to create a gluster volume
>>>                under
>>>                /home using different variations of
>>>
>>>                gluster volume create test-volume
>>>                mseas-data2:/home/gbrick_test_1
>>>                mseas-data2:/home/gbrick_test_2 transport tcp
>>>
>>>                However we keep getting errors of the form
>>>
>>>                Wrong brick type: transport, use
>>>                <HOSTNAME>:<export-dir-abs-path>
>>>
>>>                Any thoughts on what we're doing wrong?
>>>
>>>
>>>            You should give transport tcp at the beginning I think.
>>>            Anyways, transport tcp is the default, so no need to
>>>            specify
>>>            so remove those two words from the CLI.
>>>
>>>
>>>                Also do you have a list of the test we should be
>>>                running
>>>                once we get this volume created?  Given the
>>>                time-zone
>>>                difference it might help if we can run a small
>>>                battery
>>>                of tests and post the results rather than
>>>                test-post-new
>>>                test-post... .
>>>
>>>
>>>            This is the first time I am doing performance analysis
>>>            on
>>>            users as far as I remember. In our team there are
>>>            separate
>>>            engineers who do these tests. Ben who replied earlier is
>>>            one
>>>            such engineer.
>>>
>>>            Ben,
>>>                Have any suggestions?
>>>
>>>
>>>                Thanks
>>>
>>>                Pat
>>>
>>>
>>>
>>>                On 05/11/2017 12:06 PM, Pranith Kumar Karampuri
>>>                wrote:
>>>
>>>                On Thu, May 11, 2017 at 9:32 PM, Pat Haley
>>>                <phaley at mit.edu <mailto:phaley at mit.edu> <phaley at mit.edu>> wrote:
>>>
>>>
>>>                    Hi Pranith,
>>>
>>>                    The /home partition is mounted as ext4
>>>                    /home ext4 defaults,usrquota,grpquota   1 2
>>>
>>>                    The brick partitions are mounted ax xfs
>>>                    /mnt/brick1 xfs defaults 0 0
>>>                    /mnt/brick2 xfs defaults 0 0
>>>
>>>                    Will this cause a problem with creating a
>>>                    volume
>>>                    under /home?
>>>
>>>
>>>                I don't think the bottleneck is disk. You can do
>>>                the
>>>                same tests you did on your new volume to confirm?
>>>
>>>
>>>                    Pat
>>>
>>>
>>>
>>>                    On 05/11/2017 11:32 AM, Pranith Kumar Karampuri
>>>                    wrote:
>>>
>>>                    On Thu, May 11, 2017 at 8:57 PM, Pat Haley
>>>                    <phaley at mit.edu <mailto:phaley at mit.edu> <phaley at mit.edu>>
>>>                    wrote:
>>>
>>>
>>>                        Hi Pranith,
>>>
>>>                        Unfortunately, we don't have similar
>>>                        hardware
>>>                        for a small scale test.  All we have is
>>>                        our
>>>                        production hardware.
>>>
>>>
>>>                    You said something about /home partition which
>>>                    has
>>>                    lesser disks, we can create plain distribute
>>>                    volume inside one of those directories. After
>>>                    we
>>>                    are done, we can remove the setup. What do you
>>>                    say?
>>>
>>>
>>>                        Pat
>>>
>>>
>>>
>>>
>>>                        On 05/11/2017 07:05 AM, Pranith Kumar
>>>                        Karampuri wrote:
>>>
>>>                        On Thu, May 11, 2017 at 2:48 AM, Pat
>>>                        Haley
>>>                        <phaley at mit.edu <mailto:phaley at mit.edu> <phaley at mit.edu>>
>>>                        wrote:
>>>
>>>
>>>                            Hi Pranith,
>>>
>>>                            Since we are mounting the partitions
>>>                            as
>>>                            the bricks, I tried the dd test
>>>                            writing
>>>                            to
>>>                            <brick-path>/.glusterfs/<file-to-be-removed-after-test>.
>>>                            The results without oflag=sync were
>>>                            1.6
>>>                            Gb/s (faster than gluster but not as
>>>                            fast
>>>                            as I was expecting given the 1.2 Gb/s
>>>                            to
>>>                            the no-gluster area w/ fewer disks).
>>>
>>>
>>>                        Okay, then 1.6Gb/s is what we need to
>>>                        target
>>>                        for, considering your volume is just
>>>                        distribute. Is there any way you can do
>>>                        tests
>>>                        on similar hardware but at a small scale?
>>>                        Just so we can run the workload to learn
>>>                        more
>>>                        about the bottlenecks in the system? We
>>>                        can
>>>                        probably try to get the speed to 1.2Gb/s
>>>                        on
>>>                        your /home partition you were telling me
>>>                        yesterday. Let me know if that is
>>>                        something
>>>                        you are okay to do.
>>>
>>>
>>>                            Pat
>>>
>>>
>>>
>>>                            On 05/10/2017 01:27 PM, Pranith Kumar
>>>                            Karampuri wrote:
>>>
>>>                            On Wed, May 10, 2017 at 10:15 PM,
>>>                            Pat
>>>                            Haley <phaley at mit.edu
>>>                            <mailto:phaley at mit.edu> <phaley at mit.edu>> wrote:
>>>
>>>
>>>                                Hi Pranith,
>>>
>>>                                Not entirely sure (this isn't my
>>>                                area of expertise). I'll run
>>>                                your
>>>                                answer by some other people who
>>>                                are
>>>                                more familiar with this.
>>>
>>>                                I am also uncertain about how to
>>>                                interpret the results when we
>>>                                also
>>>                                add the dd tests writing to the
>>>                                /home area (no gluster, still on
>>>                                the
>>>                                same machine)
>>>
>>>                                  * dd test without oflag=sync
>>>                                    (rough average of multiple
>>>                                    tests)
>>>                                      o gluster w/ fuse mount :
>>>                                      570
>>>                                      Mb/s
>>>                                      o gluster w/ nfs mount:
>>>                                      390
>>>                                      Mb/s
>>>                                      o nfs (no gluster):  1.2
>>>                                      Gb/s
>>>                                  * dd test with oflag=sync
>>>                                  (rough
>>>                                    average of multiple tests)
>>>                                      o gluster w/ fuse mount:
>>>                                      5
>>>                                      Mb/s
>>>                                      o gluster w/ nfs mount:
>>>                                      200
>>>                                      Mb/s
>>>                                      o nfs (no gluster): 20
>>>                                      Mb/s
>>>
>>>                                Given that the non-gluster area
>>>                                is
>>>                                a
>>>                                RAID-6 of 4 disks while each
>>>                                brick
>>>                                of the gluster area is a RAID-6
>>>                                of
>>>                                32 disks, I would naively expect
>>>                                the
>>>                                writes to the gluster area to be
>>>                                roughly 8x faster than to the
>>>                                non-gluster.
>>>
>>>
>>>                            I think a better test is to try and
>>>                            write to a file using nfs without
>>>                            any
>>>                            gluster to a location that is not
>>>                            inside
>>>                            the brick but someother location
>>>                            that
>>>                            is
>>>                            on same disk(s). If you are mounting
>>>                            the
>>>                            partition as the brick, then we can
>>>                            write to a file inside .glusterfs
>>>                            directory, something like
>>>                            <brick-path>/.glusterfs/<file-to-be-removed-after-test>.
>>>
>>>
>>>
>>>                                I still think we have a speed
>>>                                issue,
>>>                                I can't tell if fuse vs nfs is
>>>                                part
>>>                                of the problem.
>>>
>>>
>>>                            I got interested in the post because
>>>                            I
>>>                            read that fuse speed is lesser than
>>>                            nfs
>>>                            speed which is counter-intuitive to
>>>                            my
>>>                            understanding. So wanted
>>>                            clarifications.
>>>                            Now that I got my clarifications
>>>                            where
>>>                            fuse outperformed nfs without sync,
>>>                            we
>>>                            can resume testing as described
>>>                            above
>>>                            and try to find what it is. Based on
>>>                            your email-id I am guessing you are
>>>                            from
>>>                            Boston and I am from Bangalore so if
>>>                            you
>>>                            are okay with doing this debugging
>>>                            for
>>>                            multiple days because of timezones,
>>>                            I
>>>                            will be happy to help. Please be a
>>>                            bit
>>>                            patient with me, I am under a
>>>                            release
>>>                            crunch but I am very curious with
>>>                            the
>>>                            problem you posted.
>>>
>>>                                Was there anything useful in the
>>>                                profiles?
>>>
>>>
>>>                            Unfortunately profiles didn't help
>>>                            me
>>>                            much, I think we are collecting the
>>>                            profiles from an active volume, so
>>>                            it
>>>                            has a lot of information that is not
>>>                            pertaining to dd so it is difficult
>>>                            to
>>>                            find the contributions of dd. So I
>>>                            went
>>>                            through your post again and found
>>>                            something I didn't pay much
>>>                            attention
>>>                            to
>>>                            earlier i.e. oflag=sync, so did my
>>>                            own
>>>                            tests on my setup with FUSE so sent
>>>                            that
>>>                            reply.
>>>
>>>
>>>                                Pat
>>>
>>>
>>>
>>>                                On 05/10/2017 12:15 PM, Pranith
>>>                                Kumar Karampuri wrote:
>>>
>>>                                Okay good. At least this
>>>                                validates
>>>                                my doubts. Handling O_SYNC in
>>>                                gluster NFS and fuse is a bit
>>>                                different.
>>>                                When application opens a file
>>>                                with
>>>                                O_SYNC on fuse mount then each
>>>                                write syscall has to be written
>>>                                to
>>>                                disk as part of the syscall
>>>                                where
>>>                                as in case of NFS, there is no
>>>                                concept of open. NFS performs
>>>                                write
>>>                                though a handle saying it needs
>>>                                to
>>>                                be a synchronous write, so
>>>                                write()
>>>                                syscall is performed first then
>>>                                it
>>>                                performs fsync(). so an write
>>>                                on
>>>                                an
>>>                                fd with O_SYNC becomes
>>>                                write+fsync.
>>>                                I am suspecting that when
>>>                                multiple
>>>                                threads do this write+fsync()
>>>                                operation on the same file,
>>>                                multiple writes are batched
>>>                                together to be written do disk
>>>                                so
>>>                                the throughput on the disk is
>>>                                increasing is my guess.
>>>
>>>                                Does it answer your doubts?
>>>
>>>                                On Wed, May 10, 2017 at 9:35
>>>                                PM,
>>>                                Pat Haley <phaley at mit.edu
>>>                                <mailto:phaley at mit.edu> <phaley at mit.edu>> wrote:
>>>
>>>
>>>                                    Without the oflag=sync and
>>>                                    only
>>>                                    a single test of each, the
>>>                                    FUSE
>>>                                    is going faster than NFS:
>>>
>>>                                    FUSE:
>>>                                    mseas-data2(dri_nascar)% dd
>>>                                    if=/dev/zero count=4096
>>>                                    bs=1048576 of=zeros.txt
>>>                                    conv=sync
>>>                                    4096+0 records in
>>>                                    4096+0 records out
>>>                                    4294967296 bytes (4.3 GB)
>>>                                    copied, 7.46961 s, 575 MB/s
>>>
>>>
>>>                                    NFS
>>>                                    mseas-data2(HYCOM)% dd
>>>                                    if=/dev/zero count=4096
>>>                                    bs=1048576 of=zeros.txt
>>>                                    conv=sync
>>>                                    4096+0 records in
>>>                                    4096+0 records out
>>>                                    4294967296 bytes (4.3 GB)
>>>                                    copied, 11.4264 s, 376 MB/s
>>>
>>>
>>>
>>>                                    On 05/10/2017 11:53 AM,
>>>                                    Pranith
>>>                                    Kumar Karampuri wrote:
>>>
>>>                                    Could you let me know the
>>>                                    speed without oflag=sync
>>>                                    on
>>>                                    both the mounts? No need
>>>                                    to
>>>                                    collect profiles.
>>>
>>>                                    On Wed, May 10, 2017 at
>>>                                    9:17
>>>                                    PM, Pat Haley
>>>                                    <phaley at mit.edu
>>>                                    <mailto:phaley at mit.edu> <phaley at mit.edu>>
>>>                                    wrote:
>>>
>>>
>>>                                        Here is what I see
>>>                                        now:
>>>
>>>                                        [root at mseas-data2 ~]#
>>>                                        gluster volume info
>>>
>>>                                        Volume Name:
>>>                                        data-volume
>>>                                        Type: Distribute
>>>                                        Volume ID:
>>>                                        c162161e-2a2d-4dac-b015-f31fd89ceb18
>>>                                        Status: Started
>>>                                        Number of Bricks: 2
>>>                                        Transport-type: tcp
>>>                                        Bricks:
>>>                                        Brick1:
>>>                                        mseas-data2:/mnt/brick1
>>>                                        Brick2:
>>>                                        mseas-data2:/mnt/brick2
>>>                                        Options Reconfigured:
>>>                                        diagnostics.count-fop-hits:
>>>                                        on
>>>                                        diagnostics.latency-measurement:
>>>                                        on
>>>                                        nfs.exports-auth-enable:
>>>                                        on
>>>                                        diagnostics.brick-sys-log-level:
>>>                                        WARNING
>>>                                        performance.readdir-ahead:
>>>                                        on
>>>                                        nfs.disable: on
>>>                                        nfs.export-volumes:
>>>                                        off
>>>
>>>
>>>
>>>                                        On 05/10/2017 11:44
>>>                                        AM,
>>>                                        Pranith Kumar
>>>                                        Karampuri
>>>                                        wrote:
>>>
>>>                                        Is this the volume
>>>                                        info
>>>                                        you have?
>>>
>>>                                        >/[root at
>>>                                        >mseas-data2
>>>                                        <http://www.gluster.org/mailman/listinfo/gluster-users> <http://www.gluster.org/mailman/listinfo/gluster-users>
>>>                                        ~]# gluster volume
>>>                                        info
>>>                                        />//>/Volume Name:
>>>                                        data-volume />/Type:
>>>                                        Distribute />/Volume
>>>                                        ID:
>>>                                        c162161e-2a2d-4dac-b015-f31fd89ceb18
>>>                                        />/Status: Started
>>>                                        />/Number
>>>                                        of Bricks: 2
>>>                                        />/Transport-type:
>>>                                        tcp
>>>                                        />/Bricks: />/Brick1:
>>>                                        mseas-data2:/mnt/brick1
>>>                                        />/Brick2:
>>>                                        mseas-data2:/mnt/brick2
>>>                                        />/Options
>>>                                        Reconfigured:
>>>                                        />/performance.readdir-ahead:
>>>                                        on />/nfs.disable: on
>>>                                        />/nfs.export-volumes:
>>>                                        off
>>>                                        /
>>>                                        I copied this from
>>>                                        old
>>>                                        thread from 2016.
>>>                                        This
>>>                                        is
>>>                                        distribute volume.
>>>                                        Did
>>>                                        you change any of the
>>>                                        options in between?
>>>
>>>                                        --
>>>
>>>                                        -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>                                        Pat Haley
>>>                                        Email:phaley at mit.edu
>>>                                        <mailto:phaley at mit.edu> <phaley at mit.edu>
>>>                                        Center for Ocean
>>>                                        Engineering
>>>                                        Phone:  (617) 253-6824
>>>                                        Dept. of Mechanical
>>>                                        Engineering
>>>                                        Fax:    (617) 253-8125
>>>                                        MIT, Room
>>>                                        5-213http://web.mit.edu/phaley/www/
>>>                                        77 Massachusetts
>>>                                        Avenue
>>>                                        Cambridge, MA
>>>                                        02139-4301
>>>
>>>                                    --
>>>                                    Pranith
>>>
>>>                                    --
>>>
>>>                                    -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>                                    Pat Haley
>>>                                    Email:phaley at mit.edu
>>>                                    <mailto:phaley at mit.edu> <phaley at mit.edu>
>>>                                    Center for Ocean
>>>                                    Engineering
>>>                                    Phone:  (617) 253-6824
>>>                                    Dept. of Mechanical
>>>                                    Engineering
>>>                                    Fax:    (617) 253-8125
>>>                                    MIT, Room
>>>                                    5-213http://web.mit.edu/phaley/www/
>>>                                    77 Massachusetts Avenue
>>>                                    Cambridge, MA  02139-4301
>>>
>>>                                --
>>>                                Pranith
>>>
>>>                                --
>>>
>>>                                -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>                                Pat Haley
>>>                                Email:phaley at mit.edu
>>>                                <mailto:phaley at mit.edu> <phaley at mit.edu>
>>>                                Center for Ocean Engineering
>>>                                Phone:
>>>                                (617) 253-6824
>>>                                Dept. of Mechanical Engineering
>>>                                Fax:
>>>                                (617) 253-8125
>>>                                MIT, Room
>>>                                5-213http://web.mit.edu/phaley/www/
>>>                                77 Massachusetts Avenue
>>>                                Cambridge, MA  02139-4301
>>>
>>>                            --
>>>                            Pranith
>>>
>>>                            --
>>>
>>>                            -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>                            Pat Haley
>>>                            Email:phaley at mit.edu
>>>                            <mailto:phaley at mit.edu> <phaley at mit.edu>
>>>                            Center for Ocean Engineering
>>>                            Phone:
>>>                            (617) 253-6824
>>>                            Dept. of Mechanical Engineering
>>>                            Fax:
>>>                            (617) 253-8125
>>>                            MIT, Room
>>>                            5-213http://web.mit.edu/phaley/www/
>>>                            77 Massachusetts Avenue
>>>                            Cambridge, MA  02139-4301
>>>
>>>                        --
>>>                        Pranith
>>>
>>>                        --
>>>
>>>                        -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>                        Pat Haley
>>>                        Email:phaley at mit.edu
>>>                        <mailto:phaley at mit.edu> <phaley at mit.edu>
>>>                        Center for Ocean Engineering       Phone:
>>>                        (617)
>>>                        253-6824
>>>                        Dept. of Mechanical Engineering    Fax:
>>>                        (617)
>>>                        253-8125
>>>                        MIT, Room
>>>                        5-213http://web.mit.edu/phaley/www/
>>>                        77 Massachusetts Avenue
>>>                        Cambridge, MA  02139-4301
>>>
>>>                    --
>>>                    Pranith
>>>
>>>                    --
>>>
>>>                    -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>                    Pat Haley
>>>                    Email:phaley at mit.edu
>>>                    <mailto:phaley at mit.edu> <phaley at mit.edu>
>>>                    Center for Ocean Engineering       Phone:
>>>                    (617)
>>>                    253-6824
>>>                    Dept. of Mechanical Engineering    Fax:
>>>                    (617)
>>>                    253-8125
>>>                    MIT, Room 5-213http://web.mit.edu/phaley/www/
>>>                    77 Massachusetts Avenue
>>>                    Cambridge, MA  02139-4301
>>>
>>>                --
>>>                Pranith
>>>
>>>                --
>>>
>>>                -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>                Pat Haley
>>>                Email:phaley at mit.edu
>>>                <mailto:phaley at mit.edu> <phaley at mit.edu>
>>>                Center for Ocean Engineering       Phone:  (617)
>>>                253-6824
>>>                Dept. of Mechanical Engineering    Fax:    (617)
>>>                253-8125
>>>                MIT, Room 5-213http://web.mit.edu/phaley/www/
>>>                77 Massachusetts Avenue
>>>                Cambridge, MA  02139-4301
>>>
>>>
>>>
>>>
>>>            --
>>>            Pranith
>>>
>>>            --
>>>
>>>            -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>            Pat Haley                          Email:phaley at mit.edu
>>>            <mailto:phaley at mit.edu> <phaley at mit.edu>
>>>            Center for Ocean Engineering       Phone:  (617) 253-6824
>>>            Dept. of Mechanical Engineering    Fax:    (617) 253-8125
>>>            MIT, Room 5-213http://web.mit.edu/phaley/www/
>>>            77 Massachusetts Avenue
>>>            Cambridge, MA  02139-4301
>>>
>>>
>>>
>>>
>>>        --
>>>        Pranith
>>>
>>>        --
>>>
>>>        -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>        Pat Haley                          Email:phaley at mit.edu
>>>        <mailto:phaley at mit.edu> <phaley at mit.edu>
>>>        Center for Ocean Engineering       Phone:  (617) 253-6824
>>>        Dept. of Mechanical Engineering    Fax:    (617) 253-8125
>>>        MIT, Room 5-213http://web.mit.edu/phaley/www/
>>>        77 Massachusetts Avenue
>>>        Cambridge, MA  02139-4301
>>>
>>>
>>>
>>>
>>> --
>>> Pranith
>>>
>>> --
>>>
>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>> Pat Haley                          Email:  phaley at mit.edu
>>> Center for Ocean Engineering       Phone:  (617) 253-6824
>>> Dept. of Mechanical Engineering    Fax:    (617) 253-8125
>>> MIT, Room 5-213                    http://web.mit.edu/phaley/www/
>>> 77 Massachusetts Avenue
>>> Cambridge, MA  02139-4301
>>>
>>>
>>>
>>> --
>>>
>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>> Pat Haley                          Email:  phaley at mit.edu
>>> Center for Ocean Engineering       Phone:  (617) 253-6824
>>> Dept. of Mechanical Engineering    Fax:    (617) 253-8125
>>> MIT, Room 5-213                    http://web.mit.edu/phaley/www/
>>> 77 Massachusetts Avenue
>>> Cambridge, MA  02139-4301
>>>
>>>
>>>
>>> --
>>>
>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>> Pat Haley                          Email:  phaley at mit.edu
>>> Center for Ocean Engineering       Phone:  (617) 253-6824
>>> Dept. of Mechanical Engineering    Fax:    (617) 253-8125
>>> MIT, Room 5-213                    http://web.mit.edu/phaley/www/
>>> 77 Massachusetts Avenue
>>> Cambridge, MA  02139-4301
>>>
>>>
>>>
>>> --
>>>
>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>> Pat Haley                          Email:  phaley at mit.edu
>>> Center for Ocean Engineering       Phone:  (617) 253-6824
>>> Dept. of Mechanical Engineering    Fax:    (617) 253-8125
>>> MIT, Room 5-213                    http://web.mit.edu/phaley/www/
>>> 77 Massachusetts Avenue
>>> Cambridge, MA  02139-4301
>>>
>>>
>>>
>>>
>>> --
>>>
>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>> Pat Haley                          Email:  phaley at mit.edu
>>> Center for Ocean Engineering       Phone:  (617) 253-6824
>>> Dept. of Mechanical Engineering    Fax:    (617) 253-8125
>>> MIT, Room 5-213                    http://web.mit.edu/phaley/www/
>>> 77 Massachusetts Avenue
>>> Cambridge, MA  02139-4301
>>>
>>>
>>>
>>> _______________________________________________
>>> Gluster-users mailing listGluster-users at gluster.orghttp://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>>>
>>> --
>>>
>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>> Pat Haley                          Email:  phaley at mit.edu
>>> Center for Ocean Engineering       Phone:  (617) 253-6824
>>> Dept. of Mechanical Engineering    Fax:    (617) 253-8125
>>> MIT, Room 5-213                    http://web.mit.edu/phaley/www/
>>> 77 Massachusetts Avenue
>>> Cambridge, MA  02139-4301
>>>
>>>
>>
>>
>> --
>> Pranith
>>
>
>
>
> --
> Pranith
>
>
> --
>
> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> Pat Haley                          Email:  phaley at mit.edu
> Center for Ocean Engineering       Phone:  (617) 253-6824
> Dept. of Mechanical Engineering    Fax:    (617) 253-8125
> MIT, Room 5-213                    http://web.mit.edu/phaley/www/
> 77 Massachusetts Avenue
> Cambridge, MA  02139-4301
>
>

-- 
Pranith
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170627/ecf5e496/attachment.html>