[Gluster-users] Slow write times to gluster disk
Niels de Vos
ndevos at redhat.com
Tue Jun 27 08:13:17 UTC 2017
On Tue, Jun 27, 2017 at 10:17:40AM +0530, Pranith Kumar Karampuri wrote:
> On Mon, Jun 26, 2017 at 7:40 PM, Pat Haley <phaley at mit.edu> wrote:
>
> >
> > Hi All,
> >
> > Decided to try another tests of gluster mounted via FUSE vs gluster
> > mounted via NFS, this time using the software we run in production (i.e.
> > our ocean model writing a netCDF file).
> >
> > gluster mounted via NFS the run took 2.3 hr
> >
> > gluster mounted via FUSE: the run took 44.2 hr
> >
> > The only problem with using gluster mounted via NFS is that it does not
> > respect the group write permissions which we need.
> >
> > We have an exercise coming up in the a couple of weeks. It seems to me
> > that in order to improve our write times before then, it would be good to
> > solve the group write permissions for gluster mounted via NFS now. We can
> > then revisit gluster mounted via FUSE afterwards.
> >
> > What information would you need to help us force gluster mounted via NFS
> > to respect the group write permissions?
> >
>
> +Niels, +Jiffin
>
> I added 2 more guys who work on NFS to check why this problem happens in
> your environment. Let's see what information they may need to find the
> problem and solve this issue.
Hi Pat,
depending on the number of groups that a user is part of, you may need
to change some volume options. A complete description of the limitations
on the number of groups can be foune here:
https://github.com/gluster/glusterdocs/blob/master/Administrator%20Guide/Handling-of-users-with-many-groups.md
HTH,
Niels
>
>
> >
> > Thanks
> >
> > Pat
> >
> >
> >
> >
> > On 06/24/2017 01:43 AM, Pranith Kumar Karampuri wrote:
> >
> >
> >
> > On Fri, Jun 23, 2017 at 9:10 AM, Pranith Kumar Karampuri <
> > pkarampu at redhat.com> wrote:
> >
> >>
> >>
> >> On Fri, Jun 23, 2017 at 2:23 AM, Pat Haley <phaley at mit.edu> wrote:
> >>
> >>>
> >>> Hi,
> >>>
> >>> Today we experimented with some of the FUSE options that we found in the
> >>> list.
> >>>
> >>> Changing these options had no effect:
> >>>
> >>> gluster volume set test-volume performance.cache-max-file-size 2MB
> >>> gluster volume set test-volume performance.cache-refresh-timeout 4
> >>> gluster volume set test-volume performance.cache-size 256MB
> >>> gluster volume set test-volume performance.write-behind-window-size 4MB
> >>> gluster volume set test-volume performance.write-behind-window-size 8MB
> >>>
> >>>
> >> This is a good coincidence, I am meeting with write-behind
> >> maintainer(+Raghavendra G) today for the same doubt. I think we will have
> >> something by EOD IST. I will update you.
> >>
> >
> > Sorry, forgot to update you. It seems like there is a bug in Write-behind
> > and Facebook guys sent a patch http://review.gluster.org/16079 to fix the
> > same. But even with that I am not seeing any improvement. May be I am doing
> > something wrong. Will update you if I find anything more.
> >
> >> Changing the following option from its default value made the speed slower
> >>>
> >>> gluster volume set test-volume performance.write-behind off (on by default)
> >>>
> >>> Changing the following options initially appeared to give a 10% increase
> >>> in speed, but this vanished in subsequent tests (we think the apparent
> >>> increase may have been to a lighter workload on the computer from other
> >>> users)
> >>>
> >>> gluster volume set test-volume performance.stat-prefetch on
> >>> gluster volume set test-volume client.event-threads 4
> >>> gluster volume set test-volume server.event-threads 4
> >>>
> >>> Can anything be gleaned from these observations? Are there other things
> >>> we can try?
> >>>
> >>> Thanks
> >>>
> >>> Pat
> >>>
> >>>
> >>>
> >>> On 06/20/2017 12:06 PM, Pat Haley wrote:
> >>>
> >>>
> >>> Hi Ben,
> >>>
> >>> Sorry this took so long, but we had a real-time forecasting exercise
> >>> last week and I could only get to this now.
> >>>
> >>> Backend Hardware/OS:
> >>>
> >>> - Much of the information on our back end system is included at the
> >>> top of http://lists.gluster.org/pipermail/gluster-users/2017-April/
> >>> 030529.html
> >>> - The specific model of the hard disks is SeaGate ENTERPRISE
> >>> CAPACITY V.4 6TB (ST6000NM0024). The rated speed is 6Gb/s.
> >>> - Note: there is one physical server that hosts both the NFS and the
> >>> GlusterFS areas
> >>>
> >>> Latest tests
> >>>
> >>> I have had time to run the tests for one of the dd tests you requested
> >>> to the underlying XFS FS. The median rate was 170 MB/s. The dd results
> >>> and iostat record are in
> >>>
> >>> http://mseas.mit.edu/download/phaley/GlusterUsers/TestXFS/
> >>>
> >>> I'll add tests for the other brick and to the NFS area later.
> >>>
> >>> Thanks
> >>>
> >>> Pat
> >>>
> >>>
> >>> On 06/12/2017 06:06 PM, Ben Turner wrote:
> >>>
> >>> Ok you are correct, you have a pure distributed volume. IE no replication overhead. So normally for pure dist I use:
> >>>
> >>> throughput = slowest of disks / NIC * .6-.7
> >>>
> >>> In your case we have:
> >>>
> >>> 1200 * .6 = 720
> >>>
> >>> So you are seeing a little less throughput than I would expect in your configuration. What I like to do here is:
> >>>
> >>> -First tell me more about your back end storage, will it sustain 1200 MB / sec? What kind of HW? How many disks? What type and specs are the disks? What kind of RAID are you using?
> >>>
> >>> -Second can you refresh me on your workload? Are you doing reads / writes or both? If both what mix? Since we are using DD I assume you are working iwth large file sequential I/O, is this correct?
> >>>
> >>> -Run some DD tests on the back end XFS FS. I normally have /xfs-mount/gluster-brick, if you have something similar just mkdir on the XFS -> /xfs-mount/my-test-dir. Inside the test dir run:
> >>>
> >>> If you are focusing on a write workload run:
> >>>
> >>> # dd if=/dev/zero of=/xfs-mount/file bs=1024k count=10000 conv=fdatasync
> >>>
> >>> If you are focusing on a read workload run:
> >>>
> >>> # echo 3 > /proc/sys/vm/drop_caches
> >>> # dd if=/gluster-mount/file of=/dev/null bs=1024k count=10000
> >>>
> >>> ** MAKE SURE TO DROP CACHE IN BETWEEN READS!! **
> >>>
> >>> Run this in a loop similar to how you did in:
> >>> http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/dd_testvol_gluster.txt
> >>>
> >>> Run this on both servers one at a time and if you are running on a SAN then run again on both at the same time. While this is running gather iostat for me:
> >>>
> >>> # iostat -c -m -x 1 > iostat-$(hostname).txt
> >>>
> >>> Lets see how the back end performs on both servers while capturing iostat, then see how the same workload / data looks on gluster.
> >>>
> >>> -Last thing, when you run your kernel NFS tests are you using the same filesystem / storage you are using for the gluster bricks? I want to be sure we have an apples to apples comparison here.
> >>>
> >>> -b
> >>>
> >>>
> >>>
> >>> ----- Original Message -----
> >>>
> >>> From: "Pat Haley" <phaley at mit.edu> <phaley at mit.edu>
> >>> To: "Ben Turner" <bturner at redhat.com> <bturner at redhat.com>
> >>> Sent: Monday, June 12, 2017 5:18:07 PM
> >>> Subject: Re: [Gluster-users] Slow write times to gluster disk
> >>>
> >>>
> >>> Hi Ben,
> >>>
> >>> Here is the output:
> >>>
> >>> [root at mseas-data2 ~]# gluster volume info
> >>>
> >>> Volume Name: data-volume
> >>> Type: Distribute
> >>> Volume ID: c162161e-2a2d-4dac-b015-f31fd89ceb18
> >>> Status: Started
> >>> Number of Bricks: 2
> >>> Transport-type: tcp
> >>> Bricks:
> >>> Brick1: mseas-data2:/mnt/brick1
> >>> Brick2: mseas-data2:/mnt/brick2
> >>> Options Reconfigured:
> >>> nfs.exports-auth-enable: on
> >>> diagnostics.brick-sys-log-level: WARNING
> >>> performance.readdir-ahead: on
> >>> nfs.disable: on
> >>> nfs.export-volumes: off
> >>>
> >>>
> >>> On 06/12/2017 05:01 PM, Ben Turner wrote:
> >>>
> >>> What is the output of gluster v info? That will tell us more about your
> >>> config.
> >>>
> >>> -b
> >>>
> >>> ----- Original Message -----
> >>>
> >>> From: "Pat Haley" <phaley at mit.edu> <phaley at mit.edu>
> >>> To: "Ben Turner" <bturner at redhat.com> <bturner at redhat.com>
> >>> Sent: Monday, June 12, 2017 4:54:00 PM
> >>> Subject: Re: [Gluster-users] Slow write times to gluster disk
> >>>
> >>>
> >>> Hi Ben,
> >>>
> >>> I guess I'm confused about what you mean by replication. If I look at
> >>> the underlying bricks I only ever have a single copy of any file. It
> >>> either resides on one brick or the other (directories exist on both
> >>> bricks but not files). We are not using gluster for redundancy (or at
> >>> least that wasn't our intent). Is that what you meant by replication
> >>> or is it something else?
> >>>
> >>> Thanks
> >>>
> >>> Pat
> >>>
> >>> On 06/12/2017 04:28 PM, Ben Turner wrote:
> >>>
> >>> ----- Original Message -----
> >>>
> >>> From: "Pat Haley" <phaley at mit.edu> <phaley at mit.edu>
> >>> To: "Ben Turner" <bturner at redhat.com> <bturner at redhat.com>, "Pranith Kumar Karampuri"<pkarampu at redhat.com> <pkarampu at redhat.com>
> >>> Cc: "Ravishankar N" <ravishankar at redhat.com> <ravishankar at redhat.com>, gluster-users at gluster.org,
> >>> "Steve Postma" <SPostma at ztechnet.com> <SPostma at ztechnet.com>
> >>> Sent: Monday, June 12, 2017 2:35:41 PM
> >>> Subject: Re: [Gluster-users] Slow write times to gluster disk
> >>>
> >>>
> >>> Hi Guys,
> >>>
> >>> I was wondering what our next steps should be to solve the slow write
> >>> times.
> >>>
> >>> Recently I was debugging a large code and writing a lot of output at
> >>> every time step. When I tried writing to our gluster disks, it was
> >>> taking over a day to do a single time step whereas if I had the same
> >>> program (same hardware, network) write to our nfs disk the time per
> >>> time-step was about 45 minutes. What we are shooting for here would be
> >>> to have similar times to either gluster of nfs.
> >>>
> >>> I can see in your test:
> >>> http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/dd_testvol_gluster.txt
> >>>
> >>> You averaged ~600 MB / sec(expected for replica 2 with 10G, {~1200 MB /
> >>> sec} / #replicas{2} = 600). Gluster does client side replication so with
> >>> replica 2 you will only ever see 1/2 the speed of your slowest part of
> >>> the
> >>> stack(NW, disk, RAM, CPU). This is usually NW or disk and 600 is
> >>> normally
> >>> a best case. Now in your output I do see the instances where you went
> >>> down to 200 MB / sec. I can only explain this in three ways:
> >>>
> >>> 1. You are not using conv=fdatasync and writes are actually going to
> >>> page
> >>> cache and then being flushed to disk. During the fsync the memory is not
> >>> yet available and the disks are busy flushing dirty pages.
> >>> 2. Your storage RAID group is shared across multiple LUNS(like in a SAN)
> >>> and when write times are slow the RAID group is busy serviceing other
> >>> LUNs.
> >>> 3. Gluster bug / config issue / some other unknown unknown.
> >>>
> >>> So I see 2 issues here:
> >>>
> >>> 1. NFS does in 45 minutes what gluster can do in 24 hours.
> >>> 2. Sometimes your throughput drops dramatically.
> >>>
> >>> WRT #1 - have a look at my estimates above. My formula for guestimating
> >>> gluster perf is: throughput = NIC throughput or storage(whatever is
> >>> slower) / # replicas * overhead(figure .7 or .8). Also the larger the
> >>> record size the better for glusterfs mounts, I normally like to be at
> >>> LEAST 64k up to 1024k:
> >>>
> >>> # dd if=/dev/zero of=/gluster-mount/file bs=1024k count=10000
> >>> conv=fdatasync
> >>>
> >>> WRT #2 - Again, I question your testing and your storage config. Try
> >>> using
> >>> conv=fdatasync for your DDs, use a larger record size, and make sure that
> >>> your back end storage is not causing your slowdowns. Also remember that
> >>> with replica 2 you will take ~50% hit on writes because the client uses
> >>> 50% of its bandwidth to write to one replica and 50% to the other.
> >>>
> >>> -b
> >>>
> >>>
> >>>
> >>>
> >>> Thanks
> >>>
> >>> Pat
> >>>
> >>>
> >>> On 06/02/2017 01:07 AM, Ben Turner wrote:
> >>>
> >>> Are you sure using conv=sync is what you want? I normally use
> >>> conv=fdatasync, I'll look up the difference between the two and see if
> >>> it
> >>> affects your test.
> >>>
> >>>
> >>> -b
> >>>
> >>> ----- Original Message -----
> >>>
> >>> From: "Pat Haley" <phaley at mit.edu> <phaley at mit.edu>
> >>> To: "Pranith Kumar Karampuri" <pkarampu at redhat.com> <pkarampu at redhat.com>
> >>> Cc: "Ravishankar N" <ravishankar at redhat.com> <ravishankar at redhat.com>,gluster-users at gluster.org,
> >>> "Steve Postma" <SPostma at ztechnet.com> <SPostma at ztechnet.com>, "Ben
> >>> Turner" <bturner at redhat.com> <bturner at redhat.com>
> >>> Sent: Tuesday, May 30, 2017 9:40:34 PM
> >>> Subject: Re: [Gluster-users] Slow write times to gluster disk
> >>>
> >>>
> >>> Hi Pranith,
> >>>
> >>> The "dd" command was:
> >>>
> >>> dd if=/dev/zero count=4096 bs=1048576 of=zeros.txt conv=sync
> >>>
> >>> There were 2 instances where dd reported 22 seconds. The output from
> >>> the
> >>> dd tests are in
> >>> http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/dd_testvol_gluster.txt
> >>>
> >>> Pat
> >>>
> >>> On 05/30/2017 09:27 PM, Pranith Kumar Karampuri wrote:
> >>>
> >>> Pat,
> >>> What is the command you used? As per the following output,
> >>> it
> >>> seems like at least one write operation took 16 seconds. Which is
> >>> really bad.
> >>> 96.39 1165.10 us 89.00 us*16487014.00 us*
> >>> 393212
> >>> WRITE
> >>>
> >>>
> >>> On Tue, May 30, 2017 at 10:36 PM, Pat Haley <phaley at mit.edu<mailto:phaley at mit.edu> <phaley at mit.edu>> wrote:
> >>>
> >>>
> >>> Hi Pranith,
> >>>
> >>> I ran the same 'dd' test both in the gluster test volume and
> >>> in
> >>> the .glusterfs directory of each brick. The median results
> >>> (12
> >>> dd
> >>> trials in each test) are similar to before
> >>>
> >>> * gluster test volume: 586.5 MB/s
> >>> * bricks (in .glusterfs): 1.4 GB/s
> >>>
> >>> The profile for the gluster test-volume is in
> >>>
> >>> http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/profile_testvol_gluster.txt
> >>> <http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/profile_testvol_gluster.txt> <http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/profile_testvol_gluster.txt>
> >>>
> >>> Thanks
> >>>
> >>> Pat
> >>>
> >>>
> >>>
> >>>
> >>> On 05/30/2017 12:10 PM, Pranith Kumar Karampuri wrote:
> >>>
> >>> Let's start with the same 'dd' test we were testing with to
> >>> see,
> >>> what the numbers are. Please provide profile numbers for the
> >>> same. From there on we will start tuning the volume to see
> >>> what
> >>> we can do.
> >>>
> >>> On Tue, May 30, 2017 at 9:16 PM, Pat Haley <phaley at mit.edu
> >>> <mailto:phaley at mit.edu> <phaley at mit.edu>> wrote:
> >>>
> >>>
> >>> Hi Pranith,
> >>>
> >>> Thanks for the tip. We now have the gluster volume
> >>> mounted
> >>> under /home. What tests do you recommend we run?
> >>>
> >>> Thanks
> >>>
> >>> Pat
> >>>
> >>>
> >>>
> >>> On 05/17/2017 05:01 AM, Pranith Kumar Karampuri wrote:
> >>>
> >>> On Tue, May 16, 2017 at 9:20 PM, Pat Haley
> >>> <phaley at mit.edu
> >>> <mailto:phaley at mit.edu> <phaley at mit.edu>> wrote:
> >>>
> >>>
> >>> Hi Pranith,
> >>>
> >>> Sorry for the delay. I never saw received your
> >>> reply
> >>> (but I did receive Ben Turner's follow-up to your
> >>> reply). So we tried to create a gluster volume
> >>> under
> >>> /home using different variations of
> >>>
> >>> gluster volume create test-volume
> >>> mseas-data2:/home/gbrick_test_1
> >>> mseas-data2:/home/gbrick_test_2 transport tcp
> >>>
> >>> However we keep getting errors of the form
> >>>
> >>> Wrong brick type: transport, use
> >>> <HOSTNAME>:<export-dir-abs-path>
> >>>
> >>> Any thoughts on what we're doing wrong?
> >>>
> >>>
> >>> You should give transport tcp at the beginning I think.
> >>> Anyways, transport tcp is the default, so no need to
> >>> specify
> >>> so remove those two words from the CLI.
> >>>
> >>>
> >>> Also do you have a list of the test we should be
> >>> running
> >>> once we get this volume created? Given the
> >>> time-zone
> >>> difference it might help if we can run a small
> >>> battery
> >>> of tests and post the results rather than
> >>> test-post-new
> >>> test-post... .
> >>>
> >>>
> >>> This is the first time I am doing performance analysis
> >>> on
> >>> users as far as I remember. In our team there are
> >>> separate
> >>> engineers who do these tests. Ben who replied earlier is
> >>> one
> >>> such engineer.
> >>>
> >>> Ben,
> >>> Have any suggestions?
> >>>
> >>>
> >>> Thanks
> >>>
> >>> Pat
> >>>
> >>>
> >>>
> >>> On 05/11/2017 12:06 PM, Pranith Kumar Karampuri
> >>> wrote:
> >>>
> >>> On Thu, May 11, 2017 at 9:32 PM, Pat Haley
> >>> <phaley at mit.edu <mailto:phaley at mit.edu> <phaley at mit.edu>> wrote:
> >>>
> >>>
> >>> Hi Pranith,
> >>>
> >>> The /home partition is mounted as ext4
> >>> /home ext4 defaults,usrquota,grpquota 1 2
> >>>
> >>> The brick partitions are mounted ax xfs
> >>> /mnt/brick1 xfs defaults 0 0
> >>> /mnt/brick2 xfs defaults 0 0
> >>>
> >>> Will this cause a problem with creating a
> >>> volume
> >>> under /home?
> >>>
> >>>
> >>> I don't think the bottleneck is disk. You can do
> >>> the
> >>> same tests you did on your new volume to confirm?
> >>>
> >>>
> >>> Pat
> >>>
> >>>
> >>>
> >>> On 05/11/2017 11:32 AM, Pranith Kumar Karampuri
> >>> wrote:
> >>>
> >>> On Thu, May 11, 2017 at 8:57 PM, Pat Haley
> >>> <phaley at mit.edu <mailto:phaley at mit.edu> <phaley at mit.edu>>
> >>> wrote:
> >>>
> >>>
> >>> Hi Pranith,
> >>>
> >>> Unfortunately, we don't have similar
> >>> hardware
> >>> for a small scale test. All we have is
> >>> our
> >>> production hardware.
> >>>
> >>>
> >>> You said something about /home partition which
> >>> has
> >>> lesser disks, we can create plain distribute
> >>> volume inside one of those directories. After
> >>> we
> >>> are done, we can remove the setup. What do you
> >>> say?
> >>>
> >>>
> >>> Pat
> >>>
> >>>
> >>>
> >>>
> >>> On 05/11/2017 07:05 AM, Pranith Kumar
> >>> Karampuri wrote:
> >>>
> >>> On Thu, May 11, 2017 at 2:48 AM, Pat
> >>> Haley
> >>> <phaley at mit.edu <mailto:phaley at mit.edu> <phaley at mit.edu>>
> >>> wrote:
> >>>
> >>>
> >>> Hi Pranith,
> >>>
> >>> Since we are mounting the partitions
> >>> as
> >>> the bricks, I tried the dd test
> >>> writing
> >>> to
> >>> <brick-path>/.glusterfs/<file-to-be-removed-after-test>.
> >>> The results without oflag=sync were
> >>> 1.6
> >>> Gb/s (faster than gluster but not as
> >>> fast
> >>> as I was expecting given the 1.2 Gb/s
> >>> to
> >>> the no-gluster area w/ fewer disks).
> >>>
> >>>
> >>> Okay, then 1.6Gb/s is what we need to
> >>> target
> >>> for, considering your volume is just
> >>> distribute. Is there any way you can do
> >>> tests
> >>> on similar hardware but at a small scale?
> >>> Just so we can run the workload to learn
> >>> more
> >>> about the bottlenecks in the system? We
> >>> can
> >>> probably try to get the speed to 1.2Gb/s
> >>> on
> >>> your /home partition you were telling me
> >>> yesterday. Let me know if that is
> >>> something
> >>> you are okay to do.
> >>>
> >>>
> >>> Pat
> >>>
> >>>
> >>>
> >>> On 05/10/2017 01:27 PM, Pranith Kumar
> >>> Karampuri wrote:
> >>>
> >>> On Wed, May 10, 2017 at 10:15 PM,
> >>> Pat
> >>> Haley <phaley at mit.edu
> >>> <mailto:phaley at mit.edu> <phaley at mit.edu>> wrote:
> >>>
> >>>
> >>> Hi Pranith,
> >>>
> >>> Not entirely sure (this isn't my
> >>> area of expertise). I'll run
> >>> your
> >>> answer by some other people who
> >>> are
> >>> more familiar with this.
> >>>
> >>> I am also uncertain about how to
> >>> interpret the results when we
> >>> also
> >>> add the dd tests writing to the
> >>> /home area (no gluster, still on
> >>> the
> >>> same machine)
> >>>
> >>> * dd test without oflag=sync
> >>> (rough average of multiple
> >>> tests)
> >>> o gluster w/ fuse mount :
> >>> 570
> >>> Mb/s
> >>> o gluster w/ nfs mount:
> >>> 390
> >>> Mb/s
> >>> o nfs (no gluster): 1.2
> >>> Gb/s
> >>> * dd test with oflag=sync
> >>> (rough
> >>> average of multiple tests)
> >>> o gluster w/ fuse mount:
> >>> 5
> >>> Mb/s
> >>> o gluster w/ nfs mount:
> >>> 200
> >>> Mb/s
> >>> o nfs (no gluster): 20
> >>> Mb/s
> >>>
> >>> Given that the non-gluster area
> >>> is
> >>> a
> >>> RAID-6 of 4 disks while each
> >>> brick
> >>> of the gluster area is a RAID-6
> >>> of
> >>> 32 disks, I would naively expect
> >>> the
> >>> writes to the gluster area to be
> >>> roughly 8x faster than to the
> >>> non-gluster.
> >>>
> >>>
> >>> I think a better test is to try and
> >>> write to a file using nfs without
> >>> any
> >>> gluster to a location that is not
> >>> inside
> >>> the brick but someother location
> >>> that
> >>> is
> >>> on same disk(s). If you are mounting
> >>> the
> >>> partition as the brick, then we can
> >>> write to a file inside .glusterfs
> >>> directory, something like
> >>> <brick-path>/.glusterfs/<file-to-be-removed-after-test>.
> >>>
> >>>
> >>>
> >>> I still think we have a speed
> >>> issue,
> >>> I can't tell if fuse vs nfs is
> >>> part
> >>> of the problem.
> >>>
> >>>
> >>> I got interested in the post because
> >>> I
> >>> read that fuse speed is lesser than
> >>> nfs
> >>> speed which is counter-intuitive to
> >>> my
> >>> understanding. So wanted
> >>> clarifications.
> >>> Now that I got my clarifications
> >>> where
> >>> fuse outperformed nfs without sync,
> >>> we
> >>> can resume testing as described
> >>> above
> >>> and try to find what it is. Based on
> >>> your email-id I am guessing you are
> >>> from
> >>> Boston and I am from Bangalore so if
> >>> you
> >>> are okay with doing this debugging
> >>> for
> >>> multiple days because of timezones,
> >>> I
> >>> will be happy to help. Please be a
> >>> bit
> >>> patient with me, I am under a
> >>> release
> >>> crunch but I am very curious with
> >>> the
> >>> problem you posted.
> >>>
> >>> Was there anything useful in the
> >>> profiles?
> >>>
> >>>
> >>> Unfortunately profiles didn't help
> >>> me
> >>> much, I think we are collecting the
> >>> profiles from an active volume, so
> >>> it
> >>> has a lot of information that is not
> >>> pertaining to dd so it is difficult
> >>> to
> >>> find the contributions of dd. So I
> >>> went
> >>> through your post again and found
> >>> something I didn't pay much
> >>> attention
> >>> to
> >>> earlier i.e. oflag=sync, so did my
> >>> own
> >>> tests on my setup with FUSE so sent
> >>> that
> >>> reply.
> >>>
> >>>
> >>> Pat
> >>>
> >>>
> >>>
> >>> On 05/10/2017 12:15 PM, Pranith
> >>> Kumar Karampuri wrote:
> >>>
> >>> Okay good. At least this
> >>> validates
> >>> my doubts. Handling O_SYNC in
> >>> gluster NFS and fuse is a bit
> >>> different.
> >>> When application opens a file
> >>> with
> >>> O_SYNC on fuse mount then each
> >>> write syscall has to be written
> >>> to
> >>> disk as part of the syscall
> >>> where
> >>> as in case of NFS, there is no
> >>> concept of open. NFS performs
> >>> write
> >>> though a handle saying it needs
> >>> to
> >>> be a synchronous write, so
> >>> write()
> >>> syscall is performed first then
> >>> it
> >>> performs fsync(). so an write
> >>> on
> >>> an
> >>> fd with O_SYNC becomes
> >>> write+fsync.
> >>> I am suspecting that when
> >>> multiple
> >>> threads do this write+fsync()
> >>> operation on the same file,
> >>> multiple writes are batched
> >>> together to be written do disk
> >>> so
> >>> the throughput on the disk is
> >>> increasing is my guess.
> >>>
> >>> Does it answer your doubts?
> >>>
> >>> On Wed, May 10, 2017 at 9:35
> >>> PM,
> >>> Pat Haley <phaley at mit.edu
> >>> <mailto:phaley at mit.edu> <phaley at mit.edu>> wrote:
> >>>
> >>>
> >>> Without the oflag=sync and
> >>> only
> >>> a single test of each, the
> >>> FUSE
> >>> is going faster than NFS:
> >>>
> >>> FUSE:
> >>> mseas-data2(dri_nascar)% dd
> >>> if=/dev/zero count=4096
> >>> bs=1048576 of=zeros.txt
> >>> conv=sync
> >>> 4096+0 records in
> >>> 4096+0 records out
> >>> 4294967296 bytes (4.3 GB)
> >>> copied, 7.46961 s, 575 MB/s
> >>>
> >>>
> >>> NFS
> >>> mseas-data2(HYCOM)% dd
> >>> if=/dev/zero count=4096
> >>> bs=1048576 of=zeros.txt
> >>> conv=sync
> >>> 4096+0 records in
> >>> 4096+0 records out
> >>> 4294967296 bytes (4.3 GB)
> >>> copied, 11.4264 s, 376 MB/s
> >>>
> >>>
> >>>
> >>> On 05/10/2017 11:53 AM,
> >>> Pranith
> >>> Kumar Karampuri wrote:
> >>>
> >>> Could you let me know the
> >>> speed without oflag=sync
> >>> on
> >>> both the mounts? No need
> >>> to
> >>> collect profiles.
> >>>
> >>> On Wed, May 10, 2017 at
> >>> 9:17
> >>> PM, Pat Haley
> >>> <phaley at mit.edu
> >>> <mailto:phaley at mit.edu> <phaley at mit.edu>>
> >>> wrote:
> >>>
> >>>
> >>> Here is what I see
> >>> now:
> >>>
> >>> [root at mseas-data2 ~]#
> >>> gluster volume info
> >>>
> >>> Volume Name:
> >>> data-volume
> >>> Type: Distribute
> >>> Volume ID:
> >>> c162161e-2a2d-4dac-b015-f31fd89ceb18
> >>> Status: Started
> >>> Number of Bricks: 2
> >>> Transport-type: tcp
> >>> Bricks:
> >>> Brick1:
> >>> mseas-data2:/mnt/brick1
> >>> Brick2:
> >>> mseas-data2:/mnt/brick2
> >>> Options Reconfigured:
> >>> diagnostics.count-fop-hits:
> >>> on
> >>> diagnostics.latency-measurement:
> >>> on
> >>> nfs.exports-auth-enable:
> >>> on
> >>> diagnostics.brick-sys-log-level:
> >>> WARNING
> >>> performance.readdir-ahead:
> >>> on
> >>> nfs.disable: on
> >>> nfs.export-volumes:
> >>> off
> >>>
> >>>
> >>>
> >>> On 05/10/2017 11:44
> >>> AM,
> >>> Pranith Kumar
> >>> Karampuri
> >>> wrote:
> >>>
> >>> Is this the volume
> >>> info
> >>> you have?
> >>>
> >>> >/[root at
> >>> >mseas-data2
> >>> <http://www.gluster.org/mailman/listinfo/gluster-users> <http://www.gluster.org/mailman/listinfo/gluster-users>
> >>> ~]# gluster volume
> >>> info
> >>> />//>/Volume Name:
> >>> data-volume />/Type:
> >>> Distribute />/Volume
> >>> ID:
> >>> c162161e-2a2d-4dac-b015-f31fd89ceb18
> >>> />/Status: Started
> >>> />/Number
> >>> of Bricks: 2
> >>> />/Transport-type:
> >>> tcp
> >>> />/Bricks: />/Brick1:
> >>> mseas-data2:/mnt/brick1
> >>> />/Brick2:
> >>> mseas-data2:/mnt/brick2
> >>> />/Options
> >>> Reconfigured:
> >>> />/performance.readdir-ahead:
> >>> on />/nfs.disable: on
> >>> />/nfs.export-volumes:
> >>> off
> >>> /
> >>> I copied this from
> >>> old
> >>> thread from 2016.
> >>> This
> >>> is
> >>> distribute volume.
> >>> Did
> >>> you change any of the
> >>> options in between?
> >>>
> >>> --
> >>>
> >>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> >>> Pat Haley
> >>> Email:phaley at mit.edu
> >>> <mailto:phaley at mit.edu> <phaley at mit.edu>
> >>> Center for Ocean
> >>> Engineering
> >>> Phone: (617) 253-6824
> >>> Dept. of Mechanical
> >>> Engineering
> >>> Fax: (617) 253-8125
> >>> MIT, Room
> >>> 5-213http://web.mit.edu/phaley/www/
> >>> 77 Massachusetts
> >>> Avenue
> >>> Cambridge, MA
> >>> 02139-4301
> >>>
> >>> --
> >>> Pranith
> >>>
> >>> --
> >>>
> >>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> >>> Pat Haley
> >>> Email:phaley at mit.edu
> >>> <mailto:phaley at mit.edu> <phaley at mit.edu>
> >>> Center for Ocean
> >>> Engineering
> >>> Phone: (617) 253-6824
> >>> Dept. of Mechanical
> >>> Engineering
> >>> Fax: (617) 253-8125
> >>> MIT, Room
> >>> 5-213http://web.mit.edu/phaley/www/
> >>> 77 Massachusetts Avenue
> >>> Cambridge, MA 02139-4301
> >>>
> >>> --
> >>> Pranith
> >>>
> >>> --
> >>>
> >>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> >>> Pat Haley
> >>> Email:phaley at mit.edu
> >>> <mailto:phaley at mit.edu> <phaley at mit.edu>
> >>> Center for Ocean Engineering
> >>> Phone:
> >>> (617) 253-6824
> >>> Dept. of Mechanical Engineering
> >>> Fax:
> >>> (617) 253-8125
> >>> MIT, Room
> >>> 5-213http://web.mit.edu/phaley/www/
> >>> 77 Massachusetts Avenue
> >>> Cambridge, MA 02139-4301
> >>>
> >>> --
> >>> Pranith
> >>>
> >>> --
> >>>
> >>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> >>> Pat Haley
> >>> Email:phaley at mit.edu
> >>> <mailto:phaley at mit.edu> <phaley at mit.edu>
> >>> Center for Ocean Engineering
> >>> Phone:
> >>> (617) 253-6824
> >>> Dept. of Mechanical Engineering
> >>> Fax:
> >>> (617) 253-8125
> >>> MIT, Room
> >>> 5-213http://web.mit.edu/phaley/www/
> >>> 77 Massachusetts Avenue
> >>> Cambridge, MA 02139-4301
> >>>
> >>> --
> >>> Pranith
> >>>
> >>> --
> >>>
> >>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> >>> Pat Haley
> >>> Email:phaley at mit.edu
> >>> <mailto:phaley at mit.edu> <phaley at mit.edu>
> >>> Center for Ocean Engineering Phone:
> >>> (617)
> >>> 253-6824
> >>> Dept. of Mechanical Engineering Fax:
> >>> (617)
> >>> 253-8125
> >>> MIT, Room
> >>> 5-213http://web.mit.edu/phaley/www/
> >>> 77 Massachusetts Avenue
> >>> Cambridge, MA 02139-4301
> >>>
> >>> --
> >>> Pranith
> >>>
> >>> --
> >>>
> >>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> >>> Pat Haley
> >>> Email:phaley at mit.edu
> >>> <mailto:phaley at mit.edu> <phaley at mit.edu>
> >>> Center for Ocean Engineering Phone:
> >>> (617)
> >>> 253-6824
> >>> Dept. of Mechanical Engineering Fax:
> >>> (617)
> >>> 253-8125
> >>> MIT, Room 5-213http://web.mit.edu/phaley/www/
> >>> 77 Massachusetts Avenue
> >>> Cambridge, MA 02139-4301
> >>>
> >>> --
> >>> Pranith
> >>>
> >>> --
> >>>
> >>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> >>> Pat Haley
> >>> Email:phaley at mit.edu
> >>> <mailto:phaley at mit.edu> <phaley at mit.edu>
> >>> Center for Ocean Engineering Phone: (617)
> >>> 253-6824
> >>> Dept. of Mechanical Engineering Fax: (617)
> >>> 253-8125
> >>> MIT, Room 5-213http://web.mit.edu/phaley/www/
> >>> 77 Massachusetts Avenue
> >>> Cambridge, MA 02139-4301
> >>>
> >>>
> >>>
> >>>
> >>> --
> >>> Pranith
> >>>
> >>> --
> >>>
> >>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> >>> Pat Haley Email:phaley at mit.edu
> >>> <mailto:phaley at mit.edu> <phaley at mit.edu>
> >>> Center for Ocean Engineering Phone: (617) 253-6824
> >>> Dept. of Mechanical Engineering Fax: (617) 253-8125
> >>> MIT, Room 5-213http://web.mit.edu/phaley/www/
> >>> 77 Massachusetts Avenue
> >>> Cambridge, MA 02139-4301
> >>>
> >>>
> >>>
> >>>
> >>> --
> >>> Pranith
> >>>
> >>> --
> >>>
> >>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> >>> Pat Haley Email:phaley at mit.edu
> >>> <mailto:phaley at mit.edu> <phaley at mit.edu>
> >>> Center for Ocean Engineering Phone: (617) 253-6824
> >>> Dept. of Mechanical Engineering Fax: (617) 253-8125
> >>> MIT, Room 5-213http://web.mit.edu/phaley/www/
> >>> 77 Massachusetts Avenue
> >>> Cambridge, MA 02139-4301
> >>>
> >>>
> >>>
> >>>
> >>> --
> >>> Pranith
> >>>
> >>> --
> >>>
> >>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> >>> Pat Haley Email: phaley at mit.edu
> >>> Center for Ocean Engineering Phone: (617) 253-6824
> >>> Dept. of Mechanical Engineering Fax: (617) 253-8125
> >>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
> >>> 77 Massachusetts Avenue
> >>> Cambridge, MA 02139-4301
> >>>
> >>>
> >>>
> >>> --
> >>>
> >>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> >>> Pat Haley Email: phaley at mit.edu
> >>> Center for Ocean Engineering Phone: (617) 253-6824
> >>> Dept. of Mechanical Engineering Fax: (617) 253-8125
> >>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
> >>> 77 Massachusetts Avenue
> >>> Cambridge, MA 02139-4301
> >>>
> >>>
> >>>
> >>> --
> >>>
> >>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> >>> Pat Haley Email: phaley at mit.edu
> >>> Center for Ocean Engineering Phone: (617) 253-6824
> >>> Dept. of Mechanical Engineering Fax: (617) 253-8125
> >>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
> >>> 77 Massachusetts Avenue
> >>> Cambridge, MA 02139-4301
> >>>
> >>>
> >>>
> >>> --
> >>>
> >>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> >>> Pat Haley Email: phaley at mit.edu
> >>> Center for Ocean Engineering Phone: (617) 253-6824
> >>> Dept. of Mechanical Engineering Fax: (617) 253-8125
> >>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
> >>> 77 Massachusetts Avenue
> >>> Cambridge, MA 02139-4301
> >>>
> >>>
> >>>
> >>>
> >>> --
> >>>
> >>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> >>> Pat Haley Email: phaley at mit.edu
> >>> Center for Ocean Engineering Phone: (617) 253-6824
> >>> Dept. of Mechanical Engineering Fax: (617) 253-8125
> >>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
> >>> 77 Massachusetts Avenue
> >>> Cambridge, MA 02139-4301
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> Gluster-users mailing listGluster-users at gluster.orghttp://lists.gluster.org/mailman/listinfo/gluster-users
> >>>
> >>>
> >>> --
> >>>
> >>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> >>> Pat Haley Email: phaley at mit.edu
> >>> Center for Ocean Engineering Phone: (617) 253-6824
> >>> Dept. of Mechanical Engineering Fax: (617) 253-8125
> >>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
> >>> 77 Massachusetts Avenue
> >>> Cambridge, MA 02139-4301
> >>>
> >>>
> >>
> >>
> >> --
> >> Pranith
> >>
> >
> >
> >
> > --
> > Pranith
> >
> >
> > --
> >
> > -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> > Pat Haley Email: phaley at mit.edu
> > Center for Ocean Engineering Phone: (617) 253-6824
> > Dept. of Mechanical Engineering Fax: (617) 253-8125
> > MIT, Room 5-213 http://web.mit.edu/phaley/www/
> > 77 Massachusetts Avenue
> > Cambridge, MA 02139-4301
> >
> >
>
>
> --
> Pranith
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: not available
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170627/52e5daca/attachment.sig>
More information about the Gluster-users
mailing list