[Gluster-users] Slow write times to gluster disk

Ben Turner bturner at redhat.com
Mon May 15 01:24:53 UTC 2017


----- Original Message -----
> From: "Pranith Kumar Karampuri" <pkarampu at redhat.com>
> To: "Pat Haley" <phaley at mit.edu>
> Cc: gluster-users at gluster.org, "Steve Postma" <SPostma at ztechnet.com>
> Sent: Friday, May 12, 2017 11:17:11 PM
> Subject: Re: [Gluster-users] Slow write times to gluster disk
> 
> 
> 
> On Sat, May 13, 2017 at 8:44 AM, Pranith Kumar Karampuri <
> pkarampu at redhat.com > wrote:
> 
> 
> 
> 
> 
> On Fri, May 12, 2017 at 8:04 PM, Pat Haley < phaley at mit.edu > wrote:
> 
> 
> 
> 
> Hi Pranith,
> 
> My question was about setting up a gluster volume on an ext4 partition. I
> thought we had the bricks mounted as xfs for compatibility with gluster?
> 
> Oh that should not be a problem. It works fine.
> 
> Just that xfs doesn't have limits for anything, where as ext4 does for things
> like hardlinks etc(At least last time I checked :-) ). So it is better to
> have xfs.

One of the biggest reasons to use XFS IMHO is that most of the testing / large scale deployments(at least that I know of) / etc are done using XFS as a backend.  While EXT4 should work I don't think that it has the same level of testing as XFS.

-b 



> 
> 
> 
> 
> 
> 
> 
> Pat
> 
> 
> 
> On 05/11/2017 12:06 PM, Pranith Kumar Karampuri wrote:
> 
> 
> 
> 
> 
> On Thu, May 11, 2017 at 9:32 PM, Pat Haley < phaley at mit.edu > wrote:
> 
> 
> 
> 
> Hi Pranith,
> 
> The /home partition is mounted as ext4
> /home ext4 defaults,usrquota,grpquota 1 2
> 
> The brick partitions are mounted ax xfs
> /mnt/brick1 xfs defaults 0 0
> /mnt/brick2 xfs defaults 0 0
> 
> Will this cause a problem with creating a volume under /home?
> 
> I don't think the bottleneck is disk. You can do the same tests you did on
> your new volume to confirm?
> 
> 
> 
> 
> Pat
> 
> 
> 
> On 05/11/2017 11:32 AM, Pranith Kumar Karampuri wrote:
> 
> 
> 
> 
> 
> On Thu, May 11, 2017 at 8:57 PM, Pat Haley < phaley at mit.edu > wrote:
> 
> 
> 
> 
> Hi Pranith,
> 
> Unfortunately, we don't have similar hardware for a small scale test. All we
> have is our production hardware.
> 
> You said something about /home partition which has lesser disks, we can
> create plain distribute volume inside one of those directories. After we are
> done, we can remove the setup. What do you say?
> 
> 
> 
> 
> 
> Pat
> 
> 
> 
> 
> On 05/11/2017 07:05 AM, Pranith Kumar Karampuri wrote:
> 
> 
> 
> 
> 
> On Thu, May 11, 2017 at 2:48 AM, Pat Haley < phaley at mit.edu > wrote:
> 
> 
> 
> 
> Hi Pranith,
> 
> Since we are mounting the partitions as the bricks, I tried the dd test
> writing to <brick-path>/.glusterfs/<file-to-be-removed-after-test>. The
> results without oflag=sync were 1.6 Gb/s (faster than gluster but not as
> fast as I was expecting given the 1.2 Gb/s to the no-gluster area w/ fewer
> disks).
> 
> Okay, then 1.6Gb/s is what we need to target for, considering your volume is
> just distribute. Is there any way you can do tests on similar hardware but
> at a small scale? Just so we can run the workload to learn more about the
> bottlenecks in the system? We can probably try to get the speed to 1.2Gb/s
> on your /home partition you were telling me yesterday. Let me know if that
> is something you are okay to do.
> 
> 
> 
> 
> Pat
> 
> 
> 
> On 05/10/2017 01:27 PM, Pranith Kumar Karampuri wrote:
> 
> 
> 
> 
> 
> On Wed, May 10, 2017 at 10:15 PM, Pat Haley < phaley at mit.edu > wrote:
> 
> 
> 
> 
> Hi Pranith,
> 
> Not entirely sure (this isn't my area of expertise). I'll run your answer by
> some other people who are more familiar with this.
> 
> I am also uncertain about how to interpret the results when we also add the
> dd tests writing to the /home area (no gluster, still on the same machine)
> 
> 
>     * dd test without oflag=sync (rough average of multiple tests)
> 
> 
>         * gluster w/ fuse mount : 570 Mb/s
>         * gluster w/ nfs mount: 390 Mb/s
>         * nfs (no gluster): 1.2 Gb/s
>     * dd test with oflag=sync (rough average of multiple tests)
> 
>         * gluster w/ fuse mount: 5 Mb/s
>         * gluster w/ nfs mount: 200 Mb/s
>         * nfs (no gluster): 20 Mb/s
> 
> Given that the non-gluster area is a RAID-6 of 4 disks while each brick of
> the gluster area is a RAID-6 of 32 disks, I would naively expect the writes
> to the gluster area to be roughly 8x faster than to the non-gluster.
> 
> I think a better test is to try and write to a file using nfs without any
> gluster to a location that is not inside the brick but someother location
> that is on same disk(s). If you are mounting the partition as the brick,
> then we can write to a file inside .glusterfs directory, something like
> <brick-path>/.glusterfs/<file-to-be-removed-after-test>.
> 
> 
> 
> 
> 
> I still think we have a speed issue, I can't tell if fuse vs nfs is part of
> the problem.
> 
> I got interested in the post because I read that fuse speed is lesser than
> nfs speed which is counter-intuitive to my understanding. So wanted
> clarifications. Now that I got my clarifications where fuse outperformed nfs
> without sync, we can resume testing as described above and try to find what
> it is. Based on your email-id I am guessing you are from Boston and I am
> from Bangalore so if you are okay with doing this debugging for multiple
> days because of timezones, I will be happy to help. Please be a bit patient
> with me, I am under a release crunch but I am very curious with the problem
> you posted.
> 
> 
> 
> 
> Was there anything useful in the profiles?
> 
> Unfortunately profiles didn't help me much, I think we are collecting the
> profiles from an active volume, so it has a lot of information that is not
> pertaining to dd so it is difficult to find the contributions of dd. So I
> went through your post again and found something I didn't pay much attention
> to earlier i.e. oflag=sync, so did my own tests on my setup with FUSE so
> sent that reply.
> 
> 
> 
> 
> 
> Pat
> 
> 
> 
> On 05/10/2017 12:15 PM, Pranith Kumar Karampuri wrote:
> 
> 
> 
> Okay good. At least this validates my doubts. Handling O_SYNC in gluster NFS
> and fuse is a bit different.
> When application opens a file with O_SYNC on fuse mount then each write
> syscall has to be written to disk as part of the syscall where as in case of
> NFS, there is no concept of open. NFS performs write though a handle saying
> it needs to be a synchronous write, so write() syscall is performed first
> then it performs fsync(). so an write on an fd with O_SYNC becomes
> write+fsync. I am suspecting that when multiple threads do this
> write+fsync() operation on the same file, multiple writes are batched
> together to be written do disk so the throughput on the disk is increasing
> is my guess.
> 
> Does it answer your doubts?
> 
> On Wed, May 10, 2017 at 9:35 PM, Pat Haley < phaley at mit.edu > wrote:
> 
> 
> 
> 
> Without the oflag=sync and only a single test of each, the FUSE is going
> faster than NFS:
> 
> FUSE:
> mseas-data2(dri_nascar)% dd if=/dev/zero count=4096 bs=1048576 of=zeros.txt
> conv=sync
> 4096+0 records in
> 4096+0 records out
> 4294967296 bytes (4.3 GB) copied, 7.46961 s, 575 MB/s
> 
> 
> NFS
> mseas-data2(HYCOM)% dd if=/dev/zero count=4096 bs=1048576 of=zeros.txt
> conv=sync
> 4096+0 records in
> 4096+0 records out
> 4294967296 bytes (4.3 GB) copied, 11.4264 s, 376 MB/s
> 
> 
> 
> On 05/10/2017 11:53 AM, Pranith Kumar Karampuri wrote:
> 
> 
> 
> Could you let me know the speed without oflag=sync on both the mounts? No
> need to collect profiles.
> 
> On Wed, May 10, 2017 at 9:17 PM, Pat Haley < phaley at mit.edu > wrote:
> 
> 
> 
> 
> Here is what I see now:
> 
> [root at mseas-data2 ~]# gluster volume info
> 
> Volume Name: data-volume
> Type: Distribute
> Volume ID: c162161e-2a2d-4dac-b015-f31fd89ceb18
> Status: Started
> Number of Bricks: 2
> Transport-type: tcp
> Bricks:
> Brick1: mseas-data2:/mnt/brick1
> Brick2: mseas-data2:/mnt/brick2
> Options Reconfigured:
> diagnostics.count-fop-hits: on
> diagnostics.latency-measurement: on
> nfs.exports-auth-enable: on
> diagnostics.brick-sys-log-level: WARNING
> performance.readdir-ahead: on
> nfs.disable: on
> nfs.export-volumes: off
> 
> 
> 
> On 05/10/2017 11:44 AM, Pranith Kumar Karampuri wrote:
> 
> 
> 
> Is this the volume info you have?
> 
> > [ root at mseas-data2 ~]# gluster volume info > > Volume Name: data-volume
> > > Type: Distribute > Volume ID: c162161e-2a2d-4dac-b015-f31fd89ceb18 >
> > Status: Started > Number of Bricks: 2 > Transport-type: tcp > Bricks: >
> > Brick1: mseas-data2:/mnt/brick1 > Brick2: mseas-data2:/mnt/brick2 >
> > Options Reconfigured: > performance.readdir-ahead: on > nfs.disable: on >
> > nfs.export-volumes: off
> ​I copied this from old thread from 2016. This is distribute volume. Did you
> change any of the options in between?
> --
> 
> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> Pat Haley                          Email: phaley at mit.edu Center for Ocean
> Engineering       Phone:  (617) 253-6824
> Dept. of Mechanical Engineering    Fax:    (617) 253-8125
> MIT, Room 5-213 http://web.mit.edu/phaley/www/ 77 Massachusetts Avenue
> Cambridge, MA  02139-4301
> --
> Pranith
> --
> 
> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> Pat Haley                          Email: phaley at mit.edu Center for Ocean
> Engineering       Phone:  (617) 253-6824
> Dept. of Mechanical Engineering    Fax:    (617) 253-8125
> MIT, Room 5-213 http://web.mit.edu/phaley/www/ 77 Massachusetts Avenue
> Cambridge, MA  02139-4301
> --
> Pranith
> --
> 
> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> Pat Haley                          Email: phaley at mit.edu Center for Ocean
> Engineering       Phone:  (617) 253-6824
> Dept. of Mechanical Engineering    Fax:    (617) 253-8125
> MIT, Room 5-213 http://web.mit.edu/phaley/www/ 77 Massachusetts Avenue
> Cambridge, MA  02139-4301
> --
> Pranith
> --
> 
> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> Pat Haley                          Email: phaley at mit.edu Center for Ocean
> Engineering       Phone:  (617) 253-6824
> Dept. of Mechanical Engineering    Fax:    (617) 253-8125
> MIT, Room 5-213 http://web.mit.edu/phaley/www/ 77 Massachusetts Avenue
> Cambridge, MA  02139-4301
> --
> Pranith
> --
> 
> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> Pat Haley                          Email: phaley at mit.edu Center for Ocean
> Engineering       Phone:  (617) 253-6824
> Dept. of Mechanical Engineering    Fax:    (617) 253-8125
> MIT, Room 5-213 http://web.mit.edu/phaley/www/ 77 Massachusetts Avenue
> Cambridge, MA  02139-4301
> --
> Pranith
> --
> 
> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> Pat Haley                          Email: phaley at mit.edu Center for Ocean
> Engineering       Phone:  (617) 253-6824
> Dept. of Mechanical Engineering    Fax:    (617) 253-8125
> MIT, Room 5-213 http://web.mit.edu/phaley/www/ 77 Massachusetts Avenue
> Cambridge, MA  02139-4301
> --
> Pranith
> --
> 
> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> Pat Haley                          Email: phaley at mit.edu Center for Ocean
> Engineering       Phone:  (617) 253-6824
> Dept. of Mechanical Engineering    Fax:    (617) 253-8125
> MIT, Room 5-213 http://web.mit.edu/phaley/www/ 77 Massachusetts Avenue
> Cambridge, MA  02139-4301
> 
> 
> 
> --
> Pranith
> 
> 
> 
> --
> Pranith
> 
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users


More information about the Gluster-users mailing list