[Gluster-users] sparse files on EC volume
Xavi Hernandez
jahernan at redhat.com
Tue Sep 26 07:55:41 UTC 2017
Hi Dmitri,
On 22/09/17 17:07, Dmitri Chebotarov wrote:
>
> Hello
>
> I'm running some tests to compare performance between Gluster FUSE mount
> and formated sparse files (located on the same Gluster FUSE mount).
>
> The Gluster volume is EC (same for both tests).
>
> I'm seeing HUGE difference and trying to figure out why.
Could you explain what hardware configuration are you using ?
Do you have a plain disk for each brick formatted in XFS, or do you have
some RAID configuration ?
>
> Here is an example:
>
> GlusterFUSE mount:
>
> # cd /mnt/glusterfs
> # rm -f testfile1 ; dd if=/dev/zero of=testfile1 bs=1G count=1
> 1+0 records in
> 1+0 records out
> 1073741824 bytes (1.1 GB) copied, 9.74757 s, *110 MB/s*
>
> Sparse file (located on GlusterFUSE mount):
>
> # truncate -l 100GB /mnt/glusterfs/xfs-100G.img
> # mkfs.xfs /mnt/glusterfs/xfs-100G.img
> # mount -o loop /mnt/glusterfs/xfs-100G.img /mnt/xfs-100G
> # cd /mnt/xfs-100G
> # rm -f testfile1 ; dd if=/dev/zero of=testfile1 bs=1G count=1
> 1+0 records in
> 1+0 records out
> 1073741824 bytes (1.1 GB) copied, 1.20576 s, *891 MB/s*
>
> The same goes for working with small files (i.e. code file, make, etc)
> with the same data located on FUSE mount vs formated sparse file on the
> same FUSE mount.
>
> What would explain such difference?
First of all, doing tests with relatively small files tends to be
misleading because of caching capacity of the operating system (to
minimize that, you can add 'conv=fsync' option to dd). You should do
tests with file sizes bigger than the amount of physical memory on
servers. This way you minimize cache effects and see the real sustained
performance.
A second important point to note is that gluster is a distributed file
system that can be accessed simultaneously by more than one client. This
means that consistency must be assured in all cases, which makes things
go to bricks sooner than local filesystems normally do.
In your case, all data saved to the fuse volume will most probably be
present on bricks once the dd command completes. On the other side, the
test through the formatted sparse file, most probably, is keeping most
of the data in the cache of the client machine.
Note that using the formatted sparse file makes it possible a better use
of local cache, improving (relatively) small file access, but on the
other side, this filesystem can only be used from a single client
(single mount). If this client fails for some reason, you will loose
access to your data.
>
> How does Gluster work with sparse files in general? I may move some of
> the data on gluster volumes to formated sparse files..
Gluster works fine with sparse files. However you should consider the
previous points before choosing the formatted sparse files option. I
guess that the sustained throughput will be very similar for bigger files.
Regards,
Xavi
>
> Thank you.
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
More information about the Gluster-users
mailing list