[Gluster-users] Gluster-users Digest, Vol 9, Issue 66

Fri Jan 23 18:56:07 UTC 2009

At 10:18 AM 1/23/2009, Evan wrote:
>I added the following to the bottom of my spec file:
>
>volume writebehind
>   type performance/write-behind
>   option aggregate-size 10MB # default is 0bytes
>   option flush-behind off    # default is 'off'
>   subvolumes afr
>end-volume
>
>which gives me the following results when making a 10MB file
># time dd if=/dev/zero of=/tmp/disktest count=10240 bs=1024
>10240+0 records in
>10240+0 records out
>10485760 bytes (10 MB) copied, 0.173179 s, 60.5 MB/s
>
>real    0m0.183s
>user    0m0.000s
>sys     0m0.204s
>
># time dd if=/dev/zero of=/mnt/gluster/disktest count=10240 bs=1024
>10240+0 records in
>10240+0 records out
>10485760 bytes (10 MB) copied, 5.50861 s, 1.9 MB/s
>
>real    0m5.720s
>user    0m0.000s
>sys     0m0.060s
>
>Although this is better than I had before is there anyway to have 
>gluster write the data to the localBrick and then sync/afr in the 
>background so I could expect to see something closer to the 60 MB/s 
>I see when writing to the local disk directly?

what you really want is a delayed replication.  I've asked for this 
in this mailing list recently, and was told that it's something 
they've considered (more as a DR feature than an HA feature), but 
it's not currently on the list of priorities.

The issue, as I see it, is if it's an HA feature, then you really 
need to insure that the data is replicated before you let your 
application think the data is written.  If the replication was 
delayed, and the server went down, the data is lost forever.  This is 
bad for HA.
if it's a DR feature, then you're probably ok, because usually 
disaster recovery scenarios can probably withstand some data loss, 
and you're more interested in a point-in-time snapshot of the data.

FUSE is a problem, and TCP/IP is a problem with much overhead and 
large blocksizes.

Ideally, glusters write-behind would be smart enough to aggregate 
smaller blocks of data into a large write.  I think this would solve 
a lot of the problem you're having in your tests.

>Thanks
>
>aghavendra G 
><<mailto:raghavendra at zresearch.com>raghavendra at zresearch.com> wrote:
>above afr with afr as a subvolume
>
>On Fri, Jan 23, 2009 at 12:59 AM, Evan 
><_<mailto:Gluster at devnada.com>Gluster at devnada.com> wrote:
>Where should I put the write-behind translator?
>Just above afr with afr as a subvolume? Or should I put it just 
>above my localBrick volume and below afr?
>
>
>Here is the output using /dev/zero:
># time dd if=/dev/zero of=/mnt/gluster/disktest count=1024 bs=1024
>
>1024+0 records in
>1024+0 records out
>1048576 bytes (1.0 MB) copied, 1.90119 s, 552 kB/s
>
>real    0m2.098s
>user    0m0.000s
>sys     0m0.016s
>
># time dd if=/dev/zero of=/tmp/disktest count=1024 bs=1024
>
>1024+0 records in
>1024+0 records out
>1048576 bytes (1.0 MB) copied, 0.0195388 s, 53.7 MB/s
>
>real    0m0.026s
>user    0m0.000s
>sys     0m0.028s
>
>
>Thanks
>
>
>On Thu, Jan 22, 2009 at 12:52 PM, Anand Avati 
><<mailto:avati at zresearch.com>avati at zresearch.com> wrote:
>Do you have write-behind loaded on the client side? For IO testing,
>use /dev/zero instead of /dev/urandom.
>
>avati
>
>On Fri, Jan 23, 2009 at 2:14 AM, Evan 
><_<mailto:Gluster at devnada.com>Gluster at devnada.com> wrote:
> > I have a 2 node single process AFR setup with 1.544Mbps bandwidth between
> > the 2 nodes. When I write a 1MB file to the gluster share it 
> seems to AFR to
> > the other node in real time killing my disk IO speeds on the gluster mount
> > point. Is there anyway to fix this? Ideally I would like to see near real
> > disk IO speeds from/to the local gluster mount point and let the afr play
> > catch up in the background as the bandwidth becomes available.
> >
> > Gluster Spec File (same on both nodes) 
> <http://pastebin.com/m58dc49d4>http://pastebin.com/m58dc49d4
> > IO speed tests:
> > # time dd if=/dev/urandom of=/mnt/gluster/disktest count=1024 bs=1024
> > 1024+0 records in
> > 1024+0 records out
> > 1048576 bytes (1.0 MB) copied, 8.34701 s, 126 kB/s
> >
> > real    0m8.547s
> > user    0m0.000s
> > sys     0m0.372s
> >
> > # time dd if=/dev/urandom of=/tmp/disktest count=1024 bs=1024
> > 1024+0 records in
> > 1024+0 records out
> > 1048576 bytes (1.0 MB) copied, 0.253865 s, 4.1 MB/s
> >
> > real    0m0.259s
> > user    0m0.000s
> > sys     0m0.284s
> >
> >
> > Thanks
> >
> > _______________________________________________
> > Gluster-users mailing list
> > <mailto:Gluster-users at gluster.org>Gluster-users at gluster.org
> > http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
> >
> >
>
>
>
>_______________________________________________
>Gluster-users mailing list
><mailto:Gluster-users at gluster.org>Gluster-users at gluster.org
>http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
>
>
>
>
>--
>Raghavendra G
>
>
>_______________________________________________
>Gluster-users mailing list
>Gluster-users at gluster.org
>http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users