[Gluster-users] Hopefully answering some mirroring questions asked here and offline

Mon May 2 22:08:05 UTC 2011

Hi folks

  We've fielded a number of mirroring questions offline as well as 
watched/participated in discussions here.  I thought it was important to 
make sure some of these are answered and searchable on the lists.

  One major question that kept arising was as follows:

q:  If I have a large image file (say a VM vmdk/other format) on a 
mirrored volume, will one small change of a few bytes result in a resync 
of the entire file?

a:  No.

To test this, we created a 20GB file on a mirror volume.

root at metal:/local2/home/landman# ls -alF /mirror1gfs/big.file
-rw-r--r-- 1 root root 21474836490 2011-05-02 12:44 /mirror1gfs/big.file

Then using the following quick and dirty Perl, we appended about 10-20 
bytes to the file.

#!/usr/bin/env perl

my $file=shift;
my $fh;
open($fh,">>".$file);
print $fh "end ".$$."\n";
close($fh);

root at metal:/local2/home/landman# ./app.pl /mirror1gfs/big.file

then I had to write a quick and dirty tail replacement, as I've 
discovered that tail doesn't seek ... (yeah, it started reading every 
'line' of that file ...)

#!/usr/bin/env perl

my $file=shift;
my $fh;
my $buf;

open($fh,"<".$file);
seek $fh,-200,2;
read $fh,$buf,200;
printf "buffer: \'%s\'\n",$buf;
close($fh);

root at metal:/local2/home/landman# ./tail.pl /mirror1gfs/big.file
buffer: 'end 19362'

While running the app.pl, I did not see any massive resyncs.  I had 
dstat running in another window.

You might say, that this is irrelevant, as we only appended, and that 
could be special cased.

So I wrote a random updater, that updated at random spots throughtout 
the large file (sorta like a VM vmdk and other files).

#!/usr/bin/env perl

my $file=shift;
my $fh;
my $buf;
my @stat;
my $loc;

@stat = stat($file);
$loc	=	int(rand($stat[7]));
open($fh,">>+".$file);
seek $fh,$loc,0;
printf $fh "I was here!!!";
printf "loc: %i\n",$loc;
close($fh);

root at metal:/local2/home/landman# ./randupd.pl /mirror1gfs/big.file
loc: 17598205436
root at metal:/local2/home/landman# ./randupd.pl /mirror1gfs/big.file
loc: 16468787891
root at metal:/local2/home/landman# ./randupd.pl /mirror1gfs/big.file
loc: 9271612568
root at metal:/local2/home/landman# ./randupd.pl /mirror1gfs/big.file
loc: 1356667302
root at metal:/local2/home/landman# ./randupd.pl /mirror1gfs/big.file
loc: 12365324308
root at metal:/local2/home/landman# ./randupd.pl /mirror1gfs/big.file
loc: 15654714313
root at metal:/local2/home/landman# ./randupd.pl /mirror1gfs/big.file
loc: 10127739152
root at metal:/local2/home/landman# ./randupd.pl /mirror1gfs/big.file
loc: 10259920623

and again, no massive resyncs.

So I think its fairly safe to say that the concern over massive resyncs 
for small updates is not something we see in the field.

Regards,

Joe

-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics Inc.
email: landman at scalableinformatics.com
web  : http://scalableinformatics.com
        http://scalableinformatics.com/sicluster
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615