[Gluster-devel] Replication Initialization from Existing Directory

Thu Apr 24 13:39:06 UTC 2008

On Thu, 24 Apr 2008, Krishna Srinivas wrote:

>>  I'm trying to move a large volume of data from local disk to GlusterFS. I
>> could just copy it, but copying ~ 1TB of data is slow. So, what I've tried
>> to do (with some randomly generated data for a test case) is to specify the
>> directory already containing the data as the data source for the underlying
>> storage brick.
>>
>>  I then fire up glusterfsd and glusterfs on the same machine, and I can see
>> all the data via the mountpoint.
>>
>>  On another node, I start glusterfsd and glusterfs, and I can see and read
>> the data. But, the data doesn't appear on the underlying data brick on the
>> 2nd node after I have done cat * > /dev/null in the mounted directory.
>>
>>  So it looks like GluserFS isn't causing the data to get copied on reads in
>> this scenario.
>>
>>  Can anyone hazard a guess as to why this might be? I am guessing that it's
>> to do with the fact that the xattrs/metaddata have not been initialized by
>> glusterfs because the files were added "underneath" rather than via the
>> mountpoint. Is there a workaround for this, e.g. by manually setting some
>> xattrs on the files (in a hope that this might be faster than copying the
>> whole volume)?
>
> Your guess is right, just set xattr "trusted.glusterfs.version" to 3 to the
> entire tree structure files/dirs (including the exported directory) and try
> find + cat, it should work

Thanks for that, much appreciated. Would I be correct in assuming that 
setting this attribute to "3" is actually meant to mean "set it to a 
higher value on the server than it might be set on the client"?

Another question -  if the value of the trusted.glusterfs.version is 
higher on node2 (for whatever reason, e.g. reusing an old store 
dump directory) than on node1, where a fresh storage brick is initialized, 
would that lead to the empty directory on node2 clobbering all the data on 
node1, because the empty directory has a higher version number? I can see 
how that would make sense if it is the case, but that means that there are 
major dangers involved in manually initializing the store. The problem is 
similar to what happens in a split-brain situation (half of the files 
simply get silently dropped/replaced) if the same file was modified on 
both sides.

Thanks.

Gordan