[Gluster-devel] Replication Initialization from Existing Directory

Thu Apr 24 15:30:09 UTC 2008

On Thu, 24 Apr 2008, Krishna Srinivas wrote:

>>>>  I'm trying to move a large volume of data from local disk to GlusterFS.
>> I
>>>> could just copy it, but copying ~ 1TB of data is slow. So, what I've
>> tried
>>>> to do (with some randomly generated data for a test case) is to specify
>> the
>>>> directory already containing the data as the data source for the
>> underlying
>>>> storage brick.
>>>>
>>>>  I then fire up glusterfsd and glusterfs on the same machine, and I can
>> see
>>>> all the data via the mountpoint.
>>>>
>>>>  On another node, I start glusterfsd and glusterfs, and I can see and
>> read
>>>> the data. But, the data doesn't appear on the underlying data brick on
>> the
>>>> 2nd node after I have done cat * > /dev/null in the mounted directory.
>>>>
>>>>  So it looks like GluserFS isn't causing the data to get copied on reads
>> in
>>>> this scenario.
>>>>
>>>>  Can anyone hazard a guess as to why this might be? I am guessing that
>> it's
>>>> to do with the fact that the xattrs/metaddata have not been initialized
>> by
>>>> glusterfs because the files were added "underneath" rather than via the
>>>> mountpoint. Is there a workaround for this, e.g. by manually setting
>> some
>>>> xattrs on the files (in a hope that this might be faster than copying
>> the
>>>> whole volume)?
>>>>
>>>
>>> Your guess is right, just set xattr "trusted.glusterfs.version" to 3 to
>> the
>>> entire tree structure files/dirs (including the exported directory) and
>> try
>>> find + cat, it should work
>>>
>>
>>  Thanks for that, much appreciated. Would I be correct in assuming that
>> setting this attribute to "3" is actually meant to mean "set it to a higher
>> value on the server than it might be set on the client"?
>
> version of a version less file will be assumed to be "1". So we should set it
> to atleast "2", setting it to "3" is just being paranoid :)

Sure, in a completely uninitialized network of nodes (all nodes without 
xattrs set on ANY file), I can see how that works.

>>  Another question -  if the value of the trusted.glusterfs.version is higher
>> on node2 (for whatever reason, e.g. reusing an old store dump directory)
>> than on node1, where a fresh storage brick is initialized, would that lead
>> to the empty directory on node2 clobbering all the data on node1, because
>> the empty directory has a higher version number? I can see how that would
>> make sense if it is the case, but that means that there are major dangers
>> involved in manually initializing the store. The problem is similar to what
>> happens in a split-brain situation (half of the files simply get silently
>> dropped/replaced) if the same file was modified on both sides.
>
> glusterfs uses two xattrs to decide on which is latest copy, i.e
> trusted.glusterfs.createtime and trusted.glusterfs.version. But
> the split brain situation is still a problem.

I was thinking more along the lines of the case I was testing:
node1 is the one with source files to be replicated:
node1:/gluster-store, trusted.glusterfs.version=3
node1:/gluster-store/*, trusted.glusterfs.version=3

node2 is empty, but it's store was used before:
node2:/gluster-store, trusted.glusterfs.version=18

Would this cause node2 to sync node1 by getting all the files on 
node1 purged?

Or would it merely cause node2 to not get any files sent to it (neither 
metadata for ls, nor (implicitly from lack of metadata) the files 
themselves)?

In other words, is deletion replication implicit, or does it have to be 
explicit? i.e. if a file gets removed while a node is down, and that node 
rejoins, does the file get replicated back to the other nodes, or does the 
deletion get replicated back? Or to put it another way, does the "latest 
wins" strategy apply in the same way to directories regarding content, or 
is there a special case for this to limit possible damage when there is 
node churn?

Thanks.

Gordan