[Gluster-devel] Re: Uh, another gotcha with AFR, pre-existingdata specific

Tue May 6 13:35:15 UTC 2008

Thanks, I was wondering about that. Below I suggested using glusterfs to do
the copy and forgetting about an external rsync, but I didn't know how that
would perform compared to an rsync. Looks like your data confirms that if
the xattrs get set correctly, allowing gluster to do the copying / creating
would be a good fast solution.

> -----Original Message-----
> From: 
> gluster-devel-bounces+chawkins=veracitynetworks.com at nongnu.org
>  
> [mailto:gluster-devel-bounces+chawkins=veracitynetworks.com at no
ngnu.org] On Behalf Of gordan at bobich.net
> Sent: Tuesday, May 06, 2008 9:26 AM
> To: gluster-devel at nongnu.org
> Subject: RE: [Gluster-devel] Re: Uh, another gotcha with AFR, 
> pre-existingdata specific
> 
> Something that may be worth mentioning here is that on my 
> glusterfs syncs using the find method I seem to get as close 
> to the wire-speed of my network as I've ever seen. The snmp 
> bandwidth graphs confirm it at > 90Mb/s on a 100Mb network, 
> as does the sync time.
> 
> The reason why I'm mentioning this is because this means that 
> it is unlikely you will actually achieve higher speed with rsync.
> 
> Gordan
> 
> On Tue, 6 May 2008, Christopher Hawkins wrote:
> 
> > I have a need for something quite similar to that. Seems 
> reasonable to me...
> >
> >
> > My guess is that what you want to do is:
> >
> > Set xattr version = 2 for all gluster files on server 1. 
> Start gluster 
> > with AFR. Do a find on all files on server 1, which should 
> copy them 
> > to server 2, whether they already exist there or not (but really 
> > server2 should pretty much be empty).
> >
> > Now all files should be in sync. I will be doing some 
> testing on this 
> > soon and intend to write a script to handle it... The Md5 
> check is a 
> > good idea and I'll try to find a way to build that in. Let 
> me know how 
> > this goes for you... Your feedback will be very helpful!
> >
> > PS - You say "it did no good" but I disagree. In your setup below, 
> > even though it copied the data all over again to server 2, 
> it did so 
> > while server
> > 1 was online because you manually set the xattrs on the 
> existing data 
> > on server 1. You had much less downtime than if you had 
> re-copied all 
> > the data from server 1 TO server 1, from non-gluster 
> storage into gluster storage.
> >
> > Chris
> >
> >> -----Original Message-----
> >> From:
> >> gluster-devel-bounces+chawkins=veracitynetworks.com at nongnu.org
> >>
> >> [mailto:gluster-devel-bounces+chawkins=veracitynetworks.com at no
> > ngnu.org] On Behalf Of Brandon Lamb
> >> Sent: Monday, May 05, 2008 9:07 PM
> >> To: Gluster Devel
> >> Subject: [Gluster-devel] Re: Uh, another gotcha with 
> AFR,pre-existing 
> >> data specific
> >>
> >> On Mon, May 5, 2008 at 5:58 PM, Brandon Lamb 
> <brandonlamb at gmail.com> 
> >> wrote:
> >>> I just did some testing, and came to the conclusion that 
> trying to 
> >>> setup afr using one server with pre-existing data and a
> >> blank server,
> >>> and copying your data and removing xattr's on the copied 
> data then 
> >>> initiating afr DOES NO GOOD.
> >>>
> >>> server1 - 400 megs of data in 10 tarballs, removed all xattr
> >>> server2 - copied the files from server1
> >>> server1 - started glusterfsd, then ran setfattr 
> >>> trusted.glusterfs.version to 1, files on server2 have no xattr.
> >>>
> >>> At this point i should have identical copies of data
> >> *assuming* i had
> >>> no writes in between.
> >>>
> >>> SO, now in a client i do head -c 1 file0.tar.bz2 and it 
> seems that 
> >>> since files on server2 have NO xattr, it copies them all
> >> over again!!!
> >>>
> >>> So, is there no viable way to PRECOPY a copy of pre-existing data?
> >>> Looks like what we will have to do is a directory by directory 
> >>> migration or stop all services that rely on the data store,
> >> copy the
> >>> data to both machines while there are no writes (no
> >> changes) going on,
> >>> then start everhting back up.
> >>>
> >>> For those of us that need this in a mail storage scenario,
> >> this is not
> >>> good. I cant stop my entire mail system for 4 hours while I
> >> copy over
> >>> 170 gigs of 4 million files.
> >>>
> >>> Now I will have to think of something a little more tricky
> >> like moving
> >>> a single maildir subdirectory letter at a time.
> >>>
> >>> Thoughts, comments, suggestions?
> >>
> >> I am stepping way over my head now, but here goes...
> >>
> >> Is there any way either with a translator or I dont know 
> what, but to 
> >> implement some kind of algorithm (md5 or
> >> whatnot) against the file in this situation? Something that 
> >> could/would only be used when initially setting up a cluster?
> >> I dont knwo if that would require just a seperate script 
> written in 
> >> whatever language or if it would be something that could belong to 
> >> glusterfs or what.
> >>
> >> In the case i just described, it would be nice to have 
> something to 
> >> go through all files such as the find trick, and have both 
> servers do 
> >> an
> >> md5 check or whatever and if they arethe same update the 
> version on 
> >> the COPY to the same version?
> >>
> >> Is this TOTALLY broken/whackass to do? I know pretty much nothing 
> >> about file system schematics and such so please dont beat 
> me up too 
> >> badly.
> >>
> >> =P
> >>
> >>
> >> _______________________________________________
> >> Gluster-devel mailing list
> >> Gluster-devel at nongnu.org
> >> http://lists.nongnu.org/mailman/listinfo/gluster-devel
> >>
> >
> >
> >
> > _______________________________________________
> > Gluster-devel mailing list
> > Gluster-devel at nongnu.org
> > http://lists.nongnu.org/mailman/listinfo/gluster-devel
> >
> 
> 
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>