[Gluster-devel] Re: Uh, another gotcha with AFR, pre-existingdata specific
Christopher Hawkins
chawkins at veracitynetworks.com
Tue May 6 13:35:15 UTC 2008
Thanks, I was wondering about that. Below I suggested using glusterfs to do
the copy and forgetting about an external rsync, but I didn't know how that
would perform compared to an rsync. Looks like your data confirms that if
the xattrs get set correctly, allowing gluster to do the copying / creating
would be a good fast solution.
> -----Original Message-----
> From:
> gluster-devel-bounces+chawkins=veracitynetworks.com at nongnu.org
>
> [mailto:gluster-devel-bounces+chawkins=veracitynetworks.com at no
ngnu.org] On Behalf Of gordan at bobich.net
> Sent: Tuesday, May 06, 2008 9:26 AM
> To: gluster-devel at nongnu.org
> Subject: RE: [Gluster-devel] Re: Uh, another gotcha with AFR,
> pre-existingdata specific
>
> Something that may be worth mentioning here is that on my
> glusterfs syncs using the find method I seem to get as close
> to the wire-speed of my network as I've ever seen. The snmp
> bandwidth graphs confirm it at > 90Mb/s on a 100Mb network,
> as does the sync time.
>
> The reason why I'm mentioning this is because this means that
> it is unlikely you will actually achieve higher speed with rsync.
>
> Gordan
>
> On Tue, 6 May 2008, Christopher Hawkins wrote:
>
> > I have a need for something quite similar to that. Seems
> reasonable to me...
> >
> >
> > My guess is that what you want to do is:
> >
> > Set xattr version = 2 for all gluster files on server 1.
> Start gluster
> > with AFR. Do a find on all files on server 1, which should
> copy them
> > to server 2, whether they already exist there or not (but really
> > server2 should pretty much be empty).
> >
> > Now all files should be in sync. I will be doing some
> testing on this
> > soon and intend to write a script to handle it... The Md5
> check is a
> > good idea and I'll try to find a way to build that in. Let
> me know how
> > this goes for you... Your feedback will be very helpful!
> >
> > PS - You say "it did no good" but I disagree. In your setup below,
> > even though it copied the data all over again to server 2,
> it did so
> > while server
> > 1 was online because you manually set the xattrs on the
> existing data
> > on server 1. You had much less downtime than if you had
> re-copied all
> > the data from server 1 TO server 1, from non-gluster
> storage into gluster storage.
> >
> > Chris
> >
> >> -----Original Message-----
> >> From:
> >> gluster-devel-bounces+chawkins=veracitynetworks.com at nongnu.org
> >>
> >> [mailto:gluster-devel-bounces+chawkins=veracitynetworks.com at no
> > ngnu.org] On Behalf Of Brandon Lamb
> >> Sent: Monday, May 05, 2008 9:07 PM
> >> To: Gluster Devel
> >> Subject: [Gluster-devel] Re: Uh, another gotcha with
> AFR,pre-existing
> >> data specific
> >>
> >> On Mon, May 5, 2008 at 5:58 PM, Brandon Lamb
> <brandonlamb at gmail.com>
> >> wrote:
> >>> I just did some testing, and came to the conclusion that
> trying to
> >>> setup afr using one server with pre-existing data and a
> >> blank server,
> >>> and copying your data and removing xattr's on the copied
> data then
> >>> initiating afr DOES NO GOOD.
> >>>
> >>> server1 - 400 megs of data in 10 tarballs, removed all xattr
> >>> server2 - copied the files from server1
> >>> server1 - started glusterfsd, then ran setfattr
> >>> trusted.glusterfs.version to 1, files on server2 have no xattr.
> >>>
> >>> At this point i should have identical copies of data
> >> *assuming* i had
> >>> no writes in between.
> >>>
> >>> SO, now in a client i do head -c 1 file0.tar.bz2 and it
> seems that
> >>> since files on server2 have NO xattr, it copies them all
> >> over again!!!
> >>>
> >>> So, is there no viable way to PRECOPY a copy of pre-existing data?
> >>> Looks like what we will have to do is a directory by directory
> >>> migration or stop all services that rely on the data store,
> >> copy the
> >>> data to both machines while there are no writes (no
> >> changes) going on,
> >>> then start everhting back up.
> >>>
> >>> For those of us that need this in a mail storage scenario,
> >> this is not
> >>> good. I cant stop my entire mail system for 4 hours while I
> >> copy over
> >>> 170 gigs of 4 million files.
> >>>
> >>> Now I will have to think of something a little more tricky
> >> like moving
> >>> a single maildir subdirectory letter at a time.
> >>>
> >>> Thoughts, comments, suggestions?
> >>
> >> I am stepping way over my head now, but here goes...
> >>
> >> Is there any way either with a translator or I dont know
> what, but to
> >> implement some kind of algorithm (md5 or
> >> whatnot) against the file in this situation? Something that
> >> could/would only be used when initially setting up a cluster?
> >> I dont knwo if that would require just a seperate script
> written in
> >> whatever language or if it would be something that could belong to
> >> glusterfs or what.
> >>
> >> In the case i just described, it would be nice to have
> something to
> >> go through all files such as the find trick, and have both
> servers do
> >> an
> >> md5 check or whatever and if they arethe same update the
> version on
> >> the COPY to the same version?
> >>
> >> Is this TOTALLY broken/whackass to do? I know pretty much nothing
> >> about file system schematics and such so please dont beat
> me up too
> >> badly.
> >>
> >> =P
> >>
> >>
> >> _______________________________________________
> >> Gluster-devel mailing list
> >> Gluster-devel at nongnu.org
> >> http://lists.nongnu.org/mailman/listinfo/gluster-devel
> >>
> >
> >
> >
> > _______________________________________________
> > Gluster-devel mailing list
> > Gluster-devel at nongnu.org
> > http://lists.nongnu.org/mailman/listinfo/gluster-devel
> >
>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>
More information about the Gluster-devel
mailing list