[Gluster-users] Rsync

Hiren Joshi josh at moonfruit.com
Mon Oct 5 10:00:36 UTC 2009


Just a quick update: The rsync is *still* not finished. 

> -----Original Message-----
> From: gluster-users-bounces at gluster.org 
> [mailto:gluster-users-bounces at gluster.org] On Behalf Of Hiren Joshi
> Sent: 01 October 2009 16:50
> To: Pavan Vilas Sondur
> Cc: gluster-users at gluster.org
> Subject: Re: [Gluster-users] Rsync
> 
> Thanks!
> 
> I'm keeping a close eye on the "is glusterfs DHT really distributed?"
> thread =)
> 
> I tried nodelay on and unhashd no. I tarred about 400G to the share in
> about 17 hours (~6MB/s?) and am running an rsync now. Will post the
> results when it's done.
> 
> > -----Original Message-----
> > From: Pavan Vilas Sondur [mailto:pavan at gluster.com] 
> > Sent: 01 October 2009 09:00
> > To: Hiren Joshi
> > Cc: gluster-users at gluster.org
> > Subject: Re: Rsync
> > 
> > Hi,
> > We're looking into the problem on similar setups and workng on it. 
> > Meanwhile can you let us know if performance increases if you 
> > use this option:
> > 
> > option transport.socket.nodelay on' in each of your
> > protocol/client and protocol/server volumes.
> > 
> > Pavan
> > 
> > On 28/09/09 11:25 +0100, Hiren Joshi wrote:
> > > Another update:
> > > It took 1240 minutes (over 20 hours) to complete on the simplified
> > > system (without mirroring). What else can I do to debug?
> > > 
> > > > -----Original Message-----
> > > > From: gluster-users-bounces at gluster.org 
> > > > [mailto:gluster-users-bounces at gluster.org] On Behalf Of 
> > Hiren Joshi
> > > > Sent: 24 September 2009 13:05
> > > > To: Pavan Vilas Sondur
> > > > Cc: gluster-users at gluster.org
> > > > Subject: Re: [Gluster-users] Rsync
> > > > 
> > > >  
> > > > 
> > > > > -----Original Message-----
> > > > > From: Pavan Vilas Sondur [mailto:pavan at gluster.com] 
> > > > > Sent: 24 September 2009 12:42
> > > > > To: Hiren Joshi
> > > > > Cc: gluster-users at gluster.org
> > > > > Subject: Re: Rsync
> > > > > 
> > > > > Can you let us know the following:
> > > > > 
> > > > >  * What is the exact directory structure?
> > > > /abc/def/ghi/jkl/[1-4]
> > > > now abc, def, ghi and jkl are one of a thousand dirs.
> > > > 
> > > > >  * How many files are there in each individual directory and 
> > > > > of what size?
> > > > Each of the [1-4] dirs has about 100 files in, all under 1MB.
> > > > 
> > > > >  * It looks like each server process has 6 export 
> > > > > directories. Can you run one server process each for a single 
> > > > > export directory and check if the rsync speeds up?
> > > > I had no idea you could do that. How? Would I need to 
> > create 6 config
> > > > files and start gluster:
> > > > 
> > > > /usr/sbin/glusterfsd -f /etc/glusterfs/export1.vol or similar?
> > > > 
> > > > I'll give this a go....
> > > > 
> > > > >  * Also, do you have any benchmarks with a similar setup on 
> > > > say, NFS?
> > > > NFS will create the dir tree in about 20 minutes then start 
> > > > copying the
> > > > files over, it takes about 2-3 hours.
> > > > 
> > > > > 
> > > > > Pavan
> > > > > 
> > > > > On 24/09/09 12:13 +0100, Hiren Joshi wrote:
> > > > > > It's been running for over 24 hours now.
> > > > > > Network traffic is nominal, top shows about 200-400% cpu 
> > > > (7 cores so
> > > > > > it's not too bad).
> > > > > > About 14G of memory used (the rest is being used as 
> > disk cache).
> > > > > > 
> > > > > > Thoughts?
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > <snip>
> > > > > > > > > > 
> > > > > > > > > > An update, after running the rsync for a day, 
> > I killed it 
> > > > > > > > > and remounted
> > > > > > > > > > all the disks (the underlying filesystem, not the 
> > > > gluster) 
> > > > > > > > > with noatime,
> > > > > > > > > > the rsync completed in about 600 minutes. I'm now 
> > > > going to 
> > > > > > > > > try one level
> > > > > > > > > > up (about 1,000,000,000 dirs).
> > > > > > > > > > 
> > > > > > > > > > > -----Original Message-----
> > > > > > > > > > > From: Pavan Vilas Sondur 
> [mailto:pavan at gluster.com] 
> > > > > > > > > > > Sent: 23 September 2009 07:55
> > > > > > > > > > > To: Hiren Joshi
> > > > > > > > > > > Cc: gluster-users at gluster.org
> > > > > > > > > > > Subject: Re: Rsync
> > > > > > > > > > > 
> > > > > > > > > > > Hi Hiren,
> > > > > > > > > > > What glusterfs version are you using? Can you 
> > > > send us the 
> > > > > > > > > > > volfiles and the log files.
> > > > > > > > > > > 
> > > > > > > > > > > Pavan
> > > > > > > > > > > 
> > > > > > > > > > > On 22/09/09 16:01 +0100, Hiren Joshi wrote:
> > > > > > > > > > > > I forgot to mention, the mount is mounted with 
> > > > > > > > > direct-io, would this
> > > > > > > > > > > > make a difference? 
> > > > > > > > > > > > 
> > > > > > > > > > > > > -----Original Message-----
> > > > > > > > > > > > > From: gluster-users-bounces at gluster.org 
> > > > > > > > > > > > > [mailto:gluster-users-bounces at gluster.org] On 
> > > > > Behalf Of 
> > > > > > > > > > > Hiren Joshi
> > > > > > > > > > > > > Sent: 22 September 2009 11:40
> > > > > > > > > > > > > To: gluster-users at gluster.org
> > > > > > > > > > > > > Subject: [Gluster-users] Rsync
> > > > > > > > > > > > > 
> > > > > > > > > > > > > Hello all,
> > > > > > > > > > > > >  
> > > > > > > > > > > > > I'm getting what I think is bizarre 
> > > > > behaviour.... I have 
> > > > > > > > > > > about 400G to
> > > > > > > > > > > > > rsync (rsync -av) onto a gluster share, 
> > the data is 
> > > > > > > > > in a directory
> > > > > > > > > > > > > structure which has about 1000 directories 
> > > > > per parent and 
> > > > > > > > > > > about 1000
> > > > > > > > > > > > > directories in each of them.
> > > > > > > > > > > > >  
> > > > > > > > > > > > > When I try to rsync an end leaf 
> directory (this 
> > > > > > > has about 4 
> > > > > > > > > > > > > dirs and 100
> > > > > > > > > > > > > files in each) the operation takes about 10 
> > > > > > > seconds. When I 
> > > > > > > > > > > > > go one level
> > > > > > > > > > > > > above (1000 dirs with about 4 dirs in each 
> > > > > with about 100 
> > > > > > > > > > > > > files in each)
> > > > > > > > > > > > > the operation takes about 10 minutes.
> > > > > > > > > > > > >  
> > > > > > > > > > > > > Now, if I then go one level above that 
> > (that's 1000 
> > > > > > > > dirs with 
> > > > > > > > > > > > > 1000 dirs
> > > > > > > > > > > > > in each with about 4 dirs in each with about 
> > > > > 100 files in 
> > > > > > > > > > > each) the
> > > > > > > > > > > > > operation takes days! Top shows glusterfsd 
> > > > > takes 300-600% 
> > > > > > > > > > > cpu usage
> > > > > > > > > > > > > (2X4core), I have about 48G of memory 
> > > > (usage is 0% as 
> > > > > > > > > expected).
> > > > > > > > > > > > >  
> > > > > > > > > > > > > Has anyone seen anything like this? How can I 
> > > > > speed it up?
> > > > > > > > > > > > >  
> > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > >  
> > > > > > > > > > > > > Josh.
> > > > > > > > > > > > > 
> > > > > > > > > > > > _______________________________________________
> > > > > > > > > > > > Gluster-users mailing list
> > > > > > > > > > > > Gluster-users at gluster.org
> > > > > > > > > > > > 
> > > > > http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
> > > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > _______________________________________________
> > > > > > > > Gluster-users mailing list
> > > > > > > > Gluster-users at gluster.org
> > > > > > > > 
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
> > > > > > > > 
> > > > > 
> > > > _______________________________________________
> > > > Gluster-users mailing list
> > > > Gluster-users at gluster.org
> > > > http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
> > > > 
> > 
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
> 



More information about the Gluster-users mailing list