[Gluster-users] Is it ok to add a new brick with files already on it?

SINCOCK John J.Sincock at fugro.com
Thu Oct 16 03:03:32 UTC 2014


Ah, apologies, sorry yes gluster can be good & fast on large file writes, it is the large number of small files that slows gluster down, and I know this is pretty much unavoidable. 

I'm still not clear though, exactly how to ensure these files can be seen by gluster though.

Ie, Franco, I'm not sure exactly what you mean when you say the files will be "out of place", and that a rebalance will take time. Directly filling the bricks up on our new server should actually bring the used/available space ratio on this server close to what it is on our other 2 nodes, so, when these files are found, somehow, by gluster, or during a rebalance, I don’t think gluster would need to shift much data between the nodes just to even out the free space.

As I understand it, gluster at the very least requires xattrs to be set on every file, and, obviously these will be set if data is copied in via gluster, and gluster places files on the bricks. But, I'm not clear how/if/when files will become part of the gluster, if they are not explicitly copied onto a brick that is already part of a gluster volume, via a proper gluster-aware mount:

I guess I’m hoping for one of two things:
1)	that if you add a brick with data already on it, to a gluster - that gluster will go through and set the xattrs on all the files, and make them available, as part of the process of adding the brick. Or, 
2)	that there is some way to trigger gluster to re-scan a brick, to make itself aware of files that have been copied in “behind” the gluster.

I do apologise if I'm missing the point or making this seem a lot harder than it is - it's just, when dealing with large amounts of data, we have to be certain - I can't afford to waste 2 days copying data onto the server, and then find I can't add the files to the gluster without deleting it all and then spending 5 more days transferring all the files again via gluster.

Thanks again, I really do appreciate any advice that can really nail this down and clarify the situation.

-----Original Message-----
From: Franco Broi [mailto:franco.broi at iongeo.com] 
Sent: Thursday, 16 October 2014 12:21 PM
To: SINCOCK John; gluster-users
Subject: Re: [Gluster-users] Is it ok to add a new brick with files already on it?


Gluster may be slow when creating lots of small files but it is not slow writing.

I don't see a problem with what you want to do as long as you realise that many of the files will be out of place and a future rebalance would take a very long time - if you decide to run one.

On Wed, 2014-10-15 at 21:12 -0500, Ryan Nix wrote: 
> Interesting.  Still, I think its better to let the Gluster client 
> handle the syncing.  What happens if, for some strange reason, the 
> rsync process dies in the middle of the night?  Gluster, on the other, 
> will keep working to get the data on the other bricks without human 
> intervention.  I recently used Gluster to sync 3 TBs of data to the 
> another brick over a 1Gbps link in about 13 hours on decent hardware.
> 
> On Wed, Oct 15, 2014 at 9:04 PM, SINCOCK John <J.Sincock at fugro.com>
> wrote:
>          
>         
>         We have 20 Terabytes to rsync onto a new server (which will
>         have 32 TB capacity),
>         
>         And we then want to add that server to an existing 2-node
>         gluster of 73TB (53 TB used, 20 TB free), to give a 3-node
>         gluster with 105TB capacity, 73TB used.
>         
>          
>         
>         The reason I want to do it this way, if possible, is that
>         Gluster is slow on writes, especially for small files, and we
>         have a LOT of small files, so I’m pretty sure it will be  LOT
>         faster to rsync directly to the new server (which is the one
>         that has free space anyway), and then add that server to the
>         gluster – if it is possible to have gluster recognise those
>         files.
>         
>          
>         
>          
>         
>         From: Ryan Nix [mailto:ryan.nix at gmail.com] 
>         Sent: Thursday, 16 October 2014 11:58 AM
>         To: SINCOCK John
>         Cc: Franco Broi; gluster-users
>         
>         
>         Subject: Re: [Gluster-users] Is it ok to add a new brick with
>         files already on it? 
>          
>         
>         So Gluster, at its core, uses rsync to copy the data to the
>         other bricks.  Why not let Gluster do the heavy lifting?
>         
>         
>          
>         
>         On Wed, Oct 15, 2014 at 7:35 PM, SINCOCK John
>         <J.Sincock at fugro.com> wrote:
>         
>         
>         In a related question... it seems, if it is possible to add
>         filesystems already containing data, as new bricks, then it
>         should also be possible to:
>         
>         1) create empty bricks
>         2) add them to the gluster volume while they are empty
>         3) rsync data directly onto the underlying empty bricks,
>         circumventing gluster, ie not through the gluster mountpoint
>         4) somehow get gluster to recognise the data that has been
>         copied into the bricks?
>         
>         How would you go about getting gluster to see the data you've
>         rsynced directly in?
>         My concern would be that all the data rsynced directly onto
>         the bricks will just sit there, invisible to glusterfs.
>         
>         Thanks again for any info!
>         
>         
>         -----Original Message-----
>         From: Franco Broi [mailto:franco.broi at iongeo.com]
>         Sent: Thursday, 16 October 2014 10:06 AM
>         To: SINCOCK John
>         Cc: gluster-users at gluster.org
>         Subject: Re: [Gluster-users] Is it ok to add a new brick with
>         files already on it?
>         
>         
>         
>         I've never added a brick with existing files but I did start a
>         new Gluster volume on disks that already contained data and I
>         was able to access the files without problem. Of course the
>         files will be out of place but the first time you access them,
>         Gluster will add links to speed up future lookups.
>         
>         On Thu, 2014-10-16 at 09:57 +1030, SINCOCK John wrote:
>         > Hi Everyone,
>         >
>         >
>         >
>         > All the instructions I’ve been able to find on adding a
>         brick to a
>         > gluster, seem to assume the brick is empty when it’s added.
>         >
>         >
>         >
>         > So my question is, is it possible for a new brick, loaded up
>         with
>         > files, to be added to a gluster (and for all the files
>         already on that
>         > brick, to be indexed and added into the gluster). Apologies
>         if the
>         > question is answered elsewhere, but I couldn’t find anyone
>         addressing
>         > this specific question, and certainty helps when you’re
>         dealing with
>         > 10’s of terabytes of data... ;-)
>         >
>         >
>         >
>         > Thanks in advance for any info or tips!
>         >
>         >
>         >
>         >
>         > _______________________________________________
>         > Gluster-users mailing list
>         > Gluster-users at gluster.org
>         >
>         http://supercolony.gluster.org/mailman/listinfo/gluster-users
>         
>         
>         _______________________________________________
>         Gluster-users mailing list
>         Gluster-users at gluster.org
>         http://supercolony.gluster.org/mailman/listinfo/gluster-users
>         
>         
>          
>         
>         
> 
> 




More information about the Gluster-users mailing list