[Gluster-users] Inviting comments on my plans

Mon Nov 19 16:36:11 UTC 2012

On 11/19/2012 3:18 AM, Fernando Frediani (Qube) wrote:
> Hi,
>
> I agree with the comment about Fedora and wouldn't choose it a distribution, but if you are comfortable with it go ahead as I don't think this will be the major pain.
>
> RAID: I see where you are coming from to choose not have any RAID and I have thought myself before to do the same, mainly for performance reasons, but as mentioned how are you going to handle the drive swap ? If you think you can somehow automate it please share with us as I believe it is a major performance gain running the disks independently .

There will be no automation.  I'll have to do everything myself -- 
telling the RAID controller to make the disk available to the OS, 
putting a filesystem on it, re-adding it to gluster, etc.  Although 
drive failure is inevitable, I do not expect it to be a common occurrence.

> What you are willing to do with XFS+BRTFS I am not quiet sure it will work as you expect. Ideally you need to use snapshots from the Distributed Filesystem otherwise you might think you are getting a consistent copy of the data and you might not as you are not supposed to be reading/writing other than on the Gluster mount.

The filesystem will be mounted on /bricks/fsname, but gluster will be 
pointed at /bricks/fsname/volname.  I would put snapshots in 
/bricks/fsname/snapshots.  Gluster would never see the snapshot data.

> Performance: Simple and short - If you can compromise one disk per host AND choose to not go with independent disks(no RAID) go with RAID 5.
> As your system grows the reads and write should (in theory) be distributed across all bricks. If you have a disk failed you can easily replace it and even in a unlikely event that you lose two disks in a server and loose its data entirely you still have a copy of it in another place and can rebuild it with a bit of patience , so no data loss.
> Also we have had more than enough reports of bad performance in Gluster for all kinds of configurations (including RAID 10) so I don't think anyone should expect Gluster to perform that well, so using RAID 5, 6 or 10 underneath shouldn't make much difference and RAID 10 only would waste space. If you are storing bulk data (multimedia, images, big files) great, it will be streamed and sequential data and it should be ok and acceptable, but if you are storing things that do a lot of small IO or Virtual machines I'm not sure if Gluster is the best choice for you and you should think carefully about it.

A big problem that I would be facing if I went with RAID5 is that I 
won't initially have all drive bays populated.  The server has 12 drive 
bays.  If I populate 8 bays per server to start out, what happens when I 
need to fill in the other 4 bays?

If I make a new RAID5, then I have lost the capacity of another disk, 
and I have no option other than adding at least three drives at a time.  
I would not have the option of growing one disk at a time.  I can 
probably grow the existing RAID array, but that is a process that will 
literally take days, during which the entire array is in a fragile state 
with horrible performance. If others have experience with doing this on 
Dell hardware and have had consistently good luck with it, then my 
objection may be unfounded.

With individual disks instead of RAID, I can add one disk at a time to a 
server pair.

We will be storing photo, text, and video assets, currently about 80 
million of them, with most of them being photos.  Each asset consists of 
a main file and a handful of very small metadata files.  If it's a video 
asset, then we actually have several "main" files - different formats 
and bitrates.  We have a website that acts as a front end to all this data.

Because of other systems (MySQL and Solr), we normally do not need to 
access the storage until someone wishes to see a detail page for an 
individual asset, or download the asset.  We have plans to migrate the 
primary metadata access to another system with better performance, 
possibly a NoSQL database.  We will keep the metadata files around so we 
have the ability to rebuild the primary system, but the goal is to only 
access the storage when we are retrieving the asset for a user 
download.  The systems that process incoming data would obviously need 
to access the storage often.

Thanks,
Shawn