[Gluster-devel] AFR comments. Maximizing free space use when using mirroring.

Tue Jul 24 19:30:12 UTC 2007

Here is my 2c on AFR.

When I setup file servers, the first priority is always to get it up and 
running, and then
later the next priority is to add mirrors/high availability. Somtimes due to 
business concerns
the second priority sometimes does not happen until something drastic 
happens. By
the time that there is budget to get additional drives, typically the drive 
sizes that are
available are also much bigger (remember, they double every 2 years or so). 
So when
I've bought drives, I've gotten 40GB, 80GB, 120Gb, 160GB, 200GB, 250GB, 
300GB,
500GB, 750GB... So what I'm saying is that when new drives are bought to 
either expand
the total file server size, or to add additional replicas, the new drives 
are most of the time bigger than the original drives purchased.

The In the current implementation of AFR, the second brick (in a non well 
manged environment)
will most likely be bigger than the first brick, thus underutilizing 
additional storage space due to mismatch in disk sizes.

The idea I have is that I want to use as many available commodity parts that 
I can find and
build a largest file server for my customer's needs and reallocating the 
remaining space for
replicas. I still have a lot of these 120GB drives sitting around from a few 
years ago, and I've
got 500/750GB drives. It seems to be a difficult task to match each 120GB 
drive with another
120GB drive to optimize disk usage for AFR purposes. I could have 2 500GB 
drives for
replication *:2, but if I want to move to *:3 in the future, most likely 
I'll have some 750GB
drives laying around. Using a 750GB as my third brick would most likely 
waste the remaining 250GB.

Just as RR or ALU puts files anywhere. I envisioned originally that AFR also 
did the same. If my dataset is larger than the largest possible RAID I can 
afford, then one brick will never carry all the
files.

What I think would be cool would be to have the AFR on top of the unify so 
that if the dataset is spread across X drives, that is fine, the remote 
mirrors would not require the same hardware, and I would just need to 
purchase the approximate 2X hard drive space at the new co-lo. I can just 
ask a client "How much disk space are you currently using?". If they say 
20TB all using 200GB drives (=100 drives), then I can setup the additional 
glusterfs replica to utilize 20TB using 750GB drives. I would like to have 
to buy 27 750GB drives to make up my 20TB, instead of having to buy 100 
750GB drives to replicate the existing 100GB drives. (It doesn't make sense 
to buy 200GB drives when larger drives are available for purchase).

Also, I have the premise that 100% of the dataset is critical (It is all 
user data), and I cannot say which file extensions should be replicated or 
should not be replicated. The example that *.c is more critical than *.o 
probably true, but I know users have told me that they have .o files from 
systems that are no longer available, so those .o files for that user are 
critical. Since I cannot specify *.c:2,*.o:1 for some users and *.c:2,*.o:2 
for others (nor would I really want to get that involved in the user data 
details or think I'll have that much free time to investigate that level of 
detail), it only makes sense to replicate everything eg: *:2 or *:3. It is a 
cool feature to have. But also if a user specifies *.c:2,*.o:1, then that 
assumes (with the current implementation of AFR), that the 2nd brick should 
be smaller than the first brick (Then I have questions as to what happens 
when there isn't enough space etc).

_________________________________________________________________
http://imagine-windowslive.com/hotmail/?locale=en-us&ocid=TXT_TAGHM_migration_HM_mini_pcmag_0507