[Gluster-devel] AFR comments. Maximizing free space use when using mirroring.

Krishna Srinivas krishna at zresearch.com
Tue Jul 31 15:29:14 UTC 2007


On 7/25/07, DeeDee Park <deedee6905 at hotmail.com> wrote:
> Here is my 2c on AFR.
>
> When I setup file servers, the first priority is always to get it up and
> running, and then
> later the next priority is to add mirrors/high availability. Somtimes due to
> business concerns
> the second priority sometimes does not happen until something drastic
> happens. By
> the time that there is budget to get additional drives, typically the drive
> sizes that are
> available are also much bigger (remember, they double every 2 years or so).
> So when
> I've bought drives, I've gotten 40GB, 80GB, 120Gb, 160GB, 200GB, 250GB,
> 300GB,
> 500GB, 750GB... So what I'm saying is that when new drives are bought to
> either expand
> the total file server size, or to add additional replicas, the new drives
> are most of the time bigger than the original drives purchased.
>
> The In the current implementation of AFR, the second brick (in a non well
> manged environment)
> will most likely be bigger than the first brick, thus underutilizing
> additional storage space due to mismatch in disk sizes.
>
> The idea I have is that I want to use as many available commodity parts that
> I can find and
> build a largest file server for my customer's needs and reallocating the
> remaining space for
> replicas. I still have a lot of these 120GB drives sitting around from a few
> years ago, and I've
> got 500/750GB drives. It seems to be a difficult task to match each 120GB
> drive with another
> 120GB drive to optimize disk usage for AFR purposes. I could have 2 500GB
> drives for
> replication *:2, but if I want to move to *:3 in the future, most likely
> I'll have some 750GB
> drives laying around. Using a 750GB as my third brick would most likely
> waste the remaining 250GB.

You can just put the bigger drive brick as the first subvolume in the subvolume
list. This should fix the problem right?

>
> Just as RR or ALU puts files anywhere. I envisioned originally that AFR also
> did the same. If my dataset is larger than the largest possible RAID I can
> afford, then one brick will never carry all the
> files.

No, That would complicate the functionality of AFR and its self-heal feature.

>
> What I think would be cool would be to have the AFR on top of the unify so
> that if the dataset is spread across X drives, that is fine, the remote
> mirrors would not require the same hardware, and I would just need to
> purchase the approximate 2X hard drive space at the new co-lo. I can just
> ask a client "How much disk space are you currently using?". If they say
> 20TB all using 200GB drives (=100 drives), then I can setup the additional
> glusterfs replica to utilize 20TB using 750GB drives. I would like to have
> to buy 27 750GB drives to make up my 20TB, instead of having to buy 100
> 750GB drives to replicate the existing 100GB drives. (It doesn't make sense
> to buy 200GB drives when larger drives are available for purchase).

I just tried afr over unify, there is some problem, I shall look into it.
However we need to see if it is advisable to use this kind of setup.

>
> Also, I have the premise that 100% of the dataset is critical (It is all
> user data), and I cannot say which file extensions should be replicated or
> should not be replicated. The example that *.c is more critical than *.o
> probably true, but I know users have told me that they have .o files from
> systems that are no longer available, so those .o files for that user are
> critical. Since I cannot specify *.c:2,*.o:1 for some users and *.c:2,*.o:2
> for others (nor would I really want to get that involved in the user data
> details or think I'll have that much free time to investigate that level of
> detail), it only makes sense to replicate everything eg: *:2 or *:3. It is a
> cool feature to have. But also if a user specifies *.c:2,*.o:1, then that
> assumes (with the current implementation of AFR), that the 2nd brick should
> be smaller than the first brick (Then I have questions as to what happens
> when there isn't enough space etc).

That is right, second brick should be smaller or equal to fist disk.

Regards,
Krishna





More information about the Gluster-devel mailing list