[Gluster-users] cluster.min-free-disk separate for each, brick
Dan Bretherton
d.a.bretherton at reading.ac.uk
Thu Sep 8 22:51:19 UTC 2011
> On Wed, Sep 7, 2011 at 4:27 PM, Dan Bretherton
> <d.a.bretherton at reading.ac.uk <mailto:d.a.bretherton at reading.ac.uk>>
> wrote:
>
>
> On 17/08/11 16:19, Dan Bretherton wrote:
>
>
>
>
>
> Dan Bretherton wrote:
>
>
> On 15/08/11 20:00, gluster-users-request at gluster.org
> <mailto:gluster-users-request at gluster.org> wrote:
>
> Message: 1
> Date: Sun, 14 Aug 2011 23:24:46 +0300
> From: "Deyan Chepishev -
> SuperHosting.BG"<dchepishev at superhosting.bg
> <mailto:dchepishev at superhosting.bg>>
> Subject: [Gluster-users] cluster.min-free-disk
> separate for each
> brick
> To: gluster-users at gluster.org
> <mailto:gluster-users at gluster.org>
> Message-ID:<4E482F0E.3030604 at superhosting.bg
> <mailto:4E482F0E.3030604 at superhosting.bg>>
> Content-Type: text/plain; charset=UTF-8; format=flowed
>
> Hello,
>
> I have a gluster set up with very different brick
> sizes.
>
> brick1: 9T
> brick2: 9T
> brick3: 37T
>
> with this configuration if I set the parameter
> cluster.min-free-disk to 10% it
> applies to all bricks which is quite uncomfortable
> with these brick sizes,
> because 10% for the small bricks are ~ 1T but for
> the big brick it is ~3.7T and
> what happens at the end is that if all brick go to
> 90% usage and I continue
> writing, the small ones eventually fill up to 100%
> while the big one has enough
> free space.
>
> My question is, is there a way to set
> cluster.min-free-disk per brick instead
> setting it for the entire volume or any other way
> to work around this problem ?
>
> Thank you in advance
>
> Regards,
> Deyan
>
> Hello Deyan,
>
> I have exactly the same problem and I have asked about
> it before - see links below.
>
> http://community.gluster.org/q/in-version-3-1-4-how-can-i-set-the-minimum-amount-of-free-disk-space-on-the-bricks/
>
> http://gluster.org/pipermail/gluster-users/2011-May/007788.html
>
> My understanding is that the patch referred to in
> Amar's reply in the May thread prevents a
> "migrate-data" rebalance operation failing by running
> out of space on smaller bricks, but that doesn't solve
> the problem we are having. Being able to set
> min-free-disk for each brick separately would be
> useful, as would being able to set this value as a
> number of bytes rather than a percentage. However,
> even if these features were present we would still
> have a problem when the amount of free space becomes
> less than min-free-disk, because this just results in
> a warning message in the logs and doesn't actually
> prevent more files from being written. In other
> words, min-free-disk is a soft limit rather than a
> hard limit. When a volume is more than 90% full there
> may still be hundreds of gigabytes of free space
> spread over the large bricks, but the small bricks may
> each only have a few gigabytes left of even less.
> Users do "df" and see lots of free space in the
> volume so they continue writing files. However, when
> GlusterFS chooses to write a file to a small brick,
> the write fails with "device full" errors if the file
> grows too large, which is often the case here with
> files typically several gigabytes in size for some
> applications.
>
> I would really like to know if there is a way to make
> min-free-disk a hard limit. Ideally, GlusterFS would
> chose a brick on which to write a file based on how
> much free space it has left rather than choosing a
> brick at random (or however it is done now). That
> would solve the problem of non-uniform brick sizes
> without the need for a hard min-free-disk limit.
>
> Amar's comment in the May thread about QA testing
> being done only on volumes with uniform brick sizes
> prompted me to start standardising on a uniform brick
> size for each volume in my cluster. My impression is
> that implementing the features needed for users with
> non-uniform brick sizes is not a priority for Gluster,
> and that users are all expected to use uniform brick
> sizes. I really think this fact should be stated
> clearly in the GlusterFS documentation, in the
> sections on creating volumes in the Administration
> Guide for example. That would stop other users from
> going down the path that I did initially, which has
> given me a real headache because I am now having to
> move tens of terabytes of data off bricks that are
> larger than the new standard size.
>
> Regards
> Dan.
>
> Hello,
>
> This is really bad news, because I already migrated my
> data and I just realized that I am screwed because Gluster
> just does not care about the brick sizes.
> It is impossible to move to uniform brick sizes.
>
> Currently we use 2TB HDDs, but the disks are growing and
> soon we will probably use 3TB hdds or whatever other
> larges sizes appear on the market. So if we choose to use
> raid5 and some level of redundancy (for example 6hdds in
> raid5, no matter what their size is) this sooner or later
> will lead us to non uniform bricks which is a problem and
> it is not correct to expect that we always can or want to
> provide uniform size bricks.
>
> With this way of thinking if we currently have 10T from
> 6x2T in hdd5, at some point when there is a 10T on a
> single disk we will have to use no raid just because
> gluster can not handle non uniform bricks.
>
> Regards,
> Deyan
>
>
> I think Amar might have provided the answer in his posting to
> the thread yesterday, which has just appeared in my autospam
> folder.
>
> http://gluster.org/pipermail/gluster-users/2011-August/008579.html
>
> With size option, you can have a hardbound on min-free-disk
>
> This means that you can set a hard limit on min-free-disk, and
> set a value in GB that is bigger than the biggest file that is
> ever likely to be written. This looks likely to solve our
> problem and make non-uniform brick sizes a practical
> proposition. I wish I had known about this back in May when I
> embarked on my cluster restructuring exercise; the issue was
> discussed in this thread in May as well:
> http://gluster.org/pipermail/gluster-users/2011-May/007794.html
>
> Once I have moved all the data off the large bricks and
> standardised on a uniform brick size, it will be relatively
> easy to stick to this because I use LVM. I create logical
> volumes for new bricks when a volume needs extending. The
> only problem with this approach is what happens when the
> amount of free space left on a server is less than the size of
> the brick you want to create. The only option then would be
> to use new servers, potentially wasting several TB of free
> space on existing servers. The standard brick size for most
> of my volumes is 3TB, which allows me to use a mixture of
> small servers and large servers in a volume and limits the
> amount of free space that would be wasted if there wasn't
> quite enough free space on a server to create another brick.
> Another consequence of having 3TB bricks is that a single
> server typically has two more more bricks belonging to a the
> same volume, although I do my best to distribute the volumes
> across different servers in order to spread the load. I am
> not aware of any problems associated with exporting multiple
> bricks from a single server and it has not caused me any
> problems so far that I am aware of.
>
> -Dan.
>
> Hello Deyan,
>
> Have you tried giving min-free-disk a value in gigabytes, and if
> so does it prevent new files being written to your bricks when
> they are nearly full? I recently tried it myself and found that
> min-free-disk had no effect all. I deliberately filled my
> test/backup volume and most of the bricks became 100 full. I set
> min-free-disk to "20GB", as reported in "gluster volume ... info"
> below.
>
> cluster.min-free-disk: 20GB
>
> Unless I am doing something wrong it seems as though we can not
> "have a hardbound on min-free-disk" after all, and uniform brick
> size is therefore an essential requirement. It still doesn't say
> that in the documentation, at least not in the volume creation
> sections.
>
>
> -Dan.
>
> On 08/09/11 06:35, Raghavendra Bhat wrote:
> > This is how it is supposed to work.
> >
> > Suppose a distribute volume is created with 2 bricks. 1st brick is
> having 25GB of free space, 2nd disk has 35 GB of free space. If one
> sets a 30GB of minimum-free-disk through volume set (gluster volume
> set <volname> min-free-disk 30GB), then whenever files are created, if
> the file is hashed to the 1st brick (which has 25GB of free space),
> then actual file will be created in the 2nd brick to which a linkfile
> will be created in the 1st brick. So the linkfile points to the actual
> file. A warning message indicating minimum free disk limit has been
> crosses and adding more nodes will be printed in the glusterfs log
> file. So any file which is hashed to the 1st brick will be created in
> the 2nd brick.
> >
> > Once the free space of 2nd brick also comes below 30 GB, then the
> files will be created in the respective hashed bricks only. There will
> be a warning message in the log file about the 2nd brick also crossing
> the minimum free disk limit.
> >
> > Regards,
> > Raghavendra Bhat
>
Dear Raghavendra,
Thanks for explaining this to me. This mechanism should allow a volume
to function correctly with non-uniform brick sizes even though
min-free-disk is not a hard limit. I can understand now why I had so
many problems with the default value of 10% for min-free-disk. 10% of a
large brick can be very large compared to 10% of a small brick, so when
they started filling up at the same rate after all had less than 10%
free space the small bricks usually filled up long before large ones,
giving "device full" errors even when df still showed a lot of free
space in the volume. At least now we can minimise this effect by
setting min-free-disk to a value in GB.
-Dan.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20110908/bbb7f854/attachment.html>
More information about the Gluster-users
mailing list