[Gluster-users] cluster.min-free-disk separate for each, brick

Thu Sep 29 11:28:22 UTC 2011

On 08/09/11 23:51, Dan Bretherton wrote:
>
>> On Wed, Sep 7, 2011 at 4:27 PM, Dan Bretherton 
>> <d.a.bretherton at reading.ac.uk <mailto:d.a.bretherton at reading.ac.uk>> 
>> wrote:
>>
>>
>>     On 17/08/11 16:19, Dan Bretherton wrote:
>>
>>
>>
>>
>>
>>             Dan Bretherton wrote:
>>
>>
>>                 On 15/08/11 20:00, gluster-users-request at gluster.org
>>                 <mailto:gluster-users-request at gluster.org> wrote:
>>
>>                     Message: 1
>>                     Date: Sun, 14 Aug 2011 23:24:46 +0300
>>                     From: "Deyan Chepishev -
>>                     SuperHosting.BG"<dchepishev at superhosting.bg
>>                     <mailto:dchepishev at superhosting.bg>>
>>                     Subject: [Gluster-users] cluster.min-free-disk
>>                      separate for each
>>                        brick
>>                     To: gluster-users at gluster.org
>>                     <mailto:gluster-users at gluster.org>
>>                     Message-ID:<4E482F0E.3030604 at superhosting.bg
>>                     <mailto:4E482F0E.3030604 at superhosting.bg>>
>>                     Content-Type: text/plain; charset=UTF-8;
>>                     format=flowed
>>
>>                     Hello,
>>
>>                     I have a gluster set up with very different brick
>>                     sizes.
>>
>>                     brick1: 9T
>>                     brick2: 9T
>>                     brick3: 37T
>>
>>                     with this configuration if I set the parameter
>>                     cluster.min-free-disk to 10% it
>>                     applies to all bricks which is quite
>>                     uncomfortable with these brick sizes,
>>                     because 10% for the small bricks are ~ 1T but for
>>                     the big brick it is ~3.7T and
>>                     what happens at the end is that if all brick go
>>                     to 90% usage and I continue
>>                     writing, the small ones eventually fill up to
>>                     100% while the big one has enough
>>                     free space.
>>
>>                     My question is, is there a way to set
>>                     cluster.min-free-disk per brick instead
>>                     setting it for the entire volume or any other way
>>                     to work around this problem ?
>>
>>                     Thank you in advance
>>
>>                     Regards,
>>                     Deyan
>>
>>                 Hello Deyan,
>>
>>                 I have exactly the same problem and I have asked
>>                 about it before - see links below.
>>
>>                 http://community.gluster.org/q/in-version-3-1-4-how-can-i-set-the-minimum-amount-of-free-disk-space-on-the-bricks/
>>
>>                 http://gluster.org/pipermail/gluster-users/2011-May/007788.html
>>
>>                 My understanding is that the patch referred to in
>>                 Amar's reply in the May thread prevents a
>>                 "migrate-data" rebalance operation failing by running
>>                 out of space on smaller bricks, but that doesn't
>>                 solve the problem we are having.  Being able to set
>>                 min-free-disk for each brick separately would be
>>                 useful, as would being able to set this value as a
>>                 number of bytes rather than a percentage.  However,
>>                 even if these features were present we would still
>>                 have a problem when the amount of free space becomes
>>                 less than min-free-disk, because this just results in
>>                 a warning message in the logs and doesn't actually
>>                 prevent more files from being written.  In other
>>                 words, min-free-disk is a soft limit rather than a
>>                 hard limit.  When a volume is more than 90% full
>>                 there may still be hundreds of gigabytes of free
>>                 space spread over the large bricks, but the small
>>                 bricks may each only have a few gigabytes left of
>>                 even less.  Users do "df" and see lots of free space
>>                 in the volume so they continue writing files.
>>                  However, when GlusterFS chooses to write a file to a
>>                 small brick, the write fails with "device full"
>>                 errors if the file grows too large, which is often
>>                 the case here with files typically several gigabytes
>>                 in size for some applications.
>>
>>                 I would really like to know if there is a way to make
>>                 min-free-disk a hard limit.  Ideally, GlusterFS would
>>                 chose a brick on which to write a file based on how
>>                 much free space it has left rather than choosing a
>>                 brick at random (or however it is done now).  That
>>                 would solve the problem of non-uniform brick sizes
>>                 without the need for a hard min-free-disk limit.
>>
>>                 Amar's comment in the May thread about QA testing
>>                 being done only on volumes with uniform brick sizes
>>                 prompted me to start standardising on a uniform brick
>>                 size for each volume in my cluster.  My impression is
>>                 that implementing the features needed for users with
>>                 non-uniform brick sizes is not a priority for
>>                 Gluster, and that users are all expected to use
>>                 uniform brick sizes.  I really think this fact should
>>                 be stated clearly in the GlusterFS documentation, in
>>                 the sections on creating volumes in the
>>                 Administration Guide for example.  That would stop
>>                 other users from going down the path that I did
>>                 initially, which has given me a real headache because
>>                 I am now having to move tens of terabytes of data off
>>                 bricks that are larger than the new standard size.
>>
>>                 Regards
>>                 Dan.
>>
>>             Hello,
>>
>>             This is really bad news, because I already migrated my
>>             data and I just realized that I am screwed because
>>             Gluster just does not care about the brick sizes.
>>             It is impossible to move to uniform brick sizes.
>>
>>             Currently we use 2TB  HDDs, but the disks are growing and
>>             soon we will probably use 3TB hdds or whatever other
>>             larges sizes appear on the market. So if we choose to use
>>             raid5 and some level of redundancy (for example 6hdds in
>>             raid5, no matter what their size is) this sooner or later
>>             will lead us to non uniform bricks which is a problem and
>>             it is not correct to expect that we always can or want to
>>             provide uniform size bricks.
>>
>>             With this way of thinking if we currently have 10T from
>>             6x2T in hdd5, at some point when there is a 10T on a
>>             single disk we will have to use no raid just because
>>             gluster can not handle non uniform bricks.
>>
>>             Regards,
>>             Deyan
>>
>>
>>         I think Amar might have provided the answer in his posting to
>>         the thread yesterday, which has just appeared in my autospam
>>         folder.
>>
>>         http://gluster.org/pipermail/gluster-users/2011-August/008579.html
>>
>>             With size option, you can have a hardbound on min-free-disk
>>
>>         This means that you can set a hard limit on min-free-disk,
>>         and set a value in GB that is bigger than the biggest file
>>         that is ever likely to be written.  This looks likely to
>>         solve our problem and make non-uniform brick sizes a
>>         practical proposition.  I wish I had known about this back in
>>         May when I embarked on my cluster restructuring exercise; the
>>         issue was discussed in this thread in May as well:
>>         http://gluster.org/pipermail/gluster-users/2011-May/007794.html
>>
>>         Once I have moved all the data off the large bricks and
>>         standardised on a uniform brick size, it will be relatively
>>         easy to stick to this because I use LVM.  I create logical
>>         volumes for new bricks when a volume needs extending.  The
>>         only problem with this approach is what happens when the
>>         amount of free space left on a server is less than the size
>>         of the brick you want to create.  The only option then would
>>         be to use new servers, potentially wasting several TB of free
>>         space on existing servers.  The standard brick size for most
>>         of my volumes is 3TB, which allows me to use a mixture of
>>         small servers and large servers in a volume and limits the
>>         amount of free space that would be wasted if there wasn't
>>         quite enough free space on a server to create another brick.
>>          Another consequence of having 3TB bricks is that a single
>>         server typically has two more more bricks belonging to a the
>>         same volume, although I do my best to distribute the volumes
>>         across different servers in order to spread the load.  I am
>>         not aware of any problems associated with exporting multiple
>>         bricks from a single server and it has not caused me any
>>         problems so far that I am aware of.
>>
>>         -Dan.
>>
>>     Hello Deyan,
>>
>>     Have you tried giving min-free-disk a value in gigabytes, and if
>>     so does it prevent new files being written to your bricks when
>>     they are nearly full?  I recently tried it myself and found that
>>     min-free-disk had no effect all.  I deliberately filled my
>>     test/backup volume and most of the bricks became 100 full.  I set
>>     min-free-disk to "20GB", as reported in "gluster volume ... info"
>>     below.
>>
>>     cluster.min-free-disk: 20GB
>>
>>     Unless I am doing something wrong it seems as though we can not
>>     "have a hardbound on min-free-disk" after all, and uniform brick
>>     size is therefore an essential requirement.  It still doesn't say
>>     that in the documentation, at least not in the volume creation
>>     sections.
>>
>>
>>     -Dan.
>>
>> On 08/09/11 06:35, Raghavendra Bhat wrote:
>> > This is how it is supposed to work.
>> >
>> > Suppose a distribute volume is created with 2 bricks. 1st brick is 
>> having 25GB of free space, 2nd disk has 35 GB of free space. If one 
>> sets a 30GB of minimum-free-disk through volume set (gluster volume 
>> set <volname> min-free-disk 30GB), then whenever files are created, 
>> if the file is hashed to the 1st brick (which has 25GB of free 
>> space), then actual file will be created in the 2nd brick to which a 
>> linkfile will be created in the 1st brick. So the linkfile points to 
>> the actual file. A warning message indicating minimum free disk limit 
>> has been crosses and adding more nodes will be printed in the 
>> glusterfs log file. So any file which is hashed to the 1st brick will 
>> be created in the 2nd brick.
>> >
>> > Once the free space of 2nd brick also comes below 30 GB, then the 
>> files will be created in the respective hashed bricks only. There 
>> will be a warning message in the log file about the 2nd brick also 
>> crossing the minimum free disk limit.
>> >
>> > Regards,
>> > Raghavendra Bhat
>>
> Dear Raghavendra,
> Thanks for explaining this to me.  This mechanism should allow a 
> volume to function correctly with non-uniform brick sizes even though 
> min-free-disk is not a hard limit.  I can understand now why I had so 
> many problems with the default value of 10% for min-free-disk.  10% of 
> a large brick can be very large compared to 10% of a small brick, so 
> when they started filling up at the same rate after all had less than 
> 10% free space the small bricks usually filled up long before large 
> ones, giving "device full" errors even when df still showed a lot of 
> free space in the volume.  At least now we can minimise this effect by 
> setting min-free-disk to a value in GB.
>
> -Dan.
>
Dear Raghavendra,
Unfortunately I am still having problems with some bricks filling up 
completely, despite having "cluster.min-free-disk: 20GB".  In one case I 
am still seeing warnings about bricks being nearly full in percentage 
terms in the client logs, so I am wondering if the volume is still using 
cluster.min-free-disk: 10%, and ignoring the 20GB setting I changed it 
to.  When I changed cluster.min-free-disk should this have taken effect 
immediately is there something else I should have done to activate the 
change?

In your example above, suppose there are 9 bricks instead of 2 bricks 
(as in my volume), and they all have less than 30GB free space except 
for one which is nearly empty, is GlusterFS clever enough to find that 
nearly empty brick every time when creating new files?  I expected all 
new files to be created in my nearly empty brick but that has not 
happened.  Some files have gone in there but most have gone to nearly 
full bricks, one of which has now filled up completely.  I have done 
rebalance...fix-layout a number of times.  What can I do to fix this 
problem?  The volumes with one or more full bricks are unusable because 
users are getting "device full" errors for some writes even though both 
volumes are showing several TB free space.

Regards
-Dan Bretherton.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20110929/53f461a0/attachment.html>