[Gluster-users] free space with varying brick sizes

Kingsley gluster at gluster.dogwind.com
Sun Oct 19 12:53:33 UTC 2014


Hi,

I've spotted what may be a small bug (or an unavoidable feature?) with
the way a gluster volume reports free space while a replicated brick is
re-syncing, or it may be that there's a setting I need to change.

Using gluster 3.5.2 on CentOS 7, I created a volume with 3 servers using
3 replicas. The servers were very different specs from each other, and
had varying disk sizes.

# gluster volume status
Status of volume: gv0
Gluster process                                         Port    Online  Pid
------------------------------------------------------------------------------
Brick data210:/data/brick/gv0                           49153   Y       2131
Brick data310:/data/brick/gv0                           49152   Y       2211
Brick data410:/data/brick/gv0                           49152   Y       2346
NFS Server on localhost                                 2049    Y       2360
Self-heal Daemon on localhost                           N/A     Y       2364
NFS Server on data310                                   2049    Y       2225
Self-heal Daemon on data310                             N/A     Y       2229
NFS Server on 172.20.1.2                                2049    Y       2146
Self-heal Daemon on 172.20.1.2                          N/A     Y       2141

Task Status of Volume gv0
------------------------------------------------------------------------------


Sensibly, to the clients mounting the volume, Gluster showed the free
space as being the amount of free space on the smallest brick.

I wrote about 120GB of data to the cluster, and then simulated a brick
failure and replacement by doing this on the server that had the
smallest disks (I didn't have a 4th server to hand to introduce):

      * stopped the gluster service
      * killed any remaining gluster processes
      * uninstalled gluster (yum remove glusterfs-server glusterfs)
      * deleted /var/lib/glusterd/
      * deleted /data/brick/gv0/

I then re-installed gluster and re-introduced the server to the cluster
by following instructions here:

http://gluster.org/community/documentation/index.php/Gluster_3.4:_Brick_Restoration_-_Replace_Crashed_Server


What I noticed while it was re-syncing the data back to the 'new' brick
was that, on the client, the free space *and* the used space were values
taken from this smallest brick that had not yet finished rebuilding
data. As these are test servers, /data/brick is on the root file system:


client# df
Filesystem              1K-blocks     Used Available Use% Mounted on
/dev/mapper/centos-root 146933660  1273948 145659712   1% /
data310:gv0             234708992 90838272 143870720  39% /mnt/gv0


brick-server# df
Filesystem              1K-blocks     Used Available Use% Mounted on
/dev/mapper/centos-root 234709020 90860416 143848604  39% /
/dev/sda1                  508588   177704    330884  35% /boot


The problem with this is that when data has finished replicating, the
smallest brick will be over 50% full. It is therefore possible for a
client to (while the brick is rebuilding) write too much data to the
volume such that the smallest brick will be unable to hold all of the
volume's data once rebuild has finished, ie oversubscribe the space on
the brick.

This is obviously more likely to hit a problem if clients write to a
rebuilding volume that's quite full, and the new brick has only just
started to replicate, so hopefully a rare case.

What I think would be a more failsafe behaviour is for gluster to report
to clients the volume size based on the smallest brick in the replica
group, but the space used based on the most spaced used on one of the
up-to-date bricks. I appreciate this may not be a value so easily
derived if the file system containing the brick also contains other
data.

Is this a setting I need to change, or is this a bug?

Cheers,
Kingsley.



More information about the Gluster-users mailing list