[Gluster-devel] filesystem full / unify
Amar S. Tumballi
amar at zresearch.com
Thu Feb 28 21:05:54 UTC 2008
Let me tell how it works currently, and why.
When a file is created on filesystem, there is no way a for a filesystem to
know about its future size. It may be just a 0byte file, may be 4k, may be
1GB, may be 100GB. Hence, while creating a file, GlusterFS just creates it
on the best possible node (depending on the scheduler) at the time of
creation. Hope that explains the reason why the 40GB file got created on the
afr (gluster1/2).
---- snip ----
|-------------- u n i f y ----------------|
| gluster1 <-- afr ---> gluster2 |
| gluster3 <-- afr ---> gluster4 |
| gluster5 <-- afr ---> gluster6 |
| gluster7 <-- afr ---> gluster8 |
|-------------- u n i f y ----------------|
gluster1 / 2 has 8gb free diskspace
gluster3 / 4 has 2 gb free diskspace (*)
gluster5 / 6 has 2 gb free diskspace (*)
gluster7 / 8 has 2 tb free diskspace
(*) => no writes due to alu.limits.min-free-disk 6GB
Now, when i try to copy a large file (40gb) into the gluster it starts
writing to gluster1/2. after 8gb gluster1/2 has no more free diskspace
and the copy process dies.
Is there a configuration option for unify to write on an other
gluster-server if one server run out of space during a write?
---- end-snip ----
Well, then it would be great if we could handle such cases too. (Its on the
roadmap) currently there is a switch scheduler in 1.3.8pre1 or latest tla
archives, where you can specify where the file should go if it matches a
specific pattern. (this helps if you have a pattern for all the large
files).
Another way of tackling this problem is using stripe, where you can stripe
your file across, but well, it may not suit everyone.
It would be an ideal case if users had a way to tell about the possible
filesize during the creation. There is no such option in any of the
currently available kernels/system calls.
Now, we can have discussion on how to handle such situation if anyone has
idea. Please do think about all the parameters before suggesting one. For
example I can say "well, when the file reaches 6GB on afr (gluster1/2) you
are just left with 2GB there, hence, move it to afr (gluster7/8) where you
have 2TB', but now, while a single 'write()' gets the error ENOSPC, I have
to move whole of 6GB from one node to another internally, which stops the
application for that long (by that time, 'transport-timeout' would be called
too). Next problem is where will you go if the file size reaches 2TB ?
Open for design ideas.
Thanks and regards,
Amar
> Regards,
>
> jan
>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>
--
Amar Tumballi
Gluster/GlusterFS Hacker
[bulde on #gluster/irc.gnu.org]
http://www.zresearch.com - Commoditizing Supercomputing and Superstorage!
More information about the Gluster-devel
mailing list