[Gluster-users] No space left on device (when there is actually lots of free space)
Kali Hernandez
kali at thenetcircle.com
Tue Apr 6 04:07:53 UTC 2010
Hi all,
We are running glusterfs 3.0.3, installed from RHEL rpm's, over 30 nodes
(not virtual machines). Our config pairs each 2 machines under replicate
translator as mirrors, and over that aggregates the 15 resulting mirrors
under stripe translator. Before we were using distribute instead, but we
had the same problem.
We are copying (using cp) a lot of files which reside under the same
directory, and I have been monitoring the whole copy process to check
where the failure starts.
In the middle of the copy process we get this error:
cp: cannot create regular file
`/mnt/gluster_new/videos/1251512-3CA86758640A31E7770EBC7629AEC10F.mpg':
No space left on device
cp: cannot create regular file
`/mnt/gluster_new/videos/1758650-3AF69C6B7FDAC0A40D85EABA8C85490D.mswmm': No
space left on device
cp: cannot create regular file
`/mnt/gluster_new/videos/179183-A018B5FBE6DCCF04A3BB99C814CD9EAB.wmv':
No space left on device
cp: cannot create regular file
`/mnt/gluster_new/videos/2448602-568B1ACF53675DC762485F2B26539E0D.wmv':
No space left on device
cp: cannot create regular file
`/mnt/gluster_new/videos/626249-7B7FFFE0B9C56E9BE5733409CB73BCDF_300.jpg':
No space left on device
cp: cannot create regular file
`/mnt/gluster_new/videos/1962299-B7CDFF12FB1AD41DF3660BF0C7045CBC.avi':
No space left on device
(hundreds of times)
When I look at the storage distribution, I can see this:
node 10 37G 14G 23G 38% /glusterfs_storage
node 11 37G 14G 23G 37% /glusterfs_storage
node 12 37G 14G 23G 37% /glusterfs_storage
node 13 37G 14G 23G 37% /glusterfs_storage
node 14 37G 13G 24G 36% /glusterfs_storage
node 15 37G 13G 24G 36% /glusterfs_storage
node 16 37G 13G 24G 35% /glusterfs_storage
node 17 49G 12G 36G 26% /glusterfs_storage
node 18 37G 12G 25G 33% /glusterfs_storage
node 19 37G 12G 25G 33% /glusterfs_storage
node 20 37G 14G 23G 38% /glusterfs_storage
node 21 37G 14G 23G 37% /glusterfs_storage
node 22 37G 14G 23G 37% /glusterfs_storage
node 23 37G 14G 23G 37% /glusterfs_storage
node 24 37G 13G 24G 36% /glusterfs_storage
node 25 37G 13G 24G 36% /glusterfs_storage
node 26 37G 13G 24G 35% /glusterfs_storage
node 27 49G 12G 36G 26% /glusterfs_storage
node 28 37G 12G 25G 33% /glusterfs_storage
node 29 37G 12G 25G 33% /glusterfs_storage
node 35 40G 40G 0 100% /glusterfs_storage
node 36 40G 22G 18G 56% /glusterfs_storage
node 37 40G 18G 22G 45% /glusterfs_storage
node 38 40G 16G 24G 40% /glusterfs_storage
node 39 40G 15G 25G 37% /glusterfs_storage
node 45 40G 40G 0 100% /glusterfs_storage
node 46 40G 22G 18G 56% /glusterfs_storage
node 47 40G 18G 22G 45% /glusterfs_storage
node 48 40G 16G 24G 40% /glusterfs_storage
node 49 40G 15G 25G 37% /glusterfs_storage
(node mirror pairings are 10-19 paired to 20-29, and 35-39 to 45-49)
As you can see, distribution of space over the cluster is more or less
rational over most of the nodes, except for node pair 35/45, which run
out of space. Thus, every time I try to copy more data onto the cluster,
I run into the mentioned "no space left on device"
From the mountpoint point of view, the gluster free space looks like this:
Filesystem 1M-blocks Used
Available Use% Mounted on
[...]
/etc/glusterfs/glusterfs.vol.new 586617 240197 340871 42%
/mnt/gluster_new
So basically, I get out of space messages when there is around 340 Gb
free on the cluster.
I tried using distribute translator instead of stripe, in fact that was
our first setup, but we thought maybe we are starting to copy a big file
(usually we store really big .tar.gz backups here) and it runs out of
space in the meanwhile, so we thought about using stripe, because
theoretically glusterfs would in that case move and copy the next block
of the file into another node. But in both cases (distribute and stripe)
we run into the same problems.
So I am wondering if this is a problem of a maximum number of files in a
same directory or filesystem or what?
Any ideas on this issue?
Our config as follows:
Each node has
--------------
volume posix
type storage/posix
option directory /glusterfs_storage
end-volume
volume locks
type features/posix-locks
subvolumes posix
end-volume
volume server
type protocol/server
option transport-type tcp
option auth.addr.locks.allow 10.20.0.*
subvolumes locks
end-volume
-------------
And the mount client has:
=======
##### Old blades (37 gb each, except rsid-a-27, 49 gb)
volume rsid-a-10
type protocol/client
option transport-type tcp
option remote-host 10.20.0.150
option remote-subvolume locks
end-volume
volume rsid-a-11
type protocol/client
option transport-type tcp
option remote-host 10.20.0.151
option remote-subvolume locks
end-volume
volume rsid-a-12
type protocol/client
option transport-type tcp
option remote-host 10.20.0.152
option remote-subvolume locks
end-volume
volume rsid-a-13
type protocol/client
option transport-type tcp
option remote-host 10.20.0.153
option remote-subvolume locks
end-volume
volume rsid-a-14
type protocol/client
option transport-type tcp
option remote-host 10.20.0.154
option remote-subvolume locks
end-volume
volume rsid-a-15
type protocol/client
option transport-type tcp
option remote-host 10.20.0.155
option remote-subvolume locks
end-volume
volume rsid-a-16
type protocol/client
option transport-type tcp
option remote-host 10.20.0.156
option remote-subvolume locks
end-volume
volume rsid-a-17
type protocol/client
option transport-type tcp
option remote-host 10.20.0.157
option remote-subvolume locks
end-volume
volume rsid-a-18
type protocol/client
option transport-type tcp
option remote-host 10.20.0.158
option remote-subvolume locks
end-volume
volume rsid-a-19
type protocol/client
option transport-type tcp
option remote-host 10.20.0.159
option remote-subvolume locks
end-volume
volume rsid-a-20
type protocol/client
option transport-type tcp
option remote-host 10.20.0.160
option remote-subvolume locks
end-volume
volume rsid-a-21
type protocol/client
option transport-type tcp
option remote-host 10.20.0.161
option remote-subvolume locks
end-volume
volume rsid-a-22
type protocol/client
option transport-type tcp
option remote-host 10.20.0.162
option remote-subvolume locks
end-volume
volume rsid-a-23
type protocol/client
option transport-type tcp
option remote-host 10.20.0.163
option remote-subvolume locks
end-volume
volume rsid-a-24
type protocol/client
option transport-type tcp
option remote-host 10.20.0.164
option remote-subvolume locks
end-volume
volume rsid-a-25
type protocol/client
option transport-type tcp
option remote-host 10.20.0.165
option remote-subvolume locks
end-volume
volume rsid-a-26
type protocol/client
option transport-type tcp
option remote-host 10.20.0.166
option remote-subvolume locks
end-volume
volume rsid-a-27
type protocol/client
option transport-type tcp
option remote-host 10.20.0.167
option remote-subvolume locks
end-volume
volume rsid-a-28
type protocol/client
option transport-type tcp
option remote-host 10.20.0.168
option remote-subvolume locks
end-volume
volume rsid-a-29
type protocol/client
option transport-type tcp
option remote-host 10.20.0.169
option remote-subvolume locks
end-volume
##### New blades (40gb each)
volume rsid-a-35
type protocol/client
option transport-type tcp
option remote-host 10.20.0.180
option remote-subvolume locks
end-volume
volume rsid-a-36
type protocol/client
option transport-type tcp
option remote-host 10.20.0.181
option remote-subvolume locks
end-volume
volume rsid-a-37
type protocol/client
option transport-type tcp
option remote-host 10.20.0.182
option remote-subvolume locks
end-volume
volume rsid-a-38
type protocol/client
option transport-type tcp
option remote-host 10.20.0.183
option remote-subvolume locks
end-volume
volume rsid-a-39
type protocol/client
option transport-type tcp
option remote-host 10.20.0.184
option remote-subvolume locks
end-volume
volume rsid-a-45
type protocol/client
option transport-type tcp
option remote-host 10.20.0.190
option remote-subvolume locks
end-volume
volume rsid-a-46
type protocol/client
option transport-type tcp
option remote-host 10.20.0.191
option remote-subvolume locks
end-volume
volume rsid-a-47
type protocol/client
option transport-type tcp
option remote-host 10.20.0.192
option remote-subvolume locks
end-volume
volume rsid-a-48
type protocol/client
option transport-type tcp
option remote-host 10.20.0.193
option remote-subvolume locks
end-volume
volume rsid-a-49
type protocol/client
option transport-type tcp
option remote-host 10.20.0.194
option remote-subvolume locks
end-volume
### mirroring blade volumes, 1x <---> 2x and 3x <---> 4x
volume mirror1
type cluster/replicate
subvolumes rsid-a-10 rsid-a-20
end-volume
volume mirror2
type cluster/replicate
subvolumes rsid-a-11 rsid-a-21
end-volume
volume mirror3
type cluster/replicate
subvolumes rsid-a-12 rsid-a-22
end-volume
volume mirror4
type cluster/replicate
subvolumes rsid-a-13 rsid-a-23
end-volume
volume mirror5
type cluster/replicate
subvolumes rsid-a-14 rsid-a-24
end-volume
volume mirror6
type cluster/replicate
subvolumes rsid-a-15 rsid-a-25
end-volume
volume mirror7
type cluster/replicate
subvolumes rsid-a-16 rsid-a-26
end-volume
volume mirror8
type cluster/replicate
subvolumes rsid-a-17 rsid-a-27
end-volume
volume mirror9
type cluster/replicate
subvolumes rsid-a-18 rsid-a-28
end-volume
volume mirror10
type cluster/replicate
subvolumes rsid-a-19 rsid-a-29
end-volume
volume mirror11
type cluster/replicate
subvolumes rsid-a-35 rsid-a-45
end-volume
volume mirror12
type cluster/replicate
subvolumes rsid-a-36 rsid-a-46
end-volume
volume mirror13
type cluster/replicate
subvolumes rsid-a-37 rsid-a-47
end-volume
volume mirror14
type cluster/replicate
subvolumes rsid-a-38 rsid-a-48
end-volume
volume mirror15
type cluster/replicate
subvolumes rsid-a-39 rsid-a-49
end-volume
### final volume, striped mirrors: 15x2 blades. 4 mb block size to allow
small files fit complete in a single storage node
### (currently only new blades are mirrored)
volume stripe
type cluster/stripe
option block-size 4MB
subvolumes mirror1 mirror2 mirror3 mirror4 mirror5 mirror6 mirror7
mirror8 mirror9 mirror10 mirror11 mirror12 mirror13 mirror14 mirror15
end-volume
volume writebehind
type performance/write-behind
option cache-size 4MB
option disable-for-first-nbytes 128KB
subvolumes stripe
end-volume
volume iocache
type performance/io-cache
subvolumes writebehind
option cache-size 4MB
option cache-timeout 5
end-volume
More information about the Gluster-users
mailing list