[Gluster-devel] seeking advice: how to upgrade from 1.3.0pre4 to tla patch628?
Sascha Ottolski
ottolski at web.de
Tue Jan 8 08:31:19 UTC 2008
Hi list,
after some rather depressing unsuccessful attempts, I'm wondering if someone
has a hint what we could do to accomplish the above task on a productive
system: anytime we tried it yet, we had to roll back, since the load of the
servers and the clients climbed so high that our application became unusable.
Our understanding is, that the introduction of the namespace is killing us,
but did not find a way how get around the problem.
The setup: 4 servers, each has two bricks and a namespace; the bricks are on
separate raid arrays. The client do an afr so that server 1 and 2 mirror each
other, as do as 3 and 4. After that, the four resulting afrs are unified (see
config below). The setup is working so far, but not very stable (i.e. we see
memory leaks on client side). The upgraded version has the four namespaces
afr-ed as well. We have about 20 clients connected that only and rarely
write, and 7 clients that only but massively read (that is, apache webservers
serving the images). All machines are connected through GB Ethernet.
May be the source of the problem is, what we store on the cluster: Thats about
12 mio. images, adding to a size of ~300 GB, in a very very nested directory
structure. So, lots of relatively small files. And we are about to add
another 15 mio. files of even smaller size, they consume only 50 GB in total,
most files only 1 or 2 KB in size.
Now, if we start the new gluster with a new, empty namespace, it only takes
minutes to have the load on the servers to be around 1.5, and on the reading
clients to jump as high as 200(!). Obviously, no more images get delivered to
connected browers. You can imagine that we did not even remotely thought to
add the load of rebuilding the namespace by force, so all the load seems to
be coming from self-heal.
In an earlier attempt with 1.3.2, this picture didn't change much even after a
forced rebuild of the namespace (which took about 24(!)) hours. Also, using
only one namespace brick and no afr did help (but it became clear that the
server with the namespace was much more loaded than the others).
So far, we did not find a proper way to simulate the problems on a test
system, which makes it even harder to find a solution :-(
One idea that comes to mind is, could we somehow prepare the namespace bricks
on the old version cluster, to reduce the necessity of the self-healing
mechanism after the upgrade?
Thanks for reading this much, I hope I've drawn the picture thoroughly, please
let me know if any thing is missing.
Cheer, Sascha
server config:
volume fsbrick1
type storage/posix
option directory /data1
end-volume
volume fsbrick2
type storage/posix
option directory /data2
end-volume
volume nsfsbrick1
type storage/posix
option directory /data-ns1
end-volume
volume brick1
type performance/io-threads
option thread-count 8
option queue-limit 1024
subvolumes fsbrick1
end-volume
volume brick2
type performance/io-threads
option thread-count 8
option queue-limit 1024
subvolumes fsbrick2
end-volume
### Add network serving capability to above bricks.
volume server
type protocol/server
option transport-type tcp/server # For TCP/IP transport
option listen-port 6996 # Default is 6996
option client-volume-filename /etc/glusterfs/glusterfs-client.vol
subvolumes brick1 brick2 nsfsbrick1
option auth.ip.brick1.allow * # Allow access to "brick" volume
option auth.ip.brick2.allow * # Allow access to "brick" volume
option auth.ip.nsfsbrick1.allow * # Allow access to "brick" volume
end-volume
-----------------------------------------------------------------------
client config
volume fsc1
type protocol/client
option transport-type tcp/client
option remote-host 10.10.1.95
option remote-subvolume brick1
end-volume
volume fsc1r
type protocol/client
option transport-type tcp/client
option remote-host 10.10.1.95
option remote-subvolume brick2
end-volume
volume fsc2
type protocol/client
option transport-type tcp/client
option remote-host 10.10.1.96
option remote-subvolume brick1
end-volume
volume fsc2r
type protocol/client
option transport-type tcp/client
option remote-host 10.10.1.96
option remote-subvolume brick2
end-volume
volume fsc3
type protocol/client
option transport-type tcp/client
option remote-host 10.10.1.97
option remote-subvolume brick1
end-volume
volume fsc3r
type protocol/client
option transport-type tcp/client
option remote-host 10.10.1.97
option remote-subvolume brick2
end-volume
volume fsc4
type protocol/client
option transport-type tcp/client
option remote-host 10.10.1.98
option remote-subvolume brick1
end-volume
volume fsc4r
type protocol/client
option transport-type tcp/client
option remote-host 10.10.1.98
option remote-subvolume brick2
end-volume
volume afr1
type cluster/afr
subvolumes fsc1 fsc2r
end-volume
volume afr2
type cluster/afr
subvolumes fsc2 fsc1r
end-volume
volume afr3
type cluster/afr
subvolumes fsc3 fsc4r
end-volume
volume afr4
type cluster/afr
subvolumes fsc4 fsc3r
end-volume
volume ns1
type protocol/client
option transport-type tcp/client
option remote-host 10.10.1.95
option remote-subvolume nsfsbrick1
end-volume
volume ns2
type protocol/client
option transport-type tcp/client
option remote-host 10.10.1.96
option remote-subvolume nsfsbrick1
end-volume
volume ns3
type protocol/client
option transport-type tcp/client
option remote-host 10.10.1.97
option remote-subvolume nsfsbrick1
end-volume
volume ns4
type protocol/client
option transport-type tcp/client
option remote-host 10.10.1.98
option remote-subvolume nsfsbrick1
end-volume
volume afrns
type cluster/afr
subvolumes ns1 ns2 ns3 ns4
end-volume
volume bricks
type cluster/unify
subvolumes afr1 afr2 afr3 afr4
option namespace afrns
option scheduler alu
option alu.limits.min-free-disk 5%
option alu.limits.max-open-files 10000
option alu.order
disk-usage:read-usage:write-usage:open-files-usage:disk-speed
-usage
option alu.disk-usage.entry-threshold 2GB
option alu.disk-usage.exit-threshold 60MB
option alu.open-files-usage.entry-threshold 1024
option alu.open-files-usage.exit-threshold 32
end-volume
volume readahead
type performance/read-ahead
option page-size 256KB
option page-count 2
subvolumes bricks
end-volume
volume write-behind
type performance/write-behind
option aggregate-size 1MB
subvolumes readahead
end-volume
volume io-cache
type performance/io-cache
option page-size 128KB
option cache-size 64MB
subvolumes write-behind
end-volume
More information about the Gluster-devel
mailing list