[Gluster-devel] seeking advice: how to upgrade from 1.3.0pre4 to tla patch628?

Sascha Ottolski ottolski at web.de
Tue Jan 8 08:31:19 UTC 2008


Hi list,

after some rather depressing unsuccessful attempts, I'm wondering if someone 
has a hint what we could do to accomplish the above task on a productive 
system: anytime we tried it yet, we had to roll back, since the load of the 
servers and the clients climbed so high that our application became unusable.

Our understanding is, that the introduction of the namespace is killing us, 
but did not find a way how get around the problem.

The setup: 4 servers, each has two bricks and a namespace; the bricks are on 
separate raid arrays. The client do an afr so that server 1 and 2 mirror each 
other, as do as 3 and 4. After that, the four resulting afrs are unified (see 
config below). The setup is working so far, but not very stable (i.e. we see 
memory leaks on client side). The upgraded version has the four namespaces 
afr-ed as well. We have about 20 clients connected that only and rarely 
write, and 7 clients that only but massively read (that is, apache webservers 
serving the images). All machines are connected through GB Ethernet.

May be the source of the problem is, what we store on the cluster: Thats about 
12 mio. images, adding to a size of ~300 GB, in a very very nested directory 
structure. So, lots of relatively small files. And we are about to add 
another 15 mio. files of even smaller size, they consume only 50 GB in total, 
most files only 1 or 2 KB in size.

Now, if we start the new gluster with a new, empty namespace, it only takes 
minutes to have the load on the servers to be around 1.5, and on the reading 
clients to jump as high as 200(!). Obviously, no more images get delivered to 
connected browers. You can imagine that we did not even remotely thought to 
add the load of rebuilding the namespace by force, so all the load seems to 
be coming from self-heal.

In an earlier attempt with 1.3.2, this picture didn't change much even after a 
forced rebuild of the namespace (which took about 24(!)) hours. Also, using 
only one namespace brick and no afr did help (but it became clear that the 
server with the namespace was much more loaded than the others).

So far, we did not find a proper way to simulate the problems on a test 
system, which makes it even harder to find a solution :-(

One idea that comes to mind is, could we somehow prepare the namespace bricks 
on the old version cluster, to reduce the necessity of the self-healing 
mechanism after the upgrade?

Thanks for reading this much, I hope I've drawn the picture thoroughly, please 
let me know if any thing is missing.


Cheer, Sascha


server config:

volume fsbrick1
  type storage/posix
  option directory /data1
end-volume

volume fsbrick2
  type storage/posix
  option directory /data2
end-volume

volume nsfsbrick1
  type storage/posix
  option directory /data-ns1
end-volume

volume brick1
  type performance/io-threads
  option thread-count 8
  option queue-limit 1024
  subvolumes fsbrick1
end-volume

volume brick2
  type performance/io-threads
  option thread-count 8
  option queue-limit 1024
  subvolumes fsbrick2
end-volume

### Add network serving capability to above bricks.
volume server
  type protocol/server
  option transport-type tcp/server     # For TCP/IP transport
  option listen-port 6996              # Default is 6996
  option client-volume-filename /etc/glusterfs/glusterfs-client.vol
  subvolumes brick1 brick2 nsfsbrick1
  option auth.ip.brick1.allow * # Allow access to "brick" volume
  option auth.ip.brick2.allow * # Allow access to "brick" volume
  option auth.ip.nsfsbrick1.allow * # Allow access to "brick" volume
end-volume

-----------------------------------------------------------------------

client config

volume fsc1
  type protocol/client
  option transport-type tcp/client
  option remote-host 10.10.1.95
  option remote-subvolume brick1
end-volume

volume fsc1r
  type protocol/client
  option transport-type tcp/client
  option remote-host 10.10.1.95
  option remote-subvolume brick2
end-volume

volume fsc2
  type protocol/client
  option transport-type tcp/client
  option remote-host 10.10.1.96
  option remote-subvolume brick1
end-volume

volume fsc2r
  type protocol/client
  option transport-type tcp/client
  option remote-host 10.10.1.96
  option remote-subvolume brick2
end-volume

volume fsc3
  type protocol/client
  option transport-type tcp/client
  option remote-host 10.10.1.97
  option remote-subvolume brick1
end-volume

volume fsc3r
  type protocol/client
  option transport-type tcp/client
  option remote-host 10.10.1.97
  option remote-subvolume brick2
end-volume

volume fsc4
  type protocol/client
  option transport-type tcp/client
  option remote-host 10.10.1.98
  option remote-subvolume brick1
end-volume

volume fsc4r
  type protocol/client
  option transport-type tcp/client
  option remote-host 10.10.1.98
  option remote-subvolume brick2
end-volume

volume afr1
  type cluster/afr
  subvolumes fsc1 fsc2r
end-volume

volume afr2
  type cluster/afr
  subvolumes fsc2 fsc1r
end-volume

volume afr3
  type cluster/afr
  subvolumes fsc3 fsc4r
end-volume

volume afr4
  type cluster/afr
  subvolumes fsc4 fsc3r
end-volume

volume ns1
  type protocol/client
  option transport-type tcp/client
  option remote-host 10.10.1.95
  option remote-subvolume nsfsbrick1
end-volume

volume ns2
  type protocol/client
  option transport-type tcp/client
  option remote-host 10.10.1.96
  option remote-subvolume nsfsbrick1
end-volume

volume ns3
  type protocol/client
  option transport-type tcp/client
  option remote-host 10.10.1.97
  option remote-subvolume nsfsbrick1
end-volume

volume ns4
  type protocol/client
  option transport-type tcp/client
  option remote-host 10.10.1.98
  option remote-subvolume nsfsbrick1
end-volume

volume afrns
  type cluster/afr
  subvolumes ns1 ns2 ns3 ns4
end-volume

volume bricks
  type cluster/unify
  subvolumes afr1 afr2 afr3 afr4
  option namespace afrns
  option scheduler alu
  option alu.limits.min-free-disk  5%
  option alu.limits.max-open-files 10000
  option alu.order 
disk-usage:read-usage:write-usage:open-files-usage:disk-speed
-usage
  option alu.disk-usage.entry-threshold 2GB 
  option alu.disk-usage.exit-threshold  60MB
  option alu.open-files-usage.entry-threshold 1024
  option alu.open-files-usage.exit-threshold 32
end-volume

volume readahead
  type performance/read-ahead
  option page-size 256KB
  option page-count 2
  subvolumes bricks
end-volume

volume write-behind
  type performance/write-behind
  option aggregate-size 1MB
  subvolumes readahead
end-volume

volume io-cache
  type performance/io-cache
  option page-size 128KB
  option cache-size 64MB
  subvolumes write-behind
end-volume






More information about the Gluster-devel mailing list