[Gluster-devel] Problem with scheduler load-balancing
drizzo201-cfs at yahoo.com
drizzo201-cfs at yahoo.com
Fri Nov 9 16:26:15 UTC 2007
This is a repost. Don't think the first one made it through.
I am having trouble getting a configuration based on this profile to work properly.
http://www.gluster.org/docs/index.php/Advanced_Striping_with_GlusterFS
Using dd to create files with the "striped" extensions works OK. Files get striped across all four servers. When I use dd to create files w/out a "stripe" matched extension, they're created on a single server. My problem occurs when I run unstriped dd's from the three clients simultaneously --- all of the file creates end up on the same glusterfsd server.
I've used both the rr and alu schedulers to try and get the non-striped file creates spread across the four servers but it doesn' t work. If I let the dd's run to completion and start a new set of dd's, the new files are created on a different server, but as before, all three dd's go the the same server.
Single client, single threaded, striped writes run at ~105MB/s. Single client, single threaded, non-striped writes run at ~85MB/s. When I run three "unstriped" client dd's concurrently and all IO goes to the same server, total thruput drops to ~50MB/s with each client getting a third of the total (lots of disk thrashing). The dd test is "dd if=/dev/zero of=/mnt/cluster/testX bs=64k count=64k". I just add the .img to get the file striped.
I'm using the gluster patched fuse client (-glfs5) and gluster 1.3.7. Interconnect is tcp over Gbe with clients having a single connection and servers having a bonded dual GBe interface. Three servers are running SLES10SP1, 4th server is running CentOS4.5. Local file systems are XFS and ext3 mounted with extended attributes. These file systems are on unshared, single partition internal 250GB SATA disks.
Glusterfsd servers are started with log level NORMAL and don't show any problems at this log level.
server# glusterfsd -f /etc/glusterfs/bp-server.vol.iot -L NORMAL -l /var/log/glusterfs/glusterfsd.log
Clients are started with log level DEBUG.
client# glusterfs -l /var/log/glusterfs/glusterfs.log -L DEBUG --server=demo1 /mnt/cluster
Here is a snippet from one of the clients. The other two clients have the same entries.
2007-11-07 12:09:31 D [inode.c:351:__active_inode] fuse/inode: activating inode(97417), lru=21/1024
2007-11-07 12:09:31 D [inode.c:308:__destroy_inode] fuse/inode: destroy inode(0) [@0xaee30e28]
2007-11-07 12:14:01 D [inode.c:381:__passive_inode] fuse/inode: passivating inode(97417), lru=22/1024
2007-11-07 14:33:53 D [fuse-bridge.c:422:fuse_lookup] glusterfs-fuse: LOOKUP 1/rlx11-8 (/rlx11-8)
2007-11-07 14:33:54 D [fuse-bridge.c:377:fuse_entry_cbk] glusterfs-fuse: ERR => -1 (2)
2007-11-07 14:33:54 D [inode.c:308:__destroy_inode] fuse/inode: destroy inode(0) [@0xaf00a2b0]
2007-11-07 14:33:54 D [inode.c:559:__create_inode] fuse/inode: create inode(97692)
2007-11-07 14:33:54 D [inode.c:351:__active_inode] fuse/inode: activating inode(97692), lru=22/1024
2007-11-07 14:33:54 D [inode.c:308:__destroy_inode] fuse/inode: destroy inode(0) [@0xaf00f6b0]
2007-11-07 14:46:32 D [inode.c:381:__passive_inode] fuse/inode: passivating inode(97692), lru=23/1024
Any idea's?
All four servers are setup the same way. Here is the spec file from the first server:
########################
volume posix-unify
type storage/posix
option directory /gluster/unify
end-volume
volume posix-stripe
type storage/posix
option directory /gluster/stripe
end-volume
volume posix-namespace
type storage/posix
option directory /export/namespace
end-volume
#volume plocks
# type features/posix-locks
# option manadatory on
# subvolumes posix-unify posix-stripe
# end-volume
volume iot-posix-unify
type performance/io-threads
option thread-count 8
subvolumes posix-unify
end-volume
volume iot-posix-stripe
type performance/io-threads
option thread-count 8
subvolumes posix-stripe
end-volume
volume server
type protocol/server
option transport-type tcp/server
option auth.ip.iot-posix-unify.allow 192.168.1.*
option auth.ip.iot-posix-stripe.allow 192.168.1.*
option auth.ip.posix-namespace.allow 192.168.1.*
option client-volume-filename /etc/glusterfs/bp-client.vol.iot
subvolumes iot-posix-unify iot-posix-stripe posix-namespace
end-volume
########################
Here is the client spec;
volume client-namespace
type protocol/client
option transport-type tcp/client
option remote-host 192.168.1.201
option remote-subvolume posix-namespace
end-volume
volume client-unify-1
type protocol/client
option transport-type tcp/client
option remote-host 192.168.1.201
option remote-subvolume iot-posix-unify
end-volume
volume client-unify-2
type protocol/client
option transport-type tcp/client
option remote-host 192.168.1.202
option remote-subvolume iot-posix-unify
end-volume
volume client-unify-3
type protocol/client
option transport-type tcp/client
option remote-host 192.168.1.203
option remote-subvolume iot-posix-unify
end-volume
volume client-unify-4
type protocol/client
option transport-type tcp/client
option remote-host 192.168.1.224
option remote-subvolume iot-posix-unify
end-volume
volume client-stripe-1
type protocol/client
option transport-type tcp/client
option remote-host 192.168.1.201
option remote-subvolume iot-posix-stripe
end-volume
volume client-stripe-2
type protocol/client
option transport-type tcp/client
option remote-host 192.168.1.202
option remote-subvolume iot-posix-stripe
end-volume
volume client-stripe-3
type protocol/client
option transport-type tcp/client
option remote-host 192.168.1.203
option remote-subvolume iot-posix-stripe
end-volume
volume client-stripe-4
type protocol/client
option transport-type tcp/client
option remote-host 192.168.1.224
option remote-subvolume iot-posix-stripe
end-volume
volume unify
type cluster/unify
# option scheduler rr
### ** ALU Scheduler Option **
option scheduler alu
option alu.limits.min-free-disk 5% #%
option alu.limits.max-open-files 10000
option alu.order disk-usage:read-usage:write-usage:open-files-usage:disk-speed-usage
option alu.disk-usage.entry-threshold 2GB
option alu.disk-usage.exit-threshold 128MB
option alu.open-files-usage.entry-threshold 1024
option alu.open-files-usage.exit-threshold 32
option alu.read-usage.entry-threshold 20 #%
option alu.read-usage.exit-threshold 4 #%
option alu.write-usage.entry-threshold 20 #%
option alu.write-usage.exit-threshold 4 #%
# option alu.disk-speed-usage.entry-threshold 0 # DO NOT SET IT. SPEED IS CONSTANT!!!
# option alu.disk-speed-usage.exit-threshold 0 # DO NOT SET IT. SPEED IS CONSTANT!!!
option alu.stat-refresh.interval 10sec
option namespace client-namespace
subvolumes client-unify-1 client-unify-2 client-unify-3 client-unify-4
end-volume
volume stripe
type cluster/stripe
option block-size *.img:2MB,*.tmp:2MB,*DUMMY*:1MB # All files ending with .img in name are striped with 2MB stripe block size.
subvolumes unify client-stripe-1 client-stripe-2 client-stripe-3 client-stripe-4
# subvolumes client-stripe-1 client-stripe-2 client-stripe-3 client-stripe-4
end-volume
volume iot
type performance/io-threads
option thread-count 8
subvolumes stripe
end-volume
volume wb
type performance/write-behind
# option thread-count 8
subvolumes iot
end-volume
volume ra
type performance/read-ahead
# option thread-count 8
subvolumes wb
end-volume
#volume ioc
# type performance/io-cache
# option thread-count 8
# subvolumes ra
# end-volume
####################
More information about the Gluster-devel
mailing list