[Gluster-users] GlusterFS spewing errors with Xen tap:aio block driver
Jim Phillips
jim at ergophobia.org
Wed Oct 8 20:03:36 UTC 2008
Greetings,
I'm trying to get to the bottom of a problem I'm having combining Xen
and GlusterFS. I've googled this extensively with no success. Every
once in awhile (more frequently than I'd like on a production
environment), GlusterFS client will start spewing errors into
glusterfs.log similar to:
2008-10-08 16:49:21 E [client-protocol.c:1158:client_writev] brick-
gridcpu02: : returning EBADFD
2008-10-08 16:49:21 E [fuse-bridge.c:1645:fuse_writev_cbk] glusterfs-
fuse: 352231064: WRITE => -1 (77)
2008-10-08 16:49:53 E [client-protocol.c:1158:client_writev] brick-
gridcpu02: : returning EBADFD
2008-10-08 16:49:53 E [fuse-bridge.c:1645:fuse_writev_cbk] glusterfs-
fuse: 352231241: WRITE => -1 (77)
2008-10-08 16:49:53 E [client-protocol.c:1158:client_writev] brick-
gridcpu02: : returning EBADFD
2008-10-08 16:49:53 E [fuse-bridge.c:1645:fuse_writev_cbk] glusterfs-
fuse: 352231243: WRITE => -1 (77)
It will literally fill the disk up in a matter of a hours if we don't
catch it early enough. This seems to only happen around disk images
opened by the Xen tap:aio block driver, and it goes away if I "xm
destroy" the right virtual instance or "xm migrate" it to another Xen
server. The DomU itself shows a disk error in dmesg and remounts the
disk read-only. How much functionality remains in the DomU tends to
vary from being able to cleanly shut it down, to not even being able
to login to run the shutdown command.
I'm going to cross-post this in both Xen and GlusterFS user mailing
lists, so I hope to get a response from one side or the other.
Specs are:
Hardware: Dell PowerEdge (1955 I think), with PERC3 SCSI disks in RAID1
OS: Ubuntu 8.04, amd64, Kernel 2.6.24-19-xen
GlusterFS 1.3.10, 1.3.10-0ubuntu1~hardy2 from https://launchpad.net/
~neil-aldur/+ppa-packages
Xen 3.2.0, 3.2.0-0ubuntu10, from Ubuntu
GlusterFS client configuration:
# file: /etc/glusterfs/glusterfs-client.vol
volume brick-gridfs01
type protocol/client
option transport-type tcp/client
option remote-host atl1gridfs01
option remote-port 6997
option remote-subvolume brick
end-volume
volume brick-gridcpu01
type protocol/client
option transport-type tcp/client
option remote-host atl1gridcpu01
option remote-port 6997
option remote-subvolume brick
end-volume
volume brick-gridcpu02
type protocol/client
option transport-type tcp/client
option remote-host atl1gridcpu02
option remote-port 6997
option remote-subvolume brick
end-volume
volume brick-gridcpu03
type protocol/client
option transport-type tcp/client
option remote-host atl1gridcpu03
option remote-port 6997
option remote-subvolume brick
end-volume
volume brick-gridcpu04
type protocol/client
option transport-type tcp/client
option remote-host atl1gridcpu04
option remote-port 6997
option remote-subvolume brick
end-volume
volume namespace-gridfs01
type protocol/client
option transport-type tcp/client
option remote-host atl1gridfs01
option remote-port 6997
option remote-subvolume brick-ns
end-volume
volume unify0
type cluster/unify
option scheduler alu
option alu.limits.min-free-disk 5%
option alu.limits.max-open-files 10000
option alu.order disk-usage:read-usage:write-usage:open-files-
usage:disk-speed-usage
option alu.disk-usage.entry-threshold 2GB # Kick in if the
discrepancy in disk-usage between volumes is more than 2GB
option alu.disk-usage.exit-threshold 60MB # Don't stop writing
to the least-used volume until the discrepancy is 1988MB
option alu.open-files-usage.entry-threshold 1024 # Kick in if the
discrepancy in open files is 1024
option alu.open-files-usage.exit-threshold 32 # Don't stop until
992 files have been written the least-used volume
# option alu.read-usage.entry-threshold 20% # Kick in when the read-
usage discrepancy is 20%
# option alu.read-usage.exit-threshold 4% # Don't stop until the
discrepancy has been reduced to 16% (20% - 4%)
# option alu.write-usage.entry-threshold 20% # Kick in when the
write-usage discrepancy is 20%
# option alu.write-usage.exit-threshold 4% # Don't stop until the
discrepancy has been reduced to 16%
# option alu.disk-speed-usage.entry-threshold # NEVER SET IT. SPEED IS
CONSTANT!!!
# option alu.disk-speed-usage.exit-threshold # NEVER SET IT. SPEED IS
CONSTANT!!!
option alu.stat-refresh.interval 10sec # Refresh the statistics
used for decision-making every 10 seconds
# option alu.stat-refresh.num-file-create 10
option namespace namespace-gridfs01
subvolumes brick-gridfs01 brick-gridcpu01 brick-gridcpu02 brick-
gridcpu03 brick-gridcpu04
end-volume
GlusterFS Server Config:
# file: /etc/glusterfs/glusterfs-server.vol
volume posix
type storage/posix
option directory /opt/gridfs/export
end-volume
volume plocks
type features/posix-locks
subvolumes posix
end-volume
volume brick
type performance/io-threads
option thread-count 4
subvolumes plocks
end-volume
volume brick-ns
type storage/posix
option directory /opt/gridfs/namespace
end-volume
volume server
type protocol/server
option transport-type tcp/server
option listen-port 6997
option auth.ip.brick.allow *
option auth.ip.brick-ns.allow *
subvolumes brick brick-ns
end-volume
Jim Phillips
jim at ergophobia.org
More information about the Gluster-users
mailing list