[Gluster-users] setup issues and errors
Lyric Hartley
lyric at aboutus.org
Thu Feb 26 07:46:58 UTC 2009
Hello All,
We have recently switched to gluster from nfs for sharing images
between a cluster of web servers. I have noticed a few issues, and am
hoping someone has some advice.
One of the main reasons to switch was redundancy - if one server goes
down the clients continue to write and read images, when the server
comes back, it gets synced. When we would have trouble with nfs, the
whole site was crippled/down, we are trying to get away from this.
We launched gluster into production a couple of weeks ago. We have
seen a few issues since then.
1) when one of the servers was under load (from something running on
the same box), we started getting a lot of timeout errors from the
app. I turned off the gluster server on that host and things were ok
again
2) we rebooted one of the clients, gluster was not started on reboot,
so we made some config changes and rebooted to see that it came up, it
did not (our isssue, I know) so, we started it manually...we started
getting a lot of timeout errors from our app...and then we started
getting a lot from all the other clients too, I ended up killing
gluster and remounting all the clients and things seem to be ok
now...sorry to be so vague, I just don't have a lot of data yet...
3) A client box became totally unresponsive and had to be power
cycled, we suspect it was gluster related as it had a really high load
not too long after the event above
from the logs on one of the servers, this is a snip, it looks mostly like this
2009-02-25 17:48:01 E [protocol.c:271:gf_block_unserialize_transport]
server: EOF from peer (127.0.0.1:44380)
2009-02-25 17:48:01 W [posix.c:959:posix_create] brick-ns: open on
/images/b/bd/Logo-southerncrosshumanitarian-org.jpg: No such file or
directory
2009-02-25 17:49:01 E [protocol.c:271:gf_block_unserialize_transport]
server: EOF from peer (127.0.0.1:44383)
2009-02-25 17:49:58 W [posix.c:959:posix_create] brick-ns: open on
/images/a/a1/Logo-replica-designers-com.gif: No such file or directory
2009-02-25 17:50:01 E [protocol.c:271:gf_block_unserialize_transport]
server: EOF from peer (127.0.0.1:44388)
2009-02-25 17:51:01 E [protocol.c:271:gf_block_unserialize_transport]
server: EOF from peer (127.0.0.1:44391)
2009-02-25 17:51:47 W [posix.c:959:posix_create] brick-ns: open on
/images/9/9b/Logo-brisbanetraybodys-com-au.png: No such file or
directory
2009-02-25 17:52:01 E [protocol.c:271:gf_block_unserialize_transport]
server: EOF from peer (127.0.0.1:44396)
2009-02-25 17:52:29 W [posix.c:959:posix_create] brick-ns: open on
/images/e/e5/Logo-callverse-com.gif: No such file or directory
2009-02-25 17:53:01 E [protocol.c:271:gf_block_unserialize_transport]
server: EOF from peer (127.0.0.1:53321)
2009-02-25 17:54:01 E [protocol.c:271:gf_block_unserialize_transport]
server: EOF from peer (127.0.0.1:53326)
2009-02-25 17:55:01 E [protocol.c:271:gf_block_unserialize_transport]
server: EOF from peer (127.0.0.1:53336)
2009-02-25 17:55:36 W [posix.c:959:posix_create] brick-ns: open on
/images/6/6f/jigsaw-logo.png: No such file or directory
2009-02-25 17:56:01 E [protocol.c:271:gf_block_unserialize_transport]
server: EOF from peer (127.0.0.1:53337)
2009-02-25 17:57:01 E [protocol.c:271:gf_block_unserialize_transport]
server: EOF from peer (127.0.0.1:53342)
2009-02-25 17:58:01 E [protocol.c:271:gf_block_unserialize_transport]
server: EOF from peer (127.0.0.1:54064)
2009-02-25 17:59:01 E [protocol.c:271:gf_block_unserialize_transport]
server: EOF from peer (127.0.0.1:54067)
2009-02-25 18:00:01 E [protocol.c:271:gf_block_unserialize_transport]
server: EOF from peer (127.0.0.1:54072)
2009-02-25 18:01:01 E [protocol.c:271:gf_block_unserialize_transport]
server: EOF from peer (127.0.0.1:54079)
2009-02-25 18:01:30 W [posix.c:959:posix_create] brick-ns: open on
/images/8/86/Portrait-KARACTERE.jpg: No such file or directory
2009-02-25 18:01:34 W [posix.c:959:posix_create] brick-ns: open on
/images/e/e4/Cropped-Portrait-KARACTERE.jpg: No such file or directory
2009-02-25 18:02:01 E [protocol.c:271:gf_block_unserialize_transport]
server: EOF from peer (127.0.0.1:54084)
2009-02-25 18:03:01 E [protocol.c:271:gf_block_unserialize_transport]
server: EOF from peer (127.0.0.1:43887)
from one of the clients
2009-02-25 17:27:12 E [client-protocol.c:4430:client_lookup_cbk]
brick-ns1: no proper reply from server, returning ENOTCONN
2009-02-25 17:27:12 E [client-protocol.c:325:client_protocol_xfer]
brick-ns1: transport_submit failed
2009-02-25 17:29:13 C [client-protocol.c:212:call_bail] brick-ns1:
bailing transport
2009-02-25 17:29:13 E [client-protocol.c:4834:client_protocol_cleanup]
brick-ns1: forced unwinding frame type(2) op(6) reply=@0x2aaab4467d80
2009-02-25 17:29:13 E [client-protocol.c:4277:client_unlock_cbk]
brick-ns1: no proper reply from server, returning ENOTCONN
2009-02-25 17:29:13 E [client-protocol.c:325:client_protocol_xfer]
brick-ns1: transport_submit failed
2009-02-25 17:34:28 E [afr.c:4625:afr_create_cbk] afr-ns:
(path=/images/8/8a/Portrait-Dan_Korn.jpg child=brick-ns2) op_ret=-1
op_errno=2
2009-02-25 17:34:28 E [afr.c:4625:afr_create_cbk] afr-ns:
(path=/images/6/65/Cropped-Portrait-Dan_Korn.jpg child=brick-ns2)
op_ret=-1 op_errno=2
2009-02-25 17:36:35 C [client-protocol.c:212:call_bail] brick-ns1:
bailing transport
2009-02-25 17:36:35 E [client-protocol.c:4834:client_protocol_cleanup]
brick-ns1: forced unwinding frame type(1) op(40) reply=@0x2aaab408f390
2009-02-25 17:36:35 E [client-protocol.c:4613:client_checksum_cbk]
brick-ns1: no proper reply from server, returning ENOTCONN
2009-02-25 17:56:53 C [client-protocol.c:212:call_bail] brick-ns1:
bailing transport
2009-02-25 17:56:53 E [client-protocol.c:4834:client_protocol_cleanup]
brick-ns1: forced unwinding frame type(2) op(5) reply=@0x2aaab46410a0
2009-02-25 17:56:53 E [client-protocol.c:4246:client_lock_cbk]
brick-ns1: no proper reply from server, returning ENOTCONN
2009-02-25 17:56:53 E [client-protocol.c:325:client_protocol_xfer]
brick-ns1: transport_submit failed
2009-02-25 17:58:34 C [client-protocol.c:212:call_bail] brick-ns1:
bailing transport
2009-02-25 17:58:34 E [client-protocol.c:4834:client_protocol_cleanup]
brick-ns1: forced unwinding frame type(1) op(34) reply=@0x2aaab41634e0
2009-02-25 17:58:34 E [client-protocol.c:4430:client_lookup_cbk]
brick-ns1: no proper reply from server, returning ENOTCONN
2009-02-25 17:58:34 E [client-protocol.c:325:client_protocol_xfer]
brick-ns1: transport_submit failed
2009-02-25 17:59:24 C [client-protocol.c:212:call_bail] brick-ns1:
bailing transport
2009-02-25 17:59:24 E [client-protocol.c:4834:client_protocol_cleanup]
brick-ns1: forced unwinding frame type(2) op(6) reply=@0x2aaab407b890
2009-02-25 17:59:24 E [client-protocol.c:4277:client_unlock_cbk]
brick-ns1: no proper reply from server, returning ENOTCONN
______________
server conf
_____________
volume brick
type storage/posix
option directory /opt/glusterfs/share/
end-volume
volume brick-ns
type storage/posix
option directory /opt/glusterfs/share-ns/
end-volume
volume server
type protocol/server
option transport-type tcp/server
option client-volume-filename /etc/glusterfs/glusterfs-client.vol
subvolumes brick brick-ns
option auth.ip.brick.allow 10.* # Allow access to "brick" volume
option auth.ip.brick-ns.allow 10.* # Allow access to "brick-ns" volume
end-volume
# performance changes
volume locks
type features/posix-locks
option mandatory-locks on
subvolumes brick
end-volume
volume iothreads
type performance/io-threads
option thread-count 8
subvolumes locks
end-volume
___________
client config
__________
volume brick1
type protocol/client
option transport-type tcp/client
option remote-host cumulus.adm # IP address of the remote brick
option remote-subvolume brick # name of the remote volume
end-volume
volume brick2
type protocol/client
option transport-type tcp/client
option remote-host dbs3.adm
option remote-subvolume brick
end-volume
volume brick-ns1
type protocol/client
option transport-type tcp/client
option remote-host cumulus.adm
option remote-subvolume brick-ns # Note the different remote volume name.
end-volume
volume brick-ns2
type protocol/client
option transport-type tcp/client
option remote-host dbs3.adm
option remote-subvolume brick-ns # Note the different remote volume name.
end-volume
volume afr1
type cluster/afr
subvolumes brick1 brick2
end-volume
volume afr-ns
type cluster/afr
subvolumes brick-ns1 brick-ns2
end-volume
olume unify
type cluster/unify
option namespace afr-ns
option scheduler rr
subvolumes afr1
end-volume
# performance changes
volume writebehind
type performance/write-behind
option aggregate-size 128KB
option window-size 1MB
subvolumes unify
end-volume
volume cache
type performance/io-cache
option cache-size 512MB
subvolumes writebehind
end-volume
Any advice is greatly appreciated
More information about the Gluster-users
mailing list