[Gluster-users] 2.0.6

David Saez Padros david at ols.es
Sat Aug 22 21:10:13 UTC 2009


> Do you mean you were unable to login to the machine over the network? unable to have a responsive console shell? machine would not respond to ICMP on the network? 

yes, machine only responds to pings, but no ssh acces was possible

> Do you still have the logfiles and volfiles and can you describe the steps to reproduce in a bug report?

no, i don't have the logfiles. Searching in google i found someone else
having a similar problem due to autoscaling but i supose i had it
disabled as the logfile did not report any error on the configuration
when using "option thread-count 8" as oposed when i try to disable it
explicity using "option autoscaling off" and those lines apperaed in
the logifle (this may be another bug):

[2009-08-16 14:29:57] W [io-threads.c:2206:init] export: 'thread-count' 
is specified with 'autoscaling' on. Ignoring'thread-count' option.

then i also faced some problems where shuting down one server made the
replicated brick unaccesible, even when the other server was accesible,
having some strange log lines like:

[2009-08-16 13:52:34] E [socket.c:744:socket_connect_finish] remote2: 
connection to  failed (Connection refused)
[2009-08-16 13:52:37] W [fuse-bridge.c:1837:fuse_statfs_cbk] 
glusterfs-fuse: 8: ERR => -1 (Transport endpoint is not connected)

then i realized that i have the same brick names in both replicated
gluster file system and that this was making very difficult to know
which of the file systems were failing, so i changed the vol files so
each brick name was unique and the all problems disapeared.

> As a thumb rule, if your server hangs to the degree of not even having a usable shell, it just means that heavy IO via glusterfs triggered some bug in the operating system. try to get kernel output via dmesg or console logs if you have any. glusterfsd only issues system calls and does not do anything funky with the server. Think of some application local to the server causing such a hung. glusterfsd is no different in that respect.

glusterfs was doing the unify of 6 bricks at the server, so it was
really doing something, also we had other processes doing intesive
math calculations (altough before using glusterfs and during almost
2 years those process never hanged the system). The first two times
that we copy a lot of data to the gluster brick, the system hanged as
reported, after the configuration changes we have done the same only
one time and the system did not hang. Tomorrow we will need again to
copy a lot of data and if it happens again will try to get as much
information as possible.

best regards ...

    David Saez Padros                http://www.ols.es
    On-Line Services 2000 S.L.       telf    +34 902 50 29 75

More information about the Gluster-users mailing list