[Gluster-devel] GlusterFS hangs/fails: Transport endpoint is not connected

Fred Hucht fred at thp.uni-due.de
Tue Nov 25 12:35:01 UTC 2008


Hi!

The glusterfsd.log on all nodes are virtually empty, the only entry on  
2008-11-25 reads

2008-11-25 03:13:48 E [io-threads.c:273:iot_flush] sc1-ioth: fd  
context is NULL, returning EBADFD

on all nodes. I don't think that this is related to our problems.

Regards,

      Fred

On 25.11.2008, at 13:17, Basavanagowda Kanur wrote:

> Fred,
>   Can you also provide us server logs?
>
> --
> gowda
>
>
> On Tue, Nov 25, 2008 at 4:57 PM, Fred Hucht <fred at thp.uni-due.de>  
> wrote:
> Hi devels!
>
> We consider GlusterFS as parallel file server (8 server nodes) for  
> our parallel Opteron cluster (88 nodes, ~500 cores), as well as for  
> a unified nufa /scratch distributed over all nodes. We use the  
> cluster within a scientific environment (theoretical physics) and  
> use Scientific Linux with kernel 2.6.25.16. After similar problems  
> with 1.3.x we installed 1.4.0qa61 and set up a /scratch for testing  
> using the following script "glusterconf.sh" which runs local on all  
> nodes on startup and writes the two config files /usr/local/etc/ 
> glusterfs-{server,client}.vol:
>
> ---------------------------------- 8< snip >8  
> ----------------------------------
> #!/bin/sh
>
> HOST=$(hostname -s)
>
> if [ $HOST = master ];then
>    MASTER_IP=127.0.0.1
>    HOST_IP=127.0.0.1
>    HOST_N=0
> else
>    MASTER_IP=192.168.1.254
>    HOST_IP=$(hostname -i)
>    HOST_N=${HOST_IP##*.}
> fi
>
> LOCAL=sc$HOST_N
>
> ###################################################################
> # write /usr/local/etc/glusterfs-server.vol
> {
>
> cat <<EOF
> ###
> ### Server config automatically created by $PWD/$0
> ###
>
> EOF
>
> if [ $HOST = master ];then
>    SERVERVOLUMES="scns"
>    cat <<EOF
> volume scns
>  type storage/posix
>  option directory /export/scratch_ns
> end-volume
>
> EOF
> else # if master
>    SERVERVOLUMES=""
> fi   # if master
>
> SERVERVOLUMES="$SERVERVOLUMES $LOCAL"
> cat <<EOF
> volume $LOCAL-posix
>  type storage/posix
>  option directory /export/scratch
> end-volume
>
> volume $LOCAL-locks
>  type features/posix-locks
>  subvolumes $LOCAL-posix
> end-volume
>
> volume $LOCAL-ioth
>  type performance/io-threads
>  option thread-count 4
>  subvolumes $LOCAL-locks
> end-volume
>
> volume $LOCAL
>  type performance/read-ahead
>  subvolumes $LOCAL-ioth
> end-volume
>
> volume server
>  type protocol/server
>  option transport-type tcp/server
>  subvolumes $SERVERVOLUMES
> EOF
>
> for vol in $SERVERVOLUMES;do
>    cat <<EOF
>  option auth.addr.$vol.allow 127.0.0.1,192.168.1.*
> EOF
> done
>
> cat <<EOF
> end-volume
>
> EOF
>
> } > /usr/local/etc/glusterfs-server.vol
>
> ###################################################################
> # write /usr/local/etc/glusterfs-client.vol
> {
> cat <<EOF
> ###
> ### Client config automatically created by $PWD/$0
> ###
>
> volume scns
>  type protocol/client
>  option transport-type tcp/client
>  option remote-host $MASTER_IP
>  option remote-subvolume scns
> end-volume
>
> volume sc0
>  type protocol/client
>  option transport-type tcp/client
>  option remote-host $MASTER_IP
>  option remote-subvolume sc0
> end-volume
>
> EOF
>
> UNIFY="sc0"
>
> # leave out node66 at the moment...
>
> for n in $(seq 65) $(seq 67 87);do
>    VOL=sc$n
>    UNIFY="$UNIFY $VOL"
>        cat <<EOF
> volume $VOL
>  type protocol/client
>  option transport-type tcp/client
>  option remote-host 192.168.1.$n
>  option remote-subvolume $VOL
> end-volume
>
> EOF
> done
>
> cat <<EOF
> volume scratch
>  type cluster/unify
>  subvolumes $UNIFY
>  option namespace scns
>  option scheduler nufa
>  option nufa.limits.min-free-disk 15
>  option nufa.refresh-interval 10
>  option nufa.local-volume-name $LOCAL
> end-volume
>
> volume scratch-io-threads
>  type performance/io-threads
>  option thread-count 4
>  subvolumes scratch
> end-volume
>
> volume scratch-write-behind
>  type performance/write-behind
>  option aggregate-size 128kB
>  option flush-behind off
>  subvolumes scratch-io-threads
> end-volume
>
> volume scratch-read-ahead
>  type performance/read-ahead
>  option page-size 128kB # unit in bytes
>  option page-count 2    # cache per file  = (page-count x page-size)
>  subvolumes scratch-write-behind
> end-volume
>
> volume scratch-io-cache
>  type performance/io-cache
>  option cache-size 64MB
>  option page-size 512kB
>  subvolumes scratch-read-ahead
> end-volume
>
> EOF
>
> } > /usr/local/etc/glusterfs-client.vol
> ---------------------------------- 8< snip >8  
> ----------------------------------
>
> The cluster uses MPI over Infiniband, while GlusterFS runs over TCP/ 
> IP Gigabit Ethernet. I use FUSE 2.7.4 with patch  
> fuse-2.7.3glfs10.diff (Is that OK? The patch succeeded)
>
> Everything is fine until some nodes which are used by a job block on  
> access to /scratch or, sometimes later, give
>
> df: `/scratch': Transport endpoint is not connected
>
> The glusterfs.log on node36 is flooded by
>
> 2008-11-25 07:30:35 E [client-protocol.c:243:call_bail] sc70:  
> activating bail-out. pending frames = 3. last sent = 2008-11-25  
> 07:29:52. last received = 2008-11-25 07:29:49. transport-timeout = 42
> 2008-11-25 07:30:35 C [client-protocol.c:250:call_bail] sc70:  
> bailing transport
> ...(~100MB)
>
> (~2 lines for every node every 10 seconds) Furthermore, I find at  
> the end of glusterfs.log:
>
> grep -v call_bail glusterfs.log
> ...
> 2008-11-25 10:00:46 E [socket.c:1187:socket_submit] sc0: transport  
> not connected to submit (priv->connected = 255)
> ...
> 2008-11-25 10:00:46 E [socket.c:1187:socket_submit] sc87: transport  
> not connected to submit (priv->connected = 255)
> 2008-11-25 10:00:46 E [socket.c:1187:socket_submit] scns: transport  
> not connected to submit (priv->connected = 255)
> 2008-11-25 10:05:03 E [fuse-bridge.c:1886:fuse_statfs_cbk] glusterfs- 
> fuse: 1353: ERR => -1 (Transport endpoint is not connected)
>
> On node68 I find
>
> 2008-11-24 23:20:12 W [client-protocol.c:93:this_ino_set] sc0: inode  
> number(201326854) changed for inode(0x6130d0)
> 2008-11-24 23:20:12 W [client-protocol.c:93:this_ino_set] scns:  
> inode number(37749030) changed for inode(0x6130d0)
> 2008-11-24 23:20:58 E [client-protocol.c:243:call_bail] scns:  
> activating bail-out. pending frames = 3. last sent = 2008-11-24  
> 23:20:12. last received = 2008-11-24 23:20:12. transport-timeout = 42
> 2008-11-24 23:20:58 C [client-protocol.c:250:call_bail] scns:  
> bailing transport
> 2008-11-24 23:20:58 E [client-protocol.c:243:call_bail] sc0:  
> activating bail-out. pending frames = 3. last sent = 2008-11-24  
> 23:20:12. last received = 2008-11-24 23:20:12. transport-timeout = 42
> 2008-11-24 23:20:58 C [client-protocol.c:250:call_bail] sc0: bailing  
> transport
> ...(~100MB)
>
> only for scns and sc0 and then
>
> 2008-11-25 10:01:31 E [client-protocol.c:243:call_bail] sc1:  
> activating bail-out. pending frames = 1. last sent = 2008-11-25  
> 10:00:46. last received = 2008-11-24 23:20:12. transport-timeout = 42
> 2008-11-25 10:01:31 C [client-protocol.c:250:call_bail] sc1: bailing  
> transport
> ...(~100MB)
>
> for all nodes, as well as
>
> 2008-11-25 10:00:46 E [socket.c:1187:socket_submit] sc0: transport  
> not connected to submit (priv->connected = 255)
> 2008-11-25 10:00:46 E [socket.c:1187:socket_submit] scns: transport  
> not connected to submit (priv->connected = 255)
> 2008-11-25 11:23:18 E [socket.c:1187:socket_submit] sc1: transport  
> not connected to submit (priv->connected = 255)
> 2008-11-25 11:23:18 E [socket.c:1187:socket_submit] sc2: transport  
> not connected to submit (priv->connected = 255)
> ...
>
> The third affected node node77 says:
>
> 2008-11-24 22:07:20 W [client-protocol.c:93:this_ino_set] sc0: inode  
> number(201326854) changed for inode(0x7f97d6c0ac70)
> 2008-11-24 22:07:20 W [client-protocol.c:93:this_ino_set] scns:  
> inode number(37749030) changed for inode(0x7f97d6c0ac70)
> 2008-11-24 22:08:07 E [client-protocol.c:243:call_bail] sc10:  
> activating bail-out. pending frames = 7. last sent = 2008-11-24  
> 22:07:24. last received = 2008-11-24 22:07:20. transport-timeout = 42
> 2008-11-24 22:08:07 C [client-protocol.c:250:call_bail] sc10:  
> bailing transport
> ...(~100MB)
>
> and then
>
> 2008-11-25 10:00:46 E [socket.c:1187:socket_submit] sc0: transport  
> not connected to submit (priv->connected = 255)
> ...
> 2008-11-25 10:00:46 E [socket.c:1187:socket_submit] sc87: transport  
> not connected to submit (priv->connected = 255)
> 2008-11-25 10:00:46 E [socket.c:1187:socket_submit] scns: transport  
> not connected to submit (priv->connected = 255)
>
>
> As I said, similar problems occurred with version 1.3.x. If these  
> problems cannot be solved, we have to use a different file system,  
> so any help is very appreciated.
>
> Have fun,
>
>     Fred
>
> Dr. Fred Hucht <fred at thp.Uni-DuE.de>
> Institute for Theoretical Physics
> University of Duisburg-Essen, 47048 Duisburg, Germany
>
>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>
>
>
> -- 
> hard work often pays off after time, but laziness always pays off now

Dr. Fred Hucht <fred at thp.Uni-DuE.de>
Institute for Theoretical Physics
University of Duisburg-Essen, 47048 Duisburg, Germany

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-devel/attachments/20081125/da7c57bb/attachment-0003.html>


More information about the Gluster-devel mailing list