[Gluster-users] Crash in glusterfsd 3.4.0 beta1 and "Transport endpoint is not connected"

Alessandro De Salvo Alessandro.DeSalvo at roma1.infn.it
Thu May 23 15:36:37 UTC 2013


Hi,
after digging a little more, I see this with gdb:

Core was generated by `/usr/sbin/glusterfs --volfile-id=/adsroma1-gluster-backup --volfile-server=pc-a'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007fe93f1341cf in ioc_open_cbk (frame=0x7fe9444d6434, cookie=<optimized out>, this=0x8873a0, op_ret=0, op_errno=117, fd=0x8e3a3c, xdata=0x0) at io-cache.c:554
554                     ioc_table_lock (ioc_inode->table);
(gdb) list
549
550             if (op_ret != -1) {
551                     inode_ctx_get (fd->inode, this, &tmp_ioc_inode);
552                     ioc_inode = (ioc_inode_t *)(long)tmp_ioc_inode;
553
554                     ioc_table_lock (ioc_inode->table);
555                     {
556                             list_move_tail (&ioc_inode->inode_lru,
557                                             &table->inode_lru[ioc_inode->weight]);
558                     }
(gdb) print (ioc_inode_t *)(long)tmp_ioc_inode
$4 = (ioc_inode_t *) 0x0

So, it seems the inode context is NULL in my case, hence the crash.
This seems to me very related to this code change:

https://forge.gluster.org/glusterfs-core/glusterfs/commit/b6e10801bee030fe7fcd1ec3bfac947ce44d023d/diffs?diffmode=sidebyside&fragment=1

So, will a patch be released for this kind of issue in the next beta release?
Can we just rebuild gluster including the above patch to make it working? Still I do not understand why the inode context is NULL, though.
Thanks,

	Alessandro

Il giorno 23/mag/2013, alle ore 11:07, Daniel Müller ha scritto:

> I had the same problem with gluster 3.2
> Syncing two Bricks.
> Look at your glusterfs-export.log, mout-glusterfs.log or something like
> this.
> For me the reason were some files who done that issue:
> -->  [2013-04-25 12:36:19.127124] E
> [afr-self-heal-metadata.c:521:afr_sh_metadata_fix] 0-sambavol-replicate-0:
> Unable to self-heal permissions/ownership of
> '/windows/winuser/xxxxx/xxx/xxx/xxx 2013/xxx.xls' (possible split-brain).
> Please fix the file on all backend volumes
> 
> After removing this files all was up and running agein.
> 
> Good Luck
> Daniel
> 
> -----------------------------------------------
> EDV Daniel Müller
> 
> Leitung EDV
> Tropenklinik Paul-Lechler-Krankenhaus
> Paul-Lechler-Str. 24
> 72076 Tübingen
> 
> Tel.: 07071/206-463, Fax: 07071/206-499
> eMail: mueller at tropenklinik.de
> Internet: www.tropenklinik.de
> -----------------------------------------------
> -----Ursprüngliche Nachricht-----
> Von: gluster-users-bounces at gluster.org
> [mailto:gluster-users-bounces at gluster.org] Im Auftrag von Alessandro De
> Salvo
> Gesendet: Donnerstag, 23. Mai 2013 10:18
> An: gluster-users at gluster.org
> Betreff: [Gluster-users] Crash in glusterfsd 3.4.0 beta1 and "Transport
> endpoint is not connected"
> 
> Hi,
> I have a replicated volume among two fedora 18 machines using glusterfs
> 3.4.0 beta1 from rawhide. All is fine with glusterd, and the replication is
> perfomed correctly, but every time I try to access any file from the fuse
> mounts I see this kind of errors in /var/log/glusterfs/<mountpoint>.log,
> leading to "Transport endpoint is not connected" so the filesystems get
> unmounted:
> 
> [2013-05-23 08:06:24.302332] I [afr-common.c:3709:afr_notify]
> 0-adsroma1-gluster-data01-replicate-0: Subvolume
> 'adsroma1-gluster-data01-client-1' came back up; going online.
> [2013-05-23 08:06:24.302706] I
> [client-handshake.c:450:client_set_lk_version_cbk]
> 0-adsroma1-gluster-data01-client-1: Server lk version = 1
> [2013-05-23 08:06:24.316318] I
> [client-handshake.c:1658:select_server_supported_programs]
> 0-adsroma1-gluster-data01-client-0: Using Program GlusterFS 3.3, Num
> (1298437), Version (330)
> [2013-05-23 08:06:24.336718] I
> [client-handshake.c:1456:client_setvolume_cbk]
> 0-adsroma1-gluster-data01-client-0: Connected to 127.0.0.1:49157, attached
> to remote volume '/gluster/data01/files'.
> [2013-05-23 08:06:24.336732] I
> [client-handshake.c:1468:client_setvolume_cbk]
> 0-adsroma1-gluster-data01-client-0: Server and Client lk-version numbers are
> not same, reopening the fds
> [2013-05-23 08:06:24.344178] I [fuse-bridge.c:4723:fuse_graph_setup] 0-fuse:
> switched to graph 0
> [2013-05-23 08:06:24.344372] I
> [client-handshake.c:450:client_set_lk_version_cbk]
> 0-adsroma1-gluster-data01-client-0: Server lk version = 1
> [2013-05-23 08:06:24.344502] I [fuse-bridge.c:3680:fuse_init]
> 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.13 kernel
> 7.21
> [2013-05-23 08:06:24.345008] I
> [afr-common.c:2059:afr_set_root_inode_on_first_lookup]
> 0-adsroma1-gluster-data01-replicate-0: added root inode
> [2013-05-23 08:06:24.345240] I [afr-common.c:2122:afr_discovery_cbk]
> 0-adsroma1-gluster-data01-replicate-0: selecting local read_child
> adsroma1-gluster-data01-client-0
> 
> 
> 
> 
> pending frames:
> frame : type(1) op(READ)
> frame : type(1) op(OPEN)
> frame : type(0) op(0)
> 
> patchset: git://git.gluster.com/glusterfs.git
> signal received: 11
> time of crash: 2013-05-23 08:08:20configuration details:
> argp 1
> backtrace 1
> dlfcn 1
> fdatasync 1
> libpthread 1
> llistxattr 1
> setfsid 1
> spinlock 1
> epoll.h 1
> xattr.h 1
> st_atim.tv_nsec 1
> package-string: glusterfs 3.4.0beta1
> /usr/lib64/libc.so.6[0x3c51035b50]
> /usr/lib64/glusterfs/3.4.0beta1/xlator/performance/io-cache.so(ioc_open_cbk+
> 0x8b)[0x7fb93cd2bc4b]
> /usr/lib64/glusterfs/3.4.0beta1/xlator/performance/read-ahead.so(ra_open_cbk
> +0x1c1)[0x7fb93cf3a951]
> /usr/lib64/glusterfs/3.4.0beta1/xlator/cluster/distribute.so(dht_open_cbk+0x
> e0)[0x7fb93d37f890]
> /usr/lib64/glusterfs/3.4.0beta1/xlator/cluster/replicate.so(afr_open_cbk+0x2
> 9c)[0x7fb93d5bf60c]
> /usr/lib64/glusterfs/3.4.0beta1/xlator/protocol/client.so(client3_3_open_cbk
> +0x174)[0x7fb93d82f5c4]
> /usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0x90)[0x3c5300e880]
> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x101)[0x3c5300ea81]
> /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x3c5300b0d3]
> /usr/lib64/glusterfs/3.4.0beta1/rpc-transport/socket.so(socket_event_poll_in
> +0x34)[0x7fb93eefa6a4]
> /usr/lib64/glusterfs/3.4.0beta1/rpc-transport/socket.so(socket_event_handler
> +0x11c)[0x7fb93eefa9dc]
> /usr/lib64/libglusterfs.so.0[0x3c5285923b]
> /usr/sbin/glusterfs(main+0x3a4)[0x4049d4]
> /usr/lib64/libc.so.6(__libc_start_main+0xf5)[0x3c51021c35]
> /usr/sbin/glusterfs[0x404d49]
> ---------
> 
> 
> The volume is defined as follows:
> 
> Volume Name: adsroma1-gluster-data01
> Type: Replicate
> Volume ID: 1ca608c7-8a9d-4d8c-ac05-fabc2d2c2565
> Status: Started
> Number of Bricks: 1 x 2 = 2
> Transport-type: tcp
> Bricks:
> Brick1: pc-ads-02.roma1.infn.it:/gluster/data01/files
> Brick2: pc-ads-03.roma1.infn.it:/gluster/data01/files
> 
> Is it a known problem with this beta version?
> Any hint?
> Thanks,
> 
> 	Alessandro
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
> 
> 




More information about the Gluster-users mailing list