[Gluster-users] Glusterfs gives up with endpoint not connected

Daniel Müller mueller at tropenklinik.de
Thu Mar 28 12:54:01 UTC 2013


[root at tuepdc glusterfs]# grep crash *
mnt-glusterfs.log:time of crash: 2013-03-26 16:29:03
mnt-glusterfs.log:time of crash: 2013-03-27 11:27:09
mnt-glusterfs.log:time of crash: 2013-03-27 12:27:52
mnt-glusterfs.log:time of crash: 2013-03-27 18:03:47
mnt-glusterfs.log:time of crash: 2013-03-28 08:57:07
mnt-glusterfs.log:time of crash: 2013-03-28 09:30:22
mnt-glusterfs.log:time of crash: 2013-03-28 10:47:06

=====================================================================

ambavol-replicate-0: size differs for /windows/winuser/schoell/Kopie von
Arbeits
zeitnachweis_Schoell-Mai2013.xls
[2013-03-26 16:28:57.650746] I [afr-common.c:735:afr_lookup_done]
0-sambavol-rep
licate-0: background  meta-data data self-heal triggered. path:
/windows/winuser
/schoell/Kopie von Arbeitszeitnachweis_Schoell-Mai2013.xls
[2013-03-26 16:29:03.808123] I [client3_1-fops.c:2228:client3_1_lookup_cbk]
0-sa
mbavol-client-1: remote operation failed: Stale NFS file handle
[2013-03-26 16:29:03.890754] I [afr-common.c:581:afr_lookup_collect_xattr]
0-sam
bavol-replicate-0: data self-heal is pending for
/windows/winuser/schoell/Kopie
von Arbeitszeitnachweis_Schoell-M<E4>rz2013.xls.
[2013-03-26 16:29:03.890807] I [afr-common.c:735:afr_lookup_done]
0-sambavol-rep
licate-0: background  meta-data data self-heal triggered. path:
/windows/winuser
/schoell/Kopie von Arbeitszeitnachweis_Schoell-M<E4>rz2013.xls
[2013-03-26 16:29:03.891570] I [client3_1-fops.c:1226:client3_1_inodelk_cbk]
0-s
ambavol-client-1: remote operation failed: No such file or directory
[2013-03-26 16:29:03.892425] I [client3_1-fops.c:366:client3_1_open_cbk]
0-samba
vol-client-1: remote operation failed: No such file or directory
[2013-03-26 16:29:03.892445] I
[afr-self-heal-data.c:1002:afr_sh_data_open_cbk]
0-sambavol-replicate-0: open of /windows/winuser/schoell/Kopie von
Arbeitszeitna
chweis_Schoell-M<E4>rz2013.xls failed on child sambavol-client-1 (No such
file o
r directory)
[2013-03-26 16:29:03.892550] I [client3_1-fops.c:1226:client3_1_inodelk_cbk]
0-s
ambavol-client-0: remote operation failed: Invalid argument
[2013-03-26 16:29:03.892567] I [afr-lk-common.c:568:afr_unlock_inodelk_cbk]
0-sa
mbavol-replicate-0: /windows/winuser/schoell/Kopie von
Arbeitszeitnachweis_Schoe
ll-M<E4>rz2013.xls: unlock failed Invalid argument
[2013-03-26 16:29:03.893072] I [afr-common.c:581:afr_lookup_collect_xattr]
0-sam
bavol-replicate-0: data self-heal is pending for
/windows/winuser/schoell/Kopie
von Arbeitszeitnachweis_Schoell-M<E4>rz2013.xls.
[2013-03-26 16:29:03.894570] I [afr-common.c:581:afr_lookup_collect_xattr]
0-sam
bavol-replicate-0: data self-heal is pending for
/windows/winuser/schoell/Kopie
von Arbeitszeitnachweis_Schoell-M<E4>rz2013.xls.
[2013-03-26 16:29:03.894594] W [afr-common.c:634:afr_lookup_self_heal_check]
0-s
ambavol-replicate-0: /windows/winuser/schoell/Kopie von
Arbeitszeitnachweis_Scho
ell-M<E4>rz2013.xls: gfid different on subvolume
[2013-03-26 16:29:03.894610] I [afr-common.c:735:afr_lookup_done]
0-sambavol-rep
licate-0: background  meta-data data self-heal triggered. path:
/windows/winuser
/schoell/Kopie von Arbeitszeitnachweis_Schoell-M<E4>rz2013.xls
[2013-03-26 16:29:03.895996] I
[afr-self-heal-common.c:537:afr_sh_mark_sources]
0-sambavol-replicate-0: split-brain possible, no source detected
[2013-03-26 16:29:03.896014] E
[afr-self-heal-metadata.c:521:afr_sh_metadata_fix] 0-sambavol-replicate-0:
Unable to self-heal permissions/ownership of '/windows/winuser/schoell/Kopie
von Arbeitszeitnachweis_Schoell-M<E4>rz2013.xls' (possible split-brain).
Please fix the file on all backend volumes
[2013-03-26 16:29:03.896954] I
[afr-self-heal-metadata.c:81:afr_sh_metadata_done] 0-sambavol-replicate-0:
aborting selfheal of /windows/winuser/schoell/Kopie von
Arbeitszeitnachweis_Schoell-M<E4>rz2013.xls
[2013-03-26 16:29:03.970126] I [client3_1-fops.c:366:client3_1_open_cbk]
0-sambavol-client-1: remote operation failed: No such file or directory
pending frames:
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)

patchset: v3.2.0
signal received: 11
time of crash: 2013-03-26 16:29:03
configuration details:
argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
============================================================================
==========================================

/winuser/steimle/Buha/Stundennachweis/Stundennachweis
2013/Stundennachweis.xls'
(possible split-brain). Please fix the file on all backend volumes
[2013-03-27 11:27:09.431579] I
[afr-self-heal-metadata.c:81:afr_sh_metadata_done
] 0-sambavol-replicate-0: aborting selfheal of
/windows/winuser/steimle/Buha/Stu
ndennachweis/Stundennachweis 2013/Stundennachweis.xls
[2013-03-27 11:27:09.432480] I [client3_1-fops.c:366:client3_1_open_cbk]
0-samba
vol-client-1: remote operation failed: No such file or directory
pending frames:
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)

patchset: v3.2.0
signal received: 11
time of crash: 2013-03-27 11:27:09
configuration details:
argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.2.0
/lib64/libc.so.6[0x30c0a302d0]
/opt/glusterfs/3.2.0/lib64/glusterfs/3.2.0/xlator/performance/io-cache.so(io
c_open_cbk+0x9b)[0x2aaaaba4d7fb]
/opt/glusterfs/3.2.0/lib64/glusterfs/3.2.0/xlator/performance/read-ahead.so(
ra_open_cbk+0x205)[0x2aaaab842935]
/opt/glusterfs/3.2.0/lib64/glusterfs/3.2.0/xlator/performance/write-behind.s
o(wb_open_cbk+0xf4)[0x2aaaab632784]
/opt/glusterfs/3.2.0/lib64/glusterfs/3.2.0/xlator/cluster/replicate.so(afr_o
pen_cbk+0x232)[0x2aaaab3f8a32]
/opt/glusterfs/3.2.0/lib64/glusterfs/3.2.0/xlator/protocol/client.so(client3
_1_open_cbk+0x19f)[0x2aaaab1bfdaf]
/opt/glusterfs/3.2.0/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa2)[0x2b7ef
e01b3d2]
/opt/glusterfs/3.2.0/lib64/libgfrpc.so.0(rpc_clnt_notify+0x8d)[0x2b7efe01b5c
d]
/opt/glusterfs/3.2.0/lib64/libgfrpc.so.0(rpc_transport_notify+0x27)[0x2b7efe
0162e7]
/opt/glusterfs/3.2.0/lib64/glusterfs/3.2.0/rpc-transport/socket.so(socket_ev
ent_poll_in+0x3f)[0x2aaaaad705af]
/opt/glusterfs/3.2.0/lib64/glusterfs/3.2.0/rpc-transport/socket.so(socket_ev
ent_handler+0x188)[0x2aaaaad70758]
/opt/glusterfs/3.2.0/lib64/libglusterfs.so.0[0x2b7efdddb811]
/opt/glusterfs/3.2.0/sbin/glusterfs(main+0x407)[0x405577]
/lib64/libc.so.6(__libc_start_main+0xf4)[0x30c0a1d994]
/opt/glusterfs/3.2.0/sbin/glusterfs[0x4036f9]
---------
[2013-03-27 12:14:26.544930] W [write-behind.c:3023:init]
0-sambavol-write-behind: disabling write-behind for first 0 bytes
[2013-03-27 12:14:26.544980] I [client.c:1987:build_client_config]
0-sambavol-client-1: setting ping-timeout to 5
[2013-03-27 12:14:26.547225] I [client.c:1987:build_client_config]
0-sambavol-client-0: setting ping-timeout to 5
[2013-03-27 12:14:26.549258] I [client.c:1935:notify] 0-sambavol-client-0:
parent translators are ready, attempting connect on transport
[2013-03-27 12:14:26.553418] I [client.c:1935:notify] 0-sambavol-client-1:
parent translators are ready, attempting connect on transport
Given volfile:
+---------------------------------------------------------------------------
---+
  1: volume sambavol-client-0
  2:     type protocol/client
  3:     option remote-host 192.168.130.199
  4:     option remote-subvolume /raid5hs/glusterfs/export
  5:     option transport-type tcp

============================================================================
====================================

Shall I seek for something more special?




-----------------------------------------------
EDV Daniel Müller

Leitung EDV
Tropenklinik Paul-Lechler-Krankenhaus
Paul-Lechler-Str. 24
72076 Tübingen

Tel.: 07071/206-463, Fax: 07071/206-499
eMail: mueller at tropenklinik.de
Internet: www.tropenklinik.de
-----------------------------------------------

-----Ursprüngliche Nachricht-----
Von: Pranith Kumar K [mailto:pkarampu at redhat.com] 
Gesendet: Donnerstag, 28. März 2013 12:34
An: mueller at tropenklinik.de
Cc: gluster-users at gluster.org; Reinhard Marstaller
Betreff: Re: [Gluster-users] Glusterfs gives up with endpoint not connected

On 03/28/2013 03:48 PM, Daniel Müller wrote:
> Dear all,
>
> Right out of the blue glusterfs is not working fine any more every now 
> end the it stops working telling me, Endpoint not connected and 
> writing core files:
>
> [root at tuepdc /]# file core.15288
> core.15288: ELF 64-bit LSB core file AMD x86-64, version 1 (SYSV), 
> SVR4-style, from 'glusterfs'
>
> My Version:
> [root at tuepdc /]# glusterfs --version
> glusterfs 3.2.0 built on Apr 22 2011 18:35:40 Repository revision: 
> v3.2.0 Copyright (c) 2006-2010 Gluster Inc. <http://www.gluster.com> 
> GlusterFS comes with ABSOLUTELY NO WARRANTY.
> You may redistribute copies of GlusterFS under the terms of the GNU 
> Affero General Public License.
>
> My /var/log/glusterfs/bricks/raid5hs-glusterfs-export.log
>
> [2013-03-28 10:47:07.243980] I [server.c:438:server_rpc_notify]
> 0-sambavol-server: disconnected connection from 192.168.130.199:1023
> [2013-03-28 10:47:07.244000] I
> [server-helpers.c:783:server_connection_destroy] 0-sambavol-server:
> destroyed connection of
> tuepdc.local-16600-2013/03/28-09:32:28:258428-sambavol-client-0
>
>
> [root at tuepdc bricks]# gluster volume info
>
> Volume Name: sambavol
> Type: Replicate
> Status: Started
> Number of Bricks: 2
> Transport-type: tcp
> Bricks:
> Brick1: 192.168.130.199:/raid5hs/glusterfs/export
> Brick2: 192.168.130.200:/raid5hs/glusterfs/export
> Options Reconfigured:
> network.ping-timeout: 5
> performance.quick-read: on
>
> Gluster is running on ext3 raid5 HS on both hosts [root at tuepdc 
> bricks]# mdadm  --detail /dev/md0
> /dev/md0:
>          Version : 0.90
>    Creation Time : Wed May 11 10:08:30 2011
>       Raid Level : raid5
>       Array Size : 1953519872 (1863.02 GiB 2000.40 GB)
>    Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
>     Raid Devices : 3
>    Total Devices : 4
> Preferred Minor : 0
>      Persistence : Superblock is persistent
>
>      Update Time : Thu Mar 28 11:13:21 2013
>            State : clean
>   Active Devices : 3
> Working Devices : 4
>   Failed Devices : 0
>    Spare Devices : 1
>
>           Layout : left-symmetric
>       Chunk Size : 64K
>
>             UUID : c484e093:018a2517:56e38f5e:1a216491
>           Events : 0.250
>
>      Number   Major   Minor   RaidDevice State
>         0       8       49        0      active sync   /dev/sdd1
>         1       8       65        1      active sync   /dev/sde1
>         2       8       97        2      active sync   /dev/sdg1
>
>         3       8       81        -      spare   /dev/sdf1
>
> [root at tuepdc glusterfs]# tail -f  mnt-glusterfs.log
> [2013-03-28 10:57:40.882566] I [rpc-clnt.c:1531:rpc_clnt_reconfig]
> 0-sambavol-client-0: changing port to 24009 (from 0)
> [2013-03-28 10:57:40.883636] I [rpc-clnt.c:1531:rpc_clnt_reconfig]
> 0-sambavol-client-1: changing port to 24009 (from 0)
> [2013-03-28 10:57:44.806649] I
> [client-handshake.c:1080:select_server_supported_programs]
> 0-sambavol-client-0: Using Program GlusterFS-3.1.0, Num (1298437), 
> Version
> (310)
> [2013-03-28 10:57:44.806857] I 
> [client-handshake.c:913:client_setvolume_cbk]
> 0-sambavol-client-0: Connected to 192.168.130.199:24009, attached to 
> remote volume '/raid5hs/glusterfs/export'.
> [2013-03-28 10:57:44.806876] I [afr-common.c:2514:afr_notify]
> 0-sambavol-replicate-0: Subvolume 'sambavol-client-0' came back up; 
> going online.
> [2013-03-28 10:57:44.811557] I [fuse-bridge.c:3316:fuse_graph_setup]
0-fuse:
> switched to graph 0
> [2013-03-28 10:57:44.811773] I [fuse-bridge.c:2897:fuse_init]
> 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.13 
> kernel 7.10
> [2013-03-28 10:57:44.812139] I [afr-common.c:836:afr_fresh_lookup_cbk]
> 0-sambavol-replicate-0: added root inode
> [2013-03-28 10:57:44.812217] I
> [client-handshake.c:1080:select_server_supported_programs]
> 0-sambavol-client-1: Using Program GlusterFS-3.1.0, Num (1298437), 
> Version
> (310)
> [2013-03-28 10:57:44.812767] I 
> [client-handshake.c:913:client_setvolume_cbk]
> 0-sambavol-client-1: Connected to 192.168.130.200:24009, attached to 
> remote volume '/raid5hs/glusterfs/export'.
>
>
>
>
> How can I fix this issue!??
>
> Daniel
>
> -----------------------------------------------
> EDV Daniel Müller
>
> Leitung EDV
> Tropenklinik Paul-Lechler-Krankenhaus
> Paul-Lechler-Str. 24
> 72076 Tübingen
>
> Tel.: 07071/206-463, Fax: 07071/206-499
> eMail: mueller at tropenklinik.de
> Internet: www.tropenklinik.de
> -----------------------------------------------
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
Could you paste the traceback that is printed in the log file of the mount
for that crash.

search for "crash" in the logs. You will see the trace after that. Paste
that here.

Pranith.




More information about the Gluster-users mailing list