[Gluster-users] Glusterfs gives up with endpoint not connected

Tue Apr 9 13:43:51 UTC 2013

After some testing I found out that my backup using rsync caused the error
'endpoint not connected'.
Stopping the cron job everything seemed to be ok.
Is there a way to take backup snapshots from the volumes by gluster itself?

-----------------------------------------------
EDV Daniel Müller

Leitung EDV
Tropenklinik Paul-Lechler-Krankenhaus
Paul-Lechler-Str. 24
72076 Tübingen

Tel.: 07071/206-463, Fax: 07071/206-499
eMail: mueller at tropenklinik.de
Internet: www.tropenklinik.de
-----------------------------------------------
-----Ursprüngliche Nachricht-----
Von: Daniel Müller [mailto:mueller at tropenklinik.de] 
Gesendet: Donnerstag, 28. März 2013 14:17
An: 'mueller at tropenklinik.de'; 'Pranith Kumar K'
Cc: 'Reinhard Marstaller'; 'gluster-users at gluster.org'
Betreff: AW: [Gluster-users] Glusterfs gives up with endpoint not connected

The tird part, output /var/log/messages concerning raid5 hs:

[root at tuepdc /]# tail -f /var/log/messages Mar 28 13:21:32 tuepdc kernel:
SCSI device sdd: drive cache: write back Mar 28 13:21:32 tuepdc kernel: SCSI
device sde: 1953525168 512-byte hdwr sectors (1000205 MB) Mar 28 13:21:32
tuepdc kernel: sde: Write Protect is off Mar 28 13:21:32 tuepdc kernel: SCSI
device sde: drive cache: write back Mar 28 13:21:32 tuepdc kernel: SCSI
device sdf: 1953525168 512-byte hdwr sectors (1000205 MB) Mar 28 13:21:32
tuepdc kernel: sdf: Write Protect is off Mar 28 13:21:32 tuepdc kernel: SCSI
device sdf: drive cache: write back Mar 28 13:21:32 tuepdc kernel: SCSI
device sdg: 1953525168 512-byte hdwr sectors (1000205 MB) Mar 28 13:21:32
tuepdc kernel: sdg: Write Protect is off Mar 28 13:21:32 tuepdc kernel: SCSI
device sdg: drive cache: write back

-----------------------------------------------
EDV Daniel Müller

Leitung EDV
Tropenklinik Paul-Lechler-Krankenhaus
Paul-Lechler-Str. 24
72076 Tübingen

Tel.: 07071/206-463, Fax: 07071/206-499
eMail: mueller at tropenklinik.de
Internet: www.tropenklinik.de
-----------------------------------------------

-----Ursprüngliche Nachricht-----
Von: gluster-users-bounces at gluster.org
[mailto:gluster-users-bounces at gluster.org] Im Auftrag von Daniel Müller
Gesendet: Donnerstag, 28. März 2013 13:57
An: 'Pranith Kumar K'
Cc: 'Reinhard Marstaller'; gluster-users at gluster.org
Betreff: Re: [Gluster-users] Glusterfs gives up with endpoint not connected

Now part to of raid5hs-glusterfs-export.log

attr (utimes) on
/raid5hs/glusterfs/export/windows/winuser/schneider/schneider/V
erwaltung/baumma<DF>nahmen/Bauvorhaben
Umsetzung/Parkierung/SKIZZE_TPLK_Lageplan
.pdf failed: Read-only file system
pending frames:

patchset: v3.2.0
signal received: 11
time of crash: 2013-03-25 22:50:46
configuration details:
argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.2.0
/lib64/libc.so.6[0x30c0a302d0]
/opt/glusterfs/3.2.0/lib64/glusterfs/3.2.0/xlator/features/marker.so(marker_
setattr_cbk+0x139)[0x2aaaaba9de79]
/opt/glusterfs/3.2.0/lib64/glusterfs/3.2.0/xlator/performance/io-threads.so(
iot_setattr_cbk+0x88)[0x2aaaab88d718]
/opt/glusterfs/3.2.0/lib64/libglusterfs.so.0(default_setattr_cbk+0x88)[0x2b1
a834a5f28]
/opt/glusterfs/3.2.0/lib64/libglusterfs.so.0(default_setattr_cbk+0x88)[0x2b1
a834a5f28]
/opt/glusterfs/3.2.0/lib64/glusterfs/3.2.0/xlator/storage/posix.so(posix_set
attr+0x1fc)[0x2aaaab2560bc]
/opt/glusterfs/3.2.0/lib64/glusterfs/3.2.0/xlator/features/access-control.so
(ac_setattr_resume+0xe9)[0x2aaaab469039]
/opt/glusterfs/3.2.0/lib64/glusterfs/3.2.0/xlator/features/access-control.so
(ac_setattr+0x49)[0x2aaaab46a979]
/opt/glusterfs/3.2.0/lib64/libglusterfs.so.0(default_setattr+0xe9)[0x2b1a834
9f659]
/opt/glusterfs/3.2.0/lib64/glusterfs/3.2.0/xlator/performance/io-threads.so(
iot_setattr_wrapper+0xe9)[0x2aaaab890749]
/opt/glusterfs/3.2.0/lib64/libglusterfs.so.0(call_resume+0xd81)[0x2b1a834b01
91]
/opt/glusterfs/3.2.0/lib64/glusterfs/3.2.0/xlator/performance/io-threads.so(
iot_worker+0x119)[0x2aaaab894229]
/lib64/libpthread.so.0[0x30c160673d]
/lib64/libc.so.6(clone+0x6d)[0x30c0ad44bd]
---------
[2013-03-26 08:04:48.577056] W [socket.c:419:__socket_keepalive] 0-socket:
failed to set keep idle on socket 8
[2013-03-26 08:04:48.613068] W [socket.c:1846:socket_server_event_handler]
0-socket.glusterfsd: Failed to set keep-alive: Operation not supported
[2013-03-26 08:04:49.187484] W [graph.c:274:gf_add_cmdline_options]
0-sambavol-server: adding option 'listen-port' for volume 'sambavol-server'
with value '24009'
[2013-03-26 08:04:49.253395] W [rpc-transport.c:447:validate_volume_options]
0-tcp.sambavol-server: option 'listen-port' is deprecated, preferred is
'transport.socket.listen-port', continuing with correction
[2013-03-26 08:04:49.287651] E [posix.c:4369:init] 0-sambavol-posix:
Directory '/raid5hs/glusterfs/export' doesn't exist, exiting.
[2013-03-26 08:04:49.287709] E [xlator.c:1390:xlator_init] 0-sambavol-posix:
Initialization of volume 'sambavol-posix' failed, review your volfile again
[2013-03-26 08:04:49.287721] E [graph.c:331:glusterfs_graph_init]
0-sambavol-posix: initializing translator failed
[2013-03-26 08:04:49.287731] E [graph.c:503:glusterfs_graph_activate]
0-graph: init failed
[2013-03-26 08:04:49.287982] W [glusterfsd.c:700:cleanup_and_exit]
(-->/opt/glusterfs/3.2.0/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa2)
[0x2b38e21d63d2] (--:

-----------------------------------------------
EDV Daniel Müller

Leitung EDV
Tropenklinik Paul-Lechler-Krankenhaus
Paul-Lechler-Str. 24
72076 Tübingen

Tel.: 07071/206-463, Fax: 07071/206-499
eMail: mueller at tropenklinik.de
Internet: www.tropenklinik.de
-----------------------------------------------

-----Ursprüngliche Nachricht-----
Von: Pranith Kumar K [mailto:pkarampu at redhat.com]
Gesendet: Donnerstag, 28. März 2013 12:34
An: mueller at tropenklinik.de
Cc: gluster-users at gluster.org; Reinhard Marstaller
Betreff: Re: [Gluster-users] Glusterfs gives up with endpoint not connected

On 03/28/2013 03:48 PM, Daniel Müller wrote:
> Dear all,
>
> Right out of the blue glusterfs is not working fine any more every now 
> end the it stops working telling me, Endpoint not connected and 
> writing core files:
>
> [root at tuepdc /]# file core.15288
> core.15288: ELF 64-bit LSB core file AMD x86-64, version 1 (SYSV), 
> SVR4-style, from 'glusterfs'
>
> My Version:
> [root at tuepdc /]# glusterfs --version
> glusterfs 3.2.0 built on Apr 22 2011 18:35:40 Repository revision: 
> v3.2.0 Copyright (c) 2006-2010 Gluster Inc. <http://www.gluster.com> 
> GlusterFS comes with ABSOLUTELY NO WARRANTY.
> You may redistribute copies of GlusterFS under the terms of the GNU 
> Affero General Public License.
>
> My /var/log/glusterfs/bricks/raid5hs-glusterfs-export.log
>
> [2013-03-28 10:47:07.243980] I [server.c:438:server_rpc_notify]
> 0-sambavol-server: disconnected connection from 192.168.130.199:1023
> [2013-03-28 10:47:07.244000] I
> [server-helpers.c:783:server_connection_destroy] 0-sambavol-server:
> destroyed connection of
> tuepdc.local-16600-2013/03/28-09:32:28:258428-sambavol-client-0
>
>
> [root at tuepdc bricks]# gluster volume info
>
> Volume Name: sambavol
> Type: Replicate
> Status: Started
> Number of Bricks: 2
> Transport-type: tcp
> Bricks:
> Brick1: 192.168.130.199:/raid5hs/glusterfs/export
> Brick2: 192.168.130.200:/raid5hs/glusterfs/export
> Options Reconfigured:
> network.ping-timeout: 5
> performance.quick-read: on
>
> Gluster is running on ext3 raid5 HS on both hosts [root at tuepdc 
> bricks]# mdadm  --detail /dev/md0
> /dev/md0:
>          Version : 0.90
>    Creation Time : Wed May 11 10:08:30 2011
>       Raid Level : raid5
>       Array Size : 1953519872 (1863.02 GiB 2000.40 GB)
>    Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
>     Raid Devices : 3
>    Total Devices : 4
> Preferred Minor : 0
>      Persistence : Superblock is persistent
>
>      Update Time : Thu Mar 28 11:13:21 2013
>            State : clean
>   Active Devices : 3
> Working Devices : 4
>   Failed Devices : 0
>    Spare Devices : 1
>
>           Layout : left-symmetric
>       Chunk Size : 64K
>
>             UUID : c484e093:018a2517:56e38f5e:1a216491
>           Events : 0.250
>
>      Number   Major   Minor   RaidDevice State
>         0       8       49        0      active sync   /dev/sdd1
>         1       8       65        1      active sync   /dev/sde1
>         2       8       97        2      active sync   /dev/sdg1
>
>         3       8       81        -      spare   /dev/sdf1
>
> [root at tuepdc glusterfs]# tail -f  mnt-glusterfs.log
> [2013-03-28 10:57:40.882566] I [rpc-clnt.c:1531:rpc_clnt_reconfig]
> 0-sambavol-client-0: changing port to 24009 (from 0)
> [2013-03-28 10:57:40.883636] I [rpc-clnt.c:1531:rpc_clnt_reconfig]
> 0-sambavol-client-1: changing port to 24009 (from 0)
> [2013-03-28 10:57:44.806649] I
> [client-handshake.c:1080:select_server_supported_programs]
> 0-sambavol-client-0: Using Program GlusterFS-3.1.0, Num (1298437), 
> Version
> (310)
> [2013-03-28 10:57:44.806857] I
> [client-handshake.c:913:client_setvolume_cbk]
> 0-sambavol-client-0: Connected to 192.168.130.199:24009, attached to 
> remote volume '/raid5hs/glusterfs/export'.
> [2013-03-28 10:57:44.806876] I [afr-common.c:2514:afr_notify]
> 0-sambavol-replicate-0: Subvolume 'sambavol-client-0' came back up; 
> going online.
> [2013-03-28 10:57:44.811557] I [fuse-bridge.c:3316:fuse_graph_setup]
0-fuse:
> switched to graph 0
> [2013-03-28 10:57:44.811773] I [fuse-bridge.c:2897:fuse_init]
> 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.13 
> kernel 7.10
> [2013-03-28 10:57:44.812139] I [afr-common.c:836:afr_fresh_lookup_cbk]
> 0-sambavol-replicate-0: added root inode
> [2013-03-28 10:57:44.812217] I
> [client-handshake.c:1080:select_server_supported_programs]
> 0-sambavol-client-1: Using Program GlusterFS-3.1.0, Num (1298437), 
> Version
> (310)
> [2013-03-28 10:57:44.812767] I
> [client-handshake.c:913:client_setvolume_cbk]
> 0-sambavol-client-1: Connected to 192.168.130.200:24009, attached to 
> remote volume '/raid5hs/glusterfs/export'.
>
>
>
>
> How can I fix this issue!??
>
> Daniel
>
> -----------------------------------------------
> EDV Daniel Müller
>
> Leitung EDV
> Tropenklinik Paul-Lechler-Krankenhaus
> Paul-Lechler-Str. 24
> 72076 Tübingen
>
> Tel.: 07071/206-463, Fax: 07071/206-499
> eMail: mueller at tropenklinik.de
> Internet: www.tropenklinik.de
> -----------------------------------------------
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
Could you paste the traceback that is printed in the log file of the mount
for that crash.

search for "crash" in the logs. You will see the trace after that. Paste
that here.

Pranith.

_______________________________________________
Gluster-users mailing list
Gluster-users at gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users