[Gluster-users] Error report: glusterfs2.0rc4 abend -- "readv failed (Bad address) "

Anand Avati avati at gluster.com
Wed Mar 18 14:18:29 UTC 2009


do you have a core dump which can be inspected? do you still see the
error after taking the 1.3 client off?

Avati

On Sat, Mar 14, 2009 at 10:42 PM, Andrew McGill <list2008 at lunch.za.net> wrote:
> Hello,
>
> I upgraded from glusterfs-1.3.12.tar.gz to glusterfs-2.0.0rc4.tar.gz because I
> could not complete a rdiff-backup without inexplicable errors.  My efforts
> have been rewarded with a crash, for which some logs are displayed below.
>
> The backend is unify, with multiple afr subvolumes of two ext3 volumes each.
>
> Here is how the client side of glusterfs died:
>
> 2009-03-14 13:32:15 W [afr-self-heal-data.c:798:afr_sh_data_fix] afr4: Picking
> favorite child u100-rs1 as authentic source to resolve conflicting data
> of /backup5/robbie.foo.co.za/rdiff-backup-data/mirror_metadata.2009-03-14T08:00:17+02:00.snapshot.gz
> 2009-03-14 13:32:15 W [afr-self-heal-data.c:646:afr_sh_data_open_cbk] afr4:
> sourcing file /backup5/robbie.foo.co.za/rdiff-backup-data/mirror_meta
> data.2009-03-14T08:00:17+02:00.snapshot.gz from u100-rs1 to other sinks
> 2009-03-14 13:32:22 E [socket.c:102:__socket_rwv] u50-dcc1: readv failed (Bad
> address)
> 2009-03-14 13:32:22 E [socket.c:634:__socket_proto_state_machine] u50-dcc1:
> read (Bad address) in state 3 (192.168.227.65:6996)
> 2009-03-14 13:32:22 E [saved-frames.c:169:saved_frames_unwind] u50-dcc1:
> forced unwinding frame type(1) op(READ)
> 2009-03-14 13:32:22 E [socket.c:102:__socket_rwv] u50-dr1: readv failed (Bad
> address)
> 2009-03-14 13:32:22 E [socket.c:634:__socket_proto_state_machine] u50-dr1:
> read (Bad address) in state 3 (192.168.227.31:6996)
> 2009-03-14 13:32:22 E [saved-frames.c:169:saved_frames_unwind] u50-dr1: forced
> unwinding frame type(1) op(READ)
> 2009-03-14 13:32:22 E [fuse-bridge.c:1548:fuse_readv_cbk] glusterfs-fuse:
> 5998294: READ => -1 (Transport endpoint is not connected)
> 2009-03-14 13:33:03 E [socket.c:102:__socket_rwv] u50-dr2: readv failed (Bad
> address)
> 2009-03-14 13:33:03 E [socket.c:634:__socket_proto_state_machine] u50-dr2:
> read (Bad address) in state 3 (192.168.227.32:6996)
> 2009-03-14 13:33:03 E [saved-frames.c:169:saved_frames_unwind] u50-dr2: forced
> unwinding frame type(1) op(READ)
> 2009-03-14 13:33:03 E [socket.c:102:__socket_rwv] u50-rs3: readv failed (Bad
> address)
> 2009-03-14 13:33:03 E [socket.c:634:__socket_proto_state_machine] u50-rs3:
> read (Bad address) in state 3 (192.168.227.59:6996)
> 2009-03-14 13:33:03 E [saved-frames.c:169:saved_frames_unwind] u50-rs3: forced
> unwinding frame type(1) op(READ)
> 2009-03-14 13:33:03 E [fuse-bridge.c:1548:fuse_readv_cbk] glusterfs-fuse:
> 6006118: READ => -1 (Transport endpoint is not connected)
> 2009-03-14 13:33:03 E [fuse-bridge.c:1548:fuse_readv_cbk] glusterfs-fuse:
> 6006119: READ => -1 (Transport endpoint is not connected)
> 2009-03-14 13:33:03 E [fuse-bridge.c:1548:fuse_readv_cbk] glusterfs-fuse:
> 6006120: READ => -1 (Transport endpoint is not connected)
> 2009-03-14 13:33:03 E [fuse-bridge.c:1548:fuse_readv_cbk] glusterfs-fuse:
> 6006121: READ => -1 (Transport endpoint is not connected)
> pending frames:
> œ·
> patchset: cb602a1d7d41587c24379cb2636961ab91446f86 +
> signal received: 6
> configuration details:argp 1
> backtrace 1
> dlfcn 1
> fdatasync 1
> libpthread 1
> llistxattr 1
> setfsid 1
> spinlock 1
> epoll.h 1
> xattr.h 1
> st_atim.tv_nsec 1
> package-string: glusterfs 2.0.0rc4
> [0x381420]
> /lib/libc.so.6(abort+0x101)[0xb86451]
> /usr/lib/glusterfs/2.0.0rc4/xlator/mount/fuse.so[0x54b9a8]
> /lib/libpthread.so.0[0xd302db]
> /lib/libc.so.6(clone+0x5e)[0xc2912e]
> ---------
>
>
> On the server side, the following messages don't enlighten me, but do remind
> me that there was another client running version 1.13 still connecting.  It
> looks like the server just noticed that the client died.
>
> 2009-03-14 13:30:03 E [socket.c:583:__socket_proto_state_machine] server:
> socket header validate failed (192.168.227.167:1023). possible mismatch of
>  transport-type between server and client volumes, or version mismatch
> 2009-03-14 13:30:03 N [server-protocol.c:8048:notify] server:
> 192.168.227.167:1023 disconnected
> 2009-03-14 13:31:45 E [socket.c:463:__socket_proto_validate_header] server:
> socket header signature does not match :O (42.6c.6f)
> 2009-03-14 13:31:45 E [socket.c:583:__socket_proto_state_machine] server:
> socket header validate failed (192.168.227.167:1023). possible mismatch of
>  transport-type between server and client volumes, or version mismatch
> 2009-03-14 13:31:45 N [server-protocol.c:8048:notify] server:
> 192.168.227.167:1023 disconnected
> 2009-03-14 13:32:22 E [socket.c:102:__socket_rwv] server: readv failed
> (Connection reset by peer)
> 2009-03-14 13:32:22 E [socket.c:561:__socket_proto_state_machine] server: read
> (Connection reset by peer) in state 1 (192.168.227.5:1020)
> 2009-03-14 13:32:22 N [server-protocol.c:8048:notify] server:
> 192.168.227.5:1020 disconnected
> 2009-03-14 13:32:22 N [server-protocol.c:7295:mop_setvolume] server: accepted
> client from 192.168.227.5:1020
> 2009-03-14 13:35:48 N [server-protocol.c:8048:notify] server:
> 192.168.227.5:1017 disconnected
> 2009-03-14 13:35:48 N [server-protocol.c:8048:notify] server:
> 192.168.227.5:1020 disconnected
> 2009-03-14 13:35:48 N [server-helpers.c:515:server_connection_destroy] server:
> destroyed connection of backup5.foo.com-23205-2009/03/14-07:10:
> 52:777008-u50-dcc1
>
>
> On another server brick, the 25Gb volume u50-dr1-raw was full (it should have
> been 50Gb like its peer).  As I recall, the free space of the second volume
> of AFR which does not get checked for disk space (a bug, IMHO).
>
> It said this, which could have led to the client-side failure a few minutes
> later (the clocks are in sync):
>
> 2009-03-14 13:30:23 W [posix.c:773:posix_mkdir] u50-dr1-raw: mkdir
> of /backup5/bumblebee.foo.co.za/rdiff-backup-data/rdiff-backup.tmp.1: No
> space left on device
> 2009-03-14 13:30:23 E [server-protocol.c:3478:server_stub_resume] server:
> 1109657: INODELK (/backup5/bumblebee.foo.co.za/rdiff-backup-data/rdiff-
> backup.tmp.1) on u50-dr1 returning error: -1 (2)
> 2009-03-14 13:30:23 E [server-protocol.c:3478:server_stub_resume] server:
> 1109658: INODELK (/backup5/bumblebee.foo.co.za/rdiff-backup-data/rdiff-
> backup.tmp.1) on u50-dr1 returning error: -1 (2)
> 2009-03-14 13:30:23 E [server-protocol.c:3448:server_stub_resume] server:
> 3184942: ENTRYLK (/backup5/bumblebee.foo.co.za/rdiff-backup-data/rdiff-
> backup.tmp.1) on u50-dr1 for key <nul> returning error: -1 (2)
> 2009-03-14 13:30:23 E [server-protocol.c:3448:server_stub_resume] server:
> 3184943: ENTRYLK (/backup5/bumblebee.foo.co.za/rdiff-backup-data/rdiff-
> backup.tmp.1) on u50-dr1 for key <nul> returning error: -1 (2)
> 2009-03-14 13:30:23 E [server-protocol.c:3448:server_stub_resume] server:
> 3184947: ENTRYLK (/backup5/bumblebee.foo.co.za/rdiff-backup-data/rdiff-
> backup.tmp.1) on u50-dr1 for key hl returning error: -1 (2)
> 2009-03-14 13:30:23 E [server-protocol.c:3448:server_stub_resume] server:
> 3184949: ENTRYLK (/backup5/bumblebee.foo.co.za/rdiff-backup-data/rdiff-
> backup.tmp.1) on u50-dr1 for key hl returning error: -1 (2)
> 2009-03-14 13:30:23 E [server-protocol.c:3478:server_stub_resume] server:
> 1109660: INODELK (/backup5/bumblebee.foo.co.za/rdiff-backup-data/rdiff-
> backup.tmp.1/hl) on u50-dr1 returning error: -1 (2)
> 2009-03-14 13:30:23 E [server-protocol.c:3478:server_stub_resume] server:
> 1109661: INODELK (/backup5/bumblebee.foo.co.za/rdiff-backup-data/rdiff-
> backup.tmp.1/hl) on u50-dr1 returning error: -1 (2)
> 2009-03-14 13:30:23 E [server-protocol.c:3448:server_stub_resume] server:
> 3184952: ENTRYLK (/backup5/bumblebee.foo.co.za/rdiff-backup-data/rdiff-
> backup.tmp.1/hl) on u50-dr1 for key <nul> returning error: -1 (2)
> 2009-03-14 13:30:23 E [server-protocol.c:3448:server_stub_resume] server:
> 3184953: ENTRYLK (/backup5/bumblebee.foo.co.za/rdiff-backup-data/rdiff-
> backup.tmp.1/hl) on u50-dr1 for key <nul> returning error: -1 (2)
> 2009-03-14 13:30:23 E [server-protocol.c:2774:server_stub_resume] server:
> 1109663: XATTROP (/backup5/bumblebee.foo.co.za/rdiff-backup-data/rdiff-
> backup.tmp.1) on u50-dr1 returning error: -1 (2)
> 2009-03-14 13:30:23 E [server-protocol.c:2868:server_stub_resume] server:
> 1109665: RMDIR (/backup5/bumblebee.foo.co.za/rdiff-backup-data/rdiff-ba
> ckup.tmp.1) on u50-dr1 returning error: -1 (2)
> 2009-03-14 13:31:45 E [socket.c:463:__socket_proto_validate_header] server:
> socket header signature does not match :O (42.6c.6f)
> 2009-03-14 13:31:45 E [socket.c:583:__socket_proto_state_machine] server:
> socket header validate failed (192.168.227.167:1022). possible mismatch of
>  transport-type between server and client volumes, or version mismatch
> 2009-03-14 13:31:45 N [server-protocol.c:8048:notify] server:
> 192.168.227.167:1022 disconnected
> 2009-03-14 13:32:22 E [socket.c:102:__socket_rwv] server: readv failed
> (Connection reset by peer)
> 2009-03-14 13:32:22 E [socket.c:561:__socket_proto_state_machine] server: read
> (Connection reset by peer) in state 1 (192.168.227.5:1016)
> 2009-03-14 13:32:22 N [server-protocol.c:8048:notify] server:
> 192.168.227.5:1016 disconnected
> 2009-03-14 13:32:23 N [server-protocol.c:7295:mop_setvolume] server: accepted
> client from 192.168.227.5:1016
>
>
> I may have to move the backup in question off glusterfs (if I can just find
> the space somewhere), since it has taken 4 days to realise that the backing
> up is not just slow, but faulty.  (Of course, if I can't fix it, I'll win a
> trip to the data center to install a new machine to replace the system.)
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
>




More information about the Gluster-users mailing list