[Gluster-devel] Possible bug in the communications layer ?
Xavier Hernandez
xhernandez at datalab.es
Thu Apr 28 12:43:01 UTC 2016
Hi,
I've seen what seems a bug in the communications layer. The first sign
is an "XDR decoding failed" error in the logs.
This happens with Gluster 3.7.11 accessed through Ganesha and gfapi. The
volume is a distributed-disperse 4*(4+2).
I'm able to reproduce the problem easily doing the following test:
iozone -t2 -s10g -r1024k -i0 -w -F <nfs mount>/iozone{1..2}.dat
echo 3 >/proc/sys/vm/drop_caches
iozone -t2 -s10g -r1024k -i1 -w -F <nfs mount>/iozone{1..2}.dat
The error happens soon after starting the read test.
As can be seen in the data below, client3_3_readv_cbk() is processing an
iovec of 116 bytes, however it should be of 154 bytes (the buffer in
memory really seems to contain 154 bytes). The data on the network seems
ok (at least I haven't been able to identify any problem), so this must
be a processing error on the client side.
The last field in cut buffer of the sequentialized data corresponds to
the length of the xdata field: 0x26. So at least 38 more byte should be
present.
My guess is that some corner case is hit reading fragmented network
packets due to a high load.
Debug information:
Breakpoint 1, client3_3_readv_cbk (req=0x7f540e64106c,
iov=0x7f540e6410ac, count=<optimized out>, myframe=0x7f54259a4d54) at
client-rpc-fops.c:3021
3021 gf_msg (this->name, GF_LOG_ERROR, EINVAL,
(gdb) print *iov
$1 = {iov_base = 0x7f53e994e018, iov_len = 116}
(gdb) x/116xb 0x7f53e994e018
0x7f53e994e018: 0x00 0x00 0x80 0x00 0x00 0x00 0x00 0x00
0x7f53e994e020: 0xa8 0xbf 0xa3 0xe0 0x5f 0x48 0x4c 0x1e
0x7f53e994e028: 0x80 0xa3 0x8a 0xd8 0x9d 0xa1 0x1c 0x75
0x7f53e994e030: 0x80 0xa3 0x8a 0xd8 0x9d 0xa1 0x1c 0x75
0x7f53e994e038: 0x00 0x00 0x00 0x00 0x00 0x00 0x08 0x00
0x7f53e994e040: 0x00 0x00 0x81 0xa0 0x00 0x00 0x00 0x01
0x7f53e994e048: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7f53e994e050: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7f53e994e058: 0x00 0x00 0x00 0x00 0xa0 0x00 0x00 0x00
0x7f53e994e060: 0x00 0x00 0x10 0x00 0x00 0x00 0x00 0x00
0x7f53e994e068: 0x00 0x50 0x00 0x00 0x57 0x22 0x04 0x1f
0x7f53e994e070: 0x25 0x38 0x92 0x91 0x57 0x22 0x04 0xb3
0x7f53e994e078: 0x03 0x53 0x1b 0x13 0x57 0x22 0x04 0xb3
0x7f53e994e080: 0x06 0xf5 0xe1 0x99 0x00 0x00 0x80 0x00
0x7f53e994e088: 0x00 0x00 0x00 0x26
(gdb) x/154xb 0x7f53e994e018
0x7f53e994e018: 0x00 0x00 0x80 0x00 0x00 0x00 0x00 0x00
0x7f53e994e020: 0xa8 0xbf 0xa3 0xe0 0x5f 0x48 0x4c 0x1e
0x7f53e994e028: 0x80 0xa3 0x8a 0xd8 0x9d 0xa1 0x1c 0x75
0x7f53e994e030: 0x80 0xa3 0x8a 0xd8 0x9d 0xa1 0x1c 0x75
0x7f53e994e038: 0x00 0x00 0x00 0x00 0x00 0x00 0x08 0x00
0x7f53e994e040: 0x00 0x00 0x81 0xa0 0x00 0x00 0x00 0x01
0x7f53e994e048: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7f53e994e050: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7f53e994e058: 0x00 0x00 0x00 0x00 0xa0 0x00 0x00 0x00
0x7f53e994e060: 0x00 0x00 0x10 0x00 0x00 0x00 0x00 0x00
0x7f53e994e068: 0x00 0x50 0x00 0x00 0x57 0x22 0x04 0x1f
0x7f53e994e070: 0x25 0x38 0x92 0x91 0x57 0x22 0x04 0xb3
0x7f53e994e078: 0x03 0x53 0x1b 0x13 0x57 0x22 0x04 0xb3
0x7f53e994e080: 0x06 0xf5 0xe1 0x99 0x00 0x00 0x80 0x00
0x7f53e994e088: 0x00 0x00 0x00 0x26 0x00 0x00 0x00 0x01
0x7f53e994e090: 0x00 0x00 0x00 0x17 0x00 0x00 0x00 0x02
0x7f53e994e098: 0x67 0x6c 0x75 0x73 0x74 0x65 0x72 0x66
0x7f53e994e0a0: 0x73 0x2e 0x69 0x6e 0x6f 0x64 0x65 0x6c
0x7f53e994e0a8: 0x6b 0x2d 0x63 0x6f 0x75 0x6e 0x74 0x00
0x7f53e994e0b0: 0x31 0x00
(gdb) bt
#0 client3_3_readv_cbk (req=0x7f540e64106c, iov=0x7f540e6410ac,
count=<optimized out>, myframe=0x7f54259a4d54) at client-rpc-fops.c:3021
#1 0x00007f542a677ab0 in rpc_clnt_handle_reply
(clnt=clnt at entry=0x7f54101cdef0, pollin=pollin at entry=0x7f54000091f0) at
rpc-clnt.c:764
#2 0x00007f542a677d6f in rpc_clnt_notify (trans=<optimized out>,
mydata=0x7f54101cdf20, event=<optimized out>, data=0x7f54000091f0) at
rpc-clnt.c:925
#3 0x00007f542a673853 in rpc_transport_notify
(this=this at entry=0x7f54101ddb70,
event=event at entry=RPC_TRANSPORT_MSG_RECEIVED,
data=data at entry=0x7f54000091f0) at rpc-transport.c:546
#4 0x00007f541d881666 in socket_event_poll_in
(this=this at entry=0x7f54101ddb70) at socket.c:2237
#5 0x00007f541d8842c4 in socket_event_handler (fd=fd at entry=30,
idx=idx at entry=20, data=0x7f54101ddb70, poll_in=1, poll_out=0,
poll_err=0) at socket.c:2350
#6 0x00007f542a90aa4a in event_dispatch_epoll_handler
(event=0x7f540effc540, event_pool=0xd3d9a0) at event-epoll.c:575
#7 event_dispatch_epoll_worker (data=0x7f54100d0890) at event-epoll.c:678
#8 0x00007f542d0fbdc5 in start_thread () from /lib64/libpthread.so.0
#9 0x00007f542c7bc28d in clone () from /lib64/libc.so.6
Xavi
More information about the Gluster-devel
mailing list