[Gluster-devel] Possible bug in the communications layer ?

Xavier Hernandez xhernandez at datalab.es
Thu Apr 28 17:10:44 UTC 2016


 

I've filed bug https://bugzilla.redhat.com/show_bug.cgi?id=1331502
and added Raghavendra Gowdappa in the CC list (he appears as a
maintainer of RPC). 

Xavi 

On 28.04.2016 18:42, Xavier Hernandez
wrote: 

> Hi Niels, 
> 
> On 28.04.2016 15:44, Niels de Vos wrote: 
>

>> On Thu, Apr 28, 2016 at 02:43:01PM +0200, Xavier Hernandez wrote:
>>

>>> Hi, I've seen what seems a bug in the communications layer. The
first sign is an "XDR decoding failed" error in the logs. This happens
with Gluster 3.7.11 accessed through Ganesha and gfapi. The volume is a
distributed-disperse 4*(4+2). I'm able to reproduce the problem easily
doing the following test: iozone -t2 -s10g -r1024k -i0 -w
-F/iozone{1..2}.dat echo 3 >/proc/sys/vm/drop_caches iozone -t2 -s10g
-r1024k -i1 -w -F/iozone{1..2}.dat The error happens soon after starting
the read test.
>> 
>> Do you know if this only happens on disperse
volumes, or also with
>> others?
> 
> I have only seen the problem with
EC. Not sure if it's because I do most of my tests with EC or because
other volumes do not manifest this problem. However I've only seen this
when I started testing Ganesha. I have never seen the problem with
FUSE.
> 
> I think the reason is that FUSE is slower than Ganesha
(little more than 120 MB/s vs 400 MB/s) and the combination of events
needed to cause this problem would be much more unlikely to happen on
FUSE. Since EC also talks to many bricks simultaneously (6 in this
case), maybe this makes it more sensible to communications problems
compared to a replicated volume.
> 
>> If you have captured a network
trace, could you provide it to
>> me? You can use 'editcap -s0 ...' to
copy only the relevant packets.
>> But, I dont have an issue to download
a few GB either if that is easier
>> for you.
>> 
>>> As can be seen in
the data below, client3_3_readv_cbk() is processing an iovec of 116
bytes, however it should be of 154 bytes (the buffer in memory really
seems to contain 154 bytes). The data on the network seems ok (at least
I haven't been able to identify any problem), so this must be a
processing error on the client side. The last field in cut buffer of the
sequentialized data corresponds to the length of the xdata field: 0x26.
So at least 38 more byte should be present. My guess is that some corner
case is hit reading fragmented network packets due to a high load. Debug
information: Breakpoint 1, client3_3_readv_cbk (req=0x7f540e64106c,
iov=0x7f540e6410ac, count=, myframe=0x7f54259a4d54) at
client-rpc-fops.c:3021 3021 gf_msg (this->name, GF_LOG_ERROR, EINVAL,
(gdb) print *iov $1 = {iov_base = 0x7f53e994e018, iov_len = 116} (gdb)
x/116xb 0x7f53e994e018 0x7f53e994e018: 0x00 0x00 0x80 0x00 0x00 0x00
0x00 0x00 0x7f53e994e020: 0xa8 0xbf 0xa3 0xe0 0x5f 0x48 0x4c 0x1e
>> 
>>
Hmm, I'm not sure how this is layed out in memory. 0x80 would be one
of
>> first bytes in RPC payload, it signals 'last record' for the
RPC
>> procedure, and we only send one record anyway. The four bytes
combined
>> like (0x80 0x.. 0x.. 0x..) should be (0x80 |
rpc-record-size). Reading
>> this in Wireshark from a .pcap.gz is much
easier :)
> 
> The RPC header is already decoded here. The 116 bytes are
only the content of the readv answer, as decoded by xdr_gfs3_read_rsp()
function.
> 
> This means that the first 4 bytes are the op_ret, next 4
are op_errno, followed by an encoded iatt (with gfid and ino number as
the first fields). Then a size field and the length of a stream of bytes
(corresponding to the encoded xdata).
> 
> I've a network capture. I can
upload it if you want, but I think this is what you are trying to see:
>

> (gdb) f 1
> #1 0x00007fa88cb5bab0 in rpc_clnt_handle_reply
(clnt=clnt at entry=0x7fa870413e30, pollin=pollin at entry=0x7fa86400efe0) at
rpc-clnt.c:764
> 764 req->cbkfn (req, req->rsp, req->rspcnt,
saved_frame->frame);
> (gdb) print *pollin
> $1 = {vector = {{iov_base =
0x7fa7ef580000, iov_len = 140}, {iov_base = 0x7fa7ef460000, iov_len =
32808}, {iov_base = 0x0, iov_len = 0} }, count = 2,
> vectored = 1 '01',
private = 0x7fa86c00ec60, iobref = 0x7fa86400e560, hdr_iobuf =
0x7fa868022ee0, is_reply = 1 '01'}
> (gdb) x/140xb 0x7fa7ef580000
>
0x7fa7ef580000: 0x00 0x00 0x89 0x68 0x00 0x00 0x00 0x01
>
0x7fa7ef580008: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
>
0x7fa7ef580010: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
>
0x7fa7ef580018: 0x00 0x00 0x80 0x00 0x00 0x00 0x00 0x00
>
0x7fa7ef580020: 0xc3 0x8e 0x35 0xf0 0x31 0xa1 0x45 0x01
>
0x7fa7ef580028: 0x8a 0x21 0x06 0x4b 0x08 0x4c 0x59 0xdf
>
0x7fa7ef580030: 0x8a 0x21 0x06 0x4b 0x08 0x4c 0x59 0xdf
>
0x7fa7ef580038: 0x00 0x00 0x00 0x00 0x00 0x00 0x08 0x00
>
0x7fa7ef580040: 0x00 0x00 0x81 0xa0 0x00 0x00 0x00 0x01
>
0x7fa7ef580048: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
>
0x7fa7ef580050: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
>
0x7fa7ef580058: 0x00 0x00 0x00 0x00 0xa0 0x00 0x00 0x00
>
0x7fa7ef580060: 0x00 0x00 0x10 0x00 0x00 0x00 0x00 0x00
>
0x7fa7ef580068: 0x00 0x50 0x00 0x00 0x57 0x22 0x33 0x1f
>
0x7fa7ef580070: 0x24 0x0c 0xc2 0x58 0x57 0x22 0x33 0xa1
>
0x7fa7ef580078: 0x34 0x11 0x3b 0x54 0x57 0x22 0x33 0xa1
>
0x7fa7ef580080: 0x34 0x20 0x7d 0x7a 0x00 0x00 0x80 0x00
>
0x7fa7ef580088: 0x00 0x00 0x00 0x26
> 
> (gdb) f 3
> #3
0x00007fa88cb57853 in rpc_transport_notify
(this=this at entry=0x7fa870423ab0,
event=event at entry=RPC_TRANSPORT_MSG_RECEIVED,
data=data at entry=0x7fa86400efe0) at rpc-transport.c:546
> 546 ret =
this->notify (this, this->mydata, event, data);
> (gdb) print *this
> $3
= {ops = 0x7fa87ff76740 , listener = 0x0, private = 0x7fa870424750,
xl_private = 0x0, xl = 0x7fa87000b5a0, mydata = 0x7fa870413e60, lock =
{__data = {__lock = 0,
> __count = 0, __owner = 0, __nusers = 0, __kind
= 0, __spins = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '00'
, __align = 0}, refcount = 2,
> outstanding_rpc_count = 0, ctx =
0x2688740, options = 0x7fa88788529c, name = 0x7fa8704245f0
"test-client-2", dnscache = 0x7fa870472b90, drc_client = 0x0, buf =
0x0,
> init = 0x7fa87fd6b760 , fini = 0x7fa87fd6b690 , reconfigure =
0x7fa87fd6b290 , notify = 0x7fa88cb5bbb0 , notify_data = 0x0, peerinfo =
{
> sockaddr = {ss_family = 2, __ss_align = 0, __ss_padding = '00' },
sockaddr_len = 16, identifier = "193.109.174.232:49152", '00' ,
>
max_op_version = 0, min_op_version = 0, volname = '00' }, myinfo =
{sockaddr = {ss_family = 2, __ss_align = 0,
> __ss_padding = '00' },
sockaddr_len = 16, identifier = "193.109.174.230:65205", '00' ,
max_op_version = 0, min_op_version = 0,
> volname = '00' },
total_bytes_read = 390116824, total_bytes_write = 5374554184, list =
{next = 0x7fa870424568, prev = 0x7fa870424568}, bind_insecure = 1,
>
dl_handle = 0x26efad0, ssl_name = 0x0, clnt_options = 0x0}
> 
> The
first 4 bytes of the dump correspond to the XID of the answer (0x8968).
We can see that the answer comes from 193.109.174.232:49152. The output
of the following command is attached:
> 
> # tshark -r tcpdump.pcap -n
-V -Y "rpc.xid == 0x8968 and ip.addr == 193.109.174.232 and tcp.port ==
49152"
> 
> We can see that the answer is received in two TCP segments:
one of 116 bytes and the other of 32836 bytes. Probably this is the
source of the problem.
> 
> Don't hesitate to file a bug for this, and
include the exact error
> messages you get. If possi> I'll file a bug
with all this information.
>> 
>> Thanks,
>> 
>> Xavi
> dth: 100%;"> 
>

> Thanks,
> Niels
> 0x7f53e994e028: 0x80 0xa3 0x8a 0xd8 0x9d 0xa1 0x1c
0x75 0x7f53e994e030: 0x80 0xa3 0x8a 0xd8 0x9d 0xa1 0x1c 0x75
0x7f53e994e038: 0x00 0x00 0x00 0x00 0x00 0x00 0x08 0x00 0x7f53e994e040:
0x00 0x00 0
> 
>> 0 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x7f53e994e058:
0x00 0x00 0x00 0x00 0xa0 0x00 0x00 0x00 0x7f53e994e060: 0x00 0x00 0x10
0x00 0x00 0x00 0x00 0x00 0x7f53e994e068: 0x00 0x50 0x00 0x00 0x57 0x22
0x04 0x1f 0x7f53e994e070: 0x25 0x38 0x92 0x91 0x57 0x22 0x04 0xb3
0x7f53e994e078: 0x03 0x53 0x1b 0x13 0x57 0x22 0x04 0xb3 0x7f53e994e080:
0x06 0xf5 0xe1 0x99 0x00 0x00 0x80 0x00 0x7f53e994e088: 0x00 0x00 0x00
0x26 (gdb) x/154xb 0x7f53e994e018 0x7f53e994e018: 0x00 0x00 0x80 0x00
0x00 0x00 0x00 0x00 0x7f53e994e020: 0xa8 0xbf 0xa3 0xe0 0x5f 0x48 0x4c
0x1e 0x7f53e994e028: 0x80 0xa3 0x8a 0xd8 0x9d 0xa1 0x1c 0x75
0x7f53e994e030: 0x80 0xa3 0x8a 0xd8 0x9d 0xa1 0x1c 0x75 0x7f53e994e038:
0x00 0x00 0x00 0x00 0x00 0x00 0x08 0x00 0x7f53e994e040: 0x00 0x00 0x81
0xa0 0x00 0x00 0x00 0x01 0x7f53e994e048: 0x00 0x00 0x00 0x00 0x00 0x00
0x00 0x00 0x7f53e994e050: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7f53e994e058: 0x00 0x00 0x00 0x00 0xa0 0x00 0x00 0x00 0x7f53e994e060:
0x00 0x00 0x10 0x00 0x00 0x00 0x00 0x00 0x7f53e994e068: 0x00 0x50 0x00
0x00 0x57 0x22 0x04 0x1f 0x7f53e994e070: 0x25 0x38 0x92 0x91 0x57 0x22
0x04 0xb3 0x7f53e994e078: 0x03 0x53 0x1b 0x13 0x57 0x22 0x04 0xb3
0x7f53e994e080: 0x06 0xf5 0xe1 0x99 0x00 0x00 0x80 0x00 0x7f53e994e088:
0x00 0x00 0x00 0x26 0x00 0x00 0x00 0x01 0x7f53e994e090: 0x00 0x00 0x00
0x17 0x00 0x00 0x00 0x02 0x7f53e994e098: 0x67 0x6c 0x75 0x73 0x74 0x65
0x72 0x66 0x7f53e994e0a0: 0x73 0x2e 0x69 0x6e 0x6f 0x64 0x65 0x6c
0x7f53e994e0a8: 0x6b 0x2d 0x63 0x6f 0x75 0x6e 0x74 0x00 0x7f53e994e0b0:
0x31 0x00 (gdb) bt #0 client3_3_readv_cbk (req=0x7f540e64106c,
iov=0x7f540e6410ac, count=, myframe=0x7f54259a4d54) at
client-rpc-fops.c:3021 #1 0x00007f542a677ab0 in rpc_clnt_handle_reply
(clnt=clnt at entry=0x7f54101cdef0, pollin=pollin at entry=0x7f54000091f0) at
rpc-clnt.c:764 #2 0x00007f542a677d6f in rpc_clnt_notify (trans=,
mydata=0x7f54101cdf20, event=, data=0x7f54000091f0) at rpc-clnt.c:925 #3
0x00007f542a673853 in rpc_transport_notify
(this=this at entry=0x7f54101ddb70,
event=event at entry=RPC_TRANSPORT_MSG_RECEIVED,
data=data at entry=0x7f54000091f0) at rpc-transport.c:546 #4
0x00007f541d881666 in socket_event_poll_in
(this=this at entry=0x7f54101ddb70) at socket.c:2237 #5 0x00007f541d8842c4
in socket_event_handler (fd=fd at entry=30, idx=idx at entry=20,
data=0x7f54101ddb70, poll_in=1, poll_out=0, poll_err=0) at socket.c:2350
#6 0x00007f542a90aa4a in event_dispatch_epoll_handler
(event=0x7f540effc540, event_pool=0xd3d9a0) at event-epoll.c:575 #7
event_dispatch_epoll_worker (data=0x7f54100d0890) at event-epoll.c:678
#8 0x00007f542d0fbdc5 in start_thread () from /lib64/libpthread.so.0 #9
0x00007f542c7bc28d in clone () from /lib64/libc.so.6 Xavi
_______________________________________________ Gluster-devel mailing
list Gluster-devel at gluster.org
[1]http://www.gluster.org/mailman/listinfo/gluster-devel [2]



Links:
------
[1] mailto:Gluster-devel at gluster.org
[2]
http://www.gluster.org/mailman/listinfo/gluster-devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-devel/attachments/20160428/de6a3791/attachment.html>


More information about the Gluster-devel mailing list