[Gluster-devel] Possible bug in the communications layer ?

Xavier Hernandez xhernandez at datalab.es
Thu Apr 28 14:45:36 UTC 2016


 

Hi Jeff, 

On 28.04.2016 15:20, Jeff Darcy wrote: 

>> This happens
with Gluster 3.7.11 accessed through Ganesha and gfapi. The volume is a
distributed-disperse 4*(4+2). I'm able to reproduce the problem easily
doing the following test: iozone -t2 -s10g -r1024k -i0 -w
-F/iozone{1..2}.dat echo 3 >/proc/sys/vm/drop_caches iozone -t2 -s10g
-r1024k -i1 -w -F/iozone{1..2}.dat The error happens soon after starting
the read test. As can be seen in the data below, client3_3_readv_cbk()
is processing an iovec of 116 bytes, however it should be of 154 bytes
(the buffer in memory really seems to contain 154 bytes). The data on
the network seems ok (at least I haven't been able to identify any
problem), so this must be a processing error on the client side. The
last field in cut buffer of the sequentialized data corresponds to the
length of the xdata field: 0x26. So at least 38 more byte should be
present.
> 
> Nice detective work, Xavi. It would be *very* interesting
to see what
> the value of the "count" parameter is (it's unfortunately
optimized out).
> I'll bet it's two, and iov[1].iov_len is 38. I have a
weak memory of
> some problems with how this iov is put together, a
couple of years ago,
> and it looks like you might have tripped over one
more.

It seems you are right. The count is 2 and the first 38 bytes of
the second vector contains the remaining data of xdata field. The rest
of the data in the second vector seems the payload of the readv fop,
plus a 2 bytes padding:

(gdb) f 0
#0 client3_3_readv_cbk
(req=0x7fdc4051a31c, iov=0x7fdc4051a35c, count=<optimized out>,
myframe=0x7fdc520d505c) at client-rpc-fops.c:3021
3021 gf_msg
(this->name, GF_LOG_ERROR, EINVAL,
(gdb) print *iov
$2 = {iov_base =
0x7fdc14b0d018, iov_len = 116}
(gdb) f 1
#1 0x00007fdc56dafab0 in
rpc_clnt_handle_reply (clnt=clnt at entry=0x7fdc3c1f4bb0,
pollin=pollin at entry=0x7fdc34010f20) at rpc-clnt.c:764
764 req->cbkfn
(req, req->rsp, req->rspcnt, saved_frame->frame);
(gdb) print *pollin
$3
= {vector = {{iov_base = 0x7fdc14b0d000, iov_len = 140}, {iov_base =
0x7fdc14a4d000, iov_len = 32808}, {iov_base = 0x0, iov_len = 0} <repeats
14 times>}, count = 2,
 vectored = 1 '01', private = 0x7fdc340106c0,
iobref = 0x7fdc34006660, hdr_iobuf = 0x7fdc3c4c07c0, is_reply = 1
'01'}
(gdb) f 0
#0 client3_3_readv_cbk (req=0x7fdc4051a31c,
iov=0x7fdc4051a35c, count=<optimized out>, myframe=0x7fdc520d505c) at
client-rpc-fops.c:3021
3021 gf_msg (this->name, GF_LOG_ERROR,
EINVAL,
(gdb) print iov[1]
$4 = {iov_base = 0x7fdc14a4d000, iov_len =
32808}
(gdb) print iov[2]
$5 = {iov_base = 0x2, iov_len =
140583741974112}
(gdb) x/128xb 0x7fdc14a4d000
0x7fdc14a4d000: 0x00 0x00
0x00 0x01 0x00 0x00 0x00 0x17
0x7fdc14a4d008: 0x00 0x00 0x00 0x02 0x67
0x6c 0x75 0x73
0x7fdc14a4d010: 0x74 0x65 0x72 0x66 0x73 0x2e 0x69
0x6e
0x7fdc14a4d018: 0x6f 0x64 0x65 0x6c 0x6b 0x2d 0x63
0x6f
0x7fdc14a4d020: 0x75 0x6e 0x74 0x00 0x31 0x00 0x00
0x00
0x7fdc14a4d028: 0x5c 0x5c 0x5c 0x5c 0x5c 0x5c 0x5c
0x5c
0x7fdc14a4d030: 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x00
0x7fdc14a4d038: 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x00
0x7fdc14a4d040: 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x00
0x7fdc14a4d048: 0x5c 0x00 0x00 0x00 0x00 0x00 0x00
0x00
0x7fdc14a4d050: 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x00
0x7fdc14a4d058: 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x00
0x7fdc14a4d060: 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x00
0x7fdc14a4d068: 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x00
0x7fdc14a4d070: 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x00
0x7fdc14a4d078: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00

> Maybe
it's related to all that epoll stuff.

I'm currently using 4 epoll
threads (this improves ec performance). I'll try to repeat the tests
with a single epoll thread, but I'm not sure if this will be enough to
get any conclusion if the problem doesn't manifest, since ec through
fuse with 4 epoll threads doesn't seem to trigger the problem.

Xavi
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-devel/attachments/20160428/4074907b/attachment-0001.html>


More information about the Gluster-devel mailing list