[Gluster-devel] Possible bug in the communications layer ?

Fri Apr 29 06:37:59 UTC 2016

----- Original Message -----
> From: "Xavier Hernandez" <xhernandez at datalab.es>
> To: "Jeff Darcy" <jdarcy at redhat.com>
> Cc: "Gluster Devel" <gluster-devel at gluster.org>
> Sent: Thursday, April 28, 2016 8:15:36 PM
> Subject: Re: [Gluster-devel] Possible bug in the communications layer ?
> 
> 
> 
> Hi Jeff,
> 
> On 28.04.2016 15:20, Jeff Darcy wrote:
> 
> 
> 
> This happens with Gluster 3.7.11 accessed through Ganesha and gfapi. The
> volume is a distributed-disperse 4*(4+2). I'm able to reproduce the problem
> easily doing the following test: iozone -t2 -s10g -r1024k -i0 -w
> -F/iozone{1..2}.dat echo 3 >/proc/sys/vm/drop_caches iozone -t2 -s10g
> -r1024k -i1 -w -F/iozone{1..2}.dat The error happens soon after starting the
> read test. As can be seen in the data below, client3_3_readv_cbk() is
> processing an iovec of 116 bytes, however it should be of 154 bytes (the
> buffer in memory really seems to contain 154 bytes). The data on the network
> seems ok (at least I haven't been able to identify any problem), so this
> must be a processing error on the client side. The last field in cut buffer
> of the sequentialized data corresponds to the length of the xdata field:
> 0x26. So at least 38 more byte should be present.
> Nice detective work, Xavi.  It would be *very* interesting to see what
> the value of the "count" parameter is (it's unfortunately optimized out).
> I'll bet it's two, and iov[1].iov_len is 38.  I have a weak memory of
> some problems with how this iov is put together, a couple of years ago,
> and it looks like you might have tripped over one more.
> It seems you are right. The count is 2 and the first 38 bytes of the second
> vector contains the remaining data of xdata field. 

This is the bug. client3_3_readv_cbk (and for that matter all the actors/cbks) expects response in utmost two vectors:
1. Program header containing request or response. This is subjected to decoding/encoding. This vector should point to a buffer that contains the entire program header/response contiguously.
2. If the procedure returns payload (like readv response or a write request), second vector contains the buffer pointing to the entire (contiguous) payload. Note that this payload is raw and is not subjected to encoding/decoding.

In your case, this _clean_ separation is broken with part of program header slipping into 2nd vector supposed to contain read data (may be because of rpc fragmentation). I think this is a bug in socket layer. I'll update more on this.

> The rest of the data in
> the second vector seems the payload of the readv fop, plus a 2 bytes
> padding:
> (gdb) f 0
> #0  client3_3_readv_cbk (req=0x7fdc4051a31c, iov=0x7fdc4051a35c,
> count=<optimized out>, myframe=0x7fdc520d505c) at client-rpc-fops.c:3021
> 3021                    gf_msg (this->name, GF_LOG_ERROR, EINVAL,
> (gdb) print *iov
> $2 = {iov_base = 0x7fdc14b0d018, iov_len = 116}
> (gdb) f 1
> #1  0x00007fdc56dafab0 in rpc_clnt_handle_reply
> (clnt=clnt at entry=0x7fdc3c1f4bb0, pollin=pollin at entry=0x7fdc34010f20) at
> rpc-clnt.c:764
> 764             req->cbkfn (req, req->rsp, req->rspcnt, saved_frame->frame);
> (gdb) print *pollin
> $3 = {vector = {{iov_base = 0x7fdc14b0d000, iov_len = 140}, {iov_base =
> 0x7fdc14a4d000, iov_len = 32808}, {iov_base = 0x0, iov_len = 0} <repeats 14
> times>}, count = 2,
> vectored = 1 '\001', private = 0x7fdc340106c0, iobref = 0x7fdc34006660,
> hdr_iobuf = 0x7fdc3c4c07c0, is_reply = 1 '\001'}
> (gdb) f 0
> #0  client3_3_readv_cbk (req=0x7fdc4051a31c, iov=0x7fdc4051a35c,
> count=<optimized out>, myframe=0x7fdc520d505c) at client-rpc-fops.c:3021
> 3021                    gf_msg (this->name, GF_LOG_ERROR, EINVAL,
> (gdb) print iov[1]
> $4 = {iov_base = 0x7fdc14a4d000, iov_len = 32808}
> (gdb) print iov[2]
> $5 = {iov_base = 0x2, iov_len = 140583741974112}
> (gdb) x/128xb 0x7fdc14a4d000
> 0x7fdc14a4d000: 0x00    0x00    0x00    0x01    0x00    0x00    0x00    0x17
> 0x7fdc14a4d008: 0x00    0x00    0x00    0x02    0x67    0x6c    0x75    0x73
> 0x7fdc14a4d010: 0x74    0x65    0x72    0x66    0x73    0x2e    0x69    0x6e
> 0x7fdc14a4d018: 0x6f    0x64    0x65    0x6c    0x6b    0x2d    0x63    0x6f
> 0x7fdc14a4d020: 0x75    0x6e    0x74    0x00    0x31    0x00    0x00    0x00
> 0x7fdc14a4d028: 0x5c    0x5c    0x5c    0x5c    0x5c    0x5c    0x5c    0x5c
> 0x7fdc14a4d030: 0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00
> 0x7fdc14a4d038: 0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00
> 0x7fdc14a4d040: 0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00
> 0x7fdc14a4d048: 0x5c    0x00    0x00    0x00    0x00    0x00    0x00    0x00
> 0x7fdc14a4d050: 0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00
> 0x7fdc14a4d058: 0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00
> 0x7fdc14a4d060: 0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00
> 0x7fdc14a4d068: 0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00
> 0x7fdc14a4d070: 0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00
> 0x7fdc14a4d078: 0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00
> 
> 
> 
> Maybe it's related to all that epoll stuff.
> I'm currently using 4 epoll threads (this improves ec performance). I'll try
> to repeat the tests with a single epoll thread, but I'm not sure if this
> will be enough to get any conclusion if the problem doesn't manifest, since
> ec through fuse with 4 epoll threads doesn't seem to trigger the problem.
> Xavi
> 
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel