[Gluster-devel] Possible bug in the communications layer ?

Mon May 9 10:35:54 UTC 2016

----- Original Message -----
> From: "Xavier Hernandez" <xhernandez at datalab.es>
> To: "Raghavendra Gowdappa" <rgowdapp at redhat.com>
> Cc: "Gluster Devel" <gluster-devel at gluster.org>
> Sent: Monday, May 9, 2016 3:07:16 PM
> Subject: Re: [Gluster-devel] Possible bug in the communications layer ?
> 
> Hi Raghavendra,
> 
> I've finally found the bug. It was obvious but I didn't see it.

Same here :).

> 
>   1561         case SP_STATE_ACCEPTED_SUCCESS_REPLY_INIT:
>   1562                 default_read_size = xdr_sizeof ((xdrproc_t)
> xdr_gfs3_read_rsp,
>   1563                                                 &read_rsp);
>   1564
>   1565                 proghdr_buf = frag->fragcurrent;
>   1566
>   1567                 __socket_proto_init_pending (priv,
> default_read_size);
>   1568
>   1569                 frag->call_body.reply.accepted_success_state
>   1570                         = SP_STATE_READING_PROC_HEADER;
>   1571
>   1572                 /* fall through */
>   1573
>   1574         case SP_STATE_READING_PROC_HEADER:
>   1575                 __socket_proto_read (priv, ret);
>   1576
>   1577                 gf_trace_add("xdrmem_create", default_read_size,
> (uintptr_t)proghdr_buf);
>   1578                 /* there can be 'xdata' in read response, figure
> it out */
>   1579                 xdrmem_create (&xdr, proghdr_buf, default_read_size,
>   1580                                XDR_DECODE);
>   1581
>   1582                 /* This will fail if there is xdata sent from
> server, if not,
>   1583                    well and good, we don't need to worry about  */
>   1584                 xdr_gfs3_read_rsp (&xdr, &read_rsp);
>   1585
>   1586                 free (read_rsp.xdata.xdata_val);
>   1587
>   1588                 /* need to round off to proper roof (%4), as XDR
> packing pads
>   1589                    the end of opaque object with '0' */
>   1590                 size = roof (read_rsp.xdata.xdata_len, 4);
>   1591
>   1592                 if (!size) {
>   1593                         frag->call_body.reply.accepted_success_state
>   1594                                 = SP_STATE_READ_PROC_OPAQUE;
>   1595                         goto read_proc_opaque;
>   1596                 }
>   1597
>   1598                 __socket_proto_init_pending (priv, size);
>   1599
>   1600                 frag->call_body.reply.accepted_success_state
>   1601                         = SP_STATE_READING_PROC_OPAQUE;
> 
> The main problem here is that we are using two local variables
> (proghdr_buf and default_read_size) in two distinct states that might be
> called at different times.
> 
> The particular case that is failing is the following:
> 
> 1. In state SP_STATE_ACCEPTED_SUCCESS_REPLY_INIT, everything is prepared
> to read 116 bytes. default_read_size is set to 116 and proghdr_buf
> points to the buffer where data will be written.
> 
> 2. In state SP_STATE_READING_PROC_HEADER, a partial read of 88 bytes is
> done. At this point the function returns and proghdr_buf and
> default_read_size are lost.
> 
> 3. When more data is available, this function is called again and it
> starts executing at state SP_STATE_READING_PROC_HEADER.
> 
> 4. The remaining 28 bytes are read.
> 
> 5. When it checks the buffer and tries to decode it to see if there's
> xdata present, it uses the default values of proghdr_buf and
> default_read_size, that are 0. This causes the decode to leave
> read_rsp.xdata.xdata_len set to 0.
> 
> 6. The program interprets that xdata_len being 0 means that there's no
> xdata, so it continues reading the remaining of the RPC packet into the
> payload buffer.
> 
> If you want, I can send a patch for this.

Yes. That would be helpful. The analysis is correct and moving initialization of prog_hdrbuf to line 1578 will fix the issue. If you are too busy, please let me know and I can patch it up too :).

Thanks for debugging the issue :).

regards,
Raghavendra.

> 
> Xavi
> 
> On 05/05/16 10:21, Xavier Hernandez wrote:
> > I've undone all changes and now I'm unable to reproduce the problem, so
> > the modification I did is probably incorrect and not the root cause, as
> > you described.
> >
> > I'll continue investigating...
> >
> > Xavi
> >
> > On 04/05/16 15:01, Xavier Hernandez wrote:
> >> On 04/05/16 14:47, Raghavendra Gowdappa wrote:
> >>>
> >>>
> >>> ----- Original Message -----
> >>>> From: "Xavier Hernandez" <xhernandez at datalab.es>
> >>>> To: "Raghavendra Gowdappa" <rgowdapp at redhat.com>
> >>>> Cc: "Gluster Devel" <gluster-devel at gluster.org>
> >>>> Sent: Wednesday, May 4, 2016 5:37:56 PM
> >>>> Subject: Re: [Gluster-devel] Possible bug in the communications layer ?
> >>>>
> >>>> I think I've found the problem.
> >>>>
> >>>> 1567         case SP_STATE_READING_PROC_HEADER:
> >>>>   1568                 __socket_proto_read (priv, ret);
> >>>>   1569
> >>>>   1570                 /* there can be 'xdata' in read response, figure
> >>>> it out */
> >>>>   1571                 xdrmem_create (&xdr, proghdr_buf,
> >>>> default_read_size,
> >>>>   1572                                XDR_DECODE);
> >>>>   1573
> >>>>   1574                 /* This will fail if there is xdata sent from
> >>>> server, if not,
> >>>>   1575                    well and good, we don't need to worry
> >>>> about  */
> >>>>   1576                 xdr_gfs3_read_rsp (&xdr, &read_rsp);
> >>>>   1577
> >>>>   1578                 free (read_rsp.xdata.xdata_val);
> >>>>   1579
> >>>>   1580                 /* need to round off to proper roof (%4), as XDR
> >>>> packing pads
> >>>>   1581                    the end of opaque object with '0' */
> >>>>   1582                 size = roof (read_rsp.xdata.xdata_len, 4);
> >>>>   1583
> >>>>   1584                 if (!size) {
> >>>>   1585
> >>>> frag->call_body.reply.accepted_success_state
> >>>>   1586                                 = SP_STATE_READ_PROC_OPAQUE;
> >>>>   1587                         goto read_proc_opaque;
> >>>>   1588                 }
> >>>>   1589
> >>>>   1590                 __socket_proto_init_pending (priv, size);
> >>>>   1591
> >>>>   1592                 frag->call_body.reply.accepted_success_state
> >>>>   1593                         = SP_STATE_READING_PROC_OPAQUE;
> >>>>   1594
> >>>>   1595         case SP_STATE_READING_PROC_OPAQUE:
> >>>>   1596                 __socket_proto_read (priv, ret);
> >>>>   1597
> >>>>   1598                 frag->call_body.reply.accepted_success_state
> >>>>   1599                         = SP_STATE_READ_PROC_OPAQUE;
> >>>>
> >>>> On line 1568 we read, at most, 116 bytes because we calculate the size
> >>>> of a read response without xdata. Then we detect that we really need
> >>>> more data for xdata (BTW, read_rsp.xdata.xdata_val will be always
> >>>> allocated even if xdr_gfs3_read_rsp() fails ?)
> >>>
> >>> No. It need not be. Its guaranteed that only on a successful
> >>> completion it is allocated. However, _if_ decoding fails only because
> >>> xdr stream doesn't include xdata bits, but xdata_len is zero (by
> >>> initializing it to default_read_size), then xdr library would've
> >>> filled read_rsp.xdata.xdata_len (read_rsp.xdata.xdata_val can still be
> >>> NULL).
> >>
> >> The question is: is it guaranteed that after an unsuccessful completion
> >> xdata_val will be NULL (i.e. not touched by the function, even if
> >> xadata_len is != 0) ? otherwise the free() could corrupt memory.
> >>
> >>>
> >>>>
> >>>> So we get into line 1596 with the pending info initialized to read the
> >>>> remaining data. This is the __socket_proto_read macro:
> >>>>
> >>>>    166 /* This will be used in a switch case and breaks from the switch
> >>>> case if all
> >>>>    167  * the pending data is not read.
> >>>>    168  */
> >>>>    169 #define __socket_proto_read(priv, ret)
> >>>>        \
> >>>>    170                 {
> >>>>        \
> >>>>    171                 size_t bytes_read = 0;
> >>>>        \
> >>>>    172                 struct gf_sock_incoming *in;
> >>>>        \
> >>>>    173                 in = &priv->incoming;
> >>>>        \
> >>>>    174
> >>>>        \
> >>>>    175                 __socket_proto_update_pending (priv);
> >>>>        \
> >>>>    176
> >>>>        \
> >>>>    177                 ret = __socket_readv (this,
> >>>>        \
> >>>>    178                                       in->pending_vector, 1,
> >>>>        \
> >>>>    179                                       &in->pending_vector,
> >>>>        \
> >>>>    180                                       &in->pending_count,
> >>>>        \
> >>>>    181                                       &bytes_read);
> >>>>        \
> >>>>    182                 if (ret == -1)
> >>>>        \
> >>>>    183                         break;
> >>>>        \
> >>>>    184                 __socket_proto_update_priv_after_read (priv,
> >>>> ret,
> >>>> bytes_read); \
> >>>>    185         }
> >>>>
> >>>> We read from the socket using __socket_readv(). It it fails, we quit.
> >>>> However if the socket doesn't have more data to read, this function
> >>>> does
> >>>> not return -1:
> >>>>
> >>>>    555                         ret = __socket_cached_read (this,
> >>>> opvector, opcount);
> >>>>    556
> >>>>    557                         if (ret == 0) {
> >>>>    558
> >>>> gf_log(this->name,GF_LOG_DEBUG,"EOF on socket");
> >>>>    559                                 errno = ENODATA;
> >>>>    560                                 ret = -1;
> >>>>    561                         }
> >>>>    562                         if (ret == -1 && errno == EAGAIN) {
> >>>>    563                                 /* done for now */
> >>>>    564                                 break;
> >>>>    565                         }
> >>>>    566                         this->total_bytes_read += ret;
> >>>>
> >>>> If __socket_cached_read() fails with errno == EAGAIN, we break and
> >>>> return opcount, which is >= 0. Causing the process to continue instead
> >>>> of waiting for more data.
> >>>
> >>> No. If you observe, there is a call to another macro
> >>> __socket_proto_update_priv_after_read at line 184. As can be seen in
> >>> the definition below, if (ret > 0) - which is the case in case of
> >>> short-read - we do break. I hope I am not missing anything here :).
> >>>
> >>> #define __socket_proto_update_priv_after_read(priv, ret,
> >>> bytes_read)    \
> >>>
> >>> {                                                               \
> >>>         struct gf_sock_incoming_frag *frag;                     \
> >>>                 frag =
> >>> &priv->incoming.frag;                            \
> >>>
> >>> \
> >>>                 frag->fragcurrent +=
> >>> bytes_read;                        \
> >>>                 frag->bytes_read +=
> >>> bytes_read;                         \
> >>>
> >>> \
> >>>                 if ((ret > 0) || (frag->remaining_size != 0))
> >>> {         \
> >>>             if (frag->remaining_size != 0 && ret == 0) {    \
> >>>                 __socket_proto_reset_pending (priv);    \
> >>>
> >>> }                                               \
> >>>
> >>> \
> >>>                         gf_log (this->name,
> >>> GF_LOG_TRACE,               \
> >>>                                 "partial read on non-blocking
> >>> socket"); \
> >>>
> >>> \
> >>>
> >>> break;                                          \
> >>>
> >>> }                                                       \
> >>>         }
> >>>
> >>
> >> I didn't see this. However the fact is that doing the change I've
> >> described, it seems to work.
> >>
> >> I'll analyze it a bit more.
> >>
> >> Xavi
> >>
> >>>>
> >>>> As a side note, there's another problem here: if errno is not EAGAIN,
> >>>> we'll update this->total_bytes_read substracting one. This shouldn't be
> >>>> done when ret < 0.
> >>>>
> >>>> There are other places where ret is set to -1, but opcount is returned.
> >>>> I guess that we should also set opcount = -1 on these places, but I
> >>>> don't have a deep knowledge about this implementation.
> >>>>
> >>>> I've done a quick test checking for (ret != 0) instead of (ret ==
> >>>> -1) in
> >>>> __socket_proto_read() and it seemed to work.
> >>>>
> >>>> Could anyone with more knowledge about the communications layer verify
> >>>> this and explain what would be the best solution ?
> >>>>
> >>>> Xavi
> >>>>
> >>>> On 29/04/16 14:52, Xavier Hernandez wrote:
> >>>>> With your patch applied, it seems that the bug is not hit.
> >>>>>
> >>>>> I guess it's a timing issue that the new logging hides. Maybe no more
> >>>>> data available after reading the partial readv header ? (it will
> >>>>> arrive
> >>>>> later).
> >>>>>
> >>>>> I'll continue testing...
> >>>>>
> >>>>> Xavi
> >>>>>
> >>>>> On 29/04/16 13:48, Raghavendra Gowdappa wrote:
> >>>>>> Attaching the patch.
> >>>>>>
> >>>>>> ----- Original Message -----
> >>>>>>> From: "Raghavendra Gowdappa" <rgowdapp at redhat.com>
> >>>>>>> To: "Xavier Hernandez" <xhernandez at datalab.es>
> >>>>>>> Cc: "Gluster Devel" <gluster-devel at gluster.org>
> >>>>>>> Sent: Friday, April 29, 2016 5:14:02 PM
> >>>>>>> Subject: Re: [Gluster-devel] Possible bug in the communications
> >>>>>>> layer ?
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> ----- Original Message -----
> >>>>>>>> From: "Xavier Hernandez" <xhernandez at datalab.es>
> >>>>>>>> To: "Raghavendra Gowdappa" <rgowdapp at redhat.com>
> >>>>>>>> Cc: "Gluster Devel" <gluster-devel at gluster.org>
> >>>>>>>> Sent: Friday, April 29, 2016 1:21:57 PM
> >>>>>>>> Subject: Re: [Gluster-devel] Possible bug in the communications
> >>>>>>>> layer ?
> >>>>>>>>
> >>>>>>>> Hi Raghavendra,
> >>>>>>>>
> >>>>>>>> yes, the readv response contains xdata. The dict length is 38
> >>>>>>>> (0x26)
> >>>>>>>> and, at the moment of failure, rsp.xdata.xdata_len already contains
> >>>>>>>> 0x26.
> >>>>>>>
> >>>>>>> rsp.xdata.xdata_len having 0x26 even when decoding failed indicates
> >>>>>>> that the
> >>>>>>> approach used in socket.c to get the length of xdata is correct.
> >>>>>>> However, I
> >>>>>>> cannot find any other way of xdata going into payload vector other
> >>>>>>> than
> >>>>>>> xdata_len being zero. Just to be double sure, I've a patch
> >>>>>>> containing
> >>>>>>> debug
> >>>>>>> message printing xdata_len when decoding fails in socket.c. Can you
> >>>>>>> please
> >>>>>>> apply the patch, run the tests and revert back with results?
> >>>>>>>
> >>>>>>>>
> >>>>>>>> Xavi
> >>>>>>>>
> >>>>>>>> On 29/04/16 09:10, Raghavendra Gowdappa wrote:
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> ----- Original Message -----
> >>>>>>>>>> From: "Raghavendra Gowdappa" <rgowdapp at redhat.com>
> >>>>>>>>>> To: "Xavier Hernandez" <xhernandez at datalab.es>
> >>>>>>>>>> Cc: "Gluster Devel" <gluster-devel at gluster.org>
> >>>>>>>>>> Sent: Friday, April 29, 2016 12:36:43 PM
> >>>>>>>>>> Subject: Re: [Gluster-devel] Possible bug in the communications
> >>>>>>>>>> layer ?
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> ----- Original Message -----
> >>>>>>>>>>> From: "Raghavendra Gowdappa" <rgowdapp at redhat.com>
> >>>>>>>>>>> To: "Xavier Hernandez" <xhernandez at datalab.es>
> >>>>>>>>>>> Cc: "Jeff Darcy" <jdarcy at redhat.com>, "Gluster Devel"
> >>>>>>>>>>> <gluster-devel at gluster.org>
> >>>>>>>>>>> Sent: Friday, April 29, 2016 12:07:59 PM
> >>>>>>>>>>> Subject: Re: [Gluster-devel] Possible bug in the communications
> >>>>>>>>>>> layer ?
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> ----- Original Message -----
> >>>>>>>>>>>> From: "Xavier Hernandez" <xhernandez at datalab.es>
> >>>>>>>>>>>> To: "Jeff Darcy" <jdarcy at redhat.com>
> >>>>>>>>>>>> Cc: "Gluster Devel" <gluster-devel at gluster.org>
> >>>>>>>>>>>> Sent: Thursday, April 28, 2016 8:15:36 PM
> >>>>>>>>>>>> Subject: Re: [Gluster-devel] Possible bug in the communications
> >>>>>>>>>>>> layer
> >>>>>>>>>>>> ?
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> Hi Jeff,
> >>>>>>>>>>>>
> >>>>>>>>>>>> On 28.04.2016 15:20, Jeff Darcy wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> This happens with Gluster 3.7.11 accessed through Ganesha and
> >>>>>>>>>>>> gfapi.
> >>>>>>>>>>>> The
> >>>>>>>>>>>> volume is a distributed-disperse 4*(4+2). I'm able to
> >>>>>>>>>>>> reproduce the
> >>>>>>>>>>>> problem
> >>>>>>>>>>>> easily doing the following test: iozone -t2 -s10g -r1024k
> >>>>>>>>>>>> -i0 -w
> >>>>>>>>>>>> -F/iozone{1..2}.dat echo 3 >/proc/sys/vm/drop_caches iozone -t2
> >>>>>>>>>>>> -s10g
> >>>>>>>>>>>> -r1024k -i1 -w -F/iozone{1..2}.dat The error happens soon after
> >>>>>>>>>>>> starting
> >>>>>>>>>>>> the
> >>>>>>>>>>>> read test. As can be seen in the data below,
> >>>>>>>>>>>> client3_3_readv_cbk() is
> >>>>>>>>>>>> processing an iovec of 116 bytes, however it should be of 154
> >>>>>>>>>>>> bytes
> >>>>>>>>>>>> (the
> >>>>>>>>>>>> buffer in memory really seems to contain 154 bytes). The
> >>>>>>>>>>>> data on
> >>>>>>>>>>>> the
> >>>>>>>>>>>> network
> >>>>>>>>>>>> seems ok (at least I haven't been able to identify any
> >>>>>>>>>>>> problem), so
> >>>>>>>>>>>> this
> >>>>>>>>>>>> must be a processing error on the client side. The last
> >>>>>>>>>>>> field in
> >>>>>>>>>>>> cut
> >>>>>>>>>>>> buffer
> >>>>>>>>>>>> of the sequentialized data corresponds to the length of the
> >>>>>>>>>>>> xdata
> >>>>>>>>>>>> field:
> >>>>>>>>>>>> 0x26. So at least 38 more byte should be present.
> >>>>>>>>>>>> Nice detective work, Xavi.  It would be *very* interesting to
> >>>>>>>>>>>> see what
> >>>>>>>>>>>> the value of the "count" parameter is (it's unfortunately
> >>>>>>>>>>>> optimized
> >>>>>>>>>>>> out).
> >>>>>>>>>>>> I'll bet it's two, and iov[1].iov_len is 38.  I have a weak
> >>>>>>>>>>>> memory of
> >>>>>>>>>>>> some problems with how this iov is put together, a couple of
> >>>>>>>>>>>> years
> >>>>>>>>>>>> ago,
> >>>>>>>>>>>> and it looks like you might have tripped over one more.
> >>>>>>>>>>>> It seems you are right. The count is 2 and the first 38
> >>>>>>>>>>>> bytes of
> >>>>>>>>>>>> the
> >>>>>>>>>>>> second
> >>>>>>>>>>>> vector contains the remaining data of xdata field.
> >>>>>>>>>>>
> >>>>>>>>>>> This is the bug. client3_3_readv_cbk (and for that matter all
> >>>>>>>>>>> the
> >>>>>>>>>>> actors/cbks) expects response in utmost two vectors:
> >>>>>>>>>>> 1. Program header containing request or response. This is
> >>>>>>>>>>> subjected to
> >>>>>>>>>>> decoding/encoding. This vector should point to a buffer that
> >>>>>>>>>>> contains
> >>>>>>>>>>> the
> >>>>>>>>>>> entire program header/response contiguously.
> >>>>>>>>>>> 2. If the procedure returns payload (like readv response or a
> >>>>>>>>>>> write
> >>>>>>>>>>> request),
> >>>>>>>>>>> second vector contains the buffer pointing to the entire
> >>>>>>>>>>> (contiguous)
> >>>>>>>>>>> payload. Note that this payload is raw and is not subjected to
> >>>>>>>>>>> encoding/decoding.
> >>>>>>>>>>>
> >>>>>>>>>>> In your case, this _clean_ separation is broken with part of
> >>>>>>>>>>> program
> >>>>>>>>>>> header
> >>>>>>>>>>> slipping into 2nd vector supposed to contain read data (may be
> >>>>>>>>>>> because
> >>>>>>>>>>> of
> >>>>>>>>>>> rpc fragmentation). I think this is a bug in socket layer. I'll
> >>>>>>>>>>> update
> >>>>>>>>>>> more
> >>>>>>>>>>> on this.
> >>>>>>>>>>
> >>>>>>>>>> Does your read response include xdata too? I think the code
> >>>>>>>>>> related to
> >>>>>>>>>> reading xdata in readv response is a bit murky.
> >>>>>>>>>>
> >>>>>>>>>> <socket.c/__socket_read_accepted_successful_reply>
> >>>>>>>>>>
> >>>>>>>>>>         case SP_STATE_ACCEPTED_SUCCESS_REPLY_INIT:
> >>>>>>>>>>                 default_read_size = xdr_sizeof ((xdrproc_t)
> >>>>>>>>>>                 xdr_gfs3_read_rsp,
> >>>>>>>>>>                                                 &read_rsp);
> >>>>>>>>>>
> >>>>>>>>>>                 proghdr_buf = frag->fragcurrent;
> >>>>>>>>>>
> >>>>>>>>>>                 __socket_proto_init_pending (priv,
> >>>>>>>>>> default_read_size);
> >>>>>>>>>>
> >>>>>>>>>>                 frag->call_body.reply.accepted_success_state
> >>>>>>>>>>                         = SP_STATE_READING_PROC_HEADER;
> >>>>>>>>>>
> >>>>>>>>>>                 /* fall through */
> >>>>>>>>>>
> >>>>>>>>>>         case SP_STATE_READING_PROC_HEADER:
> >>>>>>>>>>                 __socket_proto_read (priv, ret);
> >>>>>>>>>>
> >>>>>>>>>>>>>>> By this time we've read read response _minus_ the xdata
> >>>>>>>>>
> >>>>>>>>> I meant we have read "readv response header"
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>                 /* there can be 'xdata' in read response, figure
> >>>>>>>>>> it out
> >>>>>>>>>>                 */
> >>>>>>>>>>                 xdrmem_create (&xdr, proghdr_buf,
> >>>>>>>>>> default_read_size,
> >>>>>>>>>>                                XDR_DECODE);
> >>>>>>>>>>
> >>>>>>>>>>>>>>>> We created xdr stream above with "default_read_size" (this
> >>>>>>>>>>>>>>>> doesn't
> >>>>>>>>>>>>>>>> include xdata)
> >>>>>>>>>>
> >>>>>>>>>>                 /* This will fail if there is xdata sent from
> >>>>>>>>>> server, if
> >>>>>>>>>>                 not,
> >>>>>>>>>>                    well and good, we don't need to worry
> >>>>>>>>>> about  */
> >>>>>>>>>>
> >>>>>>>>>>>>>>>> what if xdata is present and decoding failed (as length
> >>>>>>>>>>>>>>>> of xdr
> >>>>>>>>>>>>>>>> stream
> >>>>>>>>>>>>>>>> above - default_read_size - doesn't include xdata)?
> >>>>>>>>>>>>>>>> would we
> >>>>>>>>>>>>>>>> have
> >>>>>>>>>>>>>>>> a
> >>>>>>>>>>>>>>>> valid value in read_rsp.xdata.xdata_len? This is the part
> >>>>>>>>>>>>>>>> I am
> >>>>>>>>>>>>>>>> confused about. If read_rsp.xdata.xdata_len is not
> >>>>>>>>>>>>>>>> correct then
> >>>>>>>>>>>>>>>> there
> >>>>>>>>>>>>>>>> is a possibility that xdata might not be entirely
> >>>>>>>>>>>>>>>> present in
> >>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>> vector socket passes to higher layers as progheader (with
> >>>>>>>>>>>>>>>> part or
> >>>>>>>>>>>>>>>> entire xdata spilling over to payload vector).
> >>>>>>>>>>
> >>>>>>>>>>                 xdr_gfs3_read_rsp (&xdr, &read_rsp);
> >>>>>>>>>>
> >>>>>>>>>>                 free (read_rsp.xdata.xdata_val);
> >>>>>>>>>>
> >>>>>>>>>>                 /* need to round off to proper roof (%4), as XDR
> >>>>>>>>>> packing
> >>>>>>>>>>                 pads
> >>>>>>>>>>                    the end of opaque object with '0' */
> >>>>>>>>>>                 size = roof (read_rsp.xdata.xdata_len, 4);
> >>>>>>>>>>
> >>>>>>>>>>                 if (!size) {
> >>>>>>>>>>
> >>>>>>>>>> frag->call_body.reply.accepted_success_state
> >>>>>>>>>>                                 = SP_STATE_READ_PROC_OPAQUE;
> >>>>>>>>>>                         goto read_proc_opaque;
> >>>>>>>>>>                 }
> >>>>>>>>>>
> >>>>>>>>>> </socket.c>
> >>>>>>>>>>
> >>>>>>>>>> Can you please confirm whether there was an xdata in the readv
> >>>>>>>>>> response
> >>>>>>>>>> (may
> >>>>>>>>>> be by looking in bricks) whose decoding failed?
> >>>>>>>>>>
> >>>>>>>>>> regards,
> >>>>>>>>>> Raghavendra
> >>>>>>>>>> _______________________________________________
> >>>>>>>>>> Gluster-devel mailing list
> >>>>>>>>>> Gluster-devel at gluster.org
> >>>>>>>>>> http://www.gluster.org/mailman/listinfo/gluster-devel
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>> _______________________________________________
> >>>>> Gluster-devel mailing list
> >>>>> Gluster-devel at gluster.org
> >>>>> http://www.gluster.org/mailman/listinfo/gluster-devel
> >>>>
> >> _______________________________________________
> >> Gluster-devel mailing list
> >> Gluster-devel at gluster.org
> >> http://www.gluster.org/mailman/listinfo/gluster-devel
> > _______________________________________________
> > Gluster-devel mailing list
> > Gluster-devel at gluster.org
> > http://www.gluster.org/mailman/listinfo/gluster-devel
>