[Gluster-devel] gluster doesn't like Oracle's FSINFO RPC call

Niels de Vos ndevos at redhat.com
Fri Apr 12 22:50:33 UTC 2013


On Fri, Apr 12, 2013 at 03:58:04PM -0400, Michael Brown wrote:
> KERBOOM
> 
> [michael at fleming1 ~]$ sudo mount -a -t nfs
> [sudo] password for michael:
> mount: fearless1:/gv0 failed, reason given by server: No such file or
> directory
> mount: fearless1:/gv0/fleming1/db0/ALTUS_config failed, reason given by
> server: unknown nfs status return value: 22
> mount: fearless1:/gv0/fleming1/db0/ALTUS_data failed, reason given by
> server: unknown nfs status return value: 22
> mount: fearless1:/gv0/fleming1/db0/ALTUS_flash failed, reason given by
> server: unknown nfs status return value: 22
> mount.nfs: mount point /db/flash_recovery_area/ALTUS/onlinelog does not
> exist
> 
> nfs.log:
> [2013-04-12 15:55:16.507084] E [nfs3.c:305:__nfs3_get_volume_id]
> (-->/usr/lib64/glusterfs/3.3.1/xlator/nfs/server.so(nfs3_fsinfo+0x22c)
> [0x7f45bfbb852c]
> (-->/usr/lib64/glusterfs/3.3.1/xlator/nfs/server.so(nfs3_fsinfo_reply+0x29)
> [0x7f45bfbb2ce9]
> (-->/usr/lib64/glusterfs/3.3.1/xlator/nfs/server.so(nfs3_request_xlator_deviceid+0x51)
> [0x7f45bfbb2481]))) 0-nfs-nfsv3: invalid argument: xl
> [2013-04-12 15:55:16.538560] E [nfs3.c:4706:nfs3_fsinfo] 0-nfs-nfsv3:
> Bad Handle
> [2013-04-12 15:55:16.538580] W [nfs3-helpers.c:3389:nfs3_log_common_res]
> 0-nfs-nfsv3: XID: 242c1550, FSINFO: NFS: 10001(Illegal NFS file handle),
> POSIX: 14(Bad address)
> [2013-04-12 15:55:16.538617] E [nfs3.c:305:__nfs3_get_volume_id]
> (-->/usr/lib64/glusterfs/3.3.1/xlator/nfs/server.so(nfs3_fsinfo+0x22c)
> [0x7f45bfbb852c]
> (-->/usr/lib64/glusterfs/3.3.1/xlator/nfs/server.so(nfs3_fsinfo_reply+0x29)
> [0x7f45bfbb2ce9]
> (-->/usr/lib64/glusterfs/3.3.1/xlator/nfs/server.so(nfs3_request_xlator_deviceid+0x51)
> [0x7f45bfbb2481]))) 0-nfs-nfsv3: invalid argument: xl
> 
> (I tried both with and without modifying your uint32_t size to a
> 'int32_t size' to correct the signedness of the argument)
> 
> Get ahold of me in IRC and let's get this figured out. I've got a
> debugger attached.

23:51 < ndevos> Supermathie: ah, I've thought of the error in my 
   suggestion - that function is used to encode and decode
23:52 < ndevos> which means, that the size parameter must be set 
   correctly - the .data_len attribute contain the size when encoding, 
   and should be overwritten when decoding
23:53 < ndevos> KERBOOM happens when an idea is only half looked at :-/

Maybe something the attached patch works better? It should encode/decode 
both the length and the fhandle value. Compile tested only.

Niels

> 
> M.
> 
> On 13-04-12 11:32 AM, Niels de Vos wrote:
> > On Fri, Apr 12, 2013 at 05:23:08PM +0200, Niels de Vos wrote:
> >> On Thu, Apr 11, 2013 at 12:37:30PM -0400, Michael Brown wrote:
> >>> That actually broke everything (including Linux trying to mount NFS).
> >>>
> >>> I've modified it slightly to be:
> >>>
> >>> bool_t
> >>> xdr_nfs_fh3 (XDR *xdrs, nfs_fh3 *objp)
> >>> {
> >>>         if (!xdr_bytes (xdrs, (char **)&objp->data.data_val, (u_int *)
> >>> &objp->data.data_len, NFS3_FHSIZE))
> >>>                 if (!xdr_opaque (xdrs, &objp, (u_int *)
> >>> &objp->data.data_len))
> >>>                         return FALSE;
> >>>         return TRUE;
> >>> }
> >>>
> >>> (i.e. only call the xdr_opaque function if the xdr_bytes decode fails)
> >> Nah, that won't work. The xdr_* functions are modifying the position of 
> >> the cursor in the XDR-stream. Subsequent reads will continue where the 
> >> previous one finished.
> >>
> >> What you probably need to do is something like this:
> >>
> >> xdr_nfs_fh3 (XDR *xdrs, nfs_fh3 *objp)
> >> {
> >> 	uint32_t size;
> >>
> >> 	if (!xdr_int (xdrs, &size))
> >> 		if (!xdr_opaque (xdrs, (u_int *)&objp->data.data_len, size))
> > ^ that should be objp->data.data_val of course :-/
> >
> >> 			return FALSE
> >> 	return TRUE;
> >> }
> >>
> >> That will read the size of the fhandle first, to determine how long the opaque 
> >> fhandle is, and use that size to read it.
> >>
> >> Cheers,
> >> Niels
> >>
> >>> But I get no change in behaviour.
> >>>
> >>> Also get these warnings:
> >>>
> >>> xdr-nfs3.c: In function 'xdr_nfs_fh3':
> >>> xdr-nfs3.c:197: warning: passing argument 2 of 'xdr_opaque' from
> >>> incompatible pointer type
> >>> /usr/include/rpc/xdr.h:313: note: expected 'caddr_t' but argument is of
> >>> type 'struct nfs_fh3 **'
> >>> xdr-nfs3.c:197: warning: passing argument 3 of 'xdr_opaque' makes
> >>> integer from pointer without a cast
> >>> /usr/include/rpc/xdr.h:313: note: expected 'u_int' but argument is of
> >>> type 'u_int *'
> >>>
> >>> M.
> >>>
> >>> On 13-04-11 07:42 AM, Niels de Vos wrote:
> >>>> My guess is that this (untested) change would fix it, can you try that?
> >>>>
> >>>> --- a/rpc/xdr/src/xdr-nfs3.c
> >>>> +++ b/rpc/xdr/src/xdr-nfs3.c
> >>>> @@ -184,7 +184,7 @@ xdr_specdata3 (XDR *xdrs, specdata3 *objp)
> >>>>  bool_t
> >>>>  xdr_nfs_fh3 (XDR *xdrs, nfs_fh3 *objp)
> >>>>  {
> >>>> -	 if (!xdr_bytes (xdrs, (char **)&objp->data.data_val, (u_int *) &objp->data.data_len, NFS3_FHSIZE))
> >>>> +	 if (!xdr_opaque (xdrs, &objp, (u_int *) &objp->data.data_len))
> >>>>  		 return FALSE;
> >>>>  	return TRUE;
> >>>>  }
> >>>>
> >>>>
> >>>> HTH,
> >>>> Niels
> >>>>
> >>>>> All I get out of gluster is:
> >>>>> [2013-04-08 12:54:32.206312] E [nfs3.c:4741:nfs3svc_fsinfo] 0-nfs-nfsv3:
> >>>>> Error decoding arguments
> >>>>>
> >>>>>
> >>>>> I've attached abridged packet captures and text explanations of the
> >>>>> packets (thanks to wireshark).
> >>>>>
> >>>>> Can someone please look at this and determine if it's gluster's parsing
> >>>>> of the RPC call to blame, or if it's Oracle?
> >>>>>
> >>>>> This is the same setup on which I reported the NFS race condition bug.
> >>>>> It does have that patch applied.
> >>>>> Details:
> >>>>> http://lists.gnu.org/archive/html/gluster-devel/2013-04/msg00014.html
> >>>>>
> >>>>> Thanks,
> >>>>>
> >>>>> Michael
> >>>>>
> >>>>> -- 
> >>>>> Michael Brown               | `One of the main causes of the fall of
> >>>>> Systems Consultant          | the Roman Empire was that, lacking zero,
> >>>>> Net Direct Inc.             | they had no way to indicate successful
> >>>>> ?: +1 519 883 1172 x5106    | termination of their C programs.' - Firth
> >>>>>
> >>>>
> >>>>
> >>>>
> >>>>> _______________________________________________
> >>>>> Gluster-devel mailing list
> >>>>> Gluster-devel at nongnu.org
> >>>>> https://lists.nongnu.org/mailman/listinfo/gluster-devel
> >>>
> >>> -- 
> >>> Michael Brown               | `One of the main causes of the fall of
> >>> Systems Consultant          | the Roman Empire was that, lacking zero,
> >>> Net Direct Inc.             | they had no way to indicate successful
> >>> ☎: +1 519 883 1172 x5106    | termination of their C programs.' - Firth
> >>>
> >> -- 
> >> Niels de Vos
> >> Sr. Software Maintenance Engineer
> >> Support Engineering Group
> >> Red Hat Global Support Services
> >>
> >> _______________________________________________
> >> Gluster-devel mailing list
> >> Gluster-devel at nongnu.org
> >> https://lists.nongnu.org/mailman/listinfo/gluster-devel
> 
> 
> -- 
> Michael Brown               | `One of the main causes of the fall of
> Systems Consultant          | the Roman Empire was that, lacking zero,
> Net Direct Inc.             | they had no way to indicate successful
> ☎: +1 519 883 1172 x5106    | termination of their C programs.' - Firth
> 

-- 
Niels de Vos
Sr. Software Maintenance Engineer
Support Engineering Group
Red Hat Global Support Services
-------------- next part --------------
>From 2f7f6b952ed89f5cf8181db351e1965d8400f493 Mon Sep 17 00:00:00 2001
From: Niels de Vos <ndevos at redhat.com>
Date: Sat, 13 Apr 2013 00:41:43 +0200
Subject: [PATCH] nfs: encode/decode fhandles as opaque and not as bytes

At least one client (Oracle DNFS) does not pass an XDR roundup'd byte
array a fhandle on FSINFO.

XDR (http://tools.ietf.org/html/rfc4506, the encoding used for the RPC
protocol) uses 'blocks' for alignment. A fhandle byte array that is
34-bytes long, needs to be (34 / 4 + 1)*4 = 36 bytes in size. The
'length' given in the structure tells the consumer to ignore the two
tailing bytes.

The NFSv3 specification (http://tools.ietf.org/html/rfc1813#page-21)
defines the nfs_fh3 as a opaque (not bytes) structure.

BUG: 950121
Change-Id: Id723a38ef0ec6e7f1d9f29683321ea32e00503c7
Reported-by: Michael Brown <michael at supermathie.net>
Signed-off-by: Niels de Vos <ndevos at redhat.com>
---
 rpc/xdr/src/xdr-nfs3.c |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/rpc/xdr/src/xdr-nfs3.c b/rpc/xdr/src/xdr-nfs3.c
index a497e9f..39dbf5c 100644
--- a/rpc/xdr/src/xdr-nfs3.c
+++ b/rpc/xdr/src/xdr-nfs3.c
@@ -184,7 +184,9 @@ xdr_specdata3 (XDR *xdrs, specdata3 *objp)
 bool_t
 xdr_nfs_fh3 (XDR *xdrs, nfs_fh3 *objp)
 {
-	 if (!xdr_bytes (xdrs, (char **)&objp->data.data_val, (u_int *) &objp->data.data_len, NFS3_FHSIZE))
+	 if (!xdr_uint32 (xdrs, (u_int *) &objp->data.data_len))
+		 return FALSE;
+	 if (!xdr_opaque (xdrs, (char *) &objp->data.data_val, (u_int) objp->data.data_len))
 		 return FALSE;
 	return TRUE;
 }
-- 
1.7.1



More information about the Gluster-devel mailing list