[Gluster-devel] compound fop design first cut

Soumya Koduri skoduri at redhat.com
Wed Dec 9 07:37:11 UTC 2015

On 12/09/2015 11:44 AM, Pranith Kumar Karampuri wrote:
> On 12/09/2015 06:37 AM, Vijay Bellur wrote:
>> On 12/08/2015 03:45 PM, Jeff Darcy wrote:
>>> On December 8, 2015 at 12:53:04 PM, Ira Cooper (ira at redhat.com) wrote:
>>>> Raghavendra Gowdappa writes:
>>>> I propose that we define a "compound op" that contains ops.
>>>> Within each op, there are fields that can be "inherited" from the
>>>> previous op, via use of a sentinel value.
>>>> Sentinel is -1, for all of these examples.
>>>> So:
>>>> LOOKUP (1, "foo") (Sets the gfid value to be picked up by
>>>> compounding, 1
>>>> is the root directory, as a gfid, by convention.)
>>>> OPEN(-1, O_RDWR) (Uses the gfid value, sets the glfd compound value.)
>>>> WRITE(-1, "foo", 3) (Uses the glfd compound value.)
>>>> CLOSE(-1) (Uses the glfd compound value)
>>> So, basically, what the programming-language types would call futures
>>> and promises.  It’s a good and well studied concept, which is necessary
>>> to solve the second-order problem of how to specify an argument in
>>> sub-operation N+1 that’s not known until sub-operation N completes.
>>> To be honest, some of the highly general approaches suggested here scare
>>> me too.  Wrapping up the arguments for one sub-operation in xdata for
>>> another would get pretty hairy if we ever try to go beyond two
>>> sub-operations and have to nest sub-operation #3’s args within
>>> sub-operation #2’s xdata which is itself encoded within sub-operation
>>> #1’s xdata.  There’s also not much clarity about how to handle errors in
>>> that model.  Encoding N sub-operations’ arguments in a linear structure
>>> as Shyam proposes seems a bit cleaner that way.  If I were to continue
>>> down that route I’d suggest just having start_compound and end-compound
>>> fops, plus an extra field (or by-convention xdata key) that either the
>>> client-side or server-side translator could use to build whatever
>>> structure it wants and schedule sub-operations however it wants.
>>> However, I’d be even more comfortable with an even simpler approach that
>>> avoids the need to solve what the database folks (who have dealt with
>>> complex transactions for years) would tell us is a really hard problem.
>>> Instead of designing for every case we can imagine, let’s design for the
>>> cases that we know would be useful for improving performance. Open plus
>>> read/write plus close is an obvious one.  Raghavendra mentions
>>> create+inodelk as well.  For each of those, we can easily define a
>>> structure that contains the necessary fields, we don’t need a
>>> client-side translator, and the server-side translator can take care of
>>> “forwarding” results from one sub-operation to the next.  We could even
>>> use GF_FOP_IPC to prototype this.  If we later find that the number of
>>> “one-off” compound requests is growing too large, then at least we’ll
>>> have some experience to guide our design of a more general alternative.
>>> Right now, I think we’re trying to look further ahead than we can see
>>> clearly.
> Yes Agree. This makes implementation on the client side simpler as well.
> So it is welcome.
> Just updating the solution.
> 1) New RPCs are going to be implemented.
> 2) client stack will use these new fops.
> 3) On the server side we have server xlator implementing these new fops
> to decode the RPC request then resolve_resume and
> compound-op-receiver(Better name for this is welcome) which sends one op
> after other and send compound fop response.
> List of compound fops identified so far:
> Swift/S3:
> PUT: creat(), write()s, setxattr(), fsync(), close(), rename()
> Dht:
> mkdir + inodelk
> Afr:
> xattrop+writev, xattrop+unlock to begin with.
> Could everyone who needs compound fops add to this list?
> I see that Niels is back on 14th. Does anyone else know the list of
> compound fops he has in mind?
 From the discussions we had with Niels regarding the kerberos support 
on GlusterFS, I think below are the set of compound fops which are required.

set_uid +
set_gid +
set_lkowner (or kerberos principal name) +

Also gfapi does lookup (first time/to refresh inode) before performing 
actual fops most of the times. It may really help if we can club such fops -


Coming to the design proposed, I agree with Shyam, Ira and Jeff's 
thoughts. Defining different compound fops for each specific set of 
operations and wrapping up those arguments in xdata seem rather complex 
and difficult to maintain going further. Having being worked with NFS, 
may I suggest why not we follow (or in similar lines)  the approach 
being taken by NFS protocol to define and implement compound procedures.

    The basic structure of the NFS COMPOUND procedure is:

    | tag | minorversion | numops | op + args | op + args | op + args |

    and the reply's structure is:

       |last status | tag | numres | status + op + results |

Each compound procedure will contain the number of operations followed 
by the list of 'op_code+arguments_for_that_fop'

So on similar lines, we just need to define new RPC structure for 
COMPOUND fops (something like below) and xdr encode/decode of each of 
the ops based on the op number.

struct argop {
	     uint32_t	op_num;
	     union argop switch (op_num) {
              case <OPCODE>: <argument>;

      struct COMPOUNDargs {
              uint32_t    version;
	     uint32_t	 numops;
              argop      argarray<>;


      union resop switch (opnum resop){
              case <OPCODE>: <result>;

      struct COMPOUND4res {
              uint32_t        status;
              resop	     resarray<>;

The xlator which would like to club fops can define this new COMPOUND 
fop with the list of operations. For eg., AFR can construct this 
compound fop as

compound_fop (struct COMPOUNDargs c_args);

c_args.version =1
c_args.numops = 2
c_args.argarray[0].op_args = fxattr_op_args;
c_args.argarray[0].op_args = writev_op_args;

On the server-side , the new compound xlator on receiving this compound 
fop can split the fops and execute one by one as already mentioned by you.

Any thoughts?


> Pranith.
>> Starting with a well defined set of operations for compounding has its
>> advantages. It would be easier to understand and maintain correctness
>> across the stack. Some of our translators perform transactions &
>> create/update internal metadata for certain fops. It would be easier
>> for such translators if the compound operations are well defined and
>> does not entail deep introspection of a generic representation to
>> ensure that the right behavior gets reflected at the end of a compound
>> operation.
>> -Vijay
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel

More information about the Gluster-devel mailing list