[Gluster-devel] compound fop design first cut

Wed Dec 9 14:41:03 UTC 2015

On 12/09/2015 02:37 AM, Soumya Koduri wrote:
>
>
> On 12/09/2015 11:44 AM, Pranith Kumar Karampuri wrote:
>>
>>
>> On 12/09/2015 06:37 AM, Vijay Bellur wrote:
>>> On 12/08/2015 03:45 PM, Jeff Darcy wrote:
>>>>
>>>>
>>>>
>>>> On December 8, 2015 at 12:53:04 PM, Ira Cooper (ira at redhat.com) wrote:
>>>>> Raghavendra Gowdappa writes:
>>>>> I propose that we define a "compound op" that contains ops.
>>>>>
>>>>> Within each op, there are fields that can be "inherited" from the
>>>>> previous op, via use of a sentinel value.
>>>>>
>>>>> Sentinel is -1, for all of these examples.
>>>>>
>>>>> So:
>>>>>
>>>>> LOOKUP (1, "foo") (Sets the gfid value to be picked up by
>>>>> compounding, 1
>>>>> is the root directory, as a gfid, by convention.)
>>>>> OPEN(-1, O_RDWR) (Uses the gfid value, sets the glfd compound value.)
>>>>> WRITE(-1, "foo", 3) (Uses the glfd compound value.)
>>>>> CLOSE(-1) (Uses the glfd compound value)
>>>>
>>>> So, basically, what the programming-language types would call futures
>>>> and promises.  It’s a good and well studied concept, which is necessary
>>>> to solve the second-order problem of how to specify an argument in
>>>> sub-operation N+1 that’s not known until sub-operation N completes.
>>>>
>>>> To be honest, some of the highly general approaches suggested here
>>>> scare
>>>> me too.  Wrapping up the arguments for one sub-operation in xdata for
>>>> another would get pretty hairy if we ever try to go beyond two
>>>> sub-operations and have to nest sub-operation #3’s args within
>>>> sub-operation #2’s xdata which is itself encoded within sub-operation
>>>> #1’s xdata.  There’s also not much clarity about how to handle
>>>> errors in
>>>> that model.  Encoding N sub-operations’ arguments in a linear structure
>>>> as Shyam proposes seems a bit cleaner that way.  If I were to continue
>>>> down that route I’d suggest just having start_compound and end-compound
>>>> fops, plus an extra field (or by-convention xdata key) that either the
>>>> client-side or server-side translator could use to build whatever
>>>> structure it wants and schedule sub-operations however it wants.
>>>>
>>>> However, I’d be even more comfortable with an even simpler approach
>>>> that
>>>> avoids the need to solve what the database folks (who have dealt with
>>>> complex transactions for years) would tell us is a really hard problem.
>>>> Instead of designing for every case we can imagine, let’s design for
>>>> the
>>>> cases that we know would be useful for improving performance. Open plus
>>>> read/write plus close is an obvious one.  Raghavendra mentions
>>>> create+inodelk as well.  For each of those, we can easily define a
>>>> structure that contains the necessary fields, we don’t need a
>>>> client-side translator, and the server-side translator can take care of
>>>> “forwarding” results from one sub-operation to the next.  We could even
>>>> use GF_FOP_IPC to prototype this.  If we later find that the number of
>>>> “one-off” compound requests is growing too large, then at least we’ll
>>>> have some experience to guide our design of a more general alternative.
>>>> Right now, I think we’re trying to look further ahead than we can see
>>>> clearly.
>> Yes Agree. This makes implementation on the client side simpler as well.
>> So it is welcome.
>>
>> Just updating the solution.
>> 1) New RPCs are going to be implemented.
>> 2) client stack will use these new fops.
>> 3) On the server side we have server xlator implementing these new fops
>> to decode the RPC request then resolve_resume and
>> compound-op-receiver(Better name for this is welcome) which sends one op
>> after other and send compound fop response.

@Pranith, I assume you would expand on this at a later date (something 
along the lines of what Soumya has done below, right?

>>
>> List of compound fops identified so far:
>> Swift/S3:
>> PUT: creat(), write()s, setxattr(), fsync(), close(), rename()
>>
>> Dht:
>> mkdir + inodelk
>>
>> Afr:
>> xattrop+writev, xattrop+unlock to begin with.
>>
>> Could everyone who needs compound fops add to this list?
>>
>> I see that Niels is back on 14th. Does anyone else know the list of
>> compound fops he has in mind?
>>
>  From the discussions we had with Niels regarding the kerberos support
> on GlusterFS, I think below are the set of compound fops which are
> required.
>
> set_uid +
> set_gid +
> set_lkowner (or kerberos principal name) +
> actual_fop
>
> Also gfapi does lookup (first time/to refresh inode) before performing
> actual fops most of the times. It may really help if we can club such
> fops -

@Soumya +5 (just a random number :) )

This came to my mind as well, and is a good candidate for compounding.

>
> LOOKUP + FOP (OPEN etc)
>
> Coming to the design proposed, I agree with Shyam, Ira and Jeff's
> thoughts. Defining different compound fops for each specific set of
> operations and wrapping up those arguments in xdata seem rather complex
> and difficult to maintain going further. Having being worked with NFS,
> may I suggest why not we follow (or in similar lines)  the approach
> being taken by NFS protocol to define and implement compound procedures.
>
>     The basic structure of the NFS COMPOUND procedure is:
>
>     +-----+--------------+--------+-----------+-----------+-----------+--
>     | tag | minorversion | numops | op + args | op + args | op + args |
>     +-----+--------------+--------+-----------+-----------+-----------+--
>
>     and the reply's structure is:
>
>        +------------+-----+--------+-----------------------+--
>        |last status | tag | numres | status + op + results |
>        +------------+-----+--------+-----------------------+--
>
> Each compound procedure will contain the number of operations followed
> by the list of 'op_code+arguments_for_that_fop'
>
> So on similar lines, we just need to define new RPC structure for
> COMPOUND fops (something like below) and xdr encode/decode of each of
> the ops based on the op number.
>
> struct argop {
>           uint32_t    op_num;
>           union argop switch (op_num) {
>               case <OPCODE>: <argument>;
>               ...
>           }op_args;
>       };
>
>       struct COMPOUNDargs {
>               uint32_t    version;
>           uint32_t     numops;
>               argop      argarray<>;
>       };
>
>     RESULT
>
>       union resop switch (opnum resop){
>               case <OPCODE>: <result>;
>               ...
>       };
>
>       struct COMPOUND4res {
>               uint32_t        status;
>               resop         resarray<>;
>       };
>
> The xlator which would like to club fops can define this new COMPOUND
> fop with the list of operations. For eg., AFR can construct this
> compound fop as
>
> compound_fop (struct COMPOUNDargs c_args);
>
> c_args.version =1
> c_args.numops = 2
> c_args.argarray[0].op_num=fxattr_op_num;
> c_args.argarray[0].op_args = fxattr_op_args;
> c_args.argarray[0].op_num=writev_op_num;
> c_args.argarray[0].op_args = writev_op_args;
>
> On the server-side , the new compound xlator on receiving this compound
> fop can split the fops and execute one by one as already mentioned by you.
>
> Any thoughts?
>
> Thanks,
> Soumya
>
>
>> Pranith.
>>>
>>> Starting with a well defined set of operations for compounding has its
>>> advantages. It would be easier to understand and maintain correctness
>>> across the stack. Some of our translators perform transactions &
>>> create/update internal metadata for certain fops. It would be easier
>>> for such translators if the compound operations are well defined and
>>> does not entail deep introspection of a generic representation to
>>> ensure that the right behavior gets reflected at the end of a compound
>>> operation.
>>>
>>> -Vijay
>>>
>>>
>>>
>>
>> _______________________________________________
>> Gluster-devel mailing list
>> Gluster-devel at gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-devel