[Gluster-devel] compound fop design first cut

Wed Dec 9 18:18:46 UTC 2015

On 12/09/2015 08:11 PM, Shyam wrote:
> On 12/09/2015 02:37 AM, Soumya Koduri wrote:
>>
>>
>> On 12/09/2015 11:44 AM, Pranith Kumar Karampuri wrote:
>>>
>>>
>>> On 12/09/2015 06:37 AM, Vijay Bellur wrote:
>>>> On 12/08/2015 03:45 PM, Jeff Darcy wrote:
>>>>>
>>>>>
>>>>>
>>>>> On December 8, 2015 at 12:53:04 PM, Ira Cooper (ira at redhat.com) 
>>>>> wrote:
>>>>>> Raghavendra Gowdappa writes:
>>>>>> I propose that we define a "compound op" that contains ops.
>>>>>>
>>>>>> Within each op, there are fields that can be "inherited" from the
>>>>>> previous op, via use of a sentinel value.
>>>>>>
>>>>>> Sentinel is -1, for all of these examples.
>>>>>>
>>>>>> So:
>>>>>>
>>>>>> LOOKUP (1, "foo") (Sets the gfid value to be picked up by
>>>>>> compounding, 1
>>>>>> is the root directory, as a gfid, by convention.)
>>>>>> OPEN(-1, O_RDWR) (Uses the gfid value, sets the glfd compound 
>>>>>> value.)
>>>>>> WRITE(-1, "foo", 3) (Uses the glfd compound value.)
>>>>>> CLOSE(-1) (Uses the glfd compound value)
>>>>>
>>>>> So, basically, what the programming-language types would call futures
>>>>> and promises.  It’s a good and well studied concept, which is 
>>>>> necessary
>>>>> to solve the second-order problem of how to specify an argument in
>>>>> sub-operation N+1 that’s not known until sub-operation N completes.
>>>>>
>>>>> To be honest, some of the highly general approaches suggested here
>>>>> scare
>>>>> me too.  Wrapping up the arguments for one sub-operation in xdata for
>>>>> another would get pretty hairy if we ever try to go beyond two
>>>>> sub-operations and have to nest sub-operation #3’s args within
>>>>> sub-operation #2’s xdata which is itself encoded within sub-operation
>>>>> #1’s xdata.  There’s also not much clarity about how to handle
>>>>> errors in
>>>>> that model.  Encoding N sub-operations’ arguments in a linear 
>>>>> structure
>>>>> as Shyam proposes seems a bit cleaner that way.  If I were to 
>>>>> continue
>>>>> down that route I’d suggest just having start_compound and 
>>>>> end-compound
>>>>> fops, plus an extra field (or by-convention xdata key) that either 
>>>>> the
>>>>> client-side or server-side translator could use to build whatever
>>>>> structure it wants and schedule sub-operations however it wants.
>>>>>
>>>>> However, I’d be even more comfortable with an even simpler approach
>>>>> that
>>>>> avoids the need to solve what the database folks (who have dealt with
>>>>> complex transactions for years) would tell us is a really hard 
>>>>> problem.
>>>>> Instead of designing for every case we can imagine, let’s design for
>>>>> the
>>>>> cases that we know would be useful for improving performance. Open 
>>>>> plus
>>>>> read/write plus close is an obvious one.  Raghavendra mentions
>>>>> create+inodelk as well.  For each of those, we can easily define a
>>>>> structure that contains the necessary fields, we don’t need a
>>>>> client-side translator, and the server-side translator can take 
>>>>> care of
>>>>> “forwarding” results from one sub-operation to the next. We could 
>>>>> even
>>>>> use GF_FOP_IPC to prototype this.  If we later find that the 
>>>>> number of
>>>>> “one-off” compound requests is growing too large, then at least we’ll
>>>>> have some experience to guide our design of a more general 
>>>>> alternative.
>>>>> Right now, I think we’re trying to look further ahead than we can see
>>>>> clearly.
>>> Yes Agree. This makes implementation on the client side simpler as 
>>> well.
>>> So it is welcome.
>>>
>>> Just updating the solution.
>>> 1) New RPCs are going to be implemented.
>>> 2) client stack will use these new fops.
>>> 3) On the server side we have server xlator implementing these new fops
>>> to decode the RPC request then resolve_resume and
>>> compound-op-receiver(Better name for this is welcome) which sends 
>>> one op
>>> after other and send compound fop response.
>
> @Pranith, I assume you would expand on this at a later date (something 
> along the lines of what Soumya has done below, right?

I will talk to her tomorrow to know more about this. Not saying this is 
what I will be implementing (There doesn't seem to be any consensus 
yet). But I would love to know how it is implemented.

Pranith
>
>>>
>>> List of compound fops identified so far:
>>> Swift/S3:
>>> PUT: creat(), write()s, setxattr(), fsync(), close(), rename()
>>>
>>> Dht:
>>> mkdir + inodelk
>>>
>>> Afr:
>>> xattrop+writev, xattrop+unlock to begin with.
>>>
>>> Could everyone who needs compound fops add to this list?
>>>
>>> I see that Niels is back on 14th. Does anyone else know the list of
>>> compound fops he has in mind?
>>>
>>  From the discussions we had with Niels regarding the kerberos support
>> on GlusterFS, I think below are the set of compound fops which are
>> required.
>>
>> set_uid +
>> set_gid +
>> set_lkowner (or kerberos principal name) +
>> actual_fop
>>
>> Also gfapi does lookup (first time/to refresh inode) before performing
>> actual fops most of the times. It may really help if we can club such
>> fops -
>
> @Soumya +5 (just a random number :) )
>
> This came to my mind as well, and is a good candidate for compounding.
>
>>
>> LOOKUP + FOP (OPEN etc)
>>
>> Coming to the design proposed, I agree with Shyam, Ira and Jeff's
>> thoughts. Defining different compound fops for each specific set of
>> operations and wrapping up those arguments in xdata seem rather complex
>> and difficult to maintain going further. Having being worked with NFS,
>> may I suggest why not we follow (or in similar lines)  the approach
>> being taken by NFS protocol to define and implement compound procedures.
>>
>>     The basic structure of the NFS COMPOUND procedure is:
>>
>> +-----+--------------+--------+-----------+-----------+-----------+--
>>     | tag | minorversion | numops | op + args | op + args | op + args |
>> +-----+--------------+--------+-----------+-----------+-----------+--
>>
>>     and the reply's structure is:
>>
>>        +------------+-----+--------+-----------------------+--
>>        |last status | tag | numres | status + op + results |
>>        +------------+-----+--------+-----------------------+--
>>
>> Each compound procedure will contain the number of operations followed
>> by the list of 'op_code+arguments_for_that_fop'
>>
>> So on similar lines, we just need to define new RPC structure for
>> COMPOUND fops (something like below) and xdr encode/decode of each of
>> the ops based on the op number.
>>
>> struct argop {
>>           uint32_t    op_num;
>>           union argop switch (op_num) {
>>               case <OPCODE>: <argument>;
>>               ...
>>           }op_args;
>>       };
>>
>>       struct COMPOUNDargs {
>>               uint32_t    version;
>>           uint32_t     numops;
>>               argop      argarray<>;
>>       };
>>
>>     RESULT
>>
>>       union resop switch (opnum resop){
>>               case <OPCODE>: <result>;
>>               ...
>>       };
>>
>>       struct COMPOUND4res {
>>               uint32_t        status;
>>               resop         resarray<>;
>>       };
>>
>> The xlator which would like to club fops can define this new COMPOUND
>> fop with the list of operations. For eg., AFR can construct this
>> compound fop as
>>
>> compound_fop (struct COMPOUNDargs c_args);
>>
>> c_args.version =1
>> c_args.numops = 2
>> c_args.argarray[0].op_num=fxattr_op_num;
>> c_args.argarray[0].op_args = fxattr_op_args;
>> c_args.argarray[0].op_num=writev_op_num;
>> c_args.argarray[0].op_args = writev_op_args;
>>
>> On the server-side , the new compound xlator on receiving this compound
>> fop can split the fops and execute one by one as already mentioned by 
>> you.
>>
>> Any thoughts?
>>
>> Thanks,
>> Soumya
>>
>>
>>> Pranith.
>>>>
>>>> Starting with a well defined set of operations for compounding has its
>>>> advantages. It would be easier to understand and maintain correctness
>>>> across the stack. Some of our translators perform transactions &
>>>> create/update internal metadata for certain fops. It would be easier
>>>> for such translators if the compound operations are well defined and
>>>> does not entail deep introspection of a generic representation to
>>>> ensure that the right behavior gets reflected at the end of a compound
>>>> operation.
>>>>
>>>> -Vijay
>>>>
>>>>
>>>>
>>>
>>> _______________________________________________
>>> Gluster-devel mailing list
>>> Gluster-devel at gluster.org
>>> http://www.gluster.org/mailman/listinfo/gluster-devel