[Gluster-devel] compound fop design first cut

Fri Dec 11 10:02:13 UTC 2015

On 12/09/2015 11:48 PM, Pranith Kumar Karampuri wrote:
>
>
> On 12/09/2015 08:11 PM, Shyam wrote:
>> On 12/09/2015 02:37 AM, Soumya Koduri wrote:
>>>
>>>
>>> On 12/09/2015 11:44 AM, Pranith Kumar Karampuri wrote:
>>>>
>>>>
>>>> On 12/09/2015 06:37 AM, Vijay Bellur wrote:
>>>>> On 12/08/2015 03:45 PM, Jeff Darcy wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On December 8, 2015 at 12:53:04 PM, Ira Cooper (ira at redhat.com) 
>>>>>> wrote:
>>>>>>> Raghavendra Gowdappa writes:
>>>>>>> I propose that we define a "compound op" that contains ops.
>>>>>>>
>>>>>>> Within each op, there are fields that can be "inherited" from the
>>>>>>> previous op, via use of a sentinel value.
>>>>>>>
>>>>>>> Sentinel is -1, for all of these examples.
>>>>>>>
>>>>>>> So:
>>>>>>>
>>>>>>> LOOKUP (1, "foo") (Sets the gfid value to be picked up by
>>>>>>> compounding, 1
>>>>>>> is the root directory, as a gfid, by convention.)
>>>>>>> OPEN(-1, O_RDWR) (Uses the gfid value, sets the glfd compound 
>>>>>>> value.)
>>>>>>> WRITE(-1, "foo", 3) (Uses the glfd compound value.)
>>>>>>> CLOSE(-1) (Uses the glfd compound value)
>>>>>>
>>>>>> So, basically, what the programming-language types would call 
>>>>>> futures
>>>>>> and promises.  It’s a good and well studied concept, which is 
>>>>>> necessary
>>>>>> to solve the second-order problem of how to specify an argument in
>>>>>> sub-operation N+1 that’s not known until sub-operation N completes.
>>>>>>
>>>>>> To be honest, some of the highly general approaches suggested here
>>>>>> scare
>>>>>> me too.  Wrapping up the arguments for one sub-operation in xdata 
>>>>>> for
>>>>>> another would get pretty hairy if we ever try to go beyond two
>>>>>> sub-operations and have to nest sub-operation #3’s args within
>>>>>> sub-operation #2’s xdata which is itself encoded within 
>>>>>> sub-operation
>>>>>> #1’s xdata.  There’s also not much clarity about how to handle
>>>>>> errors in
>>>>>> that model.  Encoding N sub-operations’ arguments in a linear 
>>>>>> structure
>>>>>> as Shyam proposes seems a bit cleaner that way.  If I were to 
>>>>>> continue
>>>>>> down that route I’d suggest just having start_compound and 
>>>>>> end-compound
>>>>>> fops, plus an extra field (or by-convention xdata key) that 
>>>>>> either the
>>>>>> client-side or server-side translator could use to build whatever
>>>>>> structure it wants and schedule sub-operations however it wants.
>>>>>>
>>>>>> However, I’d be even more comfortable with an even simpler approach
>>>>>> that
>>>>>> avoids the need to solve what the database folks (who have dealt 
>>>>>> with
>>>>>> complex transactions for years) would tell us is a really hard 
>>>>>> problem.
>>>>>> Instead of designing for every case we can imagine, let’s design for
>>>>>> the
>>>>>> cases that we know would be useful for improving performance. 
>>>>>> Open plus
>>>>>> read/write plus close is an obvious one.  Raghavendra mentions
>>>>>> create+inodelk as well.  For each of those, we can easily define a
>>>>>> structure that contains the necessary fields, we don’t need a
>>>>>> client-side translator, and the server-side translator can take 
>>>>>> care of
>>>>>> “forwarding” results from one sub-operation to the next. We could 
>>>>>> even
>>>>>> use GF_FOP_IPC to prototype this.  If we later find that the 
>>>>>> number of
>>>>>> “one-off” compound requests is growing too large, then at least 
>>>>>> we’ll
>>>>>> have some experience to guide our design of a more general 
>>>>>> alternative.
>>>>>> Right now, I think we’re trying to look further ahead than we can 
>>>>>> see
>>>>>> clearly.
>>>> Yes Agree. This makes implementation on the client side simpler as 
>>>> well.
>>>> So it is welcome.
>>>>
>>>> Just updating the solution.
>>>> 1) New RPCs are going to be implemented.
>>>> 2) client stack will use these new fops.
>>>> 3) On the server side we have server xlator implementing these new 
>>>> fops
>>>> to decode the RPC request then resolve_resume and
>>>> compound-op-receiver(Better name for this is welcome) which sends 
>>>> one op
>>>> after other and send compound fop response.
>>
>> @Pranith, I assume you would expand on this at a later date 
>> (something along the lines of what Soumya has done below, right?
>
> I will talk to her tomorrow to know more about this. Not saying this 
> is what I will be implementing (There doesn't seem to be any consensus 
> yet). But I would love to know how it is implemented.

Soumya and I had a discussion about this and it seems like the NFS way 
of stuffing the args seems to workout at a high level. Even the sentinel 
value based work may also be possible. What I will do now is to take a 
look at the structure deeply and work out how all the fops mentioned in 
this thread can be implemented. I will update you guys about my findings 
in a couple of days.

Pranith
>
> Pranith
>>
>>>>
>>>> List of compound fops identified so far:
>>>> Swift/S3:
>>>> PUT: creat(), write()s, setxattr(), fsync(), close(), rename()
>>>>
>>>> Dht:
>>>> mkdir + inodelk
>>>>
>>>> Afr:
>>>> xattrop+writev, xattrop+unlock to begin with.
>>>>
>>>> Could everyone who needs compound fops add to this list?
>>>>
>>>> I see that Niels is back on 14th. Does anyone else know the list of
>>>> compound fops he has in mind?
>>>>
>>>  From the discussions we had with Niels regarding the kerberos support
>>> on GlusterFS, I think below are the set of compound fops which are
>>> required.
>>>
>>> set_uid +
>>> set_gid +
>>> set_lkowner (or kerberos principal name) +
>>> actual_fop
>>>
>>> Also gfapi does lookup (first time/to refresh inode) before performing
>>> actual fops most of the times. It may really help if we can club such
>>> fops -
>>
>> @Soumya +5 (just a random number :) )
>>
>> This came to my mind as well, and is a good candidate for compounding.
>>
>>>
>>> LOOKUP + FOP (OPEN etc)
>>>
>>> Coming to the design proposed, I agree with Shyam, Ira and Jeff's
>>> thoughts. Defining different compound fops for each specific set of
>>> operations and wrapping up those arguments in xdata seem rather complex
>>> and difficult to maintain going further. Having being worked with NFS,
>>> may I suggest why not we follow (or in similar lines)  the approach
>>> being taken by NFS protocol to define and implement compound 
>>> procedures.
>>>
>>>     The basic structure of the NFS COMPOUND procedure is:
>>>
>>> +-----+--------------+--------+-----------+-----------+-----------+--
>>>     | tag | minorversion | numops | op + args | op + args | op + args |
>>> +-----+--------------+--------+-----------+-----------+-----------+--
>>>
>>>     and the reply's structure is:
>>>
>>>        +------------+-----+--------+-----------------------+--
>>>        |last status | tag | numres | status + op + results |
>>>        +------------+-----+--------+-----------------------+--
>>>
>>> Each compound procedure will contain the number of operations followed
>>> by the list of 'op_code+arguments_for_that_fop'
>>>
>>> So on similar lines, we just need to define new RPC structure for
>>> COMPOUND fops (something like below) and xdr encode/decode of each of
>>> the ops based on the op number.
>>>
>>> struct argop {
>>>           uint32_t    op_num;
>>>           union argop switch (op_num) {
>>>               case <OPCODE>: <argument>;
>>>               ...
>>>           }op_args;
>>>       };
>>>
>>>       struct COMPOUNDargs {
>>>               uint32_t    version;
>>>           uint32_t     numops;
>>>               argop      argarray<>;
>>>       };
>>>
>>>     RESULT
>>>
>>>       union resop switch (opnum resop){
>>>               case <OPCODE>: <result>;
>>>               ...
>>>       };
>>>
>>>       struct COMPOUND4res {
>>>               uint32_t        status;
>>>               resop         resarray<>;
>>>       };
>>>
>>> The xlator which would like to club fops can define this new COMPOUND
>>> fop with the list of operations. For eg., AFR can construct this
>>> compound fop as
>>>
>>> compound_fop (struct COMPOUNDargs c_args);
>>>
>>> c_args.version =1
>>> c_args.numops = 2
>>> c_args.argarray[0].op_num=fxattr_op_num;
>>> c_args.argarray[0].op_args = fxattr_op_args;
>>> c_args.argarray[0].op_num=writev_op_num;
>>> c_args.argarray[0].op_args = writev_op_args;
>>>
>>> On the server-side , the new compound xlator on receiving this compound
>>> fop can split the fops and execute one by one as already mentioned 
>>> by you.
>>>
>>> Any thoughts?
>>>
>>> Thanks,
>>> Soumya
>>>
>>>
>>>> Pranith.
>>>>>
>>>>> Starting with a well defined set of operations for compounding has 
>>>>> its
>>>>> advantages. It would be easier to understand and maintain correctness
>>>>> across the stack. Some of our translators perform transactions &
>>>>> create/update internal metadata for certain fops. It would be easier
>>>>> for such translators if the compound operations are well defined and
>>>>> does not entail deep introspection of a generic representation to
>>>>> ensure that the right behavior gets reflected at the end of a 
>>>>> compound
>>>>> operation.
>>>>>
>>>>> -Vijay
>>>>>
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> Gluster-devel mailing list
>>>> Gluster-devel at gluster.org
>>>> http://www.gluster.org/mailman/listinfo/gluster-devel
>