[Gluster-devel] compound fop design first cut
Pranith Kumar Karampuri
pkarampu at redhat.com
Fri Dec 11 10:02:13 UTC 2015
On 12/09/2015 11:48 PM, Pranith Kumar Karampuri wrote:
>
>
> On 12/09/2015 08:11 PM, Shyam wrote:
>> On 12/09/2015 02:37 AM, Soumya Koduri wrote:
>>>
>>>
>>> On 12/09/2015 11:44 AM, Pranith Kumar Karampuri wrote:
>>>>
>>>>
>>>> On 12/09/2015 06:37 AM, Vijay Bellur wrote:
>>>>> On 12/08/2015 03:45 PM, Jeff Darcy wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On December 8, 2015 at 12:53:04 PM, Ira Cooper (ira at redhat.com)
>>>>>> wrote:
>>>>>>> Raghavendra Gowdappa writes:
>>>>>>> I propose that we define a "compound op" that contains ops.
>>>>>>>
>>>>>>> Within each op, there are fields that can be "inherited" from the
>>>>>>> previous op, via use of a sentinel value.
>>>>>>>
>>>>>>> Sentinel is -1, for all of these examples.
>>>>>>>
>>>>>>> So:
>>>>>>>
>>>>>>> LOOKUP (1, "foo") (Sets the gfid value to be picked up by
>>>>>>> compounding, 1
>>>>>>> is the root directory, as a gfid, by convention.)
>>>>>>> OPEN(-1, O_RDWR) (Uses the gfid value, sets the glfd compound
>>>>>>> value.)
>>>>>>> WRITE(-1, "foo", 3) (Uses the glfd compound value.)
>>>>>>> CLOSE(-1) (Uses the glfd compound value)
>>>>>>
>>>>>> So, basically, what the programming-language types would call
>>>>>> futures
>>>>>> and promises. It’s a good and well studied concept, which is
>>>>>> necessary
>>>>>> to solve the second-order problem of how to specify an argument in
>>>>>> sub-operation N+1 that’s not known until sub-operation N completes.
>>>>>>
>>>>>> To be honest, some of the highly general approaches suggested here
>>>>>> scare
>>>>>> me too. Wrapping up the arguments for one sub-operation in xdata
>>>>>> for
>>>>>> another would get pretty hairy if we ever try to go beyond two
>>>>>> sub-operations and have to nest sub-operation #3’s args within
>>>>>> sub-operation #2’s xdata which is itself encoded within
>>>>>> sub-operation
>>>>>> #1’s xdata. There’s also not much clarity about how to handle
>>>>>> errors in
>>>>>> that model. Encoding N sub-operations’ arguments in a linear
>>>>>> structure
>>>>>> as Shyam proposes seems a bit cleaner that way. If I were to
>>>>>> continue
>>>>>> down that route I’d suggest just having start_compound and
>>>>>> end-compound
>>>>>> fops, plus an extra field (or by-convention xdata key) that
>>>>>> either the
>>>>>> client-side or server-side translator could use to build whatever
>>>>>> structure it wants and schedule sub-operations however it wants.
>>>>>>
>>>>>> However, I’d be even more comfortable with an even simpler approach
>>>>>> that
>>>>>> avoids the need to solve what the database folks (who have dealt
>>>>>> with
>>>>>> complex transactions for years) would tell us is a really hard
>>>>>> problem.
>>>>>> Instead of designing for every case we can imagine, let’s design for
>>>>>> the
>>>>>> cases that we know would be useful for improving performance.
>>>>>> Open plus
>>>>>> read/write plus close is an obvious one. Raghavendra mentions
>>>>>> create+inodelk as well. For each of those, we can easily define a
>>>>>> structure that contains the necessary fields, we don’t need a
>>>>>> client-side translator, and the server-side translator can take
>>>>>> care of
>>>>>> “forwarding” results from one sub-operation to the next. We could
>>>>>> even
>>>>>> use GF_FOP_IPC to prototype this. If we later find that the
>>>>>> number of
>>>>>> “one-off” compound requests is growing too large, then at least
>>>>>> we’ll
>>>>>> have some experience to guide our design of a more general
>>>>>> alternative.
>>>>>> Right now, I think we’re trying to look further ahead than we can
>>>>>> see
>>>>>> clearly.
>>>> Yes Agree. This makes implementation on the client side simpler as
>>>> well.
>>>> So it is welcome.
>>>>
>>>> Just updating the solution.
>>>> 1) New RPCs are going to be implemented.
>>>> 2) client stack will use these new fops.
>>>> 3) On the server side we have server xlator implementing these new
>>>> fops
>>>> to decode the RPC request then resolve_resume and
>>>> compound-op-receiver(Better name for this is welcome) which sends
>>>> one op
>>>> after other and send compound fop response.
>>
>> @Pranith, I assume you would expand on this at a later date
>> (something along the lines of what Soumya has done below, right?
>
> I will talk to her tomorrow to know more about this. Not saying this
> is what I will be implementing (There doesn't seem to be any consensus
> yet). But I would love to know how it is implemented.
Soumya and I had a discussion about this and it seems like the NFS way
of stuffing the args seems to workout at a high level. Even the sentinel
value based work may also be possible. What I will do now is to take a
look at the structure deeply and work out how all the fops mentioned in
this thread can be implemented. I will update you guys about my findings
in a couple of days.
Pranith
>
> Pranith
>>
>>>>
>>>> List of compound fops identified so far:
>>>> Swift/S3:
>>>> PUT: creat(), write()s, setxattr(), fsync(), close(), rename()
>>>>
>>>> Dht:
>>>> mkdir + inodelk
>>>>
>>>> Afr:
>>>> xattrop+writev, xattrop+unlock to begin with.
>>>>
>>>> Could everyone who needs compound fops add to this list?
>>>>
>>>> I see that Niels is back on 14th. Does anyone else know the list of
>>>> compound fops he has in mind?
>>>>
>>> From the discussions we had with Niels regarding the kerberos support
>>> on GlusterFS, I think below are the set of compound fops which are
>>> required.
>>>
>>> set_uid +
>>> set_gid +
>>> set_lkowner (or kerberos principal name) +
>>> actual_fop
>>>
>>> Also gfapi does lookup (first time/to refresh inode) before performing
>>> actual fops most of the times. It may really help if we can club such
>>> fops -
>>
>> @Soumya +5 (just a random number :) )
>>
>> This came to my mind as well, and is a good candidate for compounding.
>>
>>>
>>> LOOKUP + FOP (OPEN etc)
>>>
>>> Coming to the design proposed, I agree with Shyam, Ira and Jeff's
>>> thoughts. Defining different compound fops for each specific set of
>>> operations and wrapping up those arguments in xdata seem rather complex
>>> and difficult to maintain going further. Having being worked with NFS,
>>> may I suggest why not we follow (or in similar lines) the approach
>>> being taken by NFS protocol to define and implement compound
>>> procedures.
>>>
>>> The basic structure of the NFS COMPOUND procedure is:
>>>
>>> +-----+--------------+--------+-----------+-----------+-----------+--
>>> | tag | minorversion | numops | op + args | op + args | op + args |
>>> +-----+--------------+--------+-----------+-----------+-----------+--
>>>
>>> and the reply's structure is:
>>>
>>> +------------+-----+--------+-----------------------+--
>>> |last status | tag | numres | status + op + results |
>>> +------------+-----+--------+-----------------------+--
>>>
>>> Each compound procedure will contain the number of operations followed
>>> by the list of 'op_code+arguments_for_that_fop'
>>>
>>> So on similar lines, we just need to define new RPC structure for
>>> COMPOUND fops (something like below) and xdr encode/decode of each of
>>> the ops based on the op number.
>>>
>>> struct argop {
>>> uint32_t op_num;
>>> union argop switch (op_num) {
>>> case <OPCODE>: <argument>;
>>> ...
>>> }op_args;
>>> };
>>>
>>> struct COMPOUNDargs {
>>> uint32_t version;
>>> uint32_t numops;
>>> argop argarray<>;
>>> };
>>>
>>> RESULT
>>>
>>> union resop switch (opnum resop){
>>> case <OPCODE>: <result>;
>>> ...
>>> };
>>>
>>> struct COMPOUND4res {
>>> uint32_t status;
>>> resop resarray<>;
>>> };
>>>
>>> The xlator which would like to club fops can define this new COMPOUND
>>> fop with the list of operations. For eg., AFR can construct this
>>> compound fop as
>>>
>>> compound_fop (struct COMPOUNDargs c_args);
>>>
>>> c_args.version =1
>>> c_args.numops = 2
>>> c_args.argarray[0].op_num=fxattr_op_num;
>>> c_args.argarray[0].op_args = fxattr_op_args;
>>> c_args.argarray[0].op_num=writev_op_num;
>>> c_args.argarray[0].op_args = writev_op_args;
>>>
>>> On the server-side , the new compound xlator on receiving this compound
>>> fop can split the fops and execute one by one as already mentioned
>>> by you.
>>>
>>> Any thoughts?
>>>
>>> Thanks,
>>> Soumya
>>>
>>>
>>>> Pranith.
>>>>>
>>>>> Starting with a well defined set of operations for compounding has
>>>>> its
>>>>> advantages. It would be easier to understand and maintain correctness
>>>>> across the stack. Some of our translators perform transactions &
>>>>> create/update internal metadata for certain fops. It would be easier
>>>>> for such translators if the compound operations are well defined and
>>>>> does not entail deep introspection of a generic representation to
>>>>> ensure that the right behavior gets reflected at the end of a
>>>>> compound
>>>>> operation.
>>>>>
>>>>> -Vijay
>>>>>
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> Gluster-devel mailing list
>>>> Gluster-devel at gluster.org
>>>> http://www.gluster.org/mailman/listinfo/gluster-devel
>
More information about the Gluster-devel
mailing list