[Gluster-devel] compound fop design first cut

Pranith Kumar Karampuri pkarampu at redhat.com
Mon Dec 7 09:08:28 UTC 2015


Draft of the design doc:

Main motivation for the design of this feature is to reduce network 
round trips by sending more
than one fop in a network operation, preferably without introducing new 

There are new 2 new xlators compound-fop-sender, compound-fop-receiver.
compound-fop-sender is going to be loaded on top of each client-xlator 
on the
mount/client and compound-fop-receiver is going to be loaded below
server-xlator on the bricks. On the mount/client side from the caller 
till compund-fop-encoder xlator, the xlators can choose to implement 
this extra
compound fop handling. Once it reaches "compound-fop-sender" it will try to
choose a base fop on which it encodes the other fop in the base-fop's 
and winds the base fop to client xlator(). client xlator sends the base fop
with encoded xdata to server xlator on the brick using rpc of the base fop.
Once server xlator does resolve_and_resume() it will wind the base fop to
compound-fop-receiver xlator. This fop will decode the extra fop from 
xdata of
the base-fop. Based on the order encoded in the xdata it executes 
separate fops
one after the other and stores the cbk response arguments of both the
operations. It again encodes the response of the extra fop on to the 
base fop's
response xdata and unwind the fop to server xlator. Sends the response 
base-rpc's response structure. Client xlator will unwind the base fop to
compound-fop-sender, which will decode the response to the compound fop's
response arguments of the compound fop and unwind to the parent xlators.

I will take an example of fxattrop+write operation that we want to 
implement in
afr as an example to explain how things may look.

compound_fop_sender_fxattrop_write(call_frame_t *frame, xlator_t *this, 
fd_t * fd,
         gf_xattrop_flags_t flags,
         dict_t * fxattrop_dict,
         dict_t * fxattrop_xdata,
         struct iovec * vector,
         int32_t count,
         off_t off,
         uint32_t flags,
         struct iobref * iobref,
         dict_t * writev_xdata)
) {
         0) Remember the compound-fop
         take base-fop as write()
         in wriev_xdata add the following key,value pairs
         1) "xattrop-flags", flags
         2) for-each-fxattrop_dict key -> "fxattrop-dict-<actual-key>", 
         3) for-each-fxattrop_xdata key -> 
"fxattrop-xdata-<actual-key>", value
         4) "order" -> "fxattrop, writev"
         5) "compound-fops" -> "fxattrop"
         6) Wind writev()

         /*decode the response args and call parent_fxattrop_write_cbk*/

<compound_fop_sender_parent>_fxattrop_write_cbk (call_frame_t *frame, 
void *cookie,
                                         xlator_t *this, int32_t 
                                         int32_t fxattrop_op_errno,
                                         dict_t *fxattrop_dict,
                                         dict_t *fxattrop_xdata,
                                         int32_t writev_op_ret, int32_t 
                                         struct iatt *writev_prebuf,
                                         struct iatt *writev_postbuf,
                                         dict_t *writev_xdata)

compound_fop_receiver_writev(call_frame_t *frame, xlator_t *this, fd_t * 
         struct iovec * vector,
         int32_t count,
         off_t off,
         uint32_t flags,
         struct iobref * iobref,
         dict_t * writev_xdata)
         0) Check if writev_xdata has "compound-fop" else default_writev()
         2) decode writev_xdata from above encoding -> flags, 
fxattrop_dict, fxattrop-xdata
         3) get "order"
         4) Store all the above in 'local'
         5) wind fxattrop() with 
compound_receiver_fxattrop_cbk_writev_wind() as cbk

compound_receiver_fxattrop_cbk_writev_wind (call_frame_t *frame, void 
                                             xlator_t *this, int32_t 
                                             int32_t op_errno, dict_t 
                                             dict_t *xdata)
         0) store fxattrop cbk_args
         1) Perform writev() with writev_params with 
compound_receiver_writev_cbk() as the 'cbk'

compound_writev_cbk (call_frame_t *frame, void *cookie, xlator_t *this,
                      int32_t op_ret, int32_t op_errno, struct iatt 
                      struct iatt *postbuf, dict_t *xdata)
         0) store writev cbk_args
         1) Encode fxattrop response to writev_xdata with similar 
encoding in the compound_fop_sender_fxattrop_write()
         2) unwind writev()

This example is just to show how things may look, but the actual 
may just have all base-fops calling common function to perform the 
in the order given in the receriver xl. Yet to think about that. It is 
probably better to Encode
fop-number from glusterfs_fop_t rather than the fop-string in the 

This is phase-1 of the change because we don't want to change RPCs
in phase-2 we can implement the compound fops that are commonly used by 
lot of translators throughout the stack so that 
quota/bitrot/geo-rep/barrier etc handle them
in phase-3 may be just in time for 4.0 we can convert them to on the 
wire RPCs

Thanks to Raghavendra G, krutika, Ravi, Anuradha for the discussions

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-devel/attachments/20151207/6fdb68a5/attachment.html>

More information about the Gluster-devel mailing list