[Gluster-devel] Upcalls Infrastructure
skoduri at redhat.com
Mon Feb 23 05:09:22 UTC 2015
Below patches include preliminary upcall framework support but has only
'cache_invalidation' use-case handled.
Kindly review the changes.
Lease_Lock support changes will be submitted in the new patches after
addressing the proposed changes discussed in the earlier mail.
On 02/19/2015 12:30 PM, Soumya Koduri wrote:
> We recently have uncovered few issues with respect to lease_locks
> support and had discussions around the same. Thanks to everyone involved.
> So the new changes proposed in the design (in addition to the ones
> discussed in the earlier mail) are -
> * Earlier, in case if a client takes a lease-lock and a conflicting fop
> is requested by another client, RECALL_LEASE CBK event will be sent to
> the first client and till the first client unlocks the LEASE_LOCK, we
> send EDELAY/ERETRY error to the conflicting fops. This works for
> protocol clients (like NFS/SMB) which keep retrying on receiving that
> error but not for FUSE clients or any of the other auxiliary services
> (like rebalance/self-heal/quota) which will error-out immediately.
> So to resolve that, we choose to block the fops based on the flags
> passed (by default 'BLOCK' or 'NON_BLOCK' incase of protocol clients).
> The blocking will be done in the same way as current locks xlator does
> to block lock requests (maintain a queue of call stubs and wake them up
> once the LEASE_LOCK is released/recalled).
> * Earlier, when a lease_lk request comes, the upcall xlator maps it to
> POSIX lock for the entire file before granting it. And incase if the
> same client takes an fcntl lock, it will be merged with the earlier lock
> taken and unlock of either of the locks will result in the loss of lock
> To avoid that, we plan to define a new lk_entry (LEASE_LOCK) in the
> 'locks' xlator to store lease_locks and add support to not merge it with
> the locks of any other type.
> * In addition, before granting lease_lock, we now check if there are
> existing open-fds on the file with the conflicting access requested. If
> yes, lease_lock will not be granted.
> * While sending RECALL_LEASE CBK event, a new timer event will be
> registered to notify in case of recall timeout so that we can purge
> lease locks forcefully and wake up blocked fops.
> * Few Enhancements which may be considered,
> * to start with upcall entries are maintained in a linked list.
> We may change it to RBT tree for performance improvement.
> * store Upcall entries in inode/fd_ctxt for faster lookup.
> On 01/22/2015 02:31 PM, Soumya Koduri wrote:
>> I have updated the feature page with more design details and the
>> dependencies/limitations this support has.
>> Kindly check the same and provide your inputs.
>> Few of them which may be addressed for 3.7 release are -
>> - Incase of replica bricks maintained by AFR, the upcalls state is
>> maintained and processed on all the replica bricks. This will result in
>> duplicate notifications sent by all those bricks incase of
>> non-idempotent fops.
>> - Hence we need support on AFR to filter out such duplicate
>> callback notifications. Similar support is needed for EC as well.
>> - One of the approaches suggested by the AFR team is to cache the
>> upcall notifications received for around 1min (their current lifetime)
>> to detect & filter out the duplicate notifications sent by the replica
>> *Cleanup during network disconnect - protocol/server*
>> - At present, incase of network disconnects between the
>> glusterfs-server and the client, the protocol/server looks up the fd
>> table associated with that client and sends 'flush' op for each of those
>> fds to cleanup the locks associated with it.
>> - We need similar support to flush the lease locks taken. Hence,
>> while granting the lease-lock, we plan to associate that upcall_entry
>> with the corresponding fd_ctx or inode_ctx so that they can be easily
>> tracked if needed to be cleaned up. Also it will help in faster lookup
>> of the upcall entries while trying to process the fops using the same
>> Note: Above cleanup is done for the upcall state associated with only
>> lease-locks. For the other entries maintained (for eg:, for
>> cache-invalidations), the reaper thread (which will be used to cleanup
>> the expired entries in this xlator) will clean-up those states as well
>> once they get expired.
>> *Replay of the lease-locks state*
>> - At present, replay of locks by the client xlator (after network
>> disconnect and reconnect) seems to have been disabled.
>> - But when it is being enabled, we need to add support to replay
>> lease-locks taken as well.
>> - Till then, this will be considered as a limitation and will be
>> documented as suggested by KP.
>> On 12/16/2014 09:36 AM, Krishnan Parthasarathi wrote:
>>>>> - Is there a new connection from glusterfsd (upcall xlator) to
>>>>> a client accessing a file? If so, how does the upcall xlator reuse
>>>>> connections when the same client accesses multiple files, or
>>>>> does it?
>>>> No. We are using the same connection which client initiates to send-in
>>>> fops. Thanks to you for pointing me initially to the 'client_t'
>>>> structure. As these connection details are available only in the server
>>>> xlator, I am passing these to upcall xlator by storing them in
>>>>> - In the event of a network separation (i.e, a partition) between a
>>>>> and a server, how does the client discover or detect that the
>>>>> has 'freed' up its previously registerd upcall notification?
>>>> The rpc connection details of each client are stored based on its
>>>> client-uid. So incase of network partition, when client comes back
>>>> online, IMO it re-initiates the connection (along with new client-uid).
>>> How would a client discover that a server has purged its upcall entries?
>>> For instance, a client could assume that the server would notify it
>>> changes as before (while the server has purged the client's upcall
>>> and assume that it still holds the lease/lock. How would you avoid that?
>>>> Please correct me if that's not the case. So there will new entries
>>>> created/added in this xlator. However, we still need to decide on
>>>> how to
>>>> cleanup the old-timed-out and stale entries
>>>> * either clean-up the entries as and when we find any expired
>>>> entry or
>>>> stale entry (in case if notification fails).
>>>> * or by spawning a new thread which periodically scans through this
>>>> list and cleans up those entries.
>>> There are couple of things to resource cleanup in this context.
>>> 1) Time to cleanup; For e.g, on expiry of a timer.
>>> 2) Order of cleaning up; This involves clearly establishing
>>> among inode, upcall entry and client_t(s). We should document this.
>> Gluster-devel mailing list
>> Gluster-devel at gluster.org
> Gluster-devel mailing list
> Gluster-devel at gluster.org
More information about the Gluster-devel