[Gluster-devel] Question about EC locking
Xavier Hernandez
xhernandez at datalab.es
Fri Jan 13 12:03:29 UTC 2017
Hi,
On 13/01/17 10:58, jayakrishnan mm wrote:
> Hi Xavier,
> I went through the source code. Some questions remain.
>
> 1. If two clients try to write to same file, it should succeed, even if
> they overlap. (Locks should ensure it happens in sequence, in the bricks).
> from the source code
> lock->flock.l_type = F_WRLCK;
> lock->flock.l_whence = SEEK_SET;
>
> fop->flock.l_len += ec_adjust_offset(fop->xl->private,
> &fop->flock.l_start, 1);
> fop->flock.l_len = ec_adjust_size(fop->xl->private,
> fop->flock.l_len, 1);
> if flock.l_len is 0, the entire file is locked for writing
>
> In my test case with 2 clients, I always get flock.l_len as 0. But
> still I am able to write to the same file from both clients at the
> same time.
How are you sure you are really writing at the same time ? do you get
partial writes from some of the client ?
>
> If it is acquiring lock chunk by chunk, why I am getting l_len =0
> always ?
EC doesn't acquire partial locks. The entire file is locked when a
modification is needed. This makes possible to reuse locks for future
operations (eager locking).
> Why I am not getting the actual write size and offset f(for
> flock.l_len & flock.l_start respectively) for each write FOP ?
> (In afr , it is set to transaction.len transaction.start respectively,
> which in turn is write length & offset for the normal write case)
Because an erasure code splits the data is smaller fragments for each
brick, so offsets and lengths need to be adjusted.
>
> 2. As per source code ,a full file lock is taken by the shd also.
>
> ec_heal_inodelk(heal, F_WRLCK, 1, 0, 0);
> which means offset=0 & size=0 in ec_heal_lock() function in ec-heal.c
> flock.l_start = offset;
> flock.l_len = size;
> Does it mean , in a single file write cannot happen simultaneously with
> healing?
Correct. Heal procedure is like an additional client. If a client and
the heal process try to write at the same time, they must be serialized,
like any other regular write. However heal only takes the full lock for
some critical operations. Regular self heal of file contents is done
locking chunk by chunk.
Xavi
>
> Correct me , if I am wrong.
>
> Best Regards
> JK
>
>
>
>
>
>
> On Wed, Dec 14, 2016 at 12:07 PM, jayakrishnan mm
> <jayakrishnan.mm at gmail.com <mailto:jayakrishnan.mm at gmail.com>> wrote:
>
> Thanks Xavier, for making it clear.
> Regards
> JK
>
>
> On Dec 13, 2016 3:52 PM, "Xavier Hernandez" <xhernandez at datalab.es
> <mailto:xhernandez at datalab.es>> wrote:
>
> Hi JK,
>
>
> On 12/13/2016 08:34 AM, jayakrishnan mm wrote:
>
> Dear Xavi,
>
> How do I test the locks, for example locks for write fop.
> I have two
> clients(independent), both are trying to write to same file.
>
>
> 1. According to my understanding, both can successfully
> write if the
> offsets don't overlap . I mean, the WRITE FOP takes a chunk
> lock on the
> file . As
> long as the clients don't try to write to the same chunk,
> it should be
> OK. If no locks present, it can lead to inconsistency.
>
>
> With locks all writes will be fine as defined by posix (i.e. the
> final result will be equivalent to the sequential execution of
> both operations, though in an undefined order), even if they
> overlap. Without locks, there are chances that some bricks
> execute the operations in one order and the remaining bricks
> execute the same operations in the reverse order, causing data
> corruption.
>
>
>
>
> 2. Different FOPs can always run simultaneously. (Example
> WRITE and
> READ FOPs, or two READ FOPs).
>
>
> All fops can be executed concurrently. If there's any chance
> that two operations could interfere, locks are taken in the
> appropriate places. For example, reads cannot be merged with
> overlapping writes. Otherwise they could return inconsistent data.
>
>
>
> 3. WRITE & some metadata FOP (like setattr) together .
> Cannot happen
> together with locks , even though chances are very low.
>
>
> As in 2, if there's any possible interference, the appropriate
> locks will be taken.
>
> You can look at the code to see which locks are taken for each
> fop. See the corresponding ec_manager_<fop>() function, in the
> EC_STATE_LOCK switch case. There you will see calls to
> ec_lock_prepare_xxx() for each taken lock.
>
> Xavi
>
>
> Pls. clarify.
>
> Best regards
> JK
>
>
>
> On Wed, Nov 30, 2016 at 5:49 PM, jayakrishnan mm
> <jayakrishnan.mm at gmail.com
> <mailto:jayakrishnan.mm at gmail.com>
> <mailto:jayakrishnan.mm at gmail.com
> <mailto:jayakrishnan.mm at gmail.com>>> wrote:
>
> Hi Xavier,
>
> Thank you very much for your explanation. This helped me to
> understand more about locking in EC.
>
> Best Regards
> JK
>
>
> On Mon, Nov 28, 2016 at 4:17 PM, Xavier Hernandez
> <xhernandez at datalab.es <mailto:xhernandez at datalab.es>
> <mailto:xhernandez at datalab.es
> <mailto:xhernandez at datalab.es>>> wrote:
>
> Hi,
>
> On 11/28/2016 02:59 AM, jayakrishnan mm wrote:
>
> Hi Xavier,
>
> Notice that EC xlator uses blocking locks. Any
> specific
> reason for this?
>
>
> In a distributed filesystem like gluster a
> synchronization
> mechanism is a must to avoid data corruption.
>
>
> Do you think this will affect the performance ?
>
>
> Of course the need for locks has a performance
> impact, and we
> cannot avoid them to guarantee data integrity.
> However some
> optimizations have been applied, specially the eager
> locking
> which allows a lock to be reused without
> unlocking/locking again.
>
>
> (In comparison AFR first tries non blocking
> locks and if not
> successful, tries blocking locks then)
>
>
> EC also tries a non-blocking lock first.
>
>
> Also, why two locks are needed per FOP ? One
> for normal
> I/O and
> another for self healing?
>
>
> The only fop that currently needs two locks is
> 'rename', and
> only when source and destination directories are
> different. All
> other fops only take one lock at most.
>
> Best regards,
>
> Xavi
>
>
> Best regards
> JK
>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> <mailto:Gluster-devel at gluster.org>
> <mailto:Gluster-devel at gluster.org
> <mailto:Gluster-devel at gluster.org>>
>
> http://www.gluster.org/mailman/listinfo/gluster-devel
> <http://www.gluster.org/mailman/listinfo/gluster-devel>
>
> <http://www.gluster.org/mailman/listinfo/gluster-devel
> <http://www.gluster.org/mailman/listinfo/gluster-devel>>
>
>
>
>
>
>
>
More information about the Gluster-devel
mailing list