[Gluster-devel] Fwd: Question about EC locking
jayakrishnan mm
jayakrishnan.mm at gmail.com
Thu Feb 2 07:12:32 UTC 2017
On Fri, Jan 13, 2017 at 8:03 PM, Xavier Hernandez <xhernandez at datalab.es>
wrote:
> Hi,
>
> On 13/01/17 10:58, jayakrishnan mm wrote:
>
>> Hi Xavier,
>> I went through the source code. Some questions remain.
>>
>> 1. If two clients try to write to same file, it should succeed, even if
>> they overlap. (Locks should ensure it happens in sequence, in the bricks).
>> from the source code
>> lock->flock.l_type = F_WRLCK;
>> lock->flock.l_whence = SEEK_SET;
>>
>> fop->flock.l_len += ec_adjust_offset(fop->xl->private,
>> &fop->flock.l_start, 1);
>> fop->flock.l_len = ec_adjust_size(fop->xl->private,
>> fop->flock.l_len, 1);
>> if flock.l_len is 0, the entire file is locked for writing
>>
>> In my test case with 2 clients, I always get flock.l_len as 0. But
>> still I am able to write to the same file from both clients at the
>> same time.
>>
>
> How are you sure you are really writing at the same time ? do you get
> partial writes from some of the client ?
I am not sure, if they are happening simultaneously. I am using fio to do
that.
>
>
>
>> If it is acquiring lock chunk by chunk, why I am getting l_len =0
>> always ?
>>
>
> EC doesn't acquire partial locks. The entire file is locked when a
> modification is needed. This makes possible to reuse locks for future
> operations (eager locking).
>
> Why I am not getting the actual write size and offset f(for
>> flock.l_len & flock.l_start respectively) for each write FOP ?
>> (In afr , it is set to transaction.len transaction.start respectively,
>> which in turn is write length & offset for the normal write case)
>>
>
> Because an erasure code splits the data is smaller fragments for each
> brick, so offsets and lengths need to be adjusted.
>
>
>> 2. As per source code ,a full file lock is taken by the shd also.
>>
>> ec_heal_inodelk(heal, F_WRLCK, 1, 0, 0);
>> which means offset=0 & size=0 in ec_heal_lock() function in ec-heal.c
>> flock.l_start = offset;
>> flock.l_len = size;
>> Does it mean , in a single file write cannot happen simultaneously with
>> healing?
>>
>
> Correct. Heal procedure is like an additional client. If a client and the
> heal process try to write at the same time, they must be serialized, like
> any other regular write. However heal only takes the full lock for some
> critical operations. Regular self heal of file contents is done locking
> chunk by chunk.
>
Have got a question about index heal/full heal.
As per the code, index healer thread (ec_shd_index_healer)is created when
there is a child_up event OR when there is a
TRANSLATOR_OP/GF_SHD_OP_HEAL_INDEX.
When does the second case arise ?
Full heal thread(ec_shd_full_healer) is created only when
TRANSLATOR_OP/GF_SHD_OP_HEAL_FULL arise. Does this happen during replace
brick condition only ?
Thanks & regards
JK
>
> Xavi
>
>
>> Correct me , if I am wrong.
>>
>> Best Regards
>> JK
>>
>>
>>
>>
>>
>>
>> On Wed, Dec 14, 2016 at 12:07 PM, jayakrishnan mm
>> <jayakrishnan.mm at gmail.com <mailto:jayakrishnan.mm at gmail.com>> wrote:
>>
>> Thanks Xavier, for making it clear.
>> Regards
>> JK
>>
>>
>> On Dec 13, 2016 3:52 PM, "Xavier Hernandez" <xhernandez at datalab.es
>> <mailto:xhernandez at datalab.es>> wrote:
>>
>> Hi JK,
>>
>>
>> On 12/13/2016 08:34 AM, jayakrishnan mm wrote:
>>
>> Dear Xavi,
>>
>> How do I test the locks, for example locks for write fop.
>> I have two
>> clients(independent), both are trying to write to same file.
>>
>>
>> 1. According to my understanding, both can successfully
>> write if the
>> offsets don't overlap . I mean, the WRITE FOP takes a chunk
>> lock on the
>> file . As
>> long as the clients don't try to write to the same chunk,
>> it should be
>> OK. If no locks present, it can lead to inconsistency.
>>
>>
>> With locks all writes will be fine as defined by posix (i.e. the
>> final result will be equivalent to the sequential execution of
>> both operations, though in an undefined order), even if they
>> overlap. Without locks, there are chances that some bricks
>> execute the operations in one order and the remaining bricks
>> execute the same operations in the reverse order, causing data
>> corruption.
>>
>>
>>
>>
>> 2. Different FOPs can always run simultaneously. (Example
>> WRITE and
>> READ FOPs, or two READ FOPs).
>>
>>
>> All fops can be executed concurrently. If there's any chance
>> that two operations could interfere, locks are taken in the
>> appropriate places. For example, reads cannot be merged with
>> overlapping writes. Otherwise they could return inconsistent data.
>>
>>
>>
>> 3. WRITE & some metadata FOP (like setattr) together .
>> Cannot happen
>> together with locks , even though chances are very low.
>>
>>
>> As in 2, if there's any possible interference, the appropriate
>> locks will be taken.
>>
>> You can look at the code to see which locks are taken for each
>> fop. See the corresponding ec_manager_<fop>() function, in the
>> EC_STATE_LOCK switch case. There you will see calls to
>> ec_lock_prepare_xxx() for each taken lock.
>>
>> Xavi
>>
>>
>> Pls. clarify.
>>
>> Best regards
>> JK
>>
>>
>>
>> On Wed, Nov 30, 2016 at 5:49 PM, jayakrishnan mm
>> <jayakrishnan.mm at gmail.com
>> <mailto:jayakrishnan.mm at gmail.com>
>> <mailto:jayakrishnan.mm at gmail.com
>> <mailto:jayakrishnan.mm at gmail.com>>> wrote:
>>
>> Hi Xavier,
>>
>> Thank you very much for your explanation. This helped me
>> to
>> understand more about locking in EC.
>>
>> Best Regards
>> JK
>>
>>
>> On Mon, Nov 28, 2016 at 4:17 PM, Xavier Hernandez
>> <xhernandez at datalab.es <mailto:xhernandez at datalab.es>
>> <mailto:xhernandez at datalab.es
>>
>> <mailto:xhernandez at datalab.es>>> wrote:
>>
>> Hi,
>>
>> On 11/28/2016 02:59 AM, jayakrishnan mm wrote:
>>
>> Hi Xavier,
>>
>> Notice that EC xlator uses blocking locks. Any
>> specific
>> reason for this?
>>
>>
>> In a distributed filesystem like gluster a
>> synchronization
>> mechanism is a must to avoid data corruption.
>>
>>
>> Do you think this will affect the performance ?
>>
>>
>> Of course the need for locks has a performance
>> impact, and we
>> cannot avoid them to guarantee data integrity.
>> However some
>> optimizations have been applied, specially the eager
>> locking
>> which allows a lock to be reused without
>> unlocking/locking again.
>>
>>
>> (In comparison AFR first tries non blocking
>> locks and if not
>> successful, tries blocking locks then)
>>
>>
>> EC also tries a non-blocking lock first.
>>
>>
>> Also, why two locks are needed per FOP ? One
>> for normal
>> I/O and
>> another for self healing?
>>
>>
>> The only fop that currently needs two locks is
>> 'rename', and
>> only when source and destination directories are
>> different. All
>> other fops only take one lock at most.
>>
>> Best regards,
>>
>> Xavi
>>
>>
>> Best regards
>> JK
>>
>>
>> _______________________________________________
>> Gluster-devel mailing list
>> Gluster-devel at gluster.org
>> <mailto:Gluster-devel at gluster.org>
>> <mailto:Gluster-devel at gluster.org
>> <mailto:Gluster-devel at gluster.org>>
>>
>> http://www.gluster.org/mailman/listinfo/gluster-devel
>> <http://www.gluster.org/mailman/listinfo/gluster-devel>
>>
>> <http://www.gluster.org/mailman/listinfo/gluster-devel
>> <http://www.gluster.org/mailman/listinfo/gluster-devel>>
>>
>>
>>
>>
>>
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-devel/attachments/20170202/edc2c475/attachment-0001.html>
More information about the Gluster-devel
mailing list