[Gluster-devel] important change to syncop infra

Thu Dec 12 06:39:34 UTC 2013

The decision of doing away with -O2 is beyond our control, and we shouldn't
have code which depend on optimization to be disabled to behave properly.
Representing -errno as return is the cleanest fix (that's how other
projects which use setcontext/getcontext are behaving too.) If there are
any new further issues which arise from setcontext/getcontext, I'm tempted
to change the internal implementation to use a a vanilla pthread pool.

Avati

On Wed, Dec 11, 2013 at 10:29 PM, Anand Subramanian <ansubram at redhat.com>wrote:

> Is doing away with -O2 an option that was ever considered or is it that we
> simply must have O2 on?
>
> (I understand that turning off O2 can open some so-far-unexposed can of
> worms and a lot of soaking maybe required, and also that we may have had a
> good set of perf related reasons to have settled on -O2 in the first place,
> but wanted to understand nevertheless...)
>
> Anand
>
>
>
> On 12/11/2013 02:21 PM, Pranith Kumar Karampuri wrote:
>
>> hi,
>>      We found a day-1 bug when syncop_xxx() infra is used inside a
>> synctask with compilation optimization (CFLAGS -O2). This bug has been
>> dormant for at least 2 years.
>> There are around ~400(rebalance, replace-brick, bd, self-heal-daemon,
>> quota, fuse lock/fd migration) places where syncop is used in the code base
>> all of which are potential candidates which can take the hit.
>>
>> I sent first round of patch at http://review.gluster.com/6475 to catch
>> regressions upstream.
>> These are the files that are affected by the changes I introduced to fix
>> this:
>>
>>   api/src/glfs-fops.c                             |  36
>> ++++++++++++++++++++++++++++++++++
>>   api/src/glfs-handleops.c                        |  15 ++++++++++++++
>>   api/src/glfs-internal.h                         |   7 +++++++
>>   api/src/glfs-resolve.c                          |  10 ++++++++++
>>   libglusterfs/src/syncop.c                       | 117
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> ++++++++++++-------------------------------------
>>   xlators/cluster/afr/src/afr-self-heald.c        |  45
>> +++++++++++++++++++++++++++++-------------
>>   xlators/cluster/afr/src/pump.c                  |  12 ++++++++++--
>>   xlators/cluster/dht/src/dht-helper.c            |  24
>> +++++++++++++++--------
>>   xlators/cluster/dht/src/dht-rebalance.c         | 168
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> ++++++++++++++++++++++++++++++++++--------------------------
>> -------------------------------------
>>   xlators/cluster/dht/src/dht-selfheal.c          |   6 ++++--
>>   xlators/features/locks/src/posix.c              |   3 ++-
>>   xlators/features/qemu-block/src/bdrv-xlator.c   |  15 ++++----------
>>   xlators/features/qemu-block/src/qb-coroutines.c |  14 ++++++++++----
>>   xlators/mount/fuse/src/fuse-bridge.c            |  16 ++++++++++-----
>>
>> Please review your respective component for these changes in gerrit.
>>
>> Thanks
>> Pranith.
>>
>> Detailed explanation of the Root cause:
>> We found the bug in 'gf_defrag_migrate_data' in rebalance operation:
>>
>> Lets look at interesting parts of the function:
>>
>> int
>> gf_defrag_migrate_data (xlator_t *this, gf_defrag_info_t *defrag, loc_t
>> *loc,
>>                          dict_t *migrate_data)
>> {
>> .....
>> code section - [ Loop ]
>>          while ((ret = syncop_readdirp (this, fd, 131072, offset, NULL,
>>                                         &entries)) != 0) {
>> .....
>> code section - [ ERRNO-1 ] (errno of readdirp is stored in
>> readdir_operrno by a thread)
>>                  /* Need to keep track of ENOENT errno, that means, there
>> is no
>>                     need to send more readdirp() */
>>                  readdir_operrno = errno;
>> .....
>> code section - [ SYNCOP-1 ] (syncop_getxattr is called by a thread)
>>                          ret = syncop_getxattr (this, &entry_loc, &dict,
>>                                                 GF_XATTR_LINKINFO_KEY);
>> code section - [ ERRNO-2]   (checking for failures of syncop_getxattr().
>> This may not always be executed in same thread which executed [SYNCOP-1])
>>                          if (ret < 0) {
>>                                  if (errno != ENODATA) {
>>                                          loglevel = GF_LOG_ERROR;
>>                                          defrag->total_failures += 1;
>> .....
>> }
>>
>> the function above could be executed by thread(t1) till [SYNCOP-1] and
>> code from [ERRNO-2] can be executed by a different thread(t2) because of
>> the way syncop-infra schedules the tasks.
>>
>> when the code is compiled with -O2 optimization this is the assembly code
>> that is generated:
>>   [ERRNO-1]
>> 1165                        readdir_operrno = errno; <<---- errno gets
>> expanded as *(__errno_location())
>>     0x00007fd149d48b60 <+496>:        callq  0x7fd149d410c0
>> <__errno_location at plt>
>>     0x00007fd149d48b72 <+514>:        mov    %rax,0x50(%rsp) <<------
>> Address returned by __errno_location() is stored in a special location in
>> stack for later use.
>>     0x00007fd149d48b77 <+519>:        mov    (%rax),%eax
>>     0x00007fd149d48b79 <+521>:        mov    %eax,0x78(%rsp)
>> ....
>>   [ERRNO-2]
>> 1281                                        if (errno != ENODATA) {
>>     0x00007fd149d492ae <+2366>:        mov    0x50(%rsp),%rax <<-----
>>  Because it already stored the address returned by __errno_location(), it
>> just dereferences the address to get the errno value. BUT THIS CODE NEED
>> NOT BE EXECUTED BY SAME THREAD!!!
>>     0x00007fd149d492b3 <+2371>:        mov    $0x9,%ebp
>>     0x00007fd149d492b8 <+2376>:        mov    (%rax),%edi
>>     0x00007fd149d492ba <+2378>:        cmp    $0x3d,%edi
>>
>> The problem is that __errno_location() value of t1 and t2 are different.
>> So [ERRNO-2] ends up reading errno of t1 instead of errno of t2 even though
>> t2 is executing [ERRNO-2] code section.
>>
>> When code is compiled without any optimization for [ERRNO-2]:
>> 1281                                        if (errno != ENODATA) {
>>     0x00007fd58e7a326f <+2237>:        callq  0x7fd58e797300
>> <__errno_location at plt><<--- As it is calling __errno_location() again it
>> gets the location from t2 so it works as intended.
>>     0x00007fd58e7a3274 <+2242>:        mov    (%rax),%eax
>>     0x00007fd58e7a3276 <+2244>:        cmp    $0x3d,%eax
>>     0x00007fd58e7a3279 <+2247>:        je     0x7fd58e7a32a1
>> <gf_defrag_migrate_data+2287>
>>
>> Fix:
>> We decided to make syncop_xxx() return (-errno) value as the return value
>> in case of errors and all the functions which make syncop_xxx() will need
>> to use (-ret) to figure out the reason for failure in case of syncop_xxx()
>> failures.
>>
>> Pranith
>>
>> _______________________________________________
>> Gluster-devel mailing list
>> Gluster-devel at nongnu.org
>> https://lists.nongnu.org/mailman/listinfo/gluster-devel
>>
>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
> https://lists.nongnu.org/mailman/listinfo/gluster-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-devel/attachments/20131211/baa38ac8/attachment-0001.html>