[Gluster-devel] important change to syncop infra
Anand Subramanian
ansubram at redhat.com
Thu Dec 12 06:29:38 UTC 2013
Is doing away with -O2 an option that was ever considered or is it that
we simply must have O2 on?
(I understand that turning off O2 can open some so-far-unexposed can of
worms and a lot of soaking maybe required, and also that we may have had
a good set of perf related reasons to have settled on -O2 in the first
place, but wanted to understand nevertheless...)
Anand
On 12/11/2013 02:21 PM, Pranith Kumar Karampuri wrote:
> hi,
> We found a day-1 bug when syncop_xxx() infra is used inside a synctask with compilation optimization (CFLAGS -O2). This bug has been dormant for at least 2 years.
> There are around ~400(rebalance, replace-brick, bd, self-heal-daemon, quota, fuse lock/fd migration) places where syncop is used in the code base all of which are potential candidates which can take the hit.
>
> I sent first round of patch at http://review.gluster.com/6475 to catch regressions upstream.
> These are the files that are affected by the changes I introduced to fix this:
>
> api/src/glfs-fops.c | 36 ++++++++++++++++++++++++++++++++++
> api/src/glfs-handleops.c | 15 ++++++++++++++
> api/src/glfs-internal.h | 7 +++++++
> api/src/glfs-resolve.c | 10 ++++++++++
> libglusterfs/src/syncop.c | 117 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-------------------------------------
> xlators/cluster/afr/src/afr-self-heald.c | 45 +++++++++++++++++++++++++++++-------------
> xlators/cluster/afr/src/pump.c | 12 ++++++++++--
> xlators/cluster/dht/src/dht-helper.c | 24 +++++++++++++++--------
> xlators/cluster/dht/src/dht-rebalance.c | 168 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---------------------------------------------------------------
> xlators/cluster/dht/src/dht-selfheal.c | 6 ++++--
> xlators/features/locks/src/posix.c | 3 ++-
> xlators/features/qemu-block/src/bdrv-xlator.c | 15 ++++----------
> xlators/features/qemu-block/src/qb-coroutines.c | 14 ++++++++++----
> xlators/mount/fuse/src/fuse-bridge.c | 16 ++++++++++-----
>
> Please review your respective component for these changes in gerrit.
>
> Thanks
> Pranith.
>
> Detailed explanation of the Root cause:
> We found the bug in 'gf_defrag_migrate_data' in rebalance operation:
>
> Lets look at interesting parts of the function:
>
> int
> gf_defrag_migrate_data (xlator_t *this, gf_defrag_info_t *defrag, loc_t *loc,
> dict_t *migrate_data)
> {
> .....
> code section - [ Loop ]
> while ((ret = syncop_readdirp (this, fd, 131072, offset, NULL,
> &entries)) != 0) {
> .....
> code section - [ ERRNO-1 ] (errno of readdirp is stored in readdir_operrno by a thread)
> /* Need to keep track of ENOENT errno, that means, there is no
> need to send more readdirp() */
> readdir_operrno = errno;
> .....
> code section - [ SYNCOP-1 ] (syncop_getxattr is called by a thread)
> ret = syncop_getxattr (this, &entry_loc, &dict,
> GF_XATTR_LINKINFO_KEY);
> code section - [ ERRNO-2] (checking for failures of syncop_getxattr(). This may not always be executed in same thread which executed [SYNCOP-1])
> if (ret < 0) {
> if (errno != ENODATA) {
> loglevel = GF_LOG_ERROR;
> defrag->total_failures += 1;
> .....
> }
>
> the function above could be executed by thread(t1) till [SYNCOP-1] and code from [ERRNO-2] can be executed by a different thread(t2) because of the way syncop-infra schedules the tasks.
>
> when the code is compiled with -O2 optimization this is the assembly code that is generated:
> [ERRNO-1]
> 1165 readdir_operrno = errno; <<---- errno gets expanded as *(__errno_location())
> 0x00007fd149d48b60 <+496>: callq 0x7fd149d410c0 <__errno_location at plt>
> 0x00007fd149d48b72 <+514>: mov %rax,0x50(%rsp) <<------ Address returned by __errno_location() is stored in a special location in stack for later use.
> 0x00007fd149d48b77 <+519>: mov (%rax),%eax
> 0x00007fd149d48b79 <+521>: mov %eax,0x78(%rsp)
> ....
> [ERRNO-2]
> 1281 if (errno != ENODATA) {
> 0x00007fd149d492ae <+2366>: mov 0x50(%rsp),%rax <<----- Because it already stored the address returned by __errno_location(), it just dereferences the address to get the errno value. BUT THIS CODE NEED NOT BE EXECUTED BY SAME THREAD!!!
> 0x00007fd149d492b3 <+2371>: mov $0x9,%ebp
> 0x00007fd149d492b8 <+2376>: mov (%rax),%edi
> 0x00007fd149d492ba <+2378>: cmp $0x3d,%edi
>
> The problem is that __errno_location() value of t1 and t2 are different. So [ERRNO-2] ends up reading errno of t1 instead of errno of t2 even though t2 is executing [ERRNO-2] code section.
>
> When code is compiled without any optimization for [ERRNO-2]:
> 1281 if (errno != ENODATA) {
> 0x00007fd58e7a326f <+2237>: callq 0x7fd58e797300 <__errno_location at plt><<--- As it is calling __errno_location() again it gets the location from t2 so it works as intended.
> 0x00007fd58e7a3274 <+2242>: mov (%rax),%eax
> 0x00007fd58e7a3276 <+2244>: cmp $0x3d,%eax
> 0x00007fd58e7a3279 <+2247>: je 0x7fd58e7a32a1 <gf_defrag_migrate_data+2287>
>
> Fix:
> We decided to make syncop_xxx() return (-errno) value as the return value in case of errors and all the functions which make syncop_xxx() will need to use (-ret) to figure out the reason for failure in case of syncop_xxx() failures.
>
> Pranith
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
> https://lists.nongnu.org/mailman/listinfo/gluster-devel
More information about the Gluster-devel
mailing list