[Gluster-devel] important change to syncop infra
Pranith Kumar Karampuri
pkarampu at redhat.com
Thu Jan 16 12:01:18 UTC 2014
I just reposted this patch addressing all the comments except two. I added those comments from older version of patchset to this one. Could some one who knows the following two files please help me resolve them:
http://review.gluster.org/#/c/6475/5/xlators/features/qemu-block/src/bdrv-xlator.c
http://review.gluster.org/#/c/6475/5/xlators/features/qemu-block/src/qb-coroutines.c
Pranith
----- Original Message -----
> From: "Pranith Kumar Karampuri" <pkarampu at redhat.com>
> To: "Anand Avati" <avati at gluster.org>, "Brian Foster" <bfoster at redhat.com>
> Cc: gluster-devel at nongnu.org
> Sent: Thursday, January 2, 2014 9:16:19 PM
> Subject: Re: [Gluster-devel] important change to syncop infra
>
> Avati, Brian,
> Could you guys please do a round of review for the following files and
> provide your valuable inputs.
> http://review.gluster.com/#/c/6475/4/xlators/features/qemu-block/src/bdrv-xlator.c
> http://review.gluster.com/#/c/6475/4/xlators/features/qemu-block/src/qb-coroutines.c
>
> Pranith.
>
> ----- Original Message -----
> > From: "Pranith Kumar Karampuri" <pkarampu at redhat.com>
> > To: "Anand Avati" <avati at gluster.org>
> > Cc: gluster-devel at nongnu.org
> > Sent: Friday, December 13, 2013 9:09:03 AM
> > Subject: Re: [Gluster-devel] important change to syncop infra
> >
> > hi,
> > If there are no more objections, I will send the final patch on
> > Monday.
> >
> > Pranith.
> > ----- Original Message -----
> > > From: "Anand Avati" <avati at gluster.org>
> > > To: "Anand Subramanian" <anands at redhat.com>
> > > Cc: "Pranith Kumar Karampuri" <pkarampu at redhat.com>,
> > > gluster-devel at nongnu.org
> > > Sent: Thursday, December 12, 2013 12:09:34 PM
> > > Subject: Re: [Gluster-devel] important change to syncop infra
> > >
> > > The decision of doing away with -O2 is beyond our control, and we
> > > shouldn't
> > > have code which depend on optimization to be disabled to behave properly.
> > > Representing -errno as return is the cleanest fix (that's how other
> > > projects which use setcontext/getcontext are behaving too.) If there are
> > > any new further issues which arise from setcontext/getcontext, I'm
> > > tempted
> > > to change the internal implementation to use a a vanilla pthread pool.
> > >
> > > Avati
> > >
> > >
> > > On Wed, Dec 11, 2013 at 10:29 PM, Anand Subramanian
> > > <ansubram at redhat.com>wrote:
> > >
> > > > Is doing away with -O2 an option that was ever considered or is it that
> > > > we
> > > > simply must have O2 on?
> > > >
> > > > (I understand that turning off O2 can open some so-far-unexposed can of
> > > > worms and a lot of soaking maybe required, and also that we may have
> > > > had
> > > > a
> > > > good set of perf related reasons to have settled on -O2 in the first
> > > > place,
> > > > but wanted to understand nevertheless...)
> > > >
> > > > Anand
> > > >
> > > >
> > > >
> > > > On 12/11/2013 02:21 PM, Pranith Kumar Karampuri wrote:
> > > >
> > > >> hi,
> > > >> We found a day-1 bug when syncop_xxx() infra is used inside a
> > > >> synctask with compilation optimization (CFLAGS -O2). This bug has been
> > > >> dormant for at least 2 years.
> > > >> There are around ~400(rebalance, replace-brick, bd, self-heal-daemon,
> > > >> quota, fuse lock/fd migration) places where syncop is used in the code
> > > >> base
> > > >> all of which are potential candidates which can take the hit.
> > > >>
> > > >> I sent first round of patch at http://review.gluster.com/6475 to catch
> > > >> regressions upstream.
> > > >> These are the files that are affected by the changes I introduced to
> > > >> fix
> > > >> this:
> > > >>
> > > >> api/src/glfs-fops.c | 36
> > > >> ++++++++++++++++++++++++++++++++++
> > > >> api/src/glfs-handleops.c | 15 ++++++++++++++
> > > >> api/src/glfs-internal.h | 7 +++++++
> > > >> api/src/glfs-resolve.c | 10 ++++++++++
> > > >> libglusterfs/src/syncop.c | 117
> > > >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > > >> ++++++++++++-------------------------------------
> > > >> xlators/cluster/afr/src/afr-self-heald.c | 45
> > > >> +++++++++++++++++++++++++++++-------------
> > > >> xlators/cluster/afr/src/pump.c | 12 ++++++++++--
> > > >> xlators/cluster/dht/src/dht-helper.c | 24
> > > >> +++++++++++++++--------
> > > >> xlators/cluster/dht/src/dht-rebalance.c | 168
> > > >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > > >> ++++++++++++++++++++++++++++++++++--------------------------
> > > >> -------------------------------------
> > > >> xlators/cluster/dht/src/dht-selfheal.c | 6 ++++--
> > > >> xlators/features/locks/src/posix.c | 3 ++-
> > > >> xlators/features/qemu-block/src/bdrv-xlator.c | 15 ++++----------
> > > >> xlators/features/qemu-block/src/qb-coroutines.c | 14 ++++++++++----
> > > >> xlators/mount/fuse/src/fuse-bridge.c | 16
> > > >> ++++++++++-----
> > > >>
> > > >> Please review your respective component for these changes in gerrit.
> > > >>
> > > >> Thanks
> > > >> Pranith.
> > > >>
> > > >> Detailed explanation of the Root cause:
> > > >> We found the bug in 'gf_defrag_migrate_data' in rebalance operation:
> > > >>
> > > >> Lets look at interesting parts of the function:
> > > >>
> > > >> int
> > > >> gf_defrag_migrate_data (xlator_t *this, gf_defrag_info_t *defrag,
> > > >> loc_t
> > > >> *loc,
> > > >> dict_t *migrate_data)
> > > >> {
> > > >> .....
> > > >> code section - [ Loop ]
> > > >> while ((ret = syncop_readdirp (this, fd, 131072, offset,
> > > >> NULL,
> > > >> &entries)) != 0) {
> > > >> .....
> > > >> code section - [ ERRNO-1 ] (errno of readdirp is stored in
> > > >> readdir_operrno by a thread)
> > > >> /* Need to keep track of ENOENT errno, that means,
> > > >> there
> > > >> is no
> > > >> need to send more readdirp() */
> > > >> readdir_operrno = errno;
> > > >> .....
> > > >> code section - [ SYNCOP-1 ] (syncop_getxattr is called by a thread)
> > > >> ret = syncop_getxattr (this, &entry_loc,
> > > >> &dict,
> > > >> GF_XATTR_LINKINFO_KEY);
> > > >> code section - [ ERRNO-2] (checking for failures of
> > > >> syncop_getxattr().
> > > >> This may not always be executed in same thread which executed
> > > >> [SYNCOP-1])
> > > >> if (ret < 0) {
> > > >> if (errno != ENODATA) {
> > > >> loglevel = GF_LOG_ERROR;
> > > >> defrag->total_failures += 1;
> > > >> .....
> > > >> }
> > > >>
> > > >> the function above could be executed by thread(t1) till [SYNCOP-1] and
> > > >> code from [ERRNO-2] can be executed by a different thread(t2) because
> > > >> of
> > > >> the way syncop-infra schedules the tasks.
> > > >>
> > > >> when the code is compiled with -O2 optimization this is the assembly
> > > >> code
> > > >> that is generated:
> > > >> [ERRNO-1]
> > > >> 1165 readdir_operrno = errno; <<---- errno gets
> > > >> expanded as *(__errno_location())
> > > >> 0x00007fd149d48b60 <+496>: callq 0x7fd149d410c0
> > > >> <__errno_location at plt>
> > > >> 0x00007fd149d48b72 <+514>: mov %rax,0x50(%rsp) <<------
> > > >> Address returned by __errno_location() is stored in a special location
> > > >> in
> > > >> stack for later use.
> > > >> 0x00007fd149d48b77 <+519>: mov (%rax),%eax
> > > >> 0x00007fd149d48b79 <+521>: mov %eax,0x78(%rsp)
> > > >> ....
> > > >> [ERRNO-2]
> > > >> 1281 if (errno != ENODATA) {
> > > >> 0x00007fd149d492ae <+2366>: mov 0x50(%rsp),%rax <<-----
> > > >> Because it already stored the address returned by __errno_location(),
> > > >> it
> > > >> just dereferences the address to get the errno value. BUT THIS CODE
> > > >> NEED
> > > >> NOT BE EXECUTED BY SAME THREAD!!!
> > > >> 0x00007fd149d492b3 <+2371>: mov $0x9,%ebp
> > > >> 0x00007fd149d492b8 <+2376>: mov (%rax),%edi
> > > >> 0x00007fd149d492ba <+2378>: cmp $0x3d,%edi
> > > >>
> > > >> The problem is that __errno_location() value of t1 and t2 are
> > > >> different.
> > > >> So [ERRNO-2] ends up reading errno of t1 instead of errno of t2 even
> > > >> though
> > > >> t2 is executing [ERRNO-2] code section.
> > > >>
> > > >> When code is compiled without any optimization for [ERRNO-2]:
> > > >> 1281 if (errno != ENODATA) {
> > > >> 0x00007fd58e7a326f <+2237>: callq 0x7fd58e797300
> > > >> <__errno_location at plt><<--- As it is calling __errno_location() again
> > > >> it
> > > >> gets the location from t2 so it works as intended.
> > > >> 0x00007fd58e7a3274 <+2242>: mov (%rax),%eax
> > > >> 0x00007fd58e7a3276 <+2244>: cmp $0x3d,%eax
> > > >> 0x00007fd58e7a3279 <+2247>: je 0x7fd58e7a32a1
> > > >> <gf_defrag_migrate_data+2287>
> > > >>
> > > >> Fix:
> > > >> We decided to make syncop_xxx() return (-errno) value as the return
> > > >> value
> > > >> in case of errors and all the functions which make syncop_xxx() will
> > > >> need
> > > >> to use (-ret) to figure out the reason for failure in case of
> > > >> syncop_xxx()
> > > >> failures.
> > > >>
> > > >> Pranith
> > > >>
> > > >> _______________________________________________
> > > >> Gluster-devel mailing list
> > > >> Gluster-devel at nongnu.org
> > > >> https://lists.nongnu.org/mailman/listinfo/gluster-devel
> > > >>
> > > >
> > > >
> > > > _______________________________________________
> > > > Gluster-devel mailing list
> > > > Gluster-devel at nongnu.org
> > > > https://lists.nongnu.org/mailman/listinfo/gluster-devel
> > > >
> > >
> >
> > _______________________________________________
> > Gluster-devel mailing list
> > Gluster-devel at nongnu.org
> > https://lists.nongnu.org/mailman/listinfo/gluster-devel
> >
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
> https://lists.nongnu.org/mailman/listinfo/gluster-devel
>
More information about the Gluster-devel
mailing list