[Gluster-devel] important change to syncop infra
Pranith Kumar Karampuri
pkarampu at redhat.com
Wed Dec 11 13:13:34 UTC 2013
hi Xavier,
> The str_uuid pointer used can point to the uuid of another thread that
> could have been modified. I don't know very well the internals of
> syncops and synctask, but I think it's even possible that another task
> running on the same thread while the current one was sleeping could
> modify the contents of str_uuid.
>
> This case seems very obvious and useless but maybe other combinations
> are harder to detect.
>
> I think that currently they are only used by uuid_utoa() and
> lkowner_utoa() to write log messages, which seems safe, but we will have
> to be careful when using syncops to avoid possible future problems like
> this.
Probably the reason why I never considered these functions as a danger is for some reason
I always thought they are supposed to be used in logging functions. After your
response I went through the commits and indeed they are introduced just for logging.
Here is the snippet from the bug (https://bugzilla.redhat.com/show_bug.cgi?id=GLUSTER-2308)
as part of which this is introduced:
"This function would take in a uuid_t and return a static char buffer.
This would be useful in logging the uuid_t. We already make use of
uuid_unparse() provided by libuuid to convert uuid_t to string but that
takes an additional out buffer parameter, does not return the out
buffer and hence cannot be used directly within logging functions like
gf_log (). This necessitates uuid_unparse () to be done even if we are
not going to log the uuid (due to the log message being at a lower log
level) and hence it does have a performance impact. In all such
instances, let us get rid of uuid_unparse() and use uuid_utoa() instead.
Prototype :-> char * uuid_utoa (uuid_t uuid)"
lkowner_utoa is implemented much later for similar usage.
Both these functions have re-entrant variants, lkowner_utoa_r() and uuid_utoa_r().
So while the case that you showed for uuid_utoa() exposes the bug, I am not sure if it is supposed to be used in that way to begin with.
Even in gf_log() if we want to log more than 1 uuid we should not be using this function because it ends up logging same uuid twice.
Pranith.
> Xavi
>
> El 11/12/13 12:02, Pranith Kumar Karampuri ha escrit:
> >> it's really a very interesting and hard to find bug, but I'm wondering
> >> how we can prevent this to happen in the general case. There might be
> >> other operations based on pointers to thread local storage that will
> >> suffer this problem. Probably 'errno' is one of the most dangerous, but
> >> TLS is also used to resolve 'THIS' in a very similar way to errno, and
> >> there are temporary uuid and lkowner values also stored into TLS. More
> >> things could appear in the future. These are also potential cases to
> >> have problems with syncops.
> >
> >> An access to THIS before and after a syncop might trigger this bug, right
> >> ?
> > for errno it happens because it has the attribute __attribute__((const)).
> >
> > # grep errno_location /usr/include/bits/errno.h
> > extern int *__errno_location (void) __THROW __attribute__ ((__const__));
> > # define errno (*__errno_location ())
> >
> > (The following article explains it best IMO:
> > http://lwn.net/Articles/285332)
> >
> > But it is not there for lkowner, uuid-buf, THIS. Just to confirm, I added
> > the following code
> > ------------------------------------------------------
> > + xlator_t *tmp_this = NULL;
> >
> > gf_log (this->name, GF_LOG_INFO, "migrate data called on %s",
> > loc->path);
> > @@ -1258,8 +1259,10 @@ gf_defrag_migrate_data (xlator_t *this,
> > gf_defrag_info_t *defrag, loc_t *loc,
> > * and also that guarantees that file has to be
> > mostly
> > * migrated */
> >
> > + tmp_this = THIS;
> > ret = syncop_getxattr (this, &entry_loc, &dict,
> > GF_XATTR_LINKINFO_KEY);
> > + tmp_this = THIS;
> > if (ret < 0) {
> > if (errno != ENODATA) {
> > loglevel = GF_LOG_ERROR;
> > @@ -1267,7 +1270,7 @@ gf_defrag_migrate_data (xlator_t *this,
> > gf_defrag_info_t *defrag, loc_t *loc,
> > } else {
> > loglevel = GF_LOG_TRACE;
> > }
> > - gf_log (this->name, loglevel, "%s: failed
> > to "
> > + gf_log (tmp_this->name, loglevel, "%s:
> > failed to "
> > -----------------------------------------------------
> > assembly code it generated:
> > =====================================================
> > 1262 tmp_this = THIS;
> > 0x00007faf17120dbc <+1788>: xor %eax,%eax
> > 0x00007faf17120dbe <+1790>: callq 0x7faf171191d0
> > <__glusterfs_this_location at plt> <<----------------first call to THIS
> >
> > 1263 ret = syncop_getxattr (this, &entry_loc,
> > &dict,
> > 0x00007faf17120dc3 <+1795>: mov 0x60(%rsp),%rdx
> > 0x00007faf17120dc8 <+1800>: mov 0x28(%rsp),%rsi
> > 0x00007faf17120dcd <+1805>: lea 0x2b550(%rip),%rcx #
> > 0x7faf1714c324
> > 0x00007faf17120dd4 <+1812>: mov 0x18(%rsp),%rdi
> > 0x00007faf17120dd9 <+1817>: callq 0x7faf17118a30 <syncop_getxattr at plt>
> > 0x00007faf17120dde <+1822>: mov %eax,%ebp
> >
> > 1264 GF_XATTR_LINKINFO_KEY);
> > ---Type <return> to continue, or q <return> to quit---
> > 1265 tmp_this = THIS;
> > 0x00007faf17120de0 <+1824>: xor %eax,%eax
> > 0x00007faf17120de2 <+1826>: callq 0x7faf171191d0
> > <__glusterfs_this_location at plt> <<----------------second call to THIS
> > 0x00007faf17120de9 <+1833>: mov (%rax),%r15
> >
> > 1266 if (ret < 0) {
> > 0x00007faf17120de7 <+1831>: test %ebp,%ebp
> > 0x00007faf17120dec <+1836>: js 0x7faf1712100e
> > <gf_defrag_migrate_data+2382>
> >
> > 1267 if (errno != ENODATA) {
> > 0x00007faf1712100e <+2382>: mov 0x50(%rsp),%rax
> > <<<---------------------------------------------check that errno does
> > not callq __errno_location
> > 0x00007faf17121013 <+2387>: mov $0x9,%ebp
> > 0x00007faf17121018 <+2392>: mov (%rax),%edi
> > 0x00007faf1712101a <+2394>: cmp $0x3d,%edi
> >
> > ================================================
> >
> >> I think it's very difficult to track all these cases and handle them
> >> correctly. Another solution could be to tell the compiler to forget
> >> previous pointer values when a syncop is called, but I don't see any way
> >> to do so. Since it's storing pointers to values, any barrier or volatile
> >> seems useless.
> > We can do this by putting __attribute__((noinline)) for such functions. But
> > since errno is not in our control we can't do that.
> >
> >> Xavi
> >>
> >>
> >> El 11/12/13 09:51, Pranith Kumar Karampuri ha escrit:
> >>> hi,
> >>> We found a day-1 bug when syncop_xxx() infra is used inside a
> >>> synctask
> >>> with compilation optimization (CFLAGS -O2). This bug has been
> >>> dormant
> >>> for at least 2 years.
> >>> There are around ~400(rebalance, replace-brick, bd, self-heal-daemon,
> >>> quota, fuse lock/fd migration) places where syncop is used in the code
> >>> base all of which are potential candidates which can take the hit.
> >>>
> >>> I sent first round of patch at http://review.gluster.com/6475 to catch
> >>> regressions upstream.
> >>> These are the files that are affected by the changes I introduced to fix
> >>> this:
> >>>
> >>> api/src/glfs-fops.c | 36
> >>> ++++++++++++++++++++++++++++++++++
> >>> api/src/glfs-handleops.c | 15 ++++++++++++++
> >>> api/src/glfs-internal.h | 7 +++++++
> >>> api/src/glfs-resolve.c | 10 ++++++++++
> >>> libglusterfs/src/syncop.c | 117
> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-------------------------------------
> >>> xlators/cluster/afr/src/afr-self-heald.c | 45
> >>> +++++++++++++++++++++++++++++-------------
> >>> xlators/cluster/afr/src/pump.c | 12 ++++++++++--
> >>> xlators/cluster/dht/src/dht-helper.c | 24
> >>> +++++++++++++++--------
> >>> xlators/cluster/dht/src/dht-rebalance.c | 168
> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---------------------------------------------------------------
> >>> xlators/cluster/dht/src/dht-selfheal.c | 6 ++++--
> >>> xlators/features/locks/src/posix.c | 3 ++-
> >>> xlators/features/qemu-block/src/bdrv-xlator.c | 15 ++++----------
> >>> xlators/features/qemu-block/src/qb-coroutines.c | 14 ++++++++++----
> >>> xlators/mount/fuse/src/fuse-bridge.c | 16 ++++++++++-----
> >>>
> >>> Please review your respective component for these changes in gerrit.
> >>>
> >>> Thanks
> >>> Pranith.
> >>>
> >>> Detailed explanation of the Root cause:
> >>> We found the bug in 'gf_defrag_migrate_data' in rebalance operation:
> >>>
> >>> Lets look at interesting parts of the function:
> >>>
> >>> int
> >>> gf_defrag_migrate_data (xlator_t *this, gf_defrag_info_t *defrag, loc_t
> >>> *loc,
> >>> dict_t *migrate_data)
> >>> {
> >>> .....
> >>> code section - [ Loop ]
> >>> while ((ret = syncop_readdirp (this, fd, 131072, offset, NULL,
> >>> &entries)) != 0) {
> >>> .....
> >>> code section - [ ERRNO-1 ] (errno of readdirp is stored in
> >>> readdir_operrno
> >>> by a thread)
> >>> /* Need to keep track of ENOENT errno, that means,
> >>> there
> >>> is no
> >>> need to send more readdirp() */
> >>> readdir_operrno = errno;
> >>> .....
> >>> code section - [ SYNCOP-1 ] (syncop_getxattr is called by a thread)
> >>> ret = syncop_getxattr (this, &entry_loc, &dict,
> >>> GF_XATTR_LINKINFO_KEY);
> >>> code section - [ ERRNO-2] (checking for failures of syncop_getxattr().
> >>> This may not always be executed in same thread which executed [SYNCOP-1])
> >>> if (ret < 0) {
> >>> if (errno != ENODATA) {
> >>> loglevel = GF_LOG_ERROR;
> >>> defrag->total_failures += 1;
> >>> .....
> >>> }
> >>>
> >>> the function above could be executed by thread(t1) till [SYNCOP-1] and
> >>> code
> >>> from [ERRNO-2] can be executed by a different thread(t2) because of the
> >>> way syncop-infra schedules the tasks.
> >>>
> >>> when the code is compiled with -O2 optimization this is the assembly code
> >>> that is generated:
> >>> [ERRNO-1]
> >>> 1165 readdir_operrno = errno; <<---- errno gets
> >>> expanded as *(__errno_location())
> >>> 0x00007fd149d48b60 <+496>: callq 0x7fd149d410c0
> >>> <__errno_location at plt>
> >>> 0x00007fd149d48b72 <+514>: mov %rax,0x50(%rsp) <<------
> >>> Address returned by __errno_location() is stored in a special
> >>> location
> >>> in stack for later use.
> >>> 0x00007fd149d48b77 <+519>: mov (%rax),%eax
> >>> 0x00007fd149d48b79 <+521>: mov %eax,0x78(%rsp)
> >>> ....
> >>> [ERRNO-2]
> >>> 1281 if (errno != ENODATA) {
> >>> 0x00007fd149d492ae <+2366>: mov 0x50(%rsp),%rax <<-----
> >>> Because it already stored the address returned by
> >>> __errno_location(),
> >>> it just dereferences the address to get the errno value. BUT THIS
> >>> CODE
> >>> NEED NOT BE EXECUTED BY SAME THREAD!!!
> >>> 0x00007fd149d492b3 <+2371>: mov $0x9,%ebp
> >>> 0x00007fd149d492b8 <+2376>: mov (%rax),%edi
> >>> 0x00007fd149d492ba <+2378>: cmp $0x3d,%edi
> >>>
> >>> The problem is that __errno_location() value of t1 and t2 are different.
> >>> So
> >>> [ERRNO-2] ends up reading errno of t1 instead of errno of t2 even though
> >>> t2 is executing [ERRNO-2] code section.
> >>>
> >>> When code is compiled without any optimization for [ERRNO-2]:
> >>> 1281 if (errno != ENODATA) {
> >>> 0x00007fd58e7a326f <+2237>: callq 0x7fd58e797300
> >>> <__errno_location at plt><<--- As it is calling __errno_location()
> >>> again
> >>> it gets the location from t2 so it works as intended.
> >>> 0x00007fd58e7a3274 <+2242>: mov (%rax),%eax
> >>> 0x00007fd58e7a3276 <+2244>: cmp $0x3d,%eax
> >>> 0x00007fd58e7a3279 <+2247>: je 0x7fd58e7a32a1
> >>> <gf_defrag_migrate_data+2287>
> >>>
> >>> Fix:
> >>> We decided to make syncop_xxx() return (-errno) value as the return value
> >>> in case of errors and all the functions which make syncop_xxx() will need
> >>> to use (-ret) to figure out the reason for failure in case of
> >>> syncop_xxx()
> >>> failures.
> >>>
> >>> Pranith
> >>>
> >>> _______________________________________________
> >>> Gluster-devel mailing list
> >>> Gluster-devel at nongnu.org
> >>> https://lists.nongnu.org/mailman/listinfo/gluster-devel
> >>
> > _______________________________________________
> > Gluster-devel mailing list
> > Gluster-devel at nongnu.org
> > https://lists.nongnu.org/mailman/listinfo/gluster-devel
>
>
More information about the Gluster-devel
mailing list