[Bugs] [Bug 1334700] readdir-ahead does not fetch xattrs that md-cache needs in it's internal calls

Tue May 10 13:05:36 UTC 2016

https://bugzilla.redhat.com/show_bug.cgi?id=1334700

Prashanth Pai <ppai at redhat.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
              Flags|needinfo?(ppai at redhat.com)  |

--- Comment #3 from Prashanth Pai <ppai at redhat.com> ---
A gluster-swift test case fails when "cache-swift-metadata" is turned on
(default).

Problem: glusterfs wrongly reports to gluster-swift that the xattr does not
exist even though it exists in the backend. This triggers gluster-swift to
create the xattr, thus over-writing the existing xattr causing serious metadata
loss!

This is roughly the series of operations done by gluster-swift object server:

[pid  2937] open("/mnt/gluster-object/one/c1/manifest", O_RDONLY|O_CLOEXEC) =
10
[pid  2937] fgetxattr(10, "user.swift.metadata", 0x0, 0) = 216
[pid  2937] fgetxattr(10, "user.swift.metadata",
"{"Content-Length":"0","X-Object-Manifest":"c1/segments1/","ETag":"d41d8cd98f00b204e9800998ecf8427e","X-Timestamp":"1462164335.27522","X-Object-Type":"file","X-Type":"Object","Content-Type":"application/octet-stream"}",
216) = 216
[pid  2937] close(10)                   = 0
[pid  2937] close(8)                    = 0
[pid  2937] open("/mnt/gluster-object/one/c1/segments1/0", O_RDONLY|O_CLOEXEC)
= 10
[pid  2937] fgetxattr(10, "user.swift.metadata", 0x0, 0) = 189
[pid  2937] fgetxattr(10, "user.swift.metadata",
"{"Content-Length":"3","ETag":"f97c5d29941bfb1b2fdab0874906ab82","X-Timestamp":"1461849857.48425","X-Object-Type":"file","X-Type":"Object","Content-Type":"application/x-www-form-urlencoded"}",
189) = 189
[pid  2937] close(10)                   = 0
[pid  2937] close(8)                    = 0
[pid  2937] open("/mnt/gluster-object/one/c1/segments1/1", O_RDONLY|O_CLOEXEC)
= 10
[pid  2937] fgetxattr(10, "user.swift.metadata", 0x0, 0) = 189
[pid  2937] fgetxattr(10, "user.swift.metadata",
"{"Content-Length":"3","ETag":"b8a9f715dbb64fd5c56e7783c6820a61","X-Timestamp":"1461849867.95959","X-Object-Type":"file","X-Type":"Object","Content-Type":"application/x-www-form-urlencoded"}",
189) = 189
[pid  2937] close(10)                   = 0
[pid  2937] close(8)                    = 0
[pid  2937] open("/mnt/gluster-object/one/c1/manifest", O_RDONLY|O_CLOEXEC) =
10
[pid  2937] fgetxattr(10, "user.swift.metadata", 0x0, 0) = -1 ENODATA (No data
available)
[pid  2937] close(13)                   = 0
[pid  2937] fgetxattr(10, "user.swift.metadata", 0x0, 0) = -1 ENODATA (No data
available)
[pid  2937] fsetxattr(10, "user.swift.metadata",
"{"Content-Length":0,"ETag":"d41d8cd98f00b204e9800998ecf8427e","X-Timestamp":"1462164335.35395","X-Object-Type":"file","X-Type":"Object","Content-Type":"application/octet-stream"}",
178, 0) = 0
[pid  2937] fgetxattr(10, "user.swift.metadata", 0x0, 0) = 178
[pid  2937] fgetxattr(10, "user.swift.metadata",
"{"Content-Length":0,"ETag":"d41d8cd98f00b204e9800998ecf8427e","X-Timestamp":"1462164335.35395","X-Object-Type":"file","X-Type":"Object","Content-Type":"application/octet-stream"}",
178) = 178
[pid  2937] close(10)                   = 0
[pid  2937] close(8)                    = 0

As it can be seen in above strace output, In the second set of getattr()
operations, glusterfs client returns ENODATA to gluster-swift. This should
never happen as the xattr named user.swift.metadata always exists on the object
in the backend bricks.

This issue cannot be reproduced when either one of these volume options are
turned off.

# gluster volume set cache-swift-metadata off
OR
# gluster volume set one readdir-ahead off

1. md-cache caches user.swift.metadata xattr either on lookup_cbk or
getxattr_cbk path.
2. md-cache returns the xattr from cache the first time gluster-swift asks for
it.
3. When gluster-swift asks for the xattr the second time after a brief period,
md-cache finds that it's not in it's cache. It reports back to gluster-swift
with ENODATA. This is apparently how md-cache does negative caching.

Between operation 2 and 3 listed above, readdirp_cbk would have updated the
cache for the file. During this update, the concerned xattr isn't present.

#0  mdc_inode_xatt_set (this=<value optimized out>, inode=<value optimized
out>, dict=<value optimized out>) at md-cache.c:598
#1  0x00007f6f26ac975e in mdc_readdirp_cbk (frame=0x7f6f3308c678, cookie=<value
optimized out>, this=0x7f6f20010630, op_ret=4, op_errno=2,
entries=0x7f6f251dc850, xdata=0x7f6f32aa7b14) at md-cache.c:1998
#2  0x00007f6f26edc814 in qr_readdirp_cbk (frame=0x7f6f3308ca80, cookie=<value
optimized out>, this=<value optimized out>, op_ret=4, op_errno=2,
entries=0x7f6f251dc850, xdata=0x7f6f32aa7b14) at quick-read.c:520
$28 = (struct md_cache *) 0x7f6f20078090
$29 = {md_prot = {suid = 0 '\000', sgid = 0 '\000', sticky = 0 '\000', owner =
{read = 1 '\001', write = 1 '\001', exec = 1 '\001'}, group = {read = 1 '\001',
write = 0 '\000', exec = 1 '\001'}, other = {read = 1 '\001', write = 0 '\000',
exec = 1 '\001'}}, md_nlink = 1, md_uid = 0, md_gid = 0, md_atime = 1461997370,
md_atime_nsec = 442368834, md_mtime = 1461997370, md_mtime_nsec = 442368834,
md_ctime = 1461997370, md_ctime_nsec = 476368690, md_rdev = 0, md_size = 0,
md_blocks = 0, xattr = 0x0, linkname = 0x0, ia_time = 1461997385, xa_time =
1461997385, lock = 1}

Breakpoint 1, mdc_inode_xatt_get (this=<value optimized out>, inode=<value
optimized out>, dict=0x7f6f251dcac8) at md-cache.c:675
675                    if (!mdc->xattr)
#0  mdc_inode_xatt_get (this=<value optimized out>, inode=<value optimized
out>, dict=0x7f6f251dcac8) at md-cache.c:675
#1  0x00007f6f26aca4c1 in mdc_getxattr (frame=0x7f6f3308c9d4,
this=0x7f6f20010630, loc=0x7f6f1c0493e0, key=0x7f6f1c048350
"user.swift.metadata", xdata=0x0) at md-cache.c:1808
#2  0x00007f6f268ac681 in io_stats_getxattr (frame=0x7f6f3308cb2c,
this=0x7f6f200119e0, loc=0x7f6f1c0493e0, name=0x7f6f1c048350
"user.swift.metadata", xdata=0x0) at io-stats.c:2289
$97 = (struct md_cache *) 0x7f6f20078090
$98 = {md_prot = {suid = 0 '\000', sgid = 0 '\000', sticky = 0 '\000', owner =
{read = 1 '\001', write = 1 '\001', exec = 1 '\001'}, group = {read = 1 '\001',
write = 0 '\000', exec = 1 '\001'}, other = {read = 1 '\001', write = 0 '\000',
exec = 1 '\001'}}, md_nlink = 1, md_uid = 0, md_gid = 0, md_atime = 1461997370,
md_atime_nsec = 442368834, md_mtime = 1461997370, md_mtime_nsec = 442368834,
md_ctime = 1461997370, md_ctime_nsec = 476368690, md_rdev = 0, md_size = 0,
md_blocks = 0, xattr = 0x0, linkname = 0x0, ia_time = 1461997385, xa_time =
1461997385, lock = 0}

During readdirp, md-cache will set the xattr keys it wants to cache in xdata
which is fetched from the backend bricks. However, during opendir_cbk fop,
readdir-ahead xlator internally issues readdirp() calls which do not have any
of the keys to be cached set in xdata. So they are not fetched. For the next
readdirp() call, readdir-ahead returns the dirent entries from it's cache. And
as these entries do not contain the xattrs, md-cache updates it's cache with
wrong information which is inconsistent with the state in backend bricks. This
wrong information is served to FUSE applications for a small window of time.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=uPCsogAj1h&a=cc_unsubscribe