[Bugs] [Bug 1464327] glusterfs client crashes when reading large directory

bugzilla at redhat.com bugzilla at redhat.com
Wed Jul 5 15:34:31 UTC 2017


https://bugzilla.redhat.com/show_bug.cgi?id=1464327



--- Comment #3 from Csaba Henk <csaba at redhat.com> ---
Created attachment 1294659
  --> https://bugzilla.redhat.com/attachment.cgi?id=1294659&action=edit
Diagnostic patch

# Analysis 2/2

To explore the connection between the inode table corruption and the stack
overflow with
frame_fill_groups, we need to get our hands a bit more dirty. We'll add a
routine that does a basic
sanity check on inode tables: it runs through the bucket list of the name_hash
and check if the
list_{next,prev} pointers are zero. If none of them is zero, then it logs a
message that the table
is "all good", otherwise logs the offending buckets. ("All good" is euphemistic
though-- this is of
course just a heuristic, as the list pointers can be corrupt in other ways than
being 0.) Then we
insert this sanity check to a number of locations which are hit during various
fops. Also we add
logs that display the hash value of the dentries being added to the table.
These changes are
contained in the attached patch. Re-running the test with it, we get logs like
this:

[2017-07-04 23:04:57.798261] W [fuse-bridge.c:78:inode_check] 0-DEBUG: #1
fuse_getattr_resume/950: table 0x7fffe4006050 (inode 0x7fffe4005b10): all good!
[2017-07-04 23:04:57.798683] W [fuse-bridge.c:78:inode_check] 0-DEBUG: #2
fuse_attr_cbk/852: table 0x7fffe4006050 (inode 0x7fffe4005b10): all good!
[2017-07-04 23:04:57.809346] W [fuse-bridge.c:78:inode_check] 0-DEBUG: #3
fuse_lookup_resume/704: table 0x7fffe4006050 (inode 0x7fffe4004730): all good!
[2017-07-04 23:04:57.809892] W [fuse-bridge.c:78:inode_check] 0-DEBUG: #4
fuse_entry_cbk/502: table 0x7fffe4006050 (inode 0x7fffe4004730): all good!
[2017-07-04 23:04:57.810612] W [fuse-bridge.c:78:inode_check] 0-DEBUG: #5
fuse_lookup_resume/704: table 0x7fffe4006050 (inode 0x7fffe4004730): all good!
[2017-07-04 23:04:57.811149] W [fuse-bridge.c:78:inode_check] 0-DEBUG: #6
fuse_entry_cbk/502: table 0x7fffe4006050 (inode 0x7fffe4004730): all good!
[2017-07-04 23:04:57.811852] W [fuse-bridge.c:78:inode_check] 0-DEBUG: #7
fuse_lookup_resume/704: table 0x7fffe4006050 (inode 0x7fffe4004730): all good!
[2017-07-04 23:04:57.812440] W [fuse-bridge.c:78:inode_check] 0-DEBUG: #8
fuse_entry_cbk/502: table 0x7fffe4006050 (inode 0x7fffe4004730): all good!
[2017-07-04 23:04:57.813398] W [fuse-bridge.c:78:inode_check] 0-DEBUG: #9
fuse_lookup_resume/704: table 0x7fffe4006050 (inode 0x7fffe4004730): all good!
...
[2017-07-04 23:05:08.291447] W [inode.c:94:__dentry_hash] 0-DEBUG: table
0x7fffe4006050 (inode 0x7fffe400b6e0): adding dentry to name hash at
0x7ffff7e36600, index 9819
[2017-07-04 23:05:08.291718] W [fuse-bridge.c:78:inode_check] 0-DEBUG: #40
fuse_readdirp_cbk/3018: table 0x7fffe4006050 (inode 0x7fffe400b6e0): all good!
[2017-07-04 23:05:08.291991] W [fuse-bridge.c:78:inode_check] 0-DEBUG: #41
fuse_readdirp_cbk/3012: table 0x7fffe4006050 (inode 0x7fffe400bb00): all good!
[2017-07-04 23:05:08.292249] W [fuse-bridge.c:78:inode_check] 0-DEBUG: #42
fuse_readdirp_cbk/3013: table 0x7fffe4006050 (inode 0x7fffe4005b10): all good!
[2017-07-04 23:05:08.292322] W [inode.c:94:__dentry_hash] 0-DEBUG: table
0x7fffe4006050 (inode 0x7fffe400bb00): adding dentry to name hash at
0x7ffff7e2d080, index 7427
[2017-07-04 23:05:08.292810] W [fuse-bridge.c:78:inode_check] 0-DEBUG: #43
fuse_readdirp_cbk/3018: table 0x7fffe4006050 (inode 0x7fffe400bb00): all good!
[2017-07-04 23:05:08.293270] W [fuse-bridge.c:78:inode_check] 0-DEBUG: #44
fuse_readdirp_cbk/3012: table 0x7fffe4006050 (inode 0x7fffe400bf20): all good!
[2017-07-04 23:05:08.293570] W [fuse-bridge.c:78:inode_check] 0-DEBUG: #45
fuse_readdirp_cbk/3013: table 0x7fffe4006050 (inode 0x7fffe4005b10): all good!
[2017-07-04 23:05:08.293637] W [inode.c:94:__dentry_hash] 0-DEBUG: table
0x7fffe4006050 (inode 0x7fffe400bf20): adding dentry to name hash at
0x7ffff7e46710, index 13932
[2017-07-04 23:05:08.294232] W [fuse-bridge.c:78:inode_check] 0-DEBUG: #46
fuse_readdirp_cbk/3018: table 0x7fffe4006050 (inode 0x7fffe400bf20): all good!
[2017-07-04 23:05:08.294882] W [fuse-bridge.c:78:inode_check] 0-DEBUG: #47
fuse_readdirp_cbk/3012: table 0x7fffe4006050 (inode 0x7fffe400c340): all good!
[2017-07-04 23:05:08.295142] W [fuse-bridge.c:78:inode_check] 0-DEBUG: #48
fuse_readdirp_cbk/3013: table 0x7fffe4006050 (inode 0x7fffe4005b10): all good!
...
14957 [2017-07-04 23:05:08.303786] W [fuse-bridge.c:78:inode_check] 0-DEBUG:
#102 fuse_attr_cbk/852: table 0x7fffe4006050 (inode 0x7fffe4005b10): all good!
14958 [2017-07-04 23:05:08.304070] W [fuse-bridge.c:78:inode_check] 0-DEBUG:
#103 fuse_lookup_resume/704: table 0x7fffe4006050 (inode 0x7fffe400b6e0): all
good!
14959 [2017-07-04 23:05:08.304259] W [fuse-bridge.c:73:inode_check] 0-DEBUG:
#104 fuse_entry_cbk/502: table 0x7fffe4006050 (inode 0x7fffe400b6e0): corrupt
entry in name hash at 0x7ffff7e44cc0, index 13511
14960 [2017-07-04 23:05:08.304279] W [fuse-bridge.c:73:inode_check] 0-DEBUG:
#104 fuse_entry_cbk/502: table 0x7fffe4006050 (inode 0x7fffe400b6e0): corrupt
entry in name hash at 0x7ffff7e44cd0, index 13512
14961 [2017-07-04 23:05:08.304292] W [fuse-bridge.c:73:inode_check] 0-DEBUG:
#104 fuse_entry_cbk/502: table 0x7fffe4006050 (inode 0x7fffe400b6e0): corrupt
entry in name hash at 0x7ffff7e44cf0, index 13514
14962 [2017-07-04 23:05:08.304304] W [fuse-bridge.c:73:inode_check] 0-DEBUG:
#104 fuse_entry_cbk/502: table 0x7fffe4006050 (inode 0x7fffe400b6e0): corrupt
entry in name hash at 0x7ffff7e44d00, index 13515
14963 [2017-07-04 23:05:08.304316] W [fuse-bridge.c:73:inode_check] 0-DEBUG:
#104 fuse_entry_cbk/502: table 0x7fffe4006050 (inode 0x7fffe400b6e0): corrupt
entry in name hash at 0x7ffff7e44d60, index 13521
14964 [2017-07-04 23:05:08.304351] W [fuse-bridge.c:73:inode_check] 0-DEBUG:
#104 fuse_entry_cbk/502: table 0x7fffe4006050 (inode 0x7fffe400b6e0): corrupt
entry in name hash at 0x7ffff7e451c0, index 13591
14965 [2017-07-04 23:05:08.304368] W [fuse-bridge.c:73:inode_check] 0-DEBUG:
#104 fuse_entry_cbk/502: table 0x7fffe4006050 (inode 0x7fffe400b6e0): corrupt
entry in name hash at 0x7ffff7e45230, index 13598
14966 [2017-07-04 23:05:08.304383] W [fuse-bridge.c:73:inode_check] 0-DEBUG:
#104 fuse_entry_cbk/502: table 0x7fffe4006050 (inode 0x7fffe400b6e0): corrupt
entry in name hash at 0x7ffff7e45240, index 13599
14967 [2017-07-04 23:05:08.304398] W [fuse-bridge.c:73:inode_check] 0-DEBUG:
#104 fuse_entry_cbk/502: table 0x7fffe4006050 (inode 0x7fffe400b6e0): corrupt
entry in name hash at 0x7ffff7e45250, index 13600
14968 [2017-07-04 23:05:08.304410] W [fuse-bridge.c:73:inode_check] 0-DEBUG:
#104 fuse_entry_cbk/502: table 0x7fffe4006050 (inode 0x7fffe400b6e0): corrupt
entry in name hash at 0x7ffff7e45260, index 13601
14969 [2017-07-04 23:05:08.304421] W [fuse-bridge.c:73:inode_check] 0-DEBUG:
#104 fuse_entry_cbk/502: table 0x7fffe4006050 (inode 0x7fffe400b6e0): corrupt
entry in name hash at 0x7ffff7e45270, index 13602
14970 [2017-07-04 23:05:08.304432] W [fuse-bridge.c:73:inode_check] 0-DEBUG:
#104 fuse_entry_cbk/502: table 0x7fffe4006050 (inode 0x7fffe400b6e0): corrupt
entry in name hash at 0x7ffff7e45280, index 13603
14971 [2017-07-04 23:05:08.304442] W [fuse-bridge.c:73:inode_check] 0-DEBUG:
#104 fuse_entry_cbk/502: table 0x7fffe4006050 (inode 0x7fffe400b6e1): corrupt
entry in name hash at 0x7ffff7e452a0, index 13605
14972 [2017-07-04 23:05:08.304457] W [fuse-bridge.c:73:inode_check] 0-DEBUG:
#104 fuse_entry_cbk/502: table 0x7fffe4006050 (inode 0x7fffe400b6e0): corrupt
entry in name hash at 0x7ffff7e45300, index 13611
14973 [2017-07-04 23:05:08.304471] W [fuse-bridge.c:73:inode_check] 0-DEBUG:
#104 fuse_entry_cbk/502: table 0x7fffe4006050 (inode 0x7fffe400b6e0): corrupt
entry in name hash at 0x7ffff7e45320, index 13613
14974 [2017-07-04 23:05:08.304486] W [fuse-bridge.c:73:inode_check] 0-DEBUG:
#104 fuse_entry_cbk/502: table 0x7fffe4006050 (inode 0x7fffe400b6e0): corrupt
entry in name hash at 0x7ffff7e45350, index 13616
14975 [2017-07-04 23:05:08.304498] W [fuse-bridge.c:73:inode_check] 0-DEBUG:
#104 fuse_entry_cbk/502: table 0x7fffe4006050 (inode 0x7fffe400b6e0): corrupt
entry in name hash at 0x7ffff7e45370, index 13618
14976 [2017-07-04 23:05:08.304509] W [fuse-bridge.c:73:inode_check] 0-DEBUG:
#104 fuse_entry_cbk/502: table 0x7fffe4006050 (inode 0x7fffe400b6e0): corrupt
entry in name hash at 0x7ffff7e453a0, index 13621
14977 [2017-07-04 23:05:08.304520] W [fuse-bridge.c:73:inode_check] 0-DEBUG:
#104 fuse_entry_cbk/502: table 0x7fffe4006050 (inode 0x7fffe400b6e0): corrupt
entry in name hash at 0x7ffff7e453b0, index 13622
14978 [2017-07-04 23:05:08.304531] W [fuse-bridge.c:73:inode_check] 0-DEBUG:
#104 fuse_entry_cbk/502: table 0x7fffe4006050 (inode 0x7fffe400b6e0): corrupt
entry in name hash at 0x7ffff7e453c0, index 13623
14979 [2017-07-04 23:05:08.304542] W [fuse-bridge.c:73:inode_check] 0-DEBUG:
#104 fuse_entry_cbk/502: table 0x7fffe4006050 (inode 0x7fffe400b6e0): corrupt
entry in name hash at 0x7ffff7e453f0, index 13626
14980 [2017-07-04 23:05:08.304553] W [fuse-bridge.c:73:inode_check] 0-DEBUG:
#104 fuse_entry_cbk/502: table 0x7fffe4006050 (inode 0x7fffe400b6e0): corrupt
entry in name hash at 0x7ffff7e45400, index 13627
14981 [2017-07-04 23:05:08.304564] W [fuse-bridge.c:73:inode_check] 0-DEBUG:
#104 fuse_entry_cbk/502: table 0x7fffe4006050 (inode 0x7fffe400b6e0): corrupt
entry in name hash at 0x7ffff7e45420, index 13629
14982 [2017-07-04 23:05:08.304575] W [fuse-bridge.c:73:inode_check] 0-DEBUG:
#104 fuse_entry_cbk/502: table 0x7fffe4006050 (inode 0x7fffe400b6e0): corrupt
entry in name hash at 0x7ffff7e45440, index 13631

Breaking after corruption was detected -- which is possible via the
__inode_check_trap_bad hook
provided by the patch, we can investigate the degree of corruption in gdb.
There is another
heuristic check that can err to the opposite direction (reporting false
negatives): check if the
list_{next,prev} pointers of given list_head (the kind of struct that populates
the buckets) point
to itself. If yes, that's definitely a valid entry, representing an empty list
(in the context of
the hash table, meaning no item was hashed there). If no, then it can be either
a corrupted
list_head or a valid non-empty list -- in case of the name_hash, the latter can
be filtered out by
matching against the log entries on the hash indices of the hashed dentries. In
gdb, we can perform
these investigations in an interval that contains the noticed corruptions as
follows:

(gdb)  set $i=0
(gdb)  while ($i < 150)
> printf "%d %d %p ", $i, $i+13500, $inode->table->name_hash+13500+$i
> p $inode->table->name_hash[13500+$i++]
> end

This would print lines like this:

21 13521 0x7ffff7e44d60 $11 = {next = 0x0, prev = 0x7ffff51dc3b4}
22 13522 0x7ffff7e44d70 $12 = {next = 0xffffffffffffffff, prev = 0x1600000000}
23 13523 0x7ffff7e44d80 $13 = {next = 0x7ffff51dc3bb, prev = 0x7ffff7e44d80}
24 13524 0x7ffff7e44d90 $14 = {next = 0x7ffff7e44d90, prev = 0x7ffff7e44d90}
25 13525 0x7ffff7e44da0 $15 = {next = 0x7ffff7e44da0, prev = 0x7ffff7e44da0}
26 13526 0x7ffff7e44db0 $16 = {next = 0x7ffff7e44db0, prev = 0x3000000018}
27 13527 0x7ffff7e44dc0 $17 = {next = 0x7ffff7e45460, prev = 0x7ffff7e453a0}
28 13528 0x7ffff7e44dd0 $18 = {next = 0x7ffff7e44dd0, prev = 0x7ffff7e44dd0}

(here lines of index 24, 25 and 29 are sane as the addresses agree (and we
checked that no reported
hash index falls into the interval), the rest are corrupt).

What makes our investigation easy is that these addresses do not vary between
various runs of the
same setup (same glusterfs build tested against same directory with same
glusterfs configuration).
So we can re-run the test with making use of the findings so far. In the hope
of catching the
corruption "red-handed", we can use watchpoints to notice a change at the
memory addresses we
collected. For example, to see when the bogus value prev = 0x3000000018 gets
inserted to entry 26
(hash 13526, address 0x7ffff7e44db0), we could use the following watch
directive:

(gdb) watch *0x7ffff7e44db8

It would be great to set watchpoints on all list pointers occurring in the
interval we investigate.
Alas, the instruction set of the Intel architectures support only two such
watchpoints. So we need
to re-run the test with various address pairs under watch, and see what is
corrupt at the time of a
change in a value there. With some endurance it becomes clear that the
corruption spreads from
higher addresses towards the lower ones, and thus to capture the Original
Corruption we need to set
a watchpoint on the highest corrupted address -- in the example setup it was
at:

133 13633 0x7ffff7e45460 $123 = {next = 0x7ffff7e45460, prev = 0x7fffec000ac0}

where prev = 0x7fffec000ac0 is bogus and thus

(gdb) watch *0x7ffff7e45468

is the relevant watch directive. With this we get:


Thread 7 "glusterfs" hit Hardware watchpoint 1: *0x7ffff7e45468

Old value = -136031136
New value = -335541568
frame_fill_groups (frame=0x7fffec000ac0) at fuse-helpers.c:150
150             xlator_t       *this          = frame->this;
(gdb) bt
#0  frame_fill_groups (frame=0x7fffec000ac0) at fuse-helpers.c:150
#1  0x00007ffff51b9938 in get_groups (priv=0x65d550, frame=0x7fffec000ac0) at
fuse-helpers.c:302
#2  0x00007ffff51b9c71 in get_call_frame_for_req (state=0x7fffe0003910) at
fuse-helpers.c:368
#3  0x00007ffff51bf575 in fuse_lookup_resume (state=0x7fffe0003910) at
fuse-bridge.c:705
#4  0x00007ffff51bf2c4 in fuse_fop_resume (state=0x7fffe0003910) at
fuse-bridge.c:666
#5  0x00007ffff51bc972 in fuse_resolve_done (state=0x7fffe0003910) at
fuse-resolve.c:663
#6  0x00007ffff51bca48 in fuse_resolve_all (state=0x7fffe0003910) at
fuse-resolve.c:690
#7  0x00007ffff51bc953 in fuse_resolve (state=0x7fffe0003910) at
fuse-resolve.c:654
#8  0x00007ffff51bca1f in fuse_resolve_all (state=0x7fffe0003910) at
fuse-resolve.c:686
#9  0x00007ffff51bcaa6 in fuse_resolve_continue (state=0x7fffe0003910) at
fuse-resolve.c:706
#10 0x00007ffff51bad4e in fuse_resolve_entry_cbk (frame=0x7fffe00016c0,
cookie=0x7fffe00017d0, this=0x656d90, op_ret=0, op_errno=0,
inode=0x7fffdc007e80, buf=0x7ffff7e86ce0, xattr=0x0, postparent=0x7ffff7e86c70)
at fuse-resolve.c:76
#11 0x00007ffff7b99862 in default_lookup_cbk (frame=0x7fffe00017d0,
cookie=0x7fffec000ac0, this=0x67ad30, op_ret=0, op_errno=0,
inode=0x7fffdc007e80, buf=0x7ffff7e86ce0, xdata=0x0, postparent=0x7ffff7e86c70)
at defaults.c:1265
#12 0x00007ffff2537bae in mdc_lookup (frame=0x7fffec000ac0, this=0x679820,
loc=0x7fffe0001930, xdata=0x0) at md-cache.c:1116
#13 0x00007ffff7ba2a9c in default_lookup_resume (frame=0x7fffe00017d0,
this=0x67ad30, loc=0x7fffe0001930, xdata=0x0) at defaults.c:1872
#14 0x00007ffff7b14e66 in call_resume_wind (stub=0x7fffe00018e0) at
call-stub.c:2149
#15 0x00007ffff7b23c39 in call_resume (stub=0x7fffe00018e0) at call-stub.c:2512
#16 0x00007ffff232bfd4 in iot_worker (data=0x683410) at io-threads.c:224
#17 0x00007ffff693f5ba in start_thread (arg=0x7ffff7e87700) at
pthread_create.c:333
#18 0x00007ffff62177cd in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:109
(gdb) l
145     #define FUSE_MAX_AUX_GROUPS 32 /* We can get only up to 32 aux groups
from /proc */
146     void
147     frame_fill_groups (call_frame_t *frame)
148     {
149     #if defined(GF_LINUX_HOST_OS)
150             xlator_t       *this          = frame->this;
151             fuse_private_t *priv          = this->private;
152             char            filename[32];
153             char            line[4096];
154             char           *ptr           = NULL;


So in accordance with the ElectricFence hint, the corruption happens at / near
to entering
frame_fill_groups(). Let's see the offending bucket entry at this point:

(gdb) fr 2
(gdb) p state->loc.inode->table->name_hash[13633]
$23 = {next = 0x7ffff7e45460, prev = 0x7fffec000ac0}

Looking at it carefully: the bogus value that overwrites the prev member of
this list_head
("prev = 0x7fffec000ac0") is nothing else but the address of the frame argument
of
frame_fill_groups ("#0  frame_fill_groups (frame=0x7fffec000ac0) at
fuse-helpers.c:150")!
That is, the stack of thread 7, when it is extended to host frame_fill_groups()
is overflowing and what it starts to flood is the inode table.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=ynH3lDzITs&a=cc_unsubscribe


More information about the Bugs mailing list