[Bugs] [Bug 1150244] New: glusterfsd hangs on IO when underlying ext4 filesystem corrupts an xattr
bugzilla at redhat.com
bugzilla at redhat.com
Tue Oct 7 18:51:20 UTC 2014
https://bugzilla.redhat.com/show_bug.cgi?id=1150244
Bug ID: 1150244
Summary: glusterfsd hangs on IO when underlying ext4 filesystem
corrupts an xattr
Product: GlusterFS
Version: 3.5.2
Component: unclassified
Severity: high
Assignee: gluster-bugs at redhat.com
Reporter: rglick at radix.trade
CC: bugs at gluster.org
Description of problem:
glusterfsd process will hang (does not respond go glusterfs requests but
appears to still be running) when the underlying ext4 filesystem gets a
corrupted xattr.
IO to the affected brick will be stuck (glusterfsd process turns into a zombie
when killed), only a reboot, fsck, and subsequent startup of gluster-server
resolves the issue
This may be related (subset?) of
https://bugzilla.redhat.com/show_bug.cgi?id=832609
kernel messages look like this
Oct 7 05:34:30 ghost9 kernel: [82029.008044] ------------[ cut here
]------------
Oct 7 05:34:30 ghost9 kernel: [82029.008063] WARNING: CPU: 4 PID: 2257 at
/build/buildd/linux-lts-saucy-3.11.0/fs/ext4/ext4_jbd2.c:259
__ext4_handle_dirty_metadata+0x1a9/0x1c0()
Oct 7 05:34:30 ghost9 kernel: [82029.008065] Modules linked in:
rpcsec_gss_krb5 nfsv4 snd_hda_codec_realtek snd_hda_intel snd_hda_codec
snd_hwdep nouveau ttm snd_pcm mei_me snd_timer drm_kms_helper drm psmouse nfsd
mei snd eeepc_wmi soundcore asus_wmi lpc_ich snd_page_alloc sparse_keymap
i2c_algo_bit mxm_wmi video serio_raw mac_hid wmi lp nfs_acl auth_rpcgss parport
nfs fscache lockd sunrpc ixgbe dca ahci libahci e1000e firewire_ohci
firewire_core ptp mdio crc_itu_t pps_core
Oct 7 05:34:30 ghost9 kernel: [82029.008104] CPU: 4 PID: 2257 Comm: glusterfsd
Not tainted 3.11.0-20-generic #34~precise1-Ubuntu
Oct 7 05:34:30 ghost9 kernel: [82029.008106] Hardware name: System
manufacturer System Product Name/P9X79 WS, BIOS 4306 08/22/2013
Oct 7 05:34:30 ghost9 kernel: [82029.008108] 0000000000000103
ffff880fdd365998 ffffffff8173dd2d 0000000000000007
Oct 7 05:34:30 ghost9 kernel: [82029.008111] 0000000000000000
ffff880fdd3659d8 ffffffff8106540c ffff880fdde52180
Oct 7 05:34:30 ghost9 kernel: [82029.008112] ffff880eb9af5000
00000000ffffff8b ffff8800878b08b0 ffff880fdde52180
Oct 7 05:34:30 ghost9 kernel: [82029.008115] Call Trace:
Oct 7 05:34:30 ghost9 kernel: [82029.008123] [<ffffffff8173dd2d>]
dump_stack+0x46/0x58
Oct 7 05:34:30 ghost9 kernel: [82029.008128] [<ffffffff8106540c>]
warn_slowpath_common+0x8c/0xc0
Oct 7 05:34:30 ghost9 kernel: [82029.008130] [<ffffffff8106545a>]
warn_slowpath_null+0x1a/0x20
Oct 7 05:34:30 ghost9 kernel: [82029.008132] [<ffffffff8127f7c9>]
__ext4_handle_dirty_metadata+0x1a9/0x1c0
Oct 7 05:34:30 ghost9 kernel: [82029.008136] [<ffffffff81290f03>]
ext4_xattr_release_block+0x103/0x1f0
Oct 7 05:34:30 ghost9 kernel: [82029.008138] [<ffffffff81291524>]
ext4_xattr_block_set+0x204/0x710
Oct 7 05:34:30 ghost9 kernel: [82029.008140] [<ffffffff81292170>]
ext4_xattr_set_handle+0x370/0x490
Oct 7 05:34:30 ghost9 kernel: [82029.008143] [<ffffffff81292329>] ?
ext4_xattr_set+0x99/0x140
Oct 7 05:34:30 ghost9 kernel: [82029.008145] [<ffffffff81292355>]
ext4_xattr_set+0xc5/0x140
Oct 7 05:34:30 ghost9 kernel: [82029.008147] [<ffffffff81292e8d>]
ext4_xattr_trusted_set+0x2d/0x30
Oct 7 05:34:30 ghost9 kernel: [82029.008153] [<ffffffff811d8b6b>]
generic_setxattr+0x6b/0x90
Oct 7 05:34:30 ghost9 kernel: [82029.008155] [<ffffffff811d949b>]
__vfs_setxattr_noperm+0x7b/0x1c0
Oct 7 05:34:30 ghost9 kernel: [82029.008159] [<ffffffff81337d8e>] ?
evm_inode_setxattr+0xe/0x10
Oct 7 05:34:30 ghost9 kernel: [82029.008162] [<ffffffff811d969c>]
vfs_setxattr+0xbc/0xc0
Oct 7 05:34:30 ghost9 kernel: [82029.008164] [<ffffffff811d97de>]
setxattr+0x13e/0x1e0
Oct 7 05:34:30 ghost9 kernel: [82029.008170] [<ffffffff817494fe>] ?
_raw_spin_lock+0xe/0x20
Oct 7 05:34:30 ghost9 kernel: [82029.008178] [<ffffffff811b6ee3>] ?
__sb_start_write+0x53/0x110
Oct 7 05:34:30 ghost9 kernel: [82029.008181] [<ffffffff811d3492>] ?
mnt_clone_write+0x12/0x30
Oct 7 05:34:30 ghost9 kernel: [82029.008183] [<ffffffff811d9c7e>]
SyS_fsetxattr+0xbe/0x100
Oct 7 05:34:30 ghost9 kernel: [82029.008187] [<ffffffff811d9e5d>] ?
SyS_fgetxattr+0x7d/0xd0
Oct 7 05:34:30 ghost9 kernel: [82029.008193] [<ffffffff8175291d>]
system_call_fastpath+0x1a/0x1f
Oct 7 05:34:30 ghost9 kernel: [82029.008195] ---[ end trace 655f8cd7683964af
]---
Oct 7 05:34:30 ghost9 kernel: [82029.008198] EXT4-fs:
ext4_handle_dirty_xattr_block:167: aborting transaction: error 117 in
__ext4_handle_dirty_metadata
Oct 7 05:34:30 ghost9 kernel: [82029.008388] EXT4-fs error (device sda1):
ext4_handle_dirty_xattr_block:167: inode #15879459: block 63987149: comm
glusterfsd: journal_dirty_metadata failed: handle type 10 started at line 1173,
credits 24/24, errcode -117
Oct 7 05:34:30 ghost9 kernel: [82029.008415] EXT4-fs error (device sda1) in
ext4_reserve_inode_write:4841: Readonly filesystem
Oct 7 05:34:30 ghost9 kernel: [82029.008464] EXT4-fs error (device sda1) in
ext4_dirty_inode:4960: error 117
Oct 7 05:34:30 ghost9 kernel: [82029.008505] EXT4-fs error (device sda1) in
ext4_xattr_release_block:558: error 117
Oct 7 05:34:30 ghost9 kernel: [82029.008575] BUG: unable to handle kernel NULL
pointer dereference at 0000000000000028
Oct 7 05:34:30 ghost9 kernel: [82029.008585] IP: [<ffffffff812708c1>]
__ext4_error_inode+0x31/0x120
Oct 7 05:34:30 ghost9 kernel: [82029.008598] PGD 0
Oct 7 05:34:30 ghost9 kernel: [82029.008603] Oops: 0000 [#1] SMP
Oct 7 05:34:30 ghost9 kernel: [82029.008609] Modules linked in:
rpcsec_gss_krb5 nfsv4 snd_hda_codec_realtek snd_hda_intel snd_hda_codec
snd_hwdep nouveau ttm snd_pcm mei_me snd_timer drm_kms_helper drm psmouse nfsd
mei snd eeepc_wmi soundcore asus_wmi lpc_ich snd_page_alloc sparse_keymap
i2c_algo_bit mxm_wmi video serio_raw mac_hid wmi lp nfs_acl auth_rpcgss parport
nfs fscache lockd sunrpc ixgbe dca ahci libahci e1000e firewire_ohci
firewire_core ptp mdio crc_itu_t pps_core
Oct 7 05:34:30 ghost9 kernel: [82029.008698] CPU: 0 PID: 2257 Comm: glusterfsd
Tainted: G W 3.11.0-20-generic #34~precise1-Ubuntu
Oct 7 05:34:30 ghost9 kernel: [82029.008705] Hardware name: System
manufacturer System Product Name/P9X79 WS, BIOS 4306 08/22/2013
Oct 7 05:34:30 ghost9 kernel: [82029.008711] task: ffff880fd8219770 ti:
ffff880fdd364000 task.ti: ffff880fdd364000
Oct 7 05:34:30 ghost9 kernel: [82029.008716] RIP: 0010:[<ffffffff812708c1>]
[<ffffffff812708c1>] __ext4_error_inode+0x31/0x120
Oct 7 05:34:30 ghost9 kernel: [82029.008727] RSP: 0018:ffff880fdd365968
EFLAGS: 00010282
Oct 7 05:34:30 ghost9 kernel: [82029.008731] RAX: 0000000000000000 RBX:
0000000000000000 RCX: 0000000003c804f2
Oct 7 05:34:30 ghost9 kernel: [82029.008737] RDX: 0000000000001131 RSI:
ffffffff81830eb0 RDI: 0000000000000000
Oct 7 05:34:30 ghost9 kernel: [82029.008745] RBP: ffff880fdd365a08 R08:
ffffffff81b23460 R09: 000000000000000a
Oct 7 05:34:30 ghost9 kernel: [82029.008750] R10: 0000000000000000 R11:
0000000000000000 R12: 0000000000001131
Oct 7 05:34:30 ghost9 kernel: [82029.008755] R13: 0000000000000000 R14:
ffff880fdde52180 R15: ffffffff81b23460
Oct 7 05:34:30 ghost9 kernel: [82029.008761] FS: 00007fcb17efe700(0000)
GS:ffff88103fc00000(0000) knlGS:0000000000000000
Oct 7 05:34:30 ghost9 kernel: [82029.008766] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
Oct 7 05:34:30 ghost9 kernel: [82029.008770] CR2: 0000000000000028 CR3:
0000000fd6d29000 CR4: 00000000001407f0
Oct 7 05:34:30 ghost9 kernel: [82029.008776] Stack:
Oct 7 05:34:30 ghost9 kernel: [82029.008779] ffff880fdd365988
ffffffff811e8050 ffff880fe4d82000 ffff880fddd4cc98
Oct 7 05:34:30 ghost9 kernel: [82029.008790] ffff880fdd365998
ffffffff811e8093 ffff880fdde52180 ffffffff81838030
Oct 7 05:34:30 ghost9 kernel: [82029.008801] ffff880fdd365a08
ffffffff8127f28d ffff880fdd3659e8 ffff880fe4d82000
Oct 7 05:34:30 ghost9 kernel: [82029.008811] Call Trace:
Oct 7 05:34:30 ghost9 kernel: [82029.008821] [<ffffffff811e8050>] ?
__sync_dirty_buffer+0xa0/0xd0
Oct 7 05:34:30 ghost9 kernel: [82029.008828] [<ffffffff811e8093>] ?
sync_dirty_buffer+0x13/0x20
Oct 7 05:34:30 ghost9 kernel: [82029.008836] [<ffffffff8127f28d>] ?
ext4_journal_abort_handle+0x4d/0xe0
Oct 7 05:34:30 ghost9 kernel: [82029.008843] [<ffffffff8127f737>]
__ext4_handle_dirty_metadata+0x117/0x1c0
Oct 7 05:34:30 ghost9 kernel: [82029.008854] [<ffffffff812913f3>] ?
ext4_xattr_block_set+0xd3/0x710
Oct 7 05:34:30 ghost9 kernel: [82029.008865] [<ffffffff8125444a>]
ext4_do_update_inode+0x36a/0x560
Oct 7 05:34:30 ghost9 kernel: [82029.008873] [<ffffffff81255e47>]
ext4_mark_iloc_dirty+0x67/0x90
Oct 7 05:34:30 ghost9 kernel: [82029.008879] [<ffffffff8129204f>]
ext4_xattr_set_handle+0x24f/0x490
Oct 7 05:34:30 ghost9 kernel: [82029.008886] [<ffffffff81292355>]
ext4_xattr_set+0xc5/0x140
Oct 7 05:34:30 ghost9 kernel: [82029.009104] [<ffffffff81292e8d>]
ext4_xattr_trusted_set+0x2d/0x30
Oct 7 05:34:30 ghost9 kernel: [82029.009534] [<ffffffff811d8b6b>]
generic_setxattr+0x6b/0x90
Oct 7 05:34:30 ghost9 kernel: [82029.010056] [<ffffffff811d949b>]
__vfs_setxattr_noperm+0x7b/0x1c0
Oct 7 05:34:30 ghost9 kernel: [82029.010569] [<ffffffff81337d8e>] ?
evm_inode_setxattr+0xe/0x10
Oct 7 05:34:30 ghost9 kernel: [82029.011084] [<ffffffff811d969c>]
vfs_setxattr+0xbc/0xc0
Oct 7 05:34:30 ghost9 kernel: [82029.011604] [<ffffffff811d97de>]
setxattr+0x13e/0x1e0
Oct 7 05:34:30 ghost9 kernel: [82029.012121] [<ffffffff817494fe>] ?
_raw_spin_lock+0xe/0x20
Oct 7 05:34:30 ghost9 kernel: [82029.012648] [<ffffffff811b6ee3>] ?
__sb_start_write+0x53/0x110
Oct 7 05:34:30 ghost9 kernel: [82029.013143] [<ffffffff811d3492>] ?
mnt_clone_write+0x12/0x30
Oct 7 05:34:30 ghost9 kernel: [82029.013631] [<ffffffff811d9c7e>]
SyS_fsetxattr+0xbe/0x100
Oct 7 05:34:30 ghost9 kernel: [82029.014109] [<ffffffff811d9e5d>] ?
SyS_fgetxattr+0x7d/0xd0
Oct 7 05:34:30 ghost9 kernel: [82029.014578] [<ffffffff8175291d>]
system_call_fastpath+0x1a/0x1f
Oct 7 05:34:30 ghost9 kernel: [82029.015037] Code: 48 89 e5 48 81 ec a0 00 00
00 48 89 5d d8 4c 89 65 e0 41 89 d4 4c 89 6d e8 4c 89 75 f0 48 89 fb 4c 89 7d
f8 4c 89 4d c8 4d 89 c7 <48> 8b 47 28 48 8b 57 40 49 89 f5 49 89 ce 48 8b 80 50
03 00 00
Oct 7 05:34:30 ghost9 kernel: [82029.016080] RIP [<ffffffff812708c1>]
__ext4_error_inode+0x31/0x120
Oct 7 05:34:30 ghost9 kernel: [82029.016559] RSP <ffff880fdd365968>
Oct 7 05:34:30 ghost9 kernel: [82029.017041] CR2: 0000000000000028
Oct 7 05:34:30 ghost9 kernel: [82029.019503] ---[ end trace 655f8cd7683964b0
]---
Version-Release number of selected component (if applicable):
3.5.2-ubuntu1~precise1
How reproducible:
Unable to reproduce, but this happens approximately 1x per week in a 10 node
cluster with 20 compute clients.
Steps to Reproduce:
1. NA
Actual results:
Extended attributes corrupted (not sure if this is an ext4 issue or a gluster
issue). Brick becomes unresponsive instead of crashing or failing gracefully.
Expected results:
No filesystem corruption.
IO fails, or brick goes down and replica responds.
Additional info:
--
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=GZfVRXDPcI&a=cc_unsubscribe
More information about the Bugs
mailing list