[Bugs] [Bug 1150244] New: glusterfsd hangs on IO when underlying ext4 filesystem corrupts an xattr

bugzilla at redhat.com bugzilla at redhat.com
Tue Oct 7 18:51:20 UTC 2014


https://bugzilla.redhat.com/show_bug.cgi?id=1150244

            Bug ID: 1150244
           Summary: glusterfsd hangs on IO when underlying ext4 filesystem
                    corrupts an xattr
           Product: GlusterFS
           Version: 3.5.2
         Component: unclassified
          Severity: high
          Assignee: gluster-bugs at redhat.com
          Reporter: rglick at radix.trade
                CC: bugs at gluster.org



Description of problem:

glusterfsd process will hang (does not respond go glusterfs requests but
appears to still be running) when the underlying ext4 filesystem gets a
corrupted xattr.

IO to the affected brick will be stuck (glusterfsd process turns into a zombie
when killed), only a reboot, fsck, and subsequent startup of gluster-server
resolves the issue

This may be related (subset?) of
https://bugzilla.redhat.com/show_bug.cgi?id=832609

kernel messages look like this

Oct  7 05:34:30 ghost9 kernel: [82029.008044] ------------[ cut here
]------------
Oct  7 05:34:30 ghost9 kernel: [82029.008063] WARNING: CPU: 4 PID: 2257 at
/build/buildd/linux-lts-saucy-3.11.0/fs/ext4/ext4_jbd2.c:259
__ext4_handle_dirty_metadata+0x1a9/0x1c0()
Oct  7 05:34:30 ghost9 kernel: [82029.008065] Modules linked in:
rpcsec_gss_krb5 nfsv4 snd_hda_codec_realtek snd_hda_intel snd_hda_codec
snd_hwdep nouveau ttm snd_pcm mei_me snd_timer drm_kms_helper drm psmouse nfsd
mei snd eeepc_wmi soundcore asus_wmi lpc_ich snd_page_alloc sparse_keymap
i2c_algo_bit mxm_wmi video serio_raw mac_hid wmi lp nfs_acl auth_rpcgss parport
nfs fscache lockd sunrpc ixgbe dca ahci libahci e1000e firewire_ohci
firewire_core ptp mdio crc_itu_t pps_core
Oct  7 05:34:30 ghost9 kernel: [82029.008104] CPU: 4 PID: 2257 Comm: glusterfsd
Not tainted 3.11.0-20-generic #34~precise1-Ubuntu
Oct  7 05:34:30 ghost9 kernel: [82029.008106] Hardware name: System
manufacturer System Product Name/P9X79 WS, BIOS 4306 08/22/2013
Oct  7 05:34:30 ghost9 kernel: [82029.008108]  0000000000000103
ffff880fdd365998 ffffffff8173dd2d 0000000000000007
Oct  7 05:34:30 ghost9 kernel: [82029.008111]  0000000000000000
ffff880fdd3659d8 ffffffff8106540c ffff880fdde52180
Oct  7 05:34:30 ghost9 kernel: [82029.008112]  ffff880eb9af5000
00000000ffffff8b ffff8800878b08b0 ffff880fdde52180
Oct  7 05:34:30 ghost9 kernel: [82029.008115] Call Trace:
Oct  7 05:34:30 ghost9 kernel: [82029.008123]  [<ffffffff8173dd2d>]
dump_stack+0x46/0x58
Oct  7 05:34:30 ghost9 kernel: [82029.008128]  [<ffffffff8106540c>]
warn_slowpath_common+0x8c/0xc0
Oct  7 05:34:30 ghost9 kernel: [82029.008130]  [<ffffffff8106545a>]
warn_slowpath_null+0x1a/0x20
Oct  7 05:34:30 ghost9 kernel: [82029.008132]  [<ffffffff8127f7c9>]
__ext4_handle_dirty_metadata+0x1a9/0x1c0
Oct  7 05:34:30 ghost9 kernel: [82029.008136]  [<ffffffff81290f03>]
ext4_xattr_release_block+0x103/0x1f0
Oct  7 05:34:30 ghost9 kernel: [82029.008138]  [<ffffffff81291524>]
ext4_xattr_block_set+0x204/0x710
Oct  7 05:34:30 ghost9 kernel: [82029.008140]  [<ffffffff81292170>]
ext4_xattr_set_handle+0x370/0x490
Oct  7 05:34:30 ghost9 kernel: [82029.008143]  [<ffffffff81292329>] ?
ext4_xattr_set+0x99/0x140
Oct  7 05:34:30 ghost9 kernel: [82029.008145]  [<ffffffff81292355>]
ext4_xattr_set+0xc5/0x140
Oct  7 05:34:30 ghost9 kernel: [82029.008147]  [<ffffffff81292e8d>]
ext4_xattr_trusted_set+0x2d/0x30
Oct  7 05:34:30 ghost9 kernel: [82029.008153]  [<ffffffff811d8b6b>]
generic_setxattr+0x6b/0x90
Oct  7 05:34:30 ghost9 kernel: [82029.008155]  [<ffffffff811d949b>]
__vfs_setxattr_noperm+0x7b/0x1c0
Oct  7 05:34:30 ghost9 kernel: [82029.008159]  [<ffffffff81337d8e>] ?
evm_inode_setxattr+0xe/0x10
Oct  7 05:34:30 ghost9 kernel: [82029.008162]  [<ffffffff811d969c>]
vfs_setxattr+0xbc/0xc0
Oct  7 05:34:30 ghost9 kernel: [82029.008164]  [<ffffffff811d97de>]
setxattr+0x13e/0x1e0
Oct  7 05:34:30 ghost9 kernel: [82029.008170]  [<ffffffff817494fe>] ?
_raw_spin_lock+0xe/0x20
Oct  7 05:34:30 ghost9 kernel: [82029.008178]  [<ffffffff811b6ee3>] ?
__sb_start_write+0x53/0x110
Oct  7 05:34:30 ghost9 kernel: [82029.008181]  [<ffffffff811d3492>] ?
mnt_clone_write+0x12/0x30
Oct  7 05:34:30 ghost9 kernel: [82029.008183]  [<ffffffff811d9c7e>]
SyS_fsetxattr+0xbe/0x100
Oct  7 05:34:30 ghost9 kernel: [82029.008187]  [<ffffffff811d9e5d>] ?
SyS_fgetxattr+0x7d/0xd0
Oct  7 05:34:30 ghost9 kernel: [82029.008193]  [<ffffffff8175291d>]
system_call_fastpath+0x1a/0x1f
Oct  7 05:34:30 ghost9 kernel: [82029.008195] ---[ end trace 655f8cd7683964af
]---
Oct  7 05:34:30 ghost9 kernel: [82029.008198] EXT4-fs:
ext4_handle_dirty_xattr_block:167: aborting transaction: error 117 in
__ext4_handle_dirty_metadata
Oct  7 05:34:30 ghost9 kernel: [82029.008388] EXT4-fs error (device sda1):
ext4_handle_dirty_xattr_block:167: inode #15879459: block 63987149: comm
glusterfsd: journal_dirty_metadata failed: handle type 10 started at line 1173,
credits 24/24, errcode -117
Oct  7 05:34:30 ghost9 kernel: [82029.008415] EXT4-fs error (device sda1) in
ext4_reserve_inode_write:4841: Readonly filesystem
Oct  7 05:34:30 ghost9 kernel: [82029.008464] EXT4-fs error (device sda1) in
ext4_dirty_inode:4960: error 117
Oct  7 05:34:30 ghost9 kernel: [82029.008505] EXT4-fs error (device sda1) in
ext4_xattr_release_block:558: error 117
Oct  7 05:34:30 ghost9 kernel: [82029.008575] BUG: unable to handle kernel NULL
pointer dereference at 0000000000000028
Oct  7 05:34:30 ghost9 kernel: [82029.008585] IP: [<ffffffff812708c1>]
__ext4_error_inode+0x31/0x120
Oct  7 05:34:30 ghost9 kernel: [82029.008598] PGD 0 
Oct  7 05:34:30 ghost9 kernel: [82029.008603] Oops: 0000 [#1] SMP 
Oct  7 05:34:30 ghost9 kernel: [82029.008609] Modules linked in:
rpcsec_gss_krb5 nfsv4 snd_hda_codec_realtek snd_hda_intel snd_hda_codec
snd_hwdep nouveau ttm snd_pcm mei_me snd_timer drm_kms_helper drm psmouse nfsd
mei snd eeepc_wmi soundcore asus_wmi lpc_ich snd_page_alloc sparse_keymap
i2c_algo_bit mxm_wmi video serio_raw mac_hid wmi lp nfs_acl auth_rpcgss parport
nfs fscache lockd sunrpc ixgbe dca ahci libahci e1000e firewire_ohci
firewire_core ptp mdio crc_itu_t pps_core
Oct  7 05:34:30 ghost9 kernel: [82029.008698] CPU: 0 PID: 2257 Comm: glusterfsd
Tainted: G        W    3.11.0-20-generic #34~precise1-Ubuntu
Oct  7 05:34:30 ghost9 kernel: [82029.008705] Hardware name: System
manufacturer System Product Name/P9X79 WS, BIOS 4306 08/22/2013
Oct  7 05:34:30 ghost9 kernel: [82029.008711] task: ffff880fd8219770 ti:
ffff880fdd364000 task.ti: ffff880fdd364000
Oct  7 05:34:30 ghost9 kernel: [82029.008716] RIP: 0010:[<ffffffff812708c1>] 
[<ffffffff812708c1>] __ext4_error_inode+0x31/0x120
Oct  7 05:34:30 ghost9 kernel: [82029.008727] RSP: 0018:ffff880fdd365968 
EFLAGS: 00010282
Oct  7 05:34:30 ghost9 kernel: [82029.008731] RAX: 0000000000000000 RBX:
0000000000000000 RCX: 0000000003c804f2
Oct  7 05:34:30 ghost9 kernel: [82029.008737] RDX: 0000000000001131 RSI:
ffffffff81830eb0 RDI: 0000000000000000
Oct  7 05:34:30 ghost9 kernel: [82029.008745] RBP: ffff880fdd365a08 R08:
ffffffff81b23460 R09: 000000000000000a
Oct  7 05:34:30 ghost9 kernel: [82029.008750] R10: 0000000000000000 R11:
0000000000000000 R12: 0000000000001131
Oct  7 05:34:30 ghost9 kernel: [82029.008755] R13: 0000000000000000 R14:
ffff880fdde52180 R15: ffffffff81b23460
Oct  7 05:34:30 ghost9 kernel: [82029.008761] FS:  00007fcb17efe700(0000)
GS:ffff88103fc00000(0000) knlGS:0000000000000000
Oct  7 05:34:30 ghost9 kernel: [82029.008766] CS:  0010 DS: 0000 ES: 0000 CR0:
0000000080050033
Oct  7 05:34:30 ghost9 kernel: [82029.008770] CR2: 0000000000000028 CR3:
0000000fd6d29000 CR4: 00000000001407f0
Oct  7 05:34:30 ghost9 kernel: [82029.008776] Stack:
Oct  7 05:34:30 ghost9 kernel: [82029.008779]  ffff880fdd365988
ffffffff811e8050 ffff880fe4d82000 ffff880fddd4cc98
Oct  7 05:34:30 ghost9 kernel: [82029.008790]  ffff880fdd365998
ffffffff811e8093 ffff880fdde52180 ffffffff81838030
Oct  7 05:34:30 ghost9 kernel: [82029.008801]  ffff880fdd365a08
ffffffff8127f28d ffff880fdd3659e8 ffff880fe4d82000
Oct  7 05:34:30 ghost9 kernel: [82029.008811] Call Trace:
Oct  7 05:34:30 ghost9 kernel: [82029.008821]  [<ffffffff811e8050>] ?
__sync_dirty_buffer+0xa0/0xd0
Oct  7 05:34:30 ghost9 kernel: [82029.008828]  [<ffffffff811e8093>] ?
sync_dirty_buffer+0x13/0x20
Oct  7 05:34:30 ghost9 kernel: [82029.008836]  [<ffffffff8127f28d>] ?
ext4_journal_abort_handle+0x4d/0xe0
Oct  7 05:34:30 ghost9 kernel: [82029.008843]  [<ffffffff8127f737>]
__ext4_handle_dirty_metadata+0x117/0x1c0
Oct  7 05:34:30 ghost9 kernel: [82029.008854]  [<ffffffff812913f3>] ?
ext4_xattr_block_set+0xd3/0x710
Oct  7 05:34:30 ghost9 kernel: [82029.008865]  [<ffffffff8125444a>]
ext4_do_update_inode+0x36a/0x560
Oct  7 05:34:30 ghost9 kernel: [82029.008873]  [<ffffffff81255e47>]
ext4_mark_iloc_dirty+0x67/0x90
Oct  7 05:34:30 ghost9 kernel: [82029.008879]  [<ffffffff8129204f>]
ext4_xattr_set_handle+0x24f/0x490
Oct  7 05:34:30 ghost9 kernel: [82029.008886]  [<ffffffff81292355>]
ext4_xattr_set+0xc5/0x140
Oct  7 05:34:30 ghost9 kernel: [82029.009104]  [<ffffffff81292e8d>]
ext4_xattr_trusted_set+0x2d/0x30
Oct  7 05:34:30 ghost9 kernel: [82029.009534]  [<ffffffff811d8b6b>]
generic_setxattr+0x6b/0x90
Oct  7 05:34:30 ghost9 kernel: [82029.010056]  [<ffffffff811d949b>]
__vfs_setxattr_noperm+0x7b/0x1c0
Oct  7 05:34:30 ghost9 kernel: [82029.010569]  [<ffffffff81337d8e>] ?
evm_inode_setxattr+0xe/0x10
Oct  7 05:34:30 ghost9 kernel: [82029.011084]  [<ffffffff811d969c>]
vfs_setxattr+0xbc/0xc0
Oct  7 05:34:30 ghost9 kernel: [82029.011604]  [<ffffffff811d97de>]
setxattr+0x13e/0x1e0
Oct  7 05:34:30 ghost9 kernel: [82029.012121]  [<ffffffff817494fe>] ?
_raw_spin_lock+0xe/0x20
Oct  7 05:34:30 ghost9 kernel: [82029.012648]  [<ffffffff811b6ee3>] ?
__sb_start_write+0x53/0x110
Oct  7 05:34:30 ghost9 kernel: [82029.013143]  [<ffffffff811d3492>] ?
mnt_clone_write+0x12/0x30
Oct  7 05:34:30 ghost9 kernel: [82029.013631]  [<ffffffff811d9c7e>]
SyS_fsetxattr+0xbe/0x100
Oct  7 05:34:30 ghost9 kernel: [82029.014109]  [<ffffffff811d9e5d>] ?
SyS_fgetxattr+0x7d/0xd0
Oct  7 05:34:30 ghost9 kernel: [82029.014578]  [<ffffffff8175291d>]
system_call_fastpath+0x1a/0x1f
Oct  7 05:34:30 ghost9 kernel: [82029.015037] Code: 48 89 e5 48 81 ec a0 00 00
00 48 89 5d d8 4c 89 65 e0 41 89 d4 4c 89 6d e8 4c 89 75 f0 48 89 fb 4c 89 7d
f8 4c 89 4d c8 4d 89 c7 <48> 8b 47 28 48 8b 57 40 49 89 f5 49 89 ce 48 8b 80 50
03 00 00 
Oct  7 05:34:30 ghost9 kernel: [82029.016080] RIP  [<ffffffff812708c1>]
__ext4_error_inode+0x31/0x120
Oct  7 05:34:30 ghost9 kernel: [82029.016559]  RSP <ffff880fdd365968>
Oct  7 05:34:30 ghost9 kernel: [82029.017041] CR2: 0000000000000028
Oct  7 05:34:30 ghost9 kernel: [82029.019503] ---[ end trace 655f8cd7683964b0
]---

Version-Release number of selected component (if applicable):

3.5.2-ubuntu1~precise1

How reproducible:

Unable to reproduce, but this happens approximately 1x per week in a 10 node
cluster with 20 compute clients.

Steps to Reproduce:
1. NA

Actual results:

Extended attributes corrupted (not sure if this is an ext4 issue or a gluster
issue). Brick becomes unresponsive instead of crashing or failing gracefully.

Expected results:

No filesystem corruption.
IO fails, or brick goes down and replica responds.

Additional info:

-- 
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=GZfVRXDPcI&a=cc_unsubscribe


More information about the Bugs mailing list