[Bugs] [Bug 1707195] VM stuck in a shutdown because of a pending fuse request
bugzilla at redhat.com
bugzilla at redhat.com
Tue May 7 02:48:00 UTC 2019
https://bugzilla.redhat.com/show_bug.cgi?id=1707195
--- Comment #1 from Raghavendra G <rgowdapp at redhat.com> ---
VM fails to shutdown, getting stuck in 'Powering down' status. This is because
its 'qemu-kvm' process gets in a zombie/defunct state:
more ps-Ll.txt
F S UID PID PPID LWP C PRI NI ADDR SZ WCHAN TTY TIME CMD
6 Z 107 20631 1 20631 0 80 0 - 0 do_exi ? 8:45
[qemu-kvm] <defunct>
3 D 107 20631 1 20635 0 80 0 - 2386845 fuse_r ? 1:12
[qemu-kvm] <defunct>
The customer has collected a crash dump of the affected VM and also statedumps
from all the glusterfs process running in this machine when this problem is
present.
Thread ID 20635 is the one of interest:
crash> bt 20635
PID: 20635 TASK: ffff9ed3926eb0c0 CPU: 7 COMMAND: "IO iothread1"
#0 [ffff9ec8e351fa28] __schedule at ffffffff91967747
#1 [ffff9ec8e351fab0] schedule at ffffffff91967c49
#2 [ffff9ec8e351fac0] __fuse_request_send at ffffffffc09d24e5 [fuse]
#3 [ffff9ec8e351fb30] fuse_request_send at ffffffffc09d26e2 [fuse]
#4 [ffff9ec8e351fb40] fuse_send_write at ffffffffc09dbc76 [fuse]
#5 [ffff9ec8e351fb70] fuse_direct_io at ffffffffc09dc0d6 [fuse]
#6 [ffff9ec8e351fc58] __fuse_direct_write at ffffffffc09dc562 [fuse]
#7 [ffff9ec8e351fca8] fuse_direct_IO at ffffffffc09dd3ca [fuse]
#8 [ffff9ec8e351fd70] generic_file_direct_write at ffffffff913b8663
#9 [ffff9ec8e351fdc8] fuse_file_aio_write at ffffffffc09ddbd5 [fuse]
#10 [ffff9ec8e351fe60] do_io_submit at ffffffff91497a73
#11 [ffff9ec8e351ff40] sys_io_submit at ffffffff91497f40
#12 [ffff9ec8e351ff50] tracesys at ffffffff9197505b (via system_call)
RIP: 00007f9ff0758697 RSP: 00007f9db86814b8 RFLAGS: 00000246
RAX: ffffffffffffffda RBX: 0000000000000001 RCX: ffffffffffffffff
RDX: 00007f9db86814d0 RSI: 0000000000000001 RDI: 00007f9ff268e000
RBP: 0000000000000080 R8: 0000000000000080 R9: 000000000000006a
R10: 0000000000000078 R11: 0000000000000246 R12: 00007f9db86814c0
R13: 0000560264b9b518 R14: 0000560264b9b4f0 R15: 00007f9db8681bb0
ORIG_RAX: 00000000000000d1 CS: 0033 SS: 002b
>From the core, this is the file the above process is writing to:
crash> files -d 0xffff9ec8e8f9f740
DENTRY INODE SUPERBLK TYPE PATH
ffff9ec8e8f9f740 ffff9ed39e705700 ffff9ee009adc000 REG
/rhev/data-center/mnt/glusterSD/172.16.20.21:_vmstore2/e5dd645f-88bb-491c-9145-38fa229cbc4d/images/8e84c1ed-48ba-4b82-9882-c96e6f260bab/29bba0a1-6c7b-4358-9ef2-f8080405778d
So in this case we're accessing the vmstore2 volume.
This is the glusterfs process:
root 4863 0.0 0.0 1909580 49316 ? S<sl Apr04 25:51
/usr/sbin/glusterfs --volfile-server=172.16.20.21 --volfile-server=172.16.20.22
--volfile-server=172.16.20.23 --volfile-id=/vmstore2
/rhev/data-center/mnt/glusterSD/172.16.20.21:_vmstore2
Backtrace of this process on the crash dump:
crash> bt 4863
PID: 4863 TASK: ffff9edfa9ff9040 CPU: 11 COMMAND: "glusterfs"
#0 [ffff9ed3a332fc28] __schedule at ffffffff91967747
#1 [ffff9ed3a332fcb0] schedule at ffffffff91967c49
#2 [ffff9ed3a332fcc0] futex_wait_queue_me at ffffffff9130cf76
#3 [ffff9ed3a332fd00] futex_wait at ffffffff9130dc5b
#4 [ffff9ed3a332fe48] do_futex at ffffffff9130f9a6
#5 [ffff9ed3a332fed8] sys_futex at ffffffff9130fec0
#6 [ffff9ed3a332ff50] system_call_fastpath at ffffffff91974ddb
RIP: 00007f6e5eeccf47 RSP: 00007ffdd311c7d0 RFLAGS: 00000246
RAX: 00000000000000ca RBX: 00007f6e59496700 RCX: ffffffffffffffff
RDX: 0000000000001308 RSI: 0000000000000000 RDI: 00007f6e594969d0
RBP: 00007f6e60552780 R8: 0000000000000000 R9: 00007f6e5e6e314d
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f6e59496d28
R13: 0000000000000000 R14: 0000000000000006 R15: 00007ffdd311c920
ORIG_RAX: 00000000000000ca CS: 0033 SS: 002b
We have a few pending frames in this process. Reviewing the corresponding
statedump:
grep complete=0 glusterdump.4863.dump.1556091368 -c
7
Looking for these pending frames in the statedump:
~~~
[global.callpool.stack.1]
stack=0x7f6e4007c828
uid=107
gid=107
pid=20635
unique=5518502
lk-owner=bd2351a6cc7fcb8b
op=WRITE
type=1
cnt=6
[global.callpool.stack.1.frame.1]
frame=0x7f6dec04de38
ref_count=0
translator=vmstore2-write-behind
complete=0
parent=vmstore2-open-behind
wind_from=default_writev_resume
wind_to=(this->children->xlator)->fops->writev
unwind_to=default_writev_cbk
[global.callpool.stack.1.frame.2]
frame=0x7f6dec0326f8
ref_count=1
translator=vmstore2-open-behind
complete=0
parent=vmstore2-md-cache
wind_from=mdc_writev
wind_to=(this->children->xlator)->fops->writev
unwind_to=mdc_writev_cbk
[global.callpool.stack.1.frame.3]
frame=0x7f6dec005bf8
ref_count=1
translator=vmstore2-md-cache
complete=0
parent=vmstore2-io-threads
wind_from=default_writev_resume
wind_to=(this->children->xlator)->fops->writev
unwind_to=default_writev_cbk
[global.callpool.stack.1.frame.4]
frame=0x7f6e400ab0f8
ref_count=1
translator=vmstore2-io-threads
complete=0
parent=vmstore2
wind_from=io_stats_writev
wind_to=(this->children->xlator)->fops->writev
unwind_to=io_stats_writev_cbk
[global.callpool.stack.1.frame.5]
frame=0x7f6e4007c6c8
ref_count=1
translator=vmstore2
complete=0
parent=fuse
wind_from=fuse_write_resume
wind_to=FIRST_CHILD(this)->fops->writev
unwind_to=fuse_writev_cbk
[global.callpool.stack.1.frame.6]
frame=0x7f6e4002cb98
ref_count=1
translator=fuse
complete=0
~~~
So I believe we're pending in the 'write-behind' translator. Please, I'd need
some help to figure out the cause of the hang.
Thank you,
Natalia
--
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.
More information about the Bugs
mailing list