[Gluster-users] gluster fuse disk state problem
Anthony J. Biacco
abiacco at formatdynamics.com
Wed May 18 16:34:36 UTC 2011
After I updated to gluster 3.2.0, I noticed that on 2 separate dist/repl
setups I hit a problem where a writing process hung in a disk 'D' state
on the mount.
Nothing other than a reboot (even a kill -9) would kill said process. An
strace did not show any information once the process hit the 'D' state.
Now it could be fuse, or not. I wasn't running 3.1.4 all that long.
Maybe a month and a half. But this problem has shown up with 3.2.0 right
away, within a day.
* First instance of problem
Occurred rsyncing files from a client's gluster mount to a remote
server, e.g. rsync /gluster_path user at host::remote_path
Setup:
2 gluster servers, RAID1 replicated
Servers/Client: RHEL5.5, kernel 2.6.18-238.1.1el5.x86_64
fuse-libs-2.7.4-8.el5
glusterfs-core-3.2.0-1
fuse-2.7.4-8.el5
glusterfs-fuse-3.2.0-1
Errors:
INFO: task rsync:10652 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
rsync D ffff81000103f1a0 0 10652 10642
(NOTLB)
ffff81021d641c08 0000000000000086 ffff810133ef61d0 ffffffff883e8219
ffff810201df8600 000000000000000a ffff8101121c9080 ffff81022fce1100
000616db22314a67 000000000000f9a2 ffff8101121c9268 00000007883ecf35
Call Trace:
[<ffffffff883e8219>] :fuse:flush_bg_queue+0x2b/0x48
[<ffffffff8006ec4e>] do_gettimeofday+0x40/0x90
[<ffffffff80028b0b>] sync_page+0x0/0x43
[<ffffffff800637ca>] io_schedule+0x3f/0x67
[<ffffffff80028b49>] sync_page+0x3e/0x43
[<ffffffff8006390e>] __wait_on_bit_lock+0x36/0x66
[<ffffffff8003fdc1>] __lock_page+0x5e/0x64
[<ffffffff800a28e2>] wake_bit_function+0x0/0x23
[<ffffffff8000c3d7>] do_generic_mapping_read+0x1df/0x359
[<ffffffff8000d1c3>] file_read_actor+0x0/0x159
[<ffffffff8000c69d>] __generic_file_aio_read+0x14c/0x198
[<ffffffff800c8c08>] generic_file_read+0xac/0xc5
[<ffffffff801ab4a8>] tty_default_put_char+0x1d/0x1f
[<ffffffff800a28b4>] autoremove_wake_function+0x0/0x2e
[<ffffffff8003a8cf>] tty_ldisc_deref+0x68/0x7b
[<ffffffff8000b78d>] vfs_read+0xcb/0x171
[<ffffffff80011c7e>] sys_read+0x45/0x6e
[<ffffffff8005d116>] system_call+0x7e/0x83
* Second instance of problem
Occurred with apache accessing CGI script on client's gluster mount.
Process also in disk 'D' state like other setup.
Setup:
2 gluster servers, RAID1 replicated
Servers: CentOS 5.6, kernel 2.6.18-194.32.1.el5.x86_64
Client: CentOS 5.6, kernl 2.6.18-238.9.1.el5
fuse-2.7.4-8.el5
glusterfs-core-3.2.0-1
fuse-libs-2.7.4-8.el5
glusterfs-fuse-3.2.0-1
fuse-libs-2.7.4-8.el5
Errors:
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
index.cgi D ffff810001c7eaa0 0 14772 26749 14840
(NOTLB)
ffff81002b3efc08 0000000000000082 ffff81005e929920 ffffffff885121ac
ffff81001772fd00 0000000000000009 ffff810013180100 ffff810001fd30c0
0013494028f4652a 00000000001826e3 ffff8100131802e8 00000001000d597a
Call Trace:
[<ffffffff885121ac>] :fuse:request_send_nowait+0x56/0x78
[<ffffffff8006e1d7>] do_gettimeofday+0x40/0x90
[<ffffffff80028b44>] sync_page+0x0/0x43
[<ffffffff800637ea>] io_schedule+0x3f/0x67
[<ffffffff80028b82>] sync_page+0x3e/0x43
[<ffffffff8006392e>] __wait_on_bit_lock+0x36/0x66
[<ffffffff8003fce0>] __lock_page+0x5e/0x64
[<ffffffff800a0b8d>] wake_bit_function+0x0/0x23
[<ffffffff8000c373>] do_generic_mapping_read+0x1df/0x359
[<ffffffff8000d18c>] file_read_actor+0x0/0x159
[<ffffffff8000c639>] __generic_file_aio_read+0x14c/0x198
[<ffffffff800c6964>] generic_file_read+0xac/0xc5
[<ffffffff800a0b5f>] autoremove_wake_function+0x0/0x2e
[<ffffffff80062ff8>] thread_return+0x62/0xfe
[<ffffffff8000b729>] vfs_read+0xcb/0x171
[<ffffffff80011c15>] sys_read+0x45/0x6e
[<ffffffff8005d116>] system_call+0x7e/0x83
-Tony
---------------------------
Manager, IT Operations
Format Dynamics, Inc.
P: 303-228-7327
F: 303-228-7305
abiacco at formatdynamics.com
http://www.formatdynamics.com
More information about the Gluster-users
mailing list