[Gluster-users] Hanging writes after upgrading "clients" to debian squeeze
Stefan Becker
sbecker at rapidsoft.de
Sun Feb 5 22:03:16 UTC 2012
Hi Brian,
ok tcpdump and strace is "raining". He is doing a lot and he is connected to .40 and .41. During my tests now I found out that the hangs are random. There is one touch I have been trying the whole day and this one is still hanging. Other touches (other directories/filesnames) work randomly. What I found now using dmesg is:
[17514.155548] INFO: task touch:25873 blocked for more than 120 seconds.
[17514.155583] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[17514.155627] touch D 0000000000000000 0 25873 20727 0x00000004
[17514.155630] ffff88023f0e2a60 0000000000000086 0000000000000000 ffffffff81049fd0
[17514.155633] ffff8801dc7c3c28 000000000000f9e0 ffff8801dc7c3fd8 0000000000015780
[17514.155635] 0000000000015780 ffff88023bc8e2e0 ffff88023bc8e5d8 000000058103a866
[17514.155638] Call Trace:
[17514.155644] [<ffffffff81049fd0>] ? try_to_wake_up+0x249/0x259
[17514.155646] [<ffffffff8103f80c>] ? __wake_up+0x30/0x44
[17514.155651] [<ffffffffa0181ab1>] ? fuse_request_send+0x1a2/0x255 [fuse]
[17514.155654] [<ffffffff810649da>] ? autoremove_wake_function+0x0/0x2e
[17514.155657] [<ffffffffa0182992>] ? fuse_request_alloc+0x22/0x27 [fuse]
[17514.155660] [<ffffffffa0187da2>] ? fuse_file_alloc+0xc4/0xeb [fuse]
[17514.155663] [<ffffffffa0184646>] ? fuse_create+0x1ce/0x38f [fuse]
[17514.155667] [<ffffffff810f7180>] ? vfs_create+0x6d/0x89
[17514.155669] [<ffffffff810f80a9>] ? do_filp_open+0x31e/0x94b
[17514.155673] [<ffffffff810cc2d5>] ? handle_mm_fault+0x3b8/0x80f
[17514.155676] [<ffffffff810ec8af>] ? do_sys_open+0x55/0xfc
[17514.155678] [<ffffffff81010b42>] ? system_call_fastpath+0x16/0x1b
Does not look too good. This reminds me about a kernel/gluster bug I had a year ago and could only be fixed turning off "quickread" (it still is off, I checked).
Any other ideas?
-
Stefan
-----Ursprüngliche Nachricht-----
Von: gluster-users-bounces at gluster.org [mailto:gluster-users-bounces at gluster.org] Im Auftrag von Stefan Becker
Gesendet: Sonntag, 5. Februar 2012 22:30
An: Brian Candler
Cc: gluster-users at gluster.org
Betreff: Re: [Gluster-users] Hanging writes after upgrading "clients" to debian squeeze
Hi Brian,
thanks for your help, I will play around with what you said and come back with results or the solution :)
Greets,
Stefan
-----Ursprüngliche Nachricht-----
Von: Brian Candler [mailto:B.Candler at pobox.com]
Gesendet: Sonntag, 5. Februar 2012 22:28
An: Stefan Becker
Cc: Whit Blauvelt; gluster-users at gluster.org
Betreff: Re: [Gluster-users] Hanging writes after upgrading "clients" to debian squeeze
On Sun, Feb 05, 2012 at 09:49:47PM +0100, Stefan Becker wrote:
> - no ip tables involved
OK. So how about this on the client:
tcpdump -i eth0 -nn host 10.10.100.40 or host 10.10.100.41
(replace eth0 as necessary)
That will show you traffic to and from the bricks. When you issue a write
(e.g. touch /path/to/foo), does traffic only go out to one brick? Do you
see any TCP retransmissions? Does 'netstat -nt' show TCP connections to both
bricks? Does Send-Q stay at zero most of the time, or is it stuck at a
non-zero value?
You could also try:
strace -p <pid-of-glusterfs-process>
on the client as well. You should see writev(fd,...) and readv(fd,...) with
different fds for communication to each of the bricks. Then try issuing
a single write.
The strace output may not tell you much by itself, but if you compare what
you see on a non-upgraded (working) client versus an upgraded (broken)
client, you might be able to see what it's getting stuck on.
Regards,
Brian.
_______________________________________________
Gluster-users mailing list
Gluster-users at gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
More information about the Gluster-users
mailing list