[From nobody Mon Aug  4 12:26:02 2014
Received: from SPAMFW.knight.com (209.191.246.40) by
	JC1WSXCH01.global.knight.com (10.20.58.99) with Microsoft SMTP Server
	id 8.3.83.0; Tue, 12 Jul 2011 10:34:13 -0400
Received: from gluster.org (184-106-200-248.static.cloud-ips.com
	[184.106.200.248]) by SPAMFW.knight.com with ESMTP id iFkXzBqHP6VfqN4l
	for &lt;jburnash@knight.com&gt;; Tue, 12 Jul 2011 10:33:47 -0400 (EDT)
Received: from gluster.com-mirror1 (localhost [127.0.0.1])	by gluster.org
	(Postfix) with ESMTP id BD2EFC5C08A;
	Tue, 12 Jul 2011 07:37:36 -0700 (PDT)
Received: from MAIL11.knight.com (jc1mx01.knighttrading.com [209.191.246.37])
	by gluster.org (Postfix) with ESMTP id CE85AC5C087	for
	&lt;gluster-users@gluster.org&gt;; Tue, 12 Jul 2011 07:37:35 -0700 (PDT)
Received: from pr1wsxch01.global.knight.com ([10.255.58.99]) by
	MAIL11.knight.com with Microsoft SMTPSVC(6.0.3790.4675);
	Tue, 12 Jul 2011 10:33:42 -0400
Received: from JC1WSXCH01.global.knight.com (10.20.58.99) by
	pr1wsxch01.global.knight.com (10.255.58.99) with Microsoft SMTP
	Server	(TLS) id 8.3.83.0; Tue, 12 Jul 2011 10:33:41 -0400
Received: from EXCHANGE3.global.knight.com ([169.254.1.141]) by
	JC1WSXCH01.global.knight.com ([::1]) with mapi;
	Tue, 12 Jul 2011 10:33:41 -0400
From: &quot;Burnash, James&quot; &lt;jburnash@knight.com&gt;
To: &quot;gluster-users@gluster.org&quot; &lt;gluster-users@gluster.org&gt;
Sender: &quot;gluster-users-bounces@gluster.org&quot; &lt;gluster-users-bounces@gluster.org&gt;
Date: Tue, 12 Jul 2011 10:33:41 -0400
Subject: [Gluster-users] CentOS 5.5 kernel bugs can cause temporary hangs
	upon client access to GlusterFS
Thread-Topic: [Gluster-users] CentOS 5.5 kernel bugs can cause temporary
	hangs upon client access to GlusterFS
Thread-Index: AcxAn/iI32kci4tMSluwAmLk4PEdbg==
Message-ID: &lt;9AD565C4A8561349B7227B79DDB98873708266EB18@EXCHANGE3.global.knight.com&gt;
List-Help: &lt;mailto:gluster-users-request@gluster.org?subject=help&gt;
List-Subscribe: &lt;http://gluster.org/cgi-bin/mailman/listinfo/gluster-users&gt;,
	&lt;mailto:gluster-users-request@gluster.org?subject=subscribe&gt;
List-Unsubscribe: &lt;http://gluster.org/cgi-bin/mailman/options/gluster-users&gt;, 
	&lt;mailto:gluster-users-request@gluster.org?subject=unsubscribe&gt;
Accept-Language: en-US
Content-Language: en-US
X-MS-Exchange-Organization-AuthMechanism: 10
X-MS-Exchange-Organization-AuthSource: JC1WSXCH01.global.knight.com
X-MS-Has-Attach: 
X-Auto-Response-Suppress: All
X-MS-TNEF-Correlator: 
acceptlanguage: en-US
x-pp-processed: __PP2__5afd85e8-dd41-4b77-b2d3-4f094625b58d
x-asg-orig-subj: [Gluster-users] CentOS 5.5 kernel bugs can cause temporary
	hangs upon client access to GlusterFS
x-barracuda-start-time: 1310481227
x-barracuda-url: http://SPAM:80/cgi-mod/mark.cgi
x-asg-debug-id: 1310481226-02483a65ff00560001-KAUq6I
x-barracuda-connect: 184-106-200-248.static.cloud-ips.com[184.106.200.248]
x-barracuda-envelope-from: gluster-users-bounces@gluster.org
errors-to: gluster-users-bounces@gluster.org
x-virus-scanned: by bsmtpd at knight.com
list-id: Gluster General Discussion List  &lt;gluster-users.gluster.org&gt;
x-brightmail-tracker: AAAAAA==
x-auditid: d1bff625-00000e1c000006d0-50-4e1c5b463736
delivered-to: gluster-users@gluster.org
x-original-to: gluster-users@gluster.org
list-archive: &lt;http://gluster.org/pipermail/gluster-users&gt;
x-beenthere: gluster-users@gluster.org
x-mailman-version: 2.1.11
list-post: &lt;mailto:gluster-users@gluster.org&gt;
x-barracuda-apparent-source-ip: 184.106.200.248
x-barracuda-user-whitelist: jburnash@knight.com
Content-Type: text/plain; charset=&quot;us-ascii&quot;
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0

Got a complaint from a user - the native GlusterFS mountpoint was completel=
y inaccessible from many (if not all) clients attempting to read or write f=
rom it.

Apparently not the fault of GlusterFS - here's the entry from the messages =
file:

Jul  8 16:15:13 jc1letgfs13 kernel: [3022057.692284] INFO: task glusterfsd:=
12902 blocked for more than 120 seconds.
Jul  8 16:15:13 jc1letgfs13 kernel: [3022057.692544] &quot;echo 0 &gt; /proc/sys/ke=
rnel/hung_task_timeout_secs&quot; disables this message.
Jul  8 16:15:13 jc1letgfs13 kernel: [3022057.693037] glusterfsd    D ffffff=
ff80151248     0 12902      1         12904 12903 (NOTLB)
Jul  8 16:15:13 jc1letgfs13 kernel: [3022057.693553]  ffff81061190bbf8 0000=
000000000086 ffff81061190bea8 0000000000000000
Jul  8 16:15:13 jc1letgfs13 kernel: [3022057.694099]  000000000000000c 0000=
00000000000a ffff810627eec0c0 ffff810c27f32100
Jul  8 16:15:13 jc1letgfs13 kernel: [3022057.694660]  000abc5dc58f770c 0000=
000000005135 ffff810627eec2a8 000000038000b3fd

... and here's one for a non-Gluster process:

Jul  8 16:07:13 jc1letgfs13 kernel: [3021577.761299] INFO: task jbd2/cciss!=
c2d0:4090 blocked for more than 120 seconds.
Jul  8 16:07:13 jc1letgfs13 kernel: [3021577.761908] &quot;echo 0 &gt; /proc/sys/ke=
rnel/hung_task_timeout_secs&quot; disables this message.
Jul  8 16:07:13 jc1letgfs13 kernel: [3021577.762505] jbd2/cciss!c2 D ffffff=
ff80151248     0  4090    456          4091  4085 (L-TLB)
Jul  8 16:07:13 jc1letgfs13 kernel: [3021577.763129]  ffff810617e45d60 0000=
000000000046 ffff810617e45da0 ffffffff8008ccb0
Jul  8 16:07:13 jc1letgfs13 kernel: [3021577.763753]  ffff810617e45cf0 0000=
00000000000a ffff81063d22e820 ffff810c20b3c100
Jul  8 16:07:13 jc1letgfs13 kernel: [3021577.764370]  000abbf070cd535b 0000=
000000003c6a ffff81063d22ea08 0000000300000000
Jul  8 16:07:13 jc1letgfs13 kernel: [3021577.764693] Call Trace:
Jul  8 16:07:13 jc1letgfs13 kernel: [3021577.765247]  [&lt;ffffffff8008ccb0&gt;] =
find_busiest_group+0x20d/0x621
Jul  8 16:07:13 jc1letgfs13 kernel: [3021577.765543]  [&lt;ffffffff88342fad&gt;] =
:jbd2:jbd2_journal_commit_transaction+0x191/0x1080
Jul  8 16:07:13 jc1letgfs13 kernel: [3021577.766064]  [&lt;ffffffff800a1ba4&gt;] =
autoremove_wake_function+0x0/0x2e
Jul  8 16:07:13 jc1letgfs13 kernel: [3021577.766327]  [&lt;ffffffff8003ddd5&gt;] =
lock_timer_base+0x1b/0x3c
Jul  8 16:07:13 jc1letgfs13 kernel: [3021577.766588]  [&lt;ffffffff8004b6b6&gt;] =
try_to_del_timer_sync+0x7f/0x88
Jul  8 16:07:13 jc1letgfs13 kernel: [3021577.766853]  [&lt;ffffffff88346d72&gt;] =
:jbd2:kjournald2+0x9a/0x1ec
Jul  8 16:07:13 jc1letgfs13 kernel: [3021577.767109]  [&lt;ffffffff800a1ba4&gt;] =
autoremove_wake_function+0x0/0x2e
Jul  8 16:07:13 jc1letgfs13 kernel: [3021577.767374]  [&lt;ffffffff88346cd8&gt;] =
:jbd2:kjournald2+0x0/0x1ec
Jul  8 16:07:13 jc1letgfs13 kernel: [3021577.767627]  [&lt;ffffffff800a198c&gt;] =
keventd_create_kthread+0x0/0xc4
Jul  8 16:07:13 jc1letgfs13 kernel: [3021577.767880]  [&lt;ffffffff80032bdc&gt;] =
kthread+0xfe/0x132
Jul  8 16:07:13 jc1letgfs13 kernel: [3021577.768138]  [&lt;ffffffff8005efb1&gt;] =
child_rip+0xa/0x11
Jul  8 16:07:13 jc1letgfs13 kernel: [3021577.768399]  [&lt;ffffffff800a198c&gt;] =
keventd_create_kthread+0x0/0xc4
Jul  8 16:07:13 jc1letgfs13 kernel: [3021577.768656]  [&lt;ffffffff80032ade&gt;] =
kthread+0x0/0x132
Jul  8 16:07:13 jc1letgfs13 kernel: [3021577.768922]  [&lt;ffffffff8005efa7&gt;] =
child_rip+0x0/0x11

Haven't found the specific bug number for this (CentOS 5.5) yet.

Running GlusterFS 3.1.3 on clients and 2 servers setup up as Replicated-Dis=
tribute.

Hopefully this will help others. I will be upgrading to CentOS 5.6 as soon =
as possible on these servers.

Kudos to my coworker Joe Collette for running this issue to ground.

James Burnash
Unix Engineer
Knight Capital Group



DISCLAIMER:
This e-mail, and any attachments thereto, is intended only for use by the a=
ddressee(s) named herein and may contain legally privileged and/or confiden=
tial information. If you are not the intended recipient of this e-mail, you=
 are hereby notified that any dissemination, distribution or copying of thi=
s e-mail, and any attachments thereto, is strictly prohibited. If you have =
received this in error, please immediately notify me and permanently delete=
 the original and any copy of any e-mail and any printout thereof. E-mail t=
ransmission cannot be guaranteed to be secure or error-free. The sender the=
refore does not accept liability for any errors or omissions in the content=
s of this message which arise as a result of e-mail transmission.
NOTICE REGARDING PRIVACY AND CONFIDENTIALITY Knight Capital Group may, at i=
ts discretion, monitor and review the content of all e-mail communications.=
 http://www.knight.com
_______________________________________________
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
]