[Gluster-users] NFS blocked for more than 120 seconds on gluster 3.7.12
Georg Schoenberger
g.schoenberger at xortex.com
Mon Jul 11 16:13:23 UTC 2016
Hi Folks,
last week I upgraded gluster from 3.7.11 to 3.7.12. Unfortunately I had two times the problem
of a task being blocked for more than 120 seconds leaving me with nothing more than a hard reset of the node!
kernel: [1705322.676270] INFO: task apache2:3092 blocked for more than 120 seconds.
kernel: [1705322.682202] Not tainted 3.13.0-88-generic #135-Ubuntu
kernel: [1705322.682770] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kernel: [1705322.683738] apache2 D ffff880a03d93180 0 3092 1800 0x00000000
kernel: [1705322.683761] ffff88016f865b38 0000000000000082 ffff8809cbde8000 0000000000013180
kernel: [1705322.683769] ffff88016f865fd8 0000000000013180 ffff8809cbde8000 ffff880a03d93a18
kernel: [1705322.683775] ffff880a03f92d90 0000000000000002 ffffffffa029e670 ffff88016f865bb0
kernel: [1705322.683782] Call Trace:
kernel: [1705322.683978] [<ffffffffa029e670>] ? nfs_free_request+0xb0/0xb0 [nfs]
kernel: [1705322.684043] [<ffffffff8172e10d>] io_schedule+0x9d/0x130
kernel: [1705322.684096] [<ffffffffa029e67e>] nfs_wait_bit_uninterruptible+0xe/0x20 [nfs]
kernel: [1705322.684103] [<ffffffff8172e582>] __wait_on_bit+0x62/0x90
kernel: [1705322.684162] [<ffffffff81617018>] ? sk_reset_timer+0x18/0x30
kernel: [1705322.684193] [<ffffffffa029e670>] ? nfs_free_request+0xb0/0xb0 [nfs]
kernel: [1705322.684203] [<ffffffff8172e627>] out_of_line_wait_on_bit+0x77/0x90
kernel: [1705322.684234] [<ffffffff810adac0>] ? autoremove_wake_function+0x40/0x40
kernel: [1705322.684258] [<ffffffffa029ea33>] nfs_wait_on_request+0x33/0x40 [nfs]
kernel: [1705322.684276] [<ffffffffa02a3a40>] nfs_updatepage+0x150/0x660 [nfs]
kernel: [1705322.684290] [<ffffffffa0294dfb>] nfs_write_end+0x5b/0x340 [nfs]
kernel: [1705322.684318] [<ffffffff811522da>] generic_file_buffered_write+0x16a/0x250
kernel: [1705322.684336] [<ffffffff81153991>] __generic_file_aio_write+0x1c1/0x3d0
kernel: [1705322.684340] [<ffffffff81153bf8>] generic_file_aio_write+0x58/0xa0
kernel: [1705322.684354] [<ffffffffa029406b>] nfs_file_write+0xbb/0x1d0 [nfs]
kernel: [1705322.684387] [<ffffffff811c096a>] do_sync_write+0x5a/0x90
kernel: [1705322.684394] [<ffffffff811c10f4>] vfs_write+0xb4/0x1f0
kernel: [1705322.684399] [<ffffffff811c1b29>] SyS_write+0x49/0xa0
kernel: [1705322.684411] [<ffffffff8173a4dd>] system_call_fastpath+0x1a/0x1f
Some research came up with this links:
* https://www.gluster.org/pipermail/gluster-users/2016-July/027327.html
* http://serverfault.com/questions/500222/kernel-3-8-apache2-with-wsgi-info-task-apache2-blocked-for-more-than-120-sec
* https://www.novell.com/support/kb/doc.php?id=7010287l
The gluster volume serves a home directory for apache/php-fpm and usually the server is quite busy in terms of requests.
As with 3.7.11 I did not have any problems the last few weeks, I am unsure if it is related to the 3.7.11 -> 3.7.12 upgrade.
(or is just the file system blocking?)
My setup is:
# dpkg -l | grep gluster
ii glusterfs-client 3.7.12-ubuntu1~trusty1 amd64 clustered file-system (client package)
ii glusterfs-common 3.7.12-ubuntu1~trusty1 amd64 GlusterFS common libraries and translator modules
ii glusterfs-server 3.7.12-ubuntu1~trusty1 amd64 clustered file-system (server package)
The gluster Volume is mounted on the same host as the volume:
XXX:/gldata on /home/gldata type nfs (rw,nfsvers=3,addr=XXX,_netdev)
Are there any known race conditions with 3.7.12 and NFS?
Should I apply the vm.dirty_ratio settings mentioned in the links above?
Should I downgrade to 3.7.11 or upgrade to 3.7.13?
I appreciate any help,
THX Georg
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160711/6247c143/attachment.html>
More information about the Gluster-users
mailing list