[Gluster-devel] glusterfsd crash due to page allocation failure

Mon Dec 21 20:55:08 UTC 2015

Hello,

We've recently upgraded from gluster 3.6.6 to 3.7.6 and have started
encountering dmesg page allocation errors (stack trace is appended).

It appears that glusterfsd now sometimes fills up the cache completely and
crashes with a page allocation failure. I *believe* it mainly happens when
copying lots of new data to the system, running a 'find', or similar. Hosts
are all Scientific Linux 6.6 and these errors occur consistently on two
separate gluster pools.

Has anyone else seen this issue and are there any known fixes for it via
sysctl kernel parameters or other means?

Please let me know of any other diagnostic information that would help.

Thanks,
Patrick

[1458118.134697] glusterfsd: page allocation failure. order:5, mode:0x20
> [1458118.134701] Pid: 6010, comm: glusterfsd Not tainted
> 2.6.32-573.3.1.el6.x86_64 #1
> [1458118.134702] Call Trace:
> [1458118.134714]  [<ffffffff8113770c>] ? __alloc_pages_nodemask+0x7dc/0x950
> [1458118.134728]  [<ffffffffa0321800>] ? mlx4_ib_post_send+0x680/0x1f90
> [mlx4_ib]
> [1458118.134733]  [<ffffffff81176e92>] ? kmem_getpages+0x62/0x170
> [1458118.134735]  [<ffffffff81177aaa>] ? fallback_alloc+0x1ba/0x270
> [1458118.134736]  [<ffffffff811774ff>] ? cache_grow+0x2cf/0x320
> [1458118.134738]  [<ffffffff81177829>] ? ____cache_alloc_node+0x99/0x160
> [1458118.134743]  [<ffffffff8145f732>] ? pskb_expand_head+0x62/0x280
> [1458118.134744]  [<ffffffff81178479>] ? __kmalloc+0x199/0x230
> [1458118.134746]  [<ffffffff8145f732>] ? pskb_expand_head+0x62/0x280
> [1458118.134748]  [<ffffffff8146001a>] ? __pskb_pull_tail+0x2aa/0x360
> [1458118.134751]  [<ffffffff8146f389>] ? harmonize_features+0x29/0x70
> [1458118.134753]  [<ffffffff8146f9f4>] ? dev_hard_start_xmit+0x1c4/0x490
> [1458118.134758]  [<ffffffff8148cf8a>] ? sch_direct_xmit+0x15a/0x1c0
> [1458118.134759]  [<ffffffff8146ff68>] ? dev_queue_xmit+0x228/0x320
> [1458118.134762]  [<ffffffff8147665d>] ? neigh_connected_output+0xbd/0x100
> [1458118.134766]  [<ffffffff814abc67>] ? ip_finish_output+0x287/0x360
> [1458118.134767]  [<ffffffff814abdf8>] ? ip_output+0xb8/0xc0
> [1458118.134769]  [<ffffffff814ab04f>] ? __ip_local_out+0x9f/0xb0
> [1458118.134770]  [<ffffffff814ab085>] ? ip_local_out+0x25/0x30
> [1458118.134772]  [<ffffffff814ab580>] ? ip_queue_xmit+0x190/0x420
> [1458118.134773]  [<ffffffff81137059>] ? __alloc_pages_nodemask+0x129/0x950
> [1458118.134776]  [<ffffffff814c0c54>] ? tcp_transmit_skb+0x4b4/0x8b0
> [1458118.134778]  [<ffffffff814c319a>] ? tcp_write_xmit+0x1da/0xa90
> [1458118.134779]  [<ffffffff81178cbd>] ? __kmalloc_node+0x4d/0x60
> [1458118.134780]  [<ffffffff814c3a80>] ? tcp_push_one+0x30/0x40
> [1458118.134782]  [<ffffffff814b410c>] ? tcp_sendmsg+0x9cc/0xa20
> [1458118.134786]  [<ffffffff8145836b>] ? sock_aio_write+0x19b/0x1c0
> [1458118.134788]  [<ffffffff814581d0>] ? sock_aio_write+0x0/0x1c0
> [1458118.134791]  [<ffffffff8119169b>] ? do_sync_readv_writev+0xfb/0x140
> [1458118.134797]  [<ffffffff810a14b0>] ? autoremove_wake_function+0x0/0x40
> [1458118.134801]  [<ffffffff8123e92f>] ? selinux_file_permission+0xbf/0x150
> [1458118.134804]  [<ffffffff812316d6>] ? security_file_permission+0x16/0x20
> [1458118.134806]  [<ffffffff81192746>] ? do_readv_writev+0xd6/0x1f0
> [1458118.134807]  [<ffffffff811928a6>] ? vfs_writev+0x46/0x60
> [1458118.134809]  [<ffffffff811929d1>] ? sys_writev+0x51/0xd0
> [1458118.134812]  [<ffffffff810e88ae>] ? __audit_syscall_exit+0x25e/0x290
> [1458118.134816]  [<ffffffff8100b0d2>] ? system_call_fastpath+0x16/0x1b
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-devel/attachments/20151221/d3e09c03/attachment.html>