[Gluster-devel] glusterfsd crash due to page allocation failure

Tue Dec 22 17:55:00 UTC 2015

On Tue, Dec 22, 2015 at 05:15:57PM +0000, David Robinson wrote:
> Niels,
> 
> > 1. how is infiniband involved/configured in this environment?
> 
> gfsib01bkp and gfs02bkp are connected via infiniband. We are using tcp
> transport as I never was able to get RDMA to work.
> 
> Volume Name: gfsbackup
> Type: Distribute
> Volume ID: e78d5123-d9bc-4d88-9c73-61d28abf0b41
> Status: Started
> Number of Bricks: 7
> Transport-type: tcp
> Bricks:
> Brick1: gfsib01bkp.corvidtec.com:/data/brick01bkp/gfsbackup
> Brick2: gfsib01bkp.corvidtec.com:/data/brick02bkp/gfsbackup
> Brick3: gfsib02bkp.corvidtec.com:/data/brick01bkp/gfsbackup
> Brick4: gfsib02bkp.corvidtec.com:/data/brick02bkp/gfsbackup
> Brick5: gfsib02bkp.corvidtec.com:/data/brick03bkp/gfsbackup
> Brick6: gfsib02bkp.corvidtec.com:/data/brick04bkp/gfsbackup
> Brick7: gfsib02bkp.corvidtec.com:/data/brick05bkp/gfsbackup
> 
> > 2. was there a change/update of the driver (kernel update maybe?)
> Before upgrading these servers from gluster 3.6.6 to 3.7.6, I did a 'yum
> update' which did upgrade the kernel.
> Current kernel is 2.6.32-573.12.1.el6.x86_64
> 
> > 3. do you get a coredump of the glusterfsd process when this happens?
> There are a series of core files in / around the same time that this
> happens.
> -rw-------    1 root root  168865792 Dec 22 10:45 core.3700
> -rw-------    1 root root  168861696 Dec 22 10:45 core.3661
> -rw-------    1 root root  168861696 Dec 22 10:45 core.3706
> -rw-------    1 root root  168861696 Dec 22 10:45 core.3677
> -rw-------    1 root root  168861696 Dec 22 10:45 core.3669
> -rw-------    1 root root  168857600 Dec 22 10:45 core.3654
> -rw-------    1 root root  254345216 Dec 22 10:45 core.3693
> -rw-------    1 root root  254341120 Dec 22 10:45 core.3685
> 
> > 4. is this a fuse mount process, or a brick process? (check by PID?)
> I have rebooted the machine as it was in a bad state and I could no longer
> write to the gluster volume.
> When it does it again, I will check the PID.
> 
> This machine has both brick processses and fuse mounts.  The storage servers
> mount the volume through a fuse mount and then I use rsync to backup my
> primary storage system.

Many thanks for the details. Could you file a bug for this, mentioning
the exact glusterfs package version and attach at least one of the
cores?

Niels

> 
> David
> 
> 
> 
> >> Hello,
> >>
> >> We've recently upgraded from gluster 3.6.6 to 3.7.6 and have started
> >> encountering dmesg page allocation errors (stack trace is appended).
> >>
> >> It appears that glusterfsd now sometimes fills up the cache completely
> >>and
> >> crashes with a page allocation failure. I *believe* it mainly happens
> >>when
> >> copying lots of new data to the system, running a 'find', or similar.
> >>Hosts
> >> are all Scientific Linux 6.6 and these errors occur consistently on two
> >> separate gluster pools.
> >>
> >> Has anyone else seen this issue and are there any known fixes for it
> >>via
> >> sysctl kernel parameters or other means?
> >>
> >> Please let me know of any other diagnostic information that would help.
> >
> >Could you explain a little more about this? The below is a message from
> >the kernel telling you that the mlx4_ib (Mellanox Infiniband?) driver is
> >requesting more continuous memory than is immediately available.
> >
> >So, the questions I have regarding this:
> >
> >1. how is infiniband involved/configured in this environment?
> >2. was there a change/update of the driver (kernel update maybe?)
> >3. do you get a coredump of the glusterfsd process when this happens?
> >4. is this a fuse mount process, or a brick process? (check by PID?)
> >
> >Thanks,
> >Niels
> >
> >
> >>
> >> Thanks,
> >> Patrick
> >>
> >>
> >> [1458118.134697] glusterfsd: page allocation failure. order:5,
> >>mode:0x20
> >> > [1458118.134701] Pid: 6010, comm: glusterfsd Not tainted
> >> > 2.6.32-573.3.1.el6.x86_64 #1
> >> > [1458118.134702] Call Trace:
> >> > [1458118.134714]  [<ffffffff8113770c>] ?
> >>__alloc_pages_nodemask+0x7dc/0x950
> >> > [1458118.134728]  [<ffffffffa0321800>] ?
> >>mlx4_ib_post_send+0x680/0x1f90
> >> > [mlx4_ib]
> >> > [1458118.134733]  [<ffffffff81176e92>] ? kmem_getpages+0x62/0x170
> >> > [1458118.134735]  [<ffffffff81177aaa>] ? fallback_alloc+0x1ba/0x270
> >> > [1458118.134736]  [<ffffffff811774ff>] ? cache_grow+0x2cf/0x320
> >> > [1458118.134738]  [<ffffffff81177829>] ?
> >>____cache_alloc_node+0x99/0x160
> >> > [1458118.134743]  [<ffffffff8145f732>] ? pskb_expand_head+0x62/0x280
> >> > [1458118.134744]  [<ffffffff81178479>] ? __kmalloc+0x199/0x230
> >> > [1458118.134746]  [<ffffffff8145f732>] ? pskb_expand_head+0x62/0x280
> >> > [1458118.134748]  [<ffffffff8146001a>] ? __pskb_pull_tail+0x2aa/0x360
> >> > [1458118.134751]  [<ffffffff8146f389>] ? harmonize_features+0x29/0x70
> >> > [1458118.134753]  [<ffffffff8146f9f4>] ?
> >>dev_hard_start_xmit+0x1c4/0x490
> >> > [1458118.134758]  [<ffffffff8148cf8a>] ? sch_direct_xmit+0x15a/0x1c0
> >> > [1458118.134759]  [<ffffffff8146ff68>] ? dev_queue_xmit+0x228/0x320
> >> > [1458118.134762]  [<ffffffff8147665d>] ?
> >>neigh_connected_output+0xbd/0x100
> >> > [1458118.134766]  [<ffffffff814abc67>] ? ip_finish_output+0x287/0x360
> >> > [1458118.134767]  [<ffffffff814abdf8>] ? ip_output+0xb8/0xc0
> >> > [1458118.134769]  [<ffffffff814ab04f>] ? __ip_local_out+0x9f/0xb0
> >> > [1458118.134770]  [<ffffffff814ab085>] ? ip_local_out+0x25/0x30
> >> > [1458118.134772]  [<ffffffff814ab580>] ? ip_queue_xmit+0x190/0x420
> >> > [1458118.134773]  [<ffffffff81137059>] ?
> >>__alloc_pages_nodemask+0x129/0x950
> >> > [1458118.134776]  [<ffffffff814c0c54>] ? tcp_transmit_skb+0x4b4/0x8b0
> >> > [1458118.134778]  [<ffffffff814c319a>] ? tcp_write_xmit+0x1da/0xa90
> >> > [1458118.134779]  [<ffffffff81178cbd>] ? __kmalloc_node+0x4d/0x60
> >> > [1458118.134780]  [<ffffffff814c3a80>] ? tcp_push_one+0x30/0x40
> >> > [1458118.134782]  [<ffffffff814b410c>] ? tcp_sendmsg+0x9cc/0xa20
> >> > [1458118.134786]  [<ffffffff8145836b>] ? sock_aio_write+0x19b/0x1c0
> >> > [1458118.134788]  [<ffffffff814581d0>] ? sock_aio_write+0x0/0x1c0
> >> > [1458118.134791]  [<ffffffff8119169b>] ?
> >>do_sync_readv_writev+0xfb/0x140
> >> > [1458118.134797]  [<ffffffff810a14b0>] ?
> >>autoremove_wake_function+0x0/0x40
> >> > [1458118.134801]  [<ffffffff8123e92f>] ?
> >>selinux_file_permission+0xbf/0x150
> >> > [1458118.134804]  [<ffffffff812316d6>] ?
> >>security_file_permission+0x16/0x20
> >> > [1458118.134806]  [<ffffffff81192746>] ? do_readv_writev+0xd6/0x1f0
> >> > [1458118.134807]  [<ffffffff811928a6>] ? vfs_writev+0x46/0x60
> >> > [1458118.134809]  [<ffffffff811929d1>] ? sys_writev+0x51/0xd0
> >> > [1458118.134812]  [<ffffffff810e88ae>] ?
> >>__audit_syscall_exit+0x25e/0x290
> >> > [1458118.134816]  [<ffffffff8100b0d2>] ?
> >>system_call_fastpath+0x16/0x1b
> >> >
> >
> >> _______________________________________________
> >> Gluster-devel mailing list
> >> Gluster-devel at gluster.org
> >> http://www.gluster.org/mailman/listinfo/gluster-devel
> >
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <http://www.gluster.org/pipermail/gluster-devel/attachments/20151222/2e1285e9/attachment.sig>