[Gluster-users] CentOS Freeze with GlusterFS Error

Fri Jan 23 05:20:01 UTC 2015

My gut still says it could be related to the multipath.
I never got the answer to whether the bricks are using the multipath'ed
devices using mpathXX device or you are direclty using the dm-X device ?

If dm-X then are you ensuring that you are NOT using 2 dm-X device that map
to the same LUN on the backend SAN ?
My hunch is that in case you are doing that and xfs'ing the 2 dm-X and
using then as separate bricks anything can happen

So trying to remove multipath or even before that stop glusterfs volumes
(which should stop glusterfsd process, hence no IO on the xfs bricks) and
see if this re-creates
Since we are seeing glusterfsd everytime the kernel bug shows up, it may
not be a co-incidence but a possibility due to invalud multipath setup

thanx,
deepak

On Thu, Jan 22, 2015 at 12:57 AM, Niels de Vos <ndevos at redhat.com> wrote:

> On Wed, Jan 21, 2015 at 10:11:20PM +0530, chamara samarakoon wrote:
> > HI All,
> >
> >
> > Same error encountered again before trying anything else. So I took
> screen
> > shot  with more details of the incident.
>
> This shows an XFS error. So it can be a problem with XFS, or something
> that contributes to it in the XFS path. I would guess it is caused by an
> issue on the disk(s) because there is the mentioning of corruption.
> However, it could also be bad RAM, or an other hardware component that
> is used to access data from the disks. I suggest you take two
> approaches:
>
> 1. run hardware tests - if the error is detected, contact your HW vendor
> 2. open a support case with the vendor of the OS and check for updates
>
> Gluster can stress filesystems in ways that are not very common, and
> there have been issues found in XFS due to this. Your OS support vendor
> should be able to tell you if the latest and related XFS fixes are
> included in your kernel.
>
> HTH,
> Niels
>
> >
> >
> > 
> >
> > Thank You,
> > Chamara
> >
> >
> >
> > On Tue, Jan 20, 2015 at 5:33 PM, chamara samarakoon <chthsa123 at gmail.com
> >
> > wrote:
> >
> > > HI All,
> > >
> > > Thank You for valuable feedback , I will test the suggested solutions,
> and
> > > update the thread.
> > >
> > > Regards,
> > > Chamara
> > >
> > > On Tue, Jan 20, 2015 at 4:17 PM, Deepak Shetty <dpkshetty at gmail.com>
> > > wrote:
> > >
> > >> In addition, I would also like to add that i do suspect (just my
> hunch)
> > >> that it could be related to multipath.
> > >> If you can try without multipath and if it doesn't re-create, i think
> > >> that would be a good data point for kernel/OS vendor to debug further.
> > >>
> > >> my 2 cents again :)
> > >>
> > >> thanx,
> > >> deepak
> > >>
> > >>
> > >> On Tue, Jan 20, 2015 at 2:32 PM, Niels de Vos <ndevos at redhat.com>
> wrote:
> > >>
> > >>> On Tue, Jan 20, 2015 at 11:55:40AM +0530, Deepak Shetty wrote:
> > >>> > What does "Controller" mean, the openstack controller node or
> somethign
> > >>> > else (like HBA ) ?
> > >>> > You picture says its SAN but the text says multi-path mount.. SAN
> would
> > >>> > mean block devices, so I am assuming you have redundant block
> devices
> > >>> on
> > >>> > the compute host, mkfs'ing it and then creating bricks for gluster
> ?
> > >>> >
> > >>> >
> > >>> > The stack trace looks like you hit a kernel bug and glusterfsd
> happens
> > >>> to
> > >>> > be running on the CPU at the time... my 2 cents
> > >>>
> > >>> That definitely is a kernel issue. You should contact your OS support
> > >>> vendor about this.
> > >>>
> > >>> The bits you copy/pasted are not sufficient to see what caused it.
> The
> > >>> glusterfsd process is just a casualty of the kernel issue, and it is
> not
> > >>> likely this can be fixed in Gluster. I suspect you need a kernel
> > >>> patch/update.
> > >>>
> > >>> Niels
> > >>>
> > >>> >
> > >>> > thanx,
> > >>> > deepak
> > >>> >
> > >>> > On Tue, Jan 20, 2015 at 11:29 AM, chamara samarakoon <
> > >>> chthsa123 at gmail.com>
> > >>> > wrote:
> > >>> >
> > >>> > > Hi All,
> > >>> > >
> > >>> > >
> > >>> > > We have setup Openstack cloud as below. And the
> > >>> "/va/lib/nova/instances"
> > >>> > > is a Gluster volume.
> > >>> > >
> > >>> > > CentOS - 6.5
> > >>> > > Kernel -  2.6.32-431.29.2.el6.x86_64
> > >>> > > GlusterFS - glusterfs 3.5.2 built on Jul 31 2014 18:47:54
> > >>> > > OpenStack - RDO using Packstack
> > >>> > >
> > >>> > >
> > >>> > >
> > >>> > >
> > >>> > > 
> > >>> > >
> > >>> > >
> > >>> > > Recently Controller node freezes with following error (Which
> > >>> required hard
> > >>> > > reboot), as a result Gluster volumes on compute node can not
> reach
> > >>> the
> > >>> > > controller and due to that all the instances on compute nodes
> > >>> become to
> > >>> > > read-only status  which causes to restart all instances.
> > >>> > >
> > >>> > >
> > >>> > >
> > >>> > >
> > >>> > > *BUG: scheduling while atomic : glusterfsd/42725/0xffffffff*
> > >>> > > *BUG: unable to handle kernel paging request at
> 0000000038a60d0a8*
> > >>> > > *IP: [<fffffffff81058e5d>] task_rq_lock+0x4d/0xa0*
> > >>> > > *PGD 1065525067 PUD 0*
> > >>> > > *Oops: 0000 [#1] SMP*
> > >>> > > *last sysfs file :
> > >>> > >
> > >>>
> /sys/device/pci0000:80/0000:80:02.0/0000:86:00.0/host2/port-2:0/end_device-2:0/target2:0:0/2:0:0:1/state*
> > >>> > > *CPU 0*
> > >>> > > *Modules linked in : xtconntrack iptable_filter ip_tables
> > >>> ipt_REDIRECT
> > >>> > > fuse ipv openvswitch vxlan iptable_mangle *
> > >>> > >
> > >>> > > Please advice on above incident , also feedback on the Openstack
> +
> > >>> > > GlusterFS setup is appreciated.
> > >>> > >
> > >>> > > Thank You,
> > >>> > > Chamara
> > >>> > >
> > >>> > >
> > >>> > > _______________________________________________
> > >>> > > Gluster-users mailing list
> > >>> > > Gluster-users at gluster.org
> > >>> > > http://www.gluster.org/mailman/listinfo/gluster-users
> > >>> > >
> > >>>
> > >>>
> > >>>
> > >>> > _______________________________________________
> > >>> > Gluster-users mailing list
> > >>> > Gluster-users at gluster.org
> > >>> > http://www.gluster.org/mailman/listinfo/gluster-users
> > >>>
> > >>>
> > >>
> > >
> > >
> > > --
> > > chthsa
> > >
> >
> >
> >
> > --
> > chthsa
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150123/03575319/attachment.html>