[Gluster-users] Lots of mount points failing with core dumps, help!

Wed Aug 6 00:54:33 UTC 2014

I think all the mounts that have failed were mounted with 3.4.3 prior to
the update. Not sure why they continued to work for several days before
failing but remounting them with 3.5 appears to fix the problem. Running
fusermount -zu makes them eventually exit with a core dump.

So no more live updates!

Cheers,

On Tue, 2014-08-05 at 14:24 +0800, Franco Broi wrote: 
> On Mon, 2014-08-04 at 12:31 +0200, Niels de Vos wrote: 
> > On Mon, Aug 04, 2014 at 05:05:10PM +0800, Franco Broi wrote:
> > > 
> > > A bit more background to this.
> > > 
> > > I was running 3.4.3 on all the clients (120+ nodes) but I also have a
> > > 3.5 volume which I wanted to mount on the same nodes. The 3.4.3 client
> > > mounts of the 3.5 volume would sometimes hang on mount requiring a
> > > volume stop/start to clear. I raised this issue on this list but it was
> > > never resolved. I also tried to downgrade the 3.5 volume to 3.4 but that
> > > also didn't work.
> > > 
> > > I had a single client node running 3.5 and it was able to mount both
> > > volumes so I decided to update everything on the client side.
> > > 
> > > Middle of last week I did a glusterfs update from 3.4.3 to 3.5.1 and
> > > everything appeared to be ok. The existing 3.4.3 mounts continued to
> > > work and I was able to mount the 3.5 volume without any of the hanging
> > > problems I was seeing before. Great, I thought.
> > > 
> > > Today mount points started to fail, both for the 3.4 volume with the 3.4
> > > client and for the 3.5 volume with the 3.5 client.
> > > 
> > > I've been remounting the filesystems as they break but it's a pretty
> > > unstable environment.
> > > 
> > > BTW, is there some way to get gluster to write its core files somewhere
> > > other than the root filesystem? If I could do that I might at least get
> > > a complete core dump to run gdb on.
> > 
> > You can set a sysctl with a path, for example:
> > 
> >     # mkdir /var/cores
> >     # mount /dev/local_vg/cores /var/cores
> >     # sysctl -w kernel.core_pattern=/var/cores/core
> 
> Thanks for that.
> 
> > 
> > I am not sure if the "mismatching layouts" can cause a segmentation 
> > fault. In any case, it would be good to get the extended attributes for 
> > the directories in question. The xattrs contain the hash-range (layout) 
> > on where the files should get located.
> > 
> > For all bricks (replace the "..." with the path for the brick):
> > 
> >    # getfattr -m. -ehex -d .../promax_data/115_endurance/31fasttrackstk
> > 
> > Please also include a "gluster volume info $VOLUME".
> 
> Please see attached.
> 
> 
> > 
> > You should also file a bug for this, core dumping should definitely not 
> > happen.
> > 
> > Thanks,
> > Niels
> > 
> > 
> > 
> > >
> > > Cheers,
> > > 
> > > On Mon, 2014-08-04 at 12:53 +0530, Pranith Kumar Karampuri wrote: 
> > > > CC dht folks
> > > > 
> > > > Pranith
> > > > On 08/04/2014 11:52 AM, Franco Broi wrote:
> > > > > I've had a sudden spate of mount points failing with Transport endpoint
> > > > > not connected and core dumps. The dumps are so large and my root
> > > > > partitions so small that I haven't managed to get a decent traceback.
> > > > >
> > > > > BFD: Warning: //core.2351 is truncated: expected core file size >=
> > > > > 165773312, found: 154107904.
> > > > > [New Thread 2351]
> > > > > [New Thread 2355]
> > > > > [New Thread 2359]
> > > > > [New Thread 2356]
> > > > > [New Thread 2354]
> > > > > [New Thread 2360]
> > > > > [New Thread 2352]
> > > > > Cannot access memory at address 0x1700000006
> > > > > (gdb) where
> > > > > #0  glusterfs_signals_setup (ctx=0x8b17c0) at glusterfsd.c:1715
> > > > > Cannot access memory at address 0x7fffaa46b2e0
> > > > >
> > > > >
> > > > > Log file is full of messages like this:
> > > > >
> > > > > [2014-08-04 06:10:11.160482] I [dht-common.c:623:dht_revalidate_cbk] 0-data-dht: mismatching layouts for /promax_data/115_endurance/31fasttrackstk
> > > > > [2014-08-04 06:10:11.160495] I [dht-layout.c:718:dht_layout_dir_mismatch] 0-data-dht: /promax_data/115_endurance/31fasttrackstk - disk layout missing
> > > > > [2014-08-04 06:10:11.160502] I [dht-common.c:623:dht_revalidate_cbk] 0-data-dht: mismatching layouts for /promax_data/115_endurance/31fasttrackstk
> > > > > [2014-08-04 06:10:11.160514] I [dht-layout.c:718:dht_layout_dir_mismatch] 0-data-dht: /promax_data/115_endurance/31fasttrackstk - disk layout missing
> > > > > [2014-08-04 06:10:11.160522] I [dht-common.c:623:dht_revalidate_cbk] 0-data-dht: mismatching layouts for /promax_data/115_endurance/31fasttrackstk
> > > > > [2014-08-04 06:10:11.160622] I [dht-layout.c:718:dht_layout_dir_mismatch] 0-data-dht: /promax_data/115_endurance/31fasttrackstk - disk layout missing
> > > > > [2014-08-04 06:10:11.160634] I [dht-common.c:623:dht_revalidate_cbk] 0-data-dht: mismatching layouts for /promax_data/115_endurance/31fasttrackstk
> > > > >
> > > > >
> > > > > I'm running 3.5.1 on the client side and 3.4.3 on the server.
> > > > >
> > > > > Any quick help much appreciated.
> > > > >
> > > > > Cheersm
> > > > >
> > > > > _______________________________________________
> > > > > Gluster-users mailing list
> > > > > Gluster-users at gluster.org
> > > > > http://supercolony.gluster.org/mailman/listinfo/gluster-users
> > > > 
> > > 
> > > 
> > > _______________________________________________
> > > Gluster-users mailing list
> > > Gluster-users at gluster.org
> > > http://supercolony.gluster.org/mailman/listinfo/gluster-users
>