[Gluster-users] Lots of mount points failing with core dumps, help!

Mon Aug 4 09:05:10 UTC 2014

A bit more background to this.

I was running 3.4.3 on all the clients (120+ nodes) but I also have a
3.5 volume which I wanted to mount on the same nodes. The 3.4.3 client
mounts of the 3.5 volume would sometimes hang on mount requiring a
volume stop/start to clear. I raised this issue on this list but it was
never resolved. I also tried to downgrade the 3.5 volume to 3.4 but that
also didn't work.

I had a single client node running 3.5 and it was able to mount both
volumes so I decided to update everything on the client side.

Middle of last week I did a glusterfs update from 3.4.3 to 3.5.1 and
everything appeared to be ok. The existing 3.4.3 mounts continued to
work and I was able to mount the 3.5 volume without any of the hanging
problems I was seeing before. Great, I thought.

Today mount points started to fail, both for the 3.4 volume with the 3.4
client and for the 3.5 volume with the 3.5 client.

I've been remounting the filesystems as they break but it's a pretty
unstable environment.

BTW, is there some way to get gluster to write its core files somewhere
other than the root filesystem? If I could do that I might at least get
a complete core dump to run gdb on.

Cheers,

On Mon, 2014-08-04 at 12:53 +0530, Pranith Kumar Karampuri wrote: 
> CC dht folks
> 
> Pranith
> On 08/04/2014 11:52 AM, Franco Broi wrote:
> > I've had a sudden spate of mount points failing with Transport endpoint
> > not connected and core dumps. The dumps are so large and my root
> > partitions so small that I haven't managed to get a decent traceback.
> >
> > BFD: Warning: //core.2351 is truncated: expected core file size >=
> > 165773312, found: 154107904.
> > [New Thread 2351]
> > [New Thread 2355]
> > [New Thread 2359]
> > [New Thread 2356]
> > [New Thread 2354]
> > [New Thread 2360]
> > [New Thread 2352]
> > Cannot access memory at address 0x1700000006
> > (gdb) where
> > #0  glusterfs_signals_setup (ctx=0x8b17c0) at glusterfsd.c:1715
> > Cannot access memory at address 0x7fffaa46b2e0
> >
> >
> > Log file is full of messages like this:
> >
> > [2014-08-04 06:10:11.160482] I [dht-common.c:623:dht_revalidate_cbk] 0-data-dht: mismatching layouts for /promax_data/115_endurance/31fasttrackstk
> > [2014-08-04 06:10:11.160495] I [dht-layout.c:718:dht_layout_dir_mismatch] 0-data-dht: /promax_data/115_endurance/31fasttrackstk - disk layout missing
> > [2014-08-04 06:10:11.160502] I [dht-common.c:623:dht_revalidate_cbk] 0-data-dht: mismatching layouts for /promax_data/115_endurance/31fasttrackstk
> > [2014-08-04 06:10:11.160514] I [dht-layout.c:718:dht_layout_dir_mismatch] 0-data-dht: /promax_data/115_endurance/31fasttrackstk - disk layout missing
> > [2014-08-04 06:10:11.160522] I [dht-common.c:623:dht_revalidate_cbk] 0-data-dht: mismatching layouts for /promax_data/115_endurance/31fasttrackstk
> > [2014-08-04 06:10:11.160622] I [dht-layout.c:718:dht_layout_dir_mismatch] 0-data-dht: /promax_data/115_endurance/31fasttrackstk - disk layout missing
> > [2014-08-04 06:10:11.160634] I [dht-common.c:623:dht_revalidate_cbk] 0-data-dht: mismatching layouts for /promax_data/115_endurance/31fasttrackstk
> >
> >
> > I'm running 3.5.1 on the client side and 3.4.3 on the server.
> >
> > Any quick help much appreciated.
> >
> > Cheersm
> >
> > _______________________________________________
> > Gluster-users mailing list
> > Gluster-users at gluster.org
> > http://supercolony.gluster.org/mailman/listinfo/gluster-users
>