[Gluster-users] glusterfs client crashes

Tue Feb 23 15:53:29 UTC 2016

2.17-106.el7 is the latest glibc on CentOS 7. Tried the one-liner on older
versions as well which also results in "likely buggy" according to the test.

Found this CentOS issue - https://bugs.centos.org/view.php?id=10426

# rpm -qa | grep glibc
*glibc*-2.17-106.el7_2.4.x86_64
*glibc*-common-2.17-106.el7_2.4.x86_64

# objdump -r -d /lib64/libc.so.6 | grep -C 20 _int_free | grep -C 10
cmpxchg | head -21 | grep -A 3 cmpxchg | tail -1 | (grep '%r' && echo "Your
libc is likely buggy." || echo "Your libc looks OK.")
   7ca3e: 48 85 c9             test   *%r*cx,*%r*cx
Your libc is likely buggy.

Kind regards,
Fredrik Widlund

On Tue, Feb 23, 2016 at 4:27 PM, Raghavendra Gowdappa <rgowdapp at redhat.com>
wrote:

> Came across a glibc bug which could've caused some corruptions. On
> googling about possible problems, we found that there is an issue (
> https://bugzilla.redhat.com/show_bug.cgi?id=1305406) fixed in
> glibc-2.17-121.el7. From the bug we found the following test-script to
> determine if the glibc is buggy. And on running it, we ran it on the local
> setup using the following method given in the bug:
>
> ----------------
> # objdump -r -d /lib64/libc.so.6 | grep -C 20 _int_free | grep -C 10
> cmpxchg | head -21 | grep -A 3 cmpxchg | tail -1 | (grep '%r' && echo "Your
> libc is likely buggy." || echo "Your libc looks OK.")
>
>    7cc36:    48 85 c9                 test   %rcx,%rcx
> Your libc is likely buggy.
> ----------------
>
> Could you check if the above command on your setup gives the same output
> which says "Your libc is likely buggy."
>
> Thanks to Nithya, Krutika and Pranith for working on this.
>
> ----- Original Message -----
> > From: "Fredrik Widlund" <fredrik.widlund at gmail.com>
> > To: gluster at deej.net
> > Cc: gluster-users at gluster.org
> > Sent: Tuesday, February 23, 2016 5:51:37 PM
> > Subject: Re: [Gluster-users] glusterfs client crashes
> >
> > Hi,
> >
> > I have experienced what looks like a very similar crash. Gluster 3.7.6 on
> > CentOS 7. No errors on the bricks or on other at the time mounted
> clients.
> > Relatively high load at the time.
> >
> > Remounting the filesystem brought it back online.
> >
> >
> > pending frames:
> > frame : type(1) op(READ)
> > frame : type(1) op(READ)
> > frame : type(1) op(READ)
> > frame : type(1) op(READ)
> > frame : type(1) op(READ)
> > frame : type(1) op(READ)
> > frame : type(1) op(READ)
> > frame : type(1) op(READ)
> > frame : type(1) op(READ)
> > frame : type(1) op(READ)
> > frame : type(1) op(READ)
> > frame : type(1) op(READ)
> > frame : type(1) op(READ)
> > frame : type(1) op(READ)
> > frame : type(1) op(READ)
> > frame : type(1) op(READ)
> > frame : type(1) op(READ)
> > frame : type(1) op(READ)
> > frame : type(1) op(READ)
> > frame : type(1) op(READ)
> > frame : type(1) op(READ)
> > frame : type(1) op(STAT)
> > frame : type(1) op(STAT)
> > frame : type(1) op(READ)
> > frame : type(1) op(READ)
> > frame : type(1) op(READ)
> > frame : type(1) op(READ)
> > frame : type(1) op(READ)
> > frame : type(1) op(READ)
> > frame : type(1) op(READ)
> > frame : type(0) op(0)
> > patchset: git:// git.gluster.com/glusterfs.git
> > signal received: 6
> > time of crash:
> > 2016-02-22 10:28:45
> > configuration details:
> > argp 1
> > backtrace 1
> > dlfcn 1
> > libpthread 1
> > llistxattr 1
> > setfsid 1
> > spinlock 1
> > epoll.h 1
> > xattr.h 1
> > st_atim.tv_nsec 1
> > package-string: glusterfs 3.7.6
> > /lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xc2)[0x7f83387f7012]
> > /lib64/libglusterfs.so.0(gf_print_trace+0x31d)[0x7f83388134dd]
> > /lib64/libc.so.6(+0x35670)[0x7f8336ee5670]
> > /lib64/libc.so.6(gsignal+0x37)[0x7f8336ee55f7]
> > /lib64/libc.so.6(abort+0x148)[0x7f8336ee6ce8]
> > /lib64/libc.so.6(+0x75317)[0x7f8336f25317]
> > /lib64/libc.so.6(+0x7cfe1)[0x7f8336f2cfe1]
> > /lib64/libglusterfs.so.0(loc_wipe+0x27)[0x7f83387f4d47]
> >
> /usr/lib64/glusterfs/3.7.6/xlator/performance/md-cache.so(mdc_local_wipe+0x11)[0x7f8329c8e5f1]
> >
> /usr/lib64/glusterfs/3.7.6/xlator/performance/md-cache.so(mdc_stat_cbk+0x10c)[0x7f8329c8f4fc]
> > /lib64/libglusterfs.so.0(default_stat_cbk+0xac)[0x7f83387fcc5c]
> >
> /usr/lib64/glusterfs/3.7.6/xlator/cluster/distribute.so(dht_file_attr_cbk+0x149)[0x7f832ab2a409]
> >
> /usr/lib64/glusterfs/3.7.6/xlator/protocol/client.so(client3_3_stat_cbk+0x3c6)[0x7f832ad6d266]
> > /lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0x90)[0x7f83385c5b80]
> > /lib64/libgfrpc.so.0(rpc_clnt_notify+0x1bf)[0x7f83385c5e3f]
> > /lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7f83385c1983]
> >
> /usr/lib64/glusterfs/3.7.6/rpc-transport/socket.so(+0x9506)[0x7f832d261506]
> >
> /usr/lib64/glusterfs/3.7.6/rpc-transport/socket.so(+0xc3f4)[0x7f832d2643f4]
> > /lib64/libglusterfs.so.0(+0x878ea)[0x7f83388588ea]
> > /lib64/libpthread.so.0(+0x7dc5)[0x7f833765fdc5]
> > /lib64/libc.so.6(clone+0x6d)[0x7f8336fa621d]
> >
> >
> >
> > Kind regards,
> > Fredrik Widlund
> >
> > On Tue, Feb 23, 2016 at 1:00 PM, < gluster-users-request at gluster.org >
> wrote:
> >
> >
> > Date: Mon, 22 Feb 2016 15:08:47 -0500
> > From: Dj Merrill < gluster at deej.net >
> > To: Gaurav Garg < ggarg at redhat.com >
> > Cc: gluster-users at gluster.org
> > Subject: Re: [Gluster-users] glusterfs client crashes
> > Message-ID: < 56CB6ACF.5080408 at deej.net >
> > Content-Type: text/plain; charset=utf-8; format=flowed
> >
> > On 2/21/2016 2:23 PM, Dj Merrill wrote:
> > > Very interesting. They were reporting both bricks offline, but the
> > > processes on both servers were still running. Restarting glusterfsd on
> > > one of the servers brought them both back online.
> >
> > I realize I wasn't clear in my comments yesterday and would like to
> > elaborate on this a bit further. The "very interesting" comment was
> > sparked because when we were running 3.7.6, the bricks were not
> > reporting as offline when a client was having an issue, so this is new
> > behaviour now that we are running 3.7.8 (or a different issue entirely).
> >
> > The other point that I was not clear on is that we may have one client
> > reporting the "Transport endpoint is not connected" error, but the other
> > 40+ clients all continue to work properly. This is the case with both
> > 3.7.6 and 3.7.8.
> >
> > Curious, how can the other clients continue to work fine if both Gluster
> > 3.7.8 servers are reporting the bricks as offline?
> >
> > What does "offline" mean in this context?
> >
> >
> > Re: the server logs, here is what I've found so far listed on both
> > gluster servers (glusterfs1 and glusterfs2):
> >
> > [2016-02-21 08:06:02.785788] I [glusterfsd-mgmt.c:1596:mgmt_getspec_cbk]
> > 0-glusterfs: No change in volfile, continuing
> > [2016-02-21 18:48:20.677010] W [socket.c:588:__socket_rwv]
> > 0-gv0-client-1: readv on (sanitized IP of glusterfs2):49152 failed (No
> > data available)
> > [2016-02-21 18:48:20.677096] I [MSGID: 114018]
> > [client.c:2030:client_rpc_notify] 0-gv0-client-1: disconnected from
> > gv0-client-1. Client process will keep trying to connect to glusterd
> > until brick's port is available
> > [2016-02-21 18:48:31.148564] E [MSGID: 114058]
> > [client-handshake.c:1524:client_query_portmap_cbk] 0-gv0-client-1:
> > failed to get the port number for remote subvolume. Please run 'gluster
> > volume status' on server to see if brick process is running.
> > [2016-02-21 18:48:40.941715] W [socket.c:588:__socket_rwv] 0-glusterfs:
> > readv on (sanitized IP of glusterfs2):24007 failed (No data available)
> > [2016-02-21 18:48:51.184424] I [glusterfsd-mgmt.c:1596:mgmt_getspec_cbk]
> > 0-glusterfs: No change in volfile, continuing
> > [2016-02-21 18:48:51.972068] I [glusterfsd-mgmt.c:58:mgmt_cbk_spec]
> > 0-mgmt: Volume file changed
> > [2016-02-21 18:48:51.980210] I [glusterfsd-mgmt.c:58:mgmt_cbk_spec]
> > 0-mgmt: Volume file changed
> > [2016-02-21 18:48:51.985211] I [glusterfsd-mgmt.c:58:mgmt_cbk_spec]
> > 0-mgmt: Volume file changed
> > [2016-02-21 18:48:51.995002] I [glusterfsd-mgmt.c:58:mgmt_cbk_spec]
> > 0-mgmt: Volume file changed
> > [2016-02-21 18:48:53.006079] I [glusterfsd-mgmt.c:1596:mgmt_getspec_cbk]
> > 0-glusterfs: No change in volfile, continuing
> > [2016-02-21 18:48:53.018104] I [glusterfsd-mgmt.c:1596:mgmt_getspec_cbk]
> > 0-glusterfs: No change in volfile, continuing
> > [2016-02-21 18:48:53.024060] I [glusterfsd-mgmt.c:1596:mgmt_getspec_cbk]
> > 0-glusterfs: No change in volfile, continuing
> > [2016-02-21 18:48:53.035170] I [glusterfsd-mgmt.c:1596:mgmt_getspec_cbk]
> > 0-glusterfs: No change in volfile, continuing
> > [2016-02-21 18:48:53.045637] I [rpc-clnt.c:1847:rpc_clnt_reconfig]
> > 0-gv0-client-1: changing port to 49152 (from 0)
> > [2016-02-21 18:48:53.051991] I [MSGID: 114057]
> > [client-handshake.c:1437:select_server_supported_programs]
> > 0-gv0-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330)
> > [2016-02-21 18:48:53.052439] I [MSGID: 114046]
> > [client-handshake.c:1213:client_setvolume_cbk] 0-gv0-client-1: Connected
> > to gv0-client-1, attached to remote volume '/export/brick1/sdb1'.
> > [2016-02-21 18:48:53.052486] I [MSGID: 114047]
> > [client-handshake.c:1224:client_setvolume_cbk] 0-gv0-client-1: Server
> > and Client lk-version numbers are not same, reopening the fds
> > [2016-02-21 18:48:53.052668] I [MSGID: 114035]
> > [client-handshake.c:193:client_set_lk_version_cbk] 0-gv0-client-1:
> > Server lk version = 1
> > [2016-02-21 18:48:31.148706] I [MSGID: 114018]
> > [client.c:2030:client_rpc_notify] 0-gv0-client-1: disconnected from
> > gv0-client-1. Client process will keep trying to connect to glusterd
> > until brick's port is available
> > [2016-02-21 18:49:12.271865] W [socket.c:588:__socket_rwv] 0-glusterfs:
> > readv on (sanitized IP of glusterfs2):24007 failed (No data available)
> > [2016-02-21 18:49:15.637745] W [socket.c:588:__socket_rwv]
> > 0-gv0-client-1: readv on (sanitized IP of glusterfs2):49152 failed (No
> > data available)
> > [2016-02-21 18:49:15.637824] I [MSGID: 114018]
> > [client.c:2030:client_rpc_notify] 0-gv0-client-1: disconnected from
> > gv0-client-1. Client process will keep trying to connect to glusterd
> > until brick's port is available
> > [2016-02-21 18:49:24.198431] E [socket.c:2278:socket_connect_finish]
> > 0-glusterfs: connection to (sanitized IP of glusterfs2):24007 failed
> > (Connection refused)
> > [2016-02-21 18:49:26.204811] E [socket.c:2278:socket_connect_finish]
> > 0-gv0-client-1: connection to (sanitized IP of glusterfs2):24007 failed
> > (Connection refused)
> > [2016-02-21 18:49:38.366559] I [MSGID: 108031]
> > [afr-common.c:1883:afr_local_discovery_cbk] 0-gv0-replicate-0: selecting
> > local read_child gv0-client-0
> > [2016-02-21 18:50:54.605535] I [glusterfsd-mgmt.c:1596:mgmt_getspec_cbk]
> > 0-glusterfs: No change in volfile, continuing
> > [2016-02-21 18:50:54.605639] E [MSGID: 114058]
> > [client-handshake.c:1524:client_query_portmap_cbk] 0-gv0-client-1:
> > failed to get the port number for remote subvolume. Please run 'gluster
> > volume status' on server to see if brick process is running.
> >
> >
> > _______________________________________________
> > Gluster-users mailing list
> > Gluster-users at gluster.org
> > http://www.gluster.org/mailman/listinfo/gluster-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160223/dbf2121a/attachment.html>