[Bugs] [Bug 1381970] GlusterFS Daemon stops working after a longer runtime and higher file workload due to design flaws ?

bugzilla at redhat.com bugzilla at redhat.com
Sun Nov 27 12:15:29 UTC 2016


https://bugzilla.redhat.com/show_bug.cgi?id=1381970

Giuseppe Ragusa <giuseppe.ragusa at hotmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |giuseppe.ragusa at hotmail.com



--- Comment #2 from Giuseppe Ragusa <giuseppe.ragusa at hotmail.com> ---
I can confirm this bug (or something very similar) on latest 3.7.17 on CentOS
7.2

I have some replicated (replica 3 with arbiter) distributed volumes some of
which have NFS enabled (Gluster NFS, not NFSGanesha).

After some NFS activity (few hours), all NFS translator processes die on the
non-arbiter-brick nodes (all arbiter bricks for all volumes are confined to one
node).

Simply stopping then restarting one of the NFS-enabled volumes make the NFS
translator processes restart on all volumes and all is fine up to next time.

Here are the relevant contents from /var/log/glusterfs/nfs.log at about the
same event (clocks are synchronized between nodes) on the two non-arbiter nodes
(nothing relevant was found on the arbiter node).

Node names are shockley (arbiter), read and hall.

The Gluster cluster has been formed on a dedicated (3x1Gbps LACP bonded)
interface.
The NFS traffic with NFS clients happens on a different dedicated (2x1Gbps LACP
bonded) interface.

read:

[2016-11-26 19:07:41.439354] W [socket.c:596:__socket_rwv] 0-NLM-client: readv
o
n 172.25.15.39:4001 failed (No data available)
pending frames:
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash:
2016-11-26 19:07:51
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.7.17
/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xc2)[0x7fef26c4b622]
/lib64/libglusterfs.so.0(gf_print_trace+0x31d)[0x7fef26c70ddd]
/lib64/libc.so.6(+0x35670)[0x7fef25337670]
/lib64/libc.so.6(+0x132ad6)[0x7fef25434ad6]
/usr/lib64/glusterfs/3.7.17/xlator/nfs/server.so(nlm_set_rpc_clnt+0x62)[0x7fef185de912]
/usr/lib64/glusterfs/3.7.17/xlator/nfs/server.so(nlm_rpcclnt_notify+0x35)[0x7fef185e12a5]
/lib64/libgfrpc.so.0(rpc_clnt_notify+0x214)[0x7fef26a199b4]
/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fef26a15843]
/usr/lib64/glusterfs/3.7.17/rpc-transport/socket.so(+0x8d77)[0x7fef1b851d77]
/usr/lib64/glusterfs/3.7.17/rpc-transport/socket.so(+0x919f)[0x7fef1b85219f]
/lib64/libglusterfs.so.0(+0x9554a)[0x7fef26cba54a]
/lib64/libpthread.so.0(+0x7dc5)[0x7fef25ab3dc5]
/lib64/libc.so.6(clone+0x6d)[0x7fef253f8ced]
---------


hall:

[2016-11-26 19:25:42.784519] W [socket.c:596:__socket_rwv] 0-NLM-client: readv
o
n 172.25.15.39:4001 failed (No data available)
pending frames:
frame : type(0) op(0)
patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash:
2016-11-26 19:25:52
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.7.17
/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xc2)[0x7f11e5c0d622]
/lib64/libglusterfs.so.0(gf_print_trace+0x31d)[0x7f11e5c32ddd]
/lib64/libc.so.6(+0x35670)[0x7f11e42f9670]
/lib64/libc.so.6(+0x132ad6)[0x7f11e43f6ad6]
/usr/lib64/glusterfs/3.7.17/xlator/nfs/server.so(nlm_set_rpc_clnt+0x62)[0x7f11d34ab912]
/usr/lib64/glusterfs/3.7.17/xlator/nfs/server.so(nlm_rpcclnt_notify+0x35)[0x7f11d34ae2a5]
/lib64/libgfrpc.so.0(rpc_clnt_notify+0x214)[0x7f11e59db9b4]
/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7f11e59d7843]
/usr/lib64/glusterfs/3.7.17/rpc-transport/socket.so(+0x8d77)[0x7f11da813d77]
/usr/lib64/glusterfs/3.7.17/rpc-transport/socket.so(+0x919f)[0x7f11da81419f]
/lib64/libglusterfs.so.0(+0x9554a)[0x7f11e5c7c54a]
/lib64/libpthread.so.0(+0x7dc5)[0x7f11e4a75dc5]
/lib64/libc.so.6(clone+0x6d)[0x7f11e43baced]
---------

nothing on shockley (arbiter node)

-- 
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=zGeDT9jLlR&a=cc_unsubscribe


More information about the Bugs mailing list