[Bugs] [Bug 1450378] New: GNFS crashed while taking lock on a file from 2 different clients having same volume mounted from 2 different servers
bugzilla at redhat.com
bugzilla at redhat.com
Fri May 12 11:31:40 UTC 2017
https://bugzilla.redhat.com/show_bug.cgi?id=1450378
Bug ID: 1450378
Summary: GNFS crashed while taking lock on a file from 2
different clients having same volume mounted from 2
different servers
Product: GlusterFS
Version: 3.10
Component: nfs
Keywords: Triaged
Priority: medium
Assignee: ndevos at redhat.com
Reporter: ndevos at redhat.com
CC: bugs at gluster.org
Depends On: 1381970
Description of problem:
Mount a volume from 2 different server to 2 different clients.
Create a file.
Take lock from 2 different clients on the same file.
In that case GNFS server got crashed
Version-Release number of selected component (if applicable):
How reproducible:
Steps to Reproduce:
1.Create disperseVol 2 x (4 + 2) and Enable MDCache and GNFS on it
2.Mount the volume from two different servers to 2 different clients
3.Create 512 Bytes of file from 1 client on mount point
4.Take lock from client 1.Lock is acquired
5.Try taking lock from client 2.Lock is blocked (as already being taken by
client 1)
6.Release lock from client1.Take lock from client2
7.Again try taking lock from client 1.
Actual results:
Lock is being granted to client1.Which should not
Issue is reported in bug-https://bugzilla.redhat.com/show_bug.cgi?id=1411338
GNFS server got crashed
Expected results:
GNFS should handle taking lock from 2 different client on same volume mounted
from 2 different servers
Additional info:
--- Additional comment from Niels de Vos on 2017-01-10 13:30 CET ---
While working on the attached test-script I managed to get a coredump too. This
happened while manually executing the commands I wanted to put in the script.
Now the script is running and has already with 100+ iterations and still no
crashes...
Core was generated by `/usr/sbin/glusterfs -s localhost --volfile-id
gluster/nfs -p /var/lib/glusterd/'.
Program terminated with signal 11, Segmentation fault.
#0 __strcmp_sse42 () at ../sysdeps/x86_64/multiarch/strcmp-sse42.S:164
164 movdqu (%rdi), %xmm1
(gdb) bt
#0 __strcmp_sse42 () at ../sysdeps/x86_64/multiarch/strcmp-sse42.S:164
#1 0x00007fafa65986f2 in nlm_set_rpc_clnt (rpc_clnt=0x7faf8c005200,
caller_name=0x0) at nlm4.c:345
#2 0x00007fafa659b1d5 in nlm_rpcclnt_notify (rpc_clnt=0x7faf8c005200,
mydata=0x7faf9f66b06c, fn=<optimized out>, data=<optimized out>) at nlm4.c:930
#3 0x00007fafb48a0a84 in rpc_clnt_notify (trans=<optimized out>,
mydata=0x7faf8c005230, event=<optimized out>, data=0x7faf8c00cd70) at
rpc-clnt.c:994
#4 0x00007fafb489c973 in rpc_transport_notify (this=this at entry=0x7faf8c00cd70,
event=event at entry=RPC_TRANSPORT_CONNECT, data=data at entry=0x7faf8c00cd70) at
rpc-transport.c:541
#5 0x00007fafa9391c67 in socket_connect_finish (this=0x7faf8c00cd70) at
socket.c:2343
#6 0x00007fafa9396315 in socket_event_handler (fd=<optimized out>, idx=10,
data=0x7faf8c00cd70, poll_in=0, poll_out=4, poll_err=0) at socket.c:2386
#7 0x00007fafb4b2ece0 in event_dispatch_epoll_handler (event=0x7faf9e568e80,
event_pool=0x7fafb545e6e0) at event-epoll.c:571
#8 event_dispatch_epoll_worker (data=0x7fafa0033d50) at event-epoll.c:674
#9 0x00007fafb3937df5 in start_thread (arg=0x7faf9e569700) at
pthread_create.c:308
#10 0x00007fafb327e1ad in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:113
(gdb) f 2
#2 0x00007fafa659b1d5 in nlm_rpcclnt_notify (rpc_clnt=0x7faf8c005200,
mydata=0x7faf9f66b06c, fn=<optimized out>, data=<optimized out>) at nlm4.c:930
930 ret = nlm_set_rpc_clnt (rpc_clnt, caller_name);
(gdb) l
925 cs = mydata;
926 caller_name = cs->args.nlm4_lockargs.alock.caller_name;
927
928 switch (fn) {
929 case RPC_CLNT_CONNECT:
930 ret = nlm_set_rpc_clnt (rpc_clnt, caller_name);
931 if (ret == -1) {
932 gf_msg (GF_NLM, GF_LOG_ERROR, 0,
933 NFS_MSG_RPC_CLNT_ERROR, "Failed to set
"
934 "rpc clnt");
(gdb) p cs->args.nlm4_lockargs
$1 = {
cookie = {
nlm4_netobj_len = 0,
nlm4_netobj_val = 0x0
},
block = 0,
exclusive = 0,
alock = {
caller_name = 0x0,
fh = {
nlm4_netobj_len = 0,
nlm4_netobj_val = 0x0
},
oh = {
nlm4_netobj_len = 0,
nlm4_netobj_val = 0x0
},
svid = 0,
l_offset = 0,
l_len = 0
},
reclaim = 0,
state = 0
}
It seems that the nlm4_lockargs are empty... No idea how that can happen, will
investigate a little more.
Referenced Bugs:
https://bugzilla.redhat.com/show_bug.cgi?id=1381970
[Bug 1381970] GlusterFS Daemon stops working after a longer runtime and
higher file workload due to design flaws?
--
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=oFR9v9Pme3&a=cc_unsubscribe
More information about the Bugs
mailing list