[Bugs] [Bug 1450378] New: GNFS crashed while taking lock on a file from 2 different clients having same volume mounted from 2 different servers

Fri May 12 11:31:40 UTC 2017

https://bugzilla.redhat.com/show_bug.cgi?id=1450378

            Bug ID: 1450378
           Summary: GNFS crashed while taking lock on a file from 2
                    different clients having same volume mounted from 2
                    different servers
           Product: GlusterFS
           Version: 3.10
         Component: nfs
          Keywords: Triaged
          Priority: medium
          Assignee: ndevos at redhat.com
          Reporter: ndevos at redhat.com
                CC: bugs at gluster.org
        Depends On: 1381970

Description of problem:
Mount a volume from 2 different server to 2 different clients.
Create a file.
Take lock from 2 different clients on the same file.
In that case GNFS server got crashed

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1.Create disperseVol 2 x (4 + 2) and Enable MDCache and GNFS on it
2.Mount the volume from two different servers to 2 different clients
3.Create 512 Bytes of file from 1 client on mount point
4.Take lock from client 1.Lock is acquired
5.Try taking lock from client 2.Lock is blocked (as already being taken by
client 1)
6.Release lock from client1.Take lock from client2
7.Again try taking lock from client 1.

Actual results:
Lock is being granted to client1.Which should not
Issue is reported in bug-https://bugzilla.redhat.com/show_bug.cgi?id=1411338
GNFS server got crashed

Expected results:
GNFS should handle taking lock from 2 different client on same volume mounted
from 2 different servers

Additional info:

--- Additional comment from Niels de Vos on 2017-01-10 13:30 CET ---

While working on the attached test-script I managed to get a coredump too. This
happened while manually executing the commands I wanted to put in the script.
Now the script is running and has already with 100+ iterations and still no
crashes...

Core was generated by `/usr/sbin/glusterfs -s localhost --volfile-id
gluster/nfs -p /var/lib/glusterd/'.
Program terminated with signal 11, Segmentation fault.
#0  __strcmp_sse42 () at ../sysdeps/x86_64/multiarch/strcmp-sse42.S:164
164             movdqu  (%rdi), %xmm1
(gdb) bt
#0  __strcmp_sse42 () at ../sysdeps/x86_64/multiarch/strcmp-sse42.S:164
#1  0x00007fafa65986f2 in nlm_set_rpc_clnt (rpc_clnt=0x7faf8c005200,
caller_name=0x0) at nlm4.c:345
#2  0x00007fafa659b1d5 in nlm_rpcclnt_notify (rpc_clnt=0x7faf8c005200,
mydata=0x7faf9f66b06c, fn=<optimized out>, data=<optimized out>) at nlm4.c:930
#3  0x00007fafb48a0a84 in rpc_clnt_notify (trans=<optimized out>,
mydata=0x7faf8c005230, event=<optimized out>, data=0x7faf8c00cd70) at
rpc-clnt.c:994
#4  0x00007fafb489c973 in rpc_transport_notify (this=this at entry=0x7faf8c00cd70,
event=event at entry=RPC_TRANSPORT_CONNECT, data=data at entry=0x7faf8c00cd70) at
rpc-transport.c:541
#5  0x00007fafa9391c67 in socket_connect_finish (this=0x7faf8c00cd70) at
socket.c:2343
#6  0x00007fafa9396315 in socket_event_handler (fd=<optimized out>, idx=10,
data=0x7faf8c00cd70, poll_in=0, poll_out=4, poll_err=0) at socket.c:2386
#7  0x00007fafb4b2ece0 in event_dispatch_epoll_handler (event=0x7faf9e568e80,
event_pool=0x7fafb545e6e0) at event-epoll.c:571
#8  event_dispatch_epoll_worker (data=0x7fafa0033d50) at event-epoll.c:674
#9  0x00007fafb3937df5 in start_thread (arg=0x7faf9e569700) at
pthread_create.c:308
#10 0x00007fafb327e1ad in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:113

(gdb) f 2
#2  0x00007fafa659b1d5 in nlm_rpcclnt_notify (rpc_clnt=0x7faf8c005200,
mydata=0x7faf9f66b06c, fn=<optimized out>, data=<optimized out>) at nlm4.c:930
930                     ret = nlm_set_rpc_clnt (rpc_clnt, caller_name);
(gdb) l
925             cs = mydata;
926             caller_name = cs->args.nlm4_lockargs.alock.caller_name;
927
928             switch (fn) {
929             case RPC_CLNT_CONNECT:
930                     ret = nlm_set_rpc_clnt (rpc_clnt, caller_name);
931                     if (ret == -1) {
932                             gf_msg (GF_NLM, GF_LOG_ERROR, 0,
933                                     NFS_MSG_RPC_CLNT_ERROR, "Failed to set
"
934                                     "rpc clnt");
(gdb) p cs->args.nlm4_lockargs                                                
$1 = {
  cookie = {
    nlm4_netobj_len = 0, 
    nlm4_netobj_val = 0x0
  }, 
  block = 0, 
  exclusive = 0, 
  alock = {
    caller_name = 0x0, 
    fh = {
      nlm4_netobj_len = 0, 
      nlm4_netobj_val = 0x0
    }, 
    oh = {
      nlm4_netobj_len = 0, 
      nlm4_netobj_val = 0x0
    }, 
    svid = 0, 
    l_offset = 0, 
    l_len = 0
  }, 
  reclaim = 0, 
  state = 0
}

It seems that the nlm4_lockargs are empty... No idea how that can happen, will
investigate a little more.

Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1381970
[Bug 1381970] GlusterFS Daemon stops working after a longer runtime and
higher file workload due to design flaws?
-- 
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=oFR9v9Pme3&a=cc_unsubscribe