[Bugs] [Bug 1184417] New: Segmentation fault in locks while disconnecting client

Wed Jan 21 10:42:16 UTC 2015

https://bugzilla.redhat.com/show_bug.cgi?id=1184417

            Bug ID: 1184417
           Summary: Segmentation fault in locks while disconnecting client
           Product: GlusterFS
           Version: mainline
         Component: locks
          Assignee: bugs at gluster.org
          Reporter: xhernandez at datalab.es
                CC: bugs at gluster.org, gluster-bugs at redhat.com

Description of problem:

When a client that has a lock disconnects without releasing it and a
clear-locks operation has been executed (after acquiring the lock), the
glusterfsd process dies with a segmentation fault.

Version-Release number of selected component (if applicable): mainline

How reproducible:

Always in the described situation, but it's hard to produce it in normal
circumstances. It can be obtained as a side effect of another bug.

Steps to Reproduce:
1. Checkout revision 63dc6e1942dffcddd99c5048a498ca00eead8baa (this revision is
just before the patch that solves the bug on the ec side)
2. compile and install
3. glusterd
4. gluster volume create test disperse 3 redundancy 1 server:/bricks/test{1..3}
force
5. gluster volume start test
6. mount -t glusterfs server:/test /gluster/test
7. gluster volume clear-locks test / kind all inode

The last step causes the crash on the bricks

Actual results:

Segmentation fault on bricks.

Expected results:

Nothing special or visible should happen.

Additional info:

This is a backtrace of a crash:

Program received signal SIGSEGV, Segmentation fault.
0x00007fc901daf145 in lkowner_unparse (buf_len=2176, 
    buf=0x1f7aff0
"00000000471a2536-320000005fa3a797-ff7f000018000000-3000000020a4a797-ff7f000060a3a797-ff7f",
'0' <repeats 12 times>, "-", '0' <repeats 16 times>,
"-0000000050a1fe01-0000000070d4fc01-0000000088b2f301-0000000049652e36-32", '0'
<repeats 11 times>..., lkowner=0x7fff97a7a330) at lkowner.h:43
43                      sprintf (&buf[j], "%02hhx", lkowner->data[i]);
(gdb) bt
#0  0x00007fc901daf145 in lkowner_unparse (buf_len=2176, 
    buf=0x1f7aff0
"00000000471a2536-320000005fa3a797-ff7f000018000000-3000000020a4a797-ff7f000060a3a797-ff7f",
'0' <repeats 12 times>, "-", '0' <repeats 16 times>,
"-0000000050a1fe01-0000000070d4fc01-0000000088b2f301-0000000049652e36-32", '0'
<repeats 11 times>..., lkowner=0x7fff97a7a330) at lkowner.h:43
#1  lkowner_utoa (lkowner=lkowner at entry=0x7fff97a7a330) at common-utils.c:2177
#2  0x00007fc8f6e151c5 in pl_inodelk_log_cleanup (lock=0x7fff97a79e90) at
inodelk.c:401
#3  pl_inodelk_client_cleanup (this=this at entry=0x1f85b30,
ctx=ctx at entry=0x7fc8e8000e20) at inodelk.c:429
#4  0x00007fc8f6e12b82 in pl_client_disconnect_cbk (this=0x1f85b30,
client=<optimized out>) at posix.c:2563
#5  0x00007fc901dea23d in gf_client_disconnect (client=client at entry=0x1feae10)
at client_t.c:393
#6  0x00007fc8f5f60c28 in server_connection_cleanup (this=this at entry=0x1f8e4e0,
client=client at entry=0x1feae10, flags=flags at entry=3) at server-helpers.c:353
#7  0x00007fc8f5f5babe in server_rpc_notify (rpc=<optimized out>, xl=0x1f8e4e0,
event=<optimized out>, data=0x1fe8970) at server.c:531
#8  0x00007fc901b65d9f in rpcsvc_handle_disconnect (svc=0x1f9db20,
trans=trans at entry=0x1fe8970) at rpcsvc.c:741
#9  0x00007fc901b65ec8 in rpcsvc_notify (trans=0x1fe8970, mydata=<optimized
out>, event=<optimized out>, data=0x1fe8970) at rpcsvc.c:779
#10 0x00007fc901b68dc3 in rpc_transport_notify (this=this at entry=0x1fe8970,
event=event at entry=RPC_TRANSPORT_DISCONNECT, data=data at entry=0x1fe8970) at
rpc-transport.c:518
#11 0x00007fc8f7cb3962 in socket_event_poll_err (this=0x1fe8970) at
socket.c:1161
#12 socket_event_handler (fd=<optimized out>, idx=6, data=data at entry=0x1fe8970,
poll_in=1, poll_out=0, poll_err=<optimized out>) at socket.c:2354
#13 0x00007fc901dec232 in event_dispatch_epoll_handler (i=<optimized out>,
events=0x1f7a0f0, event_pool=0x1f59ac0) at event-epoll.c:384
#14 event_dispatch_epoll (event_pool=0x1f59ac0) at event-epoll.c:445
#15 0x0000000000404ec9 in main (argc=19, argv=0x7fff97a7bb58) at
glusterfsd.c:2052

The cause is that in pl_inode_client_cleanup() (frame #3) it traverses the
inodelk_lockers list of the client, but the clear-locks command has not removed
the lock from this list before destroying it. This causes an access to garbage
contents that finally fails.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.