[Bugs] [Bug 1488354] gluster-blockd process crashed and core generated

bugzilla at redhat.com bugzilla at redhat.com
Tue Sep 5 08:10:46 UTC 2017


https://bugzilla.redhat.com/show_bug.cgi?id=1488354

Pranith Kumar K <pkarampu at redhat.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Assignee|bugs at gluster.org            |pkarampu at redhat.com



--- Comment #1 from Pranith Kumar K <pkarampu at redhat.com> ---
(gdb) bt
#0  0x00007f549eaf3c30 in pthread_mutex_lock () from /lib64/libpthread.so.0
#1  0x00007f549e207f15 in fd_anonymous () from /lib64/libglusterfs.so.0
#2  0x00007f54869d1927 in shard_common_inode_write_do ()
   from /usr/lib64/glusterfs/3.8.4/xlator/features/shard.so
#3  0x00007f54869d1c7d in shard_common_inode_write_post_mknod_handler ()
   from /usr/lib64/glusterfs/3.8.4/xlator/features/shard.so
#4  0x00007f54869ca77f in shard_common_mknod_cbk ()
   from /usr/lib64/glusterfs/3.8.4/xlator/features/shard.so
#5  0x00007f5486c1164b in dht_newfile_cbk ()
   from /usr/lib64/glusterfs/3.8.4/xlator/cluster/distribute.so
#6  0x00007f5486e71ab1 in afr_mknod_unwind ()
   from /usr/lib64/glusterfs/3.8.4/xlator/cluster/replicate.so
#7  0x00007f5486e73eeb in __afr_dir_write_cbk ()
   from /usr/lib64/glusterfs/3.8.4/xlator/cluster/replicate.so
#8  0x00007f5486e7482d in afr_mknod_wind_cbk ()
   from /usr/lib64/glusterfs/3.8.4/xlator/cluster/replicate.so
#9  0x00007f54870f6168 in client3_3_mknod_cbk ()
   from /usr/lib64/glusterfs/3.8.4/xlator/protocol/client.so
#10 0x00007f549dfac840 in rpc_clnt_handle_reply () from /lib64/libgfrpc.so.0
#11 0x00007f549dfacb27 in rpc_clnt_notify () from /lib64/libgfrpc.so.0
#12 0x00007f549dfa89e3 in rpc_transport_notify () from /lib64/libgfrpc.so.0
#13 0x00007f5490be63d6 in socket_event_poll_in ()
   from /usr/lib64/glusterfs/3.8.4/rpc-transport/socket.so
---Type <return> to continue, or q <return> to quit--- 
#14 0x00007f5490be897c in socket_event_handler ()
   from /usr/lib64/glusterfs/3.8.4/rpc-transport/socket.so
#15 0x00007f549e23e1e6 in event_dispatch_epoll_worker ()
   from /lib64/libglusterfs.so.0
#16 0x00007f549eaf1e25 in start_thread () from /lib64/libpthread.so.0
#17 0x00007f549d8b034d in clone () from /lib64/libc.so.6


Based on the core-file, the only way it can happen is because it doesn't create
all the shards.
(gdb) fr 2
#2  0x00007f54869d1927 in shard_common_inode_write_do (frame=0x7f548c0dbbe0, 
    this=0x7f54800120d0) at shard.c:3883
3883                            anon_fd = fd_anonymous (local->inode_list[i]);
(gdb) p i
$1 = 255
(gdb) p local->inode_list[i]
$2 = (inode_t *) 0x0
(gdb) p lical->inode_list[i-1]
No symbol "lical" in current context.
(gdb) p local->inode_list[i-1]
$3 = (inode_t *) 0x7f5474765440
(gdb) p local->offset
$4 = 0
(gdb) p local->num_blocks
$5 = 256

Based on this data, I went through the code and I see two races:
1) In shard_common_mknod_cbk()
local->eexist_count is incremented without frame->lock
2) In shard_common_lookup_shards_cbk()
local->create_count is incremented without frame->lock

This can lead to the counts being less than what they need to be, so mknod is
done just on 255 shards instead of 256 shards.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Bugs mailing list