[Gluster-users] query about glusterfs 3.12-3 write-behind.c coredump
Li, Deqian (NSB - CN/Hangzhou)
deqian.li at nokia-sbell.com
Wed Jan 30 03:02:57 UTC 2019
Hi,
Yes, thanks very much for your quick response.
I attach the whole file, not very big.
Br,
Li Deqian
From: Raghavendra Gowdappa <rgowdapp at redhat.com>
Sent: Wednesday, January 30, 2019 11:00 AM
To: Li, Deqian (NSB - CN/Hangzhou) <deqian.li at nokia-sbell.com>
Cc: gluster-users <gluster-users at gluster.org>
Subject: Re: query about glusterfs 3.12-3 write-behind.c coredump
On Wed, Jan 30, 2019 at 7:35 AM Li, Deqian (NSB - CN/Hangzhou) <deqian.li at nokia-sbell.com<mailto:deqian.li at nokia-sbell.com>> wrote:
Hi,
Could you help to check this coredump?
We are using glusterfs 3.12-3(3 replicated bricks solution ) to do stability testing under high CPU load like 80% by stress and doing I/O.
After several hours, coredump happened in glusterfs side .
[Current thread is 1 (Thread 0x7ffff37d2700 (LWP 3696))]
Missing separate debuginfos, use: dnf debuginfo-install rcp-pack-glusterfs-1.8.1_11_g99e9ca6-RCP2.wf28.x86_64
(gdb) bt
#0 0x00007ffff0d5c845 in wb_fulfill (wb_inode=0x7fffd406b3b0, liabilities=0x7fffdc234b50) at write-behind.c:1148
#1 0x00007ffff0d5e4d5 in wb_process_queue (wb_inode=0x7fffd406b3b0) at write-behind.c:1718
#2 0x00007ffff0d5eda7 in wb_writev (frame=0x7fffe0086290, this=0x7fffec014b00, fd=0x7fffe4034070, vector=0x7fffdc445720, count=1, offset=67108863, flags=32770, iobref=0x7fffdc00d550, xdata=0x0)
at write-behind.c:1825
#3 0x00007ffff0b51fcb in du_writev_resume (ret=0, frame=0x7fffdc0305a0, opaque=0x7fffdc0305a0) at disk-usage.c:490
#4 0x00007ffff7b3510d in synctask_wrap () at syncop.c:377
#5 0x00007ffff60d0660 in ?? () from /lib64/libc.so.6
#6 0x0000000000000000 in ?? ()
(gdb) p wb_inode
$1 = (wb_inode_t *) 0x7fffd406b3b0
(gdb) frame 2
#2 0x00007ffff0d5eda7 in wb_writev (frame=0x7fffe0086290, this=0x7fffec014b00, fd=0x7fffe4034070, vector=0x7fffdc445720, count=1, offset=67108863, flags=32770, iobref=0x7fffdc00d550, xdata=0x0)
at write-behind.c:1825
1825 in write-behind.c
(gdb) p *fd
$2 = {pid = 18154, flags = 32962, refcount = 0, inode_list = {next = 0x7fffe4034080, prev = 0x7fffe4034080}, inode = 0x0, lock = {spinlock = 0, mutex = {__data = {__lock = 0, __count = 0, __owner = 0,
__nusers = 0, __kind = -1, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 16 times>, "\377\377\377\377", '\000' <repeats 19 times>, __align = 0}},
_ctx = 0x7fffe4022930, xl_count = 17, lk_ctx = 0x7fffe40350e0, anonymous = _gf_false}
(gdb) p fd
$3 = (fd_t *) 0x7fffe4034070
(gdb) p wb_inode->this
$1 = (xlator_t *) 0xffffffffffffff00
After adding test log I found the FOP sequence in write-behind xlator side was mass as bellow showing. In the FUSE side the FLUSH is after write2, but in the WB side, FLUSH is between write2 ‘wb_do_unwinds’ and ‘wb_fulfill’.
So I think this should has problem. I think it’s possible that the FLUSH and later RELEASE operation will destroy the fd , it will cause ‘wb_in->this(0xffffffffffffff00)’. Do you think so?
And I think our new adding disk-usage xlator’s synctask_new will dealy the write operation, but the FLUSH operation without this delay(because not invoked the disk-usage xlator).
Do you agree with my speculation ? and how to fix?(we don’t want to move the disk-usage xlator)
Problematic FOP sequence :
FUSE side: WB side:
Write 1 write1
Write2 do unwind
Write 2 FLUSH
Release(destroy fd)
FLUSH write2 (wb_fulfill) then coredump.
Release
int
wb_fulfill (wb_inode_t *wb_inode, list_head_t *liabilities)
{
wb_request_t *req = NULL;
wb_request_t *head = NULL;
wb_request_t *tmp = NULL;
wb_conf_t *conf = NULL;
off_t expected_offset = 0;
size_t curr_aggregate = 0;
size_t vector_count = 0;
int ret = 0;
conf = wb_inode->this->private; --> this line coredump
list_for_each_entry_safe (req, tmp, liabilities, winds) {
list_del_init (&req->winds);
….
volume ccs-write-behind
68: type performance/write-behind
69: subvolumes ccs-dht
70: end-volume
71:
72: volume ccs-disk-usage --> we add a new xlator here for write op ,just for checking if disk if full. And synctask_new for write.
73: type performance/disk-usage
74: subvolumes ccs-write-behind
75: end-volume
76:
77: volume ccs-read-ahead
78: type performance/read-ahead
79: subvolumes ccs-disk-usage
80: end-volume
Ps. Part of Our new translator code
int
du_writev (call_frame_t *frame, xlator_t *this, fd_t *fd,
struct iovec *vector, int count, off_t off, uint32_t flags,
struct iobref *iobref, dict_t *xdata)
{
int op_errno = -1;
int ret = -1;
du_local_t *local = NULL;
loc_t tmp_loc = {0,};
VALIDATE_OR_GOTO (frame, err);
VALIDATE_OR_GOTO (this, err);
VALIDATE_OR_GOTO (fd, err);
tmp_loc.gfid[15] = 1;
tmp_loc.inode = fd->inode;
tmp_loc.parent = fd->inode;
local = du_local_init (frame, &tmp_loc, fd, GF_FOP_WRITE);
if (!local) {
op_errno = ENOMEM;
goto err;
}
local->vector = iov_dup (vector, count);
local->offset = off;
local->count = count;
local->flags = flags;
local->iobref = iobref_ref (iobref);
ret = synctask_new(this->ctx->env, du_get_du_info,du_writev_resume,frame,frame);
Can you paste the code of,
* du_get_du_info
* du_writev_resume
if(ret)
{
op_errno = -1;
gf_log (this->name, GF_LOG_WARNING,"synctask_new return failure ret(%d) ",ret);
goto err;
}
return 0;
err:
op_errno = (op_errno == -1) ? errno : op_errno;
DU_STACK_UNWIND (writev, frame, -1, op_errno, NULL, NULL, NULL);
return 0;
}
Br,
Li Deqian
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190130/8b4827c2/attachment.html>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: disk-usage.c
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190130/8b4827c2/attachment.c>
More information about the Gluster-users
mailing list