[Gluster-users] write request hung in write-behind

Raghavendra Gowdappa rgowdapp at redhat.com
Tue Jun 4 01:55:25 UTC 2019


On Mon, Jun 3, 2019 at 1:11 PM Xie Changlong <zgrep at 139.com> wrote:

> Firstly i correct myself, write request followed by 771(not 1545) FLUSH
> requests.  I've attach gnfs dump file, totally 774 pending call-stacks,
> 771 of them pending on write-behind and the deepest call-stack is afr.
>

+Ravishankar Narayanankutty <ranaraya at redhat.com> +Karampuri, Pranith
<pkarampu at redhat.com>

Are you sure these were not call-stacks of in-progress ops? One way of
confirming that would be to take statedumps periodically (say 3 min apart).
Hung call stacks will be common to all the statedumps.


> [global.callpool.stack.771]
> stack=0x7f517f557f60
> uid=0
> gid=0
> pid=0
> unique=0
> lk-owner=
> op=stack
> type=0
> cnt=3
>
> [global.callpool.stack.771.frame.1]
> frame=0x7f517f655880
> ref_count=0
> translator=cl35vol01-replicate-7
> complete=0
> parent=cl35vol01-dht
> wind_from=dht_writev
> wind_to=subvol->fops->writev
> unwind_to=dht_writev_cbk
>
> [global.callpool.stack.771.frame.2]
> frame=0x7f518ed90340
> ref_count=1
> translator=cl35vol01-dht
> complete=0
> parent=cl35vol01-write-behind
> wind_from=wb_fulfill_head
> wind_to=FIRST_CHILD (frame->this)->fops->writev
> unwind_to=wb_fulfill_cbk
>
> [global.callpool.stack.771.frame.3]
> frame=0x7f516d3baf10
> ref_count=1
> translator=cl35vol01-write-behind
> complete=0
>
> [global.callpool.stack.772]
> stack=0x7f51607a5a20
> uid=0
> gid=0
> pid=0
> unique=0
> lk-owner=a0715b77517f0000
> op=stack
> type=0
> cnt=1
>
> [global.callpool.stack.772.frame.1]
> frame=0x7f516ca2d1b0
> ref_count=0
> translator=cl35vol01-replicate-7
> complete=0
>
> [root at rhel-201 35]# grep -rn "global.callpool.stack.*.frame.1" -A 5
>  glusterdump.20106.dump.1559038081  |grep translator | wc -l
> 774
> [root at rhel-201 35]# grep -rn "global.callpool.stack.*.frame.1" -A 5
>  glusterdump.20106.dump.1559038081 |grep complete |wc -l
> 774
> [root at rhel-201 35]# grep -rn "global.callpool.stack.*.frame.1" -A 5
>  glusterdump.20106.dump.1559038081 |grep -E "complete=0" |wc -l
> 774
> [root at rhel-201 35]# grep -rn "global.callpool.stack.*.frame.1" -A 5
>  glusterdump.20106.dump.1559038081  |grep translator | grep write-behind
> |wc -l
> 771
> [root at rhel-201 35]# grep -rn "global.callpool.stack.*.frame.1" -A 5
>  glusterdump.20106.dump.1559038081  |grep translator | grep replicate-7 |
> wc -l
> 2
> [root at rhel-201 35]# grep -rn "global.callpool.stack.*.frame.1" -A 5
>  glusterdump.20106.dump.1559038081  |grep translator | grep glusterfs | wc
> -l
> 1
>
>
>
>
> 发件人: Raghavendra Gowdappa <rgowdapp at redhat.com>
> 时间: 2019/06/03(星期一)14:46
> 收件人: Xie Changlong <zgrep at 139.com>;
> 抄送人: gluster-users <gluster-users at gluster.org>;
> 主题: Re: write request hung in write-behind
>
>
>
> On Mon, Jun 3, 2019 at 11:57 AM Xie Changlong <zgrep at 139.com> wrote:
>
>> Hi all
>>
>> Test gluster 3.8.4-54.15 gnfs, i saw a write request hung in write-behind
>> followed by 1545 FLUSH requests. I found a similar
>> bugfix https://bugzilla.redhat.com/show_bug.cgi?id=1626787, but not sure
>> if it's the right one.
>>
>> [xlator.performance.write-behind.wb_inode]
>> path=/575/1e/5751e318f21f605f2aac241bf042e7a8.jpg
>> inode=0x7f51775b71a0
>> window_conf=1073741824
>> window_current=293822
>> transit-size=293822
>> dontsync=0
>>
>> [.WRITE]
>> request-ptr=0x7f516eec2060
>> refcount=1
>> wound=yes
>> generation-number=1
>> req->op_ret=293822
>> req->op_errno=0
>> sync-attempts=1
>> sync-in-progress=yes
>>
>
> Note that the sync is still in progress. This means, write-behind has
> wound the write-request to its children and yet to receive the response
> (unless there is a bug in accounting of sync-in-progress). So, its likely
> that there are callstacks into children of write-behind, which are not
> complete yet. Are you sure the deepest hung call-stack is in write-behind?
> Can you check for frames with "complete=0"?
>
> size=293822
>> offset=1048576
>> lied=-1
>> append=0
>> fulfilled=0
>> go=-1
>>
>> [.FLUSH]
>> request-ptr=0x7f517c2badf0
>> refcount=1
>> wound=no
>> generation-number=2
>> req->op_ret=-1
>> req->op_errno=116
>> sync-attempts=0
>>
>> [.FLUSH]
>> request-ptr=0x7f5173e9f7b0
>> refcount=1
>> wound=no
>> generation-number=2
>> req->op_ret=0
>> req->op_errno=0
>> sync-attempts=0
>>
>> [.FLUSH]
>> request-ptr=0x7f51640b8ca0
>> refcount=1
>> wound=no
>> generation-number=2
>> req->op_ret=0
>> req->op_errno=0
>> sync-attempts=0
>>
>> [.FLUSH]
>> request-ptr=0x7f516f3979d0
>> refcount=1
>> wound=no
>> generation-number=2
>> req->op_ret=0
>> req->op_errno=0
>> sync-attempts=0
>>
>> [.FLUSH]
>> request-ptr=0x7f516f6ac8d0
>> refcount=1
>> wound=no
>> generation-number=2
>> req->op_ret=0
>> req->op_errno=0
>> sync-attempts=0
>>
>>
>> Any comments would be appreciated!
>>
>> Thanks
>> -Xie
>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190604/d50329f4/attachment.html>


More information about the Gluster-users mailing list