[Gluster-users] write request hung in write-behind

Thu Jun 6 07:02:46 UTC 2019

On Tue, Jun 4, 2019 at 7:36 AM Xie Changlong <zgrep at 139.com> wrote:

> To me, all 'df' commands on specific(not all) nfs client hung forever.
> The temporary solution is disable performance.nfs.write-behind and
> cluster.eager-lock.
>
> I'll try to get more info back if encounter this problem again .
>

If you observe this issue again, take successive (at least a minute apart)
statedumps of the processes and run
https://github.com/gluster/glusterfs/blob/master/extras/identify-hangs.sh
on it which will give the information about the hangs.

>
>
>
> 发件人: Raghavendra Gowdappa <rgowdapp at redhat.com>
> 时间: 2019/06/04(星期二)09:55
> 收件人: Xie Changlong <zgrep at 139.com>;Ravishankar Narayanankutty
> <ranaraya at redhat.com>;Karampuri, Pranith <pkarampu at redhat.com>;
> 抄送人: gluster-users <gluster-users at gluster.org>;
> 主题: Re: Re: write request hung in write-behind
>
>
>
> On Mon, Jun 3, 2019 at 1:11 PM Xie Changlong <zgrep at 139.com> wrote:
>
>> Firstly i correct myself, write request followed by 771(not 1545) FLUSH
>> requests.  I've attach gnfs dump file, totally 774 pending call-stacks,
>> 771 of them pending on write-behind and the deepest call-stack is afr.
>>
>
> +Ravishankar Narayanankutty <ranaraya at redhat.com> +Karampuri, Pranith
> <pkarampu at redhat.com>
>
> Are you sure these were not call-stacks of in-progress ops? One way of
> confirming that would be to take statedumps periodically (say 3 min apart).
> Hung call stacks will be common to all the statedumps.
>
>
>> [global.callpool.stack.771]
>> stack=0x7f517f557f60
>> uid=0
>> gid=0
>> pid=0
>> unique=0
>> lk-owner=
>> op=stack
>> type=0
>> cnt=3
>>
>> [global.callpool.stack.771.frame.1]
>> frame=0x7f517f655880
>> ref_count=0
>> translator=cl35vol01-replicate-7
>> complete=0
>> parent=cl35vol01-dht
>> wind_from=dht_writev
>> wind_to=subvol->fops->writev
>> unwind_to=dht_writev_cbk
>>
>> [global.callpool.stack.771.frame.2]
>> frame=0x7f518ed90340
>> ref_count=1
>> translator=cl35vol01-dht
>> complete=0
>> parent=cl35vol01-write-behind
>> wind_from=wb_fulfill_head
>> wind_to=FIRST_CHILD (frame->this)->fops->writev
>> unwind_to=wb_fulfill_cbk
>>
>> [global.callpool.stack.771.frame.3]
>> frame=0x7f516d3baf10
>> ref_count=1
>> translator=cl35vol01-write-behind
>> complete=0
>>
>> [global.callpool.stack.772]
>> stack=0x7f51607a5a20
>> uid=0
>> gid=0
>> pid=0
>> unique=0
>> lk-owner=a0715b77517f0000
>> op=stack
>> type=0
>> cnt=1
>>
>> [global.callpool.stack.772.frame.1]
>> frame=0x7f516ca2d1b0
>> ref_count=0
>> translator=cl35vol01-replicate-7
>> complete=0
>>
>> [root at rhel-201 35]# grep -rn "global.callpool.stack.*.frame.1" -A 5
>>  glusterdump.20106.dump.1559038081  |grep translator | wc -l
>> 774
>> [root at rhel-201 35]# grep -rn "global.callpool.stack.*.frame.1" -A 5
>>  glusterdump.20106.dump.1559038081 |grep complete |wc -l
>> 774
>> [root at rhel-201 35]# grep -rn "global.callpool.stack.*.frame.1" -A 5
>>  glusterdump.20106.dump.1559038081 |grep -E "complete=0" |wc -l
>> 774
>> [root at rhel-201 35]# grep -rn "global.callpool.stack.*.frame.1" -A 5
>>  glusterdump.20106.dump.1559038081  |grep translator | grep write-behind
>> |wc -l
>> 771
>> [root at rhel-201 35]# grep -rn "global.callpool.stack.*.frame.1" -A 5
>>  glusterdump.20106.dump.1559038081  |grep translator | grep replicate-7 |
>> wc -l
>> 2
>> [root at rhel-201 35]# grep -rn "global.callpool.stack.*.frame.1" -A 5
>>  glusterdump.20106.dump.1559038081  |grep translator | grep glusterfs | wc
>> -l
>> 1
>>
>>
>>
>>
>> 发件人: Raghavendra Gowdappa <rgowdapp at redhat.com>
>> 时间: 2019/06/03(星期一)14:46
>> 收件人: Xie Changlong <zgrep at 139.com>;
>> 抄送人: gluster-users <gluster-users at gluster.org>;
>> 主题: Re: write request hung in write-behind
>>
>>
>>
>> On Mon, Jun 3, 2019 at 11:57 AM Xie Changlong <zgrep at 139.com> wrote:
>>
>>> Hi all
>>>
>>> Test gluster 3.8.4-54.15 gnfs, i saw a write request hung in
>>> write-behind followed by 1545 FLUSH requests. I found a similar
>>> bugfix https://bugzilla.redhat.com/show_bug.cgi?id=1626787, but not
>>> sure if it's the right one.
>>>
>>> [xlator.performance.write-behind.wb_inode]
>>> path=/575/1e/5751e318f21f605f2aac241bf042e7a8.jpg
>>> inode=0x7f51775b71a0
>>> window_conf=1073741824
>>> window_current=293822
>>> transit-size=293822
>>> dontsync=0
>>>
>>> [.WRITE]
>>> request-ptr=0x7f516eec2060
>>> refcount=1
>>> wound=yes
>>> generation-number=1
>>> req->op_ret=293822
>>> req->op_errno=0
>>> sync-attempts=1
>>> sync-in-progress=yes
>>>
>>
>> Note that the sync is still in progress. This means, write-behind has
>> wound the write-request to its children and yet to receive the response
>> (unless there is a bug in accounting of sync-in-progress). So, its likely
>> that there are callstacks into children of write-behind, which are not
>> complete yet. Are you sure the deepest hung call-stack is in write-behind?
>> Can you check for frames with "complete=0"?
>>
>> size=293822
>>> offset=1048576
>>> lied=-1
>>> append=0
>>> fulfilled=0
>>> go=-1
>>>
>>> [.FLUSH]
>>> request-ptr=0x7f517c2badf0
>>> refcount=1
>>> wound=no
>>> generation-number=2
>>> req->op_ret=-1
>>> req->op_errno=116
>>> sync-attempts=0
>>>
>>> [.FLUSH]
>>> request-ptr=0x7f5173e9f7b0
>>> refcount=1
>>> wound=no
>>> generation-number=2
>>> req->op_ret=0
>>> req->op_errno=0
>>> sync-attempts=0
>>>
>>> [.FLUSH]
>>> request-ptr=0x7f51640b8ca0
>>> refcount=1
>>> wound=no
>>> generation-number=2
>>> req->op_ret=0
>>> req->op_errno=0
>>> sync-attempts=0
>>>
>>> [.FLUSH]
>>> request-ptr=0x7f516f3979d0
>>> refcount=1
>>> wound=no
>>> generation-number=2
>>> req->op_ret=0
>>> req->op_errno=0
>>> sync-attempts=0
>>>
>>> [.FLUSH]
>>> request-ptr=0x7f516f6ac8d0
>>> refcount=1
>>> wound=no
>>> generation-number=2
>>> req->op_ret=0
>>> req->op_errno=0
>>> sync-attempts=0
>>>
>>>
>>> Any comments would be appreciated!
>>>
>>> Thanks
>>> -Xie
>>>
>>>
>>>

-- 
Pranith
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190606/73d878c2/attachment.html>