[Gluster-users] hanging httpd processes.

Fri Apr 7 20:48:28 UTC 2017

On Sat, Apr 8, 2017 at 12:02 AM, Alvin Starr <alvin at netvel.net> wrote:

> Thanks for the help.
>
> That seems to have fixed it.
>
> We were seeing hangs clocking up at a rate of a few hundred a day and for
> the last week there have been none.
>
>
>
Thanks for confirming this. Good to know one of the major hurdle for you is
resolved.

-Amar

>
> On 03/31/2017 05:54 AM, Mohit Agrawal wrote:
>
> Hi,
>
> As you have mentioned client/server version in thread it shows package
> version are different on both(client,server).
> We would recommend you to upgrade both servers and clients to rhs-3.10.1.
> If it is not possible to upgrade both(client,server) then in this case it
> is required to upgrade client only.
>
> Thanks
> Mohit Agrawal
>
> On Fri, Mar 31, 2017 at 2:27 PM, Mohit Agrawal <moagrawa at redhat.com>
> wrote:
>
>> Hi,
>>
>> As per attached glusterdump/stackdump  it seems it is a known issue (https://bugzilla.redhat.com/show_bug.cgi?id=1372211) and issue is already fixed from the patch (https://review.gluster.org/#/c/15380/).
>>
>> The issue is happened in this case
>> Assume a file is opened with fd1 and fd2.
>> 1. some WRITE opto fd1 got error, they were add back to 'todo' queue
>>    because of those error.
>> 2. fd2 closed, a FLUSH op is send to write-behind.
>> 3. FLUSH can not be unwind because it's not a legal waiter for those
>>    failed write(as func __wb_request_waiting_on() say). and those failed
>>    WRITE also can not be ended if fd1 is not closed. fd2 stuck in close
>>    syscall.
>>
>> As per statedump it also shows flush op fd is not same as write op fd.
>> Kindly upgrade the package on 3.10.1 and share the result.
>>
>>
>>
>> Thanks
>> Mohit Agrawal
>>
>> On Fri, Mar 31, 2017 at 12:29 PM, Amar Tumballi <atumball at redhat.com <http://lists.gluster.org/mailman/listinfo/gluster-users>> wrote:
>>
>> >* Hi Alvin,
>> *>>* Thanks for the dump output. It helped a bit.
>> *>>* For now, recommend turning off open-behind and read-ahead performance
>> *>* translators for you to get rid of this situation, As I noticed hung FLUSH
>> *>* operations from these translators.
>> *>
>> Looks like I gave wrong advise by looking at below snippet:
>>
>> [global.callpool.stack.61]
>> >* stack=0x7f6c6f628f04
>> *>* uid=48
>> *>* gid=48
>> *>* pid=11077
>> *>* unique=10048797
>> *>* lk-owner=a73ae5bdb5fcd0d2
>> *>* op=FLUSH
>> *>* type=1
>> *>* cnt=5
>> *>>* [global.callpool.stack.61.frame.1]
>> *>* frame=0x7f6c6f793d88
>> *>* ref_count=0
>> *>* translator=edocs-production-write-behind
>> *>* complete=0
>> *>* parent=edocs-production-read-ahead
>> *>* wind_from=ra_flush
>> *>* wind_to=FIRST_CHILD (this)->fops->flush
>> *>* unwind_to=ra_flush_cbk
>> *>>* [global.callpool.stack.61.frame.2]
>> *>* frame=0x7f6c6f796c90
>> *>* ref_count=1
>> *>* translator=edocs-production-read-ahead
>> *>* complete=0
>> *>* parent=edocs-production-open-behind
>> *>* wind_from=default_flush_resume
>> *>* wind_to=FIRST_CHILD(this)->fops->flush
>> *>* unwind_to=default_flush_cbk
>> *>>* [global.callpool.stack.61.frame.3]
>> *>* frame=0x7f6c6f79b724
>> *>* ref_count=1
>> *>* translator=edocs-production-open-behind
>> *>* complete=0
>> *>* parent=edocs-production
>> *>* wind_from=io_stats_flush
>> *>* wind_to=FIRST_CHILD(this)->fops->flush
>> *>* unwind_to=io_stats_flush_cbk
>> *>>* [global.callpool.stack.61.frame.4]
>> *>* frame=0x7f6c6f79b474
>> *>* ref_count=1
>> *>* translator=edocs-production
>> *>* complete=0
>> *>* parent=fuse
>> *>* wind_from=fuse_flush_resume
>> *>* wind_to=FIRST_CHILD(this)->fops->flush
>> *>* unwind_to=fuse_err_cbk
>> *>>* [global.callpool.stack.61.frame.5]
>> *>* frame=0x7f6c6f796684
>> *>* ref_count=1
>> *>* translator=fuse
>> *>* complete=0
>> *>
>> Mos probably, issue is with write-behind's flush. So please turn off
>> write-behind and test. If you don't have any hung httpd processes, please
>> let us know.
>>
>> -Amar
>>
>>
>> >* -Amar
>> *>>* On Wed, Mar 29, 2017 at 6:56 AM, Alvin Starr <alvin at netvel.net <http://lists.gluster.org/mailman/listinfo/gluster-users>> wrote:
>> *>>>* We are running gluster 3.8.9-1 on Centos 7.3.1611 for the servers and on
>> *>>* the clients 3.7.11-2 on Centos 6.8
>> *>>>>* We are seeing httpd processes hang in fuse_request_send or sync_page.
>> *>>>>* These calls are from PHP 5.3.3-48 scripts
>> *>>>>* I am attaching  a tgz file that contains the process dump from glusterfsd
>> *>>* and the hung pids along with the offending pid's stacks from
>> *>>* /proc/{pid}/stack.
>> *>>>>* This has been a low level annoyance for a while but it has become a much
>> *>>* bigger issue because the number of hung processes went from a few a week to
>> *>>* a few hundred a day.
>> *>>>>>>* --
>> *>>* Alvin Starr                   ||   voice: (905)513-7688
>> *>>* Netvel Inc.                   ||   Cell:  (416)806-0133
>> *>>* alvin at netvel.net <http://lists.gluster.org/mailman/listinfo/gluster-users>              ||
>> *>>>>>>* _______________________________________________
>> *>>* Gluster-users mailing list
>> *>>* Gluster-users at gluster.org <http://lists.gluster.org/mailman/listinfo/gluster-users>
>> *>>* http://lists.gluster.org/mailman/listinfo/gluster-users <http://lists.gluster.org/mailman/listinfo/gluster-users>
>> *>>>>>>* --
>> *>* Amar Tumballi (amarts)
>> *>
>>
>>
>> --
>>
>> --
> Alvin Starr                   ||   voice: (905)513-7688
> Netvel Inc.                   ||   Cell:  (416)806-0133
> alvin at netvel.net              ||
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>

-- 
Amar Tumballi (amarts)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170408/61204760/attachment.html>