[Gluster-users] hanging httpd processes.

Fri Mar 31 09:54:10 UTC 2017

Hi,

As you have mentioned client/server version in thread it shows package
version are different on both(client,server).
We would recommend you to upgrade both servers and clients to rhs-3.10.1.
If it is not possible to upgrade both(client,server) then in this case it
is required to upgrade client only.

Thanks
Mohit Agrawal

On Fri, Mar 31, 2017 at 2:27 PM, Mohit Agrawal <moagrawa at redhat.com> wrote:

> Hi,
>
> As per attached glusterdump/stackdump  it seems it is a known issue (https://bugzilla.redhat.com/show_bug.cgi?id=1372211) and issue is already fixed from the patch (https://review.gluster.org/#/c/15380/).
>
> The issue is happened in this case
> Assume a file is opened with fd1 and fd2.
> 1. some WRITE opto fd1 got error, they were add back to 'todo' queue
>    because of those error.
> 2. fd2 closed, a FLUSH op is send to write-behind.
> 3. FLUSH can not be unwind because it's not a legal waiter for those
>    failed write(as func __wb_request_waiting_on() say). and those failed
>    WRITE also can not be ended if fd1 is not closed. fd2 stuck in close
>    syscall.
>
> As per statedump it also shows flush op fd is not same as write op fd.
> Kindly upgrade the package on 3.10.1 and share the result.
>
>
>
> Thanks
> Mohit Agrawal
>
>
> On Fri, Mar 31, 2017 at 12:29 PM, Amar Tumballi <atumball at redhat.com <http://lists.gluster.org/mailman/listinfo/gluster-users>> wrote:
>
> >* Hi Alvin,
> *>>* Thanks for the dump output. It helped a bit.
> *>>* For now, recommend turning off open-behind and read-ahead performance
> *>* translators for you to get rid of this situation, As I noticed hung FLUSH
> *>* operations from these translators.
> *>
> Looks like I gave wrong advise by looking at below snippet:
>
> [global.callpool.stack.61]
> >* stack=0x7f6c6f628f04
> *>* uid=48
> *>* gid=48
> *>* pid=11077
> *>* unique=10048797
> *>* lk-owner=a73ae5bdb5fcd0d2
> *>* op=FLUSH
> *>* type=1
> *>* cnt=5
> *>>* [global.callpool.stack.61.frame.1]
> *>* frame=0x7f6c6f793d88
> *>* ref_count=0
> *>* translator=edocs-production-write-behind
> *>* complete=0
> *>* parent=edocs-production-read-ahead
> *>* wind_from=ra_flush
> *>* wind_to=FIRST_CHILD (this)->fops->flush
> *>* unwind_to=ra_flush_cbk
> *>>* [global.callpool.stack.61.frame.2]
> *>* frame=0x7f6c6f796c90
> *>* ref_count=1
> *>* translator=edocs-production-read-ahead
> *>* complete=0
> *>* parent=edocs-production-open-behind
> *>* wind_from=default_flush_resume
> *>* wind_to=FIRST_CHILD(this)->fops->flush
> *>* unwind_to=default_flush_cbk
> *>>* [global.callpool.stack.61.frame.3]
> *>* frame=0x7f6c6f79b724
> *>* ref_count=1
> *>* translator=edocs-production-open-behind
> *>* complete=0
> *>* parent=edocs-production
> *>* wind_from=io_stats_flush
> *>* wind_to=FIRST_CHILD(this)->fops->flush
> *>* unwind_to=io_stats_flush_cbk
> *>>* [global.callpool.stack.61.frame.4]
> *>* frame=0x7f6c6f79b474
> *>* ref_count=1
> *>* translator=edocs-production
> *>* complete=0
> *>* parent=fuse
> *>* wind_from=fuse_flush_resume
> *>* wind_to=FIRST_CHILD(this)->fops->flush
> *>* unwind_to=fuse_err_cbk
> *>>* [global.callpool.stack.61.frame.5]
> *>* frame=0x7f6c6f796684
> *>* ref_count=1
> *>* translator=fuse
> *>* complete=0
> *>
> Mos probably, issue is with write-behind's flush. So please turn off
> write-behind and test. If you don't have any hung httpd processes, please
> let us know.
>
> -Amar
>
>
> >* -Amar
> *>>* On Wed, Mar 29, 2017 at 6:56 AM, Alvin Starr <alvin at netvel.net <http://lists.gluster.org/mailman/listinfo/gluster-users>> wrote:
> *>>>* We are running gluster 3.8.9-1 on Centos 7.3.1611 for the servers and on
> *>>* the clients 3.7.11-2 on Centos 6.8
> *>>>>* We are seeing httpd processes hang in fuse_request_send or sync_page.
> *>>>>* These calls are from PHP 5.3.3-48 scripts
> *>>>>* I am attaching  a tgz file that contains the process dump from glusterfsd
> *>>* and the hung pids along with the offending pid's stacks from
> *>>* /proc/{pid}/stack.
> *>>>>* This has been a low level annoyance for a while but it has become a much
> *>>* bigger issue because the number of hung processes went from a few a week to
> *>>* a few hundred a day.
> *>>>>>>* --
> *>>* Alvin Starr                   ||   voice: (905)513-7688
> *>>* Netvel Inc.                   ||   Cell:  (416)806-0133
> *>>* alvin at netvel.net <http://lists.gluster.org/mailman/listinfo/gluster-users>              ||
> *>>>>>>* _______________________________________________
> *>>* Gluster-users mailing list
> *>>* Gluster-users at gluster.org <http://lists.gluster.org/mailman/listinfo/gluster-users>
> *>>* http://lists.gluster.org/mailman/listinfo/gluster-users <http://lists.gluster.org/mailman/listinfo/gluster-users>
> *>>>>>>* --
> *>* Amar Tumballi (amarts)
> *>
>
>
> --
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170331/947837ad/attachment.html>