[Gluster-users] hanging httpd processes.

Alvin Starr alvin at netvel.net
Fri Mar 31 15:06:40 UTC 2017


Since things are in production making upgrades needs to be scheduled so 
this may take a while before I can get everything up to 3.10.

I have set the write-behind option to off on the servers but do I need 
to restart the servers or can I just get away with umount/mount the clients?

Thank you for taking a look at this for us.


On 03/31/2017 05:54 AM, Mohit Agrawal wrote:
> Hi,
>
> As you have mentioned client/server version in thread it shows package 
> version are different on both(client,server).
> We would recommend you to upgrade both servers and clients to rhs-3.10.1.
> If it is not possible to upgrade both(client,server) then in this case 
> it is required to upgrade client only.
>
> Thanks
> Mohit Agrawal
>
> On Fri, Mar 31, 2017 at 2:27 PM, Mohit Agrawal <moagrawa at redhat.com 
> <mailto:moagrawa at redhat.com>> wrote:
>
>     Hi, As per attached glusterdump/stackdump it seems it is a known
>     issue (https://bugzilla.redhat.com/show_bug.cgi?id=1372211
>     <https://bugzilla.redhat.com/show_bug.cgi?id=1372211>) and issue
>     is already fixed from the patch
>     (https://review.gluster.org/#/c/15380/
>     <https://review.gluster.org/#/c/15380/>). The issue is happened in
>     this case Assume a file is opened with fd1 and fd2. 1. some WRITE
>     opto fd1 got error, they were add back to 'todo' queue because of
>     those error. 2. fd2 closed, a FLUSH op is send to write-behind. 3.
>     FLUSH can not be unwind because it's not a legal waiter for those
>     failed write(as func __wb_request_waiting_on() say). and those
>     failed WRITE also can not be ended if fd1 is not closed. fd2 stuck
>     in close syscall. As per statedump it also shows flush op fd is
>     not same as write op fd. Kindly upgrade the package on 3.10.1 and
>     share the result. Thanks Mohit Agrawal
>
>     On Fri, Mar 31, 2017 at 12:29 PM, Amar Tumballi <atumball at redhat.com
>     <http://lists.gluster.org/mailman/listinfo/gluster-users>> wrote:
>
>     >/Hi Alvin, />//>/Thanks for the dump output. It helped a bit. />//>/For now, recommend turning off open-behind and read-ahead
>     performance />/translators for you to get rid of this situation, As I noticed
>     hung FLUSH />/operations from these translators. />//
>     Looks like I gave wrong advise by looking at below snippet:
>
>     [global.callpool.stack.61]
>     >/stack=0x7f6c6f628f04 />/uid=48 />/gid=48 />/pid=11077 />/unique=10048797 />/lk-owner=a73ae5bdb5fcd0d2 />/op=FLUSH />/type=1 />/cnt=5 />//>/[global.callpool.stack.61.frame.1] />/frame=0x7f6c6f793d88 />/ref_count=0 />/translator=edocs-production-write-behind />/complete=0 />/parent=edocs-production-read-ahead />/wind_from=ra_flush />/wind_to=FIRST_CHILD (this)->fops->flush />/unwind_to=ra_flush_cbk />//>/[global.callpool.stack.61.frame.2] />/frame=0x7f6c6f796c90 />/ref_count=1 />/translator=edocs-production-read-ahead />/complete=0 />/parent=edocs-production-open-behind />/wind_from=default_flush_resume />/wind_to=FIRST_CHILD(this)->fops->flush />/unwind_to=default_flush_cbk />//>/[global.callpool.stack.61.frame.3] />/frame=0x7f6c6f79b724 />/ref_count=1 />/translator=edocs-production-open-behind />/complete=0 />/parent=edocs-production />/wind_from=io_stats_flush />/wind_to=FIRST_CHILD(this)->fops->flush />/unwind_to=io_stats_flush_cbk />//>/[global.callpool.stack.61.frame.4] />/frame=0x7f6c6f79b474 />/ref_count=1 />/translator=edocs-production />/complete=0 />/parent=fuse />/wind_from=fuse_flush_resume />/wind_to=FIRST_CHILD(this)->fops->flush />/unwind_to=fuse_err_cbk />//>/[global.callpool.stack.61.frame.5] />/frame=0x7f6c6f796684 />/ref_count=1 />/translator=fuse />/complete=0 />//
>     Mos probably, issue is with write-behind's flush. So please turn off
>     write-behind and test. If you don't have any hung httpd processes, please
>     let us know.
>
>     -Amar
>
>
>     >/-Amar />//>/On Wed, Mar 29, 2017 at 6:56 AM, Alvin Starr <alvin at netvel.net
>     <http://lists.gluster.org/mailman/listinfo/gluster-users>> wrote: />//>>/We are running gluster 3.8.9-1 on Centos 7.3.1611 for the servers
>     and on />>/the clients 3.7.11-2 on Centos 6.8 />>//>>/We are seeing httpd processes hang in fuse_request_send or
>     sync_page. />>//>>/These calls are from PHP 5.3.3-48 scripts />>//>>/I am attaching a tgz file that contains the process dump from
>     glusterfsd />>/and the hung pids along with the offending pid's stacks from />>//proc/{pid}/stack. />>//>>/This has been a low level annoyance for a while but it has become
>     a much />>/bigger issue because the number of hung processes went from a few
>     a week to />>/a few hundred a day. />>//>>//>>/-- />>/Alvin Starr || voice: (905)513-7688 />>/Netvel Inc. || Cell: (416)806-0133 />>/alvin at netvel.net
>     <http://lists.gluster.org/mailman/listinfo/gluster-users> || />>//>>//>>/_______________________________________________ />>/Gluster-users mailing list />>/Gluster-users at gluster.org
>     <http://lists.gluster.org/mailman/listinfo/gluster-users> />>/http://lists.gluster.org/mailman/listinfo/gluster-users
>     <http://lists.gluster.org/mailman/listinfo/gluster-users> />>//>//>//>//>/-- />/Amar Tumballi (amarts) />// --
>
-- 
Alvin Starr                   ||   voice: (905)513-7688
Netvel Inc.                   ||   Cell:  (416)806-0133
alvin at netvel.net              ||
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170331/97e47508/attachment.html>


More information about the Gluster-users mailing list