[Bugs] [Bug 1311354] New: File operation hangs in 26 node cluster under heavy load
bugzilla at redhat.com
bugzilla at redhat.com
Wed Feb 24 02:28:33 UTC 2016
https://bugzilla.redhat.com/show_bug.cgi?id=1311354
Bug ID: 1311354
Summary: File operation hangs in 26 node cluster under heavy
load
Product: GlusterFS
Version: 3.5.5
Component: fuse
Severity: urgent
Assignee: bugs at gluster.org
Reporter: wymonsoon at gmail.com
CC: bugs at gluster.org
Created attachment 1130004
--> https://bugzilla.redhat.com/attachment.cgi?id=1130004&action=edit
client side log
Description of problem:
We are using GlusterFS 3.5.5.
The server-end is deployed on a 26-node cluster. Each node has one brick.
The client-end is a 32-node cluster (including the 26 server node) which runs
distributed video transcoding. GFS is the file share between the 32 servers,
mounted with FUSE.
We found that when workload is high, client often hangs on file operations on
gfs. The client log indicates that the client losts ping from the server and
leads to a bunch of "Transport point is not connected" in the log.
Version-Release number of selected component (if applicable):
3.5.5
How reproducible:
Like dozens of times per hour
Steps to Reproduce:
1.
2.
3.
Actual results:
All file operations runs correctly
Expected results:
No hang
Additional info:
OS: debian 8.2
Kernel: 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt11-1+deb8u3 (2015-08-04) x86_64
GNU/Linux
The TCP ping during the hang period is working correctly.
Our volume info:
Volume Name: hzsq_encode_02
Type: Distributed-Replicate
Volume ID: 653b554b-47aa-4f25-a102-7ac6858f41e1
Status: Started
Number of Bricks: 13 x 2 = 26
Transport-type: tcp
Bricks:
Brick1: hzsq-encode-33:/data/gfs-brk
Brick2: hzsq-encode-34:/data/gfs-brk
Brick3: hzsq-encode-41:/data/gfs-brk
Brick4: hzsq-encode-42:/data/gfs-brk
Brick5: hzsq-encode-43:/data/gfs-brk
Brick6: hzsq-encode-44:/data/gfs-brk
Brick7: hzsq-encode-45:/data/gfs-brk
Brick8: hzsq-encode-46:/data/gfs-brk
Brick9: hzsq-encode-47:/data/gfs-brk
Brick10: hzsq-encode-48:/data/gfs-brk
Brick11: hzsq-encode-49:/data/gfs-brk
Brick12: hzsq-encode-50:/data/gfs-brk
Brick13: hzsq-encode-51:/data/gfs-brk
Brick14: hzsq-encode-52:/data/gfs-brk
Brick15: hzsq-encode-53:/data/gfs-brk
Brick16: hzsq-encode-54:/data/gfs-brk
Brick17: hzsq-encode-55:/data/gfs-brk
Brick18: hzsq-encode-56:/data/gfs-brk
Brick19: hzsq-encode-57:/data/gfs-brk
Brick20: hzsq-encode-58:/data/gfs-brk
Brick21: hzsq-encode-59:/data/gfs-brk
Brick22: hzsq-encode-60:/data/gfs-brk
Brick23: hzsq-encode-61:/data/gfs-brk
Brick24: hzsq-encode-62:/data/gfs-brk
Brick25: hzsq-encode-63:/data/gfs-brk
Brick26: hzsq-encode-64:/data/gfs-brk
Options Reconfigured:
nfs.disable: On
performance.io-thread-count: 32
performance.cache-refresh-timeout: 1
performance.write-behind-window-size: 1MB
performance.cache-size: 128MB
performance.flush-behind: On
server.outstanding-rpc-limit: 0
performance.read-ahead: On
performance.io-cache: On
performance.quick-read: off
nfs.outstanding-rpc-limit: 0
network.ping-timeout: 20
server.statedump-path: /tmp
--
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.
More information about the Bugs
mailing list