[Bugs] [Bug 1635784] brick process segfault

Sat Oct 6 06:59:54 UTC 2018

https://bugzilla.redhat.com/show_bug.cgi?id=1635784


--- Comment #4 from kyung-pyo,kim <hgichon at gmail.com> ---
Yesterday, another brick was died with same symptom.
core file : http://ac2repo.gluesys.com/ac2repo/down/core.50570.tgz 

Volume configuration:
- We have 10 EC Volume with 6 nodes (4+2).
- Each brick size is 37TB
- This is one cluser infomation of 10 EC volumes.
- All volume configuration is same.

Each volume Data charateristics
- df -i : 8M, 
- 95% mp4 files(~10MB), some txt infomation 

File/Dir Layout
- /
  └── indexdir ( about 1000)
       └── datadir ( about 800) 
            └── data : about 10 (txt and mp4)      
NOTE:
- Currently there are aggressive selfhealing job.
- We delete all brick in one node, and then monitoring self-heal status.
- self-heal daemon memory reaches 75GB 
- Before brick segfaulted, there are many gfid fd clean up calls

Volume Name: xxxxxxx-01
Type: Disperse
Volume ID: cac0ab6a-55bd-48ed-ac7a-92f0cb4aca80
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (4 + 2) = 6
Transport-type: tcp
Bricks:
Brick1: xxxxxxx-GLUSTER2-1:/gluster/brick1/data
Brick2: xxxxxxx-GLUSTER2-2:/gluster/brick1/data
Brick3: xxxxxxx-GLUSTER2-3:/gluster/brick1/data
Brick4: xxxxxxx-GLUSTER2-4:/gluster/brick1/data
Brick5: xxxxxxx-GLUSTER2-5:/gluster/brick1/data
Brick6: xxxxxxx-GLUSTER2-6:/gluster/brick1/data
Options Reconfigured:
performance.io-thread-count: 64
performance.least-prio-threads: 64
performance.high-prio-threads: 64
performance.normal-prio-threads: 64
performance.low-prio-threads: 64
server.event-threads: 1024
client.event-threads: 32
cluster.lookup-optimize: on
performance.parallel-readdir: on
cluster.use-compound-fops: on
performance.nl-cache: on
performance.nl-cache-positive-entry: on
performance.nl-cache-limit: 1GB
network.inode-lru-limit: 200000
performance.md-cache-timeout: 600
performance.cache-invalidation: on
performance.stat-prefetch: on
features.cache-invalidation-timeout: 600
features.cache-invalidation: on
disperse.shd-wait-qlength: 32768
disperse.shd-max-threads: 16
disperse.self-heal-window-size: 16
disperse.heal-wait-qlength: 2048
disperse.background-heals: 64
performance.write-behind-window-size: 50MB
performance.cache-size: 4GB
cluster.shd-wait-qlength: 32768
cluster.background-self-heal-count: 64
cluster.self-heal-window-size: 16
transport.address-family: inet
nfs.disable: on
cluster.localtime-logging: enable

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.