[Gluster-users] Need some help on Mismatching xdata / Failed combine iatt / Too many fd
陈陈
chenchen at smartquerier.com
Fri Mar 25 03:29:04 UTC 2016
Hi Everyone,
I have a "2 x (4 + 2) = 12 Distributed-Disperse" volume. After upgraded
to 3.7.8 I noticed the volume is frequently out of service. The
glustershd.log is flooded by:
[ec-combine.c:866:ec_combine_check] 0-mainvol-disperse-1: Mismatching
xdata in answers of 'LOOKUP'"
[ec-common.c:116:ec_check_status] 0-mainvol-disperse-1: Operation failed
on some subvolumes (up=3F, mask=3F, remaining=0, good=1E, bad=21)
[ec-common.c:71:ec_heal_report] 0-mainvol-disperse-1: Heal failed
[Invalid argument]
[ec-combine.c:206:ec_iatt_combine] 0-mainvol-disperse-0: Failed to
combine iatt (inode: xxx, links: 1-1, uid: 1000-1000, gid: 1000-1000,
rdev: 0-0, size: xxx-xxx, mode: 100600-100600)
in normal working state, and sometimes 1000+ lines of:
[client-rpc-fops.c:466:client3_3_open_cbk] 0-mainvol-client-7: remote
operation failed. Path: <gfid:xxxx> (xxxx) [Too many open files]
and the brick went offline. "top open" showed "Max open fds: 899195".
Can anyone suggest me what happened, and what should I do? I was trying
to deal with the terrible IOPS problem but things got even worse.
Each Server has 2 x E5-2630v3 (32threads/server), 32GB RAM. Additional
infos are in the attachements. Many thanks.
Sincerely yours,
Chen
--
Chen Chen
Shanghai SmartQuerier Biotechnology Co., Ltd.
Add: Room 410, 781 Cai Lun Road, China (Shanghai) Pilot Free Trade Zone
Shanghai 201203, P. R. China
Mob: +86 15221885893
Email: chenchen at smartquerier.com
Web: www.smartquerier.com
-------------- next part --------------
Volume Name: mainvol
Type: Distributed-Disperse
Volume ID: 2e190c59-9e28-43a5-b22a-24f75e9a580b
Status: Started
Number of Bricks: 2 x (4 + 2) = 12
Transport-type: tcp
Bricks:
Brick1: sm11:/mnt/disk1/mainvol
Brick2: sm12:/mnt/disk1/mainvol
Brick3: sm13:/mnt/disk1/mainvol
Brick4: sm14:/mnt/disk2/mainvol
Brick5: sm15:/mnt/disk2/mainvol
Brick6: sm16:/mnt/disk2/mainvol
Brick7: sm11:/mnt/disk2/mainvol
Brick8: sm12:/mnt/disk2/mainvol
Brick9: sm13:/mnt/disk2/mainvol
Brick10: sm14:/mnt/disk1/mainvol
Brick11: sm15:/mnt/disk1/mainvol
Brick12: sm16:/mnt/disk1/mainvol
Options Reconfigured:
server.outstanding-rpc-limit: 256
network.remote-dio: false
performance.io-cache: true
performance.readdir-ahead: on
auth.allow: 172.16.135.*
performance.cache-size: 16GB
client.event-threads: 8
server.event-threads: 8
performance.io-thread-count: 32
performance.write-behind-window-size: 4MB
nfs.disable: on
diagnostics.client-log-level: WARNING
diagnostics.brick-log-level: WARNING
cluster.lookup-optimize: on
cluster.readdir-optimize: on
-------------- next part --------------
Status of volume: mainvol
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick sm11:/mnt/disk1/mainvol 49152 0 Y 16501
Brick sm12:/mnt/disk1/mainvol 49152 0 Y 15007
Brick sm13:/mnt/disk1/mainvol 49154 0 Y 13123
Brick sm14:/mnt/disk2/mainvol 49154 0 Y 14947
Brick sm15:/mnt/disk2/mainvol 49152 0 Y 13236
Brick sm16:/mnt/disk2/mainvol 49152 0 Y 14762
Brick sm11:/mnt/disk2/mainvol 49153 0 Y 23039
Brick sm12:/mnt/disk2/mainvol 49153 0 Y 19614
Brick sm13:/mnt/disk2/mainvol 49155 0 Y 15387
Brick sm14:/mnt/disk1/mainvol 49155 0 Y 23231
Brick sm15:/mnt/disk1/mainvol 49153 0 Y 28494
Brick sm16:/mnt/disk1/mainvol 49153 0 Y 17656
Self-heal Daemon on localhost N/A N/A Y 25029
Self-heal Daemon on sm11 N/A N/A Y 23634
Self-heal Daemon on sm13 N/A N/A Y 17394
Self-heal Daemon on sm14 N/A N/A Y 31322
Self-heal Daemon on sm12 N/A N/A Y 19609
Self-heal Daemon on hw10 N/A N/A Y 14926
Self-heal Daemon on sm16 N/A N/A Y 17648
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4169 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160325/23a05014/attachment.p7s>
More information about the Gluster-users
mailing list