[Gluster-users] Need some help on Mismatching xdata / Failed combine iatt / Too many fd

Thu Apr 7 03:21:42 UTC 2016

Hi Ashish,

I experienced another inode block today,

================================================================
[xlator.features.locks.mainvol-locks.inode]
path=/home/analyzer/workdir/NTD/bam/A1649.bam
mandatory=0
inodelk-count=3
lock-dump.domain.domain=mainvol-disperse-0:self-heal
lock-dump.domain.domain=mainvol-disperse-0
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid = 
18446744073709551610, owner=547a5f1e967f0000, client=0x7f03ac0037c0, 
connection-id=hw10-27049-2016/04/05-06:37:22:818320-mainvol-client-0-0-0, 
granted at 2016-04-06 09:11:14
inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 
18446744073709551610, owner=b046611e967f0000, client=0x7f03ac0037c0, 
connection-id=hw10-27049-2016/04/05-06:37:22:818320-mainvol-client-0-0-0, 
blocked at 2016-04-06 09:11:14
inodelk.inodelk[2](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 
1, owner=5433d109967f0000, client=0x7f03ac0037c0, 
connection-id=hw10-27049-2016/04/05-06:37:22:818320-mainvol-client-0-0-0, 
blocked at 2016-04-06 09:11:34
--

[xlator.features.locks.mainvol-locks.inode]
path=/home/analyzer/workdir/NTD/bam/A1588.bam
mandatory=0
inodelk-count=3
lock-dump.domain.domain=mainvol-disperse-0:self-heal
lock-dump.domain.domain=mainvol-disperse-0
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid = 
18446744073709551610, owner=b8185e1e967f0000, client=0x7f03ac0037c0, 
connection-id=hw10-27049-2016/04/05-06:37:22:818320-mainvol-client-0-0-0, 
granted at 2016-04-06 09:11:14
inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 
18446744073709551610, owner=d44e5b1e967f0000, client=0x7f03ac0037c0, 
connection-id=hw10-27049-2016/04/05-06:37:22:818320-mainvol-client-0-0-0, 
blocked at 2016-04-06 09:11:14
inodelk.inodelk[2](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 
1, owner=3c15d109967f0000, client=0x7f03ac0037c0, 
connection-id=hw10-27049-2016/04/05-06:37:22:818320-mainvol-client-0-0-0, 
blocked at 2016-04-06 09:11:34
================================================================

Could it be caused by the increased "nfs.outstanding-rpc-limit"? Volume 
info is attached. I'll decrease it to the default 16 to test it.

Best wishes,
Chen

On 4/4/2016 6:11 PM, Ashish Pandey wrote:
> Hi Chen,
>
> As I suspected, there are many blocked call for inodelk in sm11/mnt-disk1-mainvol.31115.dump.1459760675.
>
> =============================================
> [xlator.features.locks.mainvol-locks.inode]
> path=/home/analyzer/softs/bin/GenomeAnalysisTK.jar
> mandatory=0
> inodelk-count=4
> lock-dump.domain.domain=mainvol-disperse-0:self-heal
> lock-dump.domain.domain=mainvol-disperse-0
> inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid = 1, owner=dc2d3dfcc57f0000, client=0x7ff03435d5f0, connection-id=sm12-8063-2016/04/01-07:51:46:892384-mainvol-client-0-0-0, blocked at 2016-04-01 16:52:58, granted at 2016-04-01 16:52:58
> inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 1, owner=1414371e1a7f0000, client=0x7ff034204490, connection-id=hw10-17315-2016/04/01-07:51:44:421807-mainvol-client-0-0-0, blocked at 2016-04-01 16:58:51
> inodelk.inodelk[2](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 1, owner=a8eb14cd9b7f0000, client=0x7ff01400dbd0, connection-id=sm14-879-2016/04/01-07:51:56:133106-mainvol-client-0-0-0, blocked at 2016-04-01 17:03:41
> inodelk.inodelk[3](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 1, owner=b41a0482867f0000, client=0x7ff01800e670, connection-id=sm15-30906-2016/04/01-07:51:45:711474-mainvol-client-0-0-0, blocked at 2016-04-01 17:05:09
> =============================================
>
> This could be the cause of hang.
> Possible Workaround -
> If there is no IO going on for this volume, we can restart the volume using - gluster v start <volume-name> force. This will restart the nfs process too which will release the locks and
> we could come out of this issue.
>
> Ashish
>
>
>
>
> ----- Original Message -----
> From: "Chen Chen" <chenchen at smartquerier.com>
> To: "Ashish Pandey" <aspandey at redhat.com>
> Cc: gluster-users at gluster.org
> Sent: Monday, April 4, 2016 2:56:37 PM
> Subject: Re: [Gluster-users] Need some help on Mismatching xdata / Failed combine iatt / Too many fd
>
> Hi Ashish,
>
> Yes, I only uploaded the directory of one node (sm11). All nodes are
> showing the same kind of errors at the same time more or less.
>
> I'm sending the infos of the other 5 nodes. Logs of all bricks (except
> the "dead" 1x2) are also appended. One of the node (sm16) refused to let
> me ssh into it. volume status said it is still alive and showmount on it
> is working too.
>
> The node "hw10" works as a pure NFS server and don't have any bricks.
>
> The dump file and logs are again in my Dropbox (3.8M)
> https://dl.dropboxusercontent.com/u/56671522/statedump.tar.xz
>
> Best wishes,
> Chen
>
> On 4/4/2016 4:27 PM, Ashish Pandey wrote:
>>
>> Hi Chen,
>>
>> By looking at log in mnt-disk1-mainvol.log and mnt-disk1-mainvol.log I suspect this hang is because of inode lock contention.
>> I think the log provided are for one brick only.
>> To make sure of it, we would require statedump for all the brick process and nfs
>>
>> For bricks: gluster volume statedump <volname>
>> For nfs server: gluster volume statedump <volname> nfs
>>
>> Directory where statedump files are created can be find by using 'gluster --print-statedumpdir' command.
>> If not present create this directory.
>>
>> Logs for all the bricks are also required.
>> You should try to restart the volume which could solve this hang issue if this is because of inode lock.
>>
>> gluster volume start <volname> force
>>
>> Ashish
>>
>>
>>
>>
>>
>>
>> ----- Original Message -----
>> From: "Chen Chen" <chenchen at smartquerier.com>
>> To: "Ashish Pandey" <aspandey at redhat.com>
>> Cc: gluster-users at gluster.org
>> Sent: Sunday, April 3, 2016 2:13:22 PM
>> Subject: Re: [Gluster-users] Need some help on Mismatching xdata / Failed combine iatt / Too many fd
>>
>> Hi Ashish Pandey,
>>
>> After some investigation I updated the server from 3.7.6 to 3.7.9. I
>> also switched from native fuse to NFS mount (which boosted the
>> performance a lot when I tested) on April 1st.
>>
>> Then after two days' running, the cluster appeared to be locked. "ls"
>> hangs, no network usage, volume profile showed no r/w activity on
>> bricks. "dmesg" showed the NFS went dead in 12 hrs (Apr 2 01:13), but
>> "showmount" and "volume status" said NFS server is responding and all
>> bricks are alive.
>>
>> I'm not sure what had happened (glustershd.log and nfs.log didn't show
>> anything interesting), so I dumped the whole log folder instead. It was
>> a bit too large (5MB, filled by Error and Warning) and my mail was
>> rejected multiple times by the mailing list. I can only attached the
>> snapshot of all logs. You can grab the full version at
>> https://dl.dropboxusercontent.com/u/56671522/glusterfs.tar.xz instead.
>>
>> The volume profile info is also attached. Hope it helps.
>>
>> Best wishes,
>> Chen
>>
>> On 3/27/2016 2:38 AM, Ashish Pandey wrote:
>>> Hi Chen,
>>>
>>> Could you please send us following logs-
>>> 1 - brick logs - under /var/log/messages/brick/
>>> 2 - mount logs
>>>
>>> Also some information like what kind of IO was happening (read,write, unlink, rename on different mount) to understand this issue in a better way.
>>>
>>> ---
>>> Ashish
>>>
>>> ----- Original Message -----
>>> From: "陈陈" <chenchen at smartquerier.com>
>>> To: gluster-users at gluster.org
>>> Sent: Friday, March 25, 2016 8:59:04 AM
>>> Subject: [Gluster-users] Need some help on Mismatching xdata / Failed combine iatt / Too many fd
>>>
>>> Hi Everyone,
>>>
>>> I have a "2 x (4 + 2) = 12 Distributed-Disperse" volume. After upgraded
>>> to 3.7.8 I noticed the volume is frequently out of service. The
>>> glustershd.log is flooded by:
>>>
>>> [ec-combine.c:866:ec_combine_check] 0-mainvol-disperse-1: Mismatching
>>> xdata in answers of 'LOOKUP'"
>>> [ec-common.c:116:ec_check_status] 0-mainvol-disperse-1: Operation failed
>>> on some subvolumes (up=3F, mask=3F, remaining=0, good=1E, bad=21)
>>> [ec-common.c:71:ec_heal_report] 0-mainvol-disperse-1: Heal failed
>>> [Invalid argument]
>>> [ec-combine.c:206:ec_iatt_combine] 0-mainvol-disperse-0: Failed to
>>> combine iatt (inode: xxx, links: 1-1, uid: 1000-1000, gid: 1000-1000,
>>> rdev: 0-0, size: xxx-xxx, mode: 100600-100600)
>>>
>>> in normal working state, and sometimes 1000+ lines of:
>>>
>>> [client-rpc-fops.c:466:client3_3_open_cbk] 0-mainvol-client-7: remote
>>> operation failed. Path: <gfid:xxxx> (xxxx) [Too many open files]
>>>
>>> and the brick went offline. "top open" showed "Max open fds: 899195".
>>>
>>> Can anyone suggest me what happened, and what should I do? I was trying
>>> to deal with the terrible IOPS problem but things got even worse.
>>>
>>> Each Server has 2 x E5-2630v3 (32threads/server), 32GB RAM. Additional
>>> infos are in the attachements. Many thanks.
>>>
>>> Sincerely yours,
>>> Chen
>>>
>>
>

-- 
Chen Chen
上海慧算生物技术有限公司
Shanghai SmartQuerier Biotechnology Co., Ltd.
Add: Room 410, 781 Cai Lun Road, China (Shanghai) Pilot Free Trade Zone
         Shanghai 201203, P. R. China
Mob: +86 15221885893
Email: chenchen at smartquerier.com
Web: www.smartquerier.com
-------------- next part --------------
Volume Name: mainvol
Type: Distributed-Disperse
Volume ID: 2e190c59-9e28-43a5-b22a-24f75e9a580b
Status: Started
Number of Bricks: 2 x (4 + 2) = 12
Transport-type: tcp
Bricks:
Brick1: sm11:/mnt/disk1/mainvol
Brick2: sm12:/mnt/disk1/mainvol
Brick3: sm13:/mnt/disk1/mainvol
Brick4: sm14:/mnt/disk2/mainvol
Brick5: sm15:/mnt/disk2/mainvol
Brick6: sm16:/mnt/disk2/mainvol
Brick7: sm11:/mnt/disk2/mainvol
Brick8: sm12:/mnt/disk2/mainvol
Brick9: sm13:/mnt/disk2/mainvol
Brick10: sm14:/mnt/disk1/mainvol
Brick11: sm15:/mnt/disk1/mainvol
Brick12: sm16:/mnt/disk1/mainvol
Options Reconfigured:
nfs.outstanding-rpc-limit: 64
nfs.rpc-auth-allow: 172.168.135.*,127.0.0.1,::1
network.remote-dio: false
performance.io-cache: true
performance.cache-size: 16GB
auth.allow: 172.16.135.*
performance.readdir-ahead: on
client.event-threads: 8
server.event-threads: 8
performance.io-thread-count: 32
performance.write-behind-window-size: 4MB
nfs.disable: false
diagnostics.client-log-level: WARNING
diagnostics.brick-log-level: WARNING
cluster.lookup-optimize: on
cluster.readdir-optimize: on
server.outstanding-rpc-limit: 256
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4169 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160407/8f48ab4b/attachment.p7s>