[Gluster-users] Need some help on Mismatching xdata / Failed combine iatt / Too many fd

Chen Chen chenchen at smartquerier.com
Mon Apr 4 11:44:14 UTC 2016


Hi Ashish,

There was heavy IO load on the cluster when it get locked down. I fear 
the process waiting for IO will all get crashed.

Furthermore, both "start force" and "stop" told me "Error : Request 
timed out". I'm not sure if it was caused by the semi-dead node. I'll 
hard reset the node tomorrow and see if it helps.

Besides, what caused the lock and how can I avoid it? Any advice is 
appreciated.

Best wishes,
Chen

On 4/4/2016 6:11 PM, Ashish Pandey wrote:
> Hi Chen,
>
> As I suspected, there are many blocked call for inodelk in sm11/mnt-disk1-mainvol.31115.dump.1459760675.
>
> =============================================
> [xlator.features.locks.mainvol-locks.inode]
> path=/home/analyzer/softs/bin/GenomeAnalysisTK.jar
> mandatory=0
> inodelk-count=4
> lock-dump.domain.domain=mainvol-disperse-0:self-heal
> lock-dump.domain.domain=mainvol-disperse-0
> inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid = 1, owner=dc2d3dfcc57f0000, client=0x7ff03435d5f0, connection-id=sm12-8063-2016/04/01-07:51:46:892384-mainvol-client-0-0-0, blocked at 2016-04-01 16:52:58, granted at 2016-04-01 16:52:58
> inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 1, owner=1414371e1a7f0000, client=0x7ff034204490, connection-id=hw10-17315-2016/04/01-07:51:44:421807-mainvol-client-0-0-0, blocked at 2016-04-01 16:58:51
> inodelk.inodelk[2](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 1, owner=a8eb14cd9b7f0000, client=0x7ff01400dbd0, connection-id=sm14-879-2016/04/01-07:51:56:133106-mainvol-client-0-0-0, blocked at 2016-04-01 17:03:41
> inodelk.inodelk[3](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 1, owner=b41a0482867f0000, client=0x7ff01800e670, connection-id=sm15-30906-2016/04/01-07:51:45:711474-mainvol-client-0-0-0, blocked at 2016-04-01 17:05:09
> =============================================
>
> This could be the cause of hang.
> Possible Workaround -
> If there is no IO going on for this volume, we can restart the volume using - gluster v start <volume-name> force. This will restart the nfs process too which will release the locks and
> we could come out of this issue.
>
> Ashish
>
>
>
>
> ----- Original Message -----
> From: "Chen Chen" <chenchen at smartquerier.com>
> To: "Ashish Pandey" <aspandey at redhat.com>
> Cc: gluster-users at gluster.org
> Sent: Monday, April 4, 2016 2:56:37 PM
> Subject: Re: [Gluster-users] Need some help on Mismatching xdata / Failed combine iatt / Too many fd
>
> Hi Ashish,
>
> Yes, I only uploaded the directory of one node (sm11). All nodes are
> showing the same kind of errors at the same time more or less.
>
> I'm sending the infos of the other 5 nodes. Logs of all bricks (except
> the "dead" 1x2) are also appended. One of the node (sm16) refused to let
> me ssh into it. volume status said it is still alive and showmount on it
> is working too.
>
> The node "hw10" works as a pure NFS server and don't have any bricks.
>
> The dump file and logs are again in my Dropbox (3.8M)
> https://dl.dropboxusercontent.com/u/56671522/statedump.tar.xz
>
> Best wishes,
> Chen
>
> On 4/4/2016 4:27 PM, Ashish Pandey wrote:
>>
>> Hi Chen,
>>
>> By looking at log in mnt-disk1-mainvol.log and mnt-disk1-mainvol.log I suspect this hang is because of inode lock contention.
>> I think the log provided are for one brick only.
>> To make sure of it, we would require statedump for all the brick process and nfs
>>
>> For bricks: gluster volume statedump <volname>
>> For nfs server: gluster volume statedump <volname> nfs
>>
>> Directory where statedump files are created can be find by using 'gluster --print-statedumpdir' command.
>> If not present create this directory.
>>
>> Logs for all the bricks are also required.
>> You should try to restart the volume which could solve this hang issue if this is because of inode lock.
>>
>> gluster volume start <volname> force
>>
>> Ashish

-- 
Chen Chen
上海慧算生物技术有限公司
Shanghai SmartQuerier Biotechnology Co., Ltd.
Add: Room 410, 781 Cai Lun Road, China (Shanghai) Pilot Free Trade Zone
         Shanghai 201203, P. R. China
Mob: +86 15221885893
Email: chenchen at smartquerier.com
Web: www.smartquerier.com

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4169 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160404/849ffac4/attachment.p7s>


More information about the Gluster-users mailing list