[Gluster-users] Need some help on Mismatching xdata / Failed combine iatt / Too many fd
Chen Chen
chenchen at smartquerier.com
Sun Apr 3 08:43:22 UTC 2016
Hi Ashish Pandey,
After some investigation I updated the server from 3.7.6 to 3.7.9. I
also switched from native fuse to NFS mount (which boosted the
performance a lot when I tested) on April 1st.
Then after two days' running, the cluster appeared to be locked. "ls"
hangs, no network usage, volume profile showed no r/w activity on
bricks. "dmesg" showed the NFS went dead in 12 hrs (Apr 2 01:13), but
"showmount" and "volume status" said NFS server is responding and all
bricks are alive.
I'm not sure what had happened (glustershd.log and nfs.log didn't show
anything interesting), so I dumped the whole log folder instead. It was
a bit too large (5MB, filled by Error and Warning) and my mail was
rejected multiple times by the mailing list. I can only attached the
snapshot of all logs. You can grab the full version at
https://dl.dropboxusercontent.com/u/56671522/glusterfs.tar.xz instead.
The volume profile info is also attached. Hope it helps.
Best wishes,
Chen
On 3/27/2016 2:38 AM, Ashish Pandey wrote:
> Hi Chen,
>
> Could you please send us following logs-
> 1 - brick logs - under /var/log/messages/brick/
> 2 - mount logs
>
> Also some information like what kind of IO was happening (read,write, unlink, rename on different mount) to understand this issue in a better way.
>
> ---
> Ashish
>
> ----- Original Message -----
> From: "陈陈" <chenchen at smartquerier.com>
> To: gluster-users at gluster.org
> Sent: Friday, March 25, 2016 8:59:04 AM
> Subject: [Gluster-users] Need some help on Mismatching xdata / Failed combine iatt / Too many fd
>
> Hi Everyone,
>
> I have a "2 x (4 + 2) = 12 Distributed-Disperse" volume. After upgraded
> to 3.7.8 I noticed the volume is frequently out of service. The
> glustershd.log is flooded by:
>
> [ec-combine.c:866:ec_combine_check] 0-mainvol-disperse-1: Mismatching
> xdata in answers of 'LOOKUP'"
> [ec-common.c:116:ec_check_status] 0-mainvol-disperse-1: Operation failed
> on some subvolumes (up=3F, mask=3F, remaining=0, good=1E, bad=21)
> [ec-common.c:71:ec_heal_report] 0-mainvol-disperse-1: Heal failed
> [Invalid argument]
> [ec-combine.c:206:ec_iatt_combine] 0-mainvol-disperse-0: Failed to
> combine iatt (inode: xxx, links: 1-1, uid: 1000-1000, gid: 1000-1000,
> rdev: 0-0, size: xxx-xxx, mode: 100600-100600)
>
> in normal working state, and sometimes 1000+ lines of:
>
> [client-rpc-fops.c:466:client3_3_open_cbk] 0-mainvol-client-7: remote
> operation failed. Path: <gfid:xxxx> (xxxx) [Too many open files]
>
> and the brick went offline. "top open" showed "Max open fds: 899195".
>
> Can anyone suggest me what happened, and what should I do? I was trying
> to deal with the terrible IOPS problem but things got even worse.
>
> Each Server has 2 x E5-2630v3 (32threads/server), 32GB RAM. Additional
> infos are in the attachements. Many thanks.
>
> Sincerely yours,
> Chen
>
--
Chen Chen
上海慧算生物技术有限公司
Shanghai SmartQuerier Biotechnology Co., Ltd.
Add: Room 410, 781 Cai Lun Road, China (Shanghai) Pilot Free Trade Zone
Shanghai 201203, P. R. China
Mob: +86 15221885893
Email: chenchen at smartquerier.com
Web: www.smartquerier.com
-------------- next part --------------
Brick: sm16:/mnt/disk2/mainvol
------------------------------
Cumulative Stats:
Block Size: 512b+ 1024b+ 2048b+
No. of Reads: 37968 18257117 657317
No. of Writes: 27442 4436 607
Block Size: 4096b+ 8192b+ 16384b+
No. of Reads: 19407384 3134980 3417081
No. of Writes: 641 1008 1217
Block Size: 32768b+ 65536b+
No. of Reads: 1028960 9867913
No. of Writes: 6889 20508938
%-latency Avg-latency Min-Latency Max-Latency No. of calls Fop
--------- ----------- ----------- ----------- ------------ ----
0.00 0.00 us 0.00 us 0.00 us 529 FORGET
0.00 0.00 us 0.00 us 0.00 us 81095 RELEASE
0.00 0.00 us 0.00 us 0.00 us 15748 RELEASEDIR
Duration: 167705 seconds
Data Read: 869764755456 bytes
Data Written: 1344596574720 bytes
Interval 1 Stats:
Duration: 255 seconds
Data Read: 0 bytes
Data Written: 0 bytes
Brick: sm16:/mnt/disk1/mainvol
------------------------------
Cumulative Stats:
Block Size: 512b+ 1024b+ 2048b+
No. of Reads: 25731 124811 62170
No. of Writes: 25591 5235 539
Block Size: 4096b+ 8192b+ 16384b+
No. of Reads: 1780063 41332 1599410
No. of Writes: 668 901 1155
Block Size: 32768b+ 65536b+
No. of Reads: 597009 7867435
No. of Writes: 7347 18906027
%-latency Avg-latency Min-Latency Max-Latency No. of calls Fop
--------- ----------- ----------- ----------- ------------ ----
0.00 0.00 us 0.00 us 0.00 us 500 FORGET
0.00 0.00 us 0.00 us 0.00 us 2585213 RELEASE
0.00 0.00 us 0.00 us 0.00 us 15757 RELEASEDIR
Duration: 167705 seconds
Data Read: 572226195968 bytes
Data Written: 1239575955968 bytes
Interval 1 Stats:
Duration: 255 seconds
Data Read: 0 bytes
Data Written: 0 bytes
Brick: sm11:/mnt/disk1/mainvol
------------------------------
Cumulative Stats:
Block Size: 512b+ 1024b+ 2048b+
No. of Reads: 38428 18330601 659152
No. of Writes: 27442 4436 607
Block Size: 4096b+ 8192b+ 16384b+
No. of Reads: 19680537 3186133 3557387
No. of Writes: 641 1008 1217
Block Size: 32768b+ 65536b+
No. of Reads: 961274 10006889
No. of Writes: 6889 20508938
%-latency Avg-latency Min-Latency Max-Latency No. of calls Fop
--------- ----------- ----------- ----------- ------------ ----
0.00 0.00 us 0.00 us 0.00 us 529 FORGET
0.00 0.00 us 0.00 us 0.00 us 81097 RELEASE
0.00 0.00 us 0.00 us 0.00 us 15742 RELEASEDIR
Duration: 167705 seconds
Data Read: 880603889664 bytes
Data Written: 1344596574720 bytes
Interval 1 Stats:
Duration: 255 seconds
Data Read: 0 bytes
Data Written: 0 bytes
Brick: sm11:/mnt/disk2/mainvol
------------------------------
Cumulative Stats:
Block Size: 512b+ 1024b+ 2048b+
No. of Reads: 26415 118603 62244
No. of Writes: 25591 5235 539
Block Size: 4096b+ 8192b+ 16384b+
No. of Reads: 1851055 41928 1466117
No. of Writes: 668 901 1155
Block Size: 32768b+ 65536b+
No. of Reads: 641012 7944255
No. of Writes: 7347 18906027
%-latency Avg-latency Min-Latency Max-Latency No. of calls Fop
--------- ----------- ----------- ----------- ------------ ----
0.00 0.00 us 0.00 us 0.00 us 500 FORGET
0.00 0.00 us 0.00 us 0.00 us 2585238 RELEASE
0.00 0.00 us 0.00 us 0.00 us 15755 RELEASEDIR
Duration: 167705 seconds
Data Read: 576850006016 bytes
Data Written: 1239575955968 bytes
Interval 1 Stats:
Duration: 255 seconds
Data Read: 0 bytes
Data Written: 0 bytes
Brick: sm14:/mnt/disk2/mainvol
------------------------------
Cumulative Stats:
Block Size: 512b+ 1024b+ 2048b+
No. of Reads: 37320 17789029 655061
No. of Writes: 27442 4436 607
Block Size: 4096b+ 8192b+ 16384b+
No. of Reads: 19600027 3110591 3185336
No. of Writes: 641 1008 1217
Block Size: 32768b+ 65536b+
No. of Reads: 1043031 9626406
No. of Writes: 6889 20508938
%-latency Avg-latency Min-Latency Max-Latency No. of calls Fop
--------- ----------- ----------- ----------- ------------ ----
0.00 0.00 us 0.00 us 0.00 us 529 FORGET
0.00 0.00 us 0.00 us 0.00 us 81097 RELEASE
0.00 0.00 us 0.00 us 0.00 us 15744 RELEASEDIR
Duration: 167705 seconds
Data Read: 850640217600 bytes
Data Written: 1344596574720 bytes
Interval 1 Stats:
Duration: 255 seconds
Data Read: 0 bytes
Data Written: 0 bytes
Brick: sm14:/mnt/disk1/mainvol
------------------------------
Cumulative Stats:
Block Size: 512b+ 1024b+ 2048b+
No. of Reads: 25430 118856 63730
No. of Writes: 25591 5235 539
Block Size: 4096b+ 8192b+ 16384b+
No. of Reads: 1896584 24957 1611272
No. of Writes: 668 901 1155
Block Size: 32768b+ 65536b+
No. of Reads: 687858 7551537
No. of Writes: 7347 18906027
%-latency Avg-latency Min-Latency Max-Latency No. of calls Fop
--------- ----------- ----------- ----------- ------------ ----
0.00 0.00 us 0.00 us 0.00 us 500 FORGET
0.00 0.00 us 0.00 us 0.00 us 2585228 RELEASE
0.00 0.00 us 0.00 us 0.00 us 15753 RELEASEDIR
Duration: 167704 seconds
Data Read: 554862222336 bytes
Data Written: 1239575955968 bytes
Interval 1 Stats:
Duration: 255 seconds
Data Read: 0 bytes
Data Written: 0 bytes
Brick: sm13:/mnt/disk1/mainvol
------------------------------
Cumulative Stats:
Block Size: 512b+ 1024b+ 2048b+
No. of Reads: 37852 18005136 657018
No. of Writes: 27442 4436 607
Block Size: 4096b+ 8192b+ 16384b+
No. of Reads: 19471162 3132895 3154404
No. of Writes: 641 1008 1217
Block Size: 32768b+ 65536b+
No. of Reads: 1018312 9702965
No. of Writes: 6889 20508938
%-latency Avg-latency Min-Latency Max-Latency No. of calls Fop
--------- ----------- ----------- ----------- ------------ ----
0.00 0.00 us 0.00 us 0.00 us 529 FORGET
0.00 0.00 us 0.00 us 0.00 us 81097 RELEASE
0.00 0.00 us 0.00 us 0.00 us 15741 RELEASEDIR
Duration: 167705 seconds
Data Read: 854245903872 bytes
Data Written: 1344596574720 bytes
Interval 1 Stats:
Duration: 255 seconds
Data Read: 0 bytes
Data Written: 0 bytes
Brick: sm13:/mnt/disk2/mainvol
------------------------------
Cumulative Stats:
Block Size: 512b+ 1024b+ 2048b+
No. of Reads: 25869 100519 63939
No. of Writes: 25591 5235 539
Block Size: 4096b+ 8192b+ 16384b+
No. of Reads: 1853310 41271 1394883
No. of Writes: 668 901 1155
Block Size: 32768b+ 65536b+
No. of Reads: 576972 7517939
No. of Writes: 7347 18906027
%-latency Avg-latency Min-Latency Max-Latency No. of calls Fop
--------- ----------- ----------- ----------- ------------ ----
0.00 0.00 us 0.00 us 0.00 us 500 FORGET
0.00 0.00 us 0.00 us 0.00 us 2585248 RELEASE
0.00 0.00 us 0.00 us 0.00 us 15754 RELEASEDIR
Duration: 167705 seconds
Data Read: 545438357504 bytes
Data Written: 1239575955968 bytes
Interval 1 Stats:
Duration: 255 seconds
Data Read: 0 bytes
Data Written: 0 bytes
Brick: sm15:/mnt/disk1/mainvol
------------------------------
Cumulative Stats:
Block Size: 512b+ 1024b+ 2048b+
No. of Reads: 25376 124010 62769
No. of Writes: 25591 5235 539
Block Size: 4096b+ 8192b+ 16384b+
No. of Reads: 1842626 25247 1747332
No. of Writes: 668 901 1155
Block Size: 32768b+ 65536b+
No. of Reads: 615409 7695723
No. of Writes: 7347 18906027
%-latency Avg-latency Min-Latency Max-Latency No. of calls Fop
--------- ----------- ----------- ----------- ------------ ----
0.00 0.00 us 0.00 us 0.00 us 500 FORGET
0.00 0.00 us 0.00 us 0.00 us 2585252 RELEASE
0.00 0.00 us 0.00 us 0.00 us 15752 RELEASEDIR
Duration: 167705 seconds
Data Read: 564089530880 bytes
Data Written: 1239575955968 bytes
Interval 1 Stats:
Duration: 255 seconds
Data Read: 0 bytes
Data Written: 0 bytes
Brick: sm15:/mnt/disk2/mainvol
------------------------------
Cumulative Stats:
Block Size: 512b+ 1024b+ 2048b+
No. of Reads: 37794 17969276 655026
No. of Writes: 27442 4436 607
Block Size: 4096b+ 8192b+ 16384b+
No. of Reads: 19297777 3087656 3290762
No. of Writes: 641 1008 1217
Block Size: 32768b+ 65536b+
No. of Reads: 1025743 9707300
No. of Writes: 6889 20508938
%-latency Avg-latency Min-Latency Max-Latency No. of calls Fop
--------- ----------- ----------- ----------- ------------ ----
0.00 0.00 us 0.00 us 0.00 us 529 FORGET
0.00 0.00 us 0.00 us 0.00 us 81097 RELEASE
0.00 0.00 us 0.00 us 0.00 us 15742 RELEASEDIR
Duration: 167705 seconds
Data Read: 855877165568 bytes
Data Written: 1344596574720 bytes
Interval 1 Stats:
Duration: 255 seconds
Data Read: 0 bytes
Data Written: 0 bytes
Brick: sm12:/mnt/disk2/mainvol
------------------------------
Cumulative Stats:
Block Size: 512b+ 1024b+ 2048b+
No. of Reads: 26499 99466 63056
No. of Writes: 25591 5235 539
Block Size: 4096b+ 8192b+ 16384b+
No. of Reads: 1870342 42157 1397655
No. of Writes: 668 901 1155
Block Size: 32768b+ 65536b+
No. of Reads: 548533 7738956
No. of Writes: 7347 18906027
%-latency Avg-latency Min-Latency Max-Latency No. of calls Fop
--------- ----------- ----------- ----------- ------------ ----
0.00 0.00 us 0.00 us 0.00 us 500 FORGET
0.00 0.00 us 0.00 us 0.00 us 2585231 RELEASE
0.00 0.00 us 0.00 us 0.00 us 15751 RELEASEDIR
Duration: 167706 seconds
Data Read: 559290661888 bytes
Data Written: 1239575955968 bytes
Interval 1 Stats:
Duration: 256 seconds
Data Read: 0 bytes
Data Written: 0 bytes
Brick: sm12:/mnt/disk1/mainvol
------------------------------
Cumulative Stats:
Block Size: 512b+ 1024b+ 2048b+
No. of Reads: 38786 18260049 659154
No. of Writes: 27442 4436 607
Block Size: 4096b+ 8192b+ 16384b+
No. of Reads: 19442314 3161210 3400222
No. of Writes: 641 1008 1217
Block Size: 32768b+ 65536b+
No. of Reads: 933426 9923716
No. of Writes: 6889 20508938
%-latency Avg-latency Min-Latency Max-Latency No. of calls Fop
--------- ----------- ----------- ----------- ------------ ----
0.00 0.00 us 0.00 us 0.00 us 529 FORGET
0.00 0.00 us 0.00 us 0.00 us 81097 RELEASE
0.00 0.00 us 0.00 us 0.00 us 21405 RELEASEDIR
0.75 2.66 us 2.00 us 3.00 us 35 OPENDIR
14.07 49.86 us 23.00 us 92.00 us 35 LOOKUP
33.15 58.74 us 20.00 us 116.00 us 70 READDIR
52.04 92.23 us 20.00 us 921.00 us 70 GETXATTR
Duration: 167706 seconds
Data Read: 870389523968 bytes
Data Written: 1344596574720 bytes
Interval 1 Stats:
%-latency Avg-latency Min-Latency Max-Latency No. of calls Fop
--------- ----------- ----------- ----------- ------------ ----
0.00 0.00 us 0.00 us 0.00 us 32 RELEASEDIR
0.76 2.69 us 2.00 us 3.00 us 32 OPENDIR
13.85 49.22 us 23.00 us 92.00 us 32 LOOKUP
32.81 58.28 us 20.00 us 116.00 us 64 READDIR
52.59 93.42 us 20.00 us 921.00 us 64 GETXATTR
Duration: 256 seconds
Data Read: 0 bytes
Data Written: 0 bytes
-------------- next part --------------
[root at sm11 glusterfs]# tail bricks/*.log
==> bricks/mnt-disk1-mainvol.log <==
[2016-04-01 12:25:33.612779] E [MSGID: 115056] [server-rpc-fops.c:689:server_opendir_cbk] 0-mainvol-server: 10971356: OPENDIR /home/analyzer/personal/tcliu/projects/NTD/case_vcfs/HIGH/GALNT11 (e49e2adf-dc3f-41f5-96d5-b14b40f35d5f) ==> (Permission denied) [Permission denied]
[2016-04-01 12:29:46.857938] E [MSGID: 113018] [posix.c:234:posix_lookup] 0-mainvol-posix: post-operation lstat on parent /mnt/disk1/mainvol/.glusterfs/f3/83/f3833a3a-6c47-415d-ad1b-f3c6a7a57681 failed [No such file or directory]
[2016-04-01 12:29:46.859504] E [MSGID: 113018] [posix.c:234:posix_lookup] 0-mainvol-posix: post-operation lstat on parent /mnt/disk1/mainvol/.glusterfs/f3/83/f3833a3a-6c47-415d-ad1b-f3c6a7a57681 failed [No such file or directory]
[2016-04-01 12:33:45.228956] E [MSGID: 113001] [posix.c:5194:_posix_handle_xattr_keyvalue_pair] 0-mainvol-posix: getxattr failed on /mnt/disk1/mainvol/.glusterfs/3b/50/3b50d2e8-4956-4f96-ac19-3053a04bb676 while doing xattrop: Key:trusted.ec.version [No such file or directory]
[2016-04-01 14:31:38.579476] E [MSGID: 113001] [posix.c:5194:_posix_handle_xattr_keyvalue_pair] 0-mainvol-posix: getxattr failed on /mnt/disk1/mainvol/.glusterfs/6b/b4/6bb472bd-df7a-47ce-8b9b-54f859e72b15 while doing xattrop: Key:trusted.ec.version [No such file or directory]
[2016-04-01 15:20:17.888807] E [MSGID: 113001] [posix.c:5194:_posix_handle_xattr_keyvalue_pair] 0-mainvol-posix: getxattr failed on /mnt/disk1/mainvol/.glusterfs/ab/9e/ab9e50a7-e233-4398-a459-3146d46554bf while doing xattrop: Key:trusted.ec.version [No such file or directory]
[2016-04-01 15:22:29.297448] E [MSGID: 113001] [posix.c:5194:_posix_handle_xattr_keyvalue_pair] 0-mainvol-posix: getxattr failed on /mnt/disk1/mainvol/.glusterfs/e7/d5/e7d5cb3c-6a47-45c9-8e90-d6863ba392ba while doing xattrop: Key:trusted.ec.version [No such file or directory]
[2016-04-01 16:30:42.257752] E [MSGID: 113001] [posix.c:5194:_posix_handle_xattr_keyvalue_pair] 0-mainvol-posix: getxattr failed on /mnt/disk1/mainvol/.glusterfs/4e/a4/4ea4bea2-b2fa-4a8c-9b28-f085edac24bb while doing xattrop: Key:trusted.ec.version [No such file or directory]
[2016-04-01 16:30:42.257885] E [MSGID: 113001] [posix.c:5194:_posix_handle_xattr_keyvalue_pair] 0-mainvol-posix: getxattr failed on /mnt/disk1/mainvol/.glusterfs/35/6b/356b1bd0-38e2-4a38-98aa-d070573b8ff2 while doing xattrop: Key:trusted.ec.version [No such file or directory]
[2016-04-01 16:37:55.570342] E [MSGID: 113001] [posix.c:5194:_posix_handle_xattr_keyvalue_pair] 0-mainvol-posix: getxattr failed on /mnt/disk1/mainvol/.glusterfs/42/69/426917ac-590b-4046-a6c8-f6b552d56c88 while doing xattrop: Key:trusted.ec.version [No such file or directory]
==> bricks/mnt-disk2-mainvol.log <==
[2016-04-01 08:42:09.301205] E [MSGID: 113001] [posix.c:5194:_posix_handle_xattr_keyvalue_pair] 0-mainvol-posix: getxattr failed on /mnt/disk2/mainvol/.glusterfs/8e/dd/8edd9615-5efd-4ff6-a3bd-fd6847588b03 while doing xattrop: Key:trusted.ec.version [No such file or directory]
[2016-04-01 08:42:46.330855] E [MSGID: 113001] [posix.c:5194:_posix_handle_xattr_keyvalue_pair] 0-mainvol-posix: getxattr failed on /mnt/disk2/mainvol/.glusterfs/3c/7e/3c7e5913-9f70-468b-ae58-314048bc555a while doing xattrop: Key:trusted.ec.version [No such file or directory]
[2016-04-01 11:48:46.297341] E [MSGID: 113001] [posix.c:5194:_posix_handle_xattr_keyvalue_pair] 0-mainvol-posix: getxattr failed on /mnt/disk2/mainvol/.glusterfs/03/39/0339da12-5bae-42fb-a8e8-ced5e4526547 while doing xattrop: Key:trusted.ec.version [No such file or directory]
[2016-04-01 12:07:58.239150] E [MSGID: 113001] [posix.c:5194:_posix_handle_xattr_keyvalue_pair] 0-mainvol-posix: getxattr failed on /mnt/disk2/mainvol/.glusterfs/c3/36/c3363729-efc7-4336-8175-ad63aa6797bc while doing xattrop: Key:trusted.ec.version [No such file or directory]
[2016-04-01 12:25:07.579365] E [MSGID: 115056] [server-rpc-fops.c:689:server_opendir_cbk] 0-mainvol-server: 3977749: OPENDIR <gfid:70c326b9-e98d-41fb-b1da-2f5e52440347>/FOLH1B (49a59e47-3f92-46d1-b745-6ae0a0e61db4) ==> (Permission denied) [Permission denied]
[2016-04-01 12:25:33.602114] E [MSGID: 115056] [server-rpc-fops.c:689:server_opendir_cbk] 0-mainvol-server: 3980178: OPENDIR <gfid:70c326b9-e98d-41fb-b1da-2f5e52440347>/FOLH1B (49a59e47-3f92-46d1-b745-6ae0a0e61db4) ==> (Permission denied) [Permission denied]
[2016-04-01 12:25:33.612870] E [MSGID: 115056] [server-rpc-fops.c:689:server_opendir_cbk] 0-mainvol-server: 3980186: OPENDIR <gfid:70c326b9-e98d-41fb-b1da-2f5e52440347>/GALNT11 (e49e2adf-dc3f-41f5-96d5-b14b40f35d5f) ==> (Permission denied) [Permission denied]
[2016-04-01 12:33:45.228275] E [MSGID: 113001] [posix.c:5194:_posix_handle_xattr_keyvalue_pair] 0-mainvol-posix: getxattr failed on /mnt/disk2/mainvol/.glusterfs/3b/50/3b50d2e8-4956-4f96-ac19-3053a04bb676 while doing xattrop: Key:trusted.ec.version [No such file or directory]
[2016-04-01 14:31:38.579305] E [MSGID: 113001] [posix.c:5194:_posix_handle_xattr_keyvalue_pair] 0-mainvol-posix: getxattr failed on /mnt/disk2/mainvol/.glusterfs/3f/3d/3f3d9c37-be02-48e0-973b-ef8c4f3c295c while doing xattrop: Key:trusted.ec.version [No such file or directory]
[2016-04-01 14:52:40.571078] E [MSGID: 113001] [posix.c:5194:_posix_handle_xattr_keyvalue_pair] 0-mainvol-posix: getxattr failed on /mnt/disk2/mainvol/.glusterfs/b0/7b/b07b5553-a55d-4059-8777-a0ec40e51132 while doing xattrop: Key:trusted.ec.version [No such file or directory]
[root at sm11 glusterfs]# tail *.log
==> cli.log <==
[2016-04-03 07:33:00.323469] I [MSGID: 101190] [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2016-04-03 07:33:00.323577] I [socket.c:2356:socket_event_handler] 0-transport: disconnecting now
[2016-04-03 07:33:00.420750] I [cli-rpc-ops.c:2139:gf_cli_set_volume_cbk] 0-cli: Received resp to set
[2016-04-03 07:33:00.420987] I [input.c:36:cli_batch] 0-: Exiting with: 0
[2016-04-03 08:15:28.427738] I [cli.c:721:main] 0-cli: Started running gluster with version 3.7.9
[2016-04-03 08:15:28.436907] I [cli-cmd-volume.c:1795:cli_check_gsync_present] 0-: geo-replication not installed
[2016-04-03 08:15:28.437338] I [MSGID: 101190] [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2016-04-03 08:15:28.437433] I [socket.c:2356:socket_event_handler] 0-transport: disconnecting now
[2016-04-03 08:15:28.551696] I [cli-rpc-ops.c:2139:gf_cli_set_volume_cbk] 0-cli: Received resp to set
[2016-04-03 08:15:28.551936] I [input.c:36:cli_batch] 0-: Exiting with: 0
==> cmd_history.log <==
[2016-04-01 08:00:00.856908] : volume set help : SUCCESS
[2016-04-01 08:03:45.605250] : volume set help : SUCCESS
[2016-04-03 06:26:59.505978] : volume set help : SUCCESS
[2016-04-03 06:41:39.827425] : volume set help : SUCCESS
[2016-04-03 06:41:53.469277] : volume set help : SUCCESS
[2016-04-03 06:42:13.859466] : volume set help : SUCCESS
[2016-04-03 07:06:58.119033] : volume set help : SUCCESS
[2016-04-03 07:07:08.245910] : volume set help : SUCCESS
[2016-04-03 07:33:00.420496] : volume set help : SUCCESS
[2016-04-03 08:15:28.551440] : volume set help : SUCCESS
==> data.log <==
[2016-03-30 06:17:21.185246] W [MSGID: 114060] [client-handshake.c:724:client3_3_reopen_cbk] 0-mainvol-client-9: reopen on <gfid:69707d8f-989a-4cba-b724-33db1e8b8bbe> failed. [Stale file handle]
[2016-03-30 06:17:21.185988] W [MSGID: 114060] [client-handshake.c:724:client3_3_reopen_cbk] 0-mainvol-client-9: reopen on <gfid:9f4bafa4-b932-410a-877c-265edb553155> failed. [Stale file handle]
[2016-03-30 06:17:21.186031] W [MSGID: 114060] [client-handshake.c:724:client3_3_reopen_cbk] 0-mainvol-client-9: reopen on <gfid:9f4bafa4-b932-410a-877c-265edb553155> failed. [Stale file handle]
[2016-03-30 06:17:47.088748] W [MSGID: 122053] [ec-common.c:116:ec_check_status] 0-mainvol-disperse-1: Operation failed on some subvolumes (up=3F, mask=37, remaining=0, good=37, bad=8)
The message "W [MSGID: 122035] [ec-common.c:419:ec_child_select] 0-mainvol-disperse-1: Executing operation with some subvolumes unavailable (8)" repeated 5 times between [2016-03-30 06:16:14.119064] and [2016-03-30 06:17:14.120278]
[2016-03-30 06:17:21.155916] W [MSGID: 114060] [client-handshake.c:724:client3_3_reopen_cbk] 0-mainvol-client-9: reopen on <gfid:a293e6b6-357f-4cce-934e-f21757615648> failed. [Stale file handle]
[2016-03-30 06:17:21.156052] W [MSGID: 114060] [client-handshake.c:724:client3_3_reopen_cbk] 0-mainvol-client-9: reopen on <gfid:fb838b06-bd89-4cd1-931d-49f16185e742> failed. [Stale file handle]
The message "W [MSGID: 122053] [ec-common.c:116:ec_check_status] 0-mainvol-disperse-1: Operation failed on some subvolumes (up=3F, mask=37, remaining=0, good=37, bad=8)" repeated 3 times between [2016-03-30 06:17:47.088748] and [2016-03-30 06:18:07.271744]
[2016-03-30 06:18:14.124001] W [MSGID: 122053] [ec-common.c:116:ec_check_status] 0-mainvol-disperse-1: Operation failed on some subvolumes (up=3F, mask=37, remaining=0, good=37, bad=8)
[2016-04-01 03:14:01.680654] W [glusterfsd.c:1251:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7dc5) [0x7f425f2e8dc5] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x7f42609538b5] -->/usr/sbin/glusterfs(cleanup_and_exit+0x69) [0x7f4260953739] ) 0-: received signum (15), shutting down
==> etc-glusterfs-glusterd.vol.log <==
[2016-04-03 06:26:53.908779] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: scrub already stopped
[2016-04-03 06:26:59.507550] I [socket.c:3383:socket_submit_reply] 0-socket.management: not connected (priv->connected = -1)
[2016-04-03 06:26:59.507588] E [rpcsvc.c:1314:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x1, Program: GlusterD svc cli, ProgVers: 2, Proc: 12) to rpc-transport (socket.management)
[2016-04-03 06:26:59.507613] E [MSGID: 106430] [glusterd-utils.c:474:glusterd_submit_reply] 0-glusterd: Reply submission failed
[2016-04-03 06:42:13.859506] I [socket.c:3383:socket_submit_reply] 0-socket.management: not connected (priv->connected = -1)
[2016-04-03 06:42:13.859520] E [rpcsvc.c:1314:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x1, Program: GlusterD svc cli, ProgVers: 2, Proc: 12) to rpc-transport (socket.management)
[2016-04-03 06:42:13.859534] E [MSGID: 106430] [glusterd-utils.c:474:glusterd_submit_reply] 0-glusterd: Reply submission failed
[2016-04-03 07:07:08.245951] I [socket.c:3383:socket_submit_reply] 0-socket.management: not connected (priv->connected = -1)
[2016-04-03 07:07:08.245966] E [rpcsvc.c:1314:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x1, Program: GlusterD svc cli, ProgVers: 2, Proc: 12) to rpc-transport (socket.management)
[2016-04-03 07:07:08.245981] E [MSGID: 106430] [glusterd-utils.c:474:glusterd_submit_reply] 0-glusterd: Reply submission failed
==> glfsheal-mainvol.log <==
==> glustershd.log <==
[2016-04-02 17:03:08.694924] E [MSGID: 114031] [client-rpc-fops.c:466:client3_3_open_cbk] 0-mainvol-client-5: remote operation failed. Path: <gfid:c5df439e-1c6e-4105-b6c2-014a7be439cd> (c5df439e-1c6e-4105-b6c2-014a7be439cd) [Transport endpoint is not connected]
[2016-04-02 17:03:08.695053] E [MSGID: 114031] [client-rpc-fops.c:1624:client3_3_inodelk_cbk] 0-mainvol-client-5: remote operation failed [Transport endpoint is not connected]
[2016-04-02 17:03:08.703770] E [rpc-clnt.c:362:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7f2bc87bca52] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f2bc85878de] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f2bc85879ee] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7a)[0x7f2bc858937a] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7f2bc8589ba8] ))))) 0-mainvol-client-11: forced unwinding frame type(GlusterFS 3.3) op(INODELK(29)) called at 2016-04-02 17:03:08.697566 (xid=0x4fb1e9)
[2016-04-02 17:03:08.638750] W [MSGID: 122056] [ec-combine.c:866:ec_combine_check] 0-mainvol-disperse-1: Mismatching xdata in answers of 'LOOKUP'
[2016-04-02 17:03:08.700878] E [MSGID: 114031] [client-rpc-fops.c:1624:client3_3_inodelk_cbk] 0-mainvol-client-5: remote operation failed [Transport endpoint is not connected]
The message "E [MSGID: 114031] [client-rpc-fops.c:1624:client3_3_inodelk_cbk] 0-mainvol-client-11: remote operation failed [Transport endpoint is not connected]" repeated 7 times between [2016-04-02 17:03:08.628255] and [2016-04-02 17:03:08.748082]
[2016-04-02 17:33:09.795869] E [rpc-clnt.c:201:call_bail] 0-mainvol-client-11: bailing out frame type(GlusterFS 3.3) op(OPEN(11)) xid = 0x4fb1f6 sent = 2016-04-02 17:03:08.750519. timeout = 1800 for 172.16.135.16:49153
[2016-04-02 17:33:09.795952] E [MSGID: 114031] [client-rpc-fops.c:466:client3_3_open_cbk] 0-mainvol-client-11: remote operation failed. Path: <gfid:db0c1d6c-f733-4bc3-8c76-0b1cc8d6cbe7> (db0c1d6c-f733-4bc3-8c76-0b1cc8d6cbe7) [Transport endpoint is not connected]
[2016-04-02 18:01:59.992552] E [rpc-clnt.c:201:call_bail] 0-mainvol-client-11: bailing out frame type(GlusterFS 3.3) op(OPEN(11)) xid = 0x4fb221 sent = 2016-04-02 17:31:56.361972. timeout = 1800 for 172.16.135.16:49153
[2016-04-02 18:01:59.992618] E [MSGID: 114031] [client-rpc-fops.c:466:client3_3_open_cbk] 0-mainvol-client-11: remote operation failed. Path: <gfid:9afe87ba-855f-492c-901f-b618f5247705> (9afe87ba-855f-492c-901f-b618f5247705) [Transport endpoint is not connected]
==> mainvol-rebalance.log <==
230: volume mainvol
231: type debug/io-stats
232: option log-level WARNING
233: option latency-measurement off
234: option count-fop-hits off
235: subvolumes mainvol-dht
236: end-volume
237:
+------------------------------------------------------------------------------+
[2016-04-01 05:03:14.643881] W [glusterfsd.c:1251:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7dc5) [0x7f9c219c9dc5] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x7f9c230348b5] -->/usr/sbin/glusterfs(cleanup_and_exit+0x69) [0x7f9c23034739] ) 0-: received signum (15), shutting down
==> nfs.log <==
[2016-04-01 12:54:42.426597] W [MSGID: 122053] [ec-common.c:116:ec_check_status] 0-mainvol-disperse-1: Operation failed on some subvolumes (up=3F, mask=3F, remaining=10, good=2D, bad=2)
[2016-04-01 12:59:47.138952] E [MSGID: 114030] [client-rpc-fops.c:3022:client3_3_readv_cbk] 0-mainvol-client-4: XDR decoding failed [Invalid argument]
[2016-04-01 12:59:47.139022] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-mainvol-client-4: remote operation failed [Invalid argument]
[2016-04-01 12:59:47.141444] W [MSGID: 122053] [ec-common.c:116:ec_check_status] 0-mainvol-disperse-0: Operation failed on some subvolumes (up=3F, mask=3F, remaining=4, good=2B, bad=10)
[2016-04-01 13:38:07.961739] E [MSGID: 114030] [client-rpc-fops.c:3022:client3_3_readv_cbk] 0-mainvol-client-7: XDR decoding failed [Invalid argument]
[2016-04-01 13:38:07.962014] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-mainvol-client-7: remote operation failed [Invalid argument]
[2016-04-01 13:38:07.964187] W [MSGID: 122053] [ec-common.c:116:ec_check_status] 0-mainvol-disperse-1: Operation failed on some subvolumes (up=3F, mask=3F, remaining=10, good=2D, bad=2)
[2016-04-01 15:16:17.152097] E [MSGID: 114030] [client-rpc-fops.c:3022:client3_3_readv_cbk] 0-mainvol-client-1: XDR decoding failed [Invalid argument]
[2016-04-01 15:16:17.159452] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-mainvol-client-1: remote operation failed [Invalid argument]
[2016-04-01 15:16:17.159833] W [MSGID: 122053] [ec-common.c:116:ec_check_status] 0-mainvol-disperse-0: Operation failed on some subvolumes (up=3F, mask=3F, remaining=1, good=3C, bad=2)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4169 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160403/33482a61/attachment.p7s>
More information about the Gluster-users
mailing list