[Bugs] [Bug 1296496] [georep+disperse]: Geo-Rep session went to faulty with errors "[Errno 5] Input/output error"
bugzilla at redhat.com
bugzilla at redhat.com
Thu Jan 14 12:04:42 UTC 2016
https://bugzilla.redhat.com/show_bug.cgi?id=1296496
Kotresh HR <khiremat at redhat.com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |khiremat at redhat.com
--- Comment #1 from Kotresh HR <khiremat at redhat.com> ---
Description of problem:
=======================
Georep session went to faulty with following errors in geo-rep logs:
[2015-12-24 10:57:45.463694] E
[syncdutils(/rhs/brick2/ct-8):276:log_raise_exception] <top>: FAIL:
Traceback (most recent call last):
File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 165, in main
main_i()
File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 662, in
main_i
local.service_loop(*[r for r in [remote] if r])
File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1439, in
service_loop
g3.crawlwrap(oneshot=True)
File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 586, in
crawlwrap
'.', '.'.join([str(self.uuid), str(gconf.slave_id)]))
File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 323, in ff
return f(*a)
File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 489, in
stime_mnt
8)
File "/usr/libexec/glusterfs/python/syncdaemon/libcxattr.py", line 55, in
lgetxattr
return cls._query_xattr(path, siz, 'lgetxattr', attr)
File "/usr/libexec/glusterfs/python/syncdaemon/libcxattr.py", line 47, in
_query_xattr
cls.raise_oserr()
File "/usr/libexec/glusterfs/python/syncdaemon/libcxattr.py", line 37, in
raise_oserr
raise OSError(errn, os.strerror(errn))
OSError: [Errno 5] Input/output error
getfattr on slave mount logs: Input/Output error
[root at dhcp37-133 ~]# getfattr -d -m . -e hex /mnt/test/
getfattr: Removing leading '/' from absolute path names
# file: mnt/test/
security.selinux=0x73797374656d5f753a6f626a6563745f723a6675736566735f743a733000
/mnt/test/:
trusted.glusterfs.1dd75524-8cc9-4d93-9b14-518021c8df3f.5fc0f095-23c4-4b96-9d94-69decd14f1d4.stime:
Input/output error
trusted.glusterfs.1dd75524-8cc9-4d93-9b14-518021c8df3f.xtime=0x3000
trusted.tier.tier-dht=0x000000010000000000000000ffffffff
trusted.tier.tier-dht.commithash=0x3330313736383334313800
[root at dhcp37-133 ~]#
[root at dhcp37-133 ~]# mount | grep test
10.70.37.165:/tiervolume on /mnt/test type fuse.glusterfs
(rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
[root at dhcp37-133 ~]#
Client log snippet:
===================
# less /var/log/glusterfs/mnt-test.log
[2015-12-24 10:28:50.791227] W [MSGID: 122002] [ec-common.c:71:ec_heal_report]
0-tiervolume-disperse-1: Heal failed [Input/output error]
The message "W [MSGID: 122002] [ec-common.c:71:ec_heal_report]
0-tiervolume-disperse-1: Heal failed [Input/output error]" repeated 6 times
between [2015-12-24 10:28:50.791227] and [2015-12-24 10:28:51.062863]
[2015-12-24 10:39:42.715503] W [MSGID: 122056]
[ec-combine.c:866:ec_combine_check] 0-tiervolume-disperse-1: Mismatching xdata
in answers of 'LOOKUP'
[2015-12-24 10:39:42.718869] W [MSGID: 122053]
[ec-common.c:116:ec_check_status] 0-tiervolume-disperse-1: Operation failed on
some subvolumes (up=3F, mask=3F, remaining=0, good=3E, bad=1)
[2015-12-24 10:39:42.727887] W [MSGID: 122002] [ec-common.c:71:ec_heal_report]
0-tiervolume-disperse-1: Heal failed [Input/output error]
[2015-12-24 10:39:42.919641] N [MSGID: 122031]
[ec-generic.c:1133:ec_combine_xattrop] 0-tiervolume-disperse-1: Mismatching
dictionary in answers of 'GF_FOP_XATTROP'
[2015-12-24 10:39:42.919750] W [MSGID: 122040]
[ec-common.c:907:ec_prepare_update_cbk] 0-tiervolume-disperse-1: Failed to get
size and version [Input/output error]
[2015-12-24 10:39:42.926486] W [fuse-bridge.c:3355:fuse_xattr_cbk]
0-glusterfs-fuse: 15:
GETXATTR(trusted.glusterfs.1dd75524-8cc9-4d93-9b14-518021c8df3f.5fc0f095-23c4-4b96-9d94-69decd14f1d4.stime)
/ => -1 (Input/output error)
[2015-12-24 10:39:42.925954] N [MSGID: 122031]
[ec-generic.c:1133:ec_combine_xattrop] 0-tiervolume-disperse-1: Mismatching
dictionary in answers of 'GF_FOP_XATTROP'
[2015-12-24 10:39:42.926445] W [MSGID: 122040]
[ec-common.c:907:ec_prepare_update_cbk] 0-tiervolume-disperse-1: Failed to get
size and version [Input/output error]
[2015-12-24 10:58:58.908160] W [MSGID: 122056]
[ec-combine.c:866:ec_combine_check] 0-tiervolume-disperse-1: Mismatching xdata
in answers of 'LOOKUP'
[2015-12-24 10:58:58.909422] W [MSGID: 122053]
[ec-common.c:116:ec_check_status] 0-tiervolume-disperse-1: Operation failed on
some subvolumes (up=3F, mask=3F, remaining=0, good=3E, bad=1)
[2015-12-24 10:58:58.918637] W [MSGID: 122002] [ec-common.c:71:ec_heal_report]
0-tiervolume-disperse-1: Heal failed [Input/output error]
[2015-12-24 10:58:58.922502] N [MSGID: 122031]
[ec-generic.c:1133:ec_combine_xattrop] 0-tiervolume-disperse-1: Mismatching
dictionary in answers of 'GF_FOP_XATTROP'
[2015-12-24 10:58:58.924043] W [MSGID: 122053]
[ec-common.c:116:ec_check_status] 0-tiervolume-disperse-1: Operation failed on
some subvolumes (up=3F, mask=3E, remaining=0, good=3E, bad=1)
The message "N [MSGID: 122031] [ec-generic.c:1133:ec_combine_xattrop]
0-tiervolume-disperse-1: Mismatching dictionary in answers of 'GF_FOP_XATTROP'"
repeated 2 times between [2015-12-24 10:58:58.922502] and [2015-12-24
10:58:58.972485]
[2015-12-24 10:58:58.973055] W [MSGID: 122040]
[ec-common.c:907:ec_prepare_update_cbk] 0-tiervolume-disperse-1: Failed to get
size and version [Input/output error]
[2015-12-24 10:58:58.973187] W [fuse-bridge.c:3355:fuse_xattr_cbk]
0-glusterfs-fuse: 19:
GETXATTR(trusted.glusterfs.1dd75524-8cc9-4d93-9b14-518021c8df3f.5fc0f095-23c4-4b96-9d94-69decd14f1d4.stime)
/ => -1 (Input/output error)
[2015-12-24 10:58:58.989738] N [MSGID: 122031]
[ec-generic.c:1133:ec_combine_xattrop] 0-tiervolume-disperse-1: Mismatching
dictionary in answers of 'GF_FOP_XATTROP'
Following are the ec.version for disperse subvolume 1:
======================================================
[root at dhcp37-165 ~]# getfattr -d -e hex -m . /rhs/brick2/ct-7/
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick2/ct-7/
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.ec.version=0x00000000000000000000000000000011
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.1dd75524-8cc9-4d93-9b14-518021c8df3f.5fc0f095-23c4-4b96-9d94-69decd14f1d4.stime=0x567bc34f00000000
trusted.glusterfs.1dd75524-8cc9-4d93-9b14-518021c8df3f.xtime=0x567bc348000b7704
trusted.glusterfs.dht=0x00000001000000007ffa1668ffffffff
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.size.1=0x000000000000000000000000000000000000000000000003
trusted.glusterfs.volume-id=0x1dd755248cc94d939b14518021c8df3f
trusted.tier.tier-dht=0x000000010000000000000000ba2cafe3
trusted.tier.tier-dht.commithash=0x3330313736373533323400
[root at dhcp37-165 ~]#
[root at dhcp37-133 ~]# getfattr -d -e hex -m . /rhs/brick2/ct-8/
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick2/ct-8/
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.ec.dirty=0x0000000000000000000000000000000e
trusted.ec.version=0x00000000000000000000000000000020
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.1dd75524-8cc9-4d93-9b14-518021c8df3f.5fc0f095-23c4-4b96-9d94-69decd14f1d4.stime=0x567bc34c00000000
trusted.glusterfs.1dd75524-8cc9-4d93-9b14-518021c8df3f.xtime=0x567bc348000b781e
trusted.glusterfs.dht=0x00000001000000007ffa1668ffffffff
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.size.1=0x000000000000000000000000000000000000000000000003
trusted.glusterfs.volume-id=0x1dd755248cc94d939b14518021c8df3f
trusted.tier.tier-dht=0x000000010000000000000000ffffffff
trusted.tier.tier-dht.commithash=0x3330313736383334313800
[root at dhcp37-133 ~]#
[root at dhcp37-160 ~]# getfattr -d -e hex -m . /rhs/brick2/ct-9/
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick2/ct-9/
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.ec.dirty=0x0000000000000000000000000000000e
trusted.ec.version=0x00000000000000000000000000000020
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.1dd75524-8cc9-4d93-9b14-518021c8df3f.5fc0f095-23c4-4b96-9d94-69decd14f1d4.stime=0x567bc34b00000000
trusted.glusterfs.1dd75524-8cc9-4d93-9b14-518021c8df3f.xtime=0x567bc348000b780d
trusted.glusterfs.dht=0x00000001000000007ffa1668ffffffff
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.size.1=0x000000000000000000000000000000000000000000000003
trusted.glusterfs.volume-id=0x1dd755248cc94d939b14518021c8df3f
trusted.tier.tier-dht=0x000000010000000000000000ffffffff
trusted.tier.tier-dht.commithash=0x3330313736383334313800
[root at dhcp37-160 ~]#
[root at dhcp37-158 ~]# getfattr -d -e hex -m . /rhs/brick2/ct-10/
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick2/ct-10/
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.ec.dirty=0x0000000000000000000000000000000e
trusted.ec.version=0x00000000000000000000000000000020
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.1dd75524-8cc9-4d93-9b14-518021c8df3f.5fc0f095-23c4-4b96-9d94-69decd14f1d4.stime=0x567bc34f00000000
trusted.glusterfs.1dd75524-8cc9-4d93-9b14-518021c8df3f.xtime=0x567bc348000b7c51
trusted.glusterfs.dht=0x00000001000000007ffa1668ffffffff
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.size.1=0x000000000000000000000000000000000000000000000003
trusted.glusterfs.volume-id=0x1dd755248cc94d939b14518021c8df3f
trusted.tier.tier-dht=0x000000010000000000000000ffffffff
trusted.tier.tier-dht.commithash=0x3330313736383334313800
[root at dhcp37-158 ~]#
[root at dhcp37-110 ~]# getfattr -d -e hex -m . /rhs/brick2/ct-11/
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick2/ct-11/
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.ec.dirty=0x0000000000000000000000000000000e
trusted.ec.version=0x00000000000000000000000000000020
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.1dd75524-8cc9-4d93-9b14-518021c8df3f.5fc0f095-23c4-4b96-9d94-69decd14f1d4.stime=0x567bc34b00000000
trusted.glusterfs.1dd75524-8cc9-4d93-9b14-518021c8df3f.xtime=0x567bc348000b7714
trusted.glusterfs.dht=0x00000001000000007ffa1668ffffffff
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.size.1=0x000000000000000000000000000000000000000000000003
trusted.glusterfs.volume-id=0x1dd755248cc94d939b14518021c8df3f
trusted.tier.tier-dht=0x000000010000000000000000ffffffff
trusted.tier.tier-dht.commithash=0x3330313736383334313800
[root at dhcp37-110 ~]#
[root at dhcp37-155 ~]# getfattr -d -e hex -m . /rhs/brick2/ct-12/
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick2/ct-12/
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.ec.dirty=0x0000000000000000000000000000000e
trusted.ec.version=0x00000000000000000000000000000020
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.1dd75524-8cc9-4d93-9b14-518021c8df3f.5fc0f095-23c4-4b96-9d94-69decd14f1d4.stime=0x567bc34f00000000
trusted.glusterfs.1dd75524-8cc9-4d93-9b14-518021c8df3f.xtime=0x567bc348000b7989
trusted.glusterfs.dht=0x00000001000000007ffa1668ffffffff
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.size.1=0x000000000000000000000000000000000000000000000003
trusted.glusterfs.volume-id=0x1dd755248cc94d939b14518021c8df3f
trusted.tier.tier-dht=0x000000010000000000000000ffffffff
trusted.tier.tier-dht.commithash=0x3330313736383334313800
[root at dhcp37-155 ~]#
[root at dhcp37-165 ~]# gluster volume info tiervolume
Volume Name: tiervolume
Type: Distributed-Disperse
Volume ID: 1dd75524-8cc9-4d93-9b14-518021c8df3f
Status: Started
Number of Bricks: 2 x (4 + 2) = 12
Transport-type: tcp
Bricks:
Brick1: 10.70.37.165:/rhs/brick1/ct-1
Brick2: 10.70.37.133:/rhs/brick1/ct-2
Brick3: 10.70.37.160:/rhs/brick1/ct-3
Brick4: 10.70.37.158:/rhs/brick1/ct-4
Brick5: 10.70.37.110:/rhs/brick1/ct-5
Brick6: 10.70.37.155:/rhs/brick1/ct-6
Brick7: 10.70.37.165:/rhs/brick2/ct-7
Brick8: 10.70.37.133:/rhs/brick2/ct-8
Brick9: 10.70.37.160:/rhs/brick2/ct-9
Brick10: 10.70.37.158:/rhs/brick2/ct-10
Brick11: 10.70.37.110:/rhs/brick2/ct-11
Brick12: 10.70.37.155:/rhs/brick2/ct-12
Options Reconfigured:
changelog.changelog: on
geo-replication.ignore-pid-check: on
geo-replication.indexing: on
features.quota-deem-statfs: on
features.inode-quota: on
features.quota: on
performance.readdir-ahead: on
cluster.enable-shared-storage: enable
[root at dhcp37-165 ~]#
<Note: It was a tier volume when the original issue has been seen. The output
above is after detach>
Steps Carried:
==============
1. Create Master volume Tiered {HT: 3x2, CT: 2x(4+2)}
2. Create Slave volume (4x2)
3. Create geo-rep session
4. Start geo-rep session
Actual results:
===============
All the passive bricks went to faulty
Expected results:
=================
Geo-Rep should be ACTIVE
--
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=BBTidr7he9&a=cc_unsubscribe
More information about the Bugs
mailing list