[Bugs] [Bug 1296496] [georep+disperse]: Geo-Rep session went to faulty with errors "[Errno 5] Input/output error"

bugzilla at redhat.com bugzilla at redhat.com
Thu Jan 14 12:04:42 UTC 2016


https://bugzilla.redhat.com/show_bug.cgi?id=1296496

Kotresh HR <khiremat at redhat.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |khiremat at redhat.com



--- Comment #1 from Kotresh HR <khiremat at redhat.com> ---
Description of problem:
=======================

Georep session went to faulty with following errors in geo-rep logs:

[2015-12-24 10:57:45.463694] E
[syncdutils(/rhs/brick2/ct-8):276:log_raise_exception] <top>: FAIL: 
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 165, in main
    main_i()
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 662, in
main_i
    local.service_loop(*[r for r in [remote] if r])
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1439, in
service_loop
    g3.crawlwrap(oneshot=True)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 586, in
crawlwrap
    '.', '.'.join([str(self.uuid), str(gconf.slave_id)]))
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 323, in ff
    return f(*a)
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 489, in
stime_mnt
    8)
  File "/usr/libexec/glusterfs/python/syncdaemon/libcxattr.py", line 55, in
lgetxattr
    return cls._query_xattr(path, siz, 'lgetxattr', attr)
  File "/usr/libexec/glusterfs/python/syncdaemon/libcxattr.py", line 47, in
_query_xattr
    cls.raise_oserr()
  File "/usr/libexec/glusterfs/python/syncdaemon/libcxattr.py", line 37, in
raise_oserr
    raise OSError(errn, os.strerror(errn))
OSError: [Errno 5] Input/output error


getfattr on slave mount logs: Input/Output error

[root at dhcp37-133 ~]# getfattr -d -m . -e hex /mnt/test/
getfattr: Removing leading '/' from absolute path names
# file: mnt/test/
security.selinux=0x73797374656d5f753a6f626a6563745f723a6675736566735f743a733000
/mnt/test/:
trusted.glusterfs.1dd75524-8cc9-4d93-9b14-518021c8df3f.5fc0f095-23c4-4b96-9d94-69decd14f1d4.stime:
Input/output error
trusted.glusterfs.1dd75524-8cc9-4d93-9b14-518021c8df3f.xtime=0x3000
trusted.tier.tier-dht=0x000000010000000000000000ffffffff
trusted.tier.tier-dht.commithash=0x3330313736383334313800

[root at dhcp37-133 ~]# 

[root at dhcp37-133 ~]# mount | grep test
10.70.37.165:/tiervolume on /mnt/test type fuse.glusterfs
(rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
[root at dhcp37-133 ~]# 


Client log snippet:
===================

# less /var/log/glusterfs/mnt-test.log 

[2015-12-24 10:28:50.791227] W [MSGID: 122002] [ec-common.c:71:ec_heal_report]
0-tiervolume-disperse-1: Heal failed [Input/output error]
The message "W [MSGID: 122002] [ec-common.c:71:ec_heal_report]
0-tiervolume-disperse-1: Heal failed [Input/output error]" repeated 6 times
between [2015-12-24 10:28:50.791227] and [2015-12-24 10:28:51.062863]
[2015-12-24 10:39:42.715503] W [MSGID: 122056]
[ec-combine.c:866:ec_combine_check] 0-tiervolume-disperse-1: Mismatching xdata
in answers of 'LOOKUP'
[2015-12-24 10:39:42.718869] W [MSGID: 122053]
[ec-common.c:116:ec_check_status] 0-tiervolume-disperse-1: Operation failed on
some subvolumes (up=3F, mask=3F, remaining=0, good=3E, bad=1)
[2015-12-24 10:39:42.727887] W [MSGID: 122002] [ec-common.c:71:ec_heal_report]
0-tiervolume-disperse-1: Heal failed [Input/output error]
[2015-12-24 10:39:42.919641] N [MSGID: 122031]
[ec-generic.c:1133:ec_combine_xattrop] 0-tiervolume-disperse-1: Mismatching
dictionary in answers of 'GF_FOP_XATTROP'
[2015-12-24 10:39:42.919750] W [MSGID: 122040]
[ec-common.c:907:ec_prepare_update_cbk] 0-tiervolume-disperse-1: Failed to get
size and version [Input/output error]
[2015-12-24 10:39:42.926486] W [fuse-bridge.c:3355:fuse_xattr_cbk]
0-glusterfs-fuse: 15:
GETXATTR(trusted.glusterfs.1dd75524-8cc9-4d93-9b14-518021c8df3f.5fc0f095-23c4-4b96-9d94-69decd14f1d4.stime)
/ => -1 (Input/output error)
[2015-12-24 10:39:42.925954] N [MSGID: 122031]
[ec-generic.c:1133:ec_combine_xattrop] 0-tiervolume-disperse-1: Mismatching
dictionary in answers of 'GF_FOP_XATTROP'
[2015-12-24 10:39:42.926445] W [MSGID: 122040]
[ec-common.c:907:ec_prepare_update_cbk] 0-tiervolume-disperse-1: Failed to get
size and version [Input/output error]
[2015-12-24 10:58:58.908160] W [MSGID: 122056]
[ec-combine.c:866:ec_combine_check] 0-tiervolume-disperse-1: Mismatching xdata
in answers of 'LOOKUP'
[2015-12-24 10:58:58.909422] W [MSGID: 122053]
[ec-common.c:116:ec_check_status] 0-tiervolume-disperse-1: Operation failed on
some subvolumes (up=3F, mask=3F, remaining=0, good=3E, bad=1)
[2015-12-24 10:58:58.918637] W [MSGID: 122002] [ec-common.c:71:ec_heal_report]
0-tiervolume-disperse-1: Heal failed [Input/output error]
[2015-12-24 10:58:58.922502] N [MSGID: 122031]
[ec-generic.c:1133:ec_combine_xattrop] 0-tiervolume-disperse-1: Mismatching
dictionary in answers of 'GF_FOP_XATTROP'
[2015-12-24 10:58:58.924043] W [MSGID: 122053]
[ec-common.c:116:ec_check_status] 0-tiervolume-disperse-1: Operation failed on
some subvolumes (up=3F, mask=3E, remaining=0, good=3E, bad=1)
The message "N [MSGID: 122031] [ec-generic.c:1133:ec_combine_xattrop]
0-tiervolume-disperse-1: Mismatching dictionary in answers of 'GF_FOP_XATTROP'"
repeated 2 times between [2015-12-24 10:58:58.922502] and [2015-12-24
10:58:58.972485]
[2015-12-24 10:58:58.973055] W [MSGID: 122040]
[ec-common.c:907:ec_prepare_update_cbk] 0-tiervolume-disperse-1: Failed to get
size and version [Input/output error]
[2015-12-24 10:58:58.973187] W [fuse-bridge.c:3355:fuse_xattr_cbk]
0-glusterfs-fuse: 19:
GETXATTR(trusted.glusterfs.1dd75524-8cc9-4d93-9b14-518021c8df3f.5fc0f095-23c4-4b96-9d94-69decd14f1d4.stime)
/ => -1 (Input/output error)
[2015-12-24 10:58:58.989738] N [MSGID: 122031]
[ec-generic.c:1133:ec_combine_xattrop] 0-tiervolume-disperse-1: Mismatching
dictionary in answers of 'GF_FOP_XATTROP'


Following are the ec.version for disperse subvolume 1:
======================================================

[root at dhcp37-165 ~]# getfattr -d -e hex -m . /rhs/brick2/ct-7/
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick2/ct-7/
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.ec.version=0x00000000000000000000000000000011
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.1dd75524-8cc9-4d93-9b14-518021c8df3f.5fc0f095-23c4-4b96-9d94-69decd14f1d4.stime=0x567bc34f00000000
trusted.glusterfs.1dd75524-8cc9-4d93-9b14-518021c8df3f.xtime=0x567bc348000b7704
trusted.glusterfs.dht=0x00000001000000007ffa1668ffffffff
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.size.1=0x000000000000000000000000000000000000000000000003
trusted.glusterfs.volume-id=0x1dd755248cc94d939b14518021c8df3f
trusted.tier.tier-dht=0x000000010000000000000000ba2cafe3
trusted.tier.tier-dht.commithash=0x3330313736373533323400

[root at dhcp37-165 ~]# 



[root at dhcp37-133 ~]# getfattr -d -e hex -m . /rhs/brick2/ct-8/
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick2/ct-8/
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.ec.dirty=0x0000000000000000000000000000000e
trusted.ec.version=0x00000000000000000000000000000020
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.1dd75524-8cc9-4d93-9b14-518021c8df3f.5fc0f095-23c4-4b96-9d94-69decd14f1d4.stime=0x567bc34c00000000
trusted.glusterfs.1dd75524-8cc9-4d93-9b14-518021c8df3f.xtime=0x567bc348000b781e
trusted.glusterfs.dht=0x00000001000000007ffa1668ffffffff
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.size.1=0x000000000000000000000000000000000000000000000003
trusted.glusterfs.volume-id=0x1dd755248cc94d939b14518021c8df3f
trusted.tier.tier-dht=0x000000010000000000000000ffffffff
trusted.tier.tier-dht.commithash=0x3330313736383334313800

[root at dhcp37-133 ~]#


[root at dhcp37-160 ~]# getfattr -d -e hex -m . /rhs/brick2/ct-9/
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick2/ct-9/
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.ec.dirty=0x0000000000000000000000000000000e
trusted.ec.version=0x00000000000000000000000000000020
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.1dd75524-8cc9-4d93-9b14-518021c8df3f.5fc0f095-23c4-4b96-9d94-69decd14f1d4.stime=0x567bc34b00000000
trusted.glusterfs.1dd75524-8cc9-4d93-9b14-518021c8df3f.xtime=0x567bc348000b780d
trusted.glusterfs.dht=0x00000001000000007ffa1668ffffffff
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.size.1=0x000000000000000000000000000000000000000000000003
trusted.glusterfs.volume-id=0x1dd755248cc94d939b14518021c8df3f
trusted.tier.tier-dht=0x000000010000000000000000ffffffff
trusted.tier.tier-dht.commithash=0x3330313736383334313800

[root at dhcp37-160 ~]# 



[root at dhcp37-158 ~]# getfattr -d -e hex -m . /rhs/brick2/ct-10/
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick2/ct-10/
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.ec.dirty=0x0000000000000000000000000000000e
trusted.ec.version=0x00000000000000000000000000000020
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.1dd75524-8cc9-4d93-9b14-518021c8df3f.5fc0f095-23c4-4b96-9d94-69decd14f1d4.stime=0x567bc34f00000000
trusted.glusterfs.1dd75524-8cc9-4d93-9b14-518021c8df3f.xtime=0x567bc348000b7c51
trusted.glusterfs.dht=0x00000001000000007ffa1668ffffffff
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.size.1=0x000000000000000000000000000000000000000000000003
trusted.glusterfs.volume-id=0x1dd755248cc94d939b14518021c8df3f
trusted.tier.tier-dht=0x000000010000000000000000ffffffff
trusted.tier.tier-dht.commithash=0x3330313736383334313800

[root at dhcp37-158 ~]# 


[root at dhcp37-110 ~]# getfattr -d -e hex -m . /rhs/brick2/ct-11/
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick2/ct-11/
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.ec.dirty=0x0000000000000000000000000000000e
trusted.ec.version=0x00000000000000000000000000000020
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.1dd75524-8cc9-4d93-9b14-518021c8df3f.5fc0f095-23c4-4b96-9d94-69decd14f1d4.stime=0x567bc34b00000000
trusted.glusterfs.1dd75524-8cc9-4d93-9b14-518021c8df3f.xtime=0x567bc348000b7714
trusted.glusterfs.dht=0x00000001000000007ffa1668ffffffff
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.size.1=0x000000000000000000000000000000000000000000000003
trusted.glusterfs.volume-id=0x1dd755248cc94d939b14518021c8df3f
trusted.tier.tier-dht=0x000000010000000000000000ffffffff
trusted.tier.tier-dht.commithash=0x3330313736383334313800

[root at dhcp37-110 ~]# 


[root at dhcp37-155 ~]# getfattr -d -e hex -m . /rhs/brick2/ct-12/
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick2/ct-12/
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.ec.dirty=0x0000000000000000000000000000000e
trusted.ec.version=0x00000000000000000000000000000020
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.1dd75524-8cc9-4d93-9b14-518021c8df3f.5fc0f095-23c4-4b96-9d94-69decd14f1d4.stime=0x567bc34f00000000
trusted.glusterfs.1dd75524-8cc9-4d93-9b14-518021c8df3f.xtime=0x567bc348000b7989
trusted.glusterfs.dht=0x00000001000000007ffa1668ffffffff
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.size.1=0x000000000000000000000000000000000000000000000003
trusted.glusterfs.volume-id=0x1dd755248cc94d939b14518021c8df3f
trusted.tier.tier-dht=0x000000010000000000000000ffffffff
trusted.tier.tier-dht.commithash=0x3330313736383334313800

[root at dhcp37-155 ~]# 


[root at dhcp37-165 ~]# gluster volume info tiervolume 

Volume Name: tiervolume
Type: Distributed-Disperse
Volume ID: 1dd75524-8cc9-4d93-9b14-518021c8df3f
Status: Started
Number of Bricks: 2 x (4 + 2) = 12
Transport-type: tcp
Bricks:
Brick1: 10.70.37.165:/rhs/brick1/ct-1
Brick2: 10.70.37.133:/rhs/brick1/ct-2
Brick3: 10.70.37.160:/rhs/brick1/ct-3
Brick4: 10.70.37.158:/rhs/brick1/ct-4
Brick5: 10.70.37.110:/rhs/brick1/ct-5
Brick6: 10.70.37.155:/rhs/brick1/ct-6
Brick7: 10.70.37.165:/rhs/brick2/ct-7
Brick8: 10.70.37.133:/rhs/brick2/ct-8
Brick9: 10.70.37.160:/rhs/brick2/ct-9
Brick10: 10.70.37.158:/rhs/brick2/ct-10
Brick11: 10.70.37.110:/rhs/brick2/ct-11
Brick12: 10.70.37.155:/rhs/brick2/ct-12
Options Reconfigured:
changelog.changelog: on
geo-replication.ignore-pid-check: on
geo-replication.indexing: on
features.quota-deem-statfs: on
features.inode-quota: on
features.quota: on
performance.readdir-ahead: on
cluster.enable-shared-storage: enable
[root at dhcp37-165 ~]# 

<Note: It was a tier volume when the original issue has been seen. The output
above is after detach>


Steps Carried:
==============
1. Create Master volume Tiered {HT: 3x2, CT: 2x(4+2)} 
2. Create Slave volume (4x2)
3. Create geo-rep session
4. Start geo-rep session

Actual results:
===============

All the passive bricks went to faulty


Expected results:
=================

Geo-Rep should be ACTIVE

-- 
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=BBTidr7he9&a=cc_unsubscribe


More information about the Bugs mailing list