[Bugs] [Bug 1502104] New: [geo-rep]: RSYNC throwing internal errors

Sat Oct 14 08:49:31 UTC 2017

https://bugzilla.redhat.com/show_bug.cgi?id=1502104

            Bug ID: 1502104
           Summary: [geo-rep]: RSYNC throwing internal errors
           Product: GlusterFS
           Version: 3.12
         Component: geo-replication
          Assignee: bugs at gluster.org
          Reporter: khiremat at redhat.com
                CC: avishwan at redhat.com, bugs at gluster.org,
                    csaba at redhat.com, rallan at redhat.com,
                    rhinduja at redhat.com, rhs-bugs at redhat.com,
                    sheggodu at redhat.com, storage-qa-internal at redhat.com
        Depends On: 1476876, 1500433

+++ This bug was initially created as a clone of Bug #1500433 +++

+++ This bug was initially created as a clone of Bug #1476876 +++

Description of problem:
=======================
Rsync throwing internal errors with  'rsync: get_xattr_data: lgetxattr"

[2017-07-31  09:46:14.352732] W [master(/rhs/brick3/b16):1067:process]
_GMaster:  incomplete sync, retrying changelogs: CHANGELOG.1501494366
[2017-07-31  09:46:15.125684] E [resource(/rhs/brick2/b10):1044:rsync] SSH:
SYNC  Error(Rsync): rsync: get_xattr_data: 
lgetxattr(""/proc/3840/cwd/.gfid/00000000-0000-0000-0000-000000000001"","trusted.glusterfs.volume-mark.2d516aed-ad11-43cc-8741-32bfc7391b74",0)
 failed: No data available (61)
[2017-07-31  09:46:15.126796] E [master(/rhs/brick2/b10):1046:process]
_GMaster:  changelogs CHANGELOG.1501494366 could not be processed completely - 
moving on...
[2017-07-31  09:46:15.132359] E [resource(/rhs/brick1/b4):1044:rsync] SSH: SYNC
 Error(Rsync): rsync: get_xattr_data: 
lgetxattr(""/proc/3838/cwd/.gfid/00000000-0000-0000-0000-000000000001"","trusted.glusterfs.volume-mark.2d516aed-ad11-43cc-8741-32bfc7391b74",0)
 failed: No data available (61)
[2017-07-31  09:46:15.133415] E [master(/rhs/brick1/b4):1046:process] _GMaster:
 changelogs CHANGELOG.1501494366 could not be processed completely -  moving
on...
[2017-07-31  09:46:15.158014] W [master(/rhs/brick3/b16):1067:process]
_GMaster:  incomplete sync, retrying changelogs: CHANGELOG.1501494366
[2017-07-31  09:46:16.12286] E [resource(/rhs/brick3/b16):1044:rsync] SSH: SYNC
 Error(Rsync): rsync: get_xattr_data: 
lgetxattr(""/proc/3839/cwd/.gfid/00000000-0000-0000-0000-000000000001"","trusted.glusterfs.volume-mark.2d516aed-ad11-43cc-8741-32bfc7391b74",0)
 failed: No data available (61)
[2017-07-31  09:46:16.13156] E [master(/rhs/brick3/b16):1046:process] _GMaster:
 changelogs CHANGELOG.1501494366 could not be processed completely -  moving
on...
[2017-07-31 09:47:21.598099] I [master(/rhs/brick2/b10):1132:crawl] _GMaster:
slave's time: (1501494365, 0)
[2017-07-31 09:47:21.616106] I [master(/rhs/brick1/b4):1132:crawl] _GMaster:
slave's time: (1501494365, 0)

Version-Release number of selected component (if applicable):
==============================================================
mainline

Steps to Reproduce:
=====================
1.Create a 6 node master cluster and a 6-node slave cluster
2.Create a 9x2 DR master volume and slave volume
3.Create and start non-root geo-replication session
4. Mount the master and slave volume
5. Create data  on the master mount :

for i in {create,chmod,chown,chgrp,hardlink,symlink,truncate,rename}; do echo
"------------------- This iteration is for fop $i -----------------" >>
/root/result ; crefi --multi -n 5 -b 10 -d 10 --max=10k --min=5k --random -T 10
-t text --fop=$i /mnt/master/ 1>/dev/null 2>&1 ; sleep 10 ; echo "---Arequal
Master for $i---" >> /root/result ; /root/arequal-checksum -p /mnt/master/ >>
/root/result ; sleep 600 ;  echo "---Arequal Slave for $i---" >> /root/result ;
/root/arequal-checksum -p /mnt/slave/ >> /root/result ; done

All fops are synced (create,chmod,chgrp,chown,hardlink,symlink,truncate,rename) 

How reproducible:
==============
Have seen this twice on non-root setup out of 4 trials.

--- Additional comment from Worker Ant on 2017-10-10 10:48:22 EDT ---

REVIEW: https://review.gluster.org/18479 (geo-rep: Filter out volume-mark
xattr) posted (#1) for review on master by Kotresh HR (khiremat at redhat.com)

--- Additional comment from Worker Ant on 2017-10-13 12:26:09 EDT ---

COMMIT: https://review.gluster.org/18479 committed in master by Jeff Darcy
(jeff at pl.atyp.us) 
------
commit c64fd0d4b0ef313bb44aae68a376ec0c9ee8657a
Author: Kotresh HR <khiremat at redhat.com>
Date:   Tue Oct 10 10:27:01 2017 -0400

    geo-rep: Filter out volume-mark xattr

    The volume-mark xattr, maintained at brick root
    of slave volume is specific to geo-replication
    and should be filtered out for all other clients.
    It should also be filtered out from list getxattr
    from all mounts including geo-rep mount as it
    might cause rsync to read and set.

    Change-Id: If9eb5a3af18051083c853e70d93b2819e8eea222
    BUG: 1500433
    Signed-off-by: Kotresh HR <khiremat at redhat.com>

Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1476876
[Bug 1476876] [geo-rep]: RSYNC throwing internal errors
https://bugzilla.redhat.com/show_bug.cgi?id=1500433
[Bug 1500433] [geo-rep]: RSYNC throwing internal errors
-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.