[Gluster-users] Setting gfid failed on slave geo-rep node

Mon Feb 1 13:31:29 UTC 2016

Hi,

Thank you for your answer, below is the ouput of the requested commands. There is just one issue with the GFID, as it does not seem to work. I am running the getfattr command on the master but if I run it on the slave node it also says operation not supported.

# getfattr -n glusterfs.gfid.string  -m .  logo-login-09.svg
logo-login-04.svg: glusterfs.gfid.string: Operation not supported

# file logo-login-09.svg
logo-login-04.svg: ASCII text, with very long lines, with no line terminators

# gluster version
3.7.6

#gluster volume info
Volume Name: myvolume
Type: Replicate
Volume ID: *REMOVED*
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: gfs1a.domain.tld:/data/myvolume/brick
Brick2: gfs1b.domain.tld:/data/myvolume/brick
Options Reconfigured:
changelog.changelog: on
geo-replication.ignore-pid-check: on
geo-replication.indexing: on
performance.readdir-ahead: on
nfs.disable: on

# gluster volume geo-replication status
MASTER NODE    MASTER VOL     MASTER BRICK               SLAVE USER    SLAVE                                                   SLAVE NODE                       STATUS     CRAWL STATUS       LAST_SYNCED
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
gfs1a          myvolume    /data/myvolume/brick    root          ssh://gfs1geo.domain.tld::myvolume-geo    gfs1geo.domain.tld    Active     Changelog Crawl    2016-02-01 09:29:26
gfs1b          myvolume    /data/myvolume/brick    root          ssh://gfs1geo.domain.tld::myvolume-geo    gfs1geo.domain.tld    Passive    N/A                N/A

Regards
ML

On Monday, February 1, 2016 1:30 PM, Saravanakumar Arumugam <sarumuga at redhat.com> wrote:
Hi,

On 02/01/2016 02:14 PM, ML mail wrote:
> Hello,
>
> I just set up distributed geo-replication to a slave on my 2 nodes' replicated volume and noticed quite a few error messages (around 70 of them) in the slave's brick log file:
>
> The exact log file is: /var/log/glusterfs/bricks/data-myvolume-geo-brick.log
>
> [2016-01-31 22:19:29.524370] E [MSGID: 113020] [posix.c:1221:posix_mknod] 0-myvolume-geo-posix: setting gfid on /data/myvolume-geo/brick/data/username/files/shared/logo-login-09.svg.ocTransferId1789604916.part failed
> [2016-01-31 22:19:29.535478] W [MSGID: 113026] [posix.c:1338:posix_mkdir] 0-myvolume-geo-posix: mkdir (/data/username/files_encryption/keys/files/shared/logo-login-09.svg.ocTransferId1789604916.part): gfid (15bbcec6-a332-4c21-81e4-c52472b1e13d) isalready associated with directory (/data/myvolume-geo/brick/.glusterfs/49/5d/495d6868-4844-4632-8ff9-ad9646a878fe/logo-login-09.svg). Hence,both directories will share same gfid and thiscan lead to inconsistencies.
Can you grep for this gfid(of the corresponding files) in changelogs and 
share those files ?

{
For example:

1. Get gfid of the files like this:

# getfattr -n glusterfs.gfid.string  -m .  /mnt/slave/file456
getfattr: Removing leading '/' from absolute path names
# file: mnt/slave/file456
glusterfs.gfid.string="05b22446-de9e-42df-a63e-399c24d690c4"

2. grep for the corresponding gfid in brick back end like below:

[root at gfvm3 changelogs]# grep 05b22446-de9e-42df-a63e-399c24d690c4 
/opt/volume_test/tv_2/b1/.glusterfs/changelogs/ -rn
Binary file 
/opt/volume_test/tv_2/b1/.glusterfs/changelogs/CHANGELOG.1454135265 matches
Binary file 
/opt/volume_test/tv_2/b1/.glusterfs/changelogs/CHANGELOG.1454135476 matches

}
This will help in understanding what operations are carried out in 
master volume, which leads to this inconsistency.

Also, get the following:
gluster version
gluster volume info
gluster volume geo-replication status

>
> This doesn't look good at all because the file mentioned in the error message (
> logo-login-09.svg.ocTransferId1789604916.part) is left there with 0 kbytes and does not get deleted or cleaned up by glusterfs, leaving my geo-rep slave node in an inconsistent state which does not reflect the reality from the master nodes. The master nodes don't have that file anymore (which is correct). Here below is an "ls" of the concerned file with the correct file on top.
>
>
> -rw-r--r-- 2 www-data www-data   24312 Jan  6  2014 logo-login-09.svg
> -rw-r--r-- 1 root     root           0 Jan 31 23:19 logo-login-09.svg.ocTransferId1789604916.part
Rename issues in geo-replication are fixed lately. This looks similar to 

one.

Thanks,
Saravana