[Bugs] [Bug 1181669] New: File replicas differ in content even as heal info lists 0 entries in replica 2 setup

bugzilla at redhat.com bugzilla at redhat.com
Tue Jan 13 14:56:45 UTC 2015


https://bugzilla.redhat.com/show_bug.cgi?id=1181669

            Bug ID: 1181669
           Summary: File replicas differ in content even as heal info
                    lists 0 entries in replica 2 setup
           Product: GlusterFS
           Version: 3.4.2
         Component: replicate
          Severity: high
          Assignee: bugs at gluster.org
          Reporter: chalcogen_eg_oxygen at yahoo.com
                CC: bugs at gluster.org, gluster-bugs at redhat.com



Description of problem:

I have a replica 2 setup as follows:
------------
root at gfs_serv0:~] gluster v info gfs_replicated_vol

Volume Name: gfs_replicated_vol
Type: Replicate
Volume ID: 4e72e0cc-318f-4706-92af-4a56fc793063
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: gfs_serv0:/mnt/bricks/gfs_replicated_vol/brick
Brick2: gfs_serv1:/mnt/bricks/gfs_replicated_vol/brick
Options Reconfigured:
nfs.disable: on
cluster.self-heal-daemon: on
nfs.enable-ino32: on
network.ping-timeout: 10
root at gfs_serv0:~]

I check for heal info and I find that all files are in sync:
------------
root at gfs_serv0:~] gluster volume heal gfs_replicated_vol info


Gathering Heal info on volume _tftpboot has been successful

Brick gfs_serv0:/mnt/bricks/gfs_replicated_vol/brick

Number of entries: 0


Brick gfs_serv1:/mnt/bricks/gfs_replicated_vol/brick

Number of entries: 0

root at gfs_serv0:~] 

However, for a few files, I see that the md5sum of their replicas (in the two
bricks) differ (even as their sizes are identical). To illustrate,
------------
root at gfs_serv0:~] ls -li /mnt/bricks/gfs_replicated_vol/brick/bin/bash

4187497 -rwxr-xr-x 2 root root 666648 Dec  6 22:48
/mnt/bricks/gfs_replicated_vol/brick/bin/bash

root at gfs_serv0:~] md5sum /mnt/bricks/gfs_replicated_vol/brick/bin/bash

fc61db7be6eeda79f0b0bff58e622ace  /mnt/bricks/gfs_replicated_vol/brick/bin/bash

root at gfs_serv1:~] ls -li /mnt/bricks/gfs_replicated_vol/brick/bin/bash

8389144 -rwxr-xr-x 2 root root 666648 Dec  6 22:48
/mnt/bricks/gfs_replicated_vol/brick/bin/bash

root at gfs_serv1:~] md5sum /mnt/bricks/gfs_replicated_vol/brick/bin/bash

154b9852621b1651aff4af0764897c9a  /mnt/bricks/gfs_replicated_vol/brick/bin/bash

Further information on the two replicas:
------------
root at gfs_serv0:~] getfattr -m . -d -e hex
/mnt/bricks/gfs_replicated_vol/brick/bin/bash

getfattr: Removing leading '/' from absolute path names

# file: mnt/bricks/gfs_replicated_vol/brick/bin/bash

trusted.afr.gfs_replicated_vol-client-0=0x000000000000000000000000

trusted.afr.gfs_replicated_vol-client-1=0x000000000000000000000000

trusted.gfid=0xc577569a74cb4f23825daef95e9dcbb4



root at gfs_serv1:~] getfattr -m . -d -e hex
/mnt/bricks/gfs_replicated_vol/brick/bin/bash

getfattr: Removing leading '/' from absolute path names

# file: mnt/bricks/gfs_replicated_vol/brick/bin/bash

trusted.afr.gfs_replicated_vol-client-0=0x000000000000000000000000

trusted.afr.gfs_replicated_vol-client-1=0x000000000000000000000000

trusted.gfid=0xc577569a74cb4f23825daef95e9dcbb4


I tried to force a heal by triggering a lookup on the file using ls -l
/gfs_replicated_vol/brick/bin/bash but that made no difference.

Version-Release number of selected component (if applicable):


How reproducible: Intermittent. 


Steps to Reproduce:

I do not have a simple way to replicate this - just something we see on our
servers on rare days.

Actual results:

Files are not in-sync even as gluster volume heal info does not report a single
entry.

Expected results:

In the case that files are not in sync, glusterfs should be able to identify
them and report them to the system administrator under the heal info command.

Additional info:

None

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Bugs mailing list