[Gluster-devel] missing files

Xavier Hernandez xhernandez at datalab.es
Thu Feb 5 10:14:22 UTC 2015


Is the failure repeatable ? with the same directories ?

It's very weird that the directories appear on the volume when you do an 
'ls' on the bricks. Could it be that you only made a single 'ls' on fuse 
mount which not showed the directory ? Is it possible that this 'ls' 
triggered a self-heal that repaired the problem, whatever it was, and 
when you did another 'ls' on the fuse mount after the 'ls' on the 
bricks, the directories were there ?

The first 'ls' could have healed the files, causing that the following 
'ls' on the bricks showed the files as if nothing were damaged. If 
that's the case, it's possible that there were some disconnections 
during the copy.

Added Pranith because he knows better replication and self-heal details.

Xavi

On 02/04/2015 07:23 PM, David F. Robinson wrote:
> Distributed/replicated
>
> Volume Name: homegfs
> Type: Distributed-Replicate
> Volume ID: 1e32672a-f1b7-4b58-ba94-58c085e59071
> Status: Started
> Number of Bricks: 4 x 2 = 8
> Transport-type: tcp
> Bricks:
> Brick1: gfsib01a.corvidtec.com:/data/brick01a/homegfs
> Brick2: gfsib01b.corvidtec.com:/data/brick01b/homegfs
> Brick3: gfsib01a.corvidtec.com:/data/brick02a/homegfs
> Brick4: gfsib01b.corvidtec.com:/data/brick02b/homegfs
> Brick5: gfsib02a.corvidtec.com:/data/brick01a/homegfs
> Brick6: gfsib02b.corvidtec.com:/data/brick01b/homegfs
> Brick7: gfsib02a.corvidtec.com:/data/brick02a/homegfs
> Brick8: gfsib02b.corvidtec.com:/data/brick02b/homegfs
> Options Reconfigured:
> performance.io-thread-count: 32
> performance.cache-size: 128MB
> performance.write-behind-window-size: 128MB
> server.allow-insecure: on
> network.ping-timeout: 10
> storage.owner-gid: 100
> geo-replication.indexing: off
> geo-replication.ignore-pid-check: on
> changelog.changelog: on
> changelog.fsync-interval: 3
> changelog.rollover-time: 15
> server.manage-gids: on
>
>
> ------ Original Message ------
> From: "Xavier Hernandez" <xhernandez at datalab.es>
> To: "David F. Robinson" <david.robinson at corvidtec.com>; "Benjamin
> Turner" <bennyturns at gmail.com>
> Cc: "gluster-users at gluster.org" <gluster-users at gluster.org>; "Gluster
> Devel" <gluster-devel at gluster.org>
> Sent: 2/4/2015 6:03:45 AM
> Subject: Re: [Gluster-devel] missing files
>
>> On 02/04/2015 01:30 AM, David F. Robinson wrote:
>>> Sorry. Thought about this a little more. I should have been clearer.
>>> The files were on both bricks of the replica, not just one side. So,
>>> both bricks had to have been up... The files/directories just don't show
>>> up on the mount.
>>> I was reading and saw a related bug
>>> (https://bugzilla.redhat.com/show_bug.cgi?id=1159484). I saw it
>>> suggested to run:
>>>          find <mount> -d -exec getfattr -h -n trusted.ec.heal {} \;
>>
>> This command is specific for a dispersed volume. It won't do anything
>> (aside from the error you are seeing) on a replicated volume.
>>
>> I think you are using a replicated volume, right ?
>>
>> In this case I'm not sure what can be happening. Is your volume a pure
>> replicated one or a distributed-replicated ? on a pure replicated it
>> doesn't make sense that some entries do not show in an 'ls' when the
>> file is in both replicas (at least without any error message in the
>> logs). On a distributed-replicated it could be caused by some problem
>> while combining contents of each replica set.
>>
>> What's the configuration of your volume ?
>>
>> Xavi
>>
>>>
>>> I get a bunch of errors for operation not supported:
>>> [root at gfs02a homegfs]# find wks_backup -d -exec getfattr -h -n
>>> trusted.ec.heal {} \;
>>> find: warning: the -d option is deprecated; please use -depth instead,
>>> because the latter is a POSIX-compliant feature.
>>> wks_backup/homer_backup/backup: trusted.ec.heal: Operation not supported
>>> wks_backup/homer_backup/logs/2014_05_20.log: trusted.ec.heal: Operation
>>> not supported
>>> wks_backup/homer_backup/logs/2014_05_21.log: trusted.ec.heal: Operation
>>> not supported
>>> wks_backup/homer_backup/logs/2014_05_18.log: trusted.ec.heal: Operation
>>> not supported
>>> wks_backup/homer_backup/logs/2014_05_19.log: trusted.ec.heal: Operation
>>> not supported
>>> wks_backup/homer_backup/logs/2014_05_22.log: trusted.ec.heal: Operation
>>> not supported
>>> wks_backup/homer_backup/logs: trusted.ec.heal: Operation not supported
>>> wks_backup/homer_backup: trusted.ec.heal: Operation not supported
>>> ------ Original Message ------
>>> From: "Benjamin Turner" <bennyturns at gmail.com
>>> <mailto:bennyturns at gmail.com>>
>>> To: "David F. Robinson" <david.robinson at corvidtec.com
>>> <mailto:david.robinson at corvidtec.com>>
>>> Cc: "Gluster Devel" <gluster-devel at gluster.org
>>> <mailto:gluster-devel at gluster.org>>; "gluster-users at gluster.org"
>>> <gluster-users at gluster.org <mailto:gluster-users at gluster.org>>
>>> Sent: 2/3/2015 7:12:34 PM
>>> Subject: Re: [Gluster-devel] missing files
>>>> It sounds to me like the files were only copied to one replica, werent
>>>> there for the initial for the initial ls which triggered a self heal,
>>>> and were there for the last ls because they were healed. Is there any
>>>> chance that one of the replicas was down during the rsync? It could
>>>> be that you lost a brick during copy or something like that. To
>>>> confirm I would look for disconnects in the brick logs as well as
>>>> checking glusterfshd.log to verify the missing files were actually
>>>> healed.
>>>>
>>>> -b
>>>>
>>>> On Tue, Feb 3, 2015 at 5:37 PM, David F. Robinson
>>>> <david.robinson at corvidtec.com <mailto:david.robinson at corvidtec.com>>
>>>> wrote:
>>>>
>>>>     I rsync'd 20-TB over to my gluster system and noticed that I had
>>>>     some directories missing even though the rsync completed normally.
>>>>     The rsync logs showed that the missing files were transferred.
>>>>     I went to the bricks and did an 'ls -al
>>>>     /data/brick*/homegfs/dir/*' the files were on the bricks. After I
>>>>     did this 'ls', the files then showed up on the FUSE mounts.
>>>>     1) Why are the files hidden on the fuse mount?
>>>>     2) Why does the ls make them show up on the FUSE mount?
>>>>     3) How can I prevent this from happening again?
>>>>     Note, I also mounted the gluster volume using NFS and saw the same
>>>>     behavior. The files/directories were not shown until I did the
>>>>     "ls" on the bricks.
>>>>     David
>>>>     ===============================
>>>>     David F. Robinson, Ph.D.
>>>>     President - Corvid Technologies
>>>>     704.799.6944 x101 <tel:704.799.6944%20x101> [office]
>>>>     704.252.1310 <tel:704.252.1310> [cell]
>>>>     704.799.7974 <tel:704.799.7974> [fax]
>>>>     David.Robinson at corvidtec.com <mailto:David.Robinson at corvidtec.com>
>>>>     http://www.corvidtechnologies.com
>>>> <http://www.corvidtechnologies.com/>
>>>>
>>>>     _______________________________________________
>>>>     Gluster-devel mailing list
>>>>     Gluster-devel at gluster.org <mailto:Gluster-devel at gluster.org>
>>>>     http://www.gluster.org/mailman/listinfo/gluster-devel
>>>>
>>>>
>>>
>>>
>>> _______________________________________________
>>> Gluster-devel mailing list
>>> Gluster-devel at gluster.org
>>> http://www.gluster.org/mailman/listinfo/gluster-devel
>>>
>


More information about the Gluster-devel mailing list