[Gluster-devel] [Gluster-users] missing files

Fri Feb 6 02:05:02 UTC 2015

----- Original Message -----
> From: "Ben Turner" <bturner at redhat.com>
> To: "Pranith Kumar Karampuri" <pkarampu at redhat.com>, "David F. Robinson" <david.robinson at corvidtec.com>
> Cc: "Xavier Hernandez" <xhernandez at datalab.es>, "Benjamin Turner" <bennyturns at gmail.com>, gluster-users at gluster.org,
> "Gluster Devel" <gluster-devel at gluster.org>
> Sent: Friday, February 6, 2015 3:25:28 AM
> Subject: Re: [Gluster-users] [Gluster-devel] missing files
> 
> ----- Original Message -----
> > From: "Pranith Kumar Karampuri" <pkarampu at redhat.com>
> > To: "Xavier Hernandez" <xhernandez at datalab.es>, "David F. Robinson"
> > <david.robinson at corvidtec.com>, "Benjamin Turner"
> > <bennyturns at gmail.com>
> > Cc: gluster-users at gluster.org, "Gluster Devel" <gluster-devel at gluster.org>
> > Sent: Thursday, February 5, 2015 5:30:04 AM
> > Subject: Re: [Gluster-users] [Gluster-devel] missing files
> > 
> > 
> > On 02/05/2015 03:48 PM, Pranith Kumar Karampuri wrote:
> > > I believe David already fixed this. I hope this is the same issue he
> > > told about permissions issue.
> > Oops, it is not. I will take a look.
> 
> Yes David exactly like these:
> 
> data-brick02a-homegfs.log:[2015-02-03 19:09:34.568842] I
> [server.c:518:server_rpc_notify] 0-homegfs-server: disconnecting connection
> from
> gfs02a.corvidtec.com-18563-2015/02/03-19:07:58:519134-homegfs-client-2-0-0
> data-brick02a-homegfs.log:[2015-02-03 19:09:41.286551] I
> [server.c:518:server_rpc_notify] 0-homegfs-server: disconnecting connection
> from
> gfs01a.corvidtec.com-12804-2015/02/03-19:09:38:497808-homegfs-client-2-0-0
> data-brick02a-homegfs.log:[2015-02-03 19:16:35.906412] I
> [server.c:518:server_rpc_notify] 0-homegfs-server: disconnecting connection
> from
> gfs02b.corvidtec.com-27190-2015/02/03-19:15:53:458467-homegfs-client-2-0-0
> data-brick02a-homegfs.log:[2015-02-03 19:51:22.761293] I
> [server.c:518:server_rpc_notify] 0-homegfs-server: disconnecting connection
> from
> gfs01a.corvidtec.com-25926-2015/02/03-19:51:02:89070-homegfs-client-2-0-0
> data-brick02a-homegfs.log:[2015-02-03 20:54:02.772180] I
> [server.c:518:server_rpc_notify] 0-homegfs-server: disconnecting connection
> from
> gfs01b.corvidtec.com-4175-2015/02/02-16:44:31:179119-homegfs-client-2-0-1
> 
> You can 100% verify my theory if you can correlate the time on the
> disconnects to the time that the missing files were healed.  Can you have a
> look at /var/log/glusterfs/glustershd.log?  That has all of the healed files
> + timestamps, if we can see a disconnect during the rsync and a self heal of
> the missing file I think we can safely assume that the disconnects may have
> caused this.  I'll try this on my test systems, how much data did you rsync?
> What size ish of files / an idea of the dir layout?
> 
> @Pranith - Could bricks flapping up and down during the rsync cause the files
> to be missing on the first ls(written to 1 subvol but not the other cause it
> was down), the ls triggered SH, and thats why the files were there for the
> second ls be a possible cause here?

No it would be a bug. Afr should serve the directory contents from the brick with those files.

> 
> -b
> 
>  
> > Pranith
> > >
> > > Pranith
> > > On 02/05/2015 03:44 PM, Xavier Hernandez wrote:
> > >> Is the failure repeatable ? with the same directories ?
> > >>
> > >> It's very weird that the directories appear on the volume when you do
> > >> an 'ls' on the bricks. Could it be that you only made a single 'ls'
> > >> on fuse mount which not showed the directory ? Is it possible that
> > >> this 'ls' triggered a self-heal that repaired the problem, whatever
> > >> it was, and when you did another 'ls' on the fuse mount after the
> > >> 'ls' on the bricks, the directories were there ?
> > >>
> > >> The first 'ls' could have healed the files, causing that the
> > >> following 'ls' on the bricks showed the files as if nothing were
> > >> damaged. If that's the case, it's possible that there were some
> > >> disconnections during the copy.
> > >>
> > >> Added Pranith because he knows better replication and self-heal details.
> > >>
> > >> Xavi
> > >>
> > >> On 02/04/2015 07:23 PM, David F. Robinson wrote:
> > >>> Distributed/replicated
> > >>>
> > >>> Volume Name: homegfs
> > >>> Type: Distributed-Replicate
> > >>> Volume ID: 1e32672a-f1b7-4b58-ba94-58c085e59071
> > >>> Status: Started
> > >>> Number of Bricks: 4 x 2 = 8
> > >>> Transport-type: tcp
> > >>> Bricks:
> > >>> Brick1: gfsib01a.corvidtec.com:/data/brick01a/homegfs
> > >>> Brick2: gfsib01b.corvidtec.com:/data/brick01b/homegfs
> > >>> Brick3: gfsib01a.corvidtec.com:/data/brick02a/homegfs
> > >>> Brick4: gfsib01b.corvidtec.com:/data/brick02b/homegfs
> > >>> Brick5: gfsib02a.corvidtec.com:/data/brick01a/homegfs
> > >>> Brick6: gfsib02b.corvidtec.com:/data/brick01b/homegfs
> > >>> Brick7: gfsib02a.corvidtec.com:/data/brick02a/homegfs
> > >>> Brick8: gfsib02b.corvidtec.com:/data/brick02b/homegfs
> > >>> Options Reconfigured:
> > >>> performance.io-thread-count: 32
> > >>> performance.cache-size: 128MB
> > >>> performance.write-behind-window-size: 128MB
> > >>> server.allow-insecure: on
> > >>> network.ping-timeout: 10
> > >>> storage.owner-gid: 100
> > >>> geo-replication.indexing: off
> > >>> geo-replication.ignore-pid-check: on
> > >>> changelog.changelog: on
> > >>> changelog.fsync-interval: 3
> > >>> changelog.rollover-time: 15
> > >>> server.manage-gids: on
> > >>>
> > >>>
> > >>> ------ Original Message ------
> > >>> From: "Xavier Hernandez" <xhernandez at datalab.es>
> > >>> To: "David F. Robinson" <david.robinson at corvidtec.com>; "Benjamin
> > >>> Turner" <bennyturns at gmail.com>
> > >>> Cc: "gluster-users at gluster.org" <gluster-users at gluster.org>; "Gluster
> > >>> Devel" <gluster-devel at gluster.org>
> > >>> Sent: 2/4/2015 6:03:45 AM
> > >>> Subject: Re: [Gluster-devel] missing files
> > >>>
> > >>>> On 02/04/2015 01:30 AM, David F. Robinson wrote:
> > >>>>> Sorry. Thought about this a little more. I should have been clearer.
> > >>>>> The files were on both bricks of the replica, not just one side. So,
> > >>>>> both bricks had to have been up... The files/directories just
> > >>>>> don't show
> > >>>>> up on the mount.
> > >>>>> I was reading and saw a related bug
> > >>>>> (https://bugzilla.redhat.com/show_bug.cgi?id=1159484). I saw it
> > >>>>> suggested to run:
> > >>>>>          find <mount> -d -exec getfattr -h -n trusted.ec.heal {} \;
> > >>>>
> > >>>> This command is specific for a dispersed volume. It won't do anything
> > >>>> (aside from the error you are seeing) on a replicated volume.
> > >>>>
> > >>>> I think you are using a replicated volume, right ?
> > >>>>
> > >>>> In this case I'm not sure what can be happening. Is your volume a pure
> > >>>> replicated one or a distributed-replicated ? on a pure replicated it
> > >>>> doesn't make sense that some entries do not show in an 'ls' when the
> > >>>> file is in both replicas (at least without any error message in the
> > >>>> logs). On a distributed-replicated it could be caused by some problem
> > >>>> while combining contents of each replica set.
> > >>>>
> > >>>> What's the configuration of your volume ?
> > >>>>
> > >>>> Xavi
> > >>>>
> > >>>>>
> > >>>>> I get a bunch of errors for operation not supported:
> > >>>>> [root at gfs02a homegfs]# find wks_backup -d -exec getfattr -h -n
> > >>>>> trusted.ec.heal {} \;
> > >>>>> find: warning: the -d option is deprecated; please use -depth
> > >>>>> instead,
> > >>>>> because the latter is a POSIX-compliant feature.
> > >>>>> wks_backup/homer_backup/backup: trusted.ec.heal: Operation not
> > >>>>> supported
> > >>>>> wks_backup/homer_backup/logs/2014_05_20.log: trusted.ec.heal:
> > >>>>> Operation
> > >>>>> not supported
> > >>>>> wks_backup/homer_backup/logs/2014_05_21.log: trusted.ec.heal:
> > >>>>> Operation
> > >>>>> not supported
> > >>>>> wks_backup/homer_backup/logs/2014_05_18.log: trusted.ec.heal:
> > >>>>> Operation
> > >>>>> not supported
> > >>>>> wks_backup/homer_backup/logs/2014_05_19.log: trusted.ec.heal:
> > >>>>> Operation
> > >>>>> not supported
> > >>>>> wks_backup/homer_backup/logs/2014_05_22.log: trusted.ec.heal:
> > >>>>> Operation
> > >>>>> not supported
> > >>>>> wks_backup/homer_backup/logs: trusted.ec.heal: Operation not
> > >>>>> supported
> > >>>>> wks_backup/homer_backup: trusted.ec.heal: Operation not supported
> > >>>>> ------ Original Message ------
> > >>>>> From: "Benjamin Turner" <bennyturns at gmail.com
> > >>>>> <mailto:bennyturns at gmail.com>>
> > >>>>> To: "David F. Robinson" <david.robinson at corvidtec.com
> > >>>>> <mailto:david.robinson at corvidtec.com>>
> > >>>>> Cc: "Gluster Devel" <gluster-devel at gluster.org
> > >>>>> <mailto:gluster-devel at gluster.org>>; "gluster-users at gluster.org"
> > >>>>> <gluster-users at gluster.org <mailto:gluster-users at gluster.org>>
> > >>>>> Sent: 2/3/2015 7:12:34 PM
> > >>>>> Subject: Re: [Gluster-devel] missing files
> > >>>>>> It sounds to me like the files were only copied to one replica,
> > >>>>>> werent
> > >>>>>> there for the initial for the initial ls which triggered a self
> > >>>>>> heal,
> > >>>>>> and were there for the last ls because they were healed. Is there
> > >>>>>> any
> > >>>>>> chance that one of the replicas was down during the rsync? It could
> > >>>>>> be that you lost a brick during copy or something like that. To
> > >>>>>> confirm I would look for disconnects in the brick logs as well as
> > >>>>>> checking glusterfshd.log to verify the missing files were actually
> > >>>>>> healed.
> > >>>>>>
> > >>>>>> -b
> > >>>>>>
> > >>>>>> On Tue, Feb 3, 2015 at 5:37 PM, David F. Robinson
> > >>>>>> <david.robinson at corvidtec.com <mailto:david.robinson at corvidtec.com>>
> > >>>>>> wrote:
> > >>>>>>
> > >>>>>>     I rsync'd 20-TB over to my gluster system and noticed that I had
> > >>>>>>     some directories missing even though the rsync completed
> > >>>>>> normally.
> > >>>>>>     The rsync logs showed that the missing files were transferred.
> > >>>>>>     I went to the bricks and did an 'ls -al
> > >>>>>>     /data/brick*/homegfs/dir/*' the files were on the bricks.
> > >>>>>> After I
> > >>>>>>     did this 'ls', the files then showed up on the FUSE mounts.
> > >>>>>>     1) Why are the files hidden on the fuse mount?
> > >>>>>>     2) Why does the ls make them show up on the FUSE mount?
> > >>>>>>     3) How can I prevent this from happening again?
> > >>>>>>     Note, I also mounted the gluster volume using NFS and saw the
> > >>>>>> same
> > >>>>>>     behavior. The files/directories were not shown until I did the
> > >>>>>>     "ls" on the bricks.
> > >>>>>>     David
> > >>>>>>     ===============================
> > >>>>>>     David F. Robinson, Ph.D.
> > >>>>>>     President - Corvid Technologies
> > >>>>>>     704.799.6944 x101 <tel:704.799.6944%20x101> [office]
> > >>>>>>     704.252.1310 <tel:704.252.1310> [cell]
> > >>>>>>     704.799.7974 <tel:704.799.7974> [fax]
> > >>>>>>     David.Robinson at corvidtec.com
> > >>>>>> <mailto:David.Robinson at corvidtec.com>
> > >>>>>>     http://www.corvidtechnologies.com
> > >>>>>> <http://www.corvidtechnologies.com/>
> > >>>>>>
> > >>>>>>     _______________________________________________
> > >>>>>>     Gluster-devel mailing list
> > >>>>>>     Gluster-devel at gluster.org <mailto:Gluster-devel at gluster.org>
> > >>>>>> http://www.gluster.org/mailman/listinfo/gluster-devel
> > >>>>>>
> > >>>>>>
> > >>>>>
> > >>>>>
> > >>>>> _______________________________________________
> > >>>>> Gluster-devel mailing list
> > >>>>> Gluster-devel at gluster.org
> > >>>>> http://www.gluster.org/mailman/listinfo/gluster-devel
> > >>>>>
> > >>>
> > >
> > > _______________________________________________
> > > Gluster-users mailing list
> > > Gluster-users at gluster.org
> > > http://www.gluster.org/mailman/listinfo/gluster-users
> > 
> > _______________________________________________
> > Gluster-users mailing list
> > Gluster-users at gluster.org
> > http://www.gluster.org/mailman/listinfo/gluster-users
> > 
>