[Gluster-devel] [Gluster-users] missing files

Fri Feb 6 02:05:43 UTC 2015

Correct!  I have seen(back in the day, its been 3ish years since I have
seen it) having say 50+ volumes each with a geo rep session take system
load levels to the point where pings couldn't be serviced within the ping
timeout.  So it is known to happen but there has been alot of work in the
geo rep space to help here, some of which is discussed:

https://medium.com/@msvbhat/distributed-geo-replication-in-glusterfs-ec95f4393c50

(think tar + ssh and other fixes)Your symptoms remind me of that case of
50+ geo repd volumes, thats why I mentioned it from the start.  My current
shoot from the hip theory is when rsyncing all that data the servers got
too busy to service the pings and it lead to disconnects.  This is common
across all of the clustering / distributed software I have worked on, if
the system gets too busy to service heartbeat within the timeout things go
crazy(think fork bomb on a single host).  Now this could be a case of me
putting symptoms from an old issue into what you are describing, but thats
where my head is at.  If I'm correct I should be able to repro using a
similar workload.  I think that the multi threaded epoll changes that
_just_ landed in master will help resolve this, but they are so new I
haven't been able to test this.  I'll know more when I get a chance to test
tomorrow.

-b

On Thu, Feb 5, 2015 at 6:04 PM, David F. Robinson <
david.robinson at corvidtec.com> wrote:

> Isn't rsync what geo-rep uses?
>
> David  (Sent from mobile)
>
> ===============================
> David F. Robinson, Ph.D.
> President - Corvid Technologies
> 704.799.6944 x101 [office]
> 704.252.1310      [cell]
> 704.799.7974      [fax]
> David.Robinson at corvidtec.com
> http://www.corvidtechnologies.com
>
> > On Feb 5, 2015, at 5:41 PM, Ben Turner <bturner at redhat.com> wrote:
> >
> > ----- Original Message -----
> >> From: "Ben Turner" <bturner at redhat.com>
> >> To: "David F. Robinson" <david.robinson at corvidtec.com>
> >> Cc: "Pranith Kumar Karampuri" <pkarampu at redhat.com>, "Xavier
> Hernandez" <xhernandez at datalab.es>, "Benjamin Turner"
> >> <bennyturns at gmail.com>, gluster-users at gluster.org, "Gluster Devel" <
> gluster-devel at gluster.org>
> >> Sent: Thursday, February 5, 2015 5:22:26 PM
> >> Subject: Re: [Gluster-users] [Gluster-devel] missing files
> >>
> >> ----- Original Message -----
> >>> From: "David F. Robinson" <david.robinson at corvidtec.com>
> >>> To: "Ben Turner" <bturner at redhat.com>
> >>> Cc: "Pranith Kumar Karampuri" <pkarampu at redhat.com>, "Xavier
> Hernandez"
> >>> <xhernandez at datalab.es>, "Benjamin Turner"
> >>> <bennyturns at gmail.com>, gluster-users at gluster.org, "Gluster Devel"
> >>> <gluster-devel at gluster.org>
> >>> Sent: Thursday, February 5, 2015 5:01:13 PM
> >>> Subject: Re: [Gluster-users] [Gluster-devel] missing files
> >>>
> >>> I'll send you the emails I sent Pranith with the logs. What causes
> these
> >>> disconnects?
> >>
> >> Thanks David!  Disconnects happen when there are interruption in
> >> communication between peers, normally there is ping timeout that
> happens.
> >> It could be anything from a flaky NW to the system was to busy to
> respond
> >> to the pings.  My initial take is more towards the ladder as rsync is
> >> absolutely the worst use case for gluster - IIRC it writes in 4kb
> blocks.  I
> >> try to keep my writes at least 64KB as in my testing that is the
> smallest
> >> block size I can write with before perf starts to really drop off.
> I'll try
> >> something similar in the lab.
> >
> > Ok I do think that the file being self healed is RCA for what you were
> seeing.  Lets look at one of the disconnects:
> >
> > data-brick02a-homegfs.log:[2015-02-03 20:54:02.772180] I
> [server.c:518:server_rpc_notify] 0-homegfs-server: disconnecting connection
> from
> gfs01b.corvidtec.com-4175-2015/02/02-16:44:31:179119-homegfs-client-2-0-1
> >
> > And in the glustershd.log from the gfs01b_glustershd.log file:
> >
> > [2015-02-03 20:55:48.001797] I
> [afr-self-heal-entry.c:554:afr_selfheal_entry_do] 0-homegfs-replicate-0:
> performing entry selfheal on 6c79a368-edaa-432b-bef9-ec690ab42448
> > [2015-02-03 20:55:49.341996] I
> [afr-self-heal-common.c:476:afr_log_selfheal] 0-homegfs-replicate-0:
> Completed entry selfheal on 6c79a368-edaa-432b-bef9-ec690ab42448. source=1
> sinks=0
> > [2015-02-03 20:55:49.343093] I
> [afr-self-heal-entry.c:554:afr_selfheal_entry_do] 0-homegfs-replicate-0:
> performing entry selfheal on 792cb0d6-9290-4447-8cd7-2b2d7a116a69
> > [2015-02-03 20:55:50.463652] I
> [afr-self-heal-common.c:476:afr_log_selfheal] 0-homegfs-replicate-0:
> Completed entry selfheal on 792cb0d6-9290-4447-8cd7-2b2d7a116a69. source=1
> sinks=0
> > [2015-02-03 20:55:51.465289] I
> [afr-self-heal-metadata.c:54:__afr_selfheal_metadata_do]
> 0-homegfs-replicate-0: performing metadata selfheal on
> 403e661a-1c27-4e79-9867-c0572aba2b3c
> > [2015-02-03 20:55:51.466515] I
> [afr-self-heal-common.c:476:afr_log_selfheal] 0-homegfs-replicate-0:
> Completed metadata selfheal on 403e661a-1c27-4e79-9867-c0572aba2b3c.
> source=1 sinks=0
> > [2015-02-03 20:55:51.467098] I
> [afr-self-heal-entry.c:554:afr_selfheal_entry_do] 0-homegfs-replicate-0:
> performing entry selfheal on 403e661a-1c27-4e79-9867-c0572aba2b3c
> > [2015-02-03 20:55:55.257808] I
> [afr-self-heal-common.c:476:afr_log_selfheal] 0-homegfs-replicate-0:
> Completed entry selfheal on 403e661a-1c27-4e79-9867-c0572aba2b3c. source=1
> sinks=0
> > [2015-02-03 20:55:55.258548] I
> [afr-self-heal-metadata.c:54:__afr_selfheal_metadata_do]
> 0-homegfs-replicate-0: performing metadata selfheal on
> c612ee2f-2fb4-4157-a9ab-5a2d5603c541
> > [2015-02-03 20:55:55.259367] I
> [afr-self-heal-common.c:476:afr_log_selfheal] 0-homegfs-replicate-0:
> Completed metadata selfheal on c612ee2f-2fb4-4157-a9ab-5a2d5603c541.
> source=1 sinks=0
> > [2015-02-03 20:55:55.259980] I
> [afr-self-heal-entry.c:554:afr_selfheal_entry_do] 0-homegfs-replicate-0:
> performing entry selfheal on c612ee2f-2fb4-4157-a9ab-5a2d5603c541
> >
> > As you can see the self heal logs are just spammed with files being
> healed, and I looked at a couple of disconnects and I see self heals
> getting run shortly after on the bricks that were down.  Now we need to
> find the cause of the disconnects, I am thinking once the disconnects are
> resolved the files should be properly copied over without SH having to fix
> things.  Like I said I'll give this a go on my lab systems and see if I can
> repro the disconnects, I'll have time to run through it tomorrow.  If in
> the mean time anyone else has a theory / anything to add here it would be
> appreciated.
> >
> > -b
> >
> >> -b
> >>
> >>> David  (Sent from mobile)
> >>>
> >>> ===============================
> >>> David F. Robinson, Ph.D.
> >>> President - Corvid Technologies
> >>> 704.799.6944 x101 [office]
> >>> 704.252.1310      [cell]
> >>> 704.799.7974      [fax]
> >>> David.Robinson at corvidtec.com
> >>> http://www.corvidtechnologies.com
> >>>
> >>>> On Feb 5, 2015, at 4:55 PM, Ben Turner <bturner at redhat.com> wrote:
> >>>>
> >>>> ----- Original Message -----
> >>>>> From: "Pranith Kumar Karampuri" <pkarampu at redhat.com>
> >>>>> To: "Xavier Hernandez" <xhernandez at datalab.es>, "David F. Robinson"
> >>>>> <david.robinson at corvidtec.com>, "Benjamin Turner"
> >>>>> <bennyturns at gmail.com>
> >>>>> Cc: gluster-users at gluster.org, "Gluster Devel"
> >>>>> <gluster-devel at gluster.org>
> >>>>> Sent: Thursday, February 5, 2015 5:30:04 AM
> >>>>> Subject: Re: [Gluster-users] [Gluster-devel] missing files
> >>>>>
> >>>>>
> >>>>>> On 02/05/2015 03:48 PM, Pranith Kumar Karampuri wrote:
> >>>>>> I believe David already fixed this. I hope this is the same issue he
> >>>>>> told about permissions issue.
> >>>>> Oops, it is not. I will take a look.
> >>>>
> >>>> Yes David exactly like these:
> >>>>
> >>>> data-brick02a-homegfs.log:[2015-02-03 19:09:34.568842] I
> >>>> [server.c:518:server_rpc_notify] 0-homegfs-server: disconnecting
> >>>> connection from
> >>>>
> gfs02a.corvidtec.com-18563-2015/02/03-19:07:58:519134-homegfs-client-2-0-0
> >>>> data-brick02a-homegfs.log:[2015-02-03 19:09:41.286551] I
> >>>> [server.c:518:server_rpc_notify] 0-homegfs-server: disconnecting
> >>>> connection from
> >>>>
> gfs01a.corvidtec.com-12804-2015/02/03-19:09:38:497808-homegfs-client-2-0-0
> >>>> data-brick02a-homegfs.log:[2015-02-03 19:16:35.906412] I
> >>>> [server.c:518:server_rpc_notify] 0-homegfs-server: disconnecting
> >>>> connection from
> >>>>
> gfs02b.corvidtec.com-27190-2015/02/03-19:15:53:458467-homegfs-client-2-0-0
> >>>> data-brick02a-homegfs.log:[2015-02-03 19:51:22.761293] I
> >>>> [server.c:518:server_rpc_notify] 0-homegfs-server: disconnecting
> >>>> connection from
> >>>>
> gfs01a.corvidtec.com-25926-2015/02/03-19:51:02:89070-homegfs-client-2-0-0
> >>>> data-brick02a-homegfs.log:[2015-02-03 20:54:02.772180] I
> >>>> [server.c:518:server_rpc_notify] 0-homegfs-server: disconnecting
> >>>> connection from
> >>>>
> gfs01b.corvidtec.com-4175-2015/02/02-16:44:31:179119-homegfs-client-2-0-1
> >>>>
> >>>> You can 100% verify my theory if you can correlate the time on the
> >>>> disconnects to the time that the missing files were healed.  Can you
> have
> >>>> a look at /var/log/glusterfs/glustershd.log?  That has all of the
> healed
> >>>> files + timestamps, if we can see a disconnect during the rsync and a
> >>>> self
> >>>> heal of the missing file I think we can safely assume that the
> >>>> disconnects
> >>>> may have caused this.  I'll try this on my test systems, how much data
> >>>> did
> >>>> you rsync?  What size ish of files / an idea of the dir layout?
> >>>>
> >>>> @Pranith - Could bricks flapping up and down during the rsync cause
> the
> >>>> files to be missing on the first ls(written to 1 subvol but not the
> other
> >>>> cause it was down), the ls triggered SH, and thats why the files were
> >>>> there for the second ls be a possible cause here?
> >>>>
> >>>> -b
> >>>>
> >>>>
> >>>>> Pranith
> >>>>>>
> >>>>>> Pranith
> >>>>>>> On 02/05/2015 03:44 PM, Xavier Hernandez wrote:
> >>>>>>> Is the failure repeatable ? with the same directories ?
> >>>>>>>
> >>>>>>> It's very weird that the directories appear on the volume when you
> do
> >>>>>>> an 'ls' on the bricks. Could it be that you only made a single 'ls'
> >>>>>>> on fuse mount which not showed the directory ? Is it possible that
> >>>>>>> this 'ls' triggered a self-heal that repaired the problem, whatever
> >>>>>>> it was, and when you did another 'ls' on the fuse mount after the
> >>>>>>> 'ls' on the bricks, the directories were there ?
> >>>>>>>
> >>>>>>> The first 'ls' could have healed the files, causing that the
> >>>>>>> following 'ls' on the bricks showed the files as if nothing were
> >>>>>>> damaged. If that's the case, it's possible that there were some
> >>>>>>> disconnections during the copy.
> >>>>>>>
> >>>>>>> Added Pranith because he knows better replication and self-heal
> >>>>>>> details.
> >>>>>>>
> >>>>>>> Xavi
> >>>>>>>
> >>>>>>>> On 02/04/2015 07:23 PM, David F. Robinson wrote:
> >>>>>>>> Distributed/replicated
> >>>>>>>>
> >>>>>>>> Volume Name: homegfs
> >>>>>>>> Type: Distributed-Replicate
> >>>>>>>> Volume ID: 1e32672a-f1b7-4b58-ba94-58c085e59071
> >>>>>>>> Status: Started
> >>>>>>>> Number of Bricks: 4 x 2 = 8
> >>>>>>>> Transport-type: tcp
> >>>>>>>> Bricks:
> >>>>>>>> Brick1: gfsib01a.corvidtec.com:/data/brick01a/homegfs
> >>>>>>>> Brick2: gfsib01b.corvidtec.com:/data/brick01b/homegfs
> >>>>>>>> Brick3: gfsib01a.corvidtec.com:/data/brick02a/homegfs
> >>>>>>>> Brick4: gfsib01b.corvidtec.com:/data/brick02b/homegfs
> >>>>>>>> Brick5: gfsib02a.corvidtec.com:/data/brick01a/homegfs
> >>>>>>>> Brick6: gfsib02b.corvidtec.com:/data/brick01b/homegfs
> >>>>>>>> Brick7: gfsib02a.corvidtec.com:/data/brick02a/homegfs
> >>>>>>>> Brick8: gfsib02b.corvidtec.com:/data/brick02b/homegfs
> >>>>>>>> Options Reconfigured:
> >>>>>>>> performance.io-thread-count: 32
> >>>>>>>> performance.cache-size: 128MB
> >>>>>>>> performance.write-behind-window-size: 128MB
> >>>>>>>> server.allow-insecure: on
> >>>>>>>> network.ping-timeout: 10
> >>>>>>>> storage.owner-gid: 100
> >>>>>>>> geo-replication.indexing: off
> >>>>>>>> geo-replication.ignore-pid-check: on
> >>>>>>>> changelog.changelog: on
> >>>>>>>> changelog.fsync-interval: 3
> >>>>>>>> changelog.rollover-time: 15
> >>>>>>>> server.manage-gids: on
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> ------ Original Message ------
> >>>>>>>> From: "Xavier Hernandez" <xhernandez at datalab.es>
> >>>>>>>> To: "David F. Robinson" <david.robinson at corvidtec.com>; "Benjamin
> >>>>>>>> Turner" <bennyturns at gmail.com>
> >>>>>>>> Cc: "gluster-users at gluster.org" <gluster-users at gluster.org>;
> "Gluster
> >>>>>>>> Devel" <gluster-devel at gluster.org>
> >>>>>>>> Sent: 2/4/2015 6:03:45 AM
> >>>>>>>> Subject: Re: [Gluster-devel] missing files
> >>>>>>>>
> >>>>>>>>>> On 02/04/2015 01:30 AM, David F. Robinson wrote:
> >>>>>>>>>> Sorry. Thought about this a little more. I should have been
> >>>>>>>>>> clearer.
> >>>>>>>>>> The files were on both bricks of the replica, not just one side.
> >>>>>>>>>> So,
> >>>>>>>>>> both bricks had to have been up... The files/directories just
> >>>>>>>>>> don't show
> >>>>>>>>>> up on the mount.
> >>>>>>>>>> I was reading and saw a related bug
> >>>>>>>>>> (https://bugzilla.redhat.com/show_bug.cgi?id=1159484). I saw it
> >>>>>>>>>> suggested to run:
> >>>>>>>>>>        find <mount> -d -exec getfattr -h -n trusted.ec.heal {}
> \;
> >>>>>>>>>
> >>>>>>>>> This command is specific for a dispersed volume. It won't do
> >>>>>>>>> anything
> >>>>>>>>> (aside from the error you are seeing) on a replicated volume.
> >>>>>>>>>
> >>>>>>>>> I think you are using a replicated volume, right ?
> >>>>>>>>>
> >>>>>>>>> In this case I'm not sure what can be happening. Is your volume a
> >>>>>>>>> pure
> >>>>>>>>> replicated one or a distributed-replicated ? on a pure
> replicated it
> >>>>>>>>> doesn't make sense that some entries do not show in an 'ls' when
> the
> >>>>>>>>> file is in both replicas (at least without any error message in
> the
> >>>>>>>>> logs). On a distributed-replicated it could be caused by some
> >>>>>>>>> problem
> >>>>>>>>> while combining contents of each replica set.
> >>>>>>>>>
> >>>>>>>>> What's the configuration of your volume ?
> >>>>>>>>>
> >>>>>>>>> Xavi
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> I get a bunch of errors for operation not supported:
> >>>>>>>>>> [root at gfs02a homegfs]# find wks_backup -d -exec getfattr -h -n
> >>>>>>>>>> trusted.ec.heal {} \;
> >>>>>>>>>> find: warning: the -d option is deprecated; please use -depth
> >>>>>>>>>> instead,
> >>>>>>>>>> because the latter is a POSIX-compliant feature.
> >>>>>>>>>> wks_backup/homer_backup/backup: trusted.ec.heal: Operation not
> >>>>>>>>>> supported
> >>>>>>>>>> wks_backup/homer_backup/logs/2014_05_20.log: trusted.ec.heal:
> >>>>>>>>>> Operation
> >>>>>>>>>> not supported
> >>>>>>>>>> wks_backup/homer_backup/logs/2014_05_21.log: trusted.ec.heal:
> >>>>>>>>>> Operation
> >>>>>>>>>> not supported
> >>>>>>>>>> wks_backup/homer_backup/logs/2014_05_18.log: trusted.ec.heal:
> >>>>>>>>>> Operation
> >>>>>>>>>> not supported
> >>>>>>>>>> wks_backup/homer_backup/logs/2014_05_19.log: trusted.ec.heal:
> >>>>>>>>>> Operation
> >>>>>>>>>> not supported
> >>>>>>>>>> wks_backup/homer_backup/logs/2014_05_22.log: trusted.ec.heal:
> >>>>>>>>>> Operation
> >>>>>>>>>> not supported
> >>>>>>>>>> wks_backup/homer_backup/logs: trusted.ec.heal: Operation not
> >>>>>>>>>> supported
> >>>>>>>>>> wks_backup/homer_backup: trusted.ec.heal: Operation not
> supported
> >>>>>>>>>> ------ Original Message ------
> >>>>>>>>>> From: "Benjamin Turner" <bennyturns at gmail.com
> >>>>>>>>>> <mailto:bennyturns at gmail.com>>
> >>>>>>>>>> To: "David F. Robinson" <david.robinson at corvidtec.com
> >>>>>>>>>> <mailto:david.robinson at corvidtec.com>>
> >>>>>>>>>> Cc: "Gluster Devel" <gluster-devel at gluster.org
> >>>>>>>>>> <mailto:gluster-devel at gluster.org>>; "gluster-users at gluster.org
> "
> >>>>>>>>>> <gluster-users at gluster.org <mailto:gluster-users at gluster.org>>
> >>>>>>>>>> Sent: 2/3/2015 7:12:34 PM
> >>>>>>>>>> Subject: Re: [Gluster-devel] missing files
> >>>>>>>>>>> It sounds to me like the files were only copied to one replica,
> >>>>>>>>>>> werent
> >>>>>>>>>>> there for the initial for the initial ls which triggered a self
> >>>>>>>>>>> heal,
> >>>>>>>>>>> and were there for the last ls because they were healed. Is
> there
> >>>>>>>>>>> any
> >>>>>>>>>>> chance that one of the replicas was down during the rsync? It
> >>>>>>>>>>> could
> >>>>>>>>>>> be that you lost a brick during copy or something like that. To
> >>>>>>>>>>> confirm I would look for disconnects in the brick logs as well
> as
> >>>>>>>>>>> checking glusterfshd.log to verify the missing files were
> actually
> >>>>>>>>>>> healed.
> >>>>>>>>>>>
> >>>>>>>>>>> -b
> >>>>>>>>>>>
> >>>>>>>>>>> On Tue, Feb 3, 2015 at 5:37 PM, David F. Robinson
> >>>>>>>>>>> <david.robinson at corvidtec.com
> >>>>>>>>>>> <mailto:david.robinson at corvidtec.com>>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>   I rsync'd 20-TB over to my gluster system and noticed that I
> >>>>>>>>>>>   had
> >>>>>>>>>>>   some directories missing even though the rsync completed
> >>>>>>>>>>> normally.
> >>>>>>>>>>>   The rsync logs showed that the missing files were
> transferred.
> >>>>>>>>>>>   I went to the bricks and did an 'ls -al
> >>>>>>>>>>>   /data/brick*/homegfs/dir/*' the files were on the bricks.
> >>>>>>>>>>> After I
> >>>>>>>>>>>   did this 'ls', the files then showed up on the FUSE mounts.
> >>>>>>>>>>>   1) Why are the files hidden on the fuse mount?
> >>>>>>>>>>>   2) Why does the ls make them show up on the FUSE mount?
> >>>>>>>>>>>   3) How can I prevent this from happening again?
> >>>>>>>>>>>   Note, I also mounted the gluster volume using NFS and saw the
> >>>>>>>>>>> same
> >>>>>>>>>>>   behavior. The files/directories were not shown until I did
> the
> >>>>>>>>>>>   "ls" on the bricks.
> >>>>>>>>>>>   David
> >>>>>>>>>>>   ===============================
> >>>>>>>>>>>   David F. Robinson, Ph.D.
> >>>>>>>>>>>   President - Corvid Technologies
> >>>>>>>>>>>   704.799.6944 x101 <tel:704.799.6944%20x101> [office]
> >>>>>>>>>>>   704.252.1310 <tel:704.252.1310> [cell]
> >>>>>>>>>>>   704.799.7974 <tel:704.799.7974> [fax]
> >>>>>>>>>>>   David.Robinson at corvidtec.com
> >>>>>>>>>>> <mailto:David.Robinson at corvidtec.com>
> >>>>>>>>>>>   http://www.corvidtechnologies.com
> >>>>>>>>>>> <http://www.corvidtechnologies.com/>
> >>>>>>>>>>>
> >>>>>>>>>>>   _______________________________________________
> >>>>>>>>>>>   Gluster-devel mailing list
> >>>>>>>>>>>   Gluster-devel at gluster.org <mailto:Gluster-devel at gluster.org>
> >>>>>>>>>>> http://www.gluster.org/mailman/listinfo/gluster-devel
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> _______________________________________________
> >>>>>>>>>> Gluster-devel mailing list
> >>>>>>>>>> Gluster-devel at gluster.org
> >>>>>>>>>> http://www.gluster.org/mailman/listinfo/gluster-devel
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> Gluster-users mailing list
> >>>>>> Gluster-users at gluster.org
> >>>>>> http://www.gluster.org/mailman/listinfo/gluster-users
> >>>>>
> >>>>> _______________________________________________
> >>>>> Gluster-users mailing list
> >>>>> Gluster-users at gluster.org
> >>>>> http://www.gluster.org/mailman/listinfo/gluster-users
> >>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-devel/attachments/20150205/7a9c9d7b/attachment-0001.html>