[Gluster-devel] [Gluster-users] missing files
Joe Julian
joe at julianfamily.org
Thu Feb 5 23:01:09 UTC 2015
Out of curiosity, are you using --inplace?
On 02/05/2015 02:59 PM, David F. Robinson wrote:
> Should I run my rsync with --block-size = something other than the default? Is there an optimal value? I think 128k is the max from my quick search. Didn't dig into it throughly though.
>
> David (Sent from mobile)
>
> ===============================
> David F. Robinson, Ph.D.
> President - Corvid Technologies
> 704.799.6944 x101 [office]
> 704.252.1310 [cell]
> 704.799.7974 [fax]
> David.Robinson at corvidtec.com
> http://www.corvidtechnologies.com
>
>> On Feb 5, 2015, at 5:41 PM, Ben Turner <bturner at redhat.com> wrote:
>>
>> ----- Original Message -----
>>> From: "Ben Turner" <bturner at redhat.com>
>>> To: "David F. Robinson" <david.robinson at corvidtec.com>
>>> Cc: "Pranith Kumar Karampuri" <pkarampu at redhat.com>, "Xavier Hernandez" <xhernandez at datalab.es>, "Benjamin Turner"
>>> <bennyturns at gmail.com>, gluster-users at gluster.org, "Gluster Devel" <gluster-devel at gluster.org>
>>> Sent: Thursday, February 5, 2015 5:22:26 PM
>>> Subject: Re: [Gluster-users] [Gluster-devel] missing files
>>>
>>> ----- Original Message -----
>>>> From: "David F. Robinson" <david.robinson at corvidtec.com>
>>>> To: "Ben Turner" <bturner at redhat.com>
>>>> Cc: "Pranith Kumar Karampuri" <pkarampu at redhat.com>, "Xavier Hernandez"
>>>> <xhernandez at datalab.es>, "Benjamin Turner"
>>>> <bennyturns at gmail.com>, gluster-users at gluster.org, "Gluster Devel"
>>>> <gluster-devel at gluster.org>
>>>> Sent: Thursday, February 5, 2015 5:01:13 PM
>>>> Subject: Re: [Gluster-users] [Gluster-devel] missing files
>>>>
>>>> I'll send you the emails I sent Pranith with the logs. What causes these
>>>> disconnects?
>>> Thanks David! Disconnects happen when there are interruption in
>>> communication between peers, normally there is ping timeout that happens.
>>> It could be anything from a flaky NW to the system was to busy to respond
>>> to the pings. My initial take is more towards the ladder as rsync is
>>> absolutely the worst use case for gluster - IIRC it writes in 4kb blocks. I
>>> try to keep my writes at least 64KB as in my testing that is the smallest
>>> block size I can write with before perf starts to really drop off. I'll try
>>> something similar in the lab.
>> Ok I do think that the file being self healed is RCA for what you were seeing. Lets look at one of the disconnects:
>>
>> data-brick02a-homegfs.log:[2015-02-03 20:54:02.772180] I [server.c:518:server_rpc_notify] 0-homegfs-server: disconnecting connection from gfs01b.corvidtec.com-4175-2015/02/02-16:44:31:179119-homegfs-client-2-0-1
>>
>> And in the glustershd.log from the gfs01b_glustershd.log file:
>>
>> [2015-02-03 20:55:48.001797] I [afr-self-heal-entry.c:554:afr_selfheal_entry_do] 0-homegfs-replicate-0: performing entry selfheal on 6c79a368-edaa-432b-bef9-ec690ab42448
>> [2015-02-03 20:55:49.341996] I [afr-self-heal-common.c:476:afr_log_selfheal] 0-homegfs-replicate-0: Completed entry selfheal on 6c79a368-edaa-432b-bef9-ec690ab42448. source=1 sinks=0
>> [2015-02-03 20:55:49.343093] I [afr-self-heal-entry.c:554:afr_selfheal_entry_do] 0-homegfs-replicate-0: performing entry selfheal on 792cb0d6-9290-4447-8cd7-2b2d7a116a69
>> [2015-02-03 20:55:50.463652] I [afr-self-heal-common.c:476:afr_log_selfheal] 0-homegfs-replicate-0: Completed entry selfheal on 792cb0d6-9290-4447-8cd7-2b2d7a116a69. source=1 sinks=0
>> [2015-02-03 20:55:51.465289] I [afr-self-heal-metadata.c:54:__afr_selfheal_metadata_do] 0-homegfs-replicate-0: performing metadata selfheal on 403e661a-1c27-4e79-9867-c0572aba2b3c
>> [2015-02-03 20:55:51.466515] I [afr-self-heal-common.c:476:afr_log_selfheal] 0-homegfs-replicate-0: Completed metadata selfheal on 403e661a-1c27-4e79-9867-c0572aba2b3c. source=1 sinks=0
>> [2015-02-03 20:55:51.467098] I [afr-self-heal-entry.c:554:afr_selfheal_entry_do] 0-homegfs-replicate-0: performing entry selfheal on 403e661a-1c27-4e79-9867-c0572aba2b3c
>> [2015-02-03 20:55:55.257808] I [afr-self-heal-common.c:476:afr_log_selfheal] 0-homegfs-replicate-0: Completed entry selfheal on 403e661a-1c27-4e79-9867-c0572aba2b3c. source=1 sinks=0
>> [2015-02-03 20:55:55.258548] I [afr-self-heal-metadata.c:54:__afr_selfheal_metadata_do] 0-homegfs-replicate-0: performing metadata selfheal on c612ee2f-2fb4-4157-a9ab-5a2d5603c541
>> [2015-02-03 20:55:55.259367] I [afr-self-heal-common.c:476:afr_log_selfheal] 0-homegfs-replicate-0: Completed metadata selfheal on c612ee2f-2fb4-4157-a9ab-5a2d5603c541. source=1 sinks=0
>> [2015-02-03 20:55:55.259980] I [afr-self-heal-entry.c:554:afr_selfheal_entry_do] 0-homegfs-replicate-0: performing entry selfheal on c612ee2f-2fb4-4157-a9ab-5a2d5603c541
>>
>> As you can see the self heal logs are just spammed with files being healed, and I looked at a couple of disconnects and I see self heals getting run shortly after on the bricks that were down. Now we need to find the cause of the disconnects, I am thinking once the disconnects are resolved the files should be properly copied over without SH having to fix things. Like I said I'll give this a go on my lab systems and see if I can repro the disconnects, I'll have time to run through it tomorrow. If in the mean time anyone else has a theory / anything to add here it would be appreciated.
>>
>> -b
>>
>>> -b
>>>
>>>> David (Sent from mobile)
>>>>
>>>> ===============================
>>>> David F. Robinson, Ph.D.
>>>> President - Corvid Technologies
>>>> 704.799.6944 x101 [office]
>>>> 704.252.1310 [cell]
>>>> 704.799.7974 [fax]
>>>> David.Robinson at corvidtec.com
>>>> http://www.corvidtechnologies.com
>>>>
>>>>> On Feb 5, 2015, at 4:55 PM, Ben Turner <bturner at redhat.com> wrote:
>>>>>
>>>>> ----- Original Message -----
>>>>>> From: "Pranith Kumar Karampuri" <pkarampu at redhat.com>
>>>>>> To: "Xavier Hernandez" <xhernandez at datalab.es>, "David F. Robinson"
>>>>>> <david.robinson at corvidtec.com>, "Benjamin Turner"
>>>>>> <bennyturns at gmail.com>
>>>>>> Cc: gluster-users at gluster.org, "Gluster Devel"
>>>>>> <gluster-devel at gluster.org>
>>>>>> Sent: Thursday, February 5, 2015 5:30:04 AM
>>>>>> Subject: Re: [Gluster-users] [Gluster-devel] missing files
>>>>>>
>>>>>>
>>>>>>> On 02/05/2015 03:48 PM, Pranith Kumar Karampuri wrote:
>>>>>>> I believe David already fixed this. I hope this is the same issue he
>>>>>>> told about permissions issue.
>>>>>> Oops, it is not. I will take a look.
>>>>> Yes David exactly like these:
>>>>>
>>>>> data-brick02a-homegfs.log:[2015-02-03 19:09:34.568842] I
>>>>> [server.c:518:server_rpc_notify] 0-homegfs-server: disconnecting
>>>>> connection from
>>>>> gfs02a.corvidtec.com-18563-2015/02/03-19:07:58:519134-homegfs-client-2-0-0
>>>>> data-brick02a-homegfs.log:[2015-02-03 19:09:41.286551] I
>>>>> [server.c:518:server_rpc_notify] 0-homegfs-server: disconnecting
>>>>> connection from
>>>>> gfs01a.corvidtec.com-12804-2015/02/03-19:09:38:497808-homegfs-client-2-0-0
>>>>> data-brick02a-homegfs.log:[2015-02-03 19:16:35.906412] I
>>>>> [server.c:518:server_rpc_notify] 0-homegfs-server: disconnecting
>>>>> connection from
>>>>> gfs02b.corvidtec.com-27190-2015/02/03-19:15:53:458467-homegfs-client-2-0-0
>>>>> data-brick02a-homegfs.log:[2015-02-03 19:51:22.761293] I
>>>>> [server.c:518:server_rpc_notify] 0-homegfs-server: disconnecting
>>>>> connection from
>>>>> gfs01a.corvidtec.com-25926-2015/02/03-19:51:02:89070-homegfs-client-2-0-0
>>>>> data-brick02a-homegfs.log:[2015-02-03 20:54:02.772180] I
>>>>> [server.c:518:server_rpc_notify] 0-homegfs-server: disconnecting
>>>>> connection from
>>>>> gfs01b.corvidtec.com-4175-2015/02/02-16:44:31:179119-homegfs-client-2-0-1
>>>>>
>>>>> You can 100% verify my theory if you can correlate the time on the
>>>>> disconnects to the time that the missing files were healed. Can you have
>>>>> a look at /var/log/glusterfs/glustershd.log? That has all of the healed
>>>>> files + timestamps, if we can see a disconnect during the rsync and a
>>>>> self
>>>>> heal of the missing file I think we can safely assume that the
>>>>> disconnects
>>>>> may have caused this. I'll try this on my test systems, how much data
>>>>> did
>>>>> you rsync? What size ish of files / an idea of the dir layout?
>>>>>
>>>>> @Pranith - Could bricks flapping up and down during the rsync cause the
>>>>> files to be missing on the first ls(written to 1 subvol but not the other
>>>>> cause it was down), the ls triggered SH, and thats why the files were
>>>>> there for the second ls be a possible cause here?
>>>>>
>>>>> -b
>>>>>
>>>>>
>>>>>> Pranith
>>>>>>> Pranith
>>>>>>>> On 02/05/2015 03:44 PM, Xavier Hernandez wrote:
>>>>>>>> Is the failure repeatable ? with the same directories ?
>>>>>>>>
>>>>>>>> It's very weird that the directories appear on the volume when you do
>>>>>>>> an 'ls' on the bricks. Could it be that you only made a single 'ls'
>>>>>>>> on fuse mount which not showed the directory ? Is it possible that
>>>>>>>> this 'ls' triggered a self-heal that repaired the problem, whatever
>>>>>>>> it was, and when you did another 'ls' on the fuse mount after the
>>>>>>>> 'ls' on the bricks, the directories were there ?
>>>>>>>>
>>>>>>>> The first 'ls' could have healed the files, causing that the
>>>>>>>> following 'ls' on the bricks showed the files as if nothing were
>>>>>>>> damaged. If that's the case, it's possible that there were some
>>>>>>>> disconnections during the copy.
>>>>>>>>
>>>>>>>> Added Pranith because he knows better replication and self-heal
>>>>>>>> details.
>>>>>>>>
>>>>>>>> Xavi
>>>>>>>>
>>>>>>>>> On 02/04/2015 07:23 PM, David F. Robinson wrote:
>>>>>>>>> Distributed/replicated
>>>>>>>>>
>>>>>>>>> Volume Name: homegfs
>>>>>>>>> Type: Distributed-Replicate
>>>>>>>>> Volume ID: 1e32672a-f1b7-4b58-ba94-58c085e59071
>>>>>>>>> Status: Started
>>>>>>>>> Number of Bricks: 4 x 2 = 8
>>>>>>>>> Transport-type: tcp
>>>>>>>>> Bricks:
>>>>>>>>> Brick1: gfsib01a.corvidtec.com:/data/brick01a/homegfs
>>>>>>>>> Brick2: gfsib01b.corvidtec.com:/data/brick01b/homegfs
>>>>>>>>> Brick3: gfsib01a.corvidtec.com:/data/brick02a/homegfs
>>>>>>>>> Brick4: gfsib01b.corvidtec.com:/data/brick02b/homegfs
>>>>>>>>> Brick5: gfsib02a.corvidtec.com:/data/brick01a/homegfs
>>>>>>>>> Brick6: gfsib02b.corvidtec.com:/data/brick01b/homegfs
>>>>>>>>> Brick7: gfsib02a.corvidtec.com:/data/brick02a/homegfs
>>>>>>>>> Brick8: gfsib02b.corvidtec.com:/data/brick02b/homegfs
>>>>>>>>> Options Reconfigured:
>>>>>>>>> performance.io-thread-count: 32
>>>>>>>>> performance.cache-size: 128MB
>>>>>>>>> performance.write-behind-window-size: 128MB
>>>>>>>>> server.allow-insecure: on
>>>>>>>>> network.ping-timeout: 10
>>>>>>>>> storage.owner-gid: 100
>>>>>>>>> geo-replication.indexing: off
>>>>>>>>> geo-replication.ignore-pid-check: on
>>>>>>>>> changelog.changelog: on
>>>>>>>>> changelog.fsync-interval: 3
>>>>>>>>> changelog.rollover-time: 15
>>>>>>>>> server.manage-gids: on
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ------ Original Message ------
>>>>>>>>> From: "Xavier Hernandez" <xhernandez at datalab.es>
>>>>>>>>> To: "David F. Robinson" <david.robinson at corvidtec.com>; "Benjamin
>>>>>>>>> Turner" <bennyturns at gmail.com>
>>>>>>>>> Cc: "gluster-users at gluster.org" <gluster-users at gluster.org>; "Gluster
>>>>>>>>> Devel" <gluster-devel at gluster.org>
>>>>>>>>> Sent: 2/4/2015 6:03:45 AM
>>>>>>>>> Subject: Re: [Gluster-devel] missing files
>>>>>>>>>
>>>>>>>>>>> On 02/04/2015 01:30 AM, David F. Robinson wrote:
>>>>>>>>>>> Sorry. Thought about this a little more. I should have been
>>>>>>>>>>> clearer.
>>>>>>>>>>> The files were on both bricks of the replica, not just one side.
>>>>>>>>>>> So,
>>>>>>>>>>> both bricks had to have been up... The files/directories just
>>>>>>>>>>> don't show
>>>>>>>>>>> up on the mount.
>>>>>>>>>>> I was reading and saw a related bug
>>>>>>>>>>> (https://bugzilla.redhat.com/show_bug.cgi?id=1159484). I saw it
>>>>>>>>>>> suggested to run:
>>>>>>>>>>> find <mount> -d -exec getfattr -h -n trusted.ec.heal {} \;
>>>>>>>>>> This command is specific for a dispersed volume. It won't do
>>>>>>>>>> anything
>>>>>>>>>> (aside from the error you are seeing) on a replicated volume.
>>>>>>>>>>
>>>>>>>>>> I think you are using a replicated volume, right ?
>>>>>>>>>>
>>>>>>>>>> In this case I'm not sure what can be happening. Is your volume a
>>>>>>>>>> pure
>>>>>>>>>> replicated one or a distributed-replicated ? on a pure replicated it
>>>>>>>>>> doesn't make sense that some entries do not show in an 'ls' when the
>>>>>>>>>> file is in both replicas (at least without any error message in the
>>>>>>>>>> logs). On a distributed-replicated it could be caused by some
>>>>>>>>>> problem
>>>>>>>>>> while combining contents of each replica set.
>>>>>>>>>>
>>>>>>>>>> What's the configuration of your volume ?
>>>>>>>>>>
>>>>>>>>>> Xavi
>>>>>>>>>>
>>>>>>>>>>> I get a bunch of errors for operation not supported:
>>>>>>>>>>> [root at gfs02a homegfs]# find wks_backup -d -exec getfattr -h -n
>>>>>>>>>>> trusted.ec.heal {} \;
>>>>>>>>>>> find: warning: the -d option is deprecated; please use -depth
>>>>>>>>>>> instead,
>>>>>>>>>>> because the latter is a POSIX-compliant feature.
>>>>>>>>>>> wks_backup/homer_backup/backup: trusted.ec.heal: Operation not
>>>>>>>>>>> supported
>>>>>>>>>>> wks_backup/homer_backup/logs/2014_05_20.log: trusted.ec.heal:
>>>>>>>>>>> Operation
>>>>>>>>>>> not supported
>>>>>>>>>>> wks_backup/homer_backup/logs/2014_05_21.log: trusted.ec.heal:
>>>>>>>>>>> Operation
>>>>>>>>>>> not supported
>>>>>>>>>>> wks_backup/homer_backup/logs/2014_05_18.log: trusted.ec.heal:
>>>>>>>>>>> Operation
>>>>>>>>>>> not supported
>>>>>>>>>>> wks_backup/homer_backup/logs/2014_05_19.log: trusted.ec.heal:
>>>>>>>>>>> Operation
>>>>>>>>>>> not supported
>>>>>>>>>>> wks_backup/homer_backup/logs/2014_05_22.log: trusted.ec.heal:
>>>>>>>>>>> Operation
>>>>>>>>>>> not supported
>>>>>>>>>>> wks_backup/homer_backup/logs: trusted.ec.heal: Operation not
>>>>>>>>>>> supported
>>>>>>>>>>> wks_backup/homer_backup: trusted.ec.heal: Operation not supported
>>>>>>>>>>> ------ Original Message ------
>>>>>>>>>>> From: "Benjamin Turner" <bennyturns at gmail.com
>>>>>>>>>>> <mailto:bennyturns at gmail.com>>
>>>>>>>>>>> To: "David F. Robinson" <david.robinson at corvidtec.com
>>>>>>>>>>> <mailto:david.robinson at corvidtec.com>>
>>>>>>>>>>> Cc: "Gluster Devel" <gluster-devel at gluster.org
>>>>>>>>>>> <mailto:gluster-devel at gluster.org>>; "gluster-users at gluster.org"
>>>>>>>>>>> <gluster-users at gluster.org <mailto:gluster-users at gluster.org>>
>>>>>>>>>>> Sent: 2/3/2015 7:12:34 PM
>>>>>>>>>>> Subject: Re: [Gluster-devel] missing files
>>>>>>>>>>>> It sounds to me like the files were only copied to one replica,
>>>>>>>>>>>> werent
>>>>>>>>>>>> there for the initial for the initial ls which triggered a self
>>>>>>>>>>>> heal,
>>>>>>>>>>>> and were there for the last ls because they were healed. Is there
>>>>>>>>>>>> any
>>>>>>>>>>>> chance that one of the replicas was down during the rsync? It
>>>>>>>>>>>> could
>>>>>>>>>>>> be that you lost a brick during copy or something like that. To
>>>>>>>>>>>> confirm I would look for disconnects in the brick logs as well as
>>>>>>>>>>>> checking glusterfshd.log to verify the missing files were actually
>>>>>>>>>>>> healed.
>>>>>>>>>>>>
>>>>>>>>>>>> -b
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Feb 3, 2015 at 5:37 PM, David F. Robinson
>>>>>>>>>>>> <david.robinson at corvidtec.com
>>>>>>>>>>>> <mailto:david.robinson at corvidtec.com>>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> I rsync'd 20-TB over to my gluster system and noticed that I
>>>>>>>>>>>> had
>>>>>>>>>>>> some directories missing even though the rsync completed
>>>>>>>>>>>> normally.
>>>>>>>>>>>> The rsync logs showed that the missing files were transferred.
>>>>>>>>>>>> I went to the bricks and did an 'ls -al
>>>>>>>>>>>> /data/brick*/homegfs/dir/*' the files were on the bricks.
>>>>>>>>>>>> After I
>>>>>>>>>>>> did this 'ls', the files then showed up on the FUSE mounts.
>>>>>>>>>>>> 1) Why are the files hidden on the fuse mount?
>>>>>>>>>>>> 2) Why does the ls make them show up on the FUSE mount?
>>>>>>>>>>>> 3) How can I prevent this from happening again?
>>>>>>>>>>>> Note, I also mounted the gluster volume using NFS and saw the
>>>>>>>>>>>> same
>>>>>>>>>>>> behavior. The files/directories were not shown until I did the
>>>>>>>>>>>> "ls" on the bricks.
>>>>>>>>>>>> David
>>>>>>>>>>>> ===============================
>>>>>>>>>>>> David F. Robinson, Ph.D.
>>>>>>>>>>>> President - Corvid Technologies
>>>>>>>>>>>> 704.799.6944 x101 <tel:704.799.6944%20x101> [office]
>>>>>>>>>>>> 704.252.1310 <tel:704.252.1310> [cell]
>>>>>>>>>>>> 704.799.7974 <tel:704.799.7974> [fax]
>>>>>>>>>>>> David.Robinson at corvidtec.com
>>>>>>>>>>>> <mailto:David.Robinson at corvidtec.com>
>>>>>>>>>>>> http://www.corvidtechnologies.com
>>>>>>>>>>>> <http://www.corvidtechnologies.com/>
>>>>>>>>>>>>
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> Gluster-devel mailing list
>>>>>>>>>>>> Gluster-devel at gluster.org <mailto:Gluster-devel at gluster.org>
>>>>>>>>>>>> http://www.gluster.org/mailman/listinfo/gluster-devel
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> Gluster-devel mailing list
>>>>>>>>>>> Gluster-devel at gluster.org
>>>>>>>>>>> http://www.gluster.org/mailman/listinfo/gluster-devel
>>>>>>> _______________________________________________
>>>>>>> Gluster-users mailing list
>>>>>>> Gluster-users at gluster.org
>>>>>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>> _______________________________________________
>>>>>> Gluster-users mailing list
>>>>>> Gluster-users at gluster.org
>>>>>> http://www.gluster.org/mailman/listinfo/gluster-users
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
More information about the Gluster-devel
mailing list