[Gluster-users] missing files

Jeremy Enos jenos at ncsa.uiuc.edu
Tue Nov 24 01:39:09 UTC 2009


I have another clue to report:
So I have my export directory as:
/export
Mounted as:
/scratch

If I do "ls -lR /scratch", it's supposed to synchronize all files and
metadata, right?  Well, it doesn't seem to be doing that.

I have approx 100 files in one problematic folder.  Only 50 show up to
ls.  That is, until I list it specifically.  They also don't show up in
the export directory until ls'd by name in /scratch.

ls /scratch/file* # results in files1-49 being listed
ls /export/file*  # same result as above
ls /export/file50.dat  #  no such file or directory
ls /scratch/file50.dat  # lists file as if nothing was ever wrong
ls /export/file50.dat  # shows up now after specific ls call in /scratch
ls /scratch/file*  # results in files 1-50 being listed now  (magic?)
ls /export/file*  # also results in files 1-50 being listed now

I'm considering doing a:
for n in `seq 51-100` ; do ls /scratch/file$n.dat ; done
just to recover the files.  However, I'm delaying that so I can keep
some in the problematic state should someone give me some additional
debugging steps here.  Don't get me wrong- I appreciate any help I can
get w/ a free product like this.  But I'm actually surprised that a
report like this just seems to be hitting a dead end on this list in
terms of responses.  Isn't this alarming behavior?  Somehow the
filesystem got into a state where files still were recorded, but weren't
represented until specifically listed.  That should tell us something,
but I'm no expert here.
thx-

    Jeremy

Jeremy Enos wrote:
> Can anyone tell me if there's hope of recovering data here?  Steps to
> take?  Anything?  Is something wrong with my configuration?  (raid1 over
> raid0)  If I don't have a clue what went wrong or why, or how to
> recover, then even formatting and starting fresh doesn't lend much hope
> in future reliability.
> thx-
>
>     Jeremy
>
> Jeremy Enos wrote:
>   
>> plain text send...
>>
>> Jeremy Enos wrote:
>>     
>>> What kind of tweaking and tampering was necessary to recover the lost
>>> data?
>>>
>>>     Jeremy
>>>
>>> My configuration:
>>> Oh yes- of course- don't know why I left this out.  Version and
>>> config files follow.
>>>
>>> [jenos at ac glusterfs]$ rpm -qa |grep gluster
>>> glusterfs-common-2.0.7-1.fc10.x86_64
>>> glusterfs-client-2.0.7-1.fc10.x86_64
>>>
>>>
>>> [jenos at ac glusterfs]$ cat glusterfs.vol
>>> #-----------IB remotes------------------
>>> volume remote1
>>>   type protocol/client
>>>   option transport-type ib-verbs/client
>>>   option remote-host ac11
>>>   option remote-subvolume ibstripe
>>> end-volume
>>>
>>> volume remote2
>>>   type protocol/client
>>>   option transport-type ib-verbs/client
>>>   option remote-host ac12
>>>   option remote-subvolume ibstripe
>>> end-volume
>>>
>>> volume remote3
>>>   type protocol/client
>>>   option transport-type ib-verbs/client
>>>   option remote-host ac13
>>>   option remote-subvolume ibstripe
>>> end-volume
>>>
>>> volume remote4
>>>   type protocol/client
>>>   option transport-type ib-verbs/client
>>>   option remote-host ac14
>>>   option remote-subvolume ibstripe
>>> end-volume
>>>
>>> volume remote5
>>>   type protocol/client
>>>   option transport-type ib-verbs/client
>>>   option remote-host ac15
>>>   option remote-subvolume ibstripe
>>> end-volume
>>>
>>> volume remote6
>>>   type protocol/client
>>>   option transport-type ib-verbs/client
>>>   option remote-host ac16
>>>   option remote-subvolume ibstripe
>>> end-volume
>>>
>>> volume remote7
>>>   type protocol/client
>>>   option transport-type ib-verbs/client
>>>   option remote-host ac17
>>>   option remote-subvolume ibstripe
>>> end-volume
>>>
>>> volume remote8
>>>   type protocol/client
>>>   option transport-type ib-verbs/client
>>>   option remote-host ac18
>>>   option remote-subvolume ibstripe
>>> end-volume
>>>
>>> volume remote9
>>>   type protocol/client
>>>   option transport-type ib-verbs/client
>>>   option remote-host ac19
>>>   option remote-subvolume ibstripe
>>> end-volume
>>>
>>> volume remote10
>>>   type protocol/client
>>>   option transport-type ib-verbs/client
>>>   option remote-host ac20
>>>   option remote-subvolume ibstripe
>>> end-volume
>>>
>>> #----------Stripe and Replicate------------------
>>>
>>> volume stripe1
>>>   type cluster/stripe
>>>   option block-size 1MB
>>>   subvolumes remote1 remote2 remote3 remote4 remote5
>>> end-volume
>>>
>>> volume stripe2
>>>   type cluster/stripe
>>>   option block-size 1MB
>>>   subvolumes remote6 remote7 remote8 remote9 remote10
>>> end-volume
>>>
>>> volume replicate
>>>   type cluster/replicate
>>>   option metadata-self-heal on
>>>   subvolumes stripe1 stripe2
>>> end-volume
>>>
>>> #------------Performance Options-------------------
>>>
>>> volume readahead
>>>   type performance/read-ahead
>>>   option page-count 4           # 2 is default option
>>>   option force-atime-update off # default is off
>>>   subvolumes replicate
>>> end-volume
>>>
>>> volume writebehind
>>>   type performance/write-behind
>>>   option cache-size 1MB
>>>   subvolumes readahead
>>> end-volume
>>>
>>> volume cache
>>>   type performance/io-cache
>>>   option cache-size 1GB
>>>   subvolumes writebehind
>>> end-volume
>>>
>>> [jenos at ac glusterfs]$ cat glusterfsd.vol
>>> volume posix
>>>   type storage/posix
>>>   option directory /export
>>> end-volume
>>>
>>> volume locks
>>>   type features/locks
>>>   subvolumes posix
>>> end-volume
>>>
>>> volume ibstripe
>>>   type performance/io-threads
>>>   option thread-count 4
>>>   subvolumes locks
>>> end-volume
>>>
>>> volume server-ib
>>>   type protocol/server
>>>   option transport-type ib-verbs/server
>>>   option auth.addr.ibstripe.allow *
>>>   subvolumes ibstripe
>>> end-volume
>>>
>>> volume server-tcp
>>>   type protocol/server
>>>   option transport-type tcp/server
>>>   option auth.addr.ibstripe.allow *
>>>   subvolumes ibstripe
>>> end-volume
>>>
>>> [jenos at ac glusterfs]$
>>>
>>>
>>>
>>> Krzysztof Strasburger wrote:
>>>       
>>>> On Wed, Nov 04, 2009 at 01:31:30AM -0600, Jeremy Enos wrote:
>>>>  
>>>>         
>>>>> Hi-
>>>>> I've got a problem where certain batches of files written out to
>>>>> gluster have disappeared.  Also, newly created files sometimes
>>>>> don't show up to ls unless they are explicitly specified to ls and
>>>>> other tools.
>>>>>
>>>>> In my export folder, everything appears fine.
>>>>> I have found that when I touch the missing file in gluster, it
>>>>> comes back, shows a file size, but appears empty.  I've tried
>>>>> umounting, restarting all glusterfsds, remounting, and it stayed
>>>>> the same.  Also, this problem did not show up immediately after
>>>>> setting up the filesystem, at least during basic tests.  Any ideas?
>>>>>     
>>>>>           
>>>> What is your configuration? I experienced similar problems with unify
>>>> after a disk crash. The namespace (replicated) was not rebuilt
>>>> correctly
>>>> after replacing the failing unit and I had to add some files manually
>>>> (OK, using a script, but an intervention was needed). No data loss,
>>>> only a bit of tweaking and tampering ;).
>>>> Krzysztof
>>>>         




More information about the Gluster-users mailing list