[Gluster-users] Migrating data from a failing filesystem

Wed Sep 24 18:14:45 UTC 2014

> On 09/24/2014 07:35 PM, james.bellinger at icecube.wisc.edu wrote:
>> Thanks for the info!
>> I started the remove-brick start and, of course, the brick went
>> read-only
>> in less than an hour.
>> This morning I checked the status a couple of minutes apart and found:
>>
>>       Node Rebalanced-files       size     scanned      failures
>> status
>> ---------      -----------   --------   ---------   -----------
>> ------------
>> gfs-node04             6634   590.7GB       81799         14868    in
>> progress
>> ...
>> gfs-node04             6669   596.5GB       86584         15271    in
>> progress
>>
>> I'm not sure exactly what it is doing here:  4785 files scanned, 403
>> failures, and 35 rebalanced.
> What it is supposed to be doing is to scan all the files in the volume,
> and for the files present in itself, i.e.gfs-node04:/sdb, migrate
> (rebalance) it into other bricks in the volume. Let it go to completion.
> The rebalance log should give you an idea of the 403 failures.

I'll have a look at that.

>>   The used amount on the partition hasn't
>> changed.
> Probably because after copying the files to the other bricks, the
> unlinks/rmdirs on itself are failing because of the FS being mounted
> read-only.
>> If anything, the _other_ brick on the server is shrinking!
> Because the data is being copied into this brick as a part of migration?

No, the space used on the read/write brick is decreasing.  The readonly
one isn't changing, of course.

FWIW, this operation seems to have triggered a failure elsewhere, so I was
a little occupied in getting a filesystem working again.  (I can hardly
wait to remainder this system...)

>> (Which is related to the question I had before that you mention below.)
>>
>> gluster volume remove-brick scratch gfs-node04:/sdb start
> What is your original volume configuration? (gluster vol info scratch)?

$ sudo gluster volume info scratch

Volume Name: scratch
Type: Distribute
Volume ID: de1fbb47-3e5a-45dc-8df8-04f7f73a3ecc
Status: Started
Number of Bricks: 12
Transport-type: tcp,rdma
Bricks:
Brick1: gfs-node01:/sdb
Brick2: gfs-node01:/sdc
Brick3: gfs-node01:/sdd
Brick4: gfs-node03:/sda
Brick5: gfs-node03:/sdb
Brick6: gfs-node03:/sdc
Brick7: gfs-node04:/sda
Brick8: gfs-node04:/sdb
Brick9: gfs-node05:/sdb
Brick10: gfs-node06:/sdb
Brick11: gfs-node06:/sdc
Brick12: gfs-node05:/sdc
Options Reconfigured:
cluster.min-free-disk: 30GB

>> but...
>> df /sda
>> Filesystem           1K-blocks      Used Available Use% Mounted on
>> /dev/sda             12644872688 10672989432 1844930140  86% /sda
>> ...
>> /dev/sda             12644872688 10671453672 1846465900  86% /sda
>>
>> Have I shot myself in the other foot?
>> jim
>>
>>
>>
>>
>>
>>> On 09/23/2014 08:56 PM, james.bellinger at icecube.wisc.edu wrote:
>>>> I inherited a non-replicated gluster system based on antique hardware.
>>>>
>>>> One of the brick filesystems is flaking out, and remounts read-only.
>>>> I
>>>> repair it and remount it, but this is only postponing the inevitable.
>>>>
>>>> How can I migrate files off a failing brick that intermittently turns
>>>> read-only?  I have enough space, thanks to a catastrophic failure on
>>>> another brick; I don't want to present people with another one.  But
>>>> if
>>>> I
>>>> understand migration correctly references have to be deleted, which
>>>> isn't
>>>> possible if the filesystem turns read-only.
>>> What you could do is initiate the  migration  with `remove-brick start'
>>> and monitor the progress with 'remove-brick status`. Irrespective of
>>> whether the rebalance  completes or fails (due to the brick turning
>>> read-only), you could anyway update the volume configuration with
>>> 'remove-brick commit`. Now if the brick still has files left, mount the
>>> gluster volume on that node and copy the files from the brick to the
>>> volume via the mount.  You can then safely rebuild the array/ add a
>>> different brick or whatever.
>>>
>>>> What I want to do is migrate the files off, remove it from gluster,
>>>> rebuild the array, rebuild the filesystem, and then add it back as a
>>>> brick.  (Actually what I'd really like is to hear that the students
>>>> are
>>>> all done with the system and I can turn the whole thing off, but
>>>> theses
>>>> aren't complete yet.)
>>>>
>>>> Any advice or words of warning will be appreciated.
>>> Looks like your bricks are in trouble for over a year now
>>> (http://gluster.org/pipermail/gluster-users.old/2013-September/014319.html).
>>> Better get them fixed sooner than later! :-)
>> Oddly enough the old XRAID systems are holding up better than the VTRAK
>> arrays.  That doesn't help me much, though, since they're so small.
>>
>>> HTH,
>>> Ravi
>>>
>>>> James Bellinger
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Gluster-users mailing list
>>>> Gluster-users at gluster.org
>>>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>>>
>>
>
>