[Gluster-users] Gluster 3.6.9 missing files during remove migration operations

Fri Apr 29 01:31:39 UTC 2016

3.6.9 does not contain all fixes to trigger auto-heal when modifying the 
replica count using replace-brick/ add-brick commands.
For replace-brick, you might want to try out the manual steps mentioned 
in the "Replacing brick in Replicate/Distributed Replicate volumes" 
section of [1].
For add-brick, the steps mentioned by Anuradha in [2] should work.

HTH,
Ravi

[1] 
http://gluster.readthedocs.io/en/latest/Administrator%20Guide/Managing%20Volumes/#replace-brick
[2] https://www.gluster.org/pipermail/gluster-users/2016-January/025083.html

On 04/29/2016 01:51 AM, Bernard Gardner wrote:
> Further to this, I've continued my testing and discovered that during 
> the same type of migration operations (add-brick followed by 
> remove-brick in a replica=2 config), I see shell wildcard expansion 
> sometimes returning multiple instances of the same filename - so it 
> seems that the namespace for a FUSE mounted filesystem during a brick 
> deletion operation is somewhat mutable. This behaviour is 
> intermittent, but occurs frequently enough that I'd say it's repeatable.
>
> Does anyone have any feedback on my previous question of this being 
> expected behavior or a bug?
>
> Thanks,
> Bernard.
>
> On 20 April 2016 at 19:55, Bernard Gardner <bernard at sprybts.com 
> <mailto:bernard at sprybts.com>> wrote:
>
>     Hi,
>
>     I'm running gluster 3.6.9 on ubuntu 14.04 on a single test server
>     (under Vagrant and VirtualBox), with 4 filesystems (in addition to
>     the root), 2 of which are xfs directly on the disk, and the other
>     2 are xfs on an LVM config - the scenario I'm testing for is
>     migration of our production gluster to add LVM so that we can use
>     the snapshot features in 3.6 to implement offline backups.
>
>     On my test machine, I configured a volume with replica 2 and 2
>     bricks (with both bricks on the same server). I then started and
>     mounted the volume back onto the same server under /mnt and
>     populated /mnt with a 3 level deep hierarchy of 16 directories,
>     and in each the leaf directories added 10 files of 1kB. So there
>     are 40960 files in the filesystem (16x16x16x10) named like a/b/c/abc.0
>
>     For my first test, I did a "replace-brick commit force" to swap
>     the first brick in my config with a new brick on one of the xfs on
>     LVM filesystems. This resulted in the /mnt filesystem appearing
>     empty until I manually started a full heal on the volume after
>     which the files and directories started to re-appear on the
>     mounted filesystem - after the heal completed, everything looked
>     OK, but that's not going to work for our production systems. This
>     appeared to be the suggestion from
>     https://www.gluster.org/pipermail/gluster-users/2012-October/011502.html
>     for a replicated volume
>
>     For my second attempt, I rebuilt the test system from scratch,
>     built and mounted the gluster volume the same way and populated it
>     with the same test file configuration. I then did a volume
>     add-brick and added both of the xfs on LVM filesystems to the
>     configuration. The directory tree was copied to the new bricks,
>     but no files were moved. I then did volume remove-brick on the 2
>     initial bricks and the system started migrating the files to the
>     new filesystems. This looked more promising, but during the
>     migration operation, I ran find /mnt -type f | wc -l a number of
>     times and on one of those checks, the number of files was 39280
>     instead of 40960 - I wasn't able to observe exactly which files
>     were missing, I ran the command again immediately and it reported
>     40960 files every other time during the migration.
>
>     Is this expected behavior, or have I stumbled on a bug?
>
>     Is there a better workflow for completing this migration?
>
>     The production system runs in AWS and has 6 gluster servers over 2
>     availability zones, each of which has 1x600GB brick on an EBS
>     volume, which are configured into a single 1.8TB volume with
>     replication across the availability zones. We are planning on
>     creating the new volumes with about 10% headroom left in the LVM
>     config for holding snapshots, and hoping we can implement a backup
>     solution by doing a gluster snapshot, followed by an EBS snapshot
>     to get a consistent point in time offline backup (and then delete
>     the gluster snapshot once the EBS snapshot has been taken). I
>     haven't yet figured out the details of how we would restore from
>     the snapshots (I can test that scenario once I have a working
>     local test migration procedure and can migrate our test
>     environment in AWS to support snapshots).
>
>     Thanks,
>     Bernard.
>
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160429/374d0374/attachment.html>