[Gluster-users] Removing subvolume from dist/rep volume
Nithya Balachandran
nbalacha at redhat.com
Tue Jul 2 06:55:02 UTC 2019
Hi Dave,
Yes, files in split brain are not migrated as we cannot figure out which is
the good copy. Adding Ravi to look at this and see what can be done.
Also adding Krutika as this is a sharded volume.
The files with the "---------T" permissions are internal files and can be
ignored. Ravi and Krutika, please take a look at the other files.
Regards,
Nithya
On Fri, 28 Jun 2019 at 19:56, Dave Sherohman <dave at sherohman.org> wrote:
> On Thu, Jun 27, 2019 at 12:17:10PM +0530, Nithya Balachandran wrote:
> > There are some edge cases that may prevent a file from being migrated
> > during a remove-brick. Please do the following after this:
> >
> > 1. Check the remove-brick status for any failures. If there are any,
> > check the rebalance log file for errors.
> > 2. Even if there are no failures, check the removed bricks to see if
> any
> > files have not been migrated. If there are any, please check that
> they are
> > valid files on the brick and copy them to the volume from the brick
> to the
> > mount point.
>
> Well, looks like I hit one of those edge cases. Probably because of
> some issues around a reboot last September which left a handful of files
> in a state where self-heal identified them as needing to be healed, but
> incapable of actually healing them. (Check the list archives for
> "Kicking a stuck heal", posted on Sept 4, if you want more details.)
>
> So I'm getting 9 failures on the arbiter (merlin), 8 on one data brick
> (gandalf), and 3 on the other (saruman). Looking in
> /var/log/gluster/palantir-rebalance.log, I see those numbers of
>
> migrate file failed: /.shard/291e9749-2d1b-47af-ad53-3a09ad4e64c6.229:
> failed to lock file on palantir-replicate-1 [Stale file handle]
>
> errors.
>
> Also, merlin has four errors, and gandalf has one, of the form:
>
> Gfid mismatch detected for
> <gfid:be318638-e8a0-4c6d-977d-7a937aa84806>/0f500288-ff62-4f0b-9574-53f510b4159f.2898>,
> 9f00c0fe-58c3-457e-a2e6-f6a006d1cfc6 on palantir-client-7 and
> 08bb7cdc-172b-4c21-916a-2a244c095a3e on palantir-client-1.
>
> There are no gfid mismatches recorded on saruman. All of the gfid
> mismatches are for <gfid:be318638-e8a0-4c6d-977d-7a937aa84806> and (on
> saruman) appear to correspond to 0-byte files (e.g.,
> .shard/0f500288-ff62-4f0b-9574-53f510b4159f.2898, in the case of the
> gfid mismatch quoted above).
>
> For both types of errors, all affected files are in .shard/ and have
> UUID-style names, so I have no idea which actual files they belong to.
> File sizes are generally either 0 bytes or 4M (exactly), although one of
> them has a size slightly larger than 3M. So I'm assuming they're chunks
> of larger files (which would be almost all the files on the volume -
> it's primarily holding disk image files for kvm servers).
>
> Web searches generally seem to consider gfid mismatches to be a form of
> split-brain, but `gluster volume heal palantir info split-brain` shows
> "Number of entries in split-brain: 0" for all bricks, including those
> bricks which are reporting gfid mismatches.
>
>
> Given all that, how do I proceed with cleaning up the stale handle
> issues? I would guess that this will involve somehow converting the
> shard filename to a "real" filename, then shutting down the
> corresponding VM and maybe doing some additional cleanup.
>
> And then there's the gfid mismatches. Since they're for 0-byte files,
> is it safe to just ignore them on the assumption that they only hold
> metadata? Or do I need to do some kind of split-brain resolution on
> them (even though gluster says no files are in split-brain)?
>
>
> Finally, a listing of /var/local/brick0/data/.shard on saruman, in case
> any of the information it contains (like file sizes/permissions) might
> provide clues to resolving the errors:
>
> --- cut here ---
> root at saruman:/var/local/brick0/data/.shard# ls -l
> total 63996
> -rw-rw---- 2 root libvirt-qemu 0 Sep 17 2018
> 0f500288-ff62-4f0b-9574-53f510b4159f.2864
> -rw-rw---- 2 root libvirt-qemu 0 Sep 17 2018
> 0f500288-ff62-4f0b-9574-53f510b4159f.2868
> -rw-rw---- 2 root libvirt-qemu 0 Sep 17 2018
> 0f500288-ff62-4f0b-9574-53f510b4159f.2879
> -rw-rw---- 2 root libvirt-qemu 0 Sep 17 2018
> 0f500288-ff62-4f0b-9574-53f510b4159f.2898
> -rw------- 2 root libvirt-qemu 4194304 May 17 14:42
> 291e9749-2d1b-47af-ad53-3a09ad4e64c6.229
> -rw------- 2 root libvirt-qemu 4194304 Jun 24 09:10
> 291e9749-2d1b-47af-ad53-3a09ad4e64c6.925
> -rw-rw-r-- 2 root libvirt-qemu 4194304 Jun 26 12:54
> 2df12cb0-6cf4-44ae-8b0a-4a554791187e.266
> -rw-rw-r-- 2 root libvirt-qemu 4194304 Jun 26 16:30
> 2df12cb0-6cf4-44ae-8b0a-4a554791187e.820
> -rw-r--r-- 2 root libvirt-qemu 4194304 Jun 17 20:22
> 323186b1-6296-4cbe-8275-b940cc9d65cf.27466
> -rw-r--r-- 2 root libvirt-qemu 4194304 Jun 27 05:01
> 323186b1-6296-4cbe-8275-b940cc9d65cf.32575
> -rw-r--r-- 2 root libvirt-qemu 3145728 Jun 11 13:23
> 323186b1-6296-4cbe-8275-b940cc9d65cf.3448
> ---------T 2 root libvirt-qemu 0 Jun 28 14:26
> 4cd094f4-0344-4660-98b0-83249d5bd659.22998
> -rw------- 2 root libvirt-qemu 4194304 Mar 13 2018
> 6cdd2e5c-f49e-492b-8039-239e71577836.1302
> ---------T 2 root libvirt-qemu 0 Jun 28 13:22
> 7530a2d1-d6ec-4a04-95a2-da1f337ac1ad.47131
> ---------T 2 root libvirt-qemu 0 Jun 28 13:22
> 7530a2d1-d6ec-4a04-95a2-da1f337ac1ad.52615
> -rw-rw-r-- 2 root libvirt-qemu 4194304 Jun 27 08:56
> 8fefae99-ed2a-4a8f-ab87-aa94c6bb6e68.100
> -rw-rw-r-- 2 root libvirt-qemu 4194304 Jun 27 11:29
> 8fefae99-ed2a-4a8f-ab87-aa94c6bb6e68.106
> -rw-rw-r-- 2 root libvirt-qemu 4194304 Jun 28 02:35
> 8fefae99-ed2a-4a8f-ab87-aa94c6bb6e68.137
> -rw-rw-r-- 2 root libvirt-qemu 4194304 Nov 4 2018
> 9544617c-901c-4613-a94b-ccfad4e38af1.165
> -rw-rw-r-- 2 root libvirt-qemu 4194304 Nov 4 2018
> 9544617c-901c-4613-a94b-ccfad4e38af1.168
> -rw-rw-r-- 2 root libvirt-qemu 4194304 Nov 5 2018
> 9544617c-901c-4613-a94b-ccfad4e38af1.193
> -rw-rw-r-- 2 root libvirt-qemu 4194304 Nov 6 2018
> 9544617c-901c-4613-a94b-ccfad4e38af1.3800
> ---------T 2 root libvirt-qemu 0 Jun 28 15:02
> b48a5934-5e5b-4918-8193-6ff36f685f70.46559
> -rw-rw---- 2 root libvirt-qemu 0 Oct 12 2018
> c5bde2f2-3361-4d1a-9c88-28751ef74ce6.3568
> -rw-r--r-- 2 root libvirt-qemu 4194304 Apr 13 2018
> c953c676-152d-4826-80ff-bd307fa7f6e5.10724
> -rw-r--r-- 2 root libvirt-qemu 4194304 Apr 11 2018
> c953c676-152d-4826-80ff-bd307fa7f6e5.3101
> --- cut here ---
>
> --
> Dave Sherohman
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190702/d122ffc3/attachment.html>
More information about the Gluster-users
mailing list