[Gluster-users] Does replace-brick migrate data?

Alan Orth alan.orth at gmail.com
Sat Jun 8 08:25:12 UTC 2019


Thank you, Nithya.

The "missing" directory is indeed present on all bricks. I enabled
client-log-level DEBUG on the volume and then noticed the following in the
FUSE mount log when doing a `stat` on the "missing" directory on the FUSE
mount:

[2019-06-08 08:03:30.240738] D [MSGID: 0]
[dht-common.c:3454:dht_do_fresh_lookup] 0-homes-dht: Calling fresh lookup
for /aorth/data on homes-replicate-2
[2019-06-08 08:03:30.241138] D [MSGID: 0]
[dht-common.c:3013:dht_lookup_cbk] 0-homes-dht: fresh_lookup returned for
/aorth/data with op_ret 0
[2019-06-08 08:03:30.241610] D [MSGID: 0]
[dht-common.c:1354:dht_lookup_dir_cbk] 0-homes-dht: Internal xattr
trusted.glusterfs.dht.mds is not present  on path /aorth/data gfid is
fb87699f-ebf3-4098-977d-85c3a70b849c
[2019-06-08 08:06:18.880961] D [MSGID: 0]
[dht-common.c:1559:dht_revalidate_cbk] 0-homes-dht: revalidate lookup of
/aorth/data returned with op_ret 0
[2019-06-08 08:06:18.880963] D [MSGID: 0]
[dht-common.c:1651:dht_revalidate_cbk] 0-homes-dht: internal xattr
trusted.glusterfs.dht.mds is not present on path /aorth/data gfid is
fb87699f-ebf3-4098-977d-85c3a70b849c
[2019-06-08 08:06:18.880996] D [MSGID: 0]
[dht-common.c:914:dht_common_mark_mdsxattr] 0-homes-dht: internal xattr
trusted.glusterfs.dht.mds is present on subvolon path /aorth/data gfid is
fb87699f-ebf3-4098-977d-85c3a70b849c

One message says the trusted.glusterfs.dht.mds xattr is not present, then
the next says it is present. Is that relevant? I looked at the xattrs of
that directory on all the bricks and it does seem to be inconsistent (also
the modification times on the directory are different):

[root at wingu0 ~]# getfattr -d -m. -e hex /mnt/gluster/homes/aorth/data
getfattr: Removing leading '/' from absolute path names
# file: mnt/gluster/homes/aorth/data
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.homes-client-3=0x000000000000000200000002
trusted.afr.homes-client-5=0x000000000000000000000000
trusted.gfid=0xfb87699febf34098977d85c3a70b849c
trusted.glusterfs.dht=0xe7c11ff200000000b6dd59efffffffff

[root at wingu3 ~]# getfattr -d -m. -e hex /mnt/gluster/homes/aorth/data
getfattr: Removing leading '/' from absolute path names
# file: mnt/gluster/homes/aorth/data
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.homes-client-0=0x000000000000000000000000
trusted.afr.homes-client-1=0x000000000000000000000000
trusted.gfid=0xfb87699febf34098977d85c3a70b849c
trusted.glusterfs.dht=0xe7c11ff2000000000000000049251e2d
trusted.glusterfs.dht.mds=0x00000000

[root at wingu4 ~]# getfattr -d -m. -e hex /mnt/gluster/homes/aorth/data
getfattr: Removing leading '/' from absolute path names
# file: mnt/gluster/homes/aorth/data
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.homes-client-0=0x000000000000000000000000
trusted.afr.homes-client-1=0x000000000000000000000000
trusted.gfid=0xfb87699febf34098977d85c3a70b849c
trusted.glusterfs.dht=0xe7c11ff2000000000000000049251e2d
trusted.glusterfs.dht.mds=0x00000000

[root at wingu05 ~]# getfattr -d -m. -e hex
/data/glusterfs/sdb/homes/aorth/data
getfattr: Removing leading '/' from absolute path names
# file: data/glusterfs/sdb/homes/aorth/data
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.homes-client-2=0x000000000000000000000000
trusted.gfid=0xfb87699febf34098977d85c3a70b849c
trusted.glusterfs.dht=0xe7c11ff20000000049251e2eb6dd59ee

[root at wingu05 ~]# getfattr -d -m. -e hex
/data/glusterfs/sdc/homes/aorth/data
getfattr: Removing leading '/' from absolute path names
# file: data/glusterfs/sdc/homes/aorth/data
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.gfid=0xfb87699febf34098977d85c3a70b849c
trusted.glusterfs.dht=0xe7c11ff200000000b6dd59efffffffff

[root at wingu06 ~]# getfattr -d -m. -e hex
/data/glusterfs/sdb/homes/aorth/data
getfattr: Removing leading '/' from absolute path names
# file: data/glusterfs/sdb/homes/aorth/data
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.gfid=0xfb87699febf34098977d85c3a70b849c
trusted.glusterfs.dht=0xe7c11ff20000000049251e2eb6dd59ee

This is a replica 2 volume on Gluster 5.6.

Thank you,

On Sat, Jun 8, 2019 at 5:28 AM Nithya Balachandran <nbalacha at redhat.com>
wrote:

>
>
> On Sat, 8 Jun 2019 at 01:29, Alan Orth <alan.orth at gmail.com> wrote:
>
>> Dear Ravi,
>>
>> In the last week I have completed a fix-layout and a full INDEX heal on
>> this volume. Now I've started a rebalance and I see a few terabytes of data
>> going around on different bricks since yesterday, which I'm sure is good.
>>
>> While I wait for the rebalance to finish, I'm wondering if you know what
>> would cause directories to be missing from the FUSE mount point? If I list
>> the directories explicitly I can see their contents, but they do not appear
>> in their parent directories' listing. In the case of duplicated files it is
>> always because the files are not on the correct bricks (according to the
>> Dynamo/Elastic Hash algorithm), and I can fix it by copying the file to the
>> correct brick(s) and removing it from the others (along with their
>> .glusterfs hard links). So what could cause directories to be missing?
>>
>> Hi Alan,
>
> The directories that don't show up in the parent directory listing is
> probably because they do not exist on the hashed subvol. Please check the
> backend bricks to see if they are missing on any of them.
>
> Regards,
> Nithya
>
> Thank you,
>>
>> Thank you,
>>
>> On Wed, Jun 5, 2019 at 1:08 AM Alan Orth <alan.orth at gmail.com> wrote:
>>
>>> Hi Ravi,
>>>
>>> You're right that I had mentioned using rsync to copy the brick content
>>> to a new host, but in the end I actually decided not to bring it up on a
>>> new brick. Instead I added the original brick back into the volume. So the
>>> xattrs and symlinks to .glusterfs on the original brick are fine. I think
>>> the problem probably lies with a remove-brick that got interrupted. A few
>>> weeks ago during the maintenance I had tried to remove a brick and then
>>> after twenty minutes and no obvious progress I stopped it—after that the
>>> bricks were still part of the volume.
>>>
>>> In the last few days I have run a fix-layout that took 26 hours and
>>> finished successfully. Then I started a full index heal and it has healed
>>> about 3.3 million files in a few days and I see a clear increase of network
>>> traffic from old brick host to new brick host over that time. Once the full
>>> index heal completes I will try to do a rebalance.
>>>
>>> Thank you,
>>>
>>>
>>> On Mon, Jun 3, 2019 at 7:40 PM Ravishankar N <ravishankar at redhat.com>
>>> wrote:
>>>
>>>>
>>>> On 01/06/19 9:37 PM, Alan Orth wrote:
>>>>
>>>> Dear Ravi,
>>>>
>>>> The .glusterfs hardlinks/symlinks should be fine. I'm not sure how I
>>>> could verify them for six bricks and millions of files, though... :\
>>>>
>>>> Hi Alan,
>>>>
>>>> The reason I asked this is because you had mentioned in one of your
>>>> earlier emails that when you moved content from the old brick to the new
>>>> one, you had skipped the .glusterfs directory. So I was assuming that when
>>>> you added back this new brick to the cluster, it might have been missing
>>>> the .glusterfs entries. If that is the cae, one way to verify could be to
>>>> check using a script if all files on the brick have a link-count of at
>>>> least 2 and all dirs have valid symlinks inside .glusterfs pointing to
>>>> themselves.
>>>>
>>>>
>>>> I had a small success in fixing some issues with duplicated files on
>>>> the FUSE mount point yesterday. I read quite a bit about the elastic
>>>> hashing algorithm that determines which files get placed on which bricks
>>>> based on the hash of their filename and the trusted.glusterfs.dht xattr on
>>>> brick directories (thanks to Joe Julian's blog post and Python script for
>>>> showing how it works¹). With that knowledge I looked closer at one of the
>>>> files that was appearing as duplicated on the FUSE mount and found that it
>>>> was also duplicated on more than `replica 2` bricks. For this particular
>>>> file I found two "real" files and several zero-size files with
>>>> trusted.glusterfs.dht.linkto xattrs. Neither of the "real" files were on
>>>> the correct brick as far as the DHT layout is concerned, so I copied one of
>>>> them to the correct brick, deleted the others and their hard links, and did
>>>> a `stat` on the file from the FUSE mount point and it fixed itself. Yay!
>>>>
>>>> Could this have been caused by a replace-brick that got interrupted and
>>>> didn't finish re-labeling the xattrs?
>>>>
>>>> No, replace-brick only initiates AFR self-heal, which just copies the
>>>> contents from the other brick(s) of the *same* replica pair into the
>>>> replaced brick.  The link-to files are created by DHT when you rename a
>>>> file from the client. If the new name hashes to a different  brick, DHT
>>>> does not move the entire file there. It instead creates the link-to file
>>>> (the one with the dht.linkto xattrs) on the hashed subvol. The value of
>>>> this xattr points to the brick where the actual data is there (`getfattr -e
>>>> text` to see it for yourself).  Perhaps you had attempted a rebalance or
>>>> remove-brick earlier and interrupted that?
>>>>
>>>> Should I be thinking of some heuristics to identify and fix these
>>>> issues with a script (incorrect brick placement), or is this something a
>>>> fix layout or repeated volume heals can fix? I've already completed a whole
>>>> heal on this particular volume this week and it did heal about 1,000,000
>>>> files (mostly data and metadata, but about 20,000 entry heals as well).
>>>>
>>>> Maybe you should let the AFR self-heals complete first and then attempt
>>>> a full rebalance to take care of the dht link-to files. But  if the files
>>>> are in millions, it could take quite some time to complete.
>>>> Regards,
>>>> Ravi
>>>>
>>>> Thanks for your support,
>>>>
>>>> ¹ https://joejulian.name/post/dht-misses-are-expensive/
>>>>
>>>> On Fri, May 31, 2019 at 7:57 AM Ravishankar N <ravishankar at redhat.com>
>>>> wrote:
>>>>
>>>>>
>>>>> On 31/05/19 3:20 AM, Alan Orth wrote:
>>>>>
>>>>> Dear Ravi,
>>>>>
>>>>> I spent a bit of time inspecting the xattrs on some files and
>>>>> directories on a few bricks for this volume and it looks a bit messy. Even
>>>>> if I could make sense of it for a few and potentially heal them manually,
>>>>> there are millions of files and directories in total so that's definitely
>>>>> not a scalable solution. After a few missteps with `replace-brick ...
>>>>> commit force` in the last week—one of which on a brick that was
>>>>> dead/offline—as well as some premature `remove-brick` commands, I'm unsure
>>>>> how how to proceed and I'm getting demotivated. It's scary how quickly
>>>>> things get out of hand in distributed systems...
>>>>>
>>>>> Hi Alan,
>>>>> The one good thing about gluster is it that the data is always
>>>>> available directly on the backed bricks even if your volume has
>>>>> inconsistencies at the gluster level. So theoretically, if your cluster is
>>>>> FUBAR, you could just create a new volume and copy all data onto it via its
>>>>> mount from the old volume's bricks.
>>>>>
>>>>>
>>>>> I had hoped that bringing the old brick back up would help, but by the
>>>>> time I added it again a few days had passed and all the brick-id's had
>>>>> changed due to the replace/remove brick commands, not to mention that the
>>>>> trusted.afr.$volume-client-xx values were now probably pointing to the
>>>>> wrong bricks (?).
>>>>>
>>>>> Anyways, a few hours ago I started a full heal on the volume and I see
>>>>> that there is a sustained 100MiB/sec of network traffic going from the old
>>>>> brick's host to the new one. The completed heals reported in the logs look
>>>>> promising too:
>>>>>
>>>>> Old brick host:
>>>>>
>>>>> # grep '2019-05-30' /var/log/glusterfs/glustershd.log | grep -o -E
>>>>> 'Completed (data|metadata|entry) selfheal' | sort | uniq -c
>>>>>  281614 Completed data selfheal
>>>>>      84 Completed entry selfheal
>>>>>  299648 Completed metadata selfheal
>>>>>
>>>>> New brick host:
>>>>>
>>>>> # grep '2019-05-30' /var/log/glusterfs/glustershd.log | grep -o -E
>>>>> 'Completed (data|metadata|entry) selfheal' | sort | uniq -c
>>>>>  198256 Completed data selfheal
>>>>>   16829 Completed entry selfheal
>>>>>  229664 Completed metadata selfheal
>>>>>
>>>>> So that's good I guess, though I have no idea how long it will take or
>>>>> if it will fix the "missing files" issue on the FUSE mount. I've increased
>>>>> cluster.shd-max-threads to 8 to hopefully speed up the heal process.
>>>>>
>>>>> The afr xattrs should not cause files to disappear from mount. If the
>>>>> xattr names do not match what each AFR subvol expects (for eg. in a replica
>>>>> 2 volume, trusted.afr.*-client-{0,1} for 1st subvol, client-{2,3} for 2nd
>>>>> subvol and so on - ) for its children then it won't heal the data, that is
>>>>> all. But in your case I see some inconsistencies like one brick having the
>>>>> actual file (licenseserver.cfg) and the other having a linkto file
>>>>> (the one with the dht.linkto xattr) *in the same replica pair*.
>>>>>
>>>>>
>>>>> I'd be happy for any advice or pointers,
>>>>>
>>>>> Did you check if the .glusterfs hardlinks/symlinks exist and are in
>>>>> order for all bricks?
>>>>>
>>>>> -Ravi
>>>>>
>>>>>
>>>>> On Wed, May 29, 2019 at 5:20 PM Alan Orth <alan.orth at gmail.com> wrote:
>>>>>
>>>>>> Dear Ravi,
>>>>>>
>>>>>> Thank you for the link to the blog post series—it is very informative
>>>>>> and current! If I understand your blog post correctly then I think the
>>>>>> answer to your previous question about pending AFRs is: no, there are no
>>>>>> pending AFRs. I have identified one file that is a good test case to try to
>>>>>> understand what happened after I issued the `gluster volume replace-brick
>>>>>> ... commit force` a few days ago and then added the same original brick
>>>>>> back to the volume later. This is the current state of the replica 2
>>>>>> distribute/replicate volume:
>>>>>>
>>>>>> [root at wingu0 ~]# gluster volume info apps
>>>>>>
>>>>>> Volume Name: apps
>>>>>> Type: Distributed-Replicate
>>>>>> Volume ID: f118d2da-79df-4ee1-919d-53884cd34eda
>>>>>> Status: Started
>>>>>> Snapshot Count: 0
>>>>>> Number of Bricks: 3 x 2 = 6
>>>>>> Transport-type: tcp
>>>>>> Bricks:
>>>>>> Brick1: wingu3:/mnt/gluster/apps
>>>>>> Brick2: wingu4:/mnt/gluster/apps
>>>>>> Brick3: wingu05:/data/glusterfs/sdb/apps
>>>>>> Brick4: wingu06:/data/glusterfs/sdb/apps
>>>>>> Brick5: wingu0:/mnt/gluster/apps
>>>>>> Brick6: wingu05:/data/glusterfs/sdc/apps
>>>>>> Options Reconfigured:
>>>>>> diagnostics.client-log-level: DEBUG
>>>>>> storage.health-check-interval: 10
>>>>>> nfs.disable: on
>>>>>>
>>>>>> I checked the xattrs of one file that is missing from the volume's
>>>>>> FUSE mount (though I can read it if I access its full path explicitly), but
>>>>>> is present in several of the volume's bricks (some with full size, others
>>>>>> empty):
>>>>>>
>>>>>> [root at wingu0 ~]# getfattr -d -m. -e hex
>>>>>> /mnt/gluster/apps/clcgenomics/clclicsrv/licenseserver.cfg
>>>>>>
>>>>>> getfattr: Removing leading '/' from absolute path names
>>>>>> # file: mnt/gluster/apps/clcgenomics/clclicsrv/licenseserver.cfg
>>>>>> security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
>>>>>> trusted.afr.apps-client-3=0x000000000000000000000000
>>>>>> trusted.afr.apps-client-5=0x000000000000000000000000
>>>>>> trusted.afr.dirty=0x000000000000000000000000
>>>>>> trusted.bit-rot.version=0x0200000000000000585a396f00046e15
>>>>>> trusted.gfid=0x878003a2fb5243b6a0d14d2f8b4306bd
>>>>>>
>>>>>> [root at wingu05 ~]# getfattr -d -m. -e hex /data/glusterfs/sdb/apps/clcgenomics/clclicsrv/licenseserver.cfg
>>>>>> getfattr: Removing leading '/' from absolute path names
>>>>>> # file: data/glusterfs/sdb/apps/clcgenomics/clclicsrv/licenseserver.cfg
>>>>>> security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
>>>>>> trusted.gfid=0x878003a2fb5243b6a0d14d2f8b4306bd
>>>>>> trusted.gfid2path.82586deefbc539c3=0x34666437323861612d356462392d343836382d616232662d6564393031636566333561392f6c6963656e73657365727665722e636667
>>>>>> trusted.glusterfs.dht.linkto=0x617070732d7265706c69636174652d3200
>>>>>>
>>>>>> [root at wingu05 ~]# getfattr -d -m. -e hex /data/glusterfs/sdc/apps/clcgenomics/clclicsrv/licenseserver.cfg
>>>>>> getfattr: Removing leading '/' from absolute path names
>>>>>> # file: data/glusterfs/sdc/apps/clcgenomics/clclicsrv/licenseserver.cfg
>>>>>> security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
>>>>>> trusted.gfid=0x878003a2fb5243b6a0d14d2f8b4306bd
>>>>>> trusted.gfid2path.82586deefbc539c3=0x34666437323861612d356462392d343836382d616232662d6564393031636566333561392f6c6963656e73657365727665722e636667
>>>>>>
>>>>>> [root at wingu06 ~]# getfattr -d -m. -e hex /data/glusterfs/sdb/apps/clcgenomics/clclicsrv/licenseserver.cfg
>>>>>> getfattr: Removing leading '/' from absolute path names
>>>>>> # file: data/glusterfs/sdb/apps/clcgenomics/clclicsrv/licenseserver.cfg
>>>>>> security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
>>>>>> trusted.gfid=0x878003a2fb5243b6a0d14d2f8b4306bd
>>>>>> trusted.gfid2path.82586deefbc539c3=0x34666437323861612d356462392d343836382d616232662d6564393031636566333561392f6c6963656e73657365727665722e636667
>>>>>> trusted.glusterfs.dht.linkto=0x617070732d7265706c69636174652d3200
>>>>>>
>>>>>> According to the trusted.afr.apps-client-xx xattrs this particular
>>>>>> file should be on bricks with id "apps-client-3" and "apps-client-5". It
>>>>>> took me a few hours to realize that the brick-id values are recorded in the
>>>>>> volume's volfiles in /var/lib/glusterd/vols/apps/bricks. After comparing
>>>>>> those brick-id values with a volfile backup from before the replace-brick,
>>>>>> I realized that the files are simply on the wrong brick now as far as
>>>>>> Gluster is concerned. This particular file is now on the brick for
>>>>>> "apps-client-4". As an experiment I copied this one file to the two
>>>>>> bricks listed in the xattrs and I was then able to see the file from the
>>>>>> FUSE mount (yay!).
>>>>>>
>>>>>> Other than replacing the brick, removing it, and then adding the old
>>>>>> brick on the original server back, there has been no change in the data
>>>>>> this entire time. Can I change the brick IDs in the volfiles so they
>>>>>> reflect where the data actually is? Or perhaps script something to reset
>>>>>> all the xattrs on the files/directories to point to the correct bricks?
>>>>>>
>>>>>> Thank you for any help or pointers,
>>>>>>
>>>>>> On Wed, May 29, 2019 at 7:24 AM Ravishankar N <ravishankar at redhat.com>
>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>> On 29/05/19 9:50 AM, Ravishankar N wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 29/05/19 3:59 AM, Alan Orth wrote:
>>>>>>>
>>>>>>> Dear Ravishankar,
>>>>>>>
>>>>>>> I'm not sure if Brick4 had pending AFRs because I don't know what
>>>>>>> that means and it's been a few days so I am not sure I would be able to
>>>>>>> find that information.
>>>>>>>
>>>>>>> When you find some time, have a look at a blog
>>>>>>> <http://wp.me/peiBB-6b> series I wrote about AFR- I've tried to
>>>>>>> explain what one needs to know to debug replication related issues in it.
>>>>>>>
>>>>>>> Made a typo error. The URL for the blog is https://wp.me/peiBB-6b
>>>>>>>
>>>>>>> -Ravi
>>>>>>>
>>>>>>>
>>>>>>> Anyways, after wasting a few days rsyncing the old brick to a new
>>>>>>> host I decided to just try to add the old brick back into the volume
>>>>>>> instead of bringing it up on the new host. I created a new brick directory
>>>>>>> on the old host, moved the old brick's contents into that new directory
>>>>>>> (minus the .glusterfs directory), added the new brick to the volume, and
>>>>>>> then did Vlad's find/stat trick¹ from the brick to the FUSE mount point.
>>>>>>>
>>>>>>> The interesting problem I have now is that some files don't appear
>>>>>>> in the FUSE mount's directory listings, but I can actually list them
>>>>>>> directly and even read them. What could cause that?
>>>>>>>
>>>>>>> Not sure, too many variables in the hacks that you did to take a
>>>>>>> guess. You can check if the contents of the .glusterfs folder are in order
>>>>>>> on the new brick (example hardlink for files and symlinks for directories
>>>>>>> are present etc.) .
>>>>>>> Regards,
>>>>>>> Ravi
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> ¹
>>>>>>> https://lists.gluster.org/pipermail/gluster-users/2018-February/033584.html
>>>>>>>
>>>>>>> On Fri, May 24, 2019 at 4:59 PM Ravishankar N <
>>>>>>> ravishankar at redhat.com> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> On 23/05/19 2:40 AM, Alan Orth wrote:
>>>>>>>>
>>>>>>>> Dear list,
>>>>>>>>
>>>>>>>> I seem to have gotten into a tricky situation. Today I brought up a
>>>>>>>> shiny new server with new disk arrays and attempted to replace one brick of
>>>>>>>> a replica 2 distribute/replicate volume on an older server using the
>>>>>>>> `replace-brick` command:
>>>>>>>>
>>>>>>>> # gluster volume replace-brick homes wingu0:/mnt/gluster/homes
>>>>>>>> wingu06:/data/glusterfs/sdb/homes commit force
>>>>>>>>
>>>>>>>> The command was successful and I see the new brick in the output of
>>>>>>>> `gluster volume info`. The problem is that Gluster doesn't seem to be
>>>>>>>> migrating the data,
>>>>>>>>
>>>>>>>> `replace-brick` definitely must heal (not migrate) the data. In
>>>>>>>> your case, data must have been healed from Brick-4 to the replaced Brick-3.
>>>>>>>> Are there any errors in the self-heal daemon logs of Brick-4's node? Does
>>>>>>>> Brick-4 have pending AFR xattrs blaming Brick-3? The doc is a bit out of
>>>>>>>> date. replace-brick command internally does all the setfattr steps that are
>>>>>>>> mentioned in the doc.
>>>>>>>>
>>>>>>>> -Ravi
>>>>>>>>
>>>>>>>>
>>>>>>>> and now the original brick that I replaced is no longer part of the
>>>>>>>> volume (and a few terabytes of data are just sitting on the old brick):
>>>>>>>>
>>>>>>>> # gluster volume info homes | grep -E "Brick[0-9]:"
>>>>>>>> Brick1: wingu4:/mnt/gluster/homes
>>>>>>>> Brick2: wingu3:/mnt/gluster/homes
>>>>>>>> Brick3: wingu06:/data/glusterfs/sdb/homes
>>>>>>>> Brick4: wingu05:/data/glusterfs/sdb/homes
>>>>>>>> Brick5: wingu05:/data/glusterfs/sdc/homes
>>>>>>>> Brick6: wingu06:/data/glusterfs/sdc/homes
>>>>>>>>
>>>>>>>> I see the Gluster docs have a more complicated procedure for
>>>>>>>> replacing bricks that involves getfattr/setfattr¹. How can I tell Gluster
>>>>>>>> about the old brick? I see that I have a backup of the old volfile thanks
>>>>>>>> to yum's rpmsave function if that helps.
>>>>>>>>
>>>>>>>> We are using Gluster 5.6 on CentOS 7. Thank you for any advice you
>>>>>>>> can give.
>>>>>>>>
>>>>>>>> ¹
>>>>>>>> https://docs.gluster.org/en/latest/Administrator%20Guide/Managing%20Volumes/#replace-faulty-brick
>>>>>>>>
>>>>>>>> --
>>>>>>>> Alan Orth
>>>>>>>> alan.orth at gmail.com
>>>>>>>> https://picturingjordan.com
>>>>>>>> https://englishbulgaria.net
>>>>>>>> https://mjanja.ch
>>>>>>>> "In heaven all the interesting people are missing." ―Friedrich
>>>>>>>> Nietzsche
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Gluster-users mailing listGluster-users at gluster.orghttps://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Alan Orth
>>>>>>> alan.orth at gmail.com
>>>>>>> https://picturingjordan.com
>>>>>>> https://englishbulgaria.net
>>>>>>> https://mjanja.ch
>>>>>>> "In heaven all the interesting people are missing." ―Friedrich
>>>>>>> Nietzsche
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Gluster-users mailing listGluster-users at gluster.orghttps://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Alan Orth
>>>>>> alan.orth at gmail.com
>>>>>> https://picturingjordan.com
>>>>>> https://englishbulgaria.net
>>>>>> https://mjanja.ch
>>>>>> "In heaven all the interesting people are missing." ―Friedrich
>>>>>> Nietzsche
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Alan Orth
>>>>> alan.orth at gmail.com
>>>>> https://picturingjordan.com
>>>>> https://englishbulgaria.net
>>>>> https://mjanja.ch
>>>>> "In heaven all the interesting people are missing." ―Friedrich
>>>>> Nietzsche
>>>>>
>>>>>
>>>>
>>>> --
>>>> Alan Orth
>>>> alan.orth at gmail.com
>>>> https://picturingjordan.com
>>>> https://englishbulgaria.net
>>>> https://mjanja.ch
>>>> "In heaven all the interesting people are missing." ―Friedrich Nietzsche
>>>>
>>>>
>>>
>>> --
>>> Alan Orth
>>> alan.orth at gmail.com
>>> https://picturingjordan.com
>>> https://englishbulgaria.net
>>> https://mjanja.ch
>>> "In heaven all the interesting people are missing." ―Friedrich Nietzsche
>>>
>>
>>
>> --
>> Alan Orth
>> alan.orth at gmail.com
>> https://picturingjordan.com
>> https://englishbulgaria.net
>> https://mjanja.ch
>> "In heaven all the interesting people are missing." ―Friedrich Nietzsche
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>
>

-- 
Alan Orth
alan.orth at gmail.com
https://picturingjordan.com
https://englishbulgaria.net
https://mjanja.ch
"In heaven all the interesting people are missing." ―Friedrich Nietzsche
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190608/6cea424a/attachment.html>


More information about the Gluster-users mailing list