[Gluster-users] Heal flapping between Possibly undergoing heal and In split brain
Milos Cuculovic
cuculovic at mdpi.com
Thu Mar 21 10:56:48 UTC 2019
Sure, thank you for following up.
About the commands, here is what I see:
brick1:
—————————————————————————————————————
sudo gluster volume heal storage2 info
Brick storage3:/data/data-cluster
<gfid:256ca960-1601-4f0d-9b08-905c6fd52326>
<gfid:7a63a729-c48f-4a00-9040-c3e2a0710ae6>
/dms/final_archive - Possibly undergoing heal
Status: Connected
Number of entries: 3
Brick storage4:/data/data-cluster
<gfid:276fec9a-1c9b-4efe-9715-dcf4207e99b0>
/dms/final_archive - Possibly undergoing heal
Status: Connected
Number of entries: 2
—————————————————————————————————————
sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive
getfattr: Removing leading '/' from absolute path names
# file: data/data-cluster/dms/final_archive
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.storage2-client-1=0x000000000000000000000010
trusted.gfid=0x16c6a1e2b3fe4851972b998980097a87
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.dht.mds=0x00000000
—————————————————————————————————————
stat /data/data-cluster/dms/final_archive
File: '/data/data-cluster/dms/final_archive'
Size: 3497984 Blocks: 8768 IO Block: 4096 directory
Device: 807h/2055d Inode: 26427748396 Links: 72123
Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data)
Access: 2018-10-09 04:22:40.514629044 +0200
Modify: 2019-03-21 11:55:37.382278863 +0100
Change: 2019-03-21 11:55:37.382278863 +0100
Birth: -
—————————————————————————————————————
—————————————————————————————————————
brick2:
—————————————————————————————————————
sudo gluster volume heal storage2 info
Brick storage3:/data/data-cluster
<gfid:256ca960-1601-4f0d-9b08-905c6fd52326>
<gfid:7a63a729-c48f-4a00-9040-c3e2a0710ae6>
/dms/final_archive - Possibly undergoing heal
Status: Connected
Number of entries: 3
Brick storage4:/data/data-cluster
<gfid:276fec9a-1c9b-4efe-9715-dcf4207e99b0>
/dms/final_archive - Possibly undergoing heal
Status: Connected
Number of entries: 2
—————————————————————————————————————
sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive
getfattr: Removing leading '/' from absolute path names
# file: data/data-cluster/dms/final_archive
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.storage2-client-0=0x000000000000000000000001
trusted.gfid=0x16c6a1e2b3fe4851972b998980097a87
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.dht.mds=0x00000000
—————————————————————————————————————
stat /data/data-cluster/dms/final_archive
File: '/data/data-cluster/dms/final_archive'
Size: 3497984 Blocks: 8760 IO Block: 4096 directory
Device: 807h/2055d Inode: 13563551265 Links: 72124
Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data)
Access: 2018-10-09 04:22:40.514629044 +0200
Modify: 2019-03-21 11:55:46.382565124 +0100
Change: 2019-03-21 11:55:46.382565124 +0100
Birth: -
—————————————————————————————————————
Hope this helps.
- Kindest regards,
Milos Cuculovic
IT Manager
---
MDPI AG
Postfach, CH-4020 Basel, Switzerland
Office: St. Alban-Anlage 66, 4052 Basel, Switzerland
Tel. +41 61 683 77 35
Fax +41 61 302 89 18
Email: cuculovic at mdpi.com
Skype: milos.cuculovic.mdpi
Disclaimer: The information and files contained in this message are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this message in error, please notify me and delete this message from your system. You may not copy this message in its entirety or in part, or disclose its contents to anyone.
> On 21 Mar 2019, at 11:43, Karthik Subrahmanya <ksubrahm at redhat.com> wrote:
>
> Can you attach the "glustershd.log" file which will be present under "/var/log/glusterfs/" from both the nodes and the "stat" & "getfattr -d -m . -e hex <file-path-on-brick>" output of all the entries listed in the heal info output from both the bricks?
>
> On Thu, Mar 21, 2019 at 3:54 PM Milos Cuculovic <cuculovic at mdpi.com <mailto:cuculovic at mdpi.com>> wrote:
> Thanks Karthik!
>
> I was trying to find some resolution methods from [2] but unfortunately none worked (I can explain what I tried if needed).
>
>> I guess the volume you are talking about is of type replica-2 (1x2).
> That’s correct, aware of the arbiter solution but still didn’t took time to implement.
>
> From the info results I posted, how to know in which situation I am. No files are mentioned in spit brain, only directories. One brick has 3 entries and one two entries.
>
> sudo gluster volume heal storage2 info
> [sudo] password for sshadmin:
> Brick storage3:/data/data-cluster
> <gfid:256ca960-1601-4f0d-9b08-905c6fd52326>
> <gfid:7a63a729-c48f-4a00-9040-c3e2a0710ae6>
> /dms/final_archive - Possibly undergoing heal
>
> Status: Connected
> Number of entries: 3
>
> Brick storage4:/data/data-cluster
> <gfid:276fec9a-1c9b-4efe-9715-dcf4207e99b0>
> /dms/final_archive - Possibly undergoing heal
>
> Status: Connected
> Number of entries: 2
>
> - Kindest regards,
>
> Milos Cuculovic
> IT Manager
>
> ---
> MDPI AG
> Postfach, CH-4020 Basel, Switzerland
> Office: St. Alban-Anlage 66, 4052 Basel, Switzerland
> Tel. +41 61 683 77 35
> Fax +41 61 302 89 18
> Email: cuculovic at mdpi.com <mailto:cuculovic at mdpi.com>
> Skype: milos.cuculovic.mdpi
>
> Disclaimer: The information and files contained in this message are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this message in error, please notify me and delete this message from your system. You may not copy this message in its entirety or in part, or disclose its contents to anyone.
>
>> On 21 Mar 2019, at 10:27, Karthik Subrahmanya <ksubrahm at redhat.com <mailto:ksubrahm at redhat.com>> wrote:
>>
>> Hi,
>>
>> Note: I guess the volume you are talking about is of type replica-2 (1x2). Usually replica 2 volumes are prone to split-brain. If you can consider converting them to arbiter or replica-3, they will handle most of the cases which can lead to slit-brains. For more information see [1].
>>
>> Resolving the split-brain: [2] talks about how to interpret the heal info output and different ways to resolve them using the CLI/manually/using the favorite-child-policy.
>> If you are having entry split brain, and is a gfid split-brain (file/dir having different gfids on the replica bricks) then you can use the CLI option to resolve them. If a directory is in gfid split-brain in a distributed-replicate volume and you are using the source-brick option please make sure you use the brick of this subvolume, which has the same gfid as that of the other distribute subvolume(s) where you have the correct gfid, as the source.
>> If you are having a type mismatch then follow the steps in [3] to resolve the split-brain.
>>
>> [1] https://docs.gluster.org/en/v3/Administrator%20Guide/Split%20brain%20and%20ways%20to%20deal%20with%20it/ <https://docs.gluster.org/en/v3/Administrator%20Guide/Split%20brain%20and%20ways%20to%20deal%20with%20it/>
>> [2] https://docs.gluster.org/en/latest/Troubleshooting/resolving-splitbrain/ <https://docs.gluster.org/en/latest/Troubleshooting/resolving-splitbrain/>
>> [3] https://docs.gluster.org/en/latest/Troubleshooting/resolving-splitbrain/#dir-split-brain <https://docs.gluster.org/en/latest/Troubleshooting/resolving-splitbrain/#dir-split-brain>
>>
>> HTH,
>> Karthik
>>
>> On Thu, Mar 21, 2019 at 1:45 PM Milos Cuculovic <cuculovic at mdpi.com <mailto:cuculovic at mdpi.com>> wrote:
>> I was now able to catch the split brain log:
>>
>> sudo gluster volume heal storage2 info
>> Brick storage3:/data/data-cluster
>> <gfid:256ca960-1601-4f0d-9b08-905c6fd52326>
>> <gfid:7a63a729-c48f-4a00-9040-c3e2a0710ae6>
>> /dms/final_archive - Is in split-brain
>>
>> Status: Connected
>> Number of entries: 3
>>
>> Brick storage4:/data/data-cluster
>> <gfid:276fec9a-1c9b-4efe-9715-dcf4207e99b0>
>> /dms/final_archive - Is in split-brain
>>
>> Status: Connected
>> Number of entries: 2
>>
>> Milos
>>
>>> On 21 Mar 2019, at 09:07, Milos Cuculovic <cuculovic at mdpi.com <mailto:cuculovic at mdpi.com>> wrote:
>>>
>>> Since 24h, after upgrading from 4.0 to 4.1.7 one of the servers, the heal shows this:
>>>
>>> sudo gluster volume heal storage2 info
>>> Brick storage3:/data/data-cluster
>>> <gfid:256ca960-1601-4f0d-9b08-905c6fd52326>
>>> <gfid:7a63a729-c48f-4a00-9040-c3e2a0710ae6>
>>> /dms/final_archive - Possibly undergoing heal
>>>
>>> Status: Connected
>>> Number of entries: 3
>>>
>>> Brick storage4:/data/data-cluster
>>> <gfid:276fec9a-1c9b-4efe-9715-dcf4207e99b0>
>>> /dms/final_archive - Possibly undergoing heal
>>>
>>> Status: Connected
>>> Number of entries: 2
>>>
>>> The same files stay there. From time to time the status of the /dms/final_archive is in split brain at the following command shows:
>>>
>>> sudo gluster volume heal storage2 info split-brain
>>> Brick storage3:/data/data-cluster
>>> /dms/final_archive
>>> Status: Connected
>>> Number of entries in split-brain: 1
>>>
>>> Brick storage4:/data/data-cluster
>>> /dms/final_archive
>>> Status: Connected
>>> Number of entries in split-brain: 1
>>>
>>> How to know the file who is in split brain? The files in /dms/final_archive are not very important, fine to remove (ideally resolve the split brain) for the ones that differ.
>>>
>>> I can only see the directory and GFID. Any idea on how to resolve this situation as I would like to continue with the upgrade on the 2nd server, and for this the heal needs to be done with 0 entries in sudo gluster volume heal storage2 info
>>>
>>> Thank you in advance, Milos.
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>> https://lists.gluster.org/mailman/listinfo/gluster-users <https://lists.gluster.org/mailman/listinfo/gluster-users>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190321/9da778de/attachment-0003.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: glustershd_brick2.log
Type: application/octet-stream
Size: 734197 bytes
Desc: not available
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190321/9da778de/attachment-0002.obj>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190321/9da778de/attachment-0004.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: glustershd_brick1.log
Type: application/octet-stream
Size: 1193622 bytes
Desc: not available
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190321/9da778de/attachment-0003.obj>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190321/9da778de/attachment-0005.html>
More information about the Gluster-users
mailing list