[Gluster-users] Heal flapping between Possibly undergoing heal and In split brain
Karthik Subrahmanya
ksubrahm at redhat.com
Thu Mar 21 12:05:34 UTC 2019
Can you give me the stat & getfattr output of all those 6 entries from both
the bricks and the glfsheal-<volname>.log file from the node where you run
this command?
Meanwhile can you also try running this with the source-brick option?
On Thu, Mar 21, 2019 at 5:22 PM Milos Cuculovic <cuculovic at mdpi.com> wrote:
> Thank you Karthik,
>
> I have run this for all files (see example below) and it says the file is
> not in split-brain:
>
> sudo gluster volume heal storage2 split-brain latest-mtime
> /dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59
> Healing /dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59 failed: File
> not in split-brain.
> Volume heal failed.
>
>
> - Kindest regards,
>
> Milos Cuculovic
> IT Manager
>
> ---
> MDPI AG
> Postfach, CH-4020 Basel, Switzerland
> Office: St. Alban-Anlage 66, 4052 Basel, Switzerland
> Tel. +41 61 683 77 35
> Fax +41 61 302 89 18
> Email: cuculovic at mdpi.com <cuculovic at mdpi.com>
> Skype: milos.cuculovic.mdpi
>
> Disclaimer: The information and files contained in this message
> are confidential and intended solely for the use of the individual or
> entity to whom they are addressed. If you have received this message in
> error, please notify me and delete this message from your system. You may
> not copy this message in its entirety or in part, or disclose its contents
> to anyone.
>
> On 21 Mar 2019, at 12:36, Karthik Subrahmanya <ksubrahm at redhat.com> wrote:
>
> Hi Milos,
>
> Thanks for the logs and the getfattr output.
> From the logs I can see that there are 6 entries under the
> directory "/data/data-cluster/dms/final_archive" named
> 41be9ff5ec05c4b1c989c6053e709e59
> 5543982fab4b56060aa09f667a8ae617
> a8b7f31775eebc8d1867e7f9de7b6eaf
> c1d3f3c2d7ae90e891e671e2f20d5d4b
> e5934699809a3b6dcfc5945f408b978b
> e7cdc94f60d390812a5f9754885e119e
> which are having gfid mismatch, so the heal is failing on this directory.
>
> You can use the CLI option to resolve these files from gfid mismatch. You
> can use any of the 3 methods available:
> 1. bigger-file
> gluster volume heal <VOLNAME> split-brain bigger-file <FILE>
>
> 2. latest-mtime
> gluster volume heal <VOLNAME> split-brain latest-mtime <FILE>
>
> 3. source-brick
> gluster volume heal <VOLNAME> split-brain source-brick
> <HOSTNAME:BRICKNAME> <FILE>
>
> where <FILE> must be absolute path w.r.t. the volume, starting with '/'.
> If all those entries are directories then go for either
> latest-mtime/source-brick option.
> After you resolve all these gfid-mismatches, run the "gluster volume heal
> <volname>" command. Then check the heal info and let me know the result.
>
> Regards,
> Karthik
>
> On Thu, Mar 21, 2019 at 4:27 PM Milos Cuculovic <cuculovic at mdpi.com>
> wrote:
>
>> Sure, thank you for following up.
>>
>> About the commands, here is what I see:
>>
>> brick1:
>> —————————————————————————————————————
>> sudo gluster volume heal storage2 info
>> Brick storage3:/data/data-cluster
>> <gfid:256ca960-1601-4f0d-9b08-905c6fd52326>
>> <gfid:7a63a729-c48f-4a00-9040-c3e2a0710ae6>
>> /dms/final_archive - Possibly undergoing heal
>>
>> Status: Connected
>> Number of entries: 3
>>
>> Brick storage4:/data/data-cluster
>> <gfid:276fec9a-1c9b-4efe-9715-dcf4207e99b0>
>> /dms/final_archive - Possibly undergoing heal
>>
>> Status: Connected
>> Number of entries: 2
>> —————————————————————————————————————
>> sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive
>> getfattr: Removing leading '/' from absolute path names
>> # file: data/data-cluster/dms/final_archive
>> trusted.afr.dirty=0x000000000000000000000000
>> trusted.afr.storage2-client-1=0x000000000000000000000010
>> trusted.gfid=0x16c6a1e2b3fe4851972b998980097a87
>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>> trusted.glusterfs.dht.mds=0x00000000
>> —————————————————————————————————————
>> stat /data/data-cluster/dms/final_archive
>> File: '/data/data-cluster/dms/final_archive'
>> Size: 3497984 Blocks: 8768 IO Block: 4096 directory
>> Device: 807h/2055d Inode: 26427748396 Links: 72123
>> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data)
>> Access: 2018-10-09 04:22:40.514629044 +0200
>> Modify: 2019-03-21 11:55:37.382278863 +0100
>> Change: 2019-03-21 11:55:37.382278863 +0100
>> Birth: -
>> —————————————————————————————————————
>> —————————————————————————————————————
>>
>> brick2:
>> —————————————————————————————————————
>> sudo gluster volume heal storage2 info
>> Brick storage3:/data/data-cluster
>> <gfid:256ca960-1601-4f0d-9b08-905c6fd52326>
>> <gfid:7a63a729-c48f-4a00-9040-c3e2a0710ae6>
>> /dms/final_archive - Possibly undergoing heal
>>
>> Status: Connected
>> Number of entries: 3
>>
>> Brick storage4:/data/data-cluster
>> <gfid:276fec9a-1c9b-4efe-9715-dcf4207e99b0>
>> /dms/final_archive - Possibly undergoing heal
>>
>> Status: Connected
>> Number of entries: 2
>> —————————————————————————————————————
>> sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive
>> getfattr: Removing leading '/' from absolute path names
>> # file: data/data-cluster/dms/final_archive
>> trusted.afr.dirty=0x000000000000000000000000
>> trusted.afr.storage2-client-0=0x000000000000000000000001
>> trusted.gfid=0x16c6a1e2b3fe4851972b998980097a87
>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>> trusted.glusterfs.dht.mds=0x00000000
>> —————————————————————————————————————
>> stat /data/data-cluster/dms/final_archive
>> File: '/data/data-cluster/dms/final_archive'
>> Size: 3497984 Blocks: 8760 IO Block: 4096 directory
>> Device: 807h/2055d Inode: 13563551265 Links: 72124
>> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data)
>> Access: 2018-10-09 04:22:40.514629044 +0200
>> Modify: 2019-03-21 11:55:46.382565124 +0100
>> Change: 2019-03-21 11:55:46.382565124 +0100
>> Birth: -
>> —————————————————————————————————————
>>
>> Hope this helps.
>>
>> - Kindest regards,
>>
>> Milos Cuculovic
>> IT Manager
>>
>> ---
>> MDPI AG
>> Postfach, CH-4020 Basel, Switzerland
>> Office: St. Alban-Anlage 66, 4052 Basel, Switzerland
>> Tel. +41 61 683 77 35
>> Fax +41 61 302 89 18
>> Email: cuculovic at mdpi.com <cuculovic at mdpi.com>
>> Skype: milos.cuculovic.mdpi
>>
>> Disclaimer: The information and files contained in this message
>> are confidential and intended solely for the use of the individual or
>> entity to whom they are addressed. If you have received this message in
>> error, please notify me and delete this message from your system. You may
>> not copy this message in its entirety or in part, or disclose its contents
>> to anyone.
>>
>> On 21 Mar 2019, at 11:43, Karthik Subrahmanya <ksubrahm at redhat.com>
>> wrote:
>>
>> Can you attach the "glustershd.log" file which will be present under
>> "/var/log/glusterfs/" from both the nodes and the "stat" & "getfattr -d -m
>> . -e hex <file-path-on-brick>" output of all the entries listed in the heal
>> info output from both the bricks?
>>
>> On Thu, Mar 21, 2019 at 3:54 PM Milos Cuculovic <cuculovic at mdpi.com>
>> wrote:
>>
>>> Thanks Karthik!
>>>
>>> I was trying to find some resolution methods from [2] but unfortunately
>>> none worked (I can explain what I tried if needed).
>>>
>>> I guess the volume you are talking about is of type replica-2 (1x2).
>>>
>>> That’s correct, aware of the arbiter solution but still didn’t took time
>>> to implement.
>>>
>>> From the info results I posted, how to know in which situation I am. No
>>> files are mentioned in spit brain, only directories. One brick has 3
>>> entries and one two entries.
>>>
>>> sudo gluster volume heal storage2 info
>>> [sudo] password for sshadmin:
>>> Brick storage3:/data/data-cluster
>>> <gfid:256ca960-1601-4f0d-9b08-905c6fd52326>
>>> <gfid:7a63a729-c48f-4a00-9040-c3e2a0710ae6>
>>> /dms/final_archive - Possibly undergoing heal
>>>
>>> Status: Connected
>>> Number of entries: 3
>>>
>>> Brick storage4:/data/data-cluster
>>> <gfid:276fec9a-1c9b-4efe-9715-dcf4207e99b0>
>>> /dms/final_archive - Possibly undergoing heal
>>>
>>> Status: Connected
>>> Number of entries: 2
>>>
>>> - Kindest regards,
>>>
>>> Milos Cuculovic
>>> IT Manager
>>>
>>> ---
>>> MDPI AG
>>> Postfach, CH-4020 Basel, Switzerland
>>> Office: St. Alban-Anlage 66, 4052 Basel, Switzerland
>>> Tel. +41 61 683 77 35
>>> Fax +41 61 302 89 18
>>> Email: cuculovic at mdpi.com <cuculovic at mdpi.com>
>>> Skype: milos.cuculovic.mdpi
>>>
>>> Disclaimer: The information and files contained in this message
>>> are confidential and intended solely for the use of the individual or
>>> entity to whom they are addressed. If you have received this message in
>>> error, please notify me and delete this message from your system. You may
>>> not copy this message in its entirety or in part, or disclose its contents
>>> to anyone.
>>>
>>> On 21 Mar 2019, at 10:27, Karthik Subrahmanya <ksubrahm at redhat.com>
>>> wrote:
>>>
>>> Hi,
>>>
>>> Note: I guess the volume you are talking about is of type replica-2
>>> (1x2). Usually replica 2 volumes are prone to split-brain. If you can
>>> consider converting them to arbiter or replica-3, they will handle most of
>>> the cases which can lead to slit-brains. For more information see [1].
>>>
>>> Resolving the split-brain: [2] talks about how to interpret the heal
>>> info output and different ways to resolve them using the CLI/manually/using
>>> the favorite-child-policy.
>>> If you are having entry split brain, and is a gfid split-brain (file/dir
>>> having different gfids on the replica bricks) then you can use the CLI
>>> option to resolve them. If a directory is in gfid split-brain in a
>>> distributed-replicate volume and you are using the source-brick option
>>> please make sure you use the brick of this subvolume, which has the same
>>> gfid as that of the other distribute subvolume(s) where you have the
>>> correct gfid, as the source.
>>> If you are having a type mismatch then follow the steps in [3] to
>>> resolve the split-brain.
>>>
>>> [1]
>>> https://docs.gluster.org/en/v3/Administrator%20Guide/Split%20brain%20and%20ways%20to%20deal%20with%20it/
>>> [2]
>>> https://docs.gluster.org/en/latest/Troubleshooting/resolving-splitbrain/
>>> [3]
>>> https://docs.gluster.org/en/latest/Troubleshooting/resolving-splitbrain/#dir-split-brain
>>>
>>> HTH,
>>> Karthik
>>>
>>> On Thu, Mar 21, 2019 at 1:45 PM Milos Cuculovic <cuculovic at mdpi.com>
>>> wrote:
>>>
>>>> I was now able to catch the split brain log:
>>>>
>>>> sudo gluster volume heal storage2 info
>>>> Brick storage3:/data/data-cluster
>>>> <gfid:256ca960-1601-4f0d-9b08-905c6fd52326>
>>>> <gfid:7a63a729-c48f-4a00-9040-c3e2a0710ae6>
>>>> /dms/final_archive - Is in split-brain
>>>>
>>>> Status: Connected
>>>> Number of entries: 3
>>>>
>>>> Brick storage4:/data/data-cluster
>>>> <gfid:276fec9a-1c9b-4efe-9715-dcf4207e99b0>
>>>> /dms/final_archive - Is in split-brain
>>>>
>>>> Status: Connected
>>>> Number of entries: 2
>>>>
>>>> Milos
>>>>
>>>> On 21 Mar 2019, at 09:07, Milos Cuculovic <cuculovic at mdpi.com> wrote:
>>>>
>>>> Since 24h, after upgrading from 4.0 to 4.1.7 one of the servers, the
>>>> heal shows this:
>>>>
>>>> sudo gluster volume heal storage2 info
>>>> Brick storage3:/data/data-cluster
>>>> <gfid:256ca960-1601-4f0d-9b08-905c6fd52326>
>>>> <gfid:7a63a729-c48f-4a00-9040-c3e2a0710ae6>
>>>> /dms/final_archive - Possibly undergoing heal
>>>>
>>>> Status: Connected
>>>> Number of entries: 3
>>>>
>>>> Brick storage4:/data/data-cluster
>>>> <gfid:276fec9a-1c9b-4efe-9715-dcf4207e99b0>
>>>> /dms/final_archive - Possibly undergoing heal
>>>>
>>>> Status: Connected
>>>> Number of entries: 2
>>>>
>>>> The same files stay there. From time to time the status of the
>>>> /dms/final_archive is in split brain at the following command shows:
>>>>
>>>> sudo gluster volume heal storage2 info split-brain
>>>> Brick storage3:/data/data-cluster
>>>> /dms/final_archive
>>>> Status: Connected
>>>> Number of entries in split-brain: 1
>>>>
>>>> Brick storage4:/data/data-cluster
>>>> /dms/final_archive
>>>> Status: Connected
>>>> Number of entries in split-brain: 1
>>>>
>>>> How to know the file who is in split brain? The files in
>>>> /dms/final_archive are not very important, fine to remove (ideally resolve
>>>> the split brain) for the ones that differ.
>>>>
>>>> I can only see the directory and GFID. Any idea on how to resolve this
>>>> situation as I would like to continue with the upgrade on the 2nd server,
>>>> and for this the heal needs to be done with 0 entries in sudo gluster
>>>> volume heal storage2 info
>>>>
>>>> Thank you in advance, Milos.
>>>>
>>>>
>>>> _______________________________________________
>>>> Gluster-users mailing list
>>>> Gluster-users at gluster.org
>>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190321/908339f4/attachment.html>
More information about the Gluster-users
mailing list