[Gluster-users] Heal flapping between Possibly undergoing heal and In split brain

Karthik Subrahmanya ksubrahm at redhat.com
Fri Mar 22 07:51:04 UTC 2019


Hi,

If it is a file then you can find the filename from the gfid by running the
following on the nodes hosting the bricks
find <brickpath> -samefile <brickpath/.glusterfs/<first two bits of
gfid>/<next 2 bits of gfid>/<full gfid>

If it is a directory you can run the following on the nodes hosting the
bricks
ls -l <brickpath/.glusterfs/<first two bits of gfid>/<next 2 bits of
gfid>/<full gfid>

Run these on both the nodes and paste the output of these commands before
running the lookup from client on these entries.

Regards,
Karthik

On Fri, Mar 22, 2019 at 1:06 PM Milos Cuculovic <cuculovic at mdpi.com> wrote:

> I have run a few minutes ago the info and here are the results:
>
> sudo gluster volume heal storage2 info
> Brick storage3:/data/data-cluster
> <gfid:256ca960-1601-4f0d-9b08-905c6fd52326>
> <gfid:7a63a729-c48f-4a00-9040-c3e2a0710ae6>
> Status: Connected
> Number of entries: 2
>
> Brick storage4:/data/data-cluster
> <gfid:276fec9a-1c9b-4efe-9715-dcf4207e99b0>
> <gfid:40d6937f-dbcd-4f2e-956e-0867b36dfe40>
> <gfid:a6d73fa0-9a7d-40f6-8f74-32ccdd4c2aa7>
> <gfid:47d6a492-a5a4-4cdb-87d2-a5de84c50787>
> <gfid:e0558faf-c95b-48c9-b6cf-1c0e53d9d577>
> <gfid:332866d2-79e1-406b-a2eb-c457daf3f05b>
> Status: Connected
> Number of entries: 6
>
>
> sudo gluster volume heal storage2 info split-brain
> Brick storage3:/data/data-cluster
> Status: Connected
> Number of entries in split-brain: 0
>
> Brick storage4:/data/data-cluster
> Status: Connected
> Number of entries in split-brain: 0
>
> The heal info (2 + 6) are there since yesterday and do not change.
>
>
> If they are still there can you try doing a lookup on those entries from
> client and see whether they are getting healed?
>
> How can I do this having the gfid only?
>
> - Kindest regards,
>
> Milos Cuculovic
> IT Manager
>
> ---
> MDPI AG
> Postfach, CH-4020 Basel, Switzerland
> Office: St. Alban-Anlage 66, 4052 Basel, Switzerland
> Tel. +41 61 683 77 35
> Fax +41 61 302 89 18
> Email: cuculovic at mdpi.com <cuculovic at mdpi.com>
> Skype: milos.cuculovic.mdpi
>
> Disclaimer: The information and files contained in this message
> are confidential and intended solely for the use of the individual or
> entity to whom they are addressed. If you have received this message in
> error, please notify me and delete this message from your system. You may
> not copy this message in its entirety or in part, or disclose its contents
> to anyone.
>
> On 21 Mar 2019, at 14:34, Karthik Subrahmanya <ksubrahm at redhat.com> wrote:
>
> Now the slit-brain on the directory is resolved.
> Are these entries which are there in the latest heal info output not
> getting healed? Are they still present in the heal info output?
> If they are still there can you try doing a lookup on those entries from
> client and see whether they are getting healed?
>
>
> On Thu, Mar 21, 2019 at 6:49 PM Milos Cuculovic <cuculovic at mdpi.com>
> wrote:
>
>> Hey Karthik,
>>
>>  Can you run the "guster volume heal <volname>”
>>
>> sudo gluster volume heal storage2
>> Launching heal operation to perform index self heal on volume storage2
>> has been successful
>> Use heal info commands to check status.
>>
>> "gluster volume heal <volname> info”
>>
>> sudo gluster volume heal storage2 info
>> Brick storage3:/data/data-cluster
>> <gfid:256ca960-1601-4f0d-9b08-905c6fd52326>
>> <gfid:7a63a729-c48f-4a00-9040-c3e2a0710ae6>
>> Status: Connected
>> Number of entries: 2
>>
>> Brick storage4:/data/data-cluster
>> <gfid:276fec9a-1c9b-4efe-9715-dcf4207e99b0>
>> <gfid:40d6937f-dbcd-4f2e-956e-0867b36dfe40>
>> <gfid:a6d73fa0-9a7d-40f6-8f74-32ccdd4c2aa7>
>> <gfid:47d6a492-a5a4-4cdb-87d2-a5de84c50787>
>> <gfid:e0558faf-c95b-48c9-b6cf-1c0e53d9d577>
>> <gfid:332866d2-79e1-406b-a2eb-c457daf3f05b>
>> Status: Connected
>> Number of entries: 6
>>
>>
>>
>> - Kindest regards,
>>
>> Milos Cuculovic
>> IT Manager
>>
>> ---
>> MDPI AG
>> Postfach, CH-4020 Basel, Switzerland
>> Office: St. Alban-Anlage 66, 4052 Basel, Switzerland
>> Tel. +41 61 683 77 35
>> Fax +41 61 302 89 18
>> Email: cuculovic at mdpi.com <cuculovic at mdpi.com>
>> Skype: milos.cuculovic.mdpi
>>
>> Disclaimer: The information and files contained in this message
>> are confidential and intended solely for the use of the individual or
>> entity to whom they are addressed. If you have received this message in
>> error, please notify me and delete this message from your system. You may
>> not copy this message in its entirety or in part, or disclose its contents
>> to anyone.
>>
>> On 21 Mar 2019, at 14:07, Karthik Subrahmanya <ksubrahm at redhat.com>
>> wrote:
>>
>> Hey Milos,
>>
>> I see that gfid got healed for those directories from the getfattr output
>> and the glfsheal log also has messages corresponding to deleting the
>> entries on one brick as part of healing which then got recreated on the
>> brick with the correct gfid. Can you run the "guster volume heal <volname>"
>> & "gluster volume heal <volname> info" command and paste the output here?
>> If you still see entries pending heal, give the latest glustershd.log
>> files from both the nodes along with the getfattr output of the files which
>> are listed in the heal info output.
>>
>> Regards,
>> Karthik
>>
>> On Thu, Mar 21, 2019 at 6:03 PM Milos Cuculovic <cuculovic at mdpi.com>
>> wrote:
>>
>>> Sure:
>>>
>>> brick1:
>>> ————————————————————————————————————————————————————————————
>>> ————————————————————————————————————————————————————————————
>>> sudo getfattr -d -m . -e hex
>>> /data/data-cluster/dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59
>>> getfattr: Removing leading '/' from absolute path names
>>> # file:
>>> data/data-cluster/dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59
>>> trusted.gfid=0xe358ff34504241d387efe1e76eb28bb0
>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>> trusted.glusterfs.dht.mds=0x00000000
>>>
>>> sudo getfattr -d -m . -e hex
>>> /data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617
>>> getfattr: Removing leading '/' from absolute path names
>>> # file:
>>> data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617
>>> trusted.gfid=0x33585a577c4a4c55b39b9abc07eacff1
>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>> trusted.glusterfs.dht.mds=0x00000000
>>>
>>> sudo getfattr -d -m . -e hex
>>> /data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf
>>> getfattr: Removing leading '/' from absolute path names
>>> # file:
>>> data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf
>>> trusted.gfid=0x99cdfe0773d74d7ab29592ce40cf47f3
>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>> trusted.glusterfs.dht.mds=0x00000000
>>>
>>> sudo getfattr -d -m . -e hex
>>> /data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b
>>> getfattr: Removing leading '/' from absolute path names
>>> # file:
>>> data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b
>>> trusted.gfid=0x8a96a80ddf8a4163ac00bb12369389df
>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>> trusted.glusterfs.dht.mds=0x00000000
>>>
>>> sudo getfattr -d -m . -e hex
>>> /data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b
>>> getfattr: Removing leading '/' from absolute path names
>>> # file:
>>> data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b
>>> trusted.gfid=0x04d087a96cc6421ca7bc356fc20d7753
>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>> trusted.glusterfs.dht.mds=0x00000000
>>>
>>> sudo getfattr -d -m . -e hex
>>> /data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e
>>> getfattr: Removing leading '/' from absolute path names
>>> # file:
>>> data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e
>>> trusted.gfid=0xdee76d06204d4bd9b7652eb7a79be1a5
>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>> trusted.glusterfs.dht.mds=0x00000000
>>> ————————————————————————————————————————————————————————————
>>> sudo stat
>>> /data/data-cluster/dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59
>>>   File:
>>> '/data/data-cluster/dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59'
>>>   Size: 33        Blocks: 0          IO Block: 4096   directory
>>> Device: 807h/2055d Inode: 40809094709  Links: 3
>>> Access: (0755/drwxr-xr-x)  Uid: (   33/www-data)   Gid: (   33/www-data)
>>> Access: 2019-03-20 11:06:26.994047597 +0100
>>> Modify: 2019-03-20 11:28:28.294689870 +0100
>>> Change: 2019-03-21 13:01:03.077654239 +0100
>>>  Birth: -
>>>
>>> sudo stat
>>> /data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617
>>>   File:
>>> '/data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617'
>>>   Size: 33        Blocks: 0          IO Block: 4096   directory
>>> Device: 807h/2055d Inode: 49399908865  Links: 3
>>> Access: (0755/drwxr-xr-x)  Uid: (   33/www-data)   Gid: (   33/www-data)
>>> Access: 2019-03-20 11:07:20.342140927 +0100
>>> Modify: 2019-03-20 11:28:28.318690015 +0100
>>> Change: 2019-03-21 13:01:03.133654344 +0100
>>>  Birth: -
>>>
>>> sudo stat
>>> /data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf
>>>   File:
>>> '/data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf'
>>>   Size: 33        Blocks: 0          IO Block: 4096   directory
>>> Device: 807h/2055d Inode: 53706303549  Links: 3
>>> Access: (0755/drwxr-xr-x)  Uid: (   33/www-data)   Gid: (   33/www-data)
>>> Access: 2019-03-20 11:06:55.414097315 +0100
>>> Modify: 2019-03-20 11:28:28.362690281 +0100
>>> Change: 2019-03-21 13:01:03.141654359 +0100
>>>  Birth: -
>>>
>>> sudo stat
>>> /data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b
>>>   File:
>>> '/data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b'
>>>   Size: 33        Blocks: 0          IO Block: 4096   directory
>>> Device: 807h/2055d Inode: 57990935591  Links: 3
>>> Access: (0755/drwxr-xr-x)  Uid: (   33/www-data)   Gid: (   33/www-data)
>>> Access: 2019-03-20 11:07:08.558120309 +0100
>>> Modify: 2019-03-20 11:28:14.226604869 +0100
>>> Change: 2019-03-21 13:01:03.189654448 +0100
>>>  Birth: -
>>>
>>> sudo stat
>>> /data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b
>>>   File:
>>> '/data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b'
>>>   Size: 33        Blocks: 0          IO Block: 4096   directory
>>> Device: 807h/2055d Inode: 62291339781  Links: 3
>>> Access: (0755/drwxr-xr-x)  Uid: (   33/www-data)   Gid: (   33/www-data)
>>> Access: 2019-03-20 11:06:02.070003998 +0100
>>> Modify: 2019-03-20 11:28:28.458690861 +0100
>>> Change: 2019-03-21 13:01:03.281654621 +0100
>>>  Birth: -
>>>
>>> sudo stat
>>> /data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e
>>>   File:
>>> '/data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e'
>>>   Size: 33        Blocks: 0          IO Block: 4096   directory
>>> Device: 807h/2055d Inode: 66574223479  Links: 3
>>> Access: (0755/drwxr-xr-x)  Uid: (   33/www-data)   Gid: (   33/www-data)
>>> Access: 2019-03-20 11:28:10.826584325 +0100
>>> Modify: 2019-03-20 11:28:10.834584374 +0100
>>> Change: 2019-03-20 14:06:07.937449353 +0100
>>>  Birth: -
>>> root at storage3:/var/log/glusterfs#
>>> ————————————————————————————————————————————————————————————
>>> ————————————————————————————————————————————————————————————
>>>
>>> brick2:
>>> ————————————————————————————————————————————————————————————
>>> ————————————————————————————————————————————————————————————
>>> sudo getfattr -d -m . -e hex
>>> /data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e
>>> getfattr: Removing leading '/' from absolute path names
>>> # file:
>>> data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e
>>> trusted.afr.dirty=0x000000000000000000000000
>>> trusted.afr.storage2-client-0=0x000000000000000000000000
>>> trusted.gfid=0xdee76d06204d4bd9b7652eb7a79be1a5
>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>> trusted.glusterfs.dht.mds=0x00000000
>>>
>>> sudo getfattr -d -m . -e hex
>>> /data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617
>>> getfattr: Removing leading '/' from absolute path names
>>> # file:
>>> data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617
>>> trusted.afr.storage2-client-0=0x000000000000000000000000
>>> trusted.gfid=0x33585a577c4a4c55b39b9abc07eacff1
>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>> trusted.glusterfs.dht.mds=0x00000000
>>>
>>> sudo getfattr -d -m . -e hex
>>> /data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf
>>> getfattr: Removing leading '/' from absolute path names
>>> # file:
>>> data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf
>>> trusted.afr.storage2-client-0=0x000000000000000000000000
>>> trusted.gfid=0x99cdfe0773d74d7ab29592ce40cf47f3
>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>> trusted.glusterfs.dht.mds=0x00000000
>>>
>>> sudo getfattr -d -m . -e hex
>>> /data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b
>>> getfattr: Removing leading '/' from absolute path names
>>> # file:
>>> data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b
>>> trusted.afr.storage2-client-0=0x000000000000000000000000
>>> trusted.gfid=0x8a96a80ddf8a4163ac00bb12369389df
>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>> trusted.glusterfs.dht.mds=0x00000000
>>>
>>> sudo getfattr -d -m . -e hex
>>> /data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b
>>> getfattr: Removing leading '/' from absolute path names
>>> # file:
>>> data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b
>>> trusted.afr.storage2-client-0=0x000000000000000000000000
>>> trusted.gfid=0x04d087a96cc6421ca7bc356fc20d7753
>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>> trusted.glusterfs.dht.mds=0x00000000
>>>
>>> sudo getfattr -d -m . -e hex
>>> /data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e
>>> getfattr: Removing leading '/' from absolute path names
>>> # file:
>>> data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e
>>> trusted.afr.dirty=0x000000000000000000000000
>>> trusted.afr.storage2-client-0=0x000000000000000000000000
>>> trusted.gfid=0xdee76d06204d4bd9b7652eb7a79be1a5
>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>> trusted.glusterfs.dht.mds=0x00000000
>>>
>>> ————————————————————————————————————————————————————————————
>>>
>>> sudo stat
>>> /data/data-cluster/dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59
>>>   File:
>>> '/data/data-cluster/dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59'
>>>   Size: 33        Blocks: 0          IO Block: 4096   directory
>>> Device: 807h/2055d Inode: 42232631305  Links: 3
>>> Access: (0755/drwxr-xr-x)  Uid: (   33/www-data)   Gid: (   33/www-data)
>>> Access: 2019-03-20 11:06:26.994047597 +0100
>>> Modify: 2019-03-20 11:28:28.294689870 +0100
>>> Change: 2019-03-21 13:01:03.078748131 +0100
>>>  Birth: -
>>>
>>> sudo stat
>>> /data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617
>>>   File:
>>> '/data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617'
>>>   Size: 33        Blocks: 0          IO Block: 4096   directory
>>> Device: 807h/2055d Inode: 78589109305  Links: 3
>>> Access: (0755/drwxr-xr-x)  Uid: (   33/www-data)   Gid: (   33/www-data)
>>> Access: 2019-03-20 11:07:20.342140927 +0100
>>> Modify: 2019-03-20 11:28:28.318690015 +0100
>>> Change: 2019-03-21 13:01:03.134748477 +0100
>>>  Birth: -
>>>
>>> sudo stat
>>> /data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf
>>>   File:
>>> '/data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf'
>>>   Size: 33        Blocks: 0          IO Block: 4096   directory
>>> Device: 807h/2055d Inode: 54972096517  Links: 3
>>> Access: (0755/drwxr-xr-x)  Uid: (   33/www-data)   Gid: (   33/www-data)
>>> Access: 2019-03-20 11:06:55.414097315 +0100
>>> Modify: 2019-03-20 11:28:28.362690281 +0100
>>> Change: 2019-03-21 13:01:03.162748650 +0100
>>>  Birth: -
>>>
>>> sudo stat
>>> /data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b
>>>   File:
>>> '/data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b'
>>>   Size: 33        Blocks: 0          IO Block: 4096   directory
>>> Device: 807h/2055d Inode: 40821259275  Links: 3
>>> Access: (0755/drwxr-xr-x)  Uid: (   33/www-data)   Gid: (   33/www-data)
>>> Access: 2019-03-20 11:07:08.558120309 +0100
>>> Modify: 2019-03-20 11:28:14.226604869 +0100
>>> Change: 2019-03-21 13:01:03.194748848 +0100
>>>  Birth: -
>>>
>>> sudo stat
>>> /data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b
>>>   File:
>>> '/data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b'
>>>   Size: 33        Blocks: 0          IO Block: 4096   directory
>>> Device: 807h/2055d Inode: 15876654    Links: 3
>>> Access: (0755/drwxr-xr-x)  Uid: (   33/www-data)   Gid: (   33/www-data)
>>> Access: 2019-03-20 11:06:02.070003998 +0100
>>> Modify: 2019-03-20 11:28:28.458690861 +0100
>>> Change: 2019-03-21 13:01:03.282749392 +0100
>>>  Birth: -
>>>
>>> sudo stat
>>> /data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e
>>>   File:
>>> '/data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e'
>>>   Size: 33        Blocks: 0          IO Block: 4096   directory
>>> Device: 807h/2055d Inode: 49408944650  Links: 3
>>> Access: (0755/drwxr-xr-x)  Uid: (   33/www-data)   Gid: (   33/www-data)
>>> Access: 2019-03-20 11:28:10.826584325 +0100
>>> Modify: 2019-03-20 11:28:10.834584374 +0100
>>> Change: 2019-03-20 14:06:07.940849268 +0100
>>>  Birth: -
>>> ————————————————————————————————————————————————————————————
>>> ————————————————————————————————————————————————————————————
>>>
>>> The file is from brick 2 that I upgraded and started the heal on.
>>>
>>>
>>> - Kindest regards,
>>>
>>> Milos Cuculovic
>>> IT Manager
>>>
>>> ---
>>> MDPI AG
>>> Postfach, CH-4020 Basel, Switzerland
>>> Office: St. Alban-Anlage 66, 4052 Basel, Switzerland
>>> Tel. +41 61 683 77 35
>>> Fax +41 61 302 89 18
>>> Email: cuculovic at mdpi.com <cuculovic at mdpi.com>
>>> Skype: milos.cuculovic.mdpi
>>>
>>> Disclaimer: The information and files contained in this message
>>> are confidential and intended solely for the use of the individual or
>>> entity to whom they are addressed. If you have received this message in
>>> error, please notify me and delete this message from your system. You may
>>> not copy this message in its entirety or in part, or disclose its contents
>>> to anyone.
>>>
>>> On 21 Mar 2019, at 13:05, Karthik Subrahmanya <ksubrahm at redhat.com>
>>> wrote:
>>>
>>> Can you give me the stat & getfattr output of all those 6 entries from
>>> both the bricks and the glfsheal-<volname>.log file from the node where you
>>> run this command?
>>> Meanwhile can you also try running this with the source-brick option?
>>>
>>> On Thu, Mar 21, 2019 at 5:22 PM Milos Cuculovic <cuculovic at mdpi.com>
>>> wrote:
>>>
>>>> Thank you Karthik,
>>>>
>>>> I have run this for all files (see example below) and it says the file
>>>> is not in split-brain:
>>>>
>>>> sudo gluster volume heal storage2 split-brain latest-mtime
>>>> /dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59
>>>> Healing /dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59 failed:
>>>> File not in split-brain.
>>>> Volume heal failed.
>>>>
>>>>
>>>> - Kindest regards,
>>>>
>>>> Milos Cuculovic
>>>> IT Manager
>>>>
>>>> ---
>>>> MDPI AG
>>>> Postfach, CH-4020 Basel, Switzerland
>>>> Office: St. Alban-Anlage 66, 4052 Basel, Switzerland
>>>> Tel. +41 61 683 77 35
>>>> Fax +41 61 302 89 18
>>>> Email: cuculovic at mdpi.com <cuculovic at mdpi.com>
>>>> Skype: milos.cuculovic.mdpi
>>>>
>>>> Disclaimer: The information and files contained in this message
>>>> are confidential and intended solely for the use of the individual or
>>>> entity to whom they are addressed. If you have received this message in
>>>> error, please notify me and delete this message from your system. You may
>>>> not copy this message in its entirety or in part, or disclose its contents
>>>> to anyone.
>>>>
>>>> On 21 Mar 2019, at 12:36, Karthik Subrahmanya <ksubrahm at redhat.com>
>>>> wrote:
>>>>
>>>> Hi Milos,
>>>>
>>>> Thanks for the logs and the getfattr output.
>>>> From the logs I can see that there are 6 entries under the
>>>> directory "/data/data-cluster/dms/final_archive" named
>>>> 41be9ff5ec05c4b1c989c6053e709e59
>>>> 5543982fab4b56060aa09f667a8ae617
>>>> a8b7f31775eebc8d1867e7f9de7b6eaf
>>>> c1d3f3c2d7ae90e891e671e2f20d5d4b
>>>> e5934699809a3b6dcfc5945f408b978b
>>>> e7cdc94f60d390812a5f9754885e119e
>>>> which are having gfid mismatch, so the heal is failing on this
>>>> directory.
>>>>
>>>> You can use the CLI option to resolve these files from gfid mismatch.
>>>> You can use any of the 3 methods available:
>>>> 1. bigger-file
>>>> gluster volume heal <VOLNAME> split-brain bigger-file <FILE>
>>>>
>>>> 2. latest-mtime
>>>> gluster volume heal <VOLNAME> split-brain latest-mtime <FILE>
>>>>
>>>> 3. source-brick
>>>> gluster volume heal <VOLNAME> split-brain source-brick
>>>> <HOSTNAME:BRICKNAME> <FILE>
>>>>
>>>> where <FILE> must be absolute path w.r.t. the volume, starting with '/'.
>>>> If all those entries are directories then go for either
>>>> latest-mtime/source-brick option.
>>>> After you resolve all these gfid-mismatches, run the "gluster volume
>>>> heal <volname>" command. Then check the heal info and let me know the
>>>> result.
>>>>
>>>> Regards,
>>>> Karthik
>>>>
>>>> On Thu, Mar 21, 2019 at 4:27 PM Milos Cuculovic <cuculovic at mdpi.com>
>>>> wrote:
>>>>
>>>>> Sure, thank you for following up.
>>>>>
>>>>> About the commands, here is what I see:
>>>>>
>>>>> brick1:
>>>>> —————————————————————————————————————
>>>>> sudo gluster volume heal storage2 info
>>>>> Brick storage3:/data/data-cluster
>>>>> <gfid:256ca960-1601-4f0d-9b08-905c6fd52326>
>>>>> <gfid:7a63a729-c48f-4a00-9040-c3e2a0710ae6>
>>>>> /dms/final_archive - Possibly undergoing heal
>>>>>
>>>>> Status: Connected
>>>>> Number of entries: 3
>>>>>
>>>>> Brick storage4:/data/data-cluster
>>>>> <gfid:276fec9a-1c9b-4efe-9715-dcf4207e99b0>
>>>>> /dms/final_archive - Possibly undergoing heal
>>>>>
>>>>> Status: Connected
>>>>> Number of entries: 2
>>>>> —————————————————————————————————————
>>>>> sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive
>>>>> getfattr: Removing leading '/' from absolute path names
>>>>> # file: data/data-cluster/dms/final_archive
>>>>> trusted.afr.dirty=0x000000000000000000000000
>>>>> trusted.afr.storage2-client-1=0x000000000000000000000010
>>>>> trusted.gfid=0x16c6a1e2b3fe4851972b998980097a87
>>>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>>>> trusted.glusterfs.dht.mds=0x00000000
>>>>> —————————————————————————————————————
>>>>> stat /data/data-cluster/dms/final_archive
>>>>>   File: '/data/data-cluster/dms/final_archive'
>>>>>   Size: 3497984   Blocks: 8768       IO Block: 4096   directory
>>>>> Device: 807h/2055d Inode: 26427748396  Links: 72123
>>>>> Access: (0755/drwxr-xr-x)  Uid: (   33/www-data)   Gid: (
>>>>> 33/www-data)
>>>>> Access: 2018-10-09 04:22:40.514629044 +0200
>>>>> Modify: 2019-03-21 11:55:37.382278863 +0100
>>>>> Change: 2019-03-21 11:55:37.382278863 +0100
>>>>>  Birth: -
>>>>> —————————————————————————————————————
>>>>> —————————————————————————————————————
>>>>>
>>>>> brick2:
>>>>> —————————————————————————————————————
>>>>> sudo gluster volume heal storage2 info
>>>>> Brick storage3:/data/data-cluster
>>>>> <gfid:256ca960-1601-4f0d-9b08-905c6fd52326>
>>>>> <gfid:7a63a729-c48f-4a00-9040-c3e2a0710ae6>
>>>>> /dms/final_archive - Possibly undergoing heal
>>>>>
>>>>> Status: Connected
>>>>> Number of entries: 3
>>>>>
>>>>> Brick storage4:/data/data-cluster
>>>>> <gfid:276fec9a-1c9b-4efe-9715-dcf4207e99b0>
>>>>> /dms/final_archive - Possibly undergoing heal
>>>>>
>>>>> Status: Connected
>>>>> Number of entries: 2
>>>>> —————————————————————————————————————
>>>>> sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive
>>>>> getfattr: Removing leading '/' from absolute path names
>>>>> # file: data/data-cluster/dms/final_archive
>>>>> trusted.afr.dirty=0x000000000000000000000000
>>>>> trusted.afr.storage2-client-0=0x000000000000000000000001
>>>>> trusted.gfid=0x16c6a1e2b3fe4851972b998980097a87
>>>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>>>> trusted.glusterfs.dht.mds=0x00000000
>>>>> —————————————————————————————————————
>>>>> stat /data/data-cluster/dms/final_archive
>>>>>   File: '/data/data-cluster/dms/final_archive'
>>>>>   Size: 3497984   Blocks: 8760       IO Block: 4096   directory
>>>>> Device: 807h/2055d Inode: 13563551265  Links: 72124
>>>>> Access: (0755/drwxr-xr-x)  Uid: (   33/www-data)   Gid: (
>>>>> 33/www-data)
>>>>> Access: 2018-10-09 04:22:40.514629044 +0200
>>>>> Modify: 2019-03-21 11:55:46.382565124 +0100
>>>>> Change: 2019-03-21 11:55:46.382565124 +0100
>>>>>  Birth: -
>>>>> —————————————————————————————————————
>>>>>
>>>>> Hope this helps.
>>>>>
>>>>> - Kindest regards,
>>>>>
>>>>> Milos Cuculovic
>>>>> IT Manager
>>>>>
>>>>> ---
>>>>> MDPI AG
>>>>> Postfach, CH-4020 Basel, Switzerland
>>>>> Office: St. Alban-Anlage 66, 4052 Basel, Switzerland
>>>>> Tel. +41 61 683 77 35
>>>>> Fax +41 61 302 89 18
>>>>> Email: cuculovic at mdpi.com <cuculovic at mdpi.com>
>>>>> Skype: milos.cuculovic.mdpi
>>>>>
>>>>> Disclaimer: The information and files contained in this message
>>>>> are confidential and intended solely for the use of the individual or
>>>>> entity to whom they are addressed. If you have received this message in
>>>>> error, please notify me and delete this message from your system. You may
>>>>> not copy this message in its entirety or in part, or disclose its contents
>>>>> to anyone.
>>>>>
>>>>> On 21 Mar 2019, at 11:43, Karthik Subrahmanya <ksubrahm at redhat.com>
>>>>> wrote:
>>>>>
>>>>> Can you attach the "glustershd.log"  file which will be present under
>>>>> "/var/log/glusterfs/" from both the nodes and the "stat" & "getfattr -d -m
>>>>> . -e hex <file-path-on-brick>" output of all the entries listed in the heal
>>>>> info output from both the bricks?
>>>>>
>>>>> On Thu, Mar 21, 2019 at 3:54 PM Milos Cuculovic <cuculovic at mdpi.com>
>>>>> wrote:
>>>>>
>>>>>> Thanks Karthik!
>>>>>>
>>>>>> I was trying to find some resolution methods from [2] but
>>>>>> unfortunately none worked (I can explain what I tried if needed).
>>>>>>
>>>>>> I guess the volume you are talking about is of type replica-2 (1x2).
>>>>>>
>>>>>> That’s correct, aware of the arbiter solution but still didn’t took
>>>>>> time to implement.
>>>>>>
>>>>>> From the info results I posted, how to know in which situation I am.
>>>>>> No files are mentioned in spit brain, only directories. One brick has 3
>>>>>> entries and one two entries.
>>>>>>
>>>>>> sudo gluster volume heal storage2 info
>>>>>> [sudo] password for sshadmin:
>>>>>> Brick storage3:/data/data-cluster
>>>>>> <gfid:256ca960-1601-4f0d-9b08-905c6fd52326>
>>>>>> <gfid:7a63a729-c48f-4a00-9040-c3e2a0710ae6>
>>>>>> /dms/final_archive - Possibly undergoing heal
>>>>>>
>>>>>> Status: Connected
>>>>>> Number of entries: 3
>>>>>>
>>>>>> Brick storage4:/data/data-cluster
>>>>>> <gfid:276fec9a-1c9b-4efe-9715-dcf4207e99b0>
>>>>>> /dms/final_archive - Possibly undergoing heal
>>>>>>
>>>>>> Status: Connected
>>>>>> Number of entries: 2
>>>>>>
>>>>>> - Kindest regards,
>>>>>>
>>>>>> Milos Cuculovic
>>>>>> IT Manager
>>>>>>
>>>>>> ---
>>>>>> MDPI AG
>>>>>> Postfach, CH-4020 Basel, Switzerland
>>>>>> Office: St. Alban-Anlage 66, 4052 Basel, Switzerland
>>>>>> Tel. +41 61 683 77 35
>>>>>> Fax +41 61 302 89 18
>>>>>> Email: cuculovic at mdpi.com <cuculovic at mdpi.com>
>>>>>> Skype: milos.cuculovic.mdpi
>>>>>>
>>>>>> Disclaimer: The information and files contained in this message
>>>>>> are confidential and intended solely for the use of the individual or
>>>>>> entity to whom they are addressed. If you have received this message in
>>>>>> error, please notify me and delete this message from your system. You may
>>>>>> not copy this message in its entirety or in part, or disclose its contents
>>>>>> to anyone.
>>>>>>
>>>>>> On 21 Mar 2019, at 10:27, Karthik Subrahmanya <ksubrahm at redhat.com>
>>>>>> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Note: I guess the volume you are talking about is of type replica-2
>>>>>> (1x2). Usually replica 2 volumes are prone to split-brain. If you can
>>>>>> consider converting them to arbiter or replica-3, they will handle most of
>>>>>> the cases which can lead to slit-brains. For more information see [1].
>>>>>>
>>>>>> Resolving the split-brain: [2] talks about how to interpret the heal
>>>>>> info output and different ways to resolve them using the CLI/manually/using
>>>>>> the favorite-child-policy.
>>>>>> If you are having entry split brain, and is a gfid split-brain
>>>>>> (file/dir having different gfids on the replica bricks) then you can use
>>>>>> the CLI option to resolve them. If a directory is in gfid split-brain in a
>>>>>> distributed-replicate volume and you are using the source-brick option
>>>>>> please make sure you use the brick of this subvolume, which has the same
>>>>>> gfid as that of the other distribute subvolume(s) where you have the
>>>>>> correct gfid, as the source.
>>>>>> If you are having a type mismatch then follow the steps in [3] to
>>>>>> resolve the split-brain.
>>>>>>
>>>>>> [1]
>>>>>> https://docs.gluster.org/en/v3/Administrator%20Guide/Split%20brain%20and%20ways%20to%20deal%20with%20it/
>>>>>> [2]
>>>>>> https://docs.gluster.org/en/latest/Troubleshooting/resolving-splitbrain/
>>>>>> [3]
>>>>>> https://docs.gluster.org/en/latest/Troubleshooting/resolving-splitbrain/#dir-split-brain
>>>>>>
>>>>>> HTH,
>>>>>> Karthik
>>>>>>
>>>>>> On Thu, Mar 21, 2019 at 1:45 PM Milos Cuculovic <cuculovic at mdpi.com>
>>>>>> wrote:
>>>>>>
>>>>>>> I was now able to catch the split brain log:
>>>>>>>
>>>>>>> sudo gluster volume heal storage2 info
>>>>>>> Brick storage3:/data/data-cluster
>>>>>>> <gfid:256ca960-1601-4f0d-9b08-905c6fd52326>
>>>>>>> <gfid:7a63a729-c48f-4a00-9040-c3e2a0710ae6>
>>>>>>> /dms/final_archive - Is in split-brain
>>>>>>>
>>>>>>> Status: Connected
>>>>>>> Number of entries: 3
>>>>>>>
>>>>>>> Brick storage4:/data/data-cluster
>>>>>>> <gfid:276fec9a-1c9b-4efe-9715-dcf4207e99b0>
>>>>>>> /dms/final_archive - Is in split-brain
>>>>>>>
>>>>>>> Status: Connected
>>>>>>> Number of entries: 2
>>>>>>>
>>>>>>> Milos
>>>>>>>
>>>>>>> On 21 Mar 2019, at 09:07, Milos Cuculovic <cuculovic at mdpi.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>> Since 24h, after upgrading from 4.0 to 4.1.7 one of the servers, the
>>>>>>> heal shows this:
>>>>>>>
>>>>>>> sudo gluster volume heal storage2 info
>>>>>>> Brick storage3:/data/data-cluster
>>>>>>> <gfid:256ca960-1601-4f0d-9b08-905c6fd52326>
>>>>>>> <gfid:7a63a729-c48f-4a00-9040-c3e2a0710ae6>
>>>>>>> /dms/final_archive - Possibly undergoing heal
>>>>>>>
>>>>>>> Status: Connected
>>>>>>> Number of entries: 3
>>>>>>>
>>>>>>> Brick storage4:/data/data-cluster
>>>>>>> <gfid:276fec9a-1c9b-4efe-9715-dcf4207e99b0>
>>>>>>> /dms/final_archive - Possibly undergoing heal
>>>>>>>
>>>>>>> Status: Connected
>>>>>>> Number of entries: 2
>>>>>>>
>>>>>>> The same files stay there. From time to time the status of the
>>>>>>> /dms/final_archive is in split brain at the following command shows:
>>>>>>>
>>>>>>> sudo gluster volume heal storage2 info split-brain
>>>>>>> Brick storage3:/data/data-cluster
>>>>>>> /dms/final_archive
>>>>>>> Status: Connected
>>>>>>> Number of entries in split-brain: 1
>>>>>>>
>>>>>>> Brick storage4:/data/data-cluster
>>>>>>> /dms/final_archive
>>>>>>> Status: Connected
>>>>>>> Number of entries in split-brain: 1
>>>>>>>
>>>>>>> How to know the file who is in split brain? The files in
>>>>>>> /dms/final_archive are not very important, fine to remove (ideally resolve
>>>>>>> the split brain) for the ones that differ.
>>>>>>>
>>>>>>> I can only see the directory and GFID. Any idea on how to resolve
>>>>>>> this situation as I would like to continue with the upgrade on the 2nd
>>>>>>> server, and for this the heal needs to be done with 0 entries in sudo
>>>>>>> gluster volume heal storage2 info
>>>>>>>
>>>>>>> Thank you in advance, Milos.
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Gluster-users mailing list
>>>>>>> Gluster-users at gluster.org
>>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190322/4affdc8a/attachment-0001.html>


More information about the Gluster-users mailing list