[Gluster-users] Cluster not healing

James Wilkins nebulai at gmail.com
Mon Jan 23 20:42:01 UTC 2017


On 23 January 2017 at 20:04, Gambit15 <dougti+gluster at gmail.com> wrote:

> Have you verified that Gluster has marked the files as split-brain?
>

Gluster does not recognise all the files as split-brain - in fact only a
handful are recognised as such - e.g for the example I pasted, its not
listed - however the gfid is different (I believe this should be the same?)




>
> gluster volume heal <vol> info split-brain
>
> If you're fairly confident about which files are correct, you can automate
> the split-brain healing procedure.
>
> From the manual...
>
>>         volume heal <VOLNAME> split-brain bigger-file <FILE>
>>               Performs healing of <FILE> which is in split-brain by
>> choosing the bigger file in the replica as source.
>>
>>         volume heal <VOLNAME> split-brain source-brick
>> <HOSTNAME:BRICKNAME>
>>               Selects <HOSTNAME:BRICKNAME> as the source for all the
>> files that are in split-brain in that replica and heals them.
>>
>>         volume heal <VOLNAME> split-brain source-brick
>> <HOSTNAME:BRICKNAME> <FILE>
>>               Selects the split-brained <FILE> present in
>> <HOSTNAME:BRICKNAME> as source and completes heal.
>>
>
> D
>
> On 23 January 2017 at 16:28, James Wilkins <nebulai at gmail.com> wrote:
>
>> Hello,
>>
>> I have a couple of gluster clusters - setup with distributed/replicated
>> volumes that have starting incrementing the heal-count from statistics -
>> and for some files returning input/output error when attempting to access
>> said files from a fuse mount.
>>
>> If i take one volume, from one cluster as an example:
>>
>> gluster volume heal storage01 statistics info
>> <snip>
>> Brick storage02.<redacted>:/storage/sdc/brick_storage01
>> Number of entries: 595
>> </snip>
>>
>> And then proceed to look at one of these files (have found 2 copies - one
>> on each server / brick)
>>
>> First brick:
>>
>> # getfattr -m . -d -e hex  /storage/sdc/brick_storage01/
>> projects/183-57c559ea4d60e-canary-test--node02/wordpress285-
>> data/html/wp-content/themes/twentyfourteen/single.php
>> getfattr: Removing leading '/' from absolute path names
>> # file: storage/sdc/brick_storage01/projects/183-57c559ea4d60e-canar
>> y-test--node02/wordpress285-data/html/wp-content/themes/
>> twentyfourteen/single.php
>> security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7
>> 573746572645f627269636b5f743a733000
>> trusted.afr.dirty=0x000000000000000000000000
>> trusted.afr.storage01-client-0=0x000000020000000100000000
>> trusted.bit-rot.version=0x02000000000000005874e2cd0000459d
>> trusted.gfid=0xda4253be1c2647b7b6ec5c045d61d216
>> trusted.glusterfs.quota.c9764826-596a-4886-9bc0-60ee9b3fce44
>> .contri.1=0x00000000000006000000000000000001
>> trusted.pgfid.c9764826-596a-4886-9bc0-60ee9b3fce44=0x00000001
>>
>> Second Brick:
>>
>> # getfattr -m . -d -e hex /storage/sdc/brick_storage01/p
>> rojects/183-57c559ea4d60e-canary-test--node02/wordpress285-
>> data/html/wp-content/themes/twentyfourteen/single.php
>> getfattr: Removing leading '/' from absolute path names
>> # file: storage/sdc/brick_storage01/projects/183-57c559ea4d60e-canar
>> y-test--node02/wordpress285-data/html/wp-content/themes/
>> twentyfourteen/single.php
>> security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7
>> 573746572645f627269636b5f743a733000
>> trusted.afr.dirty=0x000000000000000000000000
>> trusted.bit-rot.version=0x020000000000000057868423000d6332
>> trusted.gfid=0x14f74b04679345289dbd3290a3665cbc
>> trusted.glusterfs.quota.47e007ee-6f91-4187-81f8-90a393deba2b
>> .contri.1=0x00000000000006000000000000000001
>> trusted.pgfid.47e007ee-6f91-4187-81f8-90a393deba2b=0x00000001
>>
>>
>>
>> I can see the only the first brick has the appropiate
>> trusted.afr.<client> tag - e.g in this case
>>
>> trusted.afr.storage01-client-0=0x000000020000000100000000
>>
>> Files are same size under stat - just the access/modify/change dates are
>> different.
>>
>> My first question is - reading https://gluster.readth
>> edocs.io/en/latest/Troubleshooting/split-brain/ this suggests that i
>> should have this field on both copies of the files - or am I mis-reading?
>>
>> Secondly - am I correct that each one of these entries will require
>> manual fixing?  (I have approx 6K files/directories in this state over two
>> clusters - which appears like an awful lot of manual fixing)
>>
>> I've checked gluster volume info <vol> and all appropiate
>> services/self-heal daemon are running.  We've even tried a full heal/heal
>> and iterating over parts of the filesystem in question with find / stat /
>> md5sum.
>>
>> Any input appreciated.
>>
>> Cheers,
>>
>>
>>
>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170123/51424234/attachment.html>


More information about the Gluster-users mailing list