[Gluster-users] Input/output error - would not heal

lejeczek peljasz at yahoo.co.uk
Thu Feb 9 16:45:39 UTC 2017



On 09/02/17 06:07, Nag Pavan Chilakam wrote:
> ----- Original Message -----
> From: "lejeczek" <peljasz at yahoo.co.uk>
> To: "Nag Pavan Chilakam" <nchilaka at redhat.com>
> Cc: gluster-users at gluster.org
> Sent: Wednesday, 8 February, 2017 7:15:29 PM
> Subject: Re: [Gluster-users] Input/output error - would not heal
>
>
>
> On 08/02/17 06:11, Nag Pavan Chilakam wrote:
>> "gluster volume info" and "gluster vol status" would help in us debug faster.
>>
>> However, coming to gfid mismatch, yes the file "abbreviations.log" (I assume the other brick copy also to be " abbreviations.log" and not "breviations.log" ....typo mistake?) is in gfid mismatch leading to IO error(gfid splitbrain)
>> Resolving data and metadata splitbrains are not recommended to be done from backend brick.
>> But in case of a GFID splitbrain(like in file abbreviations.log), the only method available is resolving from backend brick
>> You can read more about this in http://gluster.readthedocs.io/en/latest/Troubleshooting/split-brain/?highlight=gfid   (Fixing Directory entry split-brain   section)
>> (There is a bug already existing to resolve gfid splitbrain using CLI )
>>
>>    
> I've read that doc, however I'm not sure what to do with
> bits that are not mentioned in that doc. Which is:
> when some xattr does not exist on one copy but does on the
> other, like:
>
> 3]$ getfattr -d -m . -e hex .vim.backup/.bash_profile.swp
> # file: .vim.backup/.bash_profile.swp
> security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
> trusted.afr.USER-HOME-client-0=0x000000010000000100000000
> trusted.afr.USER-HOME-client-5=0x000000010000000100000000
>
> 2]$ getfattr -d -m . -e hex .vim.backup/.bash_profile.swp
> # file: .vim.backup/.bash_profile.swp
> security.selinux=0x73797374656d5f753a6f626a6563745f723a64656661756c745f743a733000
> trusted.afr.USER-HOME-client-5=0x000000010000000100000000
> trusted.afr.USER-HOME-client-6=0x000000010000000100000000
>
> that means the file .bash_profile.swp is possibly in a data and metadata splitbrain
> I need to understand the volume configuration, that is the reason I am asking for volume info
> By seeing the above, I am guessing that it is a x3 volume(3 replica copies)

as per my first email:
...
v3.9. it's a two-brick volume, was three but removed one I
think a few hours before the problem was first noticed.
...

and vol info:

Volume Name: USER-HOME
Type: Replicate
Volume ID: 9e4ed9b7-373a-413b-bc82-b6f978e82ec4
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: 10.5.6.100:/__.aLocalStorages/3/0-GLUSTERs/0-USER
Brick2: 10.5.6.49:/__.aLocalStorages/3/0-GLUSTERs/0-USER
Options Reconfigured:
changelog.changelog: on
geo-replication.ignore-pid-check: on
geo-replication.indexing: on
performance.readdir-ahead: on
features.quota: on
features.inode-quota: on
features.quota-deem-statfs: on

many thanks,
L.

>
> unless the doc talks about it and I've gone (temporarily)
> blind, but if it's does not it would be great to include
> more scenarios/cases there.
> many thx.
> L.
>
>>
>> thanks,
>> nagpavan
>>
>>
>> ----- Original Message -----
>> From: "lejeczek" <peljasz at yahoo.co.uk>
>> To: "Nag Pavan Chilakam" <nchilaka at redhat.com>
>> Cc: gluster-users at gluster.org
>> Sent: Tuesday, 7 February, 2017 10:53:07 PM
>> Subject: Re: [Gluster-users] Input/output error - would not heal
>>
>>
>>
>> On 07/02/17 12:50, Nag Pavan Chilakam wrote:
>>> Hi,
>>> Can you help us with more information on the volume, like volume status and volume info
>>> One reason of "transport endpoint error" is the brick could be down
>>>
>>> Also, i see that the syntax used for healing is wrong.
>>> You need to use as below:
>>> gluster v heal <vname> split-brain source-brick <brick path> <filename considering brick path as />
>>>
>>> In yourcase if brick path is "/G-store/1" and the file to be healed is "that_file" , then use below syntax (in this case i am considering "that_file" lying under the brick path directly"
>>>
>>> gluster volume heal USER-HOME split-brain source-brick 10.5.6.100:/G-store/1 /that_file
>> that was that, my copy-paste typo, it does not heal.
>> Interestingly, that file is not reported by heal.
>>
>> I've replied to -  GFID Mismatch - Automatic Correction ? -
>> I think my problem is similar, here is a file the heal
>> actually sees:
>>
>>
>> $ gluster vol heal USER-HOME info
>> Brick
>> 10.5.6.100:/__.aLocalStorages/3/0-GLUSTERs/0-USER.HOME/aUser/.vim.backup/.bash_profile.swp
>>
>> Status: Connected
>> Number of entries: 1
>>
>> Brick
>> 10.5.6.49:/__.aLocalStorages/3/0-GLUSTERs/0-USER.HOME/aUser/.vim.backup/.bash_profile.swp
>>
>> Status: Connected
>> Number of entries: 1
>>
>> I'm copying+pasting what I said in that reply to that thread:
>> ...
>>
>> yep, I'm seeing the same:
>> as follows:
>> 3]$ getfattr -d -m . -e hex .
>> # file: .
>> security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
>> trusted.afr.USER-HOME-client-2=0x000000000000000000000000
>> trusted.afr.USER-HOME-client-3=0x000000000000000000000000
>> trusted.afr.USER-HOME-client-5=0x000000000000000000000000
>> trusted.afr.dirty=0x000000000000000000000000
>> trusted.gfid=0x06341b521ba94ab7938eca57f7a1824f
>> trusted.glusterfs.9e4ed9b7-373a-413b-bc82-b6f978e82ec4.xtime=0x5898e0cf000dd2fe
>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>> trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri.1=0x00701c90fcb11200fffffef6f08c798e0000006a99819205
>> trusted.glusterfs.quota.dirty=0x3000
>> trusted.glusterfs.quota.size.1=0x00701c90fcb11200fffffef6f08c798e0000006a99819205
>> 3]$ getfattr -d -m . -e hex .vim.backup
>> # file: .vim.backup
>> security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
>> trusted.afr.USER-HOME-client-3=0x000000000000000000000000
>> trusted.gfid=0x0b3a223955534de89086679a4dce8156
>> trusted.glusterfs.9e4ed9b7-373a-413b-bc82-b6f978e82ec4.xtime=0x5898621c0005d720
>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>> trusted.glusterfs.quota.06341b52-1ba9-4ab7-938e-ca57f7a1824f.contri.1=0x000000000000040000000000000000020000000000000001
>> trusted.glusterfs.quota.dirty=0x3000
>> trusted.glusterfs.quota.size.1=0x000000000000040000000000000000020000000000000001
>> 3]$ getfattr -d -m . -e hex .vim.backup/.bash_profile.swp
>> # file: .vim.backup/.bash_profile.swp
>> security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
>> trusted.afr.USER-HOME-client-0=0x000000010000000100000000
>> trusted.afr.USER-HOME-client-5=0x000000010000000100000000
>> trusted.gfid=0xc2693670fc6d4fed953f21dcb77a02cf
>> trusted.glusterfs.9e4ed9b7-373a-413b-bc82-b6f978e82ec4.xtime=0x5896043c000baa55
>> trusted.glusterfs.quota.0b3a2239-5553-4de8-9086-679a4dce8156.contri.1=0x00000000000000000000000000000001
>> trusted.pgfid.0b3a2239-5553-4de8-9086-679a4dce8156=0x00000001
>>
>> 2]$ getfattr -d -m . -e hex .
>> # file: .
>> security.selinux=0x73797374656d5f753a6f626a6563745f723a64656661756c745f743a733000
>> trusted.afr.USER-HOME-client-1=0x000000000000000000000000
>> trusted.afr.USER-HOME-client-2=0x000000000000000000000000
>> trusted.afr.USER-HOME-client-3=0x000000000000000000000000
>> trusted.afr.USER-HOME-client-5=0x000000000000000000000000
>> trusted.afr.dirty=0x000000000000000000000000
>> trusted.gfid=0x06341b521ba94ab7938eca57f7a1824f
>> trusted.glusterfs.9e4ed9b7-373a-413b-bc82-b6f978e82ec4.xtime=0x5898e0d000016f82
>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>> trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri.1=0xa5e66200a7a45000cb96fbf7d6336229fae7152d8851097b
>> trusted.glusterfs.quota.dirty=0x3000
>> trusted.glusterfs.quota.size.1=0xa5e66200a7a45000cb96fbf7d6336229fae7152d8851097b
>> 2]$ getfattr -d -m . -e hex .vim.backup
>> # file: .vim.backup
>> security.selinux=0x73797374656d5f753a6f626a6563745f723a64656661756c745f743a733000
>> trusted.afr.USER-HOME-client-3=0x000000000000000000000000
>> trusted.gfid=0x0b3a223955534de89086679a4dce8156
>> trusted.glusterfs.9e4ed9b7-373a-413b-bc82-b6f978e82ec4.xtime=0x5898621b000855fe
>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>> trusted.glusterfs.quota.06341b52-1ba9-4ab7-938e-ca57f7a1824f.contri.1=0x000000000000040000000000000000020000000000000001
>> trusted.glusterfs.quota.dirty=0x3000
>> trusted.glusterfs.quota.size.1=0x000000000000040000000000000000020000000000000001
>> 2]$ getfattr -d -m . -e hex .vim.backup/.bash_profile.swp
>> # file: .vim.backup/.bash_profile.swp
>> security.selinux=0x73797374656d5f753a6f626a6563745f723a64656661756c745f743a733000
>> trusted.afr.USER-HOME-client-5=0x000000010000000100000000
>> trusted.afr.USER-HOME-client-6=0x000000010000000100000000
>> trusted.gfid=0x8a5b6e4ad18a49d0bae920c9cf8673a5
>> trusted.glusterfs.9e4ed9b7-373a-413b-bc82-b6f978e82ec4.xtime=0x5896041400058191
>> trusted.glusterfs.quota.0b3a2239-5553-4de8-9086-679a4dce8156.contri.1=0x00000000000000000000000000000001
>> trusted.pgfid.0b3a2239-5553-4de8-9086-679a4dce8156=0x00000001
>>
>>
>> and the log bit:
>>
>> GFID mismatch for
>> <gfid:335bf026-68bd-4bf4-9cba-63b65b12c0b1>/abbreviations.xlsx
>> 6e9a7fa1-bfbe-4a59-ad06-a78ee1625649 on USER-HOME-client-6
>> and 773b7ea3-31cf-4b24-94f0-0b61b573b082 on USER-HOME-client-0
>>
>> most importantly, is there a workaround for the problem, as
>> of now? Before the bug, it it's such, gets fixed.
>> b.w.
>> L.
>>
>> -- end of paste
>>
>> but I have a few more files which also report I/O errors and
>> heal does NOT even mention them:
>> on the brick that is a "master"(samba was sharing to the users)
>>
>> # file: abbreviations.log
>> security.selinux=0x73797374656d5f753a6f626a6563745f723a64656661756c745f743a733000
>> trusted.afr.dirty=0x000000000000000000000000
>> trusted.bit-rot.version=0x0200000000000000589081fd00060376
>> trusted.gfid=0x773b7ea331cf4b2494f00b61b573b082
>> trusted.glusterfs.quota.335bf026-68bd-4bf4-9cba-63b65b12c0b1.contri.1=0x0000000000002a000000000000000001
>> trusted.pgfid.335bf026-68bd-4bf4-9cba-63b65b12c0b1=0x00000001
>>
>> on the "slave" brick, was not serving files (certainly not
>> that file) to any users:
>>
>> # file: bbreviations.log
>> security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
>> trusted.afr.dirty=0x000000000000000000000000
>> trusted.bit-rot.version=0x0200000000000000588c958a000b67ea
>> trusted.gfid=0x6e9a7fa1bfbe4a59ad06a78ee1625649
>> trusted.glusterfs.quota.335bf026-68bd-4bf4-9cba-63b65b12c0b1.contri.1=0x0000000000002a000000000000000001
>> trusted.pgfid.335bf026-68bd-4bf4-9cba-63b65b12c0b1=0x00000001
>>
>> Question that probably was answered many times: is it OK to
>> tamper with(remove in my case) files directly from bricks?
>> many thanks,
>> L.
>>
>>
>>> regards,
>>> nag pavan
>>>
>>> ----- Original Message -----
>>> From: "lejeczek"<peljasz at yahoo.co.uk>
>>> To:gluster-users at gluster.org
>>> Sent: Tuesday, 7 February, 2017 2:00:51 AM
>>> Subject: [Gluster-users] Input/output error - would not heal
>>>
>>> hi all
>>>
>>> I'm hitting such problem:
>>>
>>> $ gluster vol heal USER-HOME split-brain source-brick
>>> 10.5.6.100:/G-store/1
>>> Healing gfid:8a5b6e4a-d18a-49d0-bae9-20c9cf8673a5
>>> failed:Transport endpoint is not connected.
>>> Status: Connected
>>> Number of healed entries: 0
>>>
>>>
>>>
>>>
>>> $ gluster vol heal USER-HOME split-brain source-brick
>>> 10.5.6.100:/G-store/1/that_file
>>> Lookup failed on /that_file:Input/output  error
>>> Volume heal failed.
>>>
>>> v3.9. it's a two-brick volume, was three but removed one I
>>> think a few hours before the problem was first noticed.
>>> what to do now?
>>> many thanks,
>>> L
>>>
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://lists.gluster.org/mailman/listinfo/gluster-users



More information about the Gluster-users mailing list