[Gluster-users] not healing one file

Richard Neuboeck hawk at tbi.univie.ac.at
Fri Oct 27 07:36:14 UTC 2017


Hi Karthik,

the procedure you described in [1] worked perfectly. After removing the
file and the hardlink on brick-3 it got healed. Client access is restored.

Since there doesn't seem to be an access problem with Fedora's 3.10
client, I'll upgrade all servers to 3.12. Just in case.

Thank you so much your help!
All the best
Richard

On 26.10.17 11:34, Karthik Subrahmanya wrote:
> Hi Richard,
> 
> Thanks for the informations. As you said there is gfid mismatch for the
> file.
> On brick-1 & brick-2 the gfids are same & on brick-3 the gfid is different.
> This is not considered as split-brain because we have two good copies here.
> Gluster 3.10 does not have a method to resolve this situation other than the
> manual intervention [1]. Basically what you need to do is remove the
> file and
> the gfid hardlink from brick-3 (considering brick-3 entry as bad). Then when
> you do a lookup for the file from mount it will recreate the entry on
> the other brick.
> 
> Form 3.12 we have methods to resolve this situation with the cli option
> [2] and
> with favorite-child-policy [3]. For the time being you can use [1] to
> resolve this
> and if you can consider upgrading to 3.12 that would give you options to
> handle
> these scenarios.
> 
> [1]
> http://docs.gluster.org/en/latest/Troubleshooting/split-brain/#fixing-directory-entry-split-brain
> [2] https://review.gluster.org/#/c/17485/
> [3] https://review.gluster.org/#/c/16878/
> 
> HTH,
> Karthik
> 
> On Thu, Oct 26, 2017 at 12:40 PM, Richard Neuboeck
> <hawk at tbi.univie.ac.at <mailto:hawk at tbi.univie.ac.at>> wrote:
> 
>     Hi Karthik,
> 
>     thanks for taking a look at this. I'm not working with gluster long
>     enough to make heads or tails out of the logs. The logs are attached to
>     this mail and here is the other information:
> 
>     # gluster volume info home
> 
>     Volume Name: home
>     Type: Replicate
>     Volume ID: fe6218ae-f46b-42b3-a467-5fc6a36ad48a
>     Status: Started
>     Snapshot Count: 1
>     Number of Bricks: 1 x 3 = 3
>     Transport-type: tcp
>     Bricks:
>     Brick1: sphere-six:/srv/gluster_home/brick
>     Brick2: sphere-five:/srv/gluster_home/brick
>     Brick3: sphere-four:/srv/gluster_home/brick
>     Options Reconfigured:
>     features.barrier: disable
>     cluster.quorum-type: auto
>     cluster.server-quorum-type: server
>     nfs.disable: on
>     performance.readdir-ahead: on
>     transport.address-family: inet
>     features.cache-invalidation: on
>     features.cache-invalidation-timeout: 600
>     performance.stat-prefetch: on
>     performance.cache-samba-metadata: on
>     performance.cache-invalidation: on
>     performance.md-cache-timeout: 600
>     network.inode-lru-limit: 90000
>     performance.cache-size: 1GB
>     performance.client-io-threads: on
>     cluster.lookup-optimize: on
>     cluster.readdir-optimize: on
>     features.quota: on
>     features.inode-quota: on
>     features.quota-deem-statfs: on
>     cluster.server-quorum-ratio: 51%
> 
> 
>     [root at sphere-four ~]# getfattr -d -e hex -m .
>     /srv/gluster_home/brick/romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/sessionstore-backups/recovery.baklz4
>     getfattr: Removing leading '/' from absolute path names
>     # file:
>     srv/gluster_home/brick/romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/sessionstore-backups/recovery.baklz4
>     security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
>     trusted.afr.dirty=0x000000000000000000000000
>     trusted.bit-rot.version=0x020000000000000059df20a40006f989
>     trusted.gfid=0xda1c94b1643544b18d5b6f4654f60bf5
>     trusted.glusterfs.quota.48e9eea6-cda6-4e53-bb4a-72059debf4c2.contri.1=0x0000000000009a000000000000000001
>     trusted.pgfid.48e9eea6-cda6-4e53-bb4a-72059debf4c2=0x00000001
> 
>     [root at sphere-five ~]# getfattr -d -e hex -m .
>     /srv/gluster_home/brick/romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/sessionstore-backups/recovery.baklz4
>     getfattr: Removing leading '/' from absolute path names
>     # file:
>     srv/gluster_home/brick/romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/sessionstore-backups/recovery.baklz4
>     security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
>     trusted.afr.dirty=0x000000000000000000000000
>     trusted.afr.home-client-4=0x000000010000000100000000
>     trusted.bit-rot.version=0x020000000000000059df1f310006ce63
>     trusted.gfid=0xea8ecfd195fd4e48b994fd0a2da226f9
>     trusted.glusterfs.quota.48e9eea6-cda6-4e53-bb4a-72059debf4c2.contri.1=0x0000000000009a000000000000000001
>     trusted.pgfid.48e9eea6-cda6-4e53-bb4a-72059debf4c2=0x00000001
> 
>     [root at sphere-six ~]# getfattr -d -e hex -m .
>     /srv/gluster_home/brick/romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/sessionstore-backups/recovery.baklz4
>     getfattr: Removing leading '/' from absolute path names
>     # file:
>     srv/gluster_home/brick/romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/sessionstore-backups/recovery.baklz4
>     security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
>     trusted.afr.dirty=0x000000000000000000000000
>     trusted.afr.home-client-4=0x000000010000000100000000
>     trusted.bit-rot.version=0x020000000000000059df11cd000548ec
>     trusted.gfid=0xea8ecfd195fd4e48b994fd0a2da226f9
>     trusted.glusterfs.quota.48e9eea6-cda6-4e53-bb4a-72059debf4c2.contri.1=0x0000000000009a000000000000000001
>     trusted.pgfid.48e9eea6-cda6-4e53-bb4a-72059debf4c2=0x00000001
> 
>     Cheers
>     Richard
> 
>     On 26.10.17 07:41, Karthik Subrahmanya wrote:
>     > HeyRichard,
>     >
>     > Could you share the following informations please?
>     > 1. gluster volume info <volname>
>     > 2. getfattr output of that file from all the bricks
>     >     getfattr -d -e hex -m . <brickpath/filepath>
>     > 3. glustershd & glfsheal logs
>     >
>     > Regards,
>     > Karthik
>     >
>     > On Thu, Oct 26, 2017 at 10:21 AM, Amar Tumballi <atumball at redhat.com <mailto:atumball at redhat.com>
>     > <mailto:atumball at redhat.com <mailto:atumball at redhat.com>>> wrote:
>     >
>     >     On a side note, try recently released health report tool, and see if
>     >     it does diagnose any issues in setup. Currently you may have to run
>     >     it in all the three machines.
>     >
>     >
>     >
>     >     On 26-Oct-2017 6:50 AM, "Amar Tumballi" <atumball at redhat.com <mailto:atumball at redhat.com>
>     >     <mailto:atumball at redhat.com <mailto:atumball at redhat.com>>> wrote:
>     >
>     >         Thanks for this report. This week many of the developers are at
>     >         Gluster Summit in Prague, will be checking this and respond next
>     >         week. Hope that's fine.
>     >
>     >         Thanks,
>     >         Amar
>     >
>     >
>     >         On 25-Oct-2017 3:07 PM, "Richard Neuboeck"
>     >         <hawk at tbi.univie.ac.at <mailto:hawk at tbi.univie.ac.at>
>     <mailto:hawk at tbi.univie.ac.at <mailto:hawk at tbi.univie.ac.at>>> wrote:
>     >
>     >             Hi Gluster Gurus,
>     >
>     >             I'm using a gluster volume as home for our users. The volume is
>     >             replica 3, running on CentOS 7, gluster version 3.10
>     >             (3.10.6-1.el7.x86_64). Clients are running Fedora 26 and also
>     >             gluster 3.10 (3.10.6-3.fc26.x86_64).
>     >
>     >             During the data backup I got an I/O error on one file. Manually
>     >             checking for this file on a client confirms this:
>     >
>     >             ls -l
>     >             romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/sessionstore-backups/
>     >             ls: cannot access
>     >             'romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/sessionstore-backups/recovery.ba
>     <http://recovery.ba>
>     >             <http://recovery.ba>klz4':
>     >             Input/output error
>     >             total 2015
>     >             -rw-------. 1 romanoch tbi 998211 Sep 15 18:44 previous.js
>     >             -rw-------. 1 romanoch tbi  65222 Oct 17 17:57 previous.jsonlz4
>     >             -rw-------. 1 romanoch tbi 149161 Oct  1 13:46 recovery.bak
>     >             -?????????? ? ?        ?        ?            ? recovery.baklz4
>     >
>     >             Out of curiosity I checked all the bricks for this file. It's
>     >             present there. Making a checksum shows that the file is
>     >             different on
>     >             one of the three replica servers.
>     >
>     >             Querying healing information shows that the file should be
>     >             healed:
>     >             # gluster volume heal home info
>     >             Brick sphere-six:/srv/gluster_home/brick
>     >             /romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/sessionstore-backups/recovery.ba
>     <http://recovery.ba>
>     >             <http://recovery.ba>klz4
>     >
>     >             Status: Connected
>     >             Number of entries: 1
>     >
>     >             Brick sphere-five:/srv/gluster_home/brick
>     >             /romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/sessionstore-backups/recovery.ba
>     <http://recovery.ba>
>     >             <http://recovery.ba>klz4
>     >
>     >             Status: Connected
>     >             Number of entries: 1
>     >
>     >             Brick sphere-four:/srv/gluster_home/brick
>     >             Status: Connected
>     >             Number of entries: 0
>     >
>     >             Manually triggering heal doesn't report an error but also
>     >             does not
>     >             heal the file.
>     >             # gluster volume heal home
>     >             Launching heal operation to perform index self heal on
>     >             volume home
>     >             has been successful
>     >
>     >             Same with a full heal
>     >             # gluster volume heal home full
>     >             Launching heal operation to perform full self heal on
>     volume
>     >             home
>     >             has been successful
>     >
>     >             According to the split brain query that's not the problem:
>     >             # gluster volume heal home info split-brain
>     >             Brick sphere-six:/srv/gluster_home/brick
>     >             Status: Connected
>     >             Number of entries in split-brain: 0
>     >
>     >             Brick sphere-five:/srv/gluster_home/brick
>     >             Status: Connected
>     >             Number of entries in split-brain: 0
>     >
>     >             Brick sphere-four:/srv/gluster_home/brick
>     >             Status: Connected
>     >             Number of entries in split-brain: 0
>     >
>     >
>     >             I have no idea why this situation arose in the first place
>     >             and also
>     >             no idea as how to solve this problem. I would highly
>     >             appreciate any
>     >             helpful feedback I can get.
>     >
>     >             The only mention in the logs matching this file is a
>     rename
>     >             operation:
>     >           
>      /var/log/glusterfs/bricks/srv-gluster_home-brick.log:[2017-10-23
>     >             09:19:11.561661] I [MSGID: 115061]
>     >             [server-rpc-fops.c:1022:server_rename_cbk] 0-home-server:
>     >             5266153:
>     >             RENAME
>     >           
>      /romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/sessionstore-backups/recovery.jsonlz4
>     >             (48e9eea6-cda6-4e53-bb4a-72059debf4c2/recovery.jsonlz4) ->
>     >           
>      /romanoch/.mozilla/firefox/vzzqqxrm.default-1396429081309/sessionstore-backups/recovery.ba
>     <http://recovery.ba>
>     >             <http://recovery.ba>klz4
>     >             (48e9eea6-cda6-4e53-bb4a-72059debf4c2/recovery.baklz4), client:
>     >             romulus.tbi.univie.ac.at-11894-2017/10/18-07:06:07:206366-home-client-3-0-0,
>     >             error-xlator: home-posix [No data available]
>     >
>     >             I enabled directory quotas the same day this problem showed
>     >             up but
>     >             I'm not sure how quotas could have an effect like this
>     >             (maybe unless
>     >             the limit is reached but that's also not the case).
>     >
>     >             Thanks again if anyone as an idea.
>     >             Cheers
>     >             Richard
>     >             --
>     >             /dev/null
>     >
>     >
>     >             _______________________________________________
>     >             Gluster-users mailing list
>     >             Gluster-users at gluster.org
>     <mailto:Gluster-users at gluster.org> <mailto:Gluster-users at gluster.org
>     <mailto:Gluster-users at gluster.org>>
>     >           
>      http://lists.gluster.org/mailman/listinfo/gluster-users
>     <http://lists.gluster.org/mailman/listinfo/gluster-users>
>     >             <http://lists.gluster.org/mailman/listinfo/gluster-users
>     <http://lists.gluster.org/mailman/listinfo/gluster-users>>
>     >
>     >
>     >     _______________________________________________
>     >     Gluster-users mailing list
>     >     Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>     <mailto:Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>>
>     >     http://lists.gluster.org/mailman/listinfo/gluster-users
>     <http://lists.gluster.org/mailman/listinfo/gluster-users>
>     >     <http://lists.gluster.org/mailman/listinfo/gluster-users
>     <http://lists.gluster.org/mailman/listinfo/gluster-users>>
>     >
>     >
> 
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 866 bytes
Desc: OpenPGP digital signature
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20171027/5bb3d15a/attachment.sig>


More information about the Gluster-users mailing list