[Gluster-users] not healing one file

Karthik Subrahmanya ksubrahm at redhat.com
Thu Oct 26 09:34:14 UTC 2017


Hi Richard,

Thanks for the informations. As you said there is gfid mismatch for the
file.
On brick-1 & brick-2 the gfids are same & on brick-3 the gfid is different.
This is not considered as split-brain because we have two good copies here.
Gluster 3.10 does not have a method to resolve this situation other than the
manual intervention [1]. Basically what you need to do is remove the file
and
the gfid hardlink from brick-3 (considering brick-3 entry as bad). Then when
you do a lookup for the file from mount it will recreate the entry on the
other brick.

Form 3.12 we have methods to resolve this situation with the cli option [2]
and
with favorite-child-policy [3]. For the time being you can use [1] to
resolve this
and if you can consider upgrading to 3.12 that would give you options to
handle
these scenarios.

[1]
http://docs.gluster.org/en/latest/Troubleshooting/split-brain/#fixing-directory-entry-split-brain
[2] https://review.gluster.org/#/c/17485/
[3] https://review.gluster.org/#/c/16878/

HTH,
Karthik

On Thu, Oct 26, 2017 at 12:40 PM, Richard Neuboeck <hawk at tbi.univie.ac.at>
wrote:

> Hi Karthik,
>
> thanks for taking a look at this. I'm not working with gluster long
> enough to make heads or tails out of the logs. The logs are attached to
> this mail and here is the other information:
>
> # gluster volume info home
>
> Volume Name: home
> Type: Replicate
> Volume ID: fe6218ae-f46b-42b3-a467-5fc6a36ad48a
> Status: Started
> Snapshot Count: 1
> Number of Bricks: 1 x 3 = 3
> Transport-type: tcp
> Bricks:
> Brick1: sphere-six:/srv/gluster_home/brick
> Brick2: sphere-five:/srv/gluster_home/brick
> Brick3: sphere-four:/srv/gluster_home/brick
> Options Reconfigured:
> features.barrier: disable
> cluster.quorum-type: auto
> cluster.server-quorum-type: server
> nfs.disable: on
> performance.readdir-ahead: on
> transport.address-family: inet
> features.cache-invalidation: on
> features.cache-invalidation-timeout: 600
> performance.stat-prefetch: on
> performance.cache-samba-metadata: on
> performance.cache-invalidation: on
> performance.md-cache-timeout: 600
> network.inode-lru-limit: 90000
> performance.cache-size: 1GB
> performance.client-io-threads: on
> cluster.lookup-optimize: on
> cluster.readdir-optimize: on
> features.quota: on
> features.inode-quota: on
> features.quota-deem-statfs: on
> cluster.server-quorum-ratio: 51%
>
>
> [root at sphere-four ~]# getfattr -d -e hex -m .
> /srv/gluster_home/brick/romanoch/.mozilla/firefox/vzzqqxrm.default-
> 1396429081309/sessionstore-backups/recovery.baklz4
> getfattr: Removing leading '/' from absolute path names
> # file:
> srv/gluster_home/brick/romanoch/.mozilla/firefox/vzzqqxrm.default-
> 1396429081309/sessionstore-backups/recovery.baklz4
> security.selinux=0x73797374656d5f753a6f626a6563
> 745f723a756e6c6162656c65645f743a733000
> trusted.afr.dirty=0x000000000000000000000000
> trusted.bit-rot.version=0x020000000000000059df20a40006f989
> trusted.gfid=0xda1c94b1643544b18d5b6f4654f60bf5
> trusted.glusterfs.quota.48e9eea6-cda6-4e53-bb4a-72059debf4c2.contri.1=
> 0x0000000000009a000000000000000001
> trusted.pgfid.48e9eea6-cda6-4e53-bb4a-72059debf4c2=0x00000001
>
> [root at sphere-five ~]# getfattr -d -e hex -m .
> /srv/gluster_home/brick/romanoch/.mozilla/firefox/vzzqqxrm.default-
> 1396429081309/sessionstore-backups/recovery.baklz4
> getfattr: Removing leading '/' from absolute path names
> # file:
> srv/gluster_home/brick/romanoch/.mozilla/firefox/vzzqqxrm.default-
> 1396429081309/sessionstore-backups/recovery.baklz4
> security.selinux=0x73797374656d5f753a6f626a6563
> 745f723a756e6c6162656c65645f743a733000
> trusted.afr.dirty=0x000000000000000000000000
> trusted.afr.home-client-4=0x000000010000000100000000
> trusted.bit-rot.version=0x020000000000000059df1f310006ce63
> trusted.gfid=0xea8ecfd195fd4e48b994fd0a2da226f9
> trusted.glusterfs.quota.48e9eea6-cda6-4e53-bb4a-72059debf4c2.contri.1=
> 0x0000000000009a000000000000000001
> trusted.pgfid.48e9eea6-cda6-4e53-bb4a-72059debf4c2=0x00000001
>
> [root at sphere-six ~]# getfattr -d -e hex -m .
> /srv/gluster_home/brick/romanoch/.mozilla/firefox/vzzqqxrm.default-
> 1396429081309/sessionstore-backups/recovery.baklz4
> getfattr: Removing leading '/' from absolute path names
> # file:
> srv/gluster_home/brick/romanoch/.mozilla/firefox/vzzqqxrm.default-
> 1396429081309/sessionstore-backups/recovery.baklz4
> security.selinux=0x73797374656d5f753a6f626a6563
> 745f723a756e6c6162656c65645f743a733000
> trusted.afr.dirty=0x000000000000000000000000
> trusted.afr.home-client-4=0x000000010000000100000000
> trusted.bit-rot.version=0x020000000000000059df11cd000548ec
> trusted.gfid=0xea8ecfd195fd4e48b994fd0a2da226f9
> trusted.glusterfs.quota.48e9eea6-cda6-4e53-bb4a-72059debf4c2.contri.1=
> 0x0000000000009a000000000000000001
> trusted.pgfid.48e9eea6-cda6-4e53-bb4a-72059debf4c2=0x00000001
>
> Cheers
> Richard
>
> On 26.10.17 07:41, Karthik Subrahmanya wrote:
> > HeyRichard,
> >
> > Could you share the following informations please?
> > 1. gluster volume info <volname>
> > 2. getfattr output of that file from all the bricks
> >     getfattr -d -e hex -m . <brickpath/filepath>
> > 3. glustershd & glfsheal logs
> >
> > Regards,
> > Karthik
> >
> > On Thu, Oct 26, 2017 at 10:21 AM, Amar Tumballi <atumball at redhat.com
> > <mailto:atumball at redhat.com>> wrote:
> >
> >     On a side note, try recently released health report tool, and see if
> >     it does diagnose any issues in setup. Currently you may have to run
> >     it in all the three machines.
> >
> >
> >
> >     On 26-Oct-2017 6:50 AM, "Amar Tumballi" <atumball at redhat.com
> >     <mailto:atumball at redhat.com>> wrote:
> >
> >         Thanks for this report. This week many of the developers are at
> >         Gluster Summit in Prague, will be checking this and respond next
> >         week. Hope that's fine.
> >
> >         Thanks,
> >         Amar
> >
> >
> >         On 25-Oct-2017 3:07 PM, "Richard Neuboeck"
> >         <hawk at tbi.univie.ac.at <mailto:hawk at tbi.univie.ac.at>> wrote:
> >
> >             Hi Gluster Gurus,
> >
> >             I'm using a gluster volume as home for our users. The volume
> is
> >             replica 3, running on CentOS 7, gluster version 3.10
> >             (3.10.6-1.el7.x86_64). Clients are running Fedora 26 and also
> >             gluster 3.10 (3.10.6-3.fc26.x86_64).
> >
> >             During the data backup I got an I/O error on one file.
> Manually
> >             checking for this file on a client confirms this:
> >
> >             ls -l
> >             romanoch/.mozilla/firefox/vzzqqxrm.default-
> 1396429081309/sessionstore-backups/
> >             ls: cannot access
> >             'romanoch/.mozilla/firefox/vzzqqxrm.default-
> 1396429081309/sessionstore-backups/recovery.ba
> >             <http://recovery.ba>klz4':
> >             Input/output error
> >             total 2015
> >             -rw-------. 1 romanoch tbi 998211 Sep 15 18:44 previous.js
> >             -rw-------. 1 romanoch tbi  65222 Oct 17 17:57
> previous.jsonlz4
> >             -rw-------. 1 romanoch tbi 149161 Oct  1 13:46 recovery.bak
> >             -?????????? ? ?        ?        ?            ?
> recovery.baklz4
> >
> >             Out of curiosity I checked all the bricks for this file. It's
> >             present there. Making a checksum shows that the file is
> >             different on
> >             one of the three replica servers.
> >
> >             Querying healing information shows that the file should be
> >             healed:
> >             # gluster volume heal home info
> >             Brick sphere-six:/srv/gluster_home/brick
> >             /romanoch/.mozilla/firefox/vzzqqxrm.default-
> 1396429081309/sessionstore-backups/recovery.ba
> >             <http://recovery.ba>klz4
> >
> >             Status: Connected
> >             Number of entries: 1
> >
> >             Brick sphere-five:/srv/gluster_home/brick
> >             /romanoch/.mozilla/firefox/vzzqqxrm.default-
> 1396429081309/sessionstore-backups/recovery.ba
> >             <http://recovery.ba>klz4
> >
> >             Status: Connected
> >             Number of entries: 1
> >
> >             Brick sphere-four:/srv/gluster_home/brick
> >             Status: Connected
> >             Number of entries: 0
> >
> >             Manually triggering heal doesn't report an error but also
> >             does not
> >             heal the file.
> >             # gluster volume heal home
> >             Launching heal operation to perform index self heal on
> >             volume home
> >             has been successful
> >
> >             Same with a full heal
> >             # gluster volume heal home full
> >             Launching heal operation to perform full self heal on volume
> >             home
> >             has been successful
> >
> >             According to the split brain query that's not the problem:
> >             # gluster volume heal home info split-brain
> >             Brick sphere-six:/srv/gluster_home/brick
> >             Status: Connected
> >             Number of entries in split-brain: 0
> >
> >             Brick sphere-five:/srv/gluster_home/brick
> >             Status: Connected
> >             Number of entries in split-brain: 0
> >
> >             Brick sphere-four:/srv/gluster_home/brick
> >             Status: Connected
> >             Number of entries in split-brain: 0
> >
> >
> >             I have no idea why this situation arose in the first place
> >             and also
> >             no idea as how to solve this problem. I would highly
> >             appreciate any
> >             helpful feedback I can get.
> >
> >             The only mention in the logs matching this file is a rename
> >             operation:
> >             /var/log/glusterfs/bricks/srv-gluster_home-brick.log:[2017-
> 10-23
> >             09:19:11.561661] I [MSGID: 115061]
> >             [server-rpc-fops.c:1022:server_rename_cbk] 0-home-server:
> >             5266153:
> >             RENAME
> >             /romanoch/.mozilla/firefox/vzzqqxrm.default-
> 1396429081309/sessionstore-backups/recovery.jsonlz4
> >             (48e9eea6-cda6-4e53-bb4a-72059debf4c2/recovery.jsonlz4) ->
> >             /romanoch/.mozilla/firefox/vzzqqxrm.default-
> 1396429081309/sessionstore-backups/recovery.ba
> >             <http://recovery.ba>klz4
> >             (48e9eea6-cda6-4e53-bb4a-72059debf4c2/recovery.baklz4),
> client:
> >             romulus.tbi.univie.ac.at-11894-2017/10/18-07:06:07:
> 206366-home-client-3-0-0,
> >             error-xlator: home-posix [No data available]
> >
> >             I enabled directory quotas the same day this problem showed
> >             up but
> >             I'm not sure how quotas could have an effect like this
> >             (maybe unless
> >             the limit is reached but that's also not the case).
> >
> >             Thanks again if anyone as an idea.
> >             Cheers
> >             Richard
> >             --
> >             /dev/null
> >
> >
> >             _______________________________________________
> >             Gluster-users mailing list
> >             Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
> >             http://lists.gluster.org/mailman/listinfo/gluster-users
> >             <http://lists.gluster.org/mailman/listinfo/gluster-users>
> >
> >
> >     _______________________________________________
> >     Gluster-users mailing list
> >     Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
> >     http://lists.gluster.org/mailman/listinfo/gluster-users
> >     <http://lists.gluster.org/mailman/listinfo/gluster-users>
> >
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20171026/994c4076/attachment.html>


More information about the Gluster-users mailing list