<div dir="ltr"><div><div>Hi Richard,<br><br></div>Thanks for the informations. As you said there is gfid mismatch for the file.<br></div><div>On brick-1 & brick-2 the gfids are same & on brick-3 the gfid is different.<br>This is not considered as split-brain because we have two good copies here.</div><div>Gluster 3.10 does not have a method to resolve this situation other than the<br>manual intervention [1]. Basically what you need to do is remove the file and<br>the gfid hardlink from brick-3 (considering brick-3 entry as bad). Then when<br>you do a lookup for the file from mount it will recreate the entry on the other brick.<br></div><br><div>Form 3.12 we have methods to resolve this situation with the cli option [2] and<br></div><div>with favorite-child-policy [3]. For the time being you can use [1] to resolve this<br></div><div>and if you can consider upgrading to 3.12 that would give you options to handle<br></div><div>these scenarios.<br><br>[1] <a href="http://docs.gluster.org/en/latest/Troubleshooting/split-brain/#fixing-directory-entry-split-brain">http://docs.gluster.org/en/latest/Troubleshooting/split-brain/#fixing-directory-entry-split-brain</a><br>[2] <a href="https://review.gluster.org/#/c/17485/">https://review.gluster.org/#/c/17485/</a><br>[3] <a href="https://review.gluster.org/#/c/16878/">https://review.gluster.org/#/c/16878/</a><br><br></div><div>HTH,<br></div><div>Karthik<br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Oct 26, 2017 at 12:40 PM, Richard Neuboeck <span dir="ltr"><<a href="mailto:hawk@tbi.univie.ac.at" target="_blank">hawk@tbi.univie.ac.at</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi Karthik,<br>
<br>
thanks for taking a look at this. I'm not working with gluster long<br>
enough to make heads or tails out of the logs. The logs are attached to<br>
this mail and here is the other information:<br>
<br>
# gluster volume info home<br>
<br>
Volume Name: home<br>
Type: Replicate<br>
Volume ID: fe6218ae-f46b-42b3-a467-<wbr>5fc6a36ad48a<br>
Status: Started<br>
Snapshot Count: 1<br>
Number of Bricks: 1 x 3 = 3<br>
Transport-type: tcp<br>
Bricks:<br>
Brick1: sphere-six:/srv/gluster_home/<wbr>brick<br>
Brick2: sphere-five:/srv/gluster_home/<wbr>brick<br>
Brick3: sphere-four:/srv/gluster_home/<wbr>brick<br>
Options Reconfigured:<br>
features.barrier: disable<br>
cluster.quorum-type: auto<br>
cluster.server-quorum-type: server<br>
nfs.disable: on<br>
performance.readdir-ahead: on<br>
transport.address-family: inet<br>
features.cache-invalidation: on<br>
features.cache-invalidation-<wbr>timeout: 600<br>
performance.stat-prefetch: on<br>
performance.cache-samba-<wbr>metadata: on<br>
performance.cache-<wbr>invalidation: on<br>
performance.md-cache-timeout: 600<br>
network.inode-lru-limit: 90000<br>
performance.cache-size: 1GB<br>
performance.client-io-threads: on<br>
cluster.lookup-optimize: on<br>
cluster.readdir-optimize: on<br>
features.quota: on<br>
features.inode-quota: on<br>
features.quota-deem-statfs: on<br>
cluster.server-quorum-ratio: 51%<br>
<br>
<br>
[root@sphere-four ~]# getfattr -d -e hex -m .<br>
/srv/gluster_home/brick/<wbr>romanoch/.mozilla/firefox/<wbr>vzzqqxrm.default-<wbr>1396429081309/sessionstore-<wbr>backups/recovery.baklz4<br>
getfattr: Removing leading '/' from absolute path names<br>
# file:<br>
srv/gluster_home/brick/<wbr>romanoch/.mozilla/firefox/<wbr>vzzqqxrm.default-<wbr>1396429081309/sessionstore-<wbr>backups/recovery.baklz4<br>
security.selinux=<wbr>0x73797374656d5f753a6f626a6563<wbr>745f723a756e6c6162656c65645f74<wbr>3a733000<br>
trusted.afr.dirty=<wbr>0x000000000000000000000000<br>
trusted.bit-rot.version=<wbr>0x020000000000000059df20a40006<wbr>f989<br>
trusted.gfid=<wbr>0xda1c94b1643544b18d5b6f4654f6<wbr>0bf5<br>
trusted.glusterfs.quota.<wbr>48e9eea6-cda6-4e53-bb4a-<wbr>72059debf4c2.contri.1=<wbr>0x0000000000009a00000000000000<wbr>0001<br>
trusted.pgfid.48e9eea6-cda6-<wbr>4e53-bb4a-72059debf4c2=<wbr>0x00000001<br>
<br>
[root@sphere-five ~]# getfattr -d -e hex -m .<br>
/srv/gluster_home/brick/<wbr>romanoch/.mozilla/firefox/<wbr>vzzqqxrm.default-<wbr>1396429081309/sessionstore-<wbr>backups/recovery.baklz4<br>
getfattr: Removing leading '/' from absolute path names<br>
# file:<br>
srv/gluster_home/brick/<wbr>romanoch/.mozilla/firefox/<wbr>vzzqqxrm.default-<wbr>1396429081309/sessionstore-<wbr>backups/recovery.baklz4<br>
security.selinux=<wbr>0x73797374656d5f753a6f626a6563<wbr>745f723a756e6c6162656c65645f74<wbr>3a733000<br>
trusted.afr.dirty=<wbr>0x000000000000000000000000<br>
trusted.afr.home-client-4=<wbr>0x000000010000000100000000<br>
trusted.bit-rot.version=<wbr>0x020000000000000059df1f310006<wbr>ce63<br>
trusted.gfid=<wbr>0xea8ecfd195fd4e48b994fd0a2da2<wbr>26f9<br>
trusted.glusterfs.quota.<wbr>48e9eea6-cda6-4e53-bb4a-<wbr>72059debf4c2.contri.1=<wbr>0x0000000000009a00000000000000<wbr>0001<br>
trusted.pgfid.48e9eea6-cda6-<wbr>4e53-bb4a-72059debf4c2=<wbr>0x00000001<br>
<br>
[root@sphere-six ~]# getfattr -d -e hex -m .<br>
/srv/gluster_home/brick/<wbr>romanoch/.mozilla/firefox/<wbr>vzzqqxrm.default-<wbr>1396429081309/sessionstore-<wbr>backups/recovery.baklz4<br>
getfattr: Removing leading '/' from absolute path names<br>
# file:<br>
srv/gluster_home/brick/<wbr>romanoch/.mozilla/firefox/<wbr>vzzqqxrm.default-<wbr>1396429081309/sessionstore-<wbr>backups/recovery.baklz4<br>
security.selinux=<wbr>0x73797374656d5f753a6f626a6563<wbr>745f723a756e6c6162656c65645f74<wbr>3a733000<br>
trusted.afr.dirty=<wbr>0x000000000000000000000000<br>
trusted.afr.home-client-4=<wbr>0x000000010000000100000000<br>
trusted.bit-rot.version=<wbr>0x020000000000000059df11cd0005<wbr>48ec<br>
trusted.gfid=<wbr>0xea8ecfd195fd4e48b994fd0a2da2<wbr>26f9<br>
trusted.glusterfs.quota.<wbr>48e9eea6-cda6-4e53-bb4a-<wbr>72059debf4c2.contri.1=<wbr>0x0000000000009a00000000000000<wbr>0001<br>
trusted.pgfid.48e9eea6-cda6-<wbr>4e53-bb4a-72059debf4c2=<wbr>0x00000001<br>
<br>
Cheers<br>
Richard<br>
<br>
On 26.10.17 07:41, Karthik Subrahmanya wrote:<br>
> HeyRichard,<br>
<span class="">><br>
> Could you share the following informations please?<br>
> 1. gluster volume info <volname><br>
> 2. getfattr output of that file from all the bricks<br>
> getfattr -d -e hex -m . <brickpath/filepath><br>
> 3. glustershd & glfsheal logs<br>
><br>
> Regards,<br>
> Karthik<br>
><br>
> On Thu, Oct 26, 2017 at 10:21 AM, Amar Tumballi <<a href="mailto:atumball@redhat.com">atumball@redhat.com</a><br>
</span><span class="">> <mailto:<a href="mailto:atumball@redhat.com">atumball@redhat.com</a>>> wrote:<br>
><br>
> On a side note, try recently released health report tool, and see if<br>
> it does diagnose any issues in setup. Currently you may have to run<br>
> it in all the three machines.<br>
><br>
><br>
><br>
> On 26-Oct-2017 6:50 AM, "Amar Tumballi" <<a href="mailto:atumball@redhat.com">atumball@redhat.com</a><br>
</span><span class="">> <mailto:<a href="mailto:atumball@redhat.com">atumball@redhat.com</a>>> wrote:<br>
><br>
> Thanks for this report. This week many of the developers are at<br>
> Gluster Summit in Prague, will be checking this and respond next<br>
> week. Hope that's fine.<br>
><br>
> Thanks,<br>
> Amar<br>
><br>
><br>
> On 25-Oct-2017 3:07 PM, "Richard Neuboeck"<br>
</span><span class="">> <<a href="mailto:hawk@tbi.univie.ac.at">hawk@tbi.univie.ac.at</a> <mailto:<a href="mailto:hawk@tbi.univie.ac.at">hawk@tbi.univie.ac.at</a>><wbr>> wrote:<br>
><br>
> Hi Gluster Gurus,<br>
><br>
> I'm using a gluster volume as home for our users. The volume is<br>
> replica 3, running on CentOS 7, gluster version 3.10<br>
> (3.10.6-1.el7.x86_64). Clients are running Fedora 26 and also<br>
> gluster 3.10 (3.10.6-3.fc26.x86_64).<br>
><br>
> During the data backup I got an I/O error on one file. Manually<br>
> checking for this file on a client confirms this:<br>
><br>
> ls -l<br>
> romanoch/.mozilla/firefox/<wbr>vzzqqxrm.default-<wbr>1396429081309/sessionstore-<wbr>backups/<br>
> ls: cannot access<br>
> 'romanoch/.mozilla/firefox/<wbr>vzzqqxrm.default-<wbr>1396429081309/sessionstore-<wbr>backups/<a href="http://recovery.ba" rel="noreferrer" target="_blank">recovery.ba</a><br>
</span>> <<a href="http://recovery.ba" rel="noreferrer" target="_blank">http://recovery.ba</a>>klz4':<br>
<span class="">> Input/output error<br>
> total 2015<br>
> -rw-------. 1 romanoch tbi 998211 Sep 15 18:44 previous.js<br>
> -rw-------. 1 romanoch tbi 65222 Oct 17 17:57 previous.jsonlz4<br>
> -rw-------. 1 romanoch tbi 149161 Oct 1 13:46 recovery.bak<br>
> -?????????? ? ? ? ? ? recovery.baklz4<br>
><br>
> Out of curiosity I checked all the bricks for this file. It's<br>
> present there. Making a checksum shows that the file is<br>
> different on<br>
> one of the three replica servers.<br>
><br>
> Querying healing information shows that the file should be<br>
> healed:<br>
> # gluster volume heal home info<br>
> Brick sphere-six:/srv/gluster_home/<wbr>brick<br>
> /romanoch/.mozilla/firefox/<wbr>vzzqqxrm.default-<wbr>1396429081309/sessionstore-<wbr>backups/<a href="http://recovery.ba" rel="noreferrer" target="_blank">recovery.ba</a><br>
</span>> <<a href="http://recovery.ba" rel="noreferrer" target="_blank">http://recovery.ba</a>>klz4<br>
<span class="">><br>
> Status: Connected<br>
> Number of entries: 1<br>
><br>
> Brick sphere-five:/srv/gluster_home/<wbr>brick<br>
> /romanoch/.mozilla/firefox/<wbr>vzzqqxrm.default-<wbr>1396429081309/sessionstore-<wbr>backups/<a href="http://recovery.ba" rel="noreferrer" target="_blank">recovery.ba</a><br>
</span>> <<a href="http://recovery.ba" rel="noreferrer" target="_blank">http://recovery.ba</a>>klz4<br>
<div><div class="h5">><br>
> Status: Connected<br>
> Number of entries: 1<br>
><br>
> Brick sphere-four:/srv/gluster_home/<wbr>brick<br>
> Status: Connected<br>
> Number of entries: 0<br>
><br>
> Manually triggering heal doesn't report an error but also<br>
> does not<br>
> heal the file.<br>
> # gluster volume heal home<br>
> Launching heal operation to perform index self heal on<br>
> volume home<br>
> has been successful<br>
><br>
> Same with a full heal<br>
> # gluster volume heal home full<br>
> Launching heal operation to perform full self heal on volume<br>
> home<br>
> has been successful<br>
><br>
> According to the split brain query that's not the problem:<br>
> # gluster volume heal home info split-brain<br>
> Brick sphere-six:/srv/gluster_home/<wbr>brick<br>
> Status: Connected<br>
> Number of entries in split-brain: 0<br>
><br>
> Brick sphere-five:/srv/gluster_home/<wbr>brick<br>
> Status: Connected<br>
> Number of entries in split-brain: 0<br>
><br>
> Brick sphere-four:/srv/gluster_home/<wbr>brick<br>
> Status: Connected<br>
> Number of entries in split-brain: 0<br>
><br>
><br>
> I have no idea why this situation arose in the first place<br>
> and also<br>
> no idea as how to solve this problem. I would highly<br>
> appreciate any<br>
> helpful feedback I can get.<br>
><br>
> The only mention in the logs matching this file is a rename<br>
> operation:<br>
> /var/log/glusterfs/bricks/srv-<wbr>gluster_home-brick.log:[2017-<wbr>10-23<br>
> 09:19:11.561661] I [MSGID: 115061]<br>
> [server-rpc-fops.c:1022:<wbr>server_rename_cbk] 0-home-server:<br>
> 5266153:<br>
> RENAME<br>
> /romanoch/.mozilla/firefox/<wbr>vzzqqxrm.default-<wbr>1396429081309/sessionstore-<wbr>backups/recovery.jsonlz4<br>
> (48e9eea6-cda6-4e53-bb4a-<wbr>72059debf4c2/recovery.jsonlz4) -><br>
> /romanoch/.mozilla/firefox/<wbr>vzzqqxrm.default-<wbr>1396429081309/sessionstore-<wbr>backups/<a href="http://recovery.ba" rel="noreferrer" target="_blank">recovery.ba</a><br>
</div></div>> <<a href="http://recovery.ba" rel="noreferrer" target="_blank">http://recovery.ba</a>>klz4<br>
<span class="">> (48e9eea6-cda6-4e53-bb4a-<wbr>72059debf4c2/recovery.baklz4), client:<br>
> romulus.tbi.univie.ac.at-<wbr>11894-2017/10/18-07:06:07:<wbr>206366-home-client-3-0-0,<br>
> error-xlator: home-posix [No data available]<br>
><br>
> I enabled directory quotas the same day this problem showed<br>
> up but<br>
> I'm not sure how quotas could have an effect like this<br>
> (maybe unless<br>
> the limit is reached but that's also not the case).<br>
><br>
> Thanks again if anyone as an idea.<br>
> Cheers<br>
> Richard<br>
> --<br>
> /dev/null<br>
><br>
><br>
> ______________________________<wbr>_________________<br>
> Gluster-users mailing list<br>
</span>> <a href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a> <mailto:<a href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.<wbr>org</a>><br>
> <a href="http://lists.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">http://lists.gluster.org/<wbr>mailman/listinfo/gluster-users</a><br>
<span class="im HOEnZb">> <<a href="http://lists.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">http://lists.gluster.org/<wbr>mailman/listinfo/gluster-users</a><wbr>><br>
><br>
><br>
> ______________________________<wbr>_________________<br>
> Gluster-users mailing list<br>
</span><div class="HOEnZb"><div class="h5">> <a href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a> <mailto:<a href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.<wbr>org</a>><br>
> <a href="http://lists.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">http://lists.gluster.org/<wbr>mailman/listinfo/gluster-users</a><br>
> <<a href="http://lists.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">http://lists.gluster.org/<wbr>mailman/listinfo/gluster-users</a><wbr>><br>
><br>
><br>
<br>
</div></div></blockquote></div><br></div>