<div dir="ltr"><div><div><div><div><div><div><div>Hey,<br><br></div>Can you give us the volume info output for this volume?<br></div>Why are you not able to get the xattrs from arbiter brick? It is the same way as you do it on data bricks.<br></div><div>The changelog xattrs are named trusted.afr.virt_images-<wbr>client-{1,2,3} in the getxattr outputs you have provided.<br></div><div>Did you do a remove-brick and add-brick any time? Otherwise it will be trusted.afr.virt_images-<wbr>client-{0,1,2} usually.<br></div><br></div>To overcome this scenario you can do what Ben Turner had suggested. Select the source copy and change the xattrs manually.<br></div>I am suspecting that it has hit the arbiter becoming source for data heal bug. But to confirm that we need the xattrs on the arbiter brick also.<br><br></div>Regards,<br></div>Karthik<br><div><div><div><div><div><div><div><br></div></div></div></div></div></div></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Dec 21, 2017 at 9:55 AM, Ben Turner <span dir="ltr"><<a href="mailto:bturner@redhat.com" target="_blank">bturner@redhat.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Here is the process for resolving split brain on replica 2:<br>
<br>
<a href="https://access.redhat.com/documentation/en-US/Red_Hat_Storage/2.1/html/Administration_Guide/Recovering_from_File_Split-brain.html" rel="noreferrer" target="_blank">https://access.redhat.com/<wbr>documentation/en-US/Red_Hat_<wbr>Storage/2.1/html/<wbr>Administration_Guide/<wbr>Recovering_from_File_Split-<wbr>brain.html</a><br>
<br>
It should be pretty much the same for replica 3, you change the xattrs with something like:<br>
<br>
# setfattr -n trusted.afr.vol-client-0 -v 0x000000000000000100000000 /gfs/brick-b/a<br>
<br>
When I try to decide which copy to use I normally run things like:<br>
<br>
# stat /<path to brick>/pat/to/file<br>
<br>
Check out the access and change times of the file on the back end bricks. I normally pick the copy with the latest access / change times. I'll also check:<br>
<br>
# md5sum /<path to brick>/pat/to/file<br>
<br>
Compare the hashes of the file on both bricks to see if the data actually differs. If the data is the same it makes choosing the proper replica easier.<br>
<br>
Any idea how you got in this situation? Did you have a loss of NW connectivity? I see you are using server side quorum, maybe check the logs for any loss of quorum? I wonder if there was a loos of quorum and there was some sort of race condition hit:<br>
<br>
<a href="http://docs.gluster.org/en/latest/Administrator%20Guide/arbiter-volumes-and-quorum/#server-quorum-and-some-pitfalls" rel="noreferrer" target="_blank">http://docs.gluster.org/en/<wbr>latest/Administrator%20Guide/<wbr>arbiter-volumes-and-quorum/#<wbr>server-quorum-and-some-<wbr>pitfalls</a><br>
<br>
"Unlike in client-quorum where the volume becomes read-only when quorum is lost, loss of server-quorum in a particular node makes glusterd kill the brick processes on that node (for the participating volumes) making even reads impossible."<br>
<br>
I wonder if the killing of brick processes could have led to some sort of race condition where writes were serviced on one brick / the arbiter and not the other?<br>
<br>
If you can find a reproducer for this please open a BZ with it, I have been seeing something similar(I think) but I haven't been able to run the issue down yet.<br>
<span class="HOEnZb"><font color="#888888"><br>
-b<br>
</font></span><div class="HOEnZb"><div class="h5"><br>
----- Original Message -----<br>
> From: "Henrik Juul Pedersen" <<a href="mailto:hjp@liab.dk">hjp@liab.dk</a>><br>
> To: <a href="mailto:gluster-users@gluster.org">gluster-users@gluster.org</a><br>
> Cc: "Henrik Juul Pedersen" <<a href="mailto:henrik@corepower.dk">henrik@corepower.dk</a>><br>
> Sent: Wednesday, December 20, 2017 1:26:37 PM<br>
> Subject: [Gluster-users] Gluster replicate 3 arbiter 1 in split brain. gluster cli seems unaware<br>
><br>
> Hi,<br>
><br>
> I have the following volume:<br>
><br>
> Volume Name: virt_images<br>
> Type: Replicate<br>
> Volume ID: 9f3c8273-4d9d-4af2-a4e7-<wbr>4cb4a51e3594<br>
> Status: Started<br>
> Snapshot Count: 2<br>
> Number of Bricks: 1 x (2 + 1) = 3<br>
> Transport-type: tcp<br>
> Bricks:<br>
> Brick1: virt3:/data/virt_images/brick<br>
> Brick2: virt2:/data/virt_images/brick<br>
> Brick3: printserver:/data/virt_images/<wbr>brick (arbiter)<br>
> Options Reconfigured:<br>
> features.quota-deem-statfs: on<br>
> features.inode-quota: on<br>
> features.quota: on<br>
> features.barrier: disable<br>
> features.scrub: Active<br>
> features.bitrot: on<br>
> nfs.rpc-auth-allow: on<br>
> server.allow-insecure: on<br>
> user.cifs: off<br>
> features.shard: off<br>
> cluster.shd-wait-qlength: 10000<br>
> cluster.locking-scheme: granular<br>
> cluster.data-self-heal-<wbr>algorithm: full<br>
> cluster.server-quorum-type: server<br>
> cluster.quorum-type: auto<br>
> cluster.eager-lock: enable<br>
> network.remote-dio: enable<br>
> performance.low-prio-threads: 32<br>
> performance.io-cache: off<br>
> performance.read-ahead: off<br>
> performance.quick-read: off<br>
> nfs.disable: on<br>
> transport.address-family: inet<br>
> server.outstanding-rpc-limit: 512<br>
><br>
> After a server reboot (brick 1) a single file has become unavailable:<br>
> # touch fedora27.qcow2<br>
> touch: setting times of 'fedora27.qcow2': Input/output error<br>
><br>
> Looking at the split brain status from the client side cli:<br>
> # getfattr -n replica.split-brain-status fedora27.qcow2<br>
> # file: fedora27.qcow2<br>
> replica.split-brain-status="<wbr>The file is not under data or metadata<br>
> split-brain"<br>
><br>
> However, in the client side log, a split brain is mentioned:<br>
> [2017-12-20 18:05:23.570762] E [MSGID: 108008]<br>
> [afr-transaction.c:2629:afr_<wbr>write_txn_refresh_done]<br>
> 0-virt_images-replicate-0: Failing SETATTR on gfid<br>
> 7a36937d-52fc-4b55-a932-<wbr>99e2328f02ba: split-brain observed.<br>
> [Input/output error]<br>
> [2017-12-20 18:05:23.576046] W [MSGID: 108027]<br>
> [afr-common.c:2733:afr_<wbr>discover_done] 0-virt_images-replicate-0: no<br>
> read subvols for /fedora27.qcow2<br>
> [2017-12-20 18:05:23.578149] W [fuse-bridge.c:1153:fuse_<wbr>setattr_cbk]<br>
> 0-glusterfs-fuse: 182: SETATTR() /fedora27.qcow2 => -1 (Input/output<br>
> error)<br>
><br>
> = Server side<br>
><br>
> No mention of a possible split brain:<br>
> # gluster volume heal virt_images info split-brain<br>
> Brick virt3:/data/virt_images/brick<br>
> Status: Connected<br>
> Number of entries in split-brain: 0<br>
><br>
> Brick virt2:/data/virt_images/brick<br>
> Status: Connected<br>
> Number of entries in split-brain: 0<br>
><br>
> Brick printserver:/data/virt_images/<wbr>brick<br>
> Status: Connected<br>
> Number of entries in split-brain: 0<br>
><br>
> The info command shows the file:<br>
> ]# gluster volume heal virt_images info<br>
> Brick virt3:/data/virt_images/brick<br>
> /fedora27.qcow2<br>
> Status: Connected<br>
> Number of entries: 1<br>
><br>
> Brick virt2:/data/virt_images/brick<br>
> /fedora27.qcow2<br>
> Status: Connected<br>
> Number of entries: 1<br>
><br>
> Brick printserver:/data/virt_images/<wbr>brick<br>
> /fedora27.qcow2<br>
> Status: Connected<br>
> Number of entries: 1<br>
><br>
><br>
> The heal and heal full commands does nothing, and I can't find<br>
> anything in the logs about them trying and failing to fix the file.<br>
><br>
> Trying to manually resolve the split brain from cli gives the following:<br>
> # gluster volume heal virt_images split-brain source-brick<br>
> virt3:/data/virt_images/brick /fedora27.qcow2<br>
> Healing /fedora27.qcow2 failed: File not in split-brain.<br>
> Volume heal failed.<br>
><br>
> The attrs from virt2 and virt3 are as follows:<br>
> [root@virt2 brick]# getfattr -d -m . -e hex fedora27.qcow2<br>
> # file: fedora27.qcow2<br>
> trusted.afr.dirty=<wbr>0x000000000000000000000000<br>
> trusted.afr.virt_images-<wbr>client-1=<wbr>0x000002280000000000000000<br>
> trusted.afr.virt_images-<wbr>client-3=<wbr>0x000000000000000000000000<br>
> trusted.bit-rot.version=<wbr>0x1d000000000000005a3aa0db000c<wbr>6563<br>
> trusted.gfid=<wbr>0x7a36937d52fc4b55a93299e2328f<wbr>02ba<br>
> trusted.gfid2path.<wbr>c076c6ac27a43012=<wbr>0x30303030303030302d303030302d<wbr>303030302d303030302d3030303030<wbr>303030303030312f6665646f726132<wbr>372e71636f7732<br>
> trusted.glusterfs.quota.<wbr>00000000-0000-0000-0000-<wbr>000000000001.contri.1=<wbr>0x00000000a49eb000000000000000<wbr>0001<br>
> trusted.pgfid.00000000-0000-<wbr>0000-0000-000000000001=<wbr>0x00000001<br>
><br>
> # file: fedora27.qcow2<br>
> trusted.afr.dirty=<wbr>0x000000000000000000000000<br>
> trusted.afr.virt_images-<wbr>client-2=<wbr>0x000003ef0000000000000000<br>
> trusted.afr.virt_images-<wbr>client-3=<wbr>0x000000000000000000000000<br>
> trusted.bit-rot.version=<wbr>0x19000000000000005a3a9f82000c<wbr>382a<br>
> trusted.gfid=<wbr>0x7a36937d52fc4b55a93299e2328f<wbr>02ba<br>
> trusted.gfid2path.<wbr>c076c6ac27a43012=<wbr>0x30303030303030302d303030302d<wbr>303030302d303030302d3030303030<wbr>303030303030312f6665646f726132<wbr>372e71636f7732<br>
> trusted.glusterfs.quota.<wbr>00000000-0000-0000-0000-<wbr>000000000001.contri.1=<wbr>0x00000000a2fbe000000000000000<wbr>0001<br>
> trusted.pgfid.00000000-0000-<wbr>0000-0000-000000000001=<wbr>0x00000001<br>
><br>
> I don't know how to find similar information from the arbiter...<br>
><br>
> Versions are the same on all three systems:<br>
> # glusterd --version<br>
> glusterfs 3.12.2<br>
><br>
> # gluster volume get all cluster.op-version<br>
> Option Value<br>
> ------ -----<br>
> cluster.op-version 31202<br>
><br>
> I might try upgrading to version 3.13.0 tomorrow, but I want to hear<br>
> you out first.<br>
><br>
> How do I fix this? Do I have to manually change the file attributes?<br>
><br>
> Also, in the guides for manual resolution through setfattr, all the<br>
> bricks are listed with a "trusted.afr.<volume>-client-<<wbr>brick>". But in<br>
> my system (as can be seen above), I only see the other bricks? So<br>
> which attributes should be changes into what?<br>
><br>
><br>
><br>
> I hope someone might know a solution. If you need any more information<br>
> I'll try and provide it. I can probably change the virtual machine to<br>
> another image for now.<br>
><br>
> Best regards,<br>
> Henrik Juul Pedersen<br>
> LIAB ApS<br>
> ______________________________<wbr>_________________<br>
> Gluster-users mailing list<br>
> <a href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a><br>
> <a href="http://lists.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">http://lists.gluster.org/<wbr>mailman/listinfo/gluster-users</a><br>
><br>
______________________________<wbr>_________________<br>
Gluster-users mailing list<br>
<a href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a><br>
<a href="http://lists.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">http://lists.gluster.org/<wbr>mailman/listinfo/gluster-users</a><br>
</div></div></blockquote></div><br></div>