<div dir="ltr"><div>Hi Henrik,<br><br></div>Thanks for providing the required outputs. See my replies inline.<br><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Dec 21, 2017 at 10:42 PM, Henrik Juul Pedersen <span dir="ltr"><<a href="mailto:hjp@liab.dk" target="_blank">hjp@liab.dk</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi Karthik and Ben,<br>
<br>
I'll try and reply to you inline.<br>
<span class=""><br>
On 21 December 2017 at 07:18, Karthik Subrahmanya <<a href="mailto:ksubrahm@redhat.com">ksubrahm@redhat.com</a>> wrote:<br>
> Hey,<br>
><br>
> Can you give us the volume info output for this volume?<br>
<br>
</span># gluster volume info virt_images<br>
<div><div class="h5"><br>
Volume Name: virt_images<br>
Type: Replicate<br>
Volume ID: 9f3c8273-4d9d-4af2-a4e7-<wbr>4cb4a51e3594<br>
Status: Started<br>
Snapshot Count: 2<br>
Number of Bricks: 1 x (2 + 1) = 3<br>
Transport-type: tcp<br>
Bricks:<br>
Brick1: virt3:/data/virt_images/brick<br>
Brick2: virt2:/data/virt_images/brick<br>
Brick3: printserver:/data/virt_images/<wbr>brick (arbiter)<br>
Options Reconfigured:<br>
features.quota-deem-statfs: on<br>
features.inode-quota: on<br>
features.quota: on<br>
features.barrier: disable<br>
features.scrub: Active<br>
features.bitrot: on<br>
nfs.rpc-auth-allow: on<br>
server.allow-insecure: on<br>
user.cifs: off<br>
features.shard: off<br>
cluster.shd-wait-qlength: 10000<br>
cluster.locking-scheme: granular<br>
cluster.data-self-heal-<wbr>algorithm: full<br>
cluster.server-quorum-type: server<br>
cluster.quorum-type: auto<br>
cluster.eager-lock: enable<br>
network.remote-dio: enable<br>
performance.low-prio-threads: 32<br>
performance.io-cache: off<br>
performance.read-ahead: off<br>
performance.quick-read: off<br>
nfs.disable: on<br>
transport.address-family: inet<br>
server.outstanding-rpc-limit: 512<br>
<br>
</div></div><span class="">> Why are you not able to get the xattrs from arbiter brick? It is the same<br>
> way as you do it on data bricks.<br>
<br>
</span>Yes I must have confused myself yesterday somehow, here it is in full<br>
from all three bricks:<br>
<br>
Brick 1 (virt2): # getfattr -d -m . -e hex fedora27.qcow2<br>
<span class=""># file: fedora27.qcow2<br>
trusted.afr.dirty=<wbr>0x000000000000000000000000<br>
trusted.afr.virt_images-<wbr>client-1=<wbr>0x000002280000000000000000<br>
trusted.afr.virt_images-<wbr>client-3=<wbr>0x000000000000000000000000<br>
trusted.bit-rot.version=<wbr>0x1d000000000000005a3aa0db000c<wbr>6563<br>
trusted.gfid=<wbr>0x7a36937d52fc4b55a93299e2328f<wbr>02ba<br>
trusted.gfid2path.<wbr>c076c6ac27a43012=<wbr>0x30303030303030302d303030302d<wbr>303030302d303030302d3030303030<wbr>303030303030312f6665646f726132<wbr>372e71636f7732<br>
trusted.glusterfs.quota.<wbr>00000000-0000-0000-0000-<wbr>000000000001.contri.1=<wbr>0x00000000a49eb000000000000000<wbr>0001<br>
trusted.pgfid.00000000-0000-<wbr>0000-0000-000000000001=<wbr>0x00000001<br>
<br>
</span>Brick 2 (virt3): # getfattr -d -m . -e hex fedora27.qcow2<br>
# file: fedora27.qcow2<br>
trusted.afr.dirty=<wbr>0x000000000000000000000000<br>
<span class="">trusted.afr.virt_images-<wbr>client-2=<wbr>0x000003ef0000000000000000<br>
trusted.afr.virt_images-<wbr>client-3=<wbr>0x000000000000000000000000<br>
trusted.bit-rot.version=<wbr>0x19000000000000005a3a9f82000c<wbr>382a<br>
trusted.gfid=<wbr>0x7a36937d52fc4b55a93299e2328f<wbr>02ba<br>
trusted.gfid2path.<wbr>c076c6ac27a43012=<wbr>0x30303030303030302d303030302d<wbr>303030302d303030302d3030303030<wbr>303030303030312f6665646f726132<wbr>372e71636f7732<br>
trusted.glusterfs.quota.<wbr>00000000-0000-0000-0000-<wbr>000000000001.contri.1=<wbr>0x00000000a2fbe000000000000000<wbr>0001<br>
trusted.pgfid.00000000-0000-<wbr>0000-0000-000000000001=<wbr>0x00000001<br>
<br>
</span>Brick 3 - arbiter (printserver): # getfattr -d -m . -e hex fedora27.qcow2<br>
<span class=""># file: fedora27.qcow2<br>
trusted.afr.dirty=<wbr>0x000000000000000000000000<br>
trusted.afr.virt_images-<wbr>client-1=<wbr>0x000002280000000000000000<br>
</span>trusted.bit-rot.version=<wbr>0x31000000000000005a3923720007<wbr>3206<br>
trusted.gfid=<wbr>0x7a36937d52fc4b55a93299e2328f<wbr>02ba<br>
trusted.gfid2path.<wbr>c076c6ac27a43012=<wbr>0x30303030303030302d303030302d<wbr>303030302d303030302d3030303030<wbr>303030303030312f6665646f726132<wbr>372e71636f7732<br>
trusted.glusterfs.quota.<wbr>00000000-0000-0000-0000-<wbr>000000000001.contri.1=<wbr>0x0000000000000000000000000000<wbr>0001<br>
trusted.pgfid.00000000-0000-<wbr>0000-0000-000000000001=<wbr>0x00000001<br>
<br>
I was expecting trusted.afr.virt_images-<wbr>client-{1,2,3} on all bricks?<br></blockquote><div>From AFR-V2 we do not have self blaming attrs. So you will see a brick blaming other bricks only.<br></div><div>For example brcik1 can blame brick2 & brick 3, not itself.<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<span class=""><br>
> The changelog xattrs are named trusted.afr.virt_images-<wbr>client-{1,2,3} in the<br>
> getxattr outputs you have provided.<br>
> Did you do a remove-brick and add-brick any time? Otherwise it will be<br>
> trusted.afr.virt_images-<wbr>client-{0,1,2} usually.<br>
<br>
</span>Yes, the bricks was moved around initially; brick 0 was re-created as<br>
brick 2, and the arbiter was added later on as well.<br>
<span class=""><br>
><br>
> To overcome this scenario you can do what Ben Turner had suggested. Select<br>
> the source copy and change the xattrs manually.<br>
<br>
</span>I won't mind doing that, but again, the guides assume that I have<br>
trusted.afr.virt_images-<wbr>client-{1,2,3} on all bricks, so I'm not sure<br>
what to change to what, where. </blockquote><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<span class=""><br>
> I am suspecting that it has hit the arbiter becoming source for data heal<br>
> bug. But to confirm that we need the xattrs on the arbiter brick also.<br>
><br>
> Regards,<br>
> Karthik<br>
><br>
><br>
> On Thu, Dec 21, 2017 at 9:55 AM, Ben Turner <<a href="mailto:bturner@redhat.com">bturner@redhat.com</a>> wrote:<br>
>><br>
>> Here is the process for resolving split brain on replica 2:<br>
>><br>
>><br>
>> <a href="https://access.redhat.com/documentation/en-US/Red_Hat_Storage/2.1/html/Administration_Guide/Recovering_from_File_Split-brain.html" rel="noreferrer" target="_blank">https://access.redhat.com/<wbr>documentation/en-US/Red_Hat_<wbr>Storage/2.1/html/<wbr>Administration_Guide/<wbr>Recovering_from_File_Split-<wbr>brain.html</a><br>
>><br>
>> It should be pretty much the same for replica 3, you change the xattrs<br>
>> with something like:<br>
>><br>
>> # setfattr -n trusted.afr.vol-client-0 -v 0x000000000000000100000000<br>
>> /gfs/brick-b/a<br>
>><br>
>> When I try to decide which copy to use I normally run things like:<br>
>><br>
>> # stat /<path to brick>/pat/to/file<br>
>><br>
>> Check out the access and change times of the file on the back end bricks.<br>
>> I normally pick the copy with the latest access / change times. I'll also<br>
>> check:<br>
>><br>
>> # md5sum /<path to brick>/pat/to/file<br>
>><br>
>> Compare the hashes of the file on both bricks to see if the data actually<br>
>> differs. If the data is the same it makes choosing the proper replica<br>
>> easier.<br>
<br>
</span>The files on the bricks differ, so there was something changed, and<br>
not replicated.<br>
<br>
Thanks for the input, I've looked at that, but couldn't get it to fit,<br>
as I dont have trusted.afr.virt_images-<wbr>client-{1,2,3} on all bricks.<br></blockquote><div>You can choose any one of the copy as good based on the latest ctime/mtime.<br></div><div>Before doing anything keep the backup of both the copies, so that if something bad happens,<br></div><div>you will have the data safe.<br></div><div>Now choose one copy as good (based on timestamps/size/choosing a brick as source),<br>and reset the xattrs set for that on other brick. Then do lookup on that file from the mount.<br>That should resolve the issue.<br>Once you are done, please let us know the result.<br><br></div><div>Regards,<br></div><div>Karthik<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<span class=""><br>
>><br>
>> Any idea how you got in this situation? Did you have a loss of NW<br>
>> connectivity? I see you are using server side quorum, maybe check the logs<br>
>> for any loss of quorum? I wonder if there was a loos of quorum and there<br>
>> was some sort of race condition hit:<br>
>><br>
>><br>
>> <a href="http://docs.gluster.org/en/latest/Administrator%20Guide/arbiter-volumes-and-quorum/#server-quorum-and-some-pitfalls" rel="noreferrer" target="_blank">http://docs.gluster.org/en/<wbr>latest/Administrator%20Guide/<wbr>arbiter-volumes-and-quorum/#<wbr>server-quorum-and-some-<wbr>pitfalls</a><br>
>><br>
>> "Unlike in client-quorum where the volume becomes read-only when quorum is<br>
>> lost, loss of server-quorum in a particular node makes glusterd kill the<br>
>> brick processes on that node (for the participating volumes) making even<br>
>> reads impossible."<br>
<br>
</span>I might have had a loss of server quorum, but I cant seem to see<br>
exactly why or when from the logs:<br>
<br>
Times are synchronized between servers. Virt 3 was rebooted for<br>
service at 17:29:39. The shutdown logs show an issue with unmounting<br>
the bricks, probably because glusterd was still running:<br>
Dec 20 17:29:39 virt3 systemd[1]: Failed unmounting /data/virt_images.<br>
Dec 20 17:29:39 virt3 systemd[1]: data-filserver.mount: Mount process<br>
exited, code=exited status=32<br>
Dec 20 17:29:39 virt3 systemd[1]: Failed unmounting /data/filserver.<br>
Dec 20 17:29:39 virt3 systemd[1]: Unmounted /virt_images.<br>
Dec 20 17:29:39 virt3 systemd[1]: Stopped target Network is Online.<br>
Dec 20 17:29:39 virt3 systemd[1]: Stopping GlusterFS, a clustered<br>
file-system server...<br>
Dec 20 17:29:39 virt3 systemd[1]: Stopping Network Name Resolution...<br>
Dec 20 17:29:39 virt3 systemd[1]: Stopped GlusterFS, a clustered<br>
file-system server.<br>
<br>
I believe it was around this time, the virtual machine (running on<br>
virt2) was stopped by qemu.<br>
<br>
<br>
Brick 1 (virt2) only experienced loss of quorum when starting gluster<br>
(glusterd.log confirms this):<br>
Dec 20 17:22:03 virt2 systemd[1]: Starting GlusterFS, a clustered<br>
file-system server...<br>
Dec 20 17:22:05 virt2 glusterd[739]: [2017-12-20 16:22:05.997472] C<br>
[MSGID: 106002]<br>
[glusterd-server-quorum.c:355:<wbr>glusterd_do_volume_quorum_<wbr>action]<br>
0-management: Server quorum lost for volume filserver. Stopping local<br>
bricks.<br>
Dec 20 17:22:05 virt2 glusterd[739]: [2017-12-20 16:22:05.997666] C<br>
[MSGID: 106002]<br>
[glusterd-server-quorum.c:355:<wbr>glusterd_do_volume_quorum_<wbr>action]<br>
0-management: Server quorum lost for volume virt_images. Stopping<br>
local bricks.<br>
Dec 20 17:22:06 virt2 systemd[1]: Started GlusterFS, a clustered<br>
file-system server.<br>
Dec 20 17:22:11 virt2 glusterd[739]: [2017-12-20 16:22:11.387238] C<br>
[MSGID: 106003]<br>
[glusterd-server-quorum.c:349:<wbr>glusterd_do_volume_quorum_<wbr>action]<br>
0-management: Server quorum regained for volume filserver. Starting<br>
local bricks.<br>
Dec 20 17:22:11 virt2 glusterd[739]: [2017-12-20 16:22:11.390417] C<br>
[MSGID: 106003]<br>
[glusterd-server-quorum.c:349:<wbr>glusterd_do_volume_quorum_<wbr>action]<br>
0-management: Server quorum regained for volume virt_images. Starting<br>
local bricks.<br>
-- Reboot --<br>
Dec 20 18:41:35 virt2 systemd[1]: Starting GlusterFS, a clustered<br>
file-system server...<br>
Dec 20 18:41:41 virt2 systemd[1]: Started GlusterFS, a clustered<br>
file-system server.<br>
Dec 20 18:41:43 virt2 glusterd[748]: [2017-12-20 17:41:43.387633] C<br>
[MSGID: 106003]<br>
[glusterd-server-quorum.c:349:<wbr>glusterd_do_volume_quorum_<wbr>action]<br>
0-management: Server quorum regained for volume filserver. Starting<br>
local bricks.<br>
Dec 20 18:41:43 virt2 glusterd[748]: [2017-12-20 17:41:43.391080] C<br>
[MSGID: 106003]<br>
[glusterd-server-quorum.c:349:<wbr>glusterd_do_volume_quorum_<wbr>action]<br>
0-management: Server quorum regained for volume virt_images. Starting<br>
local bricks.<br>
<br>
<br>
Brick 2 (virt3) shows a network outage on the 19th, but everything<br>
worked fine afterwards:<br>
Dec 19 13:11:34 virt3 glusterd[10058]: [2017-12-19 12:11:34.382207] C<br>
[MSGID: 106003]<br>
[glusterd-server-quorum.c:349:<wbr>glusterd_do_volume_quorum_<wbr>action]<br>
0-management: Server quorum regained for volume filserver. Starting<br>
local bricks.<br>
Dec 19 13:11:34 virt3 glusterd[10058]: [2017-12-19 12:11:34.387324] C<br>
[MSGID: 106003]<br>
[glusterd-server-quorum.c:349:<wbr>glusterd_do_volume_quorum_<wbr>action]<br>
0-management: Server quorum regained for volume virt_images. Starting<br>
local bricks.<br>
Dec 20 17:29:39 virt3 systemd[1]: Stopping GlusterFS, a clustered<br>
file-system server...<br>
Dec 20 17:29:39 virt3 systemd[1]: Stopped GlusterFS, a clustered<br>
file-system server.<br>
-- Reboot --<br>
Dec 20 17:30:21 virt3 systemd[1]: Starting GlusterFS, a clustered<br>
file-system server...<br>
Dec 20 17:30:22 virt3 glusterd[394]: [2017-12-20 16:30:22.826828] C<br>
[MSGID: 106002]<br>
[glusterd-server-quorum.c:355:<wbr>glusterd_do_volume_quorum_<wbr>action]<br>
0-management: Server quorum lost for volume filserver. Stopping local<br>
bricks.<br>
Dec 20 17:30:22 virt3 glusterd[394]: [2017-12-20 16:30:22.827188] C<br>
[MSGID: 106002]<br>
[glusterd-server-quorum.c:355:<wbr>glusterd_do_volume_quorum_<wbr>action]<br>
0-management: Server quorum lost for volume virt_images. Stopping<br>
local bricks.<br>
Dec 20 17:30:23 virt3 systemd[1]: Started GlusterFS, a clustered<br>
file-system server.<br>
Dec 20 17:30:29 virt3 glusterd[394]: [2017-12-20 16:30:29.488000] C<br>
[MSGID: 106003]<br>
[glusterd-server-quorum.c:349:<wbr>glusterd_do_volume_quorum_<wbr>action]<br>
0-management: Server quorum regained for volume filserver. Starting<br>
local bricks.<br>
Dec 20 17:30:29 virt3 glusterd[394]: [2017-12-20 16:30:29.491446] C<br>
[MSGID: 106003]<br>
[glusterd-server-quorum.c:349:<wbr>glusterd_do_volume_quorum_<wbr>action]<br>
0-management: Server quorum regained for volume virt_images. Starting<br>
local bricks.<br>
Dec 20 18:31:06 virt3 systemd[1]: Stopping GlusterFS, a clustered<br>
file-system server...<br>
Dec 20 18:31:06 virt3 systemd[1]: Stopped GlusterFS, a clustered<br>
file-system server.<br>
-- Reboot --<br>
Dec 20 18:31:46 virt3 systemd[1]: Starting GlusterFS, a clustered<br>
file-system server...<br>
Dec 20 18:31:46 virt3 glusterd[386]: [2017-12-20 17:31:46.958818] C<br>
[MSGID: 106002]<br>
[glusterd-server-quorum.c:355:<wbr>glusterd_do_volume_quorum_<wbr>action]<br>
0-management: Server quorum lost for volume filserver. Stopping local<br>
bricks.<br>
Dec 20 18:31:46 virt3 glusterd[386]: [2017-12-20 17:31:46.959168] C<br>
[MSGID: 106002]<br>
[glusterd-server-quorum.c:355:<wbr>glusterd_do_volume_quorum_<wbr>action]<br>
0-management: Server quorum lost for volume virt_images. Stopping<br>
local bricks.<br>
Dec 20 18:31:47 virt3 systemd[1]: Started GlusterFS, a clustered<br>
file-system server.<br>
Dec 20 18:33:10 virt3 glusterd[386]: [2017-12-20 17:33:10.156180] C<br>
[MSGID: 106001]<br>
[glusterd-volume-ops.c:1534:<wbr>glusterd_op_stage_start_<wbr>volume]<br>
0-management: Server quorum not met. Rejecting operation.<br>
Dec 20 18:35:58 virt3 glusterd[386]: [2017-12-20 17:35:58.440395] C<br>
[MSGID: 106003]<br>
[glusterd-server-quorum.c:349:<wbr>glusterd_do_volume_quorum_<wbr>action]<br>
0-management: Server quorum regained for volume filserver. Starting<br>
local bricks.<br>
Dec 20 18:35:58 virt3 glusterd[386]: [2017-12-20 17:35:58.446203] C<br>
[MSGID: 106003]<br>
[glusterd-server-quorum.c:349:<wbr>glusterd_do_volume_quorum_<wbr>action]<br>
0-management: Server quorum regained for volume virt_images. Starting<br>
local bricks.<br>
<br>
Brick 3 - arbiter (printserver) shows no loss of quorum at that time<br>
(again, glusterd.log confirms):<br>
Dec 19 15:33:24 printserver systemd[1]: Starting GlusterFS, a<br>
clustered file-system server...<br>
Dec 19 15:33:26 printserver glusterd[306]: [2017-12-19<br>
14:33:26.432369] C [MSGID: 106002]<br>
[glusterd-server-quorum.c:355:<wbr>glusterd_do_volume_quorum_<wbr>action]<br>
0-management: Server quorum lost for volume filserver. Stopping local<br>
bricks.<br>
Dec 19 15:33:26 printserver glusterd[306]: [2017-12-19<br>
14:33:26.432606] C [MSGID: 106002]<br>
[glusterd-server-quorum.c:355:<wbr>glusterd_do_volume_quorum_<wbr>action]<br>
0-management: Server quorum lost for volume virt_images. Stopping<br>
local bricks.<br>
Dec 19 15:33:26 printserver systemd[1]: Started GlusterFS, a clustered<br>
file-system server.<br>
Dec 19 15:34:18 printserver glusterd[306]: [2017-12-19<br>
14:34:18.158756] C [MSGID: 106003]<br>
[glusterd-server-quorum.c:349:<wbr>glusterd_do_volume_quorum_<wbr>action]<br>
0-management: Server quorum regained for volume filserver. Starting<br>
local bricks.<br>
Dec 19 15:34:18 printserver glusterd[306]: [2017-12-19<br>
14:34:18.162242] C [MSGID: 106003]<br>
[glusterd-server-quorum.c:349:<wbr>glusterd_do_volume_quorum_<wbr>action]<br>
0-management: Server quorum regained for volume virt_images. Starting<br>
local bricks.<br>
Dec 20 18:28:52 printserver systemd[1]: Stopping GlusterFS, a<br>
clustered file-system server...<br>
Dec 20 18:28:52 printserver systemd[1]: Stopped GlusterFS, a clustered<br>
file-system server.<br>
-- Reboot --<br>
Dec 20 18:30:40 printserver systemd[1]: Starting GlusterFS, a<br>
clustered file-system server...<br>
Dec 20 18:30:42 printserver glusterd[278]: [2017-12-20<br>
17:30:42.441675] C [MSGID: 106002]<br>
[glusterd-server-quorum.c:355:<wbr>glusterd_do_volume_quorum_<wbr>action]<br>
0-management: Server quorum lost for volume filserver. Stopping local<br>
bricks.<br>
Dec 20 18:30:42 printserver glusterd[278]: [2017-12-20<br>
17:30:42.441929] C [MSGID: 106002]<br>
[glusterd-server-quorum.c:355:<wbr>glusterd_do_volume_quorum_<wbr>action]<br>
0-management: Server quorum lost for volume virt_images. Stopping<br>
local bricks.<br>
Dec 20 18:30:42 printserver systemd[1]: Started GlusterFS, a clustered<br>
file-system server.<br>
Dec 20 18:33:49 printserver glusterd[278]: [2017-12-20<br>
17:33:49.005534] C [MSGID: 106003]<br>
[glusterd-server-quorum.c:349:<wbr>glusterd_do_volume_quorum_<wbr>action]<br>
0-management: Server quorum regained for volume filserver. Starting<br>
local bricks.<br>
Dec 20 18:33:49 printserver glusterd[278]: [2017-12-20<br>
17:33:49.008010] C [MSGID: 106003]<br>
[glusterd-server-quorum.c:349:<wbr>glusterd_do_volume_quorum_<wbr>action]<br>
0-management: Server quorum regained for volume virt_images. Starting<br>
local bricks.<br>
<span class=""><br>
>><br>
>> I wonder if the killing of brick processes could have led to some sort of<br>
>> race condition where writes were serviced on one brick / the arbiter and not<br>
>> the other?<br>
>><br>
>> If you can find a reproducer for this please open a BZ with it, I have<br>
>> been seeing something similar(I think) but I haven't been able to run the<br>
>> issue down yet.<br>
>><br>
>> -b<br>
<br>
</span>I'm not sure if I can replicate this, a lot has been going on in my<br>
setup the past few days (trying to tune some horrible small-file and<br>
file creation/deletion performance).<br>
<br>
Thanks for looking into this with me.<br>
<div class="HOEnZb"><div class="h5"><br>
Best regards,<br>
Henrik Juul Pedersen<br>
LIAB ApS<br>
</div></div></blockquote></div><br></div></div>