[Gluster-users] Gluster replicate 3 arbiter 1 in split brain. gluster cli seems unaware

Henrik Juul Pedersen hjp at liab.dk
Thu Dec 21 17:12:15 UTC 2017


Hi Karthik and Ben,

I'll try and reply to you inline.

On 21 December 2017 at 07:18, Karthik Subrahmanya <ksubrahm at redhat.com> wrote:
> Hey,
>
> Can you give us the volume info output for this volume?

# gluster volume info virt_images

Volume Name: virt_images
Type: Replicate
Volume ID: 9f3c8273-4d9d-4af2-a4e7-4cb4a51e3594
Status: Started
Snapshot Count: 2
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: virt3:/data/virt_images/brick
Brick2: virt2:/data/virt_images/brick
Brick3: printserver:/data/virt_images/brick (arbiter)
Options Reconfigured:
features.quota-deem-statfs: on
features.inode-quota: on
features.quota: on
features.barrier: disable
features.scrub: Active
features.bitrot: on
nfs.rpc-auth-allow: on
server.allow-insecure: on
user.cifs: off
features.shard: off
cluster.shd-wait-qlength: 10000
cluster.locking-scheme: granular
cluster.data-self-heal-algorithm: full
cluster.server-quorum-type: server
cluster.quorum-type: auto
cluster.eager-lock: enable
network.remote-dio: enable
performance.low-prio-threads: 32
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
nfs.disable: on
transport.address-family: inet
server.outstanding-rpc-limit: 512

> Why are you not able to get the xattrs from arbiter brick? It is the same
> way as you do it on data bricks.

Yes I must have confused myself yesterday somehow, here it is in full
from all three bricks:

Brick 1 (virt2): # getfattr -d -m . -e hex fedora27.qcow2
# file: fedora27.qcow2
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.virt_images-client-1=0x000002280000000000000000
trusted.afr.virt_images-client-3=0x000000000000000000000000
trusted.bit-rot.version=0x1d000000000000005a3aa0db000c6563
trusted.gfid=0x7a36937d52fc4b55a93299e2328f02ba
trusted.gfid2path.c076c6ac27a43012=0x30303030303030302d303030302d303030302d303030302d3030303030303030303030312f6665646f726132372e71636f7732
trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri.1=0x00000000a49eb0000000000000000001
trusted.pgfid.00000000-0000-0000-0000-000000000001=0x00000001

Brick 2 (virt3): # getfattr -d -m . -e hex fedora27.qcow2
# file: fedora27.qcow2
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.virt_images-client-2=0x000003ef0000000000000000
trusted.afr.virt_images-client-3=0x000000000000000000000000
trusted.bit-rot.version=0x19000000000000005a3a9f82000c382a
trusted.gfid=0x7a36937d52fc4b55a93299e2328f02ba
trusted.gfid2path.c076c6ac27a43012=0x30303030303030302d303030302d303030302d303030302d3030303030303030303030312f6665646f726132372e71636f7732
trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri.1=0x00000000a2fbe0000000000000000001
trusted.pgfid.00000000-0000-0000-0000-000000000001=0x00000001

Brick 3 - arbiter (printserver): # getfattr -d -m . -e hex fedora27.qcow2
# file: fedora27.qcow2
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.virt_images-client-1=0x000002280000000000000000
trusted.bit-rot.version=0x31000000000000005a39237200073206
trusted.gfid=0x7a36937d52fc4b55a93299e2328f02ba
trusted.gfid2path.c076c6ac27a43012=0x30303030303030302d303030302d303030302d303030302d3030303030303030303030312f6665646f726132372e71636f7732
trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri.1=0x00000000000000000000000000000001
trusted.pgfid.00000000-0000-0000-0000-000000000001=0x00000001

I was expecting trusted.afr.virt_images-client-{1,2,3} on all bricks?

> The changelog xattrs are named trusted.afr.virt_images-client-{1,2,3} in the
> getxattr outputs you have provided.
> Did you do a remove-brick and add-brick any time? Otherwise it will be
> trusted.afr.virt_images-client-{0,1,2} usually.

Yes, the bricks was moved around initially; brick 0 was re-created as
brick 2, and the arbiter was added later on as well.

>
> To overcome this scenario you can do what Ben Turner had suggested. Select
> the source copy and change the xattrs manually.

I won't mind doing that, but again, the guides assume that I have
trusted.afr.virt_images-client-{1,2,3} on all bricks, so I'm not sure
what to change to what, where.

> I am suspecting that it has hit the arbiter becoming source for data heal
> bug. But to confirm that we need the xattrs on the arbiter brick also.
>
> Regards,
> Karthik
>
>
> On Thu, Dec 21, 2017 at 9:55 AM, Ben Turner <bturner at redhat.com> wrote:
>>
>> Here is the process for resolving split brain on replica 2:
>>
>>
>> https://access.redhat.com/documentation/en-US/Red_Hat_Storage/2.1/html/Administration_Guide/Recovering_from_File_Split-brain.html
>>
>> It should be pretty much the same for replica 3, you change the xattrs
>> with something like:
>>
>> # setfattr -n trusted.afr.vol-client-0 -v 0x000000000000000100000000
>> /gfs/brick-b/a
>>
>> When I try to decide which copy to use I normally run things like:
>>
>> # stat /<path to brick>/pat/to/file
>>
>> Check out the access and change times of the file on the back end bricks.
>> I normally pick the copy with the latest access / change times.  I'll also
>> check:
>>
>> # md5sum /<path to brick>/pat/to/file
>>
>> Compare the hashes of the file on both bricks to see if the data actually
>> differs.  If the data is the same it makes choosing the proper replica
>> easier.

The files on the bricks differ, so there was something changed, and
not replicated.

Thanks for the input, I've looked at that, but couldn't get it to fit,
as I dont have trusted.afr.virt_images-client-{1,2,3} on all bricks.

>>
>> Any idea how you got in this situation?  Did you have a loss of NW
>> connectivity?  I see you are using server side quorum, maybe check the logs
>> for any loss of quorum?  I wonder if there was a loos of quorum and there
>> was some sort of race condition hit:
>>
>>
>> http://docs.gluster.org/en/latest/Administrator%20Guide/arbiter-volumes-and-quorum/#server-quorum-and-some-pitfalls
>>
>> "Unlike in client-quorum where the volume becomes read-only when quorum is
>> lost, loss of server-quorum in a particular node makes glusterd kill the
>> brick processes on that node (for the participating volumes) making even
>> reads impossible."

I might have had a loss of server quorum, but I cant seem to see
exactly why or when from the logs:

Times are synchronized between servers. Virt 3 was rebooted for
service at 17:29:39. The shutdown logs show an issue with unmounting
the bricks, probably because glusterd was still running:
Dec 20 17:29:39 virt3 systemd[1]: Failed unmounting /data/virt_images.
Dec 20 17:29:39 virt3 systemd[1]: data-filserver.mount: Mount process
exited, code=exited status=32
Dec 20 17:29:39 virt3 systemd[1]: Failed unmounting /data/filserver.
Dec 20 17:29:39 virt3 systemd[1]: Unmounted /virt_images.
Dec 20 17:29:39 virt3 systemd[1]: Stopped target Network is Online.
Dec 20 17:29:39 virt3 systemd[1]: Stopping GlusterFS, a clustered
file-system server...
Dec 20 17:29:39 virt3 systemd[1]: Stopping Network Name Resolution...
Dec 20 17:29:39 virt3 systemd[1]: Stopped GlusterFS, a clustered
file-system server.

I believe it was around this time, the virtual machine (running on
virt2) was stopped by qemu.


Brick 1 (virt2) only experienced loss of quorum when starting gluster
(glusterd.log confirms this):
Dec 20 17:22:03 virt2 systemd[1]: Starting GlusterFS, a clustered
file-system server...
Dec 20 17:22:05 virt2 glusterd[739]: [2017-12-20 16:22:05.997472] C
[MSGID: 106002]
[glusterd-server-quorum.c:355:glusterd_do_volume_quorum_action]
0-management: Server quorum lost for volume filserver. Stopping local
bricks.
Dec 20 17:22:05 virt2 glusterd[739]: [2017-12-20 16:22:05.997666] C
[MSGID: 106002]
[glusterd-server-quorum.c:355:glusterd_do_volume_quorum_action]
0-management: Server quorum lost for volume virt_images. Stopping
local bricks.
Dec 20 17:22:06 virt2 systemd[1]: Started GlusterFS, a clustered
file-system server.
Dec 20 17:22:11 virt2 glusterd[739]: [2017-12-20 16:22:11.387238] C
[MSGID: 106003]
[glusterd-server-quorum.c:349:glusterd_do_volume_quorum_action]
0-management: Server quorum regained for volume filserver. Starting
local bricks.
Dec 20 17:22:11 virt2 glusterd[739]: [2017-12-20 16:22:11.390417] C
[MSGID: 106003]
[glusterd-server-quorum.c:349:glusterd_do_volume_quorum_action]
0-management: Server quorum regained for volume virt_images. Starting
local bricks.
-- Reboot --
Dec 20 18:41:35 virt2 systemd[1]: Starting GlusterFS, a clustered
file-system server...
Dec 20 18:41:41 virt2 systemd[1]: Started GlusterFS, a clustered
file-system server.
Dec 20 18:41:43 virt2 glusterd[748]: [2017-12-20 17:41:43.387633] C
[MSGID: 106003]
[glusterd-server-quorum.c:349:glusterd_do_volume_quorum_action]
0-management: Server quorum regained for volume filserver. Starting
local bricks.
Dec 20 18:41:43 virt2 glusterd[748]: [2017-12-20 17:41:43.391080] C
[MSGID: 106003]
[glusterd-server-quorum.c:349:glusterd_do_volume_quorum_action]
0-management: Server quorum regained for volume virt_images. Starting
local bricks.


Brick 2 (virt3) shows a network outage on the 19th, but everything
worked fine afterwards:
Dec 19 13:11:34 virt3 glusterd[10058]: [2017-12-19 12:11:34.382207] C
[MSGID: 106003]
[glusterd-server-quorum.c:349:glusterd_do_volume_quorum_action]
0-management: Server quorum regained for volume filserver. Starting
local bricks.
Dec 19 13:11:34 virt3 glusterd[10058]: [2017-12-19 12:11:34.387324] C
[MSGID: 106003]
[glusterd-server-quorum.c:349:glusterd_do_volume_quorum_action]
0-management: Server quorum regained for volume virt_images. Starting
local bricks.
Dec 20 17:29:39 virt3 systemd[1]: Stopping GlusterFS, a clustered
file-system server...
Dec 20 17:29:39 virt3 systemd[1]: Stopped GlusterFS, a clustered
file-system server.
-- Reboot --
Dec 20 17:30:21 virt3 systemd[1]: Starting GlusterFS, a clustered
file-system server...
Dec 20 17:30:22 virt3 glusterd[394]: [2017-12-20 16:30:22.826828] C
[MSGID: 106002]
[glusterd-server-quorum.c:355:glusterd_do_volume_quorum_action]
0-management: Server quorum lost for volume filserver. Stopping local
bricks.
Dec 20 17:30:22 virt3 glusterd[394]: [2017-12-20 16:30:22.827188] C
[MSGID: 106002]
[glusterd-server-quorum.c:355:glusterd_do_volume_quorum_action]
0-management: Server quorum lost for volume virt_images. Stopping
local bricks.
Dec 20 17:30:23 virt3 systemd[1]: Started GlusterFS, a clustered
file-system server.
Dec 20 17:30:29 virt3 glusterd[394]: [2017-12-20 16:30:29.488000] C
[MSGID: 106003]
[glusterd-server-quorum.c:349:glusterd_do_volume_quorum_action]
0-management: Server quorum regained for volume filserver. Starting
local bricks.
Dec 20 17:30:29 virt3 glusterd[394]: [2017-12-20 16:30:29.491446] C
[MSGID: 106003]
[glusterd-server-quorum.c:349:glusterd_do_volume_quorum_action]
0-management: Server quorum regained for volume virt_images. Starting
local bricks.
Dec 20 18:31:06 virt3 systemd[1]: Stopping GlusterFS, a clustered
file-system server...
Dec 20 18:31:06 virt3 systemd[1]: Stopped GlusterFS, a clustered
file-system server.
-- Reboot --
Dec 20 18:31:46 virt3 systemd[1]: Starting GlusterFS, a clustered
file-system server...
Dec 20 18:31:46 virt3 glusterd[386]: [2017-12-20 17:31:46.958818] C
[MSGID: 106002]
[glusterd-server-quorum.c:355:glusterd_do_volume_quorum_action]
0-management: Server quorum lost for volume filserver. Stopping local
bricks.
Dec 20 18:31:46 virt3 glusterd[386]: [2017-12-20 17:31:46.959168] C
[MSGID: 106002]
[glusterd-server-quorum.c:355:glusterd_do_volume_quorum_action]
0-management: Server quorum lost for volume virt_images. Stopping
local bricks.
Dec 20 18:31:47 virt3 systemd[1]: Started GlusterFS, a clustered
file-system server.
Dec 20 18:33:10 virt3 glusterd[386]: [2017-12-20 17:33:10.156180] C
[MSGID: 106001]
[glusterd-volume-ops.c:1534:glusterd_op_stage_start_volume]
0-management: Server quorum not met. Rejecting operation.
Dec 20 18:35:58 virt3 glusterd[386]: [2017-12-20 17:35:58.440395] C
[MSGID: 106003]
[glusterd-server-quorum.c:349:glusterd_do_volume_quorum_action]
0-management: Server quorum regained for volume filserver. Starting
local bricks.
Dec 20 18:35:58 virt3 glusterd[386]: [2017-12-20 17:35:58.446203] C
[MSGID: 106003]
[glusterd-server-quorum.c:349:glusterd_do_volume_quorum_action]
0-management: Server quorum regained for volume virt_images. Starting
local bricks.

Brick 3 - arbiter (printserver) shows no loss of quorum at that time
(again, glusterd.log confirms):
Dec 19 15:33:24 printserver systemd[1]: Starting GlusterFS, a
clustered file-system server...
Dec 19 15:33:26 printserver glusterd[306]: [2017-12-19
14:33:26.432369] C [MSGID: 106002]
[glusterd-server-quorum.c:355:glusterd_do_volume_quorum_action]
0-management: Server quorum lost for volume filserver. Stopping local
bricks.
Dec 19 15:33:26 printserver glusterd[306]: [2017-12-19
14:33:26.432606] C [MSGID: 106002]
[glusterd-server-quorum.c:355:glusterd_do_volume_quorum_action]
0-management: Server quorum lost for volume virt_images. Stopping
local bricks.
Dec 19 15:33:26 printserver systemd[1]: Started GlusterFS, a clustered
file-system server.
Dec 19 15:34:18 printserver glusterd[306]: [2017-12-19
14:34:18.158756] C [MSGID: 106003]
[glusterd-server-quorum.c:349:glusterd_do_volume_quorum_action]
0-management: Server quorum regained for volume filserver. Starting
local bricks.
Dec 19 15:34:18 printserver glusterd[306]: [2017-12-19
14:34:18.162242] C [MSGID: 106003]
[glusterd-server-quorum.c:349:glusterd_do_volume_quorum_action]
0-management: Server quorum regained for volume virt_images. Starting
local bricks.
Dec 20 18:28:52 printserver systemd[1]: Stopping GlusterFS, a
clustered file-system server...
Dec 20 18:28:52 printserver systemd[1]: Stopped GlusterFS, a clustered
file-system server.
-- Reboot --
Dec 20 18:30:40 printserver systemd[1]: Starting GlusterFS, a
clustered file-system server...
Dec 20 18:30:42 printserver glusterd[278]: [2017-12-20
17:30:42.441675] C [MSGID: 106002]
[glusterd-server-quorum.c:355:glusterd_do_volume_quorum_action]
0-management: Server quorum lost for volume filserver. Stopping local
bricks.
Dec 20 18:30:42 printserver glusterd[278]: [2017-12-20
17:30:42.441929] C [MSGID: 106002]
[glusterd-server-quorum.c:355:glusterd_do_volume_quorum_action]
0-management: Server quorum lost for volume virt_images. Stopping
local bricks.
Dec 20 18:30:42 printserver systemd[1]: Started GlusterFS, a clustered
file-system server.
Dec 20 18:33:49 printserver glusterd[278]: [2017-12-20
17:33:49.005534] C [MSGID: 106003]
[glusterd-server-quorum.c:349:glusterd_do_volume_quorum_action]
0-management: Server quorum regained for volume filserver. Starting
local bricks.
Dec 20 18:33:49 printserver glusterd[278]: [2017-12-20
17:33:49.008010] C [MSGID: 106003]
[glusterd-server-quorum.c:349:glusterd_do_volume_quorum_action]
0-management: Server quorum regained for volume virt_images. Starting
local bricks.

>>
>> I wonder if the killing of brick processes could have led to some sort of
>> race condition where writes were serviced on one brick / the arbiter and not
>> the other?
>>
>> If you can find a reproducer for this please open a BZ with it, I have
>> been seeing something similar(I think) but I haven't been able to run the
>> issue down yet.
>>
>> -b

I'm not sure if I can replicate this, a lot has been going on in my
setup the past few days (trying to tune some horrible small-file and
file creation/deletion performance).

Thanks for looking into this with me.

Best regards,
Henrik Juul Pedersen
LIAB ApS


More information about the Gluster-users mailing list