[Gluster-users] Gluster replicate 3 arbiter 1 in split brain. gluster cli seems unaware
Henrik Juul Pedersen
hjp at liab.dk
Wed Dec 20 18:26:37 UTC 2017
Hi,
I have the following volume:
Volume Name: virt_images
Type: Replicate
Volume ID: 9f3c8273-4d9d-4af2-a4e7-4cb4a51e3594
Status: Started
Snapshot Count: 2
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: virt3:/data/virt_images/brick
Brick2: virt2:/data/virt_images/brick
Brick3: printserver:/data/virt_images/brick (arbiter)
Options Reconfigured:
features.quota-deem-statfs: on
features.inode-quota: on
features.quota: on
features.barrier: disable
features.scrub: Active
features.bitrot: on
nfs.rpc-auth-allow: on
server.allow-insecure: on
user.cifs: off
features.shard: off
cluster.shd-wait-qlength: 10000
cluster.locking-scheme: granular
cluster.data-self-heal-algorithm: full
cluster.server-quorum-type: server
cluster.quorum-type: auto
cluster.eager-lock: enable
network.remote-dio: enable
performance.low-prio-threads: 32
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
nfs.disable: on
transport.address-family: inet
server.outstanding-rpc-limit: 512
After a server reboot (brick 1) a single file has become unavailable:
# touch fedora27.qcow2
touch: setting times of 'fedora27.qcow2': Input/output error
Looking at the split brain status from the client side cli:
# getfattr -n replica.split-brain-status fedora27.qcow2
# file: fedora27.qcow2
replica.split-brain-status="The file is not under data or metadata split-brain"
However, in the client side log, a split brain is mentioned:
[2017-12-20 18:05:23.570762] E [MSGID: 108008]
[afr-transaction.c:2629:afr_write_txn_refresh_done]
0-virt_images-replicate-0: Failing SETATTR on gfid
7a36937d-52fc-4b55-a932-99e2328f02ba: split-brain observed.
[Input/output error]
[2017-12-20 18:05:23.576046] W [MSGID: 108027]
[afr-common.c:2733:afr_discover_done] 0-virt_images-replicate-0: no
read subvols for /fedora27.qcow2
[2017-12-20 18:05:23.578149] W [fuse-bridge.c:1153:fuse_setattr_cbk]
0-glusterfs-fuse: 182: SETATTR() /fedora27.qcow2 => -1 (Input/output
error)
= Server side
No mention of a possible split brain:
# gluster volume heal virt_images info split-brain
Brick virt3:/data/virt_images/brick
Status: Connected
Number of entries in split-brain: 0
Brick virt2:/data/virt_images/brick
Status: Connected
Number of entries in split-brain: 0
Brick printserver:/data/virt_images/brick
Status: Connected
Number of entries in split-brain: 0
The info command shows the file:
]# gluster volume heal virt_images info
Brick virt3:/data/virt_images/brick
/fedora27.qcow2
Status: Connected
Number of entries: 1
Brick virt2:/data/virt_images/brick
/fedora27.qcow2
Status: Connected
Number of entries: 1
Brick printserver:/data/virt_images/brick
/fedora27.qcow2
Status: Connected
Number of entries: 1
The heal and heal full commands does nothing, and I can't find
anything in the logs about them trying and failing to fix the file.
Trying to manually resolve the split brain from cli gives the following:
# gluster volume heal virt_images split-brain source-brick
virt3:/data/virt_images/brick /fedora27.qcow2
Healing /fedora27.qcow2 failed: File not in split-brain.
Volume heal failed.
The attrs from virt2 and virt3 are as follows:
[root at virt2 brick]# getfattr -d -m . -e hex fedora27.qcow2
# file: fedora27.qcow2
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.virt_images-client-1=0x000002280000000000000000
trusted.afr.virt_images-client-3=0x000000000000000000000000
trusted.bit-rot.version=0x1d000000000000005a3aa0db000c6563
trusted.gfid=0x7a36937d52fc4b55a93299e2328f02ba
trusted.gfid2path.c076c6ac27a43012=0x30303030303030302d303030302d303030302d303030302d3030303030303030303030312f6665646f726132372e71636f7732
trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri.1=0x00000000a49eb0000000000000000001
trusted.pgfid.00000000-0000-0000-0000-000000000001=0x00000001
# file: fedora27.qcow2
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.virt_images-client-2=0x000003ef0000000000000000
trusted.afr.virt_images-client-3=0x000000000000000000000000
trusted.bit-rot.version=0x19000000000000005a3a9f82000c382a
trusted.gfid=0x7a36937d52fc4b55a93299e2328f02ba
trusted.gfid2path.c076c6ac27a43012=0x30303030303030302d303030302d303030302d303030302d3030303030303030303030312f6665646f726132372e71636f7732
trusted.glusterfs.quota.00000000-0000-0000-0000-000000000001.contri.1=0x00000000a2fbe0000000000000000001
trusted.pgfid.00000000-0000-0000-0000-000000000001=0x00000001
I don't know how to find similar information from the arbiter...
Versions are the same on all three systems:
# glusterd --version
glusterfs 3.12.2
# gluster volume get all cluster.op-version
Option Value
------ -----
cluster.op-version 31202
I might try upgrading to version 3.13.0 tomorrow, but I want to hear
you out first.
How do I fix this? Do I have to manually change the file attributes?
Also, in the guides for manual resolution through setfattr, all the
bricks are listed with a "trusted.afr.<volume>-client-<brick>". But in
my system (as can be seen above), I only see the other bricks? So
which attributes should be changes into what?
I hope someone might know a solution. If you need any more information
I'll try and provide it. I can probably change the virtual machine to
another image for now.
Best regards,
Henrik Juul Pedersen
LIAB ApS
More information about the Gluster-users
mailing list