[Gluster-users] cannot resolve split-brain via 'select one brick as source'

Strahil Nikolov hunter86_bg at yahoo.com
Sun Dec 27 20:20:13 UTC 2020


Hi All,

I'm currently playing around with Gluster 8.3 and geo-replication.
I have created a 'replica 3' volume as source and a 'replica 2' as a destination.
In this case geo-replication is working quite fine , but during my tests I have managed to cause a split-brain in some of the files:

[root at glustere mnt]# gluster volume info secondary
 
Volume Name: secondary
Type: Distributed-Replicate
Volume ID: 1b5717ee-aa9b-4eff-9989-ad4f0388b86c
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: glusterd:/bricks/brick-d1/brick
Brick2: glustere:/bricks/brick-e2/brick
Brick3: glustere:/bricks/brick-e1/brick
Brick4: glusterd:/bricks/brick-d2/brick
Options Reconfigured:
cluster.quorum-count: 1
cluster.quorum-type: fixed
features.read-only: on
nfs.disable: on
transport.address-family: inet
storage.fips-mode-rchecksum: on
performance.quick-read: off
performance.client-io-threads: off
cluster.enable-shared-storage: enable

[root at glustere mnt]# gluster volume heal secondary info summary
Brick glusterd:/bricks/brick-d1/brick
Status: Connected
Total Number of entries: 0
Number of entries in heal pending: 0
Number of entries in split-brain: 0
Number of entries possibly healing: 0


Brick glustere:/bricks/brick-e2/brick
Status: Connected
Total Number of entries: 0
Number of entries in heal pending: 0
Number of entries in split-brain: 0
Number of entries possibly healing: 0


Brick glustere:/bricks/brick-e1/brick
Status: Connected
Total Number of entries: 2641
Number of entries in heal pending: 0
Number of entries in split-brain: 2641
Number of entries possibly healing: 0


Brick glusterd:/bricks/brick-d2/brick
Status: Connected
Total Number of entries: 2641
Number of entries in heal pending: 0
Number of entries in split-brain: 2641
Number of entries possibly healing: 0

As node "glustere" was last reboot , the source of truth should be 'glusterd:/bricks/brick-d2/brick'.

I have tried to tell gluster that , but it doesn't want to resolve the split-brain:

[root at glustere mnt]# gluster volume heal secondary split-brain source-brick glusterd:/bricks/brick-d2/brick | tail -n 5
Lookup failed on gfid:dbfb3794-4a1a-4540-b235-2fdce4d21d6a:Transport endpoint is not connected.
Lookup failed on gfid:d78fdf9c-1c00-4712-8229-cfc10b009ad3:Transport endpoint is not connected.
Status: Connected
Number of healed entries: 1

In the logs it's clearly that something is wrong:

[2020-12-27 20:14:07.100074] W [MSGID: 108027] [afr-common.c:2857:afr_attempt_readsubvol_set] 0-secondary-replicate-1: no read subvols for dbfb3794-4a1a-4540-b235-2fdce4d21d6a 
[2020-12-27 20:14:07.100106] I [MSGID: 109063] [dht-layout.c:641:dht_layout_normalize] 0-secondary-dht: Found anomalies [{path=dbfb3794-4a1a-4540-b235-2fdce4d21d6a}, {gfid=dbfb3794-4a1a-4540-b235-2fdce4d21d6a}, {holes=1}, {overlaps=0}] 
[2020-12-27 20:14:07.100903] W [MSGID: 108027] [afr-common.c:2857:afr_attempt_readsubvol_set] 0-secondary-replicate-1: no read subvols for d78fdf9c-1c00-4712-8229-cfc10b009ad3 
[2020-12-27 20:14:07.100935] I [MSGID: 109063] [dht-layout.c:641:dht_layout_normalize] 0-secondary-dht: Found anomalies [{path=d78fdf9c-1c00-4712-8229-cfc10b009ad3}, {gfid=d78fdf9c-1c00-4712-8229-cfc10b009ad3}, {holes=1}, {overlaps=0}]


Yet, if I specify the file in the previous command the heal is OK:

[root at glustere mnt]# gluster volume heal secondary split-brain source-brick glusterd:/bricks/brick-d2/brick gfid:dbfb3794-4a1a-4540-b235-2fdce4d21d6a
Healed gfid:dbfb3794-4a1a-4540-b235-2fdce4d21d6a.
[root at glustere mnt]# gluster volume heal secondary split-brain source-brick glusterd:/bricks/brick-d2/brick gfid:d78fdf9c-1c00-4712-8229-cfc10b009ad3
Healed gfid:d78fdf9c-1c00-4712-8229-cfc10b009ad3.

[2020-12-27 20:16:27.471113] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-secondary-replicate-1: performing metadata selfheal on dbfb3794-4a1a-4540-b235-2fdce4d21d6a 
[2020-12-27 20:16:27.473157] I [MSGID: 108026] [afr-self-heal-common.c:1744:afr_log_selfheal] 0-secondary-replicate-1: Completed metadata selfheal on dbfb3794-4a1a-4540-b235-2fdce4d21d6a. sources=[1]  sinks=0  
[2020-12-27 20:16:38.303151] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-secondary-replicate-1: performing metadata selfheal on d78fdf9c-1c00-4712-8229-cfc10b009ad3 
[2020-12-27 20:16:38.305144] I [MSGID: 108026] [afr-self-heal-common.c:1744:afr_log_selfheal] 0-secondary-replicate-1: Completed metadata selfheal on d78fdf9c-1c00-4712-8229-cfc10b009ad3. sources=[1]  sinks=0 



I thought that 'source-brick' solves both data and metadata split-brains. Am I wrong ?

Best Regards,
Strahil Nikolov


More information about the Gluster-users mailing list