[Gluster-users] Upgrade from 6.9 to 7.7 stuck (peer is rejected)

Mon Oct 26 18:38:22 UTC 2020

Ok I see I won't go down that path of disabling quota.

I could now remove the arbiter brick of my volume which has the quota issue so it is now a simple 2 nodes replica with 1 brick per node.

Now I would like to add the brick back but I get the following error:

volume add-brick: failed: Host arbiternode.domain.tld is not in 'Peer in Cluster' state

In fact I checked and the arbiter node is still rejected as you can see here:

State: Peer Rejected (Connected)

On the arbiter node glusted.log file I see the following errors:

[2020-10-26 18:35:05.605124] E [MSGID: 106012] [glusterd-utils.c:3682:glusterd_compare_friend_volume] 0-management: Cksums of quota configuration of volume woelkli-private differ. local cksum = 0, remote  cksum = 66908910 on peer node1.domain.tld
[2020-10-26 18:35:05.617009] E [MSGID: 106012] [glusterd-utils.c:3682:glusterd_compare_friend_volume] 0-management: Cksums of quota configuration of volume myvol-private differ. local cksum = 0, remote  cksum = 66908910 on peer node2.domain.tld

So although I have removed the arbiter brick from my volume it it still complains about that checksum of the quota configuration. I also tried to restart glusterd on my arbiter node but it does not help. The peer is still rejected.

What should I do at this stage?

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Monday, October 26, 2020 6:06 PM, Strahil Nikolov <hunter86_bg at yahoo.com> wrote:

> Detaching the arbiter is pointless...
> Quota is an extended file attribute, and thus disabling and reenabling quota on a volume with millions of files will take a lot of time and lots of IOPS. I would leave it as a last resort. 
>
> Also, it was mentioned in the list about the following script that might help you:
> https://github.com/gluster/glusterfs/blob/devel/extras/quota/quota_fsck.py
>
> You can take a look in the mailing list for usage and more details.
>
> Best Regards,
> Strahil Nikolov
>
> В понеделник, 26 октомври 2020 г., 16:40:06 Гринуич+2, Diego Zuccato diego.zuccato at unibo.it написа:
>
> Il 26/10/20 15:09, mabi ha scritto:
>
> > Right, seen liked that this sounds reasonable. Do you actually remember the exact command you ran in order to remove the brick? I was thinking this should be it:
> > gluster volume remove-brick <VOLNAME> <BRICK> force
> > but should I use "force" or "start"?
>
> Memory does not serve me well (there are 28 disks, not 26!), but bash
> history does :)
>
> gluster volume remove-brick BigVol replica 2
>
> =============================================
>
> str957-biostq:/srv/arbiters/{00..27}/BigVol force
>
> gluster peer detach str957-biostq
>
> ==================================
>
> gluster peer probe str957-biostq
>
> =================================
>
> gluster volume add-brick BigVol replica 3 arbiter 1
>
> ====================================================
>
> str957-biostq:/srv/arbiters/{00..27}/BigVol
>
> You obviously have to wait for remove-brick to complete before detaching
> arbiter.
>
> > > IIRC it took about 3 days, but the arbiters are on a VM (8CPU, 8GB RAM)
> > > that uses an iSCSI disk. More than 80% continuous load on both CPUs and RAM.
> > > That's quite long I must say and I am in the same case as you, my arbiter is a VM.
>
> Give all the CPU and RAM you can. Less than 8GB RAM is asking for
> troubles (in my case).
>
> -----------------------------------------------------------------------------------------
>
> Diego Zuccato
> DIFA - Dip. di Fisica e Astronomia
> Servizi Informatici
> Alma Mater Studiorum - Università di Bologna
> V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
> tel.: +39 051 20 95786
>
> Community Meeting Calendar:
>
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://bluejeans.com/441850968
>
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users