[Gluster-users] Two issues with my Gluster volume

Sun Apr 17 18:48:21 UTC 2022

Hi all,

Some things I have found about the space issue:

 * shared-brick-count in /var/lib/glusterd/vols/data/* is higher than 1
   on some local bricks, even though they are actually on separate file
   systems
 * I have duplicate brick-fsid numbers in
   /var/lib/glusterd/vols/data/bricks/*
 * I have restarted glusterd and still have the duplicate brick-fsid's

So I am wondering where the duplicate FSIDs come from, and how to
(forcefully) resolve them. Can I safely alter them in
/var/lib/glusterd/vols/data/bricks/* and restart glusterd maybe?

I *may* at some point have accidentally replaced a brick to a wrong
location, being either the parent file system or another brick. But I
have corrected this by replacing it again to the correct location. Each
time I used "gluster volume replace-brick".

Running up-to-date Arch Linux by the way.
I have attached what I believe would be all relevant information to
diagnose the issue.
Please let me know if I can provide more information to get this issue
resolved.

Thanks!

-- 
groet / cheers,
Patrick Dijkgraaf

-----Original Message-----
From: Patrick Dijkgraaf <bolderbast at duckstad.net>
To: gluster-users at gluster.org
Subject: [Gluster-users] Two issues with my Gluster volume
Date: Sat, 16 Apr 2022 14:03:16 +0200
Mailer: Evolution 3.44.0 

Hi all, I hope this message finds you well.

I've been running a Gluster volume (32 bricks in distributed,
replicated mode) on my 2 home servers for about 1,5 years now. I'm
generally very happy with it!

Disks are distributed across 4 enclosures (2 enclosures per server). At
one point one of these enclosures failed (8 bricks down on 1 server)
but due to the awesomeness of Gluster (and my lack of monitoring :-( )
I only noticed this after about 6 weeks... This left me with A LOT of
pending heals, about 40k per brick if I remember correctly.

Well, I brought back online the failed bricks and let the the gluster
heal. And it did, mostly... It left about 1 to 4 pending heals on
multiple bricks that won't heal, no matter what I've tried. I just let
them be for some time until I had time to figure out what to do with
them.

Also, because some disks were about to fail, I started replacing
bricks. And taking advantage of this, I replaced them with a larger
disk (4TB -> 8TB).  Healing took care of copying all data to the new
brick and finished succesfully. However, for some reason I do not see
an increase in total space on the systems were I have mounted the
Gluster volume.

So in short, I have 2 issues:

   1. Some Pending Heals that I cannot get rid of
   2. Gluster total space being reported incorrect

Ideally, I'd like to address issue 2 first (seems the easiest fix), and
then focus on issue 1. Should that be ok?

I hope you guys can help me with these 2 issues. Thanks in advance!

Added as attachent:
 * Commands used to replace the brick
 * "gluster volume status data detail" output
 * "grep -n "shared-brick-count" /var/lib/glusterd/vols/data/*" output,
   as I read somewhere that this may be relevant...

________

Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users at gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20220417/eb7fe787/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Gluster issue.zip
Type: application/zip
Size: 211469 bytes
Desc: not available
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20220417/eb7fe787/attachment.zip>