[Gluster-users] Confusion supreme

Zenon Panoussis oracle at provocation.net
Tue Jul 23 14:14:52 UTC 2024


> First step would be to ensure that all clients are connected 
> to all bricks - this will reduce the chance of new problems.

Well, when the disk broke, one brick was obviously offline. But
apart from that, I'm not sure I understand what you mean by
"ensure that all clients are connected to all bricks".

The way I have it is that on one node the local brick is mounted
and then its filesystem is used by applications. On the other two
nodes glusterfsd/glusterfs are running, but the bricks are not
mounted and not used. Which node is in use can vary, but it is
always only one.

> For some reason there are problems with the broken node. 

After replacing the broken disk I had the same problem on all

> Did you reduce the replica to 2 before reinstalling the broken 
> node and re-adding it to the TSP ?

Yes. But even though I said "replica 2", the remove-brick command
refused to run without force. So I had to use force. Maybe that
is the cause of the subsequent inconsistencies.

> Try to get the attributes and the blames of a few files.

It's too late now; I fixed the problem, so I can no longer investigate

What I found is that the unhealable files existed on all three
bricks, but with different contents, ownerships and permissions.
Something like

-rw-r--r--   2 2004 2004   4074 Jun 12  2006 brick1/.glusterfs/00/01/0001055c-41e1-49da-aa98-9bc0246f70cd
-rw-r--r--   2    0    0      0 Jun 12  2006 brick2/.glusterfs/00/01/0001055c-41e1-49da-aa98-9bc0246f70cd
-rw-r--r--   2    0    0      0 Jun 12  2006 brick3/.glusterfs/00/01/0001055c-41e1-49da-aa98-9bc0246f70cd

where the file in brick 1 is the good one and the root-owned empty
files in bricks 2 and 3 made healing impossible. (The above listings
are illustrative and I don't remember whether the file mtimes matched
or not.)

The solution was to rsync -a the unhealable files from .glusterfs/
on the good brick to .glusterfs/ on the bad bricks and restart
healing. Then shd reported copying the files' metadata and the
volume was healed.

It is all very strange and I think I can smell bugs, but I can't
exactly put my finger on them.



Слава Україні!
Путлер хуйло!

More information about the Gluster-users mailing list