[Gluster-users] Replica bricks fungible?

Sun Jun 13 13:19:43 UTC 2021

Based on your output it seems that add-brick (with force) did not destroy the already existing data, right ?

Have you checked the data integrity after the add-brick ? For example you can sha256sum several randomly picked files and compare across the bricks.

Best Regards,
Strahil Nikolov

В неделя, 13 юни 2021 г., 15:22:55 ч. Гринуич+3, Zenon Panoussis <oracle at provocation.net> написа: 

> Have you documented the procedure you followed?

There was a serious error in my previous reply to you:

  rsync -vvaz --progress node01:/gfsroot/gv0 /gfsroot/

That should have been 'rsync -vvazH' and the "H" is very
important. Gluster uses hard links to map file UUIDs to file
names, but rsync without -H ignores hard links and copies the
hardlinked data again into a new unrelated file, which breaks
gluster's coupling of data to metadata.

*

I have now also tried copying raw data on a three-brick replica
cluster (one brick per server) in a different way (do note the
hostname of the prompts below):

[root at node01 ~]# gluster volume status gv0
Status of volume: gv0
Gluster process                            TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick node01:/vol/gfs/gv0              49152    0          Y      35409
Brick node02:/vol/gfs/gv0              49152    0          Y      6814
Brick node03:/vol/gfs/gv0              49155    0          Y      21457

[root at node01 ~]# gluster volume heal gv0 statistics heal-count
(all 0)

[root at node02 ~]# umount 127.0.0.1:gv0
[root at node03 ~]# umount 127.0.0.1:gv0

[root at node01 ~]# gluster volume remove-brick gv0 replica 2 node03:/vol/gfs/gv0 force
[root at node01 ~]# gluster volume remove-brick gv0 replica 1 node02:/vol/gfs/gv0 force

You see here that, from node01 and with glusterd running on all
three nodes, I remove the other two nodes' bricks. This leaves
volume gv0 with one single brick and imposes a quorum of 1 (thank
you Strahil for this idea, albeit differently implemented here).

Now, left with a volume of only one single blick, I copy the data
to it on node01:

[root at node01 ~]# rsync -vva /datasource/blah 127.0.0.1:gv0/

This is fast. It is almost as fast as copying from one partition
to another on the same disk, because there is no network overhead
within gluster of nodes having to communicate multiple system
calls with each-other before they can write a file. And there
is no latency. System call latency ~200ms back and fro multiple
times is what is killing me (because of ADSL and 4.000 km between
my node01 and the other two), so this eliminates that problem.

In the next step I copied the raw gluster volume data to the other
two nodes. This is where 'rsync -H ' is important:

[root at node02 ~]# rsync -vvazH node01:/vol/gfs/gv0 /vol/gfs/
[root at node03 ~]# rsync -vvazH node02:/vol/gfs/gv0 /vol/gfs/

This is also fast; it copies raw data from A to B without any
communications needing to travel back and fro from every node
to every other node. Hence, no exponential latency multiplication
stonewall.

Finally, when all the raw data is in place on all three nodes,

[root at node01 www]# gluster volume add-brick gv0 replica 2 node02:/vol/gfs/gv0 force
[root at node01 www]# gluster volume add-brick gv0 replica 3 node03:/vol/gfs/gv0 force

For comparison: Copying a mail store of about 1,1 million small
and very small files, total ~80 GB, to this same gluster volume
the normal way, took me from the first days of January to early
May. Four months! Copying about 200.000 mostly small files
yesterday, total ~38 GB, with the above somewhat unorthodox way
took 12 hours from start to finish including the transfer over
ADSL.

________

Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users at gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users