[Gluster-users] Replica 3 volume with forced quorum 1 fault tolerance and recovery
Strahil Nikolov
hunter86_bg at yahoo.com
Tue Dec 1 13:15:04 UTC 2020
Replica 3 with quorum 1 ?
This is not good. I doubt anyone will help you with this. The idea of replica 3 volumes is to tolerate 1 node ,as when a second one is dead - only 1 will accept writes.
You can imagine the situation when 2 bricks are down and data is writen to brick 3. What happens when the brick 1 and 2 is up and running -> how is gluster going to decide where to heal from ?
2 is more than 1 , so the third node should delete the file instead of the opposite.
What are you trying to achive with the quorum 1 ?
Best Regards,
Strahil Nikolov
В вторник, 1 декември 2020 г., 14:09:32 Гринуич+2, Dmitry Antipov <dmantipov at yandex.ru> написа:
It seems that consistency of replica 3 volume with quorum forced to 1 becomes
broken after a few forced volume restarts initiated after 2 brick failures.
At least it breaks GFAPI clients, and even volume restart doesn't help.
Volume setup is:
Volume Name: test0
Type: Replicate
Volume ID: 919352fb-15d8-49cb-b94c-c106ac68f072
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: 192.168.1.112:/glusterfs/test0-000
Brick2: 192.168.1.112:/glusterfs/test0-001
Brick3: 192.168.1.112:/glusterfs/test0-002
Options Reconfigured:
cluster.quorum-count: 1
cluster.quorum-type: fixed
cluster.granular-entry-heal: on
storage.fips-mode-rchecksum: on
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off
Client is fio with the following options:
[global]
name=write
filename=testfile
ioengine=gfapi_async
volume=test0
brick=localhost
create_on_open=1
rw=randwrite
direct=1
numjobs=1
time_based=1
runtime=600
[test-4-kbytes]
bs=4k
size=1G
iodepth=128
How to reproduce:
0) start the volume;
1) run fio;
2) run 'gluster volume status', select 2 arbitrary brick processes
and kill them;
3) make sure fio is OK;
4) wait a few seconds, then issue 'gluster volume start [VOL] force'
to restart bricks, and finally issue 'gluster volume status' again
to check whether all bricks are running;
5) restart from 2).
This is likely to work for a few times but, sooner or later, it breaks
at 3) and fio detects an I/O error, most probably EIO or ENOTCONN. Starting
from this point, killing and restarting fio yields in error in glfs_creat(),
and even the manual volume restart doesn't help.
NOTE: as of 7914c6147adaf3ef32804519ced850168fff1711, fio's gfapi_async
engine is still incomplete and _silently ignores I/O errors_. Currently
I'm using the following tweak to detect and report them (YMMV, consider
experimental):
diff --git a/engines/glusterfs_async.c b/engines/glusterfs_async.c
index 0392ad6e..27ebb6f1 100644
--- a/engines/glusterfs_async.c
+++ b/engines/glusterfs_async.c
@@ -7,6 +7,7 @@
#include "gfapi.h"
#define NOT_YET 1
struct fio_gf_iou {
+ struct thread_data *td;
struct io_u *io_u;
int io_complete;
};
@@ -80,6 +81,7 @@ static int fio_gf_io_u_init(struct thread_data *td, struct io_u *io_u)
}
io->io_complete = 0;
io->io_u = io_u;
+ io->td = td;
io_u->engine_data = io;
return 0;
}
@@ -95,7 +97,20 @@ static void gf_async_cb(glfs_fd_t * fd, ssize_t ret, void *data)
struct fio_gf_iou *iou = io_u->engine_data;
dprint(FD_IO, "%s ret %zd\n", __FUNCTION__, ret);
- iou->io_complete = 1;
+ if (ret != io_u->xfer_buflen) {
+ if (ret >= 0) {
+ io_u->resid = io_u->xfer_buflen - ret;
+ io_u->error = 0;
+ iou->io_complete = 1;
+ } else
+ io_u->error = errno;
+ }
+
+ if (io_u->error) {
+ log_err("IO failed (%s).\n", strerror(io_u->error));
+ td_verror(iou->td, io_u->error, "xfer");
+ } else
+ iou->io_complete = 1;
}
static enum fio_q_status fio_gf_async_queue(struct thread_data fio_unused * td,
--
Dmitry
________
Community Meeting Calendar:
Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users at gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users
More information about the Gluster-users
mailing list