[Gluster-users] Replica 3 volume with forced quorum 1 fault tolerance and recovery

Tue Dec 1 12:09:23 UTC 2020

It seems that consistency of replica 3 volume with quorum forced to 1 becomes
broken after a few forced volume restarts initiated after 2 brick failures.
At least it breaks GFAPI clients, and even volume restart doesn't help.

Volume setup is:

Volume Name: test0
Type: Replicate
Volume ID: 919352fb-15d8-49cb-b94c-c106ac68f072
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: 192.168.1.112:/glusterfs/test0-000
Brick2: 192.168.1.112:/glusterfs/test0-001
Brick3: 192.168.1.112:/glusterfs/test0-002
Options Reconfigured:
cluster.quorum-count: 1
cluster.quorum-type: fixed
cluster.granular-entry-heal: on
storage.fips-mode-rchecksum: on
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off

Client is fio with the following options:

[global]
name=write
filename=testfile
ioengine=gfapi_async
volume=test0
brick=localhost
create_on_open=1
rw=randwrite
direct=1
numjobs=1
time_based=1
runtime=600

[test-4-kbytes]
bs=4k
size=1G
iodepth=128

How to reproduce:

0) start the volume;
1) run fio;
2) run 'gluster volume status', select 2 arbitrary brick processes
    and kill them;
3) make sure fio is OK;
4) wait a few seconds, then issue 'gluster volume start [VOL] force'
    to restart bricks, and finally issue 'gluster volume status' again
    to check whether all bricks are running;
5) restart from 2).

This is likely to work for a few times but, sooner or later, it breaks
at 3) and fio detects an I/O error, most probably EIO or ENOTCONN. Starting
from this point, killing and restarting fio yields in error in glfs_creat(),
and even the manual volume restart doesn't help.

NOTE: as of 7914c6147adaf3ef32804519ced850168fff1711, fio's gfapi_async
engine is still incomplete and _silently ignores I/O errors_. Currently
I'm using the following tweak to detect and report them (YMMV, consider
experimental):

diff --git a/engines/glusterfs_async.c b/engines/glusterfs_async.c
index 0392ad6e..27ebb6f1 100644
--- a/engines/glusterfs_async.c
+++ b/engines/glusterfs_async.c
@@ -7,6 +7,7 @@
  #include "gfapi.h"
  #define NOT_YET 1
  struct fio_gf_iou {
+	struct thread_data *td;
  	struct io_u *io_u;
  	int io_complete;
  };
@@ -80,6 +81,7 @@ static int fio_gf_io_u_init(struct thread_data *td, struct io_u *io_u)
      }
      io->io_complete = 0;
      io->io_u = io_u;
+    io->td = td;
      io_u->engine_data = io;
  	return 0;
  }
@@ -95,7 +97,20 @@ static void gf_async_cb(glfs_fd_t * fd, ssize_t ret, void *data)
  	struct fio_gf_iou *iou = io_u->engine_data;

  	dprint(FD_IO, "%s ret %zd\n", __FUNCTION__, ret);
-	iou->io_complete = 1;
+	if (ret != io_u->xfer_buflen) {
+		if (ret >= 0) {
+			io_u->resid = io_u->xfer_buflen - ret;
+			io_u->error = 0;
+			iou->io_complete = 1;
+		} else
+			io_u->error = errno;
+	}
+
+	if (io_u->error) {
+		log_err("IO failed (%s).\n", strerror(io_u->error));
+		td_verror(iou->td, io_u->error, "xfer");
+	} else
+		iou->io_complete = 1;
  }

  static enum fio_q_status fio_gf_async_queue(struct thread_data fio_unused * td,

--

Dmitry