[Gluster-devel] op_ret setting in gd_commit_op_phase

Atin Mukherjee amukherj at redhat.com
Sat Aug 23 08:31:27 UTC 2014


Hi Devel-list,

Current implementation of gd_commit_op_phase sets op_ret to a non zero
value if any of the commit operation fails and the transaction fails.

Cluster comprises of 2 nodes.
1. Stop the volume at Node 1
2. Start the volume at Node 1 and while volume was starting up bring
down Node 2.
3. Volume start fails with a message "volume start: test-vol: failed:
Commit failed on 00000000-0000-0000-0000-000000000000. Please check log
file for details."

4. gluster volume status now shows the volume as started although the
previous transaction failed.

In this case, since the local commit op succeed, changes to volinfo was
made but op_ret was non zero as the remote commit op failed at the other
node (due to other node going down at same point of time).

I was thinking of moving the local commit op code after the remote
commit ops and then overriding the op_ret and op_errstr with the local
commit op's behaviour. I know with this fix we can't solve the entire
inconsistency issue here as the current design doesn't have UNDO
framework but with this fix at least we can throw a correct message in CLI.

Your thoughts would be highly appreciated.

~Atin


More information about the Gluster-devel mailing list