Activation Sequence Tools
You can use the Artemis CLI to execute activation sequence maintenance/recovery tools for Pluggable Quorum Replication.
The 2 main commands are activation list
and activation set
, that can be used together to recover some disaster
happened to local/coordinated activation sequences.
Here is a disaster scenario built around the RI (using Apache Zookeeper and Apache curator) to demonstrate the usage of such commands.
Troubleshooting Case: Zookeeper Cluster disaster
A proper Zookeeper cluster should use at least 3 nodes, but what happens if all these nodes crash loosing any activation state information required to run an Artemis replication cluster?
During the disaster ie Zookeeper nodes no longer reachable, brokers:
- live ones shutdown (and if restarted by a script, should hang awaiting to connect to the Zookeeper cluster again)
- replicas become passive, awaiting to connect to the Zookeeper cluster again
Admin should:
- stop all Artemis brokers
- restart Zookeeper cluster
- search brokers with the highest local activation sequence for their
NodeID
, by running this command from thebin
folder of the broker:
$ ./artemis activation list --local
Local activation sequence for NodeID=7debb3d1-0d4b-11ec-9704-ae9213b68ac4: 1
- from the
bin
folder of the brokers with the highest local activation sequence
# assuming 1 to be the highest local activation sequence obtained at the previous step
# for NodeID 7debb3d1-0d4b-11ec-9704-ae9213b68ac4
$ ./artemis activation set --remote --to 1
Forced coordinated activation sequence for NodeID=7debb3d1-0d4b-11ec-9704-ae9213b68ac4 from 0 to 1
- restart all brokers: previously live ones should be able to be live again
The higher the number of Zookeeper nodes are, the less the chance are that a disaster like this requires Admin intervention, because it allows the Zookeeper cluster to tolerate more failures.