You can use the Artemis CLI to execute activation sequence maintenance/recovery tools for Pluggable Quorum Replication.
The 2 main commands are
activation list and
activation set, that can be used together to recover some disaster happened to local/coordinated activation sequences.
A proper ZooKeeper cluster should use at least 3 nodes, but what happens if all these nodes crash loosing any activation state information required to manage replication?
During the disaster (i.e. ZooKeeper nodes are no longer reachable) the follow occurs:
Active brokers shutdown (and if restarted, should hang waiting to reconnect to the ZooKeeper cluster again)
Passive brokers unpair and wait to reconnect to the ZooKeeper cluster again
Necessary administrative action:
Stop all brokers
Restart ZooKeeper cluster
Search for brokers with the highest local activation sequence for their
NodeIDby running this command from the
binfolder of the broker:
$ ./artemis activation list --local Local activation sequence for NodeID=7debb3d1-0d4b-11ec-9704-ae9213b68ac4: 1
binfolder of the brokers with the highest local activation sequence
# assuming 1 to be the highest local activation sequence obtained at the previous step # for NodeID 7debb3d1-0d4b-11ec-9704-ae9213b68ac4 $ ./artemis activation set --remote --to 1 Forced coordinated activation sequence for NodeID=7debb3d1-0d4b-11ec-9704-ae9213b68ac4 from 0 to 1
Restart all brokers: previously active ones should be able to be active again
The more ZooKeeper nodes there are the less chance that a disaster like this requires administrative intervention because it allows the ZooKeeper cluster to tolerate more failures.