eG Monitoring
 

Measures reported by IgniteAlwFaiTest

Apache Ignite comes with high degree of fault tolerance and supports automatic job failover. In case of node crash or job failure on a given node, jobs are automatically transferred to other available nodes for re-execution. The Always Failover SPI (Super interface) ensures that when a job from a compute task fails, an attempt is made to reroute the failed job to a node that has not executed any other job from the same task. If no such node is available, then an attempt is made to reroute the failed job to one of the nodes that may be running other jobs from the same task. If none of the above attempts succeeds, then the job is not failed over.

Always failover SPI is responsible for automatic failover and needs to be monitored to make sure it is working as expected.

This test monitors the Always Failover SPI to ensure that jobs are rerouted to other nodes in case of failover. In case jobs are failing but no jobs are being failed over, administrators may need to figure out if something is wrong with SPI.

Outputs of the test: One set of results for each Apache Ignite Server

The measures made by this test are as follows:

Measurement Description Measurement Unit Interpretation
totalFailoverJobsCount Indicates the total number of jobs failed over to other nodes apart from the node where they were originally executed. Number

Indicates the total number of jobs failed over to other nodes apart from the node where they were originally executed.