eG Monitoring
 

Measures reported by AzrBackupJobsTest

Azure Backup service triggers jobs that run in background in various scenarios such as triggering backup, restore operations, disabling backup.

If backup jobs fail, then no copies of your critical data will be available. With no backups, data recovery becomes impossible when disaster strikes; consequently, loss of data becomes inevitable. This is why, administrators should track the progress of these backup jobs, quickly detect job failures, and take appropriate action. Likewise, administrators should also be able to rapidly capture jobs that have been running for an abnormally long time, so that the reasons for the same can be quickly ascertained. The AzrBackupJobsTest helps with all of the above!

This test tracks the status of the backup jobs that have been triggered for an Azure Subscription and reports the count of jobs in different states. In the process, the test alerts administrators to failed jobs and jobs that have been running for a duration beyond a configured time limit. Detailed diagnostics throw light on the exact jobs that failed or are long-running, thus enabling administrators to easily troubleshoot the failure/abnormal run time (as the case may be).

Outputs of the test :One set of results for the Azure Subscription being monitored

The measures made by this test are as follows:

Measurement Description Measurement Unit Interpretation
failed_jobs Indicates the number of backup jobs that have failed. Number Ideally, the value of this measure should be 0. If a non-zero value is reported, it implies that one/more backup jobs have failed. In this case, you can use the detailed diagnosis of this measure to know which jobs failed.
progress_job Indicates the number of backup jobs that are in progress. Number The detailed diagnosis of this measure, if enabled, lists the jobs in progress.
completed_job Indicates the number of backup jobs that have completed. Number The detailed diagnosis of this measure, if enabled, lists the jobs that have completed.
long_running_jobs Indicates the number of backup jobs that have been running for a duration beyond the configured LONG RUNNING JOBS LIMIT IN MIN. Number A high value is a cause for concern. In this case, use the detailed diagnosis of this measure to know which jobs have been running for a long time.