eG Monitoring
 

Measures reported by IBMDB2RepLogGpTest

Recovery Point Objective (RPO) is the maximum tolerable amount of data you can afford to lose in case of a potential DB2 UDB database server crash. Recovery Time Objective is a metric that helps to calculate how quickly you need to recover your Application, database and other services following a disaster (crash) in order to maintain business continuity.

In a high availability setup, the primary and the standby databases should always be in sync. If the primary database crashes before data is synced with the standby databases, then, a significant amount of data will be lost. Generally, administrators do not wish to lose data in case of failures/crashes. To avoid such data loss, it is essential for the administrators to periodically keep track on the amount of data that each standby database is lagging behind i.e., the amount of data that is still more required for the standby database and the primary database to be in sync. Similarly, if data and infrastructure are not recovered following a disaster within the time duration set for the Recovery Time Objective, then, businesses could suffer irreparable data loss and integrity. To avoid such unpleasant eventualities and to ensure that their business is back to normal in a very short duration, administrators may have to periodically keep track on the RPO and RTO of the target DB2 UDB database server. The IBMDB2RepLogGpTest helps administrators in this regard!

For each database created on the target DB2 UDB database server, this test reports the amount of data that was lost when a switch over happened and the time lag noticed in the transport of logs between the primary and standby databases. Using this test, administrators can accurately estimate the time and amount of data required for the primary and standby databases to be in sync. This will help administrators fine-tune their high availability environment.

Outputs of the test : One set of results for each database created on the target database server instance being monitored

The measures made by this test are as follows:

Measurement Description Measurement Unit Interpretation
LaggingRPO Indicates the amount of data ( in terms of bytes) that was lost on this database when a switch over of database happened. Bytes If too many log gaps are detected in the sequence of the log files,then, it implies that the primary and the standby databases are not upto- date. A consistent increase in the value of this measure affects the availability of data in the database.
LaggingDurationsRTO Indicates the time lag noticed in the transport of logs to this database with respect to the generation of logs in the primary database. Seconds Given enough resources, in particular network bandwidth, a DB2 UDB standby database can maintain pace with very high workloads. In cases where resources are constrained, the standby can begin to fall behind, resulting in a transport or apply lag.

A transport lag is the amount of data, measured in time, that the standby has not received from the primary.

A low value is desired for this measure.