|
Measures reported by OraRacMttrTest
Instance recovery, which is the process of recovering the redo thread from the failed instance, is a critical component affecting availability. When using Oracle RAC, the SMON process in one surviving instance performs instance recovery of the failed instance. The sooner this happens and lesser the I/O that is consumed during recovery, the better will be the user experience with the Oracle RAC.
Mean time to recovery (MTTR) is the average time that the Oracle server will take to recover from any failure. In order to limit recovery I/O and optimize cluster performance, you need to understand the MTTR target your system is currently achieving and what your potential MTTR target could be, given the I/O capacity.
This test reports the target and estimated MTTR, and monitors the key factors affecting MTTR such as the redo log size and the number of redo blocks to be processed.
The measures
made by this test are as
follows:
| Measurement |
Description |
Measurement Unit |
Interpretation |
| Target_MTTR |
Indicates the effective mean time to recover (MTTR) this instance.
|
Secs |
Usually, the value of this measure should be equal to the value of the FAST_START_MTTR_TARGET initialization parameter. FAST_START_MTTR_TARGET specifies a target for the expected mean time to recover (MTTR), that is, the time (in seconds) that it should take to start up the instance and perform cache recovery.
After FAST_START_MTTR_TARGET is set, the database manages incremental checkpoint writes in an attempt to meet that target.
If FAST_START_MTTR_TARGET is set to such a small value that it is impossible to do a recovery within its time frame, then the the value of this measure will be larger than FAST_START_MTTR_TARGET.
If FAST_START_MTTR_TARGET is set to such a high value that even in the worst-case (the whole buffer cache is dirty) recovery would not take that long, then the value of this measure will be the same as the Estimated MTTR.
If FAST_START_MTTR_TARGET is not specified, then again, the value of this measure will be the same as the value of the Estimated MTTR measure.
|
| Estimated_MTTR |
Indicates the current estimated mean time to recover (MTTR).
|
Secs |
This measure is calculated based on the number of dirty buffers and log blocks (0 if FAST_START_MTTR_TARGET is not specified). Basically, this value tells you how long you could expect recovery of the instance to take place based on the work your system is doing at the time of testing.
This measure reports the estimated mean time to recovery based on the current state of the running database. If the database has just opened, the system may contain only a few dirty buffers, so not much cache recovery would be required if the instance failed at this moment. That is why the value of this measure can, for the moment, be lower than the minimum possible TARGET_MTTR measure. |
| Recovery_Estimated_IOs |
Indicates the estimated number of dirty buffers in the buffer cache of this Oracle instance. |
Number |
|
| Target_redo_blocks |
Indicates the target number of redo blocks that must be processed during this Oracle instance recovery. |
Number |
Instance recovery is nothing more than using the contents of the online log files to rebuild the database buffer cache to the state it was in before the crash. This will replay all changes extracted from the redo logs that refer to blocks that had not been written to disk at the time of the crash. Though instance recovery guarantees no corruption, it may take a considerable time to do its roll forward before the database can be opened. This time is dependent on two factors: how much redo has to be read and how many read/write operations will be needed on the datafiles as the redo is applied. The values of these measures serve as good indicators of the amount of redo reading work that needs to be performed as part of the recovery process, and are hence useful while determining the MTTR. |
| Actual_redo_blocks |
Indicates the actual number of redo blocks that are required by this Oracle instance to recover. |
Number |
| Writes_logfile_size |
Indicates the number of writes driven by the smallest redo log file size for each oracle instance.
| Number |
This measure is used to drive the checkpoint process, if your redo log file size is under sized. Since the FAST_START_MTTR_TARGET parameter is set to limit the instance recovery time, Oracle automatically tries to checkpoint as frequently as necessary. Under such a condition, the size of the log files should be large enough to avoid additional checkpoint due to under sized log files. |
| Writes_auto_tune |
Indicates the number of writes due to auto-tune checkpointing.
| Number |
The checkpoint auto-tuning mechanism inspects statistics on machine utilization, such as the rate of disk I/O and CPU usage, and if it appears that there is spare capacity, it will use this capacity to write out additional dirty buffers from the database buffer cache, thus pushing the checkpoint position forward. The result is that even if the FAST_START_MTTR_TARGET parameter is set to a high value (the highest possible is 3600 seconds-anything above that will be rounded down), actual recovery time may well be much less.
Enabling checkpoint auto-tuning with a high target should result in your instance always having the fastest possible recovery time that is consistent with maximum performance.
|
|