|
Measures reported by HanaSavPntTest
SAP HANA persists in-memory data by using savepoints. Each SAP HANA service has its own separate savepoints. During a savepoint operation, SAP HANA database flushes all changed data from memory to the data volumes. The data belonging to a savepoint represents a consistent state of the data on disk and remains so until the next savepoint operation has completed. Redo log entries are written to the log volumes for all changes to persistent data. In the event of a database restart (for example, after a crash), the data from the last completed savepoint can be read from the data volumes, and the redo log entries written to the log volumes since the last savepoint can be replayed.
HANA Savepoint is split into three individual stages:
Phase 2 (CRITICAL): The is the critical part of a savepoint operation where no concurrent write operations are allowed. This is achieved using the consistent change lock. To minimize the impact on concurrent operations, phase 2 must be kept as short as possible. The savepoint coordinator determines and stores the savepoint log position and the list of open transactions. Also pages that were changed during phase 1 are written to disk asynchronously.
To perform a savepoint write operation, SAP HANA needs to take a global database lock. This period is called the “critical phase” of a savepoint. While SAP HANA was designed to keep this time period as short as possible, poor I/O performance can extend it to a length that causes a considerable performance impact. Savepoints are used to implement backup and disaster recovery in SAP HANA, thus it is imperative to keep a vigil on the the savepoint operation.
This test monitors the savepoint operation in the SAP HANA database server and reports the size of asynchronously flushed pages and row store pages during page flush and critical state. Inaddition this test also reports the maximum duration of critical phase and blocking phase. This way administrators can identify any increase in critical phase duration and promptly resolve the issue before it adversly affect the server performance.
Outputs of the test: One set of results for the target database server being monitored
The measures made by this test are as follows:
| Measurement |
Description |
Measurement Unit |
Interpretation |
| TotalSize |
Indicates the total amount of data that have been prompted to be written to the disk. |
GB |
|
| FlushedSize |
Indicates the size of asynchronously flushed pages. |
GB |
|
| FlushedSizeInCriticalPh |
Indicates the size of pages flushed in the critical phase. |
GB |
A low value is desires for this measure.
A high value of this measure indicate potential I/O overload. Normally, zero or only a few pages should be written in the critical phase, except for special situations like global savepoint for data backup (but also in this case, the number of pages written in the critical phase should be on the order of magnitude 1% or less of asynchronously flushed pages). High amounst of data written in the critical phase indicates overload of the I/O subsystem and is most probably lead to increased blocking times of update transactions due to increased CRITICAL_PHASE_DURATION.
|
| FlushedRowStoreSize |
Indicates the size of the asynchronously flushed row store pages. |
GB |
Row store is only flushed during savepoint, column store also flushes the data between savepoints to balance the load.
|
| FlushedRwStrCriticalPh |
Indicates the size of the row store pages flushed in the critical phase. |
GB |
|
| MaxBlockingPhDuration |
Indicates the maximum blocking phase duration. |
Milliseconds |
The majority of the savepoint is performed online without holding a lock, but the finalization of the savepoint requires a lock. This step is called the blocking phase of the savepoint. It consists of two major subphases, WaitForLock and Critical.
| Long durations of the blocking phase (outside of the critical phase) are typically caused by SAP HANA internal lock contention and delays during the critical phase are often caused by problems in the disk I/O area.
| The detailed diagnosis of this measure lists the VolumeID, Initiation, Purpose, State, Blocking phase start time, Blocking phase durtaion(milliseconds), Critical phase start time, and Critical phase duration(milliseconds).
|
| MaxCriticalPhDuration |
Indicates the time spent in critical phase (during this time, updates are blocked). |
Milliseconds |
This measure shows the period of time during which the updaters were blocked in a savepoint. Normally, this should be in the milliseconds range, except for a global savepoint for data backup, which may take longer due to global synchronization across all nodes.
| If the critical phase duration is too long, there is probably some problem (e.g., I/O load is too high).
| The detailed diagnosis of this measure lists the VolumeID, Initiation, Purpose, State, Blocking phase start time, Blocking phase durtaion(milliseconds), Critical phase start time, and Critical phase duration(milliseconds).
|
|