eG Monitoring
 

Measures reported by CassCompactTest

The concept of compaction is used for different kinds of operations in Cassandra, the common thing about these operations is that it takes one or more sstables and output new sstables.

The Cassandra write process stores data in files called SSTables. SSTables are immutable. Instead of overwriting existing rows with inserts or updates, Cassandra writes new timestamped versions of the inserted or updated data in new SSTables. Cassandra does not perform deletes by removing the deleted data: instead, Cassandra marks it with tombstones.

Over time, Cassandra may write many versions of a row in different SSTables. Each version may have a unique set of columns stored with a different timestamp. As SSTables accumulate, the distribution of data can require accessing more and more SSTables to retrieve a complete row.

To keep the database healthy, Cassandra periodically merges SSTables and discards old data. This process is called compaction.

Compaction works on a collection of SSTables. From these SSTables, compaction collects all versions of each unique row and assembles one complete row, using the most up-to-date version (by timestamp) of each of the row's columns. The merge process is performant, because rows are sorted by partition key within each SSTable, and the merge process does not use random I/O. The new versions of each row is written to a new SSTable. The old versions, along with any rows that are ready for deletion, are left in the old SSTables, and are deleted as soon as pending reads are completed.

Compaction causes a temporary spike in disk space usage and disk I/O while old and new SSTables co-exist. As it completes, compaction frees up disk space occupied by old SSTables. It improves read performance by incrementally replacing old SSTables with compacted SSTables. Cassandra can read data directly from the new SSTable even before it finishes writing, instead of waiting for the entire compaction process to finish.

As Cassandra processes writes and reads, it replaces the old SSTables with new SSTables in the page cache. The process of caching the new SSTable, while directing reads away from the old one, is incremental — it does not cause a the dramatic cache miss. Cassandra provides predictable high performance even under heavy load.

If there are too many pending reads that are serviced from the old SSTables, the old SStables cannot be deleted during compaction. This process may delay the number of compactions that need to be performed on the target Cassandra database node. If old SSTables are not deleted periodically, then, administrators may be alerted to potential space crunch in the disk which may impact the addition of new data into the SSTables. Therefore, it is necessary to monitor the compactions and the size of data compacted on the target database node round the clock. The CassCompactTest test helps administrators in this regard.

This test monitors the target Cassandra database node and reports the rate at which data was compacted. In addition, this test also reveals the count of compactions that are pending and the amount of data pending to be compacted per second. Using this test, administrators can be alerted to irregularities in compaction process if any, and rectify them at a faster pace before space crunch of the disk reaches abnormal limits.

Ouputs of the test: One set of results for the target Cassandra Database node that is being monitored.

The measures made by this test are as follows:

Measurement Description Measurement Unit Interpretation
Compaction_size Indicates the amount of data compacted per second (measured after compaction process) during the last measurement period. MB/sec  
Completed_compactions Indicates the number of compactions completed per second during the last measurement period. Compactions/sec A high value is desired for this measure.
Pending_compactions Indicates the number of compactions that are currently pending. Number Ideally, the value of this measure should be zero.
Total_compactions Indicates the total number of compactions performed per second during the last measurement period. Compactions/sec