|
Measures reported by CassKeySpaceTest
A keyspace in Cassandra is a namespace that defines data replication on nodes. A cluster contains one keyspace per node. CQL stores data in tables (SSTables, memtable), whose schema defines the layout of said data in the table, and those tables are grouped in keyspaces. A keyspace defines a number of options that applies to all the tables it contains, most prominently of which is the replication strategy used by the keyspace. It is generally encouraged to use one keyspace by application, and thus many cluster may define only one keyspace.
The keyspace is the top-level database object that controls the replication for the object it contains at each datacenter in the cluster. Keyspaces contain tables, materialized views and user-defined types, functions and aggregates.
In the read path, Cassandra merges data on disk (in SSTables) with data in RAM (in memtables). To avoid checking every SSTable data file for the partition being requested, Cassandra employs a data structure known as a bloom filter. Bloom filters are maintained per SSTable, i.e. each SSTable on disk gets a corresponding bloom filter in memory.
Bloom filters are a probabilistic data structure that allows Cassandra to determine one of two possible states: - The data definitely does not exist in the given file, or - The data probably exists in the given file. While bloom filters can not guarantee that the data exists in a given SSTable, bloom filters can be made more accurate by allowing them to consume more RAM. As accuracy improves (as the bloom_filter_fp_chance (bloom filter false positive) gets closer to 0), memory usage increases non-linearly i.e., the bloom filter with a bloom_filter_fp_chance = 0.01 requires about three times as much memory as the same table with bloom_filter_fp_chance = 0.1. If the bloom filter false positives increases rapidly, the memory usage may decrease and the disk overhead increase manifold. Therefore, it is essential to contain the bloom filter false positives before the disk is bombarded with requests. Similarly, the read requests and write requests in each keyspace also should be monitored at a closer pace so that administrators can ensure that the data is available in the keyspace. This will ensure a reduced disk overhead for the requests received. The CassKeySpaceTest test helps administrators in monitoring the keyspace and containing the bloom filter false positives!
This test auto-discovers the keyspaces in the target Cassandra Database node and for each keyspace, this test reports the count of SSTables and memory tables available. In addition, this test reveals the count of bloom filter false positives on each keyspace and the space utilization of the bloom filters in depth. The test also provides insights into the read and write latency of each keyspace so that administrators can get an idea of the keyspace that is lagging behind in catering the requests.
Ouputs of the test: One set of results for the target Cassandra Database node being monitored.
The measures made by this test are as follows:
| Measurement |
Description |
Measurement Unit |
Interpretation |
| Blmfilter_false_postive |
Indicates the number of bloom filter false positives in this keyspace. |
Number |
Typical values for bloom_filter_fp_chance are usually between 0.01 (1%) to 0.1 (10%) false-positive chance, where Cassandra may scan an SSTable for a row, only to find that it does not exist on the disk. The parameter should be tuned by use case:
- Users with more RAM and slower disks may benefit from setting the bloom_filter_fp_chance to a numerically lower number (such as 0.01) to avoid excess IO operations.
- Users with less RAM, more dense nodes, or very fast disks may tolerate a higher bloom_filter_fp_chance in order to save RAM at the expense of excess IO operations.
- In workloads that rarely read, or that only perform reads by scanning the entire data set (such as analytics workloads), setting the bloom_filter_fp_chance to a much higher number is acceptable.
|
| Blmfilter_false_pst_rate |
Indicates the bloom filter false positive ratio in this keyspace. |
Percent |
A low value is desired for this measure. |
| Blmfilter_space_used |
Indicates the disk space used by the bloom filter in this keyspace. |
MB |
A high value indicates that the data is available in the keyspace. |
| Live_SStable_count |
Indicates the number of SSTables that are currently live/active in this keyspace. |
Number |
Compare the value of this measure across the keyspaces to figure out the keyspace on which there are too many SSTables that are active/live. |
| Live_dsk_spce_used |
Indicates the disk space utilized by the SSTables that are live/active in this keyspace. |
MB |
A continuously increasing value of this measure indicates that the SSTables are upto-date with the data |
| Memtab_Col_count |
Indicates the number of columns present in the memory table available in this keyspace. |
Number |
|
| Memtab_switch_count |
Indicates the number of flushes in memory table per second that resulted in the switch out of the memory table available in this keyspace. |
Switches/second |
|
| Memtable_live_data_size |
Indicates the size of the data stored in the memory table available in this keyspace. |
MB |
A continuously increasing value of this measure indicates that the memory tables are not updating the data to the SSTables. Administrators should therefore check if adequate space is allocated to the SSTables. |
| Memtable_off_heap_size |
Indicates the off-heap memory size of the memory table available in this keyspace. |
MB |
|
| Memtable_on_heap_size |
Indicates the on-heap memory size of the memory table available in this keyspace. |
MB |
|
| Recent_blmfltr_fals_pstv |
Indicates the recent number of bloom filter positives negotiated in this keyspace. |
Number |
|
| Recnt_blmfltr_falspt_rat |
Indicates the recent bloom filter false positive ratio negotiated in this keyspace. |
Percent |
|
| Avg_read_latency |
Indicates the average time taken by this keyspace to respond to read requests. |
Milliseconds/request |
Compare the value of this measure across the keyspaces to determine the keyspace that is taking too long to respond to read requests. |
| Read_lat_99thpct |
Indicates the average 99th percentile time taken by this keyspace to respond to user requests. |
Milliseconds |
|
| Avg_write_latency |
Indicates the average time taken by this keyspace to write the data for the requests. |
Milliseconds/request |
Compare the value of this measure across keyspaces to figure out the keyspace that is taking too long to write the data for the requests received. |
| Write_lat_99thpct |
Indicates the average 9th percentile time taken by this keyspace to respond to each write request. |
Milliseconds |
|
| Avg_range_latency |
Indicates the average time taken by this keyspace to respond to a range of requests. |
Milliseconds/request |
|
| Range_lat_99thpct |
Indicates the average 99th percentile time taken by this keyspace to respond to a range of user requests. |
Milliseconds |
|
| Range_lat_99thpct |
Indicates the average 99th percentile time taken by this keyspace to respond to a range of user requests. |
Milliseconds |
|
|