eG Monitoring
 

Measures reported by HiveMtConnCacheTest

Direct disk accesses are expensive operations, which may result in increasing the processing overheads and eventually, degrading the overall performance of the Apache Hive data warehouse. The primary focus of administrators therefore is to improve the disk cache usage, so that direct disk accesses are kept at a minimum. By closely monitoring the requests to the Apache Hive data warehouse and reporting the fraction of requests that have been serviced by the disk cache, this test reveals whether/not the disk cache has been effectively utilized and helps assess the impact of this usage on the processing overheads of the data warehouse. From the metrics reported by this test, administrators can also figure out if the disk cache needs any further fine-tuning.

Outputs of the test: One set of results for the target Apache Hive

The measures made by this test are as follows:

Measurement Description Measurement Unit Interpretation
cacheHit Indicates the number of requests serviced by the disk cache during the last measurement period. Number

A high value is desired for this measure.

cacheMiss Indicates the number of requests that were not serviced by the disk cache during the last measurement period. Number

A low value is desired for this measure.

cachHitRt Indicates the number of requests that were not serviced by the disk cache during the last measurement period. Percent

A high ratio of hits is ideal. A very low ratio indicates that a majority of requests have been served by direct disk accesses only. This has an adverse impact on the overall health of the data warehouse.