eG Monitoring
 

Measures reported by AzrCosmosDBTest

Azure Cosmos DB is a fully managed NoSQL database service that assures users of better responsiveness, automatic and instant scalability, high availability, and enterprise- grade security.The service also supports multi region data distribution anywhere in the world, open source APIs and SDKs for popular languages. It also takes database administration off your hands with automatic management, updates and patching. It also handles capacity management with cost-effective serverless and automatic scaling options that respond to application needs to match capacity with demand.

To begin using Azure Cosmos DB, you should initially create an Azure Cosmos account in your Azure resource group in the required subscription, and then databases, containers, items under it.

Cosmos DB's guaranteed high availability, high throughput, low latency, and tunable consistency are some of the reasons why it is used by many mission-critical web, mobile, gaming, and IoT applications today. If this database service fails to deliver the guaranteed service levels, then not only will the performance of the dependent business-critical applications deteriorate, the user experience with such applications will also suffer. For instance, if a Cosmos DB account is the hotbed for issues such as high service downtime, frequent errors/failures, significant read/write latencies, inadequate throughput, and/or insufficient storage capacity, then applications and users relying on that account for their data storage and retrieval requirements will be adversely impacted. To avoid this, administrators should track the status of and requests to each Azure Cosmos DB account that is configured for the Azure subscription, quickly capture problems in the availability, overall health, and operations of that account, and resolve them before the applications and users are affected. This is where the AzrCosmosDBTest helps!

For each Azure Cosmos DB account that is configured for the target Azure subscription, this test reports the status of that account, and alerts administrators if the account's status is abnormal. Additionally, the test also tracks read/write requests to each account, measures the responsiveness of the database service to these requests, and proactively alerts administrators to potential processing bottlenecks. The database service availability is also checked periodically, and administrators instantly alerted to the unavailability of the service. Furthermore, the test also monitors the database operations performed on every account, reveals the cost of each operation, and turns administrator attention to the costliest operations in terms of resource usage. Administrator is notified if requests are throttled because the databases/containers in the account are not sized with enough provisioned throughput to process costly operations. Storage space usage of the account is also monitored, and administrators forewarned of potential space crunches. This way, the test helps administators measure and evaluate the various service level criteria for the Azure Cosmos DB service, and determine if the performance levels promised by this database service are achieved or not.

Outputs of the test : One set of results for each Azure Cosmos DB account configured for every resource group in the target Azure Subscription

The measures made by this test are as follows:

Measurement Description Measurement Unit Interpretation
Prvsng_status Indicates the current status of this Azure Cosmos DB account.   The values reported by this measure and its numeric equivalents are mentioned in the table below:

Measure Value Numeric Value
Succeeded 1
Updating 2
Error 3


Note:

By default, this measure reports the Measure Values listed in the table above to indicate the current status of the Azure Cosmos DB account. In the graph of this measure however, the same is represented using the numeric equivalents only.

Use the detailed diagnosis of this measure to know all about the Azure Cosmos DB account. The details displayed as part of detailed diagnostics include the region in which the account is located, the API used by the account for creating databases, whether/not automatic failover is enabled for the account and if so what is the failover location, the default consistency level set for the account, and more.
tot_rqst Indicates the total number of HTTP requests processed by this account. Number  
Http2xx Indicates the number of HTTP requests that were successfully processed by this account. Number A high value is desired for this measure.
Http3xx Indicates the number of HTTP requests to which this account responded with warnings. Number A low value is desired for this measure.
Http400 Indicates the number of HTTP requests to which this account responded with the error code HTTP 400 Bad Request. Number Responses with the code HTTP 400 Bad Request are sent under the following circumstances:

  • The JSON, SQL, or JavaScript in the request body is invalid;

  • The required properties of a resource are not present or set in the body of the POST or PUT on the resource;

  • The consistency level for a GET operation is overridden by a stronger consistency from the one set for the account;

  • A request that requires an x-msdocumentdb-partitionkey does not include it.



Ideally therefore, the value of this measure should be 0.
Http401 Indicates the number of HTTP requests to which this account responded with the error code HTTP 401 Unauthorized. Number Responses with the code HTTP 401 Unauthorized are sent when the Authorization header is invalid for the requested resource.

Ideally therefore, the value of this measure should be 0.
throtld_rqst Indicates the number of requests that this account throttled. Number Azure Cosmos DB allows you to set provisioned throughput on your databases and containers. This provisioned throughput is set using RUs - i.e., Request Units. A Request Unit is a performance currency abstracting the system resources such as CPU, IOPS, and memory that are required to perform the database operations supported by Azure Cosmos DB. In short, the cost of database operations is measured by RUs.

A request is typically throttled if the ‘cost of processing that request’ is more the provisioned throughput. The most common solution to this problem is to scale up the RUs for the given collection.
intrnl_srvr_err Indicates the number of internal server errors that this account encountered. Number Responses with code HTTP 4007 refer to internal server errors that typically occur if the the input bytes are not in the base64 format. Ideally, the value of this measure should be 0.
srvc_unavlbl Indicates the number of HTTP requests to which this account responded with the code HTTP 503 Service Unavailable. Number A response with the code HTTP 503 Service Unavailable is sent if the request could not be completed because the service was unavailable. This situation could happen due to network connectivity or service availability issues. It is safe to retry the operation. If the issue persists, contact support.

Ideally therefore, the value of this measure should be 0.
avg_rqst Indicates the average number of requests this account processed per second. Number A consistent drop in the value of this measure could indicate a processing bottleneck.
obsrvd_rd_ltncy Indicates the read latency noticed in this account. Seconds If the value of the Average number of requests per second is consistently high, it is a clear indicator of processing latencies. To know where the latency is more pronounced - in read operations or in write operations - compare the value of these measures.
obsrvd_wrt_ltncy Indicates the write latency noticed in this account. Seconds
strg_cpcty Indicates the total storage capacity of this account across all its databases and containers. MB  
avlbl_strg Indicates the free/unused storage space in this account. MB A high value is desired for this measure. A very low value implies that the databases/containers in the account have almost run out of free space. This also means that storage space has been excessively utilized. If the pattern of usage continues, a serious storage space contention will occur soon. To avert this, you may want to know what type of objects - data objects or index objects - are hogging the storage space and see if any of those objects can be removed to free up space. For that, first compare the value of the Total data size per account measure with that of the Total index size per account measure.
data_sz Indicates the total storage space in this account used up by data. MB In the event of abnormal storage space usage, compare the value of these measures to know what type of objects are hogging storage space - data objects or index objects.
index_sz Indicates the total storage space in this account used up by indexes. Seconds
dcmnt_cnt Indicates the number of documents in this account-s storage. Number  
tot_rqst_unts Indicates the throughput used by this account in terms of request units. Number Azure Cosmos DB allows you to set provisioned throughput on your databases and containers. This provisioned throughput is set using RUs - i.e., Request Units. A Request Unit is a performance currency abstracting the system resources such as CPU, IOPS, and memory that are required to perform the database operations supported by Azure Cosmos DB. In short, the cost of database operations is measured by RUs.

If the value of this measure suddenly spikes, you may want to look up the value of the Throttled requests measure to check for requests that have been throttled. If the Throttled requests measure reports a non-zero value, it implies that the RU cosumption is higher than the provisioned throughput and has hence resulted in request throttling. To avoid this, you may want to increase the provisioned throughput to suit the actual usage, or identify the operations that are RU-intensive and see if they can be controlled. For the latter, you have to compare the value of the following measures and identify the costly operation: Request units on query operations, Request units on update operations, Request units on delete operations, Request units on insert operations, Request units on count operations, Request units on other operations
max_rus Indicates the maximum number of reserved Request units (RUs) consumed by this account every second. Number Azure Cosmos DB reserved capacity pricing helps you enjoy cost savings of up to 65 -percent and enhanced availability SLAs, while reducing the burden of capacity planning. After you buy an Azure Cosmos DB reserved capacity, the reservation discount is automatically applied to Azure Cosmos DB resources that match the attributes and quantity of the reservation. A reservation covers the throughput provisioned for Azure Cosmos DB resources.

Using the values of these measures, you can ascertain how much of the reserved capacity is actually utilized by an account. Based on what you observe, you can even decide to increase the reserved capacity, so as to avail additional cost benefits while aligning the reservation with realtime usage.
max_rupm_cnsumd Indicates the maximum number of reserved Request units (RUs) consumed by this account every minute. Number
mngo_qry_rqst_chrg Indicates the number of Request units (RU) consumed by query operations performed on databases/containers in this account. Number If the Total request units measure reports an unusually high value, then compare the value of these measures to identify the costly / RU-intensive operations.
mngo_updt_rqst_chrg Indicates the number of Request units (RU) consumed by update operations performed on databases/containers in this account. Number
mngo_dlt_rqst_chrg Indicates the number of Request units (RU) consumed by delete operations performed on databases/containers in this account. Number
mngo_insrt_rqst_chrg Indicates the number of Request units (RU) consumed by insert operations performed on databases/containers in this account. Number
mngo_cnt_rqst_chrg Indicates the number of Request units (RU) consumed by count operations performed on databases/containers in this account. Number
mngo_othr_rqst_chrg Indicates the number of Request units (RU) consumed by all operations, other than query, update, delete, insert, and count operations, that are performed on databases/containers in this account. Number
mgo_qry_rqst_rt Indicates the number of query requests processed by this account. Number  
mngo_updt_rqst_rt Indicates the number of update requests processed by this account. Number  
mngo_dlt_rqst_rt Indicates the number of delete requests processed by this account. Number  
mngo_insrt_rqst_rt Indicates the number of insert requests processed by this account. Number  
mngo_cnt_rqst_rt Indicates the number of count requests processed by this account. Number  
mngo_othr_rqst_rt Indicates the number of requests, other than query / update / delete / insert / count requests, that are processed by this account. Number  
mngo_qry_faild_rqst Indicates the number of query requests that this account failed to process. Number Ideally, the value of this measure should be 0. A non-zero value implies that one/more requests have failed.
mngo_updt_faild_rqst Indicates the number of update requests that this account failed to process. Number Ideally, the value of this measure should be 0. A non-zero value implies that one/more requests have failed.
mngo_dlt_faild_rqst Indicates the number of delete requests that this account failed to process. Number Ideally, the value of this measure should be 0. A non-zero value implies that one/more requests have failed.
mngo_insrt_faild_rqst Indicates the number of insert requests that this account failed to process. Number Ideally, the value of this measure should be 0. A non-zero value implies that one/more requests have failed.
mngo_cnt_faild_rqst Indicates the number of count requests that this account failed to process. Number Ideally, the value of this measure should be 0. A non-zero value implies that one/more requests have failed.
mngo_othr_faild_rqst Indicates the number of requests, other than query / update / delete / insert / count requests, that this account failed to process. Number Ideally, the value of this measure should be 0. A non-zero value implies that one/more requests have failed.
srvc_availblty Indicates whether/not this account is currently available. Number If this measure reports the value 100, it means that the database service provided by this account is available. The value 0 on the other hand, denotes that the database service delivered by this account is unavailable.
cnsistncy_lvl Indicates the percentage of requests that meet with the consistency guarantee of the consistency level chosen for this account. Percent Distributed databases that rely on replication for high availability, low latency, or both, must make a fundamental tradeoff between the read consistency, availability, latency, and throughput - in other words, they have to compromise on one for the sake of the other. To improve read consistency with minimal impact on the other parameters, Azure Cosmos DB offers five well-defined levels of consistency, namely - Strong, Bounded Staleness, Session, Consistent Prefix, ane Eventuial.

Azure Cosmos DB guarantees that 100 percent of read requests meet the consistency guarantee for the consistency level chosen.

For instance, in the Strong level, reads are guaranteed to return the most recent committed version of an item. In bounded staleness consistency, the reads are guaranteed to honor the consistentprefix guarantee. In session consistency, within a single client session reads are guaranteed to honor the consistent-prefix, monotonic reads, monotonic writes, read-your-writes, and write-followsreads guarantees. Consistent prefix consistency level guarantees that reads never see out-of-order writes.And eventual consistency, there's no ordering guarantee for reads. In the absence of any further writes, the replicas eventually converge. Eventual consistency is the weakest form of consistency because a client may read the values that are older than the ones it had read before.

Use the value of this measure to determine what percentage of read requests meet with the consistency guarantee of the consistency level chosen. Ideally, the value of this measure should be 100. Lower values indicate that the consistency guarantees are not met. This is a cause for concern and hence should be investigated.