eG Monitoring
 

Measures reported by AzrIoTHubTest

IoT Hub is a managed service hosted in the cloud that acts as a central message hub for communication between an IoT application and its attached devices. You can connect millions of devices and their backend solutions reliably and securely. Almost any device can be connected to an IoT hub.

Every IoT hub has an identity registry that stores information about the devices and modules permitted to connect to it. Before a device or module can connect, there must be an entry for that device or module in the IoT hub's identity registry. Azure IoT hub maintains a device twin for each device that you connect to IoT hub. Device twins are JSON documents that store device state information including metadata, configurations, and conditions.

A device or module must also authenticate with the IoT hub based on credentials stored in the identity registry. After authentication, the internet connection between the IoT device and IoT Hub is secured using the Transport Layer Security (TLS) standard.

Typically, IoT devices send telemetry from the sensors to back- end services in the cloud. Examples of telemetry received from a device can include sensor data such as speed or temperature, an error message such as missed event, or an information message to indicate the device is in good health. However, other types of communication are possible, such as a backend service sending commands to your devices - eg., a back-end service sending a command to a device to change the frequency at which it sends telemetry to help diagnose a problem. IoT Hub implements commands by allowing you to invoke direct methods on devices. Direct methods represent a request-reply interaction with a device similar to an HTTP call in that they succeed or fail immediately (after a user-specified timeout).

A built-in endpoint collects data from your device by default. The data is collected using a requestresponse pattern over dedicated IoT device endpoints, is available for a maximum duration of seven days, and can be used to take actions on a device. Data can also be routed to different services for further processing. Once a message route has been created, data stops flowing to the built-in-endpoint unless a fallback route has been configured.

Back-end apps can also be used to enable device administrators and operators to update and interact with IoT devices in bulk and at a scheduled time. Jobs execute device twin updates and direct methods against a set of devices at a scheduled time. For example, an operator may want to use a back-end app that initiates and tracks a job to reboot a set of devices in building 43 and floor 3 at a time that would not be disruptive to the operations of the building.

While on the one hand, IoT hubs simplify business workflows by enabling 'near-hands-free' communication and action between devices and backend solutions, on the other, they can also serve as a 'problem hotspot' because of the many moving parts within! For example, an unavailable IoT hub can suspend business operations, as IoT applications will no longer be able to communicate with devices attached to that hub. Also, if commands sent by back-end services to devices fail, they can cause unexpected errors / issues in a business- critical workflow. Likewise, some telemetry messages may not be delivered to desired endpoints, thereby disrupting a crucial business practice. Moreover, slowness may be observed in message routing, which may consequently delay key business processes. Similarly, device twin updates, scheduled jobs, and job queries can also fail, resulting in problems in communication and corresponding action. Furthermore, if an IoT hub operates at a level higher than the established quotas, then again, the performance of the hub and business services that depend on it will be compromised. To avoid this, administrators need to closely track how each IoT hub interacts with attached devices, and rapidly detect anomalies. This is where the AzrIoTHubTest helps!

This test auto-discovers the IoT hubs created for each resource group of a target subscription. For every hub so discovered, the test reports the status of that hub, and alerts administrators if any hub is unavailable. Commands executed on the devices attached to each hub are monitored, and command aborts, rejections, and abandonment are brought to the attention of administrators. Message routing by every hub is also monitored; in the process, latencies in message delivery to specific endpoints are revealed. Furthermore, the test also tracks operations - eg., reads, updates etc. - performed on device twins maintained by each hub. In the process, the test captures and reports operational failures. The progress of scheduled jobs is tracked, and job failures, cancellation failures, job query failures are highlighted. The test also periodically measures the operational levels of every hub by reporting the count of devices registered with a hub, the size of requests and responses, the count of messages sent to and by devices etc. Administrators are notified if any hub is about to use up its operational capacity, thus urging them to increase the capacity before performance suffers. This way, the test rapidly points administrators to problems in the functioning of an IoT hub and prompts them to immediately initiate corrective action, so that the hub operates uninterrupted.

Outputs of the test : One set of results for every IoT hub configured for each resource group of the target subscription

The measures made by this test are as follows:

Measurement Description Measurement Unit Interpretation
Status Indicates whether/not this IoT hub is available.   The values reported by this measure and its numeric equivalents are mentioned in the table below:

Measure Value Numeric Value
Available 1
Unavailable 0


Note:

By default, this measure reports the Measure Values listed in the table above to indicate the current status of an IoT hub. The graph of this measure however, represents the status of a server using the numeric equivalents only.

Use the detailed diagnosis of this measure to know the location, SKU, tier and capacity of the IoT hub.
Prvg_status Indicates the current provisioning status of this IoT hub.   The values reported by this measure and its numeric equivalents are mentioned in the table below:

Measure Value Numeric Value
Succeeded 1
Updating 2
Error 3
Unknown 0


Note:

By default, this measure reports the Measure Values listed in the table above to indicate the current provisioning status. In the graph of this measure however, the same is represented using the numeric equivalents only.
Tele_msg_snd Indicates the number of device-to-cloud telemetry messages attempted to be sent to this IoT hub. Number Make sure that the value of this measure is within the prescribed limits. If not, then throttling errors will occur, causing messages to be dropped.
Tele_msg_sent Indicates the number of device-to-cloud telemetry messages sent successfully to this IoT hub. Number  
Cmd_cmpltd Indicates the number of cloud-to-device message deliveries completed successfully by this IoT hub. Number To guarantee at-least-once message delivery, your IoT hub persists cloudto-device messages in per-device queues. For the IoT hub to remove the messages from the queue, thedevices must explicitly acknowledge completion. This approach guarantees resiliency against connectivity and device failures.
Cmd_aband Indicates the number of cloud-to-device commands abandoned by the devices attached to this IoT hub. Number Devices abandon messages, which causes the IoT hub to put the messages back in the queue, with the state set to Enqueued
Cmd_rjctd Indicates the number of cloud-to-device commands rejected by the devices attached to this hub. Number Ideally, the value of this measure should be 0.

A non-zero value implies that one/more messages are rejected.

If a message is rejected, the IoT hub to sets it to the Dead lettered state.
Tot_device Indicates the number of devices registered with this hub. Number The total number of devices plus modules that can be registered to a single IoT hub is capped at 1,000,000. If the value of this measure is equal to this cap, the performance of the hub will be compromised. To avoid this, you may want to increase the limit by contacting Microsoft support.
Cnnctd_dvics Indicates the number of devices currently connected to this hub. Number Ideally, the value of this measure should be the same as that of the Total devices measure. If the gap between these two measures is large, it could mean that many registered devices are currently disconnected from the hub.
Tele_msg_dlvrd Indicates the number of telemetry messages that this hub successfully delivered to endpoints. Number  
Drppd_msg Indicates the total count of messages dropped by this hub. Number Typically, a message is dropped for reasons like message didn't match any routing query or endpoint was dead and message could not be delivered after several retries.

Ideally, the value of this measure should be 0.
Orphnd_msg Indicates the number of messages that this hub orphaned. Number Orphaned messages are those that do not match any routes, including fallback routes.

Ideally, the value of this measure should be 0.
Invld_msg Indicates the number of messages that this hub could not deliver. Number Message is invalid because of incompatibility with the endpoint.

If this measure reports a non-zero value, then you may want to check the configuration of endpoints for incompatibilities
Msg_mtchng_cndtn Indicates the number of messages that this hub has written to the fallback endpoint. Number If message routing is turned on, you can enable the fallback route capability. Once a route is created, data stops flowing to the built-inendpoint, unless a route is created to that endpoint. If there are no routes to the built-in-endpoint and a fallback route is enabled, only messages that don't match any query conditions on routes will be sent to the built-inendpoint. Also, if all existing routes are deleted, fallback route must be enabled to receive all data at the builtin-endpoint.

If this measure reports a non-zero value, it could imply that either no routes have been created to the builtin endpoint, or none of the routes created match the query conditions.
Msg_delvrd_to_endpnts Indicates the number of messages that this hub delivered to event hub endpoints. Number Apart from the built-in-Event Hubs compatible endpoint, you can also route data to custom endpoints of type Event Hubs.
Msg_laten_endpnts Indicates the average latency between message ingress to this IoT hub and message ingress into custom endpoints of type Event Hub. Milliseconds If users notice any slowness in message routing, then compare the value of this measure with the other latency metrics reported by this test to determine whether deliveries to a specific endpoint were much slower than the rest.
Msg_delvrd_queue Indicates the number of messages that this hub delivered to service bus queue endpoints. Number  
Msg_laten_queue Indicates the average latency between message ingress to this IoT hub and message ingress into a service bus queue endpoint. Milliseconds If users notice any slowness in message routing, then compare the value of this measure with the other latency metrics reported by this test to determine whether deliveries to a specific endpoint were much slower than the rest.
Msg_delvrd_topic Indicates the number of messages that this hub delivered to service bus topic endpoints. Number  
Msg_laten_topic Indicates the average latency between message ingress to this IoT hub and message ingress into a service bus topic endpoint. Milliseconds If users notice any slowness in message routing, then compare the value of this measure with the other latency metrics reported by this test to determine whether deliveries to a specific endpoint were much slower than the rest.
Msg_delvrd_built Indicates the number of messages that this hub delivered to the built-in endpoints. Number  
Msg_laten_built Indicates the average latency between message ingress to this IoT hub and message ingress into the built-in endpoint. Milliseconds If users notice any slowness in message routing, then compare the value of this measure with the other latency metrics reported by this test to determine whether deliveries to a specific endpoint were much slower than the rest.
Msg_delvrd_store Indicates the number of messages that this hub delivered to storage endpoints. Number  
Msg_laten_store Indicates the average latency between message ingress to this IoT hub and message ingress into a storage endpoint. Milliseconds If users notice any slowness in message routing, then compare the value of this measure with the other latency metrics reported by this test to determine whether deliveries to a specific endpoint were much slower than the rest.
Data_wrttn_store Indicates the amount of data this IoT hub delivered to storage endpoints. MB  
Blobs_wrttn_store Indicates the number of times this hub delivered blobs to storage endpoints. Number  
Sccs_twin_dvcs Indicates the count of all successful deviceinitiated twin reads from this hub. Number  
Faild_twin_dvcs Indicates the count of all failed device-initiated twin reads from this hub. Number Ideally, the value of this measure should be 0.
Rspns_reads_dvcs Indicates the average response size of all successful deviceinitiated twin reads from this hub. MB  
Sccs_updts_dvcs Indicates the count of all successful deviceinitiated twin updates to this hub. Number  
Faild_updts_dvcs Indicates the count of all failed device-initiated twin updates to this hub. Number  
Size_updts_dvcs Indicates the average response size of all successful deviceinitiated twin updates to this hub. MB  
Sccs_drct_invctns Indicates the count of all successful direct method calls made by this hub. Number  
Faild_drct_invctns Indicates the count of all failed direct method calls made by this hub. Number Ideally, the value of this measure should be 0.
Rqst_size_drct_invctns Indicates the average request size of Cloud to Device method invocations made by this hub. MB  
Rsps_size_drct_invctns Indicates the average response size of Cloud to Device method invocations made by this hub. MB  
Sccs_rds_back Indicates the count of all successful back-endinitiated twin reads processed by this hub. Number  
Faild_rds_back Indicates the count of all failed back-endinitiated twin reads processed by this hub. Number Ideally, the value of this measure should be 0.
Rsps_size_rds_bck Indicates the average response size of backend-initiated twin reads processed by this hub. MB  
Sccs_updts_back Indicates the count of all successful device-initiated twin updates processed by this hub. Number  
Faild_updts_back Indicates the count of all failed device-initiated twin updates processed by this hub. Number Ideally, the value of this measure should be 0.
Size_updts_back Indicates the average response size of device-initiated twin updates processed by this hub. MB  
Sccs_twin_qurs Indicates the total number of twin queries processed by this hub that were successful. Number  
Faild_twin_qurs Indicates the total number of twin queries processed by this hub that failed. Number Ideally, the value of this measure should be 0.
Twin_qurs_rslt_size Indicates the average result size of successful twin queries processed by this hub. MB  
Sccs_updts_jobs Indicates the number of twin update jobs this hub successfully created. Number  
Faild_updts_jobs Indicates the number of twin update jobs this hub could not create. Number Ideally, the value of this measure should be 0.
Sccs_creatns_jobs Indicates the number of direct method invocation jobs successfully created by this hub. Number  
Faild_creatns_jobs Indicates the number of direct method invocation jobs this hub could not create. Number Ideally, the value of this measure should be 0.
Sccs_list_jobs Indicates the number of calls made by this hub to list jobs that succeeded. Number  
Faild_list_jobs Indicates the number of calls made by this hub to list jobs that failed. Number Ideally, the value of this measure should be 0.
Sccs_job_cancl Indicates the number of calls made by this hub to cancel jobs that succeeded. Number  
Faild_job_cancl Indicates the number of calls made by this hub to cancel jobs that failed. Number Ideally, the value of this measure should be 0.
Sccs_job_qurs Indicates the total count of calls to query jobs that was successfully processed by this hub. Number  
Faild_job_qurs Indicates the total count of calls to query jobs that this hub failed to process. Number Ideally, the value of this measure should be 0.
Cmpltd_jobs Indicates the total count of jobs completed by this hub. Number  
Faild_jobs Indicates the total count of jobs processed by this hub that failed. Number Ideally, the value of this measure should be 0.
Num_throt_error Indicates the number of throttling errors encountered by this hub. Number Throttling errors occur if an IoT hub's throttling limits have been exceeded for the requested operation. Ideally, the value of this measure should be 0. However, if this measure reports a non-zero value, then check if you are hitting the throttling limit by comparing your Telemetry message send attempts metric against the limits set.

Also, note that throttling errors occur only after the limit has been violated for too long a period. This is done so that your messages are not dropped if your IoT hub gets burst traffic. In the meantime, IoT hub processes the messages at the operation throttle rate, which might be slow if there is too much traffic in the backlog.
Tot_no_used Indicates the total number of messages used by this hub today. Number This is a cumulative value that is reset to zero at the beginning of each day.