eG Monitoring
 

Measures reported by NutConnStatusTest

The Nutanix Prism typically manages one/more Nutanix Acropolis clusters. Each node in a cluster runs a Nutanix Controller VM (CVM), which serves all of the I/O operations for the hypervisor and all VMs running on that node. A Prism service runs on every CVM with an elected Prism Leader which is responsible for handling HTTP requests. The cluster external IP will always be hosted by the Prism Leader. If the Prism Leader fails, then a new leader will be elected, which will assume the cluster external IP. This implies that the Prism service will be rendered unavailable, only if all the CVMs in a cluster fail. Such a failure can cause the dependent applications to suffer prolonged outages. Likewise, if the Prism leader takes too long to respond to HTTP requests, it will once again degrade application performance.

If this is to be avoided, then the availability and responsiveness of the Prism service should be periodically checked and abnormalities should be promptly reported. This can be achieved using the NutConnStatusTest test.

This test emulates an HTTP request to a Nutanix Prism, reports whether the Prism service is available or not, and if available, also reports how quickly it responds to the request. Sudden breaks in Prism availability and poor responsiveness of the Prism service can be promptly detected in the process. The test also reports the response code returned by the Prism service, so that the nature of the response - whether it is an error response or not - can be determined.

The measures made by this test are as follows:

Measurement Description Measurement Unit Interpretation
AVAILABILITY Indicates whether or not the Prism service is available. Percent If the value of this measure is 100, it indicates that the Prism service is available. The value 0 on the other hand denotes that the Prism service is unavailable.
RESPONSE_TIME Indicate the time taken by the Prism service to respond to HTTP requests. Seconds If the value of this measure consistently increases, it indicates that the performance of the Nutanix prism is gradually deteriorating. Ideally, the value of this measure should be low.
RESPONSE_CODE Indicates the response code returned by the emulated HTTP request. Number A consistent increase in the value of this measure could mean that cache misses are high, owing to which new data is being continuously written to the cache. In the process, more memory is being consumed.
Measure Value Description
200 The API request was successful and received a response.
201 The API request was successful and created an object.
400 The API request was malformed and could not be processed.
401 You have no access and/or are not authorized.
403 You are authorized but do not have the privileges for this API.
404 The URL was not found
405 The called method is not allowed or is not supported
408 The request timed out (20 seconds maximum).
500 The API request was received but there was a server error.
503 Service unavailable at this time or too early to process.
508 HTTP other than 1.1 not supported.