| Measurement |
Description |
Measurement Unit |
Interpretation |
| Tot_node_count |
Indicates the total number of nodes in the cluster. |
Number |
|
| Master_node |
Indicates the count of master nodes in the cluster. |
Number |
Use the detailed diagnosis of this measure to know which are the master nodes in the cluster. |
| Worker_node |
Indicates the number of worker nodes in the cluster. |
Number |
Use the detailed diagnosis of this measure to know which are the worker nodes in the cluster. |
| Added_node |
Indicates the number of nodes that were added to the cluster since the last measurement period. |
Number |
Use the detailed diagnosis of this measure to know which nodes were recently added to the cluster. |
| Removed_node |
Indicates the number of nodes that were removed from the cluster since the last measurement period. |
Number |
Use the detailed diagnosis of this measure to know which nodes were recently removed from the cluster. |
| Running_node |
Indicates the number of nodes in the cluster that are currently running. |
Number |
|
| NotRunning_node |
Indicates the number of nodes in the cluster that are not running presently. |
Number |
Use the detailed diagnosis of this measure to know which nodes are not running and why. |
| Unknown_node |
Indicates the number of nodes in the cluster that are in the Unknown presently. |
Number |
Use the detailed diagnosis of this measure to know which nodes are in an Unknown state and why. |
| Tot_pods_capacity |
Indicates the maximum number of Pods that can be created on the nodes in the cluster. |
Number |
|
| Tot_pods_allocation |
Indicates the number of Pods that have been scheduled to nodes in the cluster. |
Number |
If the value of this measure is equal to or close to the value of the Pods capacity measure, it indicates that the cluster has or is about to exhaust its capacity. In such a situation, you may want to add more nodes to your cluster or increase the Pod capacity of your cluster. |
| Tot_running_pods |
Indicates the number of Pods in the cluster that are in the Running state currently. |
Number |
If a Pod is in the Running state, it means that the Pod has been bound to a node, and all of the Containers have been created. At least one Container is still running, or is in the process of starting or restarting. Use the detailed diagnosis of this measure to know which Pods are in the Running state. |
| Tot_pending_pods |
Indicates the number of Pods in the cluster that are in the Pending state currently. |
Number |
If a Pod is in the Pending state, it means that the Pod has been accepted by the Kubernetes system, but one or more of the Container images has not been created. This includes time before being scheduled as well as time spent downloading images over the network, which could take a while.
If a pod is stuck in Pending it means that it can not be scheduled onto a node. Generally this is because there are insufficient resources of one type or another that prevent scheduling. If this is the case, do the following:
Add more nodes to the cluster.
Terminate unneeded pods to make room for pending pods.
Check that the pod is not larger than your nodes. For example, if all nodes have a capacity of cpu:1, then a pod with a request of cpu: 1.1 will never be scheduled.
Use the detailed diagnosis of this measure to know which Pods are in the Pending state. |
| Tot_succeeded_pods |
Indicates the number of Pods in the cluster that are in the Succeeded state currently. |
Number |
If a Pod is in the Succeeded state, it means that all Containers in the Pod have terminated in success, and will not be restarted. |
| Tot_failed_pods |
Indicates the number of Pods in the cluster that are in the Failed state currently. |
Number |
If a Pod is in the Failed state, it means that all Containers in the Pod have terminated, and at least one Container has terminated in failure. That is, the Container either exited with non-zero status or was terminated by the system. Use the detailed diagnosis of this measure to know which Pods are in the Failed state. Ideally, the value of this measure should be 0. |
| Tot_unknown_pods |
Indicates the number of Pods in the cluster that are in the Unknown state currently. |
Number |
If a Pod is in the Unknown state, it means that the state of the Pod could not be obtained, probably due to an error in communicating with the host of the Pod. Ideally, the value of this measure should be 0. |
| Tot_pods_percent |
Indicates the percentage of Pods in the cluster that are in a Running state currently. |
Percent |
The formula used for computing this measure is as follows: [Running pods/Pods capacity]*100 Ideally, the value of this measure should be high. |
| Tot_cpu_capacity |
Indicates the total number of CPU cores supported by the cluster. |
Number |
|
| Total_millicpu |
Indicates the total CPU capacity of the cluster. |
Millicpu |
|
| Tot_cpu_requests |
Indicates the minimum CPU resources guaranteed to the Pods in the cluster. |
Millicpu |
This is the sum of CPU requests configured for all containers in all Pods across nodes in the cluster. A request is the amount of that resource that the system will guarantee to a Pod. |
| Tot_cpu_limits |
Indicates that maximum amount of CPU resources that the Pods in the cluster can use. |
Millicpu |
This is the sum of CPU limits set for all containers in all Pods across nodes in the cluster. A limit is the maximum amount that the system will allow the Pod to use. |
| Percent_cpu_limits |
Indicates what percentage of the CPU capacity of the cluster is allocated as CPU limits to containers. In other words, this is the percentage of a cluster's CPU capacity that the containers are allowed to use. |
Percent |
The formula used for computing this measure is as follows: (CPU limits/CPU capacity)*100 If the value of this measure exceeds 100%, it means that one/more Pods are probably over-subscribing to the capacity of one/more nodes. |
| Percent_cpu_request |
Indicates what percentage of the total CPU capacity of the cluster is set as CPU requests for the containers in the cluster. In other words, this is the percentage of a cluster's CPU capacity that the containers on the cluster are guaranteed to receive. |
Percent |
The formula used for computing this measure is as follows: (CPU requests/CPU capacity )*100 If the value of this measure is unusually high, then you can use the detailed diagnosis of this measure to review the CPU requests configured for each Pod in the cluster. In the process, you can accurately identify the Pod for which the maximum amount of CPU resources in the cluster is guaranteed - i.e., the Pod that is hogging the CPU capacity of the cluster. |
| Tot_memory_capacity |
Indicates the total memory capacity of the cluster. |
GB |
|
| Tot_memory_request |
Indicates the minimum memory resources guaranteed to the Pods in the cluster. |
GB |
This is the sum of memory requests configured for all containers in all Pods across nodes in the cluster. A request is the amount of that resource that the system will guarantee to the Pod. |
| Tot_memory_limits |
Indicates the maximum amount of memory resources that the Pods in the cluster can use. |
GB |
This is the sum of memory limits set for all containers in all Pods across nodes in the cluster. A limit is the maximum amount that the system will allow the Pod to use. |
| Percent_memory_limits |
Indicates what percentage of the memory capacity of the cluster is allocated as memory limits to containers in the cluster. In other words, this is the percentage of a cluster's memory capacity that the containers on the cluster are allowed to use. |
Percent |
The formula used for computing this measure is as follows: (Memory limits/Memory capacity)*100 If the value of this measure exceeds 100%, it means that one/more Pods are probably over-subscribing to the capacity of one/more nodes in the cluster. |
| Percent_memory_request |
Indicates what percentage of the total memory capacity of the cluster is set as memory requests for the containers in the cluster. In other words, this is the percentage of a cluster's memory capacity that the containers in the cluster are guaranteed to receive. |
Percent |
The formula used for computing this measure is as follows: (Memory requests/Memory capacity)*100 If the value of this measure is unusually high, then you can use the detailed diagnosis of this measure to review the memory requests configured for each Pod in the cluster. In the process, you can accurately identify the Pod for which the maximum amount of memory resources in the cluster is guaranteed - i.e., the Pod that is hogging the memory capacity of the cluster. |
| Tot_replicas_count |
Indicates the total number of non-terminated Pod replicas in the cluster that have been updated with changes (if any) made to Pod template specifications. |
Number |
Typically, whenever changes are made to a Deplopyment's Pod template - say, labels or container images of the template are changed - then a Deployment rollout is triggered. A new ReplicaSet is created and the Deployment manages moving the Pods from the old ReplicaSet to the new one at a controlled rate. Ideally, the value of this measure should be the same as the value of the Total pods with deployment measure. If not, then it means that the desired number of Pod replicas are not yet fully updated with the changes to the Pod template. |
| Tot_readyReplica_count |
Indicates the number of ready Pods created in the cluster across Deployments. |
Number |
|
| Tot_available_count |
Indicates the number of available Pods created in the cluster across Deployments. |
Number |
A Pod is said to be Available, if it is ready without any containers crashing for at least the duration configured against minReadySeconds in the Pod specification. Ideally, the value of this measure should be the same as the value of the Total pods with deployment measure. This means that the desired state of the Deployments is not the same as their actual state. |
| Tot_unavailable_count |
Indicates the total number of unavailable Pods created in the cluster across Deployments. |
Number |
Any Pod that is not ready, or is ready but has containers crashing for a period of time beyond the minReadySeconds duration, is automatically considered Unavailable. Ideally, the value of this measure should be 0. If this measure reports a non-zero value or a value equal to or close to the value of the Total pods with deployment measure, it means that the desired state of the Deployments is not the same as their actual state. |