eG Monitoring
 

Measures reported by EgClusterInfoTest

To ensure high availability of the eG monitoring solution, eG Enterprise offers a licensed Redundant Manager option. If the eG license enables this capability, then two managers can be setup to operate in an Active-Active or an Active-passive manager cluster – i.e., a secondary manager can act as an active or passive standby for the primary manager. In the event of the failure of the primary, the secondary will automatically assume the primary's role and perform all the functions of the primary – this includes receiving performance data from all eG agents, correlating the metrics, performing state computations, sending out email/SMS alerts (if configured) to users, and providing real-time performance and problem updates via the eG management console. Since this fail over occurs seamlessly, eG administrators have no way of figuring out if the eG manager being used is indeed operating in a redundant cluster, and if so, whether it is the primary or the secondary manager of the cluster.

Moreover, during the period of unavailability of the primary, the secondary stores the performance metrics it collects to a local data folder; when the primary comes back up, the secondary automatically replicates this data to the primary. The maximum capacity of this data folder is configurable. To avoid data loss, administrators should periodically check whether/not the max size setting of the data folder is sufficient; for this, they need to closely track the growth in size of the data folder. All this is possible using the EgClusterInfoTest test.

This test periodically checks whether the eG manager is operating in a cluster, and if so, reports what type of cluster it is. In addition, the test also reveals whether the eG manager being monitored is the primary or secondary manager in the cluster. Regardless of manager type, the test reports the number of agents that are explicitly assigned to the manager and the number of agents that are actually reporting to the manager; this way, the test points administrators to those agents that are mapped to the manager but are not actively reporting metrics and helps them initiate investigations in this regard. The test also enables administrators to track the usage of the data folder and figure out if the maximum amount of data that can be stored in that folder needs to be increased to avoid data loss during fail-over.

The measures made by this test are as follows:

Measurement Description Measurement Unit Interpretation
cluster_type Indicates whether/not the eG manager is operating within a redundant cluster, and if it is, then the type of cluster it is.   The values that this measure reports and the numeric values that correspond to them have been discussed in the table below:

Measure Value Numeric Value
Not supported 0
Active-Active 1
Active-Passive 2

Note:

By default, this measure reports the Measure Values listed in the table above to indicate the current cluster type of the manager. In the graph of this measure however, the same will be represented using the numeric equivalents only.

manager_type Indicates whether/not this eG manager is the primary manager in the cluster.   This measure will not be reported if the ‘Cluster type of the manager’ is ‘Not Supported’.

The values that this measure reports and the numeric values that correspond to them have been discussed in the table below:

Measure Value Numeric Value
Yes 1
No 0

Note:

By default, this measure reports the Measure Values listed in the table above to indicate whether/not the manager is the primary manager. In the graph of this measure however, the same will be represented using the numeric equivalents only.

manager_status Indicates whether/not this manager is currently running.   The values that this measure reports and the numeric values that correspond to them have been discussed in the table below:

Measure Value Numeric Value
Yes 1
No 0

Note:

By default, this measure reports the Measure Values listed in the table above to indicate whether/not the manager is running. In the graph of this measure however, the same will be represented using the numeric equivalents only.

other_manager_status Indicates whether/not the other manager in the cluster is currently running or not.   The values that this measure reports and the numeric values that correspond to them have been discussed in the table below:

Measure Value Numeric Value
Yes 1
No 0

Note:

By default, this measure reports the Measure Values listed in the table above to indicate whether/not the other manager in the cluster is running. In the graph of this measure however, the same will be represented using the numeric equivalents only.

data_storage Indicates whether/not data is currently stored in this manager for transmission to the other manager in the cluster.   The values that this measure reports and the numeric values that correspond to them have been discussed in the table below:

Measure Value Numeric Value
No 0
Yes 1
Uploading 2

Note:

By default, this measure reports the Measure Values listed in the table above to indicate whether/not data is stored in the manager. In the graph of this measure however, the same will be represented using the numeric equivalents only.

no_of_files Indicates the number of files that are currently waiting to be sent by this manager to the other manager. Number The amount of data that can be stored by a manager for transmission to other managers is controlled by two configuration settings - maxStoragePerFile and filesPerManager - that are present in the file eg_managers.ini located in the <EG_INSTALL_DIR>\manager\config directory.

The setting maxStoragePerFile defines the amount of data (in MB) that can be stored in each temporary file that is used to store data temporarily for transmission to a manager. An eG manager can store data in multiple files for transmission to another manager. Multiple files are used for storage (rather than a single file) to minimize data read/write operations to memory for transmission to the other manager. The filesPerManager setting defines the maximum number of data files per manager that are used for temporary storage of data.

By default, the maxStoragePerFile value is 0, and the filesPerManager is 0. This implies that a manager does not save data it receives from agents directly for transmission to another manager. If the maxStoragePerFile is 10 and the filesPerManager is 20, then 200MB of data can be saved for transmission to another manager.

If the value of these two measures are consistently close to the maxStoragePerFile and filesPerManager settings of the monitored eG manager, it is a clear indication that a large volume of data is being generated and readied for transmission by that manager, but its temporary storage is not been tuned adequately to handle this load. If these settings are not changed accordingly, it may result in significant data loss in the event of a manager failure.

data_folder_size Indicates the amount of data currently waiting to be sent by this manager to the other manager. MB
assigned_agents Indicates the number of agents that have been explicitly assigned to this manager. Number  
reporting_agents Indicates the number of agents that are currently reporting metrics to this manager. Number  
test_data_queue Indicates the number of test data that are currently waiting to be sent by this manager to the other manager. Number  
ddd_data_queue Indicates the number of DDD data that are currently waiting to be sent by this manager to the other manager. Number  
max_thread_config Indicates the maximum number of threads allocated for processing the test and DDD data that are currently waiting to be sent by this manager to the other manager. Number  
test_data_thread_usage Indicates the percentage of threads used for processing the test data to be sent by this manager to the other manager. Percent This measure is computed as a ratio of the value of the test_data_queue measure and the max_thread_config measure. A high value for this measure indicates that the eG manager is using more number of threads to process the test data. It indicates that the thread count on the manager is rapidly running out and requests for processing the DDD data may not be serviced or may be deferred until the number of active requests for processing test data drops.

In such cases, you should consider increasing maximum number of threads allocated to handle the test and DDD data. However, exercise caution when altering the maximum thread count, since increasing the thread count may consume too much of memory resources leading to eG manager slowdowns. Likewise, if the maximum thread count is set too low, it will cause requests to block or timeout.
ddd_data_thread_usage Indicates the percentage of threads used for processing the DDD data to be sent by this manager to the other manager. Percent This measure is computed as a ratio of the value of the ddd_data_queue measure and the max_thread_config measure. A high value for this measure indicates that the eG manager is using more number of threads to process the DDD data. It indicates that the thread count on the manager is rapidly running out and requests for processing the test data may not be serviced or may be deferred until the number of active requests for processing test data drops.

In such cases, you should consider increasing maximum number of threads allocated to handle the test and DDD data. However, exercise caution when altering the maximum thread count, since increasing the thread count may consume too much of memory resources leading to eG manager slowdowns. Likewise, if the maximum thread count is set too low, it will cause requests to block or timeout.