eG Monitoring
 

Measures reported by EgHelperProcessTest

To ensure enterprise-class monitoring, the eG manager includes the capability to monitor its various components and to recover from failure of these components. When the eG manager is started, a separate eG recovery process is started. This process is called eGmon. Likewise, when the eG agent is started, a recovery process named eGagentmon also starts simultaneously.

The eGmon process periodically attempts to connect to the eG manager, access the various components of the manager, including the eG database. If it detects any problems during such access, the recovery process attempts to perform further diagnosis. The specific actions performed by the recovery process are as follows:

  • If the eG manager is not accessible, the recovery process attempts to restart the eG manager. If it fails to restart the eG manager thrice in succession, the recovery process generates an alert message to the eG administrator (using the MAIL SENDER ID specified in the Mail Configuration settings of the administration interface).
  • If the eG manager is accessible, the recovery process tests the connections from the eG manager to the database server that it uses. In the event it detects problems, it alerts the administrator of potential problems with the database server access. By connecting directly to the database server (i.e., without using any other eG manager components), the recovery process further determines whether the database access problem is being caused either because of a database failure or because the eG manager's pool of database connections is not sufficient to handle the current load on the manager.

When the eG manager is stopped manually, the eG recovery process is also shutdown.

In the same way, the eGagentmon process attempts to connect to the eG agent, and upon detection of accessibility issues, restarts the agent. However, note that if the eG agent is stopped manually, the agent recovery process is also shutdown.

This test reports the health of the eGmon and eGagentmon processes. Using this test, you can determine whether these helper processes are running or not, and if running, whether/not they are performing the checks that they are programmed to perform at pre-configured intervals. This way, you can be proactively alerted to the inadvertent termination of these critical help processes and errors in their operations.

The measures made by this test are as follows:

Measurement Description Measurement Unit Interpretation
Process_Count Indicates the number of instances of this process that is currently running. Number The value 1 is desired for this measure. Any value above 1 is a sign that more instances of a process are unnecessarily running and draining resources. Use the detailed diagnosis of this measure to know the process ID of the additional processes, so that you can kill them to conserve resources.
last_run_time Indicates the time (in minutes) that has elapsed since this process last checked the connection to the eG manager or agent (as the case may be).   Typically, this should be the same as or close to the frequency configured for the check in the eG manager or agent’s (as the case may be) configuration files. If not, it could indicate that the processes are not functioning as per the configure schedule, and could be a cause for concern.
has_process_restarted Indicates whether/not this process has restarted.   The values that this measure can report and their corresponding numeric values have been listed in the table below:

Measure Value Numeric Value
Yes 1
No 0

Note:

By default, the test reports the Measure Values listed in the table above to indicate whether a function has been set to run as a separate process or not. In the graph of this measure however, the same is represented using the numeric equivalents only.