| eG Monitoring |
|---|
|
Alarm Window The eG Enterprise system's patented alarm correlation technology automates the process of problem diagnosis. The eG system can be configured such that as and when an alarm situation is detected, the manager automatically generates an email alert to the different users who had requested to be alerted. These alarms are specific to the services and servers being monitored by the current user. Also, if a particular user has been configured to view only a specific priority of alarms then this window will display alarms pertaining to that priority only. However, the users with the permissions of a Supermonitor/Admin/ServerAdmin/SuperAlarmViewer get to see all the alarms in the environment. The alarms are categorized into Critical, Major, and Minor priority alarms. Following conventional management practices, eG Enterprise applies the color-coding scheme mentioned below to indicate alarm priorities:
The Current Alarms window also indicates the problem component-type, IP/host name of the component that has encountered a problem, the layer that has been affected, and the date and time of the problem. Typically, whenever an alarm is raised for problems at the host-level of a component, the Component type is automatically set to the Host system in the Current Alarms page, even if the component affected is a say, Oracle database server or a Web server. The service desk may hence not be able to quickly determine the exact component-type of the affected component from the alarm information. Moreover, help desk personnel may prefer to view the operating system of the problem host as part of the alarm information displayed in the Current Alarms page, as such an information will greatly simplify the troubleshooting process. To make sure that the Current Alarms page enables help desk to easily understand, interpret, and solve problems affecting a host's performance, you can optionally configure the eG Enterprise system to display the actual Component type, Host system, or the affected Operating system for host-level alarms in the Current Alarms page. To enable this capability, do the following:
Note: This configuration affects the History of Alarms page, email/SMS alerts, and SNMP traps as well. To know more about the exact nature of the problem, move your mouse pointer over the alarm displayed in the Current Alarms window. Additional alarm information in the form of a brief description of the problem, the test that detected the problem, the test that reported the problem, the host on which the test executed, and the corresponding site name (if any) will be displayed. Note: The Value column of the additional alarm details, reports the last measure value and unit of the problem measure. The alarms window and email alerts will display the last measure value only if the Show last measure value in alerts flag in the Mail Alert Configuration section of the Mail Alert Preferences page (Alerts -> Mail Settings -> Alert Settings) is set to Yes. Besides, a Graph icon is available against every alarm. Clicking on this icon invokes a graph of the problem measure for a default period of 1 hour. Using this graph, you can observe the time-of-day variations in the behavior of the problem measure. Note: To override the default measure graph Timeline of 1 hour, do the following:
Moreover, to make diagnosis more efficient and accurate, eG Enterprise embeds an optional detailed diagnostic capability. With this capability, the eG agents can be configured to run detailed, more elaborate tests as and when specific problems are detected. For example, when the CPU usage of a host reaches the threshold, the agent can be configured to provide more details - e.g., the top 10 process that are consuming more CPU resources. Optionally, this capability can also be configured to periodically generate detailed measures, regardless of the occurrence of problems. If the detailed diagnostic capability is enabled for the problem measure indicated by the Current Alarms window, then a special DD icon will be available against the corresponding alarm. Clicking on the icon reveals detailed information pertaining to the problem condition, so that you can quickly and accurately zero-in on the root-cause of the problem. For instance, the Current Alarms window indicates excessive CPU utilization on a host iis. Against the ‘High CPU utilization’ alarm raised on Processor_0 of this host, you will find a ‘magnifying glass’ icon. Clicking on this icon will list the top-10 CPU-consuming processes that were executing on Processor_0 of the host during the last hour (by default), thus enabling you to identify the exact process that is causing the issue. To view the detailed measures related to any other processor supported by the same host, you can pick a different option for analysis from the Description list and click the Submit button. Similarly, you can change the Measurement and the Timeline for the detailed diagnosis. Note: The DD icon will not appear in the Current Alarms window under the following circumstances:
Most administrators will agree that not all performance issues are caused by problems with the internal operations or the external network traffic/connectivity of a component. Sometimes, unplanned/unauthorized/accidental configuration changes can also adversely impact server performance. eG Enterprise optionally provides a dedicated Configuration Management module, which enables you to keep track of changes to the configuration of target components and analyze the performance impact of such changes.
Moreover, if the solution captures a configuration change in a component around the same time at which a performance issue was detected with that component, then the Current Alarms window will instantaneously turn your attention to the change by tagging that alarm with
Clicking on the By default, the alarm window displays alarms of all priorities. This is indicated by the default selections in the Filter by and Priority lists. To view only the critical alarms, select the Critical option from the Priority list box. Likewise, you can view the Critical & Major alarms together, or view the Major or Minor alarms alone by selecting the corresponding options from the Priority list. Besides Priority, alarms can also be filtered on the basis of Component Type, Services, Segments, or Zones. For instance to view the alarms pertaining to a particular component type alone, pick the Component Type option from the Filter by list, and then select a component type of your choice from the Types list. For instance, Citrix administrators would typically be more concerned with issues pertaining to their mission-critical Citrix XenApp installations. To focus on Citrix-related issues alone, Citrix administrators can filter the alarms list by selecting Component Type from the Filter by list and then choosing the Citrix XenApp option from the Types list. Likewise, service managers can filter the alarms list to view only those alarms that are impacting a particular business service's performance. For this, they need to select the Services option from the Filter by list, and pick a service of interest to them from the Services list. In the same way, performance degradations experienced by the components in a segment/zone can also be viewed in the CURRENT ALARMS window. Also, with a single mouse click, you can change the order in which the alarms are sorted in the Current Alarms window. By default, alarms are sorted in the descending order of the Start Time of issues. To arrange them in the ascending order of Start Time, simply click on the column label - Start Time. The current sort order will be depicted by an ‘arrow’ symbol in the sorted column - while an ‘up arrow’ symbol signifies the ascending order, the ‘down arrow’ denotes the descending order. This way, you can quickly arrange the contents of the alarms window in the ascending/descending order of any of the displayed columns. In addition to the above, the option to Delete alarms can be enabled for specific monitor users registered with the eG Enterprise system. While creating/modifying the profile of a user using the eG administrative interface, you can set the Allow alarm deletion flag to Yes for that user, if you want to grant him/her the right to delete alarms. By default, the alarm deletion capability is disabled for all users (including the admin and supermonitor users) to the eG monitoring console. If the capability has been explicitly enabled for a user, say the supermonitor, then the Alarms window will display an additional Delete button. To delete an alarm, select the check box corresponding to the alarm, and then click the Delete button. Doing so will invoke the Delete Alarm window, where you can provide a Reason for deleting the chosen alarm. You may also decide not to provide any Reason for the deletion, if you so desire. Click the Submit button to save the reason, and to confirm the deletion of the alarm. Note:
Optionally, an Acknowledgement can be provided for an alarm displayed in the eG monitor interface. By acknowledging an alarm, a user can indicate to other users that the issue raised by an alarm is being attended to. In fact, if need be, the user can even propose a course of action using this interface. In such a case, a user with Admin or Supermonitor privileges (roles) can edit the acknowledgement by providing their own comments/suggestions on the proposed action. The acknowledgement thus works in three ways:
The Acknowledge pop up window will then appear using which the alarm can be acknowledged. To save the acknowledgement, click the Submit button. Doing so will lead you back to the Current Alarms window, but this time, a symbol will prefix the acknowledged alarm. Moving your mouse pointer over the symbol will reveal the details of the acknowledgement such as its description, and the user who has acknowledged the alarm, and the date and time specifications of the acknowledgement. In large environments, it is but natural that the same set of components are assigned to multiple users for monitoring. In such environments, some/all the users with monitoring rights to a component might want to post their comments for an alarm related to that component. If acknowledgment rights are granted to all these users, then each of them can login to the monitor interface and provide an acknowledgement description for the same alarm using the procedure discussed above. eG Enterprise maintains a history of the acknowledgement descriptions provided by multiple users with rights to monitor a single component, and lists the entire history the next time one of these users attempts to view the acknowledgement details in the Current Alarms window. This way, the administrative staff can share the responsibility for resolving critical issues, brainstorm online to identify accurate remedies, and even provide each other with quick updates on problem status. An alarm can also be unacknowledged, but only by the user who originally submitted the acknowledgement. For unacknowledging, once again, select the check box corresponding to the acknowledged alarm in the Current Alarms window and click the Acknowledge button therein. When the Acknowledge pop up window appears, click on the Unacknowledge button in it. This will make sure that the ‘acknowledgement’ symbol is removed from Current Alarms window. Note: A user can edit/unacknowledge only those acknowledgement descriptions that he/she originally provided. Clicking on an alarm will lead monitor users to a page that displays the layer model, tests, and measurements pertaining to the problem component.
|