eG Monitoring
 

The System Dashboard

The default layer model representation has many benefits - for starters, the model is consistent across applications, thus ensuring a shorter learning curve for the users. Secondly, from depicting the current state of an application to indicating the current issues related to that application, the layer model page serves as the most reliable platform for providing detailed, real-time information related to current application performance. Lastly, and most importantly, the layer model page automatically correlates the performance across all the layers of an application, and precisely indicates where the performance issues related to that application originated - at the host level? the network/TCP level? or the application level? - in short, from a correlation standpoint, the layer model is ideal.

On the flip side, though the layer model can accurately point to the 'problem layer', the actual 'problem' itself is hidden inside the tests, measures, and detailed diagnosis information mapped to the layer; getting to the root-cause of an issue using the layer model therefore, involves a little time and a few mouse clicks! Finally, while its true that the layer model page can clearly depict the current state of a component, it merely provides the means to launch those interfaces that reveal past performance/problems related to that component; this implies that when it comes to analyzing historical performance and deducing performance trends, problem patterns, and potential anomalies, the layer model page offers little help.

To eliminate these shortcomings, eG Enterprise now offers specialized dashboards along with the unique monitoring model of every application that is monitored. Once the problem layer is indicated by the layer model, you can switch to one of these dashboards to receive in-depth insights into the performance of the problem layer(s), and thus troubleshoot the issue at hand better. Typically, these dashboards facilitate the following:

  • Serve as a single, central console that not only depict the current state of a layer, but also instantly indicate the root-cause of issues pertaining to that layer, thereby enabling administrators to go from problem effect to the problem source in no time!
  • Combine both raw and graphically represented data, and facilitate an in-depth analysis of not just live performance, but also the historical performance of a particular layer, thus shedding light on potential anomalies;
  • Aid administrators in effectively analyzing the past trends in the performance of a layer, so that they can easily forecast future performance;
  • Enable service level audits on-the-fly, and thus help administrators accurately determine when a layer slipped from the desired performance levels.

By default, the layer model representation of every application is accompanied by a System Dashboard and a Network Dashboard. In addition to these dashboards, a few selected applications are provided with an Application Dashboard as well.

The System Dashboard of an application allows you to focus on the performance of the operating system on which that particular application runs - i.e., the Operating System layer of an application. While viewing the layer model of an application using the Layer Model tab page, you can, if you so want, instantly switch to its System Dashboard for in-depth insights into system performance; for this, click on the System tab page.

Using the System Dashboard, administrators can determine the following:

  • The current status of the application host;
  • The problems that the host is currently facing, and the type and number of problems it encountered during the last 24 hours;
  • The current system configuration (if the eG license enables the Configuration Management capability);
  • The current state of the critical parameters related to system performance;
  • How some of the sensitive performance parameters have performed during the last 1 hour (by default);
  • The resource-hungry processors supported by the host, and the disk partitions on the host that are currently experiencing a space crunch.

By default, the System Dashboard provides an overview of system performance. Accordingly, the Overview option is chosen by default from the Subsystem list. Instead of an Overview, if you prefer to receive an inside view of system performance - i.e., if you wish to investigate how effectively / otherwise the system in question has been using each of its resources or would prefer to focus on the uptime of the host, you can pick a different option from Subsystem. The sections below discuss each of these options in great detail. By default, the System Dashboard is enabled for all applications.

Overview

In the Overview mode, the System Dashboard reveals the following:

  1. The Current System Alerts section indicates the number of unresolved issues at the host-level, and also reveals how these issues have been distributed based on priority - i.e., the number of current issues of each priority. By clicking on an alarm priority, you can view the details of current alarms of that priority. This way, you not only determine how problem-prone your operating system is, but also figure out the number and type of current problems at the system-level.

  2. If too many alarms are displayed, you can use the text boxes placed at the end of this alarm window to perform quick searches based on the corresponding columns, and locate the specific alarm(s) of interest to you. For instance, to look for an alarm with a specific description, specify the whole/part of this description in the text box below the Description column. Doing so will automatically display the details of only that alarm containing the specified description.
  3. If you click on any alarm, an Alarm Details section will be introduced in the Alarms window itself, providing additional details of the alarm clicked on. These details include the Site affected by the problem for which the alarm was raised, the test that reported the problem, and the Last Measure value.
  4. To figure out the type of problems that occurred the maximum on the system during the last 1 hour, refer to the History of Events section. This section provides a bar graph that reveals the number of problems of each priority that the application host experienced during the last 1 hour. Clicking on a bar will lead you to the EVENT HISTORY page that displays the complete list of problem events that occurred at the system-level in the last 1 hour. This information provides administrators with quick and effective insights into recurring problems, and enables them to deduce problem patterns.
  5. Once back in the dashboard, you will find an At-A-Glance tab page that enables you to determine, at a glance, the current state of the system. This tab page begins with a Current OS Health section, which provides a pie chart revealing how problem-prone the system currently is - in other words, it indicates the service level that has been achieved by the system currently. Clicking on a slice will lead you to the EVENT HISTORY page again.
  6. This pie chart will be followed by a series of pre-configured host-level measures and their current values, with the help of which unhealthy metrics can be instantly detected and impact analysis easily performed. While measures that report percentage values are typically represented using a dial chart, other measures are reported using digital displays. Since all these values are rounded-off to two decimal places (by default), you are advised to move your mouse pointer over these "approximations", so that the "actuals" can be viewed as a tool tip.
  7. Also, to enable administrators to instantly and accurately detect deviations from the norm, the dial charts, by default, indicate the threshold settings of a measure along with the real-time values reported by that measure. If multi-level thresholds are set for a measure, then each such threshold will be indicated using the conventional color-codes (Red for Critical, Orange for Major, and Pink for Minor) used across the eG monitoring and reporting consoles. By default, the dial charts display the Maximum thresholds alone. If a measure is associated with Minimum thresholds only, then the dial chart will display the minimum thresholds settings instead. Thanks to the threshold representations in the dial charts, administrators can easily identify when and what type of thresholds were violated.
  8. Let us now return to the dashboard. If you click on any dial/digital graph in the dashboard, you will be directly lead to the layer model page, where the exact layer-test-measure combination that corresponds to the dial/digital graph will be displayed.
  9. Now that we are done exploring the dial and digital graphs, let us proceed to focus on the other sections of the dashboard. A quick look at the current System Configuration helps determine whether a change in system configuration can make the host less vulnerable to performance issues. Note that the System Configuration will appear only if your eG license allows Configuration Management; if not, then this section will display a bar chart indicating the current status of the Operating System layer of the host, in terms of the percentage of time the host has been in normal/critical/major/minor/unknown states.
    • On the other hand, to add a new measure to this section, select the Test that reports the measure, then pick the Measure, provide a Display name for the measure, and then click the Add button. Finally, click the Update button.

  10. Let us now focus on the dashboard again. A list of critical system-related measures and their current state is provided under the head Key Performance Indicators in the dashboard, so that administrators can swiftly determine if the eG agent has detected any abnormalities with any of the factors that significantly influence system performance. This way, remedial measures can be immediately initiated. Clicking on a measure here will lead you the Layer Model tab page displaying the monitoring model of the target application, and the value reported by the measure that was clicked on.
    • On the other hand, to add a new measure to this section, select the Test that reports the measure, then pick the Measure, provide a Display name for the measure, and then click the Add button. Finally, click the Update button.

  11. Moreover, corresponding to each of the core measures displayed in the Key Performance Indicators section, a miniature graph will be available, which provides a quick look at the variations in that measure during the last 1 hour (by default). By observing these variations more closely and clearly - say, for even longer time periods - you can rapidly detect disturbing performance trends and proactively isolate potential problems. To achieve this, click on the miniature graph to expand it. A zoomed out graph then appears, using which, you can alter the Timeline of the graph. In addition, if the time-of-day values reported by multiple descriptors are plotted in the graph, you can choose to focus on the historical performance of the best/worst descriptors alone by picking a TOP-N or LAST-N option from the Show list that appears in the expanded graph. For instance, if the graph tracks the usage of all the disk partitions on the host over time, then, you can pick the TOP-3 option from the Show list to make sure that the graph plots the historical values of only those 3 disk partitions, which are being used the maximum.
  12. Beneath the Key Performance Indicators section, you will find a CPU Usage Summary; from this summary, you can quickly understand how well all the processors supported by the system are currently utilizing the host's CPU resources, and also accurately identify those processors that are eroding these critical resources.
  13. Similarly, the Disk Usage Summary will reveal the current capacity and usage of each of the disk partitions on the host, so that you can swiftly isolate disk partitions that are running out of space.
  14. Thus, with the help of tabulated usage statistics, the At-A-Glance tab page turns the spot light on resource-intensive processors and disk partitions on the host. Alternatively, if you prefer an interface that provides a graphical comparison of resource usage across processors and disk partitions (as the case may be), combined with quick insights into the root-cause of usage excesses (if any), then, you can switch to the Details tab page instead. For this purpose, click on the Details tab page.
  15. A set of pre-defined bar charts are provided, each of which focuses on the current usage of a key resource (disk/CPU/memory). Using these default graphs, you can easily and accurately determine the following:
    • Which processor is currently utilizing the maximum CPU resources? Which process currently executing on the host is causing this resource-drain?
    • Which disk partitions are left with limited free space?
    • Which disk partitions are the busiest in terms of the rate of I/O requests they handle? Which processes currently executing on the host are causing high disk I/O?
    • Which processes on the host are consuming memory excessively?

  16. These bar charts, in fact, can also be configured to aid effective postmortem analysis of resource usage. For instance, you can use one of these bar charts to find out which process caused the memory usage on the host to increase during a time period in the past. For this purpose, click on the corresponding bar chart. The graph will then zoom out as depicted.
  17. By default, the resulting graph will display the top-10 processes that executed on the host during the specified Timeline, in the descending order of memory usage. Accordingly, the TOP-10 option is chosen by default from the Show list. To view only a limited number of processes in the graph, pick a different TOP-N or LAST-N option from the Show list.
  18. Clicking on the icon, will lead you to the detailed diagnosis page indicating the Process ID of the memory-intensive processes, and the memory usage of each process.
  19. This way, with the help of the bar charts, you can quickly get to the source of any resource contention at the host. However, to engage in an elaborate historical analysis of the behavior of the host-level measures and isolate probable problems, the measure/summary/trend graphs offered by the History tab page will be more useful. To switch to this tab page, click on History.
  20. By default, the History tab page provides measure graphs for each of the key host-level measures, using which you can efficiently track the changes in the performance of the measures during the last 24 hours (by default). These graphs help determine when a measure, which is currently in an abnormal state, began exhibiting performance inconsistencies.

  21. Some measure graphs in the History tab page may plot values for multiple descriptors; such graphs will appear very cluttered, making analysis a nightmare! To view such measure graphs clearly, you will have to first enlarge the graph by clicking on it. The graph will then zoom out.
  22. If need be, you can change the Timeline of the enlarged graph or choose to view only a few of the descriptors in the graph by picking a TOP-N or LAST-N option from the Show list.
  23. To verify SLA-adherences/slippages of the host-level measures, click on the icon at the right, top corner of the History tab page. Summary graphs for the critical resource usage metrics, will then appear. By default, these graphs will summarize the performance of the individual measures during the default period of 1 day;this default duration can be overridden using the procedure discussed below:
    • Click the button at the top of the dashboard to open the Dashboard Settings window.
    • Then, pick the Summary Graph option from the Default timeline for list, and then set a Timeline for the graphs.
    • Finally, click the Update button.

  24. You can change the graph timeline by clicking on the Timeline link. You can even expand the graph by clicking on it, and then alter its Timeline.
  25. In addition to the timeline and dimension, the enlarged summary graph also allows you to change its Duration. By default, the Duration is set to Hourly, indicating that the summary graphs plot only the hourly summaries by default. If required, you can change the Duration of the summary graph in the enlarged mode so that ,you can perform daily or monthly summary analysis.
  26. Similarly, to observe and understand the past trends in the performance of the host and to predict future measure behavior, click on the icon at the right, top corner of the History tab page. The trend graphs for the host-level metrics, will then appear. By default, these graphs plot the maximum and minimum values registered by a measure during the default period of 1 day.
  27. If need be, you can change the graph timeline by clicking on the Timeline link. You can even expand the graph by clicking on it.
  28. Doing so will invoke, where you can view the enlarged graph. By default, only hourly trend values are plotted in a trend graph. If need be, you can change the Duration of the trend graph in the enlarged mode, so that you can perform daily or monthly trend analysis. Likewise, you can change the graph Timeline.
  29. Also, by default, the trend graph only plots the minimum and maximum values registered by a measure. Accordingly, the Graph type is set to Min/Max in the enlarged mode.
  30. Note:

    In case of descriptor-based tests, the Summary and Trend graphs displayed in the History tab page typically plot the values for a single descriptor alone. To view the graph for another descriptor, pick a descriptor from the drop-down list made available above the corresponding summary/trend graph. For instance, note that the trend graph for CPU utilization plots the CPU usage trends of Processor_1 only. If you want to view the CPU usage trend graph for Processor_0 instead, pick the Processor_0 option from the drop-down list adjacent to the graph title, CPU utilization (%).

  31. At any point in time, you can switch to the measure graphs by clicking on the button.
  32. Typically, the History tab page displays measure, summary, and trend graphs for a default set of measures.
  33. CPU

    You can also use the System dashboard to instantly identify CPU bottlenecks and the processes responsible for the same. For this, select the CPU option from the Subsystem list.

    1. The first section of the dashboard helps determine whether/not the host is currently facing any CPU-related issues - dial charts and digital number displays provided by this section enables administrators to figure out whether or not all the key CPU health indicators are currently operating well-within their thresholds limit. The dial charts do not indicate the threshold settings for measures by default. However, you can click on the button at the top of the dashboard, navigate to the Dashboard Settings window, and set the Show Threshold in Dial Chart flag to Yes, so that the maximum/minimum thresholds (whichever is available) for a measure are reflected in the dial charts. With the 'actual' performance plotted alongside the 'expected' performance levels, you can instantly identify non-conformances, and immediately initiate relevant corrective measures.
    2. The dashboard also imparts to you the flexibility to pick and choose the metrics that need to be represented via dial charts and digital displays. For this purpose, do the following:

      • Click on the icon at the top of the dashboard. In the Dashboard Settings window that appears, select System from the Module list to indicate that measures need to be configured for the System Dashboard.
      • Since the new measure(s) is to be displayed in the dashboard of the CPU sub-system, select the CPU option from the Sub-System list.
      • To add measures for the dial graph, pick the Dial Graph option from the Add/Delete Measures for list. Upon selection of the Dial Graph option, the pre-configured measures for the dial graph will appear in the Existing Value(s) list. Similarly, to add a measure to the digital display, pick the Digital Graph option from the Add/Delete Measures for list. In this case, the Existing Value(s) list box will display all those measures for which digital displays pre-exist.
      • Next, select the Test that reports the said measure, pick the measure of interest from the Measures list, provide a Display name for the measure, and click the Add button to add the chosen measure to the Existing Value(s) list. Note that while configuring measures for a dial graph the 'Measures' list will display only those measures that report percentage values.
      • If you want to delete one/more measures from the dial/digital graphs, then, as soon as you choose the Dial Graph or Digital Graph option from the Add/Delete Measures for list, pick any of the displayed measures from the Existing Value(s) list, and click the Delete button.
      • Finally, click the Update button to register the changes.

      Note:

      Only users with Admin or Supermonitor privileges can enable/disable the system, network, and application dashboards, or can customize the contents of such dashboards using the Dashboard Settings window. Therefore, whenever a user without Admin or Supermonitor privileges logs into the monitoring console, the button will not appear.

    3. Let us now return to the CPU dashboard. If you click on a dial chart or a digital graph in the CPU dashboard, Dashboard Settings window, will appear displaying the exact measure that is represented by the dial/digital chart, and the layer and test to which that measure is mapped.
    4. Below the dial and digital charts, you will find the Comparison tab page, which provides a default set of bar charts comparing the current CPU usage of the processors supported by the host and the processes executing on the host. Using these default bar graphs, you can accurately identify those processors that are excessively utilizing the CPU resources of the host, and those processes that are responsible for this CPU erosion.
    5. If required, you can override this default setting to include more bar charts in the Comparison tab page or exclude one/more existing ones. For this purpose, do the following:

      • Click the button at the top of the dashboard.
      • The Dashboard Settings window then appears. From the Module list, pick System, choose CPU as the Sub-System, and then select Comparison Graph from the Add/Delete Measures for list.
      • Then, pick the Test that reports the measures to be compared.
      • Next, select the Measure.
      • Provide a Display name for the measure, and click the Add button. The chosen measure will then be added to the Existing Value(s) list. Similarly, multiple measures can be added, so that multiple graphs appear. If need be, you can even delete one/more measures from the Existing Value(s) list; for this, select a measure from this list and click the Delete button. This will ensure that the corresponding bar graph is removed from the Comparison tab page.
      • Finally, click the Update button.

    6. Clicking on any of the bar charts in the Comparison tab page will expand the chart; for instance, to clearly view the top CPU consuming processes on the host, you can enlarge the Top CPU Consuming Processes bar chart as depicted.
    7. By default, the enlarged bar graph displays, TOP-10 the processes currently executing on the host and the percentage CPU utilized by each. If need be, you can customize the enlarged bar graph, so that it displays only a few processes - say, only the top-5 processes in terms of CPU usage. For this, simply select the TOP-5 option from the Show list.
    8. Moreover, besides the current CPU usage, you can also compare the historical CPU usage of processes by clicking on the Compare History link. Doing so will allow you to alter the Timeline of the bar graph, and enable you to zero-in on that process that was devouring the CPU resources on the host during a particular time period in the past.
    9. You can also click on the icon to invoke the DETAILED DIAGNOSIS page. Since this page provides the PROCESS ID of the top CPU consumers on the host, it enables you to easily locate the CPU-intensive processes and initiate remedial measures.
    10. In addition to these bar charts, this tab page also provides you with a CPU Usage Summary by Top Processes table. While the charts focus on current CPU usage by default, the table reveals the top CPU consumers on the host during the last 1 hour (by default). Besides, the table also indicates how high and how low the CPU usage of each process has scaled during the same hour. This way, you can understand whether the high CPU usage of a process was just a sudden spike or a consistent phenomenon.
    11. You can add more such comparison tables to the Comparison tab page, if required. To do this, follow the steps given below:

      • Click the button at the top of the dashboard.
      • The Dashboard Settings window then appears. From the Module, pick System, choose CPU as the Sub-System, and then select Comparison Table from the Add/Delete Measures for list.
      • Then, pick the Test that reports the measures to be compared.
      • Next, select the Measure.
      • Provide a Display name for the measure, and click the Add button. The chosen measure will then be added to the Existing Value(s) list. Similarly, multiple measures can be added to a table. However, note that all measures added should be reported by a single test only.
      • Then, specify a title for the comparison table in the Table Title text box.
      • In the Comparison By text box, provide a common name for the descriptors across which the chosen measure values are to be compared. For instance, if you are adding a table that compares the system CPU usage across processors, then the Comparison By text box can be set to Processors.
      • Finally, click the Update button.

    12. Besides the above, the tab page also embeds a CPU Usage History by Top Processes chart. This chart graphically compares the CPU usage of the processes that were executing on the host during the last 1 hour (by default), and enables the quick and easy identification of the process that is the top CPU consumer. Clicking on this chart enlarges it enabling you to perform the comparative analysis more effectively.
    13. You can alter the Timeline of the comparison, change its dimension, and even invoke the detailed diagnosis page to determine the Process ID of the leading CPU consumer.
    14. To shift your focus from current performance to historical performance, click on the History tab page.
    15. By default, the History tab page displays time-of-day measure graphs revealing how the host has been using the CPU resources of the host over the last 24 hours. Using these graphs, you can effortlessly figure out when exactly a CPU contention (if any) crept into the host. If required, you can click on the Timeline link, to go further back in time and analyze the CPU usage.
    16. You can even override the default timeline of 24 hours by following the steps given below:

      • Click the button at the top of the dashboard.
      • The Dashboard Settings window then appears.
      • Then, pick the History Graph option from the Default timeline for list, and then set a Timeline for the history graphs.
      • Finally, click the Update button.

    17. Back in the dashboard, you can enlarge a graph in the History tab page by clicking on that graph.
    18. Here again, you can modify the graph Timeline. In case of graphs that plot values for multiple descriptors, you can selectively compare the CPU usage of a few descriptors alone by selecting a TOP-N or LAST-N option from the Show list.
    19. By clicking on the icon in the History tab page you can convert the measure graphs into summary graphs; these summary graphs, by default, reveal the percentage of time in the last 24 hours during which the CPU usage of the host has remained optimal. Besides indicating whether the overall CPU usage of the host has been within the acceptable limits, it also indicates how often during a day these usage levels have been compromised. You can click on a summary graph to enlarge it; furthermore, you can change the Timeline of the graph in the enlarged mode.
    20. The default timeline of 24 hours can be overridden using the Dashboard Settings window that appears when the button at the top of the dashboard is clicked. From the Default Timeline for list, pick the Summary Graph option. Then, select a timeline for the summary graph from the Timeline list. Finally, click the Update button.

    21. You can even expand the summary graph by clicking on it, and then alter its Timeline.
    22. Similarly, you can click on the icon in the History tab page to view trend graphs revealing the maximum and minimum CPU used by the host during the last 24 hours (by default). This enables you to better analyze the past trends in CPU usage, and foresee the future trends.
    23. The default timeline of 24 hours can be overridden using the Dashboard Settings window that appears when the button at the top of the dashboard is clicked. From the Default Timeline for list, pick the Trend Graph option. Then, select a timeline for the trend graph from the Timeline list. Finally, click the Update button.

    24. You can even expand the trend graph by clicking on it, and then alter its Timeline. By default, the trend graph plots the minimum and maximum values of a measure during the given timeline. In the enlarged mode, you can change the Graph type so that the average values or sum of trend values are plotted in the trend graphs instead.
    25. Note:

      In case of descriptor-based tests, the Summary and Trend graphs displayed in the History tab page typically plot the values for a single descriptor alone. To view the graph for another descriptor, pick a descriptor from the drop-down list made available above the corresponding summary/trend graph.

    26. At any point in time, you can switch to the measure graphs by clicking on the button.
    27. Typically, the History tab page displays measure, summary, and trend graphs for a default set of measures. If you want to add graphs for more measures to this tab page or remove one/more measures for which graphs pre-exist in this tab page, then, do the following:
      • Click the button at the top of the dashboard.
      • The Dashboard Settings window then appears. From the Module list, pick System, choose CPU as the Sub-System, and then, select History Graph from the Add/Delete Measures for list.
      • The measures for which graphs pre-exist in the History tab page will be automatically displayed in the Existing Value(s) list. To delete a measure, and in effect, its corresponding graph as well, select the measure from the Existing Value(s) list, click the Delete button, and then click the Update button.
      • To add a new graph, first, pick the Test that reports the measure for which a graph is to be generated.
      • Next, select the Measure of interest.
      • Provide a Display name for the measure. Then, click the Add button to add the measure to the Existing Values(s) list. Finally, click the Update button.
      • This will add a new measure, summary, and trend graph for the chosen measure to the History tab page.

      Note:

      Only users with Admin or Supermonitor privileges can enable/disable the system, network, and application dashboards, or can customize the contents of such dashboards using the Dashboard Settings window. Therefore, whenever a user without Admin or Supermonitor privileges logs into the monitoring console, the button will not appear.

    Disk

    To take a closer look at how a host uses the disk space available to it, and to promptly detect probable usage excesses, select the Disk option from the Subsystem list.

    This dashboard provides the following:

    1. A quick look at the pie charts displayed in the Disk Usage Summary section, can provide you with a clear idea of how well each disk partition on the host is being used currently; the partition that is currently running out of space can be swiftly identified from this section.
    2. In addition to the above, the Comparison tab page, provides a default collection of bar charts. These default charts not only enable you to compare space usage across disk partitions, but also help you evaluate the level of activity on each disk. This way, you can accurately isolate those disks that are currently experiencing a space crunch and also those that are very busy processing requests. If these bar charts reveal that a particular disk is currently experiencing high I/O activity, you can use the Top Processes by I/O activity bar chart to zero-in on the process on the host that is responsible for generating this I/O.
    3. You can include more comparison graphs, or remove one/more graphs from this preset list using the Dashboard Settings window that appears when the button at the top of the dashboard is clicked.

      In this window, do the following:

      • Select the System option from the Module list, as the change is to be effected on the System Dashboard.
      • Pick Disk as the Sub-System.
      • To add a new comparison graph, select Comparison Graph from the Add/Delete Measures for list.
      • Upon choosing Comparison Graph, all the measures for which comparison graphs pre-exist in the Disk dashboard will be listed in the Existing Value(s) list box. You can remove one/more measures by selecting the measure from this list and clicking the Delete button alongside. This will remove the comparison graph that corresponds to that measure from the Comparison tab page.
      • To add a new graph, select the Test that reports the measure to be compared, pick the measure of interest from the Measures list, and provide a Display name for the measure.
      • Then, click the Add button.
      • Finally, click the Update button.

      Note:

      Only users with Admin or Supermonitor privileges can enable/disable the system, network, and application dashboards, or can customize the contents of such dashboards using the Dashboard Settings window. Therefore, whenever a user without Admin or Supermonitor privileges logs into the monitoring console, the button will not appear.

    4. To view a bar chart in the Comparison tab page more clearly, click on it; this enlarges the chart as depicted.
    5. By default, the comparison bar charts compare the current disk usage or I/O. If need be, you can alter the timeline of a bar chart, so as to compare the status of disks at some point of time in the past. This enables you to investigate past problems better. For instance, you might want to know which process had generated very high disk I/O activity some time during the last 1 hour; to determine this, just click on the Compare History link, and then set Timeline to 1 hour. The resulting bar chart will then compare the I/O activity of those processes that were executing on the host during the last 1 hour.
    6. If your expanded bar chart appears cluttered owing to a large number of disk partitions/processes, you can easily filter out the 'not-very-important' disk partitions / processes from the chart. Besides enhancing the readability of your bar charts, this enables you to focus on selected descriptors alone. For instance, in the bar chart, you can choose to view only the top-5 processes on the basis of the level of disk I/O activity. To achieve this, simply select the TOP-5 option from the Show list.
    7. Besides, to view more details about the I/O-intensive processes on the host, you can simply invoke the detailed diagnosis page from the enlarged Top Processes by I/O activity bar chart. To do this, just click on the icon ; this will bring up, the lists of I/O-intensive processes that were executing on the host during the Timeline specified in the enlarged graph, the Process ID of each of the processes, and other metrics related to the disk I/O.
    8. After analyzing current disk performance thoroughly, if you want to engage in an in-depth analysis of the historical disk usage metrics, switch to the History tab page by clicking on it.
    9. By default, the History tab page provides measure graphs revealing the time-of-day variations in disk usage and disk I/O during the last 24 hours (by default). By carefully studying these measure graphs, you can accurately identify which disk experienced excessive usage / high I/O at what time during the last hour.
    10. You can even change the Timeline of these graphs, so that you can analyze disk performance over a broader time window; to achieve this, just click on the Timeline link, and pick a different date from the calendar that will pop out.
    11. If required, you can even override the default timeline (i.e., 24 hours) of the graph by following the steps given below:

      • Click the button at the top of the dashboard to invoke the Dashboard Settings window.
      • Select the History Graph option from the Default timeline for list.
      • Set a different default timeline by selecting an option from the Timeline list.
      • Finally, click the Update button.

    12. Like the bar charts, these measure graphs also enlarge when clicked.
    13. Here again, the graph Timeline can be altered. Moreover, you can reduce the number of disk partitions for which usage values are plotted in a single graph, by picking a top-n or last-n option from the Show list.
    14. To assess the overall health of the disk partitions and to perform efficient service level audits on disk usage, you can convert the measure graphs into summary graphs, on-the-fly. For this purpose, click on the icon at the right, top corner. Summary graphs for a pre-configured list of measures, will then appear.
    15. By default, these summary graphs are plotted for a timeline of 24 hours. To override this default setting, do the following:

      • Click the button at the top of the dashboard to invoke the Dashboard Settings window.
      • Select the Summary Graph option from the Default timeline for list.
      • Set a different default timeline by selecting an option from the Timeline list.
      • Finally, click the Update button.

    16. You can even expand the summary graph by clicking on it, and then alter its Timeline.
    17. Similarly, to analyze past trends in disk usage and accordingly plan the future disk capacity of the host, you can view the historical trend graphs in the History tab page, instead of the measure/summary graphs. For this, click on the icon.
    18. By default,these trend graphs are plotted for a timeline of 24 hours. To override this default setting, do the following:
      • Click the button at the top of the dashboard to invoke the Dashboard Settings window.
      • Select the Trend Graph option from the Default timeline for list.
      • Set a different default timeline by selecting an option from the Timeline list.
      • Finally, click the Update button.

    19. You can even expand the trend graph by clicking on it, and then alter its Timeline. By default, the trend graph plots the minimum and maximum values of a measure during the given timeline. In the enlarged mode, you can change the Graph type so that the average values or sum of trend values are plotted in the trend graphs instead.
    20. Note:

      In case of descriptor-based tests, the Summary and Trend graphs displayed in the History tab page typically plot the values for a single descriptor alone. To view the graph for another descriptor, pick a descriptor from the drop-down list made available above the corresponding summary/trend graph.

    21. At any point in time, you can switch to the measure graphs by clicking on the button.
    22. Typically, the History tab page displays measure, summary, and trend graphs for a default set of measures. If you want to add graphs for more measures to this tab page or remove one/more measures for which graphs pre-exist in this tab page, then, do the following:
      • Click the button at the top of the dashboard.
      • The Dashboard Settings window then appears. From the Module list, pick System, choose Disk as the Sub-System, and then, select History Graph from the Add/Delete Measures for list.
      • The measures for which graphs pre-exist in the History tab page will be automatically displayed in the Existing Value(s) list. To delete a measure, and in effect, its corresponding graph as well, select the measure from the Existing Value(s) list, click the Delete button, and then click the Update button.
      • To add a new graph, first, pick the Test that reports the measure for which a graph is to be generated.
      • Next, select the Measure of interest.
      • Provide a Display name for the measure. Then, click the Add button to add the measure to the Existing Values(s) list. Finally, click the Update button.
      • This will add a new measure, summary, and trend graph for the chosen measure to the History tab page.

      Note:

      Only users with Admin or Supermonitor privileges can enable/disable the system, network, and application dashboards, or can customize the contents of such dashboards using the Dashboard Settings window. Therefore, whenever a user without Admin or Supermonitor privileges logs into the monitoring console, the button will not appear.

    Memory

    Apart from CPU and disk usage, you can pick the Memory option from the Subsystem list to determine how the system has been using its memory resources.

    1. Since the dial charts and digital number displays, report the current state of a default set of memory usage-related metrics, continuous review of these metrics can provide you with a heads-up on any usage irregularities that may have surfaced recently.
    2. You can also include dial/digital graphs for additional measures in this dashboard using the procedure discussed below:

      • Click on the icon at the top of the dashboard. In the Dashboard Settings window that appears, select System from the Module list.
      • Then, pick Memory from the Sub-System list.
      • To add measures for the dial graph, pick the Dial Graph option from the Add/Delete Measures for list. Upon selection of the Dial Graph option, the pre-configured measures for the dial graph will appear in the Existing Value(s) list. Similarly, to add a measure for digital display, pick the Digital Graph option from the Add/Delete Measures for list. In this case, the Existing Value(s) list box will display all those measures for which digital displays pre-exist.
      • Next, select the Test that reports the said measure, pick the measure of interest from the Measures list, provide a Display name for the measure, and click the Add button to add the chosen measure to the Existing Value(s) list. Note that while configuring measures for a dial graph the 'Measures' list will display only those measures that report percentage values.
      • If you want to delete one/more dial/digital graphs, then, as soon as you choose the Dial Graph or Digital Graph option from the Add/Delete Measures for list, pick any of the displayed measures from the Existing Value(s) list, and click the Delete button.
      • Finally, click the Update button to register the changes.

      Note:

      Only users with Admin or Supermonitor privileges can enable/disable the system, network, and application dashboards, or can customize the contents of such dashboards using the Dashboard Settings window. Therefore, whenever a user without Admin or Supermonitor privileges logs into the monitoring console, the button will not appear.

    3. If the dial/digital graphs in the Memory dashboard indicate any abnormal memory usage, you can use the details displayed in the Comparison tab page to determine its root-cause. For instance, if you notice a sudden/gradual decrease in the value that appears in the Free memory digital display, it is a clear indicator that the host in question is consuming excessive memory; this could be owing to one/more memory-intensive processes executing on the host. To know which process is the leading memory consumer, you can either use the Top Memory Consuming Processes table in the Comparison tab page or the Memory usage by top processes graph. The table primarily reveals how much memory has been utilized, on an average, by each process on the host during the last 1 hour (by default); in addition, it also displays the minimum and maximum percentage of memory utilized by every process in the same 1 hour period. Besides enabling you to accurately identify the most memory-intensive process on the host, this table also helps you isolate those processes which have displayed erratic memory usage trends in the default 1 hour duration. The graph, on the other hand, aids you in visually comparing the memory usage of the processes on the host during the last hour (by default), and instantly nailing the process that is consuming the maximum memory resources.
    4. You can override the default 1 hour timeline of the comparison graph, by following the steps below:

      • Click on the button at the top of the dashboard.
      • Select Comparison Graph from the Default timeline for list.
      • Specify a Timeline for the comparison graph.
      • Click the Update button.

      To perform more in-depth analysis of memory usage, you can, if required, customize the Comparison tab page to display more comparison graphs and tables. To add a new comparison graph for instance, do the following:

      • Click on the button at the top of the dashboard.
      • Select the System option from the Module list, and pick Memory as the Sub-System.
      • To add a new comparison graph, select Comparison Graph from the Add/Delete Measures for list.
      • To add a new graph, select the Test that reports the measure to be compared, pick the measure of interest from the Measures list, and provide a Display name for the measure.
      • Then, click the Add button.
      • Finally, click the Update button.

    5. If too many processes are executing on the host, the comparison graph in the Comparison tab page is bound to appear cluttered. You can therefore, click on the graph to expand it and view the values plotted clearly.
    6. You can, if need be, alter the Timeline of the enlarged graph. You can even click on the icon, to view the detailed diagnosis page, using which you can detect which process on the host was consuming excessive memory during the Timeline specified.
    7. While a comparative analysis of current/1-hour performance can reveal a memory contention that occurred recently, deeper historical analysis is necessary for deducing usage trends and problem patterns. To perform this historical analysis, use the History tab page. This tab page displays a number of graphs, plotted for a default period of 24 hours, that reveal the time-of-day variations in the memory usage of the host.
    8. You can override the default timeline of these measure graphs by following the steps given below:

      • Click on the button at the top of the dashboard.
      • Select History Graph from the Default timeline for list.
      • Specify a Timeline for the history graphs.
      • Click the Update button.

    9. Instead of changing the default timeline, you can change the timeline of the measure graphs on-the-fly too by clicking on the Timeline link, and can even enlarge a graph by simply clicking on it. The enlarged graph will appear.
    10. Here again, you can modify the Timeline of the graph.
    11. If required, you can configure the History tab page to display Summary graphs instead of the measure graphs. To achieve this, click on the icon at the right, top corner of the History tab page. Summary graphs for the critical memory usage measures will then appear . From these graphs, you can easily determine the percentage of time (during the last day by default) the system has experienced memory-related issues. Besides indicating how memory-efficient the system was during the default period, these graphs also enable you to gauge the efficiency of the administrative staff in resolving issues that might have surfaced.
    12. You can override the default timeline of 24 hours by following the steps given below:

      • Click the button at the top of the dashboard to invoke the Dashboard Settings window.
      • Select the Summary Graph option from the Default timeline for list.
      • Set a different default timeline by selecting an option from the Timeline list.
      • Finally, click the Update button.

    13. You can even expand the summary graph by clicking on it, and then alter its Timeline.
    14. Similarly, with a mere click of a button, you can have the History tab page display trend graphs instead of summary/measure graphs. For this, just click on the icon at the right, top corner.
    15. The trend graphs reveal the memory usage trends during the last 24 hours (by default) (see Error! Reference source not found ). To override this default setting, do the following:
      • Click the button at the top of the dashboard to invoke the Dashboard Settings window.
      • Select the Trend Graph option from the Default timeline for list.
      • Set a different default timeline by selecting an option from the Timeline list.
      • Finally, click the Update button.

    16. You can even expand the trend graph by clicking on it, and then alter its Timeline. By default, the trend graph plots the minimum and maximum values of a measure during the given timeline. In the enlarged mode, you can change the Graph type so that the average values or sum of trend values are plotted in the trend graphs instead.
    17. Note:

      In case of descriptor-based tests, the Summary and Trend graphs displayed in the History tab page typically plot the values for a single descriptor alone. To view the graph for another descriptor, pick a descriptor from the drop-down list made available above the corresponding summary/trend graph.

    18. At any point in time, you can switch to the measure graphs by clicking on the button.
    19. Typically, the History tab page displays measure, summary, and trend graphs for a default set of measures. If you want to add graphs for more measures to this tab page or remove one/more measures for which graphs pre-exist in this tab page, then, do the following:
      • Click the button at the top of the dashboard.
      • The Dashboard Settings window then appears. From the Module list, pick System, choose Memory as the Sub-System, and then, select History Graph from the Add/Delete Measures for list.
      • The measures for which graphs pre-exist in the History tab page will be automatically displayed in the Existing Value(s) list. To delete a measure, and in effect, its corresponding graph as well, select the measure from the Existing Value(s) list, click the Delete button, and then click the Update button.
      • To add a new graph, first, pick the Test that reports the measure for which a graph is to be generated.
      • Next, select the Measure of interest.
      • Provide a Display name for the measure. Then, click the Add button to add the measure to the Existing Values(s) list. Finally, click the Update button.
      • This will add a new measure, summary, and trend graph for the chosen measure to the History tab page.

    Note:

    Only users with Admin or Supermonitor privileges can enable/disable the system, network, and application dashboards, or can customize the contents of such dashboards using the Dashboard Settings window. Therefore, whenever a user without Admin or Supermonitor privileges logs into the monitoring console, the button will not appear.

    Uptime

    To view the uptime details of the system, you can select the Uptime option from the Subsystem list.

    This dashboard reveals the following:

    1. The first section of the dashboard reveals the total time for which the system has been up and running since it was last rebooted. The Uptime/DownTime Summary section provides a quick summary of the availability of the system during the last 24 hours (by default) - the details include: the total duration for which the system has been up and running in the last 24 hours, the percentage uptime, the total duration (in the last 24 hours) for which the system was down, the percentage of downtime, and number of reboots during the last 24 hours. Using these details, you can determine whether the agreed uptime levels for the system were met or not, and if not, how much is the system falling short of its desired performance levels.
    2. You can also infer whether the system experienced any reboots during the last 24 hours (by default). To know more about each reboot, refer to the Reboot Summary section. For every reboot that occurred in the last 24 hours (by default), this section reveals when the system was shutdown, when the reboot occurred, and how long did the system remain down until it was rebooted. This clearly indicates the frequency of the reboots, and helps determine whether such reboots were scheduled or unexpected.
    3. You can override the default timeline (of 24 hours) of the Uptime/Downtime Summary and the Reboot Summary, by following the steps indicated below:

      • Click the button at the top of the dashboard to invoke the Dashboard Settings window.
      • Select the Uptime/Downtime Summary option from the Default timeline for list.
      • Set a different default timeline by selecting an option from the Timeline list.
      • Finally, click the Update button.

      Note:

      Only users with Admin or Supermonitor privileges can enable/disable the system, network, and application dashboards, or can customize the contents of such dashboards using the Dashboard Settings window. Therefore, whenever a user without Admin or Supermonitor privileges logs into the monitoring console, the button will not appear.

    4. This will be followed by a default set of graphs indicating how long during the last 24 hours (by default) the system has been up, and whether the reboots scheduled for the system have occurred during the last hour or not. You can, if required, change the Timeline for the measure graphs and the data displayed in the Uptime/Downtime Summary and Reboot Summary sections, so that the uptime statistics of the host can be analyzed over a broader time period. For this, all you need to do is just click on the Timeline link, and pick a different timeline from the calendar that pops out.
    5. You can even override the default timeline (of 24 hours) of the graphs, by following the steps given below:

      • Click the button at the top of the dashboard to invoke the Dashboard Settings window.
      • Select the History Graph option from the Default timeline for list.
      • Set a different default timeline by selecting an option from the Timeline list.
      • Finally, click the Update button.

    6. To view the measure graphs clearly, click on a graph of interest to you to enlarge it; you can alter the Timeline in the enlarged graph too. Using these graphs, breaks in the availability of the system and failure of reboot schedules can be accurately identified and investigated.
    7. Click on the icon at the right, top corner of the History section to view summary graphs using which you can effectively perform service level audits on a host, based on the duration of their availability. Determine the percentage of time for which the host was operational during the last day (by default), and also be notified of reboots that might have occurred on the host during the default timeline. If required, you can click on the Timeline link to alter the graph timeline.
    8. The default timeline of 1 day (24 hours) for the summary graphs can be overridden using the steps detailed below:

      • Click the button at the top of the dashboard to invoke the Dashboard Settings window.
      • Select the Summary Graph option from the Default timeline for list.
      • Set a different default timeline by selecting an option from the Timeline list.
      • Finally, click the Update button.

    9. You can even expand the summary graph by clicking on it, and then alter its Timeline.
    10. Clicking on the icon in the History section will display trend graphs on system uptime; these trends reveal when during the last 24 hours (by default) uptime was the lowest, and when reboots failed. If required, you can click on the graph to expand it and alter its Timeline.
    11. The default timeline of 1 day (24 hours) for the trend graphs can be overridden using the steps detailed below:

      • Click the button at the top of the dashboard to invoke the Dashboard Settings window.
      • Select the Trend Graph option from the Default timeline for list.
      • Set a different default timeline by selecting an option from the Timeline list.
      • Finally, click the Update button.

    12. You can even expand the trend graph by clicking on it, and then alter its Timeline. By default, the trend graph plots the minimum and maximum values of a measure during the given timeline. In the enlarged mode, you can change the Graph type so that the average values or sum of trend values are plotted in the trend graphs instead.
    13. Note:

      In case of descriptor-based tests, the Summary and Trend graphs displayed in the History tab page typically plot the values for a single descriptor alone. To view the graph for another descriptor, pick a descriptor from the drop-down list made available above the corresponding summary/trend graph.

    14. At any point in time, you can switch to the measure graphs by clicking on the button.
    15. Typically, the History tab page displays measure, summary, and trend graphs for a default set of measures. If you want to add graphs for more measures to this tab page or remove one/more measures for which graphs pre-exist in this tab page, then, do the following:
      • Click the button at the top of the dashboard.
      • The Dashboard Settings window then appears. From the Module list, pick System, choose Uptime as the Sub-System, and then, select History Graph from the Add/Delete Measures for list.
      • The measures for which graphs pre-exist in the History tab page will be automatically displayed in the Existing Value(s) list. To delete a measure, and in effect, its corresponding graph as well, select the measure from the Existing Value(s) list, click the Delete button, and then click the Update button.
      • To add a new graph, first, pick the Test that reports the measure for which a graph is to be generated.
      • Next, select the Measure of interest.
      • Provide a Display name for the measure. Then, click the Add button to add the measure to the Existing Values(s)list. Finally, click the Update button.
      • This will add a new measure, summary, and trend graph for the chosen measure to the History tab page.

    Note:

    Only users with Admin or Supermonitor privileges can enable/disable the system, network, and application dashboards, or can customize the contents of such dashboards using the Dashboard Settings window. Therefore, whenever a user without Admin or Supermonitor privileges logs into the monitoring console, the button will not appear.