eG Monitoring
 

Monitoring Zones

Large infrastructures spanning geographies can pose quite a monitoring challenge owing to the number of components involved and their wide distribution. Administrators of such infrastructures might therefore prefer to monitor the infrastructure by viewing it as smaller, more manageable business units. In eG parlance, these business units are termed ZONES. A zone can typically comprise of individual components, segments, services, and/or other zones that require monitoring. For example, in the case of an infrastructure that is spread across the UK, USA, and Singapore, a zone named USA can be created consisting of all the components, segments, and services that are operating in the US branch alone. The USA zone can further contain an East-coast zone and a West-coast zone to represent infrastructure and services being supported on the two coasts of the US.

While a service/segment contains a group of inter-related components with inter-dependencies between them, a zone contains a group of components, services, segments, or zones that may/may not have inter-dependencies. Click here to know how to configure the zones.

Any number of zones can be configured using eG Enterprise. To quickly determine the state of the configured zones, you can login to the eG monitor interface and click on the icon available in the Monitor tab. Then, select the Zones option from the Groups tile.

The Zones page appears, where those zones that have been assigned to you will be listed, along with their states; if you login as supermonitor, then all the fully-configured zones will be displayed in this page.

Note:

If you have not configured locations (using the eG map interface) for all the managed zones, then clicking on the Zones option in the Monitor Menu itself will invoke the zone list.

For each configured zone, the Zones page displays the state of elements (segments/components/services/other zones) within that zone. This way, you can quickly identify those infrastructure elements that are responsible for problems with a zone.

Note:

By default, against each zone displayed in the Zones page, the top-10 Components included in that zone will be displayed. Typically, to identify the top-10 components, eG Enterprise automatically sorts all the components included in the zone on the basis of their current state, arranges the sorted list in the alphabetical order of the component names, and picks the first 10 components of this list.

If you want more number of components to be displayed against each zone, do the following:

  • Login to the eG administrative interface.
  • Click on the icon available in the Admin tab. Then, select the Monitor option from the Settings tile.
  • Click the Other Settings sub-node under the General node in the tree-structure of the Monitor Settings page.
  • In the right panel, modify the default value 10 that is displayed in the Components count in segment/service/zone list text box.
  • Click the Update button to save the changes.

Clicking on the right-arrow button alongside the Zones page will reveal a tree-structure in the left panel. This tree-structure consists of a Zones node, which displays all the configured zones as its sub-nodes. Besides the names of the zones, these sub-nodes also indicate the current state of each zone.

If you expand a node representing a zone in the tree, you will find sub-nodes representing each type of infrastructure element that has been included in the zone. For instance, if one/more services and segments have been added to a zone, then expanding that zone's node will reveal a Services and a Segments sub-node. To know which segments and services have been added to the zone, expand the Segments and Services sub-nodes, respectively.

Likewise, if a zone consists of sub-zones and independent components, then, expanding such a zone's node will reveal sub-nodes named Zones and Components, respectively. Here again, you can expand the Zones and Components sub-nodes to figure out which other zones and components have been included in the zone. If the zone includes aggregate components, then the node representing that zone will host an Aggregates sub-node, which when expanded, will list all the aggregate components that are part of that zone and their current state.

Click on a particular segment in the tree to view the Topology of that segment, using which you can determine the root-cause of performance issues in the segment. In the same way, you can click on a particular service in the tree to view the Topology of that service and accurately diagnose the source of any slowdowns that may be experienced by that service.

Click on the Associations () icon to view the state of all the fully configured segments in their environment, and the users to which each of them are assigned.

When you click on a node representing a (parent) zone in the tree - i.e., if you click on any of the sub-nodes of the global Zones node in the tree - the right panel will change to display three tab pages, namely - the Systems, Components, and Details tab pages. The sections that follow will discuss each of these tab pages in detail.

The Systems Tab Page

By default, selecting a zone node from the tree opens the Systems tab page. This tab page serves as a single, central interface that allows administrators to ascertain, from just a glance, the current operating system-level health of every component that is part of the chosen zone. The values of key host-level parameters are captured in real-time from all the components that belong to the chosen zone and are displayed here, so that administrators are instantly alerted to issues related to the network connection/traffic, TCP connectivity, and resource usage of every component.

Clicking on a System in the right panel will lead you to the layer model page, where you can view the exact layer where a problem has occurred, the test that reported the problem, and the problematic measure.

By default, the contents of the Systems tab page are sorted based on the state of the zone components listed therein. If more that one component exists in the same state, then the components of that state will be sorted in alphabetical order. If need be, you can change the sort order. For example, if you wish to sort the components listed in the Systems tab page in the descending order of the values of their Disk Usage, just click on the Disk Usage label. Doing so, tags the Disk Usage label with a down arrow icon - this icon indicates that the Systems tab page is currently sorted in the descending order of the total disk space used by each component. To change the sort order to ‘ascending’, all you need to do is just click again on the Disk Usage label or the down arrow icon. Similarly, you can sort the Systems tab page based on any column available in the table.

You can, if required, override the default measure list in the Systems tab page by adding more critical measures to the list or by removing one/more existing ones from the list. For this, do the following:

  • Click on the icon provided near the Back button in this page. In the Settings window that appears, select Systems from the Tabs flag.
  • To add more metrics to the Systems tab page, first, select the Add option from the Add/Delete Measures flag.
  • Next, select the layer for which you wish to add the test from the Layer drop down list. Now,select the Test that reports the measure of your choice, pick the measure of your interest from the Measures list, provide a Display name for the measure, and click the Add button to add the chosen measure to the Systems tab page.
  • If the test chosen is a descriptor-based test, then select the Yes option from the Does the test have descriptors? section. By default, this option is set to No. A Function section then appears in the Settings pop up window. Select the options between Sum and Average in the Function section based on the measure. If the measure value is in MB/GB then select Sum from the Function option, and if the measure value is in percentage select the Average option. The solution computes the average or the total sum of values across descriptors and displays it in the System tab page.
  • If you want to delete one/more measures from this section, then, as soon as you choose the Delete option from the Add/Delete Measures flag, the Test drop down list will be populated with all the existing tests for which measures are displayed. Pick a Test and choose a Measure of your interest to delete from the Systems tab page.

Note:

While displaying values for descriptor-based measures in the Systems tab page, the eG Enterprise system does not display the actual values per descriptor. Instead, the solution computes the average or the total sum of values across descriptors and displays it in the corresponding measure column. For instance, for values reported as percentages, the solution computes the average value across descriptors. On the other hand, if the value is reported as a GB or MB, then the total sum of all the descriptor values of the component will be displayed against the component.

The Components Tab Page

The Components tab page provides insights into the performance of the applications that are part of the chosen zone. For each application that has been added to a zone, users can configure key application-level metrics that are to be captured in real-time and displayed in the Components tab page. This way, users can ensure that they receive a heads-up on common, yet critical operational issues encountered by an application that is part of a zone, without having to go to the layer model page of that application for this purpose.

Since metrics are configured per application, the application level metrics displayed in this tab page will differ based on the type of the component. A Type drop-down list in this page will be populated with all the component types associated with the chosen zone. You can pick the component type of interest to you from the Type drop-down list to view the user-configured application level metrics of all zone components of that type.

Note:

The Type drop down list will be sorted based on the current state of the zone components of each type.

By default, the components listed in the Components tab page will be sorted in the order of their state - starting from the critical to the normal. If more that one component exists in the same state for the chosen component type, then the components of that state will be sorted in alphabetical order. If need be, you can change the sort order based on the application level metrics that are displayed against each component. For example, if you wish to view the sort the Oracle Database server list in the Components tab page in the descending order of the number of Table Space usage, just click on the Table Space usage label. Doing so, tags the Table Space usage label with a down arrow icon - this icon indicates that the Components tab page is currently sorted in the descending order of the table space usage of each zone component of type Oracle Database. To change the sort order to ‘ascending’, all you need to do is just click again on the Table Space usage label or the down arrow icon. Similarly, you can sort the Components tab page based on any column available in the table.

Clicking on a component here will lead you to the layer model page of that component, where the problem layer, test, and measures are revealed.

On the other hand, if no metrics have been configured for the Type chosen, then a message to that effect will appear.

To modify the measure-list associated with a component type, do the following:

  • Click on the icon provided near the Back button. In the Settings window that appears, select Components from the Tabs flag.
  • To add more metrics to the Components tab page, first, select the Add option from the Add/Delete Measures flag. Then, pick the Component Type to which the addition applies.
  • Next, select the layer for which you wish to add the test from the Layer drop down list. Then,select the Test that reports the measure of your choice, pick the measure of interest from the Measures list, provide a Display name for the measure, and click the Add button to add the chosen measure to the Components tab page.
  • If the test chosen is a descriptor-based test, then select the Yes option from the Does the test have descriptors? section. By default, this option is set to No. A Function section then appears in the Settings pop up window. Select the options between Sum and Average in the Function section based on the measure. If the measure value is in MB/GB then select Sum from the Function option, and if the measure value is in percentage select the Average option. The solution computes the average or the total sum of values across descriptors and displays it in the System tab page.
  • If you want to delete one/more measures from this section, then, as soon as you choose the Delete option from the Add/Delete Measures flag, the Test drop down list will be populated with all the existing tests for which measures are displayed. Pick a test and choose a Measure of your interest to delete from the Components tab page.
Note:

While displaying values for descriptor-based measures in the Components tab page, the eG Enterprise system does not display the actual values per descriptor. Instead, the solution computes the average or the total sum of values across descriptors and displays it in the corresponding measure column. For instance, for values reported as percentages, the solution computes the average value across descriptors. On the other hand, if the value is reported as a GB or MB, then the total sum of all the descriptor values of the component will be displayed against the component.

The Details Tab Page

The Details tab page, when clicked, provides a quick overview of the performance of a chosen zone.

Just like the Monitor Dashboard, the Details tab page too comprises of four panels, each of which sheds light on a critical performance aspect of the chosen zone. The Current Status panel displays the total number of measurements that eG Enterprise has collected from the zone elements, and also indicates the percentage of measurements that are in abnormal, unknown, and normal states. A count of currently unresolved issues at the zone-level is also available here. This panel thus provides an overview of the health of the zone. Clicking on any of the states here will take you to the Current Alarms window, where you can view all open alarms of the corresponding priority.

A zone can contain a wide variety of infrastructure elements starting with independent components to segments, services, and even other zones. The Infrastructure Health section therefore, graphically represents the different categories of infrastructure elements that a zone contains, and how well each category is currently performing. The Sub Zones bar graph for instance indicates the number of zones that have been added to the zone being monitored, and the current state of these subzones. You can zoom into individual subzone performance, by clicking on a division in the bar graph; the subzones which are in that particular state will then appear.

Clicking on the Sub Zones link in the Infrastructure Health section also invokes the Zones page, but in this case, the page displays all sub-zones that are part of the parent zone, regardless of state.

Either way, by default, the Zones page that appears displays the following sub-zones: direct sub-zones of the original zone, and zones (if any) that are included in the direct sub-zones. For instance, assume that 3 zones - zone A, zone B, and zone C - have been configured. While zone B has been directly assigned to zone A, zone C has been added to zone B. Now, while viewing the dashboard of zone A, if say, the Sub Zones link in the Infrastructure Health section is clicked, then the resulting Zones page will list the following by default:

  • zone B which is the direct sub-zone of zone A;
  • zone C which is added to zone B

In the same way, the Components bar graph represents the number and state of the components that are part of the zone. Clicking on a division in the Components bar lists the components in that particular state. Instead, if you click on the Components label in the Infrastructure Health section, you can view the complete list of components associated with the chosen zone, regardless of state.

Likewise, the Services and Segments bars in the Infrastructure Health section indicate the number and state of services and segments (if any) that are part of the said zone. While clicking on any division in the Services graph provides you with a list of services in that particular state, segments of a specific state will be displayed when you click on the corresponding division in the Segments graph. Alternatively, you can click on the Services or Segments label (as the case may be) to view all the segments/services (as the case may be) included in the zone, regardless of their state.

If the Measures At-A-Glance section is enabled in the eG administrative interface, and if measures have been configured to be displayed in this section, then the Details tab page will display a Measures At-A-Glance section. The Measures At-A-Glance section provides the min/max values of critical performance data collected in real-time from the zone being monitored. A quick look at this panel will instantly reveal significant deviations in zone performance. Click on any of the measures in this section to view the layer model, tests, and measurements pertaining to the corresponding component.

Alongside the Measures At-A-Glance tab is an Event Analysis tab, that primarily lists the top-5 layers at the zone-level, which were most affected by performance issues. Corresponding to every layer name in this section, you will see the number of alarms that are currently open for that layer, the average duration of the open alarms, and the maximum duration for which an alarm had remained open. If you have a dedicated troubleshooting cell for the zone, then this information will serve as an effective indicator of the efficiency of the cell in resolving performance issues pertaining to the zone. To view the complete history of alarms in the environment, click on the Click here for more events >> link.

Besides a layer-wise event analysis, this section also enables a component-wise review of events that occurred during the last hour (by default). By choosing a different duration from the Components with most events in the last list, you can view the zone components that experienced performance degradations during the chosen duration, and the number of problem events each such component encountered. This sheds light on the most problem-prone components in the zone. Clicking on a component or component-type in this section, will lead you to the layer model of the corresponding component, revealing the current status of the component layers.

The Components At-a-Glance section comprises of a bar graph depicting the number of components of each type that available in the monitored zone, and their respective states. Clicking on a bar will take you to a page that lists the individual components of the corresponding type and state. To view the complete list of zone components of a particular type, just click on the corresponding component-type in the Components At-A-Glance section.

Note:

By default, in the Components At-A-Glance section, the component-types are sorted in the descending order of the total number of monitored components of every type - in other words, in the descending order of the values in the Count column of the section. To change the sort order - i.e., to sort the component-types in the ascending order of the contents of the Count column - simply click on the down-arrow icon next to Count. To sort by a different column, say, the Server Type column, simply click on the corresponding column heading. This will instantly sort the contents in the alphabetical order of the names of the displayed server types. You can even override the default sort order, so that the component-types are by default arranged in the alphabetical order of their names, and not on the basis of the Count. To achieve this, first switch to the eG administrative interface. In the MONITOR SETTINGS - Other Display Settings page, set the Sort components in dashboards flag to By component types. This ensures that the contents of the Components At-a-Glance section are by default sorted in the ascending order of the component-types. Accordingly, the down-arrow icon, by default, appears next to the column heading, Server Type.