Agents Administration - Tests
 

Default Parameters for HdpMRJobDetTest

This test auto-discovers the users running MapReduce jobs on the cluster, and for each user, reports the count of jobs in different states. In the process, the test alerts administrators to failed jobs and jobs with errors. Additionally, for each user, the test also measures how much time the jobs run by that user took to complete. This points administrators to slow jobs and the user running them. The test also highlights users whose jobs took the maximum time for map/reduce processing. Detailed diagnostics not only shed light on such jobs, but also accurately tell where the job execution was bottlecked - in running map tasks? or in running reduce tasks? This greatly aids troubleshooting. Moreover, the test also pinpoints jobs requiring more heap memory. This way, the test reveals to administrators if improper job configuration is what caused job execution to slow down.

This page depicts the default parameters that need to be configured for the HdpMRJobDetTest test.

  • The TEST PERIOD list box helps the user to decide how often this test needs to be executed.

  • The eG agent collects metrics using Hadoop's WebHDFS REST API. While some of these API calls pull metrics from the NameNode, some others get metrics from the resource manager. NameNode is the master node in the Apache Hadoop HDFS Architecture that maintains and manages the blocks present on the DataNodes (slave nodes). NameNode is a very highly available server that manages the File System Namespace and controls access to files by clients. To run API commands on the NameNode and pull metrics, the eG agent needs access to the NameNode's web port.

    To determine the correct web port of the NameNode, do the following:

    • Open the hdfs-default.xml file in the hadoop/conf/app directory.

    • Look for the dfs.namenode.http-address parameter in the file.

    • This parameter is configured with the IP address and base port where the DFS NameNode web user interface listens on. The format of this configuration is: <IP_Address>:<Port_Number>. Given below is a sample configuration:

      192.168.10.100:50070

    Configure the <Port_Number> in the specification as the Name Node Web Port. In the case of the above sample configuration, this will be 50070.

  • The eG agent collects metrics using Hadoop's WebHDFS REST API. While some of these API calls pull metrics from the NameNode, some others get metrics from the resource manager. NameNode is the master node in the Apache Hadoop HDFS Architecture that maintains and manages the blocks present on the DataNodes (slave nodes). NameNode is a very highly available server that manages the File System Namespace and controls access to files by clients.

    In some Hadoop configurations, a simple authentication User name may be required for running API commands and collecting metrics from the NameNode. When monitoring such Hadoop installations, specify the name of the simple authentication user here. If no such user is available/required, then do not disturb the default value none of this parameter.

  • The eG agent collects metrics using Hadoop's WebHDFS REST API. While some of these API calls pull metrics from the NameNode, some others get metrics from the resource manager. The YARN Resource Manager Service (RM) is the central controlling authority for resource management and makes resource allocation decisions.

    To pull metrics from the resource manager, the eG agents first needs to connect to the resource manager. For this, you need to configure this test with the IP address/host name of the resource manager and its web port. Use the Resource Manager IP and Resource Manager Web Port parameters to configure these details.

    To determine the IP/host name and web port of the resource manager, do the following:

    • Open the yarn-site.xml file in the /opt/mapr/hadoop/hadoop- 2.x.x/etc/hadoop directory.

    • Look for the yarn.resourcemanager.webapp.address parameter in the file.

    • This parameter is configured with the IP address/host name and web port of the resource manager. The format of this configuration is: <IP_Address_or_Host_Name>:<Port_Number>. Given below is a sample configuration:

    • 192.168.10.100:8080

    Configure the <IP_ Address_ or_ Host_ Name> in the specification as the Resource Manager IP, and the <Port_Number> as the Resource Manager Web Port. In the case of the above sample configuration, this will be 8080.

  • The eG agent collects metrics using Hadoop's WebHDFS REST API. While some of these API calls pull metrics from the NameNode, some others get metrics from the resource manager. The YARN Resource Manager Service (RM) is the central controlling authority for resource management and makes resource allocation decisions.

    In some Hadoop configurations, a simple authentication User name may be required for running API commands and collecting metrics from the Resource Manager. When monitoring such Hadoop installations, specify the name of the simple authentication user here. If no such user is available/required, then do not disturb the default value none of this parameter.

  • By default, the detailed diagnosis of this test, if enabled, will report only the top-10 records. This is why, the DD ROW COUNT parameter is set to 10 by default. If you want to include more or less records in detailed diagnosis, then change the value of this parameter ccordingly.

  • To make diagnosis more efficient and accurate, the eG Enterprise suite embeds an optional detailed diagnostic capability. With this capability, the eG agents can be configured to run detailed, more elaborate tests as and when specific problems are detected. To enable the DETAILED DIAGNOSIS capability of this test for a particular server, choose the On option. To disable the capability, click on the Off option.

    The option to selectively enable/disable the detailed diagnosis capability will be available only if the following conditions are fulfilled:

    • The eG manager license should allow the detailed diagnosis capability.

    • Both the normal and abnormal frequencies configured for the detailed diagnosis measures should not be 0.

  • Once the above values are provided, click on the UPDATE button to register the changes made.

When changing default configurations of tests, the values with “$” indicate variables that will be replaced by the eG system according to the specific server being managed - for instance, $hostName is the host/nickname of the target host, $port is the port number of the server being monitored. E.g., for a server xyz:80, $hostName will be changed automatically by the eG manager to “xyz*” and $port will be changed to “80” when configuring a test.