|
Configuration of AWSEMapReduceTest
This test automatically discovers the clusters on Amazon EMR and tracks the status of each cluster and the progress of cluster activities. In the process, the test turns the spot light on clusters that are processing data slowly, and provides useful pointers to what could be slowing down processing. Optionally, you can configure this test to report metrics for every job that a cluster runs. This will enable administrators to identify those jobs that could be slowing down data processing and what tasks are performed by the slow jobs - map tasks? or reduce tasks?
The default parameters associated with this test are:
To monitor an Amazon EC2 instance, the eG agent has to be configured with the access key and secret key of a user with a valid AWS account. For this purpose, we recommend that you create a special user on the AWS cloud, obtain the access and secret keys of this user, and configure this test with these keys. To know the procedure for this, click here. Specify the access key and secret key so obtained in the AWS ACCESS KEY and AWS SECRET KEY text boxes. Make sure you reconfirm the access and secret keys you provide here by retyping it in the CONFIRM AWS ACCESS KEY and CONFIRM AWS SECRET KEY text boxes.
In some environments, all communication with the AWS EC2 cloud and its regions could be routed through a proxy server. In such environments, you should make sure that the eG agent connects to the cloud via the proxy server and collects metrics. To enable metrics collection via a proxy, specify the IP address of the proxy server and the port at which the server listens against the PROXY HOST and PROXY PORT parameters. By default, these parameters are set to none, indicating that the eG agent is not configured to communicate via a proxy, by default.
If the proxy server requires authentication, then, specify a valid proxy user name and password in the PROXY USER NAME and PROXY PASSWORD parameters, respectively. Then, confirm the password by retyping it in the CONFIRM PASSWORD text box. By default, these parameters are set to none, indicating that the proxy sever does not require authentication by default.
If a Windows NTLM proxy is to be configured for use, then additionally, you will have to configure the Windows domain name and the Windows workstation name required for the same against the PROXY DOMAIN and PROXY WORKSTATION parameters. If the environment does not support a Windows NTLM proxy, set these parameters to none.
In the EXCLUDE REGION text box, you can provide a comma-separated list of region names or patterns of region names that you do not want to monitor. For instance, to exclude regions with names that contain lsquo;east’ and ‘west’ from monitoring, your specification should be: *east*,*west*.
By default, the EMR FILTER NAME parameter is set to JobFlowId. This is the same as cluster ID, which is the unique identifier of a cluster in the form j-XXXXXXXXXXXXX. In this case, this test will report metrics for every cluster.
If required, you can override this default setting by setting the EMR FILTER NAME to JobId. You can use this to filter the metrics returned from a cluster down to those that apply to a single job within the cluster.
When changing the configuration for specific servers, a “*” beside the text box corresponding to the parameter signifies that these values have to be manually configured by the user. The parameter values that require to be configured will typically be prefixed with a “$” or contain a series of “*”. A value of "none" in the parameter value indicates that the corresponding parameter value can be changed if required.
|