eG Monitoring
 

Measures reported by HdpShufleErTest

In Hadoop MapReduce, the process of shuffling is used to transfer data from the mappers to the necessary reducers. It is the process in which the system sorts the unstructured data and transfers the output of the map as an input to the reducer. It is a necessary process for reducers otherwise they would not receive any input. This means that errors in the shuffling process can cause MapReduce jobs to fail or to slow down! It is therefore important that administrators promptly capture these errors, diagnose their reasons, and fix them. This is where the HdpShufleErTest test helps!

This test monitors the shuffling process and reports the number and nature of errors encountered in the process. Detailed diagnostics reveals the precise jobs that were impacted by the shuffle errors. This way, the test not only alerts administrators to bottlenecks in the shuffling process, but also aids troubleshooting by revealing the jobs that were affected.

Outputs of the test : One set of the results for the target Hadoop cluster

The measures made by this test are as follows:

Measurement Description Measurement Unit Interpretation
Bad_id_type Indicates the number of BAD_ID errors that occurred in the shuffling process. Number BAD_ID errors are those that are related with the interpretation of IDs from shuffle headers.

If this measure reports a value greater than 0, then you can use the detailed diagnosis of the measure to identify the jobs that were affected by these errors.
Connection_type Indicates the number of errors of type CONNECTION. Number If this measure reports a value greater than 0, then you can use the detailed diagnosis of the measure to identify the jobs that were affected by these errors.
Io_error_type Indicates the number of errors of type I/O_ERROR. Number I/O_ERRORs are those that are related with reading and writing intermediate data.

If this measure reports a value greater than 0, then you can use the detailed diagnosis of the measure to identify the jobs that were affected by these errors.
Wrong_length_type Indicates the number of errors of type WRONG_LENGTH. Number WRONG_MAP errors are related to duplication of the mapper output data (when framework tries to process already processed mapper output).

If this measure reports a value greater than 0, then you can use the detailed diagnosis of the measure to identify the jobs that were affected by these errors.
Wrong_reduce_type Indicates the number of errors of type WRONG_REDUCE. Number WRONG_REDUCE errors are related to the attempts of shuffling data for wrong reducer (when shuffle for determined reducer tries to shuffle the data for different reducer).

If this measure reports a value greater than 0, then you can use the detailed diagnosis of the measure to identify the jobs that were affected by these errors.