|
Measures reported by HdpFsTest
FsImage is a file stored on the OS filesystem that contains the complete directory structure (namespace) of the HDFS with details about the location of the data on the Data Blocks and which blocks are stored on which node. This file is used by the NameNode when it is started.
EditLogs is a transaction log that records the changes in the HDFS file system or any action performed on the HDFS cluster such as addition of a new block, replication, deletion etc. In short, it records the changes since the last FsImage was created.
Every time the NameNode restarts, EditLogs are applied to FsImage to get the latest snapshot of the file system. But NameNode restarts are rare in production clusters. Because of this, you may encounter the following issues:
EditLog grows unwieldy in size, particularly where the NameNode runs for a long period of time without a restart;
NameNode restart takes longer, as too many changes now have to be merged
If the NameNode fails to restart (i.e., crashes), there will be significant data loss, as the FsImage used at the time of the restart is very old
Secondary Namenode helps to overcome the above issues by taking over the responsibility of merging EditLogs with FsImage from the NameNode.
The Secondary NameNode obtains the FsImage and EditLogs from the NameNode at regular intervals.
Secondary NameNoide loads both the FsImage and EditLogs to main memory and applies each operation from the EditLogs to the FsImage.
Once a new FsImage is created, Secondary NameNode copies the image back to the NameNode.
Namenode will use the new FsImage for the next restart, thus reducing startup time.
However, this seemingly fail-proof process is not without issues. Delays in the aforesaid process can cause a NameNode to startup without the latest FsImage at its disposal. Such delays can occur if:
The Secondary NameNode takes too long to download the EditLogs from the NameNode;
The NameNode is slow in uploading FsImages to the Secondary NameNode and/or in downloading the updated FsImages from the Secondary NameNode
To avoid such delays, administrators will have to closely monitor the communication between the NameNode and Secondary NameNode, proactively detect any slowness in the upload and/or download of FsImages / EditLogs, and promptly initiate measures to isolate and remove the source of the slowness. This is where the HdpFsTest test helps!
This test monitors the following:
In the process, the test sheds light on latencies in communication and processing that could be slowing down uploads/downloads between the primary and secondary nodes in the cluster.
Outputs of the test : One set of the results for the Hadoop cluster being monitored
The measures made by this test are as follows:
| Measurement |
Description |
Measurement Unit |
Interpretation |
| Edit_downloads |
Indicates the rate at which the Secondary NameNode downloads EditLogs. |
Downloads/Sec |
A low value for this measure or a steady decrease in the value of this measure could indicate that the Secondary NameNode is slow in downloading edits. One reason for this could be the size of the edits - if too many changes/edits need to be downloaded, then the download process will be slow. Another reason could be the poor quality of the network connection between the NameNode and the Secondary
NameNode. |
| Avg_edit_download_time |
Indicates the time taken for the EditLogs to be downloaded by the Secondary NameNode. |
Seconds |
A low value is desired for this measure. An unusually high value is indicative of slowness when
downloading edits. One reason for this could be the size of the edits - if too many changes/edits need to be downloaded, then the download process will be slow. Another reason could be the poor quality of the network connection between the NameNode and the Secondary NameNode. |
| Fsimage_downloads |
Indicates the rate at which the updated FsImages are downloaded from the Secondary NameNode. |
Downloads/Sec |
A high value is desired for this measure. A low value is indicative of latency when downloading the latest snapshot of data from the Secondary NameNode. One reason for this could be the size of the FsImage - if too many changes/edits were applied to the old FsImage, the resultant snapshot will be of a large size. Large files naturally, take longer to download. Another reason could be the poor quality of the network connection between the NameNode and the Secondary NameNode. |
| Avg_fsimg_dwnld_time |
Indicates the time taken to download the updated FsImages from the Secondary NameNode. |
Number |
A low value is desired for this measure. A high value indicates that the NameNode is downloading
FsImages lazily. One reason for this could be the size of the FsImage - if too many changes/edits were applied to the old FsImage, the resultant snapshot will be of a large size. This can delay downloading. Another reason could be the poor quality of the network connection between the NameNode and the Secondary NameNode. |
| Fsimage_uploads |
Indicates the rate at which FsImages were uploaded to the Secondary NameNode. |
Uploads/Sec |
A high value is desired for this measure. A low value is indicative of latency when uploading the FsImage from the NameNode to the Secondary NameNode. One reason for this could be the size of the FsImage - if the FsImage to be updated is large in size, it will take a while for the NameNode to upload it to the Secondary NameNode. Another reason could be the poor quality of the network connection between the
NameNode and the Secondary NameNode. |
| Avg_fsimg_upload_time |
Indicates the time taken to upload the FsImage to the Secondary NameNode. |
Seconds |
A low value is desired for this measure. A high value indicates that the NameNode is uploading
FsImages to the Secondary NameNode, lazily. One reason for this could be the size of the FsImage - if the FsImage to be updated is large in size, it will take a while for the NameNode to upload it to the Secondary NameNode. Another reason could be the poor quality of the network connection between the NameNode and the Secondary NameNode. |
|