eG Monitoring
 

Measures reported by NetCluFcpPortTest

A Fibre Channel (FC) port is a hardware pathway into and out of a node that performs data communication over an FC link i.e., an FC Channel. The FC ports therefore are the primary handlers of I/O requests from the NetApp Cluster. I/O load on the ports directly translate into load on the volumes of the cluster. This is why, administrators need to continuously monitor the data and read/write latency on each port, so that overloaded ports can be quickly identified and the load-balancing algorithim fine-tuned accordingly. Moreover, since port-related errors can deny hosts access to the data stored in the NetApp Cluster, port monitoring is imperative to enable administrators to quickly detect such errors and fix them to ensure the normal functioning of the cluster. This can be achieved using this test! For each FC port on the NetApp Cluster, this test reports the rate at which data and I/O requests are handled and the number and nature of errors/failures encountered by each FC port. This way, administrators can be proactively alerted to potential port overloads and error conditions (with FC ports), and thus enabled to rapidly initiate remedial measures to avoid an impending system slowdown.

The measures made by this test are as follows:

Measurement Description Measurement Unit Interpretation
Avg_other_latency Indicates the average time taken to perform operations other than read and write through this port. Secs  
Other_ops Indicates the rate at which operations other than read and write are performed through this port. Ops/Sec  
Read_ops Indicates the rate at which data/block is read through this port. Ops/Sec Very high values for these measures are indicative of the existence of road-blocks to rapid reading/writing by the storage device through the port. By observing the variations in these measures over time, you can understand whether the latencies are sporadic or consistent on the port. Consistent delays in reading/writing could indicate that there are persistent bottlenecks (if any) in the port which helps you identify the over-utilized ports.
Avg_read_latency Indicates the average time to read a block/data through this port upon a user request. Secs
Write_ops Indicates the rate at which data/block is written through this port. Ops/Sec
Avg_write_latency Indicates the average time taken to write a block/data using this port upon a user request. Secs
Auth_failures Indicates the number of times authentication failure occurred on this port during the last measurement period. Number  
Link_down Indicates the number of times the Fiber Channel link was lost during the last measurement period. Number  
Lip_f8_received Indicates the number of loop failures detected at the receiver of this FC port during the last measurement period.   Loop Initialization is an essential process for allowing new devices onto the loop, assigning Aribrated Loop Physical Addresses (AL_PAs), providing notification of topology changes, and recovering from loop failure. Following loop initilaization, the loop enters a stable monitoring mode and resumes normal activity. Depending on the number of normal ports (NL_Ports) attached to the loop, an entire loop initialization may take a few milliseconds. A loop initialization can be triggered by a number of causes, the most common being the introduction of a new device. The new device could actually be a former device that has been powered on, or an active device that has been moved from one hub port to another.

A number of ordered sets have been defined to cover the various conditions that an NL_port may sense as it launches the initialization process. These ordered sets, called loop initialization primitive sequences, are referred to collectively as LIPs. An NL_Port issues atleast 12 LIPs to start loop initialization. During loop initialization, each downstream device that are part of the loop receives the LIP stream and enters a state known as Open-init, which suspends any current operations and prepares the device for the loop initialization procedure. The LIPs are forwarded along the loop until all NL_ports, including the originator of the loop, are in Open-init state. At this point, a temporary loop master is selected for conducting the rest of the initialization procedure. The first task of the temporary loop master is to issue a series of four frames that will allow each device on the loop to select a unique AL_PA. A LIP reset is used to perform a vendor specific reset at the loop port specified by this AL-PA value. These LIP resets are used to temporarily cure connectivity issues. Prolonged resets should be noted and the underlying actual connectivity issues should be resolved.

Loop_init_err Indicates the number of loop initialization errors that occurred on this FC port during the last measurement period. Number Ideally, the value of this measure should be zero.
Loss_of_signal Indicates the number of times the signal was lost on this FC port during the last measurement period. Number Ideally, the value of this measure should be zero. A non-zero value for this measure indicates that the port detected a loss of the electrical or optical signal used to transfer data on the port.

This is likely an indicator for a faulty connector or cable. These are also caused when the device connected to the port is restarted, replaced or being serviced when the Fibre Channel cable connected to the port is temporarily disconnected.

If the port is in the “loss of signal” state for longer than a specific period, the port will get into the link failure state which could degrade the performance of the Fibre Channel link.

Loss_of_sync Indicates the number of times this FC port failed to synchronize during the last measurement period. Number Ideally, the value of this measure should be zero. A non-zero value for this measure indicates that port went into the “loss of synchronization” state, where it encountered continuous Disparity errors.

This is likely an indicator for a faulty connector or cable. These are also caused when the device connected to the port is restarted, replaced or being serviced when the Fibre Channel cable connected to the port is temporarily disconnected.

If the port is in the “ loss of synchronization ” state for longer than a specific period, the port will get into the link failure state which could degrade the performance of the Fibre Channel link.

Prim_seq_err Indicates the number of Primitive Sequence protocol errors that occurred on this FC port during the last measurement period. Number Ideally, the value of this measure should be zero.
Spurious_int_count Indicates the number of spurious signals received by this FC port during the last measurement period. Number  
Virtual_link_down Indicates the number of times the virtual Fiber channel link was lost on this FC port during the last measurement period. Number Ideally, the value of this measure should be zero. A non-zero value for this measure indicates that the port detected a loss of the electrical or optical signal used to transfer data on the port.