eG Monitoring
 

Measures reported by DeferredQueueTest

If a message is delivered to most of the deliverable recipients and for some recipients delivery failed due to a transient reason (may succeed in delivery later), then such messages are placed in the deferred queue. The queue manager scans the deferred queue periodically and during each scan, a fraction of the deferred queue is brought back to the active queue for retry. Each message in the deferred queue will have a cool-off time limit set beyond which the message will be retried for delivery. One of the common causes of large deferred queues is the failure to validate recipients at the SMTP input stage. This is due to spammers routinely launching dictionary attacks from unreliable sender addresses following which the invalid recipient addresses bounce and clog the deferred queue. Therefore, recipient validation is strongly recommended. Another common cause of congestion is unwarranted flushing of the entire deferred queue. The deferred queue holds messages that are likely to fail to be delivered and are also likely to be slow to fail delivery (i.e., time out). As a result the most common reaction to a large deferred queue is to flush out the deferred queue which may ease congestion to an extent. The deferred queue should not be flushed until and unless most of its content has recently become deliverable (e.g. relayhost back up after an outage)! If the deferred queue grows endlessly, then the messages will often be retried for delivery which may sometimes flood the active queue and cause a brief congestion of the queues. To avoid this, administrators should continuously monitor the deferred queue and figure out at what time the messages in the deferred queue started increasing manifold. Administrators should also identify the domain to which most of the messages failed to be delivered so that legitimacy of that domain can also be examined. To help administrators in these tasks, eG Enterprise Suite provides you with the DeferredQueueTest.

This test periodically monitors the deferred queue of the target Postfix mail server and reports the total size of the deferred queue as well as the split up of the message count in terms of time duration i.e., the number of messages that were in the deferred queue for a specified time duration.

The measures made by this test are as follows:

Measurement Description Measurement Unit Interpretation
Queue_size Indicates the total size of the queue. Number An unusually high number of messages in the queue is indicative of any problem with the corresponding queue or its end points.

The Detailed diagnosis of this measure lists the message count for each domain.
Less_than_5_mins Indicates the number of messages that were in the queue for less than 5 minutes. Number  
Between_5_to_10_mins Indicates the number of messages that were in the queue for a time duration between 5 to 10 minutes. Number  
Between_10_to_20_mins Indicates the number of messages that were in the queue for a time duration between 10 to 20 minutes. Number  
Between_20_to_40_mins Indicates the number of messages that were in the queue for a time duration between 20 to 40 minutes. Number  
Between_40_to_80_mins Indicates the number of messages that were in the queue for a time duration between 40 to 80 minutes. Number  
Between_80_to_160_mins Indicates the number of messages that were in the queue for a time duration between 80 to 160 minutes. Number  
Between_160_to_320_mins Indicates the number of messages that were in the queue for a time duration between 160 to 320 minutes. Number  
Between_320_to_640_mins Indicates the number of messages that were in the queue for a time duration between 320 to 640 minutes. Number  
Between_640_to_1280_mins Indicates the number of messages that were in the queue for a time duration between 640 to 1280 minutes. Number  
More_than_1280_mins Indicates the number of messages that were in the queue for more than 1280 minutes. Number A low value is desired for this measure.

When a host with lots of deferred mail is down for some time, it is possible for the entire deferred queue to reach its retry time simultaneously. This can lead to a very full active queue once the host comes back up. The phenomenon can repeat approximately every maximal_backoff_time seconds if the messages are again deferred after a brief burst of congestion. Since the messages are retired constantly, it is important for the administrators to keep a constant vigil on the value of this measure. If this measure is at a high always, then the messages will always be retried leding to congestion.