eG Monitoring
 

Measures reported by LnReplicationTest

A Domino cluster is a group of two or more servers that provides users with constant access to data, balances the workload between servers, improves server performance, and maintains performance when you increase the size of your enterprise. The servers in a cluster contain replicas of databases that you want to be readily available to users at all times. If a user tries to access a database on a cluster server that is not available, Domino opens a replica of that database on a different cluster server, if a replica is available. Domino continuously synchronizes databases so that whichever replica a user opens, the information is always the same.

There is a special component on the servers in a cluster, called "Cluster Replicator" that is responsible for replication being performed between the databases. When a cluster replicator learns of a change to a database, it immediately pushes that change to all other replicas in the cluster. All replication events are stored in memory, and if a destination server is not available, the "Cluster Replicator" continues to store these events in memory until the destination server bevomes available.

By default, every server in a cluster consists of a single cluster replicator. However, in order to augment the performance of the Domino cluster, multiple replicators can be configured on a server. The decision to introduce more replicators on a cluster server can be taken only after understanding and analyzing how well the default replicator on the server handles the replication requests to it. The LnReplicationTest periodically monitors a cluster server's replicator-related activities and reveals critical performance statistics based on which administrators can decide whether/not to add more replicators to it.

Measurement Description Measurement Unit Interpretation
Replications_successful The rate of successful replications during the last measurement period Replications/Sec  
Replications_failed The rate of failed replications during the last measurement period Replications/Sec  
Replication_docs_added The rate at which replication docs were added during the last measurement period Docs/Sec  
Replication_docs_deleted The rate at which replication docs were deleted during the last measurement period Docs/Sec  
Replication_docs_updated The rate at which replication docs were updated during the last measurement period Docs/Sec  
Avg_work_queue_length The average work queue length since server startup Number  
Curr_work_queue_length The current number of databases awaiting replication by the cluster replicator Number If the value of this measure increases consistently, it could indicate a replication backlog - in other words, too many databases could be waiting to be replicated. In such a case, consider configuring more replicators on the server so that replication workload is shared and pending replication requests are cleared in a timely manner. Steady spikes in this measure could also indicate insufficient network bandwidth to process the transactions. If this is the case, you should consider setting up a private LAN for your cluster.
Avg_work_queue_wait_time The average amount of time in seconds that a database spent on the work queue Secs Since the cluster replicator checks its queue every 15 seconds, this number should be under 15 during periods of light load. If this number is frequently higher than 30, you should consider adding one or more cluster replicators.
Data_received_rate The amount of data received by the replicator during the last measurement period Kbytes/Sec  
Data_sent_rate The amount of data sent by the replicator during the last measurement period Kbytes/Sec