eG Monitoring
 

Measures reported by SPFSearchTest

The Microsoft Sharepoint Server Search feature is implemented using two of its main services:

  • Indexing: Responsible for crawling content sources and building index files.
  • Searching: Responsible for finding all information matching the search query by searching the index files.

All searching is performed against the index files; if these files do not contain what the user is looking for, there will not be a match. So, the index files are critical to the success of the search feature of Microsoft Sharepoint Server. The search functionality can be described in its simplest form as a Web page where the user defines his or her search query. The index role can be configured to run on its own Microsoft Sharepoint server, or run together with all the other roles, such as the Web service, Excel Services and Forms Services. It performs its indexing tasks following this general workflow:

  1. SharePoint stores all configuration settings for the indexing in its database.
  2. When activated, the index will look in SharePoint's databases to see what content sources to index, and what type of indexing to perform, such as a full or incremental indexing.
  3. The index service will start a program called the Gatherer, which is a program that will try to open the content that should be indexed.
  4. For each information type, the Gatherer will need an Index Filter, or IFilter, that knows how to read text inside this particular type of information. For example, to read a MS Word file, an IFilter for .DOC is needed.
  5. The Gatherer will receive a stream of Unicode characters from the IFilter. It will now use a small program called a Word Breaker; its job is to convert the stream of Unicode characters into words.
  6. However, some words are not interesting to store in the index, such as "the", "a", and numbers; the Gatherer will now compare each word found against a list of Noise Words. This is a text file that contains all words that will be removed from the stream of words.
  7. The remaining words are stored in an index file, together with a link to the source. If that word already exists, only the source will be added, so one word can point to multiple sources.
  8. If the source was information stored in SharePoint, or a file in the file system, the index will also store the security settings for this source. This will prevent a user from getting search results that he or she is not allowed to open.
Since the success of an indexing operation also depends upon how the Gatherer program functions, administrators need to keep their eyes open for irregularities in the functioning of the gatherer, so that such anomalies are detected instantly, and corrected before they can stall the indexing process.

This test monitors the performance of the SharePoint Foundation Search Gatherer, and reports issues in its performance (if any).

 The measures made by this test are as follows:

Measurement Description Measurement Unit Interpretation
Filtering_threads Indicates the current number of filtering threads in the system. Number  
Idle_threads Indicates the number of threads that are currently waiting for documents. Number These threads are not currently doing any work and will eventually be terminated. If you consistently have a more than Max Threads/Hosts idle threads you can schedule an additional crawl. If this number is 0 then you are starved. Do not schedule another crawl in this time period and analyze the durations of your crawls during this time to see if they are meeting your freshness goals. If your goals are not being met you should either reduce the number of crawls.
Network_threads Indicates the number of threads that were waiting for a response from the filter process. Number If you figure out that there is no activity that is taking place as far as this measure is concerned, and if the value of this measure is equal to the Filtering_threads measure; it indicates a network issue or the unavailability of the server that is crawling into.
Committing_threads Indicates the number of threads that are committing transactions. Number  
Plugin_threads Indicates the number of threads currently waiting for plug-ins to complete an operation. Number These threads have the filtered documents and are processing it in one of several plug-ins. This is when the index and property store are created.
Loading_threads Indicates the number of transactions that are loaded from the persisted crawl queue. Number  
Link_processing_threads Indicates the number of threads that are processing links. Number  
Active_filter_processes Indicates the number of filtering processes that are active in the system. Number  
Filter_objects Indicates the number of filter objects in the system. Number  
Active_document_queue Indicates the number of documents that are waiting for robot threads. Number If the value of this measure is 0, then it implies that all the threads are filtering threads.
Admin_clients Indicates the number of currently connected admin clients. Number  
Performance_level Indicates the amount of resources that the Gatherer service is allowed to use. Number  
Current_servers Indicates the number of servers that were recently accessed by the system. Number  
Unavailable_servers Indicates the number of servers that are currently unavailable to the system. Number A server becomes unavailable if the requests made to the server is timed out.
Stemmers_cached Indicates the number of cached stemmer instances in the system. Number Stemmers are nothing but components shared by the Search and Indexing engines that generate inflected forms for a word. Too many stemmer instances that are cached may indicate a resource usage problem.
System_IO_rate Indicates the rate at which the system IO disk traffic is detected during back off period. KB/Sec During a back-off period, indexing is suspended. To manually back off the gatherer service, pause the search service. If the search service itself generates the back-off, an event will be recorded and the search service will be paused automatically. There is no automatic restart, so you must manually start the search service in order to end a back-off state. Note that there is little reason to start the search service until you have solved the problem that caused the back-off in the first place.
Time_outs Indicates the number of timeouts detected by the system during the last measurement period. Number Ideally, this value should be zero.
Documents_filtered_rate Indicates the rate at which the documents are filtered in the system. KB/sec If this rate is decreasing over time, you should perform some troubleshooting to find out why your server is not filtering documents.

Look for memory issues, processor issues, network issues, or site hit frequency rules that slow the gatherer process.

Successful_filter_rate Indicates the rate at which the documents are filtered sucessfully in the system. KB/sec  
Delayed_documents Indicates the number of documents that were currently delayed due to site hit frequency rules. Number If you have a plethora of rules and this number is steadily increasing over time, consider relaxing or simplifying your site hit frequency rules. A very high number may indicate a conflict in the rules that the gatherer cannot resolve or follow with efficiency.
Documents_in_memory Indicates the number of document entries that are currently available in the memory of the system. KB/sec If the value of this measure is 0, it indicates that the indexing activity has been stopped.
Documents_filtered Indicates the total number of documents filtered in the system during the last measurement period. Number  
Docs_successful_filtered Indicates the total number of documents that are successfully filtered in the system during the last measurement period. Number If the value of this measure is less than the value of the Documents_filtered measure, use the gatherer logs to figure out the cause for the documents that are attempting to be filtered but are failing.