| eG Monitoring |
|---|
|
Measures reported by FSWebCrawColTest The FAST Search Web crawler collects content from a set of defined Web sites, which can be internal or external. The FAST Search Web crawler works, in many ways, like a Web browser downloading content from Web servers. But unlike a Web browser that responds only to user input via mouse clicks or keyboard, the FAST Search Web crawler works from a set of configurable rules it must follow when it requests Web items. This includes, for example, how long to wait between requests for items, and how long to wait before checking for new or updated items. The main configuration concept in the FAST Search Web crawler is a “collection”. Each crawl collection contains the configuration applicable to the particular collection, such as which start addresses and crawl rules to apply. A typical solution might have crawl collections such as Extranet or Blogs. The FAST Search Web crawler starts by comparing the start URL list against include and exclude rules specified in parameters in the XML file containing the configuration of a crawl collection. The start URL list is specified with either the start_uris or start_uri_files setting, and the rules via the include_domains and exclude_domains setting. Valid URLs are then requested from their Web servers at a rate determined by the request rate that is configured in the delay setting. If fetched successfully, the Web item is parsed for hyperlinks and other meta-information, usually by a HTML parser built into the FAST Search Web crawler. The Web item's meta-information is stored in the FAST Search Web crawler meta-database, and the Web item content (the HTML body) is stored in the FAST Search Web crawler store. The hyperlinks are filtered against the crawl rules, and used as the next set of URLs to be downloaded. This process continues until all reachable content has been gathered, until the refresh interval (refresh setting) is complete or until another configuration parameter limiting the scope of the crawl is reached. To determine how efficiently the Web crawler functions, you need to understand the current load generated by each crawl collection in terms of the number and size of documents that are crawled per collection and the speed with which these documents are downloaded by the crawler. The FSWebCrawColTest test provides you with these useful insights and helps assess the Web Crawler's efficiency. The measures made by this test are as follows:
|
|||||||||||||||||||||||||||||||||||