an issue that occurred recently was that a content source within our SSP for search (MOSS 2007) did not include any items. The crawl log of the SharePoint’s Central Administration stated the following:
The specified address was excluded from the index. The crawl rules may have to be modified to include this address. (The item was deleted because it was either not found or the crawler was denied access to it.)
Interestingly, some of the content sources we already had before were crawled without any obstacles, thus the (mis)configuration of the problematic application seemed suspicious. After checking the permissions of service accounts involved in the crawling process (not the cause), and after comparing the settings between the apps (not the cause as well) – the problem was in the crawl rules set up for this content source. The option for crawling complex URLs hasn’t been activated for the subdomain URL we wanted to crawl. Enabling the “Crawl complex URLs (URLs that contain a question mark (?))” option under Shared Services Administration: SSP > Search Administration > Crawl rules > Add or Edit Crawl Rule and starting the full crawl from the beginning solves the problem.
But still the question was, why the non-complex, normal URLs could not be crawled by the service. The cause was in our IIS configuration, which is globally set up to automatically detect cookie mode for session state. This results in appending a query string parameter to the URL at first request. So that the URL looks similar to this: http://www.ourdomain.com/index.html?AspxAutoDetectCookieSupport=1 .
Now it seems pretty clear why the crawler without the rule mentioned before had problems. It failed at the first request to the root URL, since the rule has not been met. Hence, it could not continue crawling and left the index empty with the error/warning message.
Hope this helps,