Bing’s search crawler, bingbot, has been improved to maximize crawl efficiency.
Bingbot, as one might assume, is similar to Googlebot. Its job is to discover new and updated content and add it to Bing’s index.
One of the concerns Bing has been hearing from webmasters regarding bingbot is that it doesn’t crawl frequently enough.
As a result of not crawling frequently enough, some site owners believe content in Bing’s index isn’t as fresh as it could be.
On the other hand, there are site owners who believe bingbot crawls too often, which causes constraints on website resources.
Bing says this is an engineering problem that hasn’t been fully resolved yet.
Making Bingbot More Efficient
The key issue is managing the frequency that bingbot needs to crawl a site to ensure new and updated content is included in the search index.
Bingbot also has to serve the requests of site owners. Some webmasters request to have their sites crawled daily, although the majority of webmasters would prefer to only their site crawled when new URLs have been added or content has been updated.
“The challenge we face, is how to model the bingbot algorithms based on both what a webmaster wants for their specific site, the frequency in which content is added or updated, and how to do this at scale.“
Measuring and Maximizing Crawl Efficiency
Bing measures the intelligence of bingbot using a metric called crawl efficiency, which is how often Bing crawls and discovers new and fresh content per page crawled.
Ideally, bingbot would crawl a URL only when the content has first been added to the web, or when a URL has been updated with fresh and useful content.
Crawl efficiency is lowered when bingbot crawls duplicated and/or unchanged content.
What This Means for Site Owners
Bing says it has improved the crawl efficiency of bingbot over the past few months.
For site owners, that means their new and updated content should appear in Bing’s index in a timely manner.
In addition, website resources should not be constrained due to Bingbot crawling duplicate content, or content that hasn’t been changed since the last crawl.