Many people are quiet when it comes to SEO for Bing because there’s not a lot of information about it. Funny thing is that many cutting edge technologies and techniques were used at Bing before Google. Fabrice Canel, Principal Program Manager at Bing recently shared a load of information with Jason Barnard of Kalicube about how not just Bing works but in general how search engines work as well.
Criteria for Indexing Content
Fabrice is in charge of Bingbot Crawler, URLs Discovery and Selection, Document processing, and Bing Webmaster Tools. He’s a good person to turn to for information about search engines, particularly crawling and page selection.
Fabrice here describes the crawling process and what I feel is the important takeaway is how he says Bing is picky about what it chooses to index.
A lot of people feel that every page of their site deserves a chance to get ranked. But both Google and Bing don’t index everything.
They tend to leave behind certain kinds of pages.
The first characteristic of a page Bing would want to index is a page that is useful.
Screenshot of Jason Barnard
Fabrice Canel explained:
“We are business-driven obviously to satisfy the end customer but we have to pick and choose.
We cannot crawl everything on the internet there is an infinity number of URLs out there.
You have pages with calendars. You can go to next day forever.
So it’s really about detecting what is the most useful to satisfy a Microsoft Bing customer.”
Bing and Key Domains
Fabrice next talks about the concept of Key Domains and how they are guided by key pages on the Internet to show them the quality content.
This kind of sounds like an algorithm that incorporates a seed set of trusted sites from which the further in distance a site is from the key websites the likelier it is to be spam or useless (Link Distance Ranking Algorithms)
I don’t want to put words into Fabrice’s mouth, the above is just my observation.
I’ll let Fabrice speak for himself.
“Would you say most content on the web is not useful or is that exaggerating?”
“I think it’s a little bit exaggerated.
We are guided by key pages that are important on the internet and we follow links to understand what’s next.
And if we really focus on these key domains (key pages), then this is guiding us to quality content.
So the view that we have of the internet is not to go deep forever and crawl useless content.
It’s obviously to keep the index fresh and comprehensive, containing all of the most relevant content on the web.”
What Makes Bing Crawl Deep into Websites
Jason next asks about websites that get crawled deeply. Obviously, getting a search engine to index all of the pages of a site is important.
Fabrice explains the process.
“Right. And then I think that’s the key. You prefer going wide and going deep.
So if I have a site that’s at the top of the pile, you will tend to focus more on me than on trying to find new things that you don’t already know about?”
Fabrice provided a nuance answer, reflecting the complicated nature of what gets chosen for crawling and indexing:
“It depends. If you have a site that is specialized and covers an interesting topic that customer cares about then we may obviously go deep.”
Machines Choose What to Crawl
We sometimes anthropomorphize search engines by saying things like “The search engine doesn’t like my site.”
But in reality there’s nothing in algorithms that are about liking or trusting.
Machines don’t like.
Machines don’t trust.
Search engines are machines that are essentially programmed with goals.
Fabrice explains about how Bing chooses to crawl deep or not crawl deep:
“This is not me selecting where we go deep and not deep. Nor is it my team.
This is the machine.
Machine learning that is selecting to go deep or deeper based on what we feel is important for a Bing customer.”
That part about what is important for the customer is something to take note of. The search engine, in this case Bing, is tuned to identify pages that are important to customers.
When writing an article or even creating an ecommerce page, it might be useful to look at the page and ask, “How can I make this page important for the those who visit this web page?”
Jason followed up with a question to tease more information about what is involved in the selecting what’s important to site visitors.
“You’re just giving the machine the goals you want it to achieve?”
The main input we give the the Machine Learning algorithms is satisfying Bing customers.
And so we look at various dimensions to satisfy Bing customers.
Again, if you query for Facebook. You want the Facebook link at the top position. You don’t want some random blogs speaking about Facebook.”
Search Crawling is Broken and In Need of an Update
Jason asks Fabrice why IndexNow is helpful.
Fabrice responds by stating what crawling is today and how this method of finding content to index, which is nearly thirty years old, is in need of an update.
The old and current way of crawling is to visit the website and “pull” the data from the websites, even if the web pages are the same and haven’t changed.
Search engines have to keep visiting the entire indexed web to check if any new pages, sentences or links have been added.
Fabrice asserts that the way search engines crawl websites needs to change because there’s a better way to go about it.
He explained the fundamental problem:
“So the model of crawling is really to learn, to try to figure out when things are changing.
When will Jason post again? We may be able to model it. We may be able to try to figure it out. But we really don’t know.
So what we are doing is we are pulling and pulling and crawling and crawling to see if something has changed.
This is a model of crawling today. We may learn from links, but at the end of the day, we go to the home page and figure it out. So this model needs to change.”
Fabrice next explained the solution:
“We need to get input from the website owner Jason and Jason can tell us via a simple API that the website content has changed, helping us to discover this change – to be informed of a change, to send the crawler and to get latest content.
That’s an overall industry shift from crawling and crawling and crawling and crawling to discover if something has changed…”
The Present State of Search
Google tends to call them users, people who use their site. Bing introduces the concept of people who search as customers and with that all of the little aphorisms about customers that are implicit in a customer-first approach such as the customer is always right, give the customer what they want.
Steve Jobs said about customers in relation to innovating, which relates a bit with Bing’s IndexNow but also for publishers:
“You can’t just ask customers what they want and then try to give that to them. By the time you get it built, they’ll want something new.”
The Future of Search is Push?
Bing has rolled out a new push technology called IndexNow. It’s a way for publishers to notify the search engines to come crawl new or updated web pages. This saves hosting and data center resources in the form of electrical energy and bandwidth. It also makes it easier for publishers to know that the search engine will come and get the content sooner with a push method rather than later as in the current crawl method.
This is just a portion of what was discussed.
Watch the entire interview with Fabrice Canel