Fabrice Canel, Principal Program Manager at Bing, updated the Bing Webmaster Blog this week with six tips for sitemaps, including some advice specifically for large websites.
If you don’t have a sitemap yet, Bing recommends that you explore if your web site or your CMS can manage this, Either that, or install a sitemap plugin to do it for you.
If you would rather develop your own sitemaps, Bing suggests the following best practices.
Bing 6 Best Practices For Sitemaps
- First, follow the sitemaps reference at www.sitemaps.org. Common mistakes are people thinking that HTML Sitemaps are sitemaps, malformed XML Sitemaps, XML Sitemaps too large (max 50,000 links and up to 10 megabytes uncompressed) and links in sitemaps not correctly encoded.
- Have relevant sitemaps linking to the most relevant content on your sites. Avoid duplicate links and dead links: a best practice is to generate sitemaps at least once a day, to minimize the number of broken links in sitemaps.
- Select the right format:
- Use RSS feed, to list real-time all new and updated content posted on your site, during the last 24 hours. Avoid listing only the past 10 newest links on your site, search engines may not visit RSS as often as you want and may miss new URLs. (This can also be submitted inside Bing Webmaster Tools as a Sitemap option.)
- Use XML Sitemap files and sitemaps index file to generate a complete snapshot of all relevant URLs on your site daily.
- Consolidate sitemaps: Avoid too many XML Sitemaps per site and avoid too many RSS feeds: Ideally, have only one sitemap index file listing all relevant sitemap files and sitemap index files, and only one RSS listing the latest content on your site.
- Use sitemap properties and RSS properties as appropriate.
- Tell search engines where our sitemaps XML URLs and RSS URLs are located by referencing them in your robots.txt files or by publishing the location of your sitemaps in search engines’ Webmaster Tools.
Advice For Large Websites
Some sites are really large these days, with over millions of URLs. Bing suggest to think about how many of those URLs you really need on your sitemap.
In general search engines will not crawl and index millions or billions of URLs from one site. Bing says it’s highly preferable that you link only to the most relevant web pages. That way, at least your relevant web pages are discovered, crawled and indexed.
Another best practice Bing offers to help ensure that search engines discover all the links of your very large web site is managing two sets of sitemaps files.
For example, update sitemap set A on day one, update sitemap set B on day two, and continue iterating between A and B. Use a sitemap index file to link to Sitemaps A and Sitemaps B or have 2 sitemap index files one for A and one for B.
The reasoning behind this method is to give enough time (24 hours) for search engines to download a set of sitemaps not modified. This will help ensure that search engines have discovered all your sites URLs in the past 24 to 48 hours.
For full details, see this post on Bing’s Webmaster Blog.