This week’s question comes from Xaris, who asks:
“Why, even though I have correctly composed and linked the sitemap to a client’s website, and I have checked everything, am I having indexing problems with some articles, not all of them, even after repeated requests to Google and Google Search Console. What could be the problem? I can’t figure it out.”
This is far from a unique problem; we’ve all experienced it! “I’ve done everything I can think of, but Google still isn’t indexing my pages.”
Is It Definitely Not Indexed?
The very first aspect to check is if the page is truly not indexed, or simply isn’t ranking well.
It could be that the page appears not indexed because you can’t find it for what you consider the relevant keywords. However, that doesn’t mean it’s not indexed.
For the purposes of this question, I’m going to give you advice on how to deal with both circumstances.
What Could Be The Issue?
There are many reasons that a page might not be indexed by, or rank well, on Google. Let’s discuss the main ones.
Technical Issue
There are technical reasons, both mistakes and conscious decisions, that could be stopping Googlebot from reaching your page and indexing it.
Bots Blocked In Robots.txt
Google needs to be able to reach a page’s content if it is to understand the value of the page and ultimately serve it as a search result for relevant queries.
If Googlebot is blocked from visiting these pages via the robots.txt, that could explain why it isn’t indexing them.
It can technically still index a page that it can’t access, but it will not be able to determine the content of the page and therefore will have to use external signals like backlinks to determine its relevancy.
If it cannot crawl the page, even if it knows it exists via the sitemap, it will still make it unlikely to rank.
Page Can’t Be Rendered
In a similar way, if the bot can crawl the page but it can’t render the content, it might choose not to index it. It will certainly be unlikely to rank the page well as it won’t be able to read the content of the page.
Page Has A No-Index Tag
An obvious, but often overlooked, issue is that a noindex tag has been applied to the page. This will literally instruct Googlebot not to index the page.
This is a directive, that is, something Googlebot is committed to enacting.
Server-Level Bot Blocking
There could be an issue at your server level that is preventing Googlebot from crawling your webpage.
There may well have been rules set at your server or CDN level that are preventing Googlebot from crawling your site again and discovering these new pages.
It is something that can be quite a common issue when teams that aren’t well-versed in SEO are responsible for the technical maintenance of a website.
Non-200 Server Response Codes
The pages you have added to the sitemap may well be returning a server status code that confuses Googlebot.
For example, if a page is returning a 4XX code, despite you being able to see the content on the page, Googlebot may decide it isn’t a live page and will not index it.
Slow Loading Page
It could be that your webpages are loading very slowly. As a result, the perception of their quality may be diminished.
It could also be that they are taking so long to load that the bots are having to prioritize the pages they crawl so much that your newer pages are not being crawled.
Page Quality
There are also issues with the content of the website itself that could be preventing a page from being indexed.
Low Internal Links Suggesting Low-Value Page
One of the ways Google will determine if a page is worth ranking highly is through the internal links pointing to it. The links between pages on your website can both signify the content of the page being linked to, but also whether the page is an important part of your site. A page that has few internal links may not seem valuable enough to rank well.
Pages Don’t Add Value
One of the main reasons why a page isn’t indexed by Google is that it isn’t perceived as of high enough quality.
Google will not crawl and index every page that it could. Google will prioritize unique, engaging content.
If your pages are thin, or do not really add value to the internet, they may not be indexed even though they technically could be.
They Are Duplicates Or Near Duplicates
In a similar way, if Google perceives your pages to be exact or very near duplicate versions of existing pages, it may well not index your new ones.
Even if you have signaled that the page is unique by including it in your XML sitemap, and using a self-referencing canonical tag, Google will still make its own assessment as to whether a page is worth indexing.
Manual Action
There is also the possibility that your webpage has been subject to a manual action, and that’s why Google is not indexing it.
For example, if the pages that you are trying to get Google to index are what it considers “thin affiliate pages,” you may not be able to rank them due to a manual penalty.
Manual actions are relatively rare and usually affect broader site areas, but it’s worth checking Search Console’s Manual Actions report to rule this out.
Identify The Issue
Knowing what could be the cause of your issue is only half the battle. Let’s look at how you could potentially narrow down the problem and then how you could fix it.
Check Bing Webmaster Tools
My first suggestion is to check if your page is indexed in Bing.
You may not be focusing much on Bing in your SEO strategy, but it is a quick way to determine whether this is a Google-focused issue, like a manual action or poor rankings, rather than something on your site that is preventing the page from being indexed.
Go to Bing Webmaster Tools and enter the page in its URL Inspection tool. From here, you will see if Bing is indexing the page or not. If it is, then you know this is something that is only affecting Google.
Check Google Search Console’s “Page” Report
Next, go to Google Search Console. Inspect the page and see if it is genuinely marked as not indexed. If it isn’t indexed, Google should give an explanation as to why.
For example, it could be that the page is:
Excluded By “Noindex”
If Google detects a noindex tag on the page, it will not index it. Under the URL Inspection tool results, it will tell you that “page is not indexed: Excluded by ‘noindex’ tag”
If this is the result you are getting for your pages, your next step will be to remove the noindex tag and resubmit the page to be crawled by Googlebot.
Discovered – Currently Not Indexed
The inspection tool might tell you the “page is not indexed: Currently not indexed.”
If that is the case, you know for certain that it is an indexing issue, and not a problem with poor rankings, that is causing your page not to appear in Google Search.
Google explains that a URL appearing as “Discovered – currently not indexed” is:
“The page was found by Google, but not crawled yet. Typically, Google wanted to crawl the URL but this was expected to overload the site; therefore Google rescheduled the crawl. This is why the last crawl date is empty on the report.”
If you are seeing this status, there is a high chance that Google has looked at other pages on your website and deemed them not worth adding to the index, and as such, is not spending resources crawling these other pages that it is aware of because it expects them to be of as low quality.
To fix this issue, you need to signify a page’s quality and relevance to Googlebot. It is time to take a critical look at your website and identify if there are reasons why Google may consider your pages to be low quality.
For further details on how to improve a page, read my earlier article: “Why Are My Pages Discovered But Not Indexed?”
Crawled – Currently Not Indexed
If your inspected page returns a status of “Crawled – currently not indexed,” this means that Google is aware of the page, has crawled it, but doesn’t see value in adding it to the index.
If you are getting this status code, you are best off looking for ways to improve the page’s quality.
Duplicate, Google Chose Different Canonical Than User
You may see an alert for the page you have inspected, which tells you this page is a “Duplicate, Google chose different canonical than user.”
What this means is that it sees the URL as a close duplicate of an existing page, and it is choosing the other page to be displayed in the SERPs instead of the inspected page, despite you having correctly set a canonical tag.
The way to encourage Google to display both pages in the SERPs is to make sure they are unique, have sufficient content so as to be useful to readers.
Essentially, you need to give Google a reason to index both pages.
Fixing The Issues
Although your pages may not be indexed for one or more of various reasons, the fixes are all pretty similar.
It is likely that there is either a technical issue with the site, like an errant canonical tag or a robots.txt block, that has been preventing correct crawling and indexing of a page.
Or, there is an issue with the quality of the page, which is causing Google to not see it as valuable enough to be indexed.
Start by reviewing the potential technical causes. These will help you to quickly identify if this is a “quick” fix that you or your developers can change.
Once you have ruled out the technical issues, you are most likely looking at quality problems.
Depending on what you now think is causing the page to not appear in the SERPs, it may be that the page itself has quality issues, or a larger part of your website does.
If it is the former, consider E-E-A-T, uniqueness of the page in the scope of the internet, and how you can signify the page’s importance, such as through relevant backlinks.
If it is the latter, you may wish to run a content audit to help you narrow down ways to improve the overall perception of quality across your website.
Summary
There will be a bit of investigation needed to identify if your page is truly not indexed, or if Google is just choosing not to rank it highly for queries you feel are relevant.
Once you have identified that, you can begin closing in on whether it is a technical or quality issue that is affecting your pages.
This is a frustrating issue to have, but the fixes are quite logical, and the investigation should hopefully reveal more ways to improve the crawling and indexing of your site.
More Resources:
- Website Indexing For Search Engines: How Does It Work?
- 13 Steps To Boost Your Site’s Crawlability And Indexability
- The Complete Technical SEO Audit Workbook
Featured Image: Paulo Bobita/Search Engine Journal