1. SEJ
  2.  ⋅ 
  3. SEO Audit

How to Check for Duplicate Content During an SEO Audit

Identifying duplicate content issues is a crucial part of your SEO audit. Here's what you need to check for and how to do it.

Chapter 5 - Duplicate Content

Different types of content issues can plague a site – from URL-based content issues to physical duplicate content, actually replicated from page to page without many changes.

As if that weren’t enough, you have other WordPress-specific duplicate content issues to worry about, such as duplicate content on product pages and category pages.

Identifying duplicate content issues is a crucial part of your SEO audit.

Here’s what you need to check for and how to do it.

Identify Duplicate Content Issues on Your Site Quickly

How to Check

Using the tool (made by Copyscape) can help identify duplicate content issues on your site quickly.

It gives an easy-to-see view that shows you which pages have a match percentage, and which pages match other pages.

Siteliner - Duplicate Content

Identify Which Pages of Your Site Were Duplicated Across the Web

How to Check

  • Use Copyscape to check and see which pages of your site have been duplicated across the web. Copyscape is considered one of the standard audit tools in SEO circles. This tool can help you identify duplicate content sitewide by using the private index functionality of their premium service.
  • To cover all your bases, check Google’s index for plagiarized copy of your site’s content from around the web. Select a section of text that you want to check, and simply copy/paste it into Google’s search bar. This should help you identify instances where it has been stolen.

Check URLs for Duplicate Content

Identifying duplicate content isn’t just limited to text content on the page.

Checking for URLs leading to duplicate content can also reveal issues that cause Google great confusion when they crawl your site.

Check and investigate the following:

  • How recent content updates are.
  • Magnitude of content updates.
  • Historical trend of page updates.

How to Check

In Screaming Frog, scroll all the way to the right, and you’ll find a Last Modified column. This can help you:

  • Determine how recent content updates are and the magnitude of content updates on the site.
  • Develop historical trends of page updates.

If you’re obsessed with your competitors, you could go as far as performing a crawl on them every month and keeping this data on hand to determine what they’re doing.

It would be pretty easy to analyze and keep this data updated in an Excel table, and identify historical trends if you want to see what competitors are doing in terms of developing their content.

This can be invaluable information.

ScreamingFrog Last Modified

What to Check

  • Syndicated content.
  • Helpful supplementary content.

Understanding how content is segmented within a site, or somehow syndicated, is useful for divvying up original content on a site from syndicated content on a site, especially when syndicated content is a heavy site feature.

This trick is especially useful for identifying thin content and creating custom filters for finding helpful supplementary content.

Keyword Prominence

The above trick for creating custom filters can also help you identify keyword prominence – where the keyword appears in the first 100 words of a page’s content.

Keyword in H1, H2, H3 Tags

In Screaming Frog, click on the H1 tab then take a look at the H1, H2, and H3 tags.

Alternatively, you can also click on the H2 tab. In addition, you can set up a custom filter to identify H3 tags on the site.

ScreamingFrog - Keywords in H tags

What to Check

  • Keyword word order.
  • Grammar and spelling.
  • Reading level.

Identifying poor grammar and spelling issues on your site during a site audit isn’t ideal, and can be painful, but doing so before posting content is a good step towards making sure your site is a solid performer.

If you aren’t a professional writer, use the Hemingway App to edit and write your content.

It can help identify major issues before you publish.

Hemingway App

Number of Outbound Links

The number of outbound links on a page can interfere with a page’s performance.

It has long been held a best practice by SEOs not to exceed 100 links per page.

While Google has stated that the requirement of limiting outbound links to 100 links per page has been removed, there are contradictory claims.

John Mueller has stated that outbound links are not a ranking factor. Which is it?

It helps to look at case studies conducted by others for answers:

There has been a study by that contradicts this one:

“The results are clear.

Outgoing relevant links to authoritative sites are considered in the algorithms and do have a positive impact on rankings.”

Context is important, because 100 outbound links on a page can be anything from 100 navigation links to 100 links purely put together to be in a link farm.

The idea here is to audit the quality of those links as well as the quantity.

If you see something strange going on in terms of the quantities of links, it merits further investigation into both their quality and quantity.

If you want to perform a bonus check, you could always check this in Screaming Frog, although generally it isn’t required anymore.

How to Check

In Screaming Frog, after you identify the page you want to check outbound links on, click on the URL in the main window, then click on the Outlinks tab.

Outlinks tab

Alternatively, you can click on Bulk Export > All Outlinks if you want a faster way to identify site-wide outbound links.

ScreamingFrog - Export Outlinks

Number of Internal Links Pointing to Page

To identify the number of internal links pointing to a page, click on the URL in the main Screaming Frog window then click on the Inlinks tab.

You can also click on Bulk Export > All Inlinks to identify site-wide inlinks to all site pages.

ScreamingFrog - Inlinks tab

Quality of Internal Links Pointing to Page

Using the exported Excel document from the step where we bulk exported the links, it’s easier to judge the quality of internal links pointing to each page on the site:

Inlinks report

Broken Links

Identifying broken links in an SEO audit can help you find pages that are showing up as broken to Google, and will give you an opportunity to fix them before they become major issues.

How to Check

Once Screaming Frog has finished your site crawl, click on the Internal tab, select HTML from the Filter: dropdown menu, and sort the pages by status code.

This will organize pages in descending order so you can see all of the error pages before the live 200 OK pages.

In this check, we want to identify all of the 400 errors, 500 errors, and other page errors.

For some links, depending on their context, it is safe to ignore 400 errors and let them drop out of the Google index, especially if it has been a while and you don’t find them in the Google index.

But if they are indexed and have been for a while, you’ll probably want to redirect them to the proper destination.

ScreamingFrog Status Codes

Affiliate Links

If the goal of your audit is to identify and remove affiliate links from an affiliate-heavy website, then the next tip is a good path to follow.

How to Check

Affiliate links tend to have a common referrer or portion of their URL that is identifiable across many different websites.

Utilizing a custom filter can help you find these links.

In addition, using conditional formatting in Excel, you can filter out affiliate links and identify where they are in the bulk exports from Screaming Frog.

URL Length

To identify URLs over 115 characters in Screaming Frog, click on the URL tab, click on Filter then click on Over 115 Characters.

This will give you all the URLs on-site that are more than 115 characters and can help you identify issues with overly long URLs.

ScreamingFrog URL length

Page Category

For a high-level overview of page categories, it’s useful to identify the top pages of the site via Screaming Frog’s site structure section, located on the far right of the spider tool.

How to Check

Using the site structure tab, you can identify the top URLs on the site, as well as which categories they fall into. In addition, you can identify page response time issues in the response times tab.

ScreamingFrog Page Category

Image Credits

Featured Image: Paulo Bobita
All screenshots taken by author

VIP CONTRIBUTOR Brian Harnish Senior SEO Analyst at Bruce Clay, Inc.

Brian has been doing SEO since before it was called SEO, back in the days of 1998. Back then, SEO ...

How to Do an SEO Audit: The Ultimate Checklist