SEO

Duplicate Content Session at SMX Advanced

Duplicate content and Buffy the Vampire Slayer. What do these have in common? They shed some light into the bizarre psyche of Google developers, but were also at the heart of the Duplicate Content session at SMX Advanced.

Duplicate content in 60 seconds:

  • Determine whether your site is experiencing intentional or accidental duplicate content or both.
  • If intentional, block abusive IPs, detect user agents, block specific crawlers, add copyright information to the content, request the duplicate site remove the content or take legal action.
  • If accidental, control URLs through .htaccess, client-side 301 redirects, parameter or variable reduction, 404 pages and consistent linking strategies. Also, don’t duplicate pages in the secure and non-secure areas of your site.
  • If you still experience a problem, communicate with the search engines, they are pro-actively working on a solution, but need examples and suggestions to better handle duplicate content.

After the You & A with Matt Cutts, Danny Sullivan moderated the organic session on duplicate content with the major search engines representin’ – the lovely Vanessa Fox (Product Manager from Google), Amit Kumar (Senior Engineering Manager from Yahoo! Search), Peter Linsey (Senior Product Manager for Search at Ask.com) and Eytan Seidman (Lead Program Manager of Live Search from Microsoft).

So, let’s dive in with some of the basics: What is duplicate content?

Intentional duplicate content = Content that is intentionally duplicated on either your or another website.

Accidental duplicate content = Content that is seen by the search engines as duplicate, but happens through passive or accidental methods.

Why is duplicate content an issue?

It fragments rank, anchor text and other information about the page you want to appear. It also impairs the user experience and consumes resources.

How can you combat duplicate content?

It’s difficult for the search engines to decipher the canonical page of your site, so the best way to avoid accidental duplication is by controlling your content! You can do this in a variety of ways including:

  • Be consistent with your linking strategy both on-site and off (Jessica Bowman had an excellent article on this, “Should URLs in Links Use Index.html?”)
  • Reduce session parameters and variable tracking
  • Always deliver unique content even if the location isn’t unique
  • Use client-side redirects rather than server-side
  • HTTP vs HTTPS – don’t duplicate the HTTP pages in a secure area

As for intentional duplicate content, the options are limited but include:

  • Simply asking visitors not to steal content
  • Contact those that do steal your hard-earned content and ask that they remove it
  • Embed copyright or a creative commons notification in your content
  • Verify user-agents
  • Block unknown IP addresses from crawling the site
  • Block specific crawlers
  • If that doesn’t work, get the lawyers involved and go for blood

A final note for both intentional and accidental duplicate content:

If you locate the source of a problem and made all attempts to rectify the situation, but it still is not resolved, contact the search engines. File a reinclusion request with notice of what happened, when, how you tried to fix the problem and where you find yourself today.

And now for some search engine specific advice:

Microsoft

- Consider whether duplicate content is adding value to your site

- If you’re the duplicator, be sure to give attribution

- Consider blocking local copies of pages with robots.txt

- There’s no such thing as a site-wide penalty

- Session parameter analysis occurs at the crawl time

- Duplicates are also filtered when the site is crawled

- Technology exists to find near-duplicates and ignores most mark-up, focusing on just the key concepts

Ask.com

- Duplicate content is not penalized.

- Templates are not considered for duplication, only the indexable content.

- Filter for high confidence, low tolerance on false positives.

Yahoo!

- Filters duplicates at crawl-time

- Less likely to extract links from duplicate pages

- Less likely to crawl new documents with duplicate pages

- Index-time filtering

- Less representation of duplicates when choosing crawled pages to put in index

- Legitimate forms of duplication include: newspapers, multiple languages, HTML/Word/PDF documents, partial duplication from boilerplates (navigation and common site elements)

- Not found error pages should return a 404 HTTP status code when crawled (this isn’t abusive, but makes crawling difficult)

Google

Vanessa threw a curve ball and decided not to duplicate presentations! Instead she requested feedback from the audience, but not before alienating anyone over the age of 30 with Buffy the Vampire Slayer metaphors.

And now it’s time for SEO to meet SMM.

 Duplicate Content Session at SMX Advanced
Rhea Drysdale is Co-founder and CEO of Outspoken Media, which specializes in SEO consulting, link building, reputation management and social media. With more than seven years experiences, Rhea has spoken at SMX, SES, Web 2.0 Expo, Pubcon, Blog World Expo and BlueGlass. She has also been featured on CNN.com, in the Wall Street Journal and in SEO: The Search Engine Optimization Bible as an industry insider.
 Duplicate Content Session at SMX Advanced

Latest posts by Rhea Drysdale (see all)

You Might Also Like

Comments are closed.

5 thoughts on “Duplicate Content Session at SMX Advanced

  1. I was at this session. If by “the lovely Vanessa Fox ” you mean “airhead”, then you are right on. She’s may be lovely, but added zero value.

  2. lol, wow. I think Vanessa humors SEOs because she can’t say too much, just my thought. Personally, I think the metaphors shed some good light into the duplicate content issue. Willow and Xander – you can have the good and bad sides of duplicate content. Take everything with a grain of salt and understand your position in your industry or company size knowing that there’s an ever-changing threshold.

  3. I do not really have the time to prevent duplicate contents of my website externally. However, i do a check every 6 months to ensure there are no illegal duplicated contents of my website across the web.