SEO · WebMaster Resources

About URL Tracking Parameters and Duplicate Content Issues

How do you deal with links that have tracking parameters in the URL for referrer measurement for analytical reasons or in case of a partner program where the referring site gets some form of compensation for referred traffic and/or customers?

The webmasters fear was regarding the duplicate URLs that are generated for the same page and possible negative consequences in Google or other search engines as a result of it.

This question was asked by an attendee during the panel about SEO design and organic site structure on Wednesday at Webmaster Worlds “PubCon” 2007 in Las Vegas.

I was not 100% satisfied by the answers to this question by the panelists Mark Jackson, Paul Bruemer, Lyndsay Walker, Alan K’necht and moderator Todd Friesen from Range Online Media and approached the attendee after the session to provide some tips and options for their particular problem.

Paid search tracking links don’t create this problem in most cases, because those links are usually JavaScript or nofollowed and ignored by search engines. Partner links on the other hand are usually normal HTML links that are being recognized and followed by search engines.

Duplicate Content Filter; NOT Penalty
First I would like to address the fear of possible penalties because of the duplicate content that is being created as a result of those tracking links. Google and other search engines use duplicate content filters and not penalties for duplicate pages within a single website. This means that the search engine will decide for one URL to the page and suppress all others. In this example am I pretty sure that search engines will pick the URL without the URL parameters (for a number of reasons that I will not elaborate here, because it is not relevant to this particular issue). This is different from the duplicate content issue across multiple domains, caused by canonical URLs, redistribution of content or scraping.

Just to make sure, you might want to check manually if any search engine returns the wrong version of the URL or not. A phrase search for the entire page title works the best IMO. A search for the URL with tracking code might results in a match, because as stated before, the duplicate content filter is not a penalty to block content from being indexed. For this reason, a search for the URL does not provide the answer to which URL the search engine chooses in normal search results.

Search engines might reduce the frequency or amount of pages they are crawling, if a site has excessive duplication issues, but issues that create those kinds of problems are different from the one that is currently discussed.

What Are the Real Issues?
If you don’t do anything, it will not be the end of the world, but two problems are the result of it.

The first problem is that “link juice” that should flow into one particular URL is wasted on a different one (the one with the tracking code).

The second and probably more severe problem is the possibility that a URL with the tracking code is returned for some queries on the search engines and clicked by users, which is skewing your referrer tracking statistics, because you track some referrers from the search engines as referrer from a specific partner site.

You have three general options available. Actually 4 options, but let me start with those 3 first and elaborate option 4 at the end of this post.

Option 1 – Block URLs
Block URLs with tracking code from search engine spiders. You can do this via the Robots.txt file, but you have to be very careful that you not exclude the URL without the tracking code as well. You can also exclude the URL programmatically, if the landing page is a dynamic script. You could add code to the page that sets the META tag for ROBOTS in the HEAD section of the rendered HTML to NOINDEX, if the page is called with a tracking parameter in the URL and set it to INDEX, if the call is made without the tracking parameter. Both methods will ensure that only one URL for the page is being indexed, but you lose the SEO benefit of the inbound links, because the Page Rank or “Link Juice” to the excluded pages are not counted and do not help your page to rank better in the SERPS.

Option 2 – Redirect URLs
You redirect all requests to URLs with the tracking parameter in the URL to the URL without the tracking parameter in the URL via a 301 redirect. This can be accomplished via a MOD REWRITE rule in the .HTACCESS file of your site, if you use Apache as web server, a redirect rule specified in your REWRITE software for Microsoft IIS (does not come with IIS, separate software) or via code in the dynamic script of the landing page. This would allow you to keep the SEO benefit of the link, but might prohibits the tracking of the referrer with your current tracking solution, more to that in a second.

Option 3 – Do Nothing
Leave everything as it is today and do nothing. As said before, it would not be the end of the world and only create the problems described in the paragraph about the “real issues”.

Notes to Option 1 and 2
If you decide for option 1, to eliminate the problem of having URLs with tracking code indexed by the search engines and accept the loss of the value those inbound links provide, check out the free resources to Robots.txt, META Tags, HTACCESS and 301 Redirects available on my site at Cumbrowski.com.

I also have source code in classic ASP available for the detection and removal of URL parameters, followed by a 301 redirect to the URL without the parameter. The code examples can be translated easily into other script languages, such as PHP, Python, PERL or DOT.NET.

While the code option is the best, because it allows possible tracking of referrers while also doing a 301 redirect to benefit from the inbound link, does it not mean that this is the only option for you to go.

Possible Tracking Issue with Redirection
You want to make sure that the tracking still works. This will be a problem, if your current tracking is done via a 3rd party provider where you added a piece of JavaScript code to your HTML pages and that’s it. The 301 redirect is a server side redirect and no HTML is rendered to be able to execute the tracking code for the URL with the tracking parameter. If the tracking solution is custom and supports server side scripting to lock the hit of the page with the tracking code prior the 301 redirect, perfect.

A solution in between those two would be possible, if the analytics provider allows the upload of some custom tracking data into their system for processing and reporting. In this case could your programmer write a simple logging script to track the hits prior to the redirect and in addition to that provides a tool to download those hits for the upload into the analytics software.

Conclusion
Those are your options and which one is the right for you does not only depend on the technical abilities of your team and your analytics solution provider. You have to decide if the gain from implementing any of the possible solutions outweighs the cost and efforts needed to get it done. If the SEO benefit is only marginal and you don’t expect that any of the affected pages would increase significantly in ranking, the whole ordeal of implementing the server side tracking, URL parsing and redirecting might be not worth it. The block or exclusion of the URL with the tracking code might be enough, to ensure that your tracking stats are correct.

Option 4 – The Very Best Solution
The perfect solution would of course be, if you could configure your analytics solution to allow the reporting of referrer traffic based on the referring website URL to eliminate the need of a special tracking parameter in the URL altogether.

This is not possible, if the referring partner links from a SSL secured page to your site or if the URL is also used in promotional emails sent by the partner of your behalf. If those two things are not a problem, this would be the way to go. It would avoid the duplicate URLs issue and your page would get the SEO benefit of the link automatically.

I assumed for this post that this option was already thought about and not considered to be usable for your tracking purposes.

I hope this answers some of the questions that people might had regarding this subject and provides answers that allow you to make the right decisions about what to do or not to do in your specific case.

Cheers!

Carsten Cumbrowski
Affiliate Marketer, Internet Marketing Strategy Consultant, Blogger at ReveNews.com and Editor for SearchEngineJournal.com. More free resources for marketers are available at Cumbrowski.com.

 About URL Tracking Parameters and Duplicate Content Issues
Carsten Cumbrowski has years of experience in Affiliate Marketing and knows both sides of the business as the Affiliate and Affiliate Manager. Carsten has over 10 years experience in Web Development and 20 years in programming and computers in general. He has a personal Internet Marketing Resources site at Cumbrowski.com.To learn more about Carsten, check out the "About Page" at his web site. For additional contact options see this page.

Comments are closed.

21 thoughts on “About URL Tracking Parameters and Duplicate Content Issues

  1. hello,
    today I have discovered that I have a few duplicate pages for my home pages with something that seem to be the adwords tracking code.
    Something like gclid=CP780NXq3o8CFSO
    Is that possible?
    Could that be the reason for my site disappearing from the Google rankings just to get back after a few days/weeks?
    This is like a cycle and I can’t find the reason.
    I am not sure because I can’t find the allegedly duplicate page ranking low…is 600 position for the contact or about us page instead of the home page
    Thanks for any suggestion

  2. I have already did that after I discovered it
    Question: How is possible to index that URL?
    Just a big coincidence?
    When someone clicked on the adwords ad GoogleBot was on my site?
    And could be the answer to my bigger problem?
    The site disappearing for weeks from the home page just to reappear on the same page after days or weeks without changing a thing.
    I suspected duplicate content with some directory listing and re-write all the content but no chance
    Thanks

  3. There has been a big change at Google recently where it appears Google is trying to fix the issue on their end. Less than a month ago a large competitor of ours sent all of thier ad links to a page that looks like

    /ads/default.aspx?id=asdf&another=asdf

    There were thousands of links to many different paramater version. But they had one HUGE ad on a top50 web property that was flowing PR and that link with that particular paramater mix was ranking in the top 10 for a major KW based solely on this on ROS link.

    Then Google changed something and they started ranking the page

    /ads/default.aspx

    So it looks like they consolidated all the PR onto the one page from all affiliates.

    So Google seems to be trying to fix the issue themselves. Yahoo & MSN? Well you know how that goes.

  4. Yeah, that is a glitch, unless the same URL parameters are also used somewhere else.

    If they don’t use JavaScript to hide the links, they should use nofollow. That is also what Google demands from webmasters to do with any “paid link” on their site (don’t ask me what exactly a paid link is, because I don’t know that either). Demanding something from webmasters what they don’t do for their own ads would be hypocritical10.

    Referate: As I stated in my post early on, this does NOT cause a penalty. It only triggers filters during the return of the search results, where search engines decide at query time, which version of a page they show in the results and which versions they suppress.

  5. Nice thorough post on an issue that I have spoke much about in forums, newsgroups, and SEOmoz. Fixing this problem has seen tremendous benefit for OneCall.com. Mainly because most of our inbound traffic is via our partners (in-house program). We still have duplicate content issues, but so far I have consolidated our catalog from about 200,000 pages down to 50,000 pages. Yes, we had that big of a problem.

    Where did it stem from? Affiliate tracking, Endeca URLs, Omniture URLs, promotion specific tracking.

    Now what are we doing? 301 redirects where we can. Using onclick events whenever possible and only putting the parameters in the onclick.

    The results are astronomical. ;-)

  6. Hi OneCallGuy,

    Yeah, in your example is Option 3. certainly the way to go.

    The ASP sample code I referred to in the post was written for exactly this purpose, in addition to the canonical URL issues and the problem with multiple domain names that pointed to the same site. Eg. Domain.com and Domain.net etc.

    We had massive duplications due to affiliate, other partner links and Endeca URL parameters, although we were able to cover a lot of Endeca related issues in different ways as well (those had to do with site design and some other technical details, such as adding and removing parameters server side prior the Endeca Navigation Engine call)

    You might want to look at that source code, unless you have solved all your problems already. As I said, the code can be easily ported to other script languages like PHP etc.

  7. CarstenCumbrowski,

    Why would the ad buyer ask that to be done when that set of links is probably making them 6 figures a month from organic and that doesn’t even include the direct traffic? ;)

    When a site is paying 5 figures a month for a link, it’s kinda hard for Google to claim the link was bought for search rankings and penalize either site. Of course they might stop the PR passing, but they haven’t so far.

  8. When faced with a choice of two or more URLs for the same content, one of which has an extra parameter, and in trying to decide which URL to list, I believe that generally Google will go for the shorter URL and/or the one that also appears in the internal navigation within the site; that is, unless the Pagerank of the incoming link from the external site is way way higher than the PR delivered by the internal link.

    If your tracking URL does get to be indexed, and appears in the SERPs you will no longer be tracking people coming from that other site, but instead your counts will be polluted by a number of people coming to your site from the SERPs. Whatever you do, make sure that the alternative URL cannot be indexed by search engines.

    I prefer the robots meta tag, rather than the robots.txt file for several reasons. I believe that Matt Cutts has also addressed this issue only a few weeks ago too.

  9. g1smd,
    I agree with you and also believe that Google will chose the shorter URL over the one with/more URL parameters, unless the amount of inbound links (internal and/or external) suggest otherwise. I am also sure that Google is able to identify some obvious tracking parameters in the URL and ignores them automatically (as they do with obvious session IDs). But as always, you should not leave it up to chance that Google might does it right, if you can control it.

    I mentioned the problem of indexed tracking URLs regarding the pollution or skewing of your tracking reports in the post. I also recommend the use of the robots META tag over the use of the robots.txt for excluding the versions of the URL that have tracking parameters. If you don’t do it 100% correctly in the robots.txt, you either don’t block it or worse, block the whole page, including the one you want to have indexed.

    If Option 4 can’t be done, Option 2 is the way to go. If Option 2 is not feasible, go with Option 1. If that is also not feasible, you will have to stick with Option 3, but be aware of the side effects of it, which you and me just mentioned once more.

  10. Ignacio: You can see referral information in Google Analytics and even drill down to low traffic referrals during the selected time period. I am not aware of a feature though that lets you specify a list of predefined domains or pages that you would like to monitor for referrals. They are currently adding more and more features to Google Analytics, which makes it hard to keep up :), however, I don’t think they have this kind of option yet. The conversion goal configuration might work for this though, as a workaround. It would be worthwhile to check for sure.

  11. Great post, thank you! It sounds like I don’t need to be concerned with parameters in paid search links, but I do need to be concerned with affiliate links. What about links coming from comparative shopping sites like shopzilla and shopping.com?

  12. How about a mirror website?. Under company policy we have two websites (with two domain names) with the same content. I still can not figure out how to avoid the duplicate content issue.
    Any idea on this? Or should I sacrifice the mirror by blocking spiders?
    Waiting for your advice.
    Thanks.

  13. Arief asked: “How about a mirror website?”

    Decide on which domain will be the primary one and 301 redirect all requests to the other domains to that one. I had a client who had multiple domain names as well and all pointed to the same website (same code etc.).

    The site was a dynamic website and I added to the code in every page (actually only to one script, which was included in all other scripts) to check if the current domain is the same as the master domain. If that was not the case, the script would replace the current domain with the master one and perform a 301 redirect.

    I have sample code in classic ASP available on my website. See ASP 301 redirect code.

    You can accomplish the same via an URL Rewrite rule in the .HTACCESS file (if you use Apache web server) or an ISAPI plugin for MS IIS like Helikons URLRewrite.

    I hope this makes sense.

  14. URL with tracking codes are not the only ones causing duplicate contents issue. PHP server-client coding has also created an issue because of its SESSION policy. Many PHP sites faced the problem of a single URL indexed by search engines at least 5 -6 times with the url as http://www.yoursite.com/index.php?phpsession=ifbsafbasofasfnasfcas983r2
    and the phpsession codes are different everytime a spider crawls that site.

    The best solution to solve this problem would be using “” at the header of the page.

  15. Session IDs are a different kind of problem that require a different kind of solution. If you have to use session IDs, provide at least a version without session ID for the spiders. This would be an example of ethical cloaking. If you can, get rid of session IDs in the URL alltogether, they cause all kinds of problems, not only for spiders.

  16. I do have a solution to this which have already solved our problem. However, it seem like i couldn’t get it in here. It is just a single line php code at the header.