Tools

SEO for HTTP and HTTPs

As the holiday season is rolling in, ecommerce websites are going full force at their SEO, which inevitably includes HTTP/HTTPS pages which need to be optimized in a proper way. I approached Matt Cutts with this question on Twitter and got a very simple answer:

clip image002 0043 SEO for HTTP and HTTPs

clip image004 0019 SEO for HTTP and HTTPs

Yet, there’s virtually no information anywhere that helps understand the potential challenges for HTTP/HTTPS optimization. Based on my observations and technical knowledge, here’s the top things to watch out for when you are optimizing HTTP/HTTPs – and resolutions for each.

1. Duplicate Content and Canonicalization

Because the protocols (http/https) are different, they are considered two separate sites, so there’s a good chance to get penalized for duplicate content. If the search engine discovers two identical pages, generally it would take the page it saw first and ignore the other pages.

Solutions:

  1. Be smart about the site structure:  to keep the engines from crawling and indexing HTTPS pages, structure the website so that HTTPs are only accessible through a form submission (log-in, sign-up, or payment pages). The common mistake is making these pages available via a standard link (happens when you are either ignorant or  not aware that the secure version of the site is being crawled and indexed).
  2. Use Robots.txt file to control which pages will be crawled and indexed
  3. Use .htaccess file. Here’s how to do this:
  4. Create a file names robots_ssl.txt in your root.
  5. Add the following code to your .htaccessRewriteCond %{SERVER_PORT} 443 [NC]RewriteRule ^robots.txt$ robots_ssl.txt [L]
  6. Remove yourdomain.com:443 from the webmaster tools if the pages have already been crawled
  7. For dynamic pages like php, try< ?phpif ($_SERVER["SERVER_PORT"] == 443){echo “< meta name=” robots ” content=” noindex,nofollow ” > “;}?>
  8. Dramatic solution (may not always be possible): 301 redirect the HTTPS pages to the HTTP pages – with hopes that the link juice will transfer over.

Additional ideas to solve:

  1. Have portions of the site configured to use SSL TO allow data transfer between the server and the browser over an encrypted (secure) connection. Note: the URLs of these pages still begin with https rather than http to indicate the secure protocol.
  2. If you already have HTTPs pages in the index, remove them with Webmaster Tools

2. Linking

In certain instances, I’ve seen Google index the HTTPs version of a website (for example, PayPal.com) but since everyone tends to be linking to the HTTP version of the page, the HTTPs may be out in the woods in terms of pagerank (though this is not the case with PayPal). Now, if Google indexed only the HTTPs pages, then you may be in trouble because you basically have no link juice from HTTP pages. Of course, this may not be the case if you are after HTTPs from the beginning.
Solutions:

  1. The best practice is to get links to HTTP versions of the pages, and to do this you will need to make sure that important pages are available in HTTP (typically, this should not be a problem, as most HTTPs pages are not content rich and have less value for the search engines).
  2. Keep a different log file for the https domain and write a bit of code to point the referral links to your email every day or week. Contact the webmasters and ask them to change the links to http with a sweet mail (this works for me most times).
  3. Keep normal sections under http only to reduce the likelihood of people linking to https.
  4. Last resort solution: wait for Google to start counting https links for http pages (good luck with that!)

Maria Nikishyna is an accomplished search engine marketer specializing in paid and organic lead gen programs. When Maria is not optimizing websites, she’s an SEM blogger on Seattle Search Marketing Blog. Maria’s posts revolve around technical SEO issues, AdWords tricks, and SEO tips for webmaster.

 SEO for HTTP and HTTPs

Maria Nikishyna

Maria Nikishyna is an accomplished search engine marketer specializing in paid and organic lead gen programs. When Maria is not optimizing websites, she’s an SEM blogger on Seattle Search Marketing Blog. Maria’s posts revolve around technical SEO issues, AdWords tricks, and SEO tips for webmasters.
 SEO for HTTP and HTTPs

Latest posts by Maria Nikishyna (see all)

You Might Also Like

Comments are closed.

27 thoughts on “SEO for HTTP and HTTPs

  1. “Because the protocols (http/https) are different, they are considered two separate sites, so there’s a good chance to get penalized for duplicate content.”

    > There is no duplicate content penalty.

    1. Eric, I have seen many website punished for duplicate content and disagree that it does not exist. According to Google, when the content found on your site is largely the same as what is found elsewhere on your site or on other websites across the internet, it provides no value to the user. Thus, Google does not see such content as valuable. Of course, HTTP/HTTPS dupes are less likely to get penalized, but it is still a possibility.

      1. Matt Cutts has told us, on several occasions, that there is no duplicate content penalty. The main issue is that if you have duplicate pages (likely on separate domains) the SE may choose one page to index and rank while you would have preferred the other. I would imagine, however, that a spammy site containing multiple duplicate content-laden pages would get a penalty, but that would be more of a spam penalty or such.

  2. It you just use htaccess to 301 the whole site to the https version it won’t hurt your rankings at all. I had to do this for a site using an AJAX cart and everything was fine.

    I used to do it with a 302, but Google wound up breaking my indented listing apart so I was #1with http and #5 with an https page (different page). Put up the htaccess fix and it all sorted in a couple days.

    Finally, a filter versus a penalty is a semantic argument. Google does bad things to pages it considers duplicate, including graybar PR (supplemental index), but http versus https won’t get you any duplicate content penalties.

    1. Exactly! This is a situation that clearly illustrates INTENT. Most people do not intend to optimize their HTTPS to make it look different than their HTTP. Thus, no worries. If you do try to optimize these differently, well… you get what you deserve (filter, penalty, dunce award, a time out, a nun smacking your fingers with a ruler, etc.).

  3. The wording may be confusing – really, when Google chooses to rank one page (let’s say, the HTTPs) higher, it may be and in most cases is ok, yet in other cases it may be the opposite of what you want. While I agree with you that this would not exactly be a penalty, it is still worth thinking about.
    Cheers!

  4. Thank you for taking such a technical issue on, nice work.

    We have an issue on this. The content on our pages can differ based on HTTP or HTTPS. For example we only receive/send SESSION ID’s on HTTPS connections. So when Customer “X” visits the HTTP version of the page they will get a generic version but that same Customer will get a personalized version of the page when it’s requested via HTTPS.

    So back to your point, we want Google-bot to crawl our generic HTTP content but the HTTPS content isn’t really what Google-bot should be looking at. How do we “redirect” Googlebot from the HTTPS pages to the HTTP content? Redirecting based on User-Agent starts looking like cloaking so that’s no fun.

    I like the robots_ssl.txt option and the dynamic noindex,nofollow option. However are you sure that if we *exclude* the HTTPS page Google will not get “confused” and also delist the HTTP version?

    Example at https://www.t1shopper.com/service/nj/willingboro/08046/37I3/

  5. Matt’s Advice has always worked for me. It is all about consistency both on your website and off. Always link to the Canonical URL, and even remove the other junk in the non canonical URLs if you can.

    I even suggest to some clients to use full url paths in their internal links for all URLs so that a crawler won’t start on a http page then hit a https page and then have all the pages there after be indexed as https rather than http.

    This also works with www vs no-www. It really is the same problem all in all.

  6. There is no difference between either, the only things that will crop up relating to these are technical queries regarding SEO . SEO for both of these will be the same as doing SEO for any other website.

  7. If I understand correctly http and https is a different domain as seen by Google, but we get penalized because it could result on duplicate content? Should we use a canonical tag, or just redirection to move out of https and go to http?

  8. @dhiraj Its because mostly you do not want the https version indexed. Most often google finds the page with and without https and mostly you would want the http page indexed.

    Or for example when people finish a checkout on our site they can get to all other pages on our site with https which will often give them browser alerts for mixed http/https images and includes

    thanks for the post

  9. I did some SEO for an online shop, that I practically placed in the Robots.txt what to get indexed and what not. As well as the Canonical tag, which pointed to the HTTP version of the website.

    1. We thought about using canonical tag but it really wasn’t designed for this purpose. Per the “spec,” the canonical tag should only be used on a URL with a query string like “mypage.php?id=2&my_name=joe” Canonical tag shouldn’t be used on a “clean” URL like “www.mycompany.com/my_page.php”

      Will using the canonical tag “incorrectly” look spammy or careless to Google?

  10. It would seem like they would have to penalize a lot of sites. Not many people go through and make sure there https pages are seen differently.

    Tristan, I thought I just read that https can be indexed just as well as pages not using SSL. Maybe not..

  11. I have been working on the same e-commerce website for over 6 years now and have never run into an issue where any of the SEs chose an https page over an http. I’m not saying it cannot happen!. I’m just saying that there are likely more important things to worry about.

    Further, your site will not be penalized for having an https that is identical to the http, if there is a valid reason to have both (i.e. logged in vs. not logged in, etc.) and you are not a documented spammer. It’s about intent.

    But, if you are really paranoid, don’t put any SEO effort into the https. Also, exclude any https file you don’t want crawled in your robots.txt. You could even triple-up by using nofollow where necessary.

    Yeah, people may still direct link to your https page which gives it a shot at being indexed and ranking, but by using the aforementioned, you’ll be fine.

    Seriously, there are way more important items to spend your time with.

    p.s. Nice article!

  12. There is a duplicate penalty for definite however the best method around this is to use meta data.

    I would also question the approach to where you have a form and the form data is sent to HTTPS – the form should by displayed using HTTPS from the start.

    If you are an ecommerce site and you took that approach, you are seriously hurting your customers confidence as they DO expect the page to be secure prior to sending data (to a secure location).

    Appropriate use of meta data will help Google et al remove duplicates.

  13. I’m facing same issue with one of my clients website. Problem is website works on both http and https version. I mean if i type manually address of website it will open in http but if i will edit it to https in browser and continue again it will display exactly same page in https version.

    unfortunately Google has indexed https version. More surprisingly Google has given more PR value to https version pages as compare to http version pages.

    I’ve belief that https pages do not rank well, hence i’m not doing good for most of the keywords.

    But my direct question is why Google has given more weightage to https pages and assigned it more PR. I know some controversy will start on why i’m bothering about BS Google PR. but question is WHY Google giving edge to https as compare to http.

    any experimental solution to get rid of this, will be appreciated…

    if you guys interested to know about website URL as well, I can show you in later report…

    1. Well this is an answer 2 years too late, but it might help someone else.

      1) Sometimes if there are 2 similar pages (http + https) google just uses the one it found first

      2) Sometimes if it starts looking at httpS, it just continues to do so

      You BEST bet here is to have canonical tags that always link to the “http” version. Meaning, even when viewing the httpS version, the canonical tags will point to the http version of the same link.

      This is good because (a) Google will eventually prefer and only show your http versions on SERPs (b) your “link juice” not only gets transferred to the http version, but also ADDED to it.

      I’m in no ways a pro at this stuff, but this is what a LOT of reading has taught me. Cheers.

  14. Thanks for this post! I must be living under a rock or something. I have a contact form on my site and think it is time for me to configure this portion of my site to SSL.

  15. “If you already have HTTPs pages in the index, remove them with Webmaster Tools”

    ^^^ This is a big HUGE NO-NO. ^^^

    Matt Cutts himself has answered this question and said multiple times “definitely don’t do this”. Your site might be taken off google for 6-months, and you’ll need to put in a re-inclusion request. (See 4th answer: http://www.mattcutts.com/blog/seo-advice-url-canonicalization/)

    Please do correct this in your article – as i wouldn’t want this to happen to anyone. (I almost tried it!) :)

  16. Hi. Thanks for this informative article. We are a financial site and thinking of moving all HTTP pages to HTTPS pages through 301 redirects. Will the “Google Juice” be transfered over? What kind of precautions do we need to take, if we also change the robots.txt file and the Webmaster tools stuff? We basically do not want any page in HTTP, but want to also mitigate any risk of losing SEO ranks. Would appreciate any ideas!

  17. I have the same question as Phoenix. Our developers would like to develop the whole website in https. But, in manual and natural link building process, we generally use only http. How can we overcome this problem?

    Can 301 redirects and canonicalization can be used in these cases so that the link juice is not altered?

    1. Yes, 301 redirect or rel=”canonical” will solve your problem. However, not all link juice will be transferred to your canonical pages. You will lose 5-10% of the link juice.