HTML is a markup language that forms the basis of most webpages.
It is arguably one of the most fundamental parts of technical SEO.
Using HTML elements, SEO professionals are able to communicate information about the page to users and search bots.
This can help to clarify the importance, nature, and order of content on a page, as well as its relationship to other webpages.
The Difference Between Tags & Attributes
To understand the difference between tags and attributes we need to make sure we’ve got our terminology right.
Many people will use the phrase “tag” and “attribute” interchangeably, but let’s be precise.
The format of the below HTML element is in three parts:
- The opening tag.
- The content specific to that tag,.
- The closing tag.
<h1>Welcome to my page about kittens</h1>
- “<h1>” opens the tag.
- “Welcome to my page about kittens” is the content of the tag.
- “</h1>” closes the tag.
This element is a header and would be used as a visible title on a webpage to introduce the content about kittens.
Tags must have the opening <X> and closing </X> elements in order for the tag to work.
There are also empty elements, like <br>, which do not have any content or end tag to them.
Attributes are added to elements to modify them. They sit within the element, such as:
<link rel="canonical" href="https://www.example.com" />
3 Basic HTML tags
In order to make a useful webpage, there are a few key tags that are needed.
The <!DOCTYPE html> tag is the very first tag on a webpage.
It essentially introduces the page as being a webpage.
The <head> tag introduces the first section of the page.
This is where information about the page, that won’t be displayed on the page, is contained.
It’s important to know about the <head> as this is where some of the crucial tags for SEO need to be housed.
The body tag contains the information on the page that your visitors will see.
Here is where your copy, images, and videos will go.
The body will also house some of the other HTML tags we’ll talk about later.
Common Tags for SEO & the Attributes Used in Them
The <meta> tag sits within the <head> of the page.
It can contain attributes that describe information about the webpage that won’t actually be seen in the content of the page.
This meta tag is often called “metadata” because of the attributes that are used with it that control things like the “meta description” and the no-longer-used “meta keywords.”
The name attribute is used with the <meta> tag.
It is essentially a way of specifying to any bots which may visit the page if the following information applies to them or not.
For example, including <meta name=”robots” content=”noindex” /> means that all bots should take notice of the “noindex” directive.
You will often hear this called the “meta robots tag.”
If the following were used <meta name=”googlebot” content=”noindex” /> only Google’s bot would need to take notice of the “noindex” directive.
This is a good way of giving commands to some search bots that are not needed for all.
The “noindex” attribute is one commonly used in SEO.
You will often hear it being called the “noindex tag” but more accurately it is an attribute of the <meta> tag.
<meta name="robots" content="noindex" />
This piece of code allows publishers to determine what content can be included in a search engine’s index.
By adding the “noindex” attribute, you are essentially telling a search engine it may not use this page within its index.
For example, this is useful if there is sensitive content you want to not be available from an organic search.
For instance, if you have areas on your site that should only be accessible to paid members, allowing this content into the search indices could make it accessible without logging in.
The “noindex” directive needs to be read to be followed. That is, the search bots need to be able to access the page to read the HTML code that contains the directive.
As such, be careful not to block the robots from accessing the page in the robots.txt.
The description attribute, better known as the “meta description,” is used with the <meta> tag.
The content of this tag is used in the SERPs, underneath the content of the <title> tag.
It allows publishers to summarise the content on the page in a way that will help searchers determine if the page meets their needs.
This does not affect the rankings of a page but can help encourage clicks through to the page from the SERPs.
It is important to realize that in many instances Google will ignore the content of the description attribute in favor of using its own description in the SERPs.
For guidance on how to optimize your description attributes, see Jeff Riddall’s post.
The title tag is one you’ll be familiar with if you’ve been around SEO any length of time.
Also known colloquially as the “meta title”, it’s the tag you use to define the title of the page. It sits within the <head> of the site.
As such, it is not visible to users when on the webpage. However, it appears in the browser bar, in the SERPs, and allows you to signify the relevance of a page to a searcher’s query – both to the search bots and users.
It is an important element in SEO. To find out more about good practice with title tags read Corey Morris’ article.
<h1> to <h6>
The headings tags are used to indicate which parts of HTML content should be styled as headings.
The tags sit within the <body> of the page and therefore the text is visible to users viewing the page’s content.
The heading tags should be used to help structure the page.
When creating a website developers will ensure styles are associated with each type of heading tag.
This means the words wrapped in a <h1> tag should look different from the words wrapped in a <h2> tag.
This helps users to determine when a section of text is part of the section that went before it, like titles and subtitles.
The heading tags also help the search bots to determine the structure of the content on a page.
For further information on the importance and use of header tags, see Sam Hollingsworth’s excellent article.
Link Tags & Href Attribute
As SEO professionals, we spend a lot of time chasing links.
But do you know how a link is structured and therefore why some links are perceived to be worth more than others?
A standard hyperlink is essentially an <a> tag. It’s format is as follows:
<a href=”www.example.com”>anchor text of link goes here</a>.
The <a> tag indicates it is a link.
The href= attribute dictates the destination of the link (i.e., what page it is linking to).
The text that sits between the opening <a> tag and the closing </a> tag is the anchor text.
This is the text that a user will see on the page that looks clickable.
This is used for clickable links that will appear in the <body> of the page.
The <link> tag is used to link a resource to another and appears in the <head> of the page.
These links are not hyperlinks, they are not clickable. They show the relationship between web documents.
The rel=”nofollow” attribute tells bots that the URL within the href attribute is not one that can be followed by them.
Using the rel=”nofollow” attribute will not affect a human user’s ability to click on the link and be taken to another page. It only affects bots.
This is used within SEO to stop the search engines from visiting a page, or from ascribing any benefit of one page linking to another.
This arguably renders a link useless from the traditional SEO link-building perspective as link equity will not pass through the link.
There are arguments to say that it is still a beneficial link if it causes visitors to view the linked-to page of course!
The “nofollow” attribute can be used by publishers to help search engines to know when a linked-to page is the result of payment, for example, an advert.
This can help stop issues with the link penalties as the publisher is admitting that the link is the result of a legitimate deal and not an attempt to manipulate the rankings.
The rel=”nofollow” attribute can be used on an individual link basis like the following:
<a href=www.example.com rel="nofollow">anchor text of link goes here</a>
Or it can be used to render all links on a page as “nofollow” through using it in the <head> like a “noindex” attribute is used:
<meta name="robots" content="nofollow" />
For more information on when to use the rel=”nofollow” attribute, you can read Julie Joyce’s article.
How Google Uses the rel=”nofollow” Attribute
In 2019, Google announced that there would be some changes to the way it used the “nofollow” attribute.
This included informing us of some additional attributes that could be used instead of the “nofollow” to better express the relationship of the link to its target page.
These new attributes are the rel=”ugc” and rel=”sponsored.”
They are to be used to help Google understand when a publisher wishes for the target page to be discounted for ranking signal purposes.
The rel=”sponsored” attribute is to identify when a link is the result of a paid deal such as an advert or sponsorship. The rel=”ugc” attribute is to identify when a link has been added through user-generated content such as a forum.
Google also announced that these, and the “nofollow” attribute would only be treated as hints.
Whereas previously the “nofollow” attribute would result in Googlebot ignoring the specified link, it now will take that hint under advisement but may still treat it as if the “nofollow” is not present.
For more information about this announcement, see Matt Southern’s write-up.
The purpose of the hreflang attribute is to help publishers whose sites show the same content in multiple languages.
It directs the search engines as to which version of the page should be shown to users so they can read it in their preferred language.
The hreflang attribute is used with the <link> tag. This attribute specifies the language of the content on the URL linked to.
It’s used within the <head> of the page and is formatted as follows:
<link rel="alternate" href="https://example.com" hreflang="en-gb" />
It’s broken down into several parts:
- The rel=”alternate” which suggests the page has an alternative page relevant to it.
- The href= attribute denotes which URL is being linked to.
- The language code is a two-letter designation to tell the search bots what language the linked page is written in. The two letters are taken from a standardized list known as the ISO 639-1 codes
The hreflang attribute can also be used in the HTTP header for documents that aren’t in HTML (like a PDF) or in the website’s XML sitemap.
Using the hreflang attribute correctly can be tricky. For more information on its use, see Dan Taylor’s article on its proper implementation.
The rel=”canonical” attribute of the link tag enables SEO professionals to specify which other page on a website, or another domain, should be counted as the canonical.
A page being the canonical essentially means it is the main page, of which others may be copies.
For search engine purposes, this is an indication of the page a publisher wants to be considered the main one to be ranked, the copies should not be ranked.
The canonical attribute looks like this:
<link rel="canonical" href="https://www.example.com/" />
The code should sit in the <head> of the page. The web page stated after the “href=” should be the page you want the search bots to consider the canonical page.
This tag is useful in situations where two or more pages may have identical or near-identical content on them.
Uses of the Canonical Attribute
The website might be set up in such a way that this is useful for users, such as a product listing page on an ecommerce site.
For instance, the main category page for a set of products, such as “shoes”, may have copy, headers, and a page title that have been written about “shoes.”
If a user were to click on a filter to show only brown, size 8 shoes, the URL might change but the copy, headers, and page title might remain the same as the “shoes” page.
This would result in two pages that are identical apart from the list of products that are shown.
In this instance, the website owner might wish to put a canonical tag on the “brown, size 8 shoes” page pointing to the “shoes” page.
This would help the search engines to understand that the “brown, size 8 shoes” page does not need to be ranked, whereas the “shoes” page is the more important of the two and should be ranked.
Issues With the Canonical Attribute
It’s important to realize that the search engines only use the canonical attribute as a guide, it is not something that has to be followed.
There are many instances where the canonical attribute is ignored and another page selected as the canonical of the set.
For more information on how to use the canonical attribute correctly, see Matt Southern’s article.
The <img> tag is used to embed an image into a page of HTML.
The image tag does not insert the image into the page as such, but links to it in a way that allows the image to be visible on the page.
It essentially creates a container for an image that is hosted elsewhere.
The format of a <img> tag is as follows:
<img src="imagename.jpg" alt="this is the description of the image">
This tag contains two attributes, one that is essential for the tag to work, the other which can be left blank.
The src= attribute is used to reference the location of the image that is being displayed on the page.
If the image is located on the same domain as the container it will appear in, a relative URL (just the end part of the URL, not the domain) can be used.
If the image is to be pulled from another website, the absolute (whole) URL needs to be used.
Although this attribute doesn’t serve any SEO purpose as such, it is needed for the image tag to work.
The above image tag example also contains a second attribute, the alt= attribute.
This attribute is used to specify what alternate text should be shown if the image can’t be rendered.
The alt= attribute has to be present in the <img> tag, but can be left blank if no alternative text is wanted.
There is some benefit to considering the use of keywords within an image alt= attribute. Search engines cannot determine with precision what an image is of.
There have been great strides made in the major search engines’ abilities to identify what is in a picture. That technology is far from perfect, however.
As such, search engines will use the text in the alt= attribute to better understand what the image is of.
Use language that helps to reinforce the image’s relevance to the topic the page is about.
This can aid the search engines in identifying the relevance of that page for search queries.
It is crucial to remember that this is not the primary reason for the alt= attribute.
This text is used by screen readers and assistive technology to enable those who use this technology to understand the contents of the image.
The alt= attribute should be considered first and foremost to make websites accessible to those using this technology. This should not be sacrificed for SEO purposes.
For more information about how to optimize images, read Anna Crowe’s article.
This guide is an introduction to the core HTML tags and attributes you may hear about in SEO.
There are many more that go into making a functioning, crawlable, and indexable webpage, however.
The crossover between SEO and development skillsets is vast.
As an SEO professional, the more you can know about how webpages are constructed the better.
If you want to learn more about HTML and the tags that are available with it, you might enjoy a resource like W3Schools.
- 10 Most Important Meta Tags You Need to Know for SEO
- SEO for Beginners: An Introduction to SEO Basics
- The Complete Guide to On-Page SEO
All screenshots taken by author, December 2020