Search engines create maps of the Internet called Link Graphs and these maps help search engines determine whether or not a site is relevant or low quality and how the site fits into the Internet. Link graphs are a part of ranking and for that reason it’s important to understand what they are how your strategies make sense with this way of mapping the Internet.
What are Link Graphs?
Search engines map the Internet by the link connections between each website. These maps of the Internet are called Link Graphs. Link graphs reveal multiple qualities about websites on the Internet.
- Link graphs show how sites are connected to each other.
- Link graphs can be used to identify what topics a website is about.
- Link graphs can be used to identify spammy sites.
Sites Link to Other Sites Related to their Topic
Sites about software and technology link to other sites about software and technology. Sites about cooking tend to link to other sites related to cooking.
The important take away about link graphs is that they can help tell search engines what a site is relevant for.
The link graph can also reveal networks of spam sites. While spam sites link to normal non-spam sites, normal sites do not tend to link to spam sites.
This has the effect of isolating spam sites into their own corners of the link graph.
I promise that any jargon will be explained and what seems complicated will be simplified.
Link Distance Ranking Algorithms
There are some algorithms that rank links. Whether or not Google uses these kinds of algorithms is not known for certain. We just know they exist and that they perform very well for discovering which sites are spam, which sites are normal and what the topic of the sites are.
The way this works a map of the web is created that has multiple starting points. Each starting point is called a Seed Site. Each seed site which represents a site that’s expert, authoritative and trustworthy in its topic.
Sites that the seed site directly links to are also trustworthy and expert. What was discovered in this kind of algorithm is that the further away a link was from the original seed site the less trustworthy, expert and authoritative that site tended to be.
For the purposes of illustrating the link relationship:
- If a seed site links to a site, let’s call it a child site.
- If that child site links to a site, let’s call it a grandchild site (of the seed site).
- Sites that are in between, we can call them etcetera.
The seed site-based link graph might look something like this:
Seed Sites > Children Sites > Grandchildren Sites > Etc. > Your Site > Etc. > Spam Sites
Outbound Links And Relevance
Outbound links going out of a website (together with inbound links) can influence whether or not a site ranks at all.
When one site links to another site, they are connected within the link graph. All of those connections form groups, sometimes called neighborhoods.
Solar System Analogy
Stay with me, because now I’m going to make analogies of how sites link together to form link graphs, beginning with how a single site is linked together with itself.
For example, the Solar System could be thought of as a website.
The website home page could be thought of as the Sun. Earth, Mars, Saturn etc. can be considered analogous of pages from that website.
So the whole Solar System can be thought of as a website, as your website.
A Website is Analogous to the Solar System
Milky Way Galaxy
The Solar System exists within the Milky Way galaxy. The Milky Way galaxy consists of other suns and planets.
In our analogy, the Milky Way galaxy represents all the other websites that are like your website and that are also about your same topic.
So if your site is an ecommerce site selling auto parts, all those other auto parts ecommerce sites are interconnected with your auto parts site by links from forums, blogs, product sites, manufacturer sites, review sites, etc.
The Milky Way galaxy, in our analogy example, represents all the websites on the Internet that are specifically about auto parts ecommerce. But it can also be whatever your own website topic is.
This is something to think about:
Outbound links from one site to another site create a map of the Internet by topic.
So your website and all the other websites in your niche looks like this to Google:
Analogy of Interconnected Sites on Same Topic
But… Your Niche Exists in the Greater Internet
The example site of an auto parts ecommerce store (solar system) exists within the overall topic of all the auto parts ecommerce stores on the Internet, in this example represented by the Milky Way galaxy.
The auto parts ecommerce store topic exists within a greater entity, which is the larger and more general topic of ecommerce.
The Milky Way exists as part of a cluster of other galaxies. This cluster is called the Virgo Cluster.
The Virgo Cluster is an analogy of all the sites about ecommerce.
Analogy of All Sites About Ecommerce
The Internet Link Map Reveals Topic Clusters
When search engines map the interconnections between websites, all the different topics tend to form clusters similar to how suns and planets form galaxies, including some some overlap as we’ll see in a moment.
Sites about any given topic tend to be interconnected by the similar sites that tend to link to sites about those topics.
For example, human resources-related sites tend to link to the same group of human resources related software sites and recruiting-related sites.
The Milky Way exists within the Virgo Galaxy Cluster. The Virgo Galaxy Cluster can be said to represent all the sites that are about ecommerce.
So in that Virgo Galaxy Cluster example there are groups of interconnected sites about sports ecommerce, fishing ecommerce, toy ecommerce, makeup ecommerce, and so on across all the topics that ecommerce covers.
Cluster of Super Clusters
But the Internet is bigger than ecommerce. The Internet includes the topics of politics, social media, ecommerce, travel, handbag sales, toy ecommerce, legal, entertainment, news, everything.
Staying within our analogy of the Internet as cosmos, a supercluster of galaxy clusters, where the red dots in the image below are clusters of galaxies, this is what the Internet may look to Google as a Link-based Map of the entire Internet:
Supercluster of Galaxy Clusters
All of the websites of the entire Internet arrange themselves by links into structures that can be said to resemble galaxies that represent other sites that are in the same topic.
Those galaxies can be said to exist within clusters of other topics that are related, like all the sites about ecommerce, all the sites about news, all the sites about travel, etc.
And the entire Internet can be visualized as a giant supercluster of clusters.
The above illustration is an analogy of how the Internet self-organizes itself into a gigantic link-interconnected map that self-organizes by topic.
Six Degrees of Website Separation
There is an idea that all people are six friends away from other people. A friend of a friend of a friend of a friend of a friend of a friend will ultimately lead to a connection to virtually anyone.
Whether that’s true or not is besides the point right now. What matters is that a similar thing happens with links.
The only difference with links is that there is an end point where the further away you get from a starting point that more difference there is between the starting website where you began following links and the ending website further away.
What scientific researchers discovered is that if you begin at a starting point that you might call Expert and Authoritative, the further away you get from it the likelier it is that the site is spam.
The sites that are linked closer to the starting point tend to be more expert and authoritative and trustworthy.
That is the idea behind a type of ranking analysis called Link Distance Ranking.
Multiple scientific researchers (in and outside of Google) discovered that when you create a seed set of sites as starting points, it becomes even easier to weed out spam sites as well as more accurate at mapping out the Internet according to topic.
Link distance ranking algorithms provide a more granular level of categorization by topic to the Internet beyond the natural ordering that links provide.
Link Graphs Reveal Legit Links and Spam Links
Spam sites exist in their own cluster because that’s how the Internet naturally arranges itself by links, especially when you overlay a link mapping algorithm over the link graph.
In the early 2000s the search engines used statistical analysis to discover which linking patterns were unnatural. The sites with unnatural linking patterns were called the “statistical outliers” and those outliers were spam sites.
Later on researchers published link distance ranking algorithm research papers.
Today Google uses machine learning and AI to catch spam at the moment it discovers it when crawling and also at the point where Google places sites within the index. The exact processes involved in Google’s spam AI are not known, we only know that artificial intelligence is used.
Internet SEO scammers claim that they can trick Google by using mind-numbingly simple tricks. But when you understand how the Internet is ordered with a link graph, those child-level strategies are seen for what they are, implausible and sadly laughable.
Links Graphs and Ranking
There is somewhat very little you can do to control who links to you. Because of how link graphs work the task of identifying spam is easier.
Knowing how link graphs and associated link graph mining technologies work helps to make sense of why Googlers are so confident about their ability to catch link spam and stop it from working.
While there is little you can do to control the creation of links to your site, there is a lot that you can do to control what your site links to.
For that reason, in my opinion, it’s a good idea to be careful to link to pages that are useful to users in the context of your content.
It may be useful to conduct a periodic audit of all outbound links to make sure that you’re not linking to non-existent pages that are gone and displaying a 404 response. Another situation to look out for is for links to sites that are gone and are parked domain advertisements.
In the interest of user experience it’s also a good idea to scan your outbound links to be absolutely certain that there are no outbound links to insecure HTTP web pages.
There is little one can do to control who links to a site. But the sites you link to are entirely under your control. Poor outbound linking practices may send a negative signal that reflects poorly on the site and may contribute to a negative ranking influence.