YATI & ERNIE: Machine Learning in Yandex and Baidu

Google isn't the only search engine making strides in machine learning. See how Yandex and Baidu have advanced search with YATI and ERNIE, as well.

VIP CONTRIBUTOR Dan Taylor

January 27, 2021
⋅
6 min read

VIP CONTRIBUTOR Dan Taylor Partner & Head of Technical SEO at SALT.agency

Bio

360

SHARES
1.6K

READS

YATI & ERNIE: Machine Learning in Yandex and Baidu

When it comes to machine learning and SEO, a number of advancements in the past decade have given Google a lot of publicity and praise for projects such as RankBrain, BERT, and SMITH.

With that said, Google isn’t the only search engine making great strides in advancing machine learning (ML).

Over a similar time frame to Google, Yandex has released similar projects into their ranking processes such as MatrixNet, Palekh, its second (more refined) iteration of Korolyov, and most recently, YATI.

Baidu has also been involved in developing machine learning technologies for search, with their more prominent ML model being ERNIE.

As I’m going to use the word Transformer a fair few times, it’s important to have a baseline understanding of what a Transformer is and how models like BERT and SMITH, are connected to YATI and ERNIE.

Let’s start there.

What Are Transformers?

In simple terms, a transformer is a deep-learning model used in recurrent neural networks (RNNs) for handling tasks involving sequential data and natural language.

Transformers facilitate something known as parallelization.

What this means is that input data doesn’t need to be processed in order, making it possible to process and facilitate much larger, greater scale datasets.

From this, we have been gifted in SEO with pre-trained systems such as BERT, GPT, and SMITH.

What Is YATI (Yandex)?

Since 2017, there has been little in terms of new ML technology from Yandex.

However, at the end of 2020, Yandex launched a new ranking algorithm based on transforming neural networks called YATI: Yet Another Transformer with Improvements.

It may not be poetic, but YATI has been hailed as the most significant and impactful change that Yandex has made to its search ranking algorithms since the introduction of MatrixNet in 2009.

As with all new search engine advancements, machine learning doesn’t replace the variables and parameters that we’ve operated in before but makes them better.

Like Google, Yandex relied on a number of algorithms to improve search results for users.

But since 2016 and the introduction of neural networks to its algorithm, Yandex has been building a much stronger algorithm of its own.

How YATI Will Affect Yandex Optimization

Based on Yandex’s information and statements around the reveal of YATI at YaC2020, the new machine learning component of the algorithm will account for more than 50% of the final weighting.

This means that through a better understanding of web documents and texts, making smaller changes to pages such as changing out title tags, adding in more keywords, and even exact match domains will no longer be as impactful (depending on competition and the niche).

As mentioned previously, this doesn’t mean that having strong technical, on-page, and off-page is now no longer needed.

It just makes it harder to game the system going forward.

Can You Optimize for YATI?

As YATI is an evolution of Yandex’s algorithms and not a revolution, for the most part, general Yandex optimization principles remain.

If anything, best practice has only been reinforced.

Fill Content Topic Gaps

Looking beyond keywords to topics, you need to make sure that your content is as rich with them as your competitors’.

For example, if you’re trying to attract users looking to buy protein powders and meal replacement shakes but you’re not talking about their ingredients, including a nutritional breakdown, or providing information on how they’re manufactured but your competitors are, you’re the odd one out in the dataset.

Structure Long Text Better

Breaking up pieces of text with subheaders can help users skim and find relevant parts of the text they want to read, as well as add structure for search engines.

Based on the documentation around YATI, it’s widely thought within the Russian search community that breaking up text that is 250- to 300-words with a subheader can yield benefits.

What is ERNIE (Baidu)?

Moving on from Yandex’s ML advancements, let’s look at ERNIE.

Baidu, like Google and Yandex, has a history with AI and machine learning.

In 2016, Baidu open-sourced the PaddlePaddle platform which had been used internally for a number of years to help develop:

Algorithms and technologies to better their search product.
Scalable image classification.
Machine translation of texts.
And Baidu’s advertising platform.

ERNIE (version 1.0) was introduced into PaddlePaddle, and the wider Baidu ecosphere in early 2019, with an updated version (2.0) coming around July that year.

ERNIE outperformed BERT and XLNet at its time of introduction on 16 NLP tasks and topped the public GLUE leaderboard.

XLNet, being a joint venture between Google and Carnegie Mellon University, outperformed BERT at the time.

In addition to helping advance technology and search products, another great outcome of ERNIE is a system called DuTongChuan, which is the first-ever context-aware simultaneous translation model.

ERNIE’s Impact on Search

ERNIE is an active part of the wider Baidu search algorithm and is used to both serve general search results and improve diversification within news feeds by removing duplicate stories (despite different headlines).

ERNIE also plays an active role in Baidu’s AI assistant, Xiao Du.

Using real-time models (similar to DuTongChuan) Xiao Du uses ERNIE to better understand and more accurately respond to voice requests.

Much of the literature published around ERNIE is on how it works and processes data.

The actual impact it has had across Baidu search as a whole isn’t known, however, we also need to remember that Baidu SERP results populate in a very different way than both Google and Yandex currently do.

Baidu pulls a number of rich snippets from its other products such as Baike, Zhidao, and Tieba. This means that organic queries may produce only one or two results on the first page.

Can You Optimize for ERNIE?

Similar to other ML algorithms being deployed across search, ERNIE is an evolution of existing principles.

Baidu’s core algorithms (Money Plant, Pomegranate, Ice Bucket) have encouraged webmasters to create better web experiences for users for a number of years.

Today, ERNIE is reinforcing these principles and rewarding websites that have invested in the user experience of search rather than trying to game it.

More Resources:

Category SEO International Search

The Ultimate Topic Cluster Cheat Sheet & Checklist Bundle

The New SEO Playbook: How AI Is Reshaping Search & Content

The Hidden Cost Of Google Ads: Stop Wasting Budget Bidding Against Yourself