Subscribe: Apple Podcast | Google Podcasts | Spotify
For episode 195 of The Search Engine Journal Show, I had the opportunity to interview Hamlet Batista, CEO at RankSense and a well-respected technical SEO.
Batista talks about how deep learning is transforming the way SEO tasks are automated and why learning and utilizing Python is a valuable skill for SEO professionals.
What is it about automating that has made it better and more effective today than before?
Hamlet Batista (HB): A lot of the stuff that couldn’t be possible four or five years ago, that we actually wanted to happen, now the building blocks are coming into place.
For us, the biggest change is that we had the desire to see all this happen.
Marketers are always going to be producing content. It’s an exciting thing.
Automated tagging, a menial task doing this and that, that’s never something is fun, but you have to do it.
And when I say the building blocks are coming into place, I see it in two tracks.
- The ability of marketers.
- The ability of machines.
Marketers are getting more and more technical skills.
Some started by using spreadsheets and doing simple formulas, now you’re seeing incredible work just in the formulas, in the sheets.
Marketers are getting more sophisticated. They’re using Google Data Studio, and there is a lot of technical work, regular expressions [involved].
A lot of the tools that we use, Google Analytics, Google Data Studio, Chrome, all of them add more advanced ways of doing things, programmatically.
So, you’ve got that track of the marketer getting more sophisticated by learning to script a lot of the tasks with some of the simpler tools such as spreadsheets.
On the other end, you also have the machines also getting more sophisticated and more capable.
And they’re not going to replace the marketer, but they’re able to do stuff that was not even possible two years ago.
One of the biggest changes, I will say, is deep learning.
We started with machine learning, statistical models, and stuff like that. But deep learning is more on perceptive capabilities.
The machines now have the ability to see things, the ability to understand or extract insights from text, from data that is unstructured.
Just in the last year or two, the NLP community, it’s been dramatic with stuff that you’re able to do.
And I hope that I’ve been able to show the community about this latest research.
What are some examples of tasks that can be automated that SEOs might resonate with?
HB: Aggregating data with databases and doing the reporting, I will say that’s phase one of automation. That saves a lot of time…
You pull all these reports from these different tools, and you have to run an analysis on that, you have to write a report from that.
And there’s also phase two. It’s where we are right now.
We’re talking about not just the aggregation of information or the collection of information, which is the phase one automation, [but] also the analysis part of the work has been automated.
The writing and reporting are also possible to automate.
Let’s look at an article I wrote that I recently published on Search Engine Journal about automatically generating titles and descriptions.
The computer will read an article. It will write a summary of the article in an abstract way.
So, it’s not like it’s copying content from the article, but it’s actually summarizing the content in the article in a way that makes perfect sense for an end-user, which will typically take minutes to do for a user.
And that computer can do it on scale automatically.
It’s a giant leap from just automated reporting, data collection and aggregation, into actually having the computer do work, do writing, which is a complicated task for humans. It’s incredible.
I also have another article, How to Generate Text from Images with Python, which also shows you with code how the computer now also can take an image with no text anywhere and generate a description of an image.
You can imagine, it’s just a sequence of images, so you can take a frame from the video, and have the computer automatically caption it.
That’s what I was saying that the second phase of automation is based on perceptive tasks. It’s about tasks where the computer has to have similar senses as the human has.
Brent Csutoras (BC): But, this is where the big kind of caveat comes in for a lot of people who are probably listening is that there were a lot of tools, a lot of scripts, a lot of things that tried to this.
They said, “Oh, we can improve your paragraphs,” or, “We can write some text for you.”
And of course, with that, came a lot of bad uses of it, like spamming or trying to mash stuff. But, what we always found was that the quality lacked.
Because if you would have told me you could go and read an article and write a synopsis, I would say, “Sure, but it’s not like going to be like if I wrote it myself. It’s not going to be the same quality.”
So, have we just evolved in technology and the learning and the AI capabilities to be able to truly write better? Is that the big factor that’s different between like seven years ago and today?
HB: That is the case, but I will tell you why.
The reason why is because you have the biggest companies in the world, with the largest investments in this technology.
You’re talking about Google, Facebook, DeepMind, Microsoft.
And then, they’re not only putting a lot of money, you’re talking about talent – the AI researchers, top researchers can be making a million dollars a year in a salary, so think about that.
That’s the kind of investment that these companies are making.
Now, let me give you the best part.
They’re making these huge investments and they are having the top researchers, and then they are competing against each other, and they’re making their work free and open. So, think about how crazy that is.
Brent Csutoras (BC): A lot of this sounds really exciting, and I’m sure people are listening. They’re like, “Yeah, gee, that’s great.”
But, how technical do you have to be to be able to do this?
HB: Another good news on that is that there is also a massive effort on what they call democratizing AI.
So, it depends on how custom is your use case. If your use case is common, you’re probably going to find something where you don’t have to even write any code to use it.
So, there is all the top Cloud providers have tools that they call AutoML.
Google, Microsoft, and Amazon have tools that you just provide your dataset. There are a few different ways.
You can use a pre-trained dataset that already exists, and it will solve your problem.
There are also black box tools like MonkeyLearn or BigML that you can use out of the box. And you say, “OK. Here is my problem. Run it through the tool. Give me the predictions that I need.”
And the predictions can be images, classification of numbers, whatever.
It depends on how customized is your use case. And then, you have different layers in between.
I love the ability to customize because it allows me to move away from the common use cases, so I can come up with more novel solutions.
And that’s why my approach requires a little bit more coding than normal, but even with that, and some of the examples that I used, they don’t need any coding at all.
If you use a tool from Uber called Ludwig, you only need to provide a configuration file, and you don’t even need to learn coding.
But, the more unique the use case, the more specific and the more novel you want to be about a particular solution, the more knowledge you need in terms of scripting, and in terms of the AI knowledge that you need for that.
What are some of the fundamentals people interested in this should learn?
HB: I think one of the fundamentals is called ETL, which means “Extract, Transform, Load.”
Those are principles that are going to be applicable always.
In machine learning, you have what is called pipelines. So, regardless of what mode, platform or technology you use, you have to prepare the data to do the predictions.
Whatever problem you have, you pull the data from Google Analytics or Search. You’re going to download it in different formats.
The machine learning model, they need it in a different way than the one you extract it in.
So, you need to be able to transform whatever source or proprietary data you have into the format that you can fit into the models.
I think at a basic level, that should be something that you should invest your time on because even if you’re going to use a black box tool, whatever tool you’re using, they all need that goal. They’re hungry for data.
And data is not a commodity. It’s hard to find it. So, if you are in a very specific niche, finding keyword in 10 datasets, it’s going to be very difficult.
Learning to produce these data sets yourself is so important.
So, learning a scripting language or a data transformation tool or language that allows you to take raw data in whatever format it is and reformat it how whatever tool you’re going to use to train and produce the predictions is the fundamentals that you can learn.
And one of the languages that are the easiest to do data transformations is Python, and that’s what I think also having an introduction to Python.
I have an article on Search Engine Journal, it’s a guide for SEO data analysis, which provides a lot of code snippets that you can use.
This is like you learning Spanish and English. So, you’re going to be able to communicate the same things, but just the syntax of the languages are different. That’s in theory.
Now, in practice, you’re going to find differences in prepackaged libraries. Libraries are the building blocks.
So, when you’re going to solve a problem, you’re not going to build the whole problem from scratch. You’re going to rely on third-party tools to do different parts of the problem.
And you’re going to find that one language which has more capabilities or has more extensive third-party support for certain types of problems than the other.
In practice, when I’m doing data analysis, when I’m doing machine learning or deep learning, I find Python to be a much better choice because of the extensive third party support.
The researchers are writing in Python. Facebook, Google, all these have amazing stuff. It’s coming to Python first.
So, it’s more about which language has better support for specific types of problems than the others. That’s why I would go one before the other.
This podcast is brought to you by Ahrefs and Opteo.
To listen to this Search Engine Show Podcast with Hamlet Batista:
- Listen to the full episode at the top of this post
- Subscribe via Apple Podcasts
- Sign up on IFTTT to receive an email whenever the Search Engine Journal Show RSS feed has a new episode
- Listen on Spotify, TuneIn, Stitcher or Overcast
Visit our podcast archive to listen to other Search Engine Journal Show podcasts!
Featured Image: Paulo Bobita