How to Make Search Data the Center of Your Universe [PODCAST]

DeepCrawl's Jon Myers explains how you can harness and combine your GA, GSC, backlink, and log files data to get a comprehensive overview of your website.

SEJ STAFF Loren Baker

November 2, 2017
⋅
16 min read

SEJ STAFF Loren Baker Founder at Foundation Digital

Bio

360

SHARES
643

READS

How to Make Search Data the Center of Your Universe [PODCAST]

Podcast: Download

Subscribe: Apple Podcast Google Podcasts Spotify

If you’ve been in the industry for a while now, you know how laborious search engine optimization (SEO) is.

You spend so much time and resources on various tactics hoping your site:

Ranks high on the SERPs.
Generates quality leads.
Gets a boost on conversions.

All easier said than done.

Before any of those can happen SEO professionals need to be cognizant of their main duty – that is to ensure the website is actually crawled and indexed by search engines.

Thankfully, with the help of powerful technology, you can now gather and evaluate important data regarding your website’s technical components.

Having a high-level of knowledge about your site is crucial as it allows you to improve your SEO performance accordingly.

I had the pleasure of interviewing Jon Myers, Chief Growth Officer at DeepCrawl, in this sponsored episode of Search Engine Nerds. Myers discusses how the latest technology can be used to combine data from various sources – such as Google Analytics, Google Search Console, backlinks, and log files – in order to get a comprehensive overview of your website.

You’ve been discussing the concept of the search universe, would you like to go a little bit deeper into that?

Jon Myers: We have an incredible amount of customers out there in the search space and in 60 countries now, I believe. Obviously, DeepCrawl is known well as an enterprise-level crawler that allows you to do technical SEO audits and understand what’s wrong with your site as your site sits today with Google.

What we’re trying to do now is push those boundaries a little bit further and we’re trying to… not step out of the crawling space, because we actually want to stay true to what we do and what we believe in which is to give you guys out there, enterprise-level SEO, the ability to get that powerful data back and understand what’s going on from a technical SEO point of view when you crawl your website.

What we’re doing with the search universe pieces, which is something we’re just starting to roll out in queue fall this year, is to think about:

What data can I add to my crawl reports?
What value can I also ingest into it whilst I’m crawling from an external data source point of view to allow us to better optimize and be better SEOs at what we do?

Straight away things we’re thinking about is backlink data. Everybody loves good backlink; it’s the currency of the web.

Google loves to understand what’s good, what’s bad. How can we take that backlink data and drag that down into the crawl, as we run the crawl, to let you guys understand the state in backlink?

Everybody does a lot of link cleanup and backlink review and use, obviously, there are some great tools out there in the marketplace for that.

At DeepCrawl, what we decided to do was build a conduit to allow you to take data from say Ahrefs, Moz, Majestic SEO. As you run the crawl, pulling all the goodness of all of those links that are flowing into your website, to see alongside the internal link structure and internal site architecture, where those links land.

Are those good quality links landing on high authority pages? If they’re good quality links landing on low authority pages, maybe we want to do a little bit of internal site architecture cleanup work and move those good links up the hierarchy.

If you have poor quality links and you’ve got a lot of them, yes, you want to do the link cleanup piece. But let’s see how it’s affecting you from an internal site architecture point of view as well. That’s kind of the first part of the concept…

Loren Baker: To date, DeepCrawl’s had the ability to integrate Google Analytics (GA) information, correct? You can see how many times that specific page has been visited…

The ability to look at that information from an analytics standpoint makes perfect sense. Now you’re telling me that you’ll be rolling out the ability for me to say utilize my Majestic login or Majestic API to see which of those pages are getting the most internal links, which of those pages are possibly getting other signals to make sound and data-driven decisions on what to do next.

JM: Absolutely… I mentioned the backlink. It’s kind of this four sides to the search universe as we see it and the backlinks would be one of them. You’re obviously right with the GA one, we’re touching on what we’re calling sort of the consumer segment of this idea. The wonderful thing is that GA is readily available and out there but we’re also thinking about those larger enterprise solutions as well. Things like Omniture, and the ability to pull in Omniture data.

I was talking at an event the other day and I was talking about, obviously, Google drives awareness. But for me the consumer drives ROI.

We can lead the traffic to the website but then we’ve got to make sure that person when they’re on the site, they buy. Having that ability to see the most visited pages or the biggest dwell time pages, or the most liked products via GA or Omniture… It enables you to understand what is popular and what other products are being purchased.

Again, when you’re running a crawl, pull in that GA data, pull in that Omniture data, understand where those consumers are, understand not necessarily what are the most promoted pages to the likes of Google [but] actually, what are the most promoted pages to the consumer today, the one that have the dollars in their pockets to purchase that stuff. That’s kind of step two of the search universe as we see it.

LB: Yeah, which does make total sense… To put out content that’s relevant to the product but then that content sits over here, right? And then the product and the e-commerce experience sit over here. The two aren’t necessarily always intertwined.

The ability to say, ‘Well hey, you’re getting X thousand amounts of visits per month to this post about this product that you sell but that traffic is not skipping or making its way over to the product page where they can actually purchase and make the e-commerce decision’ is fairly important as well.

JM: Absolutely agree… You can find pages, that maybe aren’t even indexed or don’t rank highly in Google, that are getting an incredible amount of traffic.

There’s a real opportunity there from your analytics data to then think:

Why isn’t that page indexing in Google?
What’s the issue from an SEO perspective?

…And make sure that page is implemented and promoted in the right way. It could be something incredibly simple as to why that page is not being seen in good light by Google.

Being able to take that crawl data and understand that issue and bring in consumer piece, is for me an incredibly powerful combination. It’s driving ROI. It’s not just doing indexation work and SEO work, it’s actually showing tangible ROI back from the organic traffic that we all know drives great ROI anywhere. But actually having that ability to show that and take that to the CMO or your boss and say, ‘look this is driving something which is pretty potent for us.’

LB: Yeah, there’s also the marriage in the search universe of SEO and paid search.

One thing that you got me thinking of when you were discussing how people are finding your site, there are a lot of sites out there specifically in the e-commerce space… They’re building up their site that’s intended for Google but then they’re also building micro pages, offer pages, email landing pages, and other pages that are not intended for Google and there’s no communication whatsoever with the SEO team and the other team about that.

At the end of the day, what happens is Google has ways of finding that information outside of your typical site navigation or in sitemaps. What can happen is multiple offer pages, multiple duplicate pages, pages that you do not intend to get ranked whatsoever, different pricing information, different product information…

It’s hard to get indexed, right? The ability to pull all of that up, as well. This is hitting home.

JM: It hits home, I agree with you, and it amazes me over the years of running paid search teams and running SEO teams. One of the first things I always tell a paid search team building those pages, I mention those specific landing pages to drive the pay per click advertising, is to make them noindexed… Just don’t let those pages be on Google, you know that the traffics come but you’re right, nine times out of ten, people forget to do that piece.

Another great example you mentioned obviously, as a publisher at SEJ, you published an incredible amount of content. We’ve seen it with other publishers, maybe in the old-school publishing world, the newspaper groups where basically it’s used something like DeepCrawl in the dev world because they’re launching bunches of news pages all the time. They’re wondering why stuff’s not indexing or why their AMP pages are all over the place and stuff like that. It’s because they’re not doing the pre-Q&A before launch in the data environment and checking the pages are right, the tags are in place, everything’s lined up for Google, the AMP pages…

I saw one a few weeks back where they’ve done a whole bunch of AMP work and then actually haven’t put the right tags in place to allow Google to effectively move forward and actually take those out page onboard. So they thought they’ve done the job and done everything well and Google just wasn’t pulling in any of the AMP pages because they’ve simply just not told Google they were there.

Actually, using the crawling technology in that sense, to do the pre-crawls in the dev kind of work, we’re seeing a lot of good traction in that space as well.

With DeepCrawl, you can set the ability to crawl your site in a dev environment?

JM: Yup, absolutely. Replicate the crawl before you launch the site.

A lot of these types of companies, big retailers that maybe launching a new product section and want to crawl it before they launch it and put it out there on the site, or it could just be actually a site migration. You built the new site, you’ve done the site work, you want to check it before you push it out, we can crawl in the dev environment and replicate how Google would see it. People get a lot of value from doing that sort of thing as well.

backlinks

How can one utilize a tool like DeepCrawl or whatnot to analyze those backlinks to make sure they are or are not toxic before filing that ever important disavow?

JM: I think it’s a subtly different thing from us. As I’ve mentioned earlier there are some great link tools out there in the marketplace. The ones that we’ve chosen to work with are Moz, Ahrefs, and Majestic because they just offer a huge scope and the ability to see effectively the link universe as such. The nuance, which we’re giving to our customers because at the end of the day we’re a crawl so we’re looking to stay true to what we do, which is technical SEO and the ability to crawl information ultimately.

The bit which we want to ingest and bring into DeepCrawl is alongside our ability to crawl the internal site architecture and understand how pages are interacting with each other, and we give them a DeepRank score which allows you to understand what, from an SEO perspective, are high-quality and authority pages and what are low-quality pages.

I think that data layer of backlinks enables you to see what kind of links are coming in on what pages within the site. We feel that’s an incredibly powerful thing for an SEO to have because you can then start to think about what is the quality of the links.

Are they good quality links?
Are they toxic links?

Breaking them down and understanding the negative and positive effect that you’ll see from those links, but then adding that extra layer, which is that ability to see where those links land in the site and see if you’ve got some incredibly good quality links from like a Yahoo or something like that, that are landing on a level 47 page, and it’s a non-authority page deep in the site. What’s the point?

Let’s think about [what we can do]:

Some easy link cleanup.
Put some redirects in place.
Move those good quality links into the higher authority pages (maybe sub-categories on the retailer site or the top level).

You’re actually using it to do some positive work to actually re-engineer where to best use your backlinks. Because, as we all know, with big publishers over the years, you just get people linking to you. You’ll see a link, a site with a million and a half links, I can guarantee you half of those will just be organically fed links where people have just linked the publisher… and is just dropping into a whole manner of different places. They will be high-quality links or poor quality, network type blog links or whatever it might be.

There’s a nice opportunity just to understand how the whole ecosystem fits together and we want to kind of give our customers that opportunity to be able to really utilize the best backlink data from some great providers into crawl reports and take it to the next level of what you can actually action data for.

We’ve talked about Google Analytics data and we’ve discussed link data a bit. Where in the search universe does Google Search Console come into play?

JM: I think that, for me, it just comes into the analytics piece. I’ve put Google Analytics into the consumer space because we’re looking at visitors and how people are navigating. I think for GSC we brought out a party about two months ago and they’re pretty advanced integration with GSC to allow you guys to actually just see all of your Google search console data within DeepCrawl and the DeepCrawl environment.

I think Google did a really good job with GSC. I think it’s a good revision of Webmaster Tools and they’ve built something, which is pretty handy for everybody out there. I can guarantee every SEO in the world has got a GSC account or a GA account.

Again, what we wanted to do was take all of that goodness of the data that you see within GSC and bring that into the DeepCrawl crawl environment so actually when you run a crawl, you just pull in the GSC data, you see that alongside your crawl report data, you see how that’s performing for your site and all of your internal architecture, alongside all of the external factors of GSC.

It’s just about the added layer… just to give you the ability to take everything that every SEO probably loves on a Monday morning and looks at GSC and sees what’s going on, but to actually directly and automatically marry that to your crawl data to allow you to start making actionable changes and calls based against GSC data or decisions that you will make to alter from the technical SEO perspective.

For us, it’s the third statement of four in the search universe but it’s a pretty powerful segment to actually want to play with and make sure you can see that update alongside the crawl data.

How important would you say it is for a company to give their SEO the access to their log file data?

JM: I think it’s incredibly important. You’re always going to struggle in some cases. Basically, on some companies, maybe large enterprise-level companies, that lock and secure everything down. You’ll be struggling to get that data off them. If I was to say how important it is, I think it’s incredibly important to have that data.

The way that we’ve looked at it is not to just go and write, ”here’s a massive data drop of absolutely everything,” because we don’t need that. We just kind of want the search engine bits. We don’t need all the other stuff because a log file can get incredibly large and bloated based on the size of the site.

You don’t want to have to weed through all that stuff to just get to the bit that you truly want. What we’re doing is when we’re bringing the file, we’re extracting the Google data and the search engine data for Googlebot and mobile bot and so on and so on. Just running that stuff into DeepCrawl just so you actually get the bit that you really need to make the difference. Which is probably 5-10 percent of the log file and getting rid of the other part of it and trying to make your lives easier effectively so you don’t have to go through that whole process.

To listen to this Search Engine Nerds Podcast with Jon Myers:

Listen to the full episode at the top of this post
Subscribe via iTunes
Sign up on IFTTT to receive an email whenever the Search Engine Nerds podcast RSS feed has a new episode
Listen on Stitcher, Overcast, or Pocket Casts

Think you have what it takes to be a Search Engine Nerd? If so, message Loren Baker on Twitter, or email him at loren [at] searchenginejournal.com. You can also email Brent Csutoras at brent [at] alphabrandmedia.com.

Visit our Search Engine Nerds archive to listen to other Search Engine Nerds podcasts!

Image Credits
Featured Image: Paulo Bobita

Category Marketing Analytics SEJ Show

The Ultimate Topic Cluster Cheat Sheet & Checklist Bundle

The Hidden Cost Of Google Ads: Stop Wasting Budget Bidding Against Yourself

The Hidden Cost Of Google Ads: Stop Wasting Budget Bidding Against Yourself

The State Of AI in Marketing

Social Media Planner: How To Plan Your Content (With Template)

The Hidden Cost Of Google Ads: Stop Wasting Budget Bidding Against Yourself

How to Make Search Data the Center of Your Universe [PODCAST]

You’ve been discussing the concept of the search universe, would you like to go a little bit deeper into that?

With DeepCrawl, you can set the ability to crawl your site in a dev environment?

How can one utilize a tool like DeepCrawl or whatnot to analyze those backlinks to make sure they are or are not toxic before filing that ever important disavow?

We’ve talked about Google Analytics data and we’ve discussed link data a bit. Where in the search universe does Google Search Console come into play?

How important would you say it is for a company to give their SEO the access to their log file data?

To listen to this Search Engine Nerds Podcast with Jon Myers: