Bing just announced this week that they have rolled out a specific spam filtering mechanism a few months ago that targets a common spam technique known as URL keyword stuffing (KWS.)
URL KWS is a black hat technique that’s designed to manipulate search engines to give the page a higher rank than what it deserves.
URL KWS relies on two assumptions about ranking algorithms:
- Keyword matching is used, and
- Matching against the URL is especially valuable.
While that’s an overly simplifies approach, considering search engines employ thousands of signals to determine page ranking. However, these signals do indeed play a role.
After the spammer has identified these perceived ‘vulnerabilities’, they then attempt to take advantage of them by creating keyword rich domains names. Since the goal of spammer is to maximizing impressions, they tend to go after high value/ frequency/ monetizable keywords (e.g. viagra, loan, payday, outlet, free, etc…).
Those are the basics of the URL KWS concept. Spammers take a variety of approaches when implementing this, each with its own unique results. These are some of the more common approaches (note: the URLs mentioned below are fake, just used to make a point.) –
- Multiple hosts, with keyword-rich hostnames: http://account.free.online.savings.samedaypaydayloansusa.com
- Host/ domain names with repeating keywords: http://loan.payday.paydayloanspaydayloansusa.com
- URL cluster across same domain, but varied hostnames comprised of keyword permutations (http://contososhoeswomen.shoesonsale.com/, http://bestwomensrunningsneakers.shoesonsale.com/, http://discountrunningapparelforwomen.shoesonsale.com/)
- URL squatting: This is a little different as the spammer is playing on a human tendency to misspell keywords & in effect syphoning traffic off of existing (typically high profile/ traffic) sites. E.g. http://nytime.com(misspelling ofhttp://nytimes.com), http://ebey.com (misspelling of http://ebay.com)
How does Bing detect URL KWS?
Bing did not give out specific details on detection algorithms because spammers are likely to use that knowledge to their advantage. What Bing did reveal is that they look at a number of signals that suggest possible use of URL keyword stuffing, such as:
- Site size
- Number of hosts
- Number of words in host/ domain names and path
- Host/ domain/ path keyword co-occurrence (inc. unigrams and bigrams)
- % of the site cluster comprised of top frequency host/ domain name keywords
- Host/ domain names containing certain lexicons/ pattern combinations (e.g. [“year”, “event | product name”], http://www.turbotaxonline2014.com)
- Site/page content quality & popularity signals
Bing also illustrates what kind of impact this spam filtering has had on users and the SEO community.
Where users are concerned, this update has impacted ~3% of Bing queries (on average ~1 in 10 URLs was filtered out per impacted query.)
Where the SEO community is concerned, around 5M sites, comprising over 130M urls, have been impacted, resulting in upwards of 75% reduction in traffic to these sites from Bing.