This post looks like taken straight from the SEOBlackHat.com Blog, but it is not.
I am showing you in detail a very clever trackback spam that is so good that even humans that check manually could fall for it and let it slip through in their own greed for high PR links from authority sites with .EDU or .GOV domains.
The Spammer makes use of this greed as well, among a set of other clever methods to make the scam invisible even to human eyes at first glance.
Trackback Spam is annoying more than anything else. My relatively young Blog at ReveNews.com gets already several hundred trackback attempts from spammers.
Unfortunately is there no way of telling them that no trackback goes online before a human approves it. Trackbacks are not like comments to a post and thus a delayed display of trackback (if okay with the author) does not hurt anybodies feelings. The MovableType trackback spam filter does not work as good as the comment spam one. The filter is notoriously flagging as much good trackbacks as spam as it does not flag spam as “seems to be alright”.
So I take the time every other day to scan over the trackbacks flagged as junk to check if a legit trackback ended up there. Since most are junk am I selection all first, to delete them. I thought that I found one and unchecked it. When I looked further another one caught my eye. Wait, that one looks familiar. Was it just a dÃ©jÃ vu?
I finished up and deleted all the obvious junk. What remained looked very interesting. If you have a problem reading it, click on the image for a larger version with more details. Look at the highlighted trackback first and ignore the rest. Looks like a real trackback to you too, right?
What raised one flag, the biggest of all, was the fact that 4 posts got a trackback within 2 days. The spammer was smart because I don’t think that it is coincidence that posts were targeted that are at least one month old, but less than three, posts that are not new, but also not too old for a trackback.
Now the trackback all by itself without showing up with his family would have passed my “eye-scan” test. I am sure not just mine. Can anybody see any of the words I will not mention here? PSA Ads from Google AdSense will not make this page prettier.
Let’s have a closer look at it. Okay, the most obvious thing is the absence of any typical keywords that are usually targeted with this type of spam.
- (yellow) Well selected text (2 partial sentences) that make the Excerpt look like something you would expect from a legit post that links to yours.
- (blue) This is the title of the post. It finishes the first Â½ sentence in the excerpt. The sentence makes perfect sense, especially because the column to the right shows the post title and matches the content of the excerpt.
- (red) The Name of the Blog, which is often the Website Name, Company or Person and is perfectly used to add the “what” and”Who” or “Where” elements that are commonly uses by bloggers.
- (green) the site or blog where the trackback was supposedly triggered from. The excerpt seems to show the beginning of a conversation which was simply cut off at the end of the excerpt, making you believe that there is something going on right now. The partial URL in the excerpt matches the URL in the From Column. Oh, a link from Stanford.edu, a PR8 site and education or governmental top level domain (TLD). Yes, today must be a good dayâ€¦ not.
I do always visit the site where the trackback came from to see what the other blogger is writing about. It is related enough to my post to link to it and I write about stuff that interests me so chances are that the post is also interesting for me. I often even engage in the discussion over at the other blog and make new contacts and may be even friends. That is what blogging is all about.
I followed the link this time too. At the Stanford site did I not find a post that links to my post of course, but what I got to see was part of this smart spam scheme.
Other than not do find at the destination what I expected, does nothing look suspicious at first glance. Let’s have a closer look at the details.
It appears to be a search result at the Stanford Site for some kind of research. It’s a university site, so what else would you expect, right? Now look at the message that shows the search phrase. Is that a link? Yes, it is. But how did that get in there?
Look between the words “research and recommendations on” and “products and flor-essence health”
This is Querystring encoded HTML code. Here is the code:
And this is the target site. The home of the Trackback spammer, a site that does not have much content (at least it is no random scraper garbage) if it is unique, free to republish articles with the author bio wrongfully removed or stolen copyrighted content, is a different story. You find the usual suspects to monetize the site, AdSense Ads and affiliate Links.
Linkshare Affiliate ID: /buuDimF0Z8
Google AdSense Publisher ID: pub-5024999383890701
I am an affiliate myself and contacted the affiliate manager of Vitacost.com to ditch this affiliate from their program, because affiliates like that cause the negative reputation of affiliate marketing and prejudges that all affiliates are spammers.
I might add that the PR5 of the search results at the Stanford site is probably only an estimated value and not the real PageRank. There are no Backlinks to be found in Google yet, which does not surprises me, because the scam is too new for that.
The Exploit used by the spammer is the fact that the Web Developer took the search phrase as is without escaping it and opened the site to all kinds of exploits, worse than this one, such as SQL Injection and that sort of things.
Here is an example how the site should have done it to prevent this exploit to be effective. I took the critical part of the search phrase used on the Stanford site and used it at AListApart.com.
You can see how the site does not render the HTML code, but shows it as it was entered. No benefits to gain for a potential spammer here.
This spam is smart. It even could trick humans that manually check, if they are not careful. I was thinking about not making the details public, because of the possible adaptation by other spammers.
I decided for the publication anyway, because I think it is important that Bloggers and Web developer learn about it and start to closing the wholes and to be more diligent.
Here is the to-do list
- Web Developers: Check and Fix your code for this exploit
- Bloggers: Watch out for those types of Trackbacks. Double check that it comes from a post that really links to you and reflects what you could see in the excerpt
- Spammers: Use this exploit to death that even my mom will get to know about the scam and be able to detect it
- Search Engines: I don’t know, probablyÂ clean the affected SERPS by using your secret “b-y-h-a-n-d” algorithm
The Stanford site was only one of 3 sites which are utilized by the spammer and only targeted my blog. There must be hundreds if not thousands sites out there that are being exploited in the same way. The other two sites that trackback to my blog are: www.vitaminworld.com and online.sagepub.com. I contacted all three sites and hope that they will close the security holes that were exploited quickly.
I have another trackback from earlier this week from the same guy that only uses the tricks shown for the excerpt, but a link to his site directly. He combined this trick with the search box exploit at the end of this week which makes it virtually undetectable for humans unless you check out the originating site for the trackback.
Message to the spammer: How you used the exploits shows that you are smart, creative and know people and what they want, all the things that make a good marketer. Don’t waste those talents on scams and do something clean and white-hat with it. I am sure that you will become successful without bending or breaking the rules.
Internet Marketer and Entrepreneur, operator of the Marketing and Web Development Resources portal at Cumbrowski.com