Tag: webspam

SEOmoz Takes On Webspam With Ambitious Project, Talks Penguin Update

SEOmoz is working on a new spam research project aimed at classifying, identifying and removing (or at least limiting) the link juice that spam pages and sites can pass – a pretty ambitious goal, to say the least. Can SEOmoz do this better than Google itself?

CEO Rand Fishkin announced the project on Google+ Monday evening, acknowledging that his company is “certainly not going to be as good at it or as scaled as Google,” but that it’s making for interesting research.

Fishkin tells WebProNews that Google’s Penguin update was not the motivator behind the project, though he did have this to say about the update:

“In terms of Penguin – it’s done a nice job of waking up a lot of folks who never thought Google would take this type of aggressive, anti-manipulative action, but I think the execution’s actually somewhat less high quality than what Google usually rolls out (lots of search results that look very strange or clearly got worse, and plenty of sites that probably shouldn’t have been hit).”

You can read more about Penguin via our various articles on the topic here.

“We’ve been wanting to work on this for a long time, but our data scientist was previously tied up on other items (and we’ve just hired a research assistant for the project),” Fishkin tells us. “The original catalyst was the vast quantity of emails and questions we get about whether a page/site is ‘safe’ to acquire links from, or whether certain offers (you know the kind – ‘$100 for 50 permanent text links guaranteed to boost your Google rankings!’) were worthwhile.”

“Tragically, there’s a lot of money flowing from people who can barely afford it, but don’t know better to spammers who know that what they’re building could hurt their customers, and Google refuses to take action to show which spam they know about,” he continues. “Our eventual goal is to build a metric marketers and site owners can use to get a rough sense of a site’s potential spamminess in comparison to others.”

“A score (or scores) of some kind would (eventually, assuming the project goes well) be included in Mozscape/OSE showing the spamminess of inlinks/outlinks,” he explained in the Google+ announcement.

According to Fishkin, the SEOmoz algorithms will be conservative and focus on the most obvious and manipulative forms of spam. “For example, we’d probably catch a lot of very obvious/bad link farms, but not necessarily many private blog networks or paid links from reputable sites,” he said in response to a comment on his Google+ post.

Also in the comments, Fishkin indicated that data would be presented in a ‘matches patterns of sites we’ve seen Google penalize/ban” kind of way than a “‘you are definitely webspam’ type of thing.”

The data scientist Fishkin spoke of will present the findings at the company’s Mozcon event in July. Fishin expects an actual product launch late this year or early next year.

Earlier this month, the company announced that it has raised $18 million in VC funding.

May 15, 2012
Google Penguin Update Recovery: Matt Cutts Says Watch These 2 Videos

Danny Sullivan at Search Engine Land put up a great Penguin article with some new quotes from Matt Cutts. We’ve referenced some of the points made in other articles, but one important thing to note from the whole thing is that Cutts pointed to two very specific videos that people should watch if they want to clean up their sites and recover from the Penguin update.

We often share Google’s Webmaster Help videos, which feature Cutts giving advice based on user-submitted questions (or sometimes his own questions). I’m sure we’ve run these in the past, but according to Sullivan, Cutts pointed to these:

Guess what: in both videos, he talks about Google’s quality guidelines. That is your recovery manual, as far as Google is concerned. Here are some articles we’ve posted recently specifically on different aspects of the guidelines:

Google Penguin Update: Don’t Forget About Duplicate Content

Google Penguin Update: A Lesson In Cloaking

Google Penguin Update Recovery: Hidden Text And Links

Recover From Google Penguin Update: Get Better At Links

Google Penguin Update: 12 Tips Directly From Google

Google Penguin Update Recovery: Getting Better At Keywords

Google Penguin Update: Seriously, Avoid Doorway Pages

Google Penguin Update And Affiliate Programs

So, in your recovery plan, take all of this into account, and these tips that Cutts lent his seal of approval to.

And when all else fails, according to Cutts, you might want to just start over with a new site.

May 11, 2012
Google Penguin Update: Google Granted Another Possibly Related Patent

Google released the Penguin update a couple weeks ago, in an effort to rid its search engine results of webspam. The update targeted the kinds of things Google has always tried to rid its results of, but the update is supposed to make Google algorithmically better at it. That, combined with the ever-refreshing Panda update, could go a long way to keep Google’s results closer to spam-free than in previous years.

Meanwhile, Google continues to secure related patents. Bill Slawski is always on top of the patents in the search industry, and recently pointed out some that may have a direct role in how Google handles Webspam. Today, Google was granted another, as Slawski points out. As usual, he does a wonderful job of making sense out of the patent.

While it appears pretty complex, and there is more to it, part of it is about how Google can disassociate spam form legitimate content, which at its most basic level, is the point of the Penguin update.

It’s called Content Entity Management . Here’s the abstract:

A first content entity and one or more associated second content entities are presented to one or more arbiters. Arbiter determinations relating to the association of at least one of the second content entities with the first content entity are received. A determination as to whether the at least one of the second content entities is to be disassociated from the first content entity based on the arbiter determinations can be made.

“It makes sense for Google to have some kind of interface that could be used to both algorithmically identify webspam and allow human beings to take actions such as disassociating some kinds of content with others,” explains Slawski. “This patent presents a framework for such a system, but I expect that whatever system Google is using at this point is probably more sophisticated than what the patent describes.”

The patent was filed for as far back as March, 2007.

To the point about human beings, which as Slawski acknowledges, could be Google’s human raters (and/or others on Google’s team), there is a part in the patent that says:

In one example implementation, arbiters can also provide a rationale for disassociation. The rationale can, for example, be predefined, e.g., check boxes for categories such as “Obscene,” “Unrelated,” “Spam,” “Unintelligible,” etc. Alternatively, the rationale can be subjective, e.g., a text field can be provided which an arbiter can provide reasons for an arbiter determination. The rationale can, for example, be reviewed by administrators for acceptance of a determination, or to tune arbiter agents, etc. In another implementation, the rational provided by the two or more arbiters must also match, or be. substantially similar, before the second content entity 110 is disassociated from the first content entity 108. Emphasis added.

The actual background described in the filing talks a little about spam:

A first content entity, e.g., a video and/or audio file, a web page for a particular subject or subject environment, a search query, a news article, etc., can have one or more associated second content entities, e.g., user ratings, reviews, tags, links to other web pages, a collection of search results based on a search query, links to file downloads, etc. The second content entities can, for example, be associated with the first content entity by a user input or by a relevance determination. For example, a user may associate a review with a video file on a web site, or a search engine may identify search results based on a search query.

Frequently, however, the second content entities associated with the first content entity may not be relevant to the first content entity, and/or may be inappropriate, and/or may otherwise not be properly associated with the first content entity. For example, instead of providing a review of a product or video, users may include links to spam sites in the review text, or may include profanity, and/or other irrelevant or inappropriate content. Likewise, users can, for example, manipulate results of search engines or serving engines by artificially weighting a second content entity to influence the ranking of the second content entity. Fox example, the rank of a web page may be manipulated by creating multiple pages that link to the page using a common anchor text.

Another part of the lengthy patent document mentions spam in relation to scoring:

In another implementation, the content management engine 202 can, for example, search for one or more specific formats in the second content entities 110. For example, the specific formats may indicate a higher probability of a spam annotation. For example, the content management engine 202 can search for predetermined uniform resource locators (URLs) in the second content entities 110. If the content management engine 202 identifies a predetermined URL a second content entities 110, the content management engine 202 can assign a low association score to the one second content entities 110.

Another part, discussing comments, also talks about spam detection:

In another implementation, a series of questions can be presented to an arbiter, e.g., “Is the comment interesting?,” “Is the comment offensive?,” “Does this comment appear to be a spam link?” etc. Based on the arbiter answers, the content management engine 102 or the content management engine 202 can, for example, determine whether one or more second content entities are to be disassociated with a first content entity item.

The document is over 10,000 words of patent speak, so if your’e feeling up to that, by all means, give it a look. It’s always interesting to see the systems Google has patented, though it’s important to keep in mind that these aren’t necessarily being used in the way they’re described. Given the amount of time it takes for a company to be granted a patent, there’s always a high probability that the company has moved on to a completely different process, or at least a much-evolved version. And of course, various systems can work in conjunction with one another. It’s not as if any one patent is going to provide a full picture of what’s really going on behind the scenes.

Still, there can be clues within such documents that can help us to understand some of the things Google is looking at, and possibly implementing.

Image: Batman: The Animated Series

May 10, 2012
Google Penguin Update: Report Spam With Google Docs

You can learn a lot of little helpful tidbits by listening to what Googles head of webspam has to say, and lucky for webmasters, he’s always saying through various channels on the web. This includes YouTube videos, his blog, Google+, Twitter, Google’s official blogs and various forums and comments threads.

In case you wondering, according to Cutts (talking on Twitter), it’s fine if you want to send a link to a Google Docs spreadsheet when you report Penguin spam.

@winnersmedia
Matthew Kennedy@mattcutts Can we send a link to a Google Docs spreadsheet when reporting spam? #penguin 1 day ago via web ·  Reply ·  Retweet ·  Favorite · powered by @socialditto

@mattcutts
Matt Cutts@winnersmedia sure. 18 hours ago via Twitter for Android ·  Reply ·  Retweet ·  Favorite · powered by @socialditto

Last week, Cutts tweeted that Google had read and processed almost all post-Penguin spam reports:

@mattcutts
Matt Cutts@Penguin_Spam yup yup, we’ve read/processed almost all of them. A few recent ones left. 3 days ago via web ·  Reply ·  Retweet ·  Favorite · powered by @socialditto

I’m sure there have been some reports submitted since then, but clearly Google isn’t taking too long to sift through them.

May 7, 2012
Google Penguin Update Recovery: Getting Better At Keywords

Last week, Google unleashed its Penguin update upon webmasters. The update, as you may know, was designed to decrease the rankings of sites engaging in black hat SEO tactics and webspam. One of the classic black hat tactics is keywords stuffing, so if you’ve been doing this and getting away with it in the past, there’s a good chance the update took you down a notch.

Specifically, Google’s Matt Cutts said the update “will decrease rankings for sites that we believe are violating Google’s existing quality guidelines. Avoiding keyword stuffing has long been one of these guidelines. The guideline says, “Don’t load pages with irrelevant keywords.”

Google has a page about this in its help center, where it elaborates a little more. Here’s what Google says, verbatim, about keyword stuffing there:

“Keyword stuffing” refers to the practice of loading a webpage with keywords in an attempt to manipulate a site’s ranking in Google’s search results. Filling pages with keywords results in a negative user experience, and can harm your site’s ranking. Focus on creating useful, information-rich content that uses keywords appropriately and in context.

To fix this problem, review your site for misused keywords. Typically, these will be lists or paragraphs of keywords, often randomly repeated. Check carefully, because keywords can often be in the form of hidden text, or they can be hidden in title tags or alt attributes.

Unlike some of the other black hat tactics advised against in the guidelines, such as cloaking, Google specifically named keyword stuffing in its announcement of the Penguin update. Cutts even provided the following image in the announcement, highlighting this particular tactic:

Cutts has spoken out about the practice plenty of times in the past. Here’s a humorous example of when he called out one site in particular about five years ago.

More recently – last month, in fact – Cutts talked about a related violation in a Google+ update. He discussed phone number spam, which he essentially equates to keyword stuffing.

““I wanted to clarify a quick point: when people search for a phone number and land on a page like the one below, it’s not really useful and a bad user experience. Also, we do consider it to be keyword stuffing to put so many phone numbers on a page,” he wrote. “There are a few websites that provide value-add for some phone numbers, e.g. sites that let people discuss a specific phone number that keeps calling them over and over. But if a site stuffs a large number of numbers on its pages without substantial value-add, that can violate our guidelines, not to mention annoy users.”

Here’s the image he was referring to:

Getting Better At Keywords

Cutts has advised that you not spend any time worrying about the keywords meta tag (though Google does use the meta description tag):

In March, Google released a video about 5 common SEO mistakes and 6 good ideas:

One of the “good ideas” was:

Include relevant words in your copy: Try to put yourself in the shoes of searchers. What would they query to find you? Your name/business name, location, products, etc., are important. It’s also helpful to use the same terms in your site that your users might type (e.g., you might be a trained “flower designer” but most searchers might type [florist]), and to answer the questions they might have (e.g., store hours, product specs, reviews). It helps to know your customers.

I’d suggest including them in your titles as well.

Matt Cutts has talked about keywords a lot in various Webmaster Help videos. If you want to make sure you’re getting keywords right, I’d advise watching some of these discussions (straight from the horse’s mouth). They’re generally short, and won’t require a lot of time:

May 2, 2012
Recovering From Google’s Penguin Update
First, before you start your campaign for Penguin recovery, you should probably determine whether you were actually hit by the Penguin update, or by the Panda update (or even some other Google algorithm change).

Shortly after the Penguin update rolled out, Google’s Matt Cutts revealed that Google had implemented a data refresh for the Panda update several days earlier. This threw off early analysis of the Penguin update’s effects on sites, as the Panda update was not initially accounted for. Searchmetrics put out a list of the top losers from the Penguin update, which was later revised to reflect the Panda refresh.

Google also makes numerous other changes, and there’s no telling how many other adjustments they made between these two updates, and since the Penguin update. That said, these two would appear to be the major changes most likely to have had a big impact on your site in the last week or two.

According to Cutts, the Panda refresh occurred around the 19th. The Penguin update (initially referred to as the Webspam Update) was announced on the 24th. The announcement indicated it could take a “few days”. Analyze your Google referrals, and determine whether they dropped off before the 24th (and around or after the 19th), and you should be able to determine if you are suffering the effects of Panda or Penguin, at least in theory.

If it looks more likely to be Panda, the best advice is probably to focus on making your content itself better. Also, take a look at Google’s list of questions the company has publicly said it considers when assessing the quality of a site’s content. We’ve written about these in the past, but I’ll re-list them here:
- Would you trust the information presented in this article?
- Is this article written by an expert or enthusiast who knows the topic well, or is it more shallow in nature?
- Does the site have duplicate, overlapping, or redundant articles on the same or similar topics with slightly different keyword variations?
- Would you be comfortable giving your credit card information to this site?
- Does this article have spelling, stylistic, or factual errors?
- Are the topics driven by genuine interests of readers of the site, or does the site generate content by attempting to guess what might rank well in search engines?
- Does the article provide original content or information, original reporting, original research, or original analysis?
- Does the page provide substantial value when compared to other pages in search results?
- How much quality control is done on content?
- Does the article describe both sides of a story?
- Is the site a recognized authority on its topic?
- Is the content mass-produced by or outsourced to a large number of creators, or spread across a large network of sites, so that individual pages or sites don’t get as much attention or care?
- Was the article edited well, or does it appear sloppy or hastily produced?
- For a health related query, would you trust information from this site?
- Would you recognize this site as an authoritative source when mentioned by name?
- Does this article provide a complete or comprehensive description of the topic?
- Does this article contain insightful analysis or interesting information that is beyond obvious?
- Is this the sort of page you’d want to bookmark, share with a friend, or recommend?
- Does this article have an excessive amount of ads that distract from or interfere with the main content?
- Would you expect to see this article in a printed magazine, encyclopedia or book?
- Are the articles short, unsubstantial, or otherwise lacking in helpful specifics?
- Are the pages produced with great care and attention to detail vs. less attention to detail?
- Would users complain when they see pages from this site?
Penguin is different. Penguin and Panda are designed to work together to increase to the quality of Google’s search results. Whether or not you think this is actually happening is another story, but this does appear to be Google’s goal, and at the very least, that’s how it’s being presented to us.

Google’s announcement of the Penguin update was titled: “Another step to reward high-quality site“.

“The goal of many of our ranking changes is to help searchers find sites that provide a great user experience and fulfill their information needs,” Cutts wrote in the post. “We also want the ‘good guys’ making great sites for users, not just algorithms, to see their effort rewarded. To that end we’ve launched Panda changes that successfully returned higher-quality sites in search results. And earlier this year we launched a page layout algorithm that reduces rankings for sites that don’t make much content available ‘above the fold.’”

If your site was hit by Penguin, you should, again, focus on quality content, and not trying to trick Google’s algorithm. All that Penguin is designed to do is to make Google better at busting you for abusing its algorithm. It’s designed to target those violating Google’s quality guidelines. The guidelines are not new. It’s not some new policy that is turning SEO on its ear. Google just found a way to get better at catching the webspam (again – at least in theory).

So, with Penguin, rather than a list of questions Google uses to assess content, as with the Panda list, simply look at what Google has to say in the Quality Guidelines. Here they are broken down into 12 tips, but there is plenty more (straight from Google) to read as well. Google’s guidelines page has plenty of links talking about specific things not to do. We’ll be delving more into each of these in various articles, but in general, simply avoid breaking these rules, and you should be fine with Penguin. If it’s too late, you may have to start over, and start building a better link profile and web reputation without spammy tactics.

Here’s a video Matt Cutts recently put out, discussing what will get you demoted or removed from Google’s index:

Assuming that were wrongfully hit by the Penguin update, Google has a form that you can fill out. That might be your best path to recovery, but you really need to determine whether or not you were in violation of the guidelines, because if you can look at your own site and say, “Hmm…maybe I shouldn’t have done this particular thing,” there’s a good chance Google will agree, and determine that you were not wrongfully hit.

By the way, if you have engaged in tactics that do violate Google’s quality guidelines, but you have not been hit by the Penguin update, I wouldn’t get too comfortable. Google has another form, which it is encouraging people to fill out when they find webspam in search results.

@mattcutts
Matt CuttsTo report post-Penguin spam, fill out https://t.co/di4RpizN and add “penguin” in the details. We’re reading feedback. 2 days ago via web · Reply · Retweet · Favorite · powered by @socialditto

They’ve had this for quite a while, but now that some people are getting hit by the Penguin update, they’re going to be angry, and probably eager to point out stuff that Google missed, in an “If what I did was so bad, why not what this person did” kind of mentality.

Another reason not to be too comfortable would be the fact that Google is likely to keep iterating upon the Penguin update. We’ve seen plenty of new versions and data refreshes of the Panda update come over the past year or so. Penguin is already targeting what Google has long been against. I can’t imagine that they won’t keep making adjustments to make it better.
April 30, 2012
Reconsideration Request Tips From Google [Updated]

If you think you’ve been wrongfully hit by Google’s Penguin update, Google has provided a form that you can fill out, in hopes that Google will see the light and get your site back into the mix.

The update is all about targeting those in violation of Google’s quality guidelines. It’s an algorithmic approach designed to make Google better at what it has been trying to do all along. For those Google has manually de-indexed, there is still a path to redemption, so it seems likely that those impacted by the update can recover as well.

For example, if you were busted participating in a link scheme, you’re not necessarily out of Google forever. Google says once you’ve made changes to keep your site from violating Google’s guidelines, you can submit a reconsideration request.

To do so, go to Webmaster Tools, sign into your Google account, make sure you have your site verified, and submit the request.

Google’s Rachel Searles and Brian White discuss tips for your request in this video:

“It’s important to admit any mistakes you’ve made, and let us know what you’ve done to try to fix them,” says Searles. “Sometimes we get requests from people who say ‘my site adheres to the guidelines now,’ and that’s not really enough information for us, so please be as detailed as possible. Realize that there are actually people reading these requests.”

“Ask questions of the people who work on your site, if you don’t work on it yourself,” she suggests, if you don’t know why you’re being penalized. Obviously, read the quality guidelines. She also suggests seeking help on the Google Webmaster forum, if you’d like the advice of a third party.

“Sometimes we get reconsideration requests, where the requester associates technical website issues with a penalty,” says White. “An example: the server timed out for a while, or bad content was delivered for a time. Google is pretty adaptive to these kinds of transient issues with websites. So if you sometimes misread the situation, as ‘I have a penalty,” and seek reconsideration, it’s probably a good idea to wait a bit, see if things revert to their previous state.”

“In the case of bad links that were gathered, point us to a URL-exhaustive effort to clean that up,” he says. “Also, we have pretty good tools internally, so don’t try to fool us. There are actual people, as Rachel said, looking at your reports. If you intentionally pass along bad or misleading information, we will disregard that request for reconsideration.”

“And please don’t spam the reconsideration form,” adds Searles. “It doesn’t help to submit multiple requests all the time. Just one detailed concise report and just get it right the first time.”

Google says they review the requests promptly.

Update: Apparently reconsideration requests don’t do you a lot of good if you were simply hit by the algorithm. A reader shares (in the comments below) an email from Google in response to such a request:

Dear site owner or webmaster of http://www.example-domain.com/,

We received a request from a site owner to reconsider http://www.example-domain.com/ for compliance with Google’s Webmaster Guidelines.

We reviewed your site and found no manual actions by the webspam team that might affect your site’s ranking in Google. There’s no need to file a reconsideration request for your site, because any ranking issues you may be experiencing are not related to a manual action taken by the webspam team.

Of course, there may be other issues with your site that affect your site’s ranking. Google’s computers determine the order of our search results using a series of formulas known as algorithms. We make hundreds of changes to our search algorithms each year, and we employ more than 200 different signals when ranking pages. As our algorithms change and as the web (including your site) changes, some fluctuation in ranking can happen as we make updates to present the best results to our users.

If you’ve experienced a change in ranking which you suspect may be more than a simple algorithm change, there are other things you may want to investigate as possible causes, such as a major change to your site’s content, content management system, or server architecture. For example, a site may not rank well if your server stops serving pages to Googlebot, or if you’ve changed the URLs for a large portion of your site’s pages. This article has a list of other potential reasons your site may not be doing well in search.

If you’re still unable to resolve your issue, please see our Webmaster Help Forum for support.

Sincerely,

Google Search Quality Team

Anyhow, should you need to submit a reconsideration request (I assume Google will still take manual action as needed), these tips might still come in handy.

Image: Batman Returns from Warner Bros.

April 30, 2012
Google Webspam Update: Where’s The Viagra? [Updated]

Update: Viagra.com is back at number one.

As you may know, Google launched a new algorithm update, dubbed the Webspam Update. According to Google, it’s designed to keep sites engaging in black hat SEO tactics from ranking. The update is still rolling out, but it’s already been the target of a great deal of criticism. You can just peruse the comments on Google’s Webmaster Central blog post announcing the change, and see what people have to say.

I can’t confirm that Viagra.com was number one in Google for the query “viagra,” but I can’t imagine why it wouldn’t have been. Either way, viagra.com is not the lead result now. That is, unless you count the paid AdWords version.

As you can see, the top organic result comes from HowStuffWorks.com. Then comes….Evaluations: Northern Kentucky University? Interesting. Here’s what that page looks like:

You’ll notice that this has absolutely nothing to do with Viagra.

Browsing through some more of the results, there are some other very suspicious activity going on. Look at this result, which points to : larryfagin.com/poet.html. That URL does’t sound like it would have anything to do with Viagra, yet Google’s title for the result says: “Buy Viagra Online No Prescription. Purchase Generic Viagra…” and the snippet says: “You can buy Viagra online in our store. This product page includes complete information about Viagra. We supply Viagra in the United Kingdom, USA and …”

If you actually click on the result, it has nothing to do with Viagra. It’s about a poet named Larry Fagin. Not once is Viagra mentioned on the page.

Also on the first results page: aiam.edu. That’s the American Institute of Alternative Medicine. At least it’s semi-drug-related. However, once again, no mention of Viagra on this page, though the title and snippet Google is providing, again, indicate otherwise. Google also informs us, “this site may be compromised”. I’m not sure what about this particular result is telling Google’s algorithm that it should be displayed on page one.

The next result is for loislowery.com:

You guessed it. Yet again, nothing to do with Viagra. And once again, Google displays a Viagra-related title and snippet for the result, and tells us the site may compromised.

Note: Not all of these results indicate that they’ve been compromised.

A few people have pointed out the oddities of Google’s viagra SERP in the comments on Google’s announcement of the webspam algorithm change:

Sean Jones says, “There is something wrong with this update. Search ‘viagra’ on Google.com – 3 edu sites are showing in the first page. Is it relevant? Matt you failed.”

Lisaz says, “These results have to be a complete joke, so much unrelated content is now surfaced to the top it’s sickening. As a funny example check this one out….Search VIAGRA and look at the results on first page for USA queries. Two completely unrelated .edu’s without viagra or ED in their content. Another site about poetry with not even a mention of viagra anywhere to be found. Then two more sites that in google that have this site may be compromised warnings. LOL what a joke this update is. Sell your Google stocks now while you can.”

ECM says, “Google.com. buy viagra online. Position 2… UNIVERSITY OF MARYLAND lol. I have seen a big mess in results now. Doesn’t this algo change just allow spammers to bring down competitors a lot more easily, just send a heap of junk/spam links to their sites. Nice one google, you’re becoming well liked. Enter BING.”

How’s Bing looking on Viagra these days?

Yeah, I have to give Bing the edge on this one.

And Yahoo:

And Blekko:

And DuckDuckGo:

We’ve seen people suggesting that the new Google update had a direct effect on exact match domain names. That could explain why viagra.com is MIA. However, it doesn’t exactly explain why some of these other results are appearing.

April 26, 2012
Google Webspam Update: “Make Money Online” Query Yields Less Than Quality Result

Update: It looks like the top result has been changed now.

Google announced a big algorithm change called the Webspam update. It’s in the process of rolling out, and is designed to penalize sites engaging in black hat SEO – activities that are direct violations of Google’s quality guidelines. In theory, it sounds like a good idea, but users are already complaining about the negative effects the update seems to have had on results.

We looked at some weird things going on with Google’s results page for the query “viagra”. For one, viagra.com is not ranking at the top. This would be the obvious, most relevant choice. Most search engines agree, based on their rankings. Now, it’s nowhere to be found on the first results page for the query in Google. There are other weird results showing up as well.

The lack of viagra.com might be explained as an issue having to do with exact match domains. People have already been talking about this in forums, and in the comments of Google’s blog post. The update, according to various webmasters, appears to have hit a fair amount of exact match domains. For example, viagra.com for the query “viagra”.

Of course, not every exact match domain for every query is missing. For example, if you search “webpronews,” you’re still going to get WebProNews.com. But perhaps there is a subset of queries that tend to have more spam targeting that were hit in this manner, and even in a case like Viagra, in which the exact match actually is the most relevant result, the algorithm is not picking up on that.

We’ve seen a few people point out Google’s SERP for “make money online”. I don’t know that makemoneyonline.com was the top result for this before anyway. It certainly should not be:

But, the top (organic) result now, is makemoneyforbeginners.blogspot.com. As Google tells us from its own snippet, “No posts. No posts.”

I don’t personally believe that the fact that it’s on Blogger (Google-owned) is much of a factor here, but it’s probably worth pointing out, given that Google is often accused of favoring its own content in search results.

Here’s what that page looks like:

Hardly the “quality content” Google is demanding of webmasters these days.

To be fair, Bing’s ranking this result too, for some reason. It’s not number one on Bing, but it’s number 3. Why is it there at all? It could be related to that whole Bing using Google results thing Google called Bing out on last year. It’s the same on Yahoo, which of course uses Bing on the back-end.

April 26, 2012
How Much Of Google’s Webspam Efforts Come From These Patents?

Bill Slawski over at SEO By The Sea, who is always up on search industry patents, has an interesting article talking about a patent that might be related to Google’s new Webspam Update.

It’s called: Methods and systems for identifying manipulated articles. The abstract for the patent says:

Systems and methods that identify manipulated articles are described. In one embodiment, a search engine implements a method comprising determining at least one cluster comprising a plurality of articles, analyzing signals to determine an overall signal for the cluster, and determining if the articles are manipulated articles based at least in part on the overall signal.

The patent was filed all the way back in 2003 and was awarded in 2007. Of course, the new update is really based on principles Google has held for years. The update is designed to target violators of its quality guidelines.

Patent jargon makes my head hurt, and I’m willing to bet there’s a strong possibility you don’t want to sift through this whole thing. Slawski is a master at explaining these things, so I’ll just quote him from his piece.

“There are a couple of different elements to this patent,” he writes. “One is that a search engine might identify a cluster of pages that might be related to each other in some way, like being on the same host, or interlinked by doorway pages and articles targeted by those pages. Once such a cluster is identified, documents within the cluster might be examined for individual signals, such as whether or not the text within them appears to have been generated by a computer, or if meta tags are stuffed with repeated keywords, if there is hidden text on pages, or if those pages might contain a lot of unrelated links.”

He goes on to talk about many of the improvements Google has made to its infrastructure, and spam detecting technologies. He also notes that two phrase-based patents were granted to Google this week. One is for “Phrase extraction using subphrase scoring” and the other, “Query phrasification“. The abstracts for those, are (respectively):

An information retrieval system uses phrases to index, retrieve, organize and describe documents. Phrases are extracted from the document collection. Documents are the indexed according to their included phrases, using phrase posting lists. The phrase posting lists are stored in an cluster of index servers. The phrase posting lists can be tiered into groups, and sharded into partitions. Phrases in a query are identified based on possible phrasifications. A query schedule based on the phrases is created from the phrases, and then optimized to reduce query processing and communication costs. The execution of the query schedule is managed to further reduce or eliminate query processing operations at various ones of the index servers.

And…

An information retrieval system uses phrases to index, retrieve, organize and describe documents. Phrases are extracted from the document collection. Documents are the indexed according to their included phrases, using phrase posting lists. The phrase posting lists are stored in an cluster of index servers. The phrase posting lists can be tiered into groups, and sharded into partitions. Phrases in a query are identified based on possible phrasifications. A query schedule based on the phrases is created from the phrases, and then optimized to reduce query processing and communication costs. The execution of the query schedule is managed to further reduce or eliminate query processing operations at various ones of the index servers.

If you’re really interested in tech patents and the inner-workings of how search engines work, I’d suggest reading Slawski’s post. I’d also suggest watching Matt Cutts explain how Google Search works.

April 25, 2012
Google On What Will Get You Demoted Or Removed From Index

Google’s Matt Cutts, as you may or may not know, often appears in Webmaster Help videos addressing questions about what Google does (and what it doesn’t do) in certain situations. Usually, the questions are submitted buy users, though sometimes, Cutts will deem an issue important enough to ask the question himself.

In the lastest video, which Cutts tweeted out on Monday, a user asks:

“Just to confirm: does Google take manual action on webspam? Does manual action result in a removal or can it also be a demotion? Are there other situations where Google remove content from its search results?”

Who better to address this question than Google’s head of webspam himself, Matt Cutts?

Cutts responds, “I’m really glad to have a chance to clarify this, because some people might not know this, although we’ve written this quite a bit in various places online. Google is willing to take manual action to remove spam. So if you write an algorithm to detect spam, and then someone searches for their own name, and they find off-topic porn, they’re really unhappy about that. And they’ll write into Google and let us know that they’re unhappy.”

“And if we write back and say, ‘Well, we hope in six to nine months to be able to have an algorithm that catches this off-topic porn,’ that’s not a really satisfactory answer for the guy who has off-topic porn showing up for his name,” he says. “So in some situations, we are willing to take manual action on our results. It’s when there are violations of our web spam quality guidelines.”

You can find those here, by the way.

“So, the answer to your question is, yes, we are willing to take manual action when we see violations of our quality guidelines,” he says. “Another follow-up question was whether it has to be removal or whether it can be a demotion. It can be a demotion. It tends to be removal, because the spam we see tends to be very clear-cut. But there are some cases where you might see cookie cutter content that’s maybe not truly, truly awful, but is duplicative, or you can find in tons of other places. And so it’s content that is really not a lot of value add – those sorts of things.”

“And we say in our guidelines to avoid duplicate content, whether it’s a cross-domain, so having lots of different domains with very, very similar or even identical content,” he says. “So when we see truly malicious, really bad stuff, we’re often taking action to remove it. If we see things that are still a violation of our quality guidelines, but not quite as bad, then you might see a demotion.”

A bad enough demotion might as well be a removal anyway. I’m sure a lot of Panda victims out there have a thing or two to say about that.

“And then the last question was, ‘Are there other situations where Google will remove content from it search results?’,” continues Cutts. “So, we do reserve the right to remove content for spam. Content can be removed for legal reasons, like we might get a DMCA complaint or some valid court order that says we have to remove something within this particular country.”

“We’re also willing to remove stuff for security reaons, so malware, Trojan horses, viruses, worms, those sorts of things,” he says. “Another example of security might be if you have your own credit card number on the web. So those are some of the areas that we are willing to take action, and we are willing to remove stuff from our search results. We don’t claim that that’s a comprehensive list. We think that it’s important to be able to exercise judgment. So if there is some safety issue, or of course, things like child porn, which would fall under legal. But those are the major areas that we’ve seen, would be spam, legal reasons, and security. And certainly, the vast majority of action that we take falls under those three broad areas.”

“But just to be clear, we do reserve the right to take action, whether it could be demotion or removal,” he reiterates. “And we think we have to apply our best judgment. We want to return the best results that we can for users. And the action that we take is in service of that, trying to make sure that we get the best search results we can out to people when they’re doing searches.”

Speaking of those security concerns, Cutts also tweeted on Monday that Google has sent messages to 20,000 sites, indicating that they may have been hacked. He attributes this to some “weird redirecting.”

April 16, 2012
Google Refreshes Spam Reporting
Google’s head of web spam Matt Cutts tweeted that the company has refreshed its spam report form. He calls it the biggest refresh in 10 years.

Side note: It’s worth pointing out that he used Twitter to announce this. I see no updates about it in his posts on Google+. This is the kind of thing that makes Twitter essential to Google’s realtime search feature, and why Google+ has a long way to go before it can serve as a useful replacement for it. Even Googlers are still relaying important information via Twitter. It looks like he hasn’t posted to Google Buzz since May 28, either, btw. But that’s another story.

@mattcutts
Matt CuttsWe just released the biggest refresh of our spam report form in, oh, say 10 years: http://t.co/ty2MxmN 16 hours ago via Tweet Button · powered by @socialditto

Here’s what the new spam report form looks like:

The page says, “‘Webspam’ refers to pages that try to trick Google into ranking them highly. Before you file a webspam report, see if the page might have a different problem.” Users are then presented with options for:
- Paid Links (the page is selling or buying link)
- Objectionable content (the page is inappropriate)
- Malware (the page is infected)
- Other Google products (This page abuses Google products other than Search, e.g., AdSense, Google Maps, etc.)
- Copyright and other legal issues (This page should be removed under applicable law).
- Personal/private (This page discloses private information)
- Phishing (This page is trying to get sensitive information)
- Something else is wrong (This page has other, non-webspam related issues)
- And finally an option that says “This page is really webspam. Report webspam”
Each option will take you to a different form or information source about how to proceed from there.

Google’s approach seems to have ruffled at least one feather. “Marketing Guy” Scott Boyd talks about the new form, saying:

Let’s see. Google crushes legitimate business websites in an attempt to remove spam from the index. Google crushes competition by undercutting them left, right and centre (analytics market is pretty much stagnent and frankly Adense just promotes lazy webmasters who’d rather take some easy bucks than work at their business). Oh and is quite happy to take vast amounts of our information without mentioning how valuable it actually is too loudly

And now they want us – that’s the webmaster community (because frankly, no one else cares about paid links – in fact most normal people probably find the idea ridiculous) – to hunt down some evil paid linkers!!

I already give you my search data, browsing history and patterns via Google toolbar, metrics on the quality of my websites via Google Adsense (for a minute fee), traffic metrics via Google Analytics, an idea of my financials, budgets and target market via Google Adwords. And now you want ME to improve YOUR product.for FREE?

I think not.

Eric Enge at Stone Temple Consulting recently posted an interview with Tiffany Oberoi, an engineer on Google’s Search Quality team. Cutts said, “Every SEO/search person should read” it. She talks about how reconsideration requests work.

Now that Google has refreshed its spam reporting, I’m guessing we’re going to see a whole lot more reporting, and of course a whole lot more of such requests. Here are some key quotes from Oberoi from that interview:

“We do have a few different manual actions that we can take, depending on the type of spam violation. We would tend to handle a good site with one bad element differently from egregious webspam. For example, a site with obvious blackhat techniques might be removed completely from our index, while a site with less severe violations of our quality guidelines might just be demoted. Instead of doing a brand name search, I’d suggest a site: query on the domain as a sure way to tell if the site is in our index. But remember that there can be many other reasons for a site not being indexed, so not showing up isn’t an indication of a webspam issue.”

“We try to take an algorithmic approach to tackling spam whenever possible because it’s more scalable to let our computers scour the Internet, fighting spam for us! Our rankings can automatically adjust based on what the algorithms find, so we can also react to new spam faster.”

“And just to be clear, we don’t really think of spam algorithms as “penalties” — Google’s rankings are the result of many algorithms working together to deliver the most relevant results for a particular query and spam algorithms are just a part of that system. In general, when we talk about “penalties” or, more precisely, “manual spam actions”, we are referring to cases where our manual spam team stepped in and took action on a site.”

“If a site is affected by an algorithmic change, submitting a reconsideration request will not have an impact. However, webmasters don’t generally know if it’s an algorithmic or manual action, so the most important thing is to clean up the spam violation and submit a reconsideration request to be sure. As we crawl and reindex the web, our spam classifiers reevaluate sites that have changed. Typically, some time after a spam site has been cleaned up, an algorithm will reprocess the site (even without a reconsideration request) and it would no longer be flagged as spam.”

She goes on to point out that reconsideration requests will not help you if you’ve been impacted by the Google Panda update.
August 4, 2011
Blekko Queries on the Rise, More So Since Content Farm Blocking

Blekko says its search queries climbed to a million a day in January. CEO Rich Skrenta tells WebProNews that Blekko has seen growth since its announcement that it has banned some content farms from its index.

"We did see a big surge in traffic following our announcement that we banned the top 20 content farms," he says. "We’ve had a lot of positive reaction to that from web users who are tired of seeing poor quality content in their search results."

"These new users are glad we’ve taken a stand and are checking the site out," he adds. "I don’t have stats yet on how many new slashtags they may have made yet, though."

Blekko issued a metrics release today, looking at how many slashtags have been created since launch, as well as total search queries in January – an all time high. Blekko users have created over 110,000 slashtags since the company’s November launch, and the search engine saw over 30 million search queries on the site in January with user activity for the month averaging between 10 to 15 queries per second. While Blekko saw a small dip in queries after an all time high at launch, current search levels are now greater than the initial launch pop, the company says.

"We’re happy at how quickly users have adopted the idea of a new search engine and have created so many quality slashtags just three months since launch," Skrenta said. "Our call to rid the Web of spam has been heard loud and clear by many and we encourage our community to continue to slash the spam."

Below are a couple of recent interviews we did with Skrenta on webspam and Blekko:

More WebProNews Videos

More WebProNews Videos

As Skrenta was on the panel with Google’s Matt Cutts and Bing’s Harry Shum last week, in which those two argued about the whole Bing-copying-search-results debate, we asked Skrenta for his take on the matter, but he wouldn’t comment on that.

As DuckDuckGo has started hard wiring in wikiHow content as its top results for how-to queries, we asked if Blekko would ever consider such a move and whether Blekko had been approached by wikiHow. His response was simply, "We haven’t been approached by wikiHow."

Blekko does say that it will add to its list of 20 blocked sites as necessary.

February 7, 2011
Google, Bing, and Blekko Talk Content Farms and Search Quality

Matt Cutts from Google, Harry Shum from Bing, and Rich Skrenta from Blekko spoke on a panel today at the Farsight Summit. Much of the conversation was around the Bing/Google results copying ordeal, but part of the conversation was about search quality in general, and the impact content farms are having on it.

Blekko announced this morning that it has banned eHow and other content farms from its results. See the full list here. Watch our recent interview with Skrenta about webspam here.

Cutts was quick to extend some praise to Blekko, saying they "made a great domain," and that he appreciates that they’ve done some interesting things lately, mentioning the spam clock, for example. He quickly followed that up by saying, "The fact is that we do use algorithms and that’s our first instinct, but when we see manual spam, we are willing to remove it manually."

He added that within Google, they could say certain domain names are webspam, but they’re trying to do things algorithmically. "We have a lot more projects that we’re working on," he added, appearing to suggest that Google’s not done with its content farm cleanup process – at least that’s how I interpreted it (something I suggested in a recent article).

Cutts said that when Google finds spam with its manual team, it also ejects it from Adsense, and that people tend to put the blame on AdSense, but even if that disappeared, we’d still have spam.

When asked what incentive Google would have to remove content from AdSense-driven pages that drive billions of dollars for the company, he just said that Google has always taken the philosophy that they care more about the long-time loyalty of users.

Then Demand Media was specifically brought up (as it has been by inquiring minds in other instances), but there was still plenty of vagueness. Cutts’ response was to mention a comment on Hacker News about how Demand Media had five articles on how to tie your shoes, then simply turn it around to "we don’t care if a site is running Google ads…we take action…we want to find an algorithmic solution."

Meanwhile, plenty of this type of content is still saturating Google SERPs. There are way more than five articles from eHow on fixing scratches in your car’s paint, as illustrated in another article:

Note: He did not say anything to the effect of "we don’t consider eHow a content farm."

Clearly Blekko is less shy about what it considers a content farm (again, see the list linked to above). Skrenta says "there’s more spam than good sites," and that it’s "easier to make a list of the sites that you actually want to go to. He notes that the top fifty medical sites have actual doctors and medical librarians creating and curating content (as opposed to what you might find from a site like eHow).

The Bing position appears to be to let Google lead the way in how to deal with search quality, which is kind of a fun position given the whole results-copying ordeal. Shum said Matt and Google need to "take this thing very seriously" because they and the industry are looking to the leader to make the web more fair and cleaner. He did say that they were also looking at Blekko and what others are saying about the topic as well.

February 1, 2011
Is Google’s Search Quality The Best It’s Ever Been?

In a post on the official Google Blog, Matt Cutts, head of the company’s webspam team said that Google’s search quality is the best it has ever been in terms of relevance, freshness, and comprehensiveness.

Do you agree that Google’s search quality is the best it’s ever been? Share your thoughts.

"Today, English-language spam in Google’s results is less than half what it was five years ago, and spam in most other languages is even lower than in English," said Cutts. "However, we have seen a slight uptick of spam in recent months, and while we’ve already made progress, we have new efforts underway to continue to improve our search quality."

"As we’ve increased both our size and freshness in recent months, we’ve naturally indexed a lot of good content and some spam as well," explains Cutts. "To respond to that challenge, we recently launched a redesigned document-level classifier that makes it harder for spammy on-page content to rank highly. The new classifier is better at detecting spam on individual web pages, e.g., repeated spammy words—the sort of phrases you tend to see in junky, automated, self-promoting blog comments. We’ve also radically improved our ability to detect hacked sites, which were a major source of spam in 2010. And we’re evaluating multiple changes that should help drive spam levels even lower, including one change that primarily affects sites that copy others’ content and sites with low levels of original content. We’ll continue to explore ways to reduce spam, including new ways for users to give more explicit feedback about spammy and low-quality sites."

The post was in response to a lot of talk throughout the Blogosphere lately that Google is losing its edge in search – when it comes to relevancy and spam. Some of this was no doubt fueled by the recent launch of the spam clock from Blekko, which may not be on the minds of much of the general public, but that many influential bloggers in the search space are certainly aware of.

Watch an interview we did the other day with Blekko CEO Rich Skrenta about web spam here:

More WebProNews Videos

Cutts says it is a misconception that Google doesn’t take as strong an action on spam in its index if the spammy sites are Google ads.

"To be crystal clear," he says, "Google absolutely takes action on sites that violate our quality guidelines regardless of whether they have ads powered by Google; Displaying Google ads does not help a site’s rankings in Google; and Buying Google ads does not increase a site’s rankings in Google’s search results. These principles have always applied, but it’s important to affirm they still hold true."

Something tells me Cutts and Google will never convince everybody, but at least they’re being "crystal clear" in their explanation.

Is the explanation clear enough for you? Tell us what you think.

January 21, 2011
Google Wants Your Help to Reduce Webspam

If you have any suggestions for Google’s webspam team, now is the time to make them. Google’s Matt Cutts, who leads the team, has called for suggestions for webspam projects for the year in a post on his personal blog.

"About a year and a half ago, I asked for suggestions for webspam projects for 2009," says Cutts. "The feedback that we got was extremely helpful. It’s almost exactly the middle of 2010, so it seemed like a good time to ask again: what projects do you think webspam should work on in 2010 and beyond?"

Cutts is taking suggestions in the Google+Reader#respond”>comments section of that blog post. From the sound of things, this could really make an impact on Google’s approach, so if you have any good ideas, don’t be shy about sharing them.

Last year Matt got over 300 suggestions, and I’d expect a similar number this time, which is all the more reason to make yours stand out by offering some thoughtful insight.

Cutts does note that he doesn’t want people to call out individual sites or much discussion in the comments – just your idea of what Google should work on to decrease webspam.

June 30, 2010