SEO – Page 43 – WebProNews

Google Panda Update: $25,000 Reward If You Can Get This Site Recovered

If you’ve been following the Panda saga for a while, you may remember hearing the story about One Way Furniture, a furniture store that was victimized by the Panda update. It got a fair amount of media coverage, including by NPR and the Wall Street Journal. We mentioned them in our first coverage of the DaniWeb story.

“I was sort of a pioneer of selling furniture online,” CEO Mitch Lieberman tells WebProNews. “I first began in 1997 on eBay and we were the first ones selling brand new furniture on that platform. I took the seller name on eBay ‘Furniture’ and we still have that user name to date.”

While DaniWeb has been able to make a full recovery (and then some) from the Panda update, thanks to Google’s latest known tweak, One Way Furniture has not been so lucky. Lieberman tells us that he is offering a $25,000 reward to whoever can get the site to a “sustained, full Panda recovery.”

Perhaps he should talk to DaniWeb’s Dani Horowitz.

“Our site literally got crushed on Panda 1, and each Panda has gotten worse,” he says. “I’m at the point where I don’t have the answers and some of the best in the business have failed to guide us towards a recovery. With that in mind I thought a reward would be the way to go. Since February 24th I’ve spent most of my time in Google Analytics.”

“I’ve spent millions over the years on AdWords,” he adds. “However, I’m not saying that with a sense of entitlement, because the fact is that Google has been great to us for many years. However, as a paying client I would hope to have some sort of history with the company to where they could guide us to what changes need to be made. We are completely in the dark here. If you shop at a department store and have a bad experience with one department, are you really going to just shop in a different department in the same store? Google acts as if the SERPS depts., and ADWORDS have nothing to do with each other.”

“As a paying customer to Google I just ask for some respect,” says Lieberman. “A simple 10 minute phone call with a SERPS manager to offer us tips is all we would expect. The highlighted total is almost 3 million spent on Google Adwords since we started with them about 8 years ago. Last year we spent $442,159.69 with them.”

Lieberman expressed that he isn’t trying to be condescending to Google. “I understand they update their algorithms from time to time, I don’t have a problem with that,” he says. “I only have a problem with two things. A) Nuking us overnight without warning as they did. B) Failure to communicate with us what we need to change. I’ve been selling furniture online while Google was starting out their business in 1997!”

“I must state that the idea is for someone hands on to take on this project,” he says. “The goal is give our end user the best possible buying experience on our site. At the same time we want to see those changes get us out of this Panda Penalty. We have made a list of over 30 changes that we have done so far, and also have some other significant changes in the works. Mainly a complete redesign.”

This is what the site looks like currently:

Here are some things One Way Furniture has done so far, as it has tried to climb back into Google’s good graces (as explained by Lieberman):
Removed verbiage from manufacturers captions on every page. We typically use this area to give a history about the manufacturer we carry. (lowered ratio of similar content on every page)
Removed Global Index, Follow.
Removed all product descriptions from shopping portals and affiliate program.
Took out Mass Comment fields coded in the source across the site
Removed, Auto Meta Tag text from every product that was put in about 8 years ago.
Deleted most discontinued products (15,000 that built up over 10 years) that were previously left so customers can find a newer product. They are now 404 errors.
No Followed Twitter / Facebook Home page links.
Removed some links from footer.
Added DMCA monitoring of content
Reverted back to Old Cross Sell to get products out of google discussions, that seem to have been seen as a forum. A google employee confirmed in webmaster forums there was some issue.
Asked Power Reviews to remove dates from reviews.
Killed Sub Domain art.onewayfurniture.com which was a art site that approached us to sell art. Problem was they sold the same art across many other sites. We learned a lesson. Our sub domains are our property and we are on lockdown.
Removed site wide links from lowest price guarantee.
Content Writers rewriting descriptions. So far over 7,000 have been written. We expect to rewrite every last one.
Rewriting some sections descriptions. Many have stolen content (including affiliates so google doesn’t know what is unique anymore)
Changed Left NAV to HTML. Create unique left nav on some section pages like twin beds, full beds, queen, beds, king beds, entertainment furniture, and office furniture section.
Follow, No index search.onewayfurniture.com
Increased page speed by changing from RTML to HTML menu.
Added Site wide Canonical tags
Fixed Java Script error and removed poor coding particularly on left nav.
no index, follow 15,000 pages (almost half the site).
Removed the repeating name in H1 just below the lowest price guarantee – May 18th
Created a supplemental section in TV Stands for non performers. – April 10 – May go site wide with this in the future. Kind of like a supplemental results for poor performers.
Started correcting bad characters that converted into question marks because of word format vs excel format. Hundreds of occurrences.
Took Out “repeat name” into caption” on section pages. June 8th.
301 redirect onewayoffice.com to our main site www.onewayfurniture.com
Going to 301 redirect www.cjhomeandoffice.com to onewayfurniture.com in the next few days. We want to focus on our main brand. We did not get this site to trick any engine. We did it because we acquired it from a competitor and it made sense at the time to populate it. 75% of the content on the site was rewritten, but the products were all the same. We now realize its useless to brand ourselves on two brands. We are regaining the focus now.
Bedroom-furniture-sets sorted low price to high price (July 12) testing to see if lower prices keep customers more engaged on the site. Previously we sorted by best sellers.
Removed site wide (online home furniture store footer link) – July 27
Deleted 5,500 discontinued products still remaining in database (july 28)
Took out “discount furniture store from logo alt tag)
Finishing up a major site redesign (about 2 weeks out). Focusing on bigger text on the left nav, removal of left nave on item pages, and removal of best sellers (right nav) on item pages. Blowing up pictures about twice the size. Shortening up columns from 5 wide to 3 wide on sections. Redesigned a better welcoming home page.

So how about that reward? Here’s the criteria One Way Furniture has set:

A) At least 85% of organic traffic recovered compared to last year same time. We will consider it a recovery after 30 days of the data showing the traffic bump.

B)      Google guidelines must be adhered to 100%.

C)       Overall bounce rate has to decrease site wide by at least 15%.

E)      Reward payment will start 90 days following the beginning of “recovery”.

E)     The 25K will be broken up into 10 payments starting on the 91st day.

F)      Payments will continue as long as traffic remains within the threshold. If traffic drops during another Panda update the payments will stop, but will continue once the traffic is again restored.

G)      If they are successful and receive all $25,000 of the reward, at the end of the period we will throw in $2,500 worth of free furniture from any one of these departments: Bar Stools, Bedroom Sets, Dining Rooms, TV Stands, Office Furniture.

H) The person that gets us out of Panda will also be in consideration for a possible contract job moving forward. The contract job will not be for an SEO, It will be for a CUO. We think SEO needs to be permanently replaced with this new term “CUO” Consumer Usability Optimization.

If you’re interested in seeking the $25,000 reward, you can contact the company at [email protected].

DaniWeb Claims 110% Recovery from Google Panda Update

This week, it was confirmed that Google had made a minor adjustment to its Panda algorithm update, which has drastically altered the search engine’s results several times since its first iteration in February.

Have you seen any rankings changes with this latest incarnation of the Panda update? Let us know in the comments.

As Google makes hundreds of algorithmic changes each year, Google downplayed this as any major shift. The official statement, as obtained by Barry Schwartz, was:

“We’re continuing to iterate on our Panda algorithm as part of our commitment to returning high-quality sites to Google users. This most recent update is one of the roughly 500 changes we make to our ranking algorithms each year.”

It appears that it may be more major than we originally thought. We had seen a few comments from webmasters indicating that their rankings had somewhat improved, but now Dani Horowitz, whose DaniWeb discussion forum was an apparently innocent casualty of the Panda updates’s wrath tells WebProNews that the site has made a full “110% recovery” as a result of this most recent Panda tweak.

When we interviewed Horowitz back in May, she told us about some various tactics she was engaging in, which seemed to be having positive effects on her site’s search referrals.

While what she was seeing was far from a full recovery, it was enough to give webmasters hope that they may be able to climb their way back up into Google’s good graces, despite having been victimized by the update. In other words, there were enough other ranking factors that sites could use to improve their rankings to avoid being totally deprived of search referrals at the hands of Panda – good news for those sites with quality content that were casualties of Google’s war on poor content.

At the time, DaniWeb had a long way to go, however, to reach the levels of traffic it was seeing from Google before. Even more interesting perhaps, was the fact that Google seemed to be ranking DaniWeb well for things that didn’t make sense, while things that that it ranked well for previously that did make sense, were sending traffic elsewhere.

“Panda 2.3 went live on July 23rd and traffic just instantly jumped back up to normal that very day,” Horowitz now tells us. “We’re now seeing traffic at the same pre-Panda highs in some countries, while other countries are even better than ever. Overall, we’re seeing more pageviews than ever before.”

Here’s a look at global visitors and US visitors respectively since the beginning of the year (that’s visitors, not pageviews):

“Notice that US visitors were affected on February 24th while global traffic wasn’t severely impacted until a month and a half later,” Horowitz points out. “The decline coincided exactly with the first iteration of Panda and the recovery coincided exactly with the latest iteration of Panda.”

“All of the changes I’ve made were documented in the official Google Support thread or in the video interview I did with you guys,” she tells us. “In fact, I hadn’t made any recent changes immediately before the recovery. I haven’t yet had a chance to investigate any specific long tail keywords yet either. Google Webmaster Tools looks very different from what it looked like back in March as a result of all the work I’ve done, but nothing that stands out between this month and last.”

She did add in the Google Support thread, “There were no big changes made immediately before the site came back, with the exception of a significant increase in my Google AdWords budget.” She followed this up shortly after with, “I mentioned AdWords because we use it heavily to increase registrations, which directly results in an increase in posts per day. If there was a correlation, then it was a sudden increase in new content followed the penalty reversal.”

Here’s our previous interview with Dani, so you can gain more insight into the kinds of things she was doing in the first place:

We’ll keep our eyes peeled for more reports of full recoveries. I have to wonder how many wrongfully impacted sites have seen their rankings jump back up. Either way, provided that DaniWeb’s recovery was indeed a direct result from this latest Panda tweak, other victims might find hope in that Google does continue to “iterate” on the Panda algorithm.

Have you noticed a significant change in rankings since the latest iteration of the Panda update? Any more ill recoveries? Let us know.

July 27, 2011

Google Made a Minor Tweak to the Panda Update

Google makes hundreds of changes to its algorithm every year. Some days it makes more than one (obviously). One day last week, the search engine reportedly made a small tweak to the Panda part.

Barry Schwartz posted the following statement from Google regarding the matter:

Google has not given any new indications of what it is doing differently on the Panda front, but as the company has said in the past, they will continue to “iterate” on it.

A few webmasters recently took to the forums to express that their sites had suddenly changed rankings, and interestingly quite a few seemed to be for the better. That’s kind of a change of pace from the grumbles we’ve been hearing for the better part of the year.

We still haven’t heard of any full recoveries, though HubPages is taking an interesting approach, as we looked at recently, by giving authors their own subdomains, as to separate content author by author, so one author who writes poorly doesn’t drag down the content of a higher quality author.

In early tests, HubPages has seen some success in rankings for certain content employing this strategy. The jury is still out on how this will impact the site as a whole.

Update: Dani Horowitz tells us that Daniweb, which we discussed at length in an interview with her, has made a full recovery (see comments below). More on DaniWeb’s recovery here.

July 26, 2011

Redesigning Your Site? Don’t Make it Harder for Google to Extract the Text

Google posted one of the Matt Cutts Q&A videos today, where he talks about the effects of site redesigns with redirects on search rankings. Here’s the specific question as it was posed to Matt:

I’m changing the platform of my blog. All old URLs will redirect to new ones. But, since the HTML code and layout of the pages are different, do you lose search engine rankings?

“Well, search engine rankings can change when the page changes itself,” Cutts responds. “If you’re doing the 301s correctly – a permanent redirect from the old site to the new site, and if you’re doing it at a page level – so from the old page to the new page – you should be in relatively good shape, but it’s not just incoming links.”

“It’s also the content of the page itself,” he continues. “So if you had a really good layout with a really clean design, where all the text was really easily indexed, and you move to something that was a lot more confusing, and maybe the text wasn’t as easy for us to extract, that could change your search rankings for the downside, or for the negative.”

“In general, we’re relatively good about changing layouts and still being able to discern what that page is about, but here’s one test that you could do: as long as you haven’t done the transition yourself, if you can try making a few tests, where you can take the layout of the new page or the new site, and see if you can apply it in some very simple ways to the old site, then that’s a way to isolate those, because it’s just like any scientific experiment,” he says. “If you do two things at once, and your rankings go down, you can’t decouple what caused it. Whereas if you can change just the layout – even if it’s only on a few pages, to try out and see whether your rankings change with that, then you’ll know – was it more likely to be because of the redirects or because I was changing my HTML layout.”

In terms of layouts, you may also do well to consider the role design plays in how Google determines quality content. Would people be comfortable giving your site their credit card info? Design can play a big role in this. Another question on Google’s list of “questions that one could use to assess the ‘quality’ of a page or an article,” is “Are the pages produced with great care and attention to detail vs. less attention to detail?” Then there’s the whole load time factor. Google does count page speed as a ranking signal.

July 25, 2011

SEO Developments You Need to Know About

Much of the Google talk lately has been centered around Google+, the company’s new social network, and with good reason. It may have a significant impact on how Internet users use other established social sites like Facebook, Twitter, LinkedIn, and even StumbleUpon. However, it is still Google search that drives the majority of web traffic for most site owners, and there is plenty going on in search as well.

What do you consider to be the most significant recent development in search? Share your thoughts in the comments.

What Would This New Google Design Mean for SEO?

First, I want to talk about a new user interface tweak Google is testing, which could have major implications for site owners and their visibility in Google search results.

The change, seen in the video below, has the search bar and navigation bar sticky at the top of the page and the left panel of search options sticky to the side. In other words, these things stay put as you scroll through search results, rather than disappearing as you scroll down as they do in the regular interface currently.

In the video, we see that results are still paginated. You still have to click through various pages of search results. How often do you really click past the first page?

However, the interface change closely resembles the current interface of Google Image Search. Here, the same things are stickied, but instead of paginated results pages, it has infinite scroll, meaning you can keep scrolling down the page to see more results. Eventually, you have to click “show more results,” but it’s not like clicking through multiple pages.

For all intents and purposes, all of the images appear on page one. It seems likely that if Google switches to this type of interface for regular web search results, it may implement the infinite scroll functionality as well. This would mean, of course, that users wouldn’t have to click to page 2 of the search results to see your site if that’s where you’re currently ranking.

Users are far more likely, in my opinion, to look at more results if they’re all presented on the page. I know this has been the case for me personally, using Google Image search. Similar functionality is also available in Twitter’s timeline, and I know I take in more results there as well.

Google has changed its algorithm and interface so much over the years, with added personalization, local results, universal search, etc. that it is has become harder and harder to get your content seen by searchers, but if this actually pans out, it may actually help with visibility. Hopefully content quality will also be reflected.

We dont’ know for sure that Google will implement any of this, but would it not make for a better user experience?

How would these changes impact SEO? Tell us what you think.

Google is getting more focused.

As you know, Google has tons of products and services, and constantly experiments with new potential ones. With Larry Page at the helm now, however, the company is getting much more focused. This was a major theme of what Page had to say in the company’s earnings call last week. Since then, Google even made the bold announcement that it is shutting down Google Labs, which holds most of Google’s experimental offerings.

“While we’ve learned a huge amount by launching very early prototypes in Labs, we believe that greater focus is crucial if we’re to make the most of the extraordinary opportunities ahead,” said Google SVP for Research and Systems Infrastructure Bill Coghran.

Search items like Google Code Search, Google Trends, Google Suggest, Google Social Search, and even Google Maps started out in Google Labs.

That doesn’t mean Google is looking to stop innovating. “We’ll continue to push speed and innovation—the driving forces behind Google Labs—across all our products, as the early launch of the Google+ field trial last month showed,” said Coghran.

“Greater focus has also been another big feature for me this quarter–more wood behind fewer arrows, Page said in the earnings call. “Last month, for example, we announced that we will be closing Google Health and Google PowerMeter. We’ve also done substantial internal work simplifying and streamlining our product lines. While much of that work has not yet become visible externally, I am very happy with our progress here. Focus and prioritization are crucial given our amazing opportunities. Indeed I see more opportunities for Google today than ever before. Because believe it or not we are still in the very early stages of what we want to do.”

“Even in search … which we’ve been working on for 12 years there have never been more important changes to make,” he said. For example this quarter we launched a pilot that shows an author’s name and picture in the search results, making it easier for users to find things from authors they trust.”

Who You Are Matters More

That last point by Page brings me to the next point. Who you are is becoming more important in search. We made note of this when Google announced the authorship markup, which enables the feature Page spoke of. To implement this, by the way, here is Google’s instructions:

To identify the author of an article or page, include a link to an author page on your domain and add rel=”author” to that link, like this:

Written by <a rel=”author” href=”../authors/mattcutts”>Matt Cutts</a>.

This tells search engines: “The linked person is an author of this linking page.” The rel=”author” link must point to an author page on the same site as the content page. For example, the page http://example.com/content/webmaster_tips could have a link to the author page at http://example.com/authors/mattcutts. Google uses a variety of algorithms to determine whether two URLs are part of the same site. For example, http://example.com/content, http://www.example.com/content, and http://news.example.com can all be considered as part of the same site, even though the hostnames are not identical.

I find it interesting that this is the sole feature Page alluded to in the earnings call, with regards to search. This makes me wonder if Google places even more emphasis on this than I thought.

Watching the Subdomain Impact on Panda Recovery

We may find out how big a role content author can play in search rankings soon (separate form the actual authorship markup element) thanks to some experimenting by Panda update victim HubPages. We recently reported on HubPages’ strategy of subdomaining content by author to keep content separate, so that the poor quality postings by some authors doesn’t have an effect on the search rankings of those authors who are putting out higher quality. This also, in theory, is designed to keep the entire site from being pulled down by some less than stellar content.

This week, Hubpages announced that it was rolling out these subdomains. One author told WebProNews, “On one of my accounts at HubPages, I’m already seeing a bit of an increase of traffic and I’m quite sure it is from the subdomain/URL forwarding. HubPages, from what I can make of the update, is definitely heading in the right direction.”

Definitely something to keep an eye on in the coming weeks/months.

Do you think subdomains are going to make a significant impact? Tell us what you think.

PageRank Gets Updated Again

Several weeks ago, Google launched an update to its PageRank (which displays in the Google toolbar). Google has played down the significance of PageRank, as it is only one of many signals, but it is still a signal, and one worth considering.

Interestingly, that update caused Google’s own PageRank to drop from a 10 to a 9. This week, PageRank got another update, and sent Google back up to a 10.

Google doesn’t usually update PageRank that frequently, so the new update raised a few eyebrows. Barry Schwartz at Search Engine Roundtable thinks it’s related to Twitter. “It was because, I believe, Twitter’s PR was a PR 0 and Google didn’t want people to think that Google downgraded Twitter’s PageRank manually because of contract deals breaking between the two,” he writes. He got the following statement from Google:

Recently Twitter has been making various changes to its robots.txt file and HTTP status codes. These changes temporarily resulted in unusual url canonicalization for Twitter by our algorithms. The canonical urls have started to settle down, and we’ve pushed a refresh of the toolbar PageRank data that reflects that. Twitter continues to have high PageRank in Google’s index, and this variation was not a penalty.

Twitter’s PR is a 9. Twitter’s wasn’t the only one to change, however. Various webmeisters took to the forums to note that their own had been changing.

Google is Nixing the Google Toolbar for Firefox

While we’re on the topic of the Google Toolbar, it’s also worth noting that it’s being discontinued for Firefox.

“First of all, we’d like to thank all of our loyal users of Google Toolbar for Firefox,” Brittney said on the Google Toolbar Help blog. “We deeply appreciate all of the feedback over the years that helped to make the product so useful. As we all know, over the past few years, there has been a tremendous amount of innovation in the browser space. For Firefox users, many features that were once offered by Google Toolbar for Firefox are now already built right into the browser. Therefore, while Google Toolbar for Firefox works on versions up to and including Firefox 4 only, it will not be supported on Firefox 5 and future versions. Please see our Help Center for additional details.”

Google’s own Chrome browser has over 160 million users, according to Page.

Google +1 Button Impressions

Page also announced that the +1 button is begin served 2.3 billion times a day. That means people are consuming a whole lot of content out there that carries this button. The button itself, as you may know, contributes directly to search rankings. The more +1’s a piece of content gets, the more signals Google is receiving that people like this content, which increases its chances of ranking better.

It’s just one of many signals Google uses, but it’s a pretty direct signal.

While the button is yet to be integrated directly into Google+, the tremendous momentum of Google+ will likely only serve to fuel clicks of the +1 button. When I say it’s not integrated, I mean that when you click the +1 button on a piece of content, it’s not sharing it to your followers’ streams. It’s not like Facebook’s “like” button, where it promotes that content to your friends’ news feed. At least not yet. It goes to a separate tab on your Google profile that few probably see.

Still, despite any confusion that may arise from that, people are going to associate that “+1” with Google+. They’re not only seeing it on content on the web, but on Google+ posts from within the social network. Presumably, they’ll click it on the web more too.

Is Google+ going to greatly impact search? Let us know what you think in the comments.

July 21, 2011

The Google Directory is No More

While I’m not sure of the viability of directory submissions as a way of search engine marketing, or optimization for that matter, when Google disbands its products, it’s news. Thanks to a heads up by Brent Csutoras, we now know the Google Directory has gone the way of Wave and Google Video; that is, it’s gone.

If you try to navigate to the Google Directory, you will be greeted with the following message:

Google Directory is no longer available.

We believe that Web Search is the fastest way to find the information you need on the web.

If you prefer to browse a directory of the web, visit the Open Directory Project at dmoz.org

Again, the viability of directories in relation to marketing is dubious, but if you could get your site into one of these things — the prominent ones, anyway — it was worth a shot. Oh well. There’s always DMOZ, although, I have something of a history with that particular subject.

Over at Csutoras’ post, he states his preference for Best of the Web directory, which offers a similar level of quality as a DMOZ entry. The difference being, the BOTW directory charges for listings. DMOZ, while free, has issues of with the approval process.

Speculation concerning the closing of the Google Directory focuses on lack of use, something Barry Schwartz pointed out in a SearchEngineLand post. He also indicates Google deactivated its directory without much fanfare, and, after a quick glance at the Google Webmasters blog and the official Google blog, you’ll see that there’s no announcement available, which also supports that “lack of use” theory. There’s also nothing on any of the various Twitter accounts for Google employees.

If no one’s using a product, quietly killing it is indeed an option, although, there isn’t much of anything Google can do that will remain quiet for long.

July 20, 2011

Google May Sticky Search Box, Navigation Bar

A Google design tweak being tested would see the bar at the top of Google.com, as well as the left panel with the search options remain in place as the user scrolls through search results.

Alex Chitu at the Google Operating System blog has shared a video from Alon Laudon of what it looks like:

It’s a subtle, but helpful change in my opinion, and I would not be surprised to see this one become an actual feature. It would also help keep Google+ on the screen at all times (the notifications and share box). This would make it easy to share content at any time while on Google.

This kind of interface already exists on Google Image Search, as Chitu points out, and that comes with infinite scrolling. One has to wonder if this would open the door to infinite scrolling on regular search results as well. If so, that ought to be good for boosting those first page rankings. In all honesty, I can actually see this benefiting pages that would rank on the second or third page of results, because users wouldn’t be required to click through to another page to see them. It’s much easier to keep scrolling down.

I really can’t see much negative about the fixed navigation or the possibility of infinite scrolling in search results. How about you?

July 18, 2011

Google Panda Update: The Solution for Recovery?

Many sites are still wondering how they can come back from being hit by the Google Panda update. Google has certainly stressed quality, and victims of the update have been striving to improve it, but have had little luck in terms of boosting their rankings for the most part.

Have you been able to recover any search traffic after being hit by the Panda update? Let us know.

When we talked to Dani Horowitz of DaniWeb, she told us about some other things she was doing that seemed to be helping content rank better, but it was hardly a full recovery in search referrals.

An article ran at WSJ.com about HubPages, one of the victims that we’ve written about a handful of times. CEO Paul Edmondson is claiming that the use of sub-domains is helping its content work its way back up in Google – something he stumbled upon by accident, but also something Google has talked about in the past.

The article quotes him as saying that he’s seen “early evidence” that dividing the site into thousands of subdomains may help it “lift the Google Panda death grip.” Amir Efrati reports:

In June, a top Google search engineer, Matt Cutts, wrote to Edmondson that he might want to try subdomains, among other things.

The HubPages subdomain testing began in late June and already has shown positive results. Edmondson’s own articles on HubPages, which saw a 50% drop in page views after Google’s Panda updates, have returned to pre-Panda levels in the first three weeks since he activated subdomains for himself and several other authors. The other authors saw significant, if not full, recoveries of Web traffic.

The piece also points to a blog post Cutts wrote all the way back in 2007 about subdomains. In that, Cutts wrote, “A subdomain can be useful to separate out content that is completely different. Google uses subdomains for distinct products such news.google.com or maps.google.com, for example.”

HubPages is rolling out subdomains for all authors, which in theory, should help the site’s performance remain tied to the quality of the output by specific authors. This is also interesting given that Google recently launched a new authorship markup, putting more emphasis on authors in search results.

When that was launched, Google said in the Webmaster Central Help Center, “When Google has information about who wrote a piece of content on the web, we may look at it as a signal to help us determine the relevance of that page to a user’s query. This is just one of many signals Google may use to determine a page’s relevance and ranking, though, and we’re constantly tweaking and improving our algorithm to improve overall search quality.”

It may be a little early to jump to the conclusion that subdomains are the silver bullet leading to a full Panda recovery, but for those sites with a mix of great quality and poor quality content, this could very well help at least the great stuff rise. It will be interesting to see how HubPages performs over time, once the new structure has been live for a while.

Google’s statement on the matter (as reported by Barry Schwartz) is: “Subdomains can be useful to separate out content that is completely different from the rest of a site — for example, on domains such as wordpress.com. However, site owners should not expect that simply adding a new subdomain on a site will trigger a boost in ranking.”

To me, it sounds like if your entire site was hit by the Panda update because of some content that wasn’t up to snuff in the eyes of Google, but some content is up to snuff, you may want to consider subdomain, at least on the stuff that Google doesn’t like – to “separate it out”. You’ll have to do some content evaluation.

Edmondson’s concept of doing it by author actually makes a great deal of sense. It makes the authors accountable for their own content, without dragging down those who have provided quality content (again, in theory). Not everybody hit by Panda is a “content farm” (or whatever name you want to use) though. For many, it won’t be so much about who’s writing content.

Content creators will still do well to consider Google’s lists of questions and focus on creating content that is actually good. I case you need a recap on those questions, they are as follows:

Would you trust the information presented in this article?
Is this article written by an expert or enthusiast who knows the topic well, or is it more shallow in nature?
Does the site have duplicate, overlapping, or redundant articles on the same or similar topics with slightly different keyword variations?
Would you be comfortable giving your credit card information to this site?
Does this article have spelling, stylistic, or factual errors?
Are the topics driven by genuine interests of readers of the site, or does the site generate content by attempting to guess what might rank well in search engines?
Does the article provide original content or information, original reporting, original research, or original analysis?
Does the page provide substantial value when compared to other pages in search results?
How much quality control is done on content?
Does the article describe both sides of a story?
Is the site a recognized authority on its topic?
Is the content mass-produced by or outsourced to a large number of creators, or spread across a large network of sites, so that individual pages or sites don’t get as much attention or care?
Was the article edited well, or does it appear sloppy or hastily produced?
For a health related query, would you trust information from this site?
Would you recognize this site as an authoritative source when mentioned by name?
Does this article provide a complete or comprehensive description of the topic?
Does this article contain insightful analysis or interesting information that is beyond obvious?
Is this the sort of page you’d want to bookmark, share with a friend, or recommend?
Does this article have an excessive amount of ads that distract from or interfere with the main content?
Would you expect to see this article in a printed magazine, encyclopedia or book?
Are the articles short, unsubstantial, or otherwise lacking in helpful specifics?
Are the pages produced with great care and attention to detail vs. less attention to detail?
Would users complain when they see pages from this site?

Those are, by the way, “questions that one could use to assess the ‘quality’ of a page or an article,” according to the company.

What do you think of the subdomain theory? Tell us in the comments.

July 13, 2011

Vanessa Fox, called a cyberspace visionary by Seattle Business Monthly, is an expert in understanding customer acquisition from organic search. She shares her perspective on how this impacts marketing and user experience and how all business silos (including developers and marketers) can work together towards greater search visibility at ninebyblue.com. She’s also an entrepreneur-in-residence with Ignition Partners, Contributing Editor at Search Engine Land, and host of the weekly podcast Office Hours. She previously created Google’s Webmaster Central, which provides both tools and community to help website owners improve their sites to gain more customers from search and was instrumental in the sitemaps.org alliance of Google, Yahoo!, and Microsoft Live Search. She was named one of Seattle’s 2008 top 25 innovators and entrepreneurs. Her book, Marketing in the Age of Google, provides a blueprint for incorporating search into organizations of all levels.

Key Interview Points

I really enjoy speaking with Vanessa about search because of her perspective about how to do things. As readers of mine know, I am a fan of the trite old way of doing it – producing a great web site, making it search friendly, and then promoting it well. Vanessa is truly an industry leader in promoting this type of thinking.

This is a great interview for you to read if you want to get a strong feeling for the philosophy that drove the Panda algorithm, and the implications of that philosophy going forward. Here are some of the major elements that I extracted (and paraphrased except in those situations which are quoted) from the discussion we had:

Like any business, Google seeks to maximize its profitability. However, Google believes that this is best done by providing maximum value to end users, as this helps them maintain and grow market share. They make more money this way than trying to squeeze extra CPM out of their web pages at the cost of user experience.
The AdWords team does not have access to the organic search team, and as a result the engineers working on organic search are free to focus on delivering the best quality results possible.
(Vanessa) “Panda isn’t simply an algorithm update. It’s a platform for new ways to understand the web and understand user experience”.
Panda is updated on a periodic basis, as opposed to in real time. This is similar to updates to the PageRank displayed on the Google Toolbar, except it is a whole lot more important!
It is easier to reliably detect social spam than link spam.
(Eric) “If you’ve got twelve different signals and someone games two of them and the other ten don’t agree, that’s a flag.”
Don’t focus on artifical aspects of SEO. If it seems like a hokey reason for a web page to rank higher, it probably isn’t true. If by some chance it is true, first it is most likely a coincidence, and second and more importantly, you can’t count on it staying that way.
(Vanessa) “I suggest you get an objective observer to provide you feedback and determine if there are any blind spots you’re not seeing.”
(Vanessa) “The question then becomes if someone lands on your site and they like that page, but they want to engage with your site further and click around your site, does the experience become degraded or does it continue to be a good experience?”
Added value is key. Search engines are looking more and more for the best possible answer to user’s questions. Even if your article is original, if it covers the exact same points as hundreds of other articles (or even 5 other articles) there is no added value to it.
Reviews can be a great way to improve web page content provided that they are contextually relevant and useful.
Crowd sourced content is also potentially useful, but must also be relevant and valuable.
One of the challenges facing both UGC and Crowd Sourcing is the editorial challenge of making sure it is useful and relevant.
Branding can be very helpful too, as it helps people trust the content more. Search engines recognize this as a differentiator as well.
(Vanessa) “I think social media levels that playing field a bit. In the past, you had to hire a publicist, do press releases, have relationships with reporters, and get on Good Morning America, or something on that order, to get your name recognized.”
SEO is still important! Making sites that are easily understood by search engines is still something you need to do. Effective promotion of your web site remains critical too.
Unfortunately, for many sites that have been hit by Panda, there is no quick fix. There are exceptions, of course, but they will be relatively rare.

Motivations of Google

Eric Enge: Let’s talk about what Panda was from a Google perspective and what they were trying to accomplish rather than the mechanics of what they did.

Vanessa Fox: I like that you addressed it that way because many people simply want to know mechanically what did they.

This update took many people by surprise and, certainly, there are things to be worked out. However, Google has never been secretive about what it’s trying to accomplish and, specifically, what it’s trying to accomplish with Panda.

Ever since Google launched, its primary goal has been to figure out what searchers want and give them that. This encompasses a lot of things. It encompasses answering their question as quickly and as comprehensively as possible. It involves all the things you think about in terms of making the searcher happy and providing a good user experience.

In the early days of the web, the only way Google knew if people found something valuable was if there was a link to it. Today, the web is more sophisticated and Google has much more information available to it. The bottom line is that Google is trying to provide the best results for searchers and, for them, Panda was a major step forward in accomplishing this.

Eric Enge: Yes, some people believe that Google made these changes because it favors their advertisers and their objective is to make more money in the short term. I don’t believe this. To me, the value of market share far outweighs the impact you could get by jacking up your effective CPM by a few percent on your pages.

It is short term and shortsighted to think Google is now focused on improving CPMs or trying to drive people … to advertise via AdWords.

Vanessa Fox: That’s absolutely right. It is short term and shortsighted to think Google is focused on improving CPMs or is trying to drive people, who lost ranking in the organic results, to advertise via AdWords. Google is looking for long term market share which is the best way for them to maximize profitability.

The root of their market share is the fact that they get so many people searching all the time. The best monetary decision for the company is to ensure that searchers experience excellent search results. That’s the core that’s going to help Google maintain their market share which, in turn, is what will help them grow.

Eric Enge: I’ll paraphrase it simply and say they are totally selfish and they are being selfish by working on their market share.

Vanessa Fox: That is exactly right. Many people don’t believe that there is a wall between the organic search people and everything else at Google. If they didn’t have such a wall you would have a situation where someone on the AdWords team would be approached by a large advertiser saying “I am having problems with the organic results, can you help me?”

Of course, that person would want to help the advertiser. By having that wall, the AdWords person doesn’t have access to the organic search people. There is this protectiveness around organic search, which enables those engineers to focus on the search experience. They don’t have to think about AdWords, they don’t have to think about how Google is making money, or what the CPMs are. They don’t have to think about any of those things and are able to concentrate on making the best search experience.

The whole environment was built that way which is unlike many other companies. In other companies, no matter what part of the organization you work in, you have to always think about how does this impact our revenue. At Google this is not part of the search engineers’ focus, which is great. Another reason is that many of the search engineers have been at Google since the beginning. They don’t have to work there anymore.

Eric Enge: At this point they could easily retire and buy an island.

Vanessa Fox: They continue to work there because they love data and love working with large amounts of data and improving things. I think if someone said to them,”I know you work on organic search, but we’ve decided it’s really important to either give advertisers preference or hold advertisers down. Could you tweak the algorithms?” They would probably say, “I am going to buy my island now, see you later.”

That’s not why they are at Google. They are there because they get to do cool things with large pieces of data. I think these two big factors make it basically impossible for anything other than a search experience to infiltrate what’s going on there.

Think of Panda as a Platform

Eric Enge: What is Panda?

Vanessa Fox: Panda isn’t simply an algorithm update. It’s a platform for new ways to understand the web and understand user experience. There are about four to five hundred algorithm updates a year based on all the signals they have. Panda updates will occur less frequently.

Eric Enge: Right. In the long run it will probably be seen as significant as the advent of a PageRank update.

Vanessa Fox: Yes, absolutely.

Eric Enge: At SMX Munich Rand Fishkin heard from Stefan Weitz and Maile Ohye that it’s a lot easier to recognize gaming of social signals than it is to recognize link spam.

Vanessa Fox: The social signals have more patterns and footprints around them. Also, the code that search engines use has gotten more sophisticated, and they have access to more data.

Eric Enge: Another thing I hear people talking about is that over time Google is looking to supplant links with other signals. My take on this is that links are still going to be a good signal, but they are not going to be the only signal.

Links will continue to be augmented with more data, which will make the value of links less important because there are other signals now in the mix.

Vanessa Fox: Google has been saying that for years. I don’t think the value of links will ever go away. They’ll continue to be augmented with more data, which will make the value of links less important because there are many other signals now in the mix.

Google never intended to be built solely on links. We didn’t have social media and Facebook like buttons, and all these things in the past. We only had links. Google was based on how can we build an infrastructure that algorithmically tells us what content people are finding most valuable on the web.

Google and Bing as black boxes

Eric Enge: I think another key component of this story is that Google and Bing are increasing the obscurity of the details of the algorithm. That’s not perfect phrasing, but I think you know what I mean.

Vanessa Fox: I think it becomes harder to reverse engineer for a number of reasons. There are so many moving parts that it’s hard to isolate. People who have systems that attempt to reverse engineer different parts of the algorithm for different signals may come to conclusions that are, or are not, accurate. This is because it’s impossible to isolate things down to a single signal.

You find cases where people think they have but, in reality, it’s the tip of an iceberg because you can’t see everything that’s under the surface. By having more signals and knowing so much more about the web the artificial stuff becomes more obvious.

Eric Enge: Absolutely. If you’ve got twelve different signals and someone games two of them and the other ten don’t agree, that’s a flag.

Vanessa Fox: Right. Which is why it’s so disheartening to me to see that some SEOs continue to react to this by saying, “okay, how can we figure out the algorithmic signals for Panda so we can cause our pages to have a footprint that matches a good quality site.” This is very short term thinking because the current signals are in use only during this snapshot in time.

At this point it’s going to be as difficult to create a footprint of a site with a good user experience as it would be to just create a site with a good user experience. This, of course, is not only a better long term perspective and more valuable, but it will result in a better rate of conversion for most businesses.

I’ve heard some people say things like, they’ve done some analysis and found that you have to vary the length of your articles on pages, so make sure that all of your articles are variable in length. And this is craziness. Even if it works this minute, next week it won’t work and then they will say the sky is falling again.

I read an article where a person said Seth Godin writes really short blogposts so he is going to be impacted by Panda, and how does Google know that if an article is short, it’s not valuable. But Google’s algorithms are not as simplistic as that. Seth Godin has not said he’s lost ranking because of Panda.

I commented on the post, and said this is not true. Google isn’t saying that a short article is not a valuable article. Publishers should make blog posts or articles as short or long as they need to be.

There will be plenty of cases where the best article is a short article.

Eric Enge: There will be plenty of cases where the best article is a short article.

Vanessa Fox: Absolutely and those will continue to rank.

How Publishers should think about Panda

Eric Enge: What would you say to a publisher if they believe they were unfairly affected by Panda? This is a tough question because 98% of the people affected by Panda will say they are in this category. They believe they were a drive by victim rather than something that fell out of the algorithm.

Vanessa Fox: That is a complicated question. I will not dispute, and I don’t think Google would dispute, any algorithmic change from any search engine has the potential of causing some collateral damage. If what you are doing as a search engine is asking, ” are the search results better?” then if the search results are better that doesn’t mean that a site with good content doesn’t accidentally end up lower.

That’s going to be the case with any change a search engine makes. From a content-owner perspective that is not good, which we’ll talk about in a second. However, I talked to many people affected by this and 75% to 80% of the time they said I’ve been hit and I shouldn’t have been hit. There have been only a few occasions where people say, “yeah, I’ve gotten away with it for a long time and they cut me off.”

Eric Enge: You appreciate their honesty, don’t you?

Vanessa Fox: Oh, absolutely. But most of the time people say I shouldn’t have been hit. If you’ve been working on a site for a long time, you may not see the areas it can be improved. I suggest you get an objective observer to provide you feedback and determine if there are any blind spots you’re not seeing. I think that would be a good first step.

It’s not one signal that’s been used. You need to determine does this page answer the question, does this help someone accomplish something.

Essentially, this has become a holistic thing. It’s not one signal that’s been used. You need to determine does this page answer the question, does this help someone accomplish something?

As a business you have to make money. You also have to understand that if a site is optimized for making as much money per visitor from ads as possible, as opposed to being optimized at being useful to the searcher, this site is probably not what a search engine wants to show as the best search results.

You have to balance that. Does it answer a searcher’s question, but also does it answer that questions better than any other site and is the answer easy to find? Look at the quality of what’s being said versus the quality of the other pages that are ranking. Is it better or worse? Then you have to determine if the content is awesome and is that obvious to the searcher.

From a user experience perspective, when they land on that page is the content they need buried? The user experience becomes important because Google wants the searcher to be happy and easily find their answer.

Let’s say the content and the user experience are good for that page. Then you run into the issue of quality ratio of the whole site. The question then becomes if someone lands on your site and they like that page, but they want to engage with your site further and click around your site, does the experience become degraded or does it continue to be a good experience?

For example, last year Google had this emphasis on speed, because their studies found that people are happier when pages load faster and abandon sites that load slowly. I’ve worked with companies whose pages take fifteen seconds before they load. No one will wait around anymore for fifteen seconds to load a page.

I don’t think this is a big part of Panda, it is just for illustration purposes.

If you isolate that as a signal you can have the best content in the world and the best user experience in the world. However, if someone does a search and lands on your page but it takes fifteen seconds for anything to appear, they’ve had a bad experience and they are going to bounce off.

You have to look holistically at everything that’s going on in your site. This is what you should be doing, as if search engines didn’t exist.

Eric Enge: Right. There is another element I want to get your reaction to which I refer to as the “sameness” factor. You may have a great user experience. You may have a solid set of articles that cover hundreds of different topics, and they may all be in fact original. However, it’s the same hundred topics that are covered by a hundred other sites and the basic points are the same, even though it’s original, there is nothing new.

Vanessa Fox: Right. I think that’s where added value comes into play. It’s important to look and see what other sites are ranking for. What are you offering that is better than other sites? If you don’t have anything new or valuable to say then take a look at your current content game plan.

Eric Enge: So, saying the same thing in different words is not the goal. I like to illustrate this by having people imagine the searcher who goes to the search results, clicks on the first result and reads through it. They don’t get what they want so they go back to the search engine, they click on a second result and it’s a different article, but it makes the same points with different words.

They still didn’t find what they want so they go back to the search engine, they click on the third result and that doesn’t say anything new either. For the search engine it is as bad as overt duplicate content.

Vanessa Fox: That’s absolutely right.

Eric Enge: It may not be a duplicate content filter per se, which is a different conversation than this one, but the impact is the same. It’s almost like an expansion of query deserves diversity, right.

The search engines have always said they want to show unique results, diverse results, valuable results.

Vanessa Fox: Right. These concepts have all been around for a long time, but we are seeing them perhaps played out with different sets of signals, but they are not anything new. The search engines have always said they want to show unique results, diverse results, valuable results, all these things.

Adding Diversity to your site with User Generated Content

Eric Enge: One thing I hear people talk a lot about regarding diversity is doing things with user-generated content. In my mind that can be a useful component provided it is contextually relevant and has something useful to say. Do you have some thoughts on that?

Vanessa Fox: Yes. I agree with you, it could go either way. Since Google’s goal is to provide useful, valuable results then you can certainly find pages where user-generated content provides that. If you look at TripAdvisor, which may have its faults, one benefit is that there are numerous first person accounts of hotels and other experiences.

Any hotel or vacation destination you are thinking of going to, you will find authentic, real information from people who’ve actually gone there.

Forums are another example where user-generated content is great. For instance on stackoverflow people are interested in answering questions and having discussions and that’s valuable content. You might have other forums where people aren’t saying anything or are there to spam and put their links.

I think it depends on both the topic and how much you are moderating things, how much time you are spending in curation, how much time you are spending organizing things in a useful way so it’s easy to find.

For instance, let’s say you have a recipe site and people tag their recipes with different variations. If you have a curation process that cultivates that and puts it into topics that people could land on a landing page and see all of the recipes about a particular topic, that will be more useful than things scattered everywhere with random tag pages.

I think there can still be work involved in UGC, although it can be useful and valuable. When you begin looking at health information, for instance, it might become harder. If it’s a site about sharing your experience about an illness, that’s one thing.

If it’s a site about diagnosing people and telling them what they should do to fix their illness, that’s another thing. If it is a group of people as opposed to doctors, you get into this authoritative issue and how do you know it is credible.

Crowd Sourced Content

Eric Enge: There is a related topic that has a different place in the picture, which is the notion of crowd sourced content. Essentially, using crowd sourced data to draw a conclusion, for example, with surveys and polls.

Vanessa Fox: This boils down to the same thing. Is it useful, valuable, credible, authoritative, and comprehensive? Is it all the things people are looking for and does it answer their question better than anything else out there on the web? We can look to TripAdvisor as an example of a site that’s been able to create valuable content on a large scale.

At a larger scale you have to move towards automated processes and, at that point, the curation process becomes harder.

At a larger scale you have to move towards automated processes and, at that point, the curation process becomes harder. Wikipedia has editors that are aggressive towards making sure the content is accurate. However, not all sites have that.

When you do surveys it can be fine, but if you are not manually reviewing the results, because of the large volume of data, that’s when something can potentially go awry, so you have to be careful with it.

The same thing can happen with aggregating data from different sources. If you look at something like Walk Score, they’ve been able to aggregate the data of how close are schools, bars, and other facilities from your house. Of course, you see other examples where it goes poorly, and you look at the page and it doesn’t make any sense.

Eric Enge: Right. It’s a matter of the context, the effort, and the level at which you are trying to do it.

Vanessa Fox: Yes. I think ultimately there will be a fair amount of work involved with running a business that adds value for people. With this age of technology, you see many cases where people say, “look at all the cool things I can do with technology and it’s very little work on my part.” This is sort of the four-hour work week syndrome.

Often, that does not produce the most valuable results. For instance, if we examine travel and look at a site like Oyster, which was started by Eytan Seidman who used to work on the search team at Microsoft, they pay full-time staff writers with a travel background to travel to hotels, write reviews, and take pictures. They aren’t in every city in the world, and they don’t have every hotel in the world.

That’s a corporate example, but there are travel bloggers, and food bloggers, and other people who only write ten blog posts. However, those ten posts are very comprehensive on the topic.

At a large scale, if you attempt to cover every topic in the world, you are not necessarily going to be able to compete with someone who has written something manually.

At a large scale, if you attempt to cover every topic in the world, you are not necessarily going to be able to compete with someone who has written something manually, gone there, and spent time editing their article. It wouldn’t make sense that your automated content would outrank them.

Eric Enge: Absolutely. It reminds me of another thread which I am not sure fits in the interview, but I am going to say it anyway. When I grew up I watched the news with Walter Cronkite. He was completely trusted and authoritative. Today we have Fox News, which is entertainment.

That’s the design of Fox News and more power to them; however, you have to imagine that as a culture we are going to have a drive towards getting news from a source that you can trust.

Vanessa Fox: Right. Google did a blog post recently where they talked about the trust element. They said it is certainly one of the questions you should ask yourself when you are evaluating a site. Can you trust it?

Eric Enge: Right. Will you give it your credit card or will you trust it for medical advice?

Vanessa Fox: Would you follow the instructions to save your life? This is where brand comes in. I don’t think it has to be a huge brand, but brand does help the trust factor. Building a brand that people see over and over makes a difference.

This is a major reason why I do not recommend microsites. I know many people who want to do a bunch of micro sites but lack of a brand is one reason I tell them it’s probably not a good idea.

It’s hard to build a brand with a bunch of micro sites that aren’t branded in a unified way. If you build one site under one brand you can build brand engagement; however, you can’t do that with a bunch of micro sites that are branded separately.

Social Media and Branding

Eric Enge: Do you think an effective tactic for beginning to build the brand would involve social media?

Vanessa Fox: It depends on the topic and audience. Where is your audience, are they on social media? If you can engage that audience and build up authority with them that is great. I think social media levels that playing field a bit. In the past, you had to hire a publicist, do press releases, have relationships with reporters, and get on Good Morning America, or something on that order, to get your name recognized.

It still takes work but you can go out on social media, see where people are talking about your topic area, answer their questions, and be that authoritative source. I think it can be great but it doesn’t fit every situation.

SEO still matters

Eric Enge: One last question since we’ve been talking about holistic marketing. The search engines still have mechanical limitations because of how they crawl web pages. So being search engine savvy is still important,

Vanessa Fox: Absolutely. Search engines crawl the web and they index the web. Technical aspects, such as how the server responds, how the page URLs are built, and what the redirects are, make a huge impact. You can have the best content in the world but if search engines can’t access that content it’s never going to be indexed to rank. So, absolutely, all that stuff is vitally important.

Eric Enge: The other component is the promotional component which is to go out and implement programs to make people aware of your site and draw links to it, and social media campaigns.

Vanessa Fox: Yes. That’s absolutely the case. I think it goes with the idea you’ve heard from the search engines for a long time which is what would you do if search engines didn’t exist? You need to build your business and part of that is building awareness about your business.

I think the web makes it easier but you need to raise awareness so people know that it’s there. Whether it is through social media or other types of PR, there are many things you can do. You can’t think of your audience engagement strategy as simply SEO. All these other components help SEO, but there are things you need to do in business even if you weren’t doing it for SEO.

The Scope of Panda

Eric Enge: Any last thoughts on Panda?

I talk to many people who have sites that have been hit and I certainly sympathize with their plight. However, there is no quick fix in these cases.

Vanessa Fox: I talk to many people who have sites that have been hit and I certainly sympathize with their plight. However, there is no quick fix in these cases.

I talked to a site owner two weeks ago that said, “maybe if we change our URL so that they are closer to the root of the site instead of having folders in them that will get us back in.” This is the wrong way of looking at it.

Eric Enge: Yes. That’s a clear “no”. For sites who have been hit by Panda, I don’t think, for the most part, there is a quick fix.

Most sites will not be lucky enough to have one section of their site that is a total boat anchor that they can just not index and be done with it. Most sites probably have a real process to go through.

Vanessa Fox: Yes. It’s hard to hear because this is affecting people’s businesses. I think it is going to be a lot of work to figure out who your audience is, what they are they looking for, are you engaging them well, and are you providing value beyond all the stuff that we talked about. It is a process.

Eric Enge: Thanks Vanessa!

Check out Ramblings About SEO for more articles by Eric Enge.

Why SEO Disgusts Me

Before my SEO friends get their panties in a wad over today’s headline, let me emphasize that I understand the practical value and wisdom of basic Search Engine Optimization practices. There are many prinicipled people in the field doing good and useful work.

What tactics frustrate you the most? Comment here.

But the competition to out-fox the search engines is getting ugly. Beyond ugly.

I recently had a discussion with the CEO of a leading Midwest search firm who described their common practice of creating fake accounts to pump client links into the comment section of blog posts and forums.

The process goes something like this:

The company hires home-bound individuals or low-wage people in developing countries to freelance as professional blog commenters.
The blog commenters are trained on how to pose as fake people and comment in a way that does not alert the suspicion of Google or the author of the blog.
The freelance commenters are then given assignments, fake personas and email accounts to provide an appearance of legitimacy. A 50-year-old man in Indianapolis might be posing as a 30-year-old housewife in Pittsburgh, for example.
The commenters are compensated by the number of client links they can successfully work into a comment or forum — as many as five in one post.

Reality check. Isn’t this fraud?

I really don’t pay attention to the SEO shenanigans like this on a day to day basis but now these practices are starting to impact me and my precious time. Here is an example of this practice in a comment that was salted into the {grow} comment section by “John” –

This is good post. This is some good important facts about the corporate blogs. Do you have any information on how to manage comments on the blog. I think http://www. (web link to consumer electronics retail outlet) might have an idea. Chech it out.

And of course this linked website did not even have a blog. So now I am spending my time weeding out fake comments that elude the spam filter … and it happens every day.

I spoke to one of the freelancers hired by this SEO company to provide this faux commenting service. He’s otherwise unemployed and is doing it because he’s desperate for money. He’s good at what he does and rarely gets “outed.”

However as he described his work, he told me he feels guilty when people on the blogs actually want to engage with his fake persona. “I feel terrible about this,” he said. “I have to find some other work. I’m deceiving people as part of my job. I’m not in a position to engage with them because I’m a fake, which seems wrong.”

While Google fights against this kind of practice, it is very difficult to detect, and the “penalties” are so minor the risk is ignored by the SEO’s. And the volume of fake comments is likely to get worse. This firm alone has hired 300 fake commenters in the past 12 months and sees rapid expansion as a key competitive advantage.

The CEO of this SEO company does not consider this a “black hat” SEO practice — “it’s gray,” he said, “and we have many companies willing to pay us a lot of money to do it.” He bragged that one client has a monthly SEO bill of $200,000.

I recognize that there are many important business insights and strategies that can come from legitimate SEO professionals like:

Keyword research + targeting
Testing + optimizing content for users
Content strategy direction
Making sites search-engine friendly
Leadership for analytics
Opportunities for alternative search listings
User experience improvement

… and more. But I’m concerned when it gets difficult to compete in the industry without engaging in fraudulent behavior. This is a slippery slope that will lead to regulation. All it will take is one high-profile case that blows the lid off these practices. And we will all lose if we have to endure new rules and the cost of compliance.

I want to do business with people who view ethics as black and white, not gray. I want to work in an industry where we can compete fairly without resorting to SEO fraud to cover up ineffective products, services and marketing plans. How about you?

Let us know in the comments.

Check out BusinessGrow.com for more articles by Mark Schaefer.

June 26, 2011

Google Panda 2.2 Goes Live

Update: A Google spokesperson tells WebProNews, “We’re continuing to iterate on our Panda algorithm as part of our commitment to returning high-quality sites to Google users. This most recent update is one of the roughly 500 changes we make to our ranking algorithms each year.”

A couple weeks ago, Google’s Matt Cutts talked about a new iteration of Google’s Panda update, which he said was already approved and would be hitting soon. The update has been commonly referred to throughout the search industry as “Panda 2.2”.

This version is expected to more heavily address the issue of scraped content, an issue that continues to plague the web and Google’s search results (the scraped content often ranks higher than the original) even post-Panda. Cutts is quoted as saying in a liveblog of an SMX Advanced session, “A guy on my team [is] working on that issue. A change has been approved that should help with that issue. We’re continuing to iterate on Panda. The algorithm change originated in search quality, not the web spam team.”

Google has not made any announcements or references indicating that the update has gone live yet, but webmasters are thinking it might have been released. Barry Schwartz at Search Engine Roundtable points to a WebmasterWorld thread, where there is a mix of webmasters claiming they have suffered from the alleged update and some that have recovered. One said, “I’ve recovered as of 36 hours ago. Day1 of recovery, traffic doubled, adsense tripled Day2 of recovered, 4 hours in, traffic has doubled again, back to my best levels of 18 months ago.”

Cutts said he didn’t know when Panda would be launched internationally (in other languages), and that Google has made no manual exceptions with the update, meaning all sites have been affected solely by algorithmic tweaks – none by hand. ”

It’s important to note that Google makes algorithmic adjustments every day, and sometimes even more than once a day. Obviously not all updates get the attention the Panda update has received, but you never know when some adjustment can impact your site’s rankings for better or for worse.

Cutts recently said, “If we think you’re relatively high quality, Panda will have a smaller impact. If you’re expert enough and no one else has the good content, even if you’ve been hit by Panda that page can still rank.”

You can still rank. Obviously you want to strive for quality, but this just shows that even if you lose a great deal of Google love at any point in time, you still have hope of getting back into its good graces if the quality is there. As Dani Horowtiz of DaniWeb discussed in a recent interview with WebProNews, there are a lot of adjustments you can make to your website (not all of which are directly content related) that can help you regain search referrals. Google has over 200 signals, and even if you have a lot of trouble gaining ground on one of them, there are a lot more areas where you may be able to improve.

June 20, 2011

Google Adds HTTP Header Support to rel= “canonical”

Google is now supporting link rel=”canonical” relationships specified in HTTP headers. Evidently this was heavily requested, as the feature is in response to webmaster feedback.

The syntax can be found in this document. Google Webmaster Trends analyst Pierre Far outlines an example on the Webmaster Central Blog:

To see the rel=”canonical” HTTP header in action, let’s look at the scenario of a website offering a white paper both as an HTML page and as a downloadable PDF alternative, under these two URLs:
http://www.example.com/white-paper.html
http://www.example.com/white-paper.pdf
In this case, the webmaster can signal to Google that the canonical URL for the PDF download is the HTML document by using a rel=”canonical” HTTP header when the PDF file is requested; for example:
GET /white-paper.pdf HTTP/1.1
Host: www.example.com
(…rest of HTTP request headers…)

HTTP/1.1 200 OK
Content-Type: application/pdf
Link: ; rel=”canonical”
Content-Length: 785710
(… rest of HTTP response headers…)

Far also notes that the canonical tag in HTTP headers might help when a site serves the same file from multiple URLs, such as a content distribution network, when the webmaster wants to let Google know the preferred URL.

The support is for web search only.

Last month, Google’s Matt Cutts discussed some reasons (which are few and far between) why Google might skip your canonical tags:

If you’re unfamiliar with rel=”canonical” altogether, watch our interview with Cutts from when it was first launched:

June 17, 2011

Panda Victim Xomba “Dances on the Bones of Content Farms”

Earlier this year, we had a conversation with John Citrone, editor of the online writing community Xomba.com, which fell victim to Google’s Panda update. He gave us a preview of a redesign of the site, which had been in the works for some time prior to the update, as he told us the company saw the update coming and started preparing for it last summer.

“Around the first of the year, we began creating a new site design with new community networking features for people who want to express themselves in more than 140 characters,” he told us. “Our new design will reduce or eliminate our dependence on Google to bring us traffic through its search results; our focus is to build a community of people who want to network with each other and share their experiences and their passions.”

Today, Citrone tells WebProNews, ” Well, it’s official — we’ve relaunched with a new model and vision.” The company’s announcement comes with the title, “Xomba.com Dances on the Bones of Content Farms with New Site, New Direction, New Philosophy.” Here’s that:

The days of writing on content farms for fat payouts are over, and sites that continue to operate on that model are collapsing with a resounding “WTF?”

Not Xomba.

Long before Google decided to single-handedly wipe out content farms with the innocently named Panda update, Xomba was in the process of building a new site where users can share ideas, without fear of the corporate monolith search engine ruining the party. “Last summer,” says CEO Nick Veneris, “we began stripping low-quality content from Xomba while restructuring the site — from design to philosophy to purpose — to put the user, rather than the content, at the center of attention. This was nearly a year before anyone had even heard of Panda.”

“Too many sites, including the old Xomba, relied on substandard content to bring in revenue,” says Veneris. “We believe that method is dead, and we are ushering a new age of writing online. Xomba is, in a word, a social networking site for people who like to write — a place where people can share ideas in more than 140 characters, make friends and have a good time.”

With the new design, Xomba has introduced a follow system, easy-to-use writing templates, Author rankings and an educational component (called Xomba University) to help along inexperienced writers. Since relaunch, Xomba has seen an uptick of between two and three times the Article submissions, most of which are coming new users. And Xomba is already prepared to move — literally — to the next level, even considering investment capital to get them there. The redesign is just scratching the surface, as Xomba plans to introduce niche sites (sites focusing on a single topic) as part of the new Xomba publishing network.

It remains to be seen whether the changes will boost Google’s perception of content quality from Xomba, but the search engine has certainly been placing more emphasis on authors too. Last week, the company announced the rel=”author” authorship markup.

Google says it is experimenting with using data from this to help people find content from authors in search results, and will continue to look at ways it could help the search engine highlight authors and rank search results.

June 16, 2011

Google’s Matt Cutts on Why Amazon Often Ranks Well

If you search for products a lot, using Google, there’s a fairly good chance you’ve seen Amazon at or near the top of the results pages quite a few times. Someone sent a question about this to Google, and Matt Cutts used a Webmaster Help video to discuss the subject.

The question was phrased as: “Search for a physical product usually ranks Amazon #1, even though it may not provide the best user experience. What is being done to prevent large corporations from dominating search engine results?”

Matt’s responded by saying, “I think in general, not to call anybody out, but I think Amazon does have a relatively good user experience in general. I wouldn’t necessarily agree that Amazon always ranks number one for every physical product.”

“So typically when I do a search for a book, Amazon is up there, but if there is an official homepage for a book, it often ranks very well, and sometimes number one as well,” he continued. “The interesting thing is not every book has a home page. This is something that still surprises me. You’ll have a very savvy author. They’ll have a webpage, but they may not have a landing page or a page dedicated to that specific book. Sometimes it’s just a lack of savviness.”

He brought up one book that he had recently looked at, noting that no other content about it was on the web, other than Amazon, GoodReads, and Google eBooks.

“The best answer is, make sure there is an actual page for your product,” said Cutts. “In general, Google does try to figure out what are the official home pages whether it be for governments, universities, or states or whatever, and we try to make sure we return those when possible.”

“We are mindful of whenever users do a search, and then they complain to us, if they complain that they’re not finding an official homepage for a product, then that’s something that we do take into consideration,” he said. “In general, we do look at the number of links. We look at the content of the page, and if one particular website is able to get a lot of links, because a lot of people thank it’s a great site, then in general, usually it should rank relatively well, and I think that by itself isn’t necessarily a problem.”

Looking at this a little bit myself, I did find that a search for my wife’s book, “The Fireman’s Daughter” did return an Amazon result within the first few results (a band with the same name is ranking above it), while the landing page for the book from the actual publisher is buried 5 pages in. There are pros and cons to Amazon outranking this page. She makes more in royalties if the book is purchased directly through the publisher, but on the other hand, the Amazon brand also lends a bit of trust from the user’s perspective, as not as many people will be familiar with the publisher itself (this may be a different story with some more well-known publishers).

The question is not just about books though. Looking at it from the perspective of the average online store, the consumer trust factor likely plays a big role in Amazon’s rankings. Remember Google’s list of questions you could use to assess the quality of your site? It included something like “Would you feel comfortable giving your credit card information to this site?”

June 13, 2011

Could Botnets Inflate Search Engines’ Query Counts?

Gabriel Weinberg, the founder of the DuckDuckGo search engine had an interesting blog post over the weekend, talking about some weird traffic the site was getting. He points to a couple SERPs generated by what he deems to be queries from botnets.

In the comments, Eric Klein says one of them looks like its trying to boost ranking by “making it seem that there are a lot of requests.”

“Could they be trying to game their placement on search pages by hitting the search sites often enough to pretend to be meaningful?” he asks.

Weinberg responds, “Ahh, that’s a very plausible theory.”

“It’s known as ‘referral spam,” adds @TristanPerryIX in the comments. “It spams servers with HTTP referrers for two purposes:

1) Sometimes webmasters will (like you done :)) review the logs, and then click on their website. I assume these spammers are (wrongly!) assuming that this is ‘good’ traffic which might result in a sale/advert click.

2) Some websites have an automatic ‘these websites link to us..’ section or widget. Naturally referral spam could be used to game this section, thus getting a backlink (thus gaming the search engines).”

DuckDuckGo doesn’t save IPs, which is one of its selling points. It blocks them at the firewall level, Weinberg says, noting that this is dissociated from query data. “If we didn’t block the most egregious botnet machines and abusers, our machines would almost instantly be under water,” he adds.

“This discussion now makes me wonder if other search engines include this errant traffic in their query counts,” says Weinberg. “We work hard to keep them completely out because they would overwhelm our real direct queries #s and therefore distort our perception of progress. We also separate out API requests for the same reason, which now also makes me wonder whether everyone else is doing that too.”

It’s an interesting factor to consider, when you think about search market share.

Below, you can view a recent interview we conducted with Weinberg:

June 13, 2011

Ranking Google Ranking Factors By Importance

Rand Fishkin and SEOmoz polled 132 SEO experts with data from over 10,000 Google search results, and have attempted to rank the importance of ranking signals. It’s not confirmed fact, obviously. Google won’t provide such information, but I suppose the next best thing is the collective opinion of a large group of people who make their livings getting sites to rank in search engines, and Fishkin has put together an impressive presentation.

Do you think Google is ranking search results effectively? Comment here.

You can view the entire presentation here, but I’ve pulled out a few key slides that basically sum up the findings.

The factors are actually broken down into the following subsets, where each is ranked against other related factors: overall algorithmic factors, page-specific link signals, domain-wide link signals, on-page signals, domain name match signals, social signals, and highest positively + negatively correlated metrics overall.

The results find that page-level link metrics are the top algorithmic factors (22%), followed by domain-level, link authority features (21%). This is similar to the same SEOmoz poll for 2009, but there is a huge difference in the numbers, indicating that experts are less certain that page-level link metrics are as important. In 2009, they accounted for 43%.

Page-specific link signals are cited as metrics based on links that point specifically to the ranking page. This is how the results panned out there:

According to Fishkin, the main takeaways here are that SEOs believe the power of links has declined, that diversity of links is greater than raw quantity, and that the exact match anchor text appears slightly less well-correlated than partial anchor text in external links.

Domain-wide link signals are cited as metrics based on links that point to anywhere on the ranking domain. Here is what the poll looked like in this department:

The report compares followed vs. nofollowed links to the domain and page, finding that nofollow links may indeed help with rankings:

On-page signals are cited as metrics based on keyword usage and features of the ranking document. Here’s what the poll looked like on these:

Fishkin determines that while it’s tough to differentiate with on-page optimization, longer documents tend to rank better (possibly as a result of Panda), long titles and URLs are still likely bad for SEO, and using keywords earlier in tags and docs “seems wise”.

Here is how the domain name extensions in search results shook out:

Here are the poll results on social-media-based ranking factors (which Google has seemingly been putting more emphasis on of late):

Fishkin suggests that Facebook may be more influential than Twitter, or that it might simply be that Facebook data is more robust and available for URLs in SERPs. He also determines that Google Buzz is probably not in use directly, as so many users simply have their tweet streams go to Buzz (making the data correlation lower). He also notes that there is a lot more to learn about how Google uses social.

Andy Beard has been testing whether links posted in Google Buzz pass PageRank or help with indexing of content since February 2010. He is now claiming evidence that Buzz is used for indexing.

Danny Sullivan asked Google’s Matt Cutts about the SEOmoz ranking factors survey in a Q&A session at SMX Advanced this week – specifically about the correlation between Facebook shares and Google rankings. Cutts is quoted as saying, “This is a good example of why correlation doesn’t equal causality because Google doesn’t get Facebook shares. We’re blocked by that data. We can see fan pages, but we can’t see Facebook shares.”

The SEOmoz presentation itself has a lot more info about the methodology used and how the correlation worked out.

All of the things covered in the presentation should be taken into consideration, particularly for sites that have experienced significant drops in rankings (because of things like the Panda update or other algorithm tweaks). We recently discussed with Dani Horowitz of Daniweb a number of other things sites can also do that may help rankings in the Post-panda Google search index. DaniWeb had been hit by Panda, but has seen a steady uptick in traffic since making some site adjustments, bringing up the possibility of Panda recovery.

Barry Schwartz at Search Engine Roundtable polled his readers about Panda recovery, and 4% said they had fully recovered, while more indicated that they had recovered partially. Still, the overwhelming majority had not recovered, indicating that Google probably did its job right for the most part (that’s not to say that some sites that didn’t deserve to get hit didn’t get hit). In that same Q&A session, Cutts said, “The general rule is to push stuff out and then find additional signals to help differentiate on the spectrum. We haven’t done any pushes that would directly pull things back. We have recomputed data that might have impacted some sites. There’s one change that might affect sites and pull things back.”

A new adjustment to the Panda update has been approved at Google, but has not rolled out yet, he says. This adjustment will be aimed at keeping scraped content from ranking over original content.

Home Page Content

There have also been other interesting bits of search-related information coming out of Google this week. Cutts posted a Webmaster Central video talking about the amount of content you should have on your homepage.

“You can have too much,” said Cutts. “So I wouldn’t have a homepage that has 20MB. You know, that takes a long time to download, and users who are on a dial-up or a modem, a slow connection, they’ll get angry at you.”

“But in general, if you have more content on a home page, there’s more text for Googlebot to find, so rather than just pictures, for example, if you have pictures plus captions – a little bit of textual information can really go a long way,” he continued.

“If you look at my blog, I’ve had anywhere from 5 to 10 posts on my main page at any given time, so I tend to veer towards a little more content when possible,” he added.

Who You Are May Count More

Who you are appears to be becoming more important in Google. Google announced that it’s supporting authorship markup, which it will use in search results. The company is experimenting with using the data to help people find content from authors in results, and says it will continue to look at ways it could help the search engine highlight authors and rank results. More on this here.

Search Queries Data from Webmaster Tools Comes to Google Analytics

Google also launched a limited pilot for search engine optimization reports in Google Analytics, tying Webmaster Central data to Google Analytics, after much demand. It will use search queries data from WMT, which includes:

New Google Panda Update Approved, On the Way

Google’s Matt Cutts spoke in a Q&A session with Danny Sullivan at SMX Advanced this week, and discussed the Panda update, among other things.

A lot of sites have been critical of Google for returning results that are scraped versions of their orignal content. Cutts is quoted as saying in a liveblog of the session, “A guy on my team [is] working on that issue. A change has been approved that should help with that issue. We’re continuing to iterate on Panda. The algorithm change originated in search quality, not the web spam team.”

He says there’s another change coming soon, and that he still doesn’t know when Panda will be launched fully internationally (in other languages). He also says they haven’t made any manual exceptions with Panda.

You may recall that the Mac blog Cult of Mac was hit by the original Panda update, and then after exchanging some dialogue with Google the site ended up getting some new traffic. Matt says, however, “We haven’t made any manual exceptions. Cult of Mac might have been confused because they started getting all this new traffic from blogging about it, but we haven’t made any manual exceptions.”

Yesterday we looked at some poll results from Search Engine Roundtable that found 4% of sites were saying they had fully recovered from the Panda update. Some other sites have been finding partial recovery.

Search Engine Roundtable Shares Panda Poll

Image credit: Search Engine Roundtable

On the prospect of sites having recovered from the update, Matt is quoted as saying, “The general rule is to push stuff out and then find additional signals to help differentiate on the spectrum. We haven’t done any pushes that would directly pull things back. We have recomputed data that might have impacted some sites. There’s one change that might affect sites and pull things back.”

You may also recall Google’s list of questions that webmasters could use to assess the quality of their content. Cutts talked briefly about those questions, saying, “It could help as we recompute data.”

He also said that what is being called “Panda 2.2” has been approved but has not yet been rolled out. “If we think you’re relatively high quality, Panda will have a smaller impact. If you’re expert enough and no one else has the good content, even if you’ve been hit by Panda that page can still rank.”

That says a lot about original content.

June 9, 2011

Bing Webmaster Tools Refreshed

Bing has launched some enhancements to Bing Webmaster Tools in an update called “Honey Badger”.

“Today’s redesign offers webmasters a simplified experience that allows them to quickly analyze and identify trends – while also bringing new and unique features to the industry,” a representative for Bing tells WebProNews. “Our goal is to help webmaster make faster, more informed decisions and drive new insights about their website by presenting them with rich visuals and more organized, relevant content.”

Enhancements include:

Crawl delay management: Lets webmasters configure the bingbot crawl rate for a specific domain.
Index Explorer: Allows webmasters to the ability to access data in the Bing index regarding a specified domain.
User and Role Management: Provides site owners with the ability to grant admin, read/write or read-only access to other users for their site.

Crawl deal is configurable by hour. Users can ask Bing to crawl slower during peak business hours or have it crawl faster during off-peak hours. There is drag-and-drop functionality that lets users create a crawl graph by clicking and dragging the mouse pointer across the graph. Individual columns can also be clicked for fine-tuning.

Index Explorer, Bing says, is a “complete rewrite” of the Index Tracker backend, focusing on freshness, performance, extensibility, reduced machine footprint, and stability and failure detection. New sites will have this data as they sign up.

The company also launched the ability for webmasters to manage deep-links and added over 40 new educational documents and videos to the Toolbox site. The content covers things like: using Webmaster Tools, data explanation, link building, removing/blocking pages from Bing’s index, SEO guidance, managing URL parameters, rich snippets (schema.org), canonicalization, nofollow, managing redirects, 404 page management, etc.

Bing says you can “count on more monthly content being added” to Webmaster Tools in the near future.

June 9, 2011

Google on How Much Content You Should Have On Your Home Page

The latest Google Webmaster Central video from Matt Cutts talks about home page content. Given issues like content depth and site speed, which Google has brought up a great deal in recent memory, the content on your home page is worth considering with regard to these things as well.

The question from a user, which Matt addresses is: “More or less content on a homepage?”

@googlewmc
WMXToday’s webmaster video: How much content should be on a homepage? http://goo.gl/SE9ss 1 hour ago via web · powered by @socialditto

“You can have too much,” says Cutts. “So I wouldn’t have a homepage that has 20MB. You know, that takes a long time to download, and users who are on a dial-up or a modem, a slow connection, they’ll get angry at you.”

“If you look at my blog, I’ve had anywhere from 5 to 10 posts on my main page at any given time, so I tend to veer towards a little more content when possible,” he adds.

You can see Matt’s blog here, if you want a better idea of how he does it.

June 8, 2011

rel=”author” is Same-Site Only

I managed to ping Google’s @mattcutts after the announcement of rel=”author” support from Google on Twitter and he clarified the use case a little.

As Twitter’s search is still so terrible at finding things I am adding the conversation here.

@AndyBeard
Andy BeardNew rel=”author” support http://goo.gl/FCK3l ( @mattcutts is this suitable for cross domain attribution too for syndicated content? ) 21 hours ago via web · powered by @socialditto

@mattcutts
Matt Cutts@AndyBeard for now it’s same-site, just to be safe. My (personal) guess is we’ll see if that can be expanded over time in a trusted way. 20 hours ago via web · powered by @socialditto

@AndyBeard
Andy Beard@mattcutts thanks for the clarification & intended current use 20 hours ago via web · powered by @socialditto

@mattcutts
Matt Cutts@AndyBeard sure thing. Remember, rel=canonical also started as same-site only, then as we trusted it more, it became cross-site. 20 hours ago via web · powered by @socialditto

@AndyBeard
Andy Beard@mattcutts I can’t sneak a rel=”canonical” into an author bio link, or ask content partners such as @WebProNews to include it 20 hours ago via web · powered by @socialditto

My last point is at least partially related to Google’s Panda update because it is quite frequently seen, possibly more than before, that original content doesn’t rank yet scraped copies of it does.

There are reasons why that happens, but a microformat rel=”author” and possibly something new… rel=”original” for a link to the canonical source would be useful.

Something like this would be easier to implement than the metatag alternative currently in testing with newspapers . ( original-source & syndication-source )

This is something really easy to get implemented in a number of CMSs, though in most cases it would be theme dependent not something that is part of core.

Originally published at Internet Business & Marketing Strategy

June 8, 2011

Search Engine Patents and Panda

Bill Slawski is the president and founder of SEO by the Sea, and has been engaging in professional SEO and internet marketing consulting since 1996. With a Bachelor of Arts Degree in English from the University of Delaware, and a Juris Doctor Degree from Widener University School of Law, Bill worked for the highest level trial Court in Delaware for 14 years as a court manager and administrator, and as a technologist/management analyst. While working for the Court, Bill also began to build and promote web pages, and became a full time SEO in 2005. Working on a wide range of sites, from Fortune 500 to small business pages, Bill also blogs about search engine patents and white papers on his seobythesea.com blog.

What are the Most Likely Signals Used by Panda?

Eric Enge: Let’s chat about some of the patents that might be playing a role in Panda 1, 2, 3, 4, 5, 6, 7 and beyond. I would like to get your thoughts on what signals are used for measuring either content quality or user engagement.

Bill Slawski: I’ve been looking at sites impacted by Panda. I started from the beginning with remedial SEO. I went through the sites, crawled through them, looked for duplicate content issues within the same domain, looked for things that shouldn’t be indexed that were, and went through the basic list that Google provides in their Webmaster Tools area.

In the Wired interview with Amit Singhal and Matt Cutts regarding this update, they mentioned an engineer named Panda. I found his name on the list of papers written by Googlers and read through his material. I also found three other tool and systems engineers named Panda, and another engineer who writes about information retrieval and architecture. I concluded that the Panda in question was the person who worked on the PLANET paper (more on this below).

For signals regarding quality, we can look to the lists of questions from Google. For example, Does your web site read like a magazine? Would people trust you with their credit card? There are many things on a web site that might indicate quality and make the page seem more credible and trustworthy and lead the search engine to believe it was written by someone who has more expertise.

The way things tend to be presented on pages, for instance where eight blocks are shown, may or may not be signals. If we look at the PLANET whitepaper “Massively Parallel Learning of Tree Ensembles with MapReduce” its focus isn’t so much on reviewing signals with quality or even user feedback but, rather, how Google is able to take a machine learning process dealing with decision trees and scale it up to use multiple computers at the same time. They could put many things in memory and compare one page against another to see if certain features and signals appear upon those pages.

Eric Enge: So, the PLANET whitepaper described how to take a process, which before was constrained to a one computer machine learning process, and put it into a distributed environment to gain substantially more power. Is that a fair assessment?

Bill Slawski: That would be a fair assessment. It would use the Google file system and Google’s MapReduce. It would enable them to draw many things into memory to compare to each other and change multiple variables at the same time. For example, a regression model type approach.

Something that may have been extremely hard to use on a very large dataset becomes much easier when it can scale. It’s important to think about what shows up on your web page as a signal of quality.

It’s possible that their approach is to manually identify pages that have quality, content quality, presentation, and so on and use those as a seed set to use with the machine learning process. To identify other pages, and how well they may rank in terms of these different features, makes it harder for us to determine expressly which signals the search engines are looking for.

If they are following this PLANET-type approach in Panda with the machine learning, there may be other things mixed in. It is hard to tell. Google may not have solely used this approach. They may have tightened up phrase-based indexing and made that stronger in a way that helps rank and re-rank search results.

Panda may be a filter on top of those where some web sites are promoted and other web sites are demoted based upon some type of quality signal score.

It appears that Panda is a re-ranking approach. It’s not a replacement for relevance and Page Rank and the two hundred plus signals we are used to hearing about from Google. It may be a filter on top of those where some web sites are promoted and other web sites are demoted based upon some type of quality signal score.

Eric Enge: That’s my sense of it also. Google uses the term classifier so you could imagine, either before running the basic algorithm or after, it is similar to a scale or a factor up or down.

Bill Slawski: Right. That’s what it seems like.

Page Features as an Indicator of Quality

Eric Enge: You shared another whitepaper with me which dealt with sponsored search. Does that whitepaper add any insight into Panda? The PLANET paper followed up on an earlier paper on sponsored search which covered predicting bounce rates on ads. It Looked at the landing pages those ads brought you to based upon features found on the landing pages.

They used this approach to identify those features and then determined which ones were higher quality based upon their feature collection. Then they could look at user feedback, such as bounce rates, to see how well they succeeded or failed. This may lead to metrics such as the percentage of the page above the fold which has advertising on it.

Bill Slawski: Now you are talking about landing pages so many advertisers may direct someone to an actual page where they can conduct a transaction. They may bring them to an informational page, or an informational light page, that may not be as concerned with SEO as it is with calls to action, signals of reassurance using different logos, and symbols that you would get from the security statistical agencies.

That set of signals is most likely different from what you would find on a page that was built for the general public or for search engines. However, if you go back to the original PLANET page they said, “this is sort of our proof of concept, this sponsored search thing. If it works with that it can work well with other very large datasets in places like organic search.”

Eric Enge: So, you may use bounce rate directly as a ranking signal but when you have newer information to deal with why not predict it instead?

Bill Slawski: Right. If you can take a number of features out of a page and use them in a way that gives them a score, and if the score can match up with bounce rate and other user engagement signals, chances are a feature-based approach isn’t a bad one to take. Also, you can use the user behavior data as a feedback mechanism to make sure you are doing well.

Eric Enge: So, you are using the actual user data as a validator rather than a signal. That’s interesting.

Bill Slawski: Right. You could do the same thing with organic search which, to a degree, they did that with blocked pages signal. This is where 85% of pages that were blocked were also pages that had lower quality scores. You can also look at other signals, for example, long clicks.

Eric Enge: Long clicks, what’s that?

Bill Slawski: I dislike the term bounce rate because it, by itself, doesn’t conclusively infer that someone visits the page and then leaves in under a few seconds. It implies that someone goes to a page, looks at it, spends time on it, and then leaves without going somewhere else. A long click is when you go to a page and you actually spend time there.

Eric Enge: Although, you don’t know whether or not they spent time there because they had to deal with a phone call.

Bill Slawski: Or, they opened something else up in a new tab and didn’t look at it for a while. There are other things that could measure this and ways to confirm agreement with it, such as how far someone scrolls that page.

Eric Enge: Or, if they print the page.

Bill Slawski: And clicks at the bottom of the page.

Eric Enge: Or clicks on some other element. Could you track cursor movements?

Bill Slawski: There have been a couple patents, even some from Google, on tracking cursor movements that they may possibly use someday. These could give them an indication of how relevant something may, or may not, be to a particular query.

One patent is described as being used on a search results page, and it shows where someone hovers for a certain amount of time. If it’s a search result, you see if they hover over a one-box result which may give them an incentive to continue showing particular types of one-box results. That’s a possibility, mouse pointer tracking.

Bounce Rates and Other User Behavior Signals

Eric Enge: Getting back to the second whitepaper, what about using the actual ad bounce rate directly as a signal because that’s also potentially validating a signal either way?

Bill Slawski: It’s not necessarily a bad idea.

Eric Enge: Or low click through rates, right?

Bill Slawski: As we said, user signals sometimes tend to be noisy. We don’t know why someone might stay on one page longer than others. We don’t know if they received a phone call, if they opened it up in a new tab, if they are showing someone else and have to wait for the person, or there are plenty of other reasons.

You could possibly collect different user behavior signals even though they may be noisy and may not be an accurate reflection of someone’s interest. You could also take another approach and use the user behavior signals as feedback. To see how your methods are working, you have the option to have a wider range of different types of data to check against each other.

Rather than having noisy user data be the main driver for your ranking… you look at the way content is presented on the page.

Bill Slawski: That’s not a bad approach. Rather than have noisy user data be the main driver for your rankings, you find another method that looks at the way content is presented on a page. One area is segmentation of a page which identifies different sections of a page by looking at features that appear within those sections or blocks, and which area is the main content part of a page. It’s the part that uses full sentences, or sometimes sentence fragments, uses periods and traumas, capital letters at the beginning of lines or text. You use a Visual Gap Segmentation (White Space) type process to identify what might be an ad, what might be navigation, where things might be such as main content areas or a footer section. You look for features in sections.

For instance, a footer section is going to contain a copyright notice and being able to segment a page like that will help you look for other signals of quality. For example, if an advertisement appears immediately after the first paragraph of the main content area you may say, “well, that’s sort of intrusive.” If one or two ads take up much of the main space, that aspect of the page may lead to a lower quality score.

How the Search Engines Look at a Page

Eric Enge: I understand how features may impact the search engine’s perception of a page’s quality, but that presumes they can unravel the CSS to figure out where things are really appearing.

Bill Slawski: Microsoft has been writing white papers and patents on the topic of Visual Gaps Segmentation since 2003. Google had a patent called “Determining semantically distinct regions of a document” involving local search where they could identify blocks of text reviews for restaurants or other places that may be separated.

For example, you have New York, a village voice article about restaurants in Greenwich Village, and it has ten paragraphs about ten different restaurants, starts with the name of the restaurant in each paragraph, and ends with the address, and in between is review.

This patent said, “we can take that page, segment those reviews, and identify them with each of the individual restaurants,” and then two or three paragraphs sets they say, “we can also use the segmentation process in other ways like identifying different sections of a page, main content, a header, a footer, or so on.” Google was granted a patent on a more detailed page segmentation process about a month ago.

Bill Slawski: Segmentation is probably part of this quality review, being able to identify and understand different parts of pages. They don’t just look at CSS. In the days where tables were used a lot you had the old table trick.

You moved the content up and, depending on how you arranged a table, you could use absolute positioning. With CSS you can do the same type of thing, but the search engine is going to use some type of simulated browser. It doesn’t render a page completely, but it helps them give an idea if they look at the DOM (Document Object Model) model of a page.

They look at some simulation of how the page will render, like an idea of where white space is, where HR tags might be throwing lines on the page, and so on. They can get a sense of what appears where, how they are separated, and then try to understand what each of those blocks does based upon linguistic-based features involving those blocks.

Is it a set of multiple single word things that have links attached to them? For instance, each one is capitalized that might be main navigation. So, you can break up a page like that, you can look at where things appear. That could be a signal, a quality signal. You can see how they are arranged.

The Search Engines Understand That There Are Different Types of Sites

Eric Enge: Does the type of site matter?

Bill Slawski: Most likely there is some categorization of types of sites so you are not looking at the same type of quality signals on the front page of a newspaper as you are on the front page of a blog or an ecommerce site.

You can have different types of things printed on those different places. You are not going to get a TRUSTe badge on a blog, but you might on an ecommerce site. You look at the different features and realize that different genres, different types of sites, may have different ones associated with them.

Eric Enge: Yes.

Bill Slawski: That may have been derived when these seed quality sites were selected. There may have been some preprocessing to identify different aspects such as ecommerce site, labels, blog labels, and other things so whatever machine learning system they used could make distinctions between types of pages and see different types of features with them.

It’s called a Decision Tree Process, and this process would look at a page and say, “is this a blog, yes or no? Is this a new site, yes or no?” It crawls along different pathways and asks questions to go crawl over that vital score.

Eric Enge: Other things you can look at are markers of quality, such as spelling errors on the page. I think Zappos, if I remember correctly, is currently editing all their reviews because they’ve learned that spelling errors and grammar affect conversion. So, that’s a clear signal they could potentially use, and the number of broken links is another.

Another area that’s interesting is when you come to a page and it is long block of text. There may be a picture on top, but that’s probably a good predictor of a high bounce rate. If it is a research paper, that’s one thing, but if it is a news article that is something else.

Bill Slawski: Or, if it’s the Declaration of Independence.

Eric Enge: Right, but they can handle that segmentation. If someone is looking for a new pair of shoes, and they come to a page with ten paragraphs of text and a couple of buttons to buy shoes, that’s a good predictor of a high bounce rate.

Bill Slawski: On the other hand, if you have a page where there is a H1 header and a main heading at the top of the page, a couple of subheadings, a list, and some pictures that all appear to be meaningful to the content of the page, that would be a well-constructed article. It’s readable for the web, it’s easy to scan and it’s easy to locate different sections of the page that identify different concepts. This may make the page more interesting, more engaging, and keep people on a page longer.

So, do these features translate to the type of user behavior where someone will be more engaged with the page and spend more time on it? Chances are, in many cases, they will.

User Engagment Signals as a Validator

Eric Enge: Another concept is user engagement signals standing by themselves may be noisy but ten of them collectively probably won’t be noisy. You could take ten noisy signals and if eight of them point in the same direction, then you’ve got a signal.

Bill Slawski: They reinforce each other in a positive manner.

Eric Enge: Then you are beginning to get something which is no longer a noisy signal.

Bill Slawski: Right. For example, if you have a warehouse full of people, in an isolated area, printing out multiple copies of the same document over and over and over, because they think printing a document is a user behavior signal that the search engine might notice, you are wasting a lot of paper and a lot of time.

In isolation that is going to look odd, it’s going to be an unusual pattern. The search engine is going to say, “someone is trying to do something they shouldn’t be doing.”

Eric Enge: Yes. That can become a direct negative flag, and you must be careful because your competitor could do it to you. So, the ballgame seems to go on. What about misleading information which was covered by a Microsoft white paper?

Bill Slawski: That was about concepts involving web credibility that Microsoft attempted to identify. It involved both on-site factors and off-site factors, and a third category, called aggregated information, which was the user behavior data they collected about pages. If you had on-site factors such as security certificates, logos, and certain other features, that would tend to make you look more credible. The emphasis is more on credibility than quality. It seems that the search engines are equating credibility with quality to a degree.

Bill Slawski: The AIRWeb Conference, which was held five years in a row but not held last year, was held again this year. It covered adversarial information retrieval on the web in conjunction with another workshop on credibility. They called it the 2010 Web Quality Conference and it was shared by people from Google, Microsoft, Yahoo and a number of academic participants.

Design actually plays a very important part, maybe bigger than most people would assume when it comes to people assessing whether or not this site is credible or not.

You can go back a number of years to the Stanford persuasive technologies laboratory’s research and work on credibility. One of the findings stated, on a study of five thousand web sites or so, that design plays an important part, maybe bigger than most people would assume, when it comes to people assessing whether or not this site is credible or not.

They also came out with a series of guidelines that said certain things that will make your web site appear more credible to people. It included photographs of people behind the site, explicitly showing an address, having privacy policy or ‘about us’ page, or terms of service. These are on-page signals you could look at.

There are many off-page signals you could look at such as winning a Webby Award, being recognized in other places, being cited on authoritative type sites, or even page rank which they said they would consider as a signal to determine whether or not a page was a quality page. In the Microsoft paper they said they will look at page rank, which was interesting.

Populating Useful Information Among Related Web Pages

Eric Enge: Then you have the notion of brand searchers. If people are searching for your brand, that’s a clear signal. If you have a no-name web site and there are no searches for the web site name or the owner’s company name.

Bill Slawski: That stirs up a whole different kettle of fish, and it leads to how do you determine whether or not a page is an authority page. For instance, Google decides, when somebody types ESPN into their search box on the toolbar, the ESPN web site should be the first one to come up. It doesn’t matter much what follows it. If they type Hilton but it goes into the topic of data the search engines identify as named entities, or specific people, and places ; how do they then associate those with particular query terms, and if those query terms are searched for how do they treat them?

Do they look at it as a navigational query and ensure the site they associated with it comes up? Do they imply site search and show four, five, six, seven different results from that web site in the top ten which Google had been doing for a good amount of time?

Eric Enge: Even for a non-brand search, for instance, Google surely associates Zappos with shoes. Right? So, in the presence of the authority, compared to some other new shoe site, you could reference the fact that the brand name Zappos is searched a bunch and that could be a direct authority signal for any search on the topic of shoes.

Bill Slawski: Right. Let us discuss a different patent from Google that explores that and goes into it in more detail. There was one published in 2007 that I wrote about called “Populating useful information among related web pages.” It talks about how Google determines which web site might be associated with a particular query and might be identified as authoritative of it.

In some ways, it echoes some of the things in the Microsoft paper about misinformation about authority. It not only looks at things it may see on the web, such as links to the pages using anchor text with those terms, but it may also look to see whether or not the term is a registered trademark that belongs to the company that owns a particular web site. It may also look at the domain name or yellow page entries.

One of the authors of this patent also wrote a number of the local search patterns which, in some parts, say that citations are just as good as links. The mention of a particular business at a particular location will more likely rank higher if somebody does a search for businesses of that type in that location . So, this patent from Google expands beyond local search to find authoritative web pages for particular queries.

Rejecting Annoying Documents

Eric Enge: Excellent. Since we are getting towards the end I’d like your thoughts on annoying advertisements.

Bill Slawski: Google came up with a patent a few years ago which, in some ways, seems a bit similar to Panda. It focused upon features on landing pages and the aspects of advertisements. It was called “Detecting and rejecting annoying documents”.

It provided a list of the types of things they may look at in ads, on landing pages, the subject matter, characteristics rating, what type of language it uses, geographically where is it from, and who is the owner of the content.

Eric Enge: It may even detect content in images using OCR or other kinds of analysis to understand what is in an image.

Bill Slawski: Right, and also locate Flash associated with an ad, locate the audio that might be played, look at the quality of images, and the fact that they are animated or not. It was a big list. I do not know if we will see a patent anytime soon from Google that gives us the same type of list involving organic search and the Panda approach. Something might be published two, three or four years from now.

Eric Enge: It’s interesting. Obviously, what patents they are using and not using is something you don’t get visibility to unless you are in the right particular building at the right time at the Googleplex.

It seems to me the underlying lesson is that you need to be aware of search engines and, obviously, make search engine savvy web sites. The point is you need to focus on what people should have focused on all along which is: What do my users want? How do I give it to them? How do I engage them? How do I keep them interested? Then create a great user experience because that’s what they are trying to model.

My perspective is search engines are another visitor to your web site like anybody else.

Bill Slawski: Right. My perspective is that search engines are another visitor to your web site like anybody else. They may have different requirements. There may be some additional technical steps you have to take for your site to cater to them, but they are a visitor and they want what other visitors to your site want. They want to fulfill some type of informational or situational need. They want to find information they are looking for. They want to buy what you offer if, in the snippets that show up in search results, that’s what you do offer.

If you are a web site that’s copying everybody else and not adding anything new or meaningful, not presenting it in a way that makes it easier to read and easier to find, and there is nothing that differentiates you or sets you apart, then you are not treating potential visitors the best way you can.

When you do SEO, even in the age of Panda, you should be doing all the basics. It’s a re-ranking approach. You need to get rid of the same content with multiple different URLs, get rid of pages that are primarily keyword insertion pages where a phrase or two or three changes but the rest of everything stays the same.

When you write about something, if you are paying attention to phrase-based indexing, make sure you include related information that most people would include on that page, related terms and so on. Those basics don’t go away and they may be more important now than they were in the past.

Yes. As a searcher, as someone who helps people with web sites, and as someone who may present my own stuff on web sites, I want to know how it works. When I do a search, I want to make sure I am finding the things that are out on the web.

Get some sweat equity going and make sure your stuff is stuff people want to see, learn about the search space as much as you can.

Bill Slawski: The things I need, or want, or hope to see, and anything Google can do to make this better, I think everybody wins. That may be more work for people putting content on the web, but the cost of sweat is fairly cheap. Get some sweat equity going and make sure your stuff is stuff people want to see, learn about the search space as much as you can.

As a ranking signal we have relevance, we have importance and, increasingly, we have content quality.

Eric Enge: How is life for you otherwise?

Bill Slawski: I have been trying to keep things local, get more involved in my local community, and do things with the local Chamber of Commerce. I now live in an area that’s much more rural in Northwestern Virginia and some of these local business people need the help.

I am really close to DC and have been trying to work more with nonprofits. Instead of traveling, I am meeting many people locally, helping people learn more about what they can do with their web sites and that’s pretty fulfilling.

Bill Slawski: I live in horse country now; there might actually be more horses in my county then there are people.

Eric Enge: Thanks Bill!

Originally published at Ramblings About SEO

June 7, 2011

Tag: SEO

Google Panda Update: $25,000 Reward If You Can Get This Site Recovered