WebProNews

Google Withdraws Previous Crawling Recommendation

Google announced that it is deprecating its AJAX crawling scheme, and webmasters need to be aware of how things have changed to ensure Google is crawling their site correctly and in the most effective manner possible.

Does this have any bearing on your site? Let us know in the comments.

Specifically, Google is no longer recommending the AJAX crawling proposal it made six years ago. That was made at the time to benefit webmasters and users by making content from rich and interactive AJAX-based sites universally accessible through search results. Google said it believed this would “significantly improve the web.”

In those days (2009), Google was unable to render and understand pages using JavaScript to present content to users, and crawlers couldn’t see any content created dynamically.

The technology has improved a great deal in six years, as you would probably expect (for perspective, 2009 was the year of the iPhone 3GS).

Now, as long as you don’t block Googlebot from crawling your JavaScript or CSS files, Google can render and understand your pages as modern browsers do.

Last year, Google wrote a blog post about this and how it was starting to understand pages better. It also offered some information on things that may lead to a negative impact on search results for your site.

“If resources like JavaScript or CSS in separate files are blocked (say, with robots.txt) so that Googlebot can’t retrieve them, our indexing systems won’t be able to see your site like an average user,” the post, co-written by a trio of Googlers, says. “We recommend allowing Googlebot to retrieve JavaScript and CSS so that your content can be indexed better. This is especially important for mobile websites, where external resources like CSS and JavaScript help our algorithms understand that the pages are optimized for mobile.”

“If your web server is unable to handle the volume of crawl requests for resources, it may have a negative impact on our capability to render your pages. If you’d like to ensure that your pages can be rendered by Google, make sure your servers are able to handle crawl requests for resources,” the continues. “It’s always a good idea to have your site degrade gracefully. This will help users enjoy your content even if their browser doesn’t have compatible JavaScript implementations. It will also help visitors with JavaScript disabled or off, as well as search engines that can’t execute JavaScript yet.”

Google also notes that some JavaScript is too complex or arcane for it to execute, which means they won’t be able to render the page fully or accurately. Also, some JavaScript removes content from the page, which prevents Google from indexing it.

At the time, Google also revealed a new tool in Webmaster Tools (now Search Console) in the form of an addition to the Fetch as Google tool, which lets you see how Googlebot renders a page. Submit a URL with “Fetch and render,” and Google tries to find all the external files involved and fetch them as well. These files include images, CSS and JavaScript files as well as other things that might be indirectly embedded through the CSS or JavaScript. Google uses all of this to render a preview image that shows Googlebot’s view of the page.

Google updated its technical Webmaster Guidelines about a year ago to recommend against disallowing Googlebot from crawling your site’s CSS or JavaScript files. The company now says that since the assumptions from its 2009 proposal are no longer valid, it recommends following the principles of progressive enhancement, a web design strategy that emphasizes accessibility, semantic HTML markup, and external stylesheet and scripting.

“For example, you can use the History API pushState() to ensure accessibility for a wider range of browsers (and our systems),” says Google Search Quality Analyst Kazushi Nagayama.

Nagayama shares a few Qs and As related to all of this, which should help webmasters better understand the preferred approach:

Q: My site currently follows your recommendation and supports _escaped_fragment_. Would my site stop getting indexed now that you’ve deprecated your recommendation?
A: No, the site would still be indexed. In general, however, we recommend you implement industry best practices when you’re making the next update for your site. Instead of the _escaped_fragment_ URLs, we’ll generally crawl, render, and index the #! URLs.

Q: Is moving away from the AJAX crawling proposal to industry best practices considered a site move? Do I need to implement redirects?
A: If your current setup is working fine, you should not have to immediately change anything. If you’re building a new site or restructuring an already existing site, simply avoid introducing _escaped_fragment_ urls. .

Q: I use a JavaScript framework and my webserver serves a pre-rendered page. Is that still ok?
A: In general, websites shouldn’t pre-render pages only for Google — we expect that you might pre-render pages for performance benefits for users and that you would follow progressive enhancement guidelines. If you pre-render pages, make sure that the content served to Googlebot matches the user’s experience, both how it looks and how it interacts. Serving Googlebot different content than a normal user would see is considered cloaking, and would be against our Webmaster Guidelines.

If all of this is insufficient in helping you get on the right track, Google suggests posting questions on Nagayama’s blog post or in the Google Webmaster Help forum.

Are you already doing things the right way, or do you need to make changes based on what Google had to say this week? Let us know in the comments.