Google has a post up on the Google News blog today talking a little bit about how it recrawls news content in order to provide the most up to date content and eliminate dead links.
"How do you balance looking for new content against the need to update older content? How can you make sure the content is fresh, doesn’t link to dead pages or display headlines that have been changed by the publisher?" asks Google.
Google’s answer is that it has implemented a recrawl feature that lets it focus on getting the newest content, while displaying the most current version of older content. After Google News discovers an article, it will continue to crawl it repeatedly to look for changes. In the first day, it will actually recrawl it more frequently, because as the company says, the most changes are usually made to news stories soon after they’re published.
"In some cases, we’ll even revisit articles we had trouble crawling the first time around," says Google. "After that, we visit them less often. Either way, we try hard to present users with the freshest news. (We bet whoever wrote "Dewey Defeats Truman" wishes they had recrawl!)."
Google says the feature is intended to reduce the number of outdated headlines and dead links, and for publishers, it will provide assurance that Google will index the latest stories and updates as soon as possible.