Tag: Indexing

Google Now Accepting Indexing Bug Reports

Google is now accepting bug reports regarding indexing issues, making it easier for webmasters to address problems.

Last year Google introduced its bug reporting feature as a way for webmasters to report security issues. Thanks to the success of the program, Google is expanding it to include indexing issues.

The best way to resolve indexing issues is to first consult our community forums and support documentation, which also highlight helpful tools: https://t.co/zAnLOaWsCY https://t.co/IVZUc0z8H9

— Google Search Central (@googlesearchc) April 28, 2021

Indexing issues can be reported in the URL Inspection Tool and the Index Coverage Report.

April 28, 2021
Google Has Fixed 99.9% of Canonical Indexing Issues

Google has announced it has fixed 99.9% of the canonical issues impacting how websites were indexed.

Google confirmed three weeks ago that it was dealing with indexing issues that were affecting mobile pages and canonicalization. The issues could result in pages not being indexed, or the wrong URL being associated with a given page.

According to Google, the issues are 99.9% fixed, with outlier cases expected to be fixed in the next week or two.

Final update: the canonical issue was effectively resolved last Wednesday, with about 99% of the URLs restored. We expect the remaining edge cases will be restored within a week or two.
— Google SearchLiaison (@searchliaison) October 19, 2020

October 19, 2020
How To Use Google’s New Blocked Resources Report

Google just introduced a new Webmaster Tools feature called the Blocked Resource Report, aimed at helping webmasters find and resolve issues where Google can’t use images, CSS, or JavaScript that has been blocked. Blocked resource prevent pages from rendering properly, and Google wants to make sure you’re only blocking what you really want/need to be.

The report provides the names of the hosts from which your site is using blocked resources. If you click on a row, it gives you the list of blocked resources and the pages that embed them. This should help you figure out the issues and take care of them so Google can better crawl and index your content.

Some resources will be hosted on your site, while others will be hosted on others. Clicking on a host will also give you a count of pages on your site affected by each blocked resource. Clicking on any blocked resource will give you a list of pages that load that resource. If you click on any page in the table hosting a blocked resource, you’ll get instructions for unblocking that particular resource.

In a help center article, Google runs down five steps for evaluating and redcuing your list of blocked resources:

1. Open the Blocked Resources Report to find a list of hosts of blocked resources on your site. Start with the hosts that you own, since you can directly update the robots.txt files, if needed.

2. Click a host on the report to see a list of blocked resources from that host. Go through the list and start with those that might affect the layout in a meaningful way. Less important resources, such as tracking pixels or counters, aren’t worth bothering with.

3. For each resource that affects layout, click to see a list of your pages that uses it. Click on any page in the list and follow the pop-up instructions for viewing the difference and updating the blocking robots.txt file. Fetch and render after each change to verify that the resource is now appearing.

4. Continue updating resources for a host until you’ve enabled Googlebot access to all the important blocked resources.

5. Move on to hosts that you don’t own, and if the resources have a strong visual impact, either contact the webmaster of those sites to ask them to consider unblocking the resource to Googlebot, or consider removing your page’s dependency on that resource.

There’s also an update to Fetch and Render, which shows how the blocked resources matter. When you request a URL to be fetched and rendered, it shows screenshots rendered both as Googlebot and as a typical user, so you get a better grasp on the problems.

“Webmaster Tools attempts to show you only the hosts that you might have influence over, so at the moment, we won’t show hosts that are used by many different sites (such as popular analytics services),” says Google webmaster trends analyst John Mueller. “Because it can be time-consuming (usually not for technical reasons!) to update all robots.txt files, we recommend starting with the resources that make the most important visual difference when blocked.”

In January, Google called on webmasters to offer suggestions for new features for Webmaster Tools. It set up a Google Moderator page where people could leave and vote on suggestions. Among the most popular suggestions were:

“I would like to see in WMT data from 12 months, not 3 as it is now :)”

“An automated action viewer, so webmasters can see if they were impacted by an algorithm such as Panda or Penguin>”

“Bounce back measuring tool. Did the user go back to Google for a similar search or did they find what they needed?”

Google has since given webmasters a new structured data tool.

Image via Google

March 11, 2015
Google Adjusts Index Status Data In Webmaster Tools

Google announced an adjustment to the way sites’ index status data appears in Webmaster Tools. The index status feature now tracks a site’s indexed URLs for both HTTP and HTTPS as well as for verified subdirectories. In the past, it didn’t show data for HTTPS sites independently. Everything was included in the HTTP report.

The move makes a great deal of sense as more and more sites move over to HTTPS (at least partially), and according to the company, people have been asking for this change.

Google’s John Mueller said, “If you’re a data-driven SEO (or just love to see how your site’s indexed), you’ll love this change.”

Now, each of these will show their own data in the Webmaster Tools Index Status report as long as they’re each verified separately:

http://www.example.com/
https://www.example.com/
http://example.com
https://example.com
http://www.example.com/folder/
https://www.example.com/folder/
http://example.com/folder/
https://example.com/folder/

Google notes that if you have a site on HTTPS or if some of your content is indexed under different subdomains, you’ll see a change that looks something like this:

“In order to see your data correctly, you will need to verify all existing variants of your site (www., non-www., HTTPS, subdirectories, subdomains) in Google Webmaster Tools. We recommend that your preferred domains and canonical URLs are configured accordingly,” says Google’s Zineb Ait Bahajji. “Note that if you wish to submit a Sitemap, you will need to do so for the preferred variant of your website, using the corresponding URLs. Robots.txt files are also read separately for each protocol and hostname.”

You can read up more on all of this here.

Image via Google

April 1, 2014
Google Says It Will Follow Five Redirects At The Same Time When Crawling

About a year ago, Google put out a Webmaster Help video discussing PageRank as it relates to 301 redirects. Specifically, someone asked, “Roughly what percentage of PageRank is lost through a 301 redirect?”

Google’s Matt Cutts responded, noting that it can change over time, but that it had been “roughly the same” for quite a while.

“The amount of PageRank that dissipates through a 301 is currently identical to the amount of PageRank that dissipates through a link,” he explained. “So they are utterly the same in terms of the amount of PageRank that dissipates going through a 301 versus through a link. So that doesn’t mean use a 301. It doesn’t mean use a link. It means use whatever is best for your purposes because you don’t get to hoard or conserve any more PageRank if you use a 301, and likewise it doesn’t hurt you if you use a 301.”

In a new Webmaster Central office hours video (via Search Engine Roundtable), Google’s John Mueller dropped another helpful tidbit related to redirects in that GoogleBot will follow up to five at the same time.

“We generally prefer to have fewer redirects in a chain if possible. I think GoogleBot follows up to five redirects at the same time when it’s trying to crawl a page, so up to give would do within the same cycle. If you have more than five in a chain, then we would have to kind of think about that the next time we crawled that page, and follow the rest of the redirects…We generally recommend trying to reduce it to one redirect wherever possible. Sometimes there are technical reasons why that’s not possible, so something with two redirects is fine.”

As Barry Schwartz at SER notes, this may be the first time Google has given a specific number. In the comments of his post, Michael Martinez says it used to be 2.

Image via YouTube

February 17, 2014
Google Re-Indexes Digg After Spam Removal Screwup

Wednesday afternoon, Digg disappeared from Google. Vanished. Gone. It was clear that Google had de-indexed Digg, but why? To what end?

Was it because Digg had just announced plans to build a Google Reader clone to satisfy angry users when Google kills the product on July 1st? Was Google just being a dick?

No, conspiracy theories were put to rest when Google released a statement, saying that it was all just a big screwup.

We’re sorry about the inconvenience this morning to people trying to search for Digg. In the process of removing a spammy link on Digg.com, we inadvertently applied the webspam action to the whole site. We’re correcting this, and the fix should be deployed shortly.

And fixed it they have. Digg is back up in Google:

It’s not like being de-indexed in Google was really a killer for Digg, considering the majority of their traffic is direct. But for Digg, you’d at least want digg.com to show up in a search. Thankfully, Google has fixed the screwup and everyone can carry on.

March 21, 2013
Bing Might Be Dropping The Number Of URLs It Indexes

As Google’s Penguin update is having its fun, Bing has been relatively calm. There has not been a major update in a few months and everything seems perfect. Well, everything used to be perfect.

There’s a member on Webmaster World that noticed Bing was indexing fewer pages on his site. This wasn’t a small change either. Bing used to index 500,000 URLs on the site, but was now only indexing 350,000. That’s a worrying change. The question now is whether or not Bing pushed some kind of update or if its just not indexing like it used to.

Search Engine Roundtable jumped on the case. They began to look at their own indexed pages and found the numbers to be fairly consistent. There we no major drops. OK, so what about impressions and clicks? They found that those numbers had exponentially increased. The crawl rate seemed to be normal as well.

So what’s up with the guy who thinks Bing is not doing its fair share in indexing? We get a better idea by looking at the responses to his original post. Some Webmasters have shared in the original poster’s pain by saying their indexed pages have seen similar drops.

A later post sheds more light on the event though. They say that the people at Bing told them that all of their URLs are being indexed. It’s just a problem with those numbers not showing up in results. Even so, that’s a pretty serious problem if the Bing Webmaster Tools show sites being delisted or URLs no longer being indexed.

We’ve reached out to Microsoft for comment and we’ll update this story if we hear back.

Have you seen any changes in the amount of sites being indexed by Bing? I find it highly unlikely that Bing has rolled out an update, but the impending release of the new Bing may be having an effect on indexing for certain sites.

May 15, 2012
Experimental Search Engine Removes Top Million Sites From Your Results

Do you ever feel the search results that Google yields are too mainstream? Are you looking to explore the cavernous, cobweb-laden outer reaches of the interwebs? If you want to spend some time on some deep discovery, Million Short might be your ticket.

Million Short’s name says it all. It’s a search engine that brings back results that are a million sites short of what you’d find in Google. You can chose to remove the top million, hundred thousand, ten thousand, and on down to just one hundred from your results.

Million Short is an experimental web search engine (really, more of a discovery engine) that allows you to REMOVE the top million (or top 100k, 10k, 1k, 100) sites from the results set. We thought might be somewhat interesting to see what we’d find if we just removed an entire slice of the web.

The thinking was the same popular sites (we’re not saying popular equals irrelevant) show up again and again, Million Short makes it easy to discover sites that just don’t make it to the top of the search engine results for whatever reason (poor SEO, new site, small marketing budget, competitive keyword(s) etc.). Most people don’t look beyond page 1 when doing a search and now they don’t have to.

For instance, let’s say that I used Million Short to search “Hipster.” Gone are results from Wikpedia, Urban dictionary, WikiHow, KnowYourMeme, and even latfh.com (Look at that F*cking Hipster, a popular blog). What it has returned are various sites that I didn’t see even on the 5th page of Google search results (and I didn’t dare go past that). The lone exception was HipsterHandbook, which appeared on the 1st page of both engines.

In theory, Million Short is helping you discover stuff that you would never ever see using Google or even Bing or Yahoo!. It’s stuff that would be buried under hundreds of pages of search results. Let’s look at another example, a search for “The Beatles.”

Million Short failed to remove the top search result from a Google search of “The Beatles,” which was thebeatles.com. But everything that follows are deeper sites. Million Short removed (once again) Wikipedia, last.fm, mtv.com, apple.com, amazon.com and a multitude of lyrics and guitar tabs sites from my results.

One result I stumbled upon was from a site called suckmybeatles.com, and it’s basically a guy who really thinks The Beatles blow who posts blog entries and funny pictures detailing this (unpopular) opinion. That was well worth my time, so I guess score one for Million Short.

Million Short was brought to my attention via reddit, so let’s take a look at some of the reviews from the community (which are mixed).

Oddgenetix writes:

I just had a very rewarding experience with this thing. I searched my own name, and through pure serendipity the first result was an artist, with the same name as I. The art he paints is 50’s-60’s pin-up (the old-style classy kind, not the desperate new variety that melded with rockabilly, retro, and reality-tv-tattoo-culture.) Also really sweet looking vintage car ads for cars he imagined, and propaganda-type posters. Shit is so awesome. I threw money at him and got a few paintings, which I will be hanging in my living room, because consequently the paintings are signed with my name and I’m a pretty good liar.
TL;DR I searched my own name and found a same-name artist, so I bought his work and now I’m “a painter.”

Bullshit? Maybe. Entirely plausible for this site? Definitely.

Gsan writes:

This is a nice technique. It’s like searching a whole other internet.
Edit: this is real nice. Look at the sidebar of the sites it blocked, and tell me how many of those you think had what you were looking for? For me the side sites are mostly online stores, and cheap sites like ehow.com and about.com. Good riddance. Google and Bing seem to think I want to buy everything I’m searching for and they really want me to buy it at Amazon.

DrizztDoUrden writes:

This is actually pretty sweet. It reminds me of the gopher days when it was nearly impossible to get exactly what you wanted, but you would learn so much more from the journey.

But FLEABttn writes:

It’s like a search engine designed to find things I’m not looking for!

And nkozyra writes:

These results were shockingly terrible.

Look, Million Short is obviously no Google killer. It’s not even a Yahoo killer. It’s an alternative search engine for people wanting a unique search experience. If you’re looking for popular, relevant information and you want it fast, it’s probably not the way to go. If you’re looking to find some random corners of the internet, it might tickle your fancy.

Just be prepared to find stuff like this as your top result (h/t reddit). ಠ_ಠ

May 1, 2012
Google Video Indexing Videos

Google Webmaster Central has released a few Youtube clips describing the best methods for indexing video content, to supplement their Webmaster Tools dialogue. Google states that when video content is marked up in the body of the page, search engines and other sites can recognize it, and may use it to improve the display of video content on a page or in search results. Marking up content provides information about videos that allows Google and other sites to index them. Along with a description of common video indexing mistakes webmasters make, Google also added tips on video indexing markup, as well as a clip on video platform restrictions.

Google also added a clip about schema.org, a set of standards that Google, Microsoft and Yahoo have jointly initiated.

Search engines are utilizing on-page markup in a number of ways. For example, Google uses it to create rich snippets in search results. Not every type of information in schema.org will be surfaced in search results, but over time, one can expect that more data will be used in more ways. Google currently supports rich snippets for people, events, reviews, products, recipes, and breadcrumb navigation, and the new schema.org markup can be used for these types.

February 22, 2012
How Many Spiders Does Google Have?

Google has posted a short but interesting video to its Webmaster Central YouTube channel. A user asked the question, "How many bots/spiders does Google currently have crawling the web?" and Google’s Matt Cutts gave his answer.

"It’s important to realize that it’s not really actual robots or actual spiders out there…instead, it’s banks of machines …at Google’s data centers who open up an HTTP connection and request a page and then get it back," he says. "So any bank of machines (even 50 machines) could easily be requesting a bunch of different content."

"We try to refresh a large fraction of the web every few days," he adds. "So it turns out you really don’t need a ton of machines. Even a relatively small amount of machines operating in parallel and fetching pages in parallel can really be able to crawl of find new pages on the web in a very quick way."

Matt says that Google doesn’t give out the exact number, but that it’s somewhere between 25 and 1,000. I’m not sure what you can really do with that information, but it’s worth hearing a quick rundown of how it works for those who aren’t real familiar with how Google indexes content.

Related Articles:

> Google Rolls Out Breadcrumb Display in SERPs

> Google Makes it Easier to Tell Where Results Originate From

> Get More Links in Your Actual Google Results

February 3, 2010