Google Has Canceled Support For Robots.Txt Noindex
Google has announced that GoogleBot will no longer follow a Robots.txt directive related to indexing. Publishers relying on the robots.txt noindex directive have until September 1, 2019 to remove it and begin using an alternative suggested by Google.
For anyone new to SEO, these might sound like a bunch of foreign terms. However, it is of importance to note. The Googlebot or Google’s crawler is essentially a bot that goes through or crawls through pages and adds them to the index, which is Google’s database. Based on this indexation, Google can rank different websites.
The Robot.txt directive is essentially a command given by a website that tells the Googlebot which pages on your site to go and which pages on your site to avoid. Normally it was used to optimize a website’s crawl-ability, or the ability for a crawlbot not to bump into any problems while going through a website.
Why Google is Cancelling it
Google does not consider the noindex Robot.txt an official directive. They used to be in support of it, however, it did not work in 8% of the cases. It was not a fool-proof directive. They have officially unsupported noindex, crawl delay and no-follow directives within robot.txt files. They have told websites that have it to remove their directives till September 1st, 2019. However, Google establishes that they have been asking websites not to rely on it for many years.
In essence, it can be summed up that while open-sourcing parser library, the team at Google analyzed robots.txt rules and its usage. In particular, they focused on rules unsupported by internet draft, such as nofollow, crawl-delay, and noindex. Since the rules were undocumented by Google, naturally, the usage related Googlebot is quite low. Digging further, their team noticed that the usage was contradicted by several other rules in all, but 0.001% of robots.txt files on the internet. The mistakes and errors hurt websites’ presence in SERPs in ways that preferably webmasters didn’t want.
Google’s official announcement in the morning of July 2nd said that were bidding adieu to and unsupported and undocumented rules in robots.txt. Those relying on these set of rules should learn about the available options posted by Google in their blog post.
In their official blog about it, this is what they had written, right before suggesting alternatives:
“To maintain a working and healthy ecosystem and preparing for potential open-source releases in the future, we’re retiring the codes that handle unpublished and unsupported rules (such as noindex) on September 1, 2019.”
They believe that sites actually hurt themselves more than they help themselves with these noindex directives, as stated by Gary Illyes. But they have made it clear that they have thought this through, especially since they have been skeptical about these directives for so many years. They do not expect disallowing the noindex robot.txt directive to hurt anyone’s site profoundly.
Alternatives Suggested by Google
Google did not want sites and companies to be rendered helpless with this change, so they gave a comprehensive list of things one could do otherwise. If you happen to be affected by this change, this is what Google posted with regards to alternatives:
- Noindex in robots meta tags, or the noindex directive is the most effective way to remove URLs from the index where crawling is allowed. This is supported both in the HTTP response headers and in HTML.
- Using 404 and 410 HTTP status codes, which is shorthand for the page not existing. This helps crawl bots drop such URLs from Google’s index once they’re crawled and processed.
- Using password protection to hide a page behind a login will remove it from Google’s indexation. The exception is a markup is used to indicate subscription or paywalled content.
- Search engines can index pages only if they know about it, so one can block the page from being crawled so that its content won’t be indexed. The search engine may index a URL based on links from other pages, without seeing the content itself, however, Google is taking measures to make these pages less visible.
- Search Console Remove URL tool is a quick and easy method to remove a URL temporarily from Google’s search results.
Essentially, Google is trying to protect websites while also finding a way to optimize the algorithm that determines which sites get to go to the top. Google is perpetually changing their rules and regulations, and their algorithms and crawl bots, so this sudden change was not necessarily a surprise. It was, however, a relatively drastic change, but Google has already established safety nets so that no company gets further adversely affected by this change. They have given websites a good two months to change and adjust to the change in directives.