On September 1, the Search Engine Jagernaut will no longer support the use of NOINDEX directive listed in robots.txt file. NOINDEX is a common practice but is an ‘unpublished rule” that professionals in the industry have been using for a long time.
They posted “In the interest of maintaining a healthy ecosystem and preparing for potential future open source releases, we’re retiring all code that handles unsupported and unpublished rules (such as noindex) on September 1, 2019. For those of you who relied on the noindex indexing directive in the robots.txt file, which controls crawling, there are a number of alternative options” the company said.
This move to ‘outlaw’ this protocol was result of their latest update on open-sourcing Google's production robots.txt parser. Upon examination of the their parser library, they also found some unsupported robot.txt rules. They reported, “Since these rules were never documented by Google, naturally, their usage in relation to Googlebot is very low. Digging further, we saw their usage was contradicted by other rules in all but 0.001% of all robots.txt files on the internet. These mistakes hurt websites' presence in Google's search results in ways we don’t think webmasters intended.”
Google listed the following alternatives to using this script:
(1) Noindex in robots meta tags: Supported both in the HTTP response headers and in HTML, the noindex directive is the most effective way to remove URLs from the index when crawling is allowed.
(2) 404 and 410 HTTP status codes: Both status codes mean that the page does not exist, which will drop such URLs from Google’s index once they’re crawled and processed.
(3) Password protection: Unless markup is used to indicate subscription or paywalled content, hiding a page behind a login will generally remove it from Google’s index.
(4) Disallow in robots.txt: Search engines can only index pages that they know about, so blocking the page from being crawled often means its content won’t be indexed. While the search engine may also index a URL based on links from other pages, without seeing the content itself, we aim to make such pages less visible in the future.
(5) Search Console Remove URL tool: The tool is a quick and easy method to remove a URL temporarily from Google’s search results.
Why should we care about this? This announcement is a clean-up of a hear-say that has become standard practice. While seemingly irrelevant if seen on the scale of on one website, this update coupled with the open-sourcing of the google robot.txt parser will likely affect the whole ecosystem of the internet and the search industry in the years to come. If you are using this protocol on your sites, take it out and replace with alternatives mentioned above. If you want to learn more about this - visit this blog.
Other related resources can be found here: https://support.google.com/webmasters/answer/6332384?ref_topic=1724262