Google Is Ready To Set Robots.txt As The Official Standard

Updated on: 24 July 2019

It’s official – the 25-year-old Robots Exclusion Protocol (REP), otherwise known as robots.txt is going to be an internet standard.

Although it has been around for many years, Google has never made it official to adopt the rules in the REP, meaning to say that it is currently open to interpretation to users. This poses a clear challenge – how does robots.txt impact SEO rankings, and how can it be used in terms of the modern digital world?

In order to face this challenge – Google has finally decided to update and document how to use the REP, which could make robots.txt an essential measure to implement in the near future. Here are some further insights into the brand-new protocol.

What is Robots.txt?

For SEO beginners, robots.txt is simply a text file inputted into your web – accompanied with special instructions to guide search engine robots on how to crawl pages on your website.

While it is not a necessity to use robots.txt – it certainly helps in some ways.

The two main duties of a robots.txt file are denying access to private content, and preventing search engine robots from crawling certain pages. For example, you may still be working on a particular page and wish to deny access for Google’s crawlers to index it until it is completely finished.

Aside from instructing robot crawlers on what to do specifically – robots.txt files are excellent for checking any blocked content or sections on your website. The advantages of robots.txt files are plenty – but keep in mind that they too, can potentially hurt your SEO rankings, especially if you had no prior intention on blocking particular page(s).

With this new update in mind, as well as some proper SEO training, it’s going to be a lot easier to understand and handle robots.txt moving forward.

So, what are the updated rules?

Now that you have a gist of what robots.txt files can do, what’s left to do is understanding Google’s intention behind this change.

The key point is to assist website owners in better managing their content and resources. While there are plenty of existing features that come with the robots.txt file – the parser which has been used to decode the robots.txt file will now be open-sourced by Google. Creating a standardized syntax to establish and parse rules – this is going to help eliminate a lot of confusion for users.

While the proposed draft by Google does not make any changes to the rules that were initiated back in 1994 – it is adjusted and adapted to better fit the modern web.

Below, you will find some of the key updated rules:

  • Any URL such as FTP or CoAP can use robots.txt, i.e., HTTP is not the only option anymore.
  • A minimum of the first 500 kibibytes of robots.txt has to be parsed by the developers.
  • Owners are given the flexibility to update their robots.txt files whenever they like, due to the brand-new, 24 hours maximum caching time.
  • Known disallowed pages will not be crawled for a certain period of time; once a robots.txt file becomes inaccessible as a result of server failures.

Why this update matters

You probably already know the importance of SEO after attending our Digital Marketing Course. It’s always wise to follow in Google’s footsteps when trying to gain an advantage to boost your rankings – and this time is no different.

Together with your trusted Digital Marketing Agency, it’s about time you put aside some time to take in this new update once it is fully launched. Especially if you have yet to include robots.txt files in your website – take the chance to execute it as soon as possible.