Skip to content

correct User-agent placement in robots.txt

What does this MR do?

fixes #26807 (closed)

Are there points in the code the reviewer needs to double check?

I'v added a User-Agent field to the top of each record within the file per the specification:

The Format

The format and semantics of the "/robots.txt" file are as follows:
The file consists of one or more records separated by one or more blank lines (terminated by CR,CR/NL, or NL). Each record contains lines of the form "<field>:<optionalspace><value><optionalspace>". The field name is case insensitive.

Why was this MR needed?

Robots can currently crawl gitlab freely, even when honoring robots.txt.

it is currently parsed as

User-agent: *

and nothing else is processed as a result because there is no instruction for that user agent

Screenshots (if relevant)

Does this MR meet the acceptance criteria?

What are the relevant issue numbers?

Merge request reports