
Robots.txt has long been the core mechanism for expressing crawl preferences and constraints on the web. Yet, the file format is intentionally simple and underspecified, while real-world websites exhibit complex, context-dependent expectations around crawling, scraping, and automated interaction. In parallel, large language models (LLMs) and agentic AI workflows are transforming how scraping systems reason about and adapt to such expectations.