Stephen Burns runs a motorcycle repair shop out of his garage in Redwood City. He’s meticulous about local SEO and has been for years. But recently, customers started showing up who hadn’t found him through Google. They’d asked ChatGPT where to get their motorcycle fixed, and it sent them to his garage.
That story captures something important happening across the web right now. Discovery is being restructured. The customer journey increasingly runs through AI systems, and those systems have their own requirements for who they can see and recommend.
Burns got lucky: his content made it into the training data, and the model knew he existed. But many businesses aren’t so fortunate. And the unlucky ones often don’t even know they’re invisible.
The Shift (Almost) Nobody Prepared For
For twenty years, the web security playbook has been straightforward: humans good, bots bad. Build walls. Check CAPTCHAs. Rate-limit aggressively. Block anything that doesn’t look like a person clicking around.
That made sense when “bot” meant scrapers, spammers, and credential stuffers. But the category has fractured. Today, automated traffic includes:
Training crawlers harvesting content for AI model development (Common Crawl, GPTBot, ClaudeBot). These are extractive and periodic with no user behind them, just dataset assembly.
Retrieval bots fetching real-time information to augment AI responses (Perplexity, ChatGPT with browsing). These surface your content in AI-synthesized answers.
Transaction agents acting on direct behalf of users to accomplish specific goals: book a flight, compare insurance quotes, place an order, schedule an appointment.
That third category is the one that should keep business leaders up at night, not because it’s dangerous, but because it’s valuable, and we’re systematically blocking it.
When a user tells their AI assistant “find me a hotel in Lisbon under €200 with good reviews and book it,” that agent is a customer. It has intent, a task, and (via the user) a credit card. If your site can’t accommodate it - or worse, actively blocks it - you’ve lost a sale to a competitor whose infrastructure was ready.
Consider Children’s Hospital of Los Angeles, one of the top pediatric cancer centers in the United States. It’s effectively invisible to AI assistants. When parents ask Gemini or ChatGPT where to take a child with leukemia in LA, CHLA doesn’t appear, not because the hospital opted out, but because their CDN’s default settings block AI crawlers. Families may be unable to find potentially life-saving care because of a configuration choice the hospital may not even know was made.
That’s the current state: valuable, legitimate discovery and transaction pathways being severed by infrastructure designed for a different threat model.
Three Properties Your Web Presence Now Needs
I’ve been working on identity and authorization infrastructure for AI agents with colleagues across the industry, including co-authoring a recent whitepaper on the topic. We keep returning to the same framework. For your web presence to function in an agent-mediated world, it needs three properties:
Accessible: The agent can actually reach your content and services. Not blocked by CDN defaults, overzealous bot detection, or blanket crawler bans.
Legible: The agent can understand what it finds. Structured data, semantic markup, machine-readable formats. Not just pretty HTML that requires a human eye to interpret.
Actionable: The agent can do something. Complete a transaction, submit an inquiry, access a service. Not just read, but also act.
If any layer is missing, whether accessibility, legibility, or actionability, your web presence is invisible or inert to the fastest-growing discovery and transaction channel emerging today. Even if your site is live but not properly indexed for agent retrieval or omitted from the training corpus, you may still be invisible.
Most organizations have focused their AI strategy on the first category, namely training data accessibility, being “in the model.” That matters. But it’s table stakes. The real opportunity (and the real risk of missing out) is in the third category: enabling legitimate agents to transact on behalf of real users.
The Verification Problem (And Why It’s Being Solved)
The obvious objection: “How do I tell a legitimate agent from a malicious bot? They look the same at the firewall.”
Fair point. Today, they often do look the same. User-agent strings are trivially spoofable. Traffic patterns can be mimicked. This is a real problem.
But it’s being actively solved. The IETF is developing Web Bot Auth, a protocol that allows agents to cryptographically prove their identity within HTTP requests, essentially a passport for responsible agents. Major players like Cloudflare and Vercel are involved in the effort. AWS Bedrock AgentCore already supports Web Bot Auth to reduce CAPTCHAs when its agents browse protected sites. This isn’t speculative; it’s shipping.
On the authorization side, OAuth 2.1 extensions are being developed to support explicit delegated authority, a formal “on-behalf-of” flow where the agent’s access token contains two distinct identifiers: the user who granted permission and the agent performing the action. This is critically different from impersonation. It creates a clear, auditable link: you can see both who authorized the action and what performed it.
The infrastructure is coming. The question is whether you’ll be ready for it, or scrambling to catch up while your competitors capture the agent-mediated market.
What “Agent Optimization” Actually Means
We’ve spent two decades optimizing for search engines. Keywords, backlinks, page speed, mobile responsiveness, the whole SEO apparatus. Now a new optimization target is emerging: AI agents.
Agent Optimization means:
Structured data that agents can parse: Schema.org markup, JSON-LD, clear semantic HTML. If an agent can’t extract your pricing, availability, or booking endpoint programmatically, you don’t exist to it.
APIs and action endpoints: Not just content to read, but services to invoke. Can an agent place an order? Submit an inquiry? Check inventory? If the only path is clicking through a JavaScript-heavy checkout flow, you’re invisible to agent-mediated commerce.
Authentication infrastructure that distinguishes agent types: Allow legitimate agents through while maintaining security. This requires moving beyond binary “human or bot” detection to nuanced policies based on verified identity and delegated scope.
Consent and governance frameworks: When an agent accesses your systems on behalf of a user, what are the terms? What data can it retrieve? What actions can it perform? Clear policies, machine-readable where possible.
The organizations that build this infrastructure now will have a significant advantage as agent-mediated interaction becomes mainstream. Those that don’t will find themselves optimized out of an increasingly important channel.
The Stakes Are Higher Than You Think
Scenario 1: E-commerce. A user asks their AI assistant to “order more of that coffee I liked from last month.” The agent needs to access the user’s order history (with permission), find the product, check availability, and complete a purchase. If your site can’t support this flow, the agent will find a competitor who sells similar coffee and can support it. You didn’t lose a customer to a better product. You lost them to better infrastructure.
Scenario 2: Professional services. A business user tells their agent to “schedule a consultation with a commercial real estate attorney in Denver for next week.” The agent needs to find appropriate providers, check availability, and book an appointment. If your law firm’s website is a brochure with a “Contact Us” form and no structured data, the agent can’t engage. You don’t get the lead.
Scenario 3: B2B procurement. A procurement agent is tasked with “find three suppliers for industrial adhesives that meet our specs and request quotes.” The agent needs to query product databases, compare specifications, and initiate RFQ processes. If your supplier portal requires human navigation through nested menus, you’re not in the consideration set.
In each case, the failure isn’t about the quality of your product or service. It’s about the accessibility, legibility, and actionability of your web presence to AI agents acting as legitimate proxies for potential customers.
What You Should Do Now
1. Audit your current accessibility. Are AI crawlers being blocked by your CDN? Check your Cloudflare settings, your robots.txt, your rate-limiting rules. Tools like CanAISeeIt can analyze which known AI bots can access your site and how you’re showing up in AI-generated citations.
2. Assess your legibility. Can a machine parse your key information? Do you have structured data for products, services, pricing, availability, locations? Run your pages through schema validators. If an agent can’t extract the basics, you have work to do.
3. Evaluate your actionability. What can an agent actually do on your site? If the answer is “read content,” you’re only halfway there. Consider APIs, booking integrations, programmatic inquiry endpoints. What would it take for an agent to complete a transaction?
4. Develop agent access policies. Not all automated access is equal. Define what types of agents you want to support, under what conditions, with what verification. This is a policy decision, not just a technical one.
5. Watch the standards landscape. Web Bot Auth, OAuth for AI agents, MCP (Model Context Protocol), A2A (Agent-to-Agent protocol, and the related Agent Payment Protocol), these are developing rapidly. You don’t need to implement everything today, but you should understand what’s coming. To get started, check out this webinar I hosted last week discussing the emerging AI Agents standards race, with senior representatives from Visa, Stripe, Skyfire, and Consumer Reports.
6. Reframe the conversation internally. If your security team’s mandate is “block bots,” you have a framing problem. The mandate should be “enable legitimate access while blocking malicious actors.” Those are different objectives with different implementations.
7. Think in two layers: live retrieval and foundational memory. Your site must both be live-index-ready and training-corpus-visible.
For purposes of being open for business by AI Agents, your current site needs to be discoverable and indexable now by whatever live web feeds support retrieval-augmented generation (RAG) and AI-agent search. That means ensuring your content is live, indexed, updated, structured, and accessible.
But there’s a second, equally strategic layer: ensuring your content is included in the training data of large language models. Being in the training corpus doesn’t guarantee retrieval, but being absent from it dramatically lowers your odds of ever being surfaced.
Treat properly identified AI crawlers (like Common Crawl’s CCBot) as strategic stakeholders, not threats. Allow appropriate access. Mark your content as machine-readable. Opt in rather than blocking by default.
The formula: live indexing + training corpus inclusion = dual-path visibility in the era of agent-mediated discovery.
Practical Standards: What’s Working Now
The strategic framework matters, but so does implementation. Here’s what’s emerging as practical infrastructure for agent-readiness.
For Accessibility
robots.txt is getting AI-specific extensions. The Robots Exclusion Protocol (now RFC 9309) remains the baseline, but an IETF draft proposes syntax to distinguish AI training from inference, letting you permit RAG-style answers while blocking training ingestion, or vice versa. AI crawlers like GPTBot, ClaudeBot, and Google-Extended already check robots.txt.
Cloudflare now blocks AI crawlers by default for new customers. If you’re on Cloudflare, check your settings. Their AI Crawl Control features let you make nuanced decisions. Be intentional about your access policy rather than accepting defaults that may be making you invisible.
For Legibility
llms.txt is the clearest practical step you can take today. It’s a simple Markdown file at /llms.txt that provides a curated map of your most important content for AI systems: key docs, FAQs, policies, pricing, with links to clean Markdown versions where possible.
Here’s what a basic llms.txt file looks like:
# YourCompany.com
> Brief description of what your company does and what this site offers.
## Key Pages
- [Product Overview](/docs/product-overview.md): What we offer and how it works
- [Pricing](/pricing.md): Current plans and pricing
- [API Documentation](/docs/api.md): Full API reference for developers
## Support & Policies
- [FAQ](/faq.md): Common questions answered
- [Terms of Service](/legal/terms.md)
- [Contact](/contact.md): How to reach us
Adoption is growing. Directories like llmstxt.site and directory.llmstxt.cloud track hundreds of implementations. GitBook has published tutorials. CMS platforms are building auto-generation features.
I’ve implemented llms.txt on several of my own sites, and I plan to expand this significantly, adding Markdown versions of key content and keeping the files current. It’s one of the most concrete things you can do right now.
Structured data (JSON-LD / Schema.org) remains non-negotiable. Products, organizations, FAQs, events, locations, schema markup gives agents a machine-readable knowledge graph of your key entities.
For Actionability
Expose your services as tools, not just pages. If you have APIs, document them with OpenAPI/Swagger specs. Agents can ingest these and treat your API as a callable tool, placing orders, checking inventory, submitting inquiries, rather than screen-scraping checkout flows.
Consider MCP (Model Context Protocol). If you want agents to act on your services, exposing an MCP-compatible endpoint is increasingly the path. Your booking system, inventory lookup, or quote generator can become a tool that agents call directly, with proper authentication and scoping.
The /ask endpoint pattern is emerging. A Microsoft-Cloudflare collaboration is pushing a model where sites expose conversational interfaces: /ask for human Q&A, /mcp for agent tool calls, both backed by the same retrieval infrastructure. Forward-looking, but being built now.
For Diagnostics
Check where you stand. CanAISeeIt scores sites on AI visibility, crawler accessibility, and protocol compliance. Your server logs show which AI user-agents are visiting. If you’re not seeing CCBot, GPTBot, or ClaudeBot, find out why.
The Web Is Being Rebuilt. Quietly.
What I’m describing isn’t a distant future. It’s happening now, mostly invisibly. Every major AI lab is building agent capabilities. Every major identity vendor is developing agent-specific IAM. Standards bodies are actively drafting protocols for agent authentication, authorization, and payment.
The shift from search engine optimization to AI optimization is directionally right as a framing, but it undersells the magnitude. SEO was about being found. Agent optimization is about being found and being usable by non-human actors who represent real human intent.
The web was built for human browsers, then retrofitted for search engine crawlers. Now it’s being rebuilt again, this time for AI agents that act as legitimate proxies for human users.
The organizations that recognize this shift and prepare for it will capture a new channel of demand. Those that don’t will watch that demand flow to competitors who were paying attention.
Your next customer might arrive via an AI agent. The question is whether you’ll recognize them as a customer, or lock them out as a bot.


