llms.txt Will Not Get You Cited. Here Is What 300,000 Domains Reveal.

Posted by

Ilya G.

May 20, 2026

On May 19, 2026

llms.txt will not get you cited in AI search. If you’ve added the file to a client site expecting a visibility lift, the data says you bought a placebo. A study spanning roughly 300,000 domains found no statistically significant link between having an llms.txt file and getting cited more often by AI engines. When researchers pulled it out of their prediction model entirely, the model got more accurate. This post is about what the file does, why it spread anyway, and where that attention should actually go.

What llms.txt was supposed to do

The pitch was reasonable. Robots.txt tells crawlers what they can access. llms.txt was proposed as a companion: a plain-text file at the root of your domain that hands large language models a curated map of your most important content, so they’d understand and cite you better.

It sounds like the kind of thing that should work. It’s tidy. It’s technical. It feels like the SEO move of putting a clean signal in front of a machine. That’s exactly why it spread through agency checklists in a matter of months.

But “should work” and “does work” are different claims, and the second one is the only one a client is paying for.

What the evidence actually shows

Three findings, and none of them are ambiguous.

First, the correlation study. Across about 300,000 domains, the presence of an llms.txt file showed no meaningful relationship with AI citation frequency. Removing it as a variable improved the model’s accuracy — which means it wasn’t just neutral, it was noise.

Second, the crawler behavior. Over a 90-day window covering more than 500 million AI bot visits, only 408 of those visits targeted an llms.txt file directly. That’s a rounding error. The major AI crawlers — GPTBot, ClaudeBot, PerplexityBot — are almost entirely ignoring the file.

Third, the vendor position. Google has stated plainly that it does not support llms.txt and has no plans to. There is no signal that the file influences inclusion in AI Overviews or AI Mode. No major engine has committed to reading it.

A standard that the standard-setters won’t read is not a standard. It’s a convention waiting for adoption that hasn’t come.

Why smart marketers fell for it anyway

This is worth sitting with, because the mechanism that made llms.txt spread will make the next empty tactic spread too.

New channel, high anxiety, low visibility into how the channel actually works. That combination produces a hunger for concrete, checkable actions. A file you can add is concrete. You can confirm it exists. You can put a green check next to it on the client report.

The trouble is that “checkable” and “effective” have nothing to do with each other. The eight-second instinct that should have fired — has anyone shown this moves the number? — got skipped, because the action felt like progress and progress felt good.

That instinct is the actual edge of a veteran marketer. Twenty years of watching tactics get sold, tested, and quietly dropped should make you the hardest person in the room to sell a placebo to. Use that.

Where that attention should go instead

If llms.txt is a zero, the obvious question is what isn’t. Here’s what the same body of 2026 research says actually correlates with getting cited.

1. A direct answer in the first 60 words

Generative engines extract before they assemble. A clean, self-contained answer of 40 to 60 words near the top of a page gives them something liftable. This is the single highest-leverage on-page change available, and it costs an editing pass, not a development ticket.

2. Schema that names your entities

Article, FAQPage, HowTo, Organization, Product. Structured data is how an AI system confirms what a page is, who stands behind it, and whether to trust it. This is the technical work llms.txt was pretending to be — except engines actually consume it.

3. Crawler access for the bots that matter

Confirm your robots.txt explicitly permits OAI-SearchBot, GPTBot, and PerplexityBot. Blocking them — often by accident, through an aggressive security plugin or a copied robots file — guarantees invisibility. This is a real file at the root of your domain that engines genuinely read. It’s the unglamorous cousin of llms.txt, and it works.

4. Third-party presence

Community platforms captured more than half of all citations across ChatGPT, Perplexity, and Google AI Overviews combined in 2026 analysis. Being mentioned, reviewed, and discussed off your own domain now does more for citation than another on-site page. Earned presence beats owned presence.

5. Recency, especially for Perplexity

Perplexity runs a live web search on most queries and weights freshness heavily. Genuinely updated content can earn citations within hours. A page that hasn’t been touched in two years is telling the freshness-sensitive engines something, and it isn’t flattering.

The pattern worth keeping

llms.txt isn’t the last tactic that will spread faster than its evidence. The AI search channel is new enough that the gap between “plausible” and “proven” will stay wide for a while, and that gap is where wasted retainer hours go to live.

So keep one habit. Before any tactic goes on a client plan, ask the eight-second question: has anyone measured this against citation rate, and what did they find? If the honest answer is “it feels like it should help,” that’s not a yes. That’s an llms.txt.

Frequently asked questions

Should I remove llms.txt from sites that already have it?

There’s no urgency to remove it — it does no measurable harm, and a future engine could in theory adopt it. The point is not to spend more time on it, and not to present it to clients as a visibility lever. Treat it as inert until an engine says otherwise.

Is robots.txt still relevant for AI search?

Very much so. robots.txt is read by AI crawlers, and a misconfigured file that blocks GPTBot or PerplexityBot will keep you out of those engines entirely. It’s one of the few root-level files that genuinely affects AI visibility.

What is the difference between llms.txt and robots.txt?

robots.txt is an established standard that AI crawlers actually read and obey, controlling what they can access. llms.txt is a proposed convention meant to guide AI models to your best content, but it currently has near-zero adoption by the major engines and no demonstrated effect on citations.