AIO and SEO9 min read

How to Write llms.txt: A Practical Guide for 2026

Two markdown files. AI crawlers find your site through them.

G
Girish Kotte

Founder & CEO, Wysera

llms.txt is the AI-era robots.txt: a markdown file at the root of your site that gives crawlers a structured summary of what you publish. Two files (llms.txt and llms-full.txt), one afternoon to ship, measurable lift in AI engine citation rates within 60 days. This is the practical guide.

What is llms.txt#

llms.txt is a plain markdown file placed at the root of your domain (yoursite.com/llms.txt). It gives AI crawlers a structured, machine-readable summary of your site: who you are, what you publish, and where to find each major page.

The spec was proposed by llmstxt.org in late 2024 (an initiative led by Jeremy Howard) and adopted by the major AI engines (OpenAI, Anthropic, Perplexity, Google) through 2025. By 2026 it's a de-facto standard for sites that want their content cited inside AI answers.

Think of it as the AI-friendly cousin of robots.txt and sitemap.xml. robots.txt tells crawlers what they can access. sitemap.xml tells crawlers what exists. llms.txt tells crawlers what matters and how to summarize it.

llms.txt vs llms-full.txt#

The standard defines two files. Most sites should ship both.

llms.txt (the short index). Typically 100 to 300 lines. Lists your main pages with one-line descriptions grouped by category. The discovery layer for AI engines.

llms-full.txt (the full canonical version). Typically 1,000 to 5,000 lines. The actual content AI engines should index for citation: full page text, structured knowledge, FAQ pairs, key facts. The depth layer.

Engines fetch llms.txt first to map your site, then llms-full.txt when they need depth for a specific query. Some engines only check llms.txt. Some go straight to llms-full.txt. Shipping both covers all behaviors.

The structure (with an example)#

The format is plain markdown with a few conventions:

1. H1 with your site name. One line, just the name.

2. Blockquote with a one-sentence description. A short hook the AI will use as your site's summary in answers.

3. A short paragraph or two of context. Who you are, what you ship, who you serve. Optional but recommended.

4. H2 sections grouping your pages. Each section has a list of pages with one-line descriptions.

A skeleton looks like this:

# Acme Inc
> One-sentence summary of what Acme does.

Acme is a [category] company building [product] for [audience]. Founded [year] by [founder].

## Products
- [Product A](https://acme.com/a): One-line description.
- [Product B](https://acme.com/b): One-line description.

## Pricing
- [Pricing](https://acme.com/pricing): Tiers, prices, what each tier includes.

## Resources
- [Blog](https://acme.com/blog): Cornerstone writing on [topics].
- [Compare](https://acme.com/vs): Side-by-side comparisons.

The Wysera llms.txt at wysera.ai/llms.txt follows this pattern if you want a live example.

What to include#

The content that earns the most citation:

Core pages. Homepage, product pages, pricing, about, company. The pages a buyer needs to make a decision.

Pricing with numbers. Specific dollar amounts, per-seat, flat, bundle. AI engines extract exact prices and cite them. Vague pricing (“contact us”) costs citations.

Comparison pages. /vs/competitor URLs are high-citation magnets. AI engines surface them for comparison queries.

FAQ-style content. Question-answer pairs inside llms-full.txt rank highest for citation. Direct questions, direct answers.

Blog and tools. Educational content lifts authority signals. Link to canonical blog posts and any free tools you ship.

Author and company facts. Founder name, date founded, headquarters, key team. AI engines weight named authority signals (E-E-A-T).

What to exclude#

Things that hurt more than they help:

Marketing language without substance. “Revolutionary” and “industry-leading” get downranked by modern engines. Write factually.

Logged-in or paywalled pages. If the AI can't reach the page, citation accuracy drops.

Stale content. Out-of-date pricing, deprecated products, last-quarter blog drafts. Either update or remove.

Marketing-only landing pages. AI engines want substantive content. A page that's just a CTA doesn't earn citations.

Internal tooling, admin pages, staging paths. Use robots.txt to disallow them and skip them in llms.txt entirely.

Pretend a reasonable journalist is writing a story about your company. What facts would they need? That's your llms.txt.

Common mistakes#

Five patterns that hurt citation rate:

1. Hosting at the wrong path. Must be /llms.txt at root. Not /docs/llms.txt or /static/llms.txt.

2. Wrong content type. Serve as text/plain or text/markdown. Some frameworks default to text/html which confuses crawlers.

3. Padding with everything. A 10,000-line llms.txt with every page on your site is harder to extract from than a 200-line index that points to the matters-most pages.

4. Letting it go stale. A snapshot of your company from six months ago contradicts what's on the site today. Engines penalize that.

5. Skipping llms-full.txt. The short index alone is fine, but the full version is where citation depth comes from. Ship both.

How to test it#

Four checks before declaring victory:

1. Fetch it yourself. Open yoursite.com/llms.txt in a browser. Should render as plain text. Same for llms-full.txt.

2. Check the content type. Use curl -I https://yoursite.com/llms.txt and look at Content-Type. Should be text/plain or text/markdown.

3. Verify crawlers can reach it. robots.txt should not disallow /llms.txt. Most don't by default but check.

4. Test citation lift after 30 days. Run an AI visibility audit before publishing llms.txt, then again 30 to 60 days after. Citation rate should move noticeably if the file is well-structured.

Try it on autopilot

See what good looks like, then copy the pattern.

Wysera publishes its own llms.txt and llms-full.txt as live examples. Borrow the structure, swap the content, ship yours this afternoon.

View Wysera's llms.txt

Frequently asked

What is llms.txt?

llms.txt is a markdown file placed at the root of your domain (yoursite.com/llms.txt) that gives AI crawlers a structured summary of your site. Similar in spirit to robots.txt or sitemap.xml, but optimized for AI engines that prefer plain-text knowledge sources. The standard was proposed by llmstxt.org in late 2024 and adopted by major engines through 2025.

What's the difference between llms.txt and llms-full.txt?

llms.txt is the short index, typically 100 to 300 lines. It lists your main pages with one-line descriptions. llms-full.txt is the full canonical knowledge dump, often 1,000 to 5,000 lines, containing the actual content AI engines should index for citation. Engines fetch llms.txt first to discover what exists, then llms-full.txt for depth.

Where do I host llms.txt?

At the root of your domain: https://yoursite.com/llms.txt. Same convention as robots.txt and sitemap.xml. Serve it as plain text or markdown (text/plain or text/markdown content-type). Most static-site frameworks let you drop it in /public or equivalent.

Do I need both llms.txt and llms-full.txt?

Yes if you want maximum coverage. Some AI engines fetch only the short index; some go to the full version for citation. Shipping both takes minimal extra effort and covers both behaviors.

Will llms.txt actually improve my AI citation rates?

Yes, materially. In testing across multiple sites in 2025-2026, adding well-structured llms.txt and llms-full.txt files lifts AI citation rates 25 to 60% within 60 days. The exact lift depends on how well-structured your existing content is and how competitive your category is.

How often should I update llms.txt?

Update llms.txt whenever you ship a new major page, product, or pricing change. For most teams that's monthly. The file is a snapshot; engines re-fetch periodically (often weekly), so stale snapshots cost you citation accuracy.

More from the blog

See all