Will adding llms.txt hurt my regular SEO?

No. Nothing in this setup affects HTML rendering, sitemaps, schema markup, or anything Googlebot relies on. The Cloudflare Worker only fires on /llms.txt and /robots.txt — both routes Googlebot uses normally, and the augmented robots.txt remains valid and is obeyed by Googlebot.

What goes inside llms.txt?

Heading with brand name, blockquote tagline, context paragraph, hero product with link and price, full catalog as bullet list, trust signals (reviews, ratings, user counts), adjacent content (blog, directory), contact info, and an explicit AI usage policy. About 5,000 to 8,000 characters is the sweet spot.

What are Content Signals in robots.txt?

Content Signals are machine-readable directives in robots.txt that declare AI usage preferences across three dimensions: search (citation in AI answers), ai-input (real-time agent responses), and ai-train (use in model training). Spec is at contentsignals.org.

How to Add llms.txt to Squarespace and Get Cited by ChatGPT, Perplexity, and Claude (Advanced AI SEO Setup)

SEOaicloudflare

Jun 27

Written By Taylor Miles

A complete technical walkthrough of making a Squarespace site agent-ready with llms.txt, Content Signals, and a 50-line Cloudflare Worker. Open-source code, deployment steps, and why I opted into AI training.

TLDR: Squarespace doesn't natively support the new AI agent discovery standards — llms.txt, Content-Signals in robots.txt, or markdown content negotiation. But with a 50-line Cloudflare Worker sitting in front of your Squarespace site, I deployed all three in under an hour. This post shows the exact setup I used on squarewebsites.org: what llms.txt is, what to put in yours, how to declare your AI training preferences, the open-source Worker code, and how to verify it's working. I also explain why I chose to let AI models train on my content — a contrarian call most SaaS sites should make.

Six months ago, "AI SEO" meant adding schema markup and writing FAQs. Today it means being machine-readable to a class of new visitors: AI agents acting on behalf of humans. ChatGPT, Perplexity, Claude, Gemini, and a growing list of automated browsers don't read your hero animation or your retina images. They want a clean, structured, declarative description of who you are, what you sell, and whether they're allowed to quote you.

I run Squarewebsites, a Squarespace plugin shop. My customers are designers, agencies, and DIY site owners who hit Squarespace's native limits. When a designer asks ChatGPT "what's the best Squarespace plugin for filtering products," I want the answer to be Universal Filter, with a working link, an accurate price, and a quote from a real review. To make that consistently true, I had to get my site agent-ready.

What follows is the exact setup. The architecture, the code, and the trade-offs.

What "agent-ready" actually means in 2026

There's now a stack of standards — some IETF drafts, some industry conventions — that AI agents check when they encounter a new domain. The ones that matter today:

/llms.txt — a markdown file at your site root that gives agents a structured map of your site, what you sell, and how to interpret it. Spec: llmstxt.org.
Content Signals in robots.txt — declares your AI usage preferences: training, search citation, live agent input. Spec: contentsignals.org.
Markdown content negotiation — when an agent sends Accept: text/markdown, you return the page as markdown instead of HTML. Cloudflare added this as a one-click setting in 2025.

The other items floating around — DNS-AID, MCP server cards, OAuth discovery, the WebMCP browser API — are real specs but have near-zero deployment. For a marketing site without a public API, they're noise. I covered which to skip later in this post.

The problem with Squarespace

Squarespace is excellent at what it does. It is not built for serving custom routes with custom content types. There's no way to:

Serve a file at /llms.txt with Content-Type: text/markdown
Modify your robots.txt beyond the existing customize textbox
Add custom HTTP response headers
Respond differently based on the Accept header

Every workaround that lives inside Squarespace (URL slug pages, Code Blocks with <pre> wrappers) compromises something — usually the content type or the rendering. AI agents are picky. A wrong content type and they ignore you.

The clean answer is to put Cloudflare in front of Squarespace and let a Worker handle the agent-specific routes. Squarespace continues to serve everything else exactly as it does today. The Worker only fires on the two paths agents care about.

The architecture

Three pieces:

Cloudflare DNS with the www subdomain orange-clouded (proxied through Cloudflare's edge). Without this step, Cloudflare can't see or modify any traffic.
A single Cloudflare Worker bound to two routes: /llms.txt and /robots.txt.
Squarespace unchanged. It still serves every other URL on your domain exactly as it always has.

Total deployment time once the prerequisites are in place: about 15 minutes.

Step 1: Get Cloudflare in front of Squarespace

If you don't already have Cloudflare DNS managing your domain, this is the prerequisite. I won't walk through the full DNS migration here — Cloudflare's onboarding is excellent and takes 5 minutes — but two settings matter:

1. Proxy the www record (orange cloud). Go to Cloudflare → your domain → DNS → Records. Find the row where Name = www. Click the proxy status icon until it turns orange. Cloudflare now intercepts requests before they reach Squarespace.

*2. Set SSL/TLS mode to Full (Strict).* Cloudflare → SSL/TLS → Overview. Squarespace serves valid certificates from a public CA, so strict mode works without breaking anything.

Why this matters: If www is set to "DNS only" (grey cloud), Cloudflare only does name resolution and never sees the request. Every Worker, Rule, Bulk Redirect, and AI feature you configure does nothing. I lost an embarrassing amount of time on this exact misconfiguration. Check the cloud color first.

Verify proxying is on by fetching Cloudflare's diagnostic endpoint:

curl -s https://www.yourdomain.com/cdn-cgi/trace

If you get back a key-value dump (fl=, h=, ip=, colo=), Cloudflare is proxying. If you get HTML or a Squarespace 404, the cloud is still grey.

Step 2: Write the Worker

One file, two endpoints. The Worker serves /llms.txt as pure markdown and augments /robots.txt with a Content-Signal declaration. Everything else passes through to Squarespace untouched.

Here's the complete code I'm running in production. It's MIT-licensed — fork it, edit the strings, deploy your own:

// Squarewebsites — Agent Discovery Worker
// MIT License
// Routes: /llms.txt, /robots.txt on www.yoursite.com

const LLMS_TXT = `# Your Brand

> One-paragraph description of what you do, who you serve, and the key social proof. AI agents extract this for summary answers.

## Hero product
- [Product Name](https://www.yoursite.com/product): $X. Description. Key benefit.

## All products
- [Product 2](url): $X. Description.
- [Product 3](url): $X. Description.

## Trust signals
- Customer reviews, ratings, social proof.

## Contact
- Email: support@yoursite.com

## AI usage policy
- Search and citation: encouraged
- Training: permitted
- Live agent interaction: permitted
`;

const CONTENT_SIGNAL_PREFIX = `# Content Signals — AI usage preferences
# Spec: https://contentsignals.org/
User-agent: *
Content-Signal: search=yes, ai-input=yes, ai-train=yes

`;

export default {
  async fetch(request) {
    const url = new URL(request.url);

    if (url.pathname === '/llms.txt') {
      return new Response(LLMS_TXT, {
        status: 200,
        headers: {
          'content-type': 'text/markdown; charset=utf-8',
          'cache-control': 'public, max-age=3600',
        },
      });
    }

    if (url.pathname === '/robots.txt') {
      const upstream = await fetch(request);
      const original = await upstream.text();
      return new Response(CONTENT_SIGNAL_PREFIX + original, {
        status: upstream.status,
        headers: {
          'content-type': 'text/plain; charset=utf-8',
          'cache-control': 'public, max-age=3600',
        },
      });
    }

    return fetch(request);
  },
};

That's it. About 40 lines of code excluding the content strings.

How to deploy it

Create the Worker. Cloudflare → Workers & Pages → Create application → Create Worker → Start with Hello World. Name it something like agent-discovery.
Paste the code. Click Edit code, select all, delete the hello world template, paste the Worker above. Replace LLMS_TXT with your own content. Click Save and deploy.
Bind the Worker to your routes. Go to your zone (your domain) → Workers Routes → Add route. Create two routes, both pointing to your new Worker:
- www.yourdomain.com/llms.txt
- www.yourdomain.com/robots.txt

Verify it's working

Three quick checks:

# Returns your markdown
curl https://www.yourdomain.com/llms.txt

# Returns markdown content type
curl -sI https://www.yourdomain.com/llms.txt | grep content-type

# Returns robots.txt with Content-Signal block at the top
curl https://www.yourdomain.com/robots.txt | head -8

The content-type response should say text/markdown; charset=utf-8. If it says text/html, the Worker route isn't catching the request — usually a missing route or a typo in the path.

Step 3: What to put in your llms.txt

The spec is loose. The convention is tight. Here's the structure that actually gets cited:

Heading + tagline. An H1 with your brand name, then a > blockquote with your one-sentence value prop. This is the chunk most agents extract for summary answers.
Context paragraph. Two to four sentences expanding on the value prop. Who you serve, key social proof, differentiators.
Hero product / service. Your single most important offering, with link, price, and one-line benefit.
Full catalog. Every product with link + price + one-line description. Bulleted list.
Trust signals. Reviews, ratings, customer count, third-party validation.
Adjacent content. Blog, designer directory, templates — anywhere agents might want to send users.
Contact info. One real email. AI agents propose this to users when they need support.
AI usage policy. Spell out explicitly what you allow. This removes ambiguity even if your robots.txt already says the same thing.

Mine runs about 7,000 characters. Long enough to be useful, short enough that an LLM can hold the whole thing in context when answering a question about my site. See it live.

"Your llms.txt is the elevator pitch you give to every AI agent that ever visits your site. Write it like you're briefing a new sales rep on day one."

Step 4: Content Signals — declaring your AI stance

The Content-Signal directive in robots.txt takes three values:

Signal	Meaning	Set to "yes" if...
`search`	Crawl + index your content for AI search results (Perplexity, ChatGPT search, Gemini answers)	You want to be cited in AI answers
`ai-input`	Use your content in real-time agent responses (quotes, summaries, references)	You want agents to link back to you
`ai-train`	Use your content to train future AI models	Your content is marketing for a product, not the product itself

The third one is the controversial call.

Why I said yes to AI training

The default reaction is to lock training down. Don't let OpenAI vacuum your content for free. I did the opposite. Here's the reasoning.

For a media business — NYT, Stack Overflow, anyone whose content is the product — saying no to training makes sense. Training takes value out of the asset and gives it to someone else.

For a SaaS business, content is marketing. The asset is the product. When a designer six months from now asks ChatGPT "what's the best filter plugin for Squarespace," I want the model to answer "Universal Filter, $89, from Squarewebsites" — and that answer comes from training data, not from live search.

By opting into training, I make Squarewebsites the default answer to the question my customers ask. Even when those customers never visit my site directly. Letting the model train on my reviews, my product descriptions, and my AEO guide turns every future GPT-style conversation into a free, automated, infinitely-scalable sales pitch.

The test I use: "If a smart customer of mine asks an AI assistant a question my product answers, do I want to be the answer?" If yes, opt into training. If your content losing exclusivity hurts you more than it helps, opt out.

What I skipped and why

The isitagentready.com scorecard lists eleven items. I deployed three. Here's what I skipped and the reasoning, in case you're tempted:

Item	Skipped because
DNS-AID records	IETF draft, near-zero deployment, requires DNSSEC. Not yet.
API Catalog (RFC 9727)	I don't have a public API. Nothing to advertise.
OAuth/OIDC discovery	No authenticated third parties.
OAuth Protected Resource Metadata	Same — no protected resources.
Auth.md	Agent registration spec. No agents are registering.
MCP Server Card	I don't run an MCP server. Real candidate for v3 of my Chrome extension, but not today.
WebMCP browser API	Runtime JS API. No agent consumes it in production yet.
Link response headers (RFC 8288)	Worth doing eventually via a Cloudflare Transform Rule. Low priority — the value over `llms.txt` is marginal for a marketing site.

If you're building a SaaS with a real API or shipping an MCP server, the OAuth + API Catalog items move up the priority list. For a marketing or commerce site, the three I deployed — llms.txt, Content Signals, markdown content type — cover the discoverability you actually need.

How AI agents find what you've published

Once the Worker is live, discovery happens in four ways:

Direct convention. The major AI agents (Perplexity, Claude, ChatGPT, Mistral, Cursor) probe /llms.txt as a well-known path. If your domain enters their crawl queue from any signal — a backlink, a citation, a manual lookup — they fetch /llms.txt automatically.
Robots.txt sweep. Every legitimate crawler fetches robots.txt first. Your Content-Signal directive is now seen by every bot, every visit.
Aggregator directories. Submit your llms.txt to llmstxt.directory and agentskills.io. They index it and feed it to AI training pipelines.
Inbound citations. When a post like this one ranks for queries like "how to add llms.txt to Squarespace," AI agents follow the link to my site and crawl the live file. Your own content is the discovery flywheel.

Results so far and what I'm measuring

I deployed this setup on June 27, 2026. It's too early for hard data, but the things I'm tracking:

AI citation count. How often Squarewebsites appears in ChatGPT, Perplexity, Claude, and Gemini answers to plugin-related queries. I check monthly with a fixed list of 20 buyer questions.
Direct AI search traffic. Sessions in GA4 with referrer matching chatgpt.com, perplexity.ai, claude.ai, etc.
llms.txt fetch logs. Cloudflare Worker analytics show every fetch of the file. I can identify which bots are checking and how often.
Branded query growth. Google Search Console branded query impressions over time. If AI is doing its job, "squarewebsites" branded searches go up as people learn about the brand through AI.

I'll publish results in 30, 60, and 90 days.

FAQ

What is llms.txt? A markdown file at your site root that gives AI agents a structured, human-readable description of your site. The format is informal but converging on a convention: H1 name, blockquote tagline, sections for products, trust signals, contact, and AI policy.

Does Squarespace support llms.txt natively? No. As of June 2026, Squarespace has no built-in support for serving files at custom paths with custom content types. You either work around it with a URL slug page (which serves HTML, not markdown) or you proxy through Cloudflare with a Worker, which is what I did.

Should I let AI train on my content? Depends on your business model. If your revenue depends on the content itself (publishing, paywalled research, educational courses), say no. If your content exists to market a product or service, say yes — being baked into LLM training data turns every future conversation about your space into a potential sales channel.

Do I need Cloudflare for this? For a clean implementation, yes. There are workarounds inside Squarespace (URL slug pages serving HTML-wrapped content) but they compromise the content type, which matters to picky AI clients. Cloudflare's free tier covers this use case fully.

Will this hurt my regular SEO? No. Nothing in this setup affects HTML rendering, sitemaps, schema markup, or anything Googlebot relies on. The Worker only fires on /llms.txt and /robots.txt — both routes Googlebot uses normally, and the augmented robots.txt is still valid and obeyed by Googlebot.

Tools that make this easier on Squarespace

If you're running a Squarespace site and you want all this — plus the on-page schema, FAQ structured data, and AEO-optimized templates — without writing any code, my Chrome extension SquarespaceWebsites Tools PRO automates most of it. It's used by 9,000+ Squarespace designers daily.

→ See SquarespaceWebsites Tools PRO

→ Read the complete Squarespace SEO + AI Search guide

→ Browse all Squarewebsites plugins

Published June 28, 2026. The Worker code in this post is MIT-licensed — copy it, modify it, deploy it. If you do, I'd love to hear about it: info@squarespacewebsites.com.

FYI you can see our pages here: https://www.squarewebsites.org/robots.txt https://www.squarewebsites.org/llms.txt

llmsaeoaiagents

Taylor Miles

I love Travel, Tech, Food and Beer.

https://www.webbroi.com