Robots (SEO & GEO)

The platform includes configurable Search Engine Optimization (SEO) and Generative Engine Optimization (GEO) controls. SEO manages how traditional search engines like Google and Bing index your site. GEO controls how AI systems like ChatGPT, Claude, and Gemini discover and use your content.

All robot controls are configured in Admin > Site Config > Robots and are restricted to Super admins.

Search Engine Robots

Standard directives from the Robots Exclusion Protocol, honored by all major search engines. These control the meta robots tag and the /robots.txt file.

Setting	Standard	Default	Effect
Allow Indexing	Robots Exclusion Protocol	On	When off, search engines will not add your pages to their results (`noindex`)
Allow Link Following	Robots Exclusion Protocol	On	When off, search engines will not follow links on your pages (`nofollow`)

AI & LLM Controls

These settings control whether AI systems can crawl and use your site content for training or generating responses. The platform uses a defense-in-depth approach with multiple enforcement layers.

Settings

Setting	Standard	Default	Effect
Allow AI Bots	Robots Exclusion Protocol	On	Master toggle. When off, all known AI bots are blocked via robots.txt
Allow AI Indexing	Experimental	On	When off, adds `noai` meta tag to opt out of AI training and retrieval. Not all providers honor this.
Allow AI Image Use	Experimental	On	When off, adds `noimageai` meta tag to opt out of AI image training. Not all providers honor this.
Blocked AI Bots	Robots Exclusion Protocol	Empty	Block specific bots by user-agent name, even when the master toggle is on
llms.txt Content	Emerging (llmstxt.org)	Empty (auto-generated)	Custom content for the `/llms.txt` endpoint. Leave empty to auto-generate from site config

Enforcement Layers

AI bot controls are enforced at multiple levels for maximum coverage:

robots.txt — Dynamic, database-driven rules. Primary enforcement for well-behaved crawlers.
Meta robots tags — noai and noimageai directives in the HTML <meta> tag.
X-Robots-Tag header — HTTP response header set by the proxy for AI bot user-agents. Hardcoded for performance (no database reads).
llms.txt — Structured site description endpoint that AI systems can consume.

Known AI Bots

The platform recognizes these AI bot user-agents for blocking and header enforcement:

Bot	Organization	Purpose
GPTBot	OpenAI	Training data & ChatGPT web browsing
ChatGPT-User	OpenAI	User-initiated real-time web search
anthropic-ai	Anthropic	Claude training data
ClaudeBot	Anthropic	Claude web search/retrieval
CCBot	Common Crawl	Open dataset used by many AI companies
PerplexityBot	Perplexity AI	Search engine crawling
Google-Extended	Google	Gemini training (NOT Google Search)
Bytespider	ByteDance	TikTok parent company crawler
Amazonbot	Amazon	Alexa and Amazon AI services
FacebookBot	Meta	Content preview and AI training
Meta-ExternalAgent	Meta	Meta AI assistant web browsing

Google-Extended only controls Gemini AI training. Blocking it does not affect your site's Google Search rankings or indexing.

llms.txt

The /llms.txt endpoint provides a structured text description of your site that AI systems can read. This is an emerging standard (see llmstxt.org) that helps AI chatbots accurately answer questions about your business.

If you leave the llms.txt content field empty, the platform auto-generates content from your site name, description, location, and contact information. For best results, write custom content that includes:

A clear description of what your business does
Your location and service area
Products and services offered
Contact information
FAQ section answering common customer questions

The FAQ section is particularly valuable for GEO. AI chatbots answer user questions — so structuring your content as Q&A pairs maps directly to how your business appears in AI-generated responses.

Loading...