> CITATION_PROTOCOL_01

Getting Cited by AI Is Not About Better Content.
It's About Formatting What You Already Have.

You have the expertise. The question is whether AI engines can extract it. These 7 steps address the extraction problem — not the expertise problem.

// Why Citations Compound Where Rankings Don't

When an AI engine cites your site, it positions you as the answer — not one of ten options. As you begin converting AI-referred visitors, you'll notice the difference: they arrive with the question already resolved. The citation is the trust transfer. AI search visits are doubling annually — and the brands named now are accumulating citation momentum that compounds.

70%
of AI users trust cited sources
3x
higher conversion than organic
(AEOfix client data, n=110 brands)
Zero
competition on the citation

// The Two-Layer System That Decides Who Gets Cited

Retrieval-Augmented Generation (RAG) is the technical process that powers real-time AI citations. When a user asks ChatGPT or Perplexity a question, the engine doesn't rely solely on its training data — it fetches live web content to supplement and verify its answer before responding. Understanding this two-layer system is the foundation of any effective AEO strategy.

LAYER 01 // TRAINING DATA

Static Knowledge Layer

Built from billions of web pages crawled before a model's knowledge cutoff. Your content must have been crawled and indexed during this collection window to influence the model's base knowledge. This is where long-term GEO (Generative Engine Optimization) strategy plays out — think training crawler access, content longevity, and topical authority.

LAYER 02 // REAL-TIME RETRIEVAL

Live Citation Layer

When generating an answer, the engine searches the live web, selects the most authoritative and relevant sources, and cites them inline in its response. This is where AEO optimization has the most immediate, measurable impact — and what the 7 steps below are designed to optimize. Schema markup, E-E-A-T signals, and citable content structure all feed this layer.

Both layers require attention. Open crawler access via a permissive robots.txt, and build the extraction-ready content structure the framework below describes. As you implement both simultaneously, you'll see results in the Search layer within days.

// The 7-Step AI Citation Framework

STEP 01

Add Schema.org JSON-LD — The Single Highest-Impact Change Available

Schema markup is the single highest-impact action for AI citation. It gives AI engines machine-readable context about your content, reducing hallucination risk and making your site computationally cheaper to parse.

Priority schema types for citations: Organization, FAQPage, HowTo, Article, Product, LocalBusiness, and BreadcrumbList.

<script type="application/ld+json"> { "@context": "https://schema.org", "@type": "Organization", "name": "Your Company", "description": "What you do in one sentence", "url": "https://yoursite.com", "knowsAbout": ["Your expertise area"] } </script>

Read our complete Schema Markup for AEO guide →

STEP 02

Open Your robots.txt — AI Engines Can Only Cite What They Can Access

AI engines can only cite content they can access. Many sites unknowingly block GPTBot, ClaudeBot, and PerplexityBot. Check your robots.txt and explicitly allow the crawlers that matter.

Recommended robots.txt Configuration:

# AI Answer Engines (Allow for Real-Time Citations) User-agent: ChatGPT-User Allow: / User-agent: GPTBot Allow: / User-agent: Claude-Web Allow: / User-agent: ClaudeBot Allow: / User-agent: Google-Extended Allow: / User-agent: PerplexityBot Allow: / User-agent: Applebot-Extended Allow: / # Training Data Collectors (Block or Allow Based on Strategy) # Block if you want real-time citations only: User-agent: CCBot Disallow: / User-agent: anthropic-ai Disallow: / # Allow if you want training data inclusion (GEO strategy): # User-agent: CCBot # Allow: / # Reference your sitemap for all crawlers Sitemap: https://yoursite.com/sitemap.xml

AI Crawler User Agents Reference:

AI Engine User Agent Purpose
ChatGPT Search ChatGPT-User Real-time browsing for answers
ChatGPT Training GPTBot Training data collection
Claude Search Claude-Web Real-time browsing for answers
Claude Training ClaudeBot Training data collection
Gemini/Bard Google-Extended AI features beyond traditional search
Perplexity PerplexityBot Real-time answer generation
Apple Intelligence Applebot-Extended AI features in iOS/macOS
Common Crawl CCBot Public training data archive

Rate Limiting Considerations:
AI crawlers can generate significant server load. Monitor your analytics for crawl frequency spikes, bandwidth usage by AI user agents, and server response times. If you experience issues, you can use the Crawl-delay directive (though not all AI crawlers respect it):

User-agent: GPTBot Crawl-delay: 10 Allow: /

Full robots.txt configuration guide →

STEP 03

Rewrite for Density — High Information Gain Is What Gets Extracted

AI engines prefer content with high information gain—unique facts, specific data points, and clear definitions that can be directly quoted. Fluff-filled SEO content gets skipped.

What makes content citable:

  • ✓ Specific numbers and data points (not vague claims)
  • ✓ Clear definitions in the format "X is Y"
  • ✓ Step-by-step processes with concrete actions
  • ✓ Original research, benchmarks, or case studies
  • ✓ Voice-search-friendly answers (concise, conversational sentences that AI reads aloud)
  • ✓ Content freshness signals—publish dates, update logs, and current-year data
  • ✗ Avoid: filler text, keyword stuffing, rehashed content
STEP 04

Establish Author Identity — Anonymous Sources Don't Get Cited

AI engines assess Experience, Expertise, Authoritativeness, and Trustworthiness (E-E-A-T) before citing a source. The stronger your authority signals, the more likely AI engines are to reference you.

E-E-A-T signals AI engines look for:

  • ✓ Named author with visible credentials and bio
  • ✓ Consistent NAP (Name, Address, Phone) data across pages
  • ✓ External citations and backlinks from authoritative domains
  • ✓ Published date and last-modified date on every page
  • ✓ Clear "About" page with company history and team bios
  • ✓ Brand mentions and positive sentiment across third-party sites and reviews
  • ✓ Topical authority—deep, interlinked content clusters that cover your subject comprehensively
STEP 05

Publish llms.txt and ai.txt

These machine-readable files help AI systems understand your site at a glance. llms.txt provides a structured summary for LLMs, while ai.txt declares your AI crawler preferences.

# llms.txt - Place at yoursite.com/llms.txt # Company Name Name: Your Company Name URL: https://yoursite.com Description: One-line description of what you do. # Services Services: - Service 1: Description - Service 2: Description # Key Facts Founded: Year Location: City, State Industry: Your industry Specialization: Your expertise # Contact Email: contact@yoursite.com Phone: +1-XXX-XXX-XXXX
STEP 06

Structure as Direct Q&A — Format Your Answers the Way AI Engines Extract Them

AI engines generate answers to questions. If your content directly answers common questions in your industry, you become the natural citation target. Use FAQPage schema to mark these up.

How to find the right questions:

  • ✓ Ask ChatGPT, Claude, and Perplexity questions about your industry and note which sources they cite
  • ✓ Use Google's "People Also Ask" for your core keywords
  • ✓ Check your site's search console for question-format queries
  • ✓ Review competitor FAQ pages for gaps you can fill better

Well-structured Q&A content also feeds featured snippets, AI Overviews, and voice search results—all channels where AI engines select a single authoritative answer to present.

STEP 07

Track Your Citation Rate — You Cannot Optimize What You Cannot Measure

You can't optimize what you can't measure. Track which AI engines cite your content, for which queries, and how your citation rate changes over time.

A Source Map Report tests 150+ queries across ChatGPT, Claude, Gemini, and Perplexity to show exactly where you appear, where you don't, and who gets cited instead.

As you begin the citation framework, establish your baseline first. Knowing where you currently appear — and where your competitors appear instead — defines the specific gap you're closing.

Get a Source Map Report — $59

// Common Mistakes That Kill AI Citations

These are the structural errors that prevent AI citation regardless of content quality. As you audit your own site, you'll likely find 2–3 of them already in place:

  1. Blocking AI crawlers in robots.txt. If GPTBot, ClaudeBot, or PerplexityBot are blocked, those engines literally cannot access your content. Always verify your robots.txt explicitly allows AI crawlers.
  2. Using Microdata instead of JSON-LD for schema. AI crawlers parse JSON-LD orders of magnitude faster than Microdata. Google recommends JSON-LD. Always use it exclusively.
  3. Generic, keyword-stuffed content. AI engines skip low-information-density content. If your page says "best practices for X" 47 times but never actually lists the practices, you won't get cited.
  4. Missing Organization schema on homepage. Without it, AI engines have no entity anchor for your brand. You're anonymous content. This is the single most critical schema type.
  5. No author attribution or generic "Staff Writer". AI engines assess E-E-A-T (Experience, Expertise, Authoritativeness, Trust). Anonymous or generic authors weaken citation probability. Use real names with credentials.
  6. Hiding content behind JavaScript-heavy frameworks. If your content isn't in the HTML source (viewable with "View Source"), AI crawlers may not see it. Server-side rendering or static generation is preferred.
  7. Outdated or missing publish dates. AI engines use datePublished and dateModified to assess freshness. Stale dates = stale citations.
  8. FAQPage answers that are too long or vague. AI engines extract the first 40-60 words. If your FAQ answer is 300 words of preamble before the actual answer, it won't get cited. Lead with the direct answer.
  9. Gating content behind login or paywall. AI crawlers can't (and won't) log in or pay. If your best content is gated, it's invisible to AI engines. Make key information publicly accessible.
  10. Inconsistent NAP (Name, Address, Phone) data. If your contact info differs across pages, schema, and third-party sites, AI engines flag the inconsistency and skip you. Maintain exact consistency everywhere.

// Engine-Specific Citation Strategies

General AEO principles apply across all four platforms. As you implement per-engine tactics, you'll see disproportionate gains on specific queries — because each engine has a distinct selection mechanism.

ChatGPT Search Citation Optimization

ChatGPT Search uses a dual-layer architecture: Bing's web index as its primary citation pool and GPTBot crawl data for supplemental content. This means Bing indexing is more important for ChatGPT citations than Google indexing.

  • Bing Indexing first: Submit your sitemap to Bing Webmaster Tools — this is the most direct path to ChatGPT's citation pool
  • GPTBot access: Explicitly allow User-agent: GPTBot and User-agent: ChatGPT-User in robots.txt
  • FAQPage schema priority: ChatGPT has the strongest correlation between FAQPage markup and citation selection of all major engines
  • Lead with the answer: ChatGPT prefers content where the first sentence directly answers the question — no preamble
  • Domain authority in Bing: Backlinks from high-authority domains in your niche directly improve ChatGPT citation probability via Bing's domain trust scores

Perplexity Citation Optimization

Perplexity is the most citation-transparent engine — it always shows numbered source links. This makes it both the most measurable and the most content-quality-sensitive platform. It runs a live web search for every query, meaning content published today can be cited today.

  • Freshness is your fastest lever: No crawl delay — update content and it can be cited within hours
  • Information density wins decisively: A 1,200-word page dense with original data will beat a 3,000-word page of generic advice every time
  • PerplexityBot access: Explicitly allow User-agent: PerplexityBot in robots.txt
  • Citation-worthy sentence structure: Write paragraphs that open with a direct, stand-alone factual statement — Perplexity extracts the first complete citable sentence
  • Source authority compound effect: Pages already cited by other credible sources get a compounding advantage — Perplexity uses backlink signals as authority proxies

Claude (Anthropic) Citation Optimization

Claude's citation behavior reflects Anthropic's emphasis on well-structured, authoritative, and factually grounded content. Claude is notably more sensitive to logical content organization and author credibility than other engines.

  • llms.txt and ai.txt: Claude specifically looks for these machine-readable files at your domain root to understand your site's scope and content preferences
  • Author attribution is weighted heavily: Named expert authors with Person schema and verifiable credentials score significantly higher than anonymous or generic bylines
  • ClaudeBot + Claude-Web access: Allow both crawlers in robots.txt — ClaudeBot for training, Claude-Web for real-time browsing
  • Clear heading hierarchy: Claude parses logical H2/H3 document structure as an authority signal — well-organized content is rated more trustworthy
  • Factual consistency: Internally consistent facts with no contradictory claims across a site receive higher confidence — Anthropic's training emphasizes accuracy

Gemini Citation Optimization

Gemini's citation behavior is deeply connected to Google's existing data infrastructure — Knowledge Graph, Search Console signals, and Google's core index. Existing Google Search authority translates directly to Gemini citations more than any other engine pairing.

  • Knowledge Graph entity: A verified Google Business Profile and Wikidata entity create a direct citation pathway unique to Gemini
  • Google-Extended crawler: Allow User-agent: Google-Extended — this is the specific crawler for Gemini and AI Overview data, separate from Googlebot
  • Schema.org influence: Schema markup feeds the same data model Google uses for Knowledge Graph, giving it more direct impact on Gemini than other engines
  • Google Business Profile (local queries): For location-based queries, Gemini cites GBP data first — keep hours, categories, and reviews current
  • AI Overviews overlap: Gemini and Google AI Overviews share source selection logic — optimizing for one improves the other

Google AI Overviews Optimization

AI Overviews appear in 47% of US Google searches as of 2026, synthesizing answers from 2–5 sources above all traditional results. Selection criteria overlaps with traditional SEO but weights topical authority and structured data much more heavily.

  • Topical authority clusters: AI Overviews strongly prefer domains with deep content clusters (pillar + 8+ supporting pages on the same topic)
  • Featured snippet foundation: Content that earns featured snippets has the highest AI Overviews selection probability — same format, direct answer first
  • FAQPage + HowTo schema: These are the two most correlated schema types with AI Overviews selection
  • Google-Extended access: Explicitly allow this crawler or Google cannot include you in AI Overviews
  • E-E-A-T inheritance: AI Overviews cite sources Google already trusts from organic search — improving core E-E-A-T signals is the most reliable long-term strategy

Full Google AI Overviews optimization guide →

Microsoft Copilot optimization guide (Bing-specific tactics) →

Implementation Is Available If the Scope Is More Than You Want to Handle Alone

All 7 steps are available as a managed implementation. Schema markup, crawler configuration, E-E-A-T signals, and 30-day verification — one-time pricing, no retainers. As you review the packages, you'll see exactly which scope matches your current gap.

View AI Readiness Plans Start with a Source Map
WB
William Bouch
AEO Architect & Founder of AEOfix. Former construction worker turned full-stack developer. Engineering-driven AI visibility optimization.