PRODUCT LAUNCH

Does ChatGPT Actually Crawl Your Website?
Now You Can Know for Certain.

You've implemented schema markup. You've optimized your content for AI citation. You've added llms.txt. But here's the question nobody in AEO can currently answer: has GPTBot actually visited your site?

By William BouchFebruary 21, 20268 min read

The Feedback Gap in AEO

There's a problem at the center of every AEO engagement that nobody talks about openly: you are optimizing for machines you cannot observe.

Traditional SEO has Google Search Console. You can see exactly which queries triggered impressions, which pages ranked, and when Googlebot last crawled each URL. The feedback loop is tight — make a change, verify it, iterate.

AEO has none of that. You implement schema markup, restructure your content for AI citation, build entity signals — and then you wait. You query ChatGPT manually and hope your brand shows up. You run an AI visibility scan weeks later. There is no live signal. There is no proof that the crawl even happened.

The core problem: Every analytics tool on the market — Google Analytics, Plausible, Fathom, Cloudflare — is built to filter bots out. Their product goal is accurate human traffic data. AI crawler visits don't appear anywhere in your existing stack.

We built AI Bot Tracker to close this gap entirely.

Three Questions Every AEO Client Needs Answered

When we analyzed what our clients actually needed — not what they asked for, but what decisions they were trying to make — it came down to three questions:

1. Is my site being crawled by AI engines at all?

This is the foundational question. Before you can be cited, you must be crawled. Before you optimize content, you need to know whether the bots can even reach your pages. If GPTBot has never visited, your schema markup is irrelevant — it hasn't been seen yet. This is crawl confirmation, and without it you are working from assumption.

2. Which AI engines are visiting — and what are they doing?

Not all AI crawlers are the same. This distinction matters enormously for strategy, and no existing tool makes it:

Category Bots What it means
AI Training GPTBot, ClaudeBot, Google-Extended, Mistral AI, Common Crawl Your content is being ingested into future model training data
AI Search OAI-SearchBot, Perplexitybot, Grok/xAI, Brave Search Your content is being retrieved for real-time AI search answers right now
AI Assistant AmazonBot/Alexa, AppleBot-Extended Your content is being indexed for voice and assistant responses

An AI Training crawl and an AI Search crawl require different optimization responses. The tracker tells you which you're getting.

3. After I made changes, did the bots come back?

This is the question that turns passive monitoring into an active optimization feedback loop. When you implement schema markup, restructure a landing page, or add new content — did a bot return to re-crawl it? Revisit tracking shows you exactly that: which pages got a second look, how many days passed between visits, and which bots showed the most interest in your content after a change.

The signal you've been missing: A GPTBot revisit 11 days after schema implementation is the closest thing to proof that your AEO work was noticed. Without revisit tracking, you can never connect your changes to crawler behavior.

How It Works: The Technical Approach

We spent time on this decision. The tracking mechanism needed to work on every platform — WordPress, Shopify, Webflow, Squarespace, raw HTML, React — without requiring a JavaScript runtime, a plugin, a CMS extension, or server access. It needed to be invisible to users but visible to every bot that crawls HTML.

The answer is a 1×1 transparent tracking pixel — a 43-byte GIF served from a Vercel serverless function.

YOUR EMBED CODE

<!-- AI Bot Tracker by AEOfix -->
<img src="https://aeofix.com/api/bot-pixel?site=YOUR_ID&page=/"
     width="1" height="1" style="display:none" alt="">

When any bot — human or crawler — requests a page, it also requests that image from our endpoint. Our function reads the User-Agent header, identifies the bot from our pattern library of 35 named crawlers, logs the visit with geo data, checks for a prior visit in the last 30 days, and returns the pixel in under 100ms. The visitor never knows. The bot never knows. But you now have a timestamped record.

A few things worth noting about this approach:

  • No JavaScript required. Bots that crawl for AI training typically do not execute JavaScript. A pixel embedded in the raw HTML is the only reliable detection method for these crawlers.
  • Zero CORS issues. Image requests have no cross-origin restrictions. The embed works identically across every platform and domain.
  • Zero external API cost. Geo data comes from Vercel's built-in headers — country and city are attached to every request automatically. No paid geo API, no latency overhead.
  • Prompt injection protection. The endpoint runs security screening on every request. Adversarial bots that embed instructions in their User-Agent strings to attempt data manipulation are silently dropped. The pixel returns; nothing is logged.

What Your Dashboard Shows

Every bot visit is stored and surfaced in a real-time dashboard accessible with your API key. You get:

  • Total visits, unique bots, pages crawled, revisit count, countries — summary stats for any 7, 30, or 90-day window
  • Top bots table — named bot, category badge, total visits, revisit count, average days between revisits, last seen timestamp
  • Category breakdown — how visits split across AI Training, AI Search, AI Assistant, Search Index, Social Media, and SEO Tools
  • Top pages crawled — which URLs get the most bot attention and from how many distinct crawlers
  • Country distribution — where in the world the crawl activity originates
  • Live activity feed — a chronological stream of every detected bot visit, with bot name, category, page path, geo, and revisit flag

Works Everywhere. One Line of HTML.

We tested the embed against every major platform before launch. The answer is the same everywhere: paste it before </body> and it works.

WordPress
Shopify
Webflow
Squarespace
Wix
Framer
Ghost
Next.js
Raw HTML

Why No One Has Built This Before

The honest answer: the market for AI-crawler-specific monitoring didn't exist two years ago. General analytics tools were built when "bot" meant something to filter out — a noise problem, not a signal. Their architecture reflects that assumption. Filtering bots in requires rebuilding from scratch around a different question.

The second reason: the classification taxonomy is non-trivial. Identifying GPTBot is easy — the pattern is public. But knowing that GPTBot is a training crawler while OAI-SearchBot is a real-time retrieval crawler, and that the distinction matters for how you should respond to each — that's the institutional knowledge that makes the data useful rather than just interesting.

The third reason: prompt injection protection. An endpoint that reads User-Agent strings and logs them to a database is a potential attack surface for adversarial AI agents. Handling that correctly — detecting injection attempts, sanitizing inputs, maintaining 60-second deduplication windows — adds meaningful engineering overhead that most analytics projects skip.

We built all of it because we needed it ourselves, and it works.

Pricing & What's Coming

AI Bot Tracker will launch at three tiers:

  • Starter — $29/mo: 1 domain, all 35 bots, 30-day history, full dashboard
  • Pro — $79/mo: Up to 3 domains, 90-day history, CSV export, API access
  • Agency — $199/mo: Unlimited domains, 1-year history, white-label dashboard, webhook alerts

We are launching with a waitlist. When we hit five confirmed signups, the service goes live and founding members get access first.

Early Access

Be Among the First Five

Join the waitlist. When we hit 5 signups, we go live and you get first access. No payment until launch.

Join the Waitlist