GA4 + AI agents

How to track AI agent traffic in GA4 (and what GA4 can't tell you)

The exact channel-group setup that surfaces ChatGPT, Claude, Perplexity, Gemini, and Copilot referrals. Plus the four things GA4 will still miss when an agent is doing the shopping.

The setup is ten minutes. The interesting part is what it doesn't catch — and the diagnostic gap that opens up once your team starts asking why agent-driven sessions don't convert.

Step 1

The five referrers GA4 currently buckets as "Direct"

Each of these services sends a real `Referer` header when a user clicks a citation in the answer panel. By default GA4 treats most of them as Direct because they aren't in the canonical Referral list. Adding them as a custom channel group exposes them.

ServiceReferrer hostnamesNote
ChatGPTchatgpt.com, chat.openai.comSends referrer on user-driven citation click. GA4 sees it as Direct without a rule.
Claudeclaude.ai, claude.comCitation click from Claude's answer surface.
Perplexityperplexity.ai, www.perplexity.aiSource-link click from a Perplexity answer.
Geminigemini.google.comDirect citation click from Gemini's answer panel. Google AI Mode citations appear under google.com.
Copilotcopilot.microsoft.comMicrosoft Copilot citation click.
Step 2

The exact GA4 channel group

This adds an "AI Assistants" row to your acquisition reports. Existing channel groups are not affected.

  1. 1 / Open the channel-group editor

    GA4 Admin → Data display → Channel groups → Create new channel group. Name it "AI Assistants."

  2. 2 / Add the rule

    Add a new channel inside the group. Match condition: Source matches regex.

  3. 3 / Paste the regex

    The regex below covers the five services above plus the new browser-mode surfaces (ChatGPT Atlas, Perplexity Comet) where they pass referrers.

  4. 4 / Save and verify

    Reports → Acquisition → User acquisition → set the channel group dropdown to "AI Assistants" → check that you see traffic. Backfill is up to GA4; expect 24–48h for historical data.

Source matches regex
chatgpt|chat\.openai|claude\.ai|claude\.com|perplexity\.ai|gemini\.google|copilot\.microsoft|comet\.perplexity

If you use Google Tag Manager, you can also push the assistant identity into a custom dimension at the dataLayer level — useful for cross-segment funnels. Reach out if you want the GTM snippet.

Step 3 (the important one)

What GA4 will still miss

The setup above catches the slice of AI traffic that travels with a referrer. Most of it does not.

70.6% of AI-driven sessions arrive with no referrer

Loamly's 2026 study of 446,000 AI-driven sessions found 70.6% had no referrer header. Headless browser agents (Operator, ChatGPT Atlas, Perplexity Comet, Claude with Playwright MCP) navigate via direct URL fetch. GA4 logs them as Direct, regardless of how many regex rules you add.

ChatGPT's in-product cards never click through

When ChatGPT shows a product card inside the chat — image, price, link — and the user buys via the in-chat path, your site sees zero traffic. OpenAI's Instant Checkout flow runs through Stripe and a Shopify connection that bypasses your storefront entirely.

Arrival is not journey

GA4 can tell you the visit happened. It cannot tell you that the agent reached the product page, tried to click a roleless variant selector, retried twice, and exited. The entire failure mode lives between landing-page and conversion-event, in a black box.

Cursor-based replay tools are blind to agents

Hotjar, Contentsquare, FullStory, Lucky Orange capture mouse movement, scroll, click. Browser agents do none of those. Heatmaps of agent sessions are flat. Recordings are silent.

Reference

AI agent user-agent strings — what they are and what they catch

These are the user-agents you'll see in server logs but not in GA4 (because they don't fire client-side scripts). Sources linked at the bottom of the page.

BotOwnerPurposerobots.txt directiveVisible in GA4?
GPTBotOpenAITraining data crawlUser-agent: GPTBotNo — server log only
OAI-SearchBotOpenAISearch index for ChatGPT searchUser-agent: OAI-SearchBotNo — indexer
ChatGPT-UserOpenAILive fetch on user instruction ("go to X and tell me Y")User-agent: ChatGPT-UserSometimes — fires GA if it loads JS
ClaudeBotAnthropicTraining data crawlUser-agent: ClaudeBotNo
anthropic-aiAnthropicOlder training crawler (still in use)User-agent: anthropic-aiNo
Claude-UserAnthropicLive user-driven fetch via Claude.aiUser-agent: Claude-UserSometimes
PerplexityBotPerplexityIndex crawlUser-agent: PerplexityBotNo
Perplexity-UserPerplexityLive user-driven fetchUser-agent: Perplexity-UserSometimes
Google-ExtendedGoogleTraining opt-out token (block to exclude from Gemini training)User-agent: Google-ExtendedNo
GoogleOtherGoogleMulti-purpose Google fetcherUser-agent: GoogleOtherNo

Sources: OpenAI bot documentation, Anthropic bot documentation, Perplexity crawler docs, Google special-case crawlers.

The diagnostic gap

Three layers of analytics. The middle one is empty.

GA4 catches the arrival — the visit happened. Hotjar and Contentsquare capture human behaviour on the page. Cloudflare and HUMAN catch and block bots. Nothing in your stack tells you what an agent did between landing and giving up. That gap is where Serge sits.

Kostenloser Scan · 30 Sekunden · keine Anmeldung

Wollen Sie eine Momentaufnahme Ihrer Lage?Scannen Sie Ihren Store.

Geben Sie Ihre Domain ein. Serge crawlt Ihren Store in etwa dreißig Sekunden — vollständig deterministisch, keine KI in der Scan-Schleife — und liefert eine schnelle Momentaufnahme zurück: wo Agenten möglicherweise blockiert werden, was zuerst zu prüfen ist und vorgeschlagene Korrekturen zur Verifizierung durch Ihr Team. Keine Anmeldung. Leiten Sie das Ergebnis an Ihren Frontend-Lead weiter.

Der Scan ist der Ausgangspunkt. Das tiefere Bild ergibt sich aus Sitzungs-Sichtbarkeit, Replay und Briefing.

Common questions

Common questions about GA4 + AI traffic

Will adding the regex break my existing channel groups?

No. Custom channel groups in GA4 are additive — they sit alongside the Default Channel Group. Reports default to Default; you switch to AI Assistants in the channel-group dropdown when you want the breakdown.

Why doesn't GA4 catch ChatGPT-User or Claude-User if they execute JavaScript?

Sometimes it does. Live user-driven fetchers (ChatGPT-User, Claude-User, Perplexity-User) execute on the user's behalf and may run client-side scripts. When they do, gtag.js fires and you'll see the visit. The catch is that the user-agent string contains "ChatGPT-User" — GA4's standard reports treat unfamiliar user-agents as direct. You'd need a custom dimension on user-agent to break those out.

Should I block GPTBot or anthropic-ai in robots.txt?

Separate question. GPTBot, ClaudeBot, anthropic-ai, and Google-Extended are training-data crawlers — blocking them removes you from the training set but does not affect agent shopping at runtime. ChatGPT-User and Claude-User are runtime fetchers; blocking those breaks live agent shopping. Most commerce sites want to allow runtime, decide separately on training.

Is there a server-side approach that catches more?

Yes — server-side analytics (Plausible, Fathom server module, raw access logs through a parser) sees the user-agent on every fetch, including agents that don't run JavaScript. This catches GPTBot, OAI-SearchBot, ClaudeBot, etc. It still doesn't tell you whether the agent succeeded — only that it visited. The server log is the floor; the diagnostic is what's missing on top.

What about ChatGPT Atlas and Perplexity Comet — those are full browsers?

Both are full browser modes (Comet launched 2025, Atlas launched 2026). They run real Chromium under the hood and execute JS, so GA4 fires. The user-agent contains a marker ("Atlas" or "Comet") and the referrer is sometimes preserved, sometimes stripped — depends on how the user navigated. Add the relevant fragments to the regex; expect the surfaces to keep changing for the next 12 months.

Does "AI Mode" in Google count as AI traffic?

Treat it carefully. Google AI Mode and AI Overviews show citations that link to your site; the click-through arrives with referrer google.com, indistinguishable from a normal Google search click. GA4 cannot separate AI-Mode-driven google.com clicks from organic-search google.com clicks without help from Search Console's AI-Mode metrics, which roll out unevenly.