Methodology

Scoring Methodology

The Agent Readiness Score measures how well AI agents can discover, understand, integrate with, and transact on your platform. It's a single number from 0 to 100, calculated from 35 checks across six layers.

This page explains what we measure, why we measure it, and how the score is calculated. We publish this openly because a score only has value if you trust how it's built.

The six layers

Agent readiness isn't a single attribute. It's a stack. Each layer represents something AI agents need from your platform before they can find you, use you, and commit to you.

Layer 1 · 20% of score

Discovery

Can agents find you?
What we check

llms.txt, A2A agent card (agent.json), structured data (JSON-LD), sitemap with developer references, robots.txt AI crawler permissions, developer hub presence, bot protection accessibility.

Why it matters

The llms.txt standard, proposed by Jeremy Howard of Answer.AI in September 2024, has been adopted by over 844,000 websites including Anthropic, Cloudflare, and Stripe. Google's Agent-to-Agent (A2A) protocol uses agent cards for capability advertisement. These are the emerging standards agents rely on for discovery.

Layer 2 · 20% of score

Schema

Can agents understand your API?
What we check

OpenAPI specification presence and version, endpoint description coverage, parameter typing completeness, error response documentation, request/response examples, rate limit documentation, pagination documentation.

Why it matters

The OpenAPI specification is the industry standard for describing REST APIs. MuleSoft's AI Readiness Profile (February 2026) identifies documentation quality as a primary dimension for evaluating whether an API is suitable for agent consumption — noting that “vague or incomplete descriptions lead to wrong tool selection or hallucinated parameters.”

Layer 3 · 25% of score

Protocol

Can agents interact with your platform?
What we check

MCP server registration and reachability, A2A protocol support, SDK availability across languages, webhook documentation, API versioning strategy, capability advertisement across multiple channels.

Why it matters

The Model Context Protocol (MCP), introduced by Anthropic in November 2024, has reached 14.5 million weekly npm downloads and is supported by Claude, Gemini, VS Code, Cursor, and the Google Workspace CLI. Google's launch of the Workspace CLI in March 2026 — with a built-in MCP server — established the pattern of CLI for humans, MCP for agents, in the same tool.

Layer 4 · 15% of score

Auth

Can agents authenticate without a human?
What we check

OAuth 2.0 client_credentials flow, scoped API key issuance, service account support, granular permission scoping, automated token refresh.

Why it matters

MuleSoft's AI Readiness Profile identifies security and authorization as a critical dimension, emphasizing that “some operations require human approval, multi-step workflows, or audit trails” and that agents need appropriate access controls. The x402 protocol (Coinbase/Cloudflare, 2025) extends this to payments — enabling agents to authenticate and pay in a single HTTP round trip.

Layer 5 · 10% of score

Behavior

Does your API behave the way agents expect?
What we check

Idempotency support on write endpoints, structured error format consistency, rate limit headers (X-RateLimit-*), Retry-After header on 429 responses, consistent pagination, deterministic response ordering.

Why it matters

MuleSoft's framework highlights that “slow APIs degrade the agent experience” and that “agents don't inherently understand cost — they call what seems relevant.” Behavior checks evaluate whether your API provides the signals agents need to operate reliably without human supervision.

Layer 6 · 10% of score

Pricing

Can agents evaluate cost and commit?
What we check

Machine-readable pricing (JSON-LD Offer schema or pricing API), usage-based tier visibility, API-specific pricing information, free tier or sandbox availability, cost estimation endpoints or documentation.

Why it matters

The x402 protocol, launched by Coinbase and Cloudflare in September 2025, processed over 100 million transactions in its first three months. Google's Agent Payments Protocol (AP2) includes x402 as a native extension. These protocols enable agents to autonomously evaluate and commit to pricing — but only if pricing information is machine-readable in the first place.

How the score is calculated

Check states

Each of the 35 checks returns one of three scored states (plus a blocked state when bot protection prevents evaluation):

StateDescription
PassFull points. The check found what it was looking for.
PartialHalf points. Something was found but incomplete or non-standard.
FailZero points. The feature is missing or not detectable.
BlockedExcluded from scoring. Bot protection prevented the check from being evaluated.

Weighted aggregation

Each check contributes equally to its layer score. Each layer contributes to the overall score based on the layer weights shown above. The overall score is a weighted average of all six layer scores.

Calibration curve

Raw check scores are passed through a piecewise calibration curve that maps the theoretical 0–100 range to a practical distribution. The curve is calibrated against benchmark data from scanned domains and is designed so that:

  • Early wins are rewarding. Adding an llms.txt file to a site scoring 20 produces a visible score increase.
  • The middle range has clear separation. A score of 50 is meaningfully different from a score of 65.
  • The top is hard to reach. Moving from 90 to 95 requires substantially more engineering investment than moving from 40 to 55.

This follows the same principle used by Google Lighthouse, which calibrates scores against real-world distributions so that they reflect practical benchmarks rather than theoretical maximums.

Score bands

ScoreLabelMeaning
0–24Just getting startedAgents may not be able to find or use your platform
25–44Room to growAgents can discover you but may hit gaps
45–64Making progressAgents can interact for basic tasks
65–84Well-positionedStrong agent experience with minor gaps
85–100Agent readyStrong agent readiness across measured layers

How we handle edge cases

Agent access probing

For every scan, Serge probes the homepage with the official User-Agent strings of six major AI agents: ChatGPT (GPTBot), Claude (ClaudeBot), Perplexity (PerplexityBot), Gemini (Google-Extended), Apple Intelligence (Applebot-Extended), and Meta AI (Meta-ExternalAgent). Each probe is a single HTTPS HEAD/GET request to the homepage — not a full crawl.

Serge also parses the domain's robots.txt for each agent's token to determine whether the agent is explicitly allowed or disallowed. The combined result is a per-agent verdict: accessible (HTTP responds and robots.txt allows), blocked (HTTP connection dropped or refused), or partial (server responds but robots.txt disallows the agent).

This data is displayed in the Agent Access section of the scan report and is factored into the Layer 1 (Discovery) score.

Bot protection and browser fallback

If our scanner is blocked by a WAF or challenge page, we mark affected checks as “blocked” rather than “fail.” Blocked checks are excluded from the score denominator — we only score what we can see.

When SergeBot's default User-Agent is silently blocked (90%+ of core resource fetches fail), Serge re-fetches key resources using a standard browser User-Agent. This allows Serge to deliver real results for domains behind aggressive CDN bot protection. The scan report clearly discloses when this fallback was used via a “Partial results” badge.

Bot protection blocking is itself a finding. If our scanner can't reach your site, other AI agents may face the same restriction. This is reported as a discovery issue with specific recommendations for allowing legitimate agent access.

Sites without APIs

Not every website needs an API to be agent-ready. A hotel chain, a restaurant, or a news publisher has different agent readiness needs than a SaaS platform.

For sites where no public API is detected, layers 4–6 (Auth, Behavior, Pricing) are hidden by default behind the advanced layers toggle. The default score reflects only layers 1–3, so the score stays meaningful for any type of site.

Score stability

The scoring methodology is locked at launch. The calibration curve and layer weights do not change. This ensures that a score of 65 measured in March 2026 means the same thing as a score of 65 measured in December 2026.

New checks may be added to the framework over time as the agent ecosystem evolves (new protocols, new standards), but the underlying methodology stays constant. This is critical for the re-scan and score-change features — progress must be measured against a stable ruler.

Prior work and influences

The Serge scoring methodology draws on established practices from several domains:

Scoring curve design

  • Google Lighthouse — Uses log-normal curves calibrated against real-world web performance data (HTTP Archive). The 25th percentile of real websites maps to a score of 50. The 8th percentile maps to 90. Serge follows the same principle: scores are calibrated against the real distribution of scanned domains, not theoretical maximums.
  • FICO (300–850 range) — Log-normal distribution where gaining 20 points from 600 to 620 is dramatically easier than gaining 20 from 830 to 850. Serge's curve produces the same property: early improvements are rewarding, late improvements require significant investment.
  • SSL Labs — Moved from 0–100 numeric scores to A–F letter grades because letters proved more useful for communication. Also pioneered hard caps: a single critical vulnerability overrides the total score.
  • SecurityScorecard — Logarithmic scale where an organization with an F grade is 13.8x more likely to be breached than one with an A. Proved that external, non-intrusive scanning can produce scores that correlate with real-world outcomes.

Agent readiness frameworks

  • MuleSoft AI Readiness Profile (February 2026) — Ten dimensions for evaluating whether an API should be exposed as an MCP tool: documentation quality, security, idempotency, cost control, latency, scalability, payload size, process compliance, semantic distinctness, and regulatory clearance. Serge's six layers cover the externally measurable subset of these dimensions.
  • llms.txt specification (September 2024, Jeremy Howard / Answer.AI) — The first proposed standard for making website content discoverable by LLMs. Over 844,000 websites have adopted it. Serge checks for llms.txt as part of Layer 1 (Discovery).
  • Model Context Protocol (November 2024, Anthropic) — The emerging standard for agent-to-tool integration. 14.5M weekly npm downloads. Serge checks for MCP server presence and reachability as part of Layer 3 (Protocol).
  • Agent-to-Agent Protocol (2025, Google) — Protocol for agent-to-agent communication and capability advertisement, including agent cards at .well-known/agent.json. Serge checks for A2A compliance in Layers 1 and 3.
  • x402 Protocol (September 2025, Coinbase/Cloudflare) — Internet-native payment protocol enabling agent-to-service transactions. 100M+ transactions processed. Serge checks for x402 support in Layer 6 (Pricing).

Free-tool-as-wedge pattern

The Serge scanner follows a model proven by:

  • HubSpot Website Grader (2007) — Free website scoring tool that graded 4 million websites and became HubSpot's primary lead generation mechanism, generating 40,000+ organic backlinks.
  • SecurityScorecard (2014) — Free security score widget that was used by 880,000+ companies and became the company's primary lead generation tool, scaling to $140M ARR.

Both demonstrated that a free, shareable scoring tool can define a category and build authority through transparency and consistency.

Limitations

We believe in being transparent about what the score can and cannot tell you.

What the score measures: How discoverable, understandable, and integrable your platform is for AI agents, based on externally observable signals.

What the score does not measure:

  • Internal API quality or performance (we don't make authenticated API calls)
  • The actual volume of agent traffic you receive
  • Whether specific agents successfully complete tasks on your platform
  • API reliability under load
  • The quality of your agent-facing documentation beyond what's publicly accessible
  • Response payload size or latency (we can't measure these externally)

The score is not a guarantee. A high score means agents can find and use your platform. It doesn't guarantee they will. Agent adoption depends on market fit, competitive landscape, and the tasks agents are being asked to perform.

The score is a point-in-time measurement. The agent ecosystem is evolving rapidly. Standards that are emerging today (llms.txt, A2A, x402) may become foundational or may be superseded. We commit to keeping the framework current with the ecosystem while maintaining score stability.

Feedback

We welcome feedback on the methodology. If you believe a check is producing inaccurate results, if we're missing an important signal, or if you have research that should inform the framework, please contact us.

The goal is to get this right — for the companies being scored and for the agents trying to use them.

Serge · serge.ai · Superstellar LLC · Zug, Switzerland
Last updated: March 2026