Introducing the OctoPerf MCP Server: Load Testing from Your AI Assistant

We are excited to announce the official release of the OctoPerf MCP Server. Built on the open Model Context Protocol, it lets any AI agent (Claude.ai, Claude Code, Codex, Gemini CLI, GitHub Copilot and more) drive your OctoPerf account directly: import Virtual Users, fix replay errors, run scenarios and read back metrics, all in plain language, without leaving the chat.

In this article we first present the server globally: what it exposes, how authentication works, and how to connect your favorite client in minutes. We then follow a complete, realistic workflow through three of the bundled skills: validation triage, auto-correlation and scenario diagnosis, showing the actual conversation between a user and the LLM, and the matching result in OctoPerf.

Want to become a super load tester?

Request a Demo

Table of Contents¶

What Is the OctoPerf MCP Server?
Get Connected in Minutes
Skills: Packaged Expertise for Your Agent
From Raw Recording to First Load Test
What This Changes, Really
Conclusion

What Is the OctoPerf MCP Server?¶

A conversation with an AI assistant

If you read our blog regularly, you may remember our first experiments with MCP and OctoPerf back in 2025. What started as a proof of concept is now a fully supported product feature, exposed at https://api.octoperf.com/mcp and documented in the MCP Server documentation.

The server is a thin, stateless bridge between an LLM agent and the OctoPerf REST API. There are no service identities and no API keys: every tool call is authenticated as you through OAuth 2.1, and resolves to a regular OctoPerf API call. Anything you can do from the web UI as your user, the agent can do on your behalf once you authorize it, and you can revoke that authorization anytime from Connected Apps.

Concretely, the server exposes around 100 tools covering the full performance-testing lifecycle:

Discovery: list workspaces, projects, scenarios, Virtual Users, load generator providers,
Import: bring in recordings from HAR, JMX, Postman, Playwright, WebDriver, URL lists or exported archives,
Edit: rename, re-tag, patch the action tree, manage variables, HTTP servers, correlation rules and project files,
Validate: run a functional check, read the per-action validation index, drill into failing requests and responses,
Run: start a scenario, monitor progress, stop a run, inspect launch logs,
Analyze: read bench reports, fetch the data behind any chart or table, expand error rows, run trend reports.

One detail makes a big difference in practice: every tool returns a deep-link to the matching page in the OctoPerf UI. Whenever the agent summarizes a result, it hands you a clickable URL, so you are always one click away from the full interactive report, the Virtual User tree, or the running test.

Get Connected in Minutes¶

Because the server speaks the open Model Context Protocol over Streamable HTTP, it plugs into any MCP-aware client. The fastest paths:

Claude.ai (Pro/Max): open Customize → Connectors, add a custom connector pointing at https://api.octoperf.com/mcp, and log in with your usual OctoPerf credentials.

Claude Code: install the official plugin from the OctoPerf marketplace: it registers the MCP server, installs the agent guide and adds eight workflow skills in one step:

/plugin marketplace add OctoPerf/octoperf-claude-plugins
/plugin install octoperf@octoperf

Everything else (Claude Desktop, Codex, Gemini CLI, Copilot in VS Code, Cursor, ...): point your client at the endpoint and authenticate: the first tool call opens a browser tab for OAuth login, and the token is cached afterwards. The documentation walks through each client step by step.

Whatever the client, we recommend feeding your agent the public agent guide: it acts as a knowledge base that teaches the AI how an OctoPerf project is structured, so it follows the platform's standards from the first prompt.

And if you run OctoPerf Enterprise on-premise, the MCP server self-hosts alongside your instance with a bundled OAuth IdP: AI-driven load testing without anything leaving your infrastructure.

Skills: Packaged Expertise for Your Agent¶

A well-organized toolbox

Raw tools are powerful, but performance testing has its own know-how: which signal to read first when a validation is red, when correlation is the right fix and when it isn't, how to classify a degraded bench run. That expertise is captured in skills: markdown playbooks the agent loads on demand, each one encoding a complete diagnostic or repair workflow.

The Claude Code plugin bundles eight of them (they are also publicly downloadable for any other agent):

Validation triage: "the validation is red after import",
Auto-correlation: "fix the replay errors / 401s on this Virtual User",
Scenario diagnosis: "why is my last run failing?",
Bench-report reading: "what does this widget mean?",
Real-browser probe: "add a Playwright UX probe to my load test",
Scheduling: "run this scenario every weekday at 8am",
PDF export: "export the bench report as PDF",
Async polling: runs automatically while a validation or bench completes.

Rather than describing each one abstractly, let's follow the three skills you will use the most, chained together on a single realistic case: a freshly recorded checkout journey that needs to become a passing load test. We will cover the remaining skills in upcoming articles.

From Raw Recording to First Load Test¶

The scenario: you recorded the checkout journey of your e-commerce application as a HAR file, imported it into OctoPerf, and the validation run came back red. The traditional next step is an hour of clicking through failing requests, comparing recorded and replayed payloads by hand. Here is what it looks like with an agent connected to the MCP server instead.

Step 1 - Triage a Red Validation¶

The first skill, validation triage, kicks in when a Virtual User has many failing actions and you want a diagnosis, not a guess. Instead of reading every failure one by one, the agent fetches the validation index, groups failures by root cause, and drills into one representative per group:

You: I imported checkout.har into my Pet Adoption Store project. The validation is red, can you triage it?

Claude: (calls get_virtual_user_validation_index, then get_validation_failure_detail on one representative per group)

Your Checkout Virtual User has 24 actions, 9 failing. They fall into three groups:

Auth / state — 6 actions failing with 401/403 after the login step. The X-CSRF-Token header replays the value captured at recording time, and the server rejects it as expired.

Variable / data — 2 actions failing with 400 "email already registered": the recorded account creation payload is replayed verbatim.

Server-side — 1 action returning a 504 on /api/recommendations; that one is your backend, not the script.

I suggest fixing the dominant group first: the 6 auth failures are a classic correlation problem. Want me to auto-correlate the Virtual User?

This is the skill's core heuristic at work: when 80% of failures share one root cause, fix that first and re-validate: most of the long tail clears once the dominant cause is resolved. Here is the same diagnosis as seen from the OctoPerf validation screen:

Failing actions in the OctoPerf validation view

Step 2 - Auto-Correlate the Dynamic Values¶

Correlation - extracting a dynamic value from a response and re-injecting it into subsequent requests - is the single most time-consuming chore in script maintenance. The auto-correlation skill drives OctoPerf's correlation rules end to end, with a guardrail worth noticing: it snapshots the Virtual User before rewriting anything.

You: Yes, go ahead.

Claude: (calls backup_virtual_user, list_correlation_frameworks, add_correlation_framework_to_project, apply_correlations_to_virtual_user)

Done — here is what I did:

Backed up the Virtual User as Checkout (pre-correlation), in case a rule misfires,

Confirmed the diagnosis on the first failing request: the login response sets a fresh csrfToken in a hidden form input, but the next POST replays the recorded one,

Applied the Token correlation framework to the project — it registered the extractor/injection rules for csrfToken and x-csrf-token,

Re-walked the Virtual User to wire the rules into the action tree,

Re-validated: 22/24 actions now pass. The 2 remaining failures are the "email already registered" group — correlation can't fix those, the recorded payload needs a variable.

I replaced the hardcoded email with a ${email} reference backed by a random-string variable, and re-validated once more: 24/24 green (the 504 recovered on its own). Your Virtual User is ready: open it in OctoPerf →

Two things are worth underlining. First, the agent did not blindly throw rules at the script: it confirmed the diagnosis on a real failing request before rewriting anything, exactly like an experienced performance engineer would. Second, it knew where correlation stops being the answer - and switched to a variable for the test-data problem.

Correlation rules created in the OctoPerf project

Step 3 - Run the Scenario and Diagnose the Results¶

With a green Virtual User, it is time to apply some load. The scenario diagnosis skill takes over once a bench run completes (or misbehaves): it reads the global metrics first, runs OctoPerf's Insights heuristics, classifies the run, and only then drills down.

You: Run the Checkout 500 VUs scenario and tell me how it went.

Claude: (calls run_scenario, polls get_bench_result, then get_report_summary_values and get_report_insights)

The run is finished — 500 virtual users, 20 minutes, EU-West. Here is the verdict:

The application sustains the load up to ~350 concurrent users, then degrades server-side.

Evidence:

Global error rate is 2.1%, but errors are concentrated in the last 5 minutes of the ramp-up,

The HIT_RATE_INFLEXION_POINT insight fired: hits/sec plateau at ~350 VUs while the user count keeps climbing — your saturation point,

The errors are SocketTimeoutException: Read timed out on the two payment actions, and the LATENCY_VS_RESPONSE_TIME insight confirms the time is spent server-side, not in the network or TLS handshake.

This is not a script problem — validation passes and the same actions are clean at low load. I'd look at the payment service's thread/DB connection pool. Full interactive report: open it in OctoPerf →

Note what the agent did not do: it did not blame the script, did not re-run the test "to see if it's flaky", and did not bury you in per-action tables. The skill encodes the discipline of reading global metrics before drilling down: and the difference between a Virtual User bug and a system reaching its capacity.

The bench report in OctoPerf

What This Changes, Really¶

A skeptic could object that everything above was already possible - and that is precisely the point: nothing here bypasses OctoPerf. The agent uses the same API, the same validation engine, the same correlation rules and the same reports you use today. What changes is the interface: the expertise needed to chain those features correctly is now encoded in skills, and the navigation cost has dropped to a sentence.

In practice we see three effects:

Beginners get a senior workflow by default: triage before fixing, backup before rewriting, one fix per validation cycle,
Experienced engineers compress the boring 80%: the import-fix-validate loop runs in the background while they focus on test design and analysis,
Performance testing reaches the people who never opened the UI: a developer can ask "did my branch regress the checkout?" from their IDE and get an answer backed by a real load test.

This fits the broader direction we discussed in our previous article: AI is reshaping performance engineering, and platforms that integrate naturally with AI ecosystems will lead that shift.

Conclusion¶

The OctoPerf MCP Server is available today for all SaaS users - the free plan is enough to try it - and self-hosts alongside OctoPerf Enterprise deployments. Point your favorite agent at https://api.octoperf.com/mcp, log in once, and ask it to list your workspaces.

We have only covered three of the eight bundled skills here. In upcoming posts we will dive into the others: composing hybrid load + real-browser scenarios with the Playwright probe skill, scheduling recurring runs, and letting the agent read bench reports widget by widget.

In the meantime, the MCP Server documentation and the AI load testing page have everything you need to get started - and if you build something interesting with it, we would love to hear about it.

Want to become a super load tester?

Request a Demo