OctoPerf MCP Server in Action: Browser Probes, Scheduled Runs and Smarter Reports

In our previous article we introduced the OctoPerf MCP Server and followed three skills (validation triage, auto-correlation, scenario diagnosis) through a complete workflow: from a raw HAR recording to a diagnosed 500-user load test.

This second part covers the remaining skills, and they take the story further: measuring what real users perceive during the load with a Playwright browser probe, turning the test into a recurring schedule, and letting the agent read the resulting reports widget by widget, trends included. Same format as before: the actual conversation between a user and the LLM, then the matching result in OctoPerf.

Want to become a super load tester?

Request a Demo

Table of Contents¶

Step 4 - Add a Real-Browser Probe to the Load Test
Step 5 - Schedule It as a Recurring Run
Step 6 - Read the Reports, Widget by Widget
The Complete Picture
Conclusion

Step 4 - Add a Real-Browser Probe to the Load Test¶

A user in front of a web browser

Where we left off: the Checkout Virtual User validates clean and the application sustains around 350 concurrent users. But there is a blind spot. JMeter measures server-side HTTP timings, and our checkout is a React single-page application: the API can answer in 200 ms while the user still waits two seconds for the page to render.

The fix is a pattern known as a real-browser probe: a single Playwright Virtual User that runs the same journey through an actual Chromium during the load test, measuring the user-perceived experience while the JMeter pool keeps the server busy. If you come from NeoLoad or LoadRunner, you know it as RealBrowser User or TruClient. The real-browser probe skill automates the whole construction:

You: The checkout is a React SPA. JMeter says the API is fast, but users complain it feels slow under load. Can you add a real-browser probe to the Checkout load test?

Claude: (calls get_virtual_user, import_playwright_virtual_user, validate_virtual_user, patch_scenario, get_scenario_matching_plans)

Done — here is what I set up:

Translated your JMeter Virtual User into a Playwright spec: one page.goto / page.fill / page.click per recorded step, with absolute URLs,

Dropped the correlation rules from the translation — a real browser submits the live form, sends the real cookies and echoes the CSRF token natively, so the rules we created last time are dead weight here,

Imported and validated the spec: 1 iteration, all assertions green (Chromium warmup makes this slower than a JMeter validation, ~30 s total),

Patched the scenario into a hybrid: UserProfile A ramps 500 JMeter users, UserProfile B pins the Playwright probe at exactly 1 user, running simultaneously,

Pre-flighted your subscription: your plan supports real-browser users, the scenario is launchable as-is.

One probe is enough — real browsers are CPU-heavy, and the point is measurement, not load. Want a second viewpoint? I'd add another 1-user profile, not scale this one.

Notice the inversion of the work we did in part 1: the correlation rules that were essential for JMeter become noise in a real browser, and the skill knows to strip them. It also encodes the cardinal rule of hybrid scenarios: the browser profile stays at 1 user, because scaling Chromium instances would saturate the load generator and skew everyone's timings.

The hybrid scenario in OctoPerf

After the run, the two profiles tell two complementary stories:

You: So, what did the probe see during the run?

Claude: The server-side and user-side stories diverge, and that's the finding:

JMeter (500 users): checkout API p95 stays at 230 ms all the way up to 350 users — the backend holds,

Playwright probe: page.goto('/checkout') climbs from 1.1 s at low load to 2.4 s at peak — the gap is client-side: render-blocking resources slow down as static asset latency grows.

In other words, your SLA on HTTP timings would pass while your users wait twice as long. The delta between the two curves is the client-side cost. Tree report split by Virtual User: open it in OctoPerf →

The statistics tree split by Virtual User

Step 5 - Schedule It as a Recurring Run¶

An old clockwork mechanism

A load test that runs once tells you where you stand today. A load test that runs every morning before the team arrives tells you when a regression lands. OctoPerf's scheduler handles one-shot and cron triggers, and the scheduling skill knows its sharp edges:

You: Schedule the hybrid scenario every weekday at 8:30, Paris time.

Claude: (calls get_scenario_matching_plans, then schedule_scenario_cron)

Scheduled — with two precautions worth knowing about:

Pre-flight first. A scheduled scenario that can't start doesn't fail once, it fails every fire and consumes credits each time until someone notices. I verified your plan can host the hybrid run before creating the job.

The cron is evaluated in UTC. 8:30 in Paris is currently UTC+2, so the expression is 30 6 * * 1-5. Heads up: when France switches back to winter time, this will fire at 7:30 local. Tell me if you'd rather I re-arm it at the DST change.

Job weekday-checkout-0830 created. Next fire: tomorrow, 08:30 Paris time ✓ — I converted the scheduler's nextRun back to your timezone to double-check the math. To pause it later, ask me to disable the job (reversible) rather than delete it.

Small things, big consequences: a Quartz-style 6-field expression would be rejected, a cron typed in local time would silently fire at the wrong hour, and a failing schedule compounds its cost daily. The skill front-loads all three checks so you don't discover them in next month's invoice.

Scheduled jobs in OctoPerf

Step 6 - Read the Reports, Widget by Widget¶

Analytics charts on a screen

One week later, the scheduler has produced five runs. An OctoPerf bench report is a rich document: summary widgets, line charts, percentile curves, top-N tables, error drill-downs, each backed by its own data shape. The bench-report reading skill is the field guide that maps every widget to the right tool, with the semantic gotchas that trip up humans and LLMs alike:

You: In this morning's report, the probe's Network row says 24 ms but page.goto('/checkout') says 364 ms. Which one is wrong?

Claude: Neither — they measure different things, and you should not sum them either:

The ACTION row (page.goto(...)) is the duration of the Playwright command: navigation, JS execution and render included. This is the user-perceived number,

The NETWORK row aggregates the individual HTTP requests the browser fired. Playwright is async, so these overlap with the actions instead of adding up,

For the journey's true wall-clock, read the bare iteration row — 1.8 s this morning.

While I was in there: 43% of the JMeter samples returned HTTP 304 (cache hits). Your global throughput looks better than what the server actually serves — for SUT cost, filter on the 200s.

And because the runs are now recurring, trend analysis comes naturally:

You: Are the morning runs drifting since Monday? If yes, give me something I can show at the Friday review.

Claude: (calls create_trend_report_by_name, get_report_table_values, then export_bench_report_pdf and polls get_task_result)

Yes, there is a drift. I built a trend report anchored on this morning's run, matching the week's scheduled runs by scenario name:

Error rate: flat, 0.1% all week,

Checkout API p95: +9% since Monday (210 ms → 229 ms), a steady climb rather than a one-day spike,

The probe's user-perceived checkout time: +14% over the same window — the regression is real and growing.

The delta table marks Wednesday as the inflexion — worth checking what was deployed that day. I've also exported the report as a PDF for your review: download it here (link valid ~5 minutes, ask me for a fresh one anytime). Interactive version: open it in OctoPerf →

This last exchange chains three skills in one answer: bench-report reading for the widgets, the trend report machinery for the week-over-week comparison, and the PDF export skill for the shareable artefact. The agent handles the async export chain (submit, poll, presigned download URL) without you ever seeing a task id.

The trend report in OctoPerf

The Complete Picture¶

Across the two articles, the eight bundled skills now cover the entire performance-testing loop, end to end:

Import and repair: validation triage, auto-correlation,
Run and understand: scenario diagnosis, async polling,
Measure what users feel: real-browser probe,
Automate and watch: scheduling, bench-report reading, PDF export.

Each step of this series happened in a chat, yet every artefact it produced (the corrected Virtual User, the hybrid scenario, the cron job, the trend report) is a plain OctoPerf object, visible and editable in the UI by anyone on the team. The agent is a colleague working in your project, not a parallel system.

Conclusion¶

If you have not connected an agent yet, the first article and the MCP Server documentation will get you there in minutes: point your client at https://api.octoperf.com/mcp, log in once, and start with something simple like "list my OctoPerf workspaces".

And these skills are only the packaged workflows. The server exposes around 100 tools, so the most interesting prompts are the ones we have not written yet: yours. The skills are open on GitHub, and if you compose a workflow worth sharing, we would love to see it.

Want to become a super load tester?

Request a Demo