Designing a Token-Efficient MCP Server: the OctoPerf Approach
In the first two articles of this series we showed what the OctoPerf MCP Server does. This one is for the builders: how we designed it, and specifically how we kept its token cost under control.
Because here is the thing nobody tells you when you start writing a Model Context Protocol server: the hard part is not exposing your API to an LLM. The hard part is not exposing too much of it. Every byte a tool returns lands in the model's context window, where it costs money, adds latency, and dilutes the model's attention. A server that naively mirrors a REST API produces an agent that is expensive, slow, and confused.
This article walks through the five patterns we applied to avoid that fate. None of them is specific to load testing: if you are building an MCP server for your own product, they should transfer directly.
