Your agent is deterministic.
Your APIs aren’t.
That’s why your agent tests flake. Gostly records the real upstream once, replays it byte-for-byte. Same input ↦ same response. Every run.
The shape of the problem
“Use AI to build deterministic automations once that can reproduce work. Don’t use AI to do repeatable work itself.”
— r/agi, organic thread
The agent-test failure mode
Your agent worked on Tuesday. It failed Wednesday. The agent didn’t change.
The Stripe sandbox returned a slightly different idempotency-key response. The eval flagged the agent as regressed. The agent was fine. Stripe moved.
This isn’t a bug in your code. It’s a property of the system: a deterministic agent calling a non-deterministic upstream cannot have a deterministic test. The only fix is to pin what the upstream returns.
I built this because I lost three days last quarter debugging an agent that hadn’t actually changed. The upstream had. I didn’t want to repeat that.
How it works for agents
Record one run
Run your agent against the real upstream once. Gostly sits between the agent and the third-party API and captures every request and response.
Replay forever
Switch the proxy to MOCK. Your agent gets byte-for-byte the same upstream behavior every test run. No tokens spent, no rate limits hit, no flake from drift.
Redacted at capture
Auth headers are stripped before any bytes touch disk — 16 PII header classes redacted by structural invariant. Bodies are scrubbed for the obvious patterns before persistence.
Gap-fill when shape drifts
When the recording doesn't cover a new path the agent tries, Gostly proposes a patch grounded by the existing recording. You approve before it ships. No LLM in the deterministic hot path, ever.
Works with the stack you already have
Gostly is an HTTP proxy. If your agent speaks HTTP to a third-party — and it does — the proxy is transparent to it. Tested with:
- ↳Anthropic SDK (Claude, Claude Code)
- ↳OpenAI SDK
- ↳LangChain / LangGraph
- ↳Pydantic AI / Instructor
- ↳Cursor agent tool calls
- ↳Anything that speaks HTTP to a third-party
Pricing
$0 to start. Self-hosted OSS proxy. No license key needed.
$10/mo when you want the AI gap-fill on unrecorded paths and drift detection.
$79/seat when the team grows — SAML, OIDC, RBAC, audit log, shared adapters.
The $10 is locked through Dec 2026 if you sign up before Jul 1. We won’t move the line on you.
Same input. Same response. Every run.
Stop debugging upstream drift. Pin the recording. Ship the agent.