For agent & harness developers

Give your agent harness the keys to prod. Without giving it prod.

Gostly records your real services once and replays them byte-for-byte. Your harness’s tool calls are answered by the recording itself — the exact captured bytes, served straight back. No second model or agent sits between the call and the response, and nothing goes out to the live service.

Let the loop retry and branch as hard as it wants. The blast radius is a recording on your own disk — not your production database.

Get started free See it live — interactive demo Why determinism matters

payments-api · upstream OFFLINE

UPSTREAM OFFLINE✓ 29 / 29 GREEN

$ docker stop upstream-api # kill the real API

upstream-api stopped

$ pytest tests/ -v # replayed from Gostly

tests/payments::test_list_customers PASSED

tests/payments::test_get_customer PASSED

· · · · · · · · · · · · · · · · · · · ·

========= 29 passed in 0.85s =========

↳ 0 live calls — served byte-for-byte

take the real service offline — the harness keeps calling, served byte-for-byte

─ WHAT A TOOL CALL CAN COST

An agent with write access and a retry loop will eventually call prod the wrong way.

Modern harnesses (Claude Code, Cursor, your own LangGraph loop) don’t just generate text. They make tool calls: real HTTP requests to Stripe, GitHub, your internal services. Give a tool the write access it needs to be useful, and the same autonomy that ships the feature can delete the row or double-charge the card at 2am. Every iteration spends tokens whether the run was useful or not.

Grant the access

A tool call has to touch real systems to do real work, so you give it write scope.

Run the loop

The harness retries, branches, and explores. Most paths are harmless. One won’t be.

Pay either way

Useful or not, every iteration spends tokens and counts against the live rate limit.

─ HOW IT WORKS FOR AGENTS

Record once. Replay deterministically.

Gostly is an HTTP proxy. If your agent speaks HTTP to a third-party — and it does — the proxy is transparent to it. Four stages, no live call in any of them.

01 · RECORD

Record one real run.

Run your harness against the real services once. Gostly sits between the agent and every third-party API and captures each tool call (request and response) over HTTP and TLS.

02 · REPLAY

Replay it byte-for-byte.

Flip the proxy to MOCK. Your agent's calls are answered by the recording itself — a direct lookup that returns the exact captured bytes, with no second model or agent in the outbound path re-deriving the response. Same bytes every run: no tokens spent, no rate limits hit, and a destructive call lands on the recording, not prod.

03 · REDACT

Redacted before it serves.

Credential and auth headers are stripped before any bytes touch disk. Bodies are scrubbed for common secret patterns when a recording is promoted to the served library. It's a floor you can raise per service, and opt into verbatim only where you need it.

04 · GAP-FILL

Drift becomes a patch you approve.

When the upstream shape moves, Gostly surfaces it as drift events and proposes a patch grounded by the existing recording. You approve before it ships — nothing auto-applies.

─ SEE IT WORK

The hard parts of testing an agent.

Multi-step flows, upstream drift, and failure injection are exactly where a static stub set falls apart. Each of these is the real product surface.

gostly.internal / statecharts

resource lifecycle · replayed in order

created

fetched

updated

closed

✓ GET after POST returns the created resource

✓ the second call isn’t a 404 — no hand-wired fixtures

STATEFUL FLOWS

The second call isn’t a 404.

Agents retry the same upstream four ways. Recorded multi-step sequences replay in order — a POST then a GET returns the resource it just created, with no hand-wired fixtures.

gostly.internal / drift

drift vs accepted baseline3 changes detected

＋ field addedcustomer.tax_idaudit-logged ✓

⇄ type changedamount string → numberaudit-logged ✓

⇄ status changed200 → 201audit-logged ✓

ignores healthy id / timestamp variance

DRIFT DETECTION

When the API moves, you see it.

Gostly schema-diffs each fresh capture against the baseline you accepted and surfaces the changes as audit-logged events — ignoring healthy id / timestamp variance.

gostly.internal / chaos workbench

fault profile

latency +800ms

errors 503 · 20%

rate-limit 429

live request

→ GET /orders

⚠ injected: 503 + 812ms

↻ client retried

✓ recovered on retry

deterministic — same config, same chaos

FAULT INJECTION

Prove the agent copes.

Inject latency, 5xx, and rate-limits, then watch your agent's retry and fallback paths under the failure modes a happy-path fixture never covers. Seed the run and a bad outcome reproduces.

─ PUT A PROCESS AROUND AGENT RUNS

Token-max your agents, not your error budget.

Vibe-code the agent however loose you want, then pin a known-good recording and gate every change against it. Run your own eval suite in CI against the replay: it stays deterministic and offline, because MOCK makes no live call. Run the loop flat-out across every expensive edge case. Your token bill climbs; your incident count doesn’t.

Same input, same response, every run. So a red pipeline means the agent regressed, not that the upstream moved under it.

gostly.internal / insights

FIDELITY

100%

EXACT MATCH

69%

MISS RATE

per-endpoint coverage

GET /customers/:id100%

POST /charges100%

every tool call replayed in CI — served from the recording, never the live API

─ WORKS WITH THE STACK YOU HAVE

No SDK changes. No config rewrites. Point your harness’s traffic through it.

Because Gostly intercepts at the HTTP/TLS boundary, it’s transparent to whatever your harness is built on. Tested with:

Anthropic SDK (Claude, Claude Code)
OpenAI SDK
LangChain / LangGraph
Pydantic AI / Instructor
Cursor agent tool calls
MCP tool calls over HTTP
Any harness that speaks HTTP to a third-party

─ BUILT TO CLEAR A SECURITY REVIEW

Your agent’s traffic never leaves your perimeter.

Gostly runs entirely in your own infrastructure, single-tenant per deployment — recorded traffic is never mixed with another customer’s data and never sent to us. Single sign-on (SAML and OIDC), role-based access control, and a full audit trail come standard on Team. If a vendor outage is your worry, there is no vendor in the path to go down — it all runs in your infrastructure.

REVIEW THE SECURITY MODEL →

gostly.internal · audit trail

every mutation · who, what, when TENANT-SCOPED

admin@…created service · payments-api14:02

admin@…switched to MOCK · payments-api14:05

admin@…approved repair · #128114:09

systemredaction floor applied14:09

built to clear a security review

─ PRICING

Self-hosted free. Pro for the gap-fill. Team for the controls.

The $10 early-access price is locked through Dec 2026 if you sign up before September 1. The price won’t change on you before then.

FREE

$0 forever

Self-hosted OSS proxy. Record, replay, and the deterministic match cascade. No license key needed.

PRO

$10 / month

Adds AI gap-fill on unrecorded paths and drift detection. The model is served from cache — never in the request path.

TEAM

$79 / seat / month

SAML + OIDC SSO, RBAC, audit log, and shared adapters across the team. 3-seat minimum.

─ READY WHEN YOU ARE

Let the loop run hot. Keep prod boring.

Point your harness at a byte-for-byte recording, gate it in CI, and let your agents hammer every path they want. No tokens wasted on rate limits, and no tool call reaching prod that shouldn’t.

Get started free See it live — interactive demo Why determinism matters