Deterministic LLM mocks — recorded from the real provider, replayed in your perimeter
llmock (and the related aimock project) and VCR-style cassettes give you a fast way to record an LLM call once and replay it deterministically in tests. Gostly takes the same record-and-replay idea and makes it infrastructure: a self-hosted proxy you point your agent’s provider base URL at, that records the real responses and replays them byte-for-byte — language-agnostic, with drift detection and a statechart for multi-step tool flows. Your API keys, prompts, and completions stay on the host.
llmock is a clean answer to a real problem: agent tests that hit a live model are slow, expensive, and flaky, and they drift every time the provider ships a new checkpoint. So llmock (and the related aimock) runs a local mock server you can reach from any process on a port, records real provider responses VCR-style the first time, and replays them deterministically after that. It supports the LLM call shape end to end — OpenAI, Claude, Gemini, Bedrock, and more — including full streaming, preserving time-to-first-token and inter-frame cadence. If your need is “make one test file deterministic without a network call,” a fixture-based mock server or a VCR cassette is the lighter, more direct tool, and often the right one.
Gostly starts from a different scope. It is not a per-language library you drop into a test file; it is a self-hosted proxy that sits in front of an upstream for your whole stack. You point your agent’s provider base URL at it, run LEARN to record an hour of real responses, then flip to MOCK and it replays them byte-for-byte — in your perimeter, with no model in the serving path. Because it lives at the proxy layer, every service and every language in your system gets the same recorded library; there is nothing to wire per test framework.
Both approaches keep your secrets local. llmock replays from on-disk fixtures with no live call; Gostly records into a library on your own host and never sends recorded traffic, mocks, or PII off the box — credential headers are redacted before anything is written to disk. For a team whose blocker is “our prompts and completions can’t touch a third-party service,” that perimeter property is the whole point, and it holds across every upstream your agent calls, not just the model.
The pain that grows over time is drift. A recorded cassette keeps returning the same tokens long after the provider changed its response shape — the test stays green, and the agent breaks in production against the new format. Gostly re-records the upstream and diffs it against what you captured, so it emits a drift event and a 0–100 freshness score before an incident does. And for multi-step tool flows, a recorded statechart means a POST followed by a GET-by-id behaves like the real resource lifecycle instead of a flat per-request fixture.
Because Gostly runs as your own single-tenant deployment, the controls a security review asks for are built in: SSO via SAML or OIDC, role-based access control, and an append-only audit log on Team, all self-hosted in the container. Each deployment is single-tenant, so your data is isolated by running in your own perimeter, never shared with another customer’s. The trade-off is real, too: llmock and aimock are purpose-built for the model-call shape — they reproduce streaming token cadence and the provider-specific protocol details in a way a general HTTP recorder does not, and a single-file VCR cassette is lighter to adopt for one suite.
A VCR cassette makes one test file deterministic. Gostly makes the whole stack deterministic: point any service’s provider base URL at a self-hosted proxy, and the mock is the recording — the same bytes the real provider sent, replayed in your perimeter, with drift detection when the provider moves.
payments-api · upstream OFFLINE
UPSTREAM OFFLINE✓ 29 / 29 GREEN
$ docker stop upstream-api # kill the real API
upstream-api stopped
$ pytest tests/ -v # replayed from Gostly
tests/payments::test_list_customers PASSED
tests/payments::test_get_customer PASSED
· · · · · · · · · · · · · · · · · · · ·
========= 29 passed in 0.85s =========
↳ 0 live calls — served byte-for-byte
cut the provider off — the recorded library still serves, 0 live model calls
Recorded statechart — POST then GET-by-id just works
Chaos / fault injection
Configurable error + latency fixtures
Markov chaos that learns a degradation profile from your traffic
Mock everything an agent talks to (MCP, vector DB, search)
aimock targets the broader AI surface (MCP, vector DBs, and more)
Any HTTP/HTTPS upstream; gRPC + async messaging are roadmap
Cold-start library before you record
Hand-write or generate fixtures
Seed from a HAR / Postman / OpenAPI file
Keys / prompts / completions stay on the host
Yes — local fixtures, no live calls in replay
Yes — recorded traffic + PII never leave your perimeter
SSO / RBAC / audit log
No
SAML + OIDC SSO, 4-role RBAC, append-only audit log on Team
Roadmap items are labelled explicitly. Statecharts (stateful flows), live drift detection, Markov chaos, and SSO / RBAC / audit log ship today; gRPC, async messaging, and database mocking stay roadmap on Gostly’s side. Streaming-token replay with original timing fidelity is something the purpose-built LLM mocks do and Gostly does not — Gostly replays the recorded response body byte-for-byte, buffered, not re-streamed frame-by-frame.
Choose llmock / a VCR cassette when
→You want to make one test file or one suite deterministic with the least possible setup.
→You need faithful streaming-token replay — time-to-first-token and inter-frame cadence matching the provider’s SSE protocol. That is exactly what the LLM-specific mocks are built for.
→You want the mock to understand provider-specific request shapes (OpenAI, Claude, Gemini, Bedrock, …) out of the box rather than treating them as opaque HTTP.
→Your agent talks to MCP servers, vector DBs, or rerankers and you want one tool that mocks all of them — aimock is purpose-built for that surface.
→An MIT-licensed npm package or a Python cassette library fits your toolchain better than running a proxy stack.
Choose Gostly when
→You want determinism across the whole stack, not one suite — every service and every language points its base URL at one self-hosted proxy and gets the same recorded library.
→Your API keys, prompts, and completions can’t leave the host — recorded traffic and PII never leave your perimeter, and credential headers are redacted before disk.
→You need deterministic, byte-for-byte replay in CI — the same bytes every run, with no model in the serving path.
→You need drift detection that flags when the real provider’s response shape moves, and recorded statecharts so multi-step tool flows replay in the right order.
→You need SSO (SAML/OIDC), role-based access, and an append-only audit log on Team — single-tenant, in your own perimeter, with data that never leaves it.
Pricing, side by side
Tier
llmock / aimock
Gostly
Free / OSS
Fully open source (MIT) · self-hosted, no paid tier
OSS proxy (FSL) · unlimited services, self-hosted
Pro / Solo
No paid tier — it’s a library
$10 / mo single user
Team
—
$79 / seat / mo
Self-host / Enterprise
—
$499 / mo Self-host · $25K+ Enterprise
llmock / aimock is free and open source — there is no price to compare against, and for a single-suite need that is unbeatable. Gostly’s OSS proxy is free and self-hosted too; the paid tiers buy the control plane, drift detection, statechart editing, fingerprint-matched outbound TLS, SSO / RBAC / audit, and self-hosted inference — the things you reach for when mocking your agent’s dependencies stops being one test file and becomes shared infrastructure.
No empty library on day one
The objection that stalls most record-and-replay adoptions is “I have no recordings yet, so until I run the proxy against real traffic the library is empty.” You aren’t starting from zero. Drop a HAR capture, a Postman collection, or an OpenAPI spec into the dashboard and Gostly seeds a working mock library before you proxy a single request — the same library your recorded traffic lands in. Then real responses sharpen it to ground truth, and AI fills the gaps the recordings did not cover, grounded by the patterns it already saw.
gostly.internal / cold-start seeding
cold-start seeding 47 MOCKS · SERVING
⤓
drop a HAR · Postman · OpenAPI file
payments.har — 47 entries · 312 KiB
HARPostmanOpenAPI
✓ GET /customers/:id — seeded
✓ POST /charges — seeded
✓ GET /invoices — seeded
· flip to MOCK — a full library on day one
no empty library — seeded before you record a single call
In the dashboard, open Cold-start seeding, drag in the file, pick the service, and commit. The mocks land in the same library your recorded traffic does — flip the proxy to MOCKand they serve immediately. Then point your agent’s provider base URL at the containerized proxy in LEARN mode and run your real suite to sharpen it to ground truth. No per-language wiring, no host CLI — the whole flow lives in the dashboard and the proxy.
Deterministic agent tests, in your own perimeter
Self-hosted, language-agnostic, recording-first. Point your agent’s provider base URL at it for an hour and see whether it produces a mock you trust — with your keys and prompts never leaving the host.