Architecture
The licensed Gostly product ships as a Docker Compose stack of containers that share a single ./data volume. A Rust proxy sits in the request path; a Python control plane owns the library and the dashboard; a Python inference server fills the edge of the match cascade; Postgres is the canonical store. This page is the technical version of the marketing architecture overview — env-var names, ports, and endpoints included.
The containers
A licensed deployment is four long-running containers plus an opt-in model sidecar. Each runs its own process with explicit interfaces between them. Internal services sit on a private Docker network; only the proxy and the web container are reachable from the host.
ghost-proxy
Rust · :8080 / :8443The only component in the request path and the only one that sees production payloads. Records traffic during LEARN, serves the library during MOCK, forwards transparently during PASSTHROUGH. Plain HTTP on :8080; the TLS-MITM listener binds :8443 only when ENABLE_TLS_INTERCEPTION is set.
ghost-web
Python + Next.js · :8000 / :3000One container running two services under supervisord — the FastAPI control plane on :8000 (mocks, services, traffic, transitions, drift, repair, webhooks, resources, auth/RBAC) and the Next.js operator dashboard on :3000. Reads the proxy's append-only JSONL captures and reconciles them into Postgres.
ghost-postgres
Postgres 16 · internalThe canonical store: the mock library, drift events, repair proposals, training sessions, statechart overrides, the append-only audit log, and the user workspace. On the internal network only — never published to the host.
ghost-inference
Python + PyTorch · :5000Runs RAG retrieval and (optionally) generation. Called only when the deterministic match cascade falls through. On the internal network; reads the shared mock library and the FAISS index from /data.
ghost-llamacpp
opt-in sidecar · isolatedA llama.cpp generation backend behind ghost-inference, enabled via a compose profile. Sits alone on an internal:true network with no route to the host gateway — structurally no internet egress. Serves generation by default, so ENABLE_GENERATION can stay off in-process.
Distribution
# Reachable from the host ghost-proxy :8080 plain HTTP proxy (the only thing your app talks to) ghost-proxy :8443 TLS-MITM listener (binds only when interception is on) ghost-web :8000 control-plane API ghost-web :3000 operator dashboard # Internal network only (not published to the host) ghost-postgres canonical state ghost-inference :5000 RAG + generation, called only at the cascade edge
Operating modes
The proxy is in exactly one mode at a time, settable per-service or globally. Modes are explicit — there is no ambiguity about whether a request reaches the upstream, returns a recording, or falls through to inference. INITIAL_MODE defaults to LEARN; mode state persists in /data/mode.txt.
LEARN
Every request is forwarded to the upstream; every response is recorded after credential redaction. The library grows. This is the default starting mode.
MOCK
No upstream call. Every request runs through the match cascade against the recorded library. Tests are deterministic; CI is offline.
PASSTHROUGH
Every request is forwarded to the upstream and nothing is recorded — the proxy is transparent. Useful for diagnosing whether the proxy itself is part of an issue.
TRANSITIONING
A brief interstitial while a fresh LEARN library is scrubbed into MOCK-ready state. The proxy returns 503 with a Retry-After header so callers back off cleanly instead of matching a half-written library.
Set the global mode via the control plane:
curl -X POST 'http://localhost:8000/v1/mode' \
-H "X-API-Key: $GHOST_API_KEY" \
-H 'Content-Type: application/json' \
-d '{"mode": "MOCK"}'The match cascade
In MOCK mode every request runs through a deterministic cascade that short-circuits at the first stage to produce a match. The cheap, deterministic stages run first; AI only ever runs at the edge.
Session-verbatim
If the request was seen during the active session, an in-memory byte-exact capture replays it — bodies and headers, tagged X-Ghost-Mock: session-verbatim. RAM-only: it never touches disk and is gone on restart or a new LEARN window.
Exact match
Method + normalized URI (and request-body hash) match a recorded entry. The recorded response is returned verbatim — same body, headers, status. No model, no randomness.
Resource store
Links a POST-created resource to a later GET-by-id, so POST /charges followed by GET /charges/{id} returns the created resource instead of a 404.
Smart swap
URI path parameters are normalized to templates (/users/{id}) and matched structurally — a recording of /users/42 serves a request to /users/99. The Free tier stops here; the AI stages below do not fire.
AI inference (edge)
Last resort, Pro+. When nothing above matches, the inference server proposes a response grounded in the recorded patterns for that route, gated by a confidence threshold.
No LLM in the request hot path
Statecharts, resources, and drift
Stateful flows stay coherent across calls. The agent-side statechart engine fires at request time on every tier with a set of bundled fixtures — charge, customer, invoice, order, and subscription. A PATCH/POST transition advances a Harel statechart that rewrites the status field and tags the response X-Ghost-Transition. The statechart editor and per-tenant overrides are Pro+.
The control plane's drift detector compares fresh upstream shapes against the recorded library, emitting drift events plus a 0–100 freshness score and a sparkline trend so operators can see at a glance which services have decayed. Drift surfaces are Pro+.
Cold-start seeding
POST /v1/seed/har, POST /v1/seed/postman, or POST /v1/seed/openapi.AI mock-repair is an operator-assist surface: the control plane can propose fixes for stale mocks that an operator approves or rejects. The auto-proposer loop is off by default behind ENABLE_AI_MOCK_REPAIR; manual approve/reject works either way.
The inference server
The inference server runs two capabilities, both local to your stack and both controlled by environment variables:
ENABLE_RAGon by default
Builds a per-service semantic index from your mock library and matches incoming requests by similarity. Retrieval grounds the edge of the cascade in your own recorded data.
ENABLE_GENERATIONoff by default
Off by default — generation routes through the isolated ghost-llamacpp sidecar via LLAMACPP_ENDPOINT. Setting it true also loads the in-process PyTorch base as a sidecar fallback (heavier RAM footprint).
Where the team has enough recorded traffic, the training pipeline can fit per-service LoRA adapters. Adapters train only on PII-scrubbed rows, are served from cache, and stay self-hosted inside your stack — no request or response payload leaves the box on the default configuration. An optional BYO-key cloud-LLM backend exists in the image but stays off unless you explicitly configure it.
State, redaction, and where data lives
The proxy writes captures to append-only JSONL on the shared /data volume — one file per service for mocks, traffic, and webhooks. The control plane reads JSONL and reconciles it into Postgres with explicit upsert semantics. Postgres is the source of truth; JSONL is the format the proxy can write without taking a database connection on the request path.
/data/
mocks/ mock_{service}.jsonl served library
traffic/ traffic_{service}.jsonl raw recorded interactions
webhooks/ {service}.jsonl captured inbound webhooks
mode.txt current mode
faiss.index inference retrieval index
models/adapters/{service}/{session}/ per-service LoRA adaptersCredential redaction floor
Authorization, Cookie, Set-Cookie, X-Api-Key, and the rest — is stripped before anything is written to disk, on every sink. The floor cannot be removed by configuration; REDACT_HEADERS only adds to it. See the header redaction reference for the full list.Header credentials and body PII are handled differently. Credential headers are stripped at the redaction floor above. PII in request and response bodies is kept verbatim in the local replay library — that fidelity is what makes a replayed response production-accurate — but is scrubbed on the way into the Postgres store and into any export. The replay library lives on your host volume and never leaves the box.
Gostly is single-tenant per deployment: tenant_id defaults to default, and every query is tenant-scoped in application code. Per-tenant Postgres Row-Level Security policies are defined as defense-in-depth on tenant-scoped tables, but in the default single-tenant configuration the isolation guarantee is the single-tenant deployment boundary plus the application-level scoping — the RLS policies are present as policy, not engine-enforced.
Auth, RBAC, and licensing
Authentication lives in the web container's auth layer. Beyond password login, the Team tier ships SAML and OIDC SSO, a four-role RBAC model — viewer < member < admin < owner — and an append-only audit log. Auth mode is set via GOSTLY_AUTH_MODE (e.g. password,saml).
Licensed features are enforced at three independent points — the proxy (parsing JWT claims), the control plane (a feature gate on every tier-gated endpoint), and the dashboard (rendering soft locks). All three must agree. The agent caches its validated entitlements and keeps serving the licensed tier through a grace window if the licensing platform is briefly unreachable, then degrades explicitly rather than silently.
Protocols and TLS
Gostly records and replays HTTP and HTTPS — HTTP/1.1 and HTTP/2 over TLS. WebSocket frames are captured for observability only, not replayed. There is no gRPC, async-messaging, or database mocking today (roadmap).
TLS is a tri-state knob. ENABLE_TLS_INTERCEPTION defaults off; true / lax enables the MITM listener with a log-and-continue posture, and strict exits the process on a TLS failure. When interception is on, the proxy mints per-host leaf certs from an embedded CA, and GET /ca.crt on the proxy serves that CA for the trust-install flow (it returns 503 while interception is off).
Outbound TLS-fingerprint impersonation — presenting a genuine Chrome, Firefox, or Safari fingerprint to the upstream — is a Pro+ capability.
Webhooks
Inbound webhook capture is automatic — the proxy writes one append-only JSONL file per service. Replay is operator-triggered through the control plane (POST /v1/webhooks/{service_id}/{webhook_id}/replay); the agent does not auto-replay captured webhooks. Scheduled fan-out and re-signing are roadmap, not shipped.
Observability
The proxy exposes a Prometheus scrape endpoint at /metrics. The core series:
ghost_requests_total{match_type}Counter, one increment per match-path outcome (exact, smart_swap, session_verbatim, resource_store, miss, learn, …) — the cascade in metric form.
ghost_mock_library_sizeGauge of the current mock-library size.
ghost_io_errors_total{operation}Disk-sink open/write failures, by operation.
axum_http_requests_total / _duration_secondsHTTP request rate and latency, emitted by the Axum Prometheus layer.
gostly_tls_*The TLS-MITM subsystem family — ALPN negotiation, cert-cache hit/miss/eviction, and listener state.
Next steps
How It Works →
The LEARN → MOCK pipeline and the match cascade, walked through end to end.
Proxy Setup →
Multiple services, TLS interception vs. Caddy-fronting, CI integration, chaos injection.
Configuration Reference →
Every environment variable, the redaction floor, scrub rules, and the tier feature matrix.
API Reference →
Mode, transition, seed, drift, statechart, and webhook endpoints.