Core Concepts

Architecture

The licensed Gostly product ships as a Docker Compose stack of containers that share a single ./data volume. A Rust proxy sits in the request path; a Python control plane owns the library and the dashboard; a Python inference server fills the edge of the match cascade; Postgres is the canonical store. This page is the technical version of the marketing architecture overview — env-var names, ports, and endpoints included.

The containers

A licensed deployment is four long-running containers plus an opt-in model sidecar. Each runs its own process with explicit interfaces between them. Internal services sit on a private Docker network; only the proxy and the web container are reachable from the host.

ghost-proxy

Rust · :8080 / :8443

The only component in the request path and the only one that sees production payloads. Records traffic during LEARN, serves the library during MOCK, forwards transparently during PASSTHROUGH. Plain HTTP on :8080; the TLS-MITM listener binds :8443 only when ENABLE_TLS_INTERCEPTION is set.

ghost-web

Python + Next.js · :8000 / :3000

One container running two services under supervisord — the FastAPI control plane on :8000 (mocks, services, traffic, transitions, drift, repair, webhooks, resources, auth/RBAC) and the Next.js operator dashboard on :3000. Reads the proxy's append-only JSONL captures and reconciles them into Postgres.

ghost-postgres

Postgres 16 · internal

The canonical store: the mock library, drift events, repair proposals, training sessions, statechart overrides, the append-only audit log, and the user workspace. On the internal network only — never published to the host.

ghost-inference

Python + PyTorch · :5000

Runs RAG retrieval and (optionally) generation. Called only when the deterministic match cascade falls through. On the internal network; reads the shared mock library and the FAISS index from /data.

ghost-llamacpp

opt-in sidecar · isolated

A llama.cpp generation backend behind ghost-inference, enabled via a compose profile. Sits alone on an internal:true network with no route to the host gateway — structurally no internet egress. Serves generation by default, so ENABLE_GENERATION can stay off in-process.

Distribution

The licensed product ships as Docker Compose + registry images — there is no host CLI. The separate OSS proxy is a different product distributed via Homebrew and a container registry, and it does ship a host CLI. Everything on this page describes the licensed Compose stack unless stated otherwise.
# Reachable from the host
ghost-proxy   :8080   plain HTTP proxy (the only thing your app talks to)
ghost-proxy   :8443   TLS-MITM listener (binds only when interception is on)
ghost-web     :8000   control-plane API
ghost-web     :3000   operator dashboard

# Internal network only (not published to the host)
ghost-postgres        canonical state
ghost-inference :5000 RAG + generation, called only at the cascade edge

Operating modes

The proxy is in exactly one mode at a time, settable per-service or globally. Modes are explicit — there is no ambiguity about whether a request reaches the upstream, returns a recording, or falls through to inference. INITIAL_MODE defaults to LEARN; mode state persists in /data/mode.txt.

LEARN

Every request is forwarded to the upstream; every response is recorded after credential redaction. The library grows. This is the default starting mode.

MOCK

No upstream call. Every request runs through the match cascade against the recorded library. Tests are deterministic; CI is offline.

PASSTHROUGH

Every request is forwarded to the upstream and nothing is recorded — the proxy is transparent. Useful for diagnosing whether the proxy itself is part of an issue.

TRANSITIONING

A brief interstitial while a fresh LEARN library is scrubbed into MOCK-ready state. The proxy returns 503 with a Retry-After header so callers back off cleanly instead of matching a half-written library.

Set the global mode via the control plane:

curl -X POST 'http://localhost:8000/v1/mode' \
  -H "X-API-Key: $GHOST_API_KEY" \
  -H 'Content-Type: application/json' \
  -d '{"mode": "MOCK"}'

The match cascade

In MOCK mode every request runs through a deterministic cascade that short-circuits at the first stage to produce a match. The cheap, deterministic stages run first; AI only ever runs at the edge.

1

Session-verbatim

If the request was seen during the active session, an in-memory byte-exact capture replays it — bodies and headers, tagged X-Ghost-Mock: session-verbatim. RAM-only: it never touches disk and is gone on restart or a new LEARN window.

2

Exact match

Method + normalized URI (and request-body hash) match a recorded entry. The recorded response is returned verbatim — same body, headers, status. No model, no randomness.

3

Resource store

Links a POST-created resource to a later GET-by-id, so POST /charges followed by GET /charges/{id} returns the created resource instead of a 404.

4

Smart swap

URI path parameters are normalized to templates (/users/{id}) and matched structurally — a recording of /users/42 serves a request to /users/99. The Free tier stops here; the AI stages below do not fire.

5

AI inference (edge)

Last resort, Pro+. When nothing above matches, the inference server proposes a response grounded in the recorded patterns for that route, gated by a confidence threshold.

No LLM in the request hot path

The first four stages are deterministic and require zero inference. Even the AI stage does not run a model synchronously on the request: generation runs on a background worker behind a bounded queue and responses are served from cache. This is an architectural invariant, not a tuning knob — a deployment that disables inference entirely still serves every deterministic stage.

Statecharts, resources, and drift

Stateful flows stay coherent across calls. The agent-side statechart engine fires at request time on every tier with a set of bundled fixtures — charge, customer, invoice, order, and subscription. A PATCH/POST transition advances a Harel statechart that rewrites the status field and tags the response X-Ghost-Transition. The statechart editor and per-tenant overrides are Pro+.

The control plane's drift detector compares fresh upstream shapes against the recorded library, emitting drift events plus a 0–100 freshness score and a sparkline trend so operators can see at a glance which services have decayed. Drift surfaces are Pro+.

Cold-start seeding

You don't have to record from zero. Drag a HAR export, a Postman collection, or an OpenAPI spec onto the dashboard and it seeds the library through POST /v1/seed/har, POST /v1/seed/postman, or POST /v1/seed/openapi.

AI mock-repair is an operator-assist surface: the control plane can propose fixes for stale mocks that an operator approves or rejects. The auto-proposer loop is off by default behind ENABLE_AI_MOCK_REPAIR; manual approve/reject works either way.

The inference server

The inference server runs two capabilities, both local to your stack and both controlled by environment variables:

ENABLE_RAG

on by default

Builds a per-service semantic index from your mock library and matches incoming requests by similarity. Retrieval grounds the edge of the cascade in your own recorded data.

ENABLE_GENERATION

off by default

Off by default — generation routes through the isolated ghost-llamacpp sidecar via LLAMACPP_ENDPOINT. Setting it true also loads the in-process PyTorch base as a sidecar fallback (heavier RAM footprint).

Where the team has enough recorded traffic, the training pipeline can fit per-service LoRA adapters. Adapters train only on PII-scrubbed rows, are served from cache, and stay self-hosted inside your stack — no request or response payload leaves the box on the default configuration. An optional BYO-key cloud-LLM backend exists in the image but stays off unless you explicitly configure it.

State, redaction, and where data lives

The proxy writes captures to append-only JSONL on the shared /data volume — one file per service for mocks, traffic, and webhooks. The control plane reads JSONL and reconciles it into Postgres with explicit upsert semantics. Postgres is the source of truth; JSONL is the format the proxy can write without taking a database connection on the request path.

/data/
  mocks/         mock_{service}.jsonl       served library
  traffic/       traffic_{service}.jsonl    raw recorded interactions
  webhooks/      {service}.jsonl            captured inbound webhooks
  mode.txt                                  current mode
  faiss.index                               inference retrieval index
  models/adapters/{service}/{session}/      per-service LoRA adapters

Credential redaction floor

A fixed floor of credential headers — Authorization, Cookie, Set-Cookie, X-Api-Key, and the rest — is stripped before anything is written to disk, on every sink. The floor cannot be removed by configuration; REDACT_HEADERS only adds to it. See the header redaction reference for the full list.

Header credentials and body PII are handled differently. Credential headers are stripped at the redaction floor above. PII in request and response bodies is kept verbatim in the local replay library — that fidelity is what makes a replayed response production-accurate — but is scrubbed on the way into the Postgres store and into any export. The replay library lives on your host volume and never leaves the box.

Gostly is single-tenant per deployment: tenant_id defaults to default, and every query is tenant-scoped in application code. Per-tenant Postgres Row-Level Security policies are defined as defense-in-depth on tenant-scoped tables, but in the default single-tenant configuration the isolation guarantee is the single-tenant deployment boundary plus the application-level scoping — the RLS policies are present as policy, not engine-enforced.

Auth, RBAC, and licensing

Authentication lives in the web container's auth layer. Beyond password login, the Team tier ships SAML and OIDC SSO, a four-role RBAC model — viewer < member < admin < owner — and an append-only audit log. Auth mode is set via GOSTLY_AUTH_MODE (e.g. password,saml).

Licensed features are enforced at three independent points — the proxy (parsing JWT claims), the control plane (a feature gate on every tier-gated endpoint), and the dashboard (rendering soft locks). All three must agree. The agent caches its validated entitlements and keeps serving the licensed tier through a grace window if the licensing platform is briefly unreachable, then degrades explicitly rather than silently.

Protocols and TLS

Gostly records and replays HTTP and HTTPS — HTTP/1.1 and HTTP/2 over TLS. WebSocket frames are captured for observability only, not replayed. There is no gRPC, async-messaging, or database mocking today (roadmap).

TLS is a tri-state knob. ENABLE_TLS_INTERCEPTION defaults off; true / lax enables the MITM listener with a log-and-continue posture, and strict exits the process on a TLS failure. When interception is on, the proxy mints per-host leaf certs from an embedded CA, and GET /ca.crt on the proxy serves that CA for the trust-install flow (it returns 503 while interception is off).

Outbound TLS-fingerprint impersonation — presenting a genuine Chrome, Firefox, or Safari fingerprint to the upstream — is a Pro+ capability.

Webhooks

Inbound webhook capture is automatic — the proxy writes one append-only JSONL file per service. Replay is operator-triggered through the control plane (POST /v1/webhooks/{service_id}/{webhook_id}/replay); the agent does not auto-replay captured webhooks. Scheduled fan-out and re-signing are roadmap, not shipped.

Observability

The proxy exposes a Prometheus scrape endpoint at /metrics. The core series:

ghost_requests_total{match_type}

Counter, one increment per match-path outcome (exact, smart_swap, session_verbatim, resource_store, miss, learn, …) — the cascade in metric form.

ghost_mock_library_size

Gauge of the current mock-library size.

ghost_io_errors_total{operation}

Disk-sink open/write failures, by operation.

axum_http_requests_total / _duration_seconds

HTTP request rate and latency, emitted by the Axum Prometheus layer.

gostly_tls_*

The TLS-MITM subsystem family — ALPN negotiation, cert-cache hit/miss/eviction, and listener state.

Next steps