How it works

A deterministic contract layer around your APIs.

Gostly is a four-component stack. A high-performance HTTP proxy learns from real traffic and serves it back through a deterministic match cascade. A control plane manages the library, drift detection, repair proposals, and the team workspace. An inference server handles the edge of the cascade — the cases where no recording exists. A license service issues signed feature entitlements and survives platform outages with a four-hour grace window.

The stack

Four components, separated by responsibility

Each component runs in its own process, with explicit interfaces between them. You can self-host the whole stack, run the cloud-managed control plane against your own proxy, or operate the proxy alone behind your existing infrastructure.

Proxy (Rust)

HTTP proxy in the request path. Records real traffic during LEARN, serves the recorded library during MOCK, forwards transparently during PASSTHROUGH. The only component that sees production payloads — designed for low latency, structural redaction, and zero runtime allocation on the hot path beyond the configured body ceiling.

Control plane (Python)

The library, the dashboard, the training pipeline, the drift detector, the repair proposer, the team workspace. Postgres-backed; every tenant-scoped table is RLS-protected. Reads the proxy's append-only JSONL captures and reconciles them into queryable state.

Inference (Python + PyTorch)

Runs the gap-fill model locally — fine-tuned by the training pipeline on the customer's own traffic. Called only when the deterministic match cascade falls through. No request body ever leaves the customer's infrastructure for inference; configurable to use a cloud LLM backend (OpenAI / Azure / Bedrock) when policy permits.

License service (managed)

Issues signed feature entitlements to the agent. Agent caches for five minutes on the happy path, four hours on the offline path. A regional outage on the platform side does not silently downgrade a customer's stack.

Operating modes

Four states, explicitly chosen

The proxy's behavior is governed by a single mode at any time, settable per-service or globally. Modes are explicit: there is no ambiguity about whether a request will reach upstream, return a recording, or fall through to the inference cascade.

LEARN

Every request is forwarded to the real upstream. Every response is recorded after structural redaction. The library grows.

When: Initial library construction. Recording a real workflow. Capturing a bug-reproduction trace.

MOCK

No upstream call. Every request runs through the match cascade against the recorded library. Tests are deterministic; CI is offline.

When: CI runs. Local development. Reproducing a captured production incident.

PASSTHROUGH

Every request is forwarded to the real upstream. Nothing is recorded. The proxy is transparent — useful for diagnosing whether the proxy itself is part of an issue.

When: Production traffic. Diagnosing whether the proxy is contributing to an observed problem.

TRANSITIONING

The control plane is scrubbing a fresh LEARN library into MOCK-ready state and writing to Postgres. The proxy returns 503 + Retry-After so callers back off cleanly.

When: Automatic. Triggered by LEARN→MOCK transitions; surfaces explicit backpressure rather than silent inconsistency.

Match cascade

Deterministic first, AI only at the edge

When the proxy is in MOCK mode, every request runs through the cascade in this order. The cascade short-circuits at the first stage that produces a match. The earliest stages are deterministic and require zero inference. The LLM is reached only when nothing else applies.

  1. 1

    Exact match

    The recorded library is keyed on (method, normalized URI). An exact hit returns the recorded response verbatim — same body, same headers, same status. No model. No randomness.

  2. 2

    Smart swap

    When the URI structurally matches a recorded route but with different parameter values, the proxy substitutes the variable segments deterministically. Free-tier opt-in; on by default for Pro and Team.

  3. 3

    Inference fallback

    When the cascade reaches a recorded route with insufficient examples, the inference server proposes a response grounded by the recordings for that route. Confidence threshold applies — low-confidence outputs fall through. Pro and Team tiers.

  4. 4

    Generative synthesis

    For routes with no recorded match, the inference server synthesizes a response from the contract — the response schema inferred from neighboring traffic, the convention extracted from the rest of the service. Pro and Team tiers.

  5. 5

    Unmatched response

    The cascade exhausted. The proxy returns a configurable default — by default a 404 with a JSON body that names the missing route. The unmatched call is logged for the dashboard so the operator can decide whether to record it or accept the gap.

Feature gating

Three-layer enforcement, not just a UI flag

Every licensed feature is checked at three independent points: the proxy (when JWT claims are parsed at startup), the control plane (on every API request), and the dashboard (when the UI renders). All three must agree for a feature to operate. A bug in any one layer cannot silently grant access through the other two.

Proxy

The agent reads the license JWT at startup and at every refresh cycle. The match cascade's inference and generative stages won't fire unless the feature is in the JWT claims.

Control plane

Every tier-gated endpoint depends on a feature gate. A request that targets a gated endpoint without the licensed feature returns a license-shaped 403, not an auth-shaped 401.

Dashboard

UI surfaces gate on the same feature flag the API uses. Locked features render a soft lock with a path to upgrade, not a broken control.

State model

Postgres is the source of truth, JSONL is the wire format

The proxy writes captures to append-only JSONL on a shared volume. The control plane reads from JSONL and reconciles into Postgres with explicit upsert semantics. Postgres is the canonical state; JSONL is the format the proxy can write to disk without taking a database connection on the request hot path. Once a mock is scrubbed, the scrubbed timestamp is a one-way seal — the sync layer refuses to overwrite a scrubbed row with an unscrubbed one.

Postgres-backed

The library, drift events, repair proposals, training sessions, audit log, user workspace. Row-level isolated per tenant; 22 tables protected by RLS policies.

JSONL captures

One file per service for mocks, traffic, webhooks, wide events, resources. Append-only. The proxy's write path; the control plane's read path. Survives restarts; resumes cleanly.

Scrub seal

Every persisted record carries a scrubbed_at timestamp set after the structural scrubber runs. The seal is the safety boundary for any operation that moves data off the customer's machine.

Resilience

The platform can be unreachable; your stack keeps working

The agent caches its validated license features for five minutes on the happy path. When the platform is unreachable — regional AWS outage, network partition, scheduled maintenance — the cache stays valid for a four-hour grace window. During that window the agent continues to serve the customer's licensed tier with a structured warning emitted on every served request. After the window closes, degradation is explicit: the agent falls back to free-tier features with a logged event, never a silent downgrade.

The control plane's drift detector, fidelity recompute loop, and optional repair-proposer loop all absorb transient errors and resume on the next tick. A blip in any single subsystem does not cascade into the request path the proxy serves.

See the security guarantees that fall out of the architecture.

The structural redaction floor, row-level tenant isolation, constant-time secret comparison, SSRF guard on replay, and bounded request bodies are all consequences of how the stack is wired — not policies imposed from outside.

Read the security model