Core Concepts

How It Works

Gostly is a transparent HTTP proxy with three operating modes: LEARN, MOCK, and TRANSITIONING. Understanding the pipeline between them is the key to getting the most out of the tool.

The pipeline

1. LEARN

The proxy forwards every request to your upstream and records the verbatim response. Your app sees no difference.

2. TRANSITION

Recorded traffic is scrubbed, pattern-extracted, and written to the mock library. A brief interstitial mode returns 503 + Retry-After.

3. MOCK

All requests are served from the mock library. No upstream required. Unmatched requests fall through to AI generation if enabled.

LEARN mode β€” recording traffic

In LEARN mode the proxy is a transparent pass-through. Every inbound request is forwarded to the configured upstream URL. The response is returned to the caller and simultaneously written to a local JSONL file on disk:

# Each line in traffic/{service}.jsonl is one recorded interaction
{
  "timestamp": "2026-04-23T09:14:22Z",
  "method": "GET",
  "uri": "/users/42",
  "request_headers": { "accept": "application/json" },
  "request_body": null,
  "status": 200,
  "response_headers": { "content-type": "application/json" },
  "response_body": { "id": 42, "name": "Jane Smith", "role": "admin" }
}

On sensitive headers

Authorization tokens, cookies, API keys, and a floor of enterprise security headers are never written to disk. They are redacted at capture time by the proxy β€” before any I/O occurs. See the header redaction reference for the full list.

JSONL files live on the customer's machine and are never transmitted anywhere. The verbatim format preserves full fidelity β€” tests that pattern-match on specific field values work correctly because the recorded data is production-accurate.

Transition β€” building the mock library

When you trigger a transition, the API reads the raw JSONL, runs it through a scrub pipeline, and writes the results to the Postgres-backed mock library:

Scrub

Request/response bodies are scanned for credentials, PII patterns, and any field paths you've configured. Matched values are replaced with [REDACTED]. The scrubbed_at timestamp is set β€” this is the permanent safety boundary.

Pattern extraction

URI paths are normalised to templates (e.g. /users/42 β†’ /users/{id}). The extracted patterns drive AI training and smart-swap matching.

Mock library write

Scrubbed entries are inserted into the mock_library table. The proxy is signalled to reload β€” it reads the library and serves from it on the next request.

During transition the proxy enters TRANSITIONING mode and returns 503 Service Unavailable with a Retry-After header. This is intentional β€” it prevents partial-library matches during the write.

Start a transition via the API (or use the dashboard at localhost:3000):

curl -X POST http://localhost:8000/v1/transition/start
# Returns: { "job_id": "..." }

# Poll until complete
curl http://localhost:8000/v1/transition/{job_id}/status

MOCK mode β€” serving responses

Switch to MOCK mode via the dashboard or the API:

curl -X POST http://localhost:8000/v1/mode \
  -H 'Content-Type: application/json' \
  -d '{"mode": "MOCK"}'

In MOCK mode the proxy matches each inbound request against the library using a tiered strategy. Earlier tiers are cheaper; later tiers are more capable:

Exact match

All tiers

Method + URI + request body hash matches a recorded entry exactly. Instant β€” O(1) hash lookup.

Smart swap

All tiersΒΉ

URI path parameters are normalised to templates (/users/{id}) and matched structurally. A recording of /users/42 will serve a request to /users/99. Enable with SMART_SWAP_ENABLED=true on the proxy.

AI generation

Pro+

No recorded match. A fine-tuned model (or retrieval-augmented generation) generates a realistic response based on recorded patterns for this service.

ΒΉ Smart swap is available on all tiers but requires SMART_SWAP_ENABLED=true on the proxy.

Chaos injection

Any tier can be wrapped with chaos config β€” injecting random latency, error rates, or specific status codes to simulate degraded upstream behaviour. Available on all plans.

AI pipeline (Pro+)

When a request has no recorded match, Gostly routes it to the inference server. The inference server runs two optional modes, both disabled by default and enabled via environment variables:

ENABLE_RAG=true

Loads the all-MiniLM-L6-v2 sentence encoder and builds a per-service semantic index from your mock library. Incoming requests are matched by cosine similarity β€” above 0.92 the recorded response is replayed directly; above 0.75 it becomes a grounded generation template; below that, pure generation. This is the recommended first step.

ENABLE_GENERATION=true

Loads Qwen2.5-0.5B-Instruct (configurable via GEN_MODEL) and serves LoRA adapter responses. For teams with 50+ recorded interactions per endpoint, optional fine-tuning produces a per-service adapter that improves consistency. Requires ~2 GB RAM; the first request after startup may briefly 503 while the model loads.

The AI pipeline is entirely local β€” the inference server runs inside your Docker stack. No request bodies or response contents are sent to any external model provider.

Next steps