Guides

Drift Detection & Freshness Scores

A recorded mock is a snapshot of an upstream at a point in time. The upstream keeps shipping. Drift detection tells you when a service's recorded behaviour has moved away from what your mocks still claim — by comparing two recording sessions, emitting drift events, and rolling everything up into a per-service 0–100 freshness score with a 30-day trend. It is the closed-loop step that keeps a deterministic mock library from quietly going stale.

Why drift matters

Gostly serves responses from a recorded library so your tests never touch the live upstream — see How It Works for the LEARN → MOCK pipeline. That determinism is the point. The risk is that a mock keeps passing CI long after the real upstream renamed a field, dropped an endpoint, or started returning 5xx — so your suite is green against a service that no longer exists in that shape.

Drift detection re-records the upstream, compares the new recording session against the baseline the mocks were built from, and surfaces the difference as structured events plus a single headline number per service. Detection runs in the control-plane API, not in the request hot path — it never adds latency to a served mock.

What gets compared — sessions, not single responses

Drift is computed over recording sessions, not individual requests. Every recorded interaction carries a recording_session_id — the partition the proxy stamps at capture time. To detect drift you point the detector at two of them for the same service: a baseline session (what the mocks were built from) and a current session (a fresh re-recording).

Only real upstream observations count as drift signal. By default the detector reads only rows captured by the proxy in LEARN mode against live traffic — operator-curated mocks (OpenAPI imports, manual edits, MCP-authored entries) are excluded so that editing a mock by hand can never fire a "the upstream changed" alert. Set GOSTLY_DRIFT_LEARN_MODE_ONLY=false to include curated rows in environments where every mock should count.

One event per session pair, per route

Detection is idempotent. Each drift is keyed on (service, route, method, current_session_id)— re-running the check over the same two sessions returns the existing events rather than emitting duplicates, and the Prometheus counter only increments on genuinely new drift. A noisy recording trickling in late won't spawn a phantom "new" event for a change you've already seen.

What counts as drift vs. healthy variance

The hard part is not noticing that two responses differ — it's noticing the ones that matter. Two recordings of GET /charges/{id} will have different IDs, timestamps, and tokens every single time. That is healthy variance, not drift. The detector reduces each session to a value-free fingerprint per response field and compares fingerprints, so changing values never register as a change:

Schema shape

The field set per route, and a recursive type-shape string per field (object{id:integer,name:string}). Adding, removing, or restructuring a field shows up here; different ID values for the same shape do not.

Field types

The dominant JSON type per field. An int|null union collapses to integer (null-tolerant). A field that flips integer → string is a type change.

Cardinality profile

Each field's distinct-value count is bucketed (singleton / low / medium / high / unbounded) from value hashes — never raw values. A field that shifts from unbounded to singleton means the upstream stopped varying it; that's real change. Same bucket + different values = no drift.

Format profile

For string fields with a stable modal format (uuid, iso_datetime, url, email, ipv4, phone, numeric_string), a format break fires. Mixed-format and free-text fields are treated as opaque — they never fire spurious format drift.

Status distribution

The fraction of responses in each HTTP status family (2xx/4xx/5xx). Catches a doubled 5xx rate even when the single most-common status stayed 200.

Empty-response rate

The fraction of responses that decoded to {}, null, or empty. A rising empty rate often signals upstream degradation that body-shape analysis alone misses.

Route topology

Routes present in baseline but missing from current (endpoint removed) or vice versa (endpoint added), compared at the session level.

PII never leaves the comparison value-free

Cardinality tracking hashes every value (SHA-256, truncated) before it lands in any long-lived object — raw upstream values never sit in the detector's memory past the line they're read on. Shape and format are derived from type structure, not content. The fingerprint is built so it can tell you a field drifted without ever recording what was in it.

Distribution-based signals (status family, empty rate) need a minimum sample per side before they'll fire — GOSTLY_DRIFT_MIN_BEHAVIORAL_SAMPLES (default 10) — so a handful of recordings won't trip a rate-shift alert on noise. The field-level signals (shape, type, cardinality, format) work from the first recording.

Drift events

Every drifted route produces one drift event. Each event records the route + method, the two session ids it compared, a headline change_type, a severity, the structured diff of which signals fired, and a detected_at timestamp.

Severity is one of two bands, rolled up as the max across every signal that fired:

major

breaking

A consumer reading the old shape will break: a field was removed, a required field changed type or shape, a status code crossed families (2xx → 5xx), a stable format broke, a cardinality profile shifted by ≥2 buckets, or a route disappeared.

minor

additive / in-family

Most consumers tolerate it: a field was added, an optional field changed type, a status shifted within the same family (200 → 201), a one-bucket cardinality wiggle, or a new route appeared.

When more than one signal fires for a route, the headline change_type is mixedand the full per-signal breakdown lives in the event's diff field. List events for the current tenant, newest first, and filter to the open ones:

# Open (unacknowledged) drift events for one service
curl "http://localhost:8000/v1/drift/events?service_id=payments&unacknowledged_only=true" \
  -H "X-API-Key: $GHOST_API_KEY"

# A single event by id
curl http://localhost:8000/v1/drift/events/42 \
  -H "X-API-Key: $GHOST_API_KEY"

When you've triaged an event — fixed it, or filed it for later — acknowledge it. The acknowledgment is append-only: it writes an audit row (optionally tagged with a user id and a reason like "tracked in JIRA-1234, fix planned Q3") and sets the event's acknowledged_at hint. Re-acking preserves the first timestamp but still records who silenced it again and why.

# Acknowledge an event (empty body is fine; richer payload optional)
curl -X POST http://localhost:8000/v1/drift/events/42/acknowledge \
  -H "X-API-Key: $GHOST_API_KEY" \
  -H 'Content-Type: application/json' \
  -d '{"user_id": "jane", "reason_text": "expected v3 migration"}'

Open events feed the freshness score

Acknowledging is not cosmetic. Unacknowledged events are what drag a service's freshness score down — so silencing an event you've genuinely handled is also how you tell the score "this one is accounted for."

The 0–100 freshness score

The freshness score answers one question per service: are my mocks fresh enough to trust right now? It is a single integer from 0 to 100, computed from three weighted signals:

Recency · 50%

Time since the last recording for the service. A recording within the last 24h scores full credit; the contribution halves every 7 days thereafter (exponential decay). This is the dominant signal — a mock built from a month-old recording is the one most likely to lie to a developer right now.

Coverage · 25%

log10(distinct routes + 1), capped at full credit. Around ten distinct route patterns recorded earns full coverage. A service with one endpoint mocked is structurally less trustworthy than one with broad coverage.

Open drift · 25%

An inverted penalty on unacknowledged drift events: zero open events earns full credit, and the component drops to zero at ten or more open events. Acknowledging an event removes its drag.

The score is a triage signal, not a precise SLA: read it as green → likely safe, yellow → look, red → re-record now. A service with no recordings at all reports 0 with a null last_recorded_at— a distinct "no data yet" state, so "never recorded" never masquerades as "zero drift, all good."

The endpoint returns the underlying signals alongside the number, so the dashboard can show why a score is what it is rather than just the value:

curl "http://localhost:8000/v1/drift/freshness-score?service_id=payments" \
  -H "X-API-Key: $GHOST_API_KEY"

# {
#   "service_id": "payments",
#   "freshness_score": 72,
#   "last_recorded_at": "2026-06-14T09:14:22+00:00",
#   "distinct_routes": 8,
#   "open_drift_event_count": 1,
#   "computed_inline": false
# }

Read path

On a running deployment the score is read from a materialized view refreshed every five minutes, so the lookup is constant-time. On a fresh deployment before the first refresh — or when a caller wants a just-now value — the API falls back to computing it inline from the base tables. The response shape is identical either way; the computed_inline flag tells you which path ran.

The sparkline trend

A single freshness number tells you where a service stands today; the sparkline tells you where it's heading. The sparkline endpoint returns a 30-day, per-day histogram of drift events split by severity — the data behind the small trend cell next to each service on the dashboard's drift table.

curl "http://localhost:8000/v1/drift/sparkline?service_id=payments" \
  -H "X-API-Key: $GHOST_API_KEY"

# {
#   "service_id": "payments",
#   "days": [
#     { "day": "2026-06-12", "major": 0, "minor": 2, "total": 2 },
#     { "day": "2026-06-15", "major": 1, "minor": 0, "total": 1 }
#   ]
# }

Days with no events are omitted from the payload; the dashboard pads the gaps with zeros so a sparse series still renders as a flat line punctuated by spikes. A flat line of zeros is the healthy steady state — a cluster of major bars is the cue that an upstream is actively moving and your mocks need attention.

Re-record and compare

The full loop is: re-record the upstream into a new session, run the drift check against your baseline, then either re-build your mocks from the fresher recording or acknowledge the events as expected.

1. Re-record

Put the service back in LEARN mode and replay representative traffic against the live upstream. This captures a fresh recording session — a new recording_session_id partition — without disturbing the baseline the mocks were built from.

2. Compare

Trigger a schema-diff between the baseline session and the fresh one. The check rejects a session id that matches no recorded mocks with a 422, so an unknown session is a loud caller error rather than a silent empty all-clear.

3. Resolve

Read the emitted events. For real upstream changes, run a transition to re-build the mock library from the fresher recording. For expected churn, acknowledge the events — they drop out of the open count and stop dragging the freshness score.

# Compare a fresh recording session against the baseline
curl -X POST http://localhost:8000/v1/drift/check \
  -H "X-API-Key: $GHOST_API_KEY" \
  -H 'Content-Type: application/json' \
  -d '{
        "service_id": "payments",
        "baseline_session_id": "sess-2026-05-01",
        "current_session_id":  "sess-2026-06-15"
      }'
# Returns: { "events": [ ... ], "count": 3 }

The check is safe to re-run — because events are keyed on the session pair, running it twice returns the same events without duplicating them. That makes it equally suitable to wire into a transition pipeline as a post-record hook or to invoke ad hoc from a script.

Re-build from the fresh recording

When drift is real, the fix is the same LEARN → MOCK transition you used to build the library in the first place — re-run it against the fresher session. See How It Works → Transition for the scrub-and-rebuild step.

Metrics & observability

The control plane exposes a counter of drift events broken out by severity, so you can alert on the rate of new major drift across all services rather than polling each one:

curl http://localhost:8000/v1/drift/metrics \
  -H "X-API-Key: $GHOST_API_KEY"

# { "counters": {
#     "gostly_drift_events_total{severity=\"major\"}": 4,
#     "gostly_drift_events_total{severity=\"minor\"}": 11
# } }

The counter increments only on genuinely new events, never on idempotent re-checks — so the rate reflects real upstream movement, not how often your pipeline happens to run the comparison. The proxy's own request-path metrics (match-type counters, mock-library size) live on the agent's /metrics endpoint; see Configuration.

Scope & limits

Drift detection is deliberately scoped to comparisons that can be made honestly from recorded sessions. Worth knowing before you lean on it:

  • Response bodies must be JSON objects. Per-field signals are computed over top-level JSON objects. Top-level arrays and primitive bodies currently produce no per-field drift signal; the detector logs once per route so the gap is visible rather than silent.
  • Comparison is session-vs-session.You pick the two sessions — there is no automatic "previous session" discovery yet. Pre- partition recordings (those captured before sessions were tracked) carry no session id and aren't mixed into a session-scoped comparison.
  • Detection is request-driven. A drift check runs when you invoke it (or when a transition-pipeline hook does). There is no notification fan-out — you read events from the dashboard or the API.

Next steps