Reference

Metrics & Observability

The agent exposes a Prometheus exposition page at GET /metrics on the plain-HTTP port :8080. It is unauthenticated and always on — including in the OSS proxy — so you can point Prometheus at it the moment the container is up. This page documents the metric families the agent emits, what each label means, and how to scrape them.

Scraping the endpoint

The endpoint is plain Prometheus text exposition. Hit it directly to confirm the agent is emitting:

curl http://localhost:8080/metrics

A minimal Prometheus scrape config:

scrape_configs:
  - job_name: gostly-agent
    metrics_path: /metrics
    static_configs:
      - targets: ["localhost:8080"]

Exposed without a token — by design

/metrics, /health, /readyz, and /ca.crt are exempt from the GHOST_API_KEY control-plane auth that gates the /ghost/* admin routes. The page carries metric names and counts only — never request bodies, headers, or recorded payloads — so it is safe to scrape from your monitoring network. If your deployment must restrict it, do so at the network layer (the agent does not gate it itself).

Metric descriptions are registered at process boot, so the scrape page carries the full schema for every family — including the gostly_tls_* series — even before a counter has ticked or the TLS listener has spawned. Dashboards that template panels over a metric family will not see missing-series gaps on a fresh agent.

ghost_requests_total{match_type}

A counter incremented exactly once per request, on the path that ultimately handled it. The match_type label is the single most useful signal the agent emits: it tells you which tier of the match cascade served each MOCK-mode request, and which operating mode handled the rest.

session_verbatim

Served byte-for-byte from the in-memory active-session capture. Earliest and most faithful MOCK tier.

exact

Method + URI + request-body match against a recorded entry. O(1) hash lookup.

resource_store

A GET-by-id resolved to a resource created by an earlier POST in the same service (linked mocks). Returned 200 instead of a 404.

resource_transition

A PATCH /collection/{id} or POST /collection/{id}/{action} advanced a bound statechart and served the updated resource.

smart_swap

Structural match — a recording of /users/42 served a request to /users/99 after path-parameter normalisation.

generated_cached

Served from the inference cache (Pro+). The response was produced earlier by the background generation worker — never synchronously in this request.

learn

LEARN mode: the request was forwarded to the upstream and recorded. Also carries a workload_class label (human | ci | agent | unknown).

passthrough

PASSTHROUGH mode: forwarded to the upstream with no recording.

transitioning

TRANSITIONING mode: returned 503 + Retry-After while a LEARN→MOCK transition job runs.

chaos

A chaos rule injected an error response on this request before any match tier ran.

miss

MOCK mode with no match at any tier. The configured unmatched status/body was returned.

No LLM in the hot path — this counter proves it

A generated_cachedincrement is always a cache read. When the deterministic tiers miss on a Pro+ deployment, the agent enqueues a generation job on a bounded background worker and falls through immediately; the model never runs on the request's wall-clock path. This is an architectural invariant, observable directly: there is no match_type value for a synchronous generation, because that path does not exist.

The cascade order on a MOCK-mode request is session_verbatim → exact → resource_store → statechart → smart_swap → cached generation, ending in miss if nothing matched.

Useful queries — the mock hit rate, and the share of traffic that fell all the way through to a miss:

# Fraction of MOCK-mode requests that hit a deterministic tier
sum(rate(ghost_requests_total{match_type=~"session_verbatim|exact|resource_store|resource_transition|smart_swap"}[5m]))
  / sum(rate(ghost_requests_total{match_type=~"session_verbatim|exact|resource_store|resource_transition|smart_swap|generated_cached|miss"}[5m]))

# Miss rate — the signal that your library has coverage gaps
sum(rate(ghost_requests_total{match_type="miss"}[5m]))

ghost_mock_library_size

An unlabeled gauge holding the total number of mock entries the agent is currently serving from — summed across all services. It is set when the in-memory serving index is read (for example on a /health probe and after a library reload), so it tracks the live serving index, not the full traffic history on disk.

# Alert if the library empties unexpectedly (e.g. a bad reload)
ghost_mock_library_size == 0

ghost_io_errors_total{operation}

A counter for disk-sink failures — opening, writing, or flushing the JSONL files the agent persists to. Any non-zero rate means recorded data is being lost: a recorded request, a served-mock log line, or a captured webhook did not reach disk. The usual causes are a full volume, wrong mount permissions, or a missing mount. The operation label tells you which sink failed:

mock_open / mock_write

The served-mock log (data/mocks/mock_{svc}.jsonl) could not be opened or written.

traffic_open / traffic_write

The append-only traffic log (data/traffic/traffic_{svc}.jsonl) — the full-fidelity training corpus — could not be opened or written.

mode_write

The mode file could not be written; an agent restart would revert to the previously persisted mode.

webhook_mkdir / webhook_open / webhook_write / webhook_flush / webhook_serialize

A captured webhook could not be persisted to data/webhooks/{svc}.jsonl.

# Any disk-sink failure in the last 5 minutes is actionable
sum(rate(ghost_io_errors_total[5m])) by (operation) > 0

Each increment is paired with an ERROR-level structured log line carrying the path and the underlying OS error, so the metric tells you that a sink is failing and the log tells you why.

axum_http_requests_* (control-plane HTTP)

RED-method (Rate, Errors, Duration) instrumentation over the agent's HTTP surface, emitted by the axum-prometheus layer:

axum_http_requests_total

Counter of HTTP requests, labeled by endpoint, method, and status. Rate + error ratio.

axum_http_requests_duration_seconds

Histogram of request latency, same labels. Use histogram_quantile for p50/p95/p99.

This series measures the admin surface, not proxied traffic

The Prometheus layer is attached to the agent's explicit routes — the /ghost/* control plane, /health, /readyz, /metrics, /ca.crt, and webhook capture — not the proxy data plane that handles your recorded and mocked traffic. For per-request volume and outcomes of proxied traffic, use ghost_requests_total{match_type} above, which is incremented inside the proxy handler itself.

# p95 latency of the control-plane HTTP surface
histogram_quantile(0.95,
  sum(rate(axum_http_requests_duration_seconds_bucket[5m])) by (le, endpoint))

gostly_tls_* (TLS MITM subsystem)

The TLS family is emitted by the optional :8443 MITM listener (see TLS Interception). The descriptions are registered at boot even when ENABLE_TLS_INTERCEPTION is off, so the series are present on every scrape — with the listener-state gauge reporting state="off".

gostly_alpn_negotiated{protocol}

Counter of negotiated HTTP version per MITM request — protocol is h2, http/1.1, or other. Counts requests, not handshakes (HTTP/2 multiplexes many requests over one). A falling h2 share is your early warning that head-of-line-blocking behaviour is changing.

gostly_tls_cert_cache_hit_total

Hits in the per-SNI leaf-certificate cache. A high hit-to-miss ratio is the SLO.

gostly_tls_cert_cache_miss_total

Cold-cert mints — a new per-host leaf certificate was signed and a TLS config built. Sustained high rate at steady state = the cache is undersized or the hostname distribution is long-tailed.

gostly_tls_cert_cache_evictions_total{cause}

Involuntary cache removals. cause=size after warm-up means TLS_CACHE_SIZE is too small; cause=expired means entries are churning on TTL.

gostly_tls_listener_state{state}

A gauge flattened to per-state booleans — exactly one of state=off | lax_running | strict_running | lax_failed is 1 at any moment. Recover the active state with max by (state).

There is deliberately no handshake-outcome counter

Metrics on the MITM path are recorded after the TLS handshake completes, so handshake failures (bad CA chain, ALPN mismatch, mid-handshake client abort) happen before any counter can observe them. Rather than ship a counter that is flat-zero forever and mislead any dashboard built on it, the metric is omitted. Successful-handshake volume on the MITM path is already observable via the request counters above.

# Which TLS listener state is live right now
max by (state) (gostly_tls_listener_state)

# Cert-cache hit ratio — alert if it drops below the SLO
sum(rate(gostly_tls_cert_cache_hit_total[5m]))
  / sum(rate(gostly_tls_cert_cache_hit_total[5m]) + rate(gostly_tls_cert_cache_miss_total[5m]))

A note on metric prefixes

You will see three prefixes on the scrape page:

ghost_*

The agent's own counters and gauges for request matching, library size, and disk I/O.

gostly_*

The TLS subsystem and the agent's product-telemetry counters. gostly_ is the canonical product prefix.

axum_*

RED-method HTTP series contributed by the framework instrumentation layer over the control-plane routes.

Both ghost_ and gostly_ product prefixes appear on the scrape page. When you build dashboards, match on the exact metric names documented above rather than on a prefix glob, since the two namespaces are not interchangeable.

Next steps

The Match Cascade →

How each match_type value gets chosen, tier by tier.

TLS Interception →

Enable the :8443 listener that emits the gostly_tls_* family.