Metrics & Observability
The agent exposes a Prometheus exposition page at GET /metrics on the plain-HTTP port :8080. It is unauthenticated and always on — including in the OSS proxy — so you can point Prometheus at it the moment the container is up. This page documents the metric families the agent emits, what each label means, and how to scrape them.
Scraping the endpoint
The endpoint is plain Prometheus text exposition. Hit it directly to confirm the agent is emitting:
curl http://localhost:8080/metrics
A minimal Prometheus scrape config:
scrape_configs:
- job_name: gostly-agent
metrics_path: /metrics
static_configs:
- targets: ["localhost:8080"]Exposed without a token — by design
/metrics, /health, /readyz, and /ca.crt are exempt from the GHOST_API_KEY control-plane auth that gates the /ghost/* admin routes. The page carries metric names and counts only — never request bodies, headers, or recorded payloads — so it is safe to scrape from your monitoring network. If your deployment must restrict it, do so at the network layer (the agent does not gate it itself).Metric descriptions are registered at process boot, so the scrape page carries the full schema for every family — including the gostly_tls_* series — even before a counter has ticked or the TLS listener has spawned. Dashboards that template panels over a metric family will not see missing-series gaps on a fresh agent.
ghost_requests_total{match_type}
A counter incremented exactly once per request, on the path that ultimately handled it. The match_type label is the single most useful signal the agent emits: it tells you which tier of the match cascade served each MOCK-mode request, and which operating mode handled the rest.
session_verbatimServed byte-for-byte from the in-memory active-session capture. Earliest and most faithful MOCK tier.
exactMethod + URI + request-body match against a recorded entry. O(1) hash lookup.
resource_storeA GET-by-id resolved to a resource created by an earlier POST in the same service (linked mocks). Returned 200 instead of a 404.
resource_transitionA PATCH /collection/{id} or POST /collection/{id}/{action} advanced a bound statechart and served the updated resource.
smart_swapStructural match — a recording of /users/42 served a request to /users/99 after path-parameter normalisation.
generated_cachedServed from the inference cache (Pro+). The response was produced earlier by the background generation worker — never synchronously in this request.
learnLEARN mode: the request was forwarded to the upstream and recorded. Also carries a workload_class label (human | ci | agent | unknown).
passthroughPASSTHROUGH mode: forwarded to the upstream with no recording.
transitioningTRANSITIONING mode: returned 503 + Retry-After while a LEARN→MOCK transition job runs.
chaosA chaos rule injected an error response on this request before any match tier ran.
missMOCK mode with no match at any tier. The configured unmatched status/body was returned.
No LLM in the hot path — this counter proves it
generated_cachedincrement is always a cache read. When the deterministic tiers miss on a Pro+ deployment, the agent enqueues a generation job on a bounded background worker and falls through immediately; the model never runs on the request's wall-clock path. This is an architectural invariant, observable directly: there is no match_type value for a synchronous generation, because that path does not exist.The cascade order on a MOCK-mode request is session_verbatim → exact → resource_store → statechart → smart_swap → cached generation, ending in miss if nothing matched.
Useful queries — the mock hit rate, and the share of traffic that fell all the way through to a miss:
# Fraction of MOCK-mode requests that hit a deterministic tier
sum(rate(ghost_requests_total{match_type=~"session_verbatim|exact|resource_store|resource_transition|smart_swap"}[5m]))
/ sum(rate(ghost_requests_total{match_type=~"session_verbatim|exact|resource_store|resource_transition|smart_swap|generated_cached|miss"}[5m]))
# Miss rate — the signal that your library has coverage gaps
sum(rate(ghost_requests_total{match_type="miss"}[5m]))ghost_mock_library_size
An unlabeled gauge holding the total number of mock entries the agent is currently serving from — summed across all services. It is set when the in-memory serving index is read (for example on a /health probe and after a library reload), so it tracks the live serving index, not the full traffic history on disk.
# Alert if the library empties unexpectedly (e.g. a bad reload) ghost_mock_library_size == 0
ghost_io_errors_total{operation}
A counter for disk-sink failures — opening, writing, or flushing the JSONL files the agent persists to. Any non-zero rate means recorded data is being lost: a recorded request, a served-mock log line, or a captured webhook did not reach disk. The usual causes are a full volume, wrong mount permissions, or a missing mount. The operation label tells you which sink failed:
mock_open / mock_writeThe served-mock log (data/mocks/mock_{svc}.jsonl) could not be opened or written.
traffic_open / traffic_writeThe append-only traffic log (data/traffic/traffic_{svc}.jsonl) — the full-fidelity training corpus — could not be opened or written.
mode_writeThe mode file could not be written; an agent restart would revert to the previously persisted mode.
webhook_mkdir / webhook_open / webhook_write / webhook_flush / webhook_serializeA captured webhook could not be persisted to data/webhooks/{svc}.jsonl.
# Any disk-sink failure in the last 5 minutes is actionable sum(rate(ghost_io_errors_total[5m])) by (operation) > 0
Each increment is paired with an ERROR-level structured log line carrying the path and the underlying OS error, so the metric tells you that a sink is failing and the log tells you why.
axum_http_requests_* (control-plane HTTP)
RED-method (Rate, Errors, Duration) instrumentation over the agent's HTTP surface, emitted by the axum-prometheus layer:
axum_http_requests_totalCounter of HTTP requests, labeled by endpoint, method, and status. Rate + error ratio.
axum_http_requests_duration_secondsHistogram of request latency, same labels. Use histogram_quantile for p50/p95/p99.
This series measures the admin surface, not proxied traffic
/ghost/* control plane, /health, /readyz, /metrics, /ca.crt, and webhook capture — not the proxy data plane that handles your recorded and mocked traffic. For per-request volume and outcomes of proxied traffic, use ghost_requests_total{match_type} above, which is incremented inside the proxy handler itself.# p95 latency of the control-plane HTTP surface histogram_quantile(0.95, sum(rate(axum_http_requests_duration_seconds_bucket[5m])) by (le, endpoint))
gostly_tls_* (TLS MITM subsystem)
The TLS family is emitted by the optional :8443 MITM listener (see TLS Interception). The descriptions are registered at boot even when ENABLE_TLS_INTERCEPTION is off, so the series are present on every scrape — with the listener-state gauge reporting state="off".
gostly_alpn_negotiated{protocol}Counter of negotiated HTTP version per MITM request — protocol is h2, http/1.1, or other. Counts requests, not handshakes (HTTP/2 multiplexes many requests over one). A falling h2 share is your early warning that head-of-line-blocking behaviour is changing.
gostly_tls_cert_cache_hit_totalHits in the per-SNI leaf-certificate cache. A high hit-to-miss ratio is the SLO.
gostly_tls_cert_cache_miss_totalCold-cert mints — a new per-host leaf certificate was signed and a TLS config built. Sustained high rate at steady state = the cache is undersized or the hostname distribution is long-tailed.
gostly_tls_cert_cache_evictions_total{cause}Involuntary cache removals. cause=size after warm-up means TLS_CACHE_SIZE is too small; cause=expired means entries are churning on TTL.
gostly_tls_listener_state{state}A gauge flattened to per-state booleans — exactly one of state=off | lax_running | strict_running | lax_failed is 1 at any moment. Recover the active state with max by (state).
There is deliberately no handshake-outcome counter
# Which TLS listener state is live right now max by (state) (gostly_tls_listener_state) # Cert-cache hit ratio — alert if it drops below the SLO sum(rate(gostly_tls_cert_cache_hit_total[5m])) / sum(rate(gostly_tls_cert_cache_hit_total[5m]) + rate(gostly_tls_cert_cache_miss_total[5m]))
A note on metric prefixes
You will see three prefixes on the scrape page:
ghost_*The agent's own counters and gauges for request matching, library size, and disk I/O.
gostly_*The TLS subsystem and the agent's product-telemetry counters. gostly_ is the canonical product prefix.
axum_*RED-method HTTP series contributed by the framework instrumentation layer over the control-plane routes.
Both ghost_ and gostly_ product prefixes appear on the scrape page. When you build dashboards, match on the exact metric names documented above rather than on a prefix glob, since the two namespaces are not interchangeable.