Comparison

Microsoft AGT vs Gostly

AGT decides whether the call may go out. Gostly determines what comes back. Both layers are needed; neither replaces the other.

Microsoft’s Agent Governance Toolkit landed in April 2026 as an MIT-licensed policy enforcement layer for agentic systems. The wedge is the DSL: write Cedar, OPA, or Rego YAML rules against tool calls and AGT decides — sub-millisecond, before the call leaves the agent — whether to allow, deny, or escalate. Every decision is written to a hash-chained audit log. It is a well-scoped, well-built piece of infrastructure.

AGT’s coverage of the OWASP Agentic Top-10 is genuinely comprehensive on the policy axis: excessive agency, tool misuse, and unauthorised actions are exactly the failure modes a policy engine is designed to catch. If your concern is “the agent tried to call a destructive endpoint and the policy did not stop it,” AGT is the right layer.

Gostly is not that layer. Gostly is the part underneath: once the policy has approved a tool call, what the upstream actually returns becomes part of the agent’s state. If that response is a live API call, it is non-reproducible — the same prompt tomorrow may yield a different observation, and the agent’s run cannot be regression-tested. Gostly captures the upstream response, redacts it, and replays it byte-equivalent.

The clean division of labour:AGT enforces the policy on the tool call. Gostly is the deterministic recording of what the tool’s upstream actually returned. Both questions need an answer for agent infrastructure to be auditable end-to-end.

Feature comparison

Feature	Microsoft AGT	Gostly
Policy DSL (Cedar / OPA / Rego YAML) against tool calls	first-class — the wedge	No
Hash-chained audit log of policy decisions	Yes	audit log on platform actions (Team)
Sub-millisecond pre-execution decision	Yes	n/a — different layer
OWASP Agentic Top-10 coverage via policy	first-class	partial — covers verifiability axis
OSS license	MIT	FSL → Apache 2.0 in 2 years
Traffic-derived contracts (recorded upstream responses)	No	default workflow
Byte-equivalent replay of recorded responses	No	Yes
No LLM in the deterministic cascade	n/a	Yes
Row-level tenant isolation (RLS) on shared state	n/a — stateless policy engine	22 tables, RLS enforced
16-header immutable redaction floor at capture	n/a	Yes
MCP server for agent introspection	No	Team tier — shipped
Decides whether the call goes out	Yes	No
Determines what comes back when it does	No	Yes

This is a two-layer comparison, not a feature shootout. Many cells read “n/a” on each side because the tools operate at different points in the call path.

Choose Microsoft AGT when

→Your security team wants a policy DSL they can review like code — Cedar or Rego against tool calls is the standard model.
→The failure modes you are most worried about are excessive agency, unauthorised tool use, and policy bypass — the OWASP Agentic Top-10 in the strict sense.
→You want an MIT-licensed OSS surface with no commercial strings attached.
→Sub-millisecond pre-execution latency on every tool call is a hard requirement.

Choose Gostly when

→You need to reproduce what an agent ran against a tool last week — same payload, same status, same headers — to regress-test or to satisfy an auditor.
→Your concern is the “what came back” axis: a live API mutates faster than your test suite can keep up with, and you want the mocks to be your real upstream’s real responses — not an LLM’s guess at them.
→Tenant isolation at the database level matters — 22-table RLS, a 16-header immutable redaction floor, and a 19-pattern PII scrubber applied at capture.
→Your agents speak MCP and you want them to introspect a deterministic library of recorded upstream behaviour before they act.

Used together

The teams that have looked at both ship them as a stack, not as a choice. AGT sits at the agent’s tool-call boundary and decides: allow, deny, escalate. When the call is allowed, it flows to Gostly, which serves the upstream’s recorded response in MOCK mode for tests and CI, or proxies + records in LEARN mode against staging. The agent’s execution trajectory is then bounded on both sides — policy on the outbound, recorded behaviour on the inbound.

The honest framing: if you can only adopt one of these layers this quarter, AGT is the right pick if your auditor is asking “why did the agent try that?” and Gostly is the right pick if your auditor is asking “why did the agent get that result?” Most enterprises end up needing both answers within a year.

Pricing, side by side

Tier	Microsoft AGT	Gostly
Free / OSS	MIT — fully featured, self-host	Unlimited services · OSS proxy (FSL)
Pro / Solo	No commercial single-seat tier	$10 / month single user
Team	n/a — OSS only at this layer	$79 / seat / month · MCP server included
Self-host / Enterprise	Microsoft commercial support paths via Azure	$499 / mo Self-host · $25K+ Enterprise

Pair Gostly with your policy layer

Recorded upstream behavior, not synthesized. Replay byte-equivalent against your agent’s tool calls — under whatever policy DSL you already trust.

Get started free Become a design partner

Evaluating for a team of 3+? We’d love to talk before you commit.