Comparison

Microsoft AGT vs Gostly

AGT decides whether the call may go out. Gostly determines what comes back. Both layers are needed; neither replaces the other.

Microsoft’s Agent Governance Toolkit landed in April 2026 as an MIT-licensed policy enforcement layer for agentic systems. The wedge is the DSL: write Cedar, OPA, or Rego YAML rules against tool calls and AGT decides — sub-millisecond, before the call leaves the agent — whether to allow, deny, or escalate. Every decision is written to a hash-chained audit log. It is a well-scoped, well-built piece of infrastructure.

AGT’s coverage of the OWASP Agentic Top-10 is genuinely comprehensive on the policy axis: excessive agency, tool misuse, and unauthorised actions are exactly the failure modes a policy engine is designed to catch. If your concern is “the agent tried to call a destructive endpoint and the policy did not stop it,” AGT is the right layer.

Gostly is not that layer. Gostly is the part underneath: once the policy has approved a tool call, what the upstream actually returns becomes part of the agent’s state. If that response is a live API call, it is non-reproducible — the same prompt tomorrow may yield a different observation, and the agent’s run cannot be regression-tested. Gostly captures the upstream response, redacts it, and replays it byte-equivalent.

The clean division of labour:AGT enforces the policy on the tool call. Gostly is the deterministic recording of what the tool’s upstream actually returned. Both questions need an answer for agent infrastructure to be auditable end-to-end.

Feature comparison

FeatureMicrosoft AGTGostly
Policy DSL (Cedar / OPA / Rego YAML) against tool callsfirst-class — the wedgeNo
Hash-chained audit log of policy decisionsYesaudit log on platform actions (Team)
Sub-millisecond pre-execution decisionYesn/a — different layer
OWASP Agentic Top-10 coverage via policyfirst-classpartial — covers verifiability axis
OSS licenseMITFSL → Apache 2.0 in 2 years
Traffic-derived contracts (recorded upstream responses)Nodefault workflow
Byte-equivalent replay of recorded responsesNoYes
No LLM in the deterministic cascaden/aYes
Row-level tenant isolation (RLS) on shared staten/a — stateless policy engine22 tables, RLS enforced
16-header immutable redaction floor at capturen/aYes
MCP server for agent introspectionNoTeam tier — shipped
Decides whether the call goes outYesNo
Determines what comes back when it doesNoYes

This is a two-layer comparison, not a feature shootout. Many cells read “n/a” on each side because the tools operate at different points in the call path.

Choose Microsoft AGT when

  • Your security team wants a policy DSL they can review like code — Cedar or Rego against tool calls is the standard model.
  • The failure modes you are most worried about are excessive agency, unauthorised tool use, and policy bypass — the OWASP Agentic Top-10 in the strict sense.
  • You want an MIT-licensed OSS surface with no commercial strings attached.
  • Sub-millisecond pre-execution latency on every tool call is a hard requirement.

Choose Gostly when

  • You need to reproduce what an agent ran against a tool last week — same payload, same status, same headers — to regress-test or to satisfy an auditor.
  • Your concern is the “what came back” axis: a live API mutates faster than your test suite can keep up with, and you want the mocks to be your real upstream’s real responses — not an LLM’s guess at them.
  • Tenant isolation at the database level matters — 22-table RLS, a 16-header immutable redaction floor, and a 19-pattern PII scrubber applied at capture.
  • Your agents speak MCP and you want them to introspect a deterministic library of recorded upstream behaviour before they act.

Used together

The teams that have looked at both ship them as a stack, not as a choice. AGT sits at the agent’s tool-call boundary and decides: allow, deny, escalate. When the call is allowed, it flows to Gostly, which serves the upstream’s recorded response in MOCK mode for tests and CI, or proxies + records in LEARN mode against staging. The agent’s execution trajectory is then bounded on both sides — policy on the outbound, recorded behaviour on the inbound.

The honest framing: if you can only adopt one of these layers this quarter, AGT is the right pick if your auditor is asking “why did the agent try that?” and Gostly is the right pick if your auditor is asking “why did the agent get that result?” Most enterprises end up needing both answers within a year.

Pricing, side by side

TierMicrosoft AGTGostly
Free / OSSMIT — fully featured, self-hostUnlimited services · OSS proxy (FSL)
Pro / SoloNo commercial single-seat tier$10 / month single user
Teamn/a — OSS only at this layer$79 / seat / month · MCP server included
Self-host / EnterpriseMicrosoft commercial support paths via Azure$499 / mo Self-host · $25K+ Enterprise

Pair Gostly with your policy layer

Recorded upstream behavior, not synthesized. Replay byte-equivalent against your agent’s tool calls — under whatever policy DSL you already trust.

Evaluating for a team of 3+? We’d love to talk before you commit.