AG-UI Protocol Explained: How It Compares to MDMA

Introduction

AI agents are only as useful as the interface humans use to steer them. As agentic applications move to production, a new stack layer has emerged: agent interaction protocols and generative UI specs that define how an agent reaches the screen. The AG-UI protocol is one of the most talked-about entries, and it is often mentioned alongside MDMA - but the two solve different problems.

This guide explains what the AG-UI protocol is, how it works, who is behind it, and how it compares to MDMA, plus where it sits next to related agentic protocols like A2UI and MCP UI. The short version: AG-UI is the transport (the pipe), and MDMA is a content format (the payload) - closer to complementary than competing.

A breakdown of the protocols shaping modern AI agent frontends.

What Is the AG-UI Protocol?

AG-UI stands for the Agent-User Interaction Protocol. This open, lightweight, event-based agent user interaction protocol standardizes real-time communication between agentic systems and user-facing applications. Rather than defining what your buttons look like, this user interaction protocol standardizes app agent interactions - it defines how an agent backend and a frontend talk over the lifetime of a session.

Concretely, the frontend and the agentic backend exchange a stream of typed JSON events over SSE, WebSockets, or plain HTTP. That stream keeps the user interface in sync with everything the agent is doing: text streaming in, tool calls firing, state changing, runs starting and finishing. AG-UI cares about the connection and streaming of a long-running, nondeterministic agent - not the visual design of any component. It stays deliberately minimal about payload shape, so any agent framework or frontend can adopt it.

AG-UI Architecture: Events, Streaming, and State

The whole architecture rests on events. Everything that happens during an agent interaction - a token of text, the start of a backend tool call, a state diff - is one typed event on the stream. This gives developers a complete vocabulary for agent execution and keeps the application UI consistent with the running agent.

The 17 AG-UI Event Types

AG-UI defines roughly 16â€“17 core event types, organized into five categories:

A structural breakdown of the AG-UI protocol components.

Lifecycle events - run progress: RunStarted, RunFinished, RunError, StepStarted, StepFinished.
Text message events - conversational content for streaming chat: TextMessageStart, TextMessageContent, TextMessageEnd, TextMessageChunk.
Tool call events - function and API calls: ToolCallStart, ToolCallArgs, ToolCallEnd, ToolCallResult.
State management events - shared data sync: StateSnapshot, StateDelta, MessagesSnapshot.
Special events - custom or external: RawEvent, CustomEvent.

Streaming Chat, Tool Calls, and Shared State

Live output flows to the UI as soon as the agent produces tokens, so agentic chat feels responsive. When the agent invokes an API - or hands off to other agents - those tool calls surface as their own events, giving full tool visibility. Incremental StateDelta events carry only what changed, so the client merges updates efficiently. That shared-state model enables agent steering - a human nudging a long-running agent - and clean handling of timeouts and "done" signals. It is core interactivity infrastructure for real agentic apps.

Who Is Behind AG-UI? CopilotKit and the Ecosystem

AG-UI is an open protocol maintained in the public ag-ui-protocol/ag-ui repository, and it is heavily promoted and tooled by CopilotKit, the team that provides much of the reference tooling and documentation. CopilotKit's involvement is a big reason the protocol gained traction across the ecosystem, though the specification is open for any vendor to implement.

Agent Framework Integrations and SDKs

Breadth of ui integration is one of AG-UI's biggest strengths. The protocol ships SDKs - a ui client for the browser and a ui server adapter for the backend, plus frontend tools the agent can invoke - for agent framework backends including LangGraph, CrewAI, Mastra, LlamaIndex, Pydantic AI, and Agno, plus platform integrations like Amazon Bedrock AgentCore, the openai agent sdk surface, and the Microsoft Agent Framework. Related platforms - AWS Bedrock agents, Cloudflare Agents, and the Oracle Agent Spec - point at the same trend: a shared wire between ai agents and interfaces.

The structural components shipped by the AG-UI protocol to connect agents to interfaces.

The AG-UI Dojo and Documentation

To teach the building blocks, CopilotKit ships the UI Dojo - the AG-UI Dojo - small, focused examples (roughly 50â€“200 lines) demonstrating streaming chat, tool calls, shared state, generative UI, and human-in-the-loop. Alongside the official documentation, it is the fastest way to see the key features and ui features of the protocol in practice.

What Is MDMA?

MDMA - Markdown Document with Mounted Applications - is not a transport. It is a document / content format. It extends Markdown with fenced `mdma code blocks that describe interactive applications in YAML: forms, buttons, tables, approval gates, webhooks, and callouts.

The agent literally writes Markdown containing these blocks in its message. You parse that into an AST, drop it into a reactive document store, and render it - React out of the box. MDMA is about what is inside the message, not how it travels. Notably, MDMA documents contain no runtime JS: they are Markdown + YAML, parsed deterministically.

Key structural metrics of MDMA-based interactive messages.

MDMA's Opinionated Feature Set

Scoped to the document rather than the pipe, MDMA ships a lot a bare protocol leaves to you: a validator with ~17 lint rules and auto-fix, automatic PII detection and redaction, a tamper-evident audit log, a policy engine for allow/deny rules, and model-specialized authoring prompts ("prompt-packs"). The point is that free-form LLM text is hard to act on, so MDMA constrains the model to emit a predictable, schema-validated set of widgets your frontend already renders.

AG-UI vs MDMA: Transport vs Content

How AG-UI and MDMA divide session management and content rendering

The cleanest framing: MDMA is a thing an agent could emit, and AG-UI is one way to transport what the agent emits. AG-UI lists "generative UI" as a building block - MDMA is one concrete implementation. You could stream MDMA-bearing messages over an AG-UI event stream and both would be doing their job. AG-UI owns the session (streaming, tool calls, shared state, interrupts, lifecycle); MDMA owns the message (rendering validated, governed components inline).

Comparison Table

Dimension / AG-UI / MDMA
Dimension	AG-UI	MDMA
Layer	Transport / protocol (agent â†” UI wire)	Content format (interactive Markdown)
Primary artifact	A stream of typed events	Markdown with `mdma` blocks parsed to an AST
Concerned with	Streaming, tool calls, shared state, interrupts	Rendering validated UI components in a message
Delivery	SSE / WebSockets / HTTP	Transport-agnostic - text in the LLM output
State model	Event-sourced diffs over the wire	Local reactive document store

Evaluating generative UI frameworks for production?

We build AI agents and generative UI systems for fintech and proptech teams, and MDMA is the open-source format we ship for portable, model-authored interfaces with audit trails, PII redaction, and approval gates built in. If you are choosing a generative UI stack, contact us.

Matt Sadowski

CEO

Human-in-the-Loop Interaction: Decision vs Control

It is tempting to say MDMA "wins" human-in-the-loop because it renders approval gates so well. But this interaction is really two things, and MDMA only owns one.

The decision surface - MDMA owns this. Presenting the human a real choice point (approve/deny, edit, fill a form) and capturing the answer as structured, validated data. MDMA's approval-gate (pending / approved / denied), forms, and buttons are HITL affordances. It goes further than a button: every action is written to a tamper-evident audit log with PII redaction, and the policy engine enforces allow/deny rules.

The control primitive - AG-UI provides this. The agent execution actually suspending mid-run, waiting for the decision, then resuming with state intact. When a user clicks "approve" in MDMA, that dispatches an action into MDMA's local document store - it lands in the frontend. But MDMA is transport-agnostic: it cannot halt your running agent or feed the decision back. That pause / resume-without-losing-state is exactly AG-UI's interrupt building block. MDMA is the better interface for the decision; AG-UI is the mechanism that makes it a loop rather than a dead-ended click.

How AG-UI Fits Among Agent Interaction Protocols

MDMA is one of several generative ui approaches. Understanding the neighbors clarifies where AG-UI fits in the stack of agent interaction protocols.

AG-UI vs A2UI

A2UI is a declarative generative ui specification originated by Google. When an agent wants to show UI, it outputs an a2ui response - a JSON payload describing components and a data model - rather than HTML or JavaScript. That makes a2ui a payload format, comparable to MDMA, not to AG-UI. The relationship is layered: AG-UI is the runtime channel, and a2ui is one content format that flows through it. AG-UI natively supports a2ui and lets developers define custom specs. So the agent generates UI in a2ui, AG-UI transports it, and the user's interaction returns through AG-UI. MDMA occupies the same slot a2ui does - the payload - just with a Markdown-first, heavily governed flavor.

AG-UI vs MCP UI

MCP UI and the related MCP Apps extension to the Model Context Protocol follow an extension model rather than pure generative UI. MCP apps treat UI as a resource: servers provide pre-built HTML via ui:// URIs, rendered in sandboxed iframes, configured through tool calling. So the difference between MCP UI and AG-UI mirrors the MDMA comparison: MCP is the agent â†” tool protocol, MCP UI defines a payload, and AG-UI is the agent â†” UI transport. A single agent might use MCP for tool calls, A2UI or MDMA for the payload, and AG-UI to push updates to the client. Other specs, like OpenAI's Open-JSON-UI, follow the same pattern.

Practical Takeaway: When to Use Each

Reach for AG-UI to connect an agentic backend to a frontend with live streaming, tool visibility, shared state, and human-in-the-loop - the plumbing of agentic applications.
Reach for MDMA to get an LLM to produce structured, renderable, actionable UI (forms, approvals, tables) inside its responses without hand-parsing free text.
If you have both, use MDMA as the generative-UI payload flowing through an AG-UI event stream.

Conclusion

The agentic protocols landscape is layering fast: MCP for tools, A2UI / MDMA / MCP UI for the payload, and AG-UI for the transport. AG-UI is the wire between agent and app; MDMA is a governed, auditable generative ui format an agent can emit. MDMA renders the moment of human-in-the-loop beautifully, but still needs a protocol like AG-UI to suspend and resume the agent. As complementary layers rather than rivals, they let you build agent uis and agentic apps that are both well-connected and well-governed.

Frequently Asked Questions

What is the fundamental difference between AG-UI and MDMA?

AG-UI is an open, event-based transport protocol that standardizes real-time communication between an agentic backend and a frontend over SSE, WebSockets, or HTTP. MDMA is a content format—Markdown Document with Mounted Applications—that embeds interactive YAML-defined components like forms and approval gates directly inside an agent's message. The simplest framing is that AG-UI is the pipe, and MDMA is the payload.

How do AG-UI and MDMA work together for human-in-the-loop interactions?

MDMA owns the decision surface, rendering validated approval gates, forms, and buttons that capture structured human input in the UI. AG-UI provides the control primitive that suspends agent execution mid-run and resumes it with state intact after the user acts. When combined, MDMA renders the interactive moment while AG-UI transports the decision back to the agent, creating a true human-in-the-loop flow.

Where does A2UI fit into the ecosystem compared to AG-UI and MDMA?

A2UI is a declarative generative UI specification originated by Google that defines what UI to render, making it a payload format comparable to MDMA. AG-UI is the transport layer beneath it, carrying A2UI events to the frontend and returning user interactions to the backend. Both A2UI and MDMA sit at the content layer, while AG-UI handles the runtime wire between agent and interface.

Is AG-UI open source, and who maintains it?

Yes, AG-UI is an open protocol maintained in the public ag-ui-protocol repository and is available under the MIT license. CopilotKit provides the primary reference tooling, SDKs, and documentation, which has helped the protocol gain broad ecosystem traction. Any vendor or developer is free to implement the specification independently.

When should a team choose AG-UI, MDMA, or both?

Choose AG-UI when you need live plumbing for agentic applications, including streaming output, tool visibility, shared state synchronization, and session lifecycle management. Choose MDMA when you want the LLM to emit structured, governed UI components such as tables, forms, and webhooks without parsing free text. In production, they are complementary layers: MDMA serves as the generative UI payload flowing through an AG-UI event stream.

More on generative UI frameworks

Still weighing your options? Read our other side-by-side breakdowns of the leading generative UI frameworks, formats, and protocols:

Building generative UI for production and need portable, model-authored output with audit trails? Explore MDMA on GitHub.

Did you like the article?Find out how we can help you.

Contact Us Intro call