Why MD Format Might Be a Good Choice for Generative UI

A2UI pattern diagram illustrating what to render and how to render for generative UI with declarative platform targets

A Claude teammate recently sparked a debate worth engaging with. In a post on X, @trq212 argued that HTML might be a more natural format than Markdown for generative output. It is a thoughtful argument. HTML is the language of the web. LLMs are trained on it. Browsers render it natively. The intuition makes sense — until you ask who is actually consuming generative UI output.

This is a comment from the trenches. We built MDMA, an open-source declarative generative framework for structured interface output, and we spent considerable time on exactly this question. Here is where we landed.

Overview: The Generative UI Format Question

Generative UI — sometimes called gen UI or genui — is the practice of having AI models produce interface components dynamically rather than having developers hand-code every screen in advance. A user prompt triggers a form. A data query generates a chart. A workflow produces a task list. The generative user interfaces produced this way are not static mockups — they are functional components that need to plug into real applications.

The format that carries those components from agent to platform matters enormously. Get it wrong and every platform that consumes generative UI has to build custom post-processing to make the output usable. Get it right and the output slots directly into whatever renderer the platform uses.

What Generative UI Actually Produces

Before arguing about format, it is worth being clear about what generative UI actually generates. Generative ui examples from across the ecosystem — CopilotKit, Google research experiments, MDMA — share a common pattern: the output is not a finished rendered page. It is a component definition. Something that says "here is a form with these fields" or "here is a bar chart with this data" rather than "here is the HTML that draws those things."

This distinction is what makes the HTML-vs-MD debate so loaded. If generative UI produced finished pages for end users to view in a browser, HTML would be the obvious winner. But generative UI produces components for developers, platforms, and agent runtimes to consume programmatically. That is a fundamentally different use case.

HTML Is Designed for Browsers — Not for Developer Ecosystems

The case for HTML rests on browser-first assumptions. HTML renders in browsers natively. Browsers are everywhere. LLMs generate valid HTML reliably. These are real advantages — for browser-first display.

But consider where generative UI actually gets consumed. A fullstack agent app built on Next.js does not inject raw HTML into its component tree. A CopilotKit application maps structured component references to React components from its own design system. An MCP app runtime processes agent output before any rendering happens at all. None of these consumers want raw HTML. They want structured data.

Generating text that a browser can render directly solves a narrow problem. It creates a bigger one: every non-browser consumer now has to parse, sanitize, and re-render the HTML to extract meaning from it. Using dangerouslySetInnerHTML in React is a security hazard. Building a custom HTML parser to extract semantic meaning from markup is fragile and slow. These are real costs that HTML generation imposes on every platform that integrates generative UI.

Flow for HTML rendering and parsing in React

Declarative Generative Formats: The A2UI Pattern

A2UI — agent-to-UI — is the design pattern at the heart of modern generative interfaces. A2UI separates what to render from how to render it. Interactive agents produce structured component specifications; renderers turn those specifications into platform-native UI. A2UI is how generative and traditional UI development can coexist cleanly inside the same application. AI agents express intent in natural language; a2ui translates that intent into typed component specifications that any renderer can consume.

This is exactly what MDMA implements. Rather than generating HTML, an MDMA-aware agent generates a declarative generative specification — structured YAML embedded in Markdown code fences. The specification defines component types, fields, bindings, and interaction logic. Renderers produce React components, Vue components, or whatever the target platform requires.

A2UI makes generative UI composable. The agent does not need to know anything about the platform's design system. The renderer does not need to understand the agent's reasoning. They communicate through a shared spec — a contract both sides can validate against.

Interactive Agents and the Generative A2UI Workflow

Interactive agents do more than return text. They return structured outputs that drive UI state across multiple turns. An interactive user experience powered by a2ui might look like this: an agent generates a form based on a user prompt, the user completes it, the agent processes the submission and generates a results visualization, all without the developer writing explicit screen transition logic.

A2UI makes this possible because each step produces a typed component definition — not raw HTML that the application has to interpret. The a2ui pattern gives agents and applications a shared language for describing what happens next.

CopilotKit pioneered this approach. The app with CopilotKit architecture lets developers define custom components that agents can emit by name. Atai Barkai — Barkai is the founder of CopilotKit — has described the vision as making agents first-class participants in UI state. A2UI generalizes this: any agent, any renderer, any platform.

Why HTML Has No Agent Spec

HTML has no concept of an agent spec. A <select> element in HTML describes visual structure. A select component in MDMA describes intent: the options available, how they are labeled, what binding they update, what validation applies. Agents can reason about MDMA's typed components because their properties are explicit and machine-readable. HTML is opaque — it tells a browser how to draw something but communicates nothing useful to downstream processing tools.

This matters for MCP apps in a different but equally important way. MDMA ships its own MCP server — @mobile-reality/mdma-mcp — which gives AI assistants direct access to the full component spec, authoring prompts, validation rules, and live documentation via standard MCP tools. An agent using this server can call get-spec to retrieve all component types and their schemas, get-prompt to load the author or fixer system prompt, or validate-prompt to check its output before returning it. HTML has no equivalent infrastructure. There is no get-html-spec tool because HTML's authoring rules for generative UI are not codified anywhere — they are implicit, inconsistent, and not machine-readable. Effective ui generators need a format whose rules can be expressed, versioned, and consumed programmatically. MDMA is built for that from the start.

Agents, Application Integration, and Platform Design

When applications integrate generative UI, they face a core design challenge: how do generated interfaces fit into an existing application's visual language and interaction model? The answer depends entirely on what format those interfaces arrive in.

Agents generating HTML produce output that is foreign to a modern application. The HTML carries its own styling, its own DOM structure, its own behavior assumptions. Making it fit the surrounding application requires post-processing: stripping styles, replacing elements, adding event handlers. This is fragile maintenance overhead that every platform integration has to absorb.

Agents generating MDMA output produce specs that renderers map to the platform's own components. The generative form uses the platform's input components. The generative chart uses the platform's visualization library. The agent and frontend relationship works cleanly because the agent output is renderer-agnostic. HTML assumes a browser. MDMA assumes nothing.

CopilotKit, Design Patterns, and the Generative Ecosystem

CopilotKit is the most widely adopted generative UI framework in the JavaScript ecosystem. CopilotKit's design patterns are instructive precisely because CopilotKit arrived at the same conclusions about format through practical experience.

CopilotKit does not use HTML as its interchange format. CopilotKit uses structured data — typed component references with explicit parameters. CopilotKit custom components are defined in code, organized into a component catalog, and agents reference them by name — developers import the renderer and the generative layer handles the rest. The generative layer in CopilotKit is a data layer, not a markup layer.

This is not a coincidence. CopilotKit, MDMA, and similar generative tools have all converged on the same design insight: markup formats couple the agent output to a specific renderer. Structured data formats decouple them. The generative UI ecosystem is building on structured data — and HTML is not structured data, it is a presentation language.

Chat, Widgets, and the Generative UI Experience

Chat — and specifically the chatbot — is the dominant surface for generative UI today. Generative ui experience in chat means messages that include interactive components — a form in a conversation thread, a chart in a support ticket, a task list in a project chat. CopilotKit popularized this pattern. MDMA's renderer-react package supports it natively.

But the generative ui experience that matters most is not a widget floating in a chat window. The best chatbot interactions produce personalized interfaces that adapt to the user's context and render natively inside the surrounding application — not generic HTML bubbles that clash with everything around them. Chat tolerates HTML because chat renderers already handle mixed content. Enterprise tools, MCP apps, education platforms, and visualization dashboards do not.

Custom uis that feel native to their platform require a format that platforms can control end to end. Structured generative specifications give platforms that control. HTML takes it away.

Gen UI Across Different Types of Platforms

Gen UI is not a browser problem. It is an integration problem that spans different types of platforms:

MCP apps running in agent runtimes where no browser is involved
Education platforms generating adaptive forms, quizzes, and assessments
Search interfaces rendering results as structured components
Games generating custom interface elements based on runtime state
Image generation tools producing configuration interfaces on demand
Visualization dashboards where charts and tables are agent-generated

In most of these contexts, generating text as HTML would create more problems than it solves. Each platform has its own rendering stack, its own component model, its own interaction patterns. What they all need is structured data that their renderers can consume directly.

JSON, YAML, and Key Tools for Generative Interchange

Key tools for generative UI need an interchange format that is universally parseable, renderer-agnostic, and semantically meaningful. JSON and YAML satisfy all three. HTML satisfies none of them in a cross-platform context.

MDMA uses structured YAML for its component blocks. Any platform with a YAML parser — which is every platform — can consume MDMA output without custom parsing, sanitization pipelines, or style scrubbing. The output is clean, typed, and immediately actionable by any renderer.

LLMs generating text naturally produce Markdown. Adding structured YAML blocks to that Markdown output is a small, reliable step. Generating valid, sanitized, framework-compatible HTML for consumption by arbitrary platforms is a much larger and more error-prone one.

The same holds for LangChain pipelines written in Python: MDMA output integrates as a structured tool response without any custom parsing. Google search results rendered as generative UI components need typed structured data, not raw HTML strings. The broader LLM tooling ecosystem is already built on JSON and YAML — generative UI format should speak the same language.

Custom Components and the Open Generative UI Vision

The future of generative UI is open and interoperable. Open generative ui means multiple agents, multiple renderers, multiple platforms sharing a common generative specification. It means a generative ui implementation built today can target a renderer that does not exist yet. It means custom components defined once can be driven by any compliant agent.

HTML cannot be that foundation. It is too tightly coupled to browser rendering. A truly open generative UI standard needs to be headless — a description of what, not a prescription of how. This is the design space that a2ui, CopilotKit, and MDMA are all building toward, and none of them are building it with HTML.

The generative capabilities of modern AI models are advancing faster than any single rendering target can absorb. The generative UI ecosystem should meet those capabilities with formats that work across all platforms — not just browsers.

Practical Tasks, Forms, and the Full Generative Application

When users interact with generative AI for practical tasks — submitting a form, reviewing a data summary, completing a multi-step workflow — the quality of the interaction depends on how native the generated interface feels. A custom interface that matches the application's design system produces a smooth interaction. An HTML blob that does not produces friction.

This is not theoretical. It is the difference between a fullstack agent application where generative UI feels like a first-class feature and one where it feels like a foreign element dropped into a page. The format choice drives that difference more than any other single decision.

MDMA exists because we kept running into this gap: generative UI that was technically impressive but practically awkward to integrate. Structured, declarative generative specifications close that gap. HTML widens it.

Conclusion: The Right Format for the Right Consumer

@trq212 is right that HTML is natural for browser-based display. For a narrow definition of generative UI — render agent output in a browser — HTML is a reasonable choice. But generative UI in practice is an integration challenge, not a display challenge.

The real consumers of generative output are developers building applications, platforms integrating agents, and tools processing structured data. For these consumers, declarative generative formats built on structured Markdown are a better fit than HTML. They are headless, renderer-agnostic, and designed to be consumed by code — not just by browsers.

HTML is the right answer to the wrong question. The question that matters is not "what can a browser render?" It is "what can every platform, every renderer, and every agent runtime use reliably?" For that question, structured Markdown with a generative specification like MDMA is the better bet.

HTML is for browsers. Declarative generative UI is for everyone.

Frequently Asked Questions

What is MVP in AI development?

An AI MVP represents the minimum set of features required to satisfy early adopters while requiring continuous model performance tracking and labeled data infrastructure from day one. According to MIT Sloan Management Review, an AI MVP must be monitorable for improvement from day one, as data serves as the most critical resource necessary even at the earliest stages. Unlike traditional software MVPs, you cannot simply add intelligence later; the AI component must constitute the core architecture from the initial release, focusing on validating one specific intelligent component rather than broad feature coverage.

What is the 30% rule in AI?

While specific 30% rules vary across different contexts, the article demonstrates significant efficiency gains through AI-powered development workflows rather than defining a single 30% rule. Teams using AI-powered tools report shipping 40-60% faster than those building manually, while Mobile Reality's shared component layer reduces frontend code effort by approximately 70%. AI MVP development typically spans six to twelve weeks compared to three to six months for conventional MVP development, representing substantial timeline compression.

How much does it cost to build a MVP app?

AI MVP strategies minimize financial exposure by reducing the capital required to test core hypotheses compared to traditional development cycles. The approach protects runway during critical early stages by enabling rapid validation with smaller investments and shorter timelines of six to twelve weeks versus three to six months for conventional development. By treating AI as infrastructure from day one and leveraging automation, startups avoid weeks of expensive stealth mode that increase burn rate while competitors test hypotheses publicly.

How to build an AI MVP?

The AI MVP development process follows six key stages: problem definition and AI feasibility assessment, data strategy and preparation infrastructure, model selection and iterative training, integration of AI components with core application code, continuous evaluation testing, and deployment with monitoring systems for model drift. Begin by defining a single high-value use case using the D1-D5 framework to evaluate Desirability, Data Readiness, Differentiation, Delivery Complexity, and Durability. You can expect ideation through prototype validation within four weeks using AI-powered code generation, with full deployment requiring additional weeks depending on integration complexity.

Discover more on AI-based applications and genAI enhancements

Artificial intelligence is revolutionizing how applications are built, enhancing user experiences, and driving business innovation. At Mobile Reality, we explore the latest advancements in AI-based applications and generative AI enhancements to keep you informed. Check out our in-depth articles covering key trends, development strategies, and real-world use cases:

Our insights are designed to help you navigate the complexities of AI-driven development, whether integrating AI into existing applications or building cutting-edge AI-powered solutions from scratch. Stay ahead of the curve with our expert analysis and practical guidance. If you need personalized advice on leveraging AI for your business, reach out to our team — we’re here to support your journey into the future of AI-driven innovation.