What are the five layers of AI infrastructure?

Silicon (chips), training compute (clusters), inference (serving), orchestration (frameworks, MCP, eval), and application (products). Each has different economics, different incumbents, and different competitive dynamics.

Which layer has the strongest moats?

Layers 1 and 2 — silicon and training compute — have the strongest structural moats because capital intensity and supply-chain access limit entrants. Layer 5 is the most commoditized over time.

Are foundation models commoditizing?

Partly. Inference is commoditizing rapidly (layer 3). Frontier training is not (layer 2), because capital intensity remains a binding constraint on who can compete at the top of the model curve.

Where does MCP fit in the framework?

Layer 4, orchestration. MCP is a wire-format standard for connecting model APIs to tools and data — it sits above inference and below the agent framework.

The Five-Layer Cake of AI Infrastructure: A Reference Framework

Q: Why do AI app companies need to track inference unit economics?

Because cost of goods sold for AI apps is dominated by inference, and inference pricing is volatile and partially controlled by the model providers. Companies that do not have token-level unit economics are running uninsured margin risk.

Most arguments about "the AI stack" are bad arguments because they conflate five distinct layers with five distinct competitive dynamics. A claim about Nvidia margins is not a claim about agent frameworks. A claim about open-weight models is not a claim about inference economics. Mixing them is how serious people end up at silly conclusions.

This is the framework we use editorially. It is not the only framework, but it has held up across two years of writing about the space, and it cleanly separates conversations that usually get tangled.

Layer 1: Silicon

The silicon layer is the chips. Nvidia is the dominant supplier; AMD's MI300 series is the credible second; Google's TPUs are a credible third for first-party workloads; AWS Trainium and Inferentia are a credible internal alternative for AWS-native deployments. A long tail of inference-specialist startups — Groq, Cerebras, SambaNova — competes on power-per-token at the inference tier.

The economics here are commodities-adjacent in the long run but monopolistic in the short run. Capacity is the binding constraint. Nvidia's pricing power is real but bounded by the willingness of hyperscalers to invest in alternatives.

Layer 2: Training compute

Layer 2 is the clusters that turn silicon into trained models. This is largely a hyperscaler market — Microsoft Azure, AWS, Google Cloud, Oracle Cloud — plus a small number of dedicated neocloud providers like CoreWeave, Lambda, and Crusoe.

The economics here look like aerospace contracts. Long-dated capacity commitments, capital intensity that is hard to overstate, and customer concentration with the major model labs. The frontier training trade has been very profitable for the labs and their cloud counterparties.

Layer 3: Inference

Layer 3 is the layer that serves the trained model to users. Inference has different economics from training: smaller individual jobs, sensitivity to latency, much higher utilization, and very different chip preferences. The same hyperscalers compete here, but they are joined by inference-specialist clouds and by the model labs themselves — Anthropic and OpenAI both serve much of their own inference and sell it as an API.

Margins here have compressed materially across 2025 and into 2026 as supply caught up with demand. This is where the "AI deflation" story most lives.

Layer 4: Orchestration

Layer 4 is the layer that turns model APIs into useful systems. It includes:

Agent frameworks (LangChain, LlamaIndex, CrewAI).
The Model Context Protocol and the registry of MCP servers.
Evaluation tooling (Braintrust, Langfuse) and observability for AI systems.
The growing class of "agent infrastructure" startups — Vercel's AI SDK, Mastra, and others.

This is the layer with the most competitive churn. It is also the layer where the open standards story is most consequential: MCP at the tools-and-data boundary, OpenTelemetry conventions for tracing, and a slowly-emerging set of evaluation standards.

Layer 5: Application

Layer 5 is the products users actually pay for. The dominant categories so far are: assistant apps (ChatGPT, Claude, Gemini, Perplexity), coding tools (Cursor, Claude Code, Replit, v0), enterprise productivity (Notion AI, Glean), creative tools (Runway, Suno), and a long tail of vertical agents.

The economics here are conventional SaaS economics, with one important wrinkle: cost of goods sold for AI apps is dominated by inference spend, which is volatile and partially controlled by the model providers. Application companies that do not understand their unit economics down to the token are running uninsured risk.

Why the framework matters

Three sloppy arguments that the framework defuses:

"Foundation models are commoditizing." This is true at layer 3 (inference is commoditizing) and partially true at layer 5 (multi-model app shells make the model swappable). It is not yet true at layer 2 (training the frontier remains capital-intensive in a way that limits entrants).

"Nvidia is overvalued because open models will undercut closed ones." This argument confuses layer 1 with layer 3. Open models still require silicon to train and serve.

"There is no moat in AI." There may be no durable model moat at layer 5. There are real moats at layers 1, 2, and 3 — they are just not the moats people argue about loudest.

If you find yourself arguing about "the AI stack" in 2026, ask the other person which layer they mean. Most of the time, the disagreement vanishes.

Sources

Nvidia data center reference architectures — nvidia.com/en-us/data-center
AMD Instinct product line — amd.com/en/products/accelerators/instinct.html
Google TPU documentation — cloud.google.com/tpu
Model Context Protocol — modelcontextprotocol.io