Most arguments about "the AI stack" are bad arguments because they conflate five distinct layers with five distinct competitive dynamics. A claim about Nvidia margins is not a claim about agent frameworks. A claim about open-weight models is not a claim about inference economics. Mixing them is how serious people end up at silly conclusions.
This is the framework we use editorially. It is not the only framework, but it has held up across two years of writing about the space, and it cleanly separates conversations that usually get tangled.
Layer 1: Silicon
The silicon layer is the chips. Nvidia is the dominant supplier; AMD's MI300 series is the credible second; Google's TPUs are a credible third for first-party workloads; AWS Trainium and Inferentia are a credible internal alternative for AWS-native deployments. A long tail of inference-specialist startups — Groq, Cerebras, SambaNova — competes on power-per-token at the inference tier.
The economics here are commodities-adjacent in the long run but monopolistic in the short run. Capacity is the binding constraint. Nvidia's pricing power is real but bounded by the willingness of hyperscalers to invest in alternatives.
Layer 2: Training compute
Layer 2 is the clusters that turn silicon into trained models. This is largely a hyperscaler market — Microsoft Azure, AWS, Google Cloud, Oracle Cloud — plus a small number of dedicated neocloud providers like CoreWeave, Lambda, and Crusoe.
The economics here look like aerospace contracts. Long-dated capacity commitments, capital intensity that is hard to overstate, and customer concentration with the major model labs. The frontier training trade has been very profitable for the labs and their cloud counterparties.
Layer 3: Inference
Layer 3 is the layer that serves the trained model to users. Inference has different economics from training: smaller individual jobs, sensitivity to latency, much higher utilization, and very different chip preferences. The same hyperscalers compete here, but they are joined by inference-specialist clouds and by the model labs themselves — Anthropic and OpenAI both serve much of their own inference and sell it as an API.
Margins here have compressed materially across 2025 and into 2026 as supply caught up with demand. This is where the "AI deflation" story most lives.
Layer 4: Orchestration
Layer 4 is the layer that turns model APIs into useful systems. It includes:
- Agent frameworks (LangChain, LlamaIndex, CrewAI).
- The Model Context Protocol and the registry of MCP servers.
- Evaluation tooling (Braintrust, Langfuse) and observability for AI systems.
- The growing class of "agent infrastructure" startups — Vercel's AI SDK, Mastra, and others.
This is the layer with the most competitive churn. It is also the layer where the open standards story is most consequential: MCP at the tools-and-data boundary, OpenTelemetry conventions for tracing, and a slowly-emerging set of evaluation standards.
Layer 5: Application
Layer 5 is the products users actually pay for. The dominant categories so far are: assistant apps (ChatGPT, Claude, Gemini, Perplexity), coding tools (Cursor, Claude Code, Replit, v0), enterprise productivity (Notion AI, Glean), creative tools (Runway, Suno), and a long tail of vertical agents.
The economics here are conventional SaaS economics, with one important wrinkle: cost of goods sold for AI apps is dominated by inference spend, which is volatile and partially controlled by the model providers. Application companies that do not understand their unit economics down to the token are running uninsured risk.
Why the framework matters
Three sloppy arguments that the framework defuses:
"Foundation models are commoditizing." This is true at layer 3 (inference is commoditizing) and partially true at layer 5 (multi-model app shells make the model swappable). It is not yet true at layer 2 (training the frontier remains capital-intensive in a way that limits entrants).
"Nvidia is overvalued because open models will undercut closed ones." This argument confuses layer 1 with layer 3. Open models still require silicon to train and serve.
"There is no moat in AI." There may be no durable model moat at layer 5. There are real moats at layers 1, 2, and 3 — they are just not the moats people argue about loudest.
If you find yourself arguing about "the AI stack" in 2026, ask the other person which layer they mean. Most of the time, the disagreement vanishes.
Sources
- Nvidia data center reference architectures — nvidia.com/en-us/data-center
- AMD Instinct product line — amd.com/en/products/accelerators/instinct.html
- Google TPU documentation — cloud.google.com/tpu
- Model Context Protocol — modelcontextprotocol.io