Did the AGI-by-2027 timeline come true?

No. Frontier models continued improving but the rate of improvement slowed relative to the straight-line extrapolations made in 2024. The Pareto frontier on public reasoning benchmarks advanced by single-digit percentage points per generation, not the order-of-magnitude leaps the strict reading required.

Did AI agents replace 20% of knowledge work?

No. US Bureau of Labor Statistics employment data shows knowledge-worker employment essentially flat year-over-year. Agents are real and useful in narrow workflows but broad displacement did not materialize.

Did open-weight models catch up to closed?

Mostly yes. Releases from Meta, DeepSeek, and Mistral closed a meaningful share of the benchmark gap by mid-2026, though closed-weight frontier models remain a step ahead.

Did AI infrastructure remain a great investment?

Mixed. Frontier training compute remained highly profitable; the second-tier GPU rental market compressed materially as supply caught up with demand.

AI Bubble Audit: 2025 Predictions Graded Against 2026 Reality

Q: What was the common failure mode in 2024 predictions?

Extrapolating compute curves without modeling second-order frictions — alignment, evaluation, distribution, regulation, taste. Specific predictions that ignored these were most often wrong.

In late 2024 the AI conversation was wall-to-wall confident prediction. The "AGI by 2027" line was repeated until it sounded like a baseline. Autonomous agents would replace knowledge work. Capital returns to AI infrastructure would compound forever. Eighteen months later, the conversation is quieter. Some of that is fatigue. Some of it is that the predictions did not all land.

This audit grades a set of the most-cited public predictions made in the second half of 2024 and the first quarter of 2025 against what is actually observable as of mid-2026. We restricted the slate to predictions whose authors put a date and a metric on them. Vague claims are not gradeable.

Prediction 1: "AGI by 2027" (graded: failed)

The "AGI by 2027" framing was popularized in the Situational Awareness essays by Leopold Aschenbrenner and amplified by other public figures over late 2024. The prediction was a continuous straight-line extrapolation of compute, data, and algorithmic efficiency.

What actually happened: the straight line bent. The largest released models from the major labs — including Anthropic's Claude family and OpenAI's GPT line — continued to improve, but the improvements have looked more like capability filling-in than capability take-off. The Pareto frontier on the public reasoning benchmarks (ARC-AGI, GPQA, the MMLU revisions) moved on the order of single-digit percentage points per generation, not the order-of-magnitude leaps the extrapolation required.

Grade: failed, with the caveat that "AGI" is definitional. Many systems can now do many tasks that resembled AGI in 2023 thought experiments. None of them are obviously general or obviously agentic enough to satisfy the strict reading of the prediction.

Prediction 2: "Autonomous agents replace 20% of knowledge work by 2026" (graded: failed)

This claim circulated in several variants — Goldman Sachs put a banded estimate on it; venture pitches repeated it; a much-shared Sequoia memo leaned on it.

What actually happened: autonomous agents are real and useful for narrow workflows — customer support triage, expense report processing, internal Q&A over wikis — but the broad displacement did not materialize. The aggregate United States knowledge-worker employment numbers from the Bureau of Labor Statistics are essentially flat year-over-year. Where displacement is visible, it is in contractor-heavy roles (entry-level copywriting, basic legal document review) and it is partial. Most knowledge workers spend part of their day collaborating with an AI; very few have been replaced by one.

Grade: failed on the headline number, partially true on the direction.

Prediction 3: "AI-native applications dominate the App Store top charts" (graded: mostly failed)

The thesis was that consumer AI apps would crowd out the incumbents. The reality is that ChatGPT, Claude, and Gemini remain dominant assistants, but the long tail of consumer AI apps mostly failed to take share from incumbents in their categories. Most independent AI apps that did succeed were professional tools — coding, design, sales productivity — and not consumer entertainment or social.

Grade: mostly failed for consumer; succeeded for prosumer/professional.

Prediction 4: "Capital returns on AI infrastructure stay above 30% for years" (graded: mixed)

The compute-spend thesis assumed that hyperscaler GPU returns would compound. The SemiAnalysis tracking and the public earnings of Nvidia, Microsoft, and Alphabet tell a more complicated story. Returns have remained strong on training runs at the frontier, but training-tier hardware utilization has begun to bifurcate from inference-tier. The mid-market — the second tier of clouds renting H100s and H200s to developers — has compressed materially.

Grade: mixed. The frontier compute trade was right. The second-tier compute trade was not.

Prediction 5: "Open weights catch up to frontier closed models" (graded: mostly true)

This is the prediction the loudest critics made. Eighteen months later, open-weight releases from Meta, DeepSeek, and Mistral, among others, have closed a meaningful share of the gap on the public reasoning benchmarks. They are not at the frontier — the closed-weight models remain a step ahead — but the gap is smaller than the headline-grabbing claims of "open is dead" suggested.

Grade: mostly true.

What the audit suggests

The predictions that failed had a shared characteristic: they extrapolated from compute curves without modeling the second-order frictions — alignment, evaluation, distribution, regulation, taste. The predictions that came truer were either narrowly scoped or grounded in clear historical analogies (open source eventually catches commercial; cheap compute lifts everyone).

If you are making AI predictions for 2028, the lesson is humbling: the people who were most specific were most often wrong. The people who said "real, useful, slower than the demos suggest" were most often right.

Sources

Aschenbrenner, "Situational Awareness," 2024 — situational-awareness.ai
Sequoia Capital essay on AI revolution — sequoiacap.com/article/the-ai-revolution-is-underhyped
US Bureau of Labor Statistics — bls.gov
Stanford HELM benchmark — crfm.stanford.edu/helm
SemiAnalysis tracking — semianalysis.com