The Revolution of Logical Architectures

How we moved from a universal CPU to specialized processors — and why this fracture is power

CPU: Seventy years of stability, ten years of revolution

From 1945 to 2005, computing was dominated by a single fundamental architecture: the CPU (Central Processing Unit) based on the stored-program model. For sixty years, every improvement came from doing the same thing better and faster: more transistors, higher frequencies, larger caches. This model worked. It colonized the world.

Then, between 2005 and 2015, something irreversible happened: the emergence of radically different architectures. GPUs that process thousands of operations in parallel. TPUs built for matrix multiplication. NPUs optimized for edge inference. LPUs that eliminate non-determinism to reach extreme speed.

This isn’t a story of linear progress. It’s a story of crisis: when old solutions stop working, the system reinvents itself — and in that reinvention, power concentrates.
The era of the universal CPU (1945–2005)
The forgotten inventors of the CPU: Eckert and Mauchly
The story begins in 1945 at the Moore School of Electrical Engineering at the University of Pennsylvania. J. Presper Eckert and John Mauchly had just completed ENIAC: the first large-scale electronic digital computer, the size of a room.
Their revolutionary idea: instead of “programming” the machine by rewiring cables, why not store programs in the same memory as data? The stored-program concept is the foundation of modern computing. John von Neumann documented these ideas in “First Draft of a Report on the EDVAC” (1945): the document circulated widely, and the architecture became “von Neumann.” A classic pattern: collective work credited to whoever holds greater prestige.
The architecture: the power and limit of traditional CPU
Unified memory: programs and data share the same space
Control unit: coordinates execution
ALU: performs arithmetic-logical operations
Input/Output: interfaces with the outside world
The separation between processor and memory, connected by a bus, is genius and disaster at the same time: total flexibility, but also dependence on data movement. This is where the bottleneck is born.
The bottleneck: the price of CPU generality
In 1977, John Backus (Turing Award) put it plainly: the problem isn’t computing — it’s moving. Computation is fast; waiting for memory is the structural tax of a universal architecture.
CPU “magic laws”: Moore and Dennard (1965–2005)
For forty years, the industry avoided collapse through technological scaling: Moore’s Law (more transistors) and Dennard scaling (smaller transistors = proportionally less power). It was the “free lunch” of computing: you just had to wait.
Intel 4004 (1971): 2,300 transistors, 740 KHz · Pentium (1993): 3.1 million, 60 MHz · Pentium 4 (2000): 42 million, 1.5 GHz
2005: the crisis — the CPU “heat wall”
Around 2005, clock speeds hit a ceiling. Not for lack of ideas, but because of physics. Dennard scaling collapsed: pushing clock higher meant catastrophic heat. This is the power wall. The universal CPU model ran into a non-negotiable limit.

Image idea 1: “The CPU was universal → the heat wall forces fragmentation.”
The temporary fix: multi-core CPU (2005–2010)
If we can’t make one core faster, we add more cores. But parallelism isn’t free: software must be rewritten, and Amdahl’s Law places a hard theoretical cap on speedups. As transistors keep growing, dark silicon appears: parts of the chip must remain off because of thermal limits.
The discovery of parallelism — the GPU era (2006–2012)
2006: NVIDIA changes the game
NVIDIA makes a strategic move: it turns a graphics processor into a parallel computing engine. GPUs are architecturally opposite to CPUs: less sophisticated control, more massive throughput. A few “smart” cores versus thousands of “simple” cores.
The “aha” moment: matrix multiplication
3D graphics and deep learning share the same core: massive matrix multiplication. When the community proves GPUs can accelerate training, the point isn’t “faster” — it’s “suddenly feasible.”
CUDA: the ecosystem that creates a monopoly
The decisive move is CUDA (2006): programming GPUs becomes accessible, but with a condition: the ecosystem is proprietary and runs only on NVIDIA. This is where technical advantage turns into structural rent.
2012: AlexNet — the detonation
AlexNet wins ImageNet and opens the era of practical deep learning. GPU training becomes standard. But what consolidates isn’t just a technology — it’s an industrial dependency on the software interface.
Google and the inference crisis — the TPU era (2013–2024)
2013: the calculation that scares Google
If every planet-scale service embeds neural networks, datacenters explode. GPUs are great for training, but inference (millions of requests, one at a time) is a different regime: latency, efficiency, operating cost.
TPU v1: absolute specialization
Google builds a dedicated ASIC: the TPU. The key architecture is the systolic array. The idea is brutal: data reuse, minimizing memory access during compute. It’s a direct response to the real enemy: memory, not the ALU.

Image idea 2: “Systolic arrays: let data flow, instead of chasing it in memory.”
The ecosystem beats performance
Even when hardware is competitive or superior, adoption depends on the ecosystem. CUDA dominates education, frameworks, libraries, and the job market. Control of the software layer matters more than hardware.
AI at the CPU’s edge — the NPU era (2014–present)
Training in the cloud, inference everywhere
AI has to live on smartphones, laptops, IoT, cars. A discrete GPU draws too much power. The answer is the NPU: integrated acceleration, low power, low precision, real-time inference. AI becomes ubiquitous — and invisible.
Chapter 5: The sequential problem — the LPU era (2024–present)
LLMs: one token at a time
Large Language Models generate text autoregressively: token n+1 depends on token n. It’s structurally sequential. On GPUs, inference becomes memory-bound: hardware waits for data more than it computes.
LPU: determinism as a weapon
The LPU idea: remove “intelligence” from hardware and put control into the compiler. Static scheduling, reduced non-determinism, minimized memory access. It’s extreme specialization: efficiency and latency in exchange for flexibility.
Five architectures, one division of labor
Architecture Optimized for Strengths Leaders Where
CPU Control flow, irregular workloads Flexibility, low-latency Intel, AMD, ARM Everywhere
GPU AI training, massive parallelism Throughput, mature ecosystem NVIDIA Datacenters, workstations
TPU Large-scale training, batch inference Efficiency, stack integration Google Cloud/internal services
NPU Edge inference, mobile AI Low power, on-device Apple, Qualcomm, Samsung Phones, laptops, IoT
LPU Real-time LLM inference Determinism, low latency Groq (specialized stacks) Inference services
Logical architectures: specialization = efficiency, but also concentration
Each transition solves a specific bottleneck. But each solution almost always shifts power: toward whoever controls the interface (software), production (foundries), and the ecosystem (education + libraries + toolchains). Specialization isn’t only engineering. It’s industrial policy.
CUDA as technological rent
Hegemony isn’t measured only in TFLOPS. It’s measured in migration costs, dependencies, and lock-in. When a platform becomes “the university,” “the default,” and “the job market,” hardware is just the visible face of power.
The manufacturing chokepoint
Advanced chips require nodes and machines that exist in only a few places on Earth. Computational infrastructure becomes a geopolitical single point of failure. This isn’t a footnote: it’s a structural condition of the digital future.
Conclusions: which future?
In seventy years we moved from a universal CPU to a fractured ecosystem of accelerators. Each fracture increases efficiency, but reduces distributed control. Knowledge is produced collectively; control tends to concentrate privately.
Decode. Resist. Reclaim.
Read more
CPU Sources
Links open in a new tab.
Gordon E. Moore (1965) — “Cramming more components onto integrated circuits”
Robert H. Dennard et al. (1974) — “Design of ion-implanted MOSFET’s with very small physical dimensions”
John Backus (1978) — “Can Programming Be Liberated from the von Neumann Style?”
Raina, Madhavan, Ng (2009) — GPU for deep learning (NIPS/NeurIPS)
Krizhevsky, Sutskever, Hinton (2012) — “ImageNet Classification with Deep Convolutional Neural Networks”
Jouppi et al. (2017) — “In-Datacenter Performance Analysis of a Tensor Processing Unit”
Kung & Leiserson (1978) — Systolic arrays (classic paper on the paradigm)

The Revolution of Logical Architectures

CPU: Seventy years of stability, ten years of revolution

The era of the universal CPU (1945–2005)

The forgotten inventors of the CPU: Eckert and Mauchly

The architecture: the power and limit of traditional CPU

The bottleneck: the price of CPU generality

CPU “magic laws”: Moore and Dennard (1965–2005)

2005: the crisis — the CPU “heat wall”

The temporary fix: multi-core CPU (2005–2010)

The discovery of parallelism — the GPU era (2006–2012)

2006: NVIDIA changes the game

The “aha” moment: matrix multiplication

CUDA: the ecosystem that creates a monopoly

2012: AlexNet — the detonation

Google and the inference crisis — the TPU era (2013–2024)

2013: the calculation that scares Google

TPU v1: absolute specialization

The ecosystem beats performance

AI at the CPU’s edge — the NPU era (2014–present)

Training in the cloud, inference everywhere

Chapter 5: The sequential problem — the LPU era (2024–present)

LLMs: one token at a time

LPU: determinism as a weapon

Five architectures, one division of labor

Logical architectures: specialization = efficiency, but also concentration

CUDA as technological rent

The manufacturing chokepoint

Conclusions: which future?

CPU Sources

internet, free society and the short escape from capitalism

world wild web: the end of the free net

anthropic and the pentagon: ai new strategic infrastructure

beyond media ecology: ai and llm, your new cognitive tutor

new american authoritarianism: technocratic surveillance

The Revolution of Logical Architectures

CPU: Seventy years of stability, ten years of revolution

The era of the universal CPU (1945–2005)

The forgotten inventors of the CPU: Eckert and Mauchly

The architecture: the power and limit of traditional CPU

The bottleneck: the price of CPU generality

CPU “magic laws”: Moore and Dennard (1965–2005)

2005: the crisis — the CPU “heat wall”

The temporary fix: multi-core CPU (2005–2010)

The discovery of parallelism — the GPU era (2006–2012)

2006: NVIDIA changes the game

The “aha” moment: matrix multiplication

CUDA: the ecosystem that creates a monopoly

2012: AlexNet — the detonation

Google and the inference crisis — the TPU era (2013–2024)

2013: the calculation that scares Google

TPU v1: absolute specialization

The ecosystem beats performance

AI at the CPU’s edge — the NPU era (2014–present)

Training in the cloud, inference everywhere

Chapter 5: The sequential problem — the LPU era (2024–present)

LLMs: one token at a time

LPU: determinism as a weapon

Five architectures, one division of labor

Logical architectures: specialization = efficiency, but also concentration

CUDA as technological rent

The manufacturing chokepoint

Conclusions: which future?

CPU Sources

Similar Posts

social