How we moved from a universal CPU to specialized processors — and why this fracture is power
CPU: Seventy years of stability, ten years of revolution
From 1945 to 2005, computing was dominated by a single fundamental architecture: the CPU (Central Processing Unit)
based on the stored-program model. For sixty years, every improvement came from doing the same thing better and faster:
more transistors, higher frequencies, larger caches. This model worked. It colonized the world.
Then, between 2005 and 2015, something irreversible happened: the emergence of radically different architectures.
GPUs that process thousands of operations in parallel. TPUs built for matrix multiplication. NPUs optimized for edge inference.
LPUs that eliminate non-determinism to reach extreme speed.
This isn’t a story of linear progress. It’s a story of crisis: when old solutions stop working, the system reinvents itself —
and in that reinvention, power concentrates.
The era of the universal CPU (1945–2005)
The forgotten inventors of the CPU: Eckert and Mauchly
The story begins in 1945 at the Moore School of Electrical Engineering at the University of Pennsylvania.
J. Presper Eckert and John Mauchly had just completed ENIAC:
the first large-scale electronic digital computer, the size of a room.
Their revolutionary idea: instead of “programming” the machine by rewiring cables, why not store programs in the same memory as data?
The stored-program concept is the foundation of modern computing.
John von Neumann documented these ideas in “First Draft of a Report on the EDVAC” (1945):
the document circulated widely, and the architecture became “von Neumann.”
A classic pattern: collective work credited to whoever holds greater prestige.
The architecture: the power and limit of traditional CPU
Unified memory: programs and data share the same space
Control unit: coordinates execution
ALU: performs arithmetic-logical operations
Input/Output: interfaces with the outside world
The separation between processor and memory, connected by a bus, is genius and disaster at the same time:
total flexibility, but also dependence on data movement. This is where the bottleneck is born.
The bottleneck: the price of CPU generality
In 1977, John Backus (Turing Award) put it plainly: the problem isn’t computing — it’s moving.
Computation is fast; waiting for memory is the structural tax of a universal architecture.
CPU “magic laws”: Moore and Dennard (1965–2005)
For forty years, the industry avoided collapse through technological scaling:
Moore’s Law (more transistors) and Dennard scaling (smaller transistors = proportionally less power).
It was the “free lunch” of computing: you just had to wait.
Around 2005, clock speeds hit a ceiling. Not for lack of ideas, but because of physics.
Dennard scaling collapsed: pushing clock higher meant catastrophic heat. This is the power wall.
The universal CPU model ran into a non-negotiable limit.
Image idea 1: “The CPU was universal → the heat wall forces fragmentation.”
The temporary fix: multi-core CPU (2005–2010)
If we can’t make one core faster, we add more cores. But parallelism isn’t free:
software must be rewritten, and Amdahl’s Law places a hard theoretical cap on speedups.
As transistors keep growing, dark silicon appears: parts of the chip must remain off because of thermal limits.
The discovery of parallelism — the GPU era (2006–2012)
2006: NVIDIA changes the game
NVIDIA makes a strategic move: it turns a graphics processor into a parallel computing engine.
GPUs are architecturally opposite to CPUs: less sophisticated control, more massive throughput.
A few “smart” cores versus thousands of “simple” cores.
The “aha” moment: matrix multiplication
3D graphics and deep learning share the same core: massive matrix multiplication.
When the community proves GPUs can accelerate training, the point isn’t “faster” — it’s “suddenly feasible.”
CUDA: the ecosystem that creates a monopoly
The decisive move is CUDA (2006): programming GPUs becomes accessible, but with a condition:
the ecosystem is proprietary and runs only on NVIDIA. This is where technical advantage turns into structural rent.
2012: AlexNet — the detonation
AlexNet wins ImageNet and opens the era of practical deep learning. GPU training becomes standard.
But what consolidates isn’t just a technology — it’s an industrial dependency on the software interface.
Google and the inference crisis — the TPU era (2013–2024)
2013: the calculation that scares Google
If every planet-scale service embeds neural networks, datacenters explode. GPUs are great for training,
but inference (millions of requests, one at a time) is a different regime: latency, efficiency, operating cost.
TPU v1: absolute specialization
Google builds a dedicated ASIC: the TPU. The key architecture is the systolic array.
The idea is brutal: data reuse, minimizing memory access during compute.
It’s a direct response to the real enemy: memory, not the ALU.
Image idea 2: “Systolic arrays: let data flow, instead of chasing it in memory.”
The ecosystem beats performance
Even when hardware is competitive or superior, adoption depends on the ecosystem.
CUDA dominates education, frameworks, libraries, and the job market.
Control of the software layer matters more than hardware.
AI at the CPU’s edge — the NPU era (2014–present)
Training in the cloud, inference everywhere
AI has to live on smartphones, laptops, IoT, cars. A discrete GPU draws too much power.
The answer is the NPU: integrated acceleration, low power, low precision, real-time inference.
AI becomes ubiquitous — and invisible.
Chapter 5: The sequential problem — the LPU era (2024–present)
LLMs: one token at a time
Large Language Models generate text autoregressively: token n+1 depends on token n.
It’s structurally sequential. On GPUs, inference becomes memory-bound:
hardware waits for data more than it computes.
LPU: determinism as a weapon
The LPU idea: remove “intelligence” from hardware and put control into the compiler.
Static scheduling, reduced non-determinism, minimized memory access.
It’s extreme specialization: efficiency and latency in exchange for flexibility.
Five architectures, one division of labor
Architecture
Optimized for
Strengths
Leaders
Where
CPU
Control flow, irregular workloads
Flexibility, low-latency
Intel, AMD, ARM
Everywhere
GPU
AI training, massive parallelism
Throughput, mature ecosystem
NVIDIA
Datacenters, workstations
TPU
Large-scale training, batch inference
Efficiency, stack integration
Google
Cloud/internal services
NPU
Edge inference, mobile AI
Low power, on-device
Apple, Qualcomm, Samsung
Phones, laptops, IoT
LPU
Real-time LLM inference
Determinism, low latency
Groq (specialized stacks)
Inference services
Logical architectures: specialization = efficiency, but also concentration
Each transition solves a specific bottleneck. But each solution almost always shifts power:
toward whoever controls the interface (software), production (foundries), and the ecosystem (education + libraries + toolchains).
Specialization isn’t only engineering. It’s industrial policy.
CUDA as technological rent
Hegemony isn’t measured only in TFLOPS. It’s measured in migration costs, dependencies, and lock-in.
When a platform becomes “the university,” “the default,” and “the job market,” hardware is just the visible face of power.
The manufacturing chokepoint
Advanced chips require nodes and machines that exist in only a few places on Earth.
Computational infrastructure becomes a geopolitical single point of failure.
This isn’t a footnote: it’s a structural condition of the digital future.
Conclusions: which future?
In seventy years we moved from a universal CPU to a fractured ecosystem of accelerators.
Each fracture increases efficiency, but reduces distributed control.
Knowledge is produced collectively; control tends to concentrate privately.
I deconstruct and reassemble communication to free it from the cages of mainstream digital narratives. My writing and visual work have appeared in Il Giornale, MowMag, and InsideOver, exploring sustainability, culture, political marketing, and storytelling for SMEs.
I believe in radical hybridization: where words glitch, pixels disrupt, and ideas ignite narratives.
When I’m not dismantling content, I’m chasing the unexpected in the margins of the everyday.
Wild Wild Web traces the collapse of the “open internet” myth: while we celebrated freedom, platforms enclosed the commons. Search, feeds, and cloud became private jurisdictions. Google, Meta, and Amazon shape visibility and access—sovereign power without a mandate.
Anthropic is caught in a public clash with the Pentagon: Claude runs inside classified environments through Palantir, where guardrails can’t be audited and accountability dissolves. With supply-chain threats and economic leverage, the real battle is over invisible logs.
Social media reorganized information, but LLMs do something more radical: they participate in thought production itself. They’re not environments we navigate, but cognitive tutors we co-think with. This transformation introduces unprecedented and more subtle power dynamics.
We dismiss likes as vanity metrics, but they were the emotional rewards driving interaction. Today, collapsed organic reach and information overload have broken that loop. When we talk about “meaningful interactions” we accept the platform defines what meaningful is. Content remains, but Platform is King.
American authoritarianism is moving from exception to infrastructure: facial recognition, data-fusion platforms (Palantir), mobile biometric scans, and long-term watchlists. Executive power turns speech and associations into “risk,” enabling visa revocations and deportations without due process.
Cookie Consent
We use cookies to improve your experience on our site. By using our site, you consent to cookies.
This website uses cookies
Websites store cookies to enhance functionality and personalise your experience. You can manage your preferences, but blocking some cookies may impact site performance and services.
Essential cookies enable basic functions and are necessary for the proper function of the website.
Name
Description
Duration
Cookie Preferences
This cookie is used to store the user's cookie consent preferences.