How we moved from a universal CPU to specialized processors — and why this fracture is power
CPU: Seventy years of stability, ten years of revolution
From 1945 to 2005, computing was dominated by a single fundamental architecture: the CPU (Central Processing Unit)
based on the stored-program model. For sixty years, every improvement came from doing the same thing better and faster:
more transistors, higher frequencies, larger caches. This model worked. It colonized the world.
Then, between 2005 and 2015, something irreversible happened: the emergence of radically different architectures.
GPUs that process thousands of operations in parallel. TPUs built for matrix multiplication. NPUs optimized for edge inference.
LPUs that eliminate non-determinism to reach extreme speed.
This isn’t a story of linear progress. It’s a story of crisis: when old solutions stop working, the system reinvents itself —
and in that reinvention, power concentrates.
The era of the universal CPU (1945–2005)
The forgotten inventors of the CPU: Eckert and Mauchly
The story begins in 1945 at the Moore School of Electrical Engineering at the University of Pennsylvania.
J. Presper Eckert and John Mauchly had just completed ENIAC:
the first large-scale electronic digital computer, the size of a room.
Their revolutionary idea: instead of “programming” the machine by rewiring cables, why not store programs in the same memory as data?
The stored-program concept is the foundation of modern computing.
John von Neumann documented these ideas in “First Draft of a Report on the EDVAC” (1945):
the document circulated widely, and the architecture became “von Neumann.”
A classic pattern: collective work credited to whoever holds greater prestige.
The architecture: the power and limit of traditional CPU
Unified memory: programs and data share the same space
Control unit: coordinates execution
ALU: performs arithmetic-logical operations
Input/Output: interfaces with the outside world
The separation between processor and memory, connected by a bus, is genius and disaster at the same time:
total flexibility, but also dependence on data movement. This is where the bottleneck is born.
The bottleneck: the price of CPU generality
In 1977, John Backus (Turing Award) put it plainly: the problem isn’t computing — it’s moving.
Computation is fast; waiting for memory is the structural tax of a universal architecture.
CPU “magic laws”: Moore and Dennard (1965–2005)
For forty years, the industry avoided collapse through technological scaling:
Moore’s Law (more transistors) and Dennard scaling (smaller transistors = proportionally less power).
It was the “free lunch” of computing: you just had to wait.
Around 2005, clock speeds hit a ceiling. Not for lack of ideas, but because of physics.
Dennard scaling collapsed: pushing clock higher meant catastrophic heat. This is the power wall.
The universal CPU model ran into a non-negotiable limit.
Image idea 1: “The CPU was universal → the heat wall forces fragmentation.”
The temporary fix: multi-core CPU (2005–2010)
If we can’t make one core faster, we add more cores. But parallelism isn’t free:
software must be rewritten, and Amdahl’s Law places a hard theoretical cap on speedups.
As transistors keep growing, dark silicon appears: parts of the chip must remain off because of thermal limits.
The discovery of parallelism — the GPU era (2006–2012)
2006: NVIDIA changes the game
NVIDIA makes a strategic move: it turns a graphics processor into a parallel computing engine.
GPUs are architecturally opposite to CPUs: less sophisticated control, more massive throughput.
A few “smart” cores versus thousands of “simple” cores.
The “aha” moment: matrix multiplication
3D graphics and deep learning share the same core: massive matrix multiplication.
When the community proves GPUs can accelerate training, the point isn’t “faster” — it’s “suddenly feasible.”
CUDA: the ecosystem that creates a monopoly
The decisive move is CUDA (2006): programming GPUs becomes accessible, but with a condition:
the ecosystem is proprietary and runs only on NVIDIA. This is where technical advantage turns into structural rent.
2012: AlexNet — the detonation
AlexNet wins ImageNet and opens the era of practical deep learning. GPU training becomes standard.
But what consolidates isn’t just a technology — it’s an industrial dependency on the software interface.
Google and the inference crisis — the TPU era (2013–2024)
2013: the calculation that scares Google
If every planet-scale service embeds neural networks, datacenters explode. GPUs are great for training,
but inference (millions of requests, one at a time) is a different regime: latency, efficiency, operating cost.
TPU v1: absolute specialization
Google builds a dedicated ASIC: the TPU. The key architecture is the systolic array.
The idea is brutal: data reuse, minimizing memory access during compute.
It’s a direct response to the real enemy: memory, not the ALU.
Image idea 2: “Systolic arrays: let data flow, instead of chasing it in memory.”
The ecosystem beats performance
Even when hardware is competitive or superior, adoption depends on the ecosystem.
CUDA dominates education, frameworks, libraries, and the job market.
Control of the software layer matters more than hardware.
AI at the CPU’s edge — the NPU era (2014–present)
Training in the cloud, inference everywhere
AI has to live on smartphones, laptops, IoT, cars. A discrete GPU draws too much power.
The answer is the NPU: integrated acceleration, low power, low precision, real-time inference.
AI becomes ubiquitous — and invisible.
Chapter 5: The sequential problem — the LPU era (2024–present)
LLMs: one token at a time
Large Language Models generate text autoregressively: token n+1 depends on token n.
It’s structurally sequential. On GPUs, inference becomes memory-bound:
hardware waits for data more than it computes.
LPU: determinism as a weapon
The LPU idea: remove “intelligence” from hardware and put control into the compiler.
Static scheduling, reduced non-determinism, minimized memory access.
It’s extreme specialization: efficiency and latency in exchange for flexibility.
Five architectures, one division of labor
Architecture
Optimized for
Strengths
Leaders
Where
CPU
Control flow, irregular workloads
Flexibility, low-latency
Intel, AMD, ARM
Everywhere
GPU
AI training, massive parallelism
Throughput, mature ecosystem
NVIDIA
Datacenters, workstations
TPU
Large-scale training, batch inference
Efficiency, stack integration
Google
Cloud/internal services
NPU
Edge inference, mobile AI
Low power, on-device
Apple, Qualcomm, Samsung
Phones, laptops, IoT
LPU
Real-time LLM inference
Determinism, low latency
Groq (specialized stacks)
Inference services
Logical architectures: specialization = efficiency, but also concentration
Each transition solves a specific bottleneck. But each solution almost always shifts power:
toward whoever controls the interface (software), production (foundries), and the ecosystem (education + libraries + toolchains).
Specialization isn’t only engineering. It’s industrial policy.
CUDA as technological rent
Hegemony isn’t measured only in TFLOPS. It’s measured in migration costs, dependencies, and lock-in.
When a platform becomes “the university,” “the default,” and “the job market,” hardware is just the visible face of power.
The manufacturing chokepoint
Advanced chips require nodes and machines that exist in only a few places on Earth.
Computational infrastructure becomes a geopolitical single point of failure.
This isn’t a footnote: it’s a structural condition of the digital future.
Conclusions: which future?
In seventy years we moved from a universal CPU to a fractured ecosystem of accelerators.
Each fracture increases efficiency, but reduces distributed control.
Knowledge is produced collectively; control tends to concentrate privately.
I deconstruct and reassemble communication to free it from the cages of mainstream digital narratives. My writing and visual work have appeared in Il Giornale, MowMag, and InsideOver, exploring sustainability, culture, political marketing, and storytelling for SMEs.
I believe in radical hybridization: where words glitch, pixels disrupt, and ideas ignite narratives.
When I’m not dismantling content, I’m chasing the unexpected in the margins of the everyday.
Fast16, an Equation Group implant dormant for twenty years on VirusTotal, rewrote nuclear detonation simulation results in memory without touching disk files — five years before Stuxnet. The same logic of corrupting perception, not systems, now defines biometric, cloud, and algorithmic surveillance.
From Disneyland to banks, hospitals to public space: a critical map of facial recognition in 2026. How biometrics is closing the historical gap between physical and digital identity, why anonymity becomes structurally impossible, and what it means that identity — once an intrinsic attribute of the person — is becoming a managed service, with access costs, mandatory verification, and a database as the final arbiter of who you are.
A 2.7MB JavaScript file embedded in LinkedIn’s production code silently scans up to 6,167 browser extensions per user — inferring religious beliefs, medical conditions, political views and job-seeking intent. No consent. No disclosure. Four undisclosed third-party recipients.
The first day of the Musk vs OpenAI trial is not just a legal clash: it exposes the fracture between nonprofit promise, private capital, and control of AI. Musk testifies, Altman leaves, and the question remains suspended: who really governs the least controlled force in tech power?
Paragon Graphite Spyware: The Spy in Your Pocket Paragon Graphite Spyware: Journalism · Intelligence · Power The Paragon Graphite spyware case is not only about compromised phones. It exposes a more subtle fracture: surveillance tools sold to states, journalists and activists exposed, foreign private vendors, incomplete logs, and responsibilities that are difficult to reconstruct. The phone becomes…
On March 26, 2026, Meta released TRIBE v2: an AI model capable of predicting brain activity in response to any video, audio, or text — with 70× higher resolution than previous systems, no scanner required. A breakthrough for neurological medicine and an unprecedented risk for the cognitive privacy of billions.
Cookie Consent
We use cookies to improve your experience on our site. By using our site, you consent to cookies.
This website uses cookies
Websites store cookies to enhance functionality and personalise your experience. You can manage your preferences, but blocking some cookies may impact site performance and services.
Essential cookies enable basic functions and are necessary for the proper function of the website.
Name
Description
Duration
pll_language
MULTI LANG SITE
12 months
Cookie Preferences
This cookie is used to store the user's cookie consent preferences.