TRIBE V2 AND THE MEASURABLE MIND
Tribe V2 The first digital twin of the human brain: how it works, what it changes in medicine, and what we truly risk
INVESTIGATION · COMPUTATIONAL NEUROSCIENCE · AI · APRIL 2026
▶ Listen to the articleA machine that can predict how your brain responds to a song, a headline, a face — without scanners, without electrodes, in seconds. Meta built it. Meta released it. Very few people understand what that really means.
Imagine listening to a song you have known for twenty years. You do not just hear it: something shifts inside you — melancholy, euphoria, a precise memory you did not consciously call up. That shift is not a metaphor. It is physical: millions of neurons increase their oxygen consumption, blood rushes to the regions that process memory and emotion. The brain changes its metabolic configuration in response to what it perceives. Now imagine that a computer system — without scanners, without electrodes, without your knowledge — can predict exactly how your cortex reacts to that song, that newspaper headline, that political face on the screen. That is what TRIBE v2, the model released by Meta on March 26, 2026, does.
To grasp the scale of the change, you first need to understand what came before. Functional magnetic resonance imaging — fMRI — has for thirty years been the core instrument of cognitive neuroscience: it measures changes in blood oxygenation in response to neural activity, translating them into 3D maps of brain activation. Powerful, but structurally limited in three ways. The first is time: neurons communicate in milliseconds, blood flow responds in two to five seconds. The second is bodily contamination: every breath, every heartbeat, every tiny muscular tremor introduces magnetic variation layered over the neural signal. The third is economic: one hour of fMRI scanning in a research hospital costs between one and three thousand dollars, and requires ethical approval, volunteers, and specialized technicians.
Tribe v2: what’s behind?
TRIBE v2 (TRansformer for In-silico Brain Experiments), accepted at ICLR 2026, bypasses all three of those limits at once. It is not a better scanner: it is a model that learned, from more than 1,115 hours of real scans from 720 volunteers, how the human brain statistically responds to any visual, auditory, or linguistic stimulus. Feed it a video, an audio clip, or a text, and it produces in seconds a three-dimensional map of cortical activation across roughly 70,000 spatial points. Its predictions are more accurate than the fMRI scan of a single real individual, because it filters out all the bodily noise that every physical scanner inevitably records along with thought. This is not science fiction. It is a paradigm shift — and like every paradigm shift, it brings real opportunities and real risks that society is not yet equipped to govern.
From input to voxel
How three sensory modalities are fused into a brain activation map across 70,000 cortical points
Encoding
- Video V-JEPA2-Giant 64 frames · 4 seconds · semantic relations
- Audio Wav2Vec-BERT 2.0 Acoustic structures · 2 Hz sampling
- Text LLaMA 3.2-3B 1,024 tokens of context · temporal narrative
D=384
Transformer
- Cross-modal dependencies
- Multisensory integration
- Universal representation
- Shared cognitive patterns
mapping
Map
voxels
predicted
fsaverage5 mesh
previous model
training dataset
processed
standard models
How it works: three senses, one transformer, seventy thousand points
The clearest way to understand TRIBE v2’s architecture is to think about how human beings understand a film. While you watch a scene, the brain processes movement on the screen, dialogue, and background music at the same time, fusing them into a single emotional and narrative experience. None of those channels exists independently from the others: music changes the meaning of images, words alter how faces are interpreted. TRIBE v2 replicates that same structure of parallel processing and fusion.
The system relies on three existing artificial intelligence models as specialized feature extractors. For video, it uses V-JEPA2-Giant, which analyzes sequences of 64 frames by tracking the semantic relationships between what moves on screen. For audio, it uses Wav2Vec-BERT 2.0, which breaks down the soundscape into its core acoustic structures. For text, it uses LLaMA 3.2-3B with a context window of 1,024 prior words — because the human brain, like the model, does not process words in isolation but within the thread of narrative. The three streams are then compressed and fused by a Temporal Transformer — the system’s core — which learns the reciprocal dependencies between modalities over time: it does not simply add vision, sound, and language together, but learns at what moment and in what combination the human brain produces its most characteristic responses.
TRIBE v2’s predictions are more accurate than the fMRI scan of a single real individual. The computational model becomes, paradoxically, a biological gold standard.
The final output is a mapping across 70,000 voxels on the cortical surface — projected onto the standard fsaverage5 mesh — showing which areas of the brain would activate, and with what intensity, in response to the full stimulus. By extracting the universal regularities of the cortex from 720 different subjects, TRIBE v2 can predict the brain activity of people it has never scanned, on tasks it has never seen, in languages absent from the original dataset. Meta’s team fed TRIBE v2 the same stimuli used in the IBC dataset (Individual Brain Charting): in simulation, the system reproduced the localization of the fusiform face area, the parahippocampal place area, and Broca’s area for syntax — regions that neuroscience took decades and millions of dollars to chart.
From Broca to TRIBE v2: one hundred and sixty-five years of a trajectory
TRIBE v2 did not fall from the sky in 2026. It is the culmination of an intellectual trajectory spanning one hundred and sixty-five years, during which science progressively dismantled the idea that the mind was something separate from the body — and slowly learned how to measure it. Paul Broca established that the mind has physical coordinates. Donald Hebb explained how those coordinates change through experience. Domenico Parisi showed that the experience that matters is embodied, situated, ecological — and that whoever controls the environment indirectly controls the cognitive structure that emerges within it. The LARAL laboratory at ISTC-CNR in Rome, founded by Parisi in 1985, produced over twenty years of research the strongest computational demonstration of that principle: ecological neural networks that spontaneously develop structures identical to the place cells of the hippocampus, emotional circuits that emerge as survival tools, and symbols that acquire meaning only through sensorimotor anchoring to physical experience. The 2004 paper by Cangelosi and Parisi in Brain and Language even used the expression “synthetic brain imaging” thirty years before TRIBE v2.
Meta FAIR translated that intuition into a computational architecture at industrial scale. Between 2021 and 2024, the systematic study of the correspondence between the internal representations of self-supervised visual models (DINOv2) and the real activity of the human visual cortex — measured with 7 Tesla fMRI — revealed that Vision Transformers trained without human labels spontaneously develop processing hierarchies parallel to the biological ventral visual stream. That validation laid the theoretical foundations for the next leap: from visual alignment to TRIBE v2’s full multisensory integration.
The thread leading to the digital twin
How science learned to read, measure, and simulate the mind across one hundred and sixty-five years
Tribe v2, Compared with medical research: function versus structure
By 2026 the landscape of brain foundation models had become crowded, and the most illuminating comparison is with BrainIAC, developed by Mass General Brigham, affiliated with Harvard, and published in Nature Neuroscience. The two systems appear to occupy the same field — brain, artificial intelligence, neuroimaging — but they answer fundamentally different questions.
| Axis | TRIBE v2 · Meta FAIR | BrainIAC · Harvard MGH |
|---|---|---|
| Data type | Dynamic functional fMRI (BOLD signal) | Static anatomical structural MRI |
| Training corpus | 1,115h fMRI · 720+ healthy volunteers | 48,965 heterogeneous clinical MRI scans |
| Operational input | Naturalistic video, audio, text | 3D anatomical grayscale slices |
| Core question | How does this brain respond to this stimulus? | Is this brain diseased? In what way? |
| Clinical output | Cortical map of cognitive activation | Brain age, dementia risk, tumor signature |
| Generalization | Zero-shot: unseen subjects and languages | Cross-institutional: different scanners, rare pathologies |
| Current clinical use | Experimental (Alzheimer’s, BCI, neuromodulation) | Active oncological and neurodegenerative diagnostics |
BrainIAC sees structure — it photographs the brain as it is built, looks for morphological anomalies, calculates whether hippocampal volume has dropped below the risk threshold for dementia. TRIBE v2 sees function — it predicts how that structure responds to the world, how it processes a political speech or a melody. The two approaches are complementary: BrainIAC detects structural damage in early Alzheimer’s; TRIBE v2 simulates how much that damage has already compromised semantic association networks. Used together, they could move diagnosis years earlier than any current method.
Both systems inherit the constraints of the neuroimaging data on which they were trained. fMRI is blind to neural electrical dynamics at the millisecond scale: it sees metabolism, not computation. Models trained on signals with high temporal fidelity — such as intracranial ECoG recordings — could eventually break through that ceiling. But that frontier requires data that today exist only in highly specialized neurosurgical contexts.
The risks no one wants to name out loud
Meta’s press release speaks of accelerated neuroscience, neurological disease, and benefit for humanity. All of that is real. But the model is open-source, it runs on consumer GPUs, and it was built by a company whose primary business model is selling human attention to advertisers. Holding those two truths together requires a level of lucidity that the current wave of enthusiasm does not encourage.
Rafael Yuste, a neurobiologist at Columbia University and cofounder of the Neurorights Foundation, coordinated in 2024 a systematic analysis of the user contracts of 30 consumer neurotechnology companies: 29 out of 30 reserve unlimited rights over users’ neural data. 16 out of 30 explicitly reserve the right to sell those data to third parties. Yuste introduces the concept of neurodiscrimination — the risk of being judged for neural predispositions before acting at all: not for visible behavior, but for measurable brain patterns.
“There are no rules; the company is taking everything. Companies never self-regulate. We’ve seen that over and over again, on the internet, with the metaverse, with social media.”
Yuste’s data concern devices that collect EEG signals directly. TRIBE v2 collects nothing: it infers neural responses from multimedia content. That places it in an even deeper regulatory gray zone.
Marcello Ienca, a bioethicist at the Technical University of Munich, argued in a 2024 paper in Neuron that current laws only cover data collected by devices in contact with the body. Systems that infer mental states from behavior and content — exactly the kind of system TRIBE v2 represents — fall outside every existing protection, including HIPAA and GDPR. Ienca calls this category “cognitive biometrics”: data that reveal the mind without ever touching the brain.
Tribe v2: Neuromarketing without economic barriers
For two decades, neuromarketing remained a niche practice: it required fMRI scanners, labs, and recruited subjects. Campaign costs ran into the hundreds of thousands of euros — the only practical protection against the systematic use of cognitive science to optimize consumer manipulation. TRIBE v2 removes that barrier. Anyone with a mid-range GPU can now test tens of thousands of ad variants, predict which configuration maximizes reward-system activation in a target audience — before a product is even launched, without recruiting a single subject.
Inferential cognitive profiling without consent
TRIBE v2 does not need to scan you to model how your brain responds to something. Having extracted the universal regularities of the human cortex, it can estimate your reaction to a piece of content — a political campaign, a candidate, a financial product — from the shared statistical knowledge of human brains in general. This is not mind-reading in the science-fiction sense. But it can predict how you would statistically respond to a visual and linguistic stimulus — which, for political and commercial marketing, is already more than enough.
Sample bias and the norm that excludes
TRIBE v2’s “canonical brain” is built on Western, Educated, Industrialized, Rich, Democratic populations — the WEIRD bias that haunts cognitive neuroscience. A neurodivergent patient, an elderly person, or someone raised in a different cultural context whose neural response deviates from the model’s norm is not necessarily ill. But if the system is used as though that deviation were diagnostically meaningful, the result is the systematic pathologization of cognitive difference.
Tribe v2 and The structural regulatory void
HIPAA and GDPR were written to protect data collected in explicit clinical contexts. A model that infers brain activity from public multimedia content falls into a gray zone that neither framework covers. California took a first step with SB 1223 (2024), equating neural data with DNA under the CCPA — but that legislation covers neural data that are collected, not cognitive inferences derived from observed behavior. That distinction — collection versus inference — is the most urgent legal vacuum this technology has opened in recent years.
TRIBE v2 is a mapping tool, not a tool of causal understanding. The brain’s digital twin is not the brain.
“Open source”: an asymmetric openness
Meta released the model weights, source code, and an interactive demo under a CC BY-NC 4.0 license. The move was celebrated as an act of open science. It is worth looking more closely — because the real issue is not the license, but the structure of power it leaves intact.
Nita Farahany, professor of law and philosophy at Duke Law School, described in her book The Battle for Your Brain (2023) the scenario of an employer using consumer EEG sensors during a salary negotiation to detect that an employee is satisfied with a 2% raise — while the company would in fact be willing to go to 10%. With a system like TRIBE v2 — which infers neural responses from content exposure without any physical contact — the same informational asymmetry can be reproduced without even the visible presence of electrodes.
“Even the staunchest freedom-of-contract libertarian would question the fairness of this negotiation.”
What is public versus what matters
Meta released TRIBE v2 publicly in its current form — trained on the data it chose to disclose. But the training data remain under the company’s control. The next version of the model — trained on more data, with a better architecture — will also belong to Meta. The “openness” applies to a snapshot of the present, not to the infrastructure of the future.
The data loop
Meta owns Instagram, Facebook, WhatsApp, and Quest headsets. A system that combines observed behavior with predicted neural response produces something qualitatively different from either layer taken on its own. Meta has both sides of the equation. Anyone using TRIBE v2 as open-source has only one.
Positioning as a de facto standard
Releasing an open tool that becomes the reference standard in a field creates an ecosystem of researchers, applications, and academic citations built around the architectural choices and dataset biases of the actor who released it. Computational neuroscience builds on TRIBE v2 — which means it builds on what Meta chose to make available, and on the terms by which it chose to do so.
The model is open-source.
The ecosystem is not.
Anyone can download TRIBE v2. Only Meta simultaneously owns the neural predictive model, the behavioral data of 3.58 billion people, the infrastructure needed to reach those people, and the marketplace that monetizes the combination. The total is qualitatively different from the individual parts.
covering
cognitive
inference
from content
Econets, 1990
LARAL · ISTC-CNR Rome
+ TRIBE v2
Meta FAIR, 2026
There is a strong argument in favor of the open-source choice: if TRIBE v2 had not been released publicly, it would still have been developed by other actors — probably with less transparency, less peer review, and less access for academic researchers. The alternative to open-source is not the absence of risk: it is risk without the possibility of scrutiny.
Four layers.
One actor.
No other actor in the world simultaneously controls all four layers of this chain. Their combination is qualitatively different from the sum of the individual elements.
The loop Parisi could not have imagined
LARAL’s experiments started from an axiom: whoever controls the environment controls — indirectly but systematically — the cognitive structure that emerges in the organisms inhabiting it. In 1990, that was an experimental thesis about simulated neurorobots. In 2026, it is a precise description of a real condition at planetary scale.
Three and a half billion people access Meta platforms every day. Almost all of the company’s revenue, roughly $196 billion out of a total $201 billion, comes from selling advertising space across those surfaces. Since 2024, Meta has managed this ecosystem through two publicly documented systems. Andromeda reverses the traditional advertising paradigm: it does not start from an advertiser-defined audience in order to find suitable ads, but from the evaluation of content in order to find the people most likely to respond to it. GEM — the Generative Ads Recommendation Model — learns from the full stream of organic content across all Meta surfaces simultaneously and transfers that knowledge to other models through knowledge distillation. The signals it learns from are behavioral: clicks, watch time, conversions — events that appear only after content has already reached the user. Cortical response comes seconds or minutes earlier. TRIBE v2 predicts that primary signal.
In his Internal Robotics experiments, Parisi had shown that emotional circuits were tools of the organism — functional to its survival. The combination of TRIBE v2 + GEM creates the technical possibility of optimizing the informational environment for users’ emotional circuits: not for their cognitive fitness or well-being, but for advertiser conversion. The same neural structures that, in biological evolution, served the organism become, in the algorithmic optimization of the information ecosystem, points of entry for behavioral modification in the direction of whoever controls the environment.
The deepest risk: when the norm becomes control
There is one risk running through all the others that is rarely named explicitly. It does not concern a malicious use of the system — it concerns the normal operation of a system working exactly as intended. The “canonical brain” produced by TRIBE v2 is objectively more accurate than any individual measurement when it comes to predicting a group response. But it is also, structurally, a norm. It defines how the human brain “should” respond to a stimulus. And every deviation from that norm automatically becomes measurable.
Normative diagnosis
A clinical system that uses TRIBE v2 as a baseline for evaluating a patient’s cognitive “health” will — if used uncritically — produce the systematic pathologization of cognitive difference. This is not a hypothetical scenario: it is the recurring history of every standardization tool applied to human variability.
Content engineered for the average brain
If content producers systematically optimize their output to maximize activation in TRIBE v2’s canonical brain, the result is not just more engaging content: it is progressively more uniform content. The cultural flattening produced by algorithmic optimization would repeat itself, this time with neurological response as the objective function instead of the click.
The “anomalous” response as a risk signal
If corporate or institutional security systems adopted tools derived from TRIBE v2 to verify whether users respond to content in the “expected” way, a cognitive response that deviates from the norm would become an alert signal. Anyone who processes a given narrative in an atypical way could be classified as “disaffected” or “at risk.”
Existing rights were not designed to protect people from systems that infer mental states without collecting direct neurological data. Ienca formulates the sharpest philosophical argument: cognitive freedom is chronologically prior to every other freedom — without sovereignty over one’s own thought, freedom of speech, freedom of the press, and freedom of conscience become formally hollow.
Yuste et al., Nature 2017 · Ienca & Andorno, Life Sciences 2017 · Farahany, 2023Existing rights — freedom of thought and mental integrity — already cover the challenges posed by neurotechnologies, if applied correctly. Creating new, specific rights risks weakening them by implicitly declaring them insufficient. In the first neurorights case in the world, in Chile in 2023, the company deleted the data in compliance with already existing laws.
Bublitz, Int. J. Human Rights 2024 · Gálvez, JMIR 2025 · Zaror Miralles, JMIR 2025Both positions converge on one point: current legal tools were not built for systems that infer cognitive response from public multimedia content. TRIBE v2 occupies precisely that unregulated space — regardless of which of the two legal paths one prefers.
What remains?
The ability to measure the deterioration of semantic networks in early Alzheimer’s — before structural damage becomes visible on MRI — is a concrete clinical advantage that could translate into years of recovered quality of life for millions of people. The acceleration of research on rare neurological disorders, made possible by virtual experiments that do not require physical subjects, is a real and immediate gain. Lowering the cost of neurological research for universities in low-income countries is democratization of knowledge in the most literal sense.
Bublitz, the sharpest critic of neurorights as a new legal category, makes the point precisely: almost half of us will develop some form of brain disorder over the course of our lives. Neurotechnology could reduce that suffering. The risk is not the technology itself. It is the gap between the speed at which the technology advances and the slowness with which governance follows. The real question is not whether tools like this should be used. It is who sets the rules of use, who has access to the training data, who audits the model’s biases, and who pays the price when it fails.
- Meta FAIR — TRIBE v2: A Foundation Model of Vision, Audition and Language for In-Silico Neuroscience (26 Mar 2026)
- Neuroscience News — Meta’s TRIBE AI Model Decodes Brain Activity
- Khurshid G. — The Brain Has a Foundation Model Now (Medium, Mar 2026)
- Magee P., Ienca M. et al. — Beyond Neural Data: Cognitive Biometrics and Mental Privacy (Neuron, 2024)
- Yuste R., Goering S. et al. — Four ethical priorities for neurotechnologies and AI (Nature, 2017)
- Szoszkiewicz L., Yuste R. — Mental privacy: navigating risks, rights and regulation (EMBO Reports, 2025)
- Ienca M., Andorno R. — Towards new human rights in the age of neuroscience (Life Sciences, 2017)
- Farahany N. A. — The Battle for Your Brain (St. Martin’s Press, 2023)
- Bublitz C. — Neurotechnologies and human rights (International Journal of Human Rights, 2024)
- Gálvez J.C.L., Zaror Miralles D. — (JMIR News & Perspectives, Feb 2025)
- Neurorights Foundation — Survey: Safeguarding Brain Data (Apr 2024)
- Meta Engineering — GEM: Generative Ads Recommendation Model (Nov 2025)
- Search Engine Land — Inside Meta’s AI-driven advertising system: Andromeda and GEM (Feb 2026)
- Meta — Q4 2025 Earnings Report (Jan 2026)
- Varela F., Thompson E., Rosch E. — The Embodied Mind (MIT Press, 1991)
- Cangelosi A., Parisi D. (eds.) — Simulating the Evolution of Language (Springer, 2002)
- Cangelosi A., Parisi D. — The processing of verbs and nouns: Insights from synthetic brain imaging (Brain and Language, 2004)
- Parisi D., Cecconi F., Nolfi S. — Econets: Neural networks that learn in an environment (Network, 1990)
- Parisi D. — Internal robotics (Connection Science, 2004)







