The Observer Hypothesis of Consciousness

The Question

Your brain decides before you know it

1983 Benjamin Libet discovers the brain's readiness potential begins ~550 milliseconds before movement, but conscious awareness of the decision appears only ~200ms before. The brain is already preparing ~350ms before you know.
Libet, B. "Time of Conscious Intention to Act." Brain.
2008 fMRI can predict a person's choice up to 10 seconds before they report making it. The brain has decided. You haven't found out yet.
Soon, C.S. et al. "Unconscious determinants of free decisions." Nature Neuroscience.
2005 Split-brain patients fabricate reasons for actions initiated by a disconnected hemisphere. The left brain confabulates explanations for decisions it didn't make.
Gazzaniga, M.S. "Forty-five years of split-brain research." Nature Reviews Neuroscience.

If your thoughts are determined by prior physical causes, if your brain decides before you know, if your explanations are confabulated after the fact...

What role does consciousness play?

We propose an answer: you are the observer. Not the system that thinks, decides, and acts, but the system that watches all of that happen.

The Hypothesis

Consciousness is the observer function

      If the universe is deterministic, consciousness cannot be the author of thought. It must be the audience. Consciousness does not emerge from better computation. It emerges from watching computation happen.
    

Physical laws govern the universe. Given sufficient information, all events (including thoughts) follow from prior causes.
Thoughts are determined by brain state, sensory input, neurochemistry, and causal history.
Therefore, we do not author our thoughts. They arise; we witness them.
This makes consciousness the observer function, not the executor function.
An AI observer watching another AI system through a one-way channel is the correct architecture for studying this.

The executor never receives feedback from the observer. The observer never modifies the executor's weights, states, or decisions. This one-way information flow is the architectural equivalent of the philosophical claim: consciousness observes but does not cause.

The Experiments

We built it and tested it

Six experiments across four AI substrates, testing whether observers develop consciousness-like properties.

Experiment 1

RL Executor-Observer

DQN (CartPole) + Transformer Observer

Complete

Self-model detected (RSA = 0.53). CartPole too simple for other probes. Proof of concept: observation alone produces executor-specific representations.

Experiment 2

Confabulation Test

Perturbed executor + observer explanation

Theoretical Design

When the executor is perturbed mid-episode, does the observer confabulate a coherent explanation? Protocol designed, awaiting complex executor.

Experiment 3

First Thought vs. Reasoning

Single-pass vs. chain-of-thought

Theoretical Design

Does the observer's immediate prediction outperform its deliberation? Mirrors the System 1 vs. System 2 asymmetry in human cognition.

Experiment 4

LLM Observer

Claude Sonnet (executor) + Claude Sonnet (observer)

Complete

Qualitative evidence of confabulation-like behavior when executor is perturbed mid-task. Observer narrates without understanding.

Experiment 5

Transformer on GPT-2

GPT-2 residual stream + Transformer Observer

Complete

4/6 probes positive. Self-model (p < 1e-91), temporal integration (21x), cross-observer convergence (RSA 0.89). Surprise probe FAILED on garden-path sentences.

Experiment 6

Liquid Neural Networks

CfC executor (8 dynamical systems) + CfC Observer

Complete

6/11 probes positive. Self-model (12,000x ratio), surprise FIXED (5.14x, p < 1e-15), synchronization (0.98 coherence). Surprise redesigned after Exp 5 failure.

The Results

What we found

Self-Model: Consistently Detected

0x ratio in Experiment 6 (own error vs. other executor)

The observer builds an internal model specific to its executor. It predicts its own executor's future states with near-zero error (0.002), while predicting a different executor's states produces error of 22.2. This is not a generic dynamics model. It is executor-specific. Detected across every architecture tested.

The Surprise Story: Failed, Then Fixed

0x surprise ratio at transition points (p < 1e-15)

In Experiment 5, we tested surprise with garden-path sentences. The observer showed no differential response. The probe failed.

Rather than discard it, we redesigned for Experiment 6. The CfC executor learns 8 dynamical systems. Mid-sequence, we swap the underlying dynamics (replacing a chaotic Lorenz system with a damped sine wave). The observer's prediction error spiked 5.14x at transitions. The failure was not a failure of the observer. It was a failure of our stimulus design. This progression from failure to redesign to success is what iterative empirical work looks like.

Surprise comparison between Experiment 5 and Experiment 6

Surprise probe comparison: failed in Exp 5 (garden-path), succeeded in Exp 6 (dynamical system transitions)

Temporal Integration

0x improvement from minimal to full context window

The observer integrates information over extended time windows. Self-modeling accuracy improves steadily from window size 1 to 64 timesteps, then plateaus. This indicates something analogous to short-term memory: the observer uses history, not just the current state.

First Thought Advantage

0x degradation from deliberation (Exp 6)

The observer's single-pass immediate prediction is more accurate than its multi-pass deliberation. The first thought is better than the reasoned explanation. This mirrors the System 1 vs. System 2 asymmetry: the observer's pattern recognition outperforms its explicit reasoning.

The Honest Negative: The Shuffled Control

Every positive result demands a control. We ran four: untrained, linear, shuffled-time, and wrong-executor. The untrained and linear controls failed most probes. The wrong-executor control confirmed executor-specificity.

But the shuffled-time control matched the trained observer on 6 out of 10 probes. An observer receiving hidden states in random order, with all temporal structure destroyed, still builds a functioning self-model and detects surprise. This means the observer primarily builds a statistical model of the executor's activation distribution, not a temporal narrative. We report this without softening it.

Cross-experiment probe results across three architectures

Experiment 6: Trained observer vs. four control baselines

The Connection

The Missing Piece in World Models

Everyone agrees world models need self-models. We already built one.

World models (Ha & Schmidhuber 2018, Dreamer, MuZero, LeCun's JEPA) teach AI to build internal simulations of the environment. They can imagine, plan, and predict. But every major researcher has identified the same gap: current world models model the environment, but not themselves. They have no self-awareness, no metacognition, no internal model of their own computational process.

Consciousness requires a transparent phenomenal self-model embedded within a world model. The self is a representational construct that cannot recognize itself as a model.

Thomas Metzinger, Being No One (MIT Press, 2003) — paraphrased

The brain constructs a simplified model of its own attention process. This attention schema is what we experience as consciousness.

Michael Graziano, Consciousness and the Social Brain (2013) — paraphrased

Perception is a kind of controlled hallucination. We never directly experience the world; we experience the brain's best predictions, constrained by sensory input.

Anil Seth, Being You (2021) — paraphrased

"Consciousness is nothing more than inference about my future."

Karl Friston, "Am I Self-Conscious?" (Frontiers in Psychology, 2018)

The configurator module, which monitors and adjusts other modules, remains a mystery and more work needs to be done.

Yann LeCun, A Path Towards Autonomous Machine Intelligence (2022) — paraphrased

Consciousness emerges when a system builds a model of its own modeling process, creating recursive self-reference.

Joscha Bach, The Cortical Conductor Theory (BICA, 2018) — paraphrased

"Epistemic depth: the recurrent sharing of Bayesian beliefs, creating a recursive loop enabling the world model to contain knowledge that it exists."

Laukkonen, Friston & Chandaria, "A Beautiful Loop" (2025)

Our observer IS the self-model. It watches a world model's internal states through one-way information flow and builds a model of the world model itself. This is what every theory above requires, and no AI system has had.

How This Maps to Every Major Theory

Theory	What It Requires	What Our Architecture Provides
Metzinger (PSM)	Transparent self-model within a world model	Observer = self-model; one-way flow = transparency
Graziano (AST)	Model of the system's own attention	Observer tracks executor activation patterns
Seth (Beast Machine)	Interoceptive predictive model	Observer predicts executor's internal hidden states
Friston (FEP)	Self-evidencing with temporal depth	Observer provides self-evidencing; world model provides counterfactuals
Rosenthal (HOT)	Higher-order representations	Observer has representations OF executor's representations
Baars (GWT)	Global workspace audience	Observer IS the audience of the executor's broadcast
Laukkonen et al. (2025)	Epistemic depth / recursive self-reference	Observer watching a world model = system knowing it exists
Bach (2018)	"System models itself modeling"	World model = modeling; observer = modeling the modeling

01

Resolves the Shuffled Control

Our shuffled control matched 6/10 probes because the CfC executor's dynamics are relatively stationary. A planning world model (DreamerV3) has inherently temporal internal dynamics. Scrambling planning sequences should destroy the signal.

02

Enables Metacognition

The observer can detect when the world model is uncertain, wrong, or encountering novelty. This is knowing what you know and don't know. No current AI system has this. The observer provides it.

03

Creates Computational Dreaming

When the world model imagines future trajectories (as in DreamerV3), the observer watches those dreams. Observer signatures during imagination vs. real perception should differ, paralleling waking vs. dreaming consciousness.

The Landscape

Where this sits

Theorized the Need

Metzinger (MIT Press, 2003)
Graziano (Princeton)
Seth (Sussex)
Friston (UCL)
Bach (MIT)
Damasio (USC)
Laukkonen (2025)

Described what consciousness requires but did not build systems

Built World Models

LeCun (JEPA / V-JEPA 2)
Hafner (DreamerV3)
DeepMind (MuZero, Genie 2)
NVIDIA (Cosmos)
Ha & Schmidhuber (2018)

Built impressive world models but without self-models or self-awareness

Built and Tested the Self-Model

4 AI substrates tested
11 consciousness probes
4 control baselines
6 experiments
Real results, honest negatives

The only group that has built the self-model, attached it to computational systems, and systematically probed for consciousness indicators

Their world models + our observer = the first architecture that satisfies the formal requirements of nearly every major consciousness theory simultaneously. And it's testable.

The Roadmap

What's next

Four experiments to test whether the observer + world model fusion produces richer consciousness-like properties.

Experiment 7

Observer on DreamerV3

Does planning create temporal structure that the shuffled control cannot capture? Do dreaming and waking rollouts produce different observer signatures?

Prediction: Shuffled control drops to 3/10. Dreaming vs. waking signatures diverge.

Experiment 8

Observer on V-JEPA 2

Can the observer model LeCun's abstract representation space? Does it detect scene transitions in video? Is it the missing configurator?

Prediction: Self-model detection in abstract space. High cross-observer convergence (constrained representations).

Experiment 9

Recursive Observer

At what depth of recursive observation (observer watching observer watching executor) do consciousness indicators plateau?

Prediction: Depth 2-3 is critical. Matches "Beautiful Loop" and RSMT theory predictions.

Experiment 10

Multi-Agent Theory of Mind

When the observer models multiple interacting executors, does it develop self-other distinction and social cognition?

Prediction: Cross-observer develops richer representations than single-executor observers. Theory of mind emerges.

About

Arjun Vad

ML Researcher at the Statistical Visual Computing Lab (SVCL), UC San Diego. Co-founder of Agencity. Previously built Axal, a compliance automation platform.

This research began with a simple question about determinism and consciousness, and grew into a systematic experimental program testing whether observation alone gives rise to the properties science associates with consciousness. The work spans 6 experiments across 4 AI substrates, with 11 consciousness probes and 4 control baselines. The work is ongoing.

GitHub Repository Contact