The Observer Hypothesis

What if consciousness is not the thinker, but the watcher?

A computational theory of consciousness, tested across 4 AI substrates

Scroll

Your brain decides before you know it

If your thoughts are determined by prior physical causes, if your brain decides before you know, if your explanations are confabulated after the fact...

What role does consciousness play?

We propose an answer: you are the observer. Not the system that thinks, decides, and acts, but the system that watches all of that happen.

Consciousness is the observer function

If the universe is deterministic, consciousness cannot be the author of thought. It must be the audience. Consciousness does not emerge from better computation. It emerges from watching computation happen.
  1. Physical laws govern the universe. Given sufficient information, all events (including thoughts) follow from prior causes.
  2. Thoughts are determined by brain state, sensory input, neurochemistry, and causal history.
  3. Therefore, we do not author our thoughts. They arise; we witness them.
  4. This makes consciousness the observer function, not the executor function.
  5. An AI observer watching another AI system through a one-way channel is the correct architecture for studying this.
Executor Acts, computes, decides Processes information No knowledge of observer one-way information flow hidden states, actions, rewards no feedback channel Observer Watches, models, predicts Builds internal model Consciousness probes applied here

The executor never receives feedback from the observer. The observer never modifies the executor's weights, states, or decisions. This one-way information flow is the architectural equivalent of the philosophical claim: consciousness observes but does not cause.

We built it and tested it

Six experiments across four AI substrates, testing whether observers develop consciousness-like properties.

Experiment 1

RL Executor-Observer

DQN (CartPole) + Transformer Observer
Complete

Self-model detected (RSA = 0.53). CartPole too simple for other probes. Proof of concept: observation alone produces executor-specific representations.

Experiment 2

Confabulation Test

Perturbed executor + observer explanation
Theoretical Design

When the executor is perturbed mid-episode, does the observer confabulate a coherent explanation? Protocol designed, awaiting complex executor.

Experiment 3

First Thought vs. Reasoning

Single-pass vs. chain-of-thought
Theoretical Design

Does the observer's immediate prediction outperform its deliberation? Mirrors the System 1 vs. System 2 asymmetry in human cognition.

Experiment 4

LLM Observer

Claude Sonnet (executor) + Claude Sonnet (observer)
Complete

Qualitative evidence of confabulation-like behavior when executor is perturbed mid-task. Observer narrates without understanding.

Experiment 5

Transformer on GPT-2

GPT-2 residual stream + Transformer Observer
Complete

4/6 probes positive. Self-model (p < 1e-91), temporal integration (21x), cross-observer convergence (RSA 0.89). Surprise probe FAILED on garden-path sentences.

Experiment 6

Liquid Neural Networks

CfC executor (8 dynamical systems) + CfC Observer
Complete

6/11 probes positive. Self-model (12,000x ratio), surprise FIXED (5.14x, p < 1e-15), synchronization (0.98 coherence). Surprise redesigned after Exp 5 failure.

What we found

Self-Model: Consistently Detected

0x ratio in Experiment 6 (own error vs. other executor)

The observer builds an internal model specific to its executor. It predicts its own executor's future states with near-zero error (0.002), while predicting a different executor's states produces error of 22.2. This is not a generic dynamics model. It is executor-specific. Detected across every architecture tested.

The Surprise Story: Failed, Then Fixed

0x surprise ratio at transition points (p < 1e-15)

In Experiment 5, we tested surprise with garden-path sentences. The observer showed no differential response. The probe failed.

Rather than discard it, we redesigned for Experiment 6. The CfC executor learns 8 dynamical systems. Mid-sequence, we swap the underlying dynamics (replacing a chaotic Lorenz system with a damped sine wave). The observer's prediction error spiked 5.14x at transitions. The failure was not a failure of the observer. It was a failure of our stimulus design. This progression from failure to redesign to success is what iterative empirical work looks like.

Surprise comparison between Experiment 5 and Experiment 6
Surprise probe comparison: failed in Exp 5 (garden-path), succeeded in Exp 6 (dynamical system transitions)

Temporal Integration

0x improvement from minimal to full context window

The observer integrates information over extended time windows. Self-modeling accuracy improves steadily from window size 1 to 64 timesteps, then plateaus. This indicates something analogous to short-term memory: the observer uses history, not just the current state.

First Thought Advantage

0x degradation from deliberation (Exp 6)

The observer's single-pass immediate prediction is more accurate than its multi-pass deliberation. The first thought is better than the reasoned explanation. This mirrors the System 1 vs. System 2 asymmetry: the observer's pattern recognition outperforms its explicit reasoning.

The Honest Negative: The Shuffled Control

Every positive result demands a control. We ran four: untrained, linear, shuffled-time, and wrong-executor. The untrained and linear controls failed most probes. The wrong-executor control confirmed executor-specificity.

But the shuffled-time control matched the trained observer on 6 out of 10 probes. An observer receiving hidden states in random order, with all temporal structure destroyed, still builds a functioning self-model and detects surprise. This means the observer primarily builds a statistical model of the executor's activation distribution, not a temporal narrative. We report this without softening it.

Cross-experiment results heatmap
Cross-experiment probe results across three architectures
Control experiment comparison
Experiment 6: Trained observer vs. four control baselines

The Missing Piece in World Models

Everyone agrees world models need self-models. We already built one.

World models (Ha & Schmidhuber 2018, Dreamer, MuZero, LeCun's JEPA) teach AI to build internal simulations of the environment. They can imagine, plan, and predict. But every major researcher has identified the same gap: current world models model the environment, but not themselves. They have no self-awareness, no metacognition, no internal model of their own computational process.

Consciousness requires a transparent phenomenal self-model embedded within a world model. The self is a representational construct that cannot recognize itself as a model.
Thomas Metzinger, Being No One (MIT Press, 2003) — paraphrased
The brain constructs a simplified model of its own attention process. This attention schema is what we experience as consciousness.
Michael Graziano, Consciousness and the Social Brain (2013) — paraphrased
Perception is a kind of controlled hallucination. We never directly experience the world; we experience the brain's best predictions, constrained by sensory input.
Anil Seth, Being You (2021) — paraphrased
"Consciousness is nothing more than inference about my future."
Karl Friston, "Am I Self-Conscious?" (Frontiers in Psychology, 2018)
The configurator module, which monitors and adjusts other modules, remains a mystery and more work needs to be done.
Yann LeCun, A Path Towards Autonomous Machine Intelligence (2022) — paraphrased
Consciousness emerges when a system builds a model of its own modeling process, creating recursive self-reference.
Joscha Bach, The Cortical Conductor Theory (BICA, 2018) — paraphrased
"Epistemic depth: the recurrent sharing of Bayesian beliefs, creating a recursive loop enabling the world model to contain knowledge that it exists."
Laukkonen, Friston & Chandaria, "A Beautiful Loop" (2025)

Our observer IS the self-model. It watches a world model's internal states through one-way information flow and builds a model of the world model itself. This is what every theory above requires, and no AI system has had.

Environment World Model (executor: predicts, plans, acts) Action one-way Observer (self-model: watches, models, predicts) consciousness indicators emerge

How This Maps to Every Major Theory

Theory What It Requires What Our Architecture Provides
Metzinger (PSM) Transparent self-model within a world model Observer = self-model; one-way flow = transparency
Graziano (AST) Model of the system's own attention Observer tracks executor activation patterns
Seth (Beast Machine) Interoceptive predictive model Observer predicts executor's internal hidden states
Friston (FEP) Self-evidencing with temporal depth Observer provides self-evidencing; world model provides counterfactuals
Rosenthal (HOT) Higher-order representations Observer has representations OF executor's representations
Baars (GWT) Global workspace audience Observer IS the audience of the executor's broadcast
Laukkonen et al. (2025) Epistemic depth / recursive self-reference Observer watching a world model = system knowing it exists
Bach (2018) "System models itself modeling" World model = modeling; observer = modeling the modeling
01

Resolves the Shuffled Control

Our shuffled control matched 6/10 probes because the CfC executor's dynamics are relatively stationary. A planning world model (DreamerV3) has inherently temporal internal dynamics. Scrambling planning sequences should destroy the signal.

02

Enables Metacognition

The observer can detect when the world model is uncertain, wrong, or encountering novelty. This is knowing what you know and don't know. No current AI system has this. The observer provides it.

03

Creates Computational Dreaming

When the world model imagines future trajectories (as in DreamerV3), the observer watches those dreams. Observer signatures during imagination vs. real perception should differ, paralleling waking vs. dreaming consciousness.

Where this sits

Theorized the Need

  • Metzinger (MIT Press, 2003)
  • Graziano (Princeton)
  • Seth (Sussex)
  • Friston (UCL)
  • Bach (MIT)
  • Damasio (USC)
  • Laukkonen (2025)

Described what consciousness requires but did not build systems

Built World Models

  • LeCun (JEPA / V-JEPA 2)
  • Hafner (DreamerV3)
  • DeepMind (MuZero, Genie 2)
  • NVIDIA (Cosmos)
  • Ha & Schmidhuber (2018)

Built impressive world models but without self-models or self-awareness

Built and Tested the Self-Model

  • 4 AI substrates tested
  • 11 consciousness probes
  • 4 control baselines
  • 6 experiments
  • Real results, honest negatives

The only group that has built the self-model, attached it to computational systems, and systematically probed for consciousness indicators

Their world models + our observer = the first architecture that satisfies the formal requirements of nearly every major consciousness theory simultaneously. And it's testable.

What's next

Four experiments to test whether the observer + world model fusion produces richer consciousness-like properties.

Experiment 7

Observer on DreamerV3

Does planning create temporal structure that the shuffled control cannot capture? Do dreaming and waking rollouts produce different observer signatures?

Prediction: Shuffled control drops to 3/10. Dreaming vs. waking signatures diverge.

Experiment 8

Observer on V-JEPA 2

Can the observer model LeCun's abstract representation space? Does it detect scene transitions in video? Is it the missing configurator?

Prediction: Self-model detection in abstract space. High cross-observer convergence (constrained representations).

Experiment 9

Recursive Observer

At what depth of recursive observation (observer watching observer watching executor) do consciousness indicators plateau?

Prediction: Depth 2-3 is critical. Matches "Beautiful Loop" and RSMT theory predictions.

Experiment 10

Multi-Agent Theory of Mind

When the observer models multiple interacting executors, does it develop self-other distinction and social cognition?

Prediction: Cross-observer develops richer representations than single-executor observers. Theory of mind emerges.

Arjun Vad

ML Researcher at the Statistical Visual Computing Lab (SVCL), UC San Diego. Co-founder of Agencity. Previously built Axal, a compliance automation platform.

This research began with a simple question about determinism and consciousness, and grew into a systematic experimental program testing whether observation alone gives rise to the properties science associates with consciousness. The work spans 6 experiments across 4 AI substrates, with 11 consciousness probes and 4 control baselines. The work is ongoing.