NeurIPS 2025Brain-Inspired AILLMCognitive Architecture

Unified Cognitive Architecture: A Brain‑Inspired Framework for Integrative Large Language Models

Submitted2026
DatasetWikiText-2

Architecture

How UCA Processes Information

GLOBALWORKSPACE3× recurrenceL1SensoryL2LimbicL3HippocampalL4AssociationL5ExecutiveL6Metacognitiveinputoutput
Global Workspace
Abstract

Large Language Models (LLMs) have achieved remarkable performance on a wide range of natural language tasks, yet they remain fundamentally different from human cognition. They lack integrated emotional reasoning, long-term episodic memory, temporal awareness, genuine self-monitoring, and the ability to maintain multiple interpretations in parallel. Inspired by neuroscientific theories of consciousness and cognitive architecture, we introduce the Unified Cognitive Architecture (UCA) — a novel LLM framework that unifies six core cognitive functions within a single recurrent neural system: sensory processing, emotional valuation, episodic memory, semantic knowledge with temporal sharding, executive control with quantum-like state superposition, and metacognitive self-modeling. A Global Workspace enables information from all specialized layers to become globally available, mimicking conscious access. Preliminary experiments show that UCA achieves competitive perplexity while exhibiting emergent cognitive behaviors such as uncertainty estimation, hallucination self-detection, and context-dependent memory retrieval.


Section 1

Introduction

The rapid advancement of Large Language Models has revolutionized natural language processing. Models such as GPT-4, Claude, and Llama 3 demonstrate fluency across a vast array of tasks, from creative writing to code generation. Yet beneath their impressive surface lies a fundamental gap: they operate as sophisticated pattern matchers without any internal model of themselves, their own knowledge, or the emotional weight of the information they process.

Neuroscience offers a rich alternative perspective. The human brain is a massively interconnected system where emotion influences memory, memory shapes perception, and a metacognitive "self" monitors and modulates all processes. Theories such as Global Workspace Theory and Predictive Processing suggest that consciousness arises from the global availability of information across specialized brain regions.

In this paper, we bridge the gap between neuroscience and LLMs by proposing UCA — a neural architecture that explicitly models six key cognitive layers within a single, recurrently connected framework, all interacting through a shared Global Workspace.

L1

Sensory Cortex

Token & embedding processing

L2

Limbic System

Emotional valence tagging

L3

Hippocampal Complex

Episodic memory

L4

Association Cortex

Temporal sharding

L5

Prefrontal Executive

Quantum-like superposition

L6

Metacognitive Self

Self-modeling & error detection



Section 3

The Unified Cognitive Architecture

UCA is designed as a single neural network with six vertically integrated layers, each corresponding to a cognitive function. Layers communicate through two mechanisms: Recurrent Processing (iterating through all layers multiple times per forward pass) and a Global Workspace (a dynamic buffer that holds the current "conscious" representation broadcast to every layer at each step).

Figure 1 — UCA Layered Architecture

L6

Metacognitive Self

self-model · confidence · errors

L5

Prefrontal Executive

quantum superposition · planning

L4

Association Cortex

temporal shards · semantics

L3

Hippocampal Complex

episodic memory · salience

L2

Limbic System

emotional valence · arousal

L1

Sensory Cortex

token embeddings · attention

Global Workspace

conscious buffer · broadcast hub

Mechanisms

  • Recurrent Loop — R steps of forward + backward flow
  • Global Workspace — broadcasts to all layers every step
  • Superposition — n parallel interpretation states

Data Flow

Input tokens

→ L1 Sensory

→ L2 Emotional gate

→ L3 Memory recall

→ L4 Temporal shard

→ L5 Superposition

→ L6 Metacognition

→ Global WS ⟲

Cognitive Layer Descriptions

L1

Sensory Cortex

Token embeddings and positional encodings processed through multi-head self-attention and feed-forward networks with residual connections. Extracts basic linguistic features.

L2

Limbic System

Emotional valence (positive–negative), arousal (calm–excited), and dominance (controlled–controlling) computed per token. Emotional states gate token embeddings to modulate higher-layer influence.

L3

Hippocampal Complex

External memory matrix storing past experiences as key–value pairs. Emotional salience gates memory writes. Top-k retrieval via cosine similarity enables episodic recall with a pruning mechanism for capacity management.

L4

Association Cortex

Temporal sharding: multiple parallel representations scaled by different time constants. Weighted combination by learned context weights allows the model to hold time-stamped knowledge and adjust to historical or contemporary context.

L5

Prefrontal Executive

Quantum-inspired superposition: n parallel interpretation states per token interact via a learnable interference matrix. High uncertainty preserves superposition; low uncertainty collapses states to a single interpretation, avoiding early commitment.

L6

Metacognitive Self

Receives inputs from all prior layers to produce per-token confidence, error-type classification (none / hallucination / contradiction), hallucination risk score, and an intervention signal that can modify lower-layer outputs when a problem is detected.

3.9 Training Objectives

Language Modeling Loss

Standard cross-entropy on next-token prediction

Emotional Consistency Loss

Encourages stable emotional embeddings across similar contexts

Memory Salience Loss

Encourages storing memories with appropriate emotional weight

Metacognitive Losses

Confidence calibration, error classification, intervention regularization


Section 4

Experimental Setup

We implemented UCA in PyTorch and trained a small-scale version on the WikiText-2 dataset to validate the architecture and observe cognitive behaviors. Training ran for 100,000 steps with batch size 16 on a single NVIDIA A100 40GB GPU, using the AdamW optimizer (lr=3e-4) and a cosine learning rate schedule. We tracked language modeling perplexity as well as cognitive metrics: average confidence, hallucination risk, memory usage, and intervention rate. For comparison, a standard transformer of similar size (6 layers, 256 hidden) was trained on the same data.


Section 5

Results

42.3

+0.5 vs baseline

Validation Perplexity

Small cost for richer cognition

~60%

of 5k capacity

Memory Utilization

Bias toward emotionally salient tokens

2%

of tokens

Metacognitive Interventions

High hallucination-risk tokens only

100K

A100 40GB GPU

Training Steps

AdamW + cosine schedule, lr=3e-4

Qualitative Examples

Prompt

"The bank was steep, so I had to…"

Baseline

"…deposit my money."

Hallucination: mixed river/money sense

UCA

"…climb carefully."

Correctly disambiguated to river bank via episodic memory retrieval

Prompt

"Pluto is a…"

Baseline

"…dwarf planet."

Correct but without temporal context

UCA

"…was classified as a planet until 2006, when it was reclassified as a dwarf planet."

Temporal sharding accessed both time-stamped facts

Model Configuration

ParameterValue
d_model256
n_heads8
n_layers6
d_ff1,024
max_seq_len128
vocab_size50,000
n_recurrence_steps3
n_temporal_shards5
n_quantum_states3
memory_capacity5,000

Ablation Study

VariantPPL
UCA (full)42.3
Without Global Workspace41.9
Without Quantum States42.7
Without Metacognitive Losses41.6
Baseline Transformer41.8

Section 6

Discussion

UCA demonstrates that it is possible to design a single neural architecture that integrates multiple cognitive functions inspired by the human brain. The preliminary experiments show that such integration does not come at a prohibitive cost in language modeling performance, and it yields emergent behaviors — uncertainty estimation, memory-guided generation, and temporal awareness — that are highly desirable for trustworthy AI.

6.1 Limitations

  • Current implementation is small-scale; billions of parameters needed to compete with state-of-the-art LLMs.
  • Cognitive metrics are based on internal signals; validation against human judgments is required.
  • Episodic memory is still far from the richness of human autobiographical memory.

6.2 Future Work

  • Scale up using mixture-of-experts to keep computation manageable.
  • Incorporate RLHF to align the metacognitive layer with human preferences.
  • Extend to multimodal inputs (vision, audio) for a truly unified cognitive agent.
  • Implement long-term memory consolidation via offline replay, analogous to sleep.
  • Address ethical implications of self-aware AI models.

Section 7

Conclusion

We have presented the Unified Cognitive Architecture (UCA), a brain-inspired LLM that unifies sensory, emotional, episodic, semantic, executive, and metacognitive processing within a single recurrent neural system with a global workspace. Our implementation and small-scale experiments show that UCA can match the language modeling performance of a standard transformer while exhibiting valuable cognitive behaviors such as uncertainty estimation, temporal reasoning, and self-monitoring. UCA represents a step toward AI systems that are not only more capable but also more aligned with human-like cognition and trustworthiness. We release our code to encourage further research in this direction.

Acknowledgments. I thank the open-source community for providing the tools that made this work possible, and the many researchers whose foundational insights inspired this architecture. This research was supported by independent funding.


References
[1]OpenAI. "GPT-4 Technical Report." 2023.
[2]Anthropic. "Claude: A conversational AI assistant." 2023.
[3]Meta AI. "Llama 3: Open foundation models." 2024.
[4]Baars, B. J. "A Cognitive Theory of Consciousness." Cambridge University Press, 1988.
[5]Friston, K. "The free-energy principle: a unified brain theory?" Nature Reviews Neuroscience, 2010.
[6]Anderson, J. R., et al. "An integrated theory of the mind." Psychological Review, 2004.
[7]Laird, J. E. "The Soar Cognitive Architecture." MIT Press, 2012.
[8]D'Mello, S. K., & Graesser, A. C. "AutoTutor and affective autotutor." ACM TIIS, 2012.
[9]Rosenbloom, P. S. "The Sigma cognitive architecture and system." AIC, 2011.
[10]Graves, A., et al. "Hybrid computing using a neural network with dynamic external memory." Nature, 2016.
[11]Weston, J., et al. "Memory networks." ICLR 2015.
[12]Rashkin, H., et al. "I-you-he-she-it: Using empathy to improve dialogue generation." ACL 2019.
[13]Jiang, Z., et al. "How can we know when language models know?" EMNLP 2020.
[14]Saunders, W., et al. "Self-critique training for language models." arXiv 2022.
[15]Dehaene, S., & Changeux, J. P. "Experimental and theoretical approaches to conscious processing." Neuron, 2011.
[16]van der Velde, F., & de Kamps, M. "Neural blackboard architectures for combinatorial inference." BBS, 2006.
[17]Vaswani, A., et al. "Attention is all you need." NeurIPS 2017.
[18]Bulatov, A., et al. "Recurrent Memory Transformer." NeurIPS 2022.
[19]Dai, Z., et al. "Transformer-XL: Attentive language models beyond a fixed-length context." ACL 2019.
[20]Peng, B., et al. "RWKV: Reinventing RNNs for the Transformer Era." arXiv 2023.
[21]Merity, S., et al. "Pointer Sentinel Mixture Models." ICLR 2017.