The Mind's Eye A Field Study · 2026 · Edition One
A Field Study

The Mind's Eye

心 眼

Some people think in pictures. Others think in words. The cortex spends roughly thirty percent of its real estate on vision and less than ten on language — and yet we narrate thought as if it were a sentence. This is a field study of the gap.

FILED
2026 · Edition One
SUBJECTS
Einstein · Tesla · da Vinci · Grandin · Faraday · Kekulé
SCOPE
Cognition · Neuroanatomy · AI
RUN TIME
~12 minutes of slow reading
I

The Spectrum两 极 之 间

§ 01
Visual ↔ Verbal
Visualizer Verbalizer
thinks in pictures, scenes, diagrams thinks in propositions, sentences, names

No one is purely one or the other. Most minds blend — drafting a sentence while a faint image flickers, or rotating a mental object while sub-vocalising its name. But the dominant gear differs from person to person, and the cost of being miscast is real: the visualizer who is taught to "show your work" in equations alone, the verbalizer who is told to "see it in your head" before they can. Schools, workplaces, and cultures often privilege the verbal pole because it is easier to grade. The visualizer's strongest asset — direct, parallel, manipulable mental scenes — is largely invisible to a marker.

A useful asymmetry to remember: language is serial, vision is parallel. A sentence arrives one word at a time; a scene arrives all at once. This asymmetry shows up in everything from how children solve geometry, to how engineers debug systems, to how AI models do — or don't — imagine.

II

The Visualizers用 图 思 考 之 人

§ 02
Six Case Studies
PHYSICS · 20C

Albert Einstein

1879 – 1955 · Germany / USA

"Words and language, written or spoken, do not seem to play any role in my mechanism of thought."

Special relativity began with a thought experiment: a sixteen-year-old Einstein imagined himself riding alongside a beam of light. The mathematics came later — many years later, by his own account. He repeatedly described his cognition as combinatorial play with "more or less clear images"; words were a translation step, not the substrate.

ELECTRICAL · 19–20C

Nikola Tesla

1856 – 1943 · Croatia / USA

"I do not need any models, drawings or experiments. I can picture them all in my mind."

Tesla famously claimed to build and run machines in his head — running them mentally for weeks, then disassembling them to inspect for wear before any metal was cut. Whatever the embellishment, his contemporaries verified that working prototypes regularly emerged the first time he committed them to materials.

RENAISSANCE

Leonardo da Vinci

1452 – 1519 · Italy

"Painting is poetry that is seen rather than felt, and poetry is painting that is felt rather than seen."

Leonardo's notebooks are not illustrated text — they are visual reasoning with text annotations. Anatomy, hydraulics, flight, optics: each investigated by drawing, each conclusion drawn from the drawing rather than from prose. He called the eye "the window of the soul" — for him it was also the workbench.

ANIMAL SCIENCE · 21C

Temple Grandin

b. 1947 · USA

"My mind is similar to an Internet search engine, set to locate photos."

Grandin's autism is bound up with one of the most extreme visual cognitions on record. She designed humane livestock-handling facilities by walking, mentally, through the animal's eye-line — anticipating shadows, reflections, and movement vectors that a verbal-dominant designer would simply not notice. Half of US cattle now pass through systems she designed.

ELECTROMAGNETISM · 19C

Michael Faraday

1791 – 1867 · England

"I have far more confidence in the one man who works mentally and bodily at a matter than in the six who only talk about it."

Faraday had no formal mathematical training. His field-line visualization — invisible curves of force filling space — was a working tool he could literally see in his mind's eye. Maxwell, mathematically gifted but visually appreciative, later wrote the field equations precisely because Faraday had already seen the field.

CHEMISTRY · 19C

August Kekulé

1829 – 1896 · Germany

"I saw the atoms gambolling before my eyes… one of the snakes had seized hold of its own tail."

Kekulé deduced the ring structure of benzene from a daydream — a snake biting its own tail. Whether this is a true reverie or an after-the-fact narrative, the underlying cognitive move is canonical: structural insight arriving as image first, formula second.

"Most of the fundamental ideas of science are essentially simple, and may, as a rule, be expressed in a language comprehensible to everyone."

Albert Einstein · 1938
III

The Cortical Atlas大 脑 皮 层 分 工

§ 03
Where the cortex spends its acres
Visual Cortexseeing & imagining
25–30%
Prefrontalplanning · self · meta
25–30%
Motor / Somatosensorybody & action
15–20%
Language RegionsBroca · Wernicke et al.
5–10%
Auditorytones · phonemes
3–8%
Olfactorysmell
< 1%

What the budget tells us

Three things jump out. First: vision is the dominant tenant. A quarter to a third of the cortical sheet is dedicated to processing what the eye delivers — far more than any other modality. Second: language is small by comparison. The classical Broca and Wernicke regions, even with their extended networks, are a thin slice of the pie. Third: language is not even a unitary subsystem — it borrows generously from motor regions (for articulation), auditory regions (for phonology), and the prefrontal cortex (for syntax and planning).

The naive folk theory — "thought = inner speech" — is the modality with the smallest dedicated footprint claiming the throne. The richer truth is that most of cognition runs below the speech layer, in regions evolution invested in vastly more aggressively.

IV

Language as Operating System语 言 即 操 作 系 统

§ 04
Not a region — a layer

If language doesn't fit into one cortical district, what is it? Better metaphor: an operating system. Vision, motor, memory, emotion all run as native processes. Language is the shared interface through which they coordinate, schedule, and report — both internally (inner speech) and externally (conversation, instruction, writing).

L4 · APPS
Conscious Thought自 觉 思 维
Inner monologue, deliberation, narrative self, instruction-following.
L3 · OS
Language Layer语 言 层
Coordinates and labels processes from below. Not a single region — a distributed protocol that lets vision, memory, motor, and emotion exchange tokens. This is where humans differ most from other primates.
L2 · KERNEL
Cortical Subsystems皮 层 子 系 统
Visual cortex, motor cortex, somatosensory, auditory, prefrontal. The major capabilities. Most cognition happens here, sub-linguistically.
L1 · HARDWARE
Brainstem & Thalamus脑 干 与 丘 脑
Heart-rate, breath, sleep, arousal, sensory routing. The kernel of the kernel — most of it never crosses into awareness.

The implication: a "verbal" thinker isn't running thought in the language layer alone — they are simply logging and routing more of it through the language API. A "visual" thinker keeps more cognition native to the visual subsystem, never serializing it into words. Both are using the whole stack; they differ in which calls cross the OS boundary.

V

The Machine Mind机 器 之 思

§ 05
The asymmetry inside AI
VERBAL CHANNEL

Large Language Models

GPT · Claude · Gemini · Llama

Trained almost entirely on text. The medium of cognition is the token sequence. Even when given image inputs (multimodal LLMs), the picture is collapsed to embeddings and processed inside the language scaffolding.

  • Strong at: reasoning sequentially, citing, summarising, instructing.
  • Weak at: rotating an object, simulating physics, sketching a layout, "seeing" a scene before describing it.
  • Has tools that can generate images. Lacks the inner workspace where images can be manipulated mid-thought.

Effectively: a brilliant verbalizer with no mind's eye. Excellent at tasks that compress to language. Brittle at tasks that don't.

VISUAL CHANNEL

Diffusion & Vision Models

Stable Diffusion · DALL·E · Sora · Flux

Operate in pixel and latent visual space. Genuinely fluent in images, motion, lighting. Yet they don't understand what they generate — there is no propositional model, no symbolic reasoning, no goal-directed planning.

  • Strong at: rendering, style transfer, photorealism, motion synthesis.
  • Weak at: counting fingers, holding a multi-step goal, refusing an impossible request, explaining its own work.
  • Pure visual cortex without prefrontal cortex — vivid, occasionally hallucinatory, never reflective.

Effectively: a visualizer with no inner narrator. Conjures scenes but can't critique them.

The two halves of the AI ecosystem map suspiciously onto the two thinking modes — and neither half, alone, can do what an ordinary human cognition does in a coffee-shop conversation.

VI

What a True Mind's Eye Would Need机 器 心 眼 之 路

§ 06
Six requirements
01

A Manipulable Visual Workspace可 操 作 的 视 觉 工 作 区

An internal canvas the model can both generate on and read from. Not "produce an image and forget" — but place an object, rotate it, occlude it, query its new geometry, all without leaving the workspace.

02

Bidirectional Language ↔ Vision Coupling语 言 与 视 觉 双 向 接 口

Not just text-to-image. A model that, when reasoning verbally about a system, can spawn a sketch, inspect it, edit it, and let the inspection update its proposition. And vice versa — let an image's contents shape the next sentence.

03

An Object Permanence Memory物 体 持 续 性 记 忆

Mental objects must persist between thoughts. Tesla mentally ran a turbine for weeks; current visual generators forget the previous frame in milliseconds. Without persistence, no simulation, no design, no mental engineering.

04

Embodied Spatial Priors具 身 空 间 先 验

Real visualizers learn space by moving through it. AI agents that train in simulated and physical environments — robotics, drones, embodied research labs — accumulate the spatial intuitions a chat-only model never can. You learn the world by bumping into it.

05

Symbolic ↔ Iconic Translation符 号 与 图 像 互 译

The capacity to swap between a propositional representation ("the lever rotates 30° around point P") and a vivid image of the same — and to debug discrepancies between the two. This is what Faraday and Maxwell did between them; future AI may need to do it inside one model.

06

A Self-Critique Loop That Operates in Both Channels双 通 道 自 我 审 视

Reasoning models already self-critique in text. The next step is critique in images — a model that looks at its own generated diagram and flags "this gear can't actually mesh" or "the perspective is impossible". Currently the visual side hallucinates with confidence because nothing inside is calling it out.

VII

Test Yourself自 测

§ 07
Eight informal questions
Q1.
When you remember the front door of your childhood home, do you see it (paint colour, handle, smell of the wood) or recall facts about it (it was wooden, slightly faded blue, north-facing)? Visualizers see. Verbalizers list.
Q2.
Picture the letter R, then mentally rotate it 90° clockwise. How long did that take? Could you read off the new shape? Strong visualizers do this in under a second; strong verbalizers describe the operation rather than perform it.
Q3.
When you read fiction, do you see the scene like a film, or do you experience it as language flowing past? Some lifelong readers report no imagery at all (aphantasia). Others see continuously.
Q4.
When you plan your morning, do you watch yourself moving through the kitchen, or do you list the steps? Either is fine. Most people do some of both — but rarely in equal proportion.
Q5.
When solving an unfamiliar geometry problem, do you draw a figure first, or write the equations first? The famous "stop and draw" of mathematicians is a visualizer's reflex.
Q6.
Do you talk to yourself in your head most of the time, occasionally, or almost never? Inner speech is far less universal than introspection suggests.
Q7.
When given new directions to a place, do you find them easier as a map or as a list of turns? Maps for visualizers, turns for verbalizers — and the difference is often dramatic.
Q8.
When trying to recall a name that just escaped you, do you see the person's face, hear their voice, or get a feeling of the shape of the name ("starts with M, two syllables")? Different retrieval pathways reveal which sub-system tagged the memory in the first place.
Counted more "see" answers — you lean visualizer. More "list / words" — verbalizer. Mixed — almost everyone.
VIII

Further Reading延 伸 阅 读

§ 08
Books · papers · primary sources

Books

  • Thinking in Pictures — Temple Grandin, 1995The first-person account that made visual cognition mainstream
  • Visual Thinking — Temple Grandin, 2022The updated argument: society undervalues object visualizers
  • The Mind's Eye — Oliver Sacks, 2010Case studies of vision lost, vision recovered, vision absent
  • The Master and His Emissary — Iain McGilchrist, 2009The two-hemisphere thesis — extended, controversial, generative
  • Hare Brain, Tortoise Mind — Guy Claxton, 1997The case for slower, image-based, sub-verbal cognition
  • Drawing on the Right Side of the Brain — Betty Edwards, 1979How learning to draw rewires the verbal-dominant adult

Primary Sources

  • Einstein letter to Hadamard — 1945"Words and language do not seem to play any role in my mechanism of thought."
  • Tesla, "My Inventions" — 1919The autobiographical account of building machines in the mind's eye
  • Faraday's Diary — 1820–62Field-line drawings as primary research instruments
  • Galton, "Statistics of Mental Imagery" — 1880The first survey to discover that some scientists imagined nothing at all
  • Kekulé's Berlin Address — 1890The famous benzene-snake reverie, recounted in his own voice

Modern Research

  • Aphantasia: a clinical update — Adam Zeman, 2021The newly named condition of mental-image absence
  • Mental rotation tasks — Shepard & Metzler, 1971The first quantification of visual reasoning speed
  • Spatial thinking & STEM outcomes — Wai, Lubinski, Benbow, 200940-year longitudinal: spatial ability predicts science attainment
  • Cortical magnification & visual percepts — Daniel & Whitteridge, 1961Quantifying just how much cortex the eye claims
  • Inner speech variability — Hurlburt & Heavey, 2018Random-sample studies showing inner speech is far rarer than reported
  • Diffusion vs reasoning models — research literature, 2022–The asymmetry described in §5 is an active research frontier