The Mind's Eye A Field Study · 2026 · Edition One

A Field Study

The Mind's Eye

心眼

Some people think in pictures. Others think in words. The cortex spends roughly thirty percent of its real estate on vision and less than ten on language — and yet we narrate thought as if it were a sentence. This is a field study of the gap.

FILED: 2026 · Edition One
SUBJECTS: Einstein · Tesla · da Vinci · Grandin · Faraday · Kekulé
SCOPE: Cognition · Neuroanatomy · AI
RUN TIME: ~12 minutes of slow reading

The Spectrum两极之间

§ 01
Visual ↔ Verbal

Visualizer Verbalizer

thinks in pictures, scenes, diagrams thinks in propositions, sentences, names

No one is purely one or the other. Most minds blend — drafting a sentence while a faint image flickers, or rotating a mental object while sub-vocalising its name. But the dominant gear differs from person to person, and the cost of being miscast is real: the visualizer who is taught to "show your work" in equations alone, the verbalizer who is told to "see it in your head" before they can. Schools, workplaces, and cultures often privilege the verbal pole because it is easier to grade. The visualizer's strongest asset — direct, parallel, manipulable mental scenes — is largely invisible to a marker.

A useful asymmetry to remember: language is serial, vision is parallel. A sentence arrives one word at a time; a scene arrives all at once. This asymmetry shows up in everything from how children solve geometry, to how engineers debug systems, to how AI models do — or don't — imagine.

The Visualizers用图思考之人

§ 02
Six Case Studies

PHYSICS · 20C

Albert Einstein

1879 – 1955 · Germany / USA

"Words and language, written or spoken, do not seem to play any role in my mechanism of thought."

Special relativity began with a thought experiment: a sixteen-year-old Einstein imagined himself riding alongside a beam of light. The mathematics came later — many years later, by his own account. He repeatedly described his cognition as combinatorial play with "more or less clear images"; words were a translation step, not the substrate.

ELECTRICAL · 19–20C

Nikola Tesla

1856 – 1943 · Croatia / USA

"I do not need any models, drawings or experiments. I can picture them all in my mind."

Tesla famously claimed to build and run machines in his head — running them mentally for weeks, then disassembling them to inspect for wear before any metal was cut. Whatever the embellishment, his contemporaries verified that working prototypes regularly emerged the first time he committed them to materials.

RENAISSANCE

Leonardo da Vinci

1452 – 1519 · Italy

"Painting is poetry that is seen rather than felt, and poetry is painting that is felt rather than seen."

Leonardo's notebooks are not illustrated text — they are visual reasoning with text annotations. Anatomy, hydraulics, flight, optics: each investigated by drawing, each conclusion drawn from the drawing rather than from prose. He called the eye "the window of the soul" — for him it was also the workbench.

ANIMAL SCIENCE · 21C

Temple Grandin

b. 1947 · USA

"My mind is similar to an Internet search engine, set to locate photos."

Grandin's autism is bound up with one of the most extreme visual cognitions on record. She designed humane livestock-handling facilities by walking, mentally, through the animal's eye-line — anticipating shadows, reflections, and movement vectors that a verbal-dominant designer would simply not notice. Half of US cattle now pass through systems she designed.

ELECTROMAGNETISM · 19C

Michael Faraday

1791 – 1867 · England

"I have far more confidence in the one man who works mentally and bodily at a matter than in the six who only talk about it."

Faraday had no formal mathematical training. His field-line visualization — invisible curves of force filling space — was a working tool he could literally see in his mind's eye. Maxwell, mathematically gifted but visually appreciative, later wrote the field equations precisely because Faraday had already seen the field.

CHEMISTRY · 19C

August Kekulé

1829 – 1896 · Germany

"I saw the atoms gambolling before my eyes… one of the snakes had seized hold of its own tail."

Kekulé deduced the ring structure of benzene from a daydream — a snake biting its own tail. Whether this is a true reverie or an after-the-fact narrative, the underlying cognitive move is canonical: structural insight arriving as image first, formula second.

"Most of the fundamental ideas of science are essentially simple, and may, as a rule, be expressed in a language comprehensible to everyone."

Albert Einstein · 1938

III

The Cortical Atlas大脑皮层分工

§ 03
Where the cortex spends its acres

Visual Cortexseeing & imagining

25–30%

Prefrontalplanning · self · meta

25–30%

Motor / Somatosensorybody & action

15–20%

Language RegionsBroca · Wernicke et al.

5–10%

Auditorytones · phonemes

3–8%

Olfactorysmell

< 1%

What the budget tells us

Three things jump out. First: vision is the dominant tenant. A quarter to a third of the cortical sheet is dedicated to processing what the eye delivers — far more than any other modality. Second: language is small by comparison. The classical Broca and Wernicke regions, even with their extended networks, are a thin slice of the pie. Third: language is not even a unitary subsystem — it borrows generously from motor regions (for articulation), auditory regions (for phonology), and the prefrontal cortex (for syntax and planning).

The naive folk theory — "thought = inner speech" — is the modality with the smallest dedicated footprint claiming the throne. The richer truth is that most of cognition runs below the speech layer, in regions evolution invested in vastly more aggressively.

Language as Operating System语言即操作系统

§ 04
Not a region — a layer

If language doesn't fit into one cortical district, what is it? Better metaphor: an operating system. Vision, motor, memory, emotion all run as native processes. Language is the shared interface through which they coordinate, schedule, and report — both internally (inner speech) and externally (conversation, instruction, writing).

L4 · APPS

Conscious Thought自觉思维

Inner monologue, deliberation, narrative self, instruction-following.

L3 · OS

Language Layer语言层

Coordinates and labels processes from below. Not a single region — a distributed protocol that lets vision, memory, motor, and emotion exchange tokens. This is where humans differ most from other primates.

L2 · KERNEL

Cortical Subsystems皮层子系统

Visual cortex, motor cortex, somatosensory, auditory, prefrontal. The major capabilities. Most cognition happens here, sub-linguistically.

L1 · HARDWARE

Brainstem & Thalamus脑干与丘脑

Heart-rate, breath, sleep, arousal, sensory routing. The kernel of the kernel — most of it never crosses into awareness.

The implication: a "verbal" thinker isn't running thought in the language layer alone — they are simply logging and routing more of it through the language API. A "visual" thinker keeps more cognition native to the visual subsystem, never serializing it into words. Both are using the whole stack; they differ in which calls cross the OS boundary.

The Machine Mind机器之思

§ 05
The asymmetry inside AI

VERBAL CHANNEL

Large Language Models

GPT · Claude · Gemini · Llama

Trained almost entirely on text. The medium of cognition is the token sequence. Even when given image inputs (multimodal LLMs), the picture is collapsed to embeddings and processed inside the language scaffolding.

Strong at: reasoning sequentially, citing, summarising, instructing.
Weak at: rotating an object, simulating physics, sketching a layout, "seeing" a scene before describing it.
Has tools that can generate images. Lacks the inner workspace where images can be manipulated mid-thought.

Effectively: a brilliant verbalizer with no mind's eye. Excellent at tasks that compress to language. Brittle at tasks that don't.

VISUAL CHANNEL

Diffusion & Vision Models

Stable Diffusion · DALL·E · Sora · Flux

Operate in pixel and latent visual space. Genuinely fluent in images, motion, lighting. Yet they don't understand what they generate — there is no propositional model, no symbolic reasoning, no goal-directed planning.

Strong at: rendering, style transfer, photorealism, motion synthesis.
Weak at: counting fingers, holding a multi-step goal, refusing an impossible request, explaining its own work.
Pure visual cortex without prefrontal cortex — vivid, occasionally hallucinatory, never reflective.

Effectively: a visualizer with no inner narrator. Conjures scenes but can't critique them.

The two halves of the AI ecosystem map suspiciously onto the two thinking modes — and neither half, alone, can do what an ordinary human cognition does in a coffee-shop conversation.

What a True Mind's Eye Would Need机器心眼之路

§ 06
Six requirements

A Manipulable Visual Workspace可操作的视觉工作区

An internal canvas the model can both generate on and read from. Not "produce an image and forget" — but place an object, rotate it, occlude it, query its new geometry, all without leaving the workspace.

Bidirectional Language ↔ Vision Coupling语言与视觉双向接口

Not just text-to-image. A model that, when reasoning verbally about a system, can spawn a sketch, inspect it, edit it, and let the inspection update its proposition. And vice versa — let an image's contents shape the next sentence.

An Object Permanence Memory物体持续性记忆

Mental objects must persist between thoughts. Tesla mentally ran a turbine for weeks; current visual generators forget the previous frame in milliseconds. Without persistence, no simulation, no design, no mental engineering.

Embodied Spatial Priors具身空间先验

Real visualizers learn space by moving through it. AI agents that train in simulated and physical environments — robotics, drones, embodied research labs — accumulate the spatial intuitions a chat-only model never can. You learn the world by bumping into it.

Symbolic ↔ Iconic Translation符号与图像互译

The capacity to swap between a propositional representation ("the lever rotates 30° around point P") and a vivid image of the same — and to debug discrepancies between the two. This is what Faraday and Maxwell did between them; future AI may need to do it inside one model.

A Self-Critique Loop That Operates in Both Channels双通道自我审视

Reasoning models already self-critique in text. The next step is critique in images — a model that looks at its own generated diagram and flags "this gear can't actually mesh" or "the perspective is impossible". Currently the visual side hallucinates with confidence because nothing inside is calling it out.

VII

Test Yourself自测

§ 07
Eight informal questions

Q1.

When you remember the front door of your childhood home, do you see it (paint colour, handle, smell of the wood) or recall facts about it (it was wooden, slightly faded blue, north-facing)? Visualizers see. Verbalizers list.

Q2.

Picture the letter R, then mentally rotate it 90° clockwise. How long did that take? Could you read off the new shape? Strong visualizers do this in under a second; strong verbalizers describe the operation rather than perform it.

Q3.

When you read fiction, do you see the scene like a film, or do you experience it as language flowing past? Some lifelong readers report no imagery at all (aphantasia). Others see continuously.

Q4.

When you plan your morning, do you watch yourself moving through the kitchen, or do you list the steps? Either is fine. Most people do some of both — but rarely in equal proportion.

Q5.

When solving an unfamiliar geometry problem, do you draw a figure first, or write the equations first? The famous "stop and draw" of mathematicians is a visualizer's reflex.

Q6.

Do you talk to yourself in your head most of the time, occasionally, or almost never? Inner speech is far less universal than introspection suggests.

Q7.

When given new directions to a place, do you find them easier as a map or as a list of turns? Maps for visualizers, turns for verbalizers — and the difference is often dramatic.

Q8.

When trying to recall a name that just escaped you, do you see the person's face, hear their voice, or get a feeling of the shape of the name ("starts with M, two syllables")? Different retrieval pathways reveal which sub-system tagged the memory in the first place.

Counted more "see" answers — you lean visualizer. More "list / words" — verbalizer. Mixed — almost everyone.

VIII

The Mind's Eye

The Spectrum两极之间

The Visualizers用图思考之人

Albert Einstein

Nikola Tesla

Leonardo da Vinci

Temple Grandin

Michael Faraday

August Kekulé

The Cortical Atlas大脑皮层分工

What the budget tells us

Language as Operating System语言即操作系统

The Machine Mind机器之思

Large Language Models

Diffusion & Vision Models

What a True Mind's Eye Would Need机器心眼之路

A Manipulable Visual Workspace可操作的视觉工作区

Bidirectional Language ↔ Vision Coupling语言与视觉双向接口

An Object Permanence Memory物体持续性记忆

Embodied Spatial Priors具身空间先验

Symbolic ↔ Iconic Translation符号与图像互译

A Self-Critique Loop That Operates in Both Channels双通道自我审视

Test Yourself自测

Further Reading延伸阅读

Books

Primary Sources

Modern Research

The Mind's Eye

The Spectrum两 极 之 间

The Visualizers用 图 思 考 之 人

Albert Einstein

Nikola Tesla

Leonardo da Vinci

Temple Grandin

Michael Faraday

August Kekulé

The Cortical Atlas大 脑 皮 层 分 工

What the budget tells us

Language as Operating System语 言 即 操 作 系 统

The Machine Mind机 器 之 思

Large Language Models

Diffusion & Vision Models

What a True Mind's Eye Would Need机 器 心 眼 之 路

A Manipulable Visual Workspace可 操 作 的 视 觉 工 作 区

Bidirectional Language ↔ Vision Coupling语 言 与 视 觉 双 向 接 口

An Object Permanence Memory物 体 持 续 性 记 忆

Embodied Spatial Priors具 身 空 间 先 验

Symbolic ↔ Iconic Translation符 号 与 图 像 互 译

A Self-Critique Loop That Operates in Both Channels双 通 道 自 我 审 视

Test Yourself自 测

Further Reading延 伸 阅 读

Books

Primary Sources

Modern Research

The Spectrum两极之间

The Visualizers用图思考之人

The Cortical Atlas大脑皮层分工

Language as Operating System语言即操作系统

The Machine Mind机器之思

What a True Mind's Eye Would Need机器心眼之路

A Manipulable Visual Workspace可操作的视觉工作区

Bidirectional Language ↔ Vision Coupling语言与视觉双向接口

An Object Permanence Memory物体持续性记忆

Embodied Spatial Priors具身空间先验

Symbolic ↔ Iconic Translation符号与图像互译

A Self-Critique Loop That Operates in Both Channels双通道自我审视

Test Yourself自测

Further Reading延伸阅读