An Interactive Reading

The Fall of the
Theorem Economy

How AI could destroy mathematics and barely touch it — an explorable commentary on David Bessis's essay.

The product of mathematics is clarity and understanding. Not theorems, by themselves. Bill Thurston, 1982 Fields medallist

David Bessis's central claim is audacious and, if taken seriously, disorienting: for two millennia, mathematics has measured itself by the wrong unit. The honor code of mathematicians — prove theorems and shut up — made sense when theorem-proving was an excellent proxy for the deeper thing mathematics is actually about: the neuroplastic expansion of human understanding. AI is about to break that proxy.

This piece walks through the essay's key moves — the personal anecdote of a theorem never written, the diagnosis of an honor code, the comparison with closed games like chess, the three caveats that every outside observer misses, and the Overhang that LLMs are uniquely positioned to harvest — making each conceptual structure tangible, manipulable, explorable.

The prose is mine, the ideas are Bessis's, and the original essay repays reading in full. What follows is not a summary but an instrument — a device for thinking alongside the argument.

· · ·

01 · The Iceberg

The Theorem Never Written

On what Bessis calls his best theorem — the one he couldn't be bothered to prove.

Bessis opens with a confession. His best theorem crystallized one morning in Lausanne, clear enough that he jotted it as an informal remark at the bottom of a slide. He never wrote it up. His second-best — Theorem 0.5 in his preprint on Garside categories — was, in his own words, shockingly easy to prove. A few pages of pretty terrestrial group theory.

The hard part, he insists, was not proving it. It was not even conjecturing it. The hard part was intuiting that there should be something like Theorem 0.5, and building the conceptual framework in which it became trivial to state. Once the definitions were right, everything fell into place. The visible theorem is the tip of an iceberg — beneath it, a vast submerged structure of definitions, intuitions, refactors, dead ends, and the slow cognitive process of finding the right frame.

Drag the waterline to see what the honor code rewards, and what it cannot see:

Figure 01 — The Iceberg of Mathematical WorkInteractive

Waterline 18%

Drag to expose the submerged structure. The honor code only registers what floats above water. The theorem Bessis cared about most — the one he never wrote — is a tiny nub on a mountain of submerged conceptual work; even Theorem 0.5, which he did publish, is a pencil-tip atop 700 pages of necessary preliminaries.

Bessis's provocation: the David who solved the K(π,1) conjecture is a social parasite of the much better mathematician, the David who crafted Definitions 2.4 and 9.3. The system rewarded him for the theorem. But the theorem was just the visible reward ticket for a piece of conceptual work that was, in fact, the real contribution.

02 · Two Mathematics

Official Math, Secret Math

A dissociation so complete that most mathematicians barely notice it.

In his book Mathematica: a Secret World of Intuition and Curiosity, Bessis frames the public misunderstanding as a tension between two versions of mathematics.

Official math is the formal deduction system — a nerd's paradise where truth takes binary values, reasoning is valid or invalid, and there is technically no room for bullshit. It lives in textbooks, peer-reviewed journals, and Lean source files.

Secret math is the human part. Why official math was invented. How mathematicians successfully interact with it. Its effects on their brains. The bizarre mental techniques through which they continuously expand its territory. Secret math never made it to the curriculum, because it lacks the defining qualities of official math — it is soft, fuzzy, subjective — and by contrast looks like cheap pedagogical backstory.

Click on an activity to see how each version of mathematics weights it. The gap between the two columns is where the honor code does its damage.

Figure 02 — Two Valuations of Mathematical WorkInteractive

Official Math

Cold · Logical · Objective · Rewarded

Secret Math

Fuzzy · Subjective · Cognitive · Invisible

Hover or click an activity to see how the two realms value it differently.

The punchline: every activity a mathematician actually spends her day on — conjecturing, renaming, reframing, chasing intuitions, failed drafts, seminar arguments, half-lit hunches — scores high in Secret Math and near zero in Official Math. The honor code has institutionalized the denial of the former.

03 · The Honor Code

Hardy's Curse

Definitions are worth zero points. Exposition counts negative. The best jobs go to those who prove the hardest theorems.

G. H. Hardy, in his celebrated and insufferable Mathematician's Apology, wrote:

There is no scorn more profound, or on the whole more justifiable, than that of the men who make for the men who explain. Exposition, criticism, appreciation, is work for second-rate minds. G. H. Hardy, A Mathematician's Apology

Behind closed doors, mathematicians complain. They insist on the importance of teaching — even for their own comprehension. They lament the pathological obsession with theorem-proving priority. Yet in public, they are bound by the honor code. Prove theorems and shut up.

The honor code persisted for millennia because it worked: theorem-proving was a cryptographic proof of conceptual innovation. The two facets were in symbiosis. You could measure one and trust the other would follow. Bessis argues that AI has just broken that cryptographic link.

Adjust each activity's social reward. Try Hardy's distribution. Try Thurston's. Watch what kind of mathematician each ledger selects for:

Figure 03 — The Ledger of Social RewardInteractive

Preset

Hardy's default. Theorem-proving is the only activity that pays. The system selects for hard-problem solvers who can be trusted to have done the conceptual work silently, off the ledger. For centuries this worked — because theorem-proving was cryptographic proof of genuine conceptual innovation.

Solving a big conjecture was a cryptographic proof that you had come up with a genuine conceptual innovation. — But what if the cryptography is broken?

04 · Hinton's Claim

A Closed System?

Geoff Hinton said mathematics is like Go and Chess — a closed system with rules. He is wrong, and the wrongness matters.

The trigger for Bessis's essay was a remark by Geoff Hinton — Turing awardee, Nobel laureate — that mathematics is "particularly easy" terrain for AI because it is "a closed system with rules," "much like Go or Chess."

The comparison is seductive and dangerously wrong. Chess and Go aren't ordinary closed systems; they are finite two-person games with perfect information, and by Zermelo's 1913 theorem they admit optimal strategies. Mathematics has none of these properties.

Click through to compare:

Figure 04 — Three "Closed Systems" ComparedInteractive

The comparison Hinton made does not survive a single column. But the comparison is doing enormous political work. If mathematics is just another Go — a finite closed system awaiting AlphaZero — then the pitch writes itself: "DeepMind solved Go and chess, we're going to solve mathematics!" At a moment when frontier labs are betting trillions that humans are about to become obsolete, solving mathematics — the crown jewel, the pride of the human race — is simply irresistible.

But if mathematics isn't a game, there is no solution to converge on. There is only an ongoing, collective, cognitive project. The question becomes: what does AI actually do to that project?

05 · Caveat One

Oceans of Significance

When First Proof released "ten research-level problems," the press read one thing. The insiders saw another.

On February 5th, an eleven-mathematician team (including 2014 Fields medallist Martin Hairer) released the First Proof project — ten research-level math questions to benchmark AI. Google, OpenAI and others scrambled. By Daniel Litt's count, "somewhere between 6 and 8 of the 10 problems were solved correctly if one combines all attempts."

Headlines wrote themselves. But Litt added a caveat the headlines missed: the problems were closer to technical lemmas — well-calibrated intermediate sub-problems, the kind that occur in a few paragraphs of a real paper. There is an ocean between a technical lemma and a serious paper. Another ocean between a serious paper and a breakthrough. Another between a breakthrough and a Fields-medal-level contribution. And several oceans above that.

Drag the slider to see where different mathematical achievements sit on this logarithmic scale. First Proof is pinned where it actually lived:

Figure 05 — The Oceans of Mathematical SignificanceInteractive

The scale is roughly logarithmic in required "human-years of preparatory canonization." First Proof's problems live at the research lemma tier — already impressive for AI, still several orders of magnitude below a serious paper, and many more below a Fields-medal-level contribution. The headline "AI solved 8/10 research-level problems" collapses all of this into a single, misleading sentence.

The First Proof team picked the right difficulty for a first benchmark — above zero, below ten — and did it scientifically. Bessis's critique is not of the science but of the optics. The general public doesn't read the fine print, and may not even know what a lemma is. If Google had scored a perfect 10, the headline would have read: GAME OVER.

06 · Caveat Two

Mathlib vs Mathslop

Even correct proofs can fail to be accretive. A 200,000-line Lean blob is not the same thing as canonized mathematics.

A few weeks before Bessis wrote, Math Inc produced a Lean formalization of Maryna Viazovska's sphere-packing theorems — Fields-medal-level results from 2022. A major technical achievement. Yet the formal-mathematics community pushed back hard. Why?

Alex Kontorovich has a name for what's missing — canonization:

By canonization, I mean the process of taking a local, one-off formalization and turning it into library mathematics: general, reusable, coherent, efficient, and compatible with the rest… Canonization often changes the picture itself: the definitions, the abstractions, the API, and sometimes even the statement… This is extremely difficult and time-consuming. Alex Kontorovich

Mathlib is a human-curated library with clean APIs, shared abstractions, and coherent definitions. Adding to Mathlib means making your proof play well with everyone else's. Math Inc's Viazovska formalization is a 200,000-line orphan — a vibe-coded blob that sits in a repository exposing no intelligible interface. It is correct, but it is not accretive. It does not grow the commons; it clogs it.

Toggle between the two paradigms. The counter that matters — human-understandable theorems — only increments on one side:

Figure 06 — Two Modes of Adding Viazovska's TheoremInteractive

Mathlib — Canonized

Each new theorem integrates with existing abstractions. Proofs reuse lemmas; lemmas become reusable. The graph grows coherently.

Nodes: 0 Edges: 0 Understood by humans: 0

Mathslop — Autoformalized Blob

A massive correct proof dumped into a repository, exposing no interface. Machine-verified, human-opaque. Cannot be reused, cannot be audited.

Lines: 0 Interface: none Understood by humans: 0

Add Viazovska theorem via:

Bessis's metaphysical point cuts deep: the problem with unintelligible mathematics isn't that it might be false. It is that it is literally meaningless — it doesn't compile on the only hardware currently able to make sense of it and appreciate its value. The human brain.

And because of the honor code, Patrick Massot's warning bites: there is no social reward left for cleaning up behind Math Inc. The prize was captured. Mathlib contributors are advised to work on less shiny projects less likely to be destroyed.

07 · Caveat Three

The Overhang

Most research sits atop unrealized capital gains from past creativity. LLMs are uniquely positioned to harvest them.

Bessis reports a mystifying experience from his own career. Working on the K(π,1) conjecture, he came up with what felt like a stroke of genius — divided Garside categories. As he was canonizing the idea, he made a shocking discovery:

My proud invention… was essentially equivalent to a seemingly esoteric construction in an entirely different branch of mathematics, namely the Bökstedt–Hsiang–Madsen subdivision of Connes's cyclic category in algebraic K-theory. David Bessis

His "one genius idea" was, in a sense, déjà vu. A profound correspondence between two seemingly unrelated fields. Connecting those dots is one of the most celebrated aspects of mathematics — Descartes connecting algebra and geometry, John McKay noticing that 196,883 + 1 = 196,884 (seeding Monstrous Moonshine).

But modern mathematics is so baroque that most correspondences go unnoticed. The Overhang is Bessis's name for the latent value — unrealized capital gains of past creativity — that sits waiting in the existing corpus for someone to connect the dots. A good LLM, with its phenomenal memorization and pattern-matching, can sweep through it at a rate no human can match. Professional mathematicians read at most a few hundred papers in their careers; LLMs are trained on the entire corpus.

The interactive below shows a conceptual space of mathematical ideas. Solid lines are known connections. Dashed lines are latent — correspondences that exist in the mathematical reality but have never been noticed. Reveal them. Then sweep with an LLM:

Figure 07 — The Overhang of Latent ConnectionsInteractive

Concepts in corpus

Known connections

Latent (the overhang)

A pair of nodes in the far regions of the graph is explicitly labeled — Divided Garside categories (combinatorial group theory) and BHM subdivision (algebraic K-theory) — the exact correspondence Bessis lived through. In the full mathematical corpus this graph is enormous, and the ratio of latent to known connections is almost certainly staggering.

The Overhang is Bessis's explanation for a puzzle Litt raised: a human with the LLMs' capabilities would, almost certainly, be proving amazing theorems constantly. Why haven't we seen this from the models yet? What are they missing? They are missing — still — meaning-making. But they can already harvest meaning made by humans, and that alone is enough to flood the system. Research sits atop unrealized capital gains; AI can strip-mine the commons.

08 · The Decoupling

The Broken Symbiosis

For millennia, problem-solving and concept-building walked hand in hand. AI can split them.

Peter Scholze, 2018 Fields medallist, put it cleanly:

What I care most about are definitions… Unfortunately, it is impossible to find the right definitions by pure thought; one needs to detect the correct problems where progress will require the isolation of a new key concept. Peter Scholze

Concept-building and problem-solving are symbiotic. New concepts enable new problems; new problems force new concepts. Theorem-proving was the externally-visible proxy. But AI does not need to pass through the concept-building station to reach problem-solving. It can decouple them — winning the proxy while leaving the underlying activity stranded.

Toggle between the two eras:

Figure 08 — The Symbiosis and Its BreakInteractive

Era

In the human era, the two activities are yoked by cognition. A proof that cannot be understood cannot be taught, and a theorem whose statement cannot be internalized cannot be extended. The proxy held because the proxy was enforced by the biology of the prover.

The point is not that AI is "cheating." The point is that the measurement — theorem-proving — was always a proxy for something else. As long as the only agents in the game were humans, the proxy was reliable. Now that another kind of agent can crack it independently, the proxy reveals what it always was: a proxy, not the thing itself.

09 · A Proposal

A Mathematical Intelligence Scale

Bessis's suggestion: borrow from the self-driving car industry. Don't let anyone declare victory from a single benchmark.

Bessis proposes a Mathematical Intelligence Scale, modeled on the SAE levels of self-driving autonomy. When AI crushes the next iteration of First Proof, the press release should read not "AI solves mathematics" but "AI achieves Level 4 (out of 10) on the Mathematical Intelligence Scale." The scale should be open-ended, perhaps with seemingly unreachable upper rungs, because mathematics is inexhaustible. The point is to prevent premature victory-declarations that will "demonetize" working mathematicians before anyone has thought carefully.

Below is one plausible draft. Drag the marker to explore each level. Current frontier AI, per Bessis's assessment, sits somewhere around 3–4 — able to solve technical lemmas autonomously, but unable to propose, define, canonize, or restructure:

Figure 09 — A Draft Mathematical Intelligence ScaleInteractive

The key design choice: levels 7 through 10 involve activities — proposing conjectures, crafting definitions, building theory, establishing entirely new branches — that are not benchmarkable in the way problem-solving is. There is no objectively verifiable outcome. Which was precisely the problem the honor code was trying to solve. Now, Bessis argues, we may have to embrace the subjectivity we once tried to abolish.

Mathematicians shouldn't be afraid of subjectivity. It is not about cheating. It is about being fair and not committing collective seppuku. David Bessis

10 · Coda

Coming Clean

The largest product recall in memory — a bogus idea of math, sold to eight billion people, now coming due.

Bessis's recommendation to the mathematical community is not technical. It is rhetorical, almost political. Finally come clean about the nature of mathematics. All the active mathematicians he cites — Tao, Avigad, Litt, Kontorovich — are careful to note that there is something in mathematics beyond theorem-proving. But they frame it as a passing remark. It needs to become the headline.

We are not trying to meet some abstract production quota of definitions, theorems and proofs. The measure of our success is whether what we do enables people to understand and think more clearly and effectively about mathematics. Bill Thurston

Long term, this forces a rewriting of the ethos and evaluation criteria of the profession. The second rule of the Intuition Club — if you really want to talk about intuition, make it sound casual and accessory, because we ain't the psychology department — must be revoked. If Thurston is right, Bessis notes, then mathematicians are the psychology department.

The essay's three speculative forecasts deserve repetition:

1 · A clearer cut between pure and applied

Not abstract vs concrete, but different sweet spots on the intelligibility-vs-applicability trade-off. Rational to spend ten billion dollars of compute on plasma containment equations even if intelligibility waits a century. Less rational for the Riemann Hypothesis, which is a meaning-making project, not a utility one.

2 · The rise of intuition-maxxers

With AI handling the technical barriers, a new generation of mathematicians will deploy unorthodox tactics to survey entire continents of mathematics at a pace that will mystify their elders. They will see further than anyone before, standing on the shoulders of giant machines.

3 · A renewed interest in the philosophy and neuroscience of mathematics

Because if mathematics is just problem-solving, and machines solve problems, then there is no reason to teach mathematics at all — and everyone vaguely senses that would lead to idiocracy but can't articulate why. The articulation is due.

The object of mathematics is the honor of the human spirit. — Carl Jacobi, 1830. Still the best single-sentence mission statement we have.

A note on the present reading

This interactive essay is a companion to, not a substitute for, Bessis's original piece. The visualizations above are conceptual schematics — metaphorical renderings of arguments, not formal simulations of mathematical activity. The Overhang graph, in particular, is an illustration of the structural claim; the real graph is incomprehensibly larger, and the real ratio of latent to known connections is almost certainly more extreme than anything a 2D canvas can convey.

Bessis ends by citing Jacobi. We ended by citing Jacobi. It is, in truth, a sentence worth citing twice.

The Fall of theTheorem Economy