Why AI Math Needs Trivia to Find Valuable Theorems

A new theoretical result shows that an AI generating verified mathematics must emit an endless stream of correct-but-worthless statements to cover the valuable ones — and proves the stream is a necessity, not a bug.

Here is the seductive story everyone is telling about AI and mathematics: pair a language model with a proof assistant, let the model propose statements and proofs, and let the checker ruthlessly reject anything false. What survives is certified true. Crank the loop and you get an ever-growing library of verified theorems. The verifier, in this telling, is the adult in the room — the thing that turns a hallucinating model into a reliable mathematician. A new paper by Xiaoyu Li, Andi Han, Dai Shi, Zheng Gao, Jiaojiao Jiang, and Junbin Gao, posted to arXiv on June 12, 2026, takes that story and complicates it with a theorem of its own. I'd love to believe the simple version, but the math says otherwise.

The gap the verifier cannot close

The authors put their finger on the real bottleneck. The binding constraint in machine-generated mathematics is no longer what a checker can verify — checkers are good at that — but the gap between what is verifiable and what a mathematician would actually value. A proof assistant will happily certify that 7 plus 7 equals 14, or any of an infinite number of true-but-pointless statements. Truth is cheap. Worth is the scarce resource, and a checker has no opinion about it. The paper's framing is blunt: the verifier is not taste.

“AI systems coupled to proof assistants now generate formal mathematics at scale, and the gap between what a checker can verify and what a mathematician would value has become the binding constraint.”— arXiv:2606.14688 source

To make that more than a slogan, the authors build a formal model. They cast the generation of valuable mathematics as a problem of language generation in the limit — a theoretical setting where a generator tries to enumerate the members of an unknown target language while seeing only examples. The verifiable formal language, which they call F, is the set of all checkably-valid statements, accessed through a membership oracle that is just the proof checker. Hidden inside F is an unknown valuable language H — the theorems a mathematician would care about — revealed only piecemeal through an adversarial enumeration of a core of the existing literature, of some fixed density. Every statement the generator emits then falls into one of three bins: valuable (it lands in H), trivial (it is in F but not in H — correct but worthless), or a hallucination (not even in F — false).

Four results, one uncomfortable conclusion

Within that model the authors settle four questions, and they build toward a single deflationary punchline. First, they confirm that the verifier is not taste in a precise sense: the families of problems for which generation is even possible are exactly the same as in a model with no oracle at all, characterized by a classical condition from learning theory due to Angluin. Having a perfect truth-checker does not expand the set of valuable targets you can hope to cover.

Second — and this is the verifier earning its keep — the checker does buy something real: sound coverage. With the oracle you can cover all the unseen valuable statements while asserting only valid ones; without it, you cannot. What the verifier does, in the authors' phrase, is relocate the unavoidable errors from false to trivial. It cannot eliminate error; it can only change its character. You stop hallucinating falsehoods and start emitting correct irrelevancies instead.

Third, and central, is a sharp dichotomy. A generator that is allowed to emit only finitely many trivial statements is capped at covering half the valuable density. But the moment you permit an infinite allowance of trivia — even at a vanishingly small rate — the achievable coverage jumps dramatically, up toward nearly the full density. The transition, the authors stress, is in the trivia count, not the rate: it is not about how often you say something trivial, but whether you are willing to keep saying trivial things forever. The unrecorded mass of valuable mathematics — the theorems not yet in the literature — can only be reached if you accept an unending dribble of certified junk along the way.

Fourth, they show both regimes actually instantiate in a concrete compression model of mathematics, so the dichotomy is not a pathological abstraction but something that bites in a realistic setting.

What this means in plain terms

Strip the formalism and the conclusion is this: a perfect verifier cannot substitute for taste, and the endless stream of correct-but-worthless statements that AND systems produce is not an engineering accident to be optimized away. It is a provable necessity. Covering the valuable mathematics that humans have not yet written down requires generating an infinite — though asymptotically negligible — stream of certified trivia. The flood is the price of the harvest. You cannot have a generator that reaches the deep, unrecorded valuable theorems without also tolerating one that keeps producing trivially-true filler indefinitely.

Why the hype-checkers should care

This lands squarely against the comfortable narrative that scaling verified generation is just a matter of compute and a good checker. The result says the checker, no matter how perfect, leaves the value problem entirely untouched, and that the very act of being thorough about value guarantees an unbounded byproduct of worthlessness. It reframes a phenomenon practitioners already grumble about — automated systems spitting out heaps of dull, technically-correct lemmas — from an annoyance into a structural feature of the problem.

The honest caveats are real. This is a result in generation-in-the-limit, an idealized model with adversarial enumeration and asymptotic guarantees; it characterizes what is possible in principle, not the throughput of any particular system tomorrow. "Valuable" is operationalized as membership in a fixed hidden language rather than as the messy, evolving judgment of working mathematicians. And "asymptotically negligible" does a lot of quiet work — the trivia stream is infinite but thin, so a clever pipeline could still filter aggressively in practice. None of that softens the core message, which is the one the field most needs to hear: if you are counting on the proof checker to make your AI mathematician good rather than merely correct, the theorem says check that assumption against the math. The verifier will keep you honest. It will not make you interesting.

A Proof Checker Can Tell You What's Correct — Not What's Worth Proving