AI-Powered Mathematics: Solving Erdős Problems with Frontier Models
We used GPT-5.4 Pro, Opus 4.6, and GPT-5.2 to attack open problems in mathematics—and one of them is now fully settled, with a machine-checked proof in Lean 4.
Since launching UnsolvedMath in February, we've been doing what the benchmark was designed for: pointing frontier language models at open mathematical problems and measuring what happens. The results have exceeded our expectations. In three months of systematic work, we've produced a full solution to one Erdős problem, substantive partial results on two more, and the first serious AI-driven attack on a FrontierMath open problem.
This post summarizes what we found, what it tells us about the current state of AI mathematical reasoning, and where the field is heading.
The results
Erdős Problem 1148 — Bounded Representations by x² + y² − z²
Full solution · GPT-5.4 Pro
Every sufficiently large integer can be written as x² + y² − z² with max(x², y², z²) ≤ n. The proof converts the ternary representation problem into a point-picking problem for primitive binary quadratic forms, then applies Duke's equidistribution theorem in the precise form due to Einsiedler–Lindenstrauss–Michel–Venkatesh. A finite parity correction recovers the integer triple. Formalized and verified in Lean 4 using the UlamAI Prover.
FrontierMath Open Problem — Large Solutions on Cubic Diophantine Surfaces
Partial results · Opus 4.6
Finding three distinct integer solutions with |x| > 1050 to cubic equations of the form z² + y²z + x³ + Ax + B = 0. The investigation deployed elliptic fibration analysis, K3 surface theory, Mordell curve methods, secant approaches on cubic surfaces, and norm-form descent over cubic number fields. All six surfaces were shown to be rational elliptic surfaces with Picard rank ρ = 3, confirming parametric families exist in principle—but specific polynomial coefficients remain beyond current computational reach.
Erdős Problem 1183 — Monochromatic Union-Closed Subfamilies
Partial results · GPT-5.4 Pro
For the largest monochromatic union-closed family F(n) forced by any two-coloring of the Boolean lattice 2[n], we establish ⌊(n+1)/2⌋ ≤ F(n) ≤ nlog⊂2n + O(log log n). The upper bound links the free rank of a union-closed family to VC-dimension. An analogous lattice parameter f(n) requiring closure under both union and intersection is pinned within a logarithmic factor. This answers affirmatively the subexponential half of the Erdős–Ulam question.
Erdős Problem 783 — Extremal Coprime Coverings Under a Reciprocal Budget
Partial results · GPT-5.2
For budget C ≤ log 2, the optimal asymptotic uncovered density is exactly 1 − C, achieved by taking the largest primes above a threshold. Beyond log 2, a construction via smooth (friable) integers yields an upper bound in terms of the Dickman–de Bruijn function ρ(eC). A conjecture is formulated predicting this is optimal for all fixed C > 0, with a variational strategy via Buchstab recurrences proposed.
What we learned about frontier models
Working with these problems over the past three months has sharpened our picture of what current models can and cannot do at the research frontier.
Extended thinking is the primary driver
The pattern is consistent: meaningful mathematical progress requires models to think for 20 minutes or more. Short inference-time outputs produce plausible-looking but shallow proof sketches. The breakthroughs came from sessions where the model had room to explore, backtrack, and refine—behaving more like a graduate student working through a problem over an afternoon than a calculator producing instant answers.
Models vary in their strengths
| Model | Strength | Observation |
|---|---|---|
| GPT-5.4 Pro | Sustained proof construction | Produced the full Erdős 1148 solution and the tight bounds for Problem 1183. Strong at maintaining logical coherence across multi-page arguments. |
| Opus 4.6 | Technique diversity | Deployed the widest battery of approaches on the FrontierMath problem—from K3 surface theory to norm-form descent. Best at identifying when an approach has failed and pivoting. |
| GPT-5.2 | Analytic number theory | The Erdős 783 work relied on delicate estimates with the Dickman function and Buchstab's identity. GPT-5.2 handled these naturally. |
Formal verification closes the loop
The Erdős 1148 result is not just an LLM-generated proof sketch—it has been formalized and machine-checked in Lean 4 using our UlamAI Prover. The formalization pipeline takes the natural-language proof, segments it into theorem-like units, drafts Lean code via LLM, typechecks against Lean's kernel, and iteratively repairs errors. The Lean source is publicly available.
This is the workflow we believe will define the next era of AI-assisted mathematics: LLMs propose, formal systems verify, and the result is a proof you can trust completely—not because you trust the model, but because you trust the verification kernel.
Three capabilities are still missing
Our evaluation across four dimensions—literature search, theorem proving, proof checking, and formalization—reveals consistent gaps. All current models struggle with building a global mental picture of a problem's structure across multiple pages of reasoning. They lack systematic self-verification, meaning they can generate errors and not catch them without external checking. And analogical reasoning—recognizing that a technique from one area of mathematics applies to a structurally similar but superficially different problem—remains weak. These are the capabilities that separate a good graduate student from an autonomous mathematical researcher.
The bigger picture
Three months ago, we argued that open mathematical problems are the right benchmark for AI reasoning. The results since then support the thesis. Traditional benchmarks are saturated—GPT-5.2 scores 99.2% on GSM8K and 100% on AIME 2025. These numbers tell us almost nothing about what models can do at the frontier. Open problems, by contrast, provide a natural curriculum of difficulty that will not saturate as models improve.
What's new is that models are now producing results that matter to mathematicians, not just impressive demos. A settled Erdős problem is a settled Erdős problem, regardless of whether a human or a machine found the proof. The partial results on Problems 783, 1183, and the FrontierMath equation contain genuinely new mathematical ideas that advance the state of knowledge on these questions.
We're not claiming that AI has replaced mathematicians. The human role remains essential: choosing which problems to attack, evaluating whether proof strategies are promising, and designing the verification pipeline. But the dynamic is shifting from "AI as calculator" toward "AI as collaborator"—and the pace of that shift is accelerating.
What's next
We're continuing to push on both sides of the pipeline. On the proving side, we're scaling our attacks to a broader set of UnsolvedMath problems and systematically benchmarking each new model release. On the verification side, we're extending UlamAI Prover's formalization capabilities and working toward Lean for broader scientific formalization.
All papers, proofs, and Lean source code are available on our research page.
If you're working on AI for mathematics—or if your model just solved an Erdős problem—we'd love to hear from you.
Papers:
Erdős Problem 1148 — Full Solution ·
Lean formalization
FrontierMath Open Problem — Partial Results
Erdős Problem 1183 — Partial Results
Erdős Problem 783 — Partial Results
UlamAI Prover — Technical Report
Open Mathematical Problems as an AI Reasoning Benchmark
