About
QED Bench evaluates frontier models across competitive math and rigorous proof-based benchmarks. While we feature a traditional leaderboard, we contextualize each score by analyzing the underlying reasoning required to achieve it.
Outcome-based contests like AIME, HMMT, PUMaC, and Putnam provide the foundation. They offer well-designed problems for measuring whether models can move beyond the destination of a correct answer and toward reasoning we can actually trust.
That foundation is what allows the methodology to generalize to proof-based contests like the USAMO and IMO. Only by building it carefully can we set the stage for evaluating graduate-level, PhD-level, and eventually research mathematics.