Proof complexity asks a deceptively simple question: how long must a proof be? The question matters because the length of a proof is tied to the efficiency of algorithms that search for proofs, and ultimately to the P versus NP problem. In 1979, Cook and Reckhow formalized a proof system as a polynomial-time verifiable relation between a statement and a witness (the proof). They showed that NP equals co-NP if and only if there exists a proof system in which every tautology has a polynomial-size proof. This connection turned proof complexity into a branch of computational complexity, and it set the stage for a series of frameworks that each capture a different style of reasoning.
The first framework to be studied systematically was Resolution. Resolution operates on clauses (disjunctions of literals) and derives new clauses by the cut rule: from (A ∨ x) and (B ∨ ¬x) derive (A ∨ B). It is a refutation system: to prove a formula unsatisfiable, one derives the empty clause. Resolution is weak but tractable. In 1985, Haken proved that any Resolution refutation of the pigeonhole principle requires exponential size, giving the first nontrivial lower bound for a natural proof system. That result established Resolution as a benchmark: it is the weakest system for which a super-polynomial lower bound is known.
Frege and Extended Frege Systems are much stronger. A Frege system is a standard propositional calculus with a finite set of axiom schemas and inference rules (e.g., modus ponens). Frege systems are polynomially equivalent to each other, and they can simulate Resolution efficiently. Extended Frege systems add the ability to introduce new variables that abbreviate subformulas, which can shorten proofs dramatically. No super-polynomial lower bound is known for Frege or Extended Frege; proving one would separate NP from co-NP. These systems remain the central open challenge in proof complexity. Today, Resolution is the workhorse of SAT solvers, while Frege and Extended Frege serve as the theoretical gold standard for feasible reasoning.
In the 1980s, researchers began to connect propositional proof systems to weak fragments of arithmetic. Bounded Arithmetic consists of first-order theories in which quantifiers are bounded by terms, so that the theories correspond to computational complexity classes. The key insight, due to Paris, Wilkie, and later Buss, is that a bounded arithmetic theory can be translated into a family of propositional proofs. For example, the theory S₂¹ (which captures polynomial-time reasoning) corresponds to Extended Frege systems. This translation works in both directions: lower bounds on propositional proof size imply lower bounds on the run time of algorithms that the arithmetic theory can formalize.
Bounded Arithmetic thus provides a uniform framework that subsumes many propositional systems. It differs from the earlier propositional approach by shifting the focus from individual proof systems to the logical strength of theories. The framework remains active because it links proof complexity to computational complexity in a precise way: proving that a bounded arithmetic theory cannot prove a certain combinatorial principle is equivalent to proving a lower bound for the corresponding propositional system. This connection has been used to show, for instance, that the pigeonhole principle is hard for Resolution but easy for Frege systems.
In the 1990s, a new family of frameworks emerged that use algebraic equations rather than logical formulas. Algebraic Proof Systems represent a Boolean formula as a set of polynomial equations over a field (e.g., x² − x = 0 to enforce Boolean values). A refutation is a derivation of the polynomial 1 from the equations, using algebraic rules such as addition and multiplication. The best-known systems are the Nullstellensatz system, the Polynomial Calculus (PC), and the Cutting Planes system (which uses linear inequalities).
Algebraic systems differ from Frege and Resolution in their language and in the lower bound techniques they support. For example, the Polynomial Calculus can simulate Resolution, but it is not known whether it can simulate Frege. Conversely, algebraic systems have yielded strong lower bounds for specific functions, such as the Tseitin tautologies and the pigeonhole principle, using degree arguments and the so-called "rank method." These lower bounds are often simpler than those for Resolution. Algebraic proof systems remain an active area, especially for understanding the power of semialgebraic reasoning (Cutting Planes) and for connections to circuit complexity.
A different direction opened in the 1990s with the study of Interactive Proofs and PCP (Probabilistically Checkable Proofs). In an interactive proof system, a verifier exchanges messages with a prover and can use randomness; the verifier must accept true statements with high probability and reject false ones. The class IP was shown to equal PSPACE, and the PCP theorem (1992) proved that every NP statement has a proof that can be checked by reading only a constant number of bits. This framework redefines what counts as a proof: it is no longer a static string but a probabilistic, interactive protocol.
How does this relate to the earlier frameworks? Classical proof systems (Resolution, Frege, Algebraic) are deterministic and non-interactive. Interactive proofs and PCPs are more powerful in terms of verification efficiency, but they do not directly address the length of proofs in the Cook–Reckhow sense. Instead, they have transformed the study of hardness of approximation and have given new lower bound techniques. For example, the PCP theorem implies that certain proof systems (like those for graph non-isomorphism) are hard to simulate. The framework remains a major pillar of complexity theory, and its techniques have been used to prove lower bounds for Resolution and other systems.
Today, the five frameworks coexist, each with a distinct role. Resolution is the standard model for SAT solving and is well understood. Frege and Extended Frege are the primary targets for lower bounds; proving a super-polynomial lower bound for Frege would be a breakthrough. Bounded Arithmetic provides the deepest connection to computational complexity, and its theories are used to calibrate the strength of combinatorial principles. Algebraic Proof Systems offer a complementary approach, often yielding lower bounds where logical systems fail. Interactive Proofs and PCP have shifted the focus to verification efficiency and have become a central tool in complexity theory.
Researchers broadly agree that the ultimate goal is to understand the power of feasible reasoning, and that each framework illuminates a different aspect. The main disagreement is about which framework is most likely to yield a separation of NP from co-NP. Some believe that algebraic methods (e.g., the Polynomial Calculus) will lead to lower bounds for Frege, while others think that bounded arithmetic or PCP techniques will be necessary. There is also a live debate about whether Extended Frege is strictly stronger than Frege, and whether algebraic systems can simulate Frege. These open questions keep proof complexity a vibrant and central subfield of the theory of computation.