How to Benchmark Logical Qubit Claims
Insider Brief
- Alice & Bob has proposed a five-part framework for evaluating logical qubit claims, arguing that the industry needs a common standard to assess progress toward fault-tolerant quantum computing.
- The framework evaluates whether a logical qubit outperforms physical qubits, improves as error-correction resources scale, operates across enough correction cycles, avoids post-selection, and remains stable over timescales relevant to practical computation.
- The report concludes that while recent demonstrations such as Google’s beyond-breakeven surface code results represent meaningful progress, significant technical challenges remain before logical qubits can support large-scale fault-tolerant quantum computing.
As the quantum computing sector has evolved, so has the way performance is measured. Early on, qubit count — the number of quantum bits in a processor — was viewed as a key benchmark. Today, researchers increasingly argue that raw qubit count matters less than the number of logical qubits, which can reliably perform computations despite errors.
But ask what those benchmarking claims about logical qubits actually mean — how to compare them, what standard they’re being held to — and the answer gets complicated fast.
That’s the problem quantum computing startup Alice & Bob set out to address in a new report, “Defining the Logical Qubit: Five Criteria to Benchmark Logical Qubit Claims.” The company, which builds cat qubit-based hardware, has published what amounts to a scorecard: five criteria that any logical qubit claim should satisfy before it can be taken seriously as a step toward fault-tolerant quantum computing (FTQC). The framework is deliberately hardware-agnostic — it’s meant to apply whether one is building with superconducting qubits, trapped ions, photonics, or anything else.
The report is aimed at investors, analysts, and enterprise decision-makers who need to parse vendor claims but don’t have a quantum physics background. But it’s useful for anyone trying to understand what the field is actually measuring, and what it still isn’t.
“Logical qubits are rapidly becoming the industry’s primary benchmark for progress toward fault-tolerant quantum computing, yet the term is used to describe achievements with vastly different levels of performance and capability,” Jérémie Guillaud, VP Quantum Software, Alice & Bob, said in a statement. “Without a common benchmark, it’s difficult for the industry to compare approaches and evaluate genuine progress. At Alice & Bob, we believe a logical qubit should be more than an experimental demonstration – it should represent a fundamental building block of a fault-tolerant quantum computer. By proposing a clear definition and common set of criteria, we hope to make logical qubit claims more transparent, comparable, and easier to evaluate.”
The Gap Between ‘Logical Qubit’ and ‘Useful Logical Qubit’
To understand why a definitional framework is needed, it helps to understand what a logical qubit is and why it’s so difficult to build.
Physical qubits — the actual hardware units inside a quantum processor — are fragile. They lose their quantum state through a process called decoherence, and they make errors at rates that would be catastrophic for any serious computation. The solution the field has converged on is quantum error correction (QEC). Rather than fixing the hardware, a single logical qubit is encoded across multiple physical qubits and use that redundancy to detect and correct errors before they cascade.
As an analogy, classical computers protect data by storing redundant copies of each bit. If one copy flips, a majority vote among the copies catches the error and corrects it. Quantum mechanics rules out simple copying — the no-cloning theorem forbids it — but researchers have found mathematically equivalent schemes that achieve the same protective effect without ever directly reading the encoded information. The logical qubit is the result: a unit of quantum information that is more reliable than any of the physical qubits it’s built from.
That’s in theory, however, in practice, error correction only helps if the underlying hardware is already good enough. Add redundancy to hardware that’s too noisy, and more opportunities are created for failure than can be prevented. And even when QEC does help, demonstrating that it helps in a way that will actually matter for real computation is harder than it might appear.
The core of Alice & Bob’s concern is that the term, logical qubit, is used in a variety of ways across the industry and research community, according to the report. The variety and interchangability makes it difficult to benchmark.
Five criteria are proposed as the minimum bar.

Criterion 1: Breakeven — Can You Outperform Your Physical Qubits?
The first and most fundamental test is whether error correction is actually helping. The logical qubit’s lifetime — the average time before an error occurs — must exceed that of the best physical qubit it’s built from. Alice & Bob calls the point where logical and physical error rates are equal “breakeven.” Anything worse is “below breakeven,” meaning error correction is actively making things worse.
While this sounds like a low bar, error correction itself introduces errors. Syndrome measurements (the indirect measurements used to detect errors without disturbing the stored information) are imperfect, and decoding those measurements can produce wrong corrections. Getting the logical qubit to outperform its physical components requires the underlying hardware to already be below a threshold error rate. Above that threshold, the mathematics of error correction can work against further improvements.
Google Quantum AI has demonstrated beyond-breakeven results using a surface code, which the report cites as satisfying this first criterion. It is a meaningful milestone, but it’s also just the beginning.
Criterion 2: Scalable Parameters — Can You Make It Better?
Reaching breakeven with one configuration is not enough. To be useful, error correction must be improvable: as more physical qubits are added, error rates should go down. The report calls this having “scalable parameters.”
The key concept here is code distance, which is a parameter that governs how many physical errors the logical qubit can tolerate before a logical error slips through. For a code with distance d, the system can handle up to (d−1)/2 simultaneous physical errors, according to the report. Increasing the code distance requires more physical qubits but, if the hardware is below the error threshold, produces a more reliable logical qubit.
The practical implication is that a vendor claiming a logical qubit should be able to show that increasing the code distance actually reduces the logical error rate and, ideally, that this improvement holds at code distances large enough to run the target applications. Demonstrating a distance-3 logical qubit is a start. Showing that distance-5 performs better, and that the extrapolation reaches the error rates needed, for example, Shor’s algorithm, is what this criterion is actually asking for.
Alice & Bob report that scalable parameters can take different forms depending on the hardware approach. For physical-level error correction — such as the cat qubit approach Alice & Bob itself uses — parameters like the number of photons in the resonator can be tuned to improve performance, and those physical-level improvements typically combine with code-level parameters.
Criterion 3: Sufficient QEC Cycles — Have All the Errors Had Time to Happen?
This criterion addresses a subtle but important measurement problem. Logical errors don’t occur instantaneously. They result from multiple physical errors accumulating across multiple qubits over time — often spreading across several error-correction rounds before they compound into a logical failure. Run too few of those rounds, and the most damaging error patterns never get a chance to fully develop. If an experiment runs for fewer cycles than the code distance, it’s possible to measure an artificially low logical error rate simply because the relevant error patterns haven’t had enough time to develop.
The report adds that to measure the true logical error rate, an experiment needs to run for at least several multiples of d error correction rounds. Shorter runs underestimate the error rate and produce results that don’t predict real-world performance.
This is a criterion that can be verified from published data, which makes it a useful check for anyone evaluating claims. Divide the total runtime by the QEC cycle time to get the number of cycles, and compare that to the code distance used.
Criterion 4: Performance Across All Runs — Does It Work Without Cherry-Picking?
This criterion targets a — controversial — practice called post-selection, which is discarding experimental runs where errors were detected before reporting results. Post-selection is a legitimate tool for some experimental purposes, but it produces logical error rates that have no relationship to what a machine would actually deliver during useful computation.
Fault-tolerant quantum computing cannot discard bad runs. Real applications require the machine to succeed consistently, not selectively. A logical qubit that looks excellent only when failed attempts are thrown away is, as Alice & Bob puts it, “quantum error discarding,” not quantum error correction.
Evaluating this criterion requires looking at how error rates were measured. In the report, the company advises checking for mentions of “post-selection” or “discarded rounds” in experimental descriptions, and looking for whether error rates are reported continuously across all runs or only for a curated subset.
Criterion 5: Utility Timescales — Does Error Correction Last Long Enough?
The fifth criterion, called a bonus, is necessary in principle, but harder to verify and somewhat of a longer-term target. It asks whether error correction can be sustained for the full duration of a meaningful computation, which might last anywhere from an hour to a week.
The complication is that quantum hardware is subject to rare, uncorrelated error events — high-energy cosmic rays passing through the chip, for instance — that can cause multi-qubit failures that error correction cannot handle. If these events occur infrequently relative to the duration of a short benchmarking run, the experiment will never encounter them. But scale up to a real computation, and they become a limiting factor.
A system that looks stable over a few microseconds of benchmarking might behave very differently over hours. Demonstrating stability at utility timescales is, the report acknowledges, something of an aspirational criterion at this stage of the field’s development. But it belongs in any honest framework for evaluating FTQC readiness.
How to Use This Framework
The five criteria form a progression with each one builds on the previous. A user needs breakeven before scalability matters, scalability before sufficient cycles are meaningful, and so on. A logical qubit that satisfies all five is a credible building block for fault-tolerant computation. One that satisfies only the first two, or that achieves low error rates through post-selection, is a research result. It’s interesting, but not what enterprise decision-makers should be planning around.
“This is a strong, timely, and useful framework for cleaning up logical-qubit claims,” said Russ Fein, Managing Director, Corporate Fuel Partners, in a statement. “It is especially valuable for investors and non-expert decision-makers because it provides a simple checklist for separating FTQC-relevant progress from weaker demonstrations.”
The report’s conclusion is that the field isn’t there yet. Current demonstrations are progressing meaningfully — Google’s beyond-breakeven surface code results are real progress — but the distance between today’s experimental achievements and a machine that can run Shor’s algorithm on RSA-relevant key sizes remains large. Resource estimates for breaking RSA-2048 have fallen significantly as hardware and algorithms have improved, but “fewer than 100,000 physical qubits” still means fault-tolerant qubits, which in turn requires the kind of reliable logical qubit operations these five criteria are designed to assess.
