Quantum Computer Benchmark Shows Quantinuum Retains Coherence at Record Scale

Insider Brief
- A new benchmark study shows Quantinuum’s H2-1 quantum processor maintained coherent computation on a 56-qubit MaxCut problem, surpassing classical simulation capabilities.
- The test used a simplified algorithm called linear-ramp QAOA and was applied to 19 quantum systems across five vendors.
- Quantinuum demonstrated strength in maintaining coherence at larger qubit counts while IBM’s Fez chip handled deeper circuits with up to 10,000 layers.
A new benchmark study has found that a quantum processor from Quantinuum can maintain meaningful computation across 56 qubits, offering a rare sign of coherence at a size beyond classical simulation.
The results, published on arXiv recently by researchers at Forschungszentrum Jülich and Purdue University, show that the Quantinuum H2-1 system passed a quantum benchmark using a combinatorial optimization problem called MaxCut on a fully connected 56-qubit circuit. The test used over 4,600 two-qubit gates—a measure of circuit depth—without degrading into random noise. According to the researchers, this is the largest instance of such a problem to yield useful results on real hardware.
The authors write: “In the case of Quantinuum H2-1, the experiments of 50 and 56 qubits are already above the capabilities of exact simulation in HPC systems and the results are still meaningful.”

Addressing Bottlenecks
The study aimed to assess how today’s quantum computers perform as circuits scale in both width and depth — two key bottlenecks for running useful quantum algorithms. They tested 19 quantum processing units (QPUs) from five vendors: IBM, IonQ, Quantinuum, Rigetti and IQM.
At the center of the test was a protocol called linear-ramp QAOA (LR-QAOA), a simplified version of the Quantum Approximate Optimization Algorithm. Unlike variational approaches that tune parameters, LR-QAOA uses fixed, linearly ramped gate schedules. It was chosen for its simplicity, scalability, and the fact that it could be implemented across many hardware platforms.
The benchmark asked each system to solve versions of the MaxCut problem, which involves dividing a graph into two groups to maximize the weight of edges between them. MaxCut is considered difficult for classical machines at scale, making it a suitable candidate to probe quantum performance.
Researchers ran LR-QAOA on problems defined over three types of graph layouts: a linear chain of qubits, each chip’s native layout, and a fully connected graph, which requires more gate operations. In each case, they measured how far the quantum computer’s answers were from the best-known solution, using a metric called the approximation ratio. If the result stayed better than a random sampler, the system was considered to have passed the test.
The Quantinuum H2-1 passed the test at 56 qubits, running three layers of LR-QAOA involving 4,620 two-qubit gates.
“To the best of our knowledge, this is the largest implementation of QAOA to solve an FC combinatorial optimization problem on real quantum hardware that is certified to give a better result over random guessing,” the authors wrote.
Speaking on the findings, Chris Langer, a Fellow, a key inventor and architect for the Quantinuum hardware, and advisor to the CEO, writes in a company blog post that the firm is focused on benchmarks and improving performance to set new standards.
“We take benchmarking seriously at Quantinuum. We lead in nearly every industry benchmark, from best-in-class gate fidelities to a 4000x lead in quantum volume, delivering top performance to our customers,” Langer writes. “Our Quantum Charged-coupled Device (QCCD) architecture has been the foundation of our success, delivering consistent performance gains year-over-year. Unlike other architectures, QCCD offers all-to-all connectivity, world-record fidelities, and advanced features like real-time decoding. Altogether, it’s clear we have superior performance metrics across the board.”
IBM Shows Strength in Depth
While Quantinuum demonstrated coherence at larger scales, IBM showed strength in pushing depth. The team ran a 100-qubit problem on IBM’s ibm fez chip using up to 10,000 layers of LR-QAOA—nearly a million two-qubit gates. Though the system eventually thermalized — a process where qubits lose their quantum coherence and settle into lower-energy, non-informative states — some coherent information persisted until around 300 layers.
The QPU time that was committed to obtain 1,000 samples running the p = 10,000 LR-QAOA circuit was 21 seconds, according to the researchers. Just a note: This refers to aggregate QPU execution time, not including queuing or classical processing overhead. However, the researchers added that this as the largest experiment by gate count to date.
IBM’s newer Heron-generation devices consistently outperformed older Eagle chips in both depth and width tests. Fractional gates—custom gate implementations introduced on Heron devices—reduced the number of required two-qubit operations by half, allowing longer circuits with less error buildup.
For example, ibm fez achieved an approximation ratio of 0.808 on a 100-qubit chain problem using fractional gates. In comparison, older IBM chips often failed to maintain coherence beyond 20 qubits or 100 layers.
Still, the study’s authors caution that benchmarks like these should not be viewed as absolute measures of superiority. Performance varied depending on circuit structure, device layout, and problem instance. The authors observed that IBM’s ibm marrakesh had stronger results than ibm fez for small problem sizes, but fez held up better as circuit size increased.
Modality Tradeoffs
The benchmarking protocol also offers insight into architectural tradeoffs. Trapped ion systems like Quantinuum and IonQ showed high fidelity and full connectivity between qubits, which made them suitable for fully connected problems. But they are currently limited by slow gate times and lower total gate counts.
“For instance, in a hypothetical 25-qubit FC problem with p=100 and 1,000 shots, ionq aria 2 would require 18,000 seconds based solely on the 2-qubit gate time, while ibm fez would need 0.51 seconds. This disparity emerges from the 2-qubit gate time and the inability to execute these gates in parallel in trapped ion QPUs,” the researchers write.
Superconducting qubits, used by IBM, IQM, and Rigetti, run much faster but face challenges scaling depth due to noise accumulation.

Reliable, Scalable Benchmarking Tool
Beyond comparing systems, the authors suggest that LR-QAOA could serve as a reliable and scalable benchmarking tool for the industry. Existing metrics such as quantum volume (QV) or error per layered gate (EPLG) can miss platform-specific issues like crosstalk or routing limitations.
“… Different candidate benchmarks to capture essential aspects of performance should be proposed,” they write, adding that there is a need for practical, reproducible tests as hardware continues to evolve.
The protocol’s simplicity — using fixed parameters and requiring no optimization loops — also made it easy to deploy across platforms. Even with as few as seven samples on Quantinuum’s H2-1, researchers could distinguish meaningful results from random outputs, offering a way to test high-cost platforms efficiently.
Still, the benchmark is not without limits. Performance depends on fixed schedule parameters, which were set using heuristics rather than optimal tuning. And while the study avoided artificially enhancing results through methods like warm starts or postprocessing, it acknowledged that such techniques could mask hardware deficiencies if not fairly accounted for.
Looking ahead, the team says next steps include refining schedule parameters, improving circuit routing strategies, and extending tests to other optimization problems. But the broader message remains: as quantum systems grow, reliable, scalable, and transparent benchmarks will be essential for tracking progress—and understanding what useful quantum computing really looks like.
For a deep dive into the technical aspects of the study, the full paper is available on arXiv. Researchers use pre-print servers, such as arXiv, to receive immediate feedback on their work, however, it’s not yet undergone official peer review, a key step of the scientific method.
The research team included: J. A. Montañez-Barrera and Kristel Michielsen of the Jülich Supercomputing Centre and David E. Bernal Neira, of Purdue University.