Main

Quantum computing promises computational speed-ups in quantum chemistry5, quantum simulation6, cryptography7 and optimization8. However, quantum information is fragile and quantum operations are error prone. State-of-the-art many-qubit platforms have only recently demonstrated entangling gates with 99.9% fidelity9,10, far short of the <10−10 error rates needed for many applications11,12. Quantum error correction is postulated to realize high-fidelity logical qubits by distributing quantum information for many entangled physical qubits to protect against errors. If the physical operations are below a critical noise threshold, the logical error rate should be suppressed exponentially as we increase the number of physical qubits per logical qubit. This behaviour is expressed in the approximate relation

$${\varepsilon }_{d}\propto {\left(\frac{p}{{p}_{{\rm{thr}}}}\right)}^{(d+1)/2}$$
(1)

for error-corrected surface code logical qubits3,4,13. Here d is the code distance indicating 2d2 − 1 physical qubits used per logical qubit; p and εd are the physical and logical error rates, respectively; and pthr is the threshold error rate of the code. Thus, when ppthr, the error rate of the logical qubit is suppressed exponentially in the distance of the code, with the error suppression factor Λ = εd/εd+2 ≈ pthr/p representing the reduction in logical error rate when increasing the code distance by two. Although many platforms have demonstrated different features of quantum error correction14,15,16,17,18,19,20, no quantum processor has definitively shown below-threshold performance.

Although achieving below-threshold physical error rates is itself a formidable challenge, fault-tolerant quantum computing also imposes requirements beyond raw performance. These include features like stability for the hours-long timescales of quantum algorithms21 and the active removal of correlated error sources like leakage22. Fault-tolerant quantum computing also imposes requirements on classical co-processors—namely, the syndrome information produced by the quantum device must be decoded as fast as it is generated23. The fast operation times of superconducting qubits, ranging from tens to hundreds of nanoseconds, provide an advantage in speed but also a challenge for decoding errors both quickly and accurately.

In this work, we realize surface codes operating below the threshold on two Willow processors. Using a 72-qubit processor, we implement a distance-5 surface code operating with an integrated real-time decoder. Subsequently, using a 105-qubit processor with similar performance, we realize a distance-7 surface code. These processors demonstrate Λ > 2 up to distance 5 and 7, respectively. Our distance-5 and distance-7 quantum memories are beyond breakeven, with distance-7 preserving quantum information for more than twice as long as its best constituent physical qubit. To identify possible logical error floors, we also implement high-distance repetition codes on the 72-qubit processor, with error rates that are dominated by correlated error events occurring once an hour. These errors, the origins of which are not yet understood, set a current error floor of 10−10 in the repetition code. Finally, we show that we can maintain below-threshold operation on the 72-qubit processor even when decoding in real time, meeting the strict timing requirements imposed by the fast 1.1 μs cycle duration of the processor.

A surface code memory below threshold

We begin with results from our 105-qubit Willow processor depicted in Fig. 1a. It features a square grid of superconducting transmon qubits24 with improved operational fidelities compared with our previously reported Sycamore processors17,25. The qubits have a mean coherence time T1 of 68 μs and T2,CPMG of 89 μs, which we attribute to improved fabrication techniques, participation ratio engineering and circuit parameter optimization (Supplementary Information). Increasing coherence contributes to the fidelity of all of our operations, which are displayed in Fig. 1b.

Fig. 1: Surface code performance.
figure 1

a, Schematic of a distance-7 (d = 7) surface code on a 105-qubit processor. Each measure qubit (blue) is associated with a stabilizer (blue-coloured tile). Data qubits (gold) form a d × d array. We remove leakage from each data qubit using a neighbouring qubit below it, with additional leakage removal qubits at the boundary (green). b, Cumulative distributions of error probabilities measured on the 105-qubit processor. Red, Pauli errors for single-qubit gates; black, Pauli errors for CZ gates; gold, Pauli errors for data qubit idle during measurement and reset; blue, identification error for measurement; teal, weight-4 detection probabilities (distance 7, averaged for 250 cycles). c, Logical error probability pL for a range of memory experiment durations. Each data point represents 105 repetitions decoded with the neural network and is averaged over the logical basis (XL and ZL). Black and grey, data from ref. 17 for comparison. Curves, exponential fits after averaging pL over code and basis. To compute εd values, we fit each individual code and basis separately and report their average (Supplementary Information). d, Logical error per cycle, εd, reducing with surface code distance d. Uncertainty on each point is less than 7 × 10−5. The symbols match those in c. Means for d = 3 and d = 5 are computed from the separate εd fits for each code and basis. Line, fit to equation (1), determining Λ. The inset shows simulations up to d = 11 alongside experimental points, both decoded with ensembled matching synthesis for comparison. Line, fit to simulation; Λsim = 2.25 ± 0.02.

We also make several improvements to decoding, using two high-accuracy offline decoders. One is a neural network decoder26, and the other is a harmonized ensemble27 of correlated minimum-weight perfect matching decoders28 augmented with matching synthesis29. These run on different classical hardware, offering two potential paths towards real-time decoding with higher accuracy. To adapt to device noise, we fine-tune the neural network with processor data26 and apply reinforcement learning optimization to the matching graph weights30.

We operate a distance-7 surface code memory comprising 49 data qubits, 48 measure qubits and 4 additional leakage removal qubits17. Summarizing, we initiate surface code operation by preparing the data qubits in a product state corresponding to a logical eigenstate of either the XL or ZL basis of the ZXXZ surface code31. We then repeat a variable number of cycles of error correction, during which the measured qubits extract parity information from the data qubits to be sent to the decoder. After each syndrome extraction, we run data qubit leakage removal (DQLR)32 to ensure that leakage to higher states is short-lived. We measure the state of the logical qubit by measuring the individual data qubits and then check whether the corrected logical measurement outcome of the decoder agrees with the initial logical state. It is worth noting that fault-tolerant computation does not require active correction of the code state; the decoder can simply reinterpret the logical measurement outcomes13.

From the surface code data, we can characterize the physical error rate of the processor using the bulk error detection probability33. This is the proportion of weight-4 stabilizer measurement comparisons that disagree with their ideal noiseless comparisons, thereby detecting an error. The surface code detection probabilities are pdet = (7.7%, 8.5%, 8.7%) for d = (3, 5, 7), respectively. We attribute the increase in detection probability with code size to finite size effects (Supplementary Information) and parasitic couplings between qubits. We expect both effects to saturate at larger processor sizes34.

We characterize the logical performance of our surface code by fitting the logical error per cycle εd up to 250 cycles, averaged over the XL and ZL bases. We average the performance of nine different distance-3 subgrids and four different distance-5 subgrids to compare with the distance-7 code. Finally, we compute Λ using linear regression of ln[εd] versus d. With our neural network decoder, we observe Λ = 2.14 ± 0.02 and ε7 = (1.43 ± 0.03) × 10−3 (Fig. 1c,d). With ensembled matching synthesis, we observe Λ = 2.04 ± 0.02 and ε7 = (1.71 ± 0.03) × 10−3.

Furthermore, we simulate logical qubits of higher distances using a noise model based on the measured component error rates in Fig. 1b, additionally including leakage and stray interactions between qubits17 (Supplementary Information). These simulations are shown alongside the experiment (Fig. 1d, inset), both decoded with ensembled matching synthesis. We observe reasonable agreement with experiment and decisive error suppression, affirming that the surface codes are operating below threshold.

Thus far, we have focused on Λ, because below-threshold performance guarantees that physical qubit lifetimes and operational fidelities can be surpassed with a sufficiently large logical qubit. In fact, our distance-7 logical qubit already has more than double the lifetime of its constituent physical qubits. Although comparing physical and logical qubits is subtle owing to their different noise processes, we plot a direct comparison between logical error rate and physical qubit error rate averaged over the X and Z basis initializations (Fig. 1c). To quantify the qubit lifetime itself, we uniformly average over pure states using the metric proposed elsewhere16 (Supplementary Information). The distance-7 logical qubit lifetime is 291 ± 6 μs, exceeding the lifetimes of all the constituent physical qubits (median, 85 ± 7 μs; best, 119 ± 13 μs) by a factor of 2.4 ± 0.3. Our logical memory beyond breakeven extends previous results using bosonic codes16,35,36 to multiqubit codes, and it is a critical step towards logical operation breakeven.

Logical error sensitivity

Equipped with below-threshold logical qubits, we can now probe the sensitivity of logical error to various error mechanisms in this new regime. We start by testing how logical error scales with physical error and code distance. As shown in Fig. 2a, we inject coherent errors with variable strengths on both data and measure qubits, and extract two quantities from each injection experiment. First, we use detection probability as a proxy for the total physical error rate. Second, we infer the logical error per cycle by measuring the logical error probability at ten cycles, decoding with correlated matching28.

Fig. 2: Error sensitivity in the surface code.
figure 2

a, One cycle of the surface code circuit, focusing on one data qubit and one measure qubit. Black bar, CZ; H, Hadamard; M, measure; R, reset; DD, dynamical decoupling; orange, injected coherent errors; purple, DQLR32. b, Error injection in the surface code on a 105-qubit processor. Distance 3 averages over 9 subset codes, and distance 5 averages over 4 subset codes, as shown in Fig. 1. Logical performance is plotted against the mean weight-4 detection probability averaging over all codes, for which increasing the error injection angle α increases the detection probability. Each experiment is ten cycles with 2 × 104 total repetitions. Lines, power-law fits for data points at or below at which the codes cross. The inset shows the inverse error suppression factor, 1/Λ, versus the detection probability. Line, fit to points at which 1/Λ < 1, 3.4pdet + 0.29. c, Estimated error budget for the surface code based on component errors and simulations. CZ, CZ error, excluding leakage and stray interactions; CZ stray int., CZ error from unwanted interactions; data idle, data qubit idle error during measurement and reset; meas., measurement and reset error; leakage, leakage during CZs and due to heating; 1Q, single-qubit gate error; excess, unmodelled error, which is the difference between experimental and simulated 1/Λ (correlated matching). d, Comparison of logical performance with and without DQLR in each cycle. Distance-3 points (red triangles) are averaged over 4 quadrants. Each experiment is 105 repetitions. Curves, exponential fits. QEC, quantum error correction. e, Repeating experiments to assess performance stability, comparing distance 3 and distance 5. Each point represents a sweep of logical performance versus experiment duration, up to 250 cycles. To obtain the data in d and e, a 72-qubit processor is used.

In Fig. 2b, we plot the logical error per cycle versus detection probability for the distance-3, distance-5 and distance-7 codes. We find that the three curves cross near a detection probability of 20%, roughly consistent with the crossover regime explored elsewhere17. The inset further shows that detection probability acts as a good proxy for 1/Λ (ref. 33 and Supplementary Information). When fitting power laws below the crossing, we observe approximately 80% of the ideal value (d + 1)/2 predicted by equation (1). We hypothesize that this deviation is caused by excess correlations in the device. Nevertheless, higher-distance codes show a faster reduction in logical error, realizing the characteristic threshold behaviour in situ on a quantum processor.

To quantify the impact of correlated errors along with more typical gate errors, we form an error budget. Using the method outlined elsewhere17,37, we estimate the relative contribution of different component errors to 1/Λ. We run simulations based on a detailed model of our 72-qubit processor. The model includes local noise sources due to gates and measurements, as well as two sources of correlated error: leakage, and stray interactions between neighbouring qubits with our CZ gates that can induce correlated ZZ and swap-like errors (Supplementary Information). Figure 2c shows our estimated error budget for 1/Λ in the 72-qubit processor when decoding with correlated matching. Applying the same decoder to experimental data yields Λ = 1.97 ± 0.02. The error budget overpredicts Λ by 14% (Fig. 2c, ‘excess’), indicating that most but not all error effects in our processor have been captured. Leakage and stray interactions make up an estimated 17% of the budget; although not a dominant contributor, we expect their importance to increase as the error rates decrease. Moreover, out-of-model long-range interactions or high-energy leakage might contribute to the error budget discrepancy. Overall, both local and correlated errors from CZ gates are the largest contributors to the error budget. Consequently, continuing to improve both coherence and calibration will be crucial to further reduce logical error.

One potential source of excess correlations that we actively mitigate is leakage to higher excited states of our transmon qubits. During the logical qubit operation, we remove leakage from measure qubits using multilevel reset. For data qubits, DQLR swaps leakage excitations to measure qubits (or additional leakage removal qubits)32. To examine sensitivity to leakage, we measure logical error probability of distance-3 and distance-5 codes in our 72-qubit processor with and without DQLR, and the results are shown in Fig. 2d. Although activating DQLR does not strongly affect the distance-3 performance, it substantially boosts the distance-5 performance, resulting in a 35% increase in Λ. Comparatively, the detection probability decreases by only 12% (Supplementary Information), indicating that the detection probability is only a good proxy for logical error suppression if the errors are uncorrelated. Overall, we find that addressing leakage is crucial for operating surface codes with transmon qubits15,32,38.

Finally, we test the sensitivity to drift. Using our 72-qubit processor, we measure the logical performance of one distance-5 and four distance-3 codes 16 times over 15 h, and the results are shown in Fig. 2e. Before the repeated runs, we use a frequency optimization strategy that forecasts defect frequencies of two-level systems (TLSs). This helps to avoid qubits coupling to TLSs during the initial calibration as well as for the duration of the experiments. Between every four experimental runs, we recalibrate the processor to account for potential qubit frequency and readout signal drift. We observe an average Λ = 2.18 ± 0.07 (standard deviation) and best Λ = 2.31 ± 0.02 (Supplementary Information) when decoding with the neural network. Although the performance of the worst distance-3 quadrant appears to fluctuate due to a transient TLS moving faster than our forecasts, this fluctuation is suppressed in the distance-5 code, suggesting that larger codes are less sensitive to component-level fluctuations. Moreover, the logical error rates of experiments right after drift recalibration are not appreciably lower than those just prior, indicating that our logical qubit is robust to the levels of qubit frequency and readout drift present. These results indicate that superconducting processors can remain stable for the hours-long timescales required for large-scale fault-tolerant algorithms21.

A repetition code memory in the ultralow-error regime

Despite realizing below-threshold surface codes, orders of magnitude remain between present logical error rates and the requirements for practical quantum computation. In previous work running repetition codes, we found that high-energy impact events occurred approximately once every 10 s, causing large correlated error bursts that manifested a logical error floor of around 10−6 (ref. 17). Such errors would block our ability to run error-corrected algorithms in the future, motivating us to reassess repetition codes on our newer devices.

Using our 72-qubit processor, we run 2 × 107 shots of a distance-29 repetition code with 1,000 cycles of error correction, with the shots split evenly between bit- and phase-flip codes. In total, we execute 2 × 1010 cycles of error correction comprising 5.5 h of processor execution time. Given the logical error probability pL at 1,000 cycles, we infer the logical error per cycle as \({\varepsilon }_{d}=\frac{1}{2}(1-{(1-2{p}_{{\rm{L}}})}^{1/1,000})\). To assess how the logical error per cycle scales with distance d, we follow ref. 37 and subsample lower-distance repetition codes from the distance-29 data.

Averaging over bit- and phase-flip repetition codes, we obtain Λ = 8.4 ± 0.1 when fitting logical error per cycle versus code distance between d = 5 and d = 11 (Fig. 3a). Notably, the error per cycle on the 72-qubit processor is suppressed far below 10−6, breaking past the error floor observed previously. We attribute the mitigation of high-energy impact failures to gap-engineered Josephson junctions39. However, at code distances of d ≥ 15, we observe a deviation from exponential error suppression at high distances culminating in an apparent logical error floor of 10−10. Although we do not observe any errors at distance 29, this is probably due to randomly decoding correctly on the few most-damaging error bursts. Although this logical error per cycle might permit certain fault-tolerant applications11, it is still many orders of magnitude higher than expected and precludes larger fault-tolerant circuits12,21.

Fig. 3: High-distance error scaling in repetition codes.
figure 3

a, εd versus d when decoding with minimum-weight perfect matching. The repetition code points are from d = 29, 103-cycle experiments, 107 repetitions for each basis X and Z. We subsample smaller codes from the same d = 29 dataset, averaging over subsamples. Line, fit of Λ. We include data from ref. 17 for comparison. b, Example event causing elevated detection probabilities, which decay exponentially with time constant 369 ± 6 μs (grey dashed line). Three consecutive experimental shots are plotted, delimited by the vertical grey lines. The 28 measure qubits are divided into four quartiles based on the average detection probability in the grey-shaded window. Each trace represents the detection probability averaged over one quartile and a time window of ten cycles. Roughly half the measure qubits experience an appreciable rise in detection probability. The inset shows the average detection probability for each measure qubit (coloured circle) in the grey-shaded window. c, Logical error scaling with the injected error. We inject a range of coherent errors on all the qubits and plot against the observed mean detection probability pdet. Each experiment is ten cycles, and we average over 106 repetitions. Smaller code distances are again subsampled from d = 29. Lines, power-law fits \({\varepsilon }_{d}={A}_{d}{p}_{\det }^{(d+1)/2}\) (one fit parameter, Ad), restricted to εd > 10−7 and pdet < 0.3. d, 1/Λ scaling with the injected error. Typical relative fit uncertainty is 2%. Line, fit; 2.2pdet. To obtain the data in this figure, a 72-qubit processor is used.

When we examine the detection patterns for these high-distance logical failures, we observe two different failure modes (Supplementary Information). The first failure mode manifests as one or two detectors suddenly increasing in the detection probability by more than a factor of 3, settling to their initial detection probability tens or hundreds of cycles later (Supplementary Information). These less-damaging failures could be caused by transient TLSs appearing near the operation frequencies of a qubit, or by coupler excitations, but might be mitigated using methods similar to refs. 38,40. The second and more catastrophic failure mode manifests as many detectors simultaneously experiencing a larger spike in the detection probability; an example is shown in Fig. 3b. Notably, these anisotropic error bursts are spatially localized to neighbourhoods of roughly 30 qubits (Fig. 3b, inset). Over the course of our 2 × 1010 cycles of error correction, our processor experienced six of these large error bursts, which are responsible for the highest-distance failures. These bursts, such as the event shown in Fig. 3b, are different from previously observed high-energy impact events17. They occur approximately once an hour, rather than once every few seconds, and they decay with an exponential time constant of around 400 μs, rather than tens of milliseconds. We do not yet understand the cause of these events, but mitigating them remains vital to building a fault-tolerant quantum computer. These results reaffirm that long repetition codes are a crucial tool for discovering new error mechanisms in quantum processors at the logical noise floor. However, surface codes are larger and sensitive to more errors than repetition codes; therefore, these events may affect the surface code performance differently.

Furthermore, although we have tested the scaling law in equation (1) at low distances, repetition codes enable us to scan to higher distances and lower logical errors. Using a similar coherent error injection method as that in the surface code, we show the scaling of logical error versus physical error and code distance in Fig. 3c,d, observing good agreement with O(p(d+1)/2) error suppression. For example, reducing the detection probability by a factor of 2 manifests in reduction by a factor of 250 in logical error at distance 15, consistent with the expected O(p8) scaling. This shows the considerable error suppression that should eventually enable large-scale fault-tolerant quantum computers, provided we can reach similar error suppression factors in surface codes.

Real-time decoding

Along with a high-fidelity processor, fault-tolerant quantum computing also requires a classical co-processor that can decode errors in real time. This is because some logical operations are non-deterministic; they depend on logical measurement outcomes that must be correctly interpreted on the fly. If the decoder cannot process measurements fast enough, an increasing backlog of syndrome information can cause an exponential increase in computation time23. Real-time decoding is particularly challenging for superconducting processors due to their speed. The throughput of transmitting, processing and decoding the syndrome information in each cycle must keep pace with the fast error-correcting cycle time of 1.1 μs. Using our 72-qubit processor as a platform, we demonstrate below-threshold performance alongside this vital module in the fault-tolerant quantum computing stack.

Our decoding system begins with our classical control electronics, where the measurement signals are classified into bits and then transmitted to a specialized workstation using low-latency Ethernet. Inside the workstation, measurements are converted into detections and then streamed to the real-time decoding software using a shared memory buffer. We use the sparse blossom algorithm41, which is optimized to quickly resolve the local configurations of errors common in surface code decoding, using a parallelization strategy similar to that in ref. 42. The decoder operates on a constant-sized graph buffer that emulates the section of the error graph being decoded at any instant, but does not grow with the total number of cycles used in the experiment. Different threads are responsible for different spacetime regions of the graph, processing their requisite syndrome information as it is streamed in42,43,44,45. These results are fused until a global minimum-weight perfect matching is found. The streaming decoding algorithm is illustrated in Fig. 4a,b. We also use a greedy edge reweighting strategy to increase the accuracy by accounting for correlations induced by Y-type errors28,46.

Fig. 4: Real-time decoding.
figure 4

a, Schematic of the streaming decoding algorithm. The decoding problems are subdivided into blocks, with different threads responsible for different blocks. b, Task graph for processing blocks. Detections are enabled to match to the block boundaries, which will then be processed downstream during a fuse step. If a configuration of detection events cannot be resolved by a future fuse step, the decoder heralds failure. We use ten-cycle blocks to ensure that the heralded failure rate is negligible compared with the logical failure rate. c, Decoder latency versus experiment duration. Each blue point corresponds to a latency measurement for a full shot (ten shots per duration; horizontal bar, median; blue shading, violin plot). The yellow histograms represent fine-grained latency measurements of the time between receiving data and completing decoding for each ten-cycle block in a shot. The values from these fine-grained measurements, which we refer to as subshot latencies, tend to be slightly larger than those from full-shot latency measurements as the decoder may need to wait to fuse with detection events in future cycles. Infrequently, we see brief subshot latency spikes above 1 ms (Supplementary Information). d, Accuracy comparison for the surface code with three decoders. We include the real-time decoder (RT), ensembled matching synthesis (Ens.) and the neural network decoder (NN). Uncertainty on each point is less than 4 × 10−4 (Supplementary Information). To obtain the data in this figure, a 72-qubit processor is used.

In Fig. 4c, we report the decoder latency, which we define as the time between the decoding software receiving the final cycle of syndrome measurements and the time at which the decoder returns its correction. For our distance-5 surface code, we test different problem sizes by increasing the number of error correction cycles up to 106. We observe that the average decoder latency remains roughly constant at a net average of 63 ± 17 μs independent of the length of the experiment (up to 1.1 s), indicating that the decoding problem is being processed in real time. This latency will eventually lower bound the reaction time of the logical processor when enacting non-Clifford gates. Other contributions to the reaction time include the data transmission time (which we estimate to be less than 10 μs) and feedback (which we have not yet implemented). Moreover, our decoder latency scales with the code size, underscoring the need for further optimization.

Importantly, we are able to maintain below-threshold performance even under the strict timing requirements imposed by real-time decoding. We run a dedicated experiment on our 72-qubit processor to compare real-time decoding to high-accuracy offline neural network decoding of the same data, with the results shown in Fig. 4d. Our real-time decoder achieves ε5 = 0.35% ± 0.01% and Λ = 2.0 ± 0.1 using a device-data-independent prior. Meanwhile, the neural network decoder achieves ε5 = 0.269% ± 0.008% and Λ = 2.18 ± 0.09 when later decoding offline. The modest reduction in accuracy when comparing the real-time decoder with an offline decoder is expected as the real-time decoder must operate substantially faster. It requires a throughput of less than 1.1 μs per cycle compared with the 24 μs per cycle of the neural network26. However, we expect that many of our high-accuracy decoding methods can eventually be made real time by introducing techniques like layered or windowed decoding27,43,44.

Outlook

In this work, we have demonstrated surface code memory below the threshold in our new Willow architecture. Each time the code distance increases by two, the logical error per cycle is reduced by more than half, culminating in a distance-7 logical lifetime of more than double its best constituent physical qubit lifetime. This signature of exponential logical error suppression with code distance forms the foundation of running large-scale quantum algorithms with error correction.

Our error-corrected processors also demonstrate other key advances towards fault-tolerant quantum computing. We achieve repeatable performance for more than several hours and run experiments up to 106 cycles without deteriorating performance, both of which are necessary for future large-scale fault-tolerant algorithms. Furthermore, we have engineered a real-time decoding system with only a modest reduction in accuracy compared with our offline decoders.

Even so, many challenges remain ahead of us. Although we might, in principle, achieve low logical error rates by scaling up our current processors, it would be resource intensive in practice. Extrapolating the projections shown in Fig. 1d, achieving a 10−6 error rate would require a distance-27 logical qubit using 1,457 physical qubits. Scaling up will bring additional challenges in real-time decoding as the syndrome measurements per cycle increase quadratically with the code distance. Our repetition code experiments also identify a noise floor at an error rate of 10−10 caused by correlated bursts of errors. Identifying and mitigating this error mechanism will be integral to running larger quantum algorithms.

However, quantum error correction also provides us exponential leverage in reducing logical errors with processor improvements. For example, reducing physical error rates by a factor of two would improve the distance-27 logical performance by four orders of magnitude, well into algorithmically relevant error rates11,12. We further expect these overheads to reduce with advances in error correction protocols47,48,49,50,51,52,53 and decoding54,55,56.

The purpose of quantum error correction is to enable large-scale quantum algorithms. Although this work focuses on building a robust memory, additional challenges will arise in logical computation57,58. On the classical side, we must ensure that software elements including our calibration protocols, real-time decoders and logical compilers can scale to the sizes and complexities needed to run multiple surface code operations59. With below-threshold surface codes, we have demonstrated processor performance that can scale in principle, but which we must now scale in practice.