Abstract
This paper improves and demonstrates the usefulness of the first quantized plane-wave algorithms for the quantum simulation of electronic structure. We describe our quantum algorithm for first quantized simulation that accurately includes pseudopotentials. We focus on the Goedecker-Tetter-Hutter pseudopotential, and despite its complicated form, we block encode the associated operator without significantly increasing the overall cost of quantum simulation. This is surprising since simulating the nuclear potential is much simpler without pseudopotentials, yet is still the bottleneck. We also generalize prior methods to enable the simulation of materials with non-cubic unit cells, which requires nontrivial modifications. Finally, we combine these techniques to estimate block-encoding costs for commercially relevant instances of heterogeneous catalysis (e.g. carbon monoxide adsorption) and compare to the quantum resources needed to simulate materials in second quantization. We conclude that for computational cells with many particles, first quantization often requires meaningfully less spacetime volume.
Similar content being viewed by others
Introduction
Simulation of quantum systems is an area where quantum computers could provide major speedups over the best classical methods, and quantum simulation of chemistry or materials is an area likely to yield considerable practical benefit. To realize the potential of quantum computing it is crucial to carefully optimize the representation of the simulation problem in order to have the lowest complexity implementation on the quantum computer and to make most efficient use of quantum resources such as ancilla qubits and Toffoli/T-gates. Crucial here is the choice of basis in which the electronic degrees of freedom of a system are discretized, a selection of the most relevant such degrees of freedom and an effective treatment of the remaining ones (e.g., core electrons), and methods through which the resulting Hamiltonian can be represented in the most compact way possible on a quantum computer.
Recent work has focused on efficient block encodings of chemical and materials Hamiltonians within second quantization by optimizing the construction of the linear combination of unitaries (LCU) representation of the Hamiltonian and estimating phase estimation costs1,2,3,4,5,6,7,8,9. While second quantized representations of the Hamiltonian in a localized basis provide a number of advantages, such as efficient representation of low-density systems and compact basis sets, plane-wave basis sets provide complementary advantages which have been explored within the context of efficient quantum algorithm representations using a first quantized approach10,11,12.
Plane wave based methods are routinely applied in the classical simulation of the electronic structure of periodic systems. According to Bloch’s theorem, the wave function for an electron in a periodic potential is a product of a plane wave and a periodic function. These Bloch functions form an ideal single-particle basis for calculations on periodic many-electron systems. This basis naturally reflects the translational symmetry of the crystal lattice, captures the extended nature of the wave functions for these systems, and is systematically improvable via the inclusion of higher-energy plane waves. Importantly, plane-wave-based representations of the operators encountered in non-relativistic quantum chemistry are trivial to evaluate; matrix elements of the kinetic energy and Coulomb operators in the momentum basis are simple functions of the momenta associated with the plane-wave functions. These nice properties have contributed to the success of plane wave density functional theory (DFT) in materials science applications13.
The primary disadvantage of plane waves is that a large number of functions is required to accurately resolve tightly localized or rapidly oscillating features, such as the electronic wave function in close proximity to nuclei. This challenge has motivated the development of a variety of approaches (e.g., norm-conserving pseudopotentials14, ultra-soft pseudopotentials15, the projector augmented wave (PAW) method16,17, etc.) that avoid explicitly considering the core electrons and orbitals. Pseudopotential approaches achieve this aim by replacing the true ionic potential for the valence electrons with a smoothly-varying effective potential (with a different functional form) that reproduces the behaviour of the valence electrons in the presence of the core orbitals. Pseudopotentials thus reduce the computational cost of plane wave calculations in two ways: (i) one no longer needs to include high-energy plane waves to resolve core orbitals, and (ii) the total number of electrons in the wave function is limited to those that reside within the valence space. Pseudopotentials can also be parameterized to capture physics beyond that contained in the Coulomb Hamiltonian, such as scalar relativistic effects18. In this work, we focus on the Goedecker-Teter-Hutter (GTH)19 and Hartwigsen-Goedecker-Hutter (HGH)20 parameterizations of the norm-conserving pseudopotential, the latter of which accounts for scalar relativistic effects. This parameterization of the norm-conserving pseudopotentials has an analytical form for the necessary matrix elements that is simple enough to implement coherently with a quantum circuit.
Reference 21 proposed the idea of using pseudopotentials to improve efficiency of first-quantized plane-wave basis simulations and proposed a method for simulating non-cubic unit cells. Their proposal leverages the fact that the QROM3 primitive can be used for state preparation, and the pseudopotential becomes substantially simpler if angular momentum projectors are ignored (see Eq. (29) of that work). There are two significant drawbacks to their approach that render them of questionable utility in practice: (1) the QROM-based state preparation for the pseudopotential (based on Grover-Rudolph22) requires QROM to output as much data as the entire plane-wave basis leading to very high costs and even more importantly, (2) ignoring off-diagonal angular momentum projectors of the pseudopotential results in energy errors that are hundreds to thousands of times chemical accuracy per atom. Errors of this magnitude simply mean one is simulating a completely different Hamiltonian than the intended system. In Section “Errors from omitting angular momentum projectors”, we confirm this error numerically for large-core pseudopotentials by demonstrating that local density approximation (LDA) DFT energies differ by as much as Hartrees when ignoring off-diagonal angular momentum projectors. While it is possible to extend the QROM heavy approach of ref. 21 to include higher angular momentum terms this would incur a substantial increase in the amount of data output by QROM and could give a further order of magnitude increase in Toffoli costs. A further difficulty with QROM in the form used in that work (with many ancilla qubits), is that there is a large overhead of Clifford gates and thus a Toffoli-only cost model is less appropriate for estimating quantum resources. Finally, their non-cubic unit cell treatment uses an even distribution of grid points regardless of the simulation cell dimensions, resulting in an uneven discretization of momentum space. Thus, while the use of pseudopotentials in first-quantized plane-wave calculations has been discussed in prior literature, a scalable and practical quantum simulation strategy incorporating these features has not been articulated.
Our approach provides a scalable block encoding cost while appropriately generalizing to non-cubic unit cells. In order to avoid the large ancilla and Toffoli complexity associated with QROM, we only use the QROM primitive for data that scales linearly with the number of atoms, or logarithmically in the basis size. We primarily use arithmetic to implement the SELECT oracle in order to to yield (poly)logarithmic scaling in the total basis set size. That is an exponential improvement over the approach of ref. 21.
In our approach, the highest complexity step in SELECT is (like the approach without pseudopotentials10,11) still \(\widetilde{{\mathcal{O}}}(\eta )\) for η electrons. Specifically, we use a QROM to output position data for the nuclei, as well as for function approximation by interpolation in order to efficiently compute the primitive function composing the GTH-pseudopotential – the negative exponential. The local and nonlocal components of the pseudopotential use the same functional form and thus we minimize costs by using the same set of arithmetic primitives to compute the core common functions. Further leveraging coherent arithmetic, we are able to use a plane-wave basis in a non-cubic unit cell and variable grid resolutions in each Miller direction. Accounting for different numbers of bits in each Miller direction requires modification to the block encoding primitives. While the omission of large components of the nonlocal pseudopotential and the different handling of non-cubic unit cells make it difficult to directly compare our block encoding costs with those in ref. 21, our arithmetic approach for the entire Hamiltonian is approximately two orders of magnitude more efficient than the previous strategy that implemented the incomplete operator.
To demonstrate the cost savings in block encoding and to frame the total quantum resources needed to resolve an open materials question we perform resource estimation on the heterogeneous catalysis systems for carbon-monoxide adsorption on transition metal catalysts. Accurate calculations of the energetics of CO adsorption is important in a number of applications such as steam reforming of fossil fuels23, methanol synthesis24,25, water-gas-shift and its reverse26. In order to perform resource estimates at grid resolutions and system sizes that are realistic for resolving the CO adsorption question we perform DFT calculations on a variety metal-CO systems to resolve the number of plane waves and grid resolution needed to accurately resolve binding energies. Using the grid resolution estimates we estimate the total Toffoli complexity to perform phase estimation to chemical precision (1.6 × 10−3 Hartree). We estimate that while the block encoding costs are on the order of 104 in Toffoli gates the total costs for phase estimation are substantial (on the order of 1014 Toffoli gates) due to the large λ-value associated with the weight in the block encoding of the Hamiltonian.
A full block encoding of the pseudopotential now allows us, for the first time, to accurately compare first quantized simulation methods to second quantized simulations in order to identify which encoding strategy offers the current best approach for simulating materials. We compared the second quantized simulation costs for the Lithium-Nickel-Oxide (LNO) cathode simulation, recently presented in ref. 7, to first quantized simulation costs. We select a grid resolution by DFT calculations of LNO system with increasing cutoff energies. We find that while first quantized simulation, costing about 1014 Toffoli gates, are more expensive than second quantized simulations, costing about 5.0 × 1012 Toffoli gates, the space resources for first quantization are substantially lower. For the LNO system the dominant space complexity is the system register using about 1400 qubits while for second quantized calculations the dominant space complexity is QROAM costing 75,000 qubits.
In Section “The plane-wave basis from non-orthogonal Bravais vectors” we describe the method of including non-cubic unit cells in the analysis. Then in Section “The Goedecker-Teter-Hutter separable dual-space Gaussian pseudopotential” we describe the GTH pseudopotential and the principle of the method used to block encode it. In Section “Errors from omitting angular momentum projectors” we analyze the error incurred by omitting terms in the pseudopotential. We give an overview of the method of block encoding the GTH pseudopotential in Section “Block encoding the GTH pseudopotential”, then go into details of the cost of the block encoding in Section “Details of the costing”. The cost of block encoding other parts of the Hamiltonian is explained in Section “Other costs in the block encoding”. The overall cost needs calculation of the λ-value of the block encoding, which is addressed in Section “Computing λ”. These results are used to determine cost estimates for examples of interesting materials in Section “Cost estimates for quantum simulating interesting materials”.
Results
The plane-wave basis from non-orthogonal Bravais vectors
Most work simulating first-quantized plane-wave basis electronic structure on a quantum computer has either been confined to non-periodic systems, or systems that have the simplest possible periodicity; e.g. cubic periodicity. Cubic periodicity results in a simple expression for the reciprocal vectors (as scaled Euler vectors) and thus the computational basis states of a quantum register are used as Miller indices leading to an exponential space savings. Furthermore, the arithmetic needed for block encoding is simplified by the fact that the Gramian of reciprocal lattice vectors for a cubic unit cell is proportional to the identity. For non-cubic unit cells the reciprocal lattice vectors no longer have a simple form (most generally computed by scaled cross products). We still use the computational basis of a quantum register to represent Miller indices, but we require more complicated arithmetic to block encode using a plane-wave basis generated from reciprocal lattice vectors whose Gramian does not form a scaled identity matrix. Computing basis state norms in the generalized basis can involve up to six fixed-point multiplications of the squared Miller indices with elements of the reciprocal lattice Gramian.
In what follows, we introduce the details of the basis and the representation we use in the quantum calculation. The three-dimensional simulation cell is described with three vectors for the sides of the cell arranged as rows in a matrix denoted a. Given the geometry of the simulation cell the reciprocal lattice vectors are described as
where Ω is the volume of the simulation cell computed by taking the determinant of a. We will not be using k-point sampling and thus commonly the simulation cell will be multiples of the primitive cell (unit cell containing one lattice point). Examples of vectors for different classes of materials are given in the Supplementary Information, Table III. For simulation cells that are multiples of the primitive cells, the vectors can be easily scaled. For example, using two primitive cells in the first direction doubles a1 and halves g(1).
The reciprocal lattice vectors can be efficiently computed by \([{{\boldsymbol{g}}}^{(1)},{{\boldsymbol{g}}}^{(2)},{{\boldsymbol{g}}}^{(3)}]=2\pi {({{\boldsymbol{a}}}^{-1})}^{T}\). For cubic simulation cells the reciprocal lattice vector, now only shown for g(1), is simply
where e1 is the unit vector along the primitive cell direction a1.
The basis is indexed by the Miller indices (px, py, pz), a 3-vector of integers describing reciprocal cell shifts. For a cubic simulation cell the momentum vector is proportional to the vector of Miller indices, so the inner product would be computed as \(\langle {k}_{p},{k}_{q}\rangle =\frac{4{\pi }^{2}}{{\Omega }^{2/3}}({p}_{x}{q}_{x}+{p}_{y}{q}_{y}+{p}_{z}{q}_{z})\). In contrast, for the general reciprocal lattice vectors we would compute the momentum vector as
Then the inner product of basis vectors is obtained by treating kp, kq as Cartesian vectors to give
where \({g}_{ij}={({{\boldsymbol{g}}}^{(i)})}^{T}{{\boldsymbol{g}}}^{(j)}\). Here [gij] are elements of the Gramian matrix that is real, symmetric, and positive definite. The squared norm of basis function, ∥k∥2, is now computed as
When considering sets of Miller indices with different numbers of points in each direction we would use three registers to store signed integer sets
which will use nx, ny, nz bits in each direction
and similarly for y and z. We will also consider a set for differences in momenta
where the width in each direction is doubled. We also consider G0, which is Gd excluding the origin. That is used where 1/∥kν∥2 weights are required, but we can use Gd for pseudopotentials because they do not diverge at ν = 0.
For indexing Gd or G0, we have maximum absolute values of \({N}_{x}-1={2}^{{n}_{x}}-2\), and similarly for y and z. That means that we can use nx, ny, nz qubits for magnitudes of components of vectors ν in Gd or G0, as well as sign qubits. Note also that the number of points in G is N = NxNyNz, whereas the number of points in Gd is (2Nx − 1)(2Ny −1)(2Nz − 1) < 8 N.
To compute ∥k∥2 using coherent arithmetic, there are up to three squares of integers px, py, pz, three products of integers, and six multiplications of integers by the real numbers gij. In practice, we will always be multiplying ∥k∥2 by a real number in another part of the algorithm, so we can scale gij and one of those multiplications is not required. Moreover, for real examples there are often simplifications where values of gij are zero or equal to each other. That enables us to eliminate or combine multiplications by real numbers, as well as to eliminate the need for some of the products of integers. The simplifications for specific examples are detailed in Section “Complexity of norm with Bravais vectors”. We note that while the aforementioned strategy involves some additional arithmetic it does easily allow for differing resolution in each Miller index direction. This allows one to maintain a uniform sampling in reciprocal space no matter how non-cubic the simulation cell. In ref. 21 a uniform distribution of the total number of plane waves is taken in each Miller index direction which can lead to non-uniform simulation grids and ultimately skew results. Modifying the grid size in each direction requires modification of the block encoding oracles discussed later.
The Goedecker-Teter-Hutter separable dual-space Gaussian pseudopotential
Local and nonlocal pseudopotentials
Nonlocal pseudopotentials are perhaps the most popular type of pseudopotential. This family of pseudopotentials includes norm-conserving pseudopotentials. As an example, Kleinman-Bylander pseudopotentials can be written as
where ℓ indexes the L nuclei in the simulation. In this equation \({v}_{{\rm{loc}}}^{{\alpha }_{\ell },{R}_{\ell }}\) is the local Coulomb potential (e.g. this might be expressed as \({Z}_{{\alpha }_{\ell }}/(r-{R}_{\ell })\) in position space), where we use αℓ to indicate the species (type of nuclei) associated with the ℓ index, Rℓ is used to indicate the coordinates of the nuclei, and Zα is the charge of a nucleus of species α (including the charge associated with the core). We can write the form of a Kleinman-Bylander nonlocal pseudopotential component as
where l and m are the quantum numbers corresponding to angular momentum and the projection of angular momentum. The nonlocal pseudopotentials are written this way because the purpose of the nonlocal part is to capture the angular momentum dependent interactions of the core electrons that we remove using the pseudopotential.
In this paper we will focus on implementing a specific type of nonlocal pseudopotential known as the Goedecker-Teter-Hutter (GTH) pseudopotential19, which has been popularized by CP2K developers over many years. Specifically, we focus on the relativistic parameterization of this pseudopotential introduced by Hartwigsen, Goedecker and Hutter (HGH)20. These two papers have been cited over ten thousand times in total, and are among the most commonly used pseudopotentials. A comprehensive review of the pseudopotential literature27 found that the HGH parameterization of the GTH pseudopotential is comparable in accuracy to other accurate norm-conserving, ultrasoft, and projector-augmented wave pseudopotentials. This study also demonstrated that for consistency of equations-of-state (EOS) calculations, using the HGH parameterization yields a less consistent root-mean-square error than PAW, when comparing against all-electron calculations, by a factor of 2-3. The consistency of EOS does not benchmark absolute accuracy though and norm-conserving pseudopotentials like those of HGH are considered accurate27. Here we focus on HGH parameterizations as a step towards high-accuracy quantum computations of real materials.
The local component of the GTH pseudopotential can be expressed as follows in real space:
We note that the \({C}_{n}^{\alpha }\) constants as well as \({r}_{{\rm{loc}}}^{\alpha }\) are all α (nuclear species type) dependent parameters. Parameters that were optimized for a density functional calculation using the local density approximation (LDA) as an exchange-correlation function and α corresponding to several atoms are given in ref. 20, and the ones we use are provided in the Supplementary Information, Table I. It is a special property of the GTH pseudopotential (beneficial for our purposes) that the local part also has an analytical form if expressed in the reciprocal space. This form of the pseudopotential would be useful if our intention was to implement the pseudopotential in a real space grid parameterization (for example, in conjunction with the algorithms of Kassal et al.28 or with the algorithm described in Appendix K of ref. 11). However, here we will focus on implementing the operator in momentum space. The Fourier transform of the local component of the GTH pseudopotential is
The matrix elements of the local part of the pseudopotential are determined in the same way as for any other periodic one-body operator, i.e., \(\langle G^{\prime} |^{{\rm{GTH}}}{v}_{{\rm{loc}}}^{\alpha ,R}\left(\hat{r}\right)| G^{\prime\prime} \rangle\) = \({\rm{GTH}\atop}{v}_{{\rm{loc}}}^{\alpha ,R}\left(G^{\prime} -G^{\prime\prime} \right)\). Note that Eq. (12) diverges for \(k=G^{\prime} -G^{\prime\prime} =0\). Since the case with k = 0 would correspond to the identity we do not include it in the sum, and there is no problem with divergent values.
The matrix elements for the nonlocal contribution to the GTH pseudopotential are
where we have taken the sum over the projection m of the angular momentum l so that
where Pl(x) is the lth Legendre polynomial of x. In these expressions \({l}_{\max }^{\alpha }\) depends on α, and \({E}_{l\alpha }^{ij}\) is a constant tabulated for each atom type α and angular momentum. The function \({F}_{l\alpha }^{j}\left(\left\Vert k\right\Vert \right)\) takes the form
where \({C}_{li}^{\alpha }\), {cx,li}, and \({r}_{l}^{\alpha }\) are all tabulated constants available for each atom, angular momentum, and a projector index i. The \({C}_{li}^{\alpha }\) and cx,li parameters that enter Eq. (15) are given in Table 1. Note that \({C}_{li}^{\alpha }\) depends on α only through \({r}_{l}^{\alpha }\), and the ratio \({C}_{li}^{\alpha }/{({r}_{l}^{\alpha })}^{l+3/2}\) is independent of α. Examples of \({r}_{l}^{\alpha }\) and \({E}_{l\alpha }^{ij}\) parameters that were optimized for LDA with α corresponding to several atoms are given in ref. 20, and the ones we consider are listed in the Supplementary Information, Table II.
The complete Hamiltonian for the GTH pseudopotential
For the complete Hamiltonian we need to sum \({{\rm{GTH}}\atop}{v}_{{\rm{loc}}}^{\alpha ,R}\left(k\right)\) and \({{\rm{GTH}}\atop}{v}_{{\rm{nonloc}}}^{\alpha ,R}({k}_{p},{k}_{q})\) over all L nuclei, and include the kinetic and two-electron parts of the Hamiltonian, which are unchanged from what we used in ref. 11. We also need to sum over the electrons, so we have k for each of the electron registers. For each of the nuclei there is a distinct position of the nucleus Rℓ (which is a vector). The quantity α indexes the different types of nuclei. For each type of nucleus, there are values tabulated for Zα, \({r}_{{\rm{loc}}}^{\alpha }\), \({C}_{n}^{\alpha }\), \({l}_{\max }^{\alpha }\), \({r}_{l}^{\alpha }\), \({E}_{l\alpha }^{ij}\), and \({C}_{li}^{\alpha }\), with cx,li independent of the nucleus. The simplest case is that where there is only one type of nucleus, with the amount of data needed here growing as the number of types of nuclei. To be more specific, we have
where
are real coefficients where we have removed the phase factor. Note that these now no longer depend on the nuclear coordinate Rℓ. We have modified the description of the nuclear part of the Hamiltonian Uloc and Unonloc so that it is a sum over q and the difference ν = q − p. The complete Hamiltonian is similar to that in prior work11, except there the potential U was
Therefore, for the simulation we can apply many of the techniques from before, except that now we need to account for the much more complicated form of U.
Method for block encoding of U
The way that this work differs from ref. 11 is that we have a different form of U, composed of functions involving sums where each term involves an exponential with a different argument. If we were to attempt to perform the block encoding using coherent arithmetic for the entire sum with repeated calculation of the exponential, then the complexity would be large due to the repeated coherent arithmetic. Instead, we introduce a method to block encode these functions using coherent arithmetic with only a single evaluation of the exponential. We will explain this in more detail for just the case of Unonloc, and it is straightforward to combine this method with the block encoding of Uloc, which also requires an exponential.
To illustrate, a simplified form of the circuit is shown in Fig. 1, which is omitting details such as the sums over the nucleus and electron number for simplicity. The register \(\left\vert \nu \right\rangle\) is an ancilla prepared for the block encoding, and \(\left\vert q\right\rangle\) is the target system. We also include another register with indices l, i, j corresponding to indices in the sum for \({u}_{{\rm{non}}}^{{\alpha }_{\ell }}({k}_{q},{k}_{q-\nu })\). This register is also prepared in a superposition state before using this circuit. The values of q, ν, l, i, j are all used to calculate the argument of the exponential function in an ancilla register with b bits (indicated by \({\left\vert 0\right\rangle }^{\otimes b}\)), shown as ‘arg’. That is then used to calculate the exponential function once. The result is then used in an inequality test with a register in an equal superposition state to provide a result in a flag register that is used to flag ‘success’, thereby achieving the correct amplitude in the block encoding.
The calculations then need to be inverted to reset the ancilla qubits, shown as \({\exp }^{-1}\) and arg−1. This inversion normally doubles the cost of the calculation, but it is also possible to invert the calculations using Clifford gates. That can be achieved if ancilla qubits are preserved from the initial calculation, so that the Toffolis for the inverse calculation can be performed with measurements and Clifford gates. In many cases the overall number of qubits needed just to store the system state is large and the number of Toffolis for the calculation is small, so it is preferable to perform the inversion in this way.
This is the key method to modify the block encoding to enable efficient encoding of the pseudopotentials. Below in Section “Block encoding the GTH pseudopotential” we give a detailed explanation of how the block encodings of the different parts are combined, the methods for the state preparation for this block encoding, and the methods for performing the coherent arithmetic including efficient calculation of the exponential.
Errors from omitting angular momentum projectors
Before describing the block encoding of the pseudopotential Hamiltonian and the generalized plane-wave basis, we first motivate the need to simulate the entire nonlocal pseudopotential by quantifying the error incurred by ignoring all but one projector for each angular momentum function, as is done in ref. 21. Energetically, this approximation appears to be a crude one. We have computed LDA energies for three systems containing metals for which the HGH nonlocal pseudopotential has up to three projectors for at least one angular momentum function: Al, Mn, and Ni. Table 2 provides LDA energies computed using full HGH-type pseudopotentials for varying numbers of plane waves (Npw). Also provided are errors incurred from the use of approximate HGH-type pseudopotentials (in parentheses). The error is evaluated as the difference in the LDA energy computed using the approximate pseudopotential and the full one, both using the same orbitals obtained from self-consistent calculations with the full pseudopotential (in the DFT calculation, a determinant constructed from these orbitals approximates the eigenstate). Approximating the nonlocal part of the pseudopotential can lead to errors on the order of hundreds of milliHartree (mEh).
Consider first AlN; the HGH pseudopotential for Al has up to two projectors per angular momentum function. For the primitive cell of AlN, calculations with large numbers of plane waves demonstrate that large errors (≈0.1658 Hartree (Eh) with 15,525 plane waves) persist with reasonably converged calculations. These errors are extensive—i.e. scale with simulation cell size. For the 1 × 1 × 2 supercell, the error grows to ≈ 0.4058 Eh for a calculation performed with 16,965 plane waves. Comparable errors are observed in LDA calculations using Gaussian-type basis functions performed in PySCF. For example, for the primitive cell of AlN, described by the GTH-cc-pVQZ basis set29, energies evaluated using full and approximate pseudopotentials differ by 0.1653 Eh. Consider now NiOF; the HGH pseudopotential for nickel has up to three projectors per angular momentum function. For the primitive cell with 14,981 plane-wave basis functions, LDA energies evaluated with full versus approximate pseudopotentials differ by 0.0686 Eh. The approximate pseudopotential also leads to large energy errors for LiMnO. For the primitive cell, described by 12,673 plane-wave basis functions, this error is nearly one-quarter of a Hartree.
In Table 2, errors incurred from the use of the approximate pseudopotentials were evaluated with the state (the determinant of Kohn-Sham orbitals) optimized with the full pseudopotentials. These errors can be substantially larger when the state is optimized using the approximate pseudopotential. For example, for the primitive cell of AlN and 2929 plane waves, the LDA energy with the approximate pseudopotential is −23.3658 Eh, which differs from the LDA energy using the full pseudopotential and the same number of plane-wave basis functions by 0.3753 Eh. These large energy differences suggest that the orbitals/state obtained from fully self-consistent LDA simulations using the approximate pseudopotential differ significantly from those obtained from calculations using the full pseudopotential. In other words, one could, in principle, obtain qualitatively different solutions depending on the treatment of the pseudopotential. These data suggest that similarly large errors would be observed in the quantum algorithm, where the energies correspond to eigenstates of different Hamiltonians when using either the full or approximate pseudopotential. The eigenstates simulated in these cases could themselves differ significantly. Note also that LDA calculations take far longer to converge with the approximate pseudopotential than with the full pseudopotential. With our implementation in qcpanop30, the LDA calculations with 5749 plane waves required 297 and 18 iterations, using the approximate and full pseudopotentials, respectively, to converge the energy to 10−8 Eh and the orbital gradient to 10−6 Eh.
Aside the impact on absolute energies, approximating the nonlocal pseudopotential terms as is done in ref. 21 leads to substantial errors in energy differences, as well. Table 2 provides band gaps from self-consistent LDA calculations carried out using full HGH-type pseudopotentials, as well band gaps derived from Kohn-Sham matrices constructed using approximate pseudopotentials and orbitals obtained from the self-consistent calculations carried out with the full pseudopotentials. Clearly, errors in band gaps can be substantial. For the primitive cell of AlN, the LDA bandgap computed using the full HGH pseudopotential and 15,525 plane-wave basis functions is 3.60 eV, and this gap reduces to only 0.32 eV when using the approximate pseudopotential. As in the case of absolute energies, we observe similar errors from calculations using Gaussian-type basis functions. Calculations on NiOF and LiMnO confirm that band gaps change dramatically when approximating the non-local pseudopotential. In particular, LDA with the full pseudopotential predicts NiOF to be a semiconductor, with a band gap of ≈ 0.6 eV to 0.7 eV depending on the size of the simulation cell. On the other hand, the approximate pseudopotential would lead us to conclude that NiOF is gapless.
These data confirm the substantial error incurred by limiting the number of projectors per angular momentum function to one in the nonlocal pseudopotential. We have observed not only large absolute energy errors due to this approximation, but also larger errors in energy differences, which could lead to qualitatively incorrect predictions. Thus, when using pseudopotentials, it is important to simulate the entire Hamiltonian or numerically confirm that, for the particular system of interest, the truncation incurs minimal energy error.
Block encoding the GTH pseudopotential
First we will give the general overview of how the block encoding of the Hamiltonian is performed in prior work, then we explain how we amend the block encoding for the pseudopotential. The general principles for the block encoding from ref. 11 are as follows.
-
Two qubits are used to select between the T, U, and V parts of the Hamiltonian.
-
A special superposition state is produced with weighting as 1/∥ν∥ that is in common between the block encodings of U and V.
-
Equal superposition states are used for ι and ξ (selecting the electron registers), and a weighted superposition is used for ℓ.
-
The registers encoding ι and ξ, which encode the numbers for electron registers, are used to control the swap of the values in these registers into ancilla registers, the select operation is performed on the ancilla, and it is swapped back. This avoids an overhead of η (the number of electrons) from performing the select operation on all η electron registers.
-
The value of ν is subtracted from one ancilla for both U and V.
-
The value of ν is added to the other ancilla only for V.
-
For U alone the phase factor \({e}^{-{\rm{i}}{k}_{\nu }\cdot {R}_{\ell }}\) is applied, where Rℓ was output via a QROM on ℓ.
-
The complexity of performing arithmetic for the kinetic part is completely avoided by expressing the sum of squares as a linear combination of bitwise products.
In this work we are changing the potential U, so the block encoding of this component needs to be changed. Previously, the preparation over ν with amplitudes 1/∥ν∥ was in common with that needed for V, so did not require additional complexity. Here we allow non-orthogonal Bravais vectors, so need to replace ∥ν∥ with ∥kν∥. Moreover, we have a more complicated function of ν required for the local part of the pseudopotential. The nonlocal part of the pseudopotential is even more complicated, because the factor that is applied depends on the momentum state \(\left\vert q\right\rangle\). This means that an amplitude factor needs to be applied according to \(\left\vert q\right\rangle\) as well as having a preparation over ν. This is no longer unitary, but can be block encoded by standard techniques of using an inequality test with an equal superposition state. For consistency with the terminology used for linear combinations of unitaries, we will separate the block encoding into a ‘prepare’ step and a ‘select’ step. Unlike the case of a linear combination of unitaries the ‘select’ is not unitary, but also involves a state-dependent amplitude; nevertheless, the principle is similar.
Other than these amendments to the block encoding, the other parts can still be performed relatively unchanged. We still perform controlled swaps of momentum registers into working registers, we still perform subtraction and addition of ν from the ancilla registers, and we still apply the phase factor \({e}^{-{\rm{i}}{k}_{\nu }\cdot {R}_{\ell }}\).
There are a number of choices in how we can block encode the local and nonlocal parts of the pseudopotential. There is a tradeoff between the complexity of the block encoding and the value of λ. In order to minimize λ, we would perform a tight initial preparation over ν using amplitude amplification similar to the preparation of 1/∥kν∥ for V. Then we would apply an amplitude factor for the nonlocal part of the pseudopotential. The functions of ∥kν∥ required could be applied using QROM interpolation, but the amplitude amplification requires computation of the function three times. That is because there is a computation, reflection on an ancilla, inversion of the computation, reflection about zero, then computation again. Some improvement in non-Clifford cost can be obtained by retaining working qubits to invert computations, but this is still a high-complexity step. Moreover we need separate functions for each species of nucleus, further increasing the complexity.
Then the application of the amplitude factor for the nonlocal part of the pseudopotential has a high complexity because it depends on both the momentum q and ν. The QROM approach to function interpolation is for functions of a single variable, and interpolation of a function of two would be unreasonably high cost. For example, if there were 200 points needed to interpolate a function of one variable, there would then be 40,000 needed to interpolate a function of two. For this reason, we would aim to interpolate the individual functions \({F}_{l\alpha }^{i}\), and use those to determine the overall pseudopotential. However, that would also be high cost because we would need to interpolate 9 functions of both q and q − ν, compute three Legendre polynomials, and perform many multiplications and additions to determine the complete pseudopotential.
To avoid the overhead from separately interpolating and computing all these functions, we can instead break up the sum over l, i, j for the nonlocal pseudopotential. We can prepare a superposition over these three indices, and perform controlled operations based on them. This substantially reduces the complexity, but at the cost of increasing λ, because we need to account for the sum of the absolute values over these three indices, rather than summing first then taking the absolute value.
The drawback to separating out the sum over l, i, j in the preparation stage is that the function of ν now needs to depend on l, i, j. Moreover these functions are complicated, and would need to all be individually interpolated in the preparation further increasing the complexity. However, it is not necessary for the preparation to be a tight function of ν. That is, in the preparation we are preparing a function over ν, then we are applying an amplitude that is a function of q and ν that is no greater than 1. The overall amplitude (corresponding to the square root of the function for the nonlocal pseudopotential) is a product of that in the two stages. This means that the initial function of ν needs to be an upper bound on the function required, but it does not need to be a tight upper bound.
Recall that in the standard scheme for preparation of 1/∥kν∥ as described for example in ref. 11, one initially prepares a superposition over a set of nested boxes. The number of nested boxes corresponds to the number of bits in a single component of ν, so is quite small, and there is low cost for the preparation. The amplitudes on the individual boxes correspond to the maximum of the desired function within that box. We can use a similar principle here, but instead of initially performing the full preparation over ν we just prepare the appropriate amplitudes over the boxes. That substantially reduces the complexity of the preparation, but also increases the value of λ, because we are using upper bounds on each box instead of a tight bound on ∥kν∥. The details of how the nested boxes are constructed are explained in Section “Preparing via nested boxes”.
Next, we aim to reduce the complexity of the block encoding by combining operations between the local part of the pseudopotential and the nonlocal part as much as possible. With the simplifications in the implementation of the nonlocal part above, all the arithmetic and QROM function interpolation is relegated to the select part of the block encoding, with a very simple preparation. In order to reduce the overall complexity we apply a similar principle to the local part of the pseudopotential. That is, we perform a simple preparation over ν, then apply the more complicated function interpolation as part of the select operation.
One advantage of this approach is that the interpolation needed for the nonlocal part of the Hamiltonian is for the negative exponential function, and this is also the interpolation needed for the local part of the pseudopotential. This means that we can perform this function interpolation in common between the two parts of the Hamiltonian, and just need to perform the further arithmetic in order to compute the functions. This arithmetic can also be combined between the local and nonlocal parts of the pseudopotential. Note that the functions given to be multiplied by \({C}_{1}^{\alpha }\), \({C}_{2}^{\alpha }\), and \({C}_{3}^{\alpha }\) correspond to the polynomials used for computing \({F}_{0\alpha }^{j}\) for j = 0, 1, 2; that is, with l = 0. For l = 0 the Legendre polynomial is just equal to 1 as well, as is the polynomial for \({F}_{0\alpha }^{i}\) for i = 1. That means the implementation of the terms in the local pseudopotential with \({C}_{1}^{\alpha },{C}_{2}^{\alpha },{C}_{3}^{\alpha }\) can be completely combined with those for the nonlocal pseudopotential with l = 0 and i = 1.
One could also use j = 1 and select between the terms with i, but it is convenient to use i = 1 to simplify the management of the input to the function. For the nonlocal pseudopotential we use q and q − ν, but for the local pseudopotential we just use ν. Therefore, if we use zero in place of q for the local pseudopotential, then we will obtain the desired result (note that ∥k−ν∥ = ∥kν∥). This will also be appropriate for the input to the negative exponential for the local pseudopotential, where we want \({({r}_{{\rm{loc}}}^{\alpha }\Vert{k}_{\nu}\Vert )}^{2}/2\) rather than \({({r}_{0}^{\alpha }\Vert{k}_{q}-{k}_{\nu}\Vert )}^{2}/2+{({r}_{0}^{\alpha }\Vert{k}_{q}\Vert)}^{2}/2\) as for the nonlocal pseudopotential (so we obtain the correct result with q = 0).
The remaining part of the local pseudopotential that needs to be accounted for separately is that with 1/∥kν∥. In that case one simply needs to account for the 1/∥kν∥ factor by rearranging the inequality test in the usual way. That is, the standard way to apply an amplitude factor in the block encoding is to calculate a function, prepare an ancilla register in an equal superposition state over integers, then perform an inequality test and flag on success for the block encoding11,31. When dividing by ∥kν∥2 in the amplitude, instead of performing arithmetic for the division, simply perform a multiplication of ∥kν∥2 by the integer in the equal superposition. This multiplication will need to be controlled by a qubit selecting this component of the pseudopotential.
Note that, in this procedure, we have separated the local pseudopotential into parts as well, which will further increase the λ. Nevertheless, we find that the local pseudopotential has a relatively small contribution to λ, so this increase is unimportant. In order to block encode the complete pseudopotential, the explicit steps needed are as follows. These steps are in addition to the standard steps listed above for the block encoding without pseudopotentials, except we are no longer using the state prepared for ν with weights 1/∥ν∥ for implementing U.
-
1.
First we need to prepare an appropriately weighted superposition state over the nuclear species type, a register μ selecting the boxes for ν, a register selecting between the local and nonlocal pseudopotential, registers for l, i, j, and a further register to select the first term for the local pseudopotential. We initially perform state preparation of a single contiguous register.
-
2.
Then QROM on this register may be used to output all of the desired registers (μ, l, i, j, etc.), as well as further data needed for later steps.
-
3.
Based on the output value of μ we prepare a superposition over ν for the box Bμ (defined in Section “Preparing via nested boxes").
-
4.
Based on data output from the QROM in Step 2, we prepare a superposition over nucleus number ℓ.
-
5.
We apply QROM on ℓ in order to determine Rℓ to use in the phase factor.
-
6.
We will copy the value of q in the working ancilla into another working ancilla, with the copy being controlled by the register selecting between the local and nonlocal components of the pseudopotential. This new working register will now have zero for the local pseudopotential.
-
7.
We determine q − ν in a new register based on the value in the previous step. For the local part of the pseudopotential this will be −ν.
-
8.
Compute the following with p = q − ν, and \({r}_{l}^{\alpha }\) replaced with \({r}_{{\rm{loc}}}^{\alpha }\) for the local component of the pseudopotential.
-
(a)
∥kq∥2 & ∥kp∥2
-
(b)
\({({r}_{l}^{\alpha }\Vert{k}_{q}\Vert)}^{2}\) & \({({r}_{l}^{\alpha }\Vert{k}_{p}\Vert)}^{2}\)
-
(c)
\({({r}_{l}^{\alpha }\Vert{k}_{q}\Vert)}^{4}\) & \({({r}_{l}^{\alpha }\Vert{k}_{p}\Vert)}^{4}\)
-
(d)
\({({r}_{l}^{\alpha }\Vert{k}_{q}\Vert)}^{2}+{({r}_{l}^{\alpha }\Vert{k}_{p}\Vert)}^{2}\)
-
(e)
kp ⋅ kq
-
(f)
\([3{({k}_{p}\cdot {k}_{q})}^{2}-{\Vert{k}_{p}\Vert}^{2}{\Vert{k}_{q}\Vert}^{2}]/2\)
-
(a)
-
9.
Perform QROM interpolation of the negative exponential using the value of \({({r}_{l}^{\alpha }\Vert{k}_{q}\Vert)}^{2}+{({r}_{l}^{\alpha}\Vert{k}_{p}\Vert)}^{2}\).
-
10.
Compute \({c}_{0,li}+{c}_{1,li}{({r}_{l}^{\alpha}\Vert{k}_{q}\Vert)}^{2}+{c}_{2,li}{({r}_{l}^{\alpha}\Vert{k}_{q}\Vert)}^{4}\) and \({c}_{0,lj}+{c}_{1,lj}{({r}_{l}^{\alpha}\Vert{k}_{p}\Vert)}^{2}+{c}_{2,lj}{({r}_{l}^{\alpha}\Vert{k}_{p}\Vert)}^{4}\) using the values determined in (b) and (c) above, and using the values of cx,li, cx,lj output by the QROM.
-
11.
Use the value of l to copy either 1, or the computed items (e) or (f) into a working register for the Legendre polynomial. This is because l is the degree of the Legendre polynomial needed.
-
12.
Multiply the exponential from the QROM interpolation in Step 9, the q and p polynomials from Step 10, and the Legendre polynomial from Step 11.
-
13.
Multiply Ψ, a weighting factor used in the preparation described in Section “The form of the calculation” below, by the value in an equal superposition register. The value of Ψ has previously been output via QROM.
-
14.
Controlled by the register selecting the first term for the local pseudopotential, multiply the value in the equal superposition register by ∥kν∥2.
-
15.
Determine the absolute value of the result of the product and perform an inequality with the result from the multiplications with the equal superposition register. The success flag for the inequality test applies the appropriate amplitude for the block encoding.
-
16.
Use the sign from the product in Step 12 to control a Z operation.
The key parts of these steps are that Steps 1 to 4 are the state preparation (omitted from Fig. 1), Step 8 is computing the argument needed for the QROM interpolation (‘arg’ in Fig. 1), and Step 9 is computing the exponential (‘exp’ in Fig. 1). Steps 10 to 13 are further arithmetic needed (omitted for simplicity in Fig. 1), then Step 15 is the inequality test in Fig. 1. The value of \({r}_{l}^{\alpha }\) or \({r}_{{\rm{loc}}}^{\alpha }\) is output in Step 2 so that the arithmetic may be performed in common between the local and nonlocal parts of the pseudopotential. Steps 6 and 14 are also used to take account of how the block encoding of the local pseudopotential is combined with the nonlocal pseudopotential, then Step 16 is just ensuring that the terms are added together with the correct signs.
We will go into further detail for these steps and their costs below. A further subtlety that needs to be accounted for in this block encoding is how ν is prepared between U and V. To avoid making all operations controlled, it is convenient to prepare separate registers for ν with the weightings for U and V. Then we may perform a controlled swap into a working register. A further observation is that the value of ∥kν∥2 needs to be computed in the preparation for V multiple times. However, we are also computing this quantity for the block encoding of the local part of the pseudopotential. We can combine the two to avoid computing this quantity twice.
This is possible even though we have included it as part of the preparation for V, but part of the selection for U. The reason is that, for V, we complete what we have described as the ‘prepare’ step by multiplying ∥kν∥2 by a register in an equal superposition and performing an inequality test. That is exactly what is done for the first term of the local part of the pseudopotential, so the implementation of these two parts can be combined, with the only distinction being the small extra cost of making the multiplication by the exponential controlled.
Details of the costing
Preparation of registers for selecting between terms
Throughout this work, we give the costing of the complexity in terms of Toffoli gates, as that is the non-Clifford gate used in these algorithms, and much more costly than Clifford gates to implement in an error-corrected architecture. Here we describe in more detail the preparation of the indices for selecting between the components of Uloc and Unonloc. These are as follows.
-
The nuclear species α.
-
The box number for ν, which we will denote μ.
-
Selection between the term in Uloc with 1/∥kν∥, the rest of Uloc, and Unonloc. We will denote the variable for selection ς, and it can be stored on two qubits, with one selecting between Uloc and Unonloc and the other between the components of Uloc.
-
The value of l, which is selecting between terms for Unonloc but just taken to be equal to 0 for Uloc.
-
The values of i, j. We will prepare i ≤ j for Unonloc, but for Uloc we will take i = 1 and just use j to select between terms.
In order to produce the full range of i, j a swap is performed, controlled by a \(\left\vert +\right\rangle\) state. This controlled swap will also need to be controlled by the qubit selecting between Uloc, and Unonloc. These controlled swaps and their inversion can be performed with only 5 Toffolis, which can be ignored in comparison to other parts of the procedure.
This preparation is achieved by initially preparing a contiguous register, Step 1 above, then using QROM on that register to output the indices, Step 2 above. To determine the complexity of preparing the contiguous register, we need to determine the number of values this contiguous register must take. To explain how these values are counted, we will initially consider just for a fixed α and μ (box number). Nominally there would appear to be 5 for Uloc, with the first term with 1/∥kν∥ and the following terms with \({C}_{1}^{\alpha }\), \({C}_{2}^{\alpha }\), etc. However, for particular nuclear species many of these will be zero, and in fact there are no more than \({C}_{1}^{\alpha }\) and \({C}_{2}^{\alpha }\) for the cases we consider (see Supplementary Information, Table I). For Unonloc, the number of values corresponds to the number of independent values of \({E}_{l\alpha }^{ij}\) (accounting for symmetry between i, j). There are four different numbers for the different species of nuclei as given in the Supplementary Information. The total values for the listed nuclei are as follows.
-
3 + 1 = 4—For nuclei like B, C, N, and O, there are three parts to Uloc (with C1 and C2 nonzero), but only one part to Unonloc with l = 0. That gives a total of four parts.
-
3 + 2 = 5—For Li there are still three parts for Uloc, but there are now two for \({E}_{l\alpha }^{ij}\) for a total of 5.
-
2 + 4 = 6—For Al, Si and Cl there are now only two parts for Uloc since C2 = 0, but there are four \({E}_{l\alpha }^{ij}\) (with l = 0, 1) for a total of 6.
-
1 + 10 = 11—For examples like Mn and Ni there is only one part for Uloc, but now for Unonloc there is a maximum l of 2, for 10 independent values of \({E}_{l\alpha }^{ij}\), and a total of 11.
We need to sum these values over all the nuclear species required for the simulation.
We would also multiply by the number of boxes for ν. A subtlety here is that the sum for Uloc excludes ν = 0 (since it is just a term in the Hamiltonian proportional to the identity), whereas the sum for Unonloc includes ν = 0. In that case the momentum is unchanged, but the amplitude is modified according to q, so this is not just proportional to the identity. In practice we find that the difference in values needed for ν = 0 and the next box are trivial, so we can just use the same box number for ν = 0 as for the next smallest box. This means that the number of values needed is multiplied by \({\mu }_{\max }-1\), where \({\mu }_{\max }\) is given as in Eq. (93) as
Here δx, δy, δz are shifts to account for different lengths of Bravais vectors, which are zero or small integers. See Section “Preparing via nested boxes” for further explanation. In the simple case where δx, δy, δz are all zero and the numbers of bits for the components of the vectors are all equal to n, the number of boxes is just n. We will use this in the following explanation for simplicity.
If we give the total number of values as per the list above summed over all nuclei as \({\mathcal{M}}\), and the number of boxes is n, then the total number is \({\mathcal{M}}n\). Then costings for preparing the initial superposition over the contiguous register (Step 1) are as follows.
-
\(4\lceil \log ({\mathcal{M}}n)\rceil +1\) for preparing the equal superposition over \({\mathcal{M}}n\) basis states32.
-
\({\mathcal{M}}n\) for the QROM iterating over the index. (This can also be made smaller by using the advanced QROM.)
-
A cost of the number of bits of precision for the inequality test with the keep register. This step does not need to be very precise, and the complexity can be taken to be the same as the number of bits, b, used in later arithmetic. That is more than enough, and has a complexity much smaller than other steps.
-
There is a cost of \(\lceil \log ({\mathcal{M}}n)\rceil\) for a controlled swap on the index being prepared and the alt register.
After this preparation, Step 2 is QROM on this contiguous register with cost \({\mathcal{M}}n\). This QROM outputs ς, α, μ, l, i, j, the values of \({c}_{x,li},{c}_{x,lj},{r}_{l}^{\alpha },{r}_{{\rm{loc}}}^{\alpha }\), and a quantity Ψς,α,μ,l,i,j to use in inequality testing. We will also use it to output rotation angles to be used in the preparation of ν and ℓ, and the number of nuclei of each species Lα.
Next, Step 3 is to prepare ν based on the box number μ. For the pseudopotentials it is preferable to prepare a superposition eliminating the inner boxes and the negative zeros. To test if the value of ν is in an inner box, we can perform a triply-controlled NOT for a given value of μ to test if there are no leading ones in any of the three components. Because μ is in a superposition, we need a quadruply-controlled NOT (controlling on a qubit for μ as well), and need to do it for each possible value of μ. That gives a cost of 3n. To test for a negative zero, we have cost n to tell if each component is zero, and a unit cost to determine if the sign qubit is negative. That gives a 3n cost to test a negative zero on any of the components. There is also a 3n cost for controlled Hadamards on each of the components, as always when preparing superpositions with nested boxes. That gives a cost of 9n, which is performed three times in the amplitude amplification. There is also a 3n cost for reflection on the ν registers in the amplitude amplification for a total of 30n. We also need a μ-dependent rotation (to be performed twice) in the amplitude amplification to give near-unit amplitude for success. That angle is given by the QROM in the preceding step, and can be taken to be b bits for cost 2b for the two rotations.
Step 4 is then to prepare a superposition over ℓ. For given species, the nucleus number ℓ does not affect the amplitude, only the phase through the factor of \({e}^{-i{k}_{\nu }\cdot {R}_{\ell }}\). Therefore, for each α an equal superposition state over ℓ needs to be prepared, over a number of basis states Lα. That number is given in a quantum register output by the QROM in Step 2. There is a standard procedure to prepare that equal superposition with logarithmic complexity5.
The total complexity is then \((2{\mathcal{M}}+30)n\) plus 3b and a logarithmic complexity in \({\mathcal{M}}\) and n. This complexity is doubled when accounting for inverse preparation in the block encoding.
The form of the calculation
Now we give more detail on the form of the expressions that need to be evaluated. We first explain how this is done for the case of the nonlocal pseudopotential, as this is more complicated, and the calculation of the local pseudopotential is combined with it. The nonlocal part of the pseudopotential is given by
where
and
In this expression \({E}_{l\alpha }^{ij}\) and \({r}_{l}^{\alpha }\) are dependent on the nuclear species α, but cx,li is independent of the nuclear species.
We now define a scaled \({\widetilde{F}}_{l}^{i}\)
This quantity is effectively \({F}_{l\alpha}^{i}\) without \({C}_{li}^{\alpha }\), but including the factor of \({({r}_{l}^{\alpha })}^{l}\) from \({C}_{li}^{\alpha }\), so
We also define a quantity
Note that this expression is dependent on the nuclear species through \({r}_{l}^{\alpha }\) as input to \({\widetilde{F}}_{l}^{i}\). That only gives a scaling in its variation, and it can be calculated independently of the nuclear species if ν is scaled by \({r}_{l}^{\alpha }\). We have also included a factor of \(1/{({r}_{l}^{\alpha })}^{2l}\), because we will not be including it explicitly in the calculation of \({\widetilde{F}}_{l}^{i}\).
Then we can rewrite \({u}_{{\rm{non}}}^{\alpha }({k}_{p},{k}_{q})\) as
The expression in the square brackets is constructed such that its absolute value is no larger than 1 (recall that the absolute value of the Legendre polynomial is no larger than 1).
The expression for \({\aleph }_{\alpha ,\nu ,l,i,j}\,\) can, in general, be a complicated function of ν, meaning that it would have high complexity to approximate. As discussed above, in the state preparation it is convenient to just approximate the functions within the individual boxes. Then the upper bound we will use is
Here the maximum is over values of ν within the box Bμ but excluding the smaller inner box Bμ−1. We are now also including 2 for ς = 2 to select the nonlocal part of the pseudopotential.
For the implementation we therefore can prepare a state over α, ν, l, i, j with (squared) amplitudes proportional to
where Lα is the number of nuclei of type α. Note that this is independent of q, so does not depend on the system state. Recall that we prepare a superposition over nuclear species α initially, then prepare the superposition over Lα values of ℓ in Step 4.
Then, dependent on the state momentum, we apply an amplitude corresponding to the expression in square brackets in Eq. (31). This amplitude can be applied in a block encoding by first computing
where the factor of \(1/{({r}_{l}^{\alpha })}^{2l}\) is to simplify the computation. The computation of this value is Step 12. In the usual way for applying an amplitude factor in block encoding, we would prepare another register in an equal superposition state, and perform an inequality test; this is Step 15. However, the register in an equal superposition would first be multiplied by Ψ2,α,μ,l,i,j before the inequality test to account for the maximum value; that multiplication is Step 13. We would also need to compute the absolute value of the computed quantity before the inequality test, and apply a phase factor according to its sign. We would also need to apply a sign according to (−1)l in the square brackets in Eq. (31). Application of the sign is in Step 16.
For the computation of the expression in Eq. (34), we group the factor of \({(\Vert{k}_{p}\Vert\, \Vert{k}_{q}\Vert )}^{l}\) from \({\widetilde{F}}_{l}^{i}\) and \({\widetilde{F}}_{l}^{j}\) together with the Legendre polynomial in order to avoid needing to compute the ratio. Since \({r}_{l}^{\alpha }\) appears in \({\widetilde{F}}_{l}^{i}\,({r}_{l}^{\alpha }\Vert{k}_{p}\Vert)\) but not the Legendre polynomial, this computation gives the factor of \(1/{({r}_{l}^{\alpha })}^{2l}\) above. For the nuclei we are considering, l ≤ 2, and the expressions obtained for the product of the Legendre polynomial and \({(\Vert {k}_{p}\Vert \Vert {k}_{q}\Vert )}^{l}\) are as follows.
-
1.
For the case l = 0, the Legendre polynomial is just equal to 1 and there is no cancellation.
-
2.
For l = 1, the Legendre polynomial is x, so we have kp ⋅ kq and the denominator cancels.
-
3.
For l = 2, the Legendre polynomial is (3x2 − 1)/2, so multiplying by \({(\Vert {k}_{p}\Vert \Vert {k}_{q}\Vert )}^{l}\) gives
$$\frac{1}{2}\left[3{({k}_{p}\cdot {k}_{q})}^{2}-{\Vert {k}_{p}\Vert }^{2}{\Vert {k}_{q}\Vert }^{2}\right].$$(35)
The parts for degree l > 0 of the Legendre polynomial are nontrivial, so are calculated in Step 8 parts (e) and (f). The correct order of the Legendre polynomial is then selected in Step 11.
The remaining expression we need to compute is then
This expression can be computed using just additions and multiplications, except for the exponential, which is the most demanding part of the expression to accurately approximate. However, this exponential is in common between the parts of the sum, so need only be evaluated once. This evaluation is Step 9 in the list above, and from Eq. (36) it can be seen that the argument is proportional to \({({r}_{l}^{\alpha }\Vert{k}_{q}\Vert)}^{2}+{({r}_{l}^{\alpha }\Vert{k}_{p}\Vert)}^{2}\) as described in Step 9. The approximation of the negative exponential function is analysed in ref. 32, and we provide further details below in Section “QROM interpolation”.
Step 8 involves calculation of many of the quantities needed for the calculation, including the argument of the exponential in part (d), and powers of \({r}_{l}^{\alpha}\Vert{k}_{q}\Vert\) and \({r}_{l}^{\alpha}\Vert{k}_{p}\Vert\) in parts (b) and (c). The squares are used for the argument of the exponential, as well as the polynomials in Eq. (36), which are calculated in Step 10. Note that the squares \({\Vert {k}_{q}\Vert }^{2}\), \({\Vert {k}_{p}\Vert }^{2}\) are computed separately, because they are needed for the Legendre polynomials.
For the local part of the pseudopotential, recall that we have
Here, we have written the sum in terms of cx,0j which is in common with the nonlocal pseudopotential. We can also write it as
For the second term here we would prepare a state with squared amplitudes proportional to
Again the maximum is such that we only need to prepare a state with a superposition over boxes, rather than an exact ν-dependent superposition. The quantity Ψ1,α,μ,0,1,j here has ς = 1 for selecting the second term. Then, for the select operation, we can compute the expression as in Eq. (36), except with q = 0, which gives us \({\widetilde{F}}_{0}^{j}({r}_{{\rm{loc}}}^{\alpha}\Vert{k}_{\nu}\Vert)\). That is why this computation can be performed in common with the nonlocal part of the pseudopotential. It is accounted for by copying zero into the working register rather than q in Step 6. To apply the required amplitude factor we would then prepare an equal superposition state and multiply it by Ψ1,α,μ,0,1,j. The sign of the expression is again applied by taking the absolute value before the inequality test, and applying a Z gate according to the sign in Step 16.
For the first term in Eq. (39) we prepare squared amplitudes proportional to
Here, Ψ0,α,μ,0,1,1 has ς = 0 to select this part of the local pseudopotential. In this case we take l = 0, i = j = 1, so computation of the expression in Eq. (36) with q = 0 gives \({e}^{-{({r}_{{\rm{loc}}}^{\alpha }\parallel {k}_{\nu }\parallel )}^{2}/2}\). That enables this part of the Hamiltonian to be block encoded in a common way with the nonlocal pseudopotential as well. To apply the desired amplitude factor in the select operation, we again prepare an equal superposition state and multiply by Ψ0,α,μ,0,1,1. In this case, however, we multiply the equal superposition state by ∥kν∥2 to account for the division by ∥kν∥2. That is Step 14 above. The value of ∥kν∥2 is computed in the preparation of the superposition state for V and can be used here.
QROM interpolation
For the QROM interpolation, since we are approximating a negative exponential function, it is convenient to apply the following procedure.
-
First multiply the argument by \(1/\ln 2\). This is so that we can interpolate 2−z rather than e−z.
-
Apply QROM interpolation on the fractional part of z (i.e., after multiplying by \(1/\ln 2\)).
-
Use the integer part of z to control a bit shift of the output.
Since about half the bits in the binary expansion of \(1/\ln 2\) are zero, the multiplication by \(1/\ln 2\) has a complexity of approximately b2/4 when performing calculations with b bits. In Section II E of ref. 32 the error for the case of the linear interpolation is estimated in terms of the second derivative (see Eqs. (92) to (94)) as
When the function is 2−z, the relative error is
For example, if δz = 1/256, then the relative error is less than 10−6. For interpolation on the fractional part of z, this would correspond to a Toffoli complexity of about 256. The interpolation requires a multiplication with complexity b2 (we’re ignoring the complexity of additions). Then the complexity of the controlled bit shift is about b2/2.
In cases where higher precision is required, a higher-order polynomial can be used. In general, for an order n − 1 polynomial, one can use interpolation on n Chebyshev nodes, which gives a maximum error
for the interval [z0 − δz/2, z0 + δz/2]. For example, quadratic interpolation with n = 3 gives
Then 128 interpolation points would give an accuracy better than one part in 109. The quadratic can be arranged so only one more multiplication is needed, and the square does not need to be computed explicitly. This extra multiplication has cost approximately b2 for b digits.
This means that, for linear interpolation, the complexity is approximately (7/4)b2 + 256 when aiming for a relative error of one part in 106. More generally it would be \((7/4){b}^{2}+\ln 2/\sqrt{8\,\delta \,f}\) when aiming for relative error δ f. With quadratic interpolation the complexity can be given as (11/4)b2 + 128 (for error of one part in 109), or more generally \((11/4){b}^{2}+\ln 2/{(192\delta f)}^{1/3}\). The QROM cost scales only as the 1/3 power of the allowed error, meaning that it increases only slowly with the required error. For example, precision of one part in 1012 could be achieved with QROM cost about 1200. That is a small contribution to the total block encoding costs, which are on the order of 20,000 to 50,000 in Table 3. As the required precision is increased, the arithmetic is a much more significant contribution to the increase in complexity.
To explain the method in a little more detail, the circuit diagram is shown in Fig. 2. After multiplying by \(1/\ln 2\), the output is divided into the integer part, the leading fractional bits and the remaining fractional bits. For 128 = 27 interpolation points, the QROM for the interpolation is applied to the leading 7 bits of the fractional part. Each of those can be used to output parameters for the interpolation aj. The interpolated value can be calculated on the remaining fractional bits as a0 + a1δ(a2 + δ). That is, if the value given by the leading fractional bits is x0, then the interpolated value is calculated relative to that as a0 + a1(x − x0)(a2 + x − x0), so it is only the shift δ from x0 that is used, which is given by the remaining fractional bits. Lastly the integer part is used to control a shift of the output.
Completing the pseudopotential costing
Using the explanation in the preceding subsections, we now go on to provide the costing of the steps as presented in Section “Block encoding the GTH pseudopotential”, using the same step numbers. We ignore \({\mathcal{O}}(1)\) complexities, and linear complexities in the number of bits for the arithmetic involving multiplications.
-
1.
As detailed above in Section “Preparation of registers for selecting between terms”, the cost of the state preparation can be taken to be \({\mathcal{M}}n+5\log ({\mathcal{M}}n)+b\), where \({\mathcal{M}}\) is the number of independent values needed for each nucleus summed over the nuclei.
-
2.
There is a further cost of \({\mathcal{M}}n\) to output all the required parameters using QROM.
-
3.
Again as detailed above in Section “Preparation of registers for selecting between terms”, the cost of preparing ν and updated box numbers is 30n + 2b.
-
4.
The nuclear species α is used to prepare an equal superposition over a register ℓ indexing between the different nuclei of that type. Given that there are \(\lceil \log L\rceil\) qubits used for the number of nuclei, the Toffoli cost is \(7\lceil \log L\rceil +2{b}_{r}-6\) according to Appendix C of ref. 5 (step 3(a) on page 42). Here br is the number of bits of accuracy for the rotation, which can be taken to be 7 which would give complexity \(7\lceil \log L\rceil +8\). For ℓ we do not need to make it index over all nuclei, it can just index over nuclei for each type, so there are the same ℓ values for different α.
-
5.
The QROM on ℓ to output Rℓ has cost L. This QROM can use both the register with α and ℓ. This is the only stage where ℓ is used, which is why ℓ can just index over nuclei of each type individually, rather than all nuclei.
-
6.
The controlled copy of q into a new register has complexity nx + ny + nz, or 3n if all three are equal.
-
7.
Computing p = q − ν also has complexity nx + ny + nz.
-
8.
The complexities of computing the main quantities needed are as follows. Here we give the expressions in terms of \({r}_{l}^{\alpha }\), but this is also \({r}_{{\rm{loc}}}^{\alpha }\) for the local pseudopotential.
-
(a)
∥kq∥2 & ∥kp∥2 – As described in Section “Complexity of norm with Bravais vectors”, the leading order complexity of the computation of this expression using a Gramian is as given in Eq. (113) for b bits for the arithmetic.
-
(b)
\({({r}_{l}^{\alpha }\Vert{k}_{q}\Vert)}^{2}\) & \({({r}_{l}^{\alpha}\Vert{k}_{p}\Vert)}^{2}\) – Each of these multiplications has a leading-order complexity of b2, using the result in Eq. (D18) of ref. 32.
-
(c)
\({({r}_{l}^{\alpha}\Vert{k}_{q}\Vert)}^{4}\) & \({({r}_{l}^{\alpha}\Vert{k}_{p}\Vert)}^{4}\) – Each squaring of a real number has a leading-order complexity of b2/2, as per Eq. (D37) of ref. 32. We do not need to perform these calculations in the case that \({E}_{l\alpha }^{ij}\) does not go up to i, j = 2, though (so it is only needed for Mn, Ni, Ca, and Ti out of the listed nuclei).
-
(d)
\({({r}_{l}^{\alpha}\Vert{k}_{q}\Vert)}^{2}+{({r}_{l}^{\alpha}\Vert{k}_{p}\Vert )}^{2}\) – The cost of the addition can be ignored in this costing.
-
(e)
kp ⋅ kq – The complexity of the dot product using a Gramian is given in Eq. (123).
-
(f)
\([3{({k}_{p}\cdot {k}_{q})}^{2}-{\Vert{k}_{p}\Vert}^{2}{\Vert{k}_{q}\Vert}^{2}]/2\) – For this expression a square (complexity b2/2) and product (complexity b2) are required for a total of 3b2/2. The multiplication by 3/2 can be applied with a single addition with complexity \({\mathcal{O}}(b)\), and similarly the subtraction is \({\mathcal{O}}(b)\); we ignore these complexities linear in b. This computation is only needed in the case that l goes up to 2.
-
(a)
-
9.
The complexities given above for the QROM are (7/4)b2 + 256 if the goal is relative error below 10−6, or (11/4)b2 + 128 to give accuracy of one part in 109. These are using linear or quadratic interpolation, respectively.
-
10.
The complexity of computing \({c}_{0,lj}+{c}_{1,lj}{({r}_{l}^{\alpha}\Vert{k}_{q}\Vert)}^{2}+{c}_{2,lj}{({r}_{l}^{\alpha}\Vert{k}_{q}\Vert)}^{4}\) (and similarly for i) depends on the maximum values of i, j.
-
(a)
First is the case i = j = 1, which corresponds to \({l}_{\max }^{\alpha }=0\) in most cases, though Li has \({l}_{\max }^{\alpha }=1\) with the same maximum values of i, j. In that case we just want c0,l1, since the sums over x are now just giving x = 0. There is no extra Toffoli cost then, since c0,l1 is output by QROM at an earlier stage.
-
(b)
In the case where the maximum of i, j is 2, we need \({c}_{0,lj}+{c}_{1,lj}{({r}_{l}^{\alpha}\Vert{k}_{q}\Vert)}^{2}\) and similarly for i. Then c1,lj can only be 0 or − 1, so the multiplication by c1,lj is just a controlled copy with cost b. The addition has a cost of b for a total of 2b, then there is a factor of 2 to perform this calculation for both i and j for a total of 4b.
-
(c)
In the case where the maximum of i, j is 3, we need to compute \({c}_{0,lj}+{c}_{1,lj}{({r}_{l}^{\alpha}\Vert{k}_{q}\Vert)}^{2}+{c}_{2,lj}{({r}_{l}^{\alpha}\Vert{k}_{q}\Vert)}^{4}\), and also allow values of c1,lj of −10 and −14. We do not have c1,lj = −18 for the materials listed because we do not have a combination of l = 2 and j = 3. The bits selecting 1 and 2 (in the value of c1,lj) cannot both be 1, so we may simply perform a controlled copy into an ancilla register for both with cost 2b. There is a controlled addition of 8 times the argument with cost approximately 2b. Next, c2,lj has just 1 bit, so the multiplication by c2,lj is just a controlled copy with cost b. Together with the two additions this is a cost of approximately 7b, then the factor of 2 for the calculation with both i and j gives a total of 14b.
-
(a)
-
11.
Use the value of l to copy either 1, or the computed items (e) or (f) from part 8 into a working register for the Legendre polynomial. The complexity is 2b for two controlled copies of b bits. The controlled copy of 1 has complexity \({\mathcal{O}}(1)\).
-
12.
Multiply the exponential from the QROM interpolation in Step 9, p and q polynomials, and the Legendre polynomial. The three multiplications have complexity 3b2.
-
13.
Multiply the value of Ψς,α,μ,l,i,j that has previously been output via QROM by the value in an equal superposition register. This yields another complexity of b2.
-
14.
The controlled multiplication of the value in the superposition register by ∥kν∥2 has complexity b2. The control just introduces complexity linear in b, which we ignore here.
-
15.
Determine the absolute value of the result of the product from Step 12 and perform an inequality with the result from Step 14. The success flag for the inequality test applies the appropriate amplitude for the block encoding. This has complexity linear in b.
-
16.
Use the sign from the product from Step 12 to control a Z operation. This has \({\mathcal{O}}(1)\) complexity.
The primary costs here are those of the multiplications and squarings. Assuming these are b bits, there is a maximum cost of \(11\frac{1}{4}{b}^{2}+14b\) for all these operations (including the cost linear in b from computing the polynomials with cx,lj, and assuming quadratic interpolation, but omitting the Gramian-dependent costs). There are various savings in other cases.
-
There is a saving of b2 if linear interpolation is used instead of quadratic.
-
There is a saving of 3b2/2 if \({l}_{\max }^{\alpha } \,<\, 2\).
-
There is a saving of b2 + 10b if the maximum of i, j is 2 instead of 3.
-
There is a further saving of 4b if the the maximum of i, j is 1.
The cost of the QROM for the interpolation is considerably smaller. For accuracy of about one part in 106 we can use linear interpolation with 256 interpolation points, as well as b = 20. Then the cost is about 5000 Toffolis (the cost of the pseudopotential excluding other parts of the block encoding). If one were aiming for a higher precision, closer to one part in 108, one could choose b = 27 and use quadratic interpolation with 64 interpolation points. That would increase the cost to around 10,000 Toffolis. This means that the precision can be improved by two orders of magnitude with about a factor of 2 increase in the complexity.
A choice we have made in the implementation is to interpolate just the exponential, instead of separately interpolating \({F}_{l\alpha }^{i}\) and \({F}_{l\alpha }^{j}\). Separately interpolating would save the multiplications by cx,li, cx,lj, but we would have much higher QROM cost. We would need to interpolate 9 combinations of each of l, i and l, j. Even if the interpolations only use 100 points each, the extra cost over just interpolating the exponential is nearly 2000. It would also be a considerably more complicated error analysis because we would not be interpolating just an exponential. In comparison, the multiplication cost we have here is about 14b; this is not large because cx,li has a small number of bits.
There is also a cost of determining two squares of vectors and a dot product using the Gramian, but the cost will be highly dependent on the form of the Gramian. Because p = q − ν, and ∥kν∥2 is computed in a different part of the algorithm, we can compute kq ⋅ kν, which will give ∥kp∥2 and kp ⋅ kq with only additions. This means that there is an additional complexity at this step of one norm and one dot product, rather than two norms.
In this section we are only considering the overhead in the block encoding for the pseudopotentials. The costs for other parts of the procedure are discussed further below in Section “Other costs in the block encoding”.
Other costs in the block encoding
Preparing the superposition over ν
The primary cost for the other part of the superposition is in preparing the state with amplitudes 1/∥kν∥ for V. This is the same state as in prior work, except here we consider the amendment that we use a Gramian to compute the norm, so ∥kν∥ is not just proportional to ∥ν∥. Also, this superposition is used only for V, not U.
The general principle used to prepare the state is to first prepare a set of nested boxes indexed by μ, then use that to prepare a superposition over ν, then use inequality testing in order to apply the correct amplitude factors. The details of how the nested boxes are applied is described in detail in Section “Preparing via nested boxes”. As explained in Section “Preparation of registers for selecting between terms”, there is a complexity linear in n (the number of bits for components of the momentum) in order to construct the superposition over ν given the superposition over μ.
The superposition over μ is prepared using controlled Hadamard gates in ref. 11. Here we are considering the case of non-orthogonal Bravais vectors, so the weightings are not the same as in ref. 11 and more general preparation is needed. We can use a similar preparation over μ here as is used in Section “Preparation of registers for selecting between terms”, by using coherent alias sampling. The cost is just linear in n, plus a logarithmic (in n) cost for preparing an equal superposition and another logarithmic (in the precision) cost for performing the inequality test.
In the simple case discussed in ref. 11, the inequality is written in the form (see Eq. (84) of that work)
where m is the value prepared in an equal superposition state, M is the number of values in the superposition for m, and \({({2}^{\mu -2})}^{2}\) is a value resulting from the simple form of nested boxes in that work. As discussed there, \({({2}^{\mu -2})}^{2}M\) can be given with no extra Toffoli gates. For the more general set of nested boxes considered here, the value will not be so simple, but the appropriate μ-dependent value can be obtained from a QROM on μ with complexity linear in n. This cost, and the other costs for preparing the superposition over μ, are very small compared the other costs in the procedure, and will be ignored in our approximate costing.
The main cost in the procedure is from determining the value of ∥kν∥2. That is no longer proportional to ∥ν∥ here, and the details of the complexity are given in Section “Complexity of norm with Bravais vectors”. In preparing the state via inequality testing and amplitude amplification, we use the following steps.
-
Prepare the superposition over μ for the nested boxes.
-
Prepare the simple superposition over ν using the value of μ as discussed above.
-
Compute ∥kν∥2 with complexity described in Section “Complexity of norm with Bravais vectors”.
-
Multiply the result by a register in an equal superposition state, with complexity b2 when performing arithmetic with b bits.
-
Perform an inequality test between the result and a μ-dependent value given by QROM.
-
Reflect on the result of the inequality test.
-
Invert the inequality test, multiplication and computation of ∥kν∥2. This may be performed with Clifford gates, provided working ancillas are retained.
-
Invert the preparation of μ, ν from the first two steps, reflect about zero on the ancillas, and reprepare.
-
Compute ∥kν∥2 and the multiplication again.
-
Perform the inequality test to prepare the state.
For the total cost we also need to account for inverting this state preparation. If we retain qubits from the calculation we can invert the last multiplication and computation of ∥kν∥2 with Clifford gates, then need to pay the cost of the computation a third time. That is, we pay the Toffoli cost of the calculation twice in preparation, and once in inverse preparation. Thus the amplitude amplification results in triple the cost of computing ∥kν∥2 and the multiplication. This is similar to the tripling of the cost obtained in ref. 11 (see below Eq. (90) of that work). The other costs are relatively trivial compared to this main cost.
Another subtlety is that we are preparing kν via amplitude amplification for V, but by just preparation over the nested boxes for the pseudopotential components of the Hamiltonian. We can perform these preparations in separate registers, then perform a controlled swap to move ν into the working register. This may be performed before the computation of ∥kν∥2 in step 8 above, so the value of ∥kν∥2 can be used in the implementation of the pseudopotential without recomputing. The controlled swap has complexity linear in n, which can be ignored in this approximate costing. There are also a number of other small selection costs that we are ignoring here. For example, the flag on the result of the inequality test is also controlled on the qubits selecting the V term in the Hamiltonian.
Implementation of the phase factor
The last step is to apply the phase factor of \({e}^{-i{k}_{\nu }\cdot {R}_{\ell }}\). For this calculation the vector kν is given in terms of the reciprocal lattice vectors as
Therefore, we can compute
The QROM in Step 5 can be used to give these quantities, so we can calculate the dot product in the form
The cost of the QROM in Step 5 is unchanged by this modification, so the extra complexity to consider here is that from the three multiplications. The result may be added into a phase gradient state to implement the phase rotation, which is a smaller complexity in comparison.
Using the result in Eq. (D9) of ref. 32, the complexity of multiplying a dB-qubit integer and a real number is approximately
when giving the result to b bits of precision. The complexity for the three multiplications is then
The precision needed for the nuclear positions was previously described for the case without pseudopotentials in ref. 11. For the case with pseudopotentials we can bound the error due to the approximation of the nuclear position in the following way. Suppose that the nuclear positions Rℓ are approximated by \({\widetilde{R}}_{\ell }\) such that \(\left\Vert {\widetilde{R}}_{\ell }-{R}_{\ell }\right\Vert \le {\delta }_{R}\) for all ℓ. Then, we have the following bounds on the distance between the corresponding Hamiltonian terms:
and
Because the exponential decay in the local and nonlocal pseudopotentials has factors of \({r}_{{\rm{loc}}}^{\alpha }\) and \({r}_{l}^{\alpha }\) respectively, the expected values of \(\left\Vert {k}_{\nu }\right\Vert\) should be on the order of \(1/{r}_{{\rm{loc}}}^{\alpha }\) and \(1/{r}_{l}^{\alpha }\), which are order 1. Since the sums omitting \(\left\Vert {k}_{\nu }\right\Vert\) give values of λ, we can infer that
That is, δR approximately gives the relative error. For the purpose of the costing we will assume that the positions are specified with b bits, similarly to other quantities in the coherent arithmetic.
The kinetic energy
In the prior work costing plane-wave simulation11, the arithmetic for the kinetic energy was completely avoided by expressing the sum of squares as a linear combination of bitwise products. That was used to avoid the need to compute ∥q∥2. Here we are using ∥kq∥2, which is a norm computed using the Gramian. That makes it more difficult to use the approach from ref. 11, but it is not needed. Here, we are already computing ∥kq∥2 for use in the nonlocal pseudopotential. That means we can use that value for the block encoding of T as well.
We can simply perform an inequality test between the value calculated for ∥kq∥2 and a number in an equal superposition in an ancilla register. This would suggest that we only have the cost of an inequality test, except that the maximum value of ∥kq∥2 will typically not be a power of 2, and we would want the maximum value to match the maximum value in the ancilla register in order to minimize λT. When the equal superposition state is prepared via Hadamards the maximum value is (one less than) a power of 2.
One approach would be to multiply ∥kq∥2 by a constant in order to provide a matching maximum value. The drawback is that multiplications have high complexity. Instead, we can prepare the equal superposition state over a number of basis states that is not a power of 2. That can be performed in the usual way using inequality testing and amplitude amplification, with complexity \(3\lceil \log d\rceil +2{b}_{r}-9\), where d is the number of basis states to take an equal superposition over5.
When the number of basis states to take a superposition over is a multiple of a power of 2, then the complexity can have 3 times that power subtracted from it. In ref. 5 that power was denoted η (distinct from the notation η for the number of electrons here), so that the complexity had 3η subtracted from it. For our application here we have some freedom in the choice of d, because we can take it to be larger than necessary at the cost of a slightly larger value of λT. For example, if we took d to be 10 bits multiplied by a power of 2, then the cost of the preparation would be about 30 Toffolis, but the increase in λT due to any imprecision would be less than 1 part in 1000. That is negligible for our application here.
Total complexity
Here we summarise the costs, and give the complete additional complexity for a block encoding of the Hamiltonian, beyond that given above for the pseudopotentials in Section “Completing the pseudopotential costing”. That will then need to be multiplied by the value of λ discussed in the next section. In the case without pseudopotentials, the costs are summarised in Table II of ref. 11. Here we summarise the costs using the same lines as in that Table, because many of the costs are equivalent. In the following we give the number of bits for each component of momentum as n for simplicity in many cases where the cost is small.
-
(i)
Registers need to be prepared to select between T, V, Uloc and Unonloc. We will first prepare registers for selecting between T, V, U, similarly to ref. 11, then prepare registers selecting between Uloc and Unonloc as in Section “Preparation of registers for selecting between terms”. The costing given in Eq. (61) of ref. 11 assumes a particular relation between the values of λ for V and U that does not hold here. Nevertheless, this is a small cost compared to the arithmetic in this block encoding, so will be ignored. The complexity for the preparation selecting between Uloc and Unonloc is part of the costing in Section “Completing the pseudopotential costing”.
-
(ii)
The preparation of equal superposition states over i, j for selecting electrons will have the same cost as in ref. 11, which is 14nη + 8br − 36. Note that it is typically reasonable to take br = 7, which would give cost 14nη + 20. Here, nη is the number of qubits needed to store the electron number.
-
(iii)
The preparation of registers needed for T is now changed from that in ref. 11 because we are preparing a superposition over a number of basis states that is not a power of 2. The cost is doubled when accounting for inverse preparation. The complexity can be taken to be \(3\lceil \log d\rceil +2{b}_{r}-9\) when preparing a superposition over d basis states, which is negligible compared to the other costs here.
-
(iv)
The complexity of swapping the momentum registers for electrons into temporary registers is unchanged at 12ηn + 4η − 8. In the case with different numbers of bits for the components, this complexity is modified by replacing 3n → nx + ny + nz so it becomes 4η(nx + ny + nz) + 4η − 8.
-
(v)
The select cost for T is now just b for an inequality test.
-
(vi)
The cost of preparing the state with 1/∥kν∥ amplitudes is covered in Section “Preparing the superposition over ν” above, and is approximately triple the cost of computing ∥kν∥2 plus 3b2 for three multiplications. In comparison, the leading-order complexity in ref. 11 was 3n2 for computing ∥kν∥2 plus 4nMn for the multiplications (with nM being the number of qubits for the equal superposition state).
-
(vii)
The cost of the QROM for outputting Rℓ was already included in Section “Completing the pseudopotential costing”.
-
(viii)
The cost of additions and subtractions of ν into momentum registers is unchanged at 24n, or 8(nx + ny + nz).
-
(ix)
The complexity of the phase factor is addressed in Section “Implementation of the phase factor” above, and is \({n}_{x}^{2}+{n}_{y}^{2}+{n}_{z}^{2}+2({n}_{x}+{n}_{y}+{n}_{z})b\) in the more general case.
Of these costs, the leading three are the 1/∥kν∥ preparation cost in part (vi), the phase factor in part (ix), and the cost of swapping the momentum registers from part (iv).
Computing λ
Here we describe the computation of λ for the simulation. We break up λ into four parts, λT, λV, λloc, λnonloc corresponding to the four terms in the Hamiltonian.
Kinetic energy λ
For comparison, the value of λT for cubic unit cells was given in Eq. (71) of ref. 11 as
In contrast, here we are accounting for general non-orthogonal Bravais vectors. In our case here, we have
This can be computed as
where the maximum is over the two ± signs, which are taken to be independent. This expression is used to calculate values in Tables 4 and 5. We do not need to consider an arbitrary sign on the first term, because an overall sign flip leaves the norm unaffected.
This value of λT results from using our approach with simply performing an inequality test using the computed value of ∥kq∥2. If we were to use the approach based on the individual qubits of q similar to ref. 11, or the proposal for Bravais vectors in ref. 21, then it can result in a larger λT. In that approach one is effectively considering the expansion
and implementing each term according to a linear combination. That means the value of λT comes from taking the sum of the maximum of each of the terms, rather than the maximum of ∥kq∥2. In cases such as LiNiO2 (C2/m) that gives a larger value, though in other cases it can be the same.
Electron-electron potential λ
To explain the method to determine λV for non-orthogonal Bravais vectors, we first summarise the method for cubic unit cells. That can be be determined from the expression in Eq. (25) of ref. 11,
When using Nx = Ny = Nz, \({\lambda }_{\nu }={\mathcal{O}}({N}^{1/3})\). In particular, we obtain
where Ti2 is the Lewin inverse-tangent integral (which can be evaluated in terms of polylogarithms), and C is Catalan’s constant (see ref. 33, or Eq. (29) of ref. 10).
The constant obtained in the case of a cube upper bounds that for a rectangular prism by noting the behaviour of the integral as the prism is reshaped to a cube of the same volume. If, for example, Nx > Ny, then reducing the value of Nx by an infinitesimal amount δx and increasing the value of Ny by δy, to maintain the same volume we need
The change in the value of the integral over x, y for each z is
where \(\tilde{x}=x/{N}_{x}\) and \(\tilde{y}=y/{N}_{y}\), and the last inequality is because Nx > Ny. This shows that for any non-cubic (rectangular prism) region, changing the shape towards a cube while preserving the volume can only increase the integral. The reason for this is that regions of the shape where 1/∥ν∥ is smaller are being replaced with regions where it is larger in reshaping the region to a cube.
That is still for orthogonal Bravais vectors. In the general case with non-orthogonal Bravais vectors, we would use
The expression in the first line is used to calculate λV in Tables 4 and 5. Now let us use the change of variables where x, y, z are the components of νxg(1) + νyg(2) + νzg(3). That is, we have
According to the usual rule for changing variables in integrals, we replace the integral over ν with one over x, y, z
where
Using \([{{\boldsymbol{g}}}^{(1)},{{\boldsymbol{g}}}^{(2)},{{\boldsymbol{g}}}^{(3)}]=2\pi {({{\boldsymbol{a}}}^{-1})}^{T}\), we have
However, this transformation of variables also changes the volume of the region. The region corresponds to those values of x, y, z obtained from Eq. (68) for νx ∈ [ − Nx, Nx], νy ∈ [ − Ny, Ny], νz ∈ [ − Nz, Nz]. That region is a parallelepiped, where the sides are 2Nxg(1), 2Nyg(2), 2Nzg(3). The volume of a parallelepiped is given by the triple scalar product of the vectors for its sides, and so here is \(8{N}_{x}{N}_{y}{N}_{z}\det ([{{\boldsymbol{g}}}^{(1)},{{\boldsymbol{g}}}^{(2)},{{\boldsymbol{g}}}^{(3)}])\). That is, the volume of the region is increased by a factor of \(\det ([{{\boldsymbol{g}}}^{(1)},{{\boldsymbol{g}}}^{(2)},{{\boldsymbol{g}}}^{(3)}])\). In a similar way as we bounded the integral for the rectangular prism with that of the cube of the same volume, modifying the parallelepiped to a rectangular prism of the same volume can only increase the integral. This is because we are again replacing regions where 1/(x2 + y2 + z2) is smaller with those where it is larger in reshaping the region. Therefore we can use the upper bound for the cube of the same volume. Since the integral gives a factor of the volume to the power of 1/3, the change of variables gives a factor of \(\det {([{{\boldsymbol{g}}}^{(1)},{{\boldsymbol{g}}}^{(2)},{{\boldsymbol{g}}}^{(3)}])}^{1/3}=2\pi /{\Omega }^{1/3}\). Multiplying that by Ω/(2π)3 from the Jacobian, it gives an overall factor of Ω2/3/(2π)2. This means that exactly the same factor of Ω2/3/(2π)2 is obtained as when we used a cubic region. For specific examples it is more accurate to compute λV by explicitly performing the sum, though.
Local pseudopotential λ
For the local part of the pseudopotential, the value of λloc can be as small as
This is the value of λloc that would be obtained if we implemented the block encoding via a tight preparation over ν. We will give the amended expression for the actual implementation below. This value of λloc is derived by noting that the expression in large round brackets for Uloc is approximately unitary. (The only nonunitarity is from the restriction that (q − ν) ∈ G, which is implemented by eliminating any transfers outside the state space in the block encoding). Here we have switched from the sum over ℓ to that over α, and used the number of atoms of each species Lα. This is an expression which can be computed via an explicit sum over ν. An approximate value can be found by approximating the sum by an integral, and integrating over an infinite region. In the case where \({r}_{{\rm{loc}}}^{\alpha }\) is significantly smaller than the size of the region, this should be accurate.
First, note that for the cubic case kν = 2 π ν/Ω1/3, so the integral over ν gives
where \(r=\,\Vert{k}_{\nu}\Vert {r}_{{\rm{loc}}}^{\alpha }\), so
Here, \(\vec{r}={k}_{\nu }{r}_{{\rm{loc}}}^{\alpha }\) and ‘ ↦ ’ indicates integration over the angular coordinates. The form in Eq. (73) gives an approximation of λloc depending only on the parameters of the pseudopotential for the nucleus, and not on N or Ω.
When using non-orthogonal Bravais vectors, we can use the change of variables
so \(r=\Vert(x,y,z)\Vert =\Vert{k}_{\nu}\Vert {r}_{{\rm{loc}}}^{\alpha }\). This just differs from the change of variables used to bound λV above by the factor of \({r}_{{\rm{loc}}}^{\alpha }\), so instead of Eq. (71) we obtain
This means that we have the same rule for the change of variables as in Eq. (74), and exactly the same derivation as in Eq. (73) holds.
Hence the same expression is obtained regardless of whether a cubic region or non-orthogonal Bravais vectors are used. Note also that there is a factor of \(1/{r}_{{\rm{loc}}}^{\alpha }\) on the first term in Eq. (73), and no factors of 1/Δ. In contrast, for λV and λT there are factors of 1/Δ and 1/Δ2, respectively, where Δ = (Ω/N)1/3 is the spacing of the plane-wave reciprocal lattice. It can therefore be expected that the value of λloc is small compared to λV when the lattice spacing is smaller than \({r}_{{\rm{loc}}}^{\alpha }\) or 1, which is the typical situation that should be expected.
In practice, our block encoding will work by separately implementing the terms in the sum for Uloc. That means that λloc should be given by
The expression on the first line is used to calculate λloc in Tables 6, 4, and 5. In the second line we have used the integral approximation again. This integral approximation and that without separating components are shown in Table 7. It can be seen that separately implementing the terms in the sum gives about a factor of 2 increase in many cases, but it is still relatively small.
When we are performing the preparation, we are only choosing the amplitudes according to the box that ν is in, which increases the effective value of λ. For the local pseudopotential, we are able to choose to implement a different component of the Hamiltonian (for example T) in the case of the failure of the inequality test. This is similar to the approach used in ref. 11 to account for the failures in state preparation for V. Since the value of λ for the local pseudopotential is small, it can be expected that we can always apply V in the failure cases, so we can still use this expression for λloc.
The expression in Eq. (77) can slightly overestimate the values for real materials, since it is using an upper bound. For the example of C (diamond), if the simulation cell consists of one primitive cell then we obtain 17.1594, which is slightly less. For diamond it would be appropriate to use more primitive cells within a simulation cell. Using 2 in each direction increases the value to 18.5678, 4 in each direction gives 19.2545, and 8 gives 19.5956, which is closer to the value found from Eq. (77). There are many examples of crystal structures listed in the Supplementary Information, Table III. In Table 6 we give the corresponding contributions to λloc for each of the nuclei in these structures (omitting the factor of η). In each case we find that the value is smaller than that given by Eq. (77), but only by a small amount.
It is also possible to compute the error due to interpolation of the exponential via a very similar calculation. That is, in the calculation where we would otherwise be estimating λ, replace each value of the exponential by the interpolation error in the exponential. The results of performing this calculation for a quadratic interpolation with 64 points are given in Table 8. This choice of interpolation can be expected to give a maximum relative error in the interpolation below 108, but it will be smaller on average. It can be expected that the average error is more relevant here, as the sum involves the evaluation of the exponential at a large number of possible arguments. In each case we find that the error is about 4 parts per billion relative to λ.
Nonlocal pseudopotential λ
Next, based on the definition of Unonloc, the value of λnonloc can be as small as
This is the formula used to calculate the λnonloc values in Table 9. When block encoding the nonlocal part by separating the l, i, j components, the value of λ becomes
where
This is the formula used to calculate the λnonloc values in Tables 10, 4, and 5. Recall that \({C}_{lj}^{\alpha }\ge 0\) so the absolute value is not required. Then we can approximate λnonloc via the integral,
Now we consider the change of variables using the general form for non-orthogonal Bravais vectors, and the case for a cubic region can be considered to be a special case. For each l we make the change of variables
so the change of variables gives the factor
If we also upper bound the Legendre polynomial by 1, that gives
where
If we further define \({\widetilde{C}}_{lj}={C}_{lj}^{\alpha }/{({r}_{l}^{\alpha })}^{l+3/2}\) (so they are independent of α), then we obtain
This again gives us an expression that only depends on the parameters of the pseudopotential, and not independently on Ω and N. In fact, the way it is written is now independent of \({r}_{l}^{\alpha }\), and depends on the nucleus only through \({E}_{l\alpha }^{ij}\). It is also independent of the geometry of the non-orthogonal Bravais vectors. The values computed for a range of atoms are given in Table 11. For the atoms with \({l}_{\max }=0\), for example B, the value of λnonloc is just \(8{E}_{0}^{00}\).
For both the local and nonlocal pseudopotential, we are only choosing the amplitudes according to the box that ν is in, which increases the effective value of λ. For the local pseudopotential we were able to choose to implement a different component of the Hamiltonian for failure of the inequality test. That isn’t possible for the nonlocal pseudopotential, because the amplitude for success of the inequality test depends on the state as well as ν. Typically, determining a ν-dependent upper bound (as in \({\aleph }\) above) is complicated, and would require further function interpolation complexity to implement. For this reason we do not apply any other components of the Hamiltonian in the case of failure of the inequality test, and there will be an increase to the effective value of λnonloc.
To give an explicit formula for this effective value of λnonloc, we adopt the notation used in Section “Preparing via nested boxes”. We use Bμ for box μ, and μ(ν) for the minimum μ such that ν ∈ Bμ. We can then use
That is, the effective λ is obtained by using the maximum value in the current box excluding the inner box; see Section “Preparing via nested boxes" for further discussion. We can then calculate the effective λnonloc by using Eq. (80) with \({\aleph }_{\alpha ,\nu ,l,i,j}^{{\rm{box}}}\).
Cost estimates for quantum simulating interesting materials
In this section we describe the proposed use case of CO adsorption on transition metals; its motivation, the details of DFT calculations and their results which are used to estimate inputs for the proposed quantum algorithm. CO adsorption on transition metals finds application in many heterogeneous catalytic systems including methanation34, steam reforming of fossil fuels23, methanol synthesis24,25, water-gas-shift and its reverse26, as well as the electrochemical reduction of CO2. CO is also responsible for deactivation of catalysts through coking (e.g. steam reforming) or poisoning (e.g. if present in feeds of low temperature fuel cells35). Thus, it is important to provide reliable and accurate estimates of the binding (i.e. adsorption) energies, as well as adsorption sites of CO on transition metal surfaces to accurately describe reaction energetics, deactivation mechanisms, and realistic active sites for the reactions involved.
However, generalized gradient approximation (GGA) family of exchange correlation functionals in the context of density functional theory (DFT-GGA) - commonly used within the communities of heterogeneous catalysis and surface science - routinely overestimate the binding energies and wrongly assign the binding sites of CO on transition metals, relative to experiments36,37. This is commonly known in literature as the ‘CO/Pt(111) puzzle’38,39, though the issue is manifested still for other metal surfaces that form a strong chemisorption bond with CO. The fundamental reason behind this problem is attributed to the self-interaction error associated with GGA density functionals (among other functionals), which lead to the overestimation of the back donation of electrons from the surface to adsorbed CO. Essentially, these functionals incorrectly place the highest occupied and lowest unoccupied molecular orbitals of CO relative to the Fermi level of the metal37. These quantitative (i.e. overestimation of binding energies) and qualitative (i.e. wrong adsorption sites) errors can be mitigated by reducing density-driven errors via using higher level functionals on Jacob’s ladder40. However, these approaches are not universal as they often rely on (semi)empirical parameters to arrive at the correct HOMO-LUMO gap, and often require impractical computational expense. For these reasons, we believe this scientific topic with wide fundamental interest and industrial applicability is an impactful use case for quantum computing applications/algorithms which could circumvent the shortcomings of DFT.
Classical Computational Methodology
All periodic DFT calculations are performed with the Perdew-Burke-Ernzerhof exchange correlation functional41 as implemented in the Quantum Espresso code42. We calculate CO adsorption on the (111) facet of three strong-adsorbing transition metals: Pt, Pd, and Rh. All surfaces are modelled with a three-layered slab, with the bottom layer fixed at the bulk positions and the upper two, together with the adsorbed CO fully relaxed. A vacuum of 10 Å is applied in the direction normal to the surface to avoid spurious interactions among vertical images. We used two sets of pseudopotentials to implement the frozen core approximation: the projector augmented wave (PAW) pseudopotentials and the GTH pseudopotentials. The latter are implemented in our quantum computing algorithm, while we present the former for comparison as they are widely used in the computational catalysis literature. The number of valence electrons for Pt, Pd, and Rh are 16, 18, 17 for the Vanderbilt pseudopotentials and 10, 10, and 9 with the GTH pseudopotentials, respectively. We provide comprehensive convergence studies for three different parameters: kinetic energy cutoff of the plane-wave basis set through which the Kohn-Sham one-electron valence states are expanded, the Monkhorst-Pack k-point mesh density and the unit cell size for Pt(111) surface. The converged parameters were then adopted for the respective Pd and Rh surfaces. The (111) facet of these face-centered-cubic (fcc) metals exhibit four unique adsorption sites: top, bridge, hollow fcc, and hollow hpc. Thus, we calculated adsorption on all four sites, but for brevity, results and convergence tests were reported on only the most favourable (e.g. CO adsorption on fcc hollow site of Pt(111) surface). The binding energy (BE) of CO on a given surface is defined as:
where E∣CO* is the total electronic energy of CO adsorbed on the surface, E∣Clean is the total electronic energy of the clean metal slab, and E∣CO(g) is the total electronic energy of CO in the gas phase (i.e. vacuum). The more negative the binding energy, the more favourable adsorption is.
Computational assessment
To get reliable resource estimates for conducting quantum chemical simulations on a quantum computer, respective DFT simulations of CO adsorption on (111) fcc metal surfaces have to be analyzed. Since – in this paper – the focus lies on the usage of GTH pseudopotentials for periodic systems, the determination of optimal simulation parameters for GTH pseudopotentials is crucial. Therefore, in this section we are focusing on systematic convergence studies for DFT calculation parameters for GTH pseudopotentials, and for direct comparison PAW pseudopotentials. One of the first parameters to analyze is the energy cutoff value, which is used to truncate the plane waves, for both pseudopotentials studied: GTH and PAW. Thereby, we studied the convergence for each of the three binding energy terms on Pt(111) with the convergence threshold of 0.01 Rydberg (Ry), as shown in Fig. 3.
The convergence of electronic energies (in Ry) versus the plane-wave energy cutoff (in Ry) using the PAW (black lines) and GTH (red lines) pseudopotentials of the three components making up the binding energies: (a) isolated CO molecule in vacuum, (b) clean Pt slab with no adsorbates, and (c) the Pt slab with a CO molecule adsorbed on its surface. Arrows point out the energy cutoff at which convergence of electronic energies has been achieved to within 0.01 Ry. In panels (b) and (c) the data point for GTH at 30 Ry energy cutoff is omitted because of too large a difference in the corresponding electronic energies. Notice the different scales for the vertical axis in all three panels. The data is generated from a (4 × 4) Pt(111) slab, with the Brillouin zone sampled at a 4 × 4 × 1 Monkhorst-Pack k-point mesh. The insets in each panel depict the atomistic models on which convergence was studied.
From the calculation results it emerges, that with the exception of CO(g) calculated with PAW and the clean Pt slab calculated with GTH, electronic energies generally require high energy cutoffs of 150 Ry to converge. By contrast, binding energies converge at much lower energy cutoffs, as shown in Fig. 3. This is intelligible due to the cancellation of errors when calculating relative energies like binding energies. Figure 4 reports adsorption on the preferred hollow fcc site. We deem the binding energies converged when differences across consecutive energy cutoff values are <0.10 eV (1 Ry = 13.61 eV). The binding energies calculated with either PAW or GTH reach convergence at a minimum of just 40 Ry of energy cutoff.
The convergence of the binding energy (in eV) of CO on the fcc hollow site on Pt(111) with respect to the energy cutoff (in Ry) used to calculate each of the three terms of Eq. (89), for both pseudopotentials: PAW (blue columns) and GTH (orange columns). We present the binding energies in electronvolts (eV) because it is a more common unit than Ry in applications of computational catalysis. For either pseudopotential, the binding energies converge to within 0.10 eV at 40 Ry energy cutoff.
Having established the convergence of binding energies with respect to energy cutoff, we then studied the convergence of the binding energies with respect to the density of the Monkhorst-Pack k-point mesh and the size of the unit cell of the Pt(111) slab. Denser k-point settings barely have any effect on the calculated binding energies. On the other hand, as we make the unit cell smaller the binding energies weaken a little: the binding energies on the (4 × 4), (3 × 3), and (2 × 2) unit cells are -1.79, -1.77, and -1.72 eV calculated with PAW/Vanderbilt pseudopotentials, and -1.69, -1.67, and -1.62 eV calculated with GTH, respectively. This is unsurprising, given that smaller unit cells offer higher coverages of adsorbates: the coverage of adsorbed CO on (4 × 4), (3 × 3), and (2 × 2) unit cells are 1/16, 1/9, and 1/4 ML, respectively. The differences are more pronounced going from the 1/9 ML coverage on the (3 × 3) unit cell to the 1/4 ML coverage on the (2 × 2) unit cell than at the higher end of unit cell size, even though all differences in binding energies are smaller than the 0.10 eV convergence threshold. The hollow site was the most stable adsorption site for both (4 × 4) and (3 × 3) unit cells as commonly predicted by DFT, but this preference switches to the bridge site on the (2 × 2) unit cell37,43. Therefore, for our further analysis, we choose the (3 × 3) unit cell; the smallest size that captures the DFT trend of favourable adsorption on fcc hollow site. Using those parameters (4 × 4 × 1 Monkhorst-Pack k-point mesh and (3 × 3) unit cell), together with 80 Ry for energy cutoff, we report the adsorption energy of CO on Pd(111) and Rh(111) in Table 12. The energy cutoff of 80 Ry is higher than suggested by the convergence tests; nevertheless we err on the side of accuracy to capture small differences in energetics of the different adsorption sites.
Table 12 details the CO adsorption puzzle on transition metals: the CO adsorption energy is overestimated by at least 0.4 eV by DFT compared to experiments, and DFT often fails to predict the correct adsorption site (the exception here is Pd).
Resource estimates
We now compile a series of resource estimates using the plane-wave cutoff estimates from the aforementioned DFT calculations. For all metal slabs we determine the maximum number of bits for each Miller index direction determined from an 80 Rydberg (Ry) spherical energy cutoff; double what is needed for energy difference convergences. The number of grid points in each direction is determined from the maximum Miller index of the spherical cutoff. The total number of electrons is computed using the large-core LDA pseudopotential parameters listed in the Supplementary Information, Table I. The nonlocal pseudopotential parameters used for the resource estimate are described in the Supplementary Information, Table II. We also investigated Diamond in a 3 × 3 × 3 simulation cell and Wurzite (AlN) both with an 80 Ry spherical energy cutoff. The number of bits, total number of electrons, number of atoms of each species, and the block encoding costs are listed in Table 3. We find that for systems with η = 100−500 the block encoding for the entire Hamiltonian is on the order of 104 Toffoli gates. In some systems we have also considered a doubling of the grid size in each direction (corresponding to a doubling in the spherical energy cutoff) and observe modest increases in the Toffoli complexity.
We also compare the block encoding costs using our protocol against the resources reported in ref. 21 for three cathode structures. We find that our protocol provides a factor of 30–40 improvement in the Toffoli cost of block encoding while faithfully simulating the entire nonlocal pseudopotential instead of just the diagonal angular momentum projectors proposed in ref. 21. While 30–40 may seem modest, in terms of orders of magnitude improvement, we note that this is an improvement over a method that incurred an approximation error on the order of hundreds of milliHartree per atom. These estimates from ref. 21 are for a number of plane waves calculated from a spherical cutoff uniformly distributed in each Miller index direction. Due to the non-cubic nature of the simulation cells this provides a non-uniform grid spacing. In our Toffoli costs the majority of the cost originates from the controlled swaps accessing the electronic registers in order to apply the SELECT operator. The coherent swaps (of which there are two sets to move electrons ι and ξ into a work register) is large in part because the number of electrons is large. Our Toffoli costs for the pseudopotential are about a factor of 100 times lower than those in ref. 21. It can be expected that the approach of ref. 21 could also be extended to compute non-diagonal angular momentum projectors in the nonlocal pseudopotential, but it would require terms with kp ⋅ kq and \({({k}_{p}\cdot {k}_{q})}^{2}\). That would require a large number of additional states that would need to be prepared using QROM in that approach, and could increase the cost by a further order of magnitude.
An important feature of the block encoding is the value of λ that scales the block-encoded Hamiltonian. In many algorithms λ plays an important role in quantifying the total costs. In phase estimation λ scales the number of queries to the block encoding for ϵ precision. Table 5 tabulates each λ for the components of the electronic structure LCU determined using Eq. (80) with \({\aleph }_{\alpha ,\nu ,l,i,j}^{{\rm{box}}}\) as given in Eq. (88). The λ component corresponding to the normalization for the nonlocal pseudopotential term is now the dominant cost over the two-electron integral λV. For all systems we observe λnonloc and λV are on the order of 107 and 106. Combined with an ϵ = 1.0 × 10−3 Hartree and the 104 block-encoding costs the 1015 Toffoli complexity is not surprising. All pseudopotentials documented in this work used the LDA large-core GTH potentials. The small core forms (more electrons) include more electrons per atom (in some cases doubling the number of electrons) which if used, would be more accurate at substantial cost.
Comparing to symmetry-adapted second quantization simulation
A benefit of developing a simulation protocol for pseudopotentials using a first quantized plane-wave representation is that we can now directly compare to the most recent quantum algorithms for simulating materials in second quantization. In order to make this comparison, we first estimate the plane wave grid resolution needed to reproduce a localized orbital simulation accurately. We take as an assumption that convergence to the thermodynamic limit is independent of reaching the complete basis set limit in terms of bands. Thus we can use a single primitive cell to verify the number of plane waves (and thus momentum space grid resolution) needed to reproduce the DFT energy of a localized orbital calculation. Shown in Fig. 5 is the total energy convergence of plane-wave DFT using an LDA functional to the molecular optimized double-zeta valence polarized (MOLOPT-DZVP) basis. The plane-wave LDA-DFT calculations are run using qcpanop30 which relies on pseudopotentials implemented in PySCF44. Dashed vertical lines in Fig. 5 are the plane-wave cutoffs where the grid resolution requires another bit. The grid resolutions (Δ values) are determined from the number of bits needed to represent the Miller index grid using Eq. (7).
A comparison of the convergence of the plane-wave (PW) basis towards the MOLOPT-DZVP basis for the primitive cell of the LNO-C2m structure from ref. 7 using LDA-DFT. Δ is the grid resolution in Bohr. The primitive cell of C2m has one formula unit. Thus in the simulation cell we use a supercell of 2 × 2 × 1 corresponding to 4 formula units consistent with the P21c structure. For the supercell we need a grid of n = [5, 5, 5] qubits to achieve a similar resolution of less than 0.36 Bohr in each direction. Thus we use a n = [5, 5, 5] for each of the LNO structures. We note that in ref. 7 the GTH-HF-REV pseudopotential was used which is a small core pseudopotential. Thus there were more electrons in that simulation (132 electrons for small-core versus 92 electrons for large-core).
Table 4 tabulates the first-quantized plane wave λ for each component and total Toffoli complexity needed to perform phase estimation to precision ϵ = 1.6 × 10−3 Hartree. The LNO systems are slightly smaller than the metal surfaces and cathode structures previously considered with the smallest simulation cell involving 92 electrons. Using the plane-wave DFT calculations we determine that for these 92 electron simulations a qubit register of [5, 5, 5] for each electron in each of the Miller index directions would be required to reproduce the MOLOPT-DZVP basis quality using a large-core GTH-LDA pseudopotential. In Fig. 6 we compare the first quantized plane wave resource estimates to the second quantized symmetry-adapted localized orbital resource requirements described in ref. 7. The second quantized simulations are about an order of magnitude lower for the same system in terms of Toffoli complexity. It is important to note though, that due to the extensive use of the QROM primitive to output the Hamiltonian the second quantized simulations of LNO require approximately seventy-five thousand logical qubits.
Toffoli resources (listed above each bar) needed to perform phase estimation to ϵ = 1.6 × 10−3 Hartree precision using the large-core GTH-LDA pseudopotential in first and second quantization. Each system (C2m, P21c, P2c) contains four formula units. The first quantized plane-wave simulations are compared against symmetry adapted second-quantized simulation using the MOLOPT-DZVP basis (shown as Gaussian type orbital (GTO) DZVP). The second quantized costs differ from ref. 7 due to using a larger pseudopotential (large-core GTH-LDA versus GTH-HF-REV) and not normalizing by the number of formula units (which results in λ being divided by four).
For the pseudopotential Toffoli costs given in Table 4 we have used a large number of ancilla qubits (around 8000) to minimize the Toffoli cost of uncomputing the pseudopotential. Without using those ancilla qubits, the Toffoli cost is increased by only about 60%, so 9.6 × 1013 Toffolis for the first line in Table 4. Then the dominant space complexity for the first quantized simulations is the system register. For LNO the system register requires 1380 logical qubits. Thus the total spacetime volume of the second quantized calculation is higher than the first quantized simulation.
Discussion
Quantum simulations of electronic structure using plane waves can have high cost due to the large number of plane waves needed to resolve the core orbitals, as well as the total number of electrons. Pseudopotentials address both these issues by using a smoothly-varying effective potential to model the behaviour of the valence electrons. The difficulty is now that the description of these effective potentials requires more classical data to be input to the quantum algorithm, as compared to non-pseudized potential interactions which can be simply calculated on the fly as in prior plane wave approaches. As seen in second quantized simulations, feeding classical data into quantum algorithms can be a major contributor to the cost and should be minimized.
Prior work in ref. 21 used a very data-intensive approach to implement the pseudopotentials where the entire functional variation needs to be input as data. That negates the advantage of the functional form for pseudopotentials by linking the cost of block encoding to something that is scaling with the total basis set size. Moreover, they used an incomplete form of the pseudopotentials, corresponding to errors on the order of thousands of times the desired accuracy of the calculation. We demonstrate the magnitude of the error in Section “Errors from omitting angular momentum projectors” and further emphasize that this error will grow as a function of system size or as more sophisticated GTH-type pseudopotentials are used. While it is possible to extend the QROM data-intensive approach to include inner products of basis states and higher powers of inner products of basis states, that classical data would incur a substantial quantum resource overhead.
In contrast, we have fully taken advantage of the functional form of the pseudopotentials to provide a simulation method that minimizes the use of the classical data. As well as providing the complete pesudopotential for accurate simulations, the Toffoli count for a single block encoding of the Hamiltonian is far lower than that in ref. 21 and recovers the expected \(\widetilde{{\mathcal{O}}}(\eta )\) scaling of block encoding in first quantization. We also provide a set of routines for handling non-cubic simulation cells that maintains uniform grid resolutions. This involves modifying basis state inner products to include reciprocal cell geometry via the coefficients of the reciprocal cell Gramian. Allowing different bits in each Miller index direction necessitates revisiting some of the block encoding primitives such as the ν state preparation.
The improvement in performance by using the pseudopotentials in the quantum algorithm reflects the improvement obtained by using pseudopotentials in classical algorithms. The accuracy is improved for a given basis size, enabling a smaller basis to be used with lower-energy plane waves. In contrast to classical algorithms, the improvement in the quantum algorithm comes about because the λ-value is reduced. In the classical literature, the improvements provided by pseudopotentials are not described in a complexity-theoretic way, so our improvement is not given in big O notation. However, our improvement over the pseudopotential algorithm of ref. 21 can be so described. That algorithm is polynomial in the basis size N, whereas ours is logarithmic. More specifically it scales as \({\mathcal{O}}({\log }^{2}N)\) due to the need to perform multiplications to calculate norms.
A drawback to our approach is that we have simplified the implementation of the nonlocal pseudopotential in a way that results in the λ-value being increased. The overall simulation complexity scales as the product of λ and the block-encoding complexity. The λ-value is increased modestly by breaking the pseudopotential into the sum to reduce the arithmetic needed. A more serious increase in the value of λ comes from the state preparation over ν for the nonlocal pseudopotential. In contrast, ref. 21 used a representation that reduces λ. That being said, the large λ values are inherent to the plane-wave basis we have selected.
Using the new algorithmic primitives for handling non-cubic unit cells and pseudopotentials we assess the fault-tolerant resources needed to simulate a much broader class of materials simulation problems than prior work that was isolated to cubic unit cells. To provide canonical costs, we estimate the quantum resources needed to resolve the CO-puzzle in heterogenous catalysis. While the costs are substantial it is important to note that these systems would have resulted in unrealistically high Toffoli complexity in a similar all-electron calculation. Another large benefit of developing the algorithmic primitives needed to accurately simulate in a non-cubic unit cell and with pseudopotentials is that we can now, for the first time, directly compare to second quantized simulation of materials recently introduced in ref. 7. In this work we have compared the flagship Lithium-Nickel-Oxide battery cathode simulation problem outlined in ref. 7 and estimated that the space-time volume of our new first quantized simulation approach provides substantial value.
Another benefit of the full compilation of non-local pseudopotentails is that it does provide a direction for further improvements. For example, when block encoding the nonlocal pseudopotential, a strategy to reduce λ would be to use a state preparation step with more detailed dependence on ν rather than just the nested boxes we have used here. Alternatively it may be possible to provide a different form of block encoding to reduce λ in a similar way as in ref. 21, though that seems to make it difficult to take advantage of the functional form to reduce the complexity.
Methods
Preparing via nested boxes
The procedure for preparing states using nested boxes was previously explained in ref. 11. Here, we give a more detailed explanation to also take into account the preparation of states with more general amplitudes (as compared to 1/∥ν∥ in ref. 11). The procedure for the nested boxes may be described in detail as follows. In the most general case, say we are aiming to produce a state of the form
where we are allowing for different dimensions in each direction. For our application here, u(ν) = 1/∥kν∥ for V, and there we use amplitude amplification in order to prepare this state with high probability for success. We also treat the pseudopotential using the nested boxes, but do not use amplitude amplification.
The quantities nx, ny, nz are the numbers of bits excluding the sign bits for νx, νy, νz, respectively. We are allowing sums up to \({2}^{{n}_{x}}-1\) (and similarly for y and z) rather than \({2}^{{n}_{x}}-2\), which is the strict bound of the region Gd for ν. That is for simplicity of the explanation, and it is also possible to prepare over just the region Gd. The implementation with this form is correct, because it just contributes to parts of the Hamiltonian where q − ν ∉ G, which are eliminated. The more strict preparation over Gd would just give a (very small) improvement in the λ-value for the block-encoding.
First, we prepare a state of the form
where the weighting ψμ is \(\sqrt{{2}^{\mu }}\) when we are preparing a state with amplitudes 1/∥ν∥, but has a more complicated dependence for 1/∥kν∥ with non-cubic unit cells. The principle is that μ indexes the boxes so that the size (width) of the boxes goes up as 2μ corresponding to larger values of ∥ν∥. The initial amplitude increases with μ, rather than decreasing with μ as might be expected (since larger μ corresponds to smaller 1/∥ν∥). The reason for the initial amplitude in the superposition is that it needs to account for all amplitudes in the box, rather than just one. As the size of the box is increased the number of values of ν in the box scales as 23μ, which gives an extra factor of 23μ/2 for each value of μ, meaning that the amplitude needed in the state preparation goes like \({\psi }_{\mu }\propto \sqrt{{2}^{\mu }}\).
In this work we have ∥kν∥ not proportional to ∥ν∥, as well as different numbers of bits in each direction, requiring a more general treatment. The form of the boxes we will take here is
where we have shifted μ according to μx = μ − δx, μy = μ − δy, μz = μ − δz. This general form of the boxes is convenient to take account of non-orthogonal Bravais vectors and unequal nx, ny, nz. For this set of boxes we do not eliminate inner boxes, so they are overlapping. This approach boosts the probability of success, as well as simplifying the preparation.
The key part of this definition is the first three inequalities. These restrict the dimension of the box in the three directions. The other inequalities (four to six) are just the conditions that are required for all values of ν. We include these because the generality of the definition allows for cases where (for example) μx − 1 > nx. Without loss of generality, we can take μ to be the maximum of μx, μy, μz, which corresponds to taking δx, δy, δz ≥ 0, and the minimum of them zero.
Now consider the example where μx is larger than μy and μz, so μ = μx. (Other cases with one largest value of μx,y,z are equivalent from symmetry.) This inequality relation (between μx and μy, μz) would then hold for all boxes Bμ, because the relative values of μx,y,z are given by the offsets δx,y,z. Then, for μ = μx = 2 we need ∣νx∣ < 2, but νy = νz = 0 from the first three inequalities. Another situation is where there are two equal largest values of μx,y,z, for example μx = μy > μz. Then the box for μ = 2 will require ∣νx∣ < 2 and ∣νy∣ < 2, but νz = 0. In the case with equal μx = μy = μz the cube with μ = 2 is that where ∣νx,y,z∣ < 2.
In each case, the box with μ = 1 includes only the point ν = 0. That point is excluded for V, as well as the local pseudopotential. For those cases ν = 0 would give a term proportional to the identity. That means the minimum value of μ would be 2. In the case of the nonlocal pseudopotential we need ν = 0, so can take μ = 1 to be the smallest allowed value.
For the maximum of μ, note that if μx − 1 > nx, μy − 1 > ny, and μz − 1 > nz, the size of the box is being constrained by inequalities four to six. Moreover, we could decrease μ by 1, and the box would be unchanged. Because we are avoiding identical boxes, we therefore take the maximum μ to be that where at least one of μx > nx + 1, μy > ny + 1, μz > nz + 1 is violated. This corresponds to the maximum value of μ being
After preparing a state with a superposition over μ, we would prepare a superposition of ν values within Bμ, giving
After this, we may use the QROM interpolation to output the value of u(ν) in a register and perform an inequality test with a register in superposition in order to obtain the desired amplitude.
For our definition, the sizes of the boxes ∣Bμ∣ are given as
The ceiling is to take account of cases where, for example, \({2}^{{\mu }_{x}-1} \,< \,1\), which would mean the only acceptable value of νx is 0. In the state preparation, we would actually generate positive and negative zeros in the signed integers. The negative zeros need to be eliminated, but the net effect is that we have a number
This effective set \(B^{{\prime} }_{\mu }\) with negative zeros is used to account for the negative zeros generated in the state preparation.
We need to account for the fact that each ν will not just be an element of one box, but if it is an element of one box Bμ for a minimum value of μ, then it will also be an element of boxes for \({B}_{\mu ^{\prime} }\) for \(\mu ^{\prime} \ge \mu\). To account for this, let us write μ(ν) as the minimum μ such that ν ∈ Bμ. Then the state from Eq. 94 can be rewritten as
Note that we are using \(B^{{\prime} }_{\mu }\) to account for the production of negative zeros. So that the correct amplitude can be obtained by inequality testing, the obvious requirement would be (we require u(ν) ≥ 0)
This definition could cause problems if ∣u(ν)∣ can be larger for larger boxes, so we modify it slightly to
With this definition, the total amplitude for ν prior to inequality testing is proportional to \(\mathop{\max }\nolimits_{\nu ^{\prime} \in {B}_{{\mu }_{\max }}\backslash {B}_{\mu (\nu )-1}}| u(\nu ^{\prime} )|\).
To be more specific about how we compute ψμ, let \({\tilde{\psi }}_{\mu }\) be an unnormalised form satisfying
so that
Here we have taken the sum over μ from 1, but for cases where the minimum μ is 2 the value of \({\tilde{\psi }}_{1}\) can be taken to be zero. To obtain \({\tilde{\psi }}_{\mu }\), we can use the formula
and for \(\mu \,< \,{\mu }_{\max }\),
Because we are taking the maximum over \({B}_{{\mu }_{\max }}\backslash {B}_{\mu -1}\), this expression cannot give negative values. (That is the problem that can be encountered if we were to use Bμ\Bμ−1.) To check this formula, it gives
as required.
We can use this approach to compute the probability for success of the state preparation via inequality testing, as
To show this expression, the prepared state is
The first line is just from the definition of ψμ as the normalised form of \({\tilde{\psi }}_{\mu }\). The second line is the state resulting from the preparation of superpositions for each μ. The third line is the equivalent state where we ignore the \(\left\vert \mu^ {\prime} \right\rangle\) register (which is allowed for in the state preparation for linear combinations of unitaries). The fourth line is from the definition of \({\tilde{\psi }}_{\mu }\). The final line is that obtained from an inequality test. This subnormalised state gives the probability of success as its norm.
For preparation of 1/∥kν∥, we perform amplitude amplification boosting the probability for success. We can then adjust δx, δy, δz to obtain probabilities of success after the amplitude amplification that are close to 1. Appropriate choices of δx, δy, δz and success probabilities are given in Table 13, where we use the lattice vectors given in the Supplementary Information, Table III. In all cases the initial probability is larger than or close to 1/4, so the success probability after amplitude amplification is close to 1.
For the pseudopotentials it is convenient to use a state preparation that is as simple as possible, then apply the appropriate amplitudes as part of the controlled operations in the block encoding. If we use the state preparation with amplitudes ψμ as above, then there is an increase in the effective λ due to the negative zeros and the possibly larger values of u(ν) for the larger boxes. It is therefore convenient to use the similar amplitude amplification as above but only for eliminating the inner boxes and negative zeros. This will be used to prepare a state of the form
where
and \(\psi ^{{\prime} }_{\mu }\) is obtained from \(\widetilde{\psi }^{{\prime} }_{\mu }\) via normalisation. This state has a large amplitude for preparation because it is mainly eliminating the negative zeros. Moreover, it is giving an effective λ where in each box excluding the inner box Bμ\Bμ−1, the value of u(ν) is replaced with its maximum over this region.
Complexity of norm with Bravais vectors
In the simple case of orthogonal Bravais vectors of equal length, computing ∥ν∥2 requires the sum of three squares. In the case with nx + ny + nz = n, the sum of squares may be computed using 3n2 − n − 1 Toffolis using the approach of Lemma 8 of ref. 11. More generally, we would need to compute ∥ν∥2 for general Bravais vectors, and nx, ny, nz may be different from each other. Here we discuss the complexity for the case of general vectors, but in specific examples there may be simplifications. The general form we need to calculate is
The arithmetic needed is then as follows, where we give only leading-order complexities.
-
1.
Squaring of νx, νy, νz, with complexity \({n}_{x}^{2},{n}_{y}^{2},{n}_{z}^{2}\), respectively.
-
2.
Three multiplications of pairs of different νx, νy, νz, with complexities 2nxny, 2nynz, 2nznx. The leading-order complexity of this and the previous step would then be \({({n}_{x}+{n}_{y}+{n}_{z})}^{2}\).
-
3.
Six multiplications by real numbers.
-
4.
Five additions, with complexity that may be ignored if we consider only leading order complexities.
For simplicity of the estimation of the complexity, we will give the number of bits to be used for real numbers as b. This will be used for the number of bits for the Bravais vectors as well as the precision of the result. Using the result in Eq. (D9) of ref. 32, the complexity of multiplying a dB-qubit integer and a real number is given by
where ϵ is the required precision in the resulting product, so \(\log (1/\epsilon )\) would be equivalent to b here. That complexity is for the real number in a quantum register, and the complexity is approximately halved when the real number is given as a classical constant, as it is in the case of computing the Bravais vectors. To leading order the complexity would then be
The values of dB would be 2nx, 2ny, 2nz, nx + ny, ny + nz, nx + nz, so the complexity in the worst case of six multiplications is
An assumption for this complexity is that b is larger than the numbers of bits in the integers. Adding to that the complexity of the squaring and multiplications of the integers is, in the worst case, \({({n}_{x}+{n}_{y}+{n}_{z})}^{2}\), which would give a worst-case complexity
There are specific examples of Bravais vectors given in the Supplementary Information, Table III. For these examples the complexity can be reduced as follows. In the following we are ignoring costs linear in the numbers of bits. That means that when we add (for example) νx and νy we approximate the number of bits as \(\max ({n}_{x},{n}_{y})\) for estimation of the complexity.
-
1.
For LiNO2 (C2/m) there are no zeros in the vectors, suggesting that the full complexity would be needed. However, we first note that we may remove any overall multiplicative factor in this calculation. This is because the norm ∥kν∥2 is always multiplied by a further constant in the algorithm, and any overall multiplying factor may be absorbed into that constant. It is found that g11 = g22 and g23 = −g13, so we may use a choice of multiplying constant so that (for example) −g23 = g13 = 1. Then we can rewrite the expression to calculate as
$${g}_{11}{({\nu }_{x}+{\nu }_{y})}^{2}+{g}_{33}{\nu }_{z}^{2}+2({g}_{12}-{g}_{11}){\nu }_{x}{\nu }_{y}+2({\nu }_{x}-{\nu }_{y}){\nu }_{z}\,.$$(114)First, only two squares and two products are needed, reducing the arithmetic complexity of that part to
$$\max {({n}_{x},{n}_{y})}^{2}+{n}_{z}^{2}+2{n}_{x}{n}_{y}+2\max ({n}_{y},{n}_{x}){n}_{z}\,.$$(115)Then only two multiplications by the squares and one multiplication by a product are needed. That means the complexity of that part is
$$\begin{array}{c}2\max ({n}_{x},{n}_{y})[\max ({n}_{x},{n}_{y})+b]+2{n}_{z}({n}_{z}+b)+{({n}_{x}+{n}_{y})}^{2}/2\\\qquad\quad+\,({n}_{x}+{n}_{y})b\,.\end{array}$$(116) -
2.
For LiNO2 (P21/c), LiNO2 (P2/c), or Li0.5MnO3 there are no equalities between terms, but g23 = g12 = 0. Similarly, for Li0.75[Li0.17Ni0.25Mn0.58]O2 there are no equalities but g23 = g13 = 0. Then we need to compute all three squares, but the only product is νxνz, or νxνy for Li0.75[Li0.17Ni0.25Mn0.58]O2. That gives complexities
$${({n}_{x}+{n}_{z})}^{2}+{n}_{y}^{2},\qquad {({n}_{x}+{n}_{y})}^{2}+{n}_{z}^{2}\,,$$(117)for the two cases. If we choose a scaling such that g13 = 1, or g12 = 1 for Li0.75[Li0.17Ni0.25Mn0.58]O2, then we only need multiplications by the squares of the components, giving complexity
$$2({n}_{x}^{2}+{n}_{y}^{2}+{n}_{z}^{2})+2b({n}_{x}+{n}_{y}+{n}_{z})\,.$$(118) -
3.
For Pd (3 × 3), Pt (2 × 2), Pt (3 × 3), or Rh (3 × 3) we have g11 = g22 = −2g12, so we may scale such that all these are equal to 1. Similarly, for Pt (4 × 4) or AlN (wurzite) we have g11 = g22 = 2g12, and the same scaling may be used. In both cases g23 = g13 = 0. The expression to calculate can then be given as
$${({\nu }_{x}-{\nu }_{y})}^{2}+{g}_{33}{\nu }_{z}^{2}+{\nu }_{x}{\nu }_{y}\,,\, {({\nu }_{x}+{\nu }_{y})}^{2}+{g}_{33}{\nu }_{z}^{2}-{\nu }_{x}{\nu }_{y}\,,$$(119)in the two cases. Then we only have two squarings and one product, with complexity
$$\max {({n}_{x},{n}_{y})}^{2}+{n}_{z}^{2}+2{n}_{x}{n}_{y}\,.$$(120)Then there is the additional complexity for the multiplication by g33, which is 2nz(nz + b).
-
4.
For Li0.75MnO2F there is g22 = g33, but g12 = g23 = g13 are all zero. The complexity of the three squarings is \({n}_{x}^{2}+{n}_{y}^{2}+{n}_{z}^{2}\). We can choose a constant factor such that g22 = g33 = 1, and the only multiplication needed is for the square of the x-component, giving complexity 2nx(nx + b).
-
5.
For CaTiO3 we have no equalities but g12 = g23 = g13 are all zero. Again the complexity of the three squarings is \({n}_{x}^{2}+{n}_{y}^{2}+{n}_{z}^{2}\). We can scale such that, for example, g33 = 1, in which case we only have complexity for multiplication for the x- and y- components giving
$$2{n}_{x}({n}_{x}+b)+2{n}_{y}({n}_{y}+b)\,.$$(121) -
6.
For Diamond all the products are needed, but ∥kν∥2 can be rewritten as proportional to
$${({\nu }_{x}+{\nu }_{y}-{\nu }_{z})}^{2}+{({\nu }_{x}-{\nu }_{y}+{\nu }_{z})}^{2}+{(-{\nu }_{x}+{\nu }_{y}+{\nu }_{z})}^{2}.$$(122)This means that the complexity is just that for three squarings. That is, the complexity would be \(3\max {({n}_{x},{n}_{y},{n}_{z})}^{2}\). Then there is no further complexity for multiplications by real numbers.
In the case of an inner product between different vectors using the Gramian, the costs of multiplication by the gij constants are unchanged, but there are different costs for the products of integers. In the worst case there are 9 products between integers needed, rather than three products and three squarings. The (leading) overall complexity is multiplied by 2, because all the squarings are replaced with products of different numbers, and all products of different components are replaced with two products of different components. That gives a complexity \(2{({n}_{x}+{n}_{y}+{n}_{z})}^{2}\), so the combination with the worst-case complexity of the multiplications above gives complexity
For the individual examples listed above, the costs are again multiplied by 2 for the same reason as for the worst-case complexity. To be more specific, the analysis for the first case is as follows. The dot product can be written as, instead of Eq. (115),
Thus the squares are replaced with products, and the products are replaced with two products, doubling the complexity to twice that in Eq. (115). The analysis for all other examples is identical.
Data availability
Data is available from the corresponding authors upon reasonable request.
Code availability
All code for resource estimation can be found in the GitHub repository45. All DFT calculations used QuantumEspresso, Pyscf, and qcpanop. Other code used to generate data for this paper is available from the corresponding authors upon reasonable request.
References
Babbush, R. et al. Exponentially more precise quantum simulation of fermions in second quantization. N. J. Phys. 18, 33032 (2016).
Babbush, R. et al. Exponentially more precise quantum simulation of fermions in the configuration interaction representation. Quant. Sci. Technol. 3, 015006 (2018).
Babbush, R. et al. Encoding electronic spectra in quantum circuits with linear T complexity. Phys. Rev. X. 8, 041015 (2018).
von Burg, V. et al. Quantum computing enhanced computational catalysis. Phys. Rev. Res. 3, 033055 (2021).
Lee, J. et al. Even more efficient quantum computations of chemistry through tensor hypercontraction. PRX Quant. 2, 030305 (2021).
Goings, J. J. et al. Reliably assessing the electronic structure of cytochrome P450 on today’s classical computers and tomorrow’s quantum computers. Proc. Natl Acad. Sci. 119, e2203533119 (2022).
Rubin, N. C. et al. Fault-tolerant quantum simulation of materials using Bloch orbitals. PRX Quant. 4, 040303 (2023).
Ivanov, A. V. et al. Quantum computation for periodic solids in second quantization. Phys. Rev, Res. 5, 013200 (2023).
Oumarou, O., Scheurer, M., Parrish, R. M., Hohenstein, E. G. & Gogolin, C. Accelerating quantum computations of chemistry through regularized compressed double factorization. Quantum 8, 1371 (2024).
Babbush, R., Berry, D. W., McClean, J. R. & Neven, H. Quantum simulation of chemistry with sublinear scaling in basis size. npj Quant. Informat. 5, 92 (2019).
Su, Y., Berry, D. W., Wiebe, N., Rubin, N. & Babbush, R. Fault-tolerant quantum simulations of chemistry in first quantization. PRX Quant. 2, 040332 (2021).
Delgado, A. et al. Simulating key properties of lithium-ion batteries with a fault-tolerant quantum computer. Phys. Rev. A. 106, 032428 (2022).
Neugebauer, J. & Hickel, T. Density functional theory in materials science. WIREs Comput. Mol. Sci. 3, 438 (2013).
Hamann, D. R., Schlüter, M. & Chiang, C. Norm-conserving pseudopotentials. Phys. Rev. Lett. 43, 1494 (1979).
Vanderbilt, D. Soft self-consistent pseudopotentials in a generalized eigenvalue formalism. Phys. Rev. B. 41, 7892 (1990).
Blöchl, P. E. Projector augmented-wave method. Phys. Rev. B. 50, 17953 (1994).
Kresse, G. & Joubert, D. From ultrasoft pseudopotentials to the projector augmented-wave method. Phys. Rev. B. 59, 1758 (1999).
Dolg, M. & Cao, X. Relativistic pseudopotentials: their development and scope of applications. Chemical Reviews 112, 403 (2012).
Goedecker, S., Teter, M. & Hutter, J. Separable dual-space Gaussian pseudopotentials. Phys. Rev. B. 54, 1703 (1996).
Hartwigsen, C., Goedecker, S. & Hutter, J. Relativistic separable dual-space Gaussian pseudopotentials from H to Rn. Phys. Rev. B. 58, 3641 (1998).
Shokrian Zini, M. et al. Quantum simulation of battery materials using ionic pseudopotentials. Quantum 7, 1049 (2023).
Grover, L. & Rudolph, T., Creating superpositions that correspond to efficiently integrable probability distributions. arXiv https://arxiv.org/abs/quant-ph/0208112 (2002).
Jones, G. et al. First principles calculations and experimental insight into methane steam reforming over transition metal catalysts. J. Catalysis 259, 147 (2008).
Grabow, L. & Mavrikakis, M. Mechanism of methanol synthesis on Cu through CO2 and CO hydrogenation. ACS Catalysis 1, 365 (2011).
Behrens, M. The active site of methanol synthesis over Cu/ZnO/Al2O3 industrial catalysts. Science 336, 893 (2012).
Grabow, L. C., Gokhale, A. A., Evans, S. T., Dumesic, J. A. & Mavrikakis, M. Mechanism of the water gas shift reaction on Pt: first principles, experiments, and microkinetic modeling. J. Phys. Chem. C. 112, 4608 (2008).
Lejaeghere, K. et al. Reproducibility in density functional theory calculations of solids. Science 351, aad3000 (2016).
Kassal, I., Jordan, S. P., Love, P. J., Mohseni, M. & Aspuru-Guzik, A. Polynomial-time quantum algorithm for the simulation of chemical dynamics. Proc. Natl Acad. Sci. USA 105, 18681 (2008).
Ye, H.-Z. & Berkelbach, T. C. Correlation-consistent Gaussian basis sets for solids made simple. J. Chem. Theory Comput. 18, 1595 (2022).
QCPANOP. ncrubin/qcpanop. https://github.com/ncrubin/qcpanop (2024).
Sanders, Y. R., Low, G. H., Scherer, A. & Berry, D. W. Black-box quantum state preparation without arithmetic. Phys. Rev. Lett. 122, 020502 (2019).
Sanders, Y. R. et al. Compilation of fault-tolerant Quantum heuristics for combinatorial optimization. PRX Quant. 1, 020312 (2020).
Bailey, D. H., Borwein, J. M. & Crandall, R. E. Advances in the theory of box integrals. Math. Comput. 79, 1839 (2010).
Sehested, J., Dahl, S., Jacobsen, J. & Rostrup-Nielsen, J. R. Methanation of CO over nickel: mechanism and kinetics at high H2/CO ratios. J. Phys. Chem. B 109, 2432 (2005).
Takenaka, S., Shimizu, T. & Otsuka, K. Complete removal of carbon monoxide in hydrogen-rich gas stream through methanation over supported metal catalysts. Int. J. Hydro. Energy 29, 1065 (2004).
Kresse, G., Gil, A. & Sautet, P. Significance of single-electron energies for the description of CO on Pt(111). Phys. Rev. B. 68, 073401 (2003).
Patra, A., Peng, H., Sun, J. & Perdew, J. P. Rethinking CO adsorption on transition-metal surfaces: Effect of density-driven self-interaction errors. Phys. Rev. B. 100, 035442 (2019).
Feibelman, P. J. et al. The CO/Pt(111) puzzle. J. Phys. Chem. B. 105, 4018 (2001).
Olsen, R., Philipsen, P. & Baerends, E. CO on Pt(111): a puzzle revisited. J. Chem. Phys. 119, 4522 (2003).
Perdew, J. P. Climbing the ladder of density functional approximations. MRS Bull. 38, 743 (2013).
Perdew, J. P., Burke, K. & Ernzerhof, M. Generalized gradient approximation made simple. Phys. Rev. Lett. 77, 3865 (1996).
Giannozzi, P. et al. QUANTUM ESPRESSO: a modular and open-source software project for quantum simulations of materials. J. Phys. Condens. Matter 21, 395502 (2009).
Ford, D. C., Xu, Y. & Mavrikakis, M. Atomic and molecular adsorption on Pt(111). Surface Sci. 587, 159 (2005).
Sun, Q. et al. Recent developments in the PySCF program package. J. Chem. Phys. 153, 024109 (2020).
QCPANOP. ncrubin/qcpanop. https://github.com/ncrubin/pseudopotential_ftqc (2024).
Steininger, H., Lehwald, S. & Ibach, H. On the adsorption of CO on Pt(111). Surface Sci. 123, 264 (1982).
Szanyi, J., Kuhn, W. K. & Goodman, D. W. CO adsorption on Pd(111) and Pd(100): low and high pressure correlations. J. Vacuum Sci. Technol. A 11, 1969 (1993).
Wei, D., Skelton, D. & Kevan, S. Desorption and molecular interactions on surfaces: C)Rh (110), CORh (100) and CORh (111). Surface Sci. 381, 49 (1997).
Acknowledgements
The authors thank Yuan Su, Nathan Wiebe, Sam Pallister, and Burak Şahinoğlu for helpful discussions and analysis. DWB worked on this project under a sponsored research agreement with Google Quantum AI. DWB is also supported by Australian Research Council Discovery Projects DP190102633, DP210101367 and DP220101602.
Author information
Authors and Affiliations
Contributions
D.W.B. developed the algorithms and costings in Sections “Method for block encoding of U”, “Block encoding the GTH pseudopotential”, “Details of the costing”, “Other costs in the block encoding”, and “Computing λ” as solutions to issues raised by R.B. N.C.R. performed the analysis of errors in Section “Errors from omitting angular momentum projectors” and the cost estimates for real materials in Section “Cost estimates for quantum simulating interesting materials”. A.O.E. and G.A. provided the CO adsorption systems, performed the DFT computations to determine the energy convergences and wrote the relevant portion of Section “Cost estimates for quantum simulating interesting materials”. A.E.D. performed DFT computations comparing full and approximate HGH-type pseudopotentials in Table I and wrote the associated portions of Section “Errors from omitting angular momentum projectors”. J.L. contributed to the theory of the appropriate pseudopotentials. R.B. and C.G. helped to manage and coordinate the collaboration. All authors participated in the discussion of results and writing and revision of the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Berry, D.W., Rubin, N.C., Elnabawy, A.O. et al. Quantum simulation of realistic materials in first quantization using non-local pseudopotentials. npj Quantum Inf 10, 130 (2024). https://doi.org/10.1038/s41534-024-00896-9
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41534-024-00896-9