Implementation of the Projective Quantum Eigensolver on a Quantum Computer

We study the performance of our previously proposed projective quantum eigensolver (PQE) on IBM’s quantum hardware in conjunction with error mitigation techniques. For a single qubit model of H2, we find that we are able to obtain energies within 4 millihartree (2.5 kcal/mol) of the exact energy along the entire potential energy curve, with the accuracy limited by both the stochastic error and the inconsistent performance of the IBM devices. We find that an optimization algorithm using direct inversion of the iterative subspace can converge swiftly, even to excited states, but stochastic noise can prompt large parameter updates. For the 4-site transverse-field Ising model at its critical point, PQE with an appropriate application of qubit tapering can recover 99% of the correlation energy, even after discarding several parameters. The large number of CNOT gates needed for the additional parameters introduces a concomitant error that, on the IBM devices, results in a loss of accuracy despite the increased expressivity of the trial state. Error extrapolation techniques and tapering or postselection are recommended to mitigate errors in PQE hardware experiments.


Introduction
The rapid evolution of quantum hardware has opened a venue for addressing challenging problems in quantum chemistry with quantum computation [1][2][3][4][5][6][7][8][9][10].However, quantum computing is currently not a practical tool for electronic structure theory.The heart of the problem is that existing implementations of digital quantum computers are noisy, intermediate-scale quantum (NISQ) devices [9,11,12] with a limited number of qubits, short decoherence times, and subject to several sources of error.Even highly accurate state-of-the-art quantum algorithms require sophisticated error mitigation techniques to tame the noise inherent to NISQ devices [13][14][15][16].The ultimate test for quantum algorithms is, therefore, based on their utility in producing highly accurate results on hardware.Within the NISQ context, one of the most promising classes of quantum algorithms for near-term applications is based on the variational quantum eigensolver (VQE) [17][18][19], a hybrid quantum-classical approach that variationally optimizes a trial state parameterized by a quantum circuit.In recent years, there has been a growing interest in finding better alternatives to VQE for NISQ devices.However, most new quantum algorithms are usually assessed via emulators, and hardware benchmarks are scarce.
In this work, we implement and examine the performance of the projective quantum eigensolver (PQE) [20] on quantum hardware.PQE is a NISQ-focused hybrid quantumclassical algorithm that seeks to optimize a parameterized trial state by satisfying projections of the Schrödinger equation onto a many-body basis.PQE has been studied exclusively in the context of method development.These include selected variants of PQE [20], which is part of the family of adaptive methods [20,[43][44][45][46]. Two follow-up studies have focused on reducing the number of CNOT gates in unitary coupled cluster trial states combined with PQE [47,48].More recently, post-iterative corrections to PQE have been explored.[49].Additionally, a formula used in PQE for asymmetric expectation values was employed by Filip and co-workers in the context of projective quantum Monte Carlo [50] and by Angelakis et al. for an MP2 variant [24].Maitra and coworkers proposed a "two-phase" PQE [51].While the first phase is standard PQE, the second phase computes the values of the "small parameters" by machine learning from the first phase and thereby requires fewer quantum measurements.Of greater relevance to this article, a recent publication from Maitra and co-workers (which appeared while this manuscript was being finalized) applied pre-existing error mitigation techniques to PQE on noisy emulators [52].
The present article ties these two threads of PQE and hardware together, reporting the first realization of PQE on NISQ devices, with the goal of assessing the effectiveness of techniques to mitigate noise in the measurements and whether the algorithmic differences between PQE and VQE matter on hardware.The paper is organized as follows: Section 2 provides a self-contained introduction to the theory of the PQE and a brief overview of the error mitigation techniques that we employ.The results of our study are found in Section 3. We begin with a careful analysis of H 2 in a minimal basis set, described in Section 3.1.After summarizing the encoding of the problem onto a quantum computer, we evaluate the noise in PQE at a fixed geometry due to finite measurement, compute an entire dissociation curve including a model of the quantum noise, and then compute a dissociation curve computed using IBM quantum devices.We then proceed to study the transverse-field Ising model (TFIM) in Section 3.2.We begin by validating our procedure, including a custom tapering procedure, on a noiseless simulator.We proceed to study the four-site problem, first with noise models and then with results from PQE simulations on quantum hardware.Afterwards, we compute TFIM with up to seven sites using noise models on quantum emulators, to compare PQE and VQE.We conclude with remarks on the prospects of PQE in Section 4.

Projective Quantum Eigensolver
Here, we summarize the PQE algorithm as introduced previously [20].The ground-state energy of a quantum mechanical system is the lowest eigenvalue of its Hamiltonian, H. Let |Φ 0 ⟩ be a reference state in the domain of H, usually chosen as simultaneously an unentangled qubit register and close to an exact eigenstate of H, and let U be a unitary operator on the domain of H. Then U |Φ 0 ⟩ is an eigenvector of H if and only if |Φ 0 ⟩ is an eigenvector of U −1 HU = H.This occurs when the projection of H |Φ 0 ⟩ onto any vector |Φ ⊥ 0 ⟩ orthogonal to |Φ 0 ⟩ is zero, i.e., ⟨Φ ⊥ 0 | H|Φ 0 ⟩ = 0.If we construct a basis for the orthogonal complement of |Φ 0 ⟩ it suffices to satisfy a residual equation for each basis element.
When the residual equations are satisfied, the energy eigenvalue E can then be read as In practice, we can neither enforce all residual equations nor vary across all possible unitaries U .Instead, we parameterize our unitary U as a product of N operators for real parameters θ µ and anti-Hermitian operators κ µ .We then define the orthogonal complement basis as the set of states As long as the |Φ µ ⟩ are orthogonal to each other and to |Φ 0 ⟩, we then enforce the residual equation for each |Φ µ ⟩.For fermionic systems, these κ µ are generators of rotations between occupied (hole) and unoccupied (particle) orbital clusters in second quantization [154].If we denote occupied (unoccupied) orbitals with the indices i, j, . . .(a, b, . ..), then the κ µ operators are of the form a , and are in one-to-one correspondence with excited Slater determinants.Permuting two orbitals within a Slater determinant produces the same determinant, but with a phase of −1 [154].We resolve the phase ambiguity by putting orbitals in a lexicographical order, which means that equation 4 requires a negative sign for some µ.This carries through all equations involving |Φ µ ⟩.We neglect the sign for clarity of exposition.For spin systems, these κ µ are imaginary multiples of Pauli strings and are associated with a specific spin state on each site.
To ensure that the residuals are zero, we first need to measure them.In our previous paper [20], we proposed a simple way to measure residual elements.Observe that for our choice of κ µ we have that We then showed that the real component of the residual element may be evaluated as using the expectation value It is thus possible to measure a single residual element with two quantum measurements beyond what is needed for the ground state energy E = E µ (0), by analogy to the parameter-shift rule [155][156][157].Unlike the parameter shift rule, this formula requires three energies.We recently derived a residual formula more similar to the parametershift rule, which expresses the real component of the residual using only two energies: We refer to Eq. 7 as the reference-shift rule.While these formulae are formally equivalent, it does not follow that they are equally performant on NISQ devices.Our initial numerical tests were not able to determine a statistically significant difference between Eqs. 5 and 7 that was robust to changes in the number of shots and the noise model.Therefore, for aesthetic reasons, our computations will use equation (7).
In PQE, the parameters are then varied so that the measured residuals are zero, within some numerical tolerance.In general, the zeros will not be unique: not only may multiple ⃗ θ produce the same U ( ⃗ θ) |Φ 0 ⟩, but different U ( ⃗ θ) |Φ 0 ⟩ can approximate different eigenstates of H.Both of these hold for VQE as well.For the optimization algorithm, we employ a quasi-Newton step accelerated with the direct inversion of the iterative subspace (DIIS) method [158], as proposed in our previous publication [20].The use of machine learning to predict the values of small parameters from the large parameters has been studied elsewhere [51].

Error Mitigation Techniques
We now sketch the error mitigation techniques we employ.For details, see Reference 9 and references therein.The technical details of how these techniques were applied to individual systems are found in each system's respective subsection within our results.

Readout Error Mitigation
A major source of error in NISQ devices arises in the measurement process itself, i.e., readout error.For example, a qubit in the |1⟩ state is mistakenly identified as being in the |0⟩ state and vice versa.We correct for it by using X gates to prepare the quantum register in a given bitstring and then measuring the occupation of all qubits.By repeating this operation for all bitstrings, we construct a linear transformation from actual bitstring probabilities to experimentally observed bitstring probabilities.When we obtain any experimental measurement probabilities, we use this matrix to estimate the bitstring probabilities in the absence of readout error via Qiskit's least squares procedure, which ensures that measurement probabilities stay between 0 and 1.

Qubit Tapering
Qubit tapering is typically considered a technique to reduce the number of qubits, not the error.However, tapering may reduce the resources required, preventing the error from those resources.For example, in Section 3.2, a generator that entangles four qubits is replaced with one that entangles three qubits, and three parameters no longer require CNOT gates.It is hence legitimate to consider it as an error-mitigation technique.
We now discuss how to perform qubit tapering [159,160].First, identify a Pauli string that commutes with the molecular Hamiltonian.When these exist, it is normally a consequence of some physically meaningful symmetry, e.g., total spin or a point group symmetry.It is possible to perform a similarity transformation so that the string becomes a single Pauli gate on qubit n.Upon performing this transformation, all operators in the Hamiltonian will take an I or Z in that qubit, and the eigenstate can be chosen to consist only of bitstrings with a 0 for that qubit, or only of bitstrings with a 1 for that qubit.The experimenter can then decide which choice is correct for the eigenstate of interest to them.The qubit is then eliminated from the computation, and the I and Z are replaced with their corresponding +1 or −1, depending on what that gate would return on qubit n.Although specific forms of this similarity transformation have been prescribed previously, those prescriptions are not essential to the method.

Postselection
When symmetry is present, some bitstrings should have no counts.Yet errors can introduce counts, and it is precisely this that postselection eliminates [154,161].In particular, suppose a state is an eigenfunction of Pauli string P .If it is possible to simultaneously measure P and other Pauli strings (such as those in the Hamiltonian), then when we measure the Pauli strings simultaneously, we can discard all measurements that yield incorrect expectation values for P .
We mention one technical limitation.Although quantum mechanics permits us to simultaneously measure any operators that mutually commute, current quantum hardware only permits us to simultaneously measure any operators that commute for every qubit in the Pauli string, i.e., that mutually qubitwise commute.To apply postselection, we must therefore perform a unitary transformation so all operators of interest qubitwise commute, as has been studied previously [162][163][164][165] and must be resolved for any hardware application.

Error Extrapolation
Error extrapolation [166][167][168] is flexible in what error it corrects for, only asking that it can be scaled.The error strength is represented by ϵ, and we seek the exact energy, which is obtained at ϵ = 0. We do this by assuming the dependence of the energy on ϵ follows some user-provided functional form and fitting its parameters according to measurements at increased values of ϵ.The "error-free" expectation value can then be read as the value at ϵ = 0.Much like the complete basis set extrapolations of quantum chemistry [169], this method is inherently heuristic but can be quite reliable.However, error extrapolation ceases to be reliable for timedependent noise [170].
A standard source of error targeted with error extrapolation is that coming from CNOT gates.This is justified when CNOT gates are the only qubit-entangling gates, and gates that act only on a single qubit have much smaller errors.In this case, we scale the noise by a factor of 2n + 1 by adding n CNOT pairs, which are just the identity in an errorless circuit, to each CNOT gate.
To perform error extrapolation, it remains to choose the functional form of the extrapolation function.Although the original proposals of error extrapolation used a linear function [166,167], exponential extrapolation was also found to achieve good accuracy [168].We shall study both, but advise the reader that we modified the exponential extrapolation procedure for numerical stability.Details are provided in Appendix A. We also point out that Maitra and coworkers [52] studied error extrapolation, but exclusively on simulators.

Results
Unless otherwise indicated, all simulations on quantum hardware used 8,192 shots for each measurement.

Simulation Details
We simulated H 2 with the minimal STO-6G atomic basis, which includes one basis function per atom.We first parameterize our state.From the atomic basis two spatial orbitals can be formed, a bonding orbital ϕ g of σ + g symmetry and an antibonding orbital ϕ u of σ − u symmetry.The ground state possesses 1 Σ + g symmetry, so by standard symmetry arguments, we can write it as: We encode the two-dimensional space of all possible symmetry-allowed states onto a single qubit by mapping |ϕ g αϕ g β⟩ to |0⟩ and |ϕ u αϕ u β⟩ to |1⟩, so The same ansatz may be derived by beginning with fermionic second quantized operators [171] and applying Z2 symmetry reduction to the Jordan-Wigner encoding [159,160].We immediately see that U (θ) = R y (θ).This may be efficiently implemented with no multi-qubit gates.With the encoding complete, we can then construct the Hamiltonian in the basis of our two states, using integrals obtained from the Psi4 package [172].
We compute a dissociation curve from 0.4 to 6.0 Å, and for each bond length, we perform ten hardware experiments.A hardware experiment consists of both the optimization of the amplitude to within a residual of 0.03 and the measurement of the energy at the optimized amplitude.(We find that tighter convergence metrics significantly increases the number of iterations required with no appreciable increase in accuracy.)As the bond stretched, in some experiments we observed qualitatively incorrect energies.When these occurred, the energy points were excluded and recomputed.We discuss the origin of this effect below.We used a single readout error calibration matrix for each set of ten energy computations.All computations were performed within an hour of computing the readout error calibration matrix.Our initial guess was always θ = 0.

Results
We show our H 2 dissociation curve in Figure 1, computed on IBM's Lima device.We observe that for all points, the averaged measured energy (shown as a red horizontal line) agrees with the observed energy to within four mE h , a good agreement considering the convergence threshold used.
Both the standard deviation of the data points and the error in the curve are smallest for points closest to dissociation and largest when the atoms are closest together, especially beyond the equilibrium point.Our group has observed a similar phenomenon previously [110].We attribute this to the simple fact that the error in a term in the energy is directly proportional to a prefactor derived from matrix elements.The energy may be expressed in terms of Hamiltonian matrix elements (H ij ) and the nuclear repulsion energy (E nuc ) as While the prefactor for the X string varies from 0.17 at compressed geometries to 0.34 at stretched geometries, the prefactor for the ⟨Z⟩ string varies from -0.99 to 0.00 along the same curve, being largest when the atoms are near each other.We therefore attribute both the larger variance and larger error at compressed geometries to the geometry dependence of the prefactor of the Z term.(The magnitude of this effect, however, is larger than would we expected solely by comparing coefficients of ⟨Z⟩.)This stochastic error is the dominant effect for stretched geometries.However, statistical error alone cannot account for the persistent fact that for compressed geometries, the energy frequently falls below full configuration interaction.As we will discuss below, for this system, the PQE residual is equivalent to the VQE gradient, so we can regard this as a VQE optimization.We observe that noise effectively distorts the angle used for measuring X relative to the angle used to measure Z, and the VQE uses this to drive the energy below full configuration interaction.
We caution the reader that due to the larger error in the three most compressed geometry points, we needed to re-compute these data points several times.We observed variations in energy of over four mE h even when no parameters were changed.In the final data we present, we magnified the number of shots used for readout calibration by a factor of 10, which should improve the accuracy of measuring the readout calibration matrix by a factor of √ 10 [173].After making this change, our energies improved by several mE h .
Because PQE is an iterative algorithm, a complete cost estimate must consider the number of iterations required to converge.We show this for two representative geometries in Figure 2. When the bond distance is 0.75 Å, close to equilibrium and where the initial guess is quite accurate, energy convergence is rapid, while amplitude convergence takes until after three iterations.At 2.95 Å, where the initial guess has only 54% overlap with the exact ground state, convergence is much more erratic but relatively swift, being achieved within eight iterations.The reader may also note a large spread in the energies after two rounds of parameter optimization, with some energies worse than the initial guess.This is an artifact of our optimization algorithm.
A pure Jacobi step will only cause a large amplitude update when the residual or approximate Hessian is large.Neither is the case here.When we optimize with DIIS, the optimization is sensitive to stochastic noise in the measurements and can cause very large extrapolations.Despite the energy having a period of 2π in the amplitude, we have observed cases where the DIIS optimizer sends parameter values far away from the initial guess of 0, into the neighborhood of ±1000.We have observed similar behavior on a noiseless simulator, but still found the performance of DIIS to be superior to the performance of a pure Jacobi step.We have found that taking the DIIS amplitude modulo the period eliminates this behavior and leads to swifter convergence on noisy simulators, but have not tested this on quantum hardware.Other modifications to the DIIS update, such as a trust radius, can also be imagined.We noted in Section 3.1.1that for stretched geometries, the PQE algorithm returned qualitatively incorrect energies, in error by over 0.5 E h .We were able to reproduce this behavior on a quantum simulator with no noise model and shall henceforth discuss results on the simulator, where we could run PQE repeatedly.We found that the erroneous energies were within 5 millihartree of the energy of the excited-state at that geometry.To understand this, recall that we initialize the wavefunction in the state |0⟩.As the bond stretches, the overlap of |0⟩ with the ground state diminishes from over 99% to 50%.As our optimizer tries to zero projections, which can be done by targeting any eigenstate, it is unsurprising in this case stochastic errors direct the optimization toward the excited state.
We view this result positively: in PQE, with an appropriate choice of amplitudes, it is possible to target an excited state.(We refer the interested reader to other quantum algorithms [40,95,148,[174][175][176][177][178] to compute excited states.)Indeed, by changing our initial guess, we have been able to target the excited state dissociation curve on the quantum simulator.The utility of this feature of PQE depends on how easy it is to obtain guess amplitudes and how accurately higher solutions of an approximate unitary ansatz can represent an excited state.This is a promising avenue of future work.
Lastly, we compared the performance of PQE to that of VQE after each optimization iteration, at the minimum energy geometry.The results are shown in Figure 3.The numerical performance is nearly identical.In fact, it may be shown that for the special case of one-parameter systems, the PQE residual is the VQE gradient.(This is most easily shown by comparing the two-measurement formulas for the residual to the VQE shift rule.)At least in this case, all difference between the two algorithms is due to numerical noise and does not indicate a meaningful performance difference in the algorithms.

Simulation Details
The Hamiltonian for the transverse-field Ising model (TFIM) on N sites is given by which describes a 1-D chain of spin sites subject to two kinds of interaction.First, adjacent spin sites have a co-alignment-penalizing interaction with magnitude controlled by J. Second, an external electric or magnetic field interacts with all sites, with magnitude controlled by h, rewarding sites that align with the field.All Pauli strings in H commute with the Pauli string N i=1 Z i , which may be physically interpreted as reversing the direction of the field and mathematically interpreted as requiring that for any eigenstate, the bitstrings all have the same number of |1⟩ qubits, modulo two, i.e., parity symmetry.In the context of the present work, this symmetry enables postselection and the tapering of one qubit.
The present work chooses h = J = 1, and, aside from the last subsection, we choose N = 4.With this choice of parameters, we have modeled a strongly correlated system.The choice of J/h = 1 puts this system at the critical point where it transitions between ferromagnetic and antiferromagnetic phases in the thermodynamic limit [179,180].In the uncorrelated basis of {|↑⟩ , |↓⟩} ⊗N , the overlap of any basis state with the correlated wavefunction is at most 81.2%, and the correlation energy (defined as the difference between the TFIM solution with J = 0 and the exact energy) is 15.9% of the total energy.
We have chosen our wavefunction ansatz as the so-called "quantum combinatorial ansatz" of Reference 181.The wave operator is a product of exponentiated imaginary multiples of Pauli strings.Each Pauli string is a product of a single Y and X gates, which always acts on the largest index of the non-trivial indices for that string.This ansatz is known to represent any state with real wavefunction coefficients (for complex coefficients, strings without a Y gate must also be allowed).We loop over the largest qubit index, from smallest to largest, but operator order is otherwise arbitrary.In some particular cases, we have chosen to neglect operators which are unlikely to have large amplitudes.We shall explicitly say when we do this.An advantage of this choice of ansatz is that it is an exact ansatz for any state, tapered or not.By symmetry, we may set to zero the coefficients of all operators in the untapered ansatz with an odd number of non-trivial operators.We choose our starting state to be |0⟩ ⊗N , and this choice of operators preserves the parity symmetry.
Let us discuss how symmetry applies to the use of postselection.Here, there are two sets of mutually commuting operators to measure.The first set consists of Z operators on various qubits, all of which already commute qubitwise with N i=1 Z i .The second set consists of X i X i+1 , which does not commute qubitwise with N i=1 Z i .To transform these operators into ones that can be simultaneously measured, we prepend a CNOT staircase, a sequence of CNOT gates where qubit i controls qubit i + 1 for all qubits.This transforms X i X i+1 into X i+1 , allowing for the simultaneous measurment of all such operators [36].Our use of tapering is more involved and is described in the next section.

Exact Results
Before analyzing the robustness of PQE to noise, we first study our algorithm on a noiseless simulator.With the procedure deployed above, we are able to reproduce the exact energy, as determined by Hamiltonian diagonalization, to within machine precision.This confirms the basic correctness of our ansatz and of PQE, when noise and noise mitigation are neglected.Looking at the corresponding parameters, we observe that they fall into two classes.One consists of all coefficients of X i Y i+1 Pauli strings, which all have magnitude of at least 0.45.The second class of parameters have a smaller value, with magnitude less than 0.12.Such a partitioning of PQE parameters was central to Reference 51 and could also have been anticipated from the perturbative arguments of Reference 181.If these three large parameters are optimized while all other parameters are set to zero, we are able to retain 96% of the correlation energy.The remainder of this section is concerned with how this division can be exploited in conjunction with qubit tapering.
As discussed in Section 2.2.2, standard qubit tapering requires a choice of qubit to be tapered out.When we attempt all four qubit taperings from the standard procedure, we are able to reproduce the exact energy, but find that only tapering qubits one or two provides an ansatz that retains at least 96% of the correlation energy when only three parameters are optimized.The other choices of qubits to taper lead to only 86% of the correlation energy when only the three largest parameters are retained.The exact solutions for the other two taperings blur the separation into large and small parameters.Tapering qubit three yields one parameter that increases to magnitude 0.35, and tapering qubit four yields two parameters that increase to magnitude 0.23 and 0.27, respectively.But even the two more accurate taperings are less than satisfactory.For one, the resulting Hamiltonians include a term that acts on every qubit, which costs the locality of the Hamiltonian.Additionally, the parameters that are important still retain at least one generator that acts on multiple qubits and is, therefore, more likely to introduce additional noise in the experiments.It would be best if the large parameters were associated with action on only one qubit.
We identified a non-standard tapering procedure that achieves this goal.Any parity-allowed basis state can be reached by applying a sequence of X i X i+1 operations to the starting state, |0⟩ ⊗4 .Starting from the first qubit, apply X i X i+1 if needed to achieve the target value on qubit i, and use parity to enforce that the final qubit is correct.We map each basis state to a state where index i is 1 if X i X i+1 was needed, and 0 otherwise, producing an overall state of 3 qubits.For example, |0101⟩ = (X 0 X 1 )(X 1 X 2 ) |000⟩ maps to |011⟩ = X 0 X 1 |000⟩.We can then construct an operator in this reduced space that produces the same expectation values in the old space, after accounting for the transformation.Notice that the operator X i X i+1 will always transform to X i in this alternative tapering scheme.
We have confirmed that this tapering scheme reproduces the exact energy.As expected, there are three large parameters of value at least 0.45, corresponding to the X i , while all other parameters are of magnitude below 0.12.Restricting the ansatz to the "large" parameters leads to recovering 97% of the correlation energy, as desired.Further, all terms in the Hamiltonian involve either one qubit or two adjacent qubits, meaning locality is restored.
Before proceeding, let us assess the quality of the wavefunctions produced by eliminating small parameters.We can do this by measuring the spin-spin correlation functions, ⟨σ i σ j ⟩, which measure the alignment of spin in the direction σ ∈ {X, Y, Z} on qubits i and j.In Figure 4, we see that many of the correlation functions are quite close, but those correlation functions between non-adjacent qubits, whose generators have been set to zero, can be far off.The same trend continues for the YY and ZZ correlation functions, albeit less strongly.We see similar behavior for our custom tapering scheme, which tends more accurate for XX and ZZ correlation, but slightly less accurate for YY correlation.We caution that we need to transform the σ i σ j operators into the new basis, which we do not discuss here.Nonetheless, we see results of similar quality.This confirms that our representation of the wavefunction has not been impaired by tapering.

Simulator Results
We begin our exploration of the transverse-field Ising model by studying combinations of the error mitigation techniques described in Section 2.2 using a simulator of the noisy Lima device of IBM.One instance of our simulations are displayed in Table 1.We caution the reader that due to both the stochastic nature   of quantum sampling and, more importantly, drift in the error model describing the device, repeating this experiment leads to different percentages, with greater difference from 100% leading to greater variability.For example, the correlation energy recovered can vary between 67% and 78% when postselection is used without extrapolation.Drift in the noise model leads us to recover between -68% and -28% of the correlation energy when only readout error is mitigated.
What is qualitatively clear is that for the 4-site TFIM, the extrapolation techniques are ordered, as are the symmetry exploiting techniques.Postselection is more accurate than qubit tapering, which is more accurate than merely removing symmetry-breaking generators.Exponential extrapolation is more accurate than linear extrapolation, which is more accurate than no extrapolation.Although it is useful to learn that eliminating erroneous measurements via postselection is a better way to eliminate error than eliminating extra qubits via tapering, all other comparisons of methods may have been easily anticipated.

Hardware Data
We now turn to experiments on quantum hardware.We found that the noise on IBM devices fluctuated in intensity to the point that even with error mitigation, we would observe qualitatively different results from experiments submitted just hours apart.Significant hardware drift over time has also been reported previously [82].In one instance, results that ran and completed within an hour of each other differed by −0.2 E h due to IBM recalibrating their hardware between the experiments.This severely limited the rigor with which we could perform experiments.We also observed that when we attempted error extrapolation, for some circuits, the error upon adding additional CNOT gates led the count distribution to be further away from pure noise, violating the assumptions of the exponential error extrapolation model discussed in Appendix A. Accordingly, exponential error extrapolation could not be performed reliably.Degraded performance of error extrapolation on hardware compared to a noise model on an emulator has been reported previously [182], and appropriate forms of error mitigation have been known to be hardware-dependent [83].For other uses of extrapolation schemes on hardware, see references 60, 183, 184.
One particularly successful experiment used our custom tapering for the 4-site TFIM and included only the three non-entangling generators, i.e., using an approximate ansatz that can be evaluated more accurately on the quantum hardware.We repeated this ten times and measured a final energy of −4.77 ± 0.03 E h .The exact energy is −4.76 E h .Within the precision possible on NISQ devices, the accuracy cannot reasonably be improved.An extension of this tapering scheme to molecular systems is unclear.Regardless, these findings make clear that for qubit tapering, some similarity transformations vastly outperform others.
We report additional experiments that are encouraging for PQE: we combined our tapering scheme, with all generators except the three-qubit entangler, with exponential extrapolation.On the Lima device, this reproducibly gave 120% of the correlation energy.By combining linear extrapolation and qubit tapering on the IBM Perth device, as well as removing the three-site generator, we were able to measure an energy of −4.70 E h i.e., we recovered 92% of the correlation energy.Given this data point and the similarity of the PQE and VQE equations, we suspect that the difficulties we observed measuring data from PQE are not specific to the PQE algorithm but reflect the challenges of obtaining reliable measurements from NISQ devices with all but the simplest quantum circuits.We add that although we attempted simulations with postselection, the results were inferior to those of exponential extrapolation and tapering.

PQE and VQE Comparison
While the hardware results of the preceding section indicate strong noise effects, they do not conclusively establish whether the poor performance is unique to PQE.To investigate that question, we return to the simulators and perform PQE and VQE computations on the transverse-field Ising model with increasing sites.We employ IBM's noise model of their Perth device, since the Lima device used earlier in the paper has only five qubits, but our experiments require seven.Numerical experiments on the Lima device, for those systems where it was possible, show the same PQE and VQE comparison as do the Perth systems, but with a shift in the amount of correlation energy recovered that increases with the number of qubits.We employ linear and exponential error mitigation, with both tapering and postselection, as Section 3.2.3showed these strategies to be advantageous.To minimize stochastic finite measurement error, we magnified our standard 8,192 shots by a factor of 40.We performed VQE using analytic gradients augmented by the shift rule, accelerated by DIIS.For additional discussion on combining VQE and DIIS, see Appendix B.
The results are shown in Figure 5. PQE and VQE results are extremely close in all cases.They disagree the most, by percentage, when exponential extrapolation is combined with tapering for the 7-site problem.With this combination of error mitigation techniques, although individual computations converged, different simulations would lead to different energies, and we had to repeat the experiment many times to obtain converged statistics.We suspect that performing more computations would yield closer agreement.Nonetheless, even in this case, it is clear that the main reason for the poor quality of our hardware results is due to hardware noise and not PQE or the optimization algorithm employed.. Rather, the noise is too intense for either PQE OR VQE using analytic residuals or gradients, respectively.
From Figure 5, we can also observe that error steadily increases as the number of qubits increases, from 4 to 7. For the 7-qubit simulations, combining exponential extrapolation and postselection does not yield converging energies.This matter is discussed in Appendix A. Combining exponential extrapolation with tapering is the most robust approach for the larger systems, but even in this best-case scenario, the error is already prohibitive at 7 qubits.Although the conclusion from Section 3.2.3 that exponential extrapolation is better than linear extrapolation still holds, the preference for postselection reverses once exponential extrapolation is consistently employed.

Conclusions
In this article, we have studied the performance of the Projective Quantum Eigensolver (PQE) on quantum hardware, to assess whether it is a viable and competitive algorithm to solve for molecular electronic structure on NISQ devices.While we encountered difficulties in applying PQE, these can be attributed to the inherent challenges of using NISQ devices, even with error mitigation.We applied PQE to H 2 in a minimal basis set with qubit tapering and mapped the state to a one qubit problem.In this case, the PQE residual condition is formally identical to zeroing gradients of VQE computed via the parameter-shift rule.For this problem, PQE converges swiftly near the equilibrium geometry.However, when the initial guess is not as accurate, PQE may either converge to an excited state or become more sensitive to errors, leading to a large spread of energies.For the four-site transverse-field Ising model (TFIM), PQE performs excellently on a noisy simulator, recovering over 99% of the correlation energy using a combination of exponential error extrapolation and postselection.On physical hardware, errors complicate the picture.While we have managed to recover over 90% of the correlation energy for the TFIM, device errors prevent us from doing this consistently.We found more success by using a custom qubit tapering that allows for 99% accuracy to be achieved with generators that require no CNOT gates.More reliable hardware is required before PQE and VQE can be definitively compared.
We now identify directions for future development of the PQE, until devices become more robust to error.We have identified new formulae for the residuals that need to be measured in the theory.Although not discussed in great detail here, the similarities to the shift rule allow for more fruitful adaptation of ideas from selected VQE into selected PQE.Second, we identified potential for improvement in the optimization of PQE.During the course of our studies, providing a suitable initial guess and exploiting periodicity in the parameters were both found to improve the convergence of DIIS.These areas merit further investigation, as do alternative convergence algorithms.Work in this direction has already begun [51].Lastly, and perhaps most intriguingly, our results on H 2 suggest that PQE can be used to target excited states, by identifying other roots of the PQE residual conditions and leveraging the fact that PQE does not use direct minimization.We plan to further explore these excited state PQE solutions in our subsequent research.
fourth assumption predicts when the true expectation value is near-zero.Hence, in cases where we have small expectation values, exponential error extrapolation is unreliable due to sampling noise, but the fourth assumption predicts that it should have a small impact anyways.PQE likely has more of these small expectation values than VQE without analytic gradients, because PQE must measure residuals involving multiple wavefunctions that are not localized in Hilbert space.
Accordingly, for each Pauli string, we measured the expectation values for both noise strengths as if we would perform exponential extrapolation and then checked that the expectation values had the same sign, decreased as error increased, that the high-error expectation was no smaller than 0.0001, and that the ratio of the errors was no greater than 2500.If any of those conditions were false, we did not perform exponential extrapolation and assumed that the low-noise value was sufficiently accurate: failure of these assumptions is likely when the true expectation value is already small.Otherwise, we performed exponential error extrapolation.Naively performing exponential error extrapolation in all cases led to numerical instabilities.
The two numerical thresholds mentioned in the preceding paragraph were chosen arbitrarily based on numerical experience with the 4-site transverse-field Ising model.We found these thresholds not to be robust for different systems.For the 7-site transverse-field Ising model, we frequently observed some cases where the error ratio was on the order of 1000, due to a large measurement on the first extrapolation point.This led to the already large measurement being amplified by a factor of √ 1000.This is fundamentally the reason we observe the non-convergence reported in Figure 5.While tighter thresholds would prevent error magnification, they fail to correct the error in the first data point.

Appendix B. DIIS Acceleration of VQE
Reference 20 reported that VQE required 2 to 3 times as many gradient evaluations as PQE to obtain a given energy accuracy, at least for the system under consideration.We were not able to reproduce this in our experiments with PQE and VQE on transversefield Ising models.While we found that PQE did converge faster than VQE in the absence of noise, this happened by one to two iterations.
To understand this difference, note that Reference 20 compared PQE using DIIS to VQE using the BFGS algorithm [185][186][187][188].Our experiments used DIIS for both systems.Upon repeating the experiments of Reference 20 with DIIS for both algorithms, we found that PQE required an iteration or two fewer than VQE, as in our results.We thus conclude that the reports of swifter convergence of PQE than VQE from Reference 20 were incomplete.The convergence speed depends on the optimization algorithm.The two algorithms perform comparably with DIIS, with PQE having a slight edge.At least for minimal amounts of noise, VQE optimization with DIIS requires fewer gradients than with BFGS.We make no statement regarding other optimization algorithms or performance under more realistic noise models.Shortcomings of DIIS optimization were noted in Section 3.1.2,and the problem of PQE optimization was studied in Reference 51.

Figure 1 .
Figure 1.Hardware experiments on the singlet ground state of the H 2 molecule in a minimal basis set.Top: Energies measured in experiments on a quantum computer (red) compared to exact full configuration interaction (blue curve) energies.Dots represent individual measurements, and horizontal lines indicate averages.Bottom: Errors from full configuration interaction.All hardware experiments were performed on the IBM Lima device with 8,192 shots, readout error mitigation, and a 1-qubit encoding.

Figure 2 .
Figure 2. Hardware experiments on the singlet ground state of the H 2 molecule in a minimal basis set.Convergence of the energy error with respect to full configuration interaction after n iterations of the PQE algorithm for ten runs.All hardware experiments were performed on the IBM Lima device with 8,192 shots, readout error mitigation, and a 1-qubit encoding.

Figure 3 .
Figure 3. Simulation of hardware experiments on the singlet ground state of the H 2 molecule in a minimal basis set.Comparison of the energy convergence averaged over ten runs of the PQE and VQE algorithms.All simulations utilized the error model for the IBM Lima device, employing 8,192 shots and a 1-qubit encoding.

Figure 4 .
Figure 4. Spin-spin correlation functions of the 4-site transverse-field Ising model.The left, center, and right columns measure correlation in the X, Y, and Z directions, respectively.The top row uses full configuration interaction, the middle row uses the three largest parameters in the standard ansatz, and the bottom row uses the three largest parameters in our tapered ansatz.

Figure 5 .
Figure 5. Simulation of hardware experiments on the transverse-field Ising model with 4 to 7 sites.Correlation energy measured employing all combinations of postselection or tapering, with linear or exponential error extrapolation.All simulations utilized the error model for the IBM Perth device, employing 327,680 shots and a 1-qubit encoding.

Table 1 .
The percentage correlation energy recovered for the transverse-field Ising model on a simulator modeling noise from IBMQ's Lima device.Readout error mitigation was used for all computations.Results are averaged over 100 runs.