Encoding Chemistry: Cleveland Clinic Uses Quantum Circuits to Aid in Predicting Proton Affinities

Insider Brief:
- Researchers at the Cleveland Clinic developed a hybrid machine learning model that combines classical molecular descriptors with quantum circuits to predict proton affinities efficiently and accurately.
- Their classical ensemble model achieved a mean absolute error of 2.47 kcal/mol, aligning with experimental uncertainty, while the hybrid model approached similar accuracy using fewer features and parameters.
- The quantum circuits served as feature encoders, transforming classical inputs into higher-dimensional representations that revealed stronger correlations with PA values than the original features.
- By integrating low-depth quantum circuits into the encoding process, the study demonstrates a practical role for quantum tools in enhancing classical workflows, especially considering current NISQ hardware limitations.
In the seemingly limitless library of molecules that make up our world, protons play a quiet but decisive role. Where a proton binds on a molecule can alter its geometry, reactivity, and even how it appears in mass spectrometry. And yet, pinpointing the most favorable protonation site—the one with the highest proton affinity—is anything but trivial and typically reserved for high-cost quantum calculations or careful experimental work.
In a new study published in Journal of Chemical Theory and Computation, Cleveland Clinic researchers Hongni Jin and Kenneth Merz Jr. propose a third option. They’ve developed a hybrid machine learning model that combines classical descriptors with quantum circuits to predict proton affinities efficiently and with competitive accuracy. Their model reaches near-experimental performance on a diverse set of molecules, and notably, shows consistent results even on noisy quantum hardware.
Classical Limits and a Quantum Alternative
Proton affinity is especially relevant in gas-phase ion chemistry, specifically in workflows such as ion mobility–mass spectrometry. When multiple protonation sites are available, only one usually leads to a geometry that matches observed data. But identifying that site is complicated. High-accuracy ab initio methods like G4 or W1 can calculate PAs directly, but they’re computationally expensive and impractical for larger molecules. Experimental methods face limitations as well, especially for compounds that are nonvolatile or thermally unstable.
To avoid those bottlenecks, Jin and Merz built a machine learning model trained on 1,185 organic compounds curated from NIST and PubChem. Each molecule was described using a broad set of 186 features, including 2D and 3D physicochemical descriptors, quantum-chemical variables like HOMO-LUMO energies, and MACCS fingerprints. Models such as support vector regression, random forests, and gradient-boosted trees were tested individually and in ensemble. The best performer—an ensemble voting regressor—achieved a mean absolute error of 2.47 kcal/mol, which aligns with the uncertainty range of experimental measurements.
From Features to Hilbert Space
The hybrid component of the study takes a different tack. Instead of feeding the descriptors directly into a classical neural network, the authors encoded a reduced subset of features into quantum states using parameterized quantum circuits. Each circuit acts as a feature encoder and transforms classical data into a higher-dimensional representation using quantum operations like angle rotations and entangling gates. These outputs were then fed into a conventional neural network.
To keep the design scalable and compatible with today’s noisy intermediate-scale quantum hardware, the authors used a “patch” approach. Features were divided into subsets, each processed by a small, structurally identical quantum circuit—or subencoder. These were generated using Élivagár, a circuit search method optimized for low-noise execution. Each subencoder consisted of just a few qubits and a fixed number of trainable gates, minimizing depth and avoiding long-range entanglement that can introduce hardware errors.
As noted in the paper, performance improved consistently with more features, more qubits per subencoder, and greater circuit expressivity. With 64 input features split across four subencoders, each using 10 qubits, the hybrid model achieved an MAE of 3.29 kcal/mol—slightly higher than the classical ensemble, but with significantly fewer trainable parameters. When implemented on IBM-Cleveland, the real-device version of the model maintained an MAE of 3.63 kcal/mol, matching the classical neural net baseline.
Are Quantum Circuits Good Feature Encoders?
The performance alone was promising, but the authors also investigated why the quantum-enhanced model works. They compared the correlation between input features and PA values for both the original and quantum-encoded features. One encoded feature in particular showed a correlation two orders of magnitude stronger than any of the original descriptors. This suggests that the quantum circuit was capturing structure in the data that classical preprocessing methods had missed.
The idea here isn’t that quantum models outperform classical ones across the board—but rather, that quantum circuits may serve as powerful tools for the data transformation part of the process. In high-dimensional problems like molecular property prediction, where feature engineering and selection can make or break a model, quantum feature encoding could become an essential part of methodology.
Navigating NISQ
To be clear, the most accurate results in the study still came from the classical ensemble. But the hybrid model’s ability to achieve strong, stable results with fewer inputs and fewer parameters is notable—especially in light of current hardware constraints. As the authors note, noise, decoherence, and barren plateaus remain real barriers for deeper or more complex quantum circuits.
Yet, by using lightweight circuits in strategic roles, not as solvers, but as encoders, the researchers show how quantum components can be meaningfully integrated into existing pipelines. In this way, it’s not about quantum advantage, but rather how can we strategically include quantum tools where useful to upgrade existing techniques.
In a problem domain where structural nuance matters, and where conventional approaches can quickly become computationally prohibitive, this method is compelling. Not a replacement for chemistry or physics, but a complement.
Contributing authors on the study include Hongni Jin and Kenneth M. Merz Jr.