Random versus Systematic Errors in Reaction Enthalpies Computed Using Semiempirical and Minimal Basis Set Methods

The connectivity-based hierarchy (CBH) protocol for computing accurate reaction enthalpies developed by Sengupta and Raghavachari is tested for fast ab initio methods (PBEh-3c, HF-3c, and HF/STO-3G), tight-binding density functional theory (DFT) methods (GFN-xTB, DFTB, and DFTB-D3), and neglect-of-diatomic-differential-overlap (NDDO)-based semiempirical methods (AM1, PM3, PM6, PM6-DH+, PM6-D2, PM6-D3H+, PM6-D3H4X, PM7, and OM2) using the same set of 25 reactions as in the original study. For the CBH-2 scheme, which reflects the change in the immediate chemical environment of all of the heavy atoms, the respective mean unsigned error relative to G4 for PBEh-3c, HF-3c, HF/STO-3G, GFN-xTB, DFTB-D3, DFTB, PM3, AM1, PM6, PM6-DH+, PM6-D3, PM6-D3H+, PM6-D3H4X, PM7, and OM2 are 1.9, 2.4, 3.0, 3.9, 3.7, 4.5, 4.8, 5.5, 5.4, 5.3, 5,4, 6.5, 5.3, 5.2, and 5.9 kcal/mol, with a single outlier removed for HF-3c, PM6, PM6-DH+, PM6-D3, PM6-D3H4X, and PM7. The increase in accuracy for the NDDO-based methods is relatively modest due to the random errors in predicted heats for formation.


■ INTRODUCTION
Computing accurate reaction enthalpies for large molecules represents a significant challenge to computational chemistry. Sengupta and Raghavachari 1 recently demonstrated that their connectivity-based hierarchy (CBH) protocol 2 can be used to compute reaction enthalpies that, on average, are within 1−2 kcal/mol of G4 theory, 3 using density functional theory and triple-ζ basis sets (DFT/TZV). Although encouraging, the computational cost of the DFT/TZV calculations is still too high to be routinely applied to large biomolecular systems. Here, we test the accuracy of the CBH approach using computationally more efficient methods, such as DFT/DZV, minimal basis set Hartree−Fock, tight-binding DFT, and neglect-of-diatomic-differential-overlap (NDDO)-based semiempirical methods, using the Sengupta and Raghavachari 1 data set. As part of the work, we have completely automated the CBH approach so that it only requires a SMILES string representation of the molecule, which is easily generated using chemical drawing programs, such as ChemDraw.
The paper is organized as follows. We first present the computational methodology including a brief description of the CBH approach. Next, we discuss the accuracy of the predicted reaction enthalpies as compared to G4 reference values for the 25 reactions used by Sengupta and Raghavachari 1 and perform an error analysis of outliers by comparing heats of formation (HOF) of the CBH fragments to G4 values. Finally, we summarize our conclusions and discuss their potential implications for parameterization of semiempirical methods. Table 1 shows the mean unsigned error (MUE) relative to G4reaction enthalpies for the parent reaction (Dev-0) and the CBH-1 (Dev-1) and the CBH-2 (Dev-2) correction schemes. In addition, the maximum absolute error for the CBH-2 scheme is also listed. As observed in the Sengupta and Raghavachari 1 study, CBH-1 correction scheme provides only a modest improvement in accuracy and for many of the semiempirical methods, the error actually increases. We will return the source of the error increase in the next subsection, but for now we will focus on the accuracy of the CBH-2 results.

■ RESULTS AND DISCUSSION
The MUE of PBEh-3c, which is a dispersion-corrected hybrid DFT/double-ζ valence method, is 1.9 kcal/mol, which is very similar to the 1.3−2.1 kcal/mol MUE values obtained by Sengupta and Raghavachari 1 for dispersion-corrected DFT functionals using the much larger 6-311++G(3df,2p) basis set and B3LYP/6-311+G(d,p) geometries and frequencies. This is consistent with Sengupta and Raghavachari 1 's observation that the Dev-2 error is fairly insensitive to basis set size. Similarly, the MUE of HF-3c, which is a dispersion-corrected HF/ minimal basis set method, is 2.4 kcal/mol, which is only 0.3 kcal/mol higher than the value obtained by Sengupta and Raghavachari 1 using HF/6-311++G(3df,2p)//B3LYP/6-311+G(d,p). The MUE is computed without reaction 19, which is a clear outlier and will be discussed in detail in the next subsection. For comparison, the STO-3G MUE is 3.0 kcal/mol, so empirical corrections (most likely dispersion) contribute to accuracy of the HF-3c method.
GFN-xTB and DFTB-D3 are the most accurate semiempirical methods with MUEs of 3.9 and 3.7 kcal/mol, respectively, followed by DFTB (4.5 kcal/mol) and PM3 (4.8 kcal/mol), with both pairs of methods being statistically identical (Table S1). The good performance of GFN-xTB and DFTB-D3 is especially remarkable because they are not specifically parameterized against any reaction energies. Comparison of the MUEs for DFTB-D3 and DFTB shows that dispersion corrections lead to noticeably more accurate reaction enthalpies for DFTB, consistent with previous findings. 5 The remaining NDDO-based methods have statistically identical accuracy, with MUEs in the range of 5.2−5.9 kcal/mol, with the exception of PM6-D3H+, which has an MUE of 6.5 kcal/mol. The MUEs for PM6, PM6-DH+, PM6-D3, PM6-D3H4X, and PM7 are computed without reaction 23, which is a clear outlier and will be discussed in detail in the next subsection. Furthermore, reactions 16 and 22 are omitted for OM2 because it is not parameterized for fluorine. We note that PM6-D3H+ has an Dev-2 error of 15.0 kcal/mol and is therefore not considered an outlier. Although the MUE for PM6-D3H+ is higher than that for the other PM6-based methods, it does not lead to any outliers using the Dev-2 method. For the semiempirical methods, there is little correlation between the MUE for parent reaction and the CBH-2-corrected values. Similarly, with an MUE of 6.8 kcal/ mol, OM2 performs reasonably well for the parent reactions but worst with CBH-2.
Error Analysis. We start by considering the PM6 result for the first reaction (R1), where the PM6 error is −0.3 kcal/mol for the parent reaction and the CBH-1 correction is −12.7 kcal/mol. PM6 error is quite low for the parent reaction so, ideally, the CBH-1 correction should be close to zero but is in fact quite large in magnitude. To understand the source of this error, we reformulate the (ΔCBH-n(G4)-ΔCBH-n(LL)) correction in terms of errors in heats of formation (HOF, ΔH f°) .
R1 is a Diels−Alder reaction, where two double bonds are converted into four single bonds ( Figure 1). The CBH-1 reaction is thus and Therefore the CHB-1 correction is where, for example The errors in the HOFs for PM6 (and all other methods) relative to G4 are listed in Table S4 and show that the error in the HOFs relative to G4 are CH 4 : −5.6, CH 2 CH 2 : −3.3, and CH 3 CH 3 : 3) − 4(−5.6) = 12.6 kcal/mol and the CBH-1 correction is large in magnitude because the HOF error for CH 4 is somewhat larger than that for CH 3 CH 3 and, especially, CH 2 CH 2 . These errors also show that the low error for the parent reaction is clearly fortuitous. If the errors in the HOFs of the reactants and products are similar to those observed for CH 4 , CH 2 CH 2 , and CH 3 CH 3 then one would expect an error in the reaction enthalpy of about 3−6 kcal/mol because you go from 2 to 1 molecule. In contrast, the PBEh-3c error for the parent reaction is 9.0 kcal/mol, which decreases to −4.5 kcal/mol with the CBH-1 correction. The error in the HOFs relative to G4 are CH 4 : 8.5, CH 2 CH 2 : 6.0, and CH 3 CH 3 : 14.9 kcal/mol. So the CBH-1 correction is 4(14.9) − 2(6.0) − 4(8.5) = 13.6 kcal/mol, which when subtracted from 9.9 lowers the error in reaction energy to −4.5 kcal/mol. Clearly, the errors in HOFs are much larger than those for PM6, yet most of the error cancels. The reason is that the magnitude of the HOF error tends to scale with the number and types of bonds in the molecule for ab initio methods. For example, the HOF error of CH 2 CHCH 3 is 13.7 kcal/mol, which is approximately the sum of the HOF errors of CH 3 CH 3 and CH 2 CH 2 , minus the error for CH 4 : 14.9 + 6.0 − 8.5 = 12.4 kcal/mol. This type of error scaling makes ab initio results very amenable to improvement using the CBH approach.
We now perform similar analysis of the outliers we removed when computing the Dev-2 MUEs in Table 1, namely, R19 for HF-3c and R23 for PM6, PM6-DH+, and PM7.
For R23, the PM6, PM6-DH+, and PM7 errors for the parent reaction are 0.6, 0.6, and −1.6 kcal/mol, respectively, whereas the Dev-2 errors are 44.8, 45.0, and 45.7 kcal/mol, much larger in magnitude than other Dev-2 results for these methods. The reaction corresponding to the CBH-2 correction and the corresponding errors in HOFs (Table S4) show that the magnitude of the CBH-2 correction derives from the HOF error of CH 2 CCH 2 (7.7 kcal/mol), which is considerably larger than, and opposite in sign from, the errors in HOF of the remaining fragments (−1.1 to −4.1 kcal/mol). For R19, the HF-3c Dev-2 error is −33.5 kcal/mol, much larger in magnitude than other Dev-2 error for HF-3c. For comparison, the corresponding value for HF/STO-3G is −12.0

ACS Omega
Article kcal/mol. In R19, an oxirane ring is formed, which may pose a special challenge to minimal basis sets that may not offer sufficient flexibility to describe the ring strain. Indeed, the Dev-2 errors for the first reaction shown in Figure 2a are −14.6 and −12.6 kcal/mol for HF-3c and HF/STO-3G, respectively. Although these errors are similar to the HF/STO-3G Dev-2 error for R19, they are still significantly smaller than the HF-3c Dev-2 error for R19, so we performed similar calculations for the second reaction shown in Figure 2a, which is more similar to R19. The HF-3c and HF/STO-3G errors are −16.4 and −11.7 kcal/mol, respectively. The source of this increased difference in error between HF-3c and HF/STO-3G can be understood by decomposing the Dev-2 error into contributions from the two "half-reactions" (Figure 2c). The HF-3c and HF/ STO-3G Dev-2 errors for the "product reaction" are very similar (−12.5 and −11.5 kcal/mol), whereas the HF-3c error is 4.0 kcal/mol higher than that for HF/STO-3G for the reactant reaction. Thus, the higher Dev-2 error observed HF-3c is primarily due to a higher error for the reactant. So, one possible explanation for −33.5 kcal/mol error for HF-3c is that roughly 13 kcal/mol comes from the oxirane ring in the product, whereas the remaining error comes from the four rings in the reactant (4(4.0) = 16 kcal/mol).
HF-3c differs from HF/STO-3G in several ways: HF-3c uses the MINIX basis set instead of STO-3G and has three empirical corrections accounting for dispersion (E disp ), basis set superposition error (E BSSE ), and bond length errors (E SRB ). These three corrections contribute 1.1 and 4.8 kcal/mol to the errors in the product and reactant half-reactions, respectively. Thus, the larger error compared to HF/STO-3G for the product halfreaction is primarily due to the difference in basis set. This observation is also consistent with the fact that the HF/MINIX Dev-2 error for the R19 is −24.8 kcal/mol.

■ CONCLUSIONS AND OUTLOOK
The connectivity-based hierarchy (CBH) protocol for computing accurate reaction enthalpies developed by Sengupta and Raghavachari 1 is tested for fast ab initio methods (PBEh-3c, HF-3c, and HF/STO-3G), tight-binding DFT methods (GFN-xTB and DFTB), and NDDO-based semiempirical methods (AM1, PM6, PM6-DH+, PM7, and OM2) using the same set of 25 reactions as in the original study. As observed by Sengupta and Raghavachari, 1 the CBH-1 correction scheme, which reflects the change in bonding, provides only a modest improvement in accuracy and for many of the semiempirical methods the error actually increases. For the CBH-2 scheme, which reflects the change in the immediate chemical environment of all of the heavy atoms, the MUE relative to G4 of PBEh-3c is 1.9 kcal/mol, which is very similar to the 1.3−2.1 kcal/mol MUE values obtained by Sengupta and Raghavachari 1 using various DFT functionals and triple-ζ basis sets. The MUE for HF-3c, which is a dispersion-corrected HF/minimal basis set method, is 2.4 and 3.0 kcal/mol for HF/STO-3G. The MUE for HF-3c is computed without reaction 19, a clear outlier primarily due to a relatively large error for rings for the HF/ MINIX basis set. GFN-xTB is the most accurate semiempirical method with an MUE of 3.9 kcal/mol, followed by DFTB (4.5 kcal/mol) and PM3 (4.8 kcal/mol). The remaining NDDObased methods, AM1, PM6, PM6-DH+, PM7, and OM2, have statistically equal accuracy with MUEs in the range of 5.2−5.9 kcal/mol. The MUEs for PM6, PM6-DH+, and PM7 are computed without reaction 23, which is a clear outlier due to an unusually large error in the heat of formation (HOF) compared to G4 for one of the components in the correcting reaction.
Although the accuracy is lower for the minimal basis set and semiempirical methods are lower than for PBEh-3c, they are significantly more computationally efficient with HF-3c and HF/STO-3G being roughly 10−50 times faster and the semiempirical methods being roughly 1000 times faster, depending on system size. It is worth noting that GFN-xTB is roughly 10 times faster than the other semiempirical methods, which can lead to significant time savings when dealing with thousands of molecules. However, significant changes in bonding and/or unusual bonding still present challenge to these faster methods, as evidenced by the presence of outliers.
More generally, our analyses show that although the errors in HOF computed by ab initio and tight-binding DFT methods tend to be larger in magnitude than those for the NDDO-based semiempirical methods, the magnitude tends to scale systematically with the number and types of bonds in the molecule, which make them very amenable to improvement using the CBH approach. The NDDO-based semiempirical methods are optimized by reducing the absolute HOF error independent of system size and, as a result, the HOF error of an individual molecule tends to be relatively random both in sign and magnitude, which makes it less amenable to improvement using the CBH approach. A better approach may be to parameterize the semiempirical methods by minimizing the CBH-corrected error and presenting the CBH-corrected energies to the user. For example, although NDDO-based methods would parameterize by minimizing the HOF error of propane (ΔHOF-(propane)), a CBH-1-based scheme would minimize [ΔHOF-(propane) − ΔHOF(ethane) − ΔHOF(methane)] and present the CBH-1-corrected HOF to the user. This approach might make it easier to find more generally applicable parameters that better and systematically minimize the error because the underlying HF approach "naturally" gives larger errors for larger systems, and we are no longer asking the parameters to "undo that" by searching for small errors independent of system size.
We implemented the CBH-1 and CBH-2 correction schemes in a program called fragreact, which identifies the appropriate fragments for a given parent reaction, generates input files for the fragments (if needed), determines the balanced equation for the correction reaction, and, in general, automates the entire process.

■ COMPUTATIONAL DETAILS
The connectivity-based hierarchy (CBH) protocol has been described in detail elsewhere 1,2 and is only summarized here. The aim of the method is to approximate the reaction enthalpy of a reaction (the "parent reaction") at a high level of theory (here taken to be G4) by computing the reaction enthalpy using a faster, lower level method ("LL" in the equations below) and subtracting a correction.
Here, ΔCBH-n(G4) is the G4-reaction enthalpy computed for a reaction involving smaller fragments that reflect the change in bonding of the parent reaction where the size of the fragment is defined by n and similarly for ΔCBH-n(LL). For ΔCBH-1, the reaction involves fragments made of two heavy (nonhydrogen) atoms and reflects the change in bonding, whereas for ΔCBH-

ACS Omega
Article 2, reaction involves fragments made from 3 to 5 heavy atoms and reflects the change in the immediate chemical environment of all of the heavy atoms. Following Sengupta and Raghavachari, 1 we evaluate the error relative to the parent reaction using G4-reaction energies as reference.
We use the G4 reference energies for 25 reactions from the study of Sengupta and Raghavachari 1 (referred to as R1−R25). Note that for R19, R20, and R22, DLPNO-CCSD(T)/cc-pVTZ is used instead of G4. We implemented the CBH-1 and CBH-2 correction schemes in a program called fragreact, 6 heavily depending on RDKit, 7 which identifies the appropriate fragments for a given parent reaction, generates input files for the fragments (if needed), determines the balanced equation for the correction reaction, and, in general, automates the entire process.
Although our fragreact results generally agree with those of Sengupta and Raghavachari, 1 we did obtain slightly different ΔCBH-2 correction reactions for R18, R19, and R23, but the effect on the mean unsigned error computed at the B3LYP/6-311+G(d,p) level of theory is 0.1 kcal/mol, as described in SI. All fragment G4 energies were recomputed as part of this study.