Semiempirical Quantum-Chemical Methods with Orthogonalization and Dispersion Corrections

We present two new semiempirical quantum-chemical methods with orthogonalization and dispersion corrections: ODM2 and ODM3 (ODMx). They employ the same electronic structure model as the OM2 and OM3 (OMx) methods, respectively. In addition, they include Grimme’s dispersion correction D3 with Becke–Johnson damping and three-body corrections EABC for Axilrod–Teller–Muto dispersion interactions as integral parts. Heats of formation are determined by adding explicitly computed zero-point vibrational energy and thermal corrections, in contrast to standard MNDO-type and OMx methods. We report ODMx parameters for hydrogen, carbon, nitrogen, oxygen, and fluorine that are optimized with regard to a wide range of carefully chosen state-of-the-art reference data. Extensive benchmarks show that the ODMx methods generally perform better than the available MNDO-type and OMx methods for ground-state and excited-state properties, while they describe noncovalent interactions with similar accuracy as OMx methods with a posteriori dispersion corrections.


INTRODUCTION
Semiempirical quantum chemistry (SQC) methods based on the neglect of diatomic differential overlap (NDDO) integral approximation 1,2 enable computationally efficient calculations of ground-state and excited-state electronic structure properties. 3,4 They are widely used when computational time becomes a major issue, i.e. in calculations of very large systems, e.g. of fullerenes, 5−9 nanotubes, 5,8 long polyynes, 10 proteins, 3,11−17 and others, 5,18,19 in real-time quantum chemistry studies, 20−24 and in simulations requiring a very large number of electronic structure calculations. The latter applications include high-throughput screening in drug 5,25−33 and materials 34,35 design, high-throughput pK a calculations, 36,37 ground-state molecular dynamics (MD) simulations, 38,39 excited-state nonadiabatic MD simulations, 3 quantum mechanics/molecular mechanics (QM/MM) MD and Monte Carlo studies, 3,[12][13][14][15][16]40 and mass spectra simulations. 41−44 There are two classes of modern NDDO-based SQC methods: 1) orthogonalization-corrected methods (OMx), 45−50 which account for repulsive orthogonalization effects, attractive penetration effects, and repulsive core− valence interactions via explicit corrections; 2) MNDO-type methods without such corrections, which ignore the overlap matrix while solving the Roothaan−Hall equations and also ignore penetration integrals and core−valence interactions. The first class comprises the OM1, 45,46,50 OM2, 47,48,50 and OM3 49,50 methods; somewhat related is the NO-MNDO method, which solves the Roothaan−Hall equations taking overlap into account explicitly. 51 Generally, the OMx methods are more accurate than the MNDO-type methods both for ground-state and excited-state properties, because they are based on a better physical model. 51 69 and PM7. 70 They are popular and useful for many applications, especially because parameters are available for many elements and because they are often reasonably accurate thanks to an elaborate parametrization and fine-tuning via empirical core−core repulsion functions.
A common problem of SQC methods is that they do not properly describe noncovalent complexes with significant dispersion interactions. 71 This problem is often ameliorated by adding explicit empirical dispersion corrections. 18,72−80 OMx methods augmented with such explicit dispersion corrections describe various large noncovalent complexes with an accuracy comparable to density functional theory (DFT) methods with dispersion corrections 18,19 that are computationally much more expensive. Noncovalent interactions with hydrogen bonds are also often described poorly with SQC methods. This issue has been addressed by including special hydrogen bond corrections in MNDO-type methods. 70,[72][73][74][75]77 In contrast, the OMx methods treat hydrogen-bonding interactions even without such corrections reasonably well, 50,54,81,82 while inclusion of dispersion corrections generally further improves the accuracy. 50,54 One should note, however, that the addition of empirical attractive dispersion corrections to any semiempirical Hamiltonian parametrized without such corrections will inevitably deterio-rate the accuracy of the computed heats of formation (which will become too small), while the computed relative energies may become more or less accurate. 52,54 Hence, it is more consistent to reparametrize the Hamiltonian with inclusion of dispersion corrections. This has so far been done only in PM7, 70 which however suffers from error accumulation in very large noncovalent complexes, 19,54 and in the proof-of-principle MNDO-F method, 83 which still has large errors in heats of formation.
Another problem of modern NDDO-based SQC methods is that all of them conventionally treat atomization energies calculated at the SCF level as atomization enthalpies at 298 K, i.e. heats of formation are obtained without explicitly computing zero-point vibrational energies (ZPVEs) and thermal enthalpic corrections from 0 to 298 K. 50,54,57,84 This convention was useful for parametrizing SQC methods against experimental heats of formation in early times, when accurate theoretical reference data were not yet available and when it was computationally unfeasible to calculate ZPVE and thermal corrections during parametrization. It is debatable whether this convention contributes much to the errors in SQC methods. 84,85 Benchmark studies show that it often has only a small effect on reaction energies, 54 but it may be problematic when comparing ZPVE-exclusive energies at 0 K with differences in semiempirical heats of formation for reactions with large changes in bonding. 54 Nowadays this convention is no longer justified, and it should be avoided in new methods. 84 As already mentioned, general-purpose SQC methods are often used for excited-state calculations, yet they are typically parametrized on ground-state properties only. On the other hand, there are special-purpose semiempirical methods such as INDO/S 86,87 and INDO/X 88 that were parametrized to reproduce electronic spectra. They can be applied for predicting such spectra but are less suitable for other purposes. It would clearly be desirable to develop general-purpose SQC methods that describe ground-state and excited-state properties in a balanced manner; this will require including both during parametrization.
In this work, we report two new orthogonalization-and dispersion-corrected SQC methods, ODM2 and ODM3 (ODMx). They are based on OM2 and OM3, respectively. They differ from the underlying OMx methods in the following aspects: (a) They include explicit dispersion corrections as an integral part. (b) They are parametrized against much larger sets of diverse, state-of-the-art reference properties, with special emphasis on a balanced treatment of both groundstate and excited-state properties as well as noncovalent interactions. (c) Atomization energies calculated from total energies are treated consistently as ZPVE-exclusive atomization energies at 0 K, while heats of formation are determined by adding ZPVE and thermal corrections obtained within the harmonic-oscillator and rigid-rotor approximations.
This Article is structured as follows. First, we discuss the theoretical formalism of the ODMx methods (Section 2). We then describe the parametrization procedure and present the optimized values of the ODM2 and ODM3 parameters (Section 3). Thereafter, we validate the new methods on a huge collection of benchmark sets and compare their performance to that of the underlying OMx and dispersioncorrected OMx methods (Section 4). Finally, we offer conclusions.

METHODOLOGY
The ODM2 and ODM3 methods employ the same electronic structure model as OM2 47,48,50 and OM3, 49,50 respectively. The OM2 and OM3 electronic structure models have been described in detail elsewhere 50 and will therefore not be explained again here. Instead we focus on the formal differences between the ODMx and OMx methods.
The ODMx methods incorporate Grimme's dispersion correction D3 89,90 with the Becke−Johnson (BJ) damping function 91−93 as an integral part (unlike the OMx methods). They also include explicit three-body corrections E ABC for the Axilrod−Teller−Muto dispersion interaction, 89 which are necessary for a better description of large dense systems. 54,94,95 We denote these D3(BJ)+E ABC corrections as D3T in the following. The ODMx total energy (E tot ) is defined as the SCF total energy plus the post-SCF D3T dispersion energy. The same definition holds for OMx methods with a posteriori D3T corrections (the OMx-D3T methods), which have been shown to describe noncovalent interactions well. 50,54 Consistent with the definitions in ab initio methods, the ZPVE-exlusive atomization energy at 0 K (ΔE at ) can be written as the difference between the sum of the total ODMx energies of N at constituent atoms and the ODMx total energy of a molecule (E tot ): This definition is different from that used in earlier NDDObased SQC methods (including the OMx methods) where ΔE at is assumed to be the atomization enthalpy at 298 K (ΔH at, 298 K ) and is directly used in evaluating the heat of formation at 298 K without ever calculating ZPVE and thermal corrections explicitly. 50,54,57,84 By contrast, in the ODMx methods, heats of formation (ΔH f, T ) include ZPVE and thermal corrections from 0 K to a given temperature T computed explicitly within the harmonic-oscillator and rigidrotor approximations (as in ab initio methods).
More specifically, ΔH f, T is defined in the ODMx methods as where ΔH f, T (A) denotes the heats of formation of the constituent atoms at temperature T. At 298 K we use the same experimental heats of formation of atoms as in the OMx methods. The atomization enthalpy at temperature T (ΔH at, T ) is determined from the absolute enthalpies H T of a molecule and its constituent atoms: However, eq 5 does not take into account that the electronaccepting properties of the bare proton, which are quantified by the ionization potential of the hydrogen atom IP(H), are often severely underestimated by SQC methods. This is also true for the ODMx methods. The ionization potential of hydrogen at the ODMx level (IP ODMx (H)) is equal to the negative of the U ss parameter of hydrogen (−U ss (H)). This parameter is optimized for molecular reference systems (Section 3.3) and turns out to be much lower than the experimental value of the ionization potential of the hydrogen atom IP exp (H) of 313.5873 kcal/mol, 96 by 25−29 kcal/mol. The impact of this underestimation becomes evident when considering the thermochemical cycle in Figure 1, which offers an alternative way to calculate PA ODMx : It is obvious from eq 6 that PA ODMx will be strongly underestimated. In a semiempirical context, it is more reasonable to substitute IP ODMx (H) with IP exp (H) in this equation to obtain corrected PAs (PA corr ), which do not suffer from the inadequate use of the same hydrogen U ss parameter in the hydrogen atom and in molecules: The correction ΔPA corr required to calculate PA corr from PA ODMx is thus given by Hence, we use the following expression to calculate corrected ZPVE-exclusive proton affinities at 0 K at the ODMx level: In the following we refer to these quantities simply as proton affinities. We note that this convention is consistent with that adopted in previous MNDO-type and OMx methods, which employ the experimental heat of formation of the proton when converting the computed heats of formation of the molecule and the protonated molecule to the corresponding proton affinity.

PARAMETRIZATION
In this Section we specify the chosen training sets, describe the parametrization procedure, and provide the list of final ODM2 and ODM3 parameters for the elements carbon, hydrogen, nitrogen, oxygen, and fluorine (CHNOF).
3.1. Training Sets. The quality of SQC methods strongly depends on the training sets, which should satisfy two requirements: 1. Molecules and properties of interest should be covered in a balanced manner. 2. Reference data should be very accurate. In the present general-purpose parametrization, we aim at covering the entire space of CHNOF-containing molecules with regard to ground-state and excited-state properties as well as noncovalent interactions. Concerning ground-state energies we want to describe both ZPVE-exclusive atomization energies at 0 K and heats of formation at 298 K as accurately as possible. To satisfy these requirements, we have chosen the following training sets with state-of-the-art reference data: • Our own CHNO set of energies (heats of formation at 298 K, ionization potentials, vibrational energies, relative energies, and barriers), geometries (bond lengths, bond angles, and dihedral angles), and dipole moments. 50 • Our own FLUOR set of energies (heats of formation at 298 K, ionization potentials, and vibrational energies), geometries (bond lengths and bond angles), and dipole moments. 50 • The MGAE109 set with ZPVE-exclusive atomization energies at 0 K, 97,98 which is part of the CE345 database. 99,100 • The TAE140 set with ZPVE-exclusive atomization energies at 0 K, which is part of the W4-11 benchmark set. 101 • Our own set of vertical excitation energies (called VEE set in the following) 102 with updated theoretical best estimates from ref 55. • The S66 set of interaction energies and geometries of 66 noncovalent complexes. 103,104 We found it beneficial to also include the ZPVE-exclusive atomization energy at 0 K of cubane calculated at the W2-F12 level. 105 3.2. Parametrization Procedure. Parametrization of SQC methods is a very complicated task in itself and is as much an art as a science. One of the challenges is the large number of parameters (usually more than a dozen per element), which makes it difficult to find the optimum set of parameters. The large parameter space provides enormous flexibility: in the words of John von Neumann, 'with four parameters I can fit an elephant, and with five I can make him wiggle his trunk'. However, this flexibility should not be mistaken as a sign that the parametrization can achieve perfect accuracy for general-purpose SQC methods. The underlying physical model strongly limits what can be achieved for different sets of molecules and properties. While specialpurpose SQC parametrizations can yield highly accurate results for certain classes of molecules and/or specific properties, 106 they will often fail outside their range of validity. Extending the metaphor, one may specifically 'fit an elephant', but such 'an

Journal of Chemical Theory and Computation
Article elephant model' would be useless for describing the locomotion of both an elephant and a car. In our experience, there is plenty of subjective judgment involved in all stages of a general-purpose parametrization, up to evaluating and choosing among a number of reasonable candidate parameter sets. In the present work, we discarded well over 99.9% of the parameter sets considered. This underlines that a generalpurpose parametrization is very demanding both in terms of human effort and computational costs.
As already mentioned above, we target a balanced treatment of a large number of diverse molecules and properties. There are two issues: 1. Errors in different properties with different units cannot be directly compared, e.g. the numerical errors of heats in formation at 298 K (in kcal/mol) cannot be directly compared to the errors in bond lengths (in Å). 2. Errors in different molecules and properties may be chemically of different importance, e.g. parameter sets giving a planar hydrogen peroxide geometry may be deemed unacceptable even if the total error for heats of formation is very low. Usually, both these problems are addressed by weighting the errors (Err) for the properties and for specific molecules or types of molecules that enter the overall sum of squares (SSQ) of errors prop entry 2 set prop entry (10) where N set is the number of training sets, N prop is the number of properties in a training set, N entry is the number of entries for a property in a training set, and w sp prop and w si entry are weighting factors that are specific for a property and an entry in a training set, respectively. The error (Err) is defined as ODM ref (11) where P si ODMx is the value calculated for a given property at the ODMx level with the current parameters, and P si ref is the corresponding reference value.
In previous general-purpose parametrizations of MNDOtype and OMx methods, the SSQ value of weighted absolute errors was minimized during the optimization. In practice, this was found to be a slow, iterative, trial-and-error procedure, as described in detail elsewhere. 46,48,49,69 In our present work, we initially also applied this conventional approach, which however turned out to be too tedious for our broad diversity of training sets (much broader than in the OMx case). Thus, we designed an alternative parametrization procedure specifically tuned for ODMx methods to reach the following objectives: 1. Aim for an accuracy that is better than or close to the accuracy of the corresponding OMx methods after redefining the SQC total energy. 2. Aim for an accuracy that is better than or close to the accuracy of the corresponding OMx methods for ground-state properties. 3. Aim for an accuracy that is better than or close to the accuracy of the corresponding OMx-D3T methods for noncovalent interactions. 4. Improve the accuracy of the ODMx methods in comparison with the corresponding OMx methods for excited-state properties.
In short, our goal is to obtain unified methods, which preserve good and eliminate bad qualities of the OMx and OMx-D3T methods for ground-state properties and noncovalent interactions, while improving the description of the excited-state properties and using the proper definition of SQC total energies. This clear breakdown of objectives allows for a systematic step-by-step parametrization of the ODMx methods. The key to simplifying the complicated optimization problem is to choose the proper error measure to be minimized. Since we deal with many diverse properties, we have chosen to focus on the ODMx errors relative to the corresponding errors of a reference SQC method (usually the corresponding OMx or OMx-D3T method) where P si ref SQC is the value of the property calculated with the reference SQC method.
This approach obviously meets objectives 1−3 in a straightforward manner. Moreover, it also resolves the two issues discussed above: relative errors are unitless and normalized for each individual property by definition, and the parametrization with regard to relative errors will tend to retain the performance of the underlying OMx or OMx-D3T methods for chemically important molecules and properties. To allow for larger flexibility and specific tuning, we still keep the option to adjust the weights for individual properties and molecules, if necessary. These conventions make parameter optimization much easier, because parameter changes that lead to very large errors of ODMx relative to OMx or OMx-D3T are easily identified and avoided by the optimizer. We note that in the final stage of the parametrization, we also ran conventional parameter optimizations minimizing the SSQ value of the absolute errors (starting from the best candidate parameter sets), which however did not lead to further improvement. Equation 12 requires that the denominator is not close to zero, which was therefore set to a small value (typically 0.1) whenever the reference SQC errors were very small. During the parametrization runs, the SQC calculations sometimes failed due to convergence problems, which is usually an indication of entering an unphysical region of parameter space. In such cases, the SSQ value was set to an arbitrarily huge number, and the parametrization was continued with a modified set of parameter estimates. This procedure was repeated until there were no remaining convergence problems (or otherwise the parametrization was terminated). Such a failsafe approach is necessary for a numerically stable parametrization algorithm.
A good initial guess for the parameters is very important. In our case, the corresponding standard OMx-D3T parameters are expected to provide an excellent guess. For ODM2 we optimized three element-independent parameters and 17 parameters per element (8 for hydrogen). ODM3 has two less parameters per element. We decided to retain the onecenter two-electron integrals derived from experimental atomic spectra, which are used in all OMx methods 50 and in MNDO. 57 We also decided to keep the standard OMx parameters for the effective core potentials. 50 During parametrization we paid special attention to large changes in the parameters to make sure that they do not assume unphysical

Journal of Chemical Theory and Computation
Article values. For this purpose we normally imposed strict limits on the allowed range of parameter values.
For parameter optimization we employed our own in-house parametrization program, which calls the development version of the MNDO program 107 for the ODMx calculations.
The detailed step-by-step protocol that was adopted for optimizing the ODMx parameters is documented in the Supporting Information. The final parameter values are presented in the next subsection.
3.3. Parameter Values. The values of the final ODM2 and ODM3 parameters are listed in Tables 1 and 2, respectively. The largest changes relative to the corresponding standard OMx-D3T parameters are found in the orthogonalization correction parameters F 1 , F 2 , G 1 , and G 2 . The smallest changes (all below 10%) occur for the U ss , U pp , and ζ parameters. The dispersion correction parameters are also changed only slightly in ODM2 relative to OM2-D3T, but there are larger shifts in ODM3 relative to OM3-D3T. However, detailed numerical tests show that the actual values of the dispersion corrections are similar for noncovalent complexes at the ODM3 and OM3-D3T levels. Moderate changes are observed in the resonance integral parameters. In the Supporting Information (SI) we present plots of resonance integrals of various types and for all combinations of diatomics as a function of the internuclear distance (Figures S1−S55 for ODM2 vs OM2 and Figures S56−S110 for ODM3 vs OM3). Inspection of these plots reveals that most of the ODMx resonance integrals are very similar to their OMx counterparts.

VALIDATION
In this Section, we compare the ODMx results with the corresponding OMx and OMx-D3T results for our big collection of benchmark sets covering ground-state properties (Subsection 4.1) and noncovalent interactions (Subsection 4.2). 50,54 We also evaluate the results for the most important excited-state benchmark sets from our previous work 55 (Subsection 4.3). We refer the reader to the cited literature for the description of these sets. Compared with our previous work 50, 54 we made only a few minor modifications to the CHNO, OVS7-CHNOF, and PDDG sets to correct some erroneous or outdated reference data (see the SI) or to use more appropriate symmetry definitions of molecules. In the following, we report only a statistical analysis of the performance of the methods considered; the underlying individual numerical results for energies are documented in the SI. All calculations were done using our developmental version of the MNDO program. 107 We applied the same computa-tional settings as in our previous studies. 50,54,55 Generally we used very tight convergence criteria. In the ground-state calculations, we applied the half-electron (HE) approach for open-shell molecules, 108 because the OMx 46,48,49 and ODMx methods were parametrized using this approach and because it is known that the HE-SQC treatment gives results that are generally superior to those from unrestricted Hartree−Fock SQC calculations. 109 We had to loosen the convergence criteria only in very few difficult cases of ground-state calculations. Excited-state properties were computed using multireference configuration interaction (MRCI) calculations with SQC Hamiltonians including single (S), double (D), and optionally also triple (T) and quadruple (Q) substitutions: specifically, CISDTQ for vertical excitation energies and MRCISD for excited-state geometry optimizations; in some cases, we had to use MRCISDT or MRCISDTQ instead of MRCISD (or different starting geometries) to achieve convergence of the geometry optimizations. The quoted OMx and OMx-D3T results for ground-state properties were

Journal of Chemical Theory and Computation
Article taken from our previous benchmarks, 50,54 except those for the updated sets (see above), which were recalculated. We used the same conventions as previously 50,54,55 for relative energies calculated at the OMx and OMx-D3T levels, i.e. they are based on heats of formation at room temperature (rather than ZPVEexclusive energies at 0 K) unless mentioned otherwise. The quoted OMx/MRCI results for excited-state properties were taken from our previous benchmarks of electronically excited states. 55 4.1. Ground-State Properties. Ground-state properties in the CHNO and FLUOR sets [45][46][47][48][49][50]110,111 were used for training both the ODMx and OMx methods. It is evident from Tables 3 and 4 that the ODMx methods are somewhat better than the OMx methods for heats of formation at 298 K. The inclusion of a posteriori D3T-corrections in the OMx methods significantly increases the mean absolute errors (MAEs) in the heats of formation at 298 K, which become systematically too small because the dispersion corrections are intrinsically negative. This highlights the importance of a consistent parametrization of SQC methods with dispersion corrections as integral part. Other properties including geometries, ionization potentials, dipole moments, relative enthalpies, and activation enthalpies are described similarly well by all SQC methods considered here. We note that ODM2 and ODM3 reproduce the bond lengths and bond angles in the CHNO set statistically somewhat better than their OMx and OMx-D3T counterparts.
Turning to the independent OVS7-CHNOF validation set 54 ( Table 5) that was not used for parametrization, the ODMx methods outperform their OMx counterparts for heats of formation of large molecules in the BIGMOL20 subset, 46,112 of anions in ANIONS24, 46 of various conformers in CONFORMERS30, 48 and of F-containing molecules in FLUORINE91. 113 They are however inferior for heats of formation of radicals in RADICALS71 109 and of cations in CATIONS41. 46 The ODMx and OMx methods perform similarly well for heats of formation of isomeric molecules in ISOMERS44. 48 The OMx-D3T methods again systematically underestimate the heats of formation (because of the uniformly attractive dispersion interactions included a posteriori) and thus suffer from larger errors in the heats of formation. For most other properties considered in the OVS7-CHNOF validation set, the ODMx methods and their OMx counterparts show similar errors; the ODM3 method performs better than OM3 and OM3-D3T for the ionization potentials in RADICALS71. 109 The MAEs in the heats of formation for the independent G2G3-CHNOF set 49,54,114−116 and in the enthalpy changes for its ALKANES28 subset 49,116 (Table 6) are in the same range for all methods considered; in the ALKANES28 subset, ODM2 outperforms OM2 in heats of formation, while ODM3 has the lowest MAE. It is also encouraging that the MAEs in the heats of formation at 298 K for the independent PDDG, PM7-CHNOF, and C7H10O2 sets are generally lower at the ODMx levels than at the corresponding OMx levels (Tables S1−S3). Other properties in the PDDG and PM7-CHNOF sets (geometries, ionization potentials, and dipole moments) are described similarly well by all methods.
The benefits of redefining the SQC total energy in the ODMx methods are clearly seen in the evaluation of the W4-11-CHNOF set 101 (Table 7). The reference ZPVE-exclusive atomization energies at 0 K and the relative energies derived therefrom are well reproduced by the ODMx methods (without any corrections), while the TAE140, TAE_nonMR124, and BDE99 subsets can be properly described by the OMx and OMx-D3T methods only after removing the ZPVE and thermal contributions from their heats of formation at 298 K.
The evaluation of the diverse reference data in the large GMTKN30-CHNOF set 117 leads to the impression that overall the ODMx methods perform somewhat better than the OMx and OMx-D3T methods (Table 8). Again, in the MB08-165, 118 W4-08, 119 W4-08woMR, 119 and BSR36 120 subsets, the MAEs can be reduced substantially for OMx
Concerning barrier heights, the performance of the ODMx methods is generally similar to that of the OMx and OMx-D3T methods. For example, the MAEs in 60 activation enthalpies (298 K) in the CHNO set are slightly higher for ODM2 and ODM3 (1.77 and 2.01 kcal/mol) than for the OMx methods (1.53−1.55 kcal/mol). On the other hand, the MAEs in 22 barrier heights of pericyclic reactions (BHPERI subset) are lower for ODM2 and ODM3 (6.49 and 6.80 kcal/ mol) than for their OMx counterparts (8.21−8.25 kcal/mol) and of similar magnitude as those for the OMx-D3T methods (6.69−6.78 kcal/mol). Compared to their OMx and OMx-D3T counterparts, ODM2 and ODM3 perform somewhat worse for the BH76 subset (54 barriers of hydrogen and heavyatom transfers, nucleophilic substitutions, unimolecular and association reactions), comparably bad for the O3ADD6 subset (only 2 barriers of ozone addition to unsaturated hydrocarbons), similarly for the HTBH38/08 (26 hydrogen transfer barriers), and somewhat worse for the NHTBH38/08 subset (23 non-hydrogen transfer barriers). Judging from the single-point results for the GMTKN30-CHNOF and CE345-CHNOF subsets (Tables 8 and 9), the MAEs of the ODMx methods for barriers and reaction energies seem to be overall of similar magnitude, while those for energy differences between conformers and isomers are lower.

Noncovalent
Interactions. The evaluation of the results for the A24-CHNOF, 164 19 and AF6 167 benchmark sets for noncovalent interactions shows that statistically ODM2 and ODM3 are rather similar to OM2-D3T and OM3-D3T, respectively, for energies at the reference geometries (Table 10) and for the optimized geometries (Table 11). The OMx methods without dispersion corrections are known to perform much worse (as expected). The ODMx methods are generally somewhat better than their OMx-D3T counterparts for predicting interaction energies in the hydrogen-bonded complexes of the A24-CHNOF, S22, S66, and JSCH-2005-CHNOF sets (Table 10) The ODM3 method suffers from one particular problem that also plagues OM3 and OM3-D3T: 50,54 geometry optimization of carboxylic acid dimers leads to symmetric cyclic structures with equal O−H bond distances, i.e. the methods fail to differentiate between covalent and noncovalent O−H bonds in these dimers. We did find ODM3 parameter sets that fix this problem, but their overall performance for other properties was less satisfactory than that of OM3 or OM3-D3T, and hence they were discarded. This underlines again how difficult it is to achieve an overall balanced

Journal of Chemical Theory and Computation
Article treatment of a large variety of target properties during parametrization.
Another problem common to many SQC methods 54 is the bad description of the HF dimer. The OMx and OMx-D3T methods give a cyclic structure with two equal H···F hydrogen bonds and strongly underestimate the interaction energy. ODM3 suffers from the same problem, while ODM2 yields a qualitatively correct geometry ( Figure 2) and an interaction energy of −2.2 kcal/mol that is still too small but much closer to the reference value of −4.6 kcal/mol than the values obtained otherwise (−1.2, −1.4, 0.5, 0.2, and 0.3 kcal/mol at OM2, OM2-D3T, OM3, OM3-D3T, and ODM3). Mainly because of the improved description of the HF dimer geometry, the MAE for selected angles in the A24-CHNOF set is much lower at the ODM2 level (4.9°) than at any other level (more than 8.5°).
Since the above sets contain only a few fluorine-containing noncovalent complexes, we also performed benchmarking on the X40×10-CHNOF set. This set is the subset of the X40×10 set constructed by excluding complexes containing elements Charged amino acids excluded. b Two complexes are attributed to both H-bonded and charged complexes subsets. c Folding enthalpies were not calculated at the ODMx levels, because this would require geometry optimizations at these levels. d Errors in folding energies were calculated using uncorrected changes in heats of formation at 298 K calculated at the OMx and OMx-D3T levels.

Journal of Chemical Theory and Computation
Article beyond CHNOF. Reference geometries and interaction energies were taken from the original publication 168 Table 12 we provide a statistical evaluation of the results for the vertical excitation energies of the VEE set that was included in the ODMx parametrization. The OMx/MRCI and OMx-D3T/MRCI results are trivially identical since the D3T-correction term does not affect the excitation energy at a given geometry. The ODMx/MRCI results are generally superior to their OMx/ MRCI counterparts: the overall MAE for the excitation energies is reduced by ca. 25% in the ODM2 case (0.35 vs 0.47 eV) and by ca. 20% in the ODM3 case (0.33 vs 0.42 eV). Singlet and triplet excitations are described with similar accuracy by all methods considered.
We also performed benchmarking on a previously introduced set of excited-state equilibrium geometries (called ExGeom) 55 and compared the results from the ODMx/MRCI, OMx/MRCI, and OMx-D3T/MRCI methods with reference results from time-dependent density functional theory (TDDFT) and coupled cluster theory (CC2). ODM2/MRCI performs very similarly to OM2/MRCI as seen from the MAEs in bond lengths and bond angles given in Table 13  (comparison to TDDFT; see Table S5 for comparison to CC2). ODM3/MRCI is slightly superior to OM3/MRCI for bond lengths, consistent with the observations for ground-state covalent bonds computed at the ODM3/SCF and OM3/SCF levels. The accuracy for bond angles is similar across all methods. As expected, dispersion corrections have practically no effect on the excited-state geometries of these small molecules (compare the OMx/MRCI with the OMx-D3T/ MRCI results in Table 13).
Some brief remarks on specific molecules that had been addressed in our previous benchmarking are as follows: 55 The singlet and triplet excited-state geometries of formaldehyde are better described by the ODMx/MRCI methods than by their OMx/MRCI counterparts (Table S86). More generally, the excited-state CO bond lengths in formaldehyde, acetaldehyde, and acetone from ODMx/MRCI are closer to the

Journal of Chemical Theory and Computation
Article experimental and TDDFT values than those from OMx/ MRCI (Table S87), but they are still underestimated. The pyramidalization of these carbonyl compounds upon excitation is reproduced well by the ODMx/MRCI, similarly to TDDFT, CC2, and OMx/MRCI (Table S88). The nonlinear equilibrium geometries of acetylene in several excited states are qualitatively well described both at the ODMx/MRCI and OMx/MRCI levels (see the ∠CCH angles in Table S89); for two states (2 1 A 2 and 2 3 A 2 ) acetylene is still predicted to be linear, whereas the reference TDDFT and CC2 calculations give slightly bent structures. Both the ODMx/MRCI and OMx/MRCI methods give excited-state structures of 9Hadenine, aniline, cytosine, and 9H-guanine with out-of-plane bending angles of the amino groups that are much too small compared to the reference TDDFT and CC2 results (Table  S90). Finally, we assess the performance of the ODMx/MRCI methods on the SKF (Send−Kuḧn−Furche) set of experimental 0−0 transition energies. 170 In view of the technical problems encountered in MRCI calculations of ZPVEs, 55 we compare theoretical values without ZPVEs to experimental values back-corrected using ΔZPVEs from (TD)DFT calculations. 55,170 As seen from Table 14 the ODM2/MRCI method is marginally better than OM2/MRCI for the 0−0 transition energies of the SKF set, while ODM3/MRCI performs statistically basically the same as OM3/MRCI. Again, as expected, the dispersion corrections do not have any significant effect for this set.
To conclude, we note that our current excited-state validations employ an MRCI treatment, which may no longer be feasible for larger active spaces that are often required for larger molecules. To deal with such cases, our code includes efficient implementations of the CIS and SF-XCIS (spin-flip extended CIS) methods 171,172 that allow for practical SQC explorations of large systems (with little loss of accuracy).

CONCLUSIONS
In this work we present two new semiempirical quantumchemical methods with integrated orthogonalization and dispersion corrections, ODM2 and ODM3 (ODMx). The electronic structure formalism is the same as in the established NDDO-based orthogonalization-corrected methods (OMx). In addition, the ODMx methods include D3-dispersion corrections with Becke−Johnson damping and with threebody corrections E ABC as an integral part, for proper

Journal of Chemical Theory and Computation
Article description of noncovalent interactions. Moreover, the total energy in the ODMx methods is defined in complete analogy to ab initio methods, and the traditional convention in NDDObased methods of using SQC total energies directly in calculating heats of formation at room temperature is abandoned. Instead, ODMx heats of formation at 298 K are determined by explicitly computing ZPVE and thermal corrections within the harmonic-oscillator and rigid-rotor approximations.
Compared with the previous OMx development, the parametrization of the ODMx methods targeted a much broader range of reference properties, covering in particular also vertical excitation energies. To ensure a balanced description of a large variety of ground-state and excitedstate properties as well as noncovalent interactions, we employed a novel robust parametrization procedure and a carefully chosen selection of representative training sets.
The performance of the ODMx methods was evaluated for a large and diverse collection of accurate reference data. The ODMx methods are found to perform overall somewhat better than the OMx methods for ground-state and excited-state properties, while their accuracy is similar to that of the dispersion-corrected OMx-D3T methods for noncovalent interactions. They are also formally more consistent: since they were parametrized with integrated dispersion corrections, there are no problems arising from the a posteriori addition of attractive dispersion terms to SQC methods parametrized without them. Therefore, heats of formation at 298 K are well described by the ODMx methods but are systematically too small for the dispersion-corrected OMx-D3T methods. Moreover, the redefinition of the total energy (in analogy to ab initio methods) removes ambiguities caused by associating them directly with heats of formation at 298 K (as traditionally done in the NDDO-based SQC methods). Thus, we recommend the ODM2 and ODM3 methods as standard tools for fast electronic structure calculations. To widen their scope we plan to extend them to heavier main-group elements.
The ODM2 method is the most complete model, shows good performance in our benchmarks, and would thus normally be the method of choice. Of course, SQC application studies should generally begin with a careful validation, and it may turn out that another SQC method is more appropriate for a particular problem, which should then be chosen for the actual production work. The benchmark results reported here and in our previous studies 50 We thank the European Research Council for financial support through an ERC Advanced Grant (OMSQC).

Notes
The authors declare no competing financial interest.

■ ACKNOWLEDGMENTS
We thank Deniz Tuna for his help with the excited-state benchmarks.