Theoretical Infrared Spectra: Quantitative Similarity Measures and Force Fields

Infrared spectroscopy can provide significant insight into the structures and dynamics of molecules of all sizes. The information that is contained in the spectrum is, however, often not easily extracted without the aid of theoretical calculations or simulations. We present here the calculation of the infrared spectra of a database of 703 gas phase compounds with four different force fields (CGenFF, GAFF-BCC, GAFF-ESP, and OPLS) using normal-mode analysis. Modern force fields increasingly use virtual sites to describe, e.g., lone-pair electrons or the σ-holes on halogen atoms. This requires some adaptation of code to perform normal-mode analysis of such compounds, the implementation of which into the GROMACS software is briefly described as well. For the quantitative comparison of the obtained spectra with experimental reference data, we discuss the application of two different statistical correlation coefficients, Pearson and Spearman. The advantages and drawbacks of the different methods of comparison are discussed, and we find that both methods of comparison give the same overall picture, showing that present force field methods cannot match the performance of quantum chemical methods for the calculation of infrared spectra.


■ INTRODUCTION
Infrared (IR) spectroscopy is a fast, cheap, and nondestructive method that has for a long time been a common tool of analytical chemists. Especially in recent times, its use has been extended to a wide variety of fields, ranging from pharmaceutical applications, 1 to food sciences 2,3 to medicinal applications, 4,5 and even to entomology. 6 The obtained spectra hold detailed information on molecular structure and dynamics, to the level that it can be used to analyze the hydrogen bonding patterns and structural details of liquid water. 7 However, the interpretation of spectra requires experience and often detailed previous understanding of the system in question. 8 At this point, computational tools can assist by predicting the spectra for compounds that have not, or possibly (as, e.g. short-lived reaction intermediates) cannot, be isolated to obtain their pure spectrum. In practice quantum chemistry (QC) and, especially, density functional theory (DFT) have been widely used for the purpose of predicting IR spectra, particularly since a vibrational analysis is also necessary for the characterization of stationary structures, as well as for thermochemistry and reaction kinetics. The range of systems that can be treated by DFT is, however, limited by the computational cost. It was realized quite some time ago that the IR spectrum of a protein is related to its structure. 9 Since then there has been a significant increase in the use of IR spectroscopy for the characterization of structure and interactions of proteins, 10 and even their structural dynamics. 11 For computational studies of these types of systems, and ground state compounds in general, force field based methods can in principal be applied. There is, however, currently limited understanding as to the reliability of vibrational spectra as predicted by force field methods.
A lot of work has been done on the evaluation of vibrational frequencies from molecular models, and indeed the MMx family of force fields were optimized in part to reproduce frequencies. 12−15 We have recently studied force field thermochemistry and evaluated entropies, heat capacity at constant volume, and internal energy 16 for the CHARMM General Force Field (CGenFF) 17 and the General Amber Force Field (GAFF). 18 These models are used routinely for a large number of applications ranging from liquids simulations, 19,20 over free energy of solvation calculations 21,22 to drug design, 23,24 rendering it essential to understanding their capabilities and limitations as well as possible. In the development of both GAFF 18 and CGenFF, 17 vibrational frequencies from quantum chemistry calculations were used to derive force constants; one would therefore expect the vibrations to be reproduced reasonably well. We note that a GAFF topology server, provided by Procacci, 25 also provides infrared spectra of the compound submitted to the server.
In continuation of our previous work, 16 we study here the performance of force fields for the calculation of infrared spectra from normal modes and the associated changes in dipole moments. In addition to the force fields used in our previous studies, we include here the widely used Optimized Potential for Liquid Simulations (OPLS). 26,27 These spectra obtained from force field calculations are, together with quantum chemistry data from the Alexandria library, 28,29 compared to experimental reference spectra.
Representing spectra as a linear vector of intensities provides the possibility to use a number of simple mathematical procedures for their quantitative comparison. For a long time, statistical correlation coefficients have been used for the quantitative comparison and identification of IR spectra. 30 In the meantime, a multitude of other methods has been developed for this purpose. 1 However, these newer methods mostly focus on the comparison of experimentally obtained spectra with database references; i.e., they are mostly developed to cope with experimental issues such as admixtures of other compounds, experimental noise, and often specific sets of compounds. Such features are rather different from the type of difference we can expect to observe in this study, where the question is whether the computed spectra at all exhibit the correct features. Therefore, we selected the well-known Pearson correlation coefficient and the Spearman correlation coefficient for this study, as described in Methods.

■ METHODS
Calculation of Infrared Spectra. Calculation of the infrared spectra was based on the Gaussian-4 calculations and B3LYP/aug-cc-pVTZ optimized structures of compounds contained in the Alexandria library. 28,29,31 Normal-mode analysis using the CGenFF, GAFF-BCC, and GAFF-ESP force fields has been previously described in ref 16. For the OPLS force field calculations, the topologies were generated using the python version of LigParGen. 32 Integer molecular charges were assigned using RDkit 33 with the mmff94 force field. Partial charges were calculated using the 1.14*CM1A method. 34,35 Using these topologies, a normal-mode analysis was conducted with GROMACS 36 as described previously. 16 The normal modes are based on the calculation of the Hessian matrix, , of second derivatives of the energy with respect to the atomic coordinates q where i and j run from 0 to N − 1 if N is the number of atoms in the compound. If virtual sites v are used, for instance, to model the σ-hole for halogen atoms in CGenFF, 17 the energy, E, depends on the positions of both atoms and virtual sites; that is, E = E(q 0 ,...,q N−1 ,v 0 ,v M−1 ), where the positions of the M virtual sites v in the compound are a function of the atomic coordinates q. The influence of the virtual site positions on the Hessian needs to be taken into account when computing . In GROMACS, 37 this calculation is done numericallythe N atoms are moved independently in all three spatial dimensions and the forces are computed. From these forces, the second derivative of the energy is then evaluated numerically. From the 2019 version of GROMACS, virtual sites positions are updated before each force calculation, which means that their influence on the Hessian is taken into account explicitly. The force field files used in this work are available at the http:// virtualchemistry.org repository. 38 Vibrational frequencies obtained using common quantum chemical methods are known to suffer from systematic errors, mainly due to basis set incompleteness, neglect of anharmonicity, and incomplete treatment of electron correlation. 39 In order to balance this systematic deviation, a multitude of scaling schemes have been developed, and at least single scaling factors have been determined for a large number of methods and basis sets. 40−42 For the frequencies obtained from the quantum chemical methods we applied single scaling factors of 0.965 for B3LYP/6-31G(2df,p) and 0.968 for B3LYP/aug-cc-pVTZ. 43 The unscaled frequencies at the B3LYP/6-31G-(2df,p) level of theory were extracted from the G4 calculations of the Alexandria library. Other values have been suggested, 44 but as will be discussed below, they give slightly worse results in this study. Therefore, we use the scaling factors given here for all spectral comparisons in this study unless noted otherwise. We also tested the applicability of single scaling factors for frequencies derived from force field calculations, as described in the following section. These, however, did not lead to any improved agreement with the experimental spectra.
For the calculation of a full IR spectrum, in addition to the eigenfrequencies of the normal modes, the intensities and the line shapes are required. In the case of the quantum chemical calculations, both the eigenfrequencies and the corresponding IR intensities are produced by default when a frequency calculation is requested in the Gaussian software, 45 and were thus readily available from the Alexandria library. Details of the quantum chemical calculations from which the frequencies were obtained have been presented previously (see refs28 and 29). For the force field calculations, the intensities I n were derived from the transition dipole derivatives: where k iterates over Cartesian dimensions, p is the dipole moment of the molecule, and Q n is the normal coordinate n.
In order to take into account virtual sites v, we note that and rewrite eq 2 as where s iterates over the N atomic coordinates. To make the calculation of intensities practical, the numerical derivative of the dipole moment with respect to the atomic coordinates Each pair of eigenfrequency and intensity is represented by a Lorentz distribution, centered at the eigenfrequency and multiplied by the intensity. The full width at half-maximum (fwhm) of each distribution was set to 24 cm −1 , as suggested by Mott and Rez 46 on the basis of the estimated bandwidth observed in the NIST database, 47 which is also the source of our experimental reference spectra in this work. However, as will be discussed below, the exact value used for the bandwidths was found to be of only minor importance for any of the analyses presented in this study. Both frequency range and resolution of the calculated spectra were matched to those of the corresponding experimental spectrum (most commonly found values are in a range from 450 to 3966 cm −1 with a resolution of 4 cm −1 ). To obtain the entire spectrum of a molecule, all Lorentz distribution vectors for that molecule were added. Finally, the resulting spectra were normalized with respect to the area under the curve.
Quantitative Evaluation of Spectra. For the comparison of the calculated spectra with the reference spectra we used two statistical measures, the first of these being Pearson's product moment correlation coefficient: where x i and y i are the elements of the intensity vectors representing the spectra under comparison and x ̅ and y ̅ the mean values of the x i and y i values, respectively. The Pearson correlation coefficient provides a measure for the linear correlation between two vectors and has been used in several studies comparing IR and Raman spectra. 48−51 The second statistical measure applied in this study is Spearman's rank correlation coefficient: where d i is the difference between the ranks of x i and y i in their respective data set and n the number of elements in each vector. This coefficient provides a measure for the monotonic correlation between the two vectors. Compared to Pearson's correlation coefficient, it is generally less sensitive toward outliers and can also represent a nonlinear relationship. To the best of our knowledge, this measure has hitherto only been used in one study comparing calculated with experimental vibrational spectra. 48 In addition to the statistical treatment, a selection of spectra were compared visually. This was primarily done for compounds giving the lowest and highest correlation coefficients of both types, as well as those where we observed a significant difference between the two measures. Finally, we have utilized both statistical measures to determine whether scaling factors can be determined for the force field methods in a fashion similar to quantum chemical methods. Because extraction of the underlying eigenfrequencies from the large number of experimental spectra in parts containing many overlapping bands did not seem feasible, we have chosen a more direct approach of maximizing the respective statistical measure as a function of the scaling factor, using Brent's method as implemented in Scientific Python. 52

■ RESULTS AND DISCUSSION
We computed the theoretical spectra using all six computational approaches for 703 of the compounds for which experimental reference spectra from gas phase measurements were available. The central limiting factor for this number isbesides the availability of experimental spectrathe ability to generate valid topologies for the compounds for all force field methods used in this study. The generated spectra are available at the compound pages of the http://virtualchemistry.org web site.
The two statistical measures used for the quantitative comparison between the calculated spectra and the experimental spectra for the same substance are listed in Table 1 averaged over chemical classes and over all compounds. The corresponding compound-wise table can be found in the Supporting Information as Table S1. A number of things become immediately clear from the data shown in Table 1. In terms of reproduction of the experimental spectra, the quantum chemical methods perform clearly better, independent of which measure is used to quantify the agreement. The larger basis set gives marginally better results overall (Pearson's correlation is essentially the same; Spearman's correlation, slightly higher), which is expected as frequencies obtained from B3LYP calculations tend to improve with basis set size. 53 Among the force field methods, CGenFF performs best, and GAFF, weakest, with ESP derived charges giving somewhat worse results than BCC charges. Here it should be noted that the ESP charges used with GAFF in this study have not been derived with the same quantum chemical method that was used in the development of the force field (B3LYP/aug-cc-pVTZ instead of HF/ 6-31G*). 29 However, the difference between the results of the two GAFF models is relatively small and might be biased due to the selection of compounds being compared, as GAFF with BCC charges performs better especially for aliphatic compounds (cf. Table 1), which make up more than a third of all compounds. GAFF-ESP, on the other hand, performs better especially for thiols, amides, and amines, which only stand for 73 compounds in total. Because the spectra derived with the two different charge models mainly differ in the intensity of the peaks, we can speculate that the weaker performance of GAFF-ESP for aliphatic compounds is due to some of the assigned charges being poor representations of the atoms' contribution to the dipole moment (carbon atoms tend to get large positive charges; hydrogen atoms, sometimes negative charges). This would affect aliphatic compounds more strongly than others, as the charges in them are generally small, leading to small deviations having a large relative impact. At the same time, these differences in correlation coefficients that can be observed between the two charge models for GAFF also give a first estimate as to what degree the deficient agreement with the experimental spectra is due to poorly matched intensities. An improved assignment of charges would likely improve the correlation coefficients approximately to this extent. More advanced charge models, e.g., use of polarizable charges, could likely expand this range and allow some further improvement. However, in order to match the performance of the QC methods, likely more fundamental revision of the force fields would be necessary.
In terms of performance for different classes of compounds, all methods gave comparatively good results for alkenes, and force fields also for alkanes, both independent of the measure used for quantification. On the other hand, arylfluorides were found to be problematic for all methods, and amines for force fields, although less so for quantum chemical methods. In the case of the correlation coefficients found for amines for the quantum chemically derived spectra, it becomes clear that the two statistical measures at least to some degree are sensitive to different properties of the compared spectra. For both quantum chemical methods, the Pearson correlation gives a comparatively low score, while the Spearman correlation coefficient is close to the average for all compounds. These discrepancies become even more clear in the case of cycloalkanes, for which the Pearson correlation is one of the two best and the Spearman correlation one of the two worst for both QC  Only classes containing at least five compounds are shown. The two highest correlation coefficients of each type for each method are marked in bold; the two lowest, underlined. In the last line, the standard deviation of the correlation coefficient over all molecules is given. Please note that the sum of the compounds in all classes is larger than the total number of molecules, as each molecule can belong to several classes.

Journal of Chemical Theory and Computation
pubs.acs.org/JCTC Article methods, and for amides, where the Pearson correlation is one of the lowest and the Spearman correlation one of the highest when the spectra are generated on the B3LYP/aug-cc-pVTZ level. In most cases, however, the two measures agree well in terms of relative ordering of agreement for both classes as well as individual compounds. However, the value of the Spearman correlation is nearly always higher (in the class averaged data, the only exceptions are cycloalkanes, and cycloalkenes for both QC methods, heterocyclic compounds for B3LYP/6-31G-(2df,p), and thiols for B3LYP/aug-cc-pVTZ).
In order to understand the differences between the two correlation coefficients, we have selected four compounds for which the scores are distinctively different. These are shown in Figure 1, the corresponding correlation coefficients are listed in Table 2. cis-Decalin gives exceptional results in the way that for all but the OPLS-generated spectra the Pearson correlation coefficient is larger than the Spearman correlation coefficient, and above the mean value for all compounds. At the same time, the Spearman correlation is below average for all methods. From Figure 1a it can be seen that the experimental spectrum of cis-decalin is dominated by two overlapping bands just below 3000 cm −1 . This feature is in general reproduced by all methods, with the QC methods placing it at somewhat high wavenumbers, and all force field methods assigning it a far too small intensity as they all produce excessive bands with significant intensity, especially around 1500 cm −1 , where only a small low-intensity band is observed in the experimental and QC spectra. All force field methods except for OPLS place the high-wavenumber feature close to the correct wavenumber, whereas OPLS produces an intensity minimum where the experimental spectrum has its maximum. This is also the only significant and systematic difference between the OPLS generated spectrum and the group of the three other force field generated spectra, which therefore is likely the reason for the large difference in terms of Pearson correlation coefficients (Spearman correlation coefficients are much more similar). It seems that reproduction of a dominant feature in the reference spectrum has a decisive impact on the Pearson correlation coefficient. This is not unexpected, as the coefficient is determined by multiplication of the (mean-adjusted) intensities. At least for  Journal of Chemical Theory and Computation pubs.acs.org/JCTC Article this example case, the Spearman correlation coefficient could thus be seen to provide a better estimate of the overall similarity of the spectra. The second example set of spectra, shown in Figure 1b, is for the compound 1,3-oxazole. In this case the two correlation coefficients give a similar picture for each of the force fields (to the degree that Spearman correlation generally gives higher values than Pearson correlation), i.e., a rather poor agreement with the experimental spectrum. For the QC methods, however, we also observe here that the Pearson correlation coefficients are significantly higher than the Spearman correlation coefficients. For this compound, the experimental spectrum does not have a single dominant feature but shows a multitude of lower intensity bands below approximately 1800 cm −1 . The majority of these bands is reproduced by the QC derived spectra, albeit with some comparatively minor deviations in both intensities and wavenumbers. Still the comparison between the QC spectra with experimental spectrum produces rather low Spearman correlation coefficients. The reason for this could be that the small range of intensity values covered in these spectra can lead to rather large differences in the rank of a data point, as used for calculation of the Spearman correlation, already from small differences in intensity. In this case, it seems, it is rather the Pearson correlation coefficient that provides an appropriate estimate of the similarity of the spectra. Figure 1c shows the spectra of ethyl sulfate. For this compound the Spearman correlation coefficients indicate good agreement for all methods, while the Pearson correlation coefficients suggest poor agreement, except for CGenFF where it indicates low to intermediate agreement. Indeed, the agreement of the calculated spectra with the experiment seems rather poor for this compound, except for CGenFF, which reproduces the positions of the bands rather well, even if the agreement of the intensities is not very good. Thus, here it again seems that the Pearson correlation represents the actual similarity between the spectra better, although one might argue that it underestimates the agreement of the CFenFF spectrum somewhat. The considerable overestimation of the agreement of the spectra by the Spearman correlation for this compound might be due to the small number of features, which are mostly concentrated to the low-wavenumber region of the spectrum (below 1500 cm −1 ), leading to separate regions with data points of high and low rank, respectively. Only reproducing this separation into regions is then sufficient to give a high Spearman correlation coefficient, even if the actual bands are poorly reproduced. This effect might actually be the reason behind the observation that the Spearman correlation generally gives higher scores than the Pearson correlation, as vibrational spectra in general share a common structure with bands below 2000 cm −1 , around 3000 cm −1 , and 3600 cm −1 , and with the remaining regions of the spectrum generally devoid of bands. A similar reason can also be suspected behind the very different scores obtained for the final example shown in Figure 1diethyl oxalate. Here, the scores obtained for the QC derived spectra match reasonably well, indicating good agreement of the spectra, with the Pearson score being somewhat higher for the spectra obtained with the 6-31G(2df,p) basis set, as the band around 1800 cm −1 is reproduced better. For the force field methods, on the other hand, the Pearson correlation coefficient indicates close to no correlation at all, with slightly better scores for the GAFF derived spectra, in which the band around 1200 cm −1 is at least reproduced approximately. The Spearman correlation, however, indicates still an intermediate to good agreement for the force field methods (with the GAFF derived spectra actually having the lowest scores). Once more, one tends to agree more with the Pearson correlation.
As discussed in the previous section, it is known that QC methods give systematic deviations for vibrational frequencies.
Several of the factors behind these systematic deviations, also apply directly, e.g., the harmonic approximation used in calculating the Hessian, or indirectly, through parameters derived from QC calculations, to spectra calculated using force field methods. We have therefore tested whether scaling factors, as are used for QC methods, could be obtained for the force field methods used in this study as well. The scaling factors resulting from a maximization of the two statistical measures we are using here to quantify the spectral similarity are shown in Table 3. Since our methodology to derive the scaling factors differs significantly from the method commonly employed for the determination of QC scaling factors, i.e., minimizing the RMSD of peak wavenumbers, we also applied our methodology to the two QC methods used in this study. For the QC methods, both statistical measures gave rise to a scaling factor that only differed insignificantly (cf. ref 54) from the literature values and lie consistently between the values obtained from the two different databases. Use of Spearman's correlation coefficient gives a slightly higher value, i.e., somewhat closer to the scaling factors found in the CCCDBD. At the same time, application of the scaling factor also improves the correlation coefficients considerably (the unscaled QC methods actually perform worse than at least CGenFF). For the force field methods, on the other hand, none of the optimized scaling factors differs significantly from unity, and also the improvement of correlation coefficients from applying them is marginal at best.  44 ) values and correlation coefficients resulting from these for the quantum chemical methods are given in the second line. It should be noted that the exact combination of functional and basis set used here is not available in the UMN database. The value used is given for the marginally larger 6-31G(2df,2p) basis set. b CCCBDB. c UMN.
Journal of Chemical Theory and Computation pubs.acs.org/JCTC Article We also attempted the same type of optimization for the width of the Lorentzian used for the calculation of the spectra, the results of which are shown in Table 4. Although, for the Pearson correlation, in all cases a significant improvement of the correlation coefficient could be achieved, the obtained optimized bandwidths are clearly unphysical. Also, in general, the lower the initial correlation score was, the larger the optimized bandwidth was obtained. For the Spearman correlation, the increase in coefficients upon optimization was negligible, and also the dependence on the starting score is nonmonotonic. Still, even using this score for optimization of the bandwidths led, in all cases, to increased bandwidth and, in most cases, to clearly unphysical values. This is very likely caused by the fact that unless the peak positions are very well matched between the calculated and experimental spectrum, a widening of the bands will always lead to an increased overlap between corresponding bands and therefore an improved correlation coefficient. This behavior is more strongly and immediately observed for the Pearson correlation that is derived directly from the intensities than for the Spearman correlation that first converts the values to ranks (in a spectrum containing only a single peak, the ranks of the data points would actually not be affected at all by the bandwidth, as long as the Lorentzian is not truncated). In any case, close to the value of 24 cm −1 used in this study, the dependence of the correlation coefficients on the width is small for Spearman's correlation. For Pearson's correlation it is larger, but similar for all methods; i.e., within a physically reasonable range the exact value used for the calculation of the spectra does not have a significant impact on the comparison of the different methods. This is demonstrated in Table S2, where the variation of the correlation coefficients for fwhm within the range from 6 to 60 cm −1 for all methods is given.

■ CONCLUSION
We have here demonstrated the calculation of infrared spectra using normal-mode analysis with a number of available general force fields in comparison with quantum chemistry. Also, we have studied the applicability of two different statistical measuresPearson's product moment correlation and Spearman's rank correlation coefficientsfor the specific purpose of quantifying the quality of a calculated spectrum using the gas phase experimental spectrum of the same compound as a reference. Independent of the statistical measure used, it has become clear that the performance of force fields for IR spectrum calculations currently still leaves immense room for improvement. We suggest that future development of force fields takes this into account and makes use of the methodology we have demonstrated in the article.
In terms of applicability of the two statistical measures, we found that both have benefits and drawbacks depending on the type of spectrum. For spectra with a single (or few) dominant, high-intensity band(s), Pearson's product moment correlation tends to overrate the overlap with this feature, giving a high coefficient even if the remaining smaller bands are only poorly, or not at all, reproduced. Spearman's rank correlation, on the other hand, gives overly high coefficients especially for spectra with only a few features that can be very poorly reproduced by the calculated spectrum. Conversely, Spearman's correlation can be said to better represent approximate matches of bands, whereas Pearson's correlation better indicates agreement of the most dominant feature(s). With this in mind the two measures can be used complementarily. In any case, It should be noted that, in the majority of cases, both coefficients used here give a similar and appropriate picture of the degree to which the spectra agree (with Spearman correlation coefficients generally being somewhat higher than their Pearson counterpart).
Finally, we demonstrated that, through optimization of the statistical measure, one can reproduce the scaling factors for vibrational frequencies used for QC methods and that the vibrational frequencies obtained from force fields do not exhibit similarly systematic deviations from experimental reference. Thus, it is not possible to improve the spectra obtained from force field calculations by application of a scaling factor. Also, optimization of the width of the Lorentzian function used in constructing the spectra is not possible using the correlation coefficients, as with the present large deviations in peak positions the overlap of spectra is increased by widening the bands even into an unphysical regime.