Least-Squares Fitting of Multidimensional Spectra to Kubo Line-Shape Models

We report a comprehensive study of the efficacy of least-squares fitting of multidimensional spectra to generalized Kubo line-shape models and introduce a novel least-squares fitting metric, termed the scale invariant gradient norm (SIGN), that enables a highly reliable and versatile algorithm. The precision of dephasing parameters is between 8× and 50× better for nonlinear model fitting compared to that for the centerline-slope (CLS) method, which effectively increases data acquisition efficiency by 1–2 orders of magnitude. Whereas the CLS method requires sequential fitting of both the nonlinear and linear spectra, our model fitting algorithm only requires nonlinear spectra but accurately predicts the linear spectrum. We show an experimental example in which the CLS time constants differ by 60% for independent measurements of the same system, while the Kubo time constants differ by only 10% for model fitting. This suggests that model fitting is a far more robust method of measuring spectral diffusion than the CLS method, which is more susceptible to structured residual signals that are not removable by pure solvent subtraction. Statistical analysis of the CLS method reveals a fundamental oversight in accounting for the propagation of uncertainty by Kubo time constants in the process of fitting to the linear absorption spectrum. A standalone desktop app and source code for the least-squares fitting algorithm are freely available, with example line-shape models and data. We have written the MATLAB source code in a generic framework where users may supply custom line-shape models. Using this application, a standard desktop fits a 12-parameter generalized Kubo model to a 106 data-point spectrum in a few minutes.

on left panel indicate the center-line of the 2020 experimental data. We overlay the same center-lines onto the residual spectra on the right panel as an aide in understanding how the experimental lineshape differs from the best fit model and how this residual difference might explain why the 2020 CLS decay is so different from the 2021 CLS decay.
Video S9: https://youtu.be/WxwiD7DwLTw Description: Comparative plots of experimental (left), model fit (middle), and residual (right) 2D IR waiting-time spectra of MeSCN in DMSO collected in 2021. Yellow dots on left panel indicate the center-line of the 2020 experimental data. We overlay the same center-lines onto the residual spectra on the right panel as an aide in understanding how the experimental lineshape differs from the best fit model and how this residual difference might explain why the 2020 CLS decay is so different from the 2021 CLS decay.
Video S10: https://youtu.be/kIbpuhalFzY Description: Recording of model fitting update window, which includes fitting trajectories of 10/12 parameters, for 100 trials of 2D IR waiting-time spectra representative of a cyanylated cysteine reporter in Calmodulin protein (SNR 100:1) based on parameters reported by Schmidt-Engler and coworkers, 2 including phasing errors (RMSD 6°) but without a phasing error fitting parameter.
Video S11: https://youtu.be/MdaV4_uKzZU Description: Model fitting results plotted as fit/true ratio for 100 trials of simulated 2D IR waiting-time spectra representative of a cyanylated cysteine reporter in Calmodulin protein (SNR 100:1) based on parameters reported by Schmidt-Engler and coworkers, 2 including phasing errors (RMSD 6°) but without a phasing error fitting parameter.
Video S12: https://youtu.be/pvm2vhSMm7k Description: Recording of model fitting update window, which includes fitting trajectories of 10/13 parameters, for 100 trials of 2D IR waiting-time spectra representative of a cyanylated cysteine reporter in Calmodulin protein (SNR 100:1) based on parameters reported by Schmidt-Engler and coworkers, 2 including phasing errors (RMSD 6°) and fitting parameter to account for phasing error.
Video S13: https://youtu.be/3XQO1OYDWHE Description: Model fitting results plotted as fit/true ratio for 100 trials of simulated 2D IR waiting-time spectra representative of a cyanylated cysteine reporter in Calmodulin protein (SNR 100:1) based on parameters reported by Schmidt-Engler and coworkers, 2 including phasing errors (RMSD 6°) and fitting parameter to account for phasing error.

B. Asymmetry of Kubo Lineshape
As shown in Figure S1., the generic 2D Kubo lineshape has a frequency dependent asymmetry to it. As a result, fitting a symmetric function (e.g. Lorentzian, Gaussian, Voigt) will yield inaccurate center-lines for CLS method. We therefore prefer asymmetric fitting functions for measuring the center-line (e.g. a Lorentzian + linear term).

C. Lineshape Model
The usual treatment for computing the 2D IR spectrum in the frequency-frequency domain is to first compute the rephasing spectrum in the (t 1 , t 3 ) = (−, +) quadrant and the nonrephasing spectrum in the (t 1 , t 3 ) = (+, +) quadrant, then FFT along the pump axis (t 1 → ω 1 ), and finally FFT along the probe axis (t 3 → ω 3 ). For model fitting in the (t 1 , ω 3 ) domain, this would require a total of three FFT's. We prefer the more direct route shown in Eq. S1 and Eq. S2 where we compute the rephasing spectrum in the (t 1 , t 3 ) = (+, −) quadrant. This ensures t 1 ≥ 0 and requires only one FFT along the probe axis (t 3 → ω 3 ) to reach the (t 1 , ω 3 ) domain. Here R + denotes nonrephasing, R − denotes rephasing and ϕ accounts for a symmetric phasing error when applicable. Note that R + and R − are analytic response functions, and therefore DC components require multiplication by ½ prior to FFT. 4 Eq. S1 Figure S1. Asymmetry of generic 2D Kubo lineshape for a two-level system (omitting the anharmonic peak). (A) Slices of 0-1 peak along the probe axis for a 2D Kubo lineshape at early waiting time taken at three different pump frequencies (red is left of center, black is center, blue is right of center). The first moment of each slice is illustrated by a vertical band. Results clearly show frequency dependent asymmetry of the Kubo lineshape. (B) Same as for (A) but now along the pump axis.
For simplicity, we assume stimulated emission and ground state bleach are equivalent, and hence, R ± is given by Eq. S3 where ESA denotes excited state absorption and GSB denotes ground state bleach.
Eq. S3 Because computers use floating-point arithmetic to do math, we expect small numerical errors. In general, the following guidelines are recommended to minimize floating-point errors: 5 1. We prefer Multiplication and Division, which are the most reliable operations. 2. We avoid Addition and Subtraction whenever possible, which are particularly susceptible to roundoff errors when: a. The two values differ by many orders of magnitude. b. The two values are nearly equal and subtracted (actually, this one is so egregious it has its own name: cancellation error). 3. We limit the number of operations as much as possible because floating-point errors propagate and magnify with every operation.
In fact, given enough calculations, floating-point error can grow so large that a fitting problem may become ill-conditioned just from the accumulated error alone. Though we have not confirmed that this is critical in computing nonlinear lineshape functions, we note that the common written expression for the multidimensional Kubo lineshape is ideal for floating-point errors (lots of addition and subtraction!). Therefore, in programming lineshape functions, we exercise caution and consider the three guidelines above. Their expression below accurately reflects our MATLAB code, though we acknowledge that our expressions may not be the "best" or most "stable" means of computing the lineshapes.
The full lineshape for GSB is shown in Eq. S4 where ω 01 is the 0-1 center frequency, ω s3 is the DC-shift frequency along the probe axis, ω s1 is the DC-shift frequency along the pump axis, and δω 1 is the calibration mismatch error between the pump and probe axis (typically < 1 cm −1 ). The lineshape for ESA is provided by Eq. S5 where Δ Anh is the anharmonicity.
As noted above, these expressions differ from the usual presentation in literature in order to accurately reflect our programming, which considers the guidelines above for minimizing floating-point error. Note that expm1 is a standard function dedicated to computing a stable form of "exp( ) − 1", which is otherwise susceptible to cancellation error near = 0.

D. Derivation of the Variance-Covariance Matrix
We denote true parameters as * �N p × 1� and a perturbation in data (e.g. noise) as (N D × 1) such that = ( * ) + . For an unperturbed data set ( = ) located at the global minimum ( = * ), we have that = by definition of a minimum (in the main text, is equivalent to ). Now suppose the data is perturbed ( ≠ ) which shifts the global minimum to * + . Then for infinitesimally small , the total differential of in Eq. S9 relates and via d � ( , )� = 0, which is true because ( , ) = 0 at both minima.
We rearrange Eq. S9 to show that = −� � − � � assuming is invertible. To compute , we must apply to = −2 T − . The residual T = ( − ) is the only factor in that depends on . Given that ( ) is equal to the identity matrix, it follows that = (−2 T − ) = −2 − where an extra transpose is necessary when evaluating to ensure the dimensions of (N p × N D ) are compatible with multiplication by (N D × 1). The differential occurs about the unperturbed coordinates where = , so we evaluate = 2� T − − � = 2 T − where j,k = T − 2 / p j p k = 0. Upon substitution we find is linearly related to the perturbation by Eq. S10 (i.e. error propagates linearly from the data to fit parameters).
Eq. S10 is easy to test. One may conjure up a model , add to it any arbitrary noise or perturbation (with − ∝ ⟨ ⟩ − ), and confirm by model fitting that the perturbation in the fit is correctly predicted by Eq. S10 for sufficiently small noise. See section H below for instruction for verifying Eq. S10 using the model fitting GUI.

Computing
is straightforward now: ≝ ⟨ ⟩. That is, the parameter variancecovariance matrix is the expectation of the outer product where we denote the expectation of noise by brackets. The outer product ⟨ ⟩ (of size N D × N D ) is known as the data variance covariance matrix (∝ ). Note that for a data set of size 256 × 64 × 64, the triangle form of ⟨ ⟩ occupies 1 terabyte of memory. It's well known that noise is in fact covariant for multidimensional spectroscopies utilizing pulsed lasers (particularly along the probe axis), but accounting for this is clearly impractical. However, it is worth noting that ⟨ ⟩ could be well approximated by a sparse matrix which only accounts for the covariance along the probe axis per individual waiting time spectrum, which could reduce the memory footprint of ⟨ ⟩ well enough to make this a tractable problem.
Now let − = α⟨ ⟩ − where α is an arbitrary proportionality constant. Then , we find the formula for in Eq. S11.

E. The Modified Variance-Covariance Matrix for the CLS Method
The derivation of the modified covariance matrix is a simple extension of the covariance derivation above. As before, let true parameters be denoted as * . As an example, consider the problem of fitting the linear absorption spectrum via the CLS method with a two-Kubo component model plus a homogeneous dephasing component. Furthermore, in keeping with the MATLAB example located at Manuscript Examples\CLS Propagation of Error\TestSIEq13.m, we define the order of the fitting variables as * = [A 01 ; c; Δ 1 2 ; Δ 2 2 ; T Hom −1 ] implying N = 5. Then the shift of the global minimum away from * due to the addition of noise in the linear absorbance spectrum and error in measuring the Kubo time constants by fitting the CLS decay, are related by the following total differential.
a Here corresponds to the true noise while corresponds to the residual which approximately resembles the true noise at the global minimum. However, the residual variance at the global minimum is smaller than the true variance by roughly N p degrees of freedom, and hence the minus N p term.

F. Comparison Between Referenced and Unreferenced Data
Table S1 shows results of model fitting to the isotropic response of 200 mM MeSCN in H2O for both referenced and unreferenced experimental data. We only collected data up to 10 ps in waiting time, and therefore, we held the vibrational lifetime (known to be ~30 ps) constant during the fit. Parameters between the two data sets agree surprisingly well which shows that model fitting is still reliable for unreferenced data dominated by spectrally correlated shot-toshot intensity fluctuations. However, the 10× smaller error bars on edge-pixel referenced data clearly show the benefits of removing correlated noise from the data prior to fitting. Figure S2 shows a comparison of FIDs (T W = 0.5 ps) for both data sets. As seen from the data, the unreferenced noise is comparable in magnitude to the signal. Table S1: Results of model fitting to edge-pixel referenced and unreferenced data.

S9
We note that these results differ from those published by Yuan and Fayer 1 (τ 1 = 0.4 ps, τ 2 = 1.7 ps, Δ 1 2 = 33.64 cm -2 , Δ 2 2 = 6.76 cm -2 , T Hom −1 = 0.3297 ps -1 ). The differences in results are reminiscent of the those between our 2020 and 2021 data of MeSCN in DMSO shown in the main text. That is, we did not find convergence with a two-Kubo model even though the CLS in 2020 data suggests two components, and the Kubo time constant obtained by model fitting with a one-Kubo model is ~30% faster than reported by Yuan and Fayer. We propose that measuring the CLS decay as a function of pump power might resolve these inconsistencies.

G. Model Fitting with Phasing Errors
We test the effect of phasing errors on model fitting by repeating the simulated Calmodulin experiment, shown in Figure 5 of the main text, with a random phasing error applied to each separate trial (out of 100 trials) sampled from a Gaussian distribution with a standard deviation of 6°, which is roughly the accuracy reported by Backus et al. 8 To better distinguish the effect of phasing errors from regular noise, we also increase the SNR to 100:1. Fitting trajectories in Video S10 show that, without a phase correction fitting parameter, model fitting yields reasonable accuracy with the exception of ω 01 (the 0-1 center frequency), which varies ±1 cm -1 to adjust for the phasing errors. It is important to enable a ±2 cm -1 range for the pump axis calibration error δω 1 here to avoid boundary stalling since δω 1 works to counteract the ~1 cm -1 deviations ω 01 along the pump axis Unfortunately, we couldn't find room to fit δω 1 on the fitting trajectories plots, so this insight came from manually looking at the numerical results. The MATLAB code for reproducing this data is freely available with instructions below. Updates of dephasing parameters are shown for every trial in Video S11. Unfortunately, these results show skewed distributions of fitting parameters in panel (C) and notable underestimates of 95% confidence intervals in panel (D), which implies nonlinear propagation of error. In other words, confidence intervals are unreliable when model fitting to data with phasing errors while foregoing a fitting parameter to account for phasing errors. On the other hand, fitting parameter values are, on average, surprisingly accurate as shown in panel (E).
We now introduce a fitting parameter ϕ to account for phasing errors in a similar manner to Garrett-Roe and coworkers. [9][10][11] Fitting trajectories in Video S12 show that model fitting yields accurate fits of all fitting parameters, including ω 01 now. Video S13 shows updates of dephasing parameters for every trial. The results show symmetric distributions of fitting parameters in panel Figure S2. FIDs of unreferenced (A) and edge-pixel referenced (B) data in black. Corresponding model fits are overlayed in red. S10 (C) and reliable estimates of 95% confidence intervals in panel (D), which together imply that error propagates linearly when including a phase correction fitting parameter. Further results in panel (E) show accurate fits of dephasing parameters over the course of 100 fitting trials.
It should be noted that this model assumes phase stability over the course of the entire experiment, where the phase error is the same for every waiting time spectrum, and equally symmetric for the rephasing and nonrephasing spectrum (i.e. e ±iϕ in Eq. S2). In practice, this level of phase stability requires considerable effort to maintain with four-wave-mixing apparatus. Where these assumptions do not apply, we cannot guarantee the accuracy or reliability of model fitting. While there are many factors that can cause phase instability, the presence of spectrally correlated shot-to-shot noise (a.k.a. local-oscillator noise) further complicates the postprocessing procedure for correcting phase errors, particularly with the projection slice theorem. Therefore, we do not recommend model fitting for phase distorted apparatuses without prior removal of shot-to-shot noise (e.g. using a calibrated referencing scheme 3,12,13 ).

H. Instructions for Reproducing Results and Figures
The standalone desktop app (available for Windows and Mac users), MATLAB source code, and experimental data are freely available at https://github.com/kevin-robben/model-fitting. The following instructions are provided for reproducing results and figures. Note that MATLAB R2020 or newer is required. Scripts concerning simulated data have the random number generators initialized with seed = 1 (i.e. rng(1)), which enables exact reproducibility of results. Likewise, all experimental data is included.