Simulation of Linear and Cyclic Alkanes with Second-Order Møller–Plesset Perturbation Theory through Adaptive Force Matching

Predicting ensemble properties, such as density and heat of vaporization, of small hydrocarbons is challenging due to the dispersion-dominated weak interactions between these molecules. With the adaptive force matching (AFM) method, the bonded and short-range nonbonded interactions are fitted to second-order Møller–Plesset perturbation theory (MP2) references computed with the def2-TZVP basis set. The dispersion is modeled using symmetry adapted perturbation theory (SAPT) at MP4 accuracy using the def2-TZVPD basis set. A new charge matrix decomposition technique is described to obtain partial charges in AFM. Although the models developed do not have any empirical parameters, several properties of the resulting models are compared with experiments as validations. The density, heat of vaporization, pressure dependence of density, diffusion constants, and surface tensions all show quantitative agreement with experiments. Although the density shows a very small systematic error, which could be due to missing three-body dispersion, the heat of vaporization agrees with experiments of within 0.5%. The paper shows that AFM can be used as a reliable tool to enable simulations at post-Hartree–Fock quality at the cost of molecular mechanics force fields.

1. Procedure for creating the hydrocarbon models using AFM.

Initial guess force field and conformation
A set of conformations sampled with an initial guess force field is needed to start the AFM iterations.For this purpose, the Abalone program developed by Alexey Nikitin is used.The initial guess force fields were generated using the X75 program that has been tested to work with a range of hydrocarbons.Note that this program and the autogenerated model is only used to generate the conformations for the first generation of AFM.After the first generation, sampling will be done with Gromacs using the models developed by AFM.

Sampling step of AFM
The sampling step of AFM is simulated with a cubic box containing 125 molecules using either the initial guess model or generated AFM force fields.For most alkanes, two simulations were performed at 298 K and 328 K with a 0.5 fs and for 5 ns at each temperature.For butane, the sampling was performed at the normal boiling temperature of 270 K and 328 K.For each molecule, the lower temperature sampling is performed under NPT condition using the stochastic rescaling thermostat and Berendsen barostat.The 328 K simulations were performed under NVT condition to avoid accidental vaporization for some lower boiling temperature molecules.The higher temperature simulation helps sampling the short-range repulsive region of the model.200 snapshots are saved for each temperature from the last 4 ns of trajectory with one conformation every 20 ps.Thus for each generation 400 conformations are used for the QM calculations.

QM step of AFM
As mentioned in the paper, no MM region is used when computing reference forces for AFM as all the alkanes being studied are non-polar molecules.Neglecting MM particles will eliminate the QM/MM boundary and allow more QM particles to be fit.The QM cluster will be created by the following two step procedure.1.One random molecule is selected to be the first molecule of the QM region.2. Randomly select up to 5 molecules that are within 2.6 Å from the first molecule measured using nearest atom distances.This procedure will produce a QM cluster with up to 6 molecules.For each QM cluster, the forces were calculated using RI-MP2 with RIJCOSX and the def2-TZVP basis set using the ORCA program.

FM step of AFM
The dispersion parameters are not optimized during AFM.As described in the paper, the dispersion parameters were fit before AFM to SAPT dimer calculations.
The atom typing and energy expressions of the final force field is described in the paper.The fits were performed using a two-step procedure with the CReate Your Own Force Field (CRYOFF) program.The first step fits molecular forces and toques only and determines intermolecular parameters.This includes partial charges and parameters for short-range repulsion interactions.The reason for fitting intermolecular forces in a separate step is to avoid strong coupling with intramolecular forces, which tends to be much larger than intermolecular forces for weakly interacting compounds.The second step is intramolecular fit, where the non-bonded interactions are fixed to the values obtained in the first step and bonded parameters are determined by minimizing the errors in atomic forces.We also emphasize that with the CMD methods, the intermolecular fit solves the linear least square problem twice as described in the paper.

Simulation Details for Property Calculations
All property computations were performed using Gromacs package 2019.6.The liquid trajectories for the density and ΔHvap were simulated for 20 ns using a 0.5 fs time step size in cubic boxes containing 125 alkane molecules.The electrostatics interactions are treated with 4 th order Ewald summation with a real space accuracy of 10 -5 .The van der Waals cutoff is chosen to be 13 Å with long range correction to energy and stress due to the C6/r 6 term applied automatically by Gromacs.As discussed in the paper, Gromacs does not support the long range correction to pressure from the C8/r 8 term.Thus, the long range correction to energy and pressure due to the C8/r 8 term is done manually using the formula in the next section of the supporting information.For example, for n-heptane, the long range correction to C8/r 8 is -11 bar with a 13 Å cutoff at the average simulation density.Thus, the barostat is set to 12 bar, which is in reality 1 bar after the -11 bar correction is accounted for.
The liquid state simulations for density and ΔHvap are performed at 298 K for all the molecules except for n-butane, for which the experimental boiling temperature of 272 K is used.The temperature is maintained with the Nose-Hoover thermostat with a relaxation time of 1 ps and the pressure is maintained with Parrinello-Rahman barostat with a relaxation time of 5 ps.
For computing ΔHvap the gas phase energy is computed with a single molecule in a box the same size as the liquid simulation and with cutoff coulombic.In other words, the molecule won't be able to see its images in other boxes.The box size is held as constant.The equation of motion for the single molecule is integrated with the stochastic dynamics integrator with an inverse friction constant of 2.0 ps. 100 ns trajectory is used to compute the gas phase energy for each alkane molecule.The long trajectory is important for the gas phase as there is only one molecule, obtaining good statistics is not possible unless the gas phase simulation is substantially longer than the liquid phase.
The densities at elevated pressures are also computed using 20 ns MD using the same barostat and thermostat settings as those used for 1 bar except with the higher pressure.
For each molecule, the diffusion constant D were computed also with a 500 molecule cubic box at the average density at 1 bar.The simulation temperature is the same as the temperature of the available experimental data shown in Table II.The average box size was determined with 2 ns MD NPT simulations using Nose-Hoover thermostat with a relaxation time of 1 ps and the Parrinello-Rahman barostat with a relaxation time of 5 ps.After the average box size was determined, the diffusion constant was measured by fitting the mean square displacement obtained from a 2 ns NVT simulation also with Nose-Hoover thermostat but with a longer 2 ps relaxation time.
Accurate determination of surface tension γ requires long van der Waals cutoff.Table S7 summarizes the number of alkane molecules used to create the liquid slab that is at least 42 × 42 × Gromacs actually treat all different types of atoms to be equivalent but uses an average C6.It can be shown that it is equivalent to our derivation, where the number of pairs between identical and nonidentical particles are counted explicitly.
With a and the number of pairs are computed as described above.
can be shown that the long-range correction to the energy and pressure based on the above formula correction to the energy (Eq.4) is for the whole box and is not per particle.It can be shown that for one type of particles of a large number N.

Table S1 .
The parameters for the short-range damped dispersion used for all alkanes in this work.