Systematic Data-Driven Modeling of Bimetallic Catalyst Performance for the Hydrogenation of 5-Ethoxymethylfurfural with Variable Selection and Regularization

Catalyst development for biorefining applications involves many challenges. Mathematical modeling can be seen as an essential tool in assisting to explain catalyst performance. This paper presents studies on several machine learning (ML) methods that can model the performance of heterogeneous catalysts with relevant descriptors. A systematic approach for selecting the most appropriate ML method is taken with focus on the variable selection. Regularization algorithms were applied to variable selection. Several different candidate model structures were compared in modeling with interpretation of results. The systematic modeling approach presented aims to highlight the necessary tools and aspects to unexperienced users of ML. Literature datasets for the hydrogenation of 5-ethoxymethylfurfural with simple bimetal catalysts, including main metals and promoters, were studied with the addition of catalyst descriptors found in the literature. Good results were obtained with the best models for estimating conversion, selectivity, and yield with correlations between 0.90 and 0.98. The best identified model structures were support vector regression, Gaussian process regression, and decision tree methods. In general, the use of variable selection procedures was found to improve the performance of models. The modeling methods applied thus seem to exhibit a strong potential in aiding catalyst development based mainly on the information content of descriptor datasets.

. Root mean square error values, correlations and prediction uncertainties for reference models of conversion with diethyl carbonate solvent (C1). RMSET = Root mean squared value for training set. RMSECV = Root mean squared value of cross-validation. RMSEP = Root mean squared value for test set. RT = Correlation between observed and predicted values with training set. RCV = Correlation between observed and predicted values with cross-validation set. RP = Correlation between observed and predicted values with test set. PUT = Prediction Uncertainty (mean value of error 2 * standard deviation of error) with training set. PUCV = Prediction Uncertainty with ± cross-validation set. PUP = Prediction Uncertainty with test set.  Table S2. Root mean square error values, correlations and prediction uncertainties for reference models of selectivity with diethyl carbonate solvent (S1).  Table S8. Variable subsets for the best modelling results according to RMSEP values. M = main metal, P = promoter, CS = crystal structure, SGN = space group number, IE = ionisation energy, BCM = base-centered monoclinic, FCC = face-centered cubic, SH = simple hexagonal. Definitions for all the variables can be found in tables S15-S19.

4.
Dummy variable for Ni (M) N/A Explains the presence of Ni as main metal.

5.
Dummy variable for Pd (M) N/A Explains the presence of Pd as main metal.

6.
Dummy variable for Pt (M) N/A Explains the presence of Pt as main metal.

7.
Dummy variable for Rh (M) N/A Explains the presence of Rh as main metal.

8.
Dummy variable for Rt (M) N/A Explains the presence of Rt as main metal.

9.
Dummy variable for Bi (P) N/A Explains the presence of Bi as promoter.

10.
Dummy variable for Cr (P) N/A Explains the presence of Cr as promoter.

11.
Dummy variable for Fe (P) N/A Explains the presence of Fe as promoter.

12.
Dummy variable for Na (P) N/A Explains the presence of Na as promoter.

13.
Dummy variable for Sn (P) N/A Explains the presence of Sn as promoter.

14.
Dummy variable for W (P) N/A Explains the presence of W as promoter.

15.
Dummy variable for d-block in the periodic

39.
Dummy variable for base-centered monoclinic crystalline structure (P) N/A Explains, if the promoter has base-centered monoclinic crystalline structure.

40.
Dummy variable for base-centered cubic crystalline structure (P) N/A Explains, if the promoter has base-centered cubic crystalline structure.

41.
Dummy variable for centered tetragonal crystalline structure (P) N/A Explains, if the promoter has centered tetragonal crystalline structure.

42.
Dummy variable for space group number 12 (P) N/A Explains, if the promoter has space group number 12.

43.
Dummy variable for space group number 141 (P) N/A Explains, if the promoter has space group number 141.

44.
Dummy variable for space group number 229 (P) N/A Explains, if the promoter has space group number 229.

45.
Temperature°C Temperature of experiment.

46.
Atomic number (M) N/A Atomic number of the element in periodic table.

47.
Atomic weight (M) g/mol Defines the weight of an atom.

49.
Melting point (M) K Defines the temperature value at which the element changes its phase from solid to liquid.

50.
Boiling point (M) K Defines the temperature value at which the element changes its phase from liquid to gas.

51.
Heat of fusion (M) kJ/mol The quantity of heat necessary to change a solid to a liquid without temperature change.

Heat of vaporization (M) kJ/mol
The quantity of heat necessary to change a liquid to a solid without temperature change.

53.
Specific heat capacity (M) J/(kg*K) The quantity of heat necessary for a given mass to produce a unit change in its temperature.

54.
Thermal conductivity (M) W/(m*K) A measure of materials ability to conduct heat.

55.
Thermal expansion (M) K -1 Defines materials ability to change its shape, area, volume, and density to a temperature change.

56.
Molar volume (M) m 3 /mol Volume occupied by one mole of the substance at the given temperature and pressure.

57.
Brinell hardness (M) MPa Definition of materials hardness tested by applying pressure with indenter on the material.

59.
Bulk modulus (M) GPa Defines materials resistance to compression.

60.
Shear modulus (M) GPa Describe materials response to shear stress. Table S17. Variables used in the variable selection, part 3. M refers to main metal and P to promoter.

61.
Young modulus (M) GPa Defines materials resistance to elastic changes.

62.
Poisson ratio (M) N/A A measure of the Poisson effect.

63.
Speed of sound (M) m/s Defines how fast sound will travel in the material.

64.
Valence of ion (M) N/A Defines the number of electrons in the materials valence orbital.

65.
Electronegativity (M) N/A Defines atoms ability to attract a shared pair of electrons with another.

66.
Electron affinity (M) kJ/mol Defines the change in energy of a neutral atom, when an electron is added to the atom to form a negative ion.

67.
First ionization energy (M) kJ/mol The amount of energy needed to remove one electron from an atom.

68.
Second ionization energy (M) kJ/mol The amount of energy needed to remove two electrons from an atom.

69.
Electrical conductivity (M) S/m Defines materials ability to conduct electric current.

71.
Volume magnetic susceptibility (M) N/A Indicates the degree of magnetization of a material in response to an applied magnetic field.

72.
Atomic radius (M) pm Measure of the size of atoms in element.

73.
Covalent radius (M) pm Measure of the size of atom that forms part of one covalent bond.

74.
First lattice angle (M) N/A Defines first dimension's angle in unit cell that describes the crystal structure.

75.
Second lattice angle (M) N/A Defines second dimension's angle in unit cell that describes the crystal structure.

76.
Third lattice angle (M) N/A Defines third dimension's angle in unit cell that describes the crystal structure.

77.
First lattice constant (M) pm Defines first dimension's length in unit cell that describes the crystal structure.

78.
Second lattice constant (M) pm Defines second dimension's length in unit cell that describes the crystal structure.

79.
Third lattice constant (M) pm Defines third dimension's length in unit cell that describes the crystal structure.

80.
Neutron cross section (M) b Defines the likelihood of interaction between an incident neutron and a target nucleus.

82.
STO variable rAPEX (M) N/A Distance of maximum probability of encountering a valence electron.

STO variable FWHH (M) N/A
Width of the probability distribution (i.e. STOs) at half height (half of the maximum).

85.
STO variable SKEW (M) N/A Measure for the asymmetry of the probability distribution (i.e. STOs).

89.
Quadratic term for SKEW (M) N/A Quadratic term for SKEW.

91.
Interaction term for rAPEX and FWHH (M) N/A Interaction term for rAPEX and FWHH.

92.
Interaction term for rAPEX and SKEW (M) N/A Interaction term for rAPEX and SKEW.

93.
Interaction term for RAPEX and FWHH (M) N/A Interaction term for RAPEX and FWHH.

94.
Interaction term for RAPEX and SKEW (M) N/A Interaction term for RAPEX and SKEW.

95.
Interaction term for FWHH and SKEW (M) N/A Interaction term for FWHH and SKEW.

96.
Surface energy (M) J/m Defines the surface excess free energy per unit area of a particular crystal facet.

97.
Atomic number (P) N/A Atomic number of the element in periodic table.

98.
Atomic weight (P) g/mol Defines the weight of an atom.

100.
Melting point (P) K Defines the temperature value at which the element changes its phase from solid to liquid.

101.
Boiling point (P) K Defines the temperature value at which the element changes its phase from liquid to gas.

102.
Heat of fusion (P) kJ/mol The quantity of heat necessary to change a solid to a liquid without temperature change.

103.
Heat of vaporization (P) kJ/mol The quantity of heat necessary to change a liquid to a solid without temperature change.

104.
Specific heat capacity (P) J/(kg*K) The quantity of heat necessary for a given mass to produce a unit change in its temperature.

105.
Thermal conductivity (P) W/(m*K) A measure of materials ability to conduct heat.

106.
Thermal expansion (P) K-1 Defines materials ability to change its shape, area, volume, and density to a temperature change.

107.
Molar volume (P) m3/mol Volume occupied by one mole of the substance at the given temperature and pressure.

108.
Brinell hardness (P) MPa Definition of materials hardness tested by applying pressure with indenter on the material.

110.
Bulk modulus (P) GPa Defines materials resistance to compression.

111.
Shear modulus (P) GPa Describe materials response to shear stress.

112.
Young modulus (P) GPa Defines materials resistance to elastic changes.

113.
Poisson ratio (P) N/A A measure of the Poisson effect.

114.
Speed of sound (P) m/s Defines how fast sound will travel in the material.

115.
Valence of ion (P) N/A Defines the number of electrons in the materials valence orbital.

116.
Electronegativity (P) N/A Defines atoms ability to attract a shared pair of electrons with another.

117.
Electron affinity (P) kJ/mol Defines the change in energy of a neutral atom, when an electron is added to the atom to form a negative ion.

118.
First ionization energy (P) kJ/mol The amount of energy needed to remove one electron from an atom.

119.
Second ionization energy (P) kJ/mol The amount of energy needed to remove two electrons from an atom.

120.
Electrical conductivity (P) S/m Defines materials ability to conduct electric current.

122.
Volume magnetic susceptibility (P) N/A Indicates the degree of magnetization of a material in response to an applied magnetic field.

123.
Atomic radius (P) pm Measure of the size of atoms in element.

124.
Covalent radius (P) pm Measure of the size of atom that forms part of one covalent bond.

125.
First lattice angle (P) N/A Defines first dimension's angle in unit cell that describes the crystal structure.

126.
Second lattice angle (P) N/A Defines second dimension's angle in unit cell that describes the crystal structure.

127.
Third lattice angle (P) N/A Defines third dimension's angle in unit cell that describes the crystal structure.

128.
First lattice constant (P) pm Defines first dimension's length in unit cell that describes the crystal structure.

129.
Second lattice constant (P) pm Defines second dimension's length in unit cell that describes the crystal structure.

130.
Third lattice constant (P) pm Defines third dimension's length in unit cell that describes the crystal structure.

131.
Neutron cross section (P) b Defines the likelihood of interaction between an incident neutron and a target nucleus.

133.
Slater variable rAPEX (P) N/A Distance of maximum probability of encountering a valence electron.

135.
Slater variable FWHH (P) N/A Width of the probability distribution (i.e. STOs) at half height (half of the maximum).

136.
Slater variable SKEW (P) N/A Measure for the asymmetry of the probability distribution (i.e. STOs).

140.
Quadratic term for SKEW (P) N/A Quadratic term for SKEW.

141.
Interaction term for rAPEX and RAPEX (P) N/A Interaction term for rAPEX and RAPEX.

142.
Interaction term for rAPEX and FWHH (P) N/A Interaction term for rAPEX and FWHH.

143.
Interaction term for rAPEX and SKEW (P) N/A Interaction term for rAPEX and SKEW.

144.
Interaction term for RAPEX and FWHH (P) N/A Interaction term for RAPEX and FWHH.

145.
Interaction term for RAPEX and SKEW (P) N/A Interaction term for RAPEX and SKEW.

146.
Interaction term for FWHH and SKEW (P) N/A Interaction term for FWHH and SKEW.

147.
Surface energy (P) J/m Defines the surface excess free energy per unit area of a particular crystal facet.