Structural Features and Zeolite Stability: A Linearized Equation Approach

Zeolite stability, in terms of lattice energy, is revisited from a crystal-chemistry point of view. A linearized equation relates the zeolite lattice energy using simple structural data readily available from experiments or modeling. The equation holds for a large range of zeolite energies, up to 3 eV per tetrahedron with respect to quartz, and has been validated internally via two simple machine learning automatic procedures for data fitting/reference partitions and externally using data from recently synthesized zeolites. The approach is certain in locating those recently synthesized zeolites in the energy range of those experimentally known zeolites used in the parametrization of the linearized equation. Hidden intrinsic structural data–energy correlations were found for data sets built from energy-relaxed structures along with energy values computed using the same energy functions employed in the structural relaxation. The asymmetry of the structural features is relevant for an accurate description of the energy.


Section S1. Mathematical expression and fitted coefficients
using the max.and min.values of Table S1.

Section S2: Automatic and non-automatic attribute selection
To minimize overfitting caused by high correlation among certain attributes in the dataset, an experimentation was conducted to reduce the dimensionality and collinearity of the problem's attributes.Four procedures were considered for this purpose.Table S3 summarizes the main results of the experimentation using the dataset of all zeolites (S1-S10).A 10-fold crossvalidation approach was employed.
The conducted procedures were as follows: 2) Perform a Principal Component Analysis (PCA) to transform the initial dataset.The dataset is preprocessed using the filter weka.filters.unsupervised.attribute.PrincipalComponents available in Weka.The default parameters remain unchanged.Dimensionality reduction is achieved by selecting enough eigenvectors to represent 95% of the variance in the original data.However, the obtained results decrease the correlation and increase the error.In some cases, they do not reduce the dimensionality but rather transform it, complicating the equation for energy calculation, as it would require computing the components beforehand.
3) Apply attribute selection algorithms using correlation-based evaluation methods.Two attribute selection models are applied beforehand.The CfsSubsetEval, 3 evaluation method is employed, utilizing the PSOSearch, 4 and BestFirst algorithms:

Section S4: About "feasibility"
To achieve a linear function between energy and geometrical descriptors within a narrower energy range thereby increasing the accuracy of the fit, we can limit its application range to synthesizable zeolites.However, this concept is not easily straightforward (as discussed in the main text), thus necessitating the need to make certain assumptions.One we have defined the feasibility functions ( , , and ) for each zeolite, we applied     TT   TOT   the following filter to the entire database (excluding the S11, the new synthesized zeolites) to obtain the ensemble of likely feasible zeolites (~1800 zeolites): eV,    < 0.095  TT  < 0.003 Å., and deg.The filter thresholds were chosen so that the set of synthesized zeolites  TOT  < 3 (including sets S1, S6 and S11) are "feasible".For the filtered subset of zeolites, we subsequently applied the fitting procedure of the linearized equation (r = 0.97 and MAE of 8 meV).We obtained a performance score of r = 0.96 and MAE of 8 meV (for the S11, see Figure 3).
.functions.LinearRegression function, available in the Weka automated learning environment.1 Specifically, adjustments were made to the AttributeSelectionMethod and EliminateColinearAtributes parameters.The weka.classifiers.functions.LinearRegression function employs linear regression for prediction and utilizes the Akaike Information Criterion (AIC), 2 for model selection.The following input parameters were varied within this function: a) AttributeSelectionMethod: Setting used to select attributes for linear regression, comprising: i) no attribute selection, ii) attribute selection using M5's method (iterative attribute removal based on the smallest standardized coefficient until no improvement is observed in the error estimate given by the AIC), and iii) a greedy selection using the Akaike information metric.b) EliminateColinearAttributes, aimed at removing collinear attributes.The NumDecimalPlaces parameter was set to 10 in all experiments.The remaining parameters (DoNotCheckCapabilities, OutputAdditionalStats, Minimal, BatchSize, Debug, and Ridge) were used with their default values.By resetting the AttributeSelectionMethod parameter to Greedy method and EliminateColinearAttributes to 'True' (without performing any manual reduction), the dataset was reduced to 36 attributes, resulting in a correlation of r = 0.9455 and a MAE of 23.8 meV.This represented the best-case scenario observed, as no substantial improvement was observed.

4 )
Figure S1.Energy fit of IZA zeolites on increasing complexity of the linearized equation

Table S2 .
Coefficients of the linearized equation (F 1 ) for the lattice energies of zeolites.The   equation has the following expression: