Machine Learning-Driven Discovery of Key Descriptors for CO2 Activation over Two-Dimensional Transition Metal Carbides and Nitrides

Fusing high-throughput quantum mechanical screening techniques with modern artificial intelligence strategies is among the most fundamental —yet revolutionary— science activities, capable of opening new horizons in catalyst discovery. Here, we apply this strategy to the process of finding appropriate key descriptors for CO2 activation over two-dimensional transition metal (TM) carbides/nitrides (MXenes). Various machine learning (ML) models are developed to screen over 114 pure and defective MXenes, where the random forest regressor (RFR) ML scheme exhibits the best predictive performance for the CO2 adsorption energy, with a mean absolute error ± standard deviation of 0.16 ± 0.01 and 0.42 ± 0.06 eV for training and test data sets, respectively. Feature importance analysis revealed d-band center (εd), surface metal electronegativity (χM), and valence electron number of metal atoms (MV) as key descriptors for CO2 activation. These findings furnish a fundamental basis for designing novel MXene-based catalysts through the prediction of potential indicators for CO2 activation and their posterior usage.


INTRODUCTION
The excessive carbon dioxide (CO 2 ) concentration in Earth's atmosphere has become a large threat to the environment given its main role in global warming; therefore, a lot of efforts have been taken worldwide to remove it. The rise of CO 2 concentration in the atmosphere is mainly due to the massive destruction of forests as well as the extensive exploitation of fossil fuels, which led to a continuous increase of CO 2 concentration, that will reach ∼590 ppm by the year 2100, causing an expected global temperature raise by 1.9°C, with the concomitant acidification of oceans and devastating consequences for the marine ecosystems. 1 At present, the increasing CO 2 emissions are partly controlled through either converting it into useful carbon-based fuels/chemicals or by storing it in a stabilized media. To attain a valid impact on both environment and economy, it is necessary to utilize CO 2 instead of just storing it, to thus unlock its potential and trigger profitable industrial applications. Hitherto, several types of catalysts were investigated aimed at CO 2 activation and reduction, including different metal oxides, pure metals and alloys, organometallics, single-atom catalysts, non-metals, and nano-metals. 2−4 Typically, metals such as Cd, Sn, In, Pd, and Bi mediate the formation of formic acid from CO 2 , 5−8 while Ti, Zn, and Au can efficiently convert CO 2 into CO. 9−11 These studies demonstrated that the CO 2 molecule can interact with metal surfaces through either strong or weak binding modes. In the case of strong interactions, the metal−carbon (M−C) overbonding may poison the catalyst surface, making the active sites inaccessible for further reduction of CO 2 with a concomitant reduction of product formation. In contrast, a weak bonding between CO 2 and a given metal surface does not allow for the CO 2 C−O bond dissociation, as desorption prevails over this bond scission chemical step, which would favor the formation of the desired products. It should be borne in mind that the C−O bond enthalpy in the CO 2 molecule is very large, of 803 kJ·mol −1 , 12 and thus the activation of CO 2 can be regarded as a suitable approach to lower the CO 2 reaction conditions and energy demands. Therefore, a thorough activation analysis is highly required when designing novel catalysts based on rational approaches, to uncover which factors govern both activity and selectivity during the reactive processes.
In general, the CO 2 binding energy over a potential catalytic surface is considered as an effective source for predicting the likelihood of CO 2 reduction reactions. 13 However, the experimental accurate measurement of the CO 2 binding energy is far from being a simple issue. 14 On the other hand, the theoretical modeling of the catalytic activity on a given material surface requires extensive yet accurate calculations, preferably from first principles-based methods, leading to a good understanding of the interaction of CO 2 with the surface of interest and accounting also for coverage effects, but at a high computational cost, though. In this regard, properties that provide information about the catalytic activity from a lower computational cost are highly preferable, particularly to screen over a pool of chemically related family of materials. In a simple case scenario, the adsorption energies can be linearly correlated with electronic descriptors that only require investigating the substrates, e.g., by density functional theory (DFT), significantly reducing the computation cost for predicting the catalytic activity. In particular, Hammer and Nørskov 15 proposed one of the most successful descriptors to date, the d-band center, capable of predicting the adsorption energy of a given adsorbate at different TM surfaces using information of the TM surface electronic structure only. Here, the essence of the statement indicates that the binding energy of CO 2 to the TM surface does not require entire details of the density of sates, where instead the d-band center is sufficient to correlate the interaction strength with the surface chemical activity and, eventually, the catalytic performance. In addition, other structural parameters such as bond lengths and angles, even surface coordination numbers, could be correlated with adsorption energies. By mapping the adsorption energy with materials intrinsic properties, one can obtain descriptors that do not only provide a fast screening over them with a rather high accuracy but also offer fundamental insights into the coupling between CO 2 and the surfaces of interest.
In the recent years, machine learning (ML) models trained on a limited number of quantum-mechanical calculations have become an appealing alternative for high-throughput prediction of chemical reactivity with either algorithmderived 16−20 or handcrafted features. 21−23 The input variables required for ML modeling are typically accessible from the relaxed pristine materials surface structures without the presence of adsorbates. Using such properties with a low computational cost, one can predict complex parameters such as catalytic activities or adsorption energy distributions in a much faster way. Since the ML analysis within the catalysis field mainly deals with particular chemical or physical properties, e.g., adsorption energies, d-band centers, selectivities, limiting potentials, and so on, it is essential to consider supervised ML algorithms that map the target data set. Linear regression is a simple procedure with a highly potential and widespread approach used to analyze descriptors and to establish scaling relations for predicting valuable information in the computational heterogeneous catalysis field. More advanced techniques are currently available to handle multiple features such as non-linear relationships, 24−27 including kernel ridge regression, 28 neural networks, 29 random forest regression, 30 and Gaussian processes regression, 31 to name a few. Ultimately, choosing suitable descriptors is essential in any ML to regulate the prediction power and the learning efficiency. 32 In this work, ML models are developed to mine and map the CO 2 activation over pure and defective MXenes based solely on their pristine properties and the features of gas-phase atoms that enter in the MXene chemical composition. Note in passing by that for CO 2 activation, we refer here to a strong interaction between the CO 2 molecule and the MXene surfaces, leading to significant changes in the adsorbed CO 2 geometry, including a bent geometry with elongated C−O bonds and a molecular negative charge, resulting from a charge transfer from the MXene surface to CO 2 . This CO 2 activation must not be misled with another widely used meaning, implying the CO 2 conversion into other chemicals, e.g., CO, formic acid, methanol, and so on, although both definitions are connected, since the bent CO 2 geometry is quite often the key, decisive state in CO 2 conversion, as found in organometallics, 33 TM carbides, 34 MXenes, 35 metals, 36 alloys, 37 and oxide-based catalysts. 38,39 Thus, the fundamental goal of the present study is to develop and understand ML models for activated CO 2 adsorption on MXenes, which can be quantitatively implemented and leveraged for the predictive analysis in drawing useful information into the process of CO 2 posterior conversion. Figure 1 displays the schematic diagram of the ML workflow, trained on a data set generated from our previous literature and DFT calculations to identify potential descriptors for CO 2 activation over MXenes. To this end, three regression models, namely, multivariate linear regression (MLR), decision tree regression (DTR), and random forest regression (RFR) are set up and evaluated with the aim of predicting potential descriptors for CO 2 activation over these materials. 40 Accordingly, we performed a feature importance evolution and investigated the effect of each primary feature on the target adsorption properties. As demonstrated below, the RFR model is best performing, using d-band center, ε d , the MXene surface metal electronegativity, χ M , and valence electron number of metal atoms, M V , as meaningful features to predict the activation of CO 2 for the chosen MXene class of materials. This high-throughput screening research based on first-principles calculations and ML predictions can discover prominent indicators of CO 2 activation over MXene materials, and it is likely to be transferred to other bulk TM carbides/ nitrides materials as well.

Data Collection and Pre-processing.
The data required to nurture the developed ML tools were collected from our previous literature on MXenes for CO 2 capture. 35,41−44 A total of 114 data points were extracted, among which 60 points are from pure MXenes with varying thickness, while remaining 54 points correspond to MXenes with different sorts of vacancies; see Figure 2. Note that MXenes are usually surface-terminated as a result of the synthesis procedure, yet bare MXenes are nowadays attainable either through molten salts synthesis 45 or after applying cleaning protocols. 46 Furthermore, such non-terminated sites have been appointed to be key catalytic active centers in CO 2 conversion, as shown in the dry methane reforming. 47 In addition, some previous cases, where *CO 2 was found to dissociate into *CO and *O adsorbates upon relaxation on the MXene surfaces �due to a molecular placement too close to the MXene surface, and so a higher energy level, which led the dissociation� were reoptimized in order to gain a stable *CO 2 adsorption state. In addition, we also observed that a substantial amount of data was missing, particularly on surface descriptors, which were here calculated and completed; see below. Notice that the data source had many aspects in common, e.g., all being DFT calculations on p(3×3) slab models, with a minimum vacuum of 10 Å, and using Perdew− Burke−Ernzerhof (PBE) exchange−correlation functional, 48 with Grimme's D3 correction to account for dispersive forces. 49 However, data slightly differed concerning the plane-wave basis set kinetic energy-cutoff or the Brillouin zone k-points density. To assess the possible effect of such input differences on binding energies, we carried out test evaluations on 11% of the data set using the same materials, with representatives from pure MXenes and varying thicknesses and cases including different sorts of vacancies. The evaluated impact on target properties such as adsorption energies, bond lengths, and O�C�O angles were found to be at most of 0.07 eV, 0.03 Å, and 5.41°, respectively. Such discrepancies are well below or at least comparable to the inherent DFT accuracy.
The entire set of data points was then split into randomly selected training and test subsets. Accordingly, a random 20% of the total data points were labeled as test data and the remaining 80% was labeled as training data for the evaluation of the designed models. To better understand the importance of the studied models with the set of primary features, we considered the Pearson correlation coefficient, R, and the mean absolute error (MAE) as main evaluation indices.

ML Models and Hyperparameter
Tuning. Three ML models, namely, MLR, DTR, and RFR, were devised and evaluated to predict the CO 2 activation over MXenes on the set of described input features or descriptors. Based on a training data set, each model was developed, where the test data set was employed to evaluate their prediction accuracy. For more detailed information about the considered three ML models; see Section S1 of the Supporting Information. To improve the model prediction quality, a cross-validation was carried out during the training process to tune the hyperparameters. Generally, the hyperparameter tuning (HT) is used for obtaining optimal model performance by finding a set of hyperparameters, which are tuned during the model training process, 50 e.g., the DTR and RFR branches and leaf nodes; see Figures S1 and S2 in the Supporting Information. In the present study, HT was carried out using a grid search method, which is reliable methodology, while tuning a lower set of primary features. All the data processing and ML technique implementation were performed using the open-source scikitlearn library. 51 2.3. DFT Calculations. Complementary periodic DFT calculations were carried out using the Vienna ab initio simulation package (VASP) code, 52 using a plane wave basis set for the valence electron density with an optimal kinetic energy cutoff of 415 eV. For the scalar-relativistic treatment of the effect of core electrons on the valence density, projector augmented wave 53 pseudopotentials were used. A generalized gradient approximation exchange−correlation functional has been employed, in particular, that proposed by Perdew-Burke-Ernzerhof (PBE). 48 The geometry optimization was considered converged when forces acting on atoms were all below 0.01 eV·Å −1 , while an electronic convergence criterion of 10 −5 eV was imposed. An optimal Monkhorst−Pack grid of k-points  Table. A total of 114 data points are extracted, among which 60 points are from pure MXenes with varying thickness, while remaining points correspond to MXenes with different vacancies; metal vacancy (V M ), carbon/nitrogen vacancy (V X ), and metal and nearby carbon/nitrogen vacancy (V MX ). of 5×5×1 dimensions was used, overall guaranteeing adsorption energies to be converged below chemical accuracy of 1 kcal·mol −1 , ca. 0.04 eV. Dispersive forces were accounted using Grimme's D3 method, 49 being PBE-D3 a suited level of calculation employed in previous studies. 41 −44 The adsorption energy, E ads , of CO 2 on various MXene surfaces was obtained from the following equation where E COd 2 /MXene , E MXene , and E COd 2 are the total energies of CO 2 adsorbed on the corresponding MXene surface, that of the relaxed pristine MXene, and that of the isolated CO 2 molecule, respectively. For the CO 2 molecule, it was placed within a symmetric box of 10×10×10 Å dimensions and optimized at the Γ-point. ΔE ZPE is the zero point energy (ZPE) difference in between the adsorbed CO 2 and that of the gas phase within the harmonic approximation. For further details, we refer to literature. 35 As far as descriptors are concerned, the work function, ϕ, is defined as the amount of energy required to move an electron from the material Fermi level, E F , and place it in the vacuum energy level, E vac . Thus In the d-band center model, it is defined as the gravimetric center of the d-projected density of states of a surface TM atom, within the initial energy level up to the energy level corresponding to an hypothetical d 10 electronic configuration of the TM; see further details in literature. 54 Aside, a Bader's atoms-in-molecules electronic density analysis is carried out to integrate it within regions whose charge is assigned to certain atoms. 55 Thus, a negative Q value implies a negative charge, and vice versa. Finally, the exfoliation energies, E exf , are gained, computed as the energy necessary to remove the A element from MXene MAX phase precursors, 44 and obtained as follows where E MXene and E MAX are the isolated MXene and the MAX unit cell total energies, respectively, as depicted in Figure S3 of the Supporting Information. Besides, E A and S A indicate the bulk phase atomic energy of A species and the cross-section area of each created MXene unit, respectively. Within this definition, the larger the E exf , the stronger the bonding between MXene layers and the A phase and the costlier is to separate them.

RESULTS AND DISCUSSION
Having consistently gained and gathered all the necessary data, we first considered four target variable indicators of the CO 2 activation. These included CO 2 adsorption energy, E ads , in the sense that, a priori, the stronger the bonding, the higher the activation. Aside from this energetic feature, we regarded two geometric parameters, the average C−O bond distance, d(CO), and the CO 2 molecular angle, α(OCO), since, ideally, the activated CO 2 features a reduced angle compared to the linear gas molecule angle of 180°, plus elongated C−O bonds, result from the activated bent geometry, and a consequence of a charge transfer from the substrate material. 56,57 Thus, the smaller the angle and the larger the bond lengths, the more activated the CO 2 . Finally, the mentioned charge transfer can be quantified through the Bader charge of the adsorbed CO 2 , Q, in the sense that, the larger the charge, the more activated CO 2 is. At first, we evaluated these features in a descriptive fashion, showing fringe limits in the data set and distribution; see Figure 3. A quick inspection reveals that the distribution of features is not uniform for none of the target properties. For instance, E ads shows three peaks, one close to ca. −3.5 eV, another around −2.1 eV, and a small peak close to −0.4 eV. According to the Sabatier principle, moderate adsorption energies �neither too weak nor too strong� would provide the better catalytic performances, but, in our case, an activated CO 2 molecule getting bent and negatively charged often implies strong adsorption energies, suggesting that a surplus of energy is required for a reaction to occur when using adsorbed CO 2 . In any case, among all the studied MXenes, only 6.84% �6 out of 114� of the E ads are below −1.0 eV, which indicates overall a strong CO 2 chemisorption over the studied MXenes.
The previous property is accompanied by reduced angles and elongated C−O bonds, indicators of the CO 2 activation. 58 In the latter case, they are concentrated at 1.37 Å, which is 0.20 Å larger than the CO 2 distance in vacuum of 1.17 Å, with a smaller peak at 1.27 Å, and few cases with bond lengths larger than 1.5 Å, like Cr 2 C with a d(CO) of 1.54 Å. When it comes to molecular angles, there are two main peaks around 116 and 132°. Notice thus that all the studied cases imply a bent CO 2 , with angles ranging from 112 to 140°. The increase in both C− O bond elongation and CO 2 bending is consistent with a charge transfer from the surface to the adsorbed molecule. 59 Thus, the Bader charge of the adsorbed CO 2 is also a potential indicator of activation, where the average Q is found to be −1.59 e, with a minimum and maximum value of −2.98 and −0.83 e, respectively, and a significant peak around −1.1 e.
To understand the efficiency of a catalyst, one requires descriptors that correlate with the catalyst performance. Hence, for a practical use, the selected primary features or descriptors should be much facile to evaluate when compared with that of the target properties and, whenever possible, connect with chemical intuition-derived concepts. Thus, for a fruitful comparison of unique fingerprints, we have considered 18 primary features aimed to characterize the local environment of the adsorption sites, chosen among the properties of pristine MXenes, but also including features from the atoms comprising the MXene. These primary features are rapid to obtain, unique, and easily accessible. Typically, since the binding energies scale linearly with the d-band filling, the adsorption strength could be linked to the TM d-band energy distribution. Figure S4 displays the linear correlations between the target properties; E ads , d(CO), α(OCO), and Q, and primary features of MXenes, including some of the best performing or alleged descriptors in the literature, such as the d-band center, ε d , the exfoliation energy, E exf , the work function, ϕ, the metal electronegativity, χ M , the valence electron number of a metal atom, M V , and Bader charge of surface metal atom, q M , along with the regression coefficients R. For a better understanding, the R values of the aforementioned descriptors are provided separately for pure and defective MXenes and summarized in Figure 4, also regarding C-and N-based MXenes separately.
As seen in Figure 4, for both pure and defective MXenes, E ads shows better linear trends with the primary features, while d(CO) and α(OCO) show poor correlations when compared with other target properties. In the case of pure MXenes with varying thicknesses, the detailed analysis demonstrates that the R value of E ads as a function of ε d improves by increasing it. For defective systems, the R value is smaller for single vacancies; V M and/or V X , while the R score increases in the case of double vacancies, i.e., V MX . Interestingly, the R value of several primary features exceeds that of the d-band center. For instance, in all cases, M V shows better scaling relations among the other descriptors. It should be noted that the d-band center is quite a universal descriptor for E ads of different adsorbates at transition meal surfaces representing catalyst models. However, there are several signatures that the d-band center itself is not an adequate descriptor for more complex compounds. 60−63 On the other hand, q M shows very small regression coefficients, indicating that the target properties exhibit poor correlations with the primary features. Notably, M V and ε d appear as the top two descriptors, independently establishing the relationship with the target properties. However, we were unable to establish a better relationship with target properties using simply several single descriptors, which requires integration of multiple descriptors to reach a more accurate description. Therefore, these insufficient correlations prompted us to build a predictive ML model through combination of primary features that could resemble the contribution of each feature individually to the model.
Thus, using primary features as input variables, we evaluated various ML models, including MLR, DTR, and RFR methods using our database. Their MAE together with the standard deviation, σ, are shown in Table 1. In the case of E ads , the MAE values are found to be 0.49 ± 0.06, 0.53 ± 0.10, and 0.45 ± 0.06 eV for MLR, DTR, and RFR, respectively. Notice that  such errors are around double the typical DFT accuracy of ca. 0.2 eV and are still too large, especially when predicting cases with an E ads weaker than −1 eV. However, for the majority of MXene cases, the accuracy is already enough for a rapid screening, being the most of the cases between −1 and −4 eV; see Figure 1. For all the combinations of descriptors, the RFR model showed better performance than MLR and DFR models. Typically, feature importance estimates the weightage of a particular descriptor, thereby revealing the most relevant features for predicting the target properties by understanding the direct chemical insights. Especially for catalytic materials, 30 To further understand the importance of precise features that correlate the target properties, it is necessary to remove the descriptors that are less relevant in minimizing the MAE. It should also be noted that an excessive number of features may lead to high prediction bias and low training efficiency. 68 To alleviate this issue, the feature dimension is reduced by employing the leave-one-out approach. Using this method, we eliminate unwanted features by evaluating their impact on the test set MAE. After shortlisting the descriptors according to the leave-one-out approach, HT was performed over RFR by employing crossvalidation on various combinations of parameters. Although removing unnecessary features (RUF) and HT exhibited comparable performance, the latter marginally outperformed the former in terms of least MAE.
As per the size of the data set, Figure 5 shows the E ads MAE decay with respect to training set size; in other words, the learning curve, regarding that training set considers randomly selected 80% of the samples, while the test set comprises the remaining 20%. For a better analysis, a cross-validation procedure with 100 shuffle splits was carried out, as done in previous analysis, where average MAE is shown in Figure 5, with areas denoting the standard deviations. 69 Notice on the training set that RFR MAE decay is rather good, 0.16 ± 0.01 eV, rapidly below the 0.2 eV DFT accuracy limit, and especially with an almost negligible standard deviation when having more than ca. 60 samples. Still, the decay of the test set is more pronounced, with larger standard deviations; see Table  1, and with the open question whether the evolution would remain stuck or would still descend when increasing the number of points of the data set. Alternatively, the reached plateau may be indicative of the existence of other descriptors, here not accounted for, which could be critical in improving the accuracy. Similar MAE evaluation is found for d(CO), α(OCO), and Q in Figures S5−S7 of the Supporting Information.
The HT of RFR further improved the accuracy of the model for E ads by reducing the test set MAE to 0.42 ± 0.06 eV. Indeed, estimations on the MAE on the HT of RFR ML using a limit training set of 113 points, and evaluated on the remaining test point, provides slightly better accuracies of 0.15 and 0.40 eV for training and test sets, respectively, over 114 developed ML models, signaling the convergence of the accuracy over the data set. The top four descriptors listed by the RFR model for E ads are the combination of two features of the TM chemical elements, χ M and M V , plus two other computed for the MXenes, ε d and E exf ; see Figure 6, highlighting how important surface metal atoms are and how important is their placement within the MXene arrangement. It should also be noted that the choice of features introduces biasing, but at the same time, favors to counterbalance the overfitting, since we narrow their choice to sensible parameters that have been correlated to the sought, target properties, according to the literature. In the case of d(CO) and α(OCO), the MAE of the testing set is rather good as well, which are found to be 0.04 ± 0.01 Å and 4.84 ± 0.78°, respectively, essentially four times larger than chemical accuracy limits of 0.01 Å and 1°, respectively. For Q, there is a slight decrease in the prediction performance of RFR using HT; from 0.21 ± 0.03 to 0.20 ± 0.03 e, when compared to RUF. To reinforce the employed methodology, we have also compared our results using the recursive feature elimination (RFE) 70 method to filter the descriptors with extreme asymmetry (skewness) and with low/zero variance for recognizing more suitable smaller subset of features. As shown in Table S1 of the Supporting Information, the leave-one-out approach outperforms the RFE method by providing better predictive mean absolute errors. Finally, notice in Figure 6 that ε d and χ M are common descriptors of all the explored properties, while others such as ϕ, q M , and E exf are also common to a couple of properties, while M V and Nd M are only important to E ads and Q, respectively.
Notice that the abovementioned ML models work irrespective of MXenes with or without vacancies, and for either C-or N-based MXenes, at variance with linear relationships; see Figure S4 of the Supporting Information, highlighting the versatility of the ML approach. Inspecting the descriptor weights in Figure 6, the ranking already states how ε d and χ M are determinant in CO 2 activation, where the larger the ε d , the stronger the bonding is, as expected from the dband model. 19 Aside, the smaller the metal electronegativity, χ M , the stronger the E ads , fully physically understandable given the coulombic contribution of the bond of negatively charged  Figure S4 in the Supporting Information. In any case, the weights of these two primary features are different for the different properties, e.g., ε d weights are 30 and 35% for Q and d(CO), respectively, while for E ads , actually χ M and ε d have similar importance values of 25%. Other secondary features can be rationalized as well; for instance, the CO 2 charge Q also pretty much affects the molecular angle, α(OCO), and is influenced by a smaller workfunction, ϕ, which succinctly implies an easier MXene→CO 2 charge transfer. E exf affects the bond strengths, and so, the larger the E exf , the smaller the CO 2 adsorption energy and the less elongated becomes d(CO). As far as geometries are concerned, d(CO) and α(OCO) seem to be slightly influenced as well by q M , so that the larger the charge, the smaller the α(OCO) angle and the longer the d(CO), stabilizing the negatively charged CO 2 . Finally, the number of valence electrons and the number of d electrons, somehow related, affect the E ads and the amount of transferred Q, in the sense that the smaller the number of valence electrons, and so, of d electrons, the stronger the E ads and the more charge transferred, also in line with higher ε d . By identifying these descriptors, we have gained a deeper insight of the fundamental properties governing CO 2 activation on the studied MXene surfaces, which can ultimately be used to design and optimize MXene-based compounds for CO 2 storage or conversion applications. Thus, the ML tools allowed us to name which factors govern the CO 2 activation, and which importance they have, which are properties to have in mind when inspecting other MXenes for CO 2 storage or usage selected processes. For example, from the descriptor weights in Figure 6, when one would seek for CO 2 E ads of −1 eV or weaker, one should pay attention to the M V , ε d , χ M , and E exf descriptors; which is in line with the trends evaluated in Figure  S4 of the Supporting Information; one would seek the MXene materials with ε d below −1 eV, while having an E exf above 3.25 J·m −2 , a metal electronegativity of the metal, χ M , above 1.5, and a minimum number of 6 e valence electrons of the metal, M V . Moreover, the coefficient of determination analysis; see heat map in Figure S8 of the Supporting Information, demonstrates that the reduced set of features is sufficient for capturing the complex interactions influencing the E ads , d(CO), α(OCO), and Q, with no significant linear correlation among the found descriptors.

CONCLUSIONS
In summary, we have developed a ML prediction scheme to unearth the potential indicators for CO 2 activation on MXenes with the accessible properties of the pristine materials and of the atoms they are composed of. Three different ML algorithms were trained, where the hyperparameters tuning of RFR improved the accuracy of the model for E ads , reducing the test set MAE to 0.42 ± 0.06 eV when compared with that of the conventional RFR model, while the training set MAE was 0.16 ± 0.01 eV. The high ranking of the d-band center, ε d , and surface metal electronegativity, χ M , is highlighted for E ads , but also for other activation properties, including CO 2 charge,