Informatics-Driven Design of Superhard B–C–O Compounds

Materials containing B, C, and O, due to the advantages of forming strong covalent bonds, may lead to materials that are superhard, i.e., those with a Vicker’s hardness larger than 40 GPa. However, the exploration of this vast chemical, compositional, and configurational space is nontrivial. Here, we leverage a combination of machine learning (ML) and first-principles calculations to enable and accelerate such a targeted search. The ML models first screen for potentially superhard B–C–O compositions from a large hypothetical B–C–O candidate space. Atomic-level structure search using density functional theory (DFT) within those identified compositions, followed by further detailed analyses, unravels on four potentially superhard B–C–O phases exhibiting thermodynamic, mechanical, and dynamic stability.


Optimized parameters of developed machine learning models for bulk (K) and shear (G) modulus
The hyper-parameters for the best-selected models are shown in Table S1.

Down-selection of B-C-O compositions for DFT validation
The structure selection criteria employed in this study are summarized as follows: • Initial Downselection: From a pool of 335 compositions predicted to have a hardness greater than 35 GPa, an initial downselection was made based on several criteria.These included the total number of valence electrons and the total number of atoms in each composition.
Further refinement in the selection process involved restricting the number of atoms in the structures.This step was crucial in narrowing down the potential candidates.
• Final Selection Based on Symmetry and Optimization: Among the 19 structures that passed the initial filters, a detailed analysis based on symmetry and optimization was conducted.
This analysis led to the identification of only four structures that met all the required criteria.
As explained in Figure S1, initially 19 B-C-O compositions were identified and downselected to 15 compositions for further DFT calculations.These are listed in Table S2. S-2 This study employed a multi-step selection process involving both quantitative criteria (like valence electrons and atom count) and qualitative analysis (such as symmetry and optimization studies) to narrow down the vast pool of candidates to the most promising few.

Stability assessment for the compositions
For all the potential compositions the thermodynamic, mechanical, and dynamical stability have been verified.The thermodynamic stability has been determined by calculating the formation energy ∆E as shown below: which is the difference between the total energy E of B x C y O z and the total energies of atomic C, B, and O.
The mechanical stability has been determined by using Born's criteria for the stability of monoclinic systems, 1,2 as explained: where The formation energy and the Born stability criteria have been calculated for all of the identified BCO compositions in a similar manner.The dynamical stability has been assessed by calculating the phonon spectra for the identified compositions.S3. S-5

Summary of the structures corresponding to the identified superhard compositions
This study identifies four superhard B-C-O compounds, which include B 1 C 10 O 1 .These systems have been tested for thermodynamic, mechanical, and dynamical stability by calculating the formation energy, using Born criteria, and phonon dispersions through DFT.The summary is shown in Table S3.
Table S3: ML predicted and DFT calculated properties

Features analysis using SHapley Additive exPlanations (SHAP)
Interpreting the descriptors employed in the ML models facilitates the derivation of design guidelines, offering useful insights and rational pathways to accelerate the discovery of new promising materials that meet the desired target requirements.Hence, to analyze the local and global effects of the features on the predicted target values, SHAP analysis was performed.The findings can be summarized as follows: • The feature importance for bulk and shear modulus as shown in Figures S3 (a A smaller crystal unit cell favors a higher bulk and shear modulus due to a tightly packed system, thereby increasing hardness. • Further analysis shows that increasing the number of valence electrons in the p orbital could affect the elastic moduli, in turn, the hardness negatively, as shown in Figures S6 (a) and (b).
This confirms the direct and inverse influence of X min and p val f wm , respectively.In addition, this agrees well with the fact that increasing oxygen contents in BCO compositions leads to smaller hardness.
• Local interpretability was also analyzed through individual SHAP plots, validating the impact of specific features such as X min on the model output, as shown in Figure S7    For a particular observation, if the value of a feature that has a positive (negative) impact on model output is more than its mean value, then it will push the base value towards the right (left).

Figure S1 :
Figure S1: 15 superhard B-C-O compositions identified for further DFT validation by considering ML-informed decision, prior knowledge of compositions, symmetry search, leading to final four identified superhard B-C-O compounds after performing stability analysis.

Figure S3 :
Figure S3: Global interpretability of the features SHAP, where (a) SHAP feature importance plot for bulk modulus (K) and (b) SHAP feature importance plot for shear modulus (G), respectively.

Figure S6 : 2
Figure S6: SHAP dependence plot for fraction weighted mean of the number of valence electrons in p orbital (p_val_fwm) for (a) bulk (K) and (b) shear (G) modulus, respectively.

Figure S7 :
Figure S7: Local interpretability into bulk modulus using SHAP: individual SHAP plots for (a) BeSiIr 2 and (b) Li 2 Ge 4 N 6 .The numbers below the features denote their values for a particular observation, predicted values are shown under f(x), and the base value (100 GPa ) is the mean of the model output over the train data.The features that push the predicted value higher (to the right) are shown in red, and those pushing the prediction lower (to the right) are shown in blue.For a particular observation, if the value of a feature that has a positive (negative) impact on model output is more than its mean value, then it will push the base value towards the right (left).

Table S2
Among the seven symmetrized BCO structures, only five (B 4 C 8 O 4 , B 2 C 9 O 1 , B 2 C 8 O 2 , B 4 C 5 O 3 , and B 4 C 7 O 1 ) were optimized successfully.All the structures are dynamically stable, except B 4 C 5 O 3 , leading to a finalset of four BCO structures with predicted hardness (H pred ) more than 35 GPa.Among these four BCO compounds, B 4 C 7 O 1 has a DFT calculated hardness of 30.9 GPa, which is relatively lower than the other three compounds having calculated hardness of more than 40 GPa, thus identified as superhard.