Discovering High Entropy Alloy Electrocatalysts in Vast Composition Spaces with Multiobjective Optimization

High entropy alloys (HEAs) are a highly promising class of materials for electrocatalysis as their unique active site distributions break the scaling relations that limit the activity of conventional transition metal catalysts. Existing Bayesian optimization (BO)-based virtual screening approaches focus on catalytic activity as the sole objective and correspondingly tend to identify promising materials that are unlikely to be entropically stabilized. Here, we overcome this limitation with a multiobjective BO framework for HEAs that simultaneously targets activity, cost-effectiveness, and entropic stabilization. With diversity-guided batch selection further boosting its data efficiency, the framework readily identifies numerous promising candidates for the oxygen reduction reaction that strike the balance between all three objectives in hitherto unchartered HEA design spaces comprising up to 10 elements.


S1 DFT dataset
Two distinct DFT datasets have been established in this work: the in-domain dataset and the out-of-domain dataset.The in-domain dataset serves as the training set for the ML regression model used in predicting adsorption enthalpies.It contains 19,955 data points and comprises three 5-element HEA subsets: AgIrPdPtRu, AuOsPdPtRu, and CuPtReRhRu, covering a total of ten elements.The AgIrPdPtRu subset is taken from reference, 1 while the last two subsets were constructed using DFT calculations with the same settings and post-processing steps to ensure data consistency.The data preprocessing is intended to filter out relaxed structures where the slabs have converged in a rearranged state or where the adsorbate has moved to a different site.It is noteworthy that the existence and viability of the AgIrPdPtRu and AuOsPdPtRu HEA catalysts have been confirmed through recent experimental studies, 2,3 and Cu-based catalysts have shown promising potential. 4,5To be more specific, we considered fcc(111) surfaces with randomly distributed surface atoms and OH* and O* adsorbed in on-top and fcc hollow sites.To generate diverse slab compositions, compositions were drawn from a Dirichlet distribution with uniform density within the 5dimensional simplex space.In this spanned hyperspace, samples located at the edges and corners exhibit less diverse active site motifs, which are generally easier to predict compared to those at the center.The large diversity resulting from this sampling method can aid the ML model in making predictions throughout the spanned hyperspace.The atom identities were then picked from that set of probabilities.Next, on-top OH* and fcc hollow O* adsorption motifs were randomly sampled nine times for each unique slab.After carrying out the necessary postprocessing steps, we obtained an in-domain dataset with sizes of 5,039, 7,461, and 7,455 for AgIrPdPtRu, AuOsPdPtRu, and CuPtReRhRu, respectively.This in-domain dataset provides a comprehensive and diverse representation of the surface configurations and concomitant active sites.
The out-of-domain dataset contains 4,020 data points and comprises two subsets: compositiondiversity and component-diversity subsets for the purpose of testing extrapolative perfor-mance.The composition-diversity subset focuses on an unknown 5-element RuRhPdIrPt.
To generate this subset, we sampled 120 slabs with different compositions using the Dirichlet distribution method described above.For each unique slab composition, the on-top OH* and fcc hollow O* adsorption motifs were randomly sampled nine times individually.The component-diversity subset contains 100 different 5-element combinations randomly drawn from the 10-element library.For each 5-element combination, we randomly sampled four compositions, and for each composition, three different on-top OH* and fcc hollow O* adsorption motifs were randomly selected.By conducting the same postprocessing step as the in-domain dataset and previous studies, 1 we end up with an out-of-domain dataset with sizes of 2,138 and 1,882 for composition-diversity and component-diversity subsets, respectively.The adsorption enthalpy distributions for the in-domain and out-of-domain datasets are shown in Figs.S1 and S2, respectively.Additional information on the DFT calculations is provided in Section S2.S2 DFT computational details.
The DFT calculations of in-domain and out-of-domain datasets were performed using the following settings, which are entirely consistent with previous studies. 1The GPAW code 6,7 was used with a plane-wave basis set and the revised Perdew-Burke-Ernzerhof (RPBE) exchangecorrelation functional. 8The HEA surfaces were modeled using the fcc(111) slab with a randomly distributed atomic arrangement to represent random solid solutions.We used a 3 × 3 atom-size surface cell and five atomic layers.The larger surface size was meant to capture the intricate local electronic environment, 1,9 and the deeper layers managed to account for longrange and directional ligand effects in HEAs. 10 The bottom two layers were kept fixed in their bulk-truncated positions, while the top layers and adsorbates were relaxed until the maximum force on each atom was below at least 0.1 eV/ Å.All DFT calculations were conducted as periodic slab calculations, with a vacuum region of 10.0 Å above and below the slab.A 4 × 4 k-point grid and a plane waves energy cutoff of 400 eV were employed.The initial surface structure setup and DFT geometry optimization were carried out using ASE 11

S3 Additional details on the Graph Neural Network (GNN) model
Graph representation is a versatile approach for representing various molecular systems, including isolated molecules, crystal structures, and surface-adsorbate systems.3][14][15] In this study, we utilize a variant of gated graph convolutional networks 16 that has recently been developed for predicting 5-element HEA catalysts, 1 in which a long-range node feature related to directiondependent effects in the third layer of the surface slab, is included (see Fig. S3 for the GNN architecture).The GNN model relies on the local environment (only connectivity) of the active site, where we consider graph representations that encompass up to second-rank neighbors, equivalent to three node distances to the binding atom of the adsorbate.In this way, the GNN model is size-extensive, enabling predictions for sufficiently large surface cells.
The graph edges between two atoms were determined based on their covalent radii with a skin of 0.3 Å.The node attribute used consists of a 14-dimensional feature vector.This vector includes a 12-dimensional element onehot encoding, representing the ten different metal species and O and H in the adsorbates.Additionally, it includes a one-dimensional zone feature and a one-dimensional long-range feature.The long-range feature is determined based on whether the third layer exhibits a direction-dependent effect through the second zone atom to the binding atom.More details about the long-range feature can be found in the Fig. S4.We also tested the use of a learnable atom embedding, which is initialized based on the atom's identity. 17However, the predictive performance of this approach was found to be inferior compared to the inclusion of the long-range feature within the current GNN architecture.This can be attributed to the fact that while graph representation already includes connectivity information about how atoms bond to other atoms, the long-range feature provides additional directional information that points out the underlying atoms for the adsorption.Therefore, we stick with the generated 14-dimensional node features.
The GNN model was implemented using the PyTorch framework, Adam optimizer and MAE loss function.We use a random 80%/10%/10% training/validation/test split of the in-domain dataset.During the training process, an early stopping strategy was used to prevent overfitting where we monitored the validation error by calculating the rolling mean over a window of 10 epochs, which is compared to that of all prior epochs.If the rolling mean validation error did not decrease by at least 1% compared to the previous lowest error within 100 epochs, early stopping was triggered.The learning rate was set to 0.001.To optimize the hyperparameters of the GNN model, we conducted a systematic grid search on various parameters including dimensionality of convolutional layers, number of convolutional layers, number of hidden layers, and batch size.We evaluated the performance of each combination of hyperparameters using the MAE of validation.The results of the grid search, along with the associated validation MAE values, are presented in a parallel coordinates plot (see Fig. S5).Based on the grid search results, we identified a set of optimal hyperparameters that are 22, 4, 0, 64 for dimensionality of convolutional layers, number of convolutional layers, number of hidden layers, batch size, respectively, yielding robust predictions.We then reported the MAE on test split as the predictive performance of the in-domain prediction (see Fig. S6).
For the extrapolation tasks, using the identified optimal hyperparameters, we performed a random 90%/10% split of the in-domain dataset for training and validation and used same early stopping strategy described above.We then assessed the model's performance on the composition-diversity and component-diversity datasets, which serve as the out-of-domain predictions (see Figs. 2b-e in the main text).The resulting pretrained GNN model will be used to infer predictions in multi-objective optimization.
Figure S3: Schematic illustration of the GNN architecture.The elemental species are encoded as one-hot vectors with a size of 12 (i.e., 10 metal species, hydrogen and oxygen)."Zone feature" refers to the sequence number of zone that an atom presents (e.g., the zone feature of an adsorbate atom is 0, that of the active site atom is 1, and that of the first nearest neighbor is 2, etc).The long-range feature is described in Fig. S4.The graph edges between two atoms were determined based on their covalent radii with a skin of 0.3 Å.

S4 Additional details on the diversity-guided multiobjective Bayesian optimizaiton (BO)
Our customized multi-objective BO framework is built upon a recently developed diversityguided approach. 18In general, multi-objective optimization methods consist of four key elements, including the surrogate model, acquisition function, solver, and selection strategy.Specifically, in this work we utilize GPR with a radial basis function kernel, the identity function, the Pareto-discovery solver, 19 and a diversity-guided selection strategy, respectively.
For more details on each of these key elements, a thorough comparison to single-objective optimization methods used in previous studies, 1,20 and hyperparameters used, please refer to sections S4.1 and S4.2.Furthermore, we mindfully customized the diversity-guided approach 18 for our HEA application in following ways: 1) For a given number of elements N , we are addressing an optimization problem subject to an equality constraint that requires the sum of the fractions of the individual elements to be 1.To efficiently handle this equality constraint problem, we reformulated it to inequality constraints: C i − 1 < 0 and C n < 0.9, where the individual fraction is limited to 0 to 0.9.In this way, the number of variables is N − 1, and the N th variable is set to be deterministic.Note that the limit for the N th element C N are from 0 to 0.9.Since C N is determined by the equation slightly greater than 0. The upper limit of C N is directly imposed by the second constraint, C n < 0.9.2) The original method employed a Latin hypercube sampling in both the initial sampling step and stochastic sampling step of the Pareto-discovery solver.However, it is hard to get valid samples that satisfy the equality constraint, e.g., N 1 C i = 1.Instead, we draw variables from a uniform hypercubic sampling and transform them using quantiles of gamma functions followed by normalization, resulting in a Dirichlet-like distribution.This sampling method gives rise to a better discrepancy score than standard Dirichlet sampling, indicating a better coverage of the design space.
3) The original implementation for computing the gradient vector at the local optimization step in the Pareto-discovery solver can lead to invalid solutions that are out of bounds.We improved it by gradient numerical differentiation respecting bounds.

S4.1 Multi-objective and single-objective BO Methods
In general, the key components in multi-objective optimization can be classified into the surrogate model, acquisition function, solver, and selection strategy.In Fig. S7, we outlined our customized multi-objective BO method 18,19 in terms of these four components, and we also compared it with a single-objective BO method that was developed recently for discovering 5-element HEA electrocatalysts.The surrogate model serves to approximate an unknown function that maps the design space to the performance space for each objective.In our multi-objective BO method, we utilize Gaussian Process Regression (GPR) 22 with a radial basis function kernel (eq. 1) as the surrogate model.The choice of GPR is motivated by its data efficiency compared to deep neural networks while also providing uncertainty prediction.It is noteworthy that (c) Performance buffer data structure used to store candidates in the process of multi-objective BO.Each buffer cell includes the points with the minimum distance to the origin that intersects its corresponding ray and also top K candidates within an allowed tolerance of the minimal distance (these hyperparameters were presented in Tab.S1).This buffer gets updated with each iteration of the discovery algorithm.Redrawn from the inserts in Ref. 18.
the number of surrogate models used in multi-objective BO is equivalent to the number of objectives to be optimized.The radial basis function kernel has been successfully applied in many case studies due to its similarity to the Gaussian distribution, leading to smooth and infinitely differentiable functions.
The acquisition function is a mathematical technique that provides guidance on how the design space should be explored during optimization.For single-objective BO, the most commonly used acquisition functions include expected improvement, 23 probability of improvement, 24 and upper confidence bound. 25In the context of multi-objective optimization, the acquisition function is adapted or jointly used to accommodate multiple surrogate models, such as expected hypervolume improvement 26 and predictive entropy search. 27In this study, we employ an identity function, simply the mean of the GPR posterior.This acquisition function will be incorporated with the Pareto-discovery solver to guide the exploration of the design space.
Regarding the solver, this work utilizes the Pareto-discovery solver, 19 which aims to discover a continuous Pareto front instead of discrete points, enabling efficient navigation of the compromised landscape.The Pareto-discovery solver operates iteratively and consists of three main steps: stochastic sampling, local optimization, and first-order approximation.
Firstly, the stochastic sampling combines newly generated samples from our adapted hypercubic sampling and samples previously found, perturbing them to avoid local minima.
Secondly, a local optimization is performed on these perturbed samples, aiming to push each sample towards Pareto-optimal solutions, for which we employ the Sequential Least Squares Programming (SLSQP) method 28 implemented using SciPy, 29 along with gradient numerical differentiation respecting bounds.Next, once a local Pareto-optimal solution is identified, first-order approximation locally expands the Pareto-optimal solutions around this point.
This first-order approximation relies on the Karush-Kuhn-Tucker (KKT) conditions, 30 efficiently resulting in a dense set of solutions that approximates a piecewise continuous region of the Pareto front.A complete loop is depicted by a green arrow in Fig. S8, and this loop is applied multiple times to continuously push the samples towards the Pareto-optimal solutions, ultimately resulting in a dense set of approximated Pareto-front shown in the blue curve of Fig. S8b.It should be noted that the Pareto-discovery solver relies on a data structure, namely, the performance buffer shown in Fig. S8c.This buffer is represented as an array discretized using hyperspherical coordinates.Each cell in the buffer stores a list of solutions, including Pareto-relevant points and their associated manifold approximation.Throughout the iterative discovery procedure, the buffer is continuously tracked and updated.
The batch selection procedure begins with the Pareto-front approximation obtained from the Pareto-discover solver.The initial step involves sparsifying the approximated Paretooptimal solutions, where the idea is to select a single solution from each buffer cell while ensuring that solutions in adjacent buffer cells are close in the design space.This is achieved by using a graph-cut method that extracts a sparse subset of optimal points grouped into k linear subspaces.The points are grouped based on their values in the performance space and nearness in the design space.Thus, these linear subspaces define diversity regions (as shown by the blue curve in Fig. S8b).Subsequently, the downstream diversity-guided batch selection aims to choose a batch of samples from these regions that are diverse in both the design and performance space, while also maximizing hypervolume improvement (see selected points in Fig. S8b).This is accomplished through a greedy approach.It is remarkable that the selected batch of samples can prevent the optimization process from getting trapped in local minima and can improve the predictive performance of the surrogate model, i.e., GPR, especially when there are significant uncertainties in the initial iterations.
In comparison to the single-objective BO shown in Fig. S7, the main distinction lies in the solver and selection strategy.Specifically, the single-objective BO method employs 1000 randomly selected samples and evaluates their acquisition values.Once a sample with the maximum acquisition value is found, a grid search is conducted to assess samples around that maximum.However, a major issue can arise if the surrogate model is not trained on effective samples, resulting in poor predictive performance.This can lead to incorrect assignment of acquisition values to suggested samples and ultimately getting stuck in local minima.
In contrast, multi-objective BO method addresses this problem by incorporating diversity information in both the design and performance space, and various implementations in the Pareto-discovery solver are used to escape from local minima, which significantly alleviates this issue. 18

S4.2 Hyperparameters in multi-objective BO
For most of the hyperparameters, we maintained consistency with the previous implementation, 18 as shown in Tab.S1.Hyperparameters related to the performance buffer include the number of buffer cells, the maximum number of samples in each buffer cell, the buffer origin, and the tolerance for buffer construction (δ b ).Regarding the Pareto-discovery solver, For the initial samples and batch size per iteration, we used 10 data points and 4 data points for 5-, 6-, and 7-element HEA spaces, and 30 data points and 10 data points for 10-element HEAs space.The larger number of initial samples and batch size used for the 10-element space is due to the fact that there is a much larger design space inhabited.

S4.3 Data Efficiency
To quantify the data efficiency of single and multi-objective BO, we conducted singleobjective BO using the method described in Ref. 20 and compared them with multi-objective BO results in terms of the number of experiments required to achieve 95% of the optimum performance or 95% of the maximal hypervolume indicator, respectively.The single-objective BO runs target catalytic activity, and these results are presented in Tab.S2, where the number of experiments is averaged over five independent runs for each 5D compositional space.
Depending on the compositional space, 15−67 runs are required on average, in reasonable agreement with the number of ca.50 evaluations reported in Refs.[ 20,21].For bi-objective optimization in 5D compositional spaces, the number of experiments required is similar to that needed for single-objective BO.Specifically, the results for optimizing catalytic activity and relative cost are comparable (see Tab. S3).Higher-dimensional spaces, such as 6D, 7D, and 10D, require a larger number of evaluations to converge, as indicated in Tab.S4, with approximately 100, 110, and 140 evaluations needed to reach 95% of the maximum hypervolume, respectively.Lastly, tri-objective optimization in the same HEA composition space requires more evaluations than bi-objective optimization, as shown in Tab.S5, with up to 161 evaluations for tri-objective optimization in a 10D compositional space.

S5 Additional details on the current density modeling
We use a heuristic current density modeling technique to estimate the average current density per active site (eq.2) relative to Pt(111) as an ORR activity indicator (as reported in Ref. 1,2,20,31,32 ).To obtain net adsorption sites i, an explicit simulation that iteratively places The current density of the net adsorption site j i is then modeled using the Koutecký-Levich equation (eq.3), where j D is the diffusion-limited current (set to -1), and j k,i is the kinetically limiting current obtained from an Arrhenius-like expression (eq.4).Here, the calculation of the kinetically limiting current requires the consideration of the ORR associative mechanism as described in eqs.5−8.In eq. 4, ∆G i denotes the OH* or O* adsorption free energy, ∆G opt i represents the optimal OH* (0.1 eV) or O* (0.2 eV) adsorption free energy larger than these of Pt(111), c i is a scale factor that is 1 for OH* and 0.5 for O*, U is the applied potential (set to 0.82 V) vs. RHE (reversible hydrogen electrode), k B is the Boltzmann constant, and T denotes the absolute temperature (set to 298.15 K).The 0.86 eV is the OH* adsorption free energy that ensures simultaneously minimizing the first two steps of ORR: the adsorption of molecular oxygen eq. 5 and the desorption of water eq.6.
Consequently, for the case of the on-top OH* site, assuming the first eq.5 and last step eq.8 as the limiting steps, the j k,i is calculated by − exp(− ).For the fcc hollow O* site, considering two simultaneous proton and electron transfers to form water (eq.7 and 8), we can estimate the j k,i in a similar manner as on-top OH* site, that is, − exp(− ).For a more detailed explanation and derivation, please refer to sections S5.1, S5.2 and previous publications. 1,2,20At the starting point of the explicit simulation, a bare HEA surface is represented by a sufficiently large fcc(111) supercell, e.g., 100 × 100, for the sake of adequate surface statistics.
Based on the way of determining adsorption described above, we can look at all active sites available on the bare surface with the help of the ML regression model and then place the first adsorbate on the strongest binding site.At the next interaction, some restriction rules are imposed to mimic competitive co-adsorption between O* and OH*, ensuring that no surface atom is bonded to more than one adsorbate.More precisely, these rules include: 1) An adsorbed on-top OH* will obstruct the adjacent three fcc hollow sites.2) If two adsorbed on-top OH* molecules form an immediate neighbor, the adjacent five fcc hollow sites and the two shared on-top sites will be obstructed.3) An adsorbed fcc O* will obstruct the adjacent three on-top sites and six neighboring fcc hollow sites (see Fig. S12).Under these rules, the second strongest binding sites will be occupied accordingly.This procedure is executed iteratively until no sites are available.

S5.2 Modeling reactivity of oxygen reduction reaction (ORR)
The ORR is a very sluggish reaction represented by eq. 9 with a reduction potential of +1.23 V vs. the reversible hydrogen electrode (RHE). 34(g) + 4 (H We consider and express the associative mechanism as the sequence of the four elementary PCET steps as described in eqs.5−8, where ∆G i is the Gibbs free energy change (reaction energy) for each step, e is the elementary charge, U is the electrochemical potential (vs.RHE).In order to get a reaction rate, we use an Arrhenius expression r = k • exp which k is a pre-exponential factor, ∆G ‡ RLS is the Gibbs free energy changes of the transition state of the rate-limiting step, k B is the Boltzmann constant, and T is the temperature (set to 298.15 K).
Since the electron-proton transfers in these reactions are generally facile, 35 the reaction barriers are often neglected so that the relevant ∆G ‡ RLS is equal to the reaction energy of the most uphill elementary reaction: ∆G ‡ RLS ≈ max(∆G 1 , ∆G 2 , ∆G 3 , ∆G 4 ).We therefore need to express the Gibbs free energy changes of the rate-limiting step in terms of adsorption enthalpies of the key intermediates.It has been proved that in HEAs, there are same-site binding energy linear scaling relations between on-top OOH* and on-top OH*, 36,37 such that ∆G OOH * can be approximated as ∆G OOH * = ∆G OH * + 3.2 eV.We then rewrite the first step of the associative mechanism to be ∆G 1 = ∆G OH * -1.72 eV + eU.Considering only the first and the final step to be rate limiting 33,38  ).
Note that the optimum catalytic activity at an adsorption enthalpy of 0.86 eV has been observed to be 0.1 and 0.2 eV larger than OH* and O* on Pt(111). 39,40We can further

S6 Correlated mixing entropy
While ideal mixing entropy is widespread use in HEA studies, it has been proposed that the formation of a random solid solution is influenced not solely by S id but also by other factors like atomic size difference and mixing enthalpy of the constituent atomic pairs. 41,42tionally, the occurrence of metastability in the random solid solution has been noted at lower temperatures.In our pursuit to move beyond ideal mixing entropy, we performed an empirical technique known as correlated mixing entropy, introduced by Yang et al., 43,44 which takes into consideration both atomic size and chemical bond misfit.
Its general structure can be formulated as S corr = S id + S E .In this context, S id signifies the ideal mixing entropy, while S E represents the excessive entropy of mixing due to correlation.Drawing from statistical thermodynamics and considering the general effect of potential energy fluctuations, Yang et al. derived a S E formula, eq. 10, in which x is the normalized energy fluctuation, encompassing contributions from two sources, namely atom size misfit (x e ) and chemical bond misfit (x c ) as outlined in eq.11.Furthermore, according to references, 43,44 x e and x c can be formulated as eq. 12 and eq. 13 respectively, where K is the average bulk modulus, V is the average atomic volume, H ij denotes the mixing enthalpy between the i th and j th elements, and H indicates the average of H ij .These tabulated features were obtained from. 45,46For a detailed explanation of the correlated mixing entropy derivation, readers are directed to the original publications. 41,43,44raging this correlated mixing entropy, we conducted bi-objective optimizations for catalytic activity and correlated mixing entropy across five 5-to 10-element HEA spaces at temperatures of 1000 K and 2000 K.The resulting learned Pareto fronts are displayed in Fig. S13.A similar trend is evident when compared with the learned Pareto front of catalytic activity and ideal mixing entropy (see Fig. 4 in the main text).Notably, there are more pronounced gaps among these HEA spaces at 2000 K, suggesting that the entropic effect becomes more significant with an increasing number of elements within HEAs.

Figure S2 :
Figure S2: Distribution of adsorption enthalpies of out-of-domain dataset for (a) composition-diversity and (b) component-diversity subset.
and CHEAT packages. 1Specifically, on-top OH* and fcc hollow O* were positioned above the surface at distances of 2.0 Å and 1.3 Å, respectively.After geometry optimization, the adsorption enthalpies were calculated as ∆E ads = ∆E HEA+ads − ∆E HEA − ∆E Pt(111)+ads + ∆E Pt(111) , where ∆E HEA+ads and ∆E HEA are the calculated total energy of the HEA slab with and without adsorbate (i.e., O* or OH*) and ∆E Pt(111)+ads and ∆E Pt(111) donate the calculated total energy of the Pt(111) reference slab with and with without adsorbate.

Figure S4 :
Figure S4: Illustration of the long-range feature used in node attributes of the GNN model.The long-range feature is determined based on whether the third layer exhibits a directiondependent effect through the second zone atom to the binding atom, i.e., the angle of ∠ 1A-2A-3A > 150 • .Redrawn based on Fig.4 in Ref. 10.

Figure S5 :
Figure S5: Parallel coordinates plot of the hyperparameter search of the GNN model.A grid search was performed on the dimensionality of convolutional layers, number of convolutional layers, number of hidden layers, batch size, and learning rate.Validation loss is in eV.

Figure S6 :
Figure S6: Parity hexbin plot of DFT-calculated vs. ML-predicted adsorption enthalpies for in-domain predictions of (a) on-top OH*, (b) fcc O* on test split.The colorbar denotes the number of points in each hexbin.

Figure S7 :
Figure S7: A general outline of multi-objective optimization including surrogate model, acquisition function, solver and selection.The diversity-guided multi-objective BO and single-objective BO used in this work were presented in line with this outline.

Figure S8 :
Figure S8: (a-b) Schematic illustration of the Pareto-discovery solver and batch selection strategy.Blue line and blue curve represent Pareto-optimal solutions in the design space and its corresponding Pareto front in the performance space.The green arrow refers to one iteration of Pareto-discovery solver.Batch samples are selected from different subclusters of the Pareto-front using a greedy approach.(c) Performance buffer data structure used to store candidates in the process of multi-objective BO.Each buffer cell includes the points with the minimum distance to the origin that intersects its corresponding ray and also top K candidates within an allowed tolerance of the minimal distance (these hyperparameters were presented in Tab.S1).This buffer gets updated with each iteration of the discovery algorithm.Redrawn from the inserts in Ref.18.

Figure S9 :
Figure S9: (a) Hypervolume indicator as a function of the number of evaluations, where the curve is averaged over five different seeds, and the variance is represented as a shaded region.For each 5-element space, we conduct multi-objective optimization five times using different random seeds.(b) The learned Pareto fronts for five different runs using bi-objective optimization for catalytic activity and mixing entropy.It is important to note that we consistently discover the same shape of the Pareto front across these runs, and the uncertainty mainly arises from extreme regions where the activity is either too high or too low.Nevertheless, these extreme regions are less interesting for the purpose of discovery.

Figure S10 :
Figure S10: The evolution of learned Pareto front with the increasing number of evaluations up to (a) 3rd iterations, (b) 6th iterations, (c) 9th iterations, and (d) full iterations.For the initial sampling, we used 10 data points, and for each iteration, a batch of 4 data points was suggested.The full iterations refer to the 23rd iteration, which corresponds to a total number of 102 evaluations.

Figure S11 :
FigureS11: The learned Pareto fronts for a selected 5-element HEA space (AgIrPdPtRu) and a 10-element HEA space (AgAuCuIrOsPdPtReRhRu) using tri-objective optimization for catalytic activity, cost-effectiveness and mixing entropy.The 3D Pareto fronts were projected to three pairwise 2D subplots with estimated probability density for each objective on the off-diagonal.The probability density was normalized independently for each HEA space.
adsorbates onto a sufficiently large surface based on adsorption strength was performed while taking into account the co-adsorption behaviors among O* and OH*.The simulation is expedited by a GNN regression model, which enables rapid prediction of adsorption energies in seconds.To ensure adequate surface statistics, we utilize surface supercells of sizes 100 × 100, 150 × 150, 200 × 200, and 250 × 250 with three atomic layers for the prediction of 5-, 6-, 7-, and 10-element HEA spaces, respectively.The inclusion of three atomic layers is attributed to the consideration of second-rank neighbors in the graph representation of the GNN model.(Further details on net adsorption simulation are provided in section S5.1).

7 ) 4 : 8 )S5. 1
O * + H + (aq) + e − −−⇀ ↽−− OH * , ∆G 3 = ∆G OH * − ∆G O * + eU (OH * + H + (aq) + e − −−⇀ ↽−− H 2 O(l), ∆G 4 = −∆G OH * + eU (Net adsorption enthalpy distribution In order to get current density, the first step is to obtain a net adsorption enthalpy distribution that is relevant to the actual active site participating in the catalytic reaction.Unlike gross adsorption enthalpy distribution, which simply includes all sites available on the surface, net adsorption enthalpy distribution accounts for active sites affected by co-adsorption interplay between O* and OH* at certain applied potential.It is obtained by performing an explicit simulation that iteratively places adsorbates onto the surface based on their adsorption strength.To model the effect of an applied electrode potential U vs. the reversible hydrogen electrode (RHE), we express the Gibbs free energy of OH* and O* as ∆G OH * (U) = ∆G OH * (0) − eU and ∆G O * (U) = ∆G O * (0) − 2eU. 33This is because O* has to undergo two proton-coupled electron transfer (PCET) steps but OH* only one.If we let ∆G OH * and ∆G O * equal to 0, it means that adsorption starts happening at this time, thereby we have ∆G OH * (0) = 1 2 ∆G O * (0) = eU.In this way, when U is increased, we can compute ∆G OH * (0) and 1 2 ∆G O * (0) with respect to eU to determine the occurrence of adsorption, that is, whether active sites are occupied or not.
yields an expression only depending on the adsorption enthalpy of OH*, that is, max(∆G 1 , ∆G 4 ) = max(∆G OH * -1.72 eV+eU, −∆G OH * +eU) = |∆G OH * -0.86 eV| − 0.86 eV + eU.Therefore, using Arrhenius expression and replacing ∆G ‡ RLS , we can now express the reaction rate as r OH * = k • exp(− |∆G OH * -0.86 eV|−0.86eV+eU k B T rewrite the reaction rate to r OH * = k • exp(− |∆G OH * -∆G Pt OH * −0.1 eV|−0.86eV+eU k B T ) relative to a pure Pt(111) surface.By doing so, there is no need to include the additional correction of the electrochemical environment (such as solvation stabilization) to the Gibbs free energy changes, as it is assumed to have the same effect as Pt(111) and therefore cancels out.To determine the reaction rate of fcc hollow O* sites, we consider the last two PCET steps to the formation of water to serve as an upper bound for the activity of fcc hollow O* sites.We assume that the O* adsorption enthalpy will scale with the OH* adsorption enthalpy for active sites, with a slope of 2, i.e. ∆G O * = 2∆G OH * (also see section S5.1).To compute the activity of fcc hollow O* sites using an expression similar to OH*, we get r O * = k • exp(− 0.5|∆G O * -∆G Pt O * −0.2 eV|−0.86eV+eU k B T).

Figure S12 :
Figure S12: Illustration of three blocking rules during explicit simulation of the heuristic current density modeling.(a) An adsorbed on-top OH* will obstruct the adjacent three fcc hollow sites.(b) If two adsorbed on-top OH* molecules form an immediate neighbor, the adjacent five fcc hollow sites and the two shared on-top sites will be obstructed.(c) An adsorbed fcc O* will obstruct three adjacent on-top sites and six neighboring fcc hollow sites.Redrawn from Fig. 6 in Ref. 1.

Figure S13 :
Figure S13: The learned Pareto fronts for five 5-to 10-element HEA spaces using bi-objective optimization for catalytic activity and correlated mixing entropy at temperatures of (a) 1000 K and (b) 2000 K

Table S1 :
Hyperparmaters of the multi-objective BO

Table S2 :
The number of experiments required to reach 95% of the optimum for singleobjective BO on catalytic activity.

Table S3 :
The number of experiments required to reach 95% of the maximal hypervolume indicator for bi-objective BO on catalytic activity and relative cost.

Table S4 :
The number of experiments required to reach 95% of the maximal hypervolume indicator for bi-objective BO on catalytic activity and mixing entropy.

Table S5 :
The number of experiments required to reach 95% of the maximal hypervolume indicator for tri-objective BO on catalytic activity, relative cost and mixing entropy.