Electrochemical Mechanistic Analysis from Cyclic Voltammograms Based on Deep Learning

For decades, employing cyclic voltammetry for mechanistic investigation has demanded manual inspection of voltammograms. Here, we report a deep-learning-based algorithm that automatically analyzes cyclic voltammograms and designates a probable electrochemical mechanism among five of the most common ones in homogeneous molecular electrochemistry. The reported algorithm will aid researchers’ mechanistic analyses, utilize otherwise elusive features in voltammograms, and experimentally observe the gradual mechanism transitions encountered in electrochemistry. An automated voltammogram analysis will aid the analysis of complex electrochemical systems and promise autonomous high-throughput research in electrochemistry with minimal human interference.

General considerations for the model of cyclic voltammetry S3 E mechanism S4 EC mechanism S6 CE mechanism S7 ECE mechanism S7 DISP1 mechanism S9 Additional considerations when sampling scan rate v S10 Additional discussion about the number of voltammograms n needed for mechanism determination S11 Table S1. The variables and the corresponding value ranges in the numerical simulation S13 Figure S1. Exemplary simulated cyclic voltammograms with different levels of Gaussian noises S15 Figure S2. Training of machine-learning algorithms for cyclic voltammetry S16 Figure S3. The "importance" plots of simulated cyclic voltammograms. S17 Additional References S18

General considerations for the model of cyclic voltammetry
We  Table S1. As shown below, the range in variable's values could be interdependent. Such interdependence and random sampling are implemented by python 3 scripts.
Partial differential equations Here ! denotes the mechanism-specific function that describes any possible C step in the solution. ! = 0 denotes the absence of any homogenous C steps.
in which & is the standard rate constant of interfacial charge transfer and = 0.5 is the transfer coefficient.
The above expression suggests that the characteristic time constant of diffusional behavior is (/ 01 for $ ( ). Therefore, in our simulation, the thickness of the diffusion layer L is adaptively chosen so that the L is more than six times of the characteristic length scale of diffusion within the noted characteristic time constant when T = 298.15 K (same below).

= 6 K • (2)
Here the scan rate v is evenly sampled both logarithmically and linearly between 0.01 to 2 V/s.
Additional algorithms to sample n number of different v values in the same simulated electrochemical systems is extensively discussed below.
In addition to the Faradaic processes simulated below, capacitive double-layer charging events are also simulated in all mechanistic scenarios, with double-layer capacitance Cdl randomly sampled linearly between 5 to 35 μF/cm 2 based on literature values. 6 The capcacitive current idl is simulated based the following equation, The current model does not include uncompensated resistance hence the iR drop. We contend that any serious mechanistic electrohcemical analysis should all be based on experimental data whose iR drop has been much minimized, if not completely mitigated, through judicious instrument setting during the experimental characterization. Moreover, as the reported deep-learning (DL) S5 algorithm does not intend to evaluate the reversibility of charge transfer within the E mechanism, the possible convolution between iR drop and quasi-reversible charge transfer will not negatively impact the practical utility of DL algorithm. Nonetheless, future versions of DL algorithms will consider including the impact of iR drops in the training data. Concentration-dependent Butler-Volmer equation 1 is employed to define the E step at the electrode interface.
Here, = 1/2 and 0. Following the Nicholson's formalism in cyclic voltammetry, 5 , is dependent on the standard rate constant of surface change transfer & : We chose ∈ [10, 0.3] following the Nicholson's formalism 5 , which corresponding to a peak EC mechanism Most of the constraints in the EC mechanism are the same as the Er mechanism with the following additional constraints.
The equilibrium constant of the C step %/A is logarithmically sampled between 10 0.5 ~ 10 3 .
The kinetic rate constant of C step in the forward direction @ is logarithmically sampled within the following upper and lower bound so that log B, Most of the constraints in the CE mechanism are the same as the Er mechanism with the following additional constraints.
The equilibrium constant of the C step (/A is logarithmically sampled between 10 −3 ~ 10 −0.5 .
The kinetic rate constant of C step in the forward direction @ is logarithmically sampled within the following upper and lower bound so that log B, ≡ B, r
The E step between R1 and O1 follows the same definition of R and O in the E mechanism.
The kinetic rate constant k of the C step is logarithmically sampled with the following constraints so that log B, ≡ B, r In order to accommodate the additional redox features, the Estart is now linearly sampled between Ewindow,c and −0.6 V vs. NHE. (1).
The E step between R1 and O1 follows the same definition of R and O in the E mechanism.
The Estart and the kinetic rate constant k of the C step follow the same definition of k in the ECE mechanism.
The kinetic rate constant kDISP of the DISP step is logarithmically sampled with the following constraints. The above definition indeed may also include scenarios that is similar, but not quite the same, to the DISP2 mechanism, 9 when the DISP step is slow and rate-limiting (yet the limiting case of DISP2 mechanism requires a reversible pre-equilibrium for the C step between O1 and R2). Such slight ambiguity of simulated voltammograms in the training data will be addressed in future versions of the algorithm.

Additional considerations when sampling scan rate v
In the sampling of simulated cyclic voltammograms, variables intrinsic to the chemistry of the electrochemical systems are first sampled either linearly or logarithmically. Variables related to the electrochemical testing conditions, including Estart, Ewindow,a, Ewindow,c, and v are sampled S11 subsequently. Particular attention was paid the sampling of v since multiple chemistry-intrinsic variables are also dependent on the v values as shown in Table S1. Because we aim to obtain up to 6 simulated cyclic voltammograms with different v values (n = 6), an iterative process of variable samplings was implemented in the python 3 scripts as shown below.
Step 1: After the initial generation of random combinations of chemistry-related variables listed in Table S1, a medium scan rate vmedium is linearly sampled between 0.1 to 0.5 V/s, the range of v mostly commonly used in cyclic voltammetry. We note that as shown below vmedium serves as a Step 2. The maximal and minimal scan rate vmax and vmin were randomly selected based on the following constraints. The above constraints ensure that vmax and vmin are within the ranges of 0.01 to 2 V/s, the peak separations in the Nicholson's formalism 5 will not deviate too much from the targeted values separation ( 8 = 62 ~ 120 mV), the voltammograms at maximal scan rates won't lead to indistinguishable redox peaks due to capacitive double-layer charging/discharging, and there is significant differences, 10 0.6 ~ 4 fold difference in current densities, among the n number of simulated cyclic voltammograms.
Step 4. 4 more additional v values are linearly or logarithmically sampled between the vmax and vmin, leading to 6 values of v in total (n = 6).

S12
Additional discussion about the number of voltammograms n needed for mechanism determination As discussed in the main text and presented in Fig. 2e, when n ≥ 2 the prediction accuracies of DL models trained by {v, i(E, σ)}n (n = 1 to 6, σ = 0.3) more or less remain equally satisfactory (> 95%). Such results suggest that within the tested set of simulated voltammograms, statistically on average there is diminishing returns of prediction accuracy when n > 2. Here we provide additional discussion and illustrate the parameter range when such a statement is applicable, given the defined parameter space of simulated voltammograms provided in the Supplementary Information and listed in Table S1.
In section "Additional considerations when sampling scan rate v" of the Supplementary  Here the "≈" sign suggests that the above relationship is a statistically approximation given that    Here the "linear classification" model is the simplest as it require that the high dimensional data be linearly separable. In the fully-connected MLP every node of every layer is connected to every node of the layers before it and after it. Instead, the ResNets architecture are not fully connected and further have residual layers in place which act as mechanisms to prevent training problems such as exploding and vanishing gradients that can occur during the learning process/optimization of network architectures. S18 Figure S3.