Fluctuation Relations to Calculate Protein Redox Potentials from Molecular Dynamics Simulations

The tunable design of protein redox potentials promises to open a range of applications in biotechnology and catalysis. Here, we introduce a method to calculate redox potential changes by combining fluctuation relations with molecular dynamics simulations. It involves the simulation of reduced and oxidized states, followed by the instantaneous conversion between them. Energy differences introduced by the perturbations are obtained using the Kubo-Onsager approach. Using a detailed fluctuation relation coupled with Bayesian inference, these are postprocessed into estimates for the redox potentials in an efficient manner. This new method, denoted MD + CB, is tested on a de novo four-helix bundle heme protein (the m4D2 “maquette”) and five designed mutants, including some mutants characterized experimentally in this work. The MD + CB approach is found to perform reliably, giving redox potential shifts with reasonably good correlation (0.85) to the experimental values for the mutants. The MD + CB approach also compares well with redox potential shift predictions using a continuum electrostatic method. The estimation method employed within the MD + CB approach is straightforwardly transferable to standard equilibrium MD simulations and holds promise for redox protein engineering and design applications.


Introduction
Electron transfer is fundamental in biological processes such as photosynthesis and respiration.2][3][4] Redox active metallocofactors, such as heme, enable many natural oxidoreductases to catalyse a wide range of reactions, including hydroxylation and oxygenation. 1,5Heme-containing proteins are ubiquitously found in nature and are involved in many biological electron transfer processes. 1,6heir redox potentials play a role in determining their activities, such as oxygen binding, electron transfer and catalysis. 1,6he redox potential E of a heme center can be described as its tendency to acquire electrons and thus to become reduced.Thus, the higher the value of the redox potential, the more favorable the reduction of the group.The intrinsic properties of the heme macrocycle are critically modulated by the protein environment. 1,6Many properties are important for this, including the axial residues directly coordinating the iron, the second coordination sphere interactions, such as hydrogen-bonds, and the electrostatic environment surrounding the center. 1,6Given the multitude of factors involved in this 'tuning', the accurate prediction of the redox properties of a heme-containing protein remains a challenge.There is a need for computational methods capable of predicting redox potentials for natural proteins (e.g., for analysing effects of mutations) and potentially in the engineering of proteins, both natural and de novo with altered redox properties.
Engineering of existing redox proteins and construction of novel designs offers possibilities such as tuning enzymes towards alternative substrates and creating novel electron transfer systems 3,7 for applications in biocatalysis, biosensing, biofuel generation and bioelectronics. 6,8Reliable prediction methods will assist functional protein design and complement directed evolution by identifying target sites for mutation.Calculations of redox properties of mutant proteins could usefully be incorporated into design protocols, e.g. to identify promising locations for mutations for synthesis, and narrowing the experimental search spaces for desired properties.
Despite difficulties, examples of useful applications of 'tuning' redox potentials exist (see e.g. 9,10).However, such successes have been based generally on qualitative insight, and trial and error approaches.Alternative predictive methods for redox properties of proteins, whether designed de novo or engineered natural proteins, could significantly accelerate applications in engineering biological systems at the molecular level.
Biomolecular simulations, such as equilibrium molecular dynamics (MD) simulations, can contribute to such developments.Simulations are increasingly assisting the design of proteins, e.g. as catalysts. 11Such approaches, which can be used qualitatively, to predict the stability of designs, are becoming increasingly capable of predicting thermodynamic properties.3][14][15][16][17][18][19][20] Challenges include for example a proper dynamic representation of the different redox states and their electrostatic interactions, and sampling the relevant conformational states (e.g. 13,15).3][14][15] For example, in MD-based free energy simulations, the protein's conformation changes are explicitly treated for a fixed reduction/protonation state.However, such calculations are computationally expensive.
2][23] ).These methods are much faster than MDbased approaches as they sacrifice configuration aspects of the protein and/or solvent.Nonetheless, for most proteins, the lack of explicit dynamics can affect the accuracy of the predictions (e.g. 15 ).Hybrid approaches, combining MD and continuum electrostaticsbased methods have also been developed to estimate protonation and reduction changes. 24,25Such methods require adequate sampling of conformational space, which is still a challenge for most proteins. 21][28] They link the distributions, p(•), and averages, ⟨•⟩, of stochastically fluctuating quantities, such as entropy S, work W or heat Q, for a particular process Λ with those of the time-reversed process Λ. Fluctuation relations establish limits on the microscopic fluctuations of small systems that are much more detailed than the macroscopic laws of thermodynamics. 29,30Practically, they are used to infer free energy differences ∆G based Figure 1: Predicted structure of the de novo monoheme binding maquette protein m4D2.The structure shown was built using Rosetta 37 (for more details, see 38 ).m4D2 is a designed, de novo 4-helical bundle protein that binds heme B. Heme B and the histidine residues axially coordinating the Fe atom are shown with green sticks, whereas the sites of the mutations studied here are shown with magenta sticks.on data from highly non-equilibrium experiments.2][33] Combined with fluctuation relations, these experiments have been been used, for example, to determine ligand binding energies, as well as characterise selectivity and allosteric effects of nucleic acids and peptides. 33ere we use a detailed fluctuation relation, the Crooks relation, 28 with data from MD simulations to calculate protein redox potential changes.To our knowledge, this is the first application of these relations in this context.We use this method to predict the redox potential E of a de novo designed protein, the m4D2 'maquette' (Fig. 1), and several mutants (single and double).m4D2 is a well-characterized soluble four-helix bundle monoheme binding protein.Redox potentials have been experimentally determined for m4D2 and several mutants by optically-transparent thin-layer electrochemistry. 34m4D2 is quite small (about 110 residues long), making it an amenable target for MD simulations. 34[36] 2 Materials and methods

m4D2 structure
The computational design of the monoheme m4D2 structure was performed using Rosetta 37 as described in detail in Ref. 38 In our design (Fig. 1), the heme group is coordinated by two histidines situated on diag-onally opposed helices, and is positioned with its propionate groups pointing towards the end of the bundle.In m4D2, there are two threonine residues (threonines 19 and 77), which are directly involved in key hydrogen bonding interactions with the heme-coordinating histidines.Experimental mutation of threonine 19 to an aspartate 38 changes the redox potential by −28 mV relative to m4D2 (T19D in Tab. 1).Replacing both threonine 19 and 77 with aspartate 38 has a substantial effect on the heme redox potential, decreasing it by −56 mV relative to m4D2 (see double mutant, DM, in Tab. 1).These residues were originally selected for mutation as there is an aspartate in an equivalent histidine-interacting position in horseradish peroxidase, which is believed to play a critical role in the enzyme's catalytic triad by increasing the imidazolate character of the proximal histidine. 39,40This local increase of negative charge was expected to lower the redox potential of the heme.
Arginines 34 and 92 are located on the second and fourth helices of m4D2, respectively, and likely interact directly with the heme propionates through ion pairing and hydrogen bonding, as observed in the 4D2 crystal structure. 38,41The distance between these arginines and the propionate groups was monitored in our trajectories (Fig. S9 in the Supporting Information (SI)), and as predicted, these residues can form direct interactions with the negatively-charged propionates.Mutation of either of these arginines is likely to alter the network of interactions involving the propionate groups.Experimentally, replacement of either arginines by glutamine (so removing the positive charge) is here found to change the redox potential by approximately −30 mV (see R34Q and R92Q in Tab. 1).
Finally, methionine 23, which is located on helix 1, is also close to the heme.Nonetheless and despite its proximity to the heme, experimentally mutating this methionine to asparagine (see M23N in Tab. 1) has little effect on the redox potential of the heme (with a +1 mV change relative to m4D2). 38

MD simulations
MD simulations were used to generate ensembles of conformations of m4D2 41 , four single mutants (T19D, M23N, R34Q, R92Q) and a double mutant, namely T19D-T77D (hereafter labelled DM).The m4D2 model 38 produced by Rosetta described above in section 2.1 was used as the starting point for the m4D2 simulations.Starting structures for simulations of the mutants were created using the mutagenesis tool in PyMOL. 42.For each protein, MD simulations were performed for the reduced and oxidized form.
MD simulations were performed using GROMACS [43][44][45][46] on the University of Bristol's compute cluster, BluePebble.The GROMOS 54A7 force field 47 was used for protein, and parameters for the oxidized and reduced redox center were taken from our previous work. 38Protein models were solvated in dodecahedral boxes, with a minimum distance of 2 nm between the protein and the box limits.The simple point charge (SPC) water model 48 was used.The total net charge of m4D2 and M23N in the oxidized and reduced states is -2 and -3, respectively.For the T19D, R34Q and R91Q mutants, the overall charge of the proteins in the oxidized and reduced states is -3 and -4.Finally, the total charge for the double mutant (in which aspartate residues replaced both T19 and T77) is -4 and -5 for the oxidized and reduced state, respectively.The overall net charge in all systems was neutralized by adding the exact number of positively charged ions to offset the net charge on the proteins.Overall, 2 and 3 sodium (Na+) ions were added in the oxidized/reduced m4D2 and M23N systems; 3 and 4 Na+ ions were included in the T19D, R33Q and R91Q systems; and 4 and 5 Na+ ions were added to the double mutant system.In addition to the ions needed to neutralize the systems (i.e. to make the total net charge of the proteins equal to 0), an ionic concentration of 0.05 M sodium chloride was also included in the simulation boxes to mimic the experimental conditions.
Simulations were performed at constant temperature and pressure, using the velocity rescaling thermostat 49 at T = 298 K and the Parrinello-Rahman barostat 50,51 to maintain the pressure at 1 bar.A time step of 2 fs was used for integrating the equations of motion.The LINCS algorithm 52 was used to constrain bonds in the protein and the SETTLE algorithm 53 was used to keep water molecules rigid.Long-range electrostatic interactions were calculated using a particle mesh Ewald method, 54 with a Fourier grid spacing of 0.12 nm and a 1.4 nm cut-off for direct contributions.1000 steps of energy minimization with the steepest descent method with harmonic restraints applied to heavy atoms, followed by a further 1000 steps restraining the Cα atoms only, then 1000 steps with no restraints, were performed prior to MD simulation.Then, a 3 ns restrained MD relaxation was performed to relax the system, prior to unrestrained MD simulations.
Multiple MD simulations were performed for each system and redox state: ten 500 ns unrestrained MD simulations were performed for the reduced and oxidized states of m4D2 and for the single mutants (T19D, M23N, R34Q and R92Q).Twenty 500 ns unrestrained MD simulations were performed for the T19D-T77D double mutant (DM) in the reduced and oxidized states.The replicas were initiated with different sets of random velocities.In total, this amounts to 70 µs of simulation.All analyses were performed using the GROMACS package [43][44][45][46] and in-house tools.PyMOL 42 was utilised for molecular representation.
All proteins were stable over the simulation time.The simulations showed that the structures of all of the mutants are overall similar to m4D2, as expected.The proteins all appeared to be equilibrated after 100 ns (see Figs. S2-S7 in SI).The first 100 ns were taken as equilibration, and only the last 400 ns was analysed (Figs.S1-S8 in SI), unless stated otherwise.Principal compon-ent analysis was used to evaluate the sampling of the conformational space by the replicates (see Figs. S5-S6).To analyse dynamical changes caused by the mutations, root mean square fluctuation (RMSF) profiles of the Cα atoms were calculated (Fig. S8).The RMSF plots show that the effects of the mutations are localised to the regions surrounding the mutation site.Some decrease local fluctuations (e.g., T19D) while others increase them (e.g., R92Q), relative to m4D2.The RMSF profiles also show that the unstructured loop regions of the proteins are very mobile, representing some of the largest peaks observed Fig. S8.Such dynamic behaviour is also contributing to the high RMSD values observed in Fig. S2 in the SI.As can be seen in Figs.S3-S4, the RMSD profiles for the structured parts of the protein (i.e.excluding the loop regions) show lower Cα deviations from the structures used as starting points for the simulations.These analyses altogether indicate that the MD simulations provide a reasonable conformational sample for calculations of redox potentials.

Nonequilibrium Perturbations
To determine the energy cost of reducing and oxidizing the heme group, conformations were extracted every nanosecond from the equilibrated trajectories (400 conformations per replicate) and used as starting points for the reduction/oxidation events (in a total of 4000 conformations per system for m4D2, T19D, M23N, R34Q and R92Q, and 8000 conformations for the DM).In each of these extracted conformations, the redox state of the heme was (instantaneously) changed.
For each conformation, the energy change associated with the oxidation/reduction of the protein was calculated as the difference in energy between the states in that conformation.Note that these energy differences, which were obtained using a molecular mechanics force field, do not take into account the electronic effects associated with adding/removing an electron from the heme group, i.e. the intrinsic energy for reduction/oxidation of the heme cofactor (the intramolecular contribution to the energy).
The energy difference (∆ϵ) between oxidation states in the protein was determined using the Kubo-Onsager approach: [55][56][57][58] specifically, by calculating the difference in the potential energy of the protein between every pair of reduced/oxidized conformations extracted from the simulations.The large number (thousands) of replicates allows for convergence of the calculated energetics associated with heme reduction and oxidation.
Note that in this work, the oxidation/reduction reorganization energies are obtained by determining the instantaneous energy difference between redox states.This is a simple way to determine the work values W (for more details see section 2.4).An interesting future extension would be to use dynamical nonequilibrium molecular dynamics (D-NEMD) simulations 55,56,58 to estimate such energies.Additional factors (e.g.barostat effects) need to be considered for this.Note also that for the energy difference between the two redox states extracted from the MD simulation we will include only the contribution from the protein.It is clear that the solvent can make a non-trivial contribution.However, when taking the energy differences between states of the protein plus the full solvent, we found that the fluctuations are then too large in comparison to the redox potential we want to extract.A solution to this problem would be to only include a portion of the solvent.But this just shifts the problem of where to make the cut.Future theoretical development is needed that addresses how to adequately include the solvent effects.
It should be noted that the experimental redox potential shifts, while they might appear large, correspond, in fact, to very small free energies relative to the systematic and statistical errors associated with a typical computational calculation.Indeed, this is why predicting redox potential shifts is such a challenging task.Besides the contribution of the solvent, there are other factors that are not considered and that can affect the energy differences.These include quantum effects which are not captured by molecular mechanics approaches, such as changes in the heme's polarization, and density and ionization energy due to different environments.Sampling problems, imprecision in the model produced by Rosetta, biases introduced when building the models for the mutants, force field limitations (e.g. the lack of polarization), uncertainties in the protonations states of the titrable residues are all examples of factors that can affect the energy differences, and thus, computational predictions.

Fluctuation relations
The goal of this investigation is to predict redox potentials E using fluctuation relations applied to the data from these perturbations.We employ a Bayesian generalisation of the procedure to estimate free energy differences via the Crooks fluctuation relation. 28,59irst, we introduce the Crooks relation.This detailed fluctuation relation is formulated for a generic system, such as a harmonic oscillator or a molecule, with at least one externally controlled parameter λ with two settings A and B. For example, the harmonic potential's frequency can be either λ A or λ B , and in a molecule a charge can be absent (λ A ) or present (λ B ). Starting at setting λ A , with an equilibrium state at inverse temperature β = 1/k B T where k B is the Boltzmann constant, the system is pushed (arbitrarily far) out of equilibrium by varying the parameter λ with some protocol Λ.This can either be a smooth variation in time, λ(t), or an instantaneous change, e.g., Work W is received by the system during the action of the non-equilibrium protocol.Crooks' relation 28 quantifies the likelihood p of a specific work value W being required given the forward protocol Λ in comparison to the likelihood of the corresponding negative value −W being required given the backward protocol Λ, i.e., While the left-hand-side contains non-equilibrium work distributions, the right hand side contains equilibrium properties of the system at settings λ A and λ B .Specifically, ∆G = G B − G A is the free energy difference between the two equilibrium states at inverse temperature β.Obviously, in a quasistatic protocol Λ qs where the system is always in equilibrium, one would have p(+W |Λ qs ) = p(−W | Λqs ) and W = ∆G in every run, as expected from macroscopic thermodynamics.The power of relation ( 1) is that it is valid for protocols that drive the system far from equilibrium.
In the context of our heme-containing proteins, the two settings are that an electron is absent (setting λ A , oxidized) or present (setting λ B , reduced).The free energy difference for reduction is then ∆G 60 where E is the redox potential of the heme, F is Faraday's constant (96485.3C mol −1 ), and n is the number of electrons being transferred.In our case, n = 1, so that ∆G = −F E.
The reduction of the heme group is thus identified with the forward protocol Λ, while the oxidation process is the backward protocol Λ.To get the work values for the reduction (oxidation) protocols, we first note that there is no heat contribution since the simulations implement an instantaneous appearance (disappearance) of the electron in the heme, leaving no time for heat to be exchanged. 27,30We define the statistical work value W i , received by the heme in the i-th reduction process Λ as the statistical energy difference W i = ϵ i red − ϵ i ox =: ∆ϵ i for i = 1, ..., µ.I.e., in each run, an equilibrium simulation gives the initial statistical energy value of the oxidised protein, ϵ i ox , and a subsequent non-equilibrium perturbation, where the electron has been removed, gives the final statistical energy value, ϵ i red , for the reduced protein.Similarly, for the backward protocol Λ, the statistical work is For an ensemble of non-equilibrium processes, for a given m4D2-mutant, we obtain a set of work values W = (W 1 , . . ., W 2µ ), where entries 1, ..., µ correspond to reduction and entries µ + 1, ..., 2µ to oxidation.
When information from both directions of a process, Λ and Λ, is available, the commonly used procedure to estimate ∆G is via the Crooks relation (1), as follows: 28,30 Constructing the forward and backward work histograms, p(W |Λ) and p(−W | Λ), one identifies the point W * where they cross, i.e. p(W * |Λ) = p(−W * | Λ).By virtue of the Crooks relation (1) this gives an estimate 33 for the free energy as ∆G = W * .For the m4D2 protein this is illustrated in Fig. 2c), and in Fig. 2g) and Fig. S13 for the mutants.It should be noted that the work distributions in Figs.2c), 2g) and Fig. S13, corresponding to the nonequilibrium work values associated with instantaneous oxidation/reduction processes, can also be viewed as equilibrium distributions obtained from sampling the oxidized and reduced states.
However, this histogram-based method has the caveat that it requires a sufficiently large number of iterations to produce good estimates. 61Not only is finite statistics known to significantly impact the quality of standard free energy estimates, [62][63][64] but using estimators whose validity depends on the size of the sample poses the risk of amplifying potential errors due to limited sampling in MD simulations.To address these caveats, Maragakis et al. 59 put forward a Bayesian framework for the estimation of free energy differences that combines the Crooks relation with Bayes theorem. 61,65The key advantage of using Bayes theorem is that it can extract all the information available in a given sample regardless of its size. 59,61,66Therefore, it can help to prevent amplification of limited sampling.
To use this approach, one first calculates the conditional probability density p(∆g|W ), where ∆g denotes a hypothesis about the true free energy difference ∆G given the work values W provided by the MD simulations.Following Maragakis et al. 59 , we use where ) is the inverse temperature used in the simulations (i.e., T = 298 K).Eq. ( 2) is derived using minimal prior information, see details in the SI subsection S.3.The probability distribution p(∆g|W ), shown in Fig. S14 for m4D2 and its mutants, contains all the information available to estimate ∆G.To map this information into a concrete value for the redox potential E, one uses the estimator (indicated by a tilde) where F is the Faraday constant.This estimator is optimal under the square error criterion, 59,61 with error The use of energy differences ∆ϵ obtained from MD simulations of proteins (see section 2.3), together with the Crooks-Bayes estimator given in Eq. (3) for probability distribution (2), constitutes a new computational method for the prediction of redox potentials.We refer to this method as the MD+CB method, see illustration in Fig. 2. The redox potential shifts of the mutants relative to m4D2 obtained with the MD+CB method are reported in Tab. 1.The estimate δ Ẽ for the shift δE was calculated by subtracting the E-estimate ( Ẽ) for m4D2 from that for T19D, and the errors were propagated.The same procedure was applied to all of the mutants, with results given in Tab. 1.

Note on other methods
Given the MD simulated work values, a variety of estimation methods exist to infer the free energy difference ∆G.One of them is using Bennett's acceptance ratio (BAR). 67,68The BAR estimation method has been shown to be akin to combining Crooks' relation with maximum likelihood estimation. 59,69This makes its reliability generally justified only in the limit of an asymptotically large number of work measurements. 70In contrast, the Crooks-Bayes approach we use here leads to reliable estimates even for small data sets.This is a general property of Bayesian estimation techniques, en-abled by the inclusion of prior information (or the absence of it), such as symmetries 71 , in the calculations of estimators and errors. 61,65nother common method is using Jarzynski's equality, 27 ⟨exp(−βW )⟩ = exp(−β∆G), an integral fluctuation relation that can be deduced from the Crooks relation.Jarzynski's equality provides a useful estimator for ∆G for experiments conducted out of equilibrium that can only implement one direction of the protocol. 31,72t is necessarily less informative than using the Crooks relation, because the average over a work probability density is less informative than the probability dens-ity itself.Note that analogous considerations apply to any other method based on work averages for one direction of the protocol.This includes, e.g., Zwanzig's free energy perturbation (FEP) formula, 73 which coincides with Jarzynski's estimator for instantaneous changes, and the linear response (LR) approximation, which is based on the assumption that work distributions are Gaussian.
Relationships between various free energy estimation methods based on work averages or distributions have been discussed in the literature. 27,59,69,73Hence, in addition to comparing the predictions of the new MD+CB method to experimental results, we here chose to compare them to the predictions of a method that is completely different.This approach, which we abbreviate PB+MC, is based on the widely used continuum electrostatic method 23,74 and is known to perform reasonably well for these systems, providing a baseline for practical protein engineering applications.
The change in redox potential of the heme group between m4D2 and mutants has previously been calculated with this approach. 23,74This method involves simulating the joint binding equilibrium of proton and electrons.It uses a combination of Poisson-Boltzmann (PB) calculations, e.g. with MEAD (version2.2.9), [75][76][77] and Metropolis Monte Carlo (MC) calculations, using the software PETIT (version 1.6). 78The PB calculations compute the individual and pairwise terms needed to obtain the free energies of protonation/reduction changes.These energies are then used in the MC calculations.The changes in the redox potential of the heme group relative to m4D2 is determined from the corresponding reduction curve by extracting the E-shift values corresponding to a reduced fraction of 0.5 in Fig. S11.
The structural model predicted by Rosetta 38 was used for the calculations of m4D2, while models for the mutants were constructed using PyMOL. 42One structure for each system, namely m4D2, T19D, M23N, R34Q, R92Q and the DM, was used for the calculation.We note that a slightly different structural preparation protocol prior to the calculations gives slightly different results. 38The charges for all the atoms in the protein (except the heme group) and radii were taken from the GROMOS 54A7 force field 47 using a previously described procedure. 79The partial charges for the heme group were taken from our previous work. 38These calculations use the temperature of 298 K and a molecular surface defined with a solvent probe radius of 1.4 Å. 79 The dielectric constants used for the solvent (ε sol ) and for the protein (ε protein ) were 80 and 20, respectively. 79An ionic strength of 0.05 M was used.The finite-difference linear PB calculations used a three-step focusing procedure 80 employing consecutive grid spacing of 1.0, 0.5 and 0.25 Å.Each MC calculation comprised 10 5 MC steps and the acceptance/rejection of each step followed a Metropolis criterion 81 using the PB free energies.
We refer to this alternative method as the PB+MC method, and report the predicted redox shifts of the m4D2 mutants in column 5 in Tab. 1.
The mediators are used to facilitate electron transfer between the working electrode and the heme cofactor and therefore promote rapid equilibration in the electrochemical cell. 83o obtain the reduction potentials of the proteins, a Biologic SP-150 was used to apply stepwise potentials between a thin platinum gauze working electrode and a platinum counter electrode, typically over a range of −525 mV to −225 mV vs a Ag/AgCl reference electrode also in the electrochemical cell.The protein sample and thin Pt gauze working electrode were housed in a modified quartz EPR cuvette (Wilmad), with a pathlength of 0.3 mm, and the counter and reference electrodes were held above in a glycerol-free buffer layer within a fused glass side arm tube.UV-Visible absorbance spectra were recorded between 200-800 nm after 30 mins of equilibration at each potential to measure the evolution of the heme absorbance spectrum as cycled between ferric and ferrous states during reductive and oxidative sweeps of potential.Redox potential measurements of horse heart cyctochrome c and m4D2 were used to calibrate the Ag/AgCl reference electrode for each round of redox measurements, enabling reduction potentials to be quoted versus the Nernst Hydrogen Electrode (NHE).The experiments were conducted at room temperature (ca.T = 298 K).
For these b-type heme proteins with bis-histidine coordination, A 416nm represents the position of the oxidized, ferric Soret band, and A 429nm represents the position of the reduced, ferrous Soret band.The ∆A 429nm was plotted against applied potential (mV), and the Nernst equation to calculate the redox midpoint potential (E): Here, E app and E are the applied potential and the redox potential, respectively; R, the universal gas constant; T , the temperature; n, the number of electrons being transferred; [red], the concentration of the reduced ferrous heme; [ox], the concentration of the oxidised ferric heme; and F , Faraday's constant.For a normal reduction reaction such as the general Nernst equation ( 5) can be used to describe the redox potential under non-standard conditions such as those in which the data were collected.
For these experiments, data was collected in triplicate and then processed using a Jupyter Notebook.Normalised, mean data was fit to a 1-electron Nernst equation using the SciPy optimize curve fit function. 84

Results and discussion
Multiple long MD simulations were performed for the oxidized and reduced states of m4D2, four single (T19D, M23N, R34Q, R92Q) and the T19D-T77D double mutant (DM).The trajectories provided ensembles of conformations (4000 for m4D2 and single mutants and 8000 for the double mutant) to calculate the instantaneous oxidation and reduction process of the proteins as described in section 2.3 using the approach illustrated in Fig. 2.
As outlined in section 2.4, fluctuation relations were then used to calculate redox potential shifts for the mutants relative to m4D2 (see results in Tab.1), using the energy differences described above.The energy changes between the reduced and oxidized states were used to determine the statistical work values associated with reduction and oxidation, for each of the six proteins.These work values (Fig. S12) were used to calculate the probability density (2).This encodes the information that the MD simulations provide about the redox potential according to the Crooks relation.All six densities are reported in full in Fig. S14.These probabilities were used to calculate the Crooks-Bayes estimates Eq. ( 3).The redox potential changes calculated in this way are given (relative to m4D2) in Tab. 1 (column 4).The uncertainties for these values were propagated from the mean square errors (4) for each redox potential, and are indicated in brackets.The results for all six proteins showed statistical convergence from µ ≃ 2000 data points (Fig. S15).
While Eq. ( 3) is in principle capable of predicting absolute redox potentials (see Tab. S1 in the SI), it should be noted that the energy differences from the simulations used here do not account for quantum effects (such as, polarization and ionization energy) as those are not captured by molecular mechanics approaches.Therefore, we report the redox potentials as changes relative to m4D2, denoted as δE; see Tab. 1.
The calculated redox potential shifts for the m4D2 mutants determined using the Crooks relation here generally correlate well with the experimental values Tab. 1 (columns 3, 4).For T19D, the change in redox potential predicted by MD+CB result is −4 mV (experimental shift is −28 mV) whereas for M23N is 14 mV (experimental shift is +1 mV).The MD+CB predictions for the two arginine-to-glutamine mutants, namely R34Q Table 1: Experimental redox potentials E (column 2) and corresponding changes δE relative to m4D2 (column 3), where the double mutant (DM) is T19D-T77D.Previously measured redox potentials 38 are indicated in the table.Calculated redox potential changes relative to m4D2 using the proposed MD+CB method, which post-processes the data generated by the MD simulations via the Crooks-Bayes estimator (3), are shown in column 4. Calculated redox potential changes relative to m4D2 using a well established CE approach (column 5), combining PB calculations and MC simulations (PB+MC). 23,74The errors associated with δE were propagated from those for E. Possible reasons for the differences observed between the MD+CB predicted and experimental redox shifts are discussed below.
The MD+CB calculations correctly predict the sign of the redox potential change for all of the mutants.They also give the correct order of the redox potentials shifts for all the single mutants simulated: M23N > T19D > R34Q ≈ R92Q.Indeed, for single mutants, the Pearson correlation coefficient is ρ corr = 0.97.Interestingly, for the single mutants, the MD+CB method predicts redox potential shifts that are consistently offset in comparison to experiment, by around -18 mV.
These findings indicate that this approach may be usefully predictive of redox potential changes for single mutants.The performance of the MD+CB method is overall comparable to that of the PB+MC approach for these proteins.
The good agreement for the single mutants indicate that the MD+CB method can give good results.The energy differences for the double mutant probably arise from issues of modelling the structure, protonation states and sampling the conformational landscapes of this mutant, in one or both redox states.For the MD+CB method to give good results, it is essential that the MD simulations sample the conformations of each state adequately (giving a representative ensemble of structures for each), and that they overlap sufficiently.
Although several microseconds of simulation were performed for each system, the sampling gathered may not be enough to explore the conformations of these mutants in one or both redox states.This is suggested by simulations of, for example, the oxidized state of R92Q, in which we find an unusually persistent direct hydrogen bond between glutamine in position 92 and the heme propionates, present in more than 35% of the total simulation time (Fig. S9 B).The uncommonly high frequency of this interaction in the oxidized R92Q system suggests that these simulations may be trapped in an energy minimum.This persistent hydrogen bond is indeed observed in five of the ten R92Q trajectories for the oxidized state.Predictions for double mutants are generally more difficult because the structural changes induced by two mutations are generally significantly larger than for single mutations.The dynamics of the specific double mutant here (T19D-T77D) are significantly altered from m4D2, due to the two extra negative charges.The introduced aspartate residues form strong electrostatic interactions with nearby positively charged residues (K36 and K94) (Fig. S10).These new salt bridges significantly affect the overall dynamics of the protein, rigidifying it (Fig. S8).This may mean that longer simulations or more repeats are required to sample the conformational space of this mutant properly.Overall (and despite the limitations discussed in section 2.3), our results clearly show that using fluctuation relations to post-process MD simulations data is a reasonably reliable approach to predict redox shifts in proteins, given sufficient sampling of both redox states in MD.
To benchmark the MD+CB method, we also calculated the changes in redox potential relative to m4D2 using a well established CE approach, which combines Poisson-Boltzmann (PB) electrostatic calculations with Metropolis Monte Carlo (MC) simulations. 23,74This method, which uses simplified models for both the solvent (dielectric continuum) and the protein (atomic point charges immersed in a low dielectric medium), allows fast calculation of the free-energy terms associated with redox and protonation changes. 23,74These energy terms are then used to sample protonation and redox states using MC. 23,74.This PB+MC method has already proven to be a valuable tool for the redox engineering of m4D2: it correctly predicted the order of reduction of the designed m4D2 mutants using predicted structures from Rosetta. 38he PB+MC results are presented in (Tab. 1, column 5) and (Fig. S11).Overall, we observe that for some mutants, e.g.T19D and M23N, the PB+MC predictions are closer to the experimentally measured redox shifts while for others, e.g.R34Q and R92Q, the new method combining MD simulations and fluctuations relations (MD+CB) performs as well as the PB+MC approach.
It should be noted that in the PB+MC calculations, the dynamic behaviour of the proteins is implicitly modelled using a dielectric constant.In con-trast, a large set of conformations (4000 for m4D2, T19D, M23N, R34Q and R92Q, and 8000 conformations for the DM) was used for the MD+CB predictions, thus meaning that in this approach, protein dynamics is being explicitly factored into the calculations.This may be an advantage for systems that undergo conformational changes during the reduction/oxidation process.A potentially significant effect included in the PB+MC and not in the MD+CB is the inclusion of protonation state changes in the protein associated with redox changes.This may account for the better performance of PB+MC in some cases.A potentially useful extension to the MD+CB method would be the inclusion of protonation state changes, e.g. through MC calculations or constant pH MD.
For all the mutants studied here, both methods give results with a similar correlation strength with the experimental values, namely 0.85 and 0.84 for the MD+CB and PB+MC methods, respectively.However, for the single mutants only, the MD+CB method shows a correlation of 0.97 with the experimental data compared with 0.61 obtained for the PB+MC method-thus reinforcing the earlier claim that fluctuation relations are, when combined with MD simulations, a valid predictive tool for calculating redox potential changes.

Conclusions
Natural and designed redox proteins are increasingly widely used in technological applications (e.g., in biocatalysis and biomolecular electronics).For such applications, there is a need to be able to predict protein redox potentials, e.g. to aid in designing mutations to change the redox potential to optimize it for a particular application.Molecular simulation tools are potentially useful in this context, to suggest candidates for experimental characterization, and to understand the causes of observed redox potential changes.Here, we have proposed and tested a method for calculation of redox potentials from MD simulations with application of fluctuation relations and Bayesian inference.Comparison of the predictions of the MD+CB approach against experimentally measured redox potentials for point variants of a de novo heme protein indicate that the method is usefully predictive for relative potential shifts.Comparison with the completely different PB+MC method indicates that the approach proposed here performs similarly well for single point mutations.The MD+CB method can be readily applied as a complement to standard equilibrium MD simulations of different redox states, and so may be useful in protein design and engineering applications.

Data availability statement
The files containing the energy differences extracted from the MD simulations for m4D2 and mutants, as well as the scripts needed for the prediction of the redox potential shifts using the MD+CB approach, are available on GitHub (github.com/jesus-rubiojimenez/FlucRels-CrooksBayes).
Acknowledgement JR thanks F Cerisola, M Rider and WP Wardley for helpful discussions.This work is part of a project that has received funding from the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation programme (Grant agreement No. 101021207): AJM and ASFO thank ERC for funding for the PREDAC-TED Advanced grant.AJM and ASFO also thank EPSRC (grant number EP/M022609/1) and BBSRC (grant number (BB/R016445/1 and BB/X009831/1) for support, as well as BrisSynBio, a BBSRC/EPSRC Synthetic Biology Research Centre (Grant Number:BB/L01386X/1). JLRA, ASFO and AJM acknowledge the UKRI sLoLA grant BB/W003449/1.ASFO thanks Oracle for Research for funding.JA and JR acknowledge support from EPSRC (Grants Nos.EP/T002875/1 and EP/R045577/1), and JA thanks the Royal Society for support.JR also acknowledges support from the Surrey Future Fellowship Programme.MD simulations were carried out using the computational facilities of the Advanced Computing Research Centre, University of Bristol (http://www.bris.ac.uk/acrc).

ASSOCIATED CONTENT
Supporting Information available.The supporting information file contains six sections, namely S.1-S.6.In section S.1, several plots evidencing the conformational stability of the MD simulations are shown (Fig. S1-S10).Section S.2 displays the reduction curves for the mutants relative to m4D2 determined using the PB+MC method (Fig. S11).Section S.3 discusses the mathematical procedure to estimate free energy differences by combining the Crooks fluctuation relation with Bayes theorem.Section S.4 contains several figures displaying the non-equilibrium work values computed from the simulation data (Fig S12 ), the work histograms for the forward and backward protocols (Fig. S13), the final posterior probabilities (Fig. S14), and the convergence of our redox potential estimator (Fig. S15).Section S.5 shows a figure with the experimentally determined redox potentials for the M23N, R34Q and R92Q mutants (Fig. 16).Section S.6 provides a table with the experimental and predicted redox potentials for m4D2, T19D, M23N, R34Q, R92Q, and T19D-T77D (Tab.S1).
-Supporting information -In section S.1 below, we display a range of figures to evidence the conformational stability of the MD simulations:   In section S.6 below, we provide the following table : Tab.S1: Experimental and predicted redox potentials E for m4D2, T19D, M23N, R34Q, R92Q, and T19D-T77D.Figure S8: Average Cα RMSF for m4D2, T19D, M23N, R34Q, R92Q and T19D-T77D.The Cα RMSF was calculated for the last 400 ns of simulation and averaged over all replicates (10 replicates for m4D2 and single mutants and 20 replicates for the double mutant).The vertical black lines highlight the mutation site(s).The grey boxes identify the position of the loop regions, namely loop 1 (connecting α-helix 1 to α-helix 2), loop 2 (connecting α-helix 2 to α-helix 3) and loop 3 (connecting α-helix 3 to α-helix 4).Please zoom into the image for a detailed visualisation.S12.This amounts to multiplying a large number of logistic functions with their mirror image, thus leading to the bell-like shape that can be observed. 59More importantly, each profile encodes all the available information needed to calculate an estimate for the true free energy ∆G.To normalise these profiles, we chose a conservative range of ∆g ∈ [−396, 531] kJ/mol, which covered the most likely work values produced by our MD simulations.As shown in Fig. S15, these estimates enable the means to predict redox potentials E, given the relation E = −∆G/F .Here, E and ∆G are measured in units of voltage and of energy, respectively, while F stands for Faraday's constant.Figure S15: Crooks-Bayes estimates Ẽ(W ) ± σ(W ) from µ iterations of the forward/reduction and backward/oxidation protocols.These are calculated using the estimator E(W ) = − d∆g p(∆g|W )∆g/F , which is optimal under the square error criterion. 61,65Here, F is Faraday's constant.The error in such an estimate is given by σ 2 (W ) = d∆g p(∆g|W )[−∆g/F − E(W )] 2 .The posterior probabilities p(∆g|W ) are those in Fig. S14.The final values for the redox potentials in the main text correspond to µ = 4000 iterations in panel A-E, and to µ = 8000 iterations in panel F. As can be observed, all six estimates start to converge to a single value when µ ≃ 2000.

Figure 2 :
Figure 2: Schematic of the MD+CB method used to calculate the redox potential shift (δE) of T19D relative to m4D2.MD simulations and a nonequilibrium perturbation were used to determine the energy cost of reduction (panels (a) and (e)) and oxidation (panels (b) and (f )) for m4D2 and T19D.Reduction and oxidation are identified as the forward (Λ) and backward ( Λ) protocols, respectively, which are needed as input for the detailed Crooks fluctuation relation.The statistical work, W , was determined as W = ϵ fin − ϵ ini, where for the reduction process, the final energy is the energy for the reduced protein, ϵ fin = ϵ red , and the initial energy is the energy for the oxidized protein, ϵ ini = ϵ ox .For the oxidation process, it is the other way around, i.e. ϵ fin = ϵ ox and ϵ ini = ϵ red .The resulting work histograms, p(W |Λ) for the forward/reduction (purple) process and p(−W | Λ) for the backward/oxidation (grey) process are shown in panels (c) for m4D2 and (g) for T19D.These histograms are shown for illustrative purposes, but they are not employed to perform the estimation of redox potentials due to the finite-sample caveats discussed in the main text.Instead, we use the more accurate Crooks-Bayes estimate59 (3), with error (4).For m4D2 and T19D, respectively, panels (d) and (h) show the convergence of these estimates as the number of iterations µ increases.The estimate δ Ẽ for the shift δE was calculated by subtracting the E-estimate ( Ẽ) for m4D2 from that for T19D, and the errors were propagated.The same procedure was applied to all of the mutants, with results given in Tab. 1.

Fig. S1 :
Fig. S1: Average structures of oxidized and reduced m4D2 and its mutants.Fig. S2: Time evolution of the average Cα RMSD for m4D2 and mutants showing that all systems are equilibrated after 100 ns.Fig. S3: Time evolution of the average Cα RMSD for all the Cα atoms and for helices 1-4 in the oxidized systems.Fig. S4: Time evolution of the average Cα RMSD for all the Cα atoms and for helices 1-4 in the reduced systems.Fig. S5: PCA for m4D2 and mutants showing that the replicates sample different regions of the conformational landscape.Fig. S6: PCA for the oxidized and reduced m4D2 and mutants showing that the two states sample different regions of conformational space.Fig. S7: Time evolution of the number of residues without secondary structure showing that the protein structures were stable during the simulation time.Fig. S8: Average fluctuations for the Cα atoms in m4D2 and mutants showing that in general the mutations affect the dynamics of their surrounding regions.Fig. S9: Histogram of the distance between residues in position 34 and 92 and the heme propionates for m4D2 and its mutants.Fig. S10: Histogram of the distance between residue in position 19 and 36, and 77 and 94 for m4D2, T19D and T19D-T77D.

Figure S1 :
FigureS1: Average structures for the oxidized and reduced m4D2, T19D, M23N, R34Q, R92Q and T19D-T77D.The average structures were calculated using the last 400 ns of all replicates for each system (10 replicates for m4D2 and single mutants and 20 replicates for the double mutant).

Figure S2 :
FigureS2: Time evolution of the average Cα RMSD for m4D2 (A), T19D (B), M23N (C), R34Q (D), R92Q (E) and T19D-T77D (F).The Cα RMSD was calculated relative to the starting structures and averaged over all replicates (10 replicates for m4D2 and single mutants and 20 replicates for the double mutant).All the systems were considered equilibrated after 100 ns.

Figure S3 :
FigureS3: Time evolution of the average Cα RMSD for oxidized m4D2 (A), T19D (B), M23N (C), R34Q (D), R92Q (E) and T19D-T77D (F) relative to the starting structures for all Cα atoms (dark green line) and for the structured regions (light green line) of the proteins.The Cα RMSD was calculated relative to the starting structures and averaged over all replicates (10 replicates for m4D2 and single mutants and 20 replicates for the double mutant).The structured region includes α-helices 1-4.Please note that helices 2 and 4 contain the two histidine residues axially coordinating the heme Fe atom.

Figure S4 :
FigureS4: Time evolution of the average Cα RMSD for reduced m4D2 (A), T19D (B), M23N (C), R34Q (D), R92Q (E) and T19D-T77D (F) relative to the starting structures for all Cα atoms (dark green line) and for the structured regions (light green line) of the proteins.The Cα RMSD was calculated relative to the starting structures and averaged over all replicates (10 replicates for m4D2 and single mutants and 20 replicates for the double mutant).The structured region includes α-helices 1-4.Please note that helices 2 and 4 contain the two histidine residues axially coordinating the heme Fe atom.

Figure S5 :
Figure S5: PCA of all replicates for the oxidized and reduced m4D2 (A-B), T19D (C-D), M23N (E-F), R34Q (G-H), R92Q (I-J) and T19D-T77D (K-L).All oxidized and reduced replicates for each system were combined before the analysis, and each trajectory contained one conformation per nanosecond per replicate (totaling 10001 frames for the m4D2 and single mutants and 20001 frames for the double mutant) with all the Cα atoms of the protein.Please zoom into the image for detailed visualization.Note that different replicates sample different regions of conformational space, thus improving the overall sampling for each system.

Figure S6 :
Figure S6: PCA for the m4D2 (A), T19D (B), M23N (C), R34Q (D), R92Q (E) and T19D-T77D (F).All oxidized and reduced replicates for each system were combined before the analysis, and each trajectory contained one conformation per nanosecond per replicate (totaling 10001 frames for the m4D2 and single mutants and 20001 frames for the double mutant) with all the Cα atoms of the protein.Note that the oxidized and reduced systems sample different regions of conformational space.

Figure S9 :
Figure S9: Minimum distance between residue 34 and the heme propionates (A) and residue 92 and the heme propionates (B) in the MD simulations of m4D2, T19D, M23N, R34Q, R92Q and T19D-T77D.Overall distribution of the minimum distance between the sidechain of residue 34 and 92 and the propionates.The histograms reflect the distances over the last 400 ns of each simulation.

Figure S10 :
Figure S10: Minimum distance between residue in position 19 and 36, and 77 and 94 for (A) m4D2, (B) T19D and (C) T19D-T77D.Overall distribution of the minimum distance between the sidechain of residue 19 and 36, and 77 and 94.The histograms reflect the distances over the last 400 ns of each simulation.

Figure S14 :
FigureS14: Numerical posterior probability p(d∆g|W ) = p(∆g|W )d∆g calculated via Eq.(S4) using the work data W in Fig.S12.This amounts to multiplying a large number of logistic functions with their mirror image, thus leading to the bell-like shape that can be observed.59More importantly, each profile encodes all the available information needed to calculate an estimate for the true free energy ∆G.To normalise these profiles, we chose a conservative range of ∆g ∈ [−396, 531] kJ/mol, which covered the most likely work values produced by our MD simulations.As shown in Fig.S15, these estimates enable the means to predict redox potentials E, given the relation E = −∆G/F .Here, E and ∆G are measured in units of voltage and of energy, respectively, while F stands for Faraday's constant. 27,30.