Bayesian Data Integration Questions Classic Study on Protease Self-Digest Kinetics

We combine Bayesian data integration with kinetic modeling to rigorously identify reaction mechanisms. This approach forces models to be consistent not only with kinetic measurements but with all available information. We revisit a classic study on trypsin self-digest acceleration by colloidal silica. Bayesian data integration reveals that the mechanism suggested in that study is inconsistent with its presented data. We propose an improved hypothesis. However, the detailed mechanism of the surface reaction cannot be inferred from the available data.


S1 Pre-processing of data from JW1981
Due to the considerable time that has passed since JW1981 was published, we had to access their data by digitizing their Figures 4 and 8. 1 JW1981 did not measure the concentration of trypsin directly but spectroscopically monitored the tryptic digest of substrate benzoyl-L-arginine ethyl ester (BAEE). If the substrate is present in great excess, the concentration of trypsin limits the reaction rate. The concentrations were then determined by comparison of the observed rates with corresponding data from experiments with known trypsin concentrations.
For the kinetic experiments, data in JW1981 were given as the inverse of relative activity. The amount of adsorbed trypsin in Figure 8 in JW1981 is given in mg trypsin mg SiO2 . We converted these data by dividing by trypsins molecular weight (23.3 kg mol −1 ) and multiplying with the amount of silica present in the kinetic experiments (10 mg l −1 ) to arrive at the concentration of [TS] in mol l −1 .

S2 Mechanism of surface accelerated self-digest
Colloidal silica in water has a disperse phase. There are two major mechanisms commonly used to describe surface reactions (SRs) in heterogeneous catalysis, Eley-Rideal (ER) and Langmuir-Hinshelwood (LH). Both mechanisms appear in textbooks for reactions with two different reactants and here have been slightly adapted to cater to trypsin-trypsin digest with two identical reactants.

S2.1 surface coverage
Trypsin molecules can bind to and unbind from silica particle surfaces.
T + S
The reaction rate on the surface r surface ER (in mol m −2 s −1 ) equals where: C S = binding site density on the surface, mol m −2 The binding site density is dependent on the size of the adsorbent and the properties of the surface. Since we are only interested in one pair of adsorbent and surface, we do not model C S explicitly. The reaction rate for the system r solution where: We do not need to model A explicitly because JW1981 always worked with the same concentration of silica. The complete system of ordinary differential equations is If unbound and adsorbed trypsin are in equilibrium, the reaction order (ω) w.r.t. to [T] can be determined from If surface coverage (φ) can be considered constant, the rate of product formation is dependent only on the concentration of unbound trypsin in solution. If φ is close to 1, i.e. maximum coverage, the reaction is first-order.
If φ is approximately 0, product formation resembles a second-order reaction. Adding Eq. S2 assumes that the substrate is adsorbed, whereas the digester attacks from the bulk. In principle, these roles could also be reversed with an immobilized digester on the surface proteolysing substrates from the solution: No trypsin adsorbed on the silica surface (TS) is consumed during autolysis in this model. We did not consider this variant of the ER mechanism. Surface reaction (SR) must be faster than bulk reaction (BR) in D SAS . There is no obvious reason why adsorbed enzyme should proteolyse substrate from solution faster than unbound enzyme. JW1981 found no difference in enzymatic activity towards BAEE upon adsorption. Moreover, this mechanism would most likely have the same shortcomings as JW1981's hypothesis discussed in the main text.

S2.3 Langmuir-Hinshelwood (LH)
Langmuir and Hinshelwood developed a mechanism where both molecules are adsorbed on the surface before the reaction (Scheme S2). → TS + P + S (S11) where: where: Contrary to ER, the rate of product formation does not depend directly on the concentration of unbound trypsin. The complete set of differential equations is If unbound and adsorbed trypsin are in equilibrium, the reaction order (ω) equals If φ approaches maximum coverage, product formation converges to a maximum since the maximum amount of available [TS] is capped by the number of binding sites [S] 0 . The reaction appears zeroth order with respect to [T]. For φ ≈ 0, the reaction appears to be second-order. If φ ≈ 0.5, the reaction appears to be first-order.
S2.4 Final model: ER-like surface reaction with two trypsin con-

S3 Requirement for constant surface coverage
We continue to investigate how large the binding constant K a would need to be to yield constant surface coverage for the trypsin starting concentrations of interest, i.e. satisfy Setting [T] a 0 = 11.0 µM and [T] b 0 = 2.2 µM , K a needs to be approximately one order of magnitude larger than inferred from D ADS for φ c < 1.05 ( Figure S1). φ c = 1.05 implies a 5% S8 difference in surface coverage in the beginning for the two starting concentrations of trypsin.
There are some differences between the D SAS and D ADS experiments which might lead to a distortion in K a (see main text). 5 6 7 log(K a / M 1 ) Probability density inferred c < 1.05 Figure S1: K a would need to be substantially larger than implied by D ADS for constant φ.

S4.1 Single trypsin conformation
We describe the bulk reaction (BR) of trypsin autolysis with a single bimolecular reaction, neglecting the reversible formation of the digester-substrate complex included in the Michaelis-Menten model.
where: T = unbound trypsin P = degradation product of trypsin

S4.2 Two trypsin conformations
Assuming two states A, B for trypsin, a simplified version of the autolysis reaction as published by e.g. Nord 4 but neglecting the digester-substrate complex is given by

S4.3 Reaction order of regular autolysis is below 2
One can approximate the reaction order (ω) w.r.t. to [T] by plotting the logarithm of the reaction rate r against the logarithm of [T] and calculating the slope: from data on regular self-digest (D REG ) for the smallest timestep t 2 − t 1 available. We then perform Bayesian inference of ω and C (which is a constant of no further interest to us) with this probabilistic model:

S5 Implicit assumptions
In addition to the explicit assumptions that are included in the chemical equations (e.g.

S5.2 P does not bind to or inhibit trypsin (T)
Northrop mentioned trypsin inhibition by hydrolysis products in 1921. 5 Fraser et al. studied trypsin self-digest and suggested that P can form a complex with T, thus inhibiting trypsin, in line with Northrop's proposal. 6 We neglect this inhibition process for three reasons: 1. We lack information such as the binding constant of P to T. Moreover, the degree of fragmentation is unclear, i.e. how many molecules of (potentially inhibiting) P are generated by digestion of one T.
2. Over 24 hours of regular autolysis, trypsin activity drops to almost zero. 3. Inhibition by P should affect regular and surface accelerated self-digest in the same way, therefore it is of little interest to the project at hand. S13

S5.3 Degradation of T to P is irreversible
Kukor et al. found trypsin to resynthesize the R117-V118 peptide bond. 8 Cleavage of R117-V118 appears to be an important step in the degradation of trypsin; 7 resynthesis of this particular bond is likely to be necessary for reversing inactivation. Neglecting the back reaction is acceptable as experiments show that bulk reaction (BR) trypsin self-digest continues until barely any activity is left. 7

towards BAEE
We derive experimental trypsin concentrations from relative measurements of trypsin activity towards BAEE (Section S1). The question if A and B (and their adsorbed forms) are equally active towards BAEE is by itself unrelated to the mechanism of (surface accelerated) S14 autolysis. Nevertheless, it is relevant to ensure comparability of modeled concentrations to experimental data.
Gabel and Kasche described that partially unfolded trpysin states exist and retain their enzymatic activity. 9 The same authors found that Ca 2+ leads to a more compact trypsin conformation and reduced autolysis but similar activity towards BAEE. 10 Therefore, assuming A and B to be equally active is plausible.
Kunitz and Northrop found that trypsin deactivates reversibly and assumed that this

S6 Bayes inference
Bayes theorem is where: θ i is a subset of the overall parameter space θ. One can also interpret this approach as Bayesian inference for one data set, using the posterior that we have determined from all other data sets as prior: All of our models return concentrations. The likelihood function assumes a normal distribution (Eq. S35). For the current study, the assumption of normality seemed safe, which is supported by the close match of measured data and posterior predictions. If concentrations were close to zero, relative to σ, using a normal distribution might lead to uncertainty intervals and posterior predictions of negative concentrations which are unphysical. where: Both σ f ix and f are inferred from the data. σ f ix represents an absolute error (m) whereas f accommodates relative errors (unitless) dependent of the observed concentration.
If the experimental uncertainty is the same for two experiments i and j, σ i should equal σ j . For JW1981, we assume that the experimental uncertainty of trypsin concentrations with or without silica surfaces is identical, hence σ D SAS = σ D REG . The experimental uncertainty σ D ADS of the adsorption isotherm is unrelated and inferred independently.

S6.2 Priors
We use weakly informative uniform priors for all parameters of the kinetic models (shown in Figure 2b in the main text). Rate constants of chemical reactions span many orders of magnitude. Parameters are supposed to be restrained by experimental data; priors often 2 https://emcee.readthedocs.io/en/latest/tutorials/line/; accessed at 13:48, January 9th, 2020 help numerical inference to converge, or at least they avoid wasting computational time in flat areas of the posterior landscape.

S6.2.1 Likelihoods are indifferent if step is not rate-determining
Consider the following simple mechanism: Let's assume concentrations over time are only available for X and Z, and that the first step is rate-determining. The likelihood is unresponsive to k 2 as long as k 2 k 1 . Therefore, the posterior landscape would be flat. Sampling in this area would be wasteful and uninformative. A reasonable prior that truncates k 2 would simplify sampling. Note that the choice of the prior will impact the observed density ( Figure S5). Unlike in many other applications of Bayesian statistics, even an infinite amount of data is not guaranteed to dominate over the prior.

S6.2.2 Biochemical motivation for priors
In biological systems, the association rate (k a ) is typically between 3 to 7 whereas the dissociation rate (k d ) is between -6 to -1. 3 We added two orders of magnitude to the lower and upper boundary as safety margin. Moreoever, our system describes the binding of enzyme molecules to binding sites, not to colloidal particles (CPs). Every such particle has n S/CP binding sites on its surface. In order to yield accurate results, parameters should Kinetically perfect enzymes have specificity constants kcat Km (merged in our model to rate constant of the bulk reaction (k BR )) of up to 10 8 − 10 9 m −1 s −1 , so we chose 10 9 m −1 s −1 as upper limit for k BR . 12 The lower limit can be derived from the integrated rate law of a second-order reaction and the data. Solving Eq. S42 for k BR yields S19 the lower end of the uniform prior to k BR > 10 −0.5 m −1 s −1 .
The lower limit for k ER is derived in a similar fashion from the integrated rate law of the ER mechanism (derivation not shown).
For [T] 0 = 2.2 µm, [T] < 0.9 µm after t = 15000 s. Larger K a lead to smaller k ER , hence we apply the upper limit of K a , 10 12 m −1 , yielding a lower limit of 10 −5.15 s −1 for k ER . We define is independent of [S] 0 . The upper limit for k ER cannot be determined without knowing φ beforehand. We set the upper limit to 10 −0.15 s −1 , i.e. the prior spans five orders of magnitude.
For the models including two conformations of trypsin we add two uniform priors: Priors for nuisance parameters are shown in Table S1.  BDI does not only correctly infer kinetic parameters but can also distinguish between different mechanisms ( Figure S8). In this test case, we compare two models M : ER and LH. Both mechanisms can reproduce D KIN equally well which is not surprising. Due to the added noise, p(θ|D) can be larger than p(θ true |D). This is a sign of (weak) overfitting: since the noise is random, no set of parameters can be consistently better than θ true . If we include D EQ , LH still performs similar to ER. Finally, with the inclusion of the last data set D TD , the difference in p(θ|D, M ) between ER and LH becomes large. The LH mechanism cannot reproduce all available data as well as the ER mechanism and is therefore rejected for our test case.
Our example for synthetic data demonstrates that integrative Bayesian inference reduces parameter uncertainty. Leveraging multiple data sets allows to reject a wrong hypothetical S23 mechanism. In our example, the LH mechanism could only be rejected after uncertainty was reduced for all parameters involved in the mechanism. Of course, this may pose a challenge in practice, depending on the accessibility and cost of experiments for parameter inference.