Bayesian Regression Facilitates Quantitative Modeling of Cell Metabolism

This paper presents Maud, a command-line application that implements Bayesian statistical inference for kinetic models of biochemical metabolic reaction networks. Maud takes into account quantitative information from omics experiments and background knowledge as well as structural information about kinetic mechanisms, regulatory interactions, and enzyme knockouts. Our paper reviews the existing options in this area, presents a case study illustrating how Maud can be used to analyze a metabolic network, and explains the biological, statistical, and computational design decisions underpinning Maud.


Maud's kinetic model 2.1 Parameters
Table 1 shows all of Maud's unknown parameters along with their dimensions Note that Maud's metabolic model includes some quantities that are not treated as parameters in its statistical model, including temperatures, compartment volumes and the formation energy of water.Maud treats these quantities as if they were known precisely: they can be configured by the user or default values can be used.Although in practice there can be considerable uncertainty regarding these quantities, we chose to disregard this uncertainty in the interest of simplicity.

Membrane potentials Experiments
Solving the steady state problem for a given set of parameters in an experiment yields a vector   of balanced metabolite concentrations.These are combined with the balanced metabolite concentrations   to produce a vector   with a concentration for each metabolite/compartment combination.
Δ   parameters can optionally be fixed; this can be useful for computational purposes, as for example to avoid estimating the formation energy of a metabolite about which there is no available information due to it only participating in irreversible reactions.

Rate equations
As discussed in the main text, Maud's kinetic model decomposes into factors contributing to the flux in a metabolic network in an experiment as shown in equation (1).For succinct-ness, and since Maud's model assumes that there are no interactions between experiments, we omit any notation referring to experiments below.We also omit any reference to the network's drain reactions: these are modelled as being exactly determined by the values of the parameter vector   .

𝐹 (𝐶
The term  in equation ( 1) is a vector of non-negative real numbers representing the concentration of the enzyme catalysing each reaction.
The term   in equation ( 1) is a vector of non-negative real numbers representing the amount of flux carried per unit of saturated enzyme.
The term  in equation ( 1) is a vector of real numbers capturing the impact of thermodynamic effects on the reaction's flux, as shown in equation (2).
Δ   =   Δ   + The terms in (2) have the following meanings: •  is the temperature in Kelvin (a number), •  is the gas constant (a number), • Δ   is a vector representing the Gibbs free energy change of each reaction in standard conditions, • Δ   is a vector representing the standard condition Gibbs free energy change of each metabolite's formation reaction, or in other words each metabolite's 'formation energy'.
•  is a vector representing the number of charges transported by each reaction.
•  is the Faraday constant (a number) •  is a vector representing each reaction's membrane potential (these numbers only matter for reactions that transport non-zero charge) Note that, for reactions with zero transported charge, the thermodynamic effect on each reaction is derived from metabolite formation energies.This formulation is helpful because, provided that all reactions' rates are calculated from the same formation energies, they are guaranteed to be thermodynamically consistent.
The term  accounts for both the charge and the directionality.For instance, a reaction that exports 2 protons to the extracellular space in the forward direction would have -2 charge.
If a negatively charged molecule like acetate is exported in the forward direction,  would be 1.
Note that this way of modelling the effect of transported charge does not take into account that the concentration gradient used by the transport is that of the dissociated molecules.
Thus, this expression is only correct for ions whose concentration can be expressed in the model only in the charged form; e.g., protons,  + ,   + ,  − , etc.
The term  in equation ( 1) is a vector of non-negative real numbers representing, for each reaction, the fraction of enzyme that is saturated, i.e. bound to one of the reaction's substrates.To describe saturation we use equation ( 3), which is taken from Liebermeister et al. ( 5) and Noor et al. (6).Additionally, this term captures competitive inhibition: as competitive inhibitor concentration increases, the saturation denominator increases, effectively decreasing the saturation of the substrate on the total enzyme pool.Conversely, as the substrate concentration increases this term approaches 1.
=  ⋅ free enzyme ratio (3) The term  in equation ( 1) is a vector of non-negative numbers describing the effect of allosteric regulation on each reaction.Allosteric regulation happens when binding to a certain molecule changes an enzyme's shape in a way that changes its catalytic behaviour.

𝑅
The parameter  0 in equation ( 1) is called the transfer constant, and the parameter vectors   and   are called tense and relaxed dissociation constants respectively.
Finally, the term  ℎℎ in equation ( 1) captures the important effect whereby enzyme activity is altered due to a coupled process of phosphorylation and dephosphorylation.This description achieves a similar behaviour to the MWC formalism for describing allosteric regulation, but using the rates of phosphorylation and dephosphorylation rather than concentrations of metabolites.Starting with the model in Saa and Nielsen (11), we extracted values for enzyme concentrations, boundary conditions and fluxes.We used these values to generate MCMC samples using Maud using the priors specified in section Section 3.2.When this was finished, we selected one sample with relatively high log probability to use as a ground truth in our case study.These parameter values are shown below in table Table 2.We manually inspected the parameter values to screen for any obviously implausible values; we did not find any of these.

Prior distributions compared with true parameter values
Table 2 shows the prior distributions we used for independent parameters.The first two columns show the 1% and 99% quantiles of each marginal prior distribution.True parameter S8 value are shown in column three, and the last column shows the z-score on log scale of the true parameter value according the marginal prior distribution.As can be seen from the table, there are 7 parameters for which the true value is outside the 1%-99% range.Δ   parameters for most metabolites were fixed; those that were modelled as unknown had a multivariate normal prior distribution derived from eQuilibrator (12).
The values for Δ   parameters, as well as all other model parameters, can be found by inspecting the file priors.tomlwhich is online at https://github.com/biosustain/Methionine_model/blob/main/data/methionine/priors.toml.

Computation
We conducted adaptive Hamiltonian Monte Carlo sampling for the full and missing-data datasets.For the full dataset we obtained 1000 post-warmup samples each from 4 independent Markov chains after 1000 warm-up samples and "hot-starting" with a mass metric output by a previous model run.
For the missing-data dataset 250 post-warmup samples were taken from 4 indpendent Markov chains after 100 warmup samples.The sampling was initialised using the mass matric from the complete measurement dataset and the warmup consisted of step size adaption for 100 samples.The resulting posterior distribution had an R = 1.01 for the log-probability and did not exhibit post-warmup divergences that were not a result of differential equation errors.

Laplace approximation case study
To compare MCMC sampling with Laplace approximation we used a different model with fewer parameters and state variables.This model was chosen because we were not able to generate results for our methionine model using Laplace approximation.The simpler case still serves to illustrate the general issues with approximating the posterior distributions of Bayesian kinetic models using the Laplace method, and that the associated numerical instability is another reason to prefer other methods where possible.
To generate Laplace samples we used Maud's Laplace mode.finding that, for certain cases where the target distribution exhibits highly disconnected and similarly shaped modes, the two algorithms have similar performance.

Multimodal posterior distributions
It is therefore important to pay careful attention to MCMC diagnostics and monitor developments in computational statistics that might expand the range of posterior distributions that can practically be sampled.
Zamora-Sillero et al. ( 16) introduces the method HYPERSPACE, which is designed to identify disconnected regions of a biochemical parameter space and characterise them by estimating their dimensions.HYPERSPACE does not perform parameter inference and is therefore not directly comparable with algorithms like adaptive Hamiltonian Monte Carlo as used in Maud, but could plausibly be adapted into such an algorithm, for example by supplementation with importance sampling (see Vehtari et al. (17 )).However, any such algorithm would be limited by HYPERSPACE's dependency on a random walk Metropolis Hastings algorithm.This kind of algorithm's performance is known to scale poorly compared with Hamiltonian Monte Carlo algorithms as the number of parameters increases: see Mangoubi et al. (15).Since Maud aims to fit models with hundreds of parameters, this means that it is unlikely that HYPERSPACE can directly be used to improve Maud's efficiency.S14

Table 1 :
Table S1 -Parameters of Maud's statistical model

Table 2 :
Table S2 -Parameter specification, marginal prior distributions and true parameter values used in our case study.
(15)e wehave not yet observed this in practice, we expect that Maud is capable of accurately sampling mildly multi-modal posterior distributions, i.e. those for which regions of parameter space with high probability density are not very sharply separated.This is because this ability depends primarily on the underlying inference algorithm, and adaptive Hamiltonian Monte Carlo is known to be able to sample many such posterior distributions.See, for example, Mangiola et al.(13), which reports the successful use of adaptive Hamiltonian Monte Carlo to sample posterior distributions involving mixtures of Gaussian distributions, which are typically multi-modal.Moreover, in cases where it fails to sample a multi-modal posterior distribution adaptiveHamiltonian Monte Carlo typically exhibits poor mixing and divergent transitions, which Maud is set up to detect automatically, making this case easy for users to diagnose.Nonetheless, it is also well known that strongly multi-modal posterior distributions pose a problem for adaptive Hamiltonian Monte Carlo, as well as other MCMC algorithms.See (14, §3.2) for a general discussion of this issue.Mangoubi et al.(15)explores the related question of the relative performance of Hamiltonian Monte Carlo vs random walk Metropolis Hastings,