RBS and Promoter Strengths Determine the Cell-Growth-Dependent Protein Mass Fractions and Their Optimal Synthesis Rates

Models of gene expression considering host–circuit interactions are relevant for understanding both the strategies and associated trade-offs that cell endogenous genes have evolved and for the efficient design of heterologous protein expression systems and synthetic genetic circuits. Here, we consider a small-size model of gene expression dynamics in bacterial cells accounting for host–circuit interactions due to limited cellular resources. We define the cellular resources recruitment strength as a key functional coefficient that explains the distribution of resources among the host and the genes of interest and the relationship between the usage of resources and cell growth. This functional coefficient explicitly takes into account lab-accessible gene expression characteristics, such as promoter and ribosome binding site (RBS) strengths, capturing their interplay with the growth-dependent flux of available free cell resources. Despite its simplicity, the model captures the differential role of promoter and RBS strengths in the distribution of protein mass fractions as a function of growth rate and the optimal protein synthesis rate with remarkable fit to the experimental data from the literature for Escherichia coli. This allows us to explain why endogenous genes have evolved different strategies in the expression space and also makes the model suitable for model-based design of exogenous synthetic gene expression systems with desired characteristics.


SI.1. Modeling gene expression
In this section we obtain the the dynamics for the polysomic expression of a protein-coding gene in a bacterial cell. We introduce the resources recruitment strength (RRS), a dimensionless function that expresses the capacity of a protein-coding gene to engage cellular resources to get expressed. The RRS serves as basis for all posterior analysis of the partitioning of the cell mass between ribosomal and non-ribosomal endogenous proteins, the effects caused by the expression of exogenous protein-coding genes, and the roles played by the RBS and promoter strengths. In our model, we consider a set of basic assumptions: 1. Transcription dynamics is fast enough as compared to translation so it can be considered at quasi-steady state (QSS). 2. The main resources-dependent process in protein expression is translation. Therefore: (a) only ribosomes are considered as limiting shared resource required for protein expression. RNA polymerase is not considered explicitly. (b) the effective translation rate is assumed to depend on the availability of intracellular substrate. This is implicitly considering that building the polypeptide protein chain is the limiting energyconsuming process in the cell. We do not explicitly consider the catabolic conversion of substrate into amino acid building blocks. µ : cell specific growth rate.
For a protein of length l pk amino acids, we denote the mRNA length: where 1/l e is the ribosomes density, with l e expressed as equivalent number of codons. We consider the effective RBS length to be the same as l e . The length of the protein is denoted as l pk . Thus, up to n r ribosomes can simultaneously be translating a single copy of mRNA, with n r = l pk /l e , and an additional ribosome is bound to the RBS.
We consider the effective rates constants K e (s i ) and K k (s i ) at which the ribosome glides through the RBS and the remaining mRNA nucleotides respectively. Thus, we consider: with ν t (s i ) = νs i K sc + s i (4) where ν is the maximum attainable translation rate (peptide synthesis rate) and K sc is a Michaelis-Menten parameter related to the cell substrate uptake and catabolic capacity. As a first approximation, we consider that ν is organism dependent but does not depend on the sequence of nucleotides, and K sc is organism and substrate dependent but does not depend on the nucleotides sequence either.
The translating complexes C k j−1 are pseudo-species modeling the process of parallel translation. They can loosely be identified with each of the chains of amino acids under formation plus its associated ribosome. The first one, C k 0 , represents the ribosome bound to the RBS. Notice with rate K e (s i ) the ribosome bound to the RBS, forming C k 0 , advances to the next ribosome occupancy slot, generating the translating complex C k 1 and freeing the RBS so a new ribosome can enter in the queue. In turn, the displacement of the ribosomes in the queue by one occupancy generates the translating complexes C k j from the previous complex C k j−1 . Thus, each C k j species gives rise to a peptide chain which is in different stages of production. Finally, each parallel translating complex C k j generates a protein with rate K k (s i ) and frees its bound ribosome. Recall the translating complexes can be identified with the ensemble formed by each of the chains of amino acids under formation and their respective associated ribosomes. This way, the queue dynamics of the ribosomes advancing along the mRNA is decoupled from the protein building ones, thus getting a continuous approximation of the polysomic process of translation. This continuous approximation results in an exponential distribution of the amounts of pseudo-complexes C k j . Figure SI.1 depicts the process. The amounts of the pseudo-complexes fulfill C k j < C k j−1 (see equation (9)). Thus, starting from C k 0 , at each time instant a fraction of it remains and a smaller fraction becomes C k 1 because of the continuous shuffling up. At the next differential time instant, on the one hand a fraction of C k 1 remains and a smaller fraction becomes C k 2 and, on the other hand, a fraction of the C k 0 which remained in the previous time step remains and a smaller fraction becomes C k 1 . This process of generation of the amounts of pseudo-complexes takes place at each differential time instant, eventually resulting in an exponential distribution of the amounts of pseudo-complexes C k j . In parallel (i.e. simultaneously) to the process above, each of the pseudo-complexes C k j generates a peptide chain through the pseudo-reactions As a result the protein synthesis rate K k (s i ) nr j=1 C k j (see equation (5)) will effectively be weighting the amount of proteins being synthesized at each of their synthesis stages. Recall again that the pseudocomplexes C k j can loosely be identified with each of the chains of amino acids under formation plus its associated ribosome. The less abundant ones at the later stages of the peptide chain synthesis (close or at C k nr ) are weighted much less than the more abundant shorter ones at earlier stages. This way, the evaluation of the protein synthesis rate takes into account an approximation of the distribution profile of the lengths of the peptide chains being synthesized by the ribosomes along the transcript. (1) For a k-th protein-coding gene, a free ribosome binds to m k , a mRNA copy with its RBS free, producing the initial pseudocomplex C k 0 : an avatar for a transcript with a ribosome bound to its RBS. (2) The ribosome shuffles up one space with rate constant equal to the translation initiation one Ke(si) leaving the same mRNA copy with a free RBS, m k . In addition, there will be a ribosome placed at the first slot of the transcript protein coding region This ensemble is represented by the pseudocomplex C k 1 . Thus, though physically there is only one transcript copy with a free RBS and a ribosome placed at the first slot of the transcript coding region, we model this situation as having two pseudospecies: m k and C k 1 . The mRNA is again free to be bound by a new free ribosome, allowing new ribosomes to continually load into the system. Thus, ribosomes enter the translation process through the continuous "cycling" between the second and third pseudo-reactions in SI equation (1) and their shuffling up through the chain pseudo-reactions C k j−1 (3) C k 1 becomes an independent pseudocomplex consisting of a ribosome placed at the first slot of the mRNA coding region and gliding along the transcript with translation elongation rate constant K k (si). This virtual representation allows to consider the pseudoreaction C k 1 Eventually, one copy of the protein is synthesized and the ribosome is freed. (4) When the ribosome has initiated translation in C k 1 all other ribosomes will shuffle up one space in the physical mRNA. In our model we shuffle up the pseudocomplexes C k j . Thus, C k 1 becomes C k 2 at rate Ke(si) at which the entering free ribosome into the system "pushes" the queue to shuffle up. (5) A second parallel translation C k 2 → p k + r takes place. Both C k 1 and C k 2 simultaneously synthesize a protein and eventually free their bound ribosomes (one per complex).
Next, we apply mass action kinetics to obtain the dynamic balances for the copy number of each species S4 in the model. This way, we have: Recall we assume that transcription dynamics is fast enough as compared to translation so it can be considered at quasi-steady state (QSS). We also assume the binding-unbinding dynamics to form the translation complexes C k j are fast enough so that we can also consider the number of each of the complexes quickly reaches steady state. Therefore, fromṁ k = 0 andĊ k j ,j=1...nr = 0 we get: and In practice, d mk µ and k k u µ (see Table SI.2). Therefore the magnitude of the specific growth rate µ can be neglected with respect to both the mRNA degradation rate constant d mk and the sums K k u + K e (s i ) and K e (s i ) + K k (s i ) respectively so that we can approximate: and where we have defined: Notice K k C 0 is directly related to the RBS strength. Now, the dynamics for the abundance of the protein p k can be obtained from the equations (5) and (9) as: Notice the geometric sum 1 + a + . . . + a nr−1 , with a < 1, gives: Using the definitions (3), notice a = l pk le+l pk = l pk l mk so we get the expression: where we have taken into account that the maximum number of ribosomes bound to active translating complexes C k j ,j=1...nr simultaneously translating a mRNA molecule is n r = l pk /l e , and we assume this maximum is always reached.
Recall we have assumed that transcription is not a limiting process, so we can express the effective transcription rate: where η (codons/min) is the maximum transcription speed and F k (T f ) is the transcription characteristic function that may depend on one or several transcription factors. By default, we assume the gene copy number c nk is one. If this is not the case, the effective transcription rate ω k (T f ) must be multiplied by c nk . Notice in this case the transcription characteristic function F k (T f ) may depend on the gene copy number as described in [23]. As commented in our preliminary assumptions, we do not model competition for RNA polymerases, which would also affect the effective transcription rate. Yet, notice that, even if there are no cognate transcription factors associated, the term F k (T f ) can be used to account for competition for RNA polymerases preventing the effective transcription from proceeding at its maximum rate η l mk , sequence-dependent affinity of the promoter for the RNA polymerases (promoter strength) and the effect of nucleotides usage on the transcription speed. In summary, the term F k (T f ) can be used to accommodate aspects affecting transcription so that ω k (T f ) is the effective transcription rate.
Thus, the dynamics for protein expression become: We now define the ribosomes density related term: Figure SI.2 shows the values of E mk (l pk , l e ) as a function of the protein length l pk for different values of l e . As seen, E mk (l pk , l e ) can be accurately approximated as the linear function of l pk /l e : E mk (l pk , l e ) ≈ 0.62 l pk l e We further define the resources recruitment strength (RRS) functional parameter J k (µ, r): Notice the resources recruitment strength is a dimensionless function that expresses the capacity of the k-th protein-coding gene to engage cellular resources to get expressed. It explicitly depends on:

S6
Figure SI.2. Function E mk (l pk , l e ) as a function of the protein length l pk for different values of l e and their corresponding linear approximations E mk (l pk , l e ) = 0.62 l pk le .
• the gene expression characteristics: -mRNA transcription rate ω k (T f ) and degradation rate constant d mk -RBS strength-related parameter K k C 0 (s i ) • and the availability of cell resources: flux of free ribosomes µr ribosomes density l pk le (via E mk (l pk , l e )) Using these definitions in equation (15) we get the expressions for the abundance dynamics of the k-th protein:ṗ

SI.2. Dynamics of the number of mature available ribosomes
Next, using the results of the previous section, we obtain the dtnamics of the number of mature available ribosomes. Ribosomes are large complexes formed by both ribosomal RNA molecules and a variety of ribosomal proteins, adding up to 55 different protein species in E. coli. Recall we consider that translation is the main energy and resources limiting process we model. Notice the total number of ribosomes in the cell at any one time instant r T is the sum of the mature (r a ) and inmature (r i ) ribosomes. The mature ribosomes r a available for protein translation comprise the free ribosomes r and the ones bound to translating complexes. The bound ribosomes comprise the ones bound to translating complexes building the ribosomes themselves (r r b ), the ones bound to the translating complexes of endogenous non-ribosomal proteins (r nr b ) and the ones associated to the expression of exogenous genes (r exo b ). In turn, the bound ribosomes may be either bound to the RBSs or actively elongating along the transcripts to synthesize proteins. We denote the last subset as r r t , r nr t and r exo t respectively. We also denote r h b = r r b + r nr b as the set of host ribosomes bound both to ribosomal and non-ribosomal endogenous complexes. In case we are interested in a particular strain hosting exogenous genes, we refer to the set of all bound ribosomes as r s b = r h b + r exo b . In sumamry, we have: An analogous notation is used for the actively elongating bound ribosomes, with r h t = r r t , +r nr t and r s t = r h t + r exo t . The number of available mature ribosomes is a fraction of the total number of ribosomes, so that r a = Φ m r T . The fraction of active elongating ribosomes varies little in time, with an average value 0.8 [2,3]. Therefore, we can expect that the fraction Φ m also varies little, so that the dynamics of the total number of ribosomes and that of the number of available ribosomes are the same but for a scale factor. Next we consider the dynamics of the total copy number of ribosomes in the cell, r T .
To get the dynamics of r T we first consider an analogous expression to (19) for each of the proteins forming up a ribosome. If we consider the average ribosomal protein p r , and an average ribosome composed of N r proteins (e.g. N r = 55 for E. coli ) we define the total number of ribosomal proteins as p Σr = N r p r . For the average ribosomal protein p r we have use (19) to obtain dynamics: where we have assumed that ribosomal proteins are only subject to dilution caused by cell growth. Then, the dynamics for the total number of ribosomal proteins can be approximated as: Since all N r protein species are needed to form up an individual ribosome, and considering the average ribosomal protein p r , the total number of ribosomes is r T = p Σr /N r . Therefore, the dynamics of the total abundance of ribosomes r T will be the same as those of p r . That is: Therefore, the dynamics of the number of mature available ribosomes is:

SI.3. Relating free and available ribosomes
The number of free ribosomes is a measure of the cell burden, and plays a central role in the synthesis rate of proteins. Next we relate the number of free ribosomes with that of the synthesized available ones. That is, the available mature ones in the cell. This relationship will later allow to express the dynamics of the total number (alternatively, mass) of cell ribosomes and non-ribosomal proteins as a function of their interaction Recall we had r a = r + r s b . For each protein p k , and using the previous results, the number of ribosomes bound to complexes involved in its translation at each time instant is given by: where notice that C k 0 ribosomes are bound to the RBSs of the k-th protein-coding gene transcripts and r p k t = E mk (l pk , l e )C k 0 = J k (µ, r)r are actively involved in translation. An analogous expression can be obtained for the number r r of ribosomes bound to complexes involved in translation of ribosomes themselves: where we have taken into account that it requires N r protein species to build-up a ribosome and we consider an average ribosomal protein.
Therefore, the number of mature available ribosomes r a can be obtained from: where N nr is the number endogenous non-ribosomal host protein-coding genes and N exo that of exogenous ones. Notice for a ribosomal density l e = 25 and average ribosomal and non-ribosomal protein lengths (see Table SI.2) the average values 1/E mr = 0.21 and 1/E mp = 0.12 are small, accounting for the percentage of ribosomes bound to the RBS.
From equation (27) we get the number of free ribosomes r as a function of the mature available ones r a : SI.4. Fractions of bound and actively translating ribosomes with respect to the available mature ones Notice from (25) and (26) that the number of bound and bound-actively-translating ribosomes is: Using (28) and: S9 we can define the fractions of ribosomes bound to complexes and those actively involved in translation relative to the mature available ones for ribosomal, non-ribosomal and exogenous protein-coding genes: where k=Nnr , k=Nexo stand for the sum over the ensemble of all genes coding for endogenous and exogenous non-ribosomal proteins respectively. Thus, for the native host cell we can consider: for the strain hosting exogenous protein-coding genes. Using (30) and (31), the fraction of ribosomes bound to translating complexes (including both translating complexes for endogenous and exogenous proteins and those bound to RBSs) relative to the available mature ribosomes is: Notice each term 1 + k={r,nr,exo} 1 + 1 E mk J k (µ, r) can be understood as the share of cell resources required to express the j−th protein-coding gene. Thus, the magnitude of the adimensional coefficient J j (µ, r) is a measure of the resources recruited to express the j−th protein-coding gene.

SI.5. Obtaining the cell specific growth rate
Cell growth can essentially be explained as the time variation of the protein fraction of the total cell mass. Yet, not all protein mass contributes to cell growth. On the one hand, there are proteins which may be undergoing active degradation. On the other, exogenous proteins will in general not have any active role in the cell that contributes to its growth. Therefore, next we consider only the endogenous ribosomal and non-ribosomal proteins to obtain the cell specific growth rate from the time variation of the endogenous protein fraction of the total cell mass.
To deal with protein degradation, we take into account that the protein fraction of cell mass is the sum of the mass of functional and non-functional proteins (ie. proteins undergoing degradation). Though non-functional proteins do not contribute to cell growth, they do to cell mass. Thus, for a k-th protein species we can consider the fraction quantity of functional molecules of the protein, p k , and the one of non-functional ones p nf k so that the total number of protein copies of the k-th species is p T k = p k + p nf k . Then, considering the dynamics (19) of a generic protein, we have: where we have taken into account that the non-functional fraction p nf k only undergoes dilution due to cell growth and there is a conversion from functional to non-functional fraction caused by protein degradation.
If we consider the average mass of an amino acid m aa , the mass weight of a protein of length l pk can be approximated as m aa l pk . Thus, for p T k molecules of the k-th protein, their total mass weight is m k = m aa l pk p T k . Then, the mass of the N nr non-ribosomal endogenous proteins in the cell proteins can be approximated as: Therefore, the times variation of the protein mass explained by this set of N nr proteins is: where recall J k (µ, r)r is the number of ribosomes bound to complexes involved in the translation of each k-th protein and we have used the definitions (31) in the last step. Notice, in addition, that the degradation of proteins does not play a role when we consider the dynamics of the protein mass. It indeed plays a role when we consider the number of active proteins.
As for the protein cell mass variation explained by the time variation in the total number of ribosomes r T we will have the analogous expression:ṁ where we have used the fact that N r ribosomal proteins are required to form up one ribosome. Notice that we only consider the weight of the protein fraction of the ribosomes. This accounts only for approximately one third of the ribosomes mass [15]. Denoting the host cell protein weight m h = m nr + m r T , we reach the expression: which, using (32), can be expressed as:ṁ Recall that the specific growth rate µ is a continuous approximation of the discrete event process of cell duplication. Here we consider that the total biomass dry weight variation (ie. that of the whole population of cells) is mainly caused by cell duplication (i.e. population growth), and the dynamics of cell mass accumulation are much faster than those of cell duplication. Under this assumption, we may consider the protein mass for each cell quickly reaches steady state (ṁ h ≈ 0). Thus, from equation (39) we get the expression for the cell specific growth rate: Notice Φ h t r a is the number of ribosomes actively translating endogenous proteins (both ribosomal and nonribosomal) at a given time instant. Equation (40) allows to predict this number given a specific growth rate, assuming saturation of intracellular substrate (eg. considering a batch experiment and the exponential growth phase) and considering the average values for the amino acids mass and that of the protein fraction of the cell.
Notice also that (40) can be expressed as a function of the total number of ribosomes r T as:

SI.6. Cell specific growth rate and population dynamics
Our model considers the intracellular substrate s i as source of building blocks to synthesize proteins. Using the model in a multi-scale framework considering the macroscopic dynamics of the population of cells and the uptake of extracellular substrate requires relating the cell population growth rate with the individual cell one.
To this end, we next relate the expression for the cell specific growth rate µ obtained in (40) with the classical Monod-like expressions for the growth of a population of cells.
Recall if we have a population of N cells and we consider the average cell dry mass m cDW , the total biomass dry-weight will be M b = N m cDW . By taking derivative with respect to time, we get: Now, consider the continuous approximation of cell duplication: where µ = log 2/t d , with t d the cell population doubling time, is the specific growth rate. Then: from which we get:ṁ whereṀ b N is the mean value per cell of the population mass growth. As done in section SI.5, we assume the cell mass quickly reaches steady state as compared to the population dynamics. Thus, from equation (45) and assumingṁ cDW ≈ 0 we get: Experimental evidence suggests that the cell densityρ varies little throughout the adult cell life [10], so: The specific growth rate under a limiting substrate obtained from the experimental macroscopic analysis of a culture of cells can be expressed using the classical empirical Monod relationship: where µ m is the maximum specific growth rate, K s is the Monod affinity constant, and s is the concentration of the limiting substrate in the culture medium. The identities (47) allow to relate the specific growth rate obtained in (40) with the one obtained from population-scale macroscopic experiments under the condition of steady-state growth. Under this condition, the rate of total cell-mass growth is identical to the rate of cell number growth [27]. Thus: To relate the intracellular and extracellular substrate, we resorted to the theoretical approaches to derive the Monod equation. Several alternatives exist [13,26,24]. We followed a reasoning derived from the model developed in [25], where the quantity of intracellular substrate s i is related to the one of extracellular substrate s through the dynamics of nutrient import and catabolism:

S12
where V m is the harvest volume [24], e t and e m are transport and catabolism enzymes, and Michaelis-Menten kinetics are assumed (see [25]). If we assume that nutrient import quickly balances nutrient catabolism and we neglect the dilution term: where c = emνm etνt . Notice if the maximum import and catabolism fluxes are balanced, ie. c ≈ 1, then there is a linear relationship between the intracellular amount of substrate and its extracellular concentration. Otherwise, if catabolism is more efficient than transport (c > 1) the intracellular amount of substrate s i saturates with increasing values of s. Recall in our model the specific growth rate (49) is a function of s i . Using (51) we obtain the Monod-like expression: If we assume that the Michaelis-Menten constant for substrate catabolism is the same as the constant we defined in (3), that is, K sc = k m , we have: Notice that the hypothesis K sc = k m implicitly implies that the Michaelis-Menten constants for substrate catabolism and transport have similar values (k t = k m ), in agreement with the assumptions in [25]. Also notice the term 1/c = etνt emνm can be interpreted as the maximum flux yield between nutrient import and its catabolism. If c ≈ 1, that is, under the hypothesis that the efficiency of nutrient import and catabolism are balanced so the maximum import and catabolism fluxes are similar: In case catabolism is more efficient than transport (c > 1) we will need an increase in the concentration of the substrate in the extracellular medium (s) to achieve the same value of si Ksc+si in (53) as compared to the balanced case c = 1. Finally, in case transport is more efficient than catabolism (c < 1) we will need lower concentrations of the extracellular substrate.
Then, from (49) ,(48) and (53) we get: so we can identify: Note: The relationship (53) is valid but in the extreme cases non relevant cases c = 0 (there is transport into the cell but nutrients are not metabolised) and c = ∞. In these cases dilution cannot be neglected in equation (50) and the equilibria are different.

SI.7. Relationship between growth rate and cell mass
Our model accounts for the protein mass distribution but does not obtain the total protein cell mass. A simple approach to solve this would be considering a constant average cell protein mass. Yet, the cell mass varies with growth rate. In this section we consider the relationship between growth rate and the total cell protein mass, and obtain an empirical model relating the host protein mass m h with the specific growth rate µ.
Several phenomenological models have been proposed in the literature accounting for the relationship between growth rate and cell dry weight, like the recent ones [20,27]. In [27] a Monod-like relation between the chromosome replication-segregation period C + D and the cell specific growth rate µ is considered: We estimated the parameters α and β in (57) using the data in [3]. Figure SI.3(left) shows the good fit obtained. In addition the authors in [27] propose a linear relation between the cell dry weight and the product of C + D and µ: m c = m 0 µ(C + D) Therefore, according to this model, there is an affine relationship between the cell dry weight and specific growth rate: m c = m 0 (α + βµ) As shown in Figure SI.3(right), for the data we used, the relationship (58) gives a very rough approximation. Indeed, as shown in the same figure, and affine relationship gives much better fit.
As an alternative, in [20] an exponential relationship is proposed between the cell volume S c and the product of µ(C + D): Assuming constant cell density, equation (60) can be expressed as a function of the cell dry weight.   [3].
which expresses that the variation of cell mass (dry weight) relative to the variation of the specific growth rate is proportional to the cell mass. Figure SI.4 (left) shows the results obtained for (62) and Table SI.1 lists the best fitted parameters. Better fits can be obtained with alternative phenomenological expressions to (62) at the cost of losing the simple interpretation provided by expression (63). For the relationship between the cell content of endogenous proteins and the specific growth rate we postulate a relationship analogous to (63). Thus, we consider:

SI.8. Average host dynamics and steady state balanced growth.
We were interested in having an average model for the host dynamics and its steady state that can be used as base for analyzing host-circuit interactions. To this end we considered, on the one hand, the dynamics (37) of the ribosomal protein mass content of host, m r T and, on the other, the dynamics (36) of the mass of the ensemble of non-ribosomal endogenous proteins m nr .
For the non-ribosomal endogenous proteins we consider them as a lumped species with a single average resources recruitment strength J nr (µ, r) and average E mnr such that: We also considered average values for the endogenous protein lengths l r p and l nr p and a common average amino acid mass m aa so that the masses are related to the number of proteins as: where m rib is the average mass of the protein content of ribosomes and p nr includes both functional and inactive endogenous proteins (see SI.5). Then, using (37), (36), the expression for the growth (40) and the definitions (31), the dynamics for the host endogenous ribosomal and non-ribosomal protein mass can be expressed as: with the average effective RBS strengths for k = {r, nr}. The number of free ribosomes is obtained using the averaged (28): with r T obtained from (66), and the specific growth rate is obtained from (40) using the fraction Notice from (68)-(69) that the steady state will reached either when the growth rate stalls (µ = 0) or when there exists exponential balanced growth with: and the growth rate (41): Thus, at steady state balanced growth the relative resources recruitment strength provides the relative mass fractions at steady state. It is easy to see that this holds also for individual proteins. The relative resources recruitment strength of a given protein equals its relative mass in the cell. Notice from (68), (71) and (41) that at steady state: S16 that is, the growth rate at steady state depends linearly on the fraction Φ m Φ r t of bound ribosomes actively being used to build up ribosomes (ie. actively translating the transcripts) relative to the total number of ribosomes.
At steady state, the flux of free resources for a given intracellular substrate s i , defined as the number of free ribosomes times the cell growth rate, can be obtained as: showing a linear relationship with the cell protein weight m p . The host cell protein weight m h depends of the growth rate. To calculate it, we used both the data available in [3] and the phenomenological relationships obtained in section SI.7.

SI.9. Synthesis rate of exogenous proteins and interaction with the host cell
Product titer and productivity rate are important measures of performance in biotechnological applications.
In this section we consider the expression of an exogenous protein-coding gene and we analyze how the number of produced proteins at steady state and its specific mass productivity rate (synthesis rate) vary as a function of its RBS and promoter strengths and the interaction with the host cell. The expressions obtained can easily be adapted for the analysis of the synthesis rate of an endogenous protein as a function of its RBS and promoter strengths and interaction with the remaining host endogenous genes. We define the synthesis rate of a protein A, Π nA , as the number molecules of A at steady-state balanced growth produced per cell and generation as in [12]. Notice this is equivalent to the productivity rate of the protein A. Thus, if we consider a population of N cells and the continuous approximation of cell duplication (43), the total quantity of molecules of the protein A, P N A , will increase as the population of cells does as: where p A,ss is the number of molecules of the protein A at steady state in a single cell. Therefore: Analogously, we can defined the mass synthesis rate as: For the host endogenous dynamics we considered the average model as described in section SI.8. We extend the model by adding the dynamics of the exogenous protein of interest A as: where notice the denominator in the fraction of RRSs only includes the host protein-coding genes and the mass m h (µ) is still that of the native host cell and not the mass of the strain m s = m h (µ) + m A . The expressions (68)-(69) for the host endogenous dynamics remain the same. Yet, now the number of free ribosomes r is obtained using (28): S17 and the specific growth rate is obtained from (40) using the fraction The exogenous proteins do not contribute to cell growth, but contribute to cell mass. Thus, the fraction of fraction φ A of synthesized exogenous protein(s) must take into account their mass contribution. To this end, we take into account the specific growth rate is obtained from (40): and the mass dynamics for the exogenous protein: From (82) evaluated at steady state, we get Therefore, we obtain the relationship: Now, the protein mass content of the strain is Therefore, the mass fraction of protein A is: Using (80) evaluated at steady state and the results above, the synthesis rate (79) becomes:

SI.11. Estimation of the fractions Φ h t and Φ h b
Next, for the host native E. coli cell, we evaluate the fractions Φ h b and Φ h t of bound ribosomes and bound ribosomes being actively used in translating complexes relative to the mature available ones as a function of the number of free ribosomes r (see the definitions (32) in Section SI.4). We expect a very low number of free ribosomes. If this was not the case, there would be no real competition to recruit them. To set an initial upper limit, we use the estimation r = N(350, 35) in [11]. In addition, having too many free ribosomes in excess would imply a superfluous us of energy for the cell. Considering this hypothesis, we evaluated equation (33) as a function of the mature available ribosomes r a and the free ones r. Figure SI.5(left) shows the values estimated. Notice the values of Φ h b close to Φ h b = 1 indicating that the cell is always at the edge of its maximum capacity for using the available resources.
From the estimation of the fraction Φ h b and using the definitions in section SI. 4 we can obtain that of the dimensionless sum k={r,nr} 1 + 1 E mk J k (µ, r) reflecting the resources recruitment load generated by the S18 Model parameters for E. coli calculated from data in [9] lr average ribosomal protein length 195 amino acids (aa molec −1 ) calculated from data in [9] le ribosome occupancy length [20,30] r). The black line in both subplots depicts the value of mature available ribosomes r a as a function of the free ones r using the number of bound ribosomes obtained from the data in [9]. whole set of ribosomal and non-ribosomal endogenous proteins being expressed at a given moment in the cell (see Figure SI.5(right)).
To validate the estimations above, we evaluated both the sum k={r,nr} J k (µ, r) and the weighted sum k={r,nr} 1 + 1 E mk J k (µ, r) for all protein-coding genes reported in [9] for E. coli. The dynamic model for the expression of protein p in [9] considers: where β m (mRNA/t) is the transcription rate, β p (protein/(mRNA·t)) the translation one and d m the mRNA degradation rate constant. For those transcripts without information for d m in [9] we used the value shown in Table SI.2. Then, we used the equation (19) to derive the relationship: We took into account that the data was obtained for fast growing cells, with doubling time t d = 21 minutes. Under this condition, it is sensible to consider that the intracellular substrate will be saturated and the cells are growing at its maximum growth rate., so that ν t (s i ) = ν. We evaluated (88) considering the set of all non-ribosomal proteins, the ribosomal ones, and the full set of proteins, and obtained the corresponding fractions Φ h t and Φ h b . Figure SI.6(left) shows the experimental values of Φ h t and Φ h b obtained as a function of the number of free ribosomes r. As expected, and in agreement with the estimations shown in Figure SI.5(left), the values of Φ h b kept very close to 1 for a wide range of values of the number of free ribosomes. Recall the data in [9] was obtained for fast growing cells for which it is sensible to consider saturated intracellular substrate so that S20 the cells are growing at its maximum growth rate and the number of free ribosomes is very small. This is consistent with the result shown in Figure SI.5(left, black line) where the estimated values of Φ b are plotted along with the value of mature available ribosomes r a as a function of the free ones r using the number of bound ribosomes obtained from the data in [9]. In order to keep the experimental values of Φ h b over 0.98, the number of free ribosomes must keep below the limit of a few hundreds. Notice also the experimental constant ratio Φ h t /Φ h b ≈ 0.83. in agreement with [2,3]. That is, around 17% of the total number of mature available ribosomes are, in average, located at the RBSs.
Figure SI.6. Experimental values of Φ h b (black) and Φ h t (blue) as a function of the number of free ribosomes r obtained using the data in [9].
SI.12. Evaluation of the maximum resources recruitment strength.
In this section we evaluate, on the one hand, the order of magnitude of the resources recruitment strength for the protein-coding genes in E. coli. This is useful to evaluate the maximum burden (measured as the sum of the RRSs) in the nativeE. coli host cell and estimate how many genes are active. We used the data in [9] and expression (88) to calculate the individual values of maximum resources recruitment strength J k evaluated at r = 1 for the set of non-ribosomal and ribosomal protein-coding genes and sorted them by magnitude and as the ratio between the sorted values of J k and the corresponding protein lengths. The results are shown in Figure SI.7.
As a proxy to estimate how many genes are active at a given time we calculated the cumulative sum of the maximum resources recruitment strengths and obtained how many genes being expressed are required to explain both 95% and 99% of the total cumulative sum. We did this independently for both ribosomal and non-ribosomal proteins. Figure SI.8 shows the results obtained. Notice the same results will be obtained if using the weighted resources recruitment strengths since the ratio Φ h t /Φ h b is constant. Our results show that out of the 68 ribosomal genes, 49 of them (72%) explain 95% of the cumulative sum of the maximum resources recruitment strength of the ribosomal genes. To explain 99% we need 57 ribosomal genes (84% of them). On the other hand, for non-ribosomal genes we need 875 out of 3551 genes (25%) to explain 95% of the cumulative sum and 1735 (49%) to explain the 99%.

SI.13. Estimation of the number of free ribosomes in the cell
Estimation of the number of free ribosomes in the cell, r, is key for assessing the competition among the cell circuits for cellular resources. The results in the previous sections suggest that an extremely low number of free ribosomes, with order of magnitude in the range 10 1 , gives a high sensitivity of the total cumulative sum of the maximum resources recruitment strength with respect to variations in the amount of free ribosomes (see Figures  the total amount of recruited resources with respect to variations in the number of free ribosomes. That is, by not expressing superfluous resources, the cell forces a competition for them that induces a high sensitivity of the total amount of recruited resources with respect to variations in the number of free ribosomes, while a small surplus of superfluous resources induces robustness. To evaluate the range of expected values of r we used experimental data of the translation efficiency per mRNA. Notice from the dynamics (19) for a protein p k we can define: where s i is the copy number of molecules of intracellular substrate, d mk is the mRNA degradation rate constant, recall K k C 0 (s i ) is a substrate dependent parameter essentially related to the RBS strength and we consider that substrate availability will only affect translation and not transcription (see Section SI.1). Notice Y p/mRNA is the number of protein copies produced per transcript.
We used the data from [9] to estimate an upper bound for the number of free ribosomes r using (89) and the values for Y p/mRNA obtained using (87): Then, the relationship between the RBS strength-related term K k C 0 and the free ribosomes r becomes: where f (s i ) = s i /(K sc + s i ) and we have used equations (54), (55) and (56) relating the specific growth rate µ with the maximum one µ m and the availability of intracellular substrate. Notice that, for any given protein and intracellular substrate availability, the number of free ribosomes will determine the required value of the RBS strength-related term K k C 0 (s i ) to attain the experimental value of the translation efficiency per mRNA Y p/mRNA . The translation efficiency given by expression (91) depends on the ribosomes density 1/l e . An average ribosomes density around 4.2 ribosomes per 100 codons in optimal growth conditions has been reported in the literature for the prokaryote L. lactis [16]. This value is the same we obtained from the data in (88) by considering the total number of available active ribosomes Φ b r a obtained in Section SI.12 (see Figure SI.6) and dividing it by the sum of the lengths of all proteins weighted by a factor 0.5 to account for the estimation that 50% of the genes are active (explain 99% of the cumulative sum of the maximum resources recruitment strength). Similar values are found for other organisms [7]. For E. coli a value of 3.5 is given in [15]. The ribosomes density is inversely log-linearly related to the length of the coding sequence, with a slope quite consistent for a variety of organisms [7]. To account for this, we approximated a power law consistent with the findings in [7] and resulting in 4 ribosomes per 100 codons for an average protein length of 330 codons. We obtained the relationship: 1 l e = 0.0703 l 0.097 pk (92) This gives a range l e ⊂ [18,31], with a value l e = 25 for the average protein length. The minimum value is consistent with the shortest protein length (18 codons) in the database we used. Figure SI.9 shows the results obtained for the set of all ribosomal and non-ribosomal proteins and their average values.
From the results shown in Figure SI.9, notice that the number of free ribosomes r required to explain the experimental value of the translation efficiency per mRNA Y p/mRNA for each protein increases as the value of the RBS strength-related term K k C 0 decreases. Indeed, the more free ribosomes are available, the less competition for shared resources. The number of free ribosomes r is an indicator of the level of competition for resources. Thus, expression (91) implies that a gene producing short-living transcripts will require, for the same level of competition, a stronger RBS to achieve the same translation efficiency per mRNA Y p/mRNA as one with long-living transcripts.
To estimate an upper limit for the copy number of free ribosomes required to achieve the experimental translation rates per mRNA, we considered an upper bound for the RBS strength-related term K k C 0 . Recall from equation (10) that K k C 0 term is a function of the intracellular substrate availability. Since the data we used from [9] was obtained for fast growing cells, we can consider intracellular substrate saturation. Under this condition we get: Using the value of ν in Table SI.2 and the range of l e given above, we estimate ν le ⊂ [40, 70] (molec −1 · min −1 ). Figure SI.9. Relationship between the RBS strength-related term K k C 0 and the number of free ribosomes r obtained using the experimental data in [9] for non-ribosomal (left) and ribosomal (right) proteins in E. coli. Thin lines correspond to the experimental value of the translation efficiency per mRNA Y p/mRNA for each protein. The red thick line corresponds to the mean for all proteins in the corresponding non-ribosomal and ribosomal sets. The green thick line corresponds to the approximated mean when the term associated to the maximum specific cell growth rate is neglected in the expression (91).
On the other hand, the values of the association and dissociation rates of the ribosome to the RBS, K k b and K k u , may vary in a large range. Values K k b ⊂ [3,15] (molec −1 · min −1 ) are found in the literature (see Table SI.2). We use a conservative upper bound K max b = 10 (molec −1 ) considering binding is diffusion controlled. From the literature, we consider a range for the dissociation rate K k u ⊂ [3, 135] (min −1 ). Overall, these estimates give us a range (under the assumption of intracellular substrate saturation) K k C 0 ⊂ [0.02, 0.2] (molec −1 ). From the results shown in Figure SI.9, notice that a maximum number of free ribosomes r ≈ 350 can confidently explain the translation efficiencies per mRNA Y p/mRNA for almost all proteins while maintaining the value of the RBS strength-related term K k C 0 < 0.2. This estimation for the amount of free ribosomes is in complete agreement with the estimation r = N(350, 35) in [11].
With the upper limit r ≈ 350 we could explain the translation efficiencies per mRNA Y p/mRNA calculated as β pk d mk (see equation (91)) using the data in [9] but for a small set of 80 non-ribosomal proteins out of 3551 (2.25%). In 52 of them, this could be attributed to their extremely long-living transcripts. In the remaining 28 ones, to their very high translation efficiency per mRNA Y p/mRNA expected from their values of β pk and d mk given in [9]. This can be explained by rewriting (91) as: Figure SI.10 shows a plot of the function (94), as a function of its argument f (s i )K C 0 (s i )r/d m for two mRNA degradation rates corresponding to short and long-living mRNAs and ribosomes densities in the range l e = [18,31]. Notice Y p/mRNA = β pk d mk saturates at the maximum attainable value:  On the other hand, it is interesting to notice that for very low values of f (s i )K C 0 (s i )r/d m such that f (s i )K C 0 (s i )r dm µm we can approximate: from which we get: β pk ≈ 0.62 ν l e f (s i )K k C 0 (s i )r That is, for a highly competitive scenario where the number of free ribosomes is sufficiently small (e.g. in the order of few tens to few hundreds for typical values of d mk , µ m = 0.032 min −1 and f (s i )K C 0 (s i ) at its maximum estimated value f (s i )K C 0 (s i ) = 0.2) the translation rate (proteins per mRNA per time unit) is proportional to the ribosomes density 0.62 le , the effective maximum translation rate per codon attainable for a given substrate availability νf (s i ), the RBS strength K k C 0 , and the available free ribosomes r. Notice under this scenario the translation rate will suffer large stochastic fluctuations caused by stochastic fluctuations in the number of free ribosomes. In this case, the transcription rate for a given RBS-strength mainly depends on the competition for cellular resources and, therefore, on the number of free ribosomes r, and it is largely independent of the specific growth rate.

SI.14. Estimated fractions of ribosomes
Additional results for Section 2.2 are given here. The total number of ribosomes (both experimental and estimated) much increases for very fast growing cells. Thus, the fraction of free ribosomes with respect to the total number only increases from 0.08% up to 1.37% for cell duplication times between 100 and 24 minutes respectively even though the number of free ribosomes multiplies by almost 200-fold (see Figure  SI.11). The estimated value of the fraction of mature ribosomes with respect to the total number of ribosomes was Φ m ≈ 0.90 and the estimated fraction of active bound ribosomes kept roughly constant with growth rate at Φ h t Φ m ≈ 0.78. Notice also the logarithmic affine relationship between the number of free ribosomes r and its flux µr, (log 10 (r) ≈ 4.07 + 0.78log 10 (µr)) reflecting a power-law relationship between growth rate and number of free ribosomes. Additional images SI.12 and SI.13 for Results Section 2.4 are given here. In both cases, the effect of substrate variation is considered. The function f (s i ) corresponds to the normalized attainable peptide synthesis rate defined in equation (4), so that: Therefore f (s i ) is a saturated monotonous increasing function with the intracellular substrate s i , taking values in the range [0, 1].

SI.16. Software code and data
The software code and data are available at https://github.com/sb2cl/Resources-allocation-NoPi21 Figure SI.12. A: Effect of increasing mRNA synthesis rate on growth rate and mass fractions (left) and specific protein synthesis rate (right). B: Effect of RBS strength variation on growth rate and mass fractions (left) and specific protein synthesis rate (right) for the three values N A ω A = {150, 400, 800}. C: Specific protein synthesis rate across the expression space N A ω A , K A C 0 . The pink and blue squares correspond to the average lumped values of N x ω x , K x C 0 (s i ) for the non-ribosomal (pink) and ribosomal (blue) endogenous protein coding-genes in an E. coli host respectively. The dashed white lines correspond to the four scenarios analyzed in the panels A and B.
[6] Mette Eriksen, Kim Sneppen, Steen Pedersen, and Namiko Mitarai. Occlusion of the ribosome binding site connects the translational initiation frequency, mrna stability and premature transcription termination. Figure SI.13. Effect of the variation of substrate on the specific optimal synthesis rate of exogenous proteincoding genes. The contour lines show the specific synthesis rate as a function of the effective RBS strength and the mRNA synthesis rate. They were obtained by simulating the expression of a generic exogenous protein-coding gene with varying strengths across the expression space N A ω A , K A C 0 . We considered genes with four differential characteristics. Top: Saturated substrate. Bottom: Low substrate condition. The values of RBS strength and mRNA synthesis rate are now depicted as diamonds for the new scenario. The values of the previous substrate-saturated scenario are kept as squares for the sake of comparison. Recall the dependence of the effective RBS strength K x C 0 (s i ) on the availability (tantamount, nutrient quality) of the intracellular substrate. In our model, strong RBSs are more affected by variations of the substrate than weak ones. As the substrate decreases, the effective RBS-strength for genes with weak RBSs do not appreciably change. Notice that these genes require less resources (per gene) than the ones with strong RBSs. For the genes with strong RBSs, the apparent RBS strength increases as the substrate decreases. They increase their avidity for scarce resources. This way, they both keep their relative positions within the specific synthesis rate space..