What Is the Role of the Environment in the Emergence of Novel Antibiotic Resistance Genes? A Modeling Approach

It is generally accepted that intervention strategies to curb antibiotic resistance cannot solely focus on human and veterinary medicine but must also consider environmental settings. While the environment clearly has a role in transmission of resistant bacteria, its role in the emergence of novel antibiotic resistance genes (ARGs) is less clear. It has been suggested that the environment constitutes an enormous recruitment ground for ARGs to pathogens, but its extent is practically unknown. We have constructed a model framework for resistance emergence and used available quantitative data on relevant processes to identify limiting steps in the appearance of ARGs in human pathogens. We found that in a majority of possible scenarios, the environment would only play a minor role in the emergence of novel ARGs. However, the uncertainty is enormous, highlighting an urgent need for more quantitative data. Specifically, more data is most needed on the fitness costs of ARG carriage, the degree of dispersal of resistant bacteria from the environment to humans, and the rates of mobilization and horizontal transfer of ARGs. This type of data is instrumental to determine which processes should be targeted for interventions to curb development and transmission of ARGs in the environment.

: Dependency of the modelled processes on the MH and H parameters S7 Figure S1: Relations between E and other parameters for the pre-existing model S8 Figure S2: Dependency of the different processes on the H parameter for the pre-existing model S9 Figure S3: Dependency of the different processes on the MH parameter for the pre-existing model S10 Figure S4: Dependency of the different processes on the D parameter for the pre-existing model S11 Figure S5: Dependency of the different processes on the MH parameter divided by the H parameter (resulting in an estimated M parameter) for the pre-existing model after 70 years of simulated time S12 Figure S6: Valid parameter ranges and process rates for the pre-existing model with S being fixed to 1 S13 Figure S7: Differences between how processes depend on E for the pre-existing model and the model where S is fixed to 1 S14 Figure S8: Valid parameter ranges and process rates for the emergence model S15 Links to the code of the two models and full results S16 References (supplement) S17 Detailed methods description Implementation of the model We consider two different scenarios in our model. In the first scenario, ARGs that later end up in human pathogens already pre-existed in bacteria in some setting at the start of the antibiotic era (around 70 years ago), and only needed to be, in some way, transferred to human pathogens (Main text, Table 1). In the second model scenario, we assume that ARGs did not have a resistance function at the start of the modelled time period but needed to first emerge as resistance factors. We will refer to the latter scenario as the "emergence model" and the first scenario as the "pre-existing model".
The endpoint of the proposed models is a mobilized ARG observed in a human pathogen. Several different chains of events may lead from the first appearance of an ARG to its occurrence in a pathogenic strain (Main text, Fig. 1). These combinations of events are summarized in the model as six different pathways. First, an ARG may have existed directly on a mobile genetic element in a human pathogen living in a human being (E1). Secondly, an ARG could have already existed on the chromosome of a human commensal bacteria. After mobilization, the ARG could then be horizontally transferred to a human pathogen (E2). Thirdly, an ARG could originate from a mobile genetic element in a commensal species belonging to the human microbiome. In this scenario, it would then be directly transferrable into a human pathogen via HGT (E3). These first three processes all target pre-existence of ARGs in bacteria directly associated with humans. However, it is also plausible that ARGs may have originated in bacteria living in the environment and subsequently been disseminated to humans. The simplest case would be an ARG that originated from a human pathogen present in the environment. It could then be disseminated between the compartments before ending up in humans (E4). Alternatively, a chromosomally encoded ARG could have existed among environmental bacteria that are not pathogenic. Subsequently, that ARG needs to be mobilized onto a plasmid or as a transposon, spread via horizontal gene transfer (HGT, referred to with the letter 'H' in the model) and be transferred to human pathogens. Humans might pick up the bacterium harbouring the ARG, for instance, with food or while swimming (E5). An ARG could also have pre-existed on a mobile genetic element in an environmental bacterium. Via transformation, transduction or conjugation (H), it could then get transferred to a human pathogen, which can subsequently be taken up by humans (E6).
Time plays an important role in all processes relating to the spread and maintenance of ARGs. For mobilization (M), horizontal gene transfer (H) and the dissemination processes (D), it is important that the microbe carrying the gene survives. Many human pathogenic species are known to have a rather short lifetime outside of their host 1,2 . Furthermore, the fitness effect of ARG carriage on population size over time is cumulative, affecting the population expansion rate (S) exponentially, in contrast to the other model parameters. The dissemination efficiency is additionally influenced by the distance to humans in terms of space and time. Human exposure is a critical factor for the dissemination as well as the appearance of the ARG in pathogens in the human microbiome. Humans are exposed to bacteria in many different ways through which uptake via food seems to be the most important one in terms of bacterial exposure per day 3 .
Each of the six pathways shown in Fig. 1 were expressed as equations (Main text, Table 1) with parameters defined in Table 2 (in the main text). The interpretation of each equation is the number of mobilized ARGs in pathogens at time t contributed by the corresponding pathway. Both models (the main "pre-existing" model and the "emergence" model) were implemented in R as follows, only differing in the meaning and range of the parameter E (Main text, Table 2). Each parameter was randomly selected from its log10 transformed range (see Main text, Table 2) under a uniform distribution. For each set of randomly selected parameters, the result of the equations for the different considered processes were calculated (Main text, Table 1). Next, the total appearance was calculated based on these sub-equations according to this formula: where E1 to E6 each represent the contribution of ARGs by a particular process, expressed as total contributed ARGs over the entire model timeframe, and t is the time in days, resulting in the total appearance (App) representing the total observed ARGs at time t. The true total appearance was approximated from the ResFinder database 4 , which depending on how gene variants are counted held on the order of 700 to 2200 unique ARGs in September 2019. By assuming that these genes would all have appeared in the past 70 years since antibiotics were widely introduced, that gives us an expected appearance rate of around 9 to 29 ARGs per year, or 0.025 to 0.079 per day (Main text, Table 2). If the modelled total appearance fell within the expected interval, the model parameters were saved. This procedure was iterated until 10,000 valid sets of model parameters had been obtained for each modelled time point and set of parameter ranges.

Defining the probability boundaries for antibiotic resistance development
In order to populate the model, measured values for the individual parameters were taken from literature (Supplementary Tables S1 and S2), and the upper and lower boundaries were fed into the model (Supplementary Table S1). For the rates of horizontal gene transfer (H) most studies were found for the conjugative transfer of genes. Data for both intra-and interspecies transfer of various naturally occurring and artificially constructed plasmids are available. These have been included as a first approximation in the model, bearing in mind that this is laboratory data and that their transferability to the respective ecosystems has been little studied so far. At this time, we have limited knowledge concerning the role of transduction and transformation for the distribution of resistance genes in the respective ecosystems, although recent studies have suggested that they may be important in the transfer of ARGs between bacteria 5 . Due to the lack of quantitative data, these two processes were not considered for the estimation of the parameter H.
Unfortunately, there are no direct measurements for the mobilization parameter (M) that have been obtained independent of HGT. Most studies have the following design in common 6-10 ; the resistance encoded by the donor is not mobile and can only be detected in the recipient once it has been mobilized and transferred. Thus, we do not have a measure of the M parameter, and instead we have used the combined parameter MH (mobilization and horizontal gene transfer) where experimental data can be obtained from these studies.
To get an estimate for the dissemination of ARGs from the environment to humans (D), the number of eaten bacteria per day were used as a proxy 3 . In this study, the authors counted the number of colony forming units of meals for three different diet types. In this context, other transfer routes are also conceivable, e.g., the absorption of bacteria by swallowing water while bathing 11,12 . Such dissemination pathways can be of decisive importance when considering individual systems. However, they are not explicitly further considered in this model, which is intentionally kept general, as we aim to accommodate all possible dispersal scenarios in one single parameter.
Finally, some parameters were calculated based on the estimates of the number of humans on earth (7.8 billion; http://www.worldometers.info), the number of bacterial cells in/on the human body (3.8 x 10 13 ) 13 and the total number of bacterial cells on earth (on the order of 10 30 ) 14 . These values were used to define ranges for Pph, Ph, Pp and E (Main text, Table 2). We also estimate a typical bacterial genome to contain 1500 to 7500 genes 15 , that around 7% of bacterial cells carry a conjugable plasmid, and that a typical plasmid carries 50-100 genes 16 , which was used to constrain the Pm parameter. The S parameter represents the overall fitness impact of carrying an ARG and implicitly also involves the survival rate for a bacterium carrying an ARG, relative to a non-carrier. S is defined as the population expansion rate per day in the model, so when S is equal to one the average ARG would have no impact on the expansion of the population, i.e. the average ARG would overall be fitness neutral.

Limitations of the model
It is important to note that we have intentionally kept the model and its parameters general, in order to provide an overview of the respective importance of the individual processes that influence the appearance of new resistance genes in human pathogens. This is likely also one of the reasons why the model predictions are in many cases associated with very wide ranges. However, the model still emphasizes important and interrelated processes that need more research attention in order to build more quantitatively accurate models. Furthermore, by using specific rates for particular ecosystems as input for the model, we believe that our model could be used to assess the probabilities for the contribution of specific pathways for resistance development and dissemination.
A further limitation to the model is that the variable associated with the fitness cost of ARGs (S) represents a variety of different processes and parameters related to bacterial survival and proliferation. As we did not find an unambiguous way to tease these components apart, they are all captured by this single parameter, which of course results in an oversimplification of the real world. For these reasons, the role of S has to be interpreted with some caution. First of all, S represents the average fitness impact of the average ARG over the entire run time of the model (which in most of the data we have presented in this paper is 70 years). This means that S encapsulates scenarios where an ARG may some of the time have a positive fitness impact (such as during antibiotic selection) and at other times a slightly negative effect on fitness (e.g., in the absence of antibiotic selection). The final value of S therefore represents the expected overall fitness impact of ARG carriage, not the specific fitness of any single ARG currently encountered in pathogens. Furthermore, S also implicitly includes a constraint on the survival rate for a bacterium carrying an ARG. The way S is defined in the model, it is related to the population expansion rate per generation. In other words, if S is equal to one, the average ARG is expected to have no impact on bacterial growth and, thus, the population expansion (i.e. it is fitness neutral). However, given the relatively quick doubling rate of bacteria, a value only slightly deviating from one will rapidly result in the gene propagating through bacterial populations or quickly disappearing, which explains the tendency of S to approach one the longer the simulated time of the model.

S5
Supplementary  Supplementary Fig. S5. Dependency of the different processes on the MH parameter divided by the H parameter (resulting in an estimated M parameter) for the pre-existing model after 70 years of simulated time.