Pair your accounts.

Export articles to Mendeley

Get article recommendations from ACS based on references in your Mendeley library.

Pair your accounts.

Export articles to Mendeley

Get article recommendations from ACS based on references in your Mendeley library.

You’ve supercharged your research process with ACS and Mendeley!

STEP 1:
Click to create an ACS ID

Please note: If you switch to a different device, you may be asked to login again with only your ACS ID.

Please note: If you switch to a different device, you may be asked to login again with only your ACS ID.

Please note: If you switch to a different device, you may be asked to login again with only your ACS ID.

MENDELEY PAIRING EXPIRED
Your Mendeley pairing has expired. Please reconnect
ACS Publications. Most Trusted. Most Cited. Most Read
Linking Engineered Cells to Their Digital Twins: A Version Control System for Strain Engineering
My Activity
  • Open Access
Research Article

Linking Engineered Cells to Their Digital Twins: A Version Control System for Strain Engineering
Click to copy article linkArticle link copied!

Open PDFSupporting Information (1)

ACS Synthetic Biology

Cite this: ACS Synth. Biol. 2020, 9, 3, 536–545
Click to copy citationCitation copied!
https://doi.org/10.1021/acssynbio.9b00400
Published February 20, 2020

Copyright © 2020 American Chemical Society. This publication is licensed under CC-BY-NC-ND.

Abstract

Click to copy section linkSection link copied!

As DNA sequencing and synthesis become cheaper and more easily accessible, the scale and complexity of biological engineering projects is set to grow. Yet, although there is an accelerating convergence between biotechnology and digital technology, a deficit in software and laboratory techniques diminishes the ability to make biotechnology more agile, reproducible, and transparent while, at the same time, limiting the security and safety of synthetic biology constructs. To partially address some of these problems, this paper presents an approach for physically linking engineered cells to their digital footprint—we called it digital twinning. This enables the tracking of the entire engineering history of a cell line in a specialized version control system for collaborative strain engineering via simple barcoding protocols.

Copyright © 2020 American Chemical Society
There is a rapidly accelerating convergence between biotechnology and information and communication technologies (ICT).
As writing and reading DNA becomes routine, cheaper, and pervasive, genetic engineering—that is, programming biological organisms—becomes more similar to programming computers. This has important disruptive implications for biotechnology: (1) larger teams of bioprogrammers—both in biotechnology companies of all sizes and in research organizations such as universities—work together and concurrently in the genetic programming of biological organisms; (2) the pace of innovation for biological “apps” is set to explode; and (3) the ecosystem and supply chain of biotechnological products, in particular cell lines and plasmids, is to become more complex and more diversified.
In software development, the time to market for new products has shortened dramatically over the recent years, thanks to (among other factors) advances in continuous integration and deployment allowing teams of programmers (often geographically distributed), to rapidly develop, test, debug, and promptly push new code to production. This means that new software products can reach their users and customers faster than ever before.
The core element of the technology underpinning continuous integration and deployment are the Version Control Systems (VCS). Their roots date back to Bell Laboratories’ early 1970s Source Code Control System. (1) Git, the prevalent VCS powering modern software engineering, was introduced by Linus Torvalds in 2005 and popularized the use of distributed version control systems.
Nowadays distributed VCS are used daily by software developers across the world. VCS help distributed teams to keep track of the history of changes to large software systems as it enables them to quickly answer questions such as “What was done to a piece of software?”, “What versions of the code are available?”, “Who introduced the last feature (or bug) into the code?”, “When did the modification take place?”, “What was modified?”, “How does this new version differ from the previous one?”, and importantly “Why was this piece of software modified?”, etc.
A version control system provides answers to the above questions, greatly simplifying the work of distributed teams of programmers who must modify the same piece of software concurrently. VCS achieve this by storing the files in the repository together with the entire history of changes to them; changes are automatically tracked and semiautomatically merged by the system. Because the entire lineage of a source code is maintained, VCS enables traceability, transparency, and backtracking (if necessary) as well as branching out new versions of computer code. Branched code does not interfere with the original code and yet retains all the metadata that lead to that branch’s origins, thus providing a safe sandpit for experimentation (i.e., if something breaks, one can always restore previous states). Without VCSs it would be virtually impossible to develop complex computer programs as we know them today. For a recent biology and biotechnology friendly introduction to version control and Git, refer to ref (2).
How would a biotechnology-specific version control system contribute to making biotechnology more agile, reproducible, and transparent—and the corresponding live constructs more traceable for the sake of their security and safety? (3)
On one hand, mislabeling and misidentification of biological samples occurs frequently in the laboratory. (4) While being able to properly identify and track a strain’s origins is essential, and notwithstanding persistent calls to address this challenge, (5−7) it is a problem that lacks appropriate tooling. Moreover, a related major problem affecting many scientific disciplines—including biotechnology—is experimental irreproducibility. (8) As recently as 2018, Nature had a special issue themed “Challenges in Irreproducible Research” to highlight some of the most urgent issues and suggest potential ways forward, with Brazil the first country—to the best of our knowledge—to launch a reproducibility initiative within its borders. (9)
Laboratory data generated by researchers also face problems like data loss and lack of accepted engineering and reporting standards. Electronic Lab Notebooks and other online applications that use the cloud to store and publish the data and experiments generated are a partial step in this direction, although they are generic for lab operations rather than specific to the genetic programming of organisms. Thus, new tools and software tools have been called for. (10)
On the other hand, although several methods aimed at identifying cell lines coming from clinical or environmental samples have been developed, little effort has been put into establishing such tools for laboratory-created strains. From an engineering viewpoint, one needs to know not only a strain’s genome sequence but, importantly, also the history of changes that were made to a strain (e.g., added plasmids, scars left by knock-ins/knockouts, etc.). Ideally, one would want to also maintain a detailed history of experimental conditions used, the intention behind the modifications done to a strain or to its growth media, as well as other metadata such as, e.g., the lab of provenance, the genetic engineer(s) who worked on a strain, etc.
The lack of robust, yet simple, solutions to address the issues mentioned above, on the one hand, leads to wasteful use of both public and private funds and, on the other hand, contributes to the growth in public mistrust due to a lack of traceability and transparency in biotechnology product innovation.
Moreover, current practice for linking a strain digital footprint to the actual biological sample is based around hand-written notes, word processor or spreadsheet files, and—in the best case—the labeling of test tubes with barcodes generated from generic laboratory information management systems. Crucially, in all those cases, the biological sample itself does not carry a record of its digital footprint. That is, with current laboratory practice, a strain’s digital footprint (e.g., strain designs, genome sequence, recombineering sites, engineering history, etc.) is disconnected from the physical strain sample: no actual connection exists between the strains one creates in the laboratory (or that is used in production, e.g., to manufacture enzymes or biologics at scale) and the data available for that strain. Recent advances in DNA synthesis techniques and information storage in DNA could help bridge this important gap.
The usage of DNA as a long-term way of storing data has been recently demonstrated. (11) The use of small, synthetic DNA sequences as identifiers has been reported several times in very diverse areas. These short DNA sequences allow the identification of mutants in mixed populations in both microbial studies (12,13) and tumor cell lines studies under different treatments. (14,15) Also, DNA barcodes have been used to track cell lineage by progressively editing the barcode sequence using CRISPR. (16) DNA barcodes can be used in gene synthesis, in which the barcode sequence is used to isolate and assemble a synthetic gene (17) or large pathways in various circuit designs. (18) Additionally, DNA barcodes have been shown to be useful in drug discovery by tagging and identifying several chemicals that bind to specific target molecules. (19) Very recently new barcode generation algorithms have been developed to consider any type of synthesis or sequencing mistake which increases the robustness of all the previous uses. (20)
Finally, a version control system that physically links, via an embedded DNA barcode, engineered cells to their digital twins would also contribute toward more responsible research and innovation. For example, the efficacy of genetic firewalls, sophisticated as they might be, may not be able to avoid engineered organisms to escape a given niche. (22) Existing proposals for containing genetically programmed organisms and SynBio agents, do not seem to go beyond events occurring at frequencies of 10–11, which is not enough for what has been called certainty of containment (CoC). (21) In contrast, most of the concerns on undesirable propagation of human-made constructs could be managed if chassis and agents were barcoded with specific and unique DNA sequences that—once decoded—could trace back the sample to its origins. Barcoded clones would thus be equivalent to pets implanted subcutaneously with identification chips: in case they get lost or do some harm, their owner and their lineage can be immediately identified, and measures taken for counteracting detrimental effects. By the same token, barcoded strains would allow accessing all relevant information on its pedigree, safety and modifications implemented in them. Moreover, it would allow for a detailed comparison of the current genome sequence at the time of sampling and the intended (initial) genomic sequence at the moment of engineering. This would certainly facilitate comparative genomics and evolutionary studies starting from a well characterized point of origin for the later collected (i.e., escaped) samples. Barcoding would thus be complementary to—if not even more practical than—containment measures which as mentioned above are unlikely to deliver CoC where a very large cell population is at stake. Instead, barcodes will not only make traceability simple, but it will also assign a nonambiguous cipher to the growingly improved versions of the same chassis (as is the case with computers and mobile phones operating systems).
In this paper we present a biotechnology specific version control system, CellRepo, that provides both the genetic toolkits and web-based software to physically link living samples to their digital footprint history. CellRepo is based on small, unique and bio-orthogonal DNA sequences inserted in specific genomic locations of a strain. By a single sequencing reaction, the DNA sequence can be retrieved, and hence the user can track down the entire digital footprint history of a strain via our web server: strain creators, parental and derivative strains, strain design documentation, related papers, experimental protocols, computer models, etc. can all be retrieved from the proposed system (Figure 1). Put together, the biotechnology kits and the software repository move the digitalization of biotechnology a step closer making it more collaborative, scalable, transparent, traceable, and reproducible.

Figure 1

Figure 1. An example of version control in strain engineering. A repository with the master culture (red) is branched by two different biologists to create new Strain 1 (purple) and Strain 3 (green). Each dot in the commit tree represents a strain engineering milestone and is created by modifying the parent strand (or an earlier derivative). For critical (user defined) steps in the engineering process, a unique identifier can be generated and stored as a DNA barcode in the strain chromosome. The barcode is associated with a “commit” within our version control system, with each commit containing the digital footprint (files and other metadata information) that lead to this milestone.

Materials and Methods

Click to copy section linkSection link copied!

We report next the materials and methods used for physically relating a given strain to its digital twin within in the version control system. This is accomplished via writing and reading of a unique DNA barcode (Figure 2) that is demonstrated with different barcoding techniques in the model organisms Escherichia coli and Bacillus subtilis. When a barcoded strain is modified and a new milestone is reached, the strain can be rebarcoded: the new barcode replaces the previous one (Figure 2 Bottom). No information is encoded in the barcode itself, only the reference to the web profile. Updating a barcode only requires replacing one by another without the need for concatenation. Barcode design details are available in the Supporting Information.

Figure 2

Figure 2. Writing a barcode (Top): During strain engineering, a key milestone is reached, and the strain is barcoded with a unique DNA sequence generated by the CellRepo website. A commit to the repository is made connecting the unique barcode (now in the chromosome of the cell) with all the cell’s related documentation. Reading a barcode (Center): A DNA barcode is read from a sample strain enabling the lookup of all the strains’ documentation from the CellRepo repository. Updating a barcode (Bottom): Strain 1 was barcoded with Barcode-1. When the strain is modified to give Strain 2 a new barcode must be used. The strain needs to be updated with Barcode-2 replacing Barcode-1. The repository keeps track of the barcode’s “lineage”; thus, via sequencing Barcode-2 one can trace back the parental strain (and its own barcode).

Materials

DNA was amplified using Q5 High-Fidelity DNA Polymerase (New England Biolabs, NEB). For cloning purposes DNA was purified using Monarch PCR & DNA Cleanup Kit (NEB) and assembled using NEBuilder HiFi DNA Assembly Cloning Kit (NEB). Plasmid preparations were carried out using QIAprep Spin Miniprep Kit (QIAGEN). Primers and Synthetic DNA sequences (barcode gBlocks) were synthesized by Integrated DNA Technologies. All barcodes were sequenced by Sanger sequencing via a universal sequencing primer.

Strains and Plasmids

E. coli DH5α cells were used for most of plasmid preparations. Plasmids carrying a R6K-γ origin of replication were prepared and stored using DH5α/λ-pir cells. We used BW25113 as the barcode receiver E. coli strain and used its genomic DNA as template for homologous recombination to insert barcodes on the chromosome. Bacillus subtilis experiments were conducted in 168CA strain for barcode insertion, and the toxin/antitoxin unit for barcode selection/counter-selection was amplified from B. subtilis ZPM6. (23) All the “vector” tagged plasmids were built containing a restriction site that allows easy barcode sequence cloning. See Table 1 for a full list and description of the plasmids used in this work. Plasmid maps can be found in Figure S3 in the Supporting Information.
Table 1. Plasmids Used in Our Barcoding Kitsa
speciestaskplasmid nameantibiotic resistancefeaturesref.
E. coliλ-redpKD46Ampicillinλ-red genes under pBad promoter, temperature sensitive (24)
  pCP20AmpicillinFLP recombinase gene, temperature sensitive (24)
  pEC-VectorAmpicillin, ChloramphenicolCmR gene between FRT sites, homologous arms, R6k-γ origin. BamHIThis study
  pEC-BCAmpicillin, ChloramphenicolBarcoded pEC-vectorThis study
 CRISPRpREDCas9SpectinomycinConstitutive Cas9 expression, λ-red genes under IPTG inducible promoter, sgRNA targeting pUC origin under pBAD promoter, temperature sensitive replicon (25)
  pEC-CRISPR-vectorAmpicillinpUC origin, gRNA and homologous arms targeting barcoding region. SphIThis study
  pEC-CRISPR-BCAmpicillinBarcoded pEC-CRISPRThis study
B. subtilisCre-LoxpBS-CreLox-VectorSpectinomycin, ZeocinZeoR gene between loxP sites and homologous arms. SpeIThis study
  pBS-CreLox-BCSpectinomycin, ZeocinBarcoded version of pBS-CreLox-VectorThis study
  pDR244AmpicillinCre recombinase expression, temperature sensitive (26)
 CRISPRpJOE8999.1KanamycinCas9 under mannose inducible promoter, sgRNA under constitutive promoter, temperature sensitive (27)
  pBS-CRISPR-VectorKanamycinFrom pJOE8999.1. Homologous arms, sgRNA targeting barcoding location. SpeIThis study
  pBS-CRISPR-BCKanamycinBarcoded pBS-CRISPR-VectorThis study
 GFP insertionpGFP-rrnBAmpicillin, ChloramphenicolGFP gene under constitutive promoter and CmR gene in between homologous arms targeting amyE locus. Nonreplicative in B. subtilis. (28)
a

Restriction sites used to clone the barcode are shown in italics. pREDCas9 was a gift from Tao Chen (Addgene plasmid # 71541).

Barcoding Site Selection

Barcodes irrevocably link a strain to its digital twin and should be inserted on the chromosome in a favorable genetic context. To maximize sequence stability and to minimize genetic interference with a strain’s phenotype, we chose to insert the barcodes far from important gene loci. On the chromosome, interacting regulatory units are often found in neighboring locations. Not all genes in the genome are essential, and therefore, for each species, we gathered and curated data about essential genes from the literature and ruled out locations in the genome that were within the operon of essential genes. (29,30) Genes involved in metabolism regulation, cell wall components and migrating elements were also avoided. Following these considerations, barcodes were inserted at specific loci that did not affect replication upon cell division and, at the same time, were placed in a favorable biological context. All in all, the barcodes inserted at these loci (Figure 3) are unlikely to interact with proximal elements or to interfere with strain physiology.

Figure 3

Figure 3. Graphic representation of the barcoding sites for both species. Orange lines show homologous regions used for barcoding purposes. Coordinates indicate start and end of nearby genes. Green blocks correspond to universal primer binding sites (25 bp). Yellow and red blocks represent synchronization (9 bp) and checksum sequences (18 bp) respectively (Supporting Information). For E. coli and B. subtilis, NCBI genome Accession number CP009273.1 and AL009126.3 were used, respectively.

E. coli Barcoding

For detailed barcoding protocols, please see the Supporting Information.

λ-Red Mediated Recombination

We assembled pEC-vector containing the R6k-γ origin of replication and a selection cassette (cat gene, conferring chloramphenicol resistance) flanked by homology arms for genomic insertion (Figure 3) and FRT (Flipase Recognition Target) sequences for antibiotic cassette removal. We subsequently inserted barcode sequences on this parental plasmid leading to pEC-BC and adapted the protocol described by (24) for chromosomal insertion. Cells were first transformed with the helper plasmid pKD46 for expression of λ-red enzymes that allow barcode insertion via double crossover. Transformant colonies were selected in the presence of carbenicillin (100 μg/mL) at 30 °C. Carbenicillin resistant colony was inoculated and induced with l-arabinose 30 mM. When OD = 0.5, cells were made electrocompetent.
We amplified the barcode recombinant assembly from pEC-BC by PCR. Electrocompetent cells were mixed with the barcoding cassette. Cells were electroporated (25 μF, 200 Ω, 1.8 kV, Gene Pulser, BioRad) and plated in LB plates supplemented with chloramphenicol (25 μg/mL) at 37 °C. Barcoding was checked by colony-PCR. If selection cassette removal was required, barcoded colonies were transformed with the pCP20 helper plasmid and incubated at 30 °C in LB/Carbenicillin plates. Carbenicillin resistant colonies were grown at 37 °C in LB plates to promote the loss of pCP20. Carbenicillin and chloramphenicol sensitive clones were verified by colony PCR and positive clones were stored as barcoded strains.

CRISPR

E. coli cells were also barcoded by CRISPR using a two-plasmid system. (25,31) The plasmid pEC-CRISPR-vector was constructed by cloning the homologous arms and the 20N-gRNA scaffold under the control of a constitutive promoter. The barcode sequence was added afterward in between both homologous arms by HiFi DNA Assembly leading to the pEC-CRISPR-BC plasmid. pREDCas9 was transformed into E. coli BW25113 cells and selected in LB/Spectinomycin (50 μg/mL) at 30 °C. λ-Red recombinase was induced with IPTG 2 mM until OD = 0.5. pEC-CRISPR-BC was transformed and selected in LB/Spectinomycin/Carbenicillin plates. Transformant cells were checked by colony PCR. Positive clones were cured from pEC-CRISPR-BC by l-arabinose induction (30 mM). pREDCas9 was cured afterward by growing cells at 37 °C. Spectinomycin and carbenicillin sensitive clones were stored.

B. subtilis Barcoding

For detailed barcoding protocols, please see the Supporting Information.

Toxin/Antitoxin

We assembled a recombinant cassette containing homology arms, the mazF-ZeoR unit for selection/counter-selection and barcode sequence by SOE-PCR and adapted the protocol described in ref (23) to promote chromosomal insertion. The PCR product was transformed into B. subtilis cells. Cells were plated in NA (Nutrient agar)/zeocin(20 μg/mL) plates. Colonies were restreaked on NA/zeocin plates and tested for the integration of the recombinant DNA by PCR. For mazF-ZeoR cassette removal, positive clones were grown with 1% xylose to induce the toxin gene. Cells were plated in xylose supplemented media. Individual colonies were restreaked on NA and NA/zeocin plates; colonies which tested positive by PCR and sequencing were stored as barcoded strains.

Cre-Lox

We adapted the Cre-Lox protocol from ref (26) and constructed a plasmid with zeocin antibiotic resistance flanked by loxP sites and homologous regions to the barcode insertion site. Barcodes were cloned into this parental plasmid, the recombinant cassette was PCR amplified and used to transform B. subtilis 168CA. Zeocin resistant colonies were transformed with pDR224 at 30 °C, and spectinomycin (100 μg/mL) resistant colonies were verified for zeocin cassette removal by colony-PCR. Positive colonies were cured from pDR224 after incubation at 37 °C. Spectinomycin and zeocin sensitive colonies were stored as barcoded strains.

CRISPR

To barcode B. subtilis cells using CRISPR we used a single-plasmid approach. (27) Homologous arms and sgRNA target sequence were cloned in pJOE backbone. The barcode DNA sequence was cloned in afterward. 168CA cells were transformed with this plasmid and selected in LB/Kanamycin (5 μg/mL) supplemented with 0.2% mannose for Cas9 induction at 30 °C. Transformants were checked by colony PCR. pBS-CRISPR-BC was cured by growing the cells at 37 °C. Kanamycin sensitive cells were stored as barcoded strains.
Refer to Supplementary Figure S3 for detailed plasmid maps for each method.

Barcode Sequence Retrieval

To easily check the integrity of the barcode sequences, all barcode identifiers were tagged with a Universal Primer sequence (5′-TGGACATACATAGTATACTCTGGTG-3′). This primer is used during Sanger sequencing to retrieve any barcode sequence. Also, it can be used to check the success of the barcoding experiment (through colony-PCR) together with appropriate species-specific reverse primers.

Barcode Stability Assay

Chemostat

Using a chemostat we followed bacteria over 200 generations (about 4 days of culture) and retrieved barcode information at regular time points (every 15–25 bacterial generations, i.e., passage through the entire cell cycle). For E. coli the same cultures were followed over 200 consecutive generations, while in B. subtilis, cells were followed for 100 generations, induced to stress and sporulation by ethanol treatment and regrown from spores for a further 100 generations. Barcode sequences were obtained by PCR product Sanger sequencing after barcode amplification from genomic DNA.
For this study, we performed CFU growth curves and serially diluted cultures over time to obtain ideal dilution rates at which a minimal number of cells could be used to inoculate a chemostat. Provided this minimum number of cells, it was possible to evaluate, given a specific growth rate, the time needed for cultures to attain exponential phase (e.g., when the chemostat continuous flow should be turned on). Along all chemostat experiments, we recorded optical density (OD600) measurements while sampling cells for barcode sequencing to ensure estimated dilution rates were accurate and cultures could reach steady-state growth (Figure S4 in Supporting Information).

Large-Scale Growth Assay

With automated plate handling systems, we compared the evolution of 384 subcultures of barcoded and control cells over several stationary phase redilutions, in order to observe any changes represented in growth defects. Besides growth characterization, all barcoded samples were sequenced to uncover any potential mutations in DNA barcode sequences.
We used a liquid handling robot (Beckman Coulter Biomek FX) and an automated plate reader to handle 8 individual 96-well plates simultaneously. Before starting a growth experiment, plates were sealed by a gas-permeable membrane. In this assay, we followed bacteria over 10 subculture experiments. Initially, a single colony from each barcoded/control strain was picked from a fresh plate and grown in 25 mL LB supplemented with 0.4% (w/v) glucose overnight at 37 °C with regular shaking parameters (about 150 rpm). In the morning, saturated cultures were spun down, resuspended in fresh LB medium, diluted 100 times and 200 μL were loaded onto ThermoFisher clear 96-well microplates.
For all subculture experiments, two conditions were tested: an early stop of bacterial cultures after 6 h (in late exponential/start of stationary phase) and a prolonged culture in stationary phase (12 h) before snap freezing. For the first subculture, 100 μL were harvested after 6 h for the early sampling point, and the remaining bacterial culture was further incubated up to 12 h. All subsequent cultures were diluted 100 times from frozen stocks and cultivated in a 100 μL total volume for both early and late sampling points (Figure S5 and Figure S6 of the Supporting Information). By the end of the 10 subcultures, genomic DNA (gDNA) was extracted and screened for potential variations in barcode sequences.

Results

Click to copy section linkSection link copied!

We illustrate all the concepts introduced by performing a simple genetic engineering experiment that includes barcoding a cell line and uploading to the version control system. B. subtilis 168CA wild-type strain was barcoded with Barcode 659 using the Cre-Lox method described before. The resulting strain was then transformed with pGFP-rrnB. (28) Chloramphenicol (5 μg/mL) resistant clones were checked by colony-PCR and by checking the green fluorescence emission in a plate reader. This new strain was rebarcoded using Barcode 207. Two repositories can be found in our webpage describing B. subtilis (https://cellrepo.ico2s.org/B.subtilis-GFP) and E. coli experiments (https://cellrepo.ico2s.org/E.coli-Barcoding).

Barcoding Process

All the methods used to barcode both species probed to be capable of barcoding the cells with high efficiency (Supplementary Table S1). After extracting the genomic DNA and PCR the barcode, it was always possible to retrieve the barcode DNA sequence.

Barcode Stability Assay

After the chemostat experiment, we analyzed 128 sequencing reactions after 200 generations, including 24 controls to compare the evolution of barcoded vs wild-type strains. We confirmed that control wild type sequences remained unchanged and found no variation in barcode sequences over 200 generations for either species. This is consistent with mutation rate bibliographic data in both species (32,33) (Supplementary Table S2).
In the large-scale growth assay, over 10 subcultures, we estimated from 100-fold dilutions of previous subcultures that final samples reached about 100 generations. In our assay, sequencing of 384 barcoded strains tested in a normal vs stress conditions always revealed intact DNA barcode sequences. No major difference in growth rates was observed across the different samples (Supplementary Figures S5 and S6). Most sequencing reads left a 26–27 nucleotide gap downstream of the universal primer-binding site (due to Sanger sequencing limitations) and then showed a perfect match with the expected alignment. In less than 5% of cases, sequencing data quality was noisy, but a second complementary read would always manage to recover the integrity of a barcode sequence. The screening of a large number of biological replicates demonstrated the robustness of the barcode insertion process as well as the barcode sequence stability over time.

Web Server

We implemented the software aspects of the CellRepo by extending existing technology—the source code management system Kallithea (https://kallithea-scm.org). The web server hosting our implementation is available at https://cellrepo.ico2s.org and is free for academic use. We documented on the CellRepo server the key milestones of the process of cell engineering and barcoding described earlier (Figure 4), so that the reader can access additional information directly there. The CellRepo server allows new users (upon registration) to create their own cell engineering repositories. After that, the users can add/modify files and generate barcodes (Figure 5 (c)). There is no limit on the type of data that can be associated with a repository. Data might include plasmid designs, FASTA or GenBank files with genomic or plasmid data, SBOL files, characterization of experiments outputs (e.g., optical density readouts, growth curves, experimental protocols, etc.), references to papers or the papers themselves, etc. (Figure 5 (b)). All changes/commits are organized in a chronological order. Every repository has a wiki-like front-page that describes the essential details (e.g., genotype, phenotype, owner, etc.) (Figure 5 (a)). Furthermore, the system allows a repository to be forked, so further work could be carried into a cell line without interfering with the original repository (Figure 5).

Figure 4

Figure 4. Screenshot of the repository summary page. The list of changes shows all steps made in the process of engineering a Bacillus subtilis mutant strain with inserted GFP gene and represents a digital footprint of the cell line. Two key engineering milestones (revision 4 and 10) have been linked via a genetic barcode to the cell line.

Figure 5

Figure 5. Examples of information stored in the repository: (a) essential strain details, (b) list of added files that are part of the digital documentation for a cell line at a given point in time, and (c) list of revisions barcoded into the cell line, called “snapshots”, as they uniquely link to the digital documentation (i.e., the digital twin) at a specific point in the cell engineering cycle. In this example revision with id r10:acc4afc32349 and r4:070c713ad448 have been barcoded back into the cell line.

Discussion

Click to copy section linkSection link copied!

In this paper we argue that as the speed, size, and complexity of synthetic biology, biotechnology, and genetic engineering approaches increase, new tools are required to handle the substantial scale up that is taking place. We propose a new purpose-built version control system, CellRepo, for strain engineering. CellRepo links biological cells to their “digital twins” thus allowing the tracking of all extant data and metadata related to a cell engineering process. This link is created by placing a barcode into the cell that can—at a later stage—be retrieved via a simple sequencing reaction. The retrieved barcode can then be used to identify the cell line and all the related details stored in CellRepo (who built the cell line, in which lab, when the modifications took place, what protocols were used, what was done to the cell and why, etc.).
A possible limitations to this new technology may include inconsistencies between the barcode (and its digitally stored information) and the strain itself: mutations in the strain’s genome may arise after the strain has been barcoded leading to a discrepancy between the genome sequence associated with a barcode and the genome of the mutated strain. Without the barcode and its associated digital footprint in the repository, this situation will not be detectable because it will not be possible to know what the original strain genome was. With a barcode, even if the strain changed, there is the possibility to link back to the intended strain, assess the speed of the “evolutionary clock” and ascertain whether the changes are tolerable or not (which is—of course—application specific).
In the examples in this paper, the barcoding sites were chosen after considering a simple case where the barcodes were inserted in a neutral spot in the genome of two species, which have a wide range of genetic tools available and well annotated genomes. Similar strategies to the ones used in this paper may be followed for other well annotated organisms. For less well-known organisms, the choice of barcoding loci is less clear and remains an open avenue for future research. However, a practical guide is that, when lacking clear “neutral” barcoding loci, the barcode might be inserted together with the genetic modifications that are being constructed. Provided that the position of the identifier in the genome is well-documented within the digital information on the strain, our approach should still equip genetic engineers with the means to properly barcode their nonmodel organisms and link them to our version control system.
The barcoding protocols showed in this paper are not directly applicable to large-scale genomic rearrangements. (34) However, this kind of experiments utilize their own barcoding strategies that can still be linked to our version control system if the positions of their barcodes—within the massively rearranged genomes—are provided as part of “commits” to CellRepo.
As the need of barcoding new species and the number of synthetic biology chassis increases, fresh genetic technologies need to be developed for stably inserting such sequences in the genome of target organisms across the tree of life. Methods such as CRISPR prime (35) look like promising enablers that could expand the reach of the version control system into nonmodel organisms and also make the barcoding of more standard chassis easier.
We envision that adoption of barcoding as a routine for standardized identification of genetically engineered organisms will ease approval, security, and safety of the corresponding modified agents for industrial and environmental uses to a degree far superior than the current propositions for genetic firewalls—which do not provide a certainty of containment. Besides the computational effort, such regulation-oriented and safety-oriented barcoding will also demand genome editing of strains deficient in recombination, a feature typically desirable for environmental safety of GMOs. To this end, a number of molecular tools, e.g., counterselectable TargeTrons, are being currently developed in our laboratories. (36) In the meantime, CellRepo is freely available for noncommercial use at https://cellrepo.ico2s.org.

Supporting Information

Click to copy section linkSection link copied!

The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acssynbio.9b00400.

  • Supplementary text explains barcode design; Supplementary figures of plasmid maps used for barcoding and description of barcode stability assays results; Supplementary tables show the efficiency of the barcoding methods and the mutation rate comparison with bibliography (PDF)

Terms & Conditions

Most electronic Supporting Information files are available without a subscription to ACS Web Editions. Such files may be downloaded by article for research use (if there is a public use license linked to the relevant article, that license may permit other uses). Permission may be obtained from ACS for other uses through requests via the RightsLink permission system: http://pubs.acs.org/page/copyright/permissions.html.

Author Information

Click to copy section linkSection link copied!

  • Corresponding Author
  • Authors
    • Jonathan Tellechea-Luzardo - Interdisciplinary Computing and Complex Biosystems (ICOS) Research Group, Newcastle University, Newcastle Upon Tyne NE4 5TG, U.K.Orcidhttp://orcid.org/0000-0003-2198-4558
    • Charles Winterhalter - Interdisciplinary Computing and Complex Biosystems (ICOS) Research Group, Newcastle University, Newcastle Upon Tyne NE4 5TG, U.K.
    • Paweł Widera - Interdisciplinary Computing and Complex Biosystems (ICOS) Research Group, Newcastle University, Newcastle Upon Tyne NE4 5TG, U.K.Orcidhttp://orcid.org/0000-0003-4955-3653
    • Jerzy Kozyra - Interdisciplinary Computing and Complex Biosystems (ICOS) Research Group, Newcastle University, Newcastle Upon Tyne NE4 5TG, U.K.
    • Víctor de Lorenzo - Systems and Synthetic Biology Program, Centro Nacional de Biotecnología (CNB-CSIC), 28049 Madrid, SpainOrcidhttp://orcid.org/0000-0002-6041-2731
  • Funding

    J.T.L., C.W., J.K., and N.K. were supported by the UK Engineering and Physical Research Council under project “Synthetic Portabolomics: Leading the way at the crossroads of the Digital and the Bio Economies (EP/N031962/1)”. N.K. is funded by a Royal Academy of Engineering Chair in Emerging Technology award. V.d.L. was supported by project “BioRoboost (H2020-NMBP-BIO-CSA-2018, grant agreement N820699)”.

  • Notes
    The authors declare no competing financial interest.

    Vector files for both species can be found in the B. subtilis (https://cellrepo.ico2s.org/B.subtilis-GFP) and E. coli (https://cellrepo.ico2s.org/E.coli-Barcoding) repositories.

References

Click to copy section linkSection link copied!

This article references 36 other publications.

  1. 1
    Rochkind, M. J. (1975) The Source Code Control System. IEEE Trans. Softw. Eng. SE 1 (4), 364370,  DOI: 10.1109/TSE.1975.6312866
  2. 2
    Blischak, J. D., Davenport, E. R., and Wilson, G. (2016) A Quick Introduction to Version Control with Git and GitHub. PLoS Comput. Biol. 12 (1), 118,  DOI: 10.1371/journal.pcbi.1004668
  3. 3
    Schmidt, M. and De Lorenzo, V. (2016) Synthetic Bugs on the Loose: Containment Options for Deeply Engineered (Micro)Organisms. Curr. Opin. Biotechnol. 38, 9096,  DOI: 10.1016/j.copbio.2016.01.006
  4. 4
    Broman, K. W., Keller, M. P., Teo Broman, A., Kendziorski, C., Yandell, B. S., Sen, S., and Attie, A. D. (2015) 53706, W. Identification and Correction of Sample Mix-Ups in Expression Genetic Data: A Case Study. G3: Genes, Genomes, Genet. 5 (10), 21772186,  DOI: 10.1534/g3.115.019778
  5. 5
    Identity Crisis. Nature 2009, 457, 935936,  DOI: 10.1038/457935b .
  6. 6
    American Type Culture Collection Standards Development Organization (2010) Workgroup ASN-0002. Cell Line Misidentification: The Beginning of the End. Nat. Rev. Cancer 10 (6), 441448,  DOI: 10.1038/nrc2852
  7. 7
    Masters, J. R. (2012) End the Scandal of False Cell Lines. Nature (London, U. K.) 492, 186,  DOI: 10.1038/492186a
  8. 8
    Freedman, L. P., Cockburn, I. M., and Simcoe, T. S. (2015) The Economics of Reproducibility in Preclinical Research. PLoS Biol. 13, e1002165,  DOI: 10.1371/journal.pbio.1002165
  9. 9
    De Oliveira Andrade, R. (2019) Brazil’s Science Faces Reproducibility Test. Nature (London, U. K.) 569, 318319,  DOI: 10.1038/d41586-019-01485-z
  10. 10
    Sadowski, M. I., Grant, C., and Fell, T. S. (2016) Harnessing QbD, Programming Languages, and Automation for Reproducible Biology. Trends Biotechnol. 34 (3), 214227,  DOI: 10.1016/j.tibtech.2015.11.006
  11. 11
    Shipman, S. L., Nivala, J., Macklis, J. D., and Church, G. M. (2017) CRISPR-Cas Encoding of a Digital Movie into the Genomes of a Population of Living Bacteria. Nature (London, U. K.) 547, 345349,  DOI: 10.1038/nature23017
  12. 12
    Mazurkiewicz, P., Tang, C. M., Boone, C., and Holden, D. W. (2006) Signature-Tagged Mutagenesis: Barcoding Mutants for Genome-Wide Screens. Nat. Rev. Genet. 7, 929939,  DOI: 10.1038/nrg1984
  13. 13
    Liu, H., Price, M. N., Waters, R. J., Ray, J., Carlson, H. K., Lamson, J. S., Chakraborty, R., Arkin, A. P., and Deutschbauer, A. M. (2018) Magic Pools: Parallel Assessment of Transposon Delivery Vectors in Bacteria. mSystems 3, e0014317,  DOI: 10.1128/mSystems.00143-17
  14. 14
    Yu, C., Mannan, A. M., Metta Yvone, G., Ross, K. N., Zhang, Y.-L., Marton, M. A., Taylor, B. R., Crenshaw, A., Gould, J. Z., and Tamayo, P. (2016) High-Throughput Identification of Genotype-Specific Cancer Vulnerabilities in Mixtures of Barcoded Tumor Cell Lines. Nat. Biotechnol. 34, 419,  DOI: 10.1038/nbt.3460
  15. 15
    Bhang, H.-E. C., Ruddy, D. A., Krishnamurthy Radhakrishna, V., Caushi, J. X., Zhao, R., Hims, M. M., Singh, A. P., Kao, I., Rakiec, D., Shaw, P. (2015) Studying Clonal Dynamics in Response to Cancer Therapy Using High-Complexity Barcoding. Nat. Med. 21 (5), 440448,  DOI: 10.1038/nm.3841
  16. 16
    McKenna, A., Findlay, G. M., Gagnon, J. A., Horwitz, M. S., Schier, A. F., and Shendure, J. (2016) Whole-Organism Lineage Tracing by Combinatorial and Cumulative Genome Editing. Science (Washington, DC, U. S.) 353 (6298), aaf7907,  DOI: 10.1126/science.aaf7907
  17. 17
    Plesa, C., Sidore, A. M., Lubock, N. B., Zhang, D., and Kosuri, S. (2018) Multiplexed Gene Synthesis in Emulsions for Exploring Protein Functional Landscapes. Science (Washington, DC, U. S.) 359, 343347,  DOI: 10.1126/science.aao5167
  18. 18
    Woodruff, L. B. A., Gorochowski, T. E., Roehner, N., Mikkelsen, T. S., Densmore, D., Gordon, D. B., Nicol, R., and Voigt, C. A. (2016) Registry in a Tube: Multiplexed Pools of Retrievable Parts for Genetic Design Space Exploration. Nucleic Acids Res. 45 (3), 15531565,  DOI: 10.1093/nar/gkw1226
  19. 19
    Zimmermann, G. and Neri, D. (2016) DNA-Encoded Chemical Libraries: Foundations and Applications in Lead Discovery. Drug Discovery Today 21 (11), 18281834,  DOI: 10.1016/j.drudis.2016.07.013
  20. 20
    Hawkins, J. A., Jones, S. K., Finkelstein, I. J., and Press, W. H. (2018) Indel-Correcting DNA Barcodes for High-Throughput Sequencing. Proc. Natl. Acad. Sci. U. S. A. 115 (27), E6217E6226,  DOI: 10.1073/pnas.1802640115
  21. 21
    de Lorenzo, V. and Schmidt, M. (2018) Biological Standards for the Knowledge-Based BioEconomy: What Is at Stake. New Biotechnol. 40, 170180,  DOI: 10.1016/j.nbt.2017.05.001
  22. 22
    Schmidt, M. and de Lorenzo, V. (2012) Synthetic Constructs in/for the Environment: Managing the Interplay between Natural and Engineered Biology. FEBS Lett. 586 (15), 21992206,  DOI: 10.1016/j.febslet.2012.02.022
  23. 23
    Lin, Z., Deng, B., Jiao, Z., Wu, B., Xu, X., Yu, D., and Li, W. (2013) A Versatile Mini-MazF-Cassette for Marker-Free Targeted Genetic Modification in Bacillus Subtilis. J. Microbiol. Methods 95, 207214,  DOI: 10.1016/j.mimet.2013.07.020
  24. 24
    Datsenko, K. A., Wanner, B. L., and Beckwith, J. (2000) One-Step Inactivation of Chromosomal Genes in Escherichia Coli K-12 Using PCR Products. Proc. Natl. Acad. Sci. U. S. A. 97 (12), 66406645,  DOI: 10.1073/pnas.120163297
  25. 25
    Li, Y., Lin, Z., Huang, C., Zhang, Y., Wang, Z., Tang, Y., Chen, T., and Zhao, X. (2015) Metabolic Engineering of Escherichia Coli Using CRISPR–Cas9Meditated Genome Editing. Metab. Eng. 31, 1321,  DOI: 10.1016/j.ymben.2015.06.006
  26. 26
    Koo, B. M., Kritikos, G., Farelli, J. D., Todor, H., Tong, K., Kimsey, H., Wapinski, I., Galardini, M., Cabal, A., Peters, J. M. (2017) Construction and Analysis of Two Genome-Scale Deletion Libraries for Bacillus Subtilis. Cell Syst. 22, 291305,  DOI: 10.1016/j.cels.2016.12.013
  27. 27
    Altenbuchner, J. (2016) Editing of the Bacillus Subtilis Genome by the CRISPR-Cas9 System. Appl. Environ. Microbiol. 82 (17), 54215427,  DOI: 10.1128/AEM.01453-16
  28. 28
    Veening, J.-W., Murray, H., and Errington, J. (2009) A Mechanism for Cell Cycle Regulation of Sporulation Initiation in Bacillus Subtilis. Genes Dev. 23, 19591970,  DOI: 10.1101/gad.528209
  29. 29
    Kobayashi, K., Ehrlich, S. D., Albertini, A., Amati, G., Andersen, K. K., Arnaud, M., Asai, K., Ashikaga, S., Aymerich, S., Bessieres, P. (2003) Essential Bacillus Subtilis Genes. Proc. Natl. Acad. Sci. U. S. A. 100 (8), 46784683,  DOI: 10.1073/pnas.0730515100
  30. 30
    Juhas, M., Reuß, D. R., Zhu, B., and Commichau, F. M. (2014) Bacillus Subtilis and Escherichia Coli Essential Genes and Minimal Cell Factories after One Decade of Genome Engineering. Microbiology (London, U. K.) 160, 23412351,  DOI: 10.1099/mic.0.079376-0
  31. 31
    Jiang, Y., Chen, B., Duan, C., Sun, B., Yang, J., and Yang, S. (2015) Multigene Editing in the Escherichia Coli Genome via the CRISPR-Cas9 System. Appl. Environ. Microbiol. 81 (7), 25062514,  DOI: 10.1128/AEM.04023-14
  32. 32
    Sung, W., Ackerman, M. S., Gout, J.-F., Miller, S. F., Williams, E., Foster, P. L., and Lynch, M. (2015) Asymmetric Context-Dependent Mutation Patterns Revealed through Mutation-Accumulation Experiments. Mol. Biol. Evol. 32 (7), 16721683,  DOI: 10.1093/molbev/msv055
  33. 33
    Lee, H., Popodi, E., Tang, H., and Foster, P. L. (2012) Rate and Molecular Spectrum of Spontaneous Mutations in the Bacterium Escherichia Coli as Determined by Whole-Genome Sequencing. Proc. Natl. Acad. Sci. U. S. A. 109 (41), E2774E2783,  DOI: 10.1073/pnas.1210309109
  34. 34
    Wang, K., de la Torre, D., Robertson, W. E., and Chin, J. W. (2019) Programmed Chromosome Fission and Fusion Enable Precise Large-Scale Genome Rearrangement and Assembly. Science (Washington, DC, U. S.) 365, 922926,  DOI: 10.1126/science.aay0737
  35. 35
    Anzalone, A. V., Randolph, P. B., Davis, J. R., Sousa, A. A., Koblan, L. W., Levy, J. M., Chen, P. J., Wilson, C., Newby, G. A., Raguram, A. (2019) Search-and-Replace Genome Editing without Double-Strand Breaks or Donor DNA. Nature (London, U. K.) 576, 149157,  DOI: 10.1038/s41586-019-1711-4
  36. 36
    Velázquez, E., Lorenzo, V. de, and Al-Ramahi, Y. (2019) Recombination-Independent Genome Editing through CRISPR/Cas9-Enhanced TargeTron Delivery. ACS Synth. Biol. 8 (9), 21862193,  DOI: 10.1021/acssynbio.9b00293

Cited By

Click to copy section linkSection link copied!

This article is cited by 18 publications.

  1. Elena Velázquez, Yamal Al-Ramahi, Jonathan Tellechea-Luzardo, Natalio Krasnogor, Víctor de Lorenzo. Targetron-Assisted Delivery of Exogenous DNA Sequences into Pseudomonas putida through CRISPR-Aided Counterselection. ACS Synthetic Biology 2021, 10 (10) , 2552-2565. https://doi.org/10.1021/acssynbio.1c00199
  2. Savas Konur, Laurentiu Mierla, Harold Fellermann, Christophe Ladroue, Bradley Brown, Anil Wipat, Jamie Twycross, Boyang Peter Dun, Sara Kalvala, Marian Gheorghe, Natalio Krasnogor. Toward Full-Stack In Silico Synthetic Biology: Integrating Model Specification, Simulation, Verification, and Biological Compilation. ACS Synthetic Biology 2021, 10 (8) , 1931-1945. https://doi.org/10.1021/acssynbio.1c00143
  3. Roberta Bardini, Stefano Di Carlo. Computational methods for biofabrication in tissue engineering and regenerative medicine - a literature review. Computational and Structural Biotechnology Journal 2024, 23 , 601-616. https://doi.org/10.1016/j.csbj.2023.12.035
  4. Esteban Martínez-García, Víctor de Lorenzo. Pseudomonas putida as a synthetic biology chassis and a metabolic engineering platform. Current Opinion in Biotechnology 2024, 85 , 103025. https://doi.org/10.1016/j.copbio.2023.103025
  5. Casey-Tyler Berezin, Samuel Peccoud, Diptendu M. Kar, Jean Peccoud. Cryptographic approaches to authenticating synthetic DNA sequences. Trends in Biotechnology 2024, 9 https://doi.org/10.1016/j.tibtech.2024.02.002
  6. Steven Cen, Mulugeta Gebregziabher, Saeed Moazami, Christina J. Azevedo, Daniel Pelletier. Toward precision medicine using a “digital twin” approach: modeling the onset of disease-specific brain atrophy in individuals with multiple sclerosis. Scientific Reports 2023, 13 (1) https://doi.org/10.1038/s41598-023-43618-5
  7. Esteban Martínez-García, Sofía Fraile, Elena Algar, Tomás Aparicio, Elena Velázquez, Belén Calles, Huseyin Tas, Blas Blázquez, Bruno Martín, Clara Prieto, Lucas Sánchez-Sampedro, Morten H H Nørholm, Daniel C Volke, Nicolas T Wirth, Pavel Dvořák, Lorea Alejaldre, Lewis Grozinger, Matthew Crowther, Angel Goñi-Moreno, Pablo I Nikel, Juan Nogales, Víctor de Lorenzo. SEVA 4.0: an update of the Standard European Vector Architecture database for advanced analysis and programming of bacterial phenotypes. Nucleic Acids Research 2023, 51 (D1) , D1558-D1567. https://doi.org/10.1093/nar/gkac1059
  8. Jonathan Tellechea-Luzardo, Leanne Hobbs, Elena Velázquez, Lenka Pelechova, Simon Woods, Víctor de Lorenzo, Natalio Krasnogor. Versioning biological cells for trustworthy cell engineering. Nature Communications 2022, 13 (1) https://doi.org/10.1038/s41467-022-28350-4
  9. Elena Velázquez, Yamal Al‐Ramahi, Víctor de Lorenzo. CRISPR/Cas9‐enhanced Targetron Insertion for Delivery of Heterologous Sequences into the Genome of Gram‐Negative Bacteria. Current Protocols 2022, 2 (9) https://doi.org/10.1002/cpz1.532
  10. Víctor de Lorenzo. Environmental Galenics: large-scale fortification of extant microbiomes with engineered bioremediation agents. Philosophical Transactions of the Royal Society B: Biological Sciences 2022, 377 (1857) https://doi.org/10.1098/rstb.2021.0395
  11. Jonathan Tellechea-Luzardo, Irene Otero-Muras, Angel Goñi-Moreno, Pablo Carbonell. Fast biofoundries: coping with the challenges of biomanufacturing. Trends in Biotechnology 2022, 40 (7) , 831-842. https://doi.org/10.1016/j.tibtech.2021.12.006
  12. Pablo I. Nikel, Víctor de Lorenzo. Metabolic Engineering for Large‐Scale Environmental Bioremediation. 2021, 859-890. https://doi.org/10.1002/9783527823468.ch22
  13. Isabel Voigt, Hernan Inojosa, Anja Dillenseger, Rocco Haase, Katja Akgün, Tjalf Ziemssen. Digital Twins for Multiple Sclerosis. Frontiers in Immunology 2021, 12 https://doi.org/10.3389/fimmu.2021.669811
  14. Markus Schmidt, Vladimir Kubyshkin. How To Quantify a Genetic Firewall? A Polarity‐Based Metric for Genetic Code Engineering. ChemBioChem 2021, 22 (7) , 1268-1284. https://doi.org/10.1002/cbic.202000758
  15. Víctor de Lorenzo, Natalio Krasnogor, Markus Schmidt. For the sake of the Bioeconomy: define what a Synthetic Biology Chassis is!. New Biotechnology 2021, 60 , 44-51. https://doi.org/10.1016/j.nbt.2020.08.004
  16. Birgit Koch, Melanie M. Callaghan, Jonathan Tellechea‐Luzardo, Ami Y. Seeger, Joseph P. Dillard, Natalio Krasnogor. Protein interactions within and between two F‐type type IV secretion systems. Molecular Microbiology 2020, 114 (5) , 823-838. https://doi.org/10.1111/mmi.14582
  17. . Outcome of the public consultation on the draft Scientific Opinion on the evaluation of existing guidelines for their adequacy for the microbial characterisation and environmental risk assessment of micro‐organisms obtained through synthetic biology. EFSA Supporting Publications 2020https://doi.org/10.2903/sp.efsa.2020.EN-1934
  18. Feng-He Li, Qiang Tang, Yang-Yang Fan, Yang Li, Jie Li, Jing-Hang Wu, Chen-Fei Luo, Hong Sun, Wen-Wei Li, Han-Qing Yu. Developing a population-state decision system for intelligently reprogramming extracellular electron transfer in Shewanella oneidensis. Proceedings of the National Academy of Sciences 2020, 117 (37) , 23001-23010. https://doi.org/10.1073/pnas.2006534117
Open PDF

ACS Synthetic Biology

Cite this: ACS Synth. Biol. 2020, 9, 3, 536–545
Click to copy citationCitation copied!
https://doi.org/10.1021/acssynbio.9b00400
Published February 20, 2020

Copyright © 2020 American Chemical Society. This publication is licensed under CC-BY-NC-ND.

Article Views

3595

Altmetric

-

Citations

Learn about these metrics

Article Views are the COUNTER-compliant sum of full text article downloads since November 2008 (both PDF and HTML) across all institutions and individuals. These metrics are regularly updated to reflect usage leading up to the last few days.

Citations are the number of other articles citing this article, calculated by Crossref and updated daily. Find more information about Crossref citation counts.

The Altmetric Attention Score is a quantitative measure of the attention that a research article has received online. Clicking on the donut icon will load a page at altmetric.com with additional details about the score and the social media presence for the given article. Find more information on the Altmetric Attention Score and how the score is calculated.

  • Abstract

    Figure 1

    Figure 1. An example of version control in strain engineering. A repository with the master culture (red) is branched by two different biologists to create new Strain 1 (purple) and Strain 3 (green). Each dot in the commit tree represents a strain engineering milestone and is created by modifying the parent strand (or an earlier derivative). For critical (user defined) steps in the engineering process, a unique identifier can be generated and stored as a DNA barcode in the strain chromosome. The barcode is associated with a “commit” within our version control system, with each commit containing the digital footprint (files and other metadata information) that lead to this milestone.

    Figure 2

    Figure 2. Writing a barcode (Top): During strain engineering, a key milestone is reached, and the strain is barcoded with a unique DNA sequence generated by the CellRepo website. A commit to the repository is made connecting the unique barcode (now in the chromosome of the cell) with all the cell’s related documentation. Reading a barcode (Center): A DNA barcode is read from a sample strain enabling the lookup of all the strains’ documentation from the CellRepo repository. Updating a barcode (Bottom): Strain 1 was barcoded with Barcode-1. When the strain is modified to give Strain 2 a new barcode must be used. The strain needs to be updated with Barcode-2 replacing Barcode-1. The repository keeps track of the barcode’s “lineage”; thus, via sequencing Barcode-2 one can trace back the parental strain (and its own barcode).

    Figure 3

    Figure 3. Graphic representation of the barcoding sites for both species. Orange lines show homologous regions used for barcoding purposes. Coordinates indicate start and end of nearby genes. Green blocks correspond to universal primer binding sites (25 bp). Yellow and red blocks represent synchronization (9 bp) and checksum sequences (18 bp) respectively (Supporting Information). For E. coli and B. subtilis, NCBI genome Accession number CP009273.1 and AL009126.3 were used, respectively.

    Figure 4

    Figure 4. Screenshot of the repository summary page. The list of changes shows all steps made in the process of engineering a Bacillus subtilis mutant strain with inserted GFP gene and represents a digital footprint of the cell line. Two key engineering milestones (revision 4 and 10) have been linked via a genetic barcode to the cell line.

    Figure 5

    Figure 5. Examples of information stored in the repository: (a) essential strain details, (b) list of added files that are part of the digital documentation for a cell line at a given point in time, and (c) list of revisions barcoded into the cell line, called “snapshots”, as they uniquely link to the digital documentation (i.e., the digital twin) at a specific point in the cell engineering cycle. In this example revision with id r10:acc4afc32349 and r4:070c713ad448 have been barcoded back into the cell line.

  • References


    This article references 36 other publications.

    1. 1
      Rochkind, M. J. (1975) The Source Code Control System. IEEE Trans. Softw. Eng. SE 1 (4), 364370,  DOI: 10.1109/TSE.1975.6312866
    2. 2
      Blischak, J. D., Davenport, E. R., and Wilson, G. (2016) A Quick Introduction to Version Control with Git and GitHub. PLoS Comput. Biol. 12 (1), 118,  DOI: 10.1371/journal.pcbi.1004668
    3. 3
      Schmidt, M. and De Lorenzo, V. (2016) Synthetic Bugs on the Loose: Containment Options for Deeply Engineered (Micro)Organisms. Curr. Opin. Biotechnol. 38, 9096,  DOI: 10.1016/j.copbio.2016.01.006
    4. 4
      Broman, K. W., Keller, M. P., Teo Broman, A., Kendziorski, C., Yandell, B. S., Sen, S., and Attie, A. D. (2015) 53706, W. Identification and Correction of Sample Mix-Ups in Expression Genetic Data: A Case Study. G3: Genes, Genomes, Genet. 5 (10), 21772186,  DOI: 10.1534/g3.115.019778
    5. 5
      Identity Crisis. Nature 2009, 457, 935936,  DOI: 10.1038/457935b .
    6. 6
      American Type Culture Collection Standards Development Organization (2010) Workgroup ASN-0002. Cell Line Misidentification: The Beginning of the End. Nat. Rev. Cancer 10 (6), 441448,  DOI: 10.1038/nrc2852
    7. 7
      Masters, J. R. (2012) End the Scandal of False Cell Lines. Nature (London, U. K.) 492, 186,  DOI: 10.1038/492186a
    8. 8
      Freedman, L. P., Cockburn, I. M., and Simcoe, T. S. (2015) The Economics of Reproducibility in Preclinical Research. PLoS Biol. 13, e1002165,  DOI: 10.1371/journal.pbio.1002165
    9. 9
      De Oliveira Andrade, R. (2019) Brazil’s Science Faces Reproducibility Test. Nature (London, U. K.) 569, 318319,  DOI: 10.1038/d41586-019-01485-z
    10. 10
      Sadowski, M. I., Grant, C., and Fell, T. S. (2016) Harnessing QbD, Programming Languages, and Automation for Reproducible Biology. Trends Biotechnol. 34 (3), 214227,  DOI: 10.1016/j.tibtech.2015.11.006
    11. 11
      Shipman, S. L., Nivala, J., Macklis, J. D., and Church, G. M. (2017) CRISPR-Cas Encoding of a Digital Movie into the Genomes of a Population of Living Bacteria. Nature (London, U. K.) 547, 345349,  DOI: 10.1038/nature23017
    12. 12
      Mazurkiewicz, P., Tang, C. M., Boone, C., and Holden, D. W. (2006) Signature-Tagged Mutagenesis: Barcoding Mutants for Genome-Wide Screens. Nat. Rev. Genet. 7, 929939,  DOI: 10.1038/nrg1984
    13. 13
      Liu, H., Price, M. N., Waters, R. J., Ray, J., Carlson, H. K., Lamson, J. S., Chakraborty, R., Arkin, A. P., and Deutschbauer, A. M. (2018) Magic Pools: Parallel Assessment of Transposon Delivery Vectors in Bacteria. mSystems 3, e0014317,  DOI: 10.1128/mSystems.00143-17
    14. 14
      Yu, C., Mannan, A. M., Metta Yvone, G., Ross, K. N., Zhang, Y.-L., Marton, M. A., Taylor, B. R., Crenshaw, A., Gould, J. Z., and Tamayo, P. (2016) High-Throughput Identification of Genotype-Specific Cancer Vulnerabilities in Mixtures of Barcoded Tumor Cell Lines. Nat. Biotechnol. 34, 419,  DOI: 10.1038/nbt.3460
    15. 15
      Bhang, H.-E. C., Ruddy, D. A., Krishnamurthy Radhakrishna, V., Caushi, J. X., Zhao, R., Hims, M. M., Singh, A. P., Kao, I., Rakiec, D., Shaw, P. (2015) Studying Clonal Dynamics in Response to Cancer Therapy Using High-Complexity Barcoding. Nat. Med. 21 (5), 440448,  DOI: 10.1038/nm.3841
    16. 16
      McKenna, A., Findlay, G. M., Gagnon, J. A., Horwitz, M. S., Schier, A. F., and Shendure, J. (2016) Whole-Organism Lineage Tracing by Combinatorial and Cumulative Genome Editing. Science (Washington, DC, U. S.) 353 (6298), aaf7907,  DOI: 10.1126/science.aaf7907
    17. 17
      Plesa, C., Sidore, A. M., Lubock, N. B., Zhang, D., and Kosuri, S. (2018) Multiplexed Gene Synthesis in Emulsions for Exploring Protein Functional Landscapes. Science (Washington, DC, U. S.) 359, 343347,  DOI: 10.1126/science.aao5167
    18. 18
      Woodruff, L. B. A., Gorochowski, T. E., Roehner, N., Mikkelsen, T. S., Densmore, D., Gordon, D. B., Nicol, R., and Voigt, C. A. (2016) Registry in a Tube: Multiplexed Pools of Retrievable Parts for Genetic Design Space Exploration. Nucleic Acids Res. 45 (3), 15531565,  DOI: 10.1093/nar/gkw1226
    19. 19
      Zimmermann, G. and Neri, D. (2016) DNA-Encoded Chemical Libraries: Foundations and Applications in Lead Discovery. Drug Discovery Today 21 (11), 18281834,  DOI: 10.1016/j.drudis.2016.07.013
    20. 20
      Hawkins, J. A., Jones, S. K., Finkelstein, I. J., and Press, W. H. (2018) Indel-Correcting DNA Barcodes for High-Throughput Sequencing. Proc. Natl. Acad. Sci. U. S. A. 115 (27), E6217E6226,  DOI: 10.1073/pnas.1802640115
    21. 21
      de Lorenzo, V. and Schmidt, M. (2018) Biological Standards for the Knowledge-Based BioEconomy: What Is at Stake. New Biotechnol. 40, 170180,  DOI: 10.1016/j.nbt.2017.05.001
    22. 22
      Schmidt, M. and de Lorenzo, V. (2012) Synthetic Constructs in/for the Environment: Managing the Interplay between Natural and Engineered Biology. FEBS Lett. 586 (15), 21992206,  DOI: 10.1016/j.febslet.2012.02.022
    23. 23
      Lin, Z., Deng, B., Jiao, Z., Wu, B., Xu, X., Yu, D., and Li, W. (2013) A Versatile Mini-MazF-Cassette for Marker-Free Targeted Genetic Modification in Bacillus Subtilis. J. Microbiol. Methods 95, 207214,  DOI: 10.1016/j.mimet.2013.07.020
    24. 24
      Datsenko, K. A., Wanner, B. L., and Beckwith, J. (2000) One-Step Inactivation of Chromosomal Genes in Escherichia Coli K-12 Using PCR Products. Proc. Natl. Acad. Sci. U. S. A. 97 (12), 66406645,  DOI: 10.1073/pnas.120163297
    25. 25
      Li, Y., Lin, Z., Huang, C., Zhang, Y., Wang, Z., Tang, Y., Chen, T., and Zhao, X. (2015) Metabolic Engineering of Escherichia Coli Using CRISPR–Cas9Meditated Genome Editing. Metab. Eng. 31, 1321,  DOI: 10.1016/j.ymben.2015.06.006
    26. 26
      Koo, B. M., Kritikos, G., Farelli, J. D., Todor, H., Tong, K., Kimsey, H., Wapinski, I., Galardini, M., Cabal, A., Peters, J. M. (2017) Construction and Analysis of Two Genome-Scale Deletion Libraries for Bacillus Subtilis. Cell Syst. 22, 291305,  DOI: 10.1016/j.cels.2016.12.013
    27. 27
      Altenbuchner, J. (2016) Editing of the Bacillus Subtilis Genome by the CRISPR-Cas9 System. Appl. Environ. Microbiol. 82 (17), 54215427,  DOI: 10.1128/AEM.01453-16
    28. 28
      Veening, J.-W., Murray, H., and Errington, J. (2009) A Mechanism for Cell Cycle Regulation of Sporulation Initiation in Bacillus Subtilis. Genes Dev. 23, 19591970,  DOI: 10.1101/gad.528209
    29. 29
      Kobayashi, K., Ehrlich, S. D., Albertini, A., Amati, G., Andersen, K. K., Arnaud, M., Asai, K., Ashikaga, S., Aymerich, S., Bessieres, P. (2003) Essential Bacillus Subtilis Genes. Proc. Natl. Acad. Sci. U. S. A. 100 (8), 46784683,  DOI: 10.1073/pnas.0730515100
    30. 30
      Juhas, M., Reuß, D. R., Zhu, B., and Commichau, F. M. (2014) Bacillus Subtilis and Escherichia Coli Essential Genes and Minimal Cell Factories after One Decade of Genome Engineering. Microbiology (London, U. K.) 160, 23412351,  DOI: 10.1099/mic.0.079376-0
    31. 31
      Jiang, Y., Chen, B., Duan, C., Sun, B., Yang, J., and Yang, S. (2015) Multigene Editing in the Escherichia Coli Genome via the CRISPR-Cas9 System. Appl. Environ. Microbiol. 81 (7), 25062514,  DOI: 10.1128/AEM.04023-14
    32. 32
      Sung, W., Ackerman, M. S., Gout, J.-F., Miller, S. F., Williams, E., Foster, P. L., and Lynch, M. (2015) Asymmetric Context-Dependent Mutation Patterns Revealed through Mutation-Accumulation Experiments. Mol. Biol. Evol. 32 (7), 16721683,  DOI: 10.1093/molbev/msv055
    33. 33
      Lee, H., Popodi, E., Tang, H., and Foster, P. L. (2012) Rate and Molecular Spectrum of Spontaneous Mutations in the Bacterium Escherichia Coli as Determined by Whole-Genome Sequencing. Proc. Natl. Acad. Sci. U. S. A. 109 (41), E2774E2783,  DOI: 10.1073/pnas.1210309109
    34. 34
      Wang, K., de la Torre, D., Robertson, W. E., and Chin, J. W. (2019) Programmed Chromosome Fission and Fusion Enable Precise Large-Scale Genome Rearrangement and Assembly. Science (Washington, DC, U. S.) 365, 922926,  DOI: 10.1126/science.aay0737
    35. 35
      Anzalone, A. V., Randolph, P. B., Davis, J. R., Sousa, A. A., Koblan, L. W., Levy, J. M., Chen, P. J., Wilson, C., Newby, G. A., Raguram, A. (2019) Search-and-Replace Genome Editing without Double-Strand Breaks or Donor DNA. Nature (London, U. K.) 576, 149157,  DOI: 10.1038/s41586-019-1711-4
    36. 36
      Velázquez, E., Lorenzo, V. de, and Al-Ramahi, Y. (2019) Recombination-Independent Genome Editing through CRISPR/Cas9-Enhanced TargeTron Delivery. ACS Synth. Biol. 8 (9), 21862193,  DOI: 10.1021/acssynbio.9b00293
  • Supporting Information

    Supporting Information


    The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acssynbio.9b00400.

    • Supplementary text explains barcode design; Supplementary figures of plasmid maps used for barcoding and description of barcode stability assays results; Supplementary tables show the efficiency of the barcoding methods and the mutation rate comparison with bibliography (PDF)


    Terms & Conditions

    Most electronic Supporting Information files are available without a subscription to ACS Web Editions. Such files may be downloaded by article for research use (if there is a public use license linked to the relevant article, that license may permit other uses). Permission may be obtained from ACS for other uses through requests via the RightsLink permission system: http://pubs.acs.org/page/copyright/permissions.html.