A General Sequence Processing and Analysis Program for Protein Engineering
Abstract

Protein engineering projects often amass numerous raw DNA sequences, but no readily available software combines sequence processing and activity correlation required for efficient lead identification. XLibraryDisplay is an open source program integrated into Microsoft Excel for Windows that automates batch sequence processing via a simple step-by-step, menu-driven graphical user interface. XLibraryDisplay accepts any DNA template which is used as a basis for trimming, filtering, translating, and aligning hundreds to thousands of sequences (raw, FASTA, or Phred PHD file formats). Key steps for library characterization through lead discovery are available including library composition analysis, filtering by experimental data, graphing and correlating to experimental data, alignment to structural data extracted from PDB files, and generation of PyMOL visualization scripts. Though larger data sets can be handled, the program is best suited for analyzing approximately 10 000 or fewer leads or naïve clones which have been characterized using Sanger sequencing and other experimental approaches. XLibraryDisplay can be downloaded for free from sourceforge.net/projects/xlibrarydisplay/.
Introduction
Figure 1

Figure 1. Overview of the XLibraryDisplay user interface. All basic analysis routines are executed by clicking through the buttons on the vertical main menu from top to bottom. The processed data from each step is organized in a series of worksheets: Template, RawData, TrimmedDNA, BadDNA, GoodDNA, Translated, Aligned, Summary, and Activity. The aligned protein sequences are shown for a sample data set in which CDR H3 of trastuzumab has been randomized with 8 NNK codons, which contain equal mixes of all nucleotides at the first two positions (N) and G or T at the third position (K). NNKs allow coding of all 20 amino acids while lowering the odds of finding stop codons in individual library members compared to NNN. The template is always visible as a reference (top row) as are the sequence names (left column) in frozen panes. Sequence names are automatically highlighted in different colors if they have stop codons (red), frameshifts (blue), deletions (gray), insertions (dark gray), or undetermined amino acids (yellow). The library positions, which were automatically detected by the program, are highlighted in magenta in the template sequence. Unique library residues within the alignment are highlighted in alternating shades of magenta and purple by default after sorting. Other amino acids in the alignment are automatically highlighted if they are mutations (orange), silent mutations (peach), stop codons (red), gaps (gray), or unknown amino acids (Xs, yellow). Right-clicking on the alignment opens an interactive menu that allows the user to perform different actions on selected sequences or columns. As an example, two local DNA/AA alignment windows are shown. The Developer tab in Excel has been enabled which allows the user to modify code using Visual Basic.
Implementation
Overall Program Architecture
DNA Sequence Trimming
Simple Alignment Algorithm
Figure 2

Figure 2. Simple alignment algorithm. (A) A library is shown after each step to illustrate the alignment algorithm. For simplicity, the example library only shows eight sequences from a simulated library of the trastuzumab HC in which residues KDTY of CDR H1 have been randomized with 4 NNK codons. First, all the sequences which have been previously trimmed to the template are translated and aligned from N to C terminus without adding any gaps. Second, DNA sequences 3 and 4 which are shorter than the template by a multiple of 3 are assumed to have deletions. Gaps (red circles) are inserted into these sequences to align them to the template. Third, DNA sequences 5 and 6 which are longer than the template by a multiple of 3 are assumed to have insertions. Gaps are inserted into all other sequences and the template to align these insertions (red circles). Fourth, gaps (red circle) must be corrected for sequences which also contain insertions. Lastly, DNA sequences 7 and 8 which differ in size from the DNA template by a nonmultiple of 3 are assumed to have frameshifts. No gaps are inserted into these sequences. All sequences are colored as in Figure 1. The library residues have not yet been identified, nor have the sequences been sorted so they are not colored in magenta and purple. The randomized library positions are clearly identifiable under the KDTY template residues since they are mostly mutated. (B) The simple alignment algorithm inserts gaps into sequences by systematically testing gaps and scoring up to 10 residues surrounding each gap for the best match to the template. Gaps of 1 amino acid (i.e., 1 codon or 3 nucleotides) are tested initially scanning from N- to C-terminus during the first pass. The test gap size is increased until the template and sequence are the same length. If a gap score of 1 is not found, then the gap with the highest score will be used. The gap insertion process is iterated until the sequence and template are the same length. The top example represents an intermediate gap test that did not score as well as the gap chosen in the bottom alignment. The same method is employed for inserting gaps into the template for sequences that have insertions.
Needleman–Wunsch Alignment
Results
Enzyme Library Example
position | 32 | 65 | 108 | 109 | 158 | 159 |
---|---|---|---|---|---|---|
wt | Y | L | F | Q | D | I |
mutations | A | A | F | Q | A | |
V | I | W | I | A | G | |
L | L | Y | L | G | S | |
T | V | M | V |
Batch Sequence Trimming, Filtering, and Alignment
Interactive Sequence Analysis and Curation
Overall Library Composition Analysis
Figure 3

Figure 3. Automated library amino acid composition analysis. (A) A stacked-column graph generated by XLibraryDisplay shows the percent amino acid composition at each randomized position of the example MjTyrRS library. (B) A colored chart generated by the program shows the total numbers of amino acids found at each library position. (C) A WebLogo plot can be generated by loading an exported FASTA file.
Figure 4

Figure 4. Automated library nucleotide composition analysis. (A) A stacked-column graph shows the percent nucleotide composition at each randomized position of the example MjTyrRS library. (B) A colored chart generated by the program shows the total numbers of each base at each library position.
Automated Library Summary Report
Figure 5

Figure 5. Library summary analysis. A portion of a standard summary report for the MjTyrRS library is shown. For brevity, only 15 sequences are shown, but the actual report shows the library sequences for all unique clones.
Phred Sequence QC Analysis
Figure 6

Figure 6. Phred QC analysis. (A) A typical QC report is shown for the MjTyrRS library when sequences are loaded from Phred PHD files which contain a QC score for every base. Scores for each base are used to shade separate nucleotide boxes from light blue (high score, more accurate) to dark blue (low score, less accurate). (B) A plot of the QC score at each position of a good, single sequence is shown (A01). (C) A plot of the QC score at each position of a mixture of sequences is shown (D03). (D) A comparison of the sequence chromatograms for the good sequence shown in panel B, to the mixed sequence in panel C. A red box is drawn in each panel to indicate the site of the mixed bases near position 370. The coloring in panel A enables detection of potentially mixed clones by visual inspection.
Activity and Structure Analysis
Figure 7

Figure 7. Activity and structure analysis. (A) An example is shown for the sequence–activity correlation for an antibody light chain library selection against VEGF. In this example, only the library sequences are shown, but the entire sequence can also be automatically correlated to experimental data. (B) An example is shown for sequence–structure correlation in which the light chain antibody sequence and structure from PDB code 1N8Z was aligned to the selected sequences shown in panel A. Residues are colored according to secondary structure (red = sheets, blue = helices, purple = loops). The inset shows an image from PyMOL created from an automatically generated script which highlights mutations like T72M.
Discussion
Figure 8

Figure 8. Comparison of alignment algorithms. For libraries with constant loop lengths, the simple alignment algorithm works better than the NW algorithm. As implemented in the program, the NW algorithm has a tendency to insert gaps into randomized positions since it uses a constant gap penalty. For libraries with variable loop lengths, the NW algorithm performs better since it correctly inserts gaps into the randomized positions.
Acknowledgment
The authors thank Marissa Matsumoto, Heather Stephenson, Chris Thanos, Deepti Rokkam, David Chemla-Vogel, Amandeep Gakhal, Alice Yam, Justin Diachun, and Hara Dilley for testing the software and providing suggestions for the program.
NW | Needleman–Wunsch |
VEGF | vascular endothelial growth factor |
GUI | graphical user interface |
PDB | protein data bank |
MjTyrRS | Methanococcus jannaschii tyrosyl tRNA synthetase |
pAMF | para-azidomethyl-l-phenylalanine |
CDRs | complementarity determining regions |
FW | framework |
ELISA | enzyme-linked immunosorbant assay |
PCR | polymerase chain reaction |
VBA | Visual Basic for Applications |
AA | amino acid |
wt | wild type |
QC | quality control |
DARPin | designed ankyrin repeat protein |
BLOSUM | blocks substitution matrix |
References
This article references 35 other publications.
- 1Artimo, P.; Jonnalagedda, M.; Arnold, K.; Baratin, D.; Csardi, G.; de Castro, E.; Duvaud, S.; Flegel, V.; Fortier, A.; Gasteiger, E.; Grosdidier, A.; Hernandez, C.; Ioannidis, V.; Kuznetsov, D.; Liechti, R.; Moretti, S.; Mostaguir, K.; Redaschi, N.; Rossier, G.; Xenarios, I.; Stockinger, H. ExPASy: SIB Bioinformatics Resource Portal Nucleic Acids Res. 2012, 40, W597– W603Google ScholarThere is no corresponding record for this reference.
- 2Stothard, P. The Sequence Manipulation Suite: JavaScript Programs for Analyzing and Formatting Protein and DNA Sequences BioTechniques 2000, 28 (1102) 1104Google ScholarThere is no corresponding record for this reference.
- 3Hall, T. BioEdit: A User-Friendly Biological Sequence Alignment Editor and Analysis Program for Windows 95/98/NT Nucleic Acids Symp. Ser. 1999, 41, 95– 98Google Scholar3https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD3cXhtVyjs7Y%253D&md5=eb372a22047bdd2a98be7e092c29b47aBioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NTHall, Thomas A.Nucleic Acids Symposium Series (1999), 41 (Symposium on RNA Biology III: RNA, Tool & Target), 95-98CODEN: NACSD8; ISSN:0261-3166. (Oxford University Press)BioEdit is a user-friendly sequence alignment editor and anal. package that is offered free of charge for Windows 95/98/NT systems. BioEdit is a full-featured nucleic acid/protein alignment editor that offers several modes of easy hand-alignment, split-window views, user-defined colors, information-based shading, auto-integration with ClustalW, local/internet BLAST, restriction-mapping, annotatable plasmid-drawing, box-shading with full color-capability, several built-in anal. options, and a graphical interface for configuring further interfaces to automatically run external anal. programs. BioEdit is also customizable to user preferences with user-defined menu shortcuts and correct handling of all fonts. Among the built-in analyses offered are a set of RNA comparative anal. tools including covariation, potential-pairings and mutual information anal. BioEdit offers the tools required to create and manipulate an alignment, run comparative analyses from the edit window, and view and analyze the data through interactive 2-D graphical matrix plots, area plots, and a rich-text editor. The following note describes a rough sample anal. of the secondary structure of bacterial RNase P RNA (excluding the high G+C Gram-Pos. group) by mutual information probing. An initial scanning of mutual information data via the graphical anal. tools reveals all major helixes that exist in the E. coli structure.
- 4Anzaldi, L. J.; Muñoz-Fernández, D.; Erill, I. BioWord: A Sequence Manipulation Suite for Microsoft Word BMC Bioinf. 2012, 13, 124Google Scholar4https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BC38npvVWgsA%253D%253D&md5=fd6b98466f925e013f24a02c7dccde00BioWord: a sequence manipulation suite for Microsoft WordAnzaldi Laura J; Munoz-Fernandez Daniel; Erill IvanBMC bioinformatics (2012), 13 (), 124 ISSN:.BACKGROUND: The ability to manipulate, edit and process DNA and protein sequences has rapidly become a necessary skill for practicing biologists across a wide swath of disciplines. In spite of this, most everyday sequence manipulation tools are distributed across several programs and web servers, sometimes requiring installation and typically involving frequent switching between applications. To address this problem, here we have developed BioWord, a macro-enabled self-installing template for Microsoft Word documents that integrates an extensive suite of DNA and protein sequence manipulation tools. RESULTS: BioWord is distributed as a single macro-enabled template that self-installs with a single click. After installation, BioWord will open as a tab in the Office ribbon. Biologists can then easily manipulate DNA and protein sequences using a familiar interface and minimize the need to switch between applications. Beyond simple sequence manipulation, BioWord integrates functionality ranging from dyad search and consensus logos to motif discovery and pair-wise alignment. Written in Visual Basic for Applications (VBA) as an open source, object-oriented project, BioWord allows users with varying programming experience to expand and customize the program to better meet their own needs. CONCLUSIONS: BioWord integrates a powerful set of tools for biological sequence manipulation within a handy, user-friendly tab in a widely used word processing software package. The use of a simple scripting language and an object-oriented scheme facilitates customization by users and provides a very accessible educational platform for introducing students to basic bioinformatics algorithms.
- 5Waterhouse, A. M.; Procter, J. B.; Martin, D. M. A.; Clamp, M.; Barton, G. J. Jalview Version 2--a Multiple Sequence Alignment Editor and Analysis Workbench Bioinformatics 2009, 25, 1189– 1191Google ScholarThere is no corresponding record for this reference.
- 6Weiss, G. A.; Watanabe, C. K.; Zhong, A.; Goddard, A.; Sidhu, S. S. Rapid Mapping of Protein Functional Epitopes by Combinatorial Alanine Scanning Proc. Natl. Acad. Sci. U. S. A. 2000, 97, 8950– 8954Google Scholar6https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD3cXls12lur0%253D&md5=549d46048c742ea594810adec8500e7fRapid mapping of protein functional epitopes by combinatorial alanine scanningWeiss, Gregory A.; Watanabe, Colin K.; Zhong, Alan; Goddard, Audrey; Sidhu, Sachdev S.Proceedings of the National Academy of Sciences of the United States of America (2000), 97 (16), 8950-8954CODEN: PNASA6; ISSN:0027-8424. (National Academy of Sciences)A combinatorial alanine-scanning strategy was used to det. simultaneously the functional contributions of 19 side chains buried at the interface between human growth hormone and the extracellular domain of its receptor. A phage-displayed protein library was constructed in which the 19 side chains were preferentially allowed to vary only as the wild type or alanine. The library pool was subjected to binding selections to isolate functional clones, and DNA sequencing was used to det. the alanine/wild-type ratio at each varied position. This ratio was used to calc. the effect of each alanine substitution as a change in free energy relative to that of wild type. Only seven side chains contribute significantly to the binding interaction, and these conserved residues form a compact cluster in the human growth hormone tertiary structure. The results were in excellent agreement with free energy data previously detd. by conventional alanine-scanning mutagenesis and suggest that this technol. should be useful for analyzing functional epitopes in proteins.
- 7Schofield, D. J.; Pope, A. R.; Clementel, V.; Buckell, J.; Chapple, S. D.; Clarke, K. F.; Conquer, J. S.; Crofts, A. M.; Crowther, S. R.; Dyson, M. R.; Flack, G.; Griffin, G. J.; Hooks, Y.; Howat, W. J.; Kolb-Kokocinski, A.; Kunze, S.; Martin, C. D.; Maslen, G. L.; Mitchell, J. N.; O’Sullivan, M.; Perera, R. L.; Roake, W.; Shadbolt, S. P.; Vincent, K. J.; Warford, A.; Wilson, W. E.; Xie, J.; Young, J. L.; McCafferty, J. Application of Phage Display to High Throughput Antibody Generation and Characterization Genome Biol. 2007, 8, R254Google ScholarThere is no corresponding record for this reference.
- 8Schwimmer, L. J.; Huang, B.; Giang, H.; Cotter, R. L.; Chemla-Vogel, D. S.; Dy, F. V.; Tam, E. M.; Zhang, F.; Toy, P.; Bohmann, D. J.; Watson, S. R.; Beaber, J. W.; Reddy, N.; Kuan, H.-F.; Bedinger, D. H.; Rondon, I. J. Discovery of Diverse and Functional Antibodies from Large Human Repertoire Antibody Libraries J. Immunol. Methods 2013, 391, 60– 71Google Scholar8https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3sXkt1KlsL4%253D&md5=e4bd1ba932c9ed2b76b774724baaa9c6Discovery of diverse and functional antibodies from large human repertoire antibody librariesSchwimmer, Lauren J.; Huang, Betty; Giang, Hoa; Cotter, Robyn L.; Chemla-Vogel, David S.; Dy, Francis V.; Tam, Eric M.; Zhang, Fangjiu; Toy, Pamela; Bohmann, David J.; Watson, Susan R.; Beaber, John W.; Reddy, Nithin; Kuan, Hua-Feng; Bedinger, Daniel H.; Rondon, Isaac J.Journal of Immunological Methods (2013), 391 (1-2), 60-71CODEN: JIMMBG; ISSN:0022-1759. (Elsevier B.V.)Phage display antibody libraries have a proven track record for the discovery of therapeutic human antibodies, increasing the demand for large and diverse phage antibody libraries for the discovery of new therapeutics. We have constructed naive antibody phage display libraries in both Fab and scFv formats, with each library having more than 250 billion clones that encompass the human antibody repertoire. These libraries show high fidelity in open reading frame and expression percentages, and their V-gene family distribution, VH-CDR3 length and amino acid usage mirror the natural diversity of human antibodies. Both the Fab and scFv libraries show robust sequence diversity in target-specific binders and differential V-gene usage for each target tested, supporting the use of libraries that utilize multiple display formats and V-gene utilization to maximize antibody-binding diversity. For each of the targets, clones with picomolar affinities were identified from at least one of the libraries and for the two targets assessed for activity, functional antibodies were identified from both libraries.
- 9Vielmetter, J.; Tishler, J.; Ary, M. L.; Cheung, P.; Bishop, R. Data Management Solutions for Protein Therapeutic Research and Development Drug Discovery Today 2005, 10, 1065– 1071Google ScholarThere is no corresponding record for this reference.
- 10Hansen, M. R.; Villar, H. O.; Feyfant, E. Development of an Informatics Platform for Therapeutic Protein and Peptide Analytics J. Chem. Inf. Model. 2013, 53, 2774– 2779Google Scholar10https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3sXhsVOlt7vK&md5=abecaa332c19c825132265c9a668ebb1Development of an Informatics Platform for Therapeutic Protein and Peptide AnalyticsHansen, Mark R.; Villar, Hugo O.; Feyfant, EricJournal of Chemical Information and Modeling (2013), 53 (10), 2774-2779CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)The momentum gained by research on biologics has not been met yet with equal thrust on the informatics side. There is a noticeable lack of software for data management that empowers the bench scientists working on the development of biol. therapeutics. SARvision|Biologics is a tool to analyze data assocd. with biopolymers, including peptides, antibodies, and protein therapeutics programs. The program brings under a single user interface tools to filter, mine, and visualize data as well as those algorithms needed to organize sequences. As part of the data-anal. tools, we introduce two new concepts: mutation cliffs and invariant maps. Invariant maps show the variability of properties when a monomer is maintained const. in a position of the biopolymer. Mutation cliff maps draw attention to pairs of sequences where a single or limited no. of point mutations elicit a large change in a property of interest. We illustrate the program and its applications using a peptide data set collected from the literature.
- 11Caffrey, D. R.; Dana, P. H.; Mathur, V.; Ocano, M.; Hong, E.-J.; Wang, Y. E.; Somaroo, S.; Caffrey, B. E.; Potluri, S.; Huang, E. S. PFAAT Version 2.0: A Tool for Editing, Annotating, and Analyzing Multiple Sequence Alignments BMC Bioinf. 2007, 8, 381Google Scholar11https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BD2snpsVyksg%253D%253D&md5=138db4c59d5fe8e9748f7eec4e12b07fPFAAT version 2.0: a tool for editing, annotating, and analyzing multiple sequence alignmentsCaffrey Daniel R; Dana Paul H; Mathur Vidhya; Ocano Marco; Hong Eun-Jong; Wang Yaoyu E; Somaroo Shyamal; Caffrey Brian E; Potluri Shobha; Huang Enoch SBMC bioinformatics (2007), 8 (), 381 ISSN:.BACKGROUND: By virtue of their shared ancestry, homologous sequences are similar in their structure and function. Consequently, multiple sequence alignments are routinely used to identify trends that relate to function. This type of analysis is particularly productive when it is combined with structural and phylogenetic analysis. RESULTS: Here we describe the release of PFAAT version 2.0, a tool for editing, analyzing, and annotating multiple sequence alignments. Support for multiple annotations is a key component of this release as it provides a framework for most of the new functionalities. The sequence annotations are accessible from the alignment and tree, where they are typically used to label sequences or hyperlink them to related databases. Sequence annotations can be created manually or extracted automatically from UniProt entries. Once a multiple sequence alignment is populated with sequence annotations, sequences can be easily selected and sorted through a sophisticated search dialog. The selected sequences can be further analyzed using statistical methods that explicitly model relationships between the sequence annotations and residue properties. Residue annotations are accessible from the alignment viewer and are typically used to designate binding sites or properties for a particular residue. Residue annotations are also searchable, and allow one to quickly select alignment columns for further sequence analysis, e.g. computing percent identities. Other features include: novel algorithms to compute sequence conservation, mapping conservation scores to a 3D structure in Jmol, displaying secondary structure elements, and sorting sequences by residue composition. CONCLUSION: PFAAT provides a framework whereby end-users can specify knowledge for a protein family in the form of annotation. The annotations can be combined with sophisticated analysis to test hypothesis that relate to sequence, structure and function.
- 12Cock, P. J. A.; Antao, T.; Chang, J. T.; Chapman, B. A.; Cox, C. J.; Dalke, A.; Friedberg, I.; Hamelryck, T.; Kauff, F.; Wilczynski, B.; de Hoon, M. J. L. Biopython: Freely Available Python Tools for Computational Molecular Biology and Bioinformatics Bioinformatics 2009, 25, 1422– 1423Google Scholar12https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1MXmtFeqt74%253D&md5=cdb95ac36cc1900372d6b6ba629d4537Biopython: freely available Python tools for computational molecular biology and bioinformaticsCock, Peter J. A.; Antao, Tiago; Chang, Jeffrey T.; Chapman, Brad A.; Cox, Cymon J.; Dalke, Andrew; Friedberg, Iddo; Hamelryck, Thomas; Kauff, Frank; Wilczynski, Bartek; de Hoon, Michiel J. L.Bioinformatics (2009), 25 (11), 1422-1423CODEN: BOINFP; ISSN:1367-4803. (Oxford University Press)Summary: The Biopython project is a mature open source international collaboration of volunteer developers, providing Python libraries for a wide range of bioinformatics problems. Biopython includes modules for reading and writing different sequence file formats and multiple sequence alignments, dealing with 3D macro mol. structures, interacting with common tools such as BLAST, ClustalW and EMBOSS, accessing key online databases, as well as providing numerical methods for statistical learning. Availability: Biopython is freely available, with documentation and source code at www.biopython.org under the Biopython license. Contact: All queries should be directed to the Biopython mailing lists, see www.biopython.org/wiki/ Mailimg [email protected].
- 13Gentleman, R. C.; Carey, V. J.; Bates, D. M.; Bolstad, B.; Dettling, M.; Dudoit, S.; Ellis, B.; Gautier, L.; Ge, Y.; Gentry, J.; Hornik, K.; Hothorn, T.; Huber, W.; Iacus, S.; Irizarry, R.; Leisch, F.; Li, C.; Maechler, M.; Rossini, A. J.; Sawitzki, G.; Smith, C.; Smyth, G.; Tierney, L.; Yang, J. Y. H.; Zhang, J. Bioconductor: Open Software Development for Computational Biology and Bioinformatics Genome Biol. 2004, 5, R80Google Scholar13https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BD2cvptlelug%253D%253D&md5=35cc886a10031db9f45bd91012252749Bioconductor: open software development for computational biology and bioinformaticsGentleman Robert C; Carey Vincent J; Bates Douglas M; Bolstad Ben; Dettling Marcel; Dudoit Sandrine; Ellis Byron; Gautier Laurent; Ge Yongchao; Gentry Jeff; Hornik Kurt; Hothorn Torsten; Huber Wolfgang; Iacus Stefano; Irizarry Rafael; Leisch Friedrich; Li Cheng; Maechler Martin; Rossini Anthony J; Sawitzki Gunther; Smith Colin; Smyth Gordon; Tierney Luke; Yang Jean Y H; Zhang JianhuaGenome biology (2004), 5 (10), R80 ISSN:.The Bioconductor project is an initiative for the collaborative creation of extensible software for computational biology and bioinformatics. The goals of the project include: fostering collaborative development and widespread use of innovative software, reducing barriers to entry into interdisciplinary scientific research, and promoting the achievement of remote reproducibility of research results. We describe details of our aims and methods, identify current challenges, compare Bioconductor to other open bioinformatics projects, and provide working examples.
- 14Prlić, A.; Yates, A.; Bliven, S. E.; Rose, P. W.; Jacobsen, J.; Troshin, P. V.; Chapman, M.; Gao, J.; Koh, C. H.; Foisy, S.; Holland, R.; Rimsa, G.; Heuer, M. L.; Brandstätter-Müller, H.; Bourne, P. E.; Willis, S. BioJava: An Open-Source Framework for Bioinformatics in 2012 Bioinformatics 2012, 28, 2693– 2695Google ScholarThere is no corresponding record for this reference.
- 15Lee, C. V.; Liang, W.-C.; Dennis, M. S.; Eigenbrot, C.; Sidhu, S. S.; Fuh, G. High-Affinity Human Antibodies from Phage-Displayed Synthetic Fab Libraries with a Single Framework Scaffold J. Mol. Biol. 2004, 340, 1073– 1093Google ScholarThere is no corresponding record for this reference.
- 16Lamb, B. M.; Mercer, A. C.; Barbas, C. F., 3rd Directed Evolution of the TALE N-Terminal Domain for Recognition of All 5′ Bases Nucleic Acids Res. 2013, 41, 9779– 9785Google ScholarThere is no corresponding record for this reference.
- 17Binz, H. K.; Stumpp, M. T.; Forrer, P.; Amstutz, P.; Plückthun, A. Designing Repeat Proteins: Well-Expressed, Soluble and Stable Proteins from Combinatorial Libraries of Consensus Ankyrin Repeat Proteins J. Mol. Biol. 2003, 332, 489– 503Google ScholarThere is no corresponding record for this reference.
- 18Koide, S.; Koide, A.; Lipovšek, D. Target-Binding Proteins Based on the 10th Human Fibronectin Type III Domain (10Fn3) Methods Enzymol. 2012, 503, 135– 156Google ScholarThere is no corresponding record for this reference.
- 19Ho, M.; Pastan, I. In Vitro Antibody Affinity Maturation Targeting Germline Hotspots Methods Mol. Biol. 2009, 525, 293– 308
xiv
Google Scholar19https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1MXkt1Cjurs%253D&md5=4d7b5bc258b12433793badf18ecbffe8In vitro antibody affinity maturation targeting germline hotspotsHo, Mitchell; Pastan, IraMethods in Molecular Biology (Totowa, NJ, United States) (2009), 525 (Therapeutic Antibodies), 293-308CODEN: MMBIED; ISSN:1064-3745. (Humana Press Inc.)A review. Affinity-matured antibodies can exhibit increased biol. efficacy. Regardless of whether an antibody is isolated from a hybridoma or a human Fv phage library, the antibody affinity for its target may need improvement for therapeutic applications. An increased affinity may allow for a reduced dosage of a therapeutic antibody; toxic side effects may also be reduced. In the immune system, affinity maturation is a process involving somatic hypermutations in B cells. Therefore, germline hotspot residues are most likely to have a major impact on antibody affinity. Here, the authors describe procedures for germline hotspot mutagenesis with an emphasis on strategies for randomizing hotspots with PCR and phage display, using as an example the anti-CD22 monoclonal antibody. - 20Labrou, N. E. Random Mutagenesis Methods for in Vitro Directed Enzyme Evolution Curr. Protein Pept. Sci. 2010, 11, 91– 100Google Scholar20https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3cXkt1ersrY%253D&md5=7d3b4c0dc56d139154faeeb1fe0e81d9Random mutagenesis methods for in vitro directed enzyme evolutionLabrou, Nikolaos E.Current Protein and Peptide Science (2010), 11 (1), 91-100CODEN: CPPSCM; ISSN:1389-2037. (Bentham Science Publishers Ltd.)A review. Random mutagenesis is a powerful tool for generating enzymes, proteins, entire metabolic pathways, or even entire genomes with desired or improved properties. This technol. is used to evolve genes in vitro through an iterative process consisting of recombinant generation. Coupled with the development of powerful high-throughput screening or selection methods, this technique has been successfully used to solve problems in protein engineering. There are many methods to generate genetic diversity by random mutagenesis and to create combinatorial libraries. This can be achieved by treating DNA or whole bacteria with various chem. mutagens, by passing cloned genes through mutator strains, by "error-prone" PCR mutagenesis, by rolling circle error-prone PCR, or by satn. mutagenesis. The next sections of this review article focus on recent advances in techniques and methods used for in vitro directed evolution of enzymes using random mutagenesis. Selected examples, highlighting successful applications of these methods, are also presented and discussed.
- 21Crooks, G. E.; Hon, G.; Chandonia, J.-M.; Brenner, S. E. WebLogo: A Sequence Logo Generator Genome Res. 2004, 14, 1188– 1190Google Scholar21https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2cXkvFGht7Y%253D&md5=1b7fb3dd80c6f5a1e600f736a1bf498bWebLogo: A sequence logo generatorCrooks, Gavin E.; Hon, Gary; Chandonia, John-Marc; Brenner, Steven E.Genome Research (2004), 14 (6), 1188-1190CODEN: GEREFS; ISSN:1088-9051. (Cold Spring Harbor Laboratory Press)WebLogo generates sequence logos, graphical representations of the patterns within a multiple sequence alignment. Sequence logos provide a richer and more precise description of sequence similarity than consensus sequences and can rapidly reveal significant features of the alignment otherwise difficult to perceive. Each logo consists of stacks of letters, one stack for each position in the sequence. The overall height of each stack indicates the sequence conservation at that position (measured in bits), whereas the height of symbols within the stack reflects the relative frequency of the corresponding amino or nucleic acid at that position. WebLogo has been enhanced recently with addnl. features and options, to provide a convenient and highly configurable sequence logo generator. A command line interface and the complete, open WebLogo source code are available for local installation and customization.
- 22Zimmerman, E. S.; Heibeck, T. H.; Gill, A.; Li, X.; Murray, C. J.; Madlansacay, M. R.; Tran, C.; Uter, N. T.; Yin, G.; Rivers, P. J.; Yam, A. Y.; Wang, W. D.; Steiner, A. R.; Bajad, S. U.; Penta, K.; Yang, W.; Hallam, T. J.; Thanos, C. D.; Sato, A. K. Production of Site-Specific Antibody-Drug Conjugates Using Optimized Non-Natural Amino Acids in a Cell-Free Expression System Bioconjugate Chem. 2014, 25, 351– 361Google Scholar22https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2cXpt1ymtw%253D%253D&md5=af10b912907e3157a0314ce2a80828c4Production of Site-Specific Antibody-Drug Conjugates Using Optimized Non-Natural Amino Acids in a Cell-Free Expression SystemZimmerman, Erik S.; Heibeck, Tyler H.; Gill, Avinash; Li, Xiaofan; Murray, Christopher J.; Madlansacay, Mary Rose; Tran, Cuong; Uter, Nathan T.; Yin, Gang; Rivers, Patrick J.; Yam, Alice Y.; Wang, Willie D.; Steiner, Alexander R.; Bajad, Sunil U.; Penta, Kalyani; Yang, Wenjin; Hallam, Trevor J.; Thanos, Christopher D.; Sato, Aaron K.Bioconjugate Chemistry (2014), 25 (2), 351-361CODEN: BCCHES; ISSN:1043-1802. (American Chemical Society)Antibody-drug conjugates (ADCs) are a targeted chemotherapeutic currently at the cutting edge of oncol. medicine. These hybrid mols. consist of a tumor antigen-specific antibody coupled to a chemotherapeutic small mol. Through targeted delivery of potent cytotoxins, ADCs exhibit improved therapeutic index and enhanced efficacy relative to traditional chemotherapies and monoclonal antibody therapies. The currently FDA-approved ADCs, Kadcyla (Immunogen/Roche) and Adcetris (Seattle Genetics), are produced by conjugation to surface-exposed lysines, or partial disulfide redn. and conjugation to free cysteines, resp. These stochastic modes of conjugation lead to heterogeneous drug products with varied nos. of drugs conjugated across several possible sites. As a consequence, the field has limited understanding of the relationships between the site and extent of drug loading and ADC attributes such as efficacy, safety, pharmacokinetics, and immunogenicity. A robust platform for rapid prodn. of ADCs with defined and uniform sites of drug conjugation would enable such studies. We have established a cell-free protein expression system for prodn. of antibody drug conjugates through site-specific incorporation of the optimized non-natural amino acid, para-azidomethyl-L-phenylalanine (pAMF). By using our cell-free protein synthesis platform to directly screen a library of aaRS variants, we have discovered a novel variant of the Methanococcus jannaschii tyrosyl tRNA synthetase (TyrRS), with a high activity and specificity toward pAMF. We demonstrate that site-specific incorporation of pAMF facilitates near complete conjugation of a DBCO-PEG-monomethyl auristatin (DBCO-PEG-MMAF) drug to the tumor-specific, Her2-binding IgG Trastuzumab using strain-promoted azide-alkyne cycloaddn. (SPAAC) copper-free click chem. The resultant ADCs proved highly potent in in vitro cell cytotoxicity assays.
- 23Stafford, R. L.; Matsumoto, M. L.; Yin, G.; Cai, Q.; Fung, J. J.; Stephenson, H.; Gill, A.; You, M.; Lin, S.-H.; Wang, W. D.; Masikat, M. R.; Li, X.; Penta, K.; Steiner, A. R.; Baliga, R.; Murray, C. J.; Thanos, C. D.; Hallam, T. J.; Sato, A. K. In Vitro Fab Display: A Cell-Free System for IgG Discovery Protein Eng. Des. Sel. 2014, 27, 97– 109Google Scholar23https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2cXltFKhsrw%253D&md5=3555fdfa8a1c297ee6eef5bbb7cd8736In vitro Fab display: a cell-free system for IgG discoveryStafford, Ryan L.; Matsumoto, Marissa L.; Yin, Gang; Cai, Qi; Fung, Juan Jose; Stephenson, Heather; Gill, Avinash; You, Monica; Lin, Shwu-Hwa; Wang, Willie D.; Masikat, Mary Rose; Li, Xiaofan; Penta, Kalyani; Steiner, Alex R.; Baliga, Ramesh; Murray, Christopher J.; Thanos, Christopher D.; Hallam, Trevor J.; Sato, Aaron K.Protein Engineering, Design & Selection (2014), 27 (4), 97-109CODEN: PEDSBR; ISSN:1741-0126. (Oxford University Press)Selection technologies such as ribosome display enable the rapid discovery of novel antibody fragments entirely in vitro. It has been assumed that the open nature of the cell-free reactions used in these technologies limits selections to single-chain protein fragments. We present a simple approach for the selection of multi-chain proteins, such as antibody Fab fragments, using ribosome display. Specifically, we show that a two-chain trastuzumab (Herceptin) Fab domain can be displayed in a format which tethers either the heavy or light chain to the ribosome while retaining functional antigen binding. Then, we constructed synthetic Fab HC and LC libraries and performed test selections against carcinoembryonic antigen (CEA) and vascular endothelial growth factor (VEGF). The Fab selection output was reformatted into full-length Ig Gs (IgGs) and directly expressed at high levels in an optimized cell-free system for immediate screening, purifn. and characterization. Several novel IgGs were identified using this cell-free platform that bind to purified CEA, CEA pos. cells and VEGF.
- 24Needleman, S. B.; Wunsch, C. D. A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins J. Mol. Biol. 1970, 48, 443– 453Google Scholar24https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaE3cXktVShu74%253D&md5=703fa9a6d50ac3b7c9b45c37aee094d0General method applicable to the search for similarities in the amino acid sequence of two proteinsNeedleman, Saul B.; Wunsch, Christian D.Journal of Molecular Biology (1970), 48 (3), 443-53CODEN: JMOBAK; ISSN:0022-2836.A computer adaptable method for finding similarities in the amino acid sequences of two proteins has been developed, making it possible to det. whether significant homology exists between the proteins. This information is used to trace their possible evolutionary development. The max. match is a no. dependent upon the similarity of the sequences. One of its definitions is the largest no. of amino acids of one protein that can be matched with those of a second protein allowing for all possible interruptions in either of the sequences. While the interruptions give rise to a very large no. of comparisons, the method efficiently excludes from consideration those comparisons that cannot contribute to the max. match. Comparisons are made from the smallest unit of significance, a pair of amino acids, one from each protein.
- 25Henikoff, S.; Henikoff, J. G. Amino Acid Substitution Matrices from Protein Blocks Proc. Natl. Acad. Sci. U. S. A. 1992, 89, 10915– 10919Google Scholar25https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaK3sXjsFCgsQ%253D%253D&md5=3c4bee915654ac0b98b8d4aaafd44ebcAmino acid substitution matrixes from protein blocksHenikoff, Steven; Henikoff, Jorja G.Proceedings of the National Academy of Sciences of the United States of America (1992), 89 (22), 10915-19CODEN: PNASA6; ISSN:0027-8424.Methods for alignment of protein sequences typically measure similarity by using a substitution matrix with scores for all possible exchanges of one amino acid with another. The most widely used matrixes are based on the Dayhoff model of evolutionary rates. Using a different approach, the authors derived substitution matrixes from about 2000 blocks of aligned sequence segments characterizing more than 500 groups of related proteins. This led to marked improvements in alignments and in searches using queries from each of the groups.
- 26Heckman, K. L.; Pease, L. R. Gene Splicing and Mutagenesis by PCR-Driven Overlap Extension Nat. Protoc. 2007, 2, 924– 932Google ScholarThere is no corresponding record for this reference.
- 27Sievers, F.; Higgins, D. G. Clustal Omega, Accurate Alignment of Very Large Numbers of Sequences Methods Mol. Biol. 2014, 1079, 105– 116Google Scholar27https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2MXntFOhsLw%253D&md5=4287e7d9b9ab241655fdee497980031fClustal Omega, Accurate Alignment of Very Large Numbers of SequencesSievers, Fabian; Higgins, Desmond G.Methods in Molecular Biology (New York, NY, United States) (2014), 1079 (Multiple Sequence Alignment Methods), 105-116CODEN: MMBIED; ISSN:1940-6029. (Springer)Clustal Omega is a completely rewritten and revised version of the widely used Clustal series of programs for multiple sequence alignment. It can deal with very large nos. (many tens of thousands) of DNA/RNA or protein sequences due to its use of the mBED algorithm for calcg. guide trees. This algorithm allows very large alignment problems to be tackled very quickly, even on personal computers. The accuracy of the program has been considerably improved over earlier Clustal programs, through the use of the HHalign method for aligning profile hidden Markov models. The program currently is used from the command line or can be run on line.
- 28Ewing, B.; Hillier, L.; Wendl, M. C.; Green, P. Base-Calling of Automated Sequencer Traces Using Phred. I. Accuracy Assessment Genome Res. 1998, 8, 175– 185Google Scholar28https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaK1cXitlWlu78%253D&md5=2a5c86722c1883b778c8c5222473ba22Base-calling automated sequencer traces using phred. I. Accuracy assessmentEwing, Brent; Hillier, LaDeana; Wendl, Michael C.; Green, PhilGenome Research (1998), 8 (3), 175-185CODEN: GEREFS; ISSN:1088-9051. (Cold Spring Harbor Laboratory Press)The availability of massive amts. of DNA sequence information has begun to revolutionize the practice of biol. As a result, current large-scale sequencing output, while impressive, is not adequate to keep pace with growing demand and, in particular, is far short of what will be required to obtain the 3-billion-base human genome sequence by the target date of 2005. To reach this goal, improved automation will be essential, and it is particularly important that human involvement in sequence data processing be significantly reduced or eliminated. Progress in this respect will require both improved accuracy of the data processing software and reliable accuracy measures to reduce the need for human involvement in error correction and make human review more efficient. Here, we describe one step toward that goal: a base-calling program for automated sequencer traces, phred, with improved accuracy. Phred appears to be the first base-calling program to achieve a lower error rate than the ABI software, averaging 40%-50% fewer errors in the data sets examd. independent of position in read, machine running conditions, or sequencing chem.
- 29Ewing, B.; Green, P. Base-Calling of Automated Sequencer Traces Using Phred. II. Error Probabilities Genome Res. 1998, 8, 186– 194Google Scholar29https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaK1cXitlWlu7g%253D&md5=f870cd1861b597a01874832afa393634Base-calling of automated sequencer traces using Phred. II. Error probabilitiesEwing, Brent; Green, PhilGenome Research (1998), 8 (3), 186-194CODEN: GEREFS; ISSN:1088-9051. (Cold Spring Harbor Laboratory Press)Elimination of the data processing bottleneck in high-throughput sequencing will require both improved accuracy of data processing software and reliable measures of that accuracy. The ability to est. a probability of error for each base-call, as a function of certain parameters computed from the trace data, was developed and implemented in the base-calling program phred. These error probabilities are shown to be valid (correspond to actual error rates) and to have high power to discriminate correct base-calls from incorrect ones, for read data collected under several different chemistries and electrophoretic conditions. They play a crit. role in the assembly program phrap and the finishing program consed.
- 30Gotoh, O. An Improved Algorithm for Matching Biological Sequences J. Mol. Biol. 1982, 162, 705– 708Google ScholarThere is no corresponding record for this reference.
- 31Thompson, J. D.; Plewniak, F.; Poch, O. A Comprehensive Comparison of Multiple Sequence Alignment Programs Nucleic Acids Res. 1999, 27, 2682– 2690Google ScholarThere is no corresponding record for this reference.
- 32Tiller, T.; Schuster, I.; Deppe, D.; Siegers, K.; Strohner, R.; Herrmann, T.; Berenguer, M.; Poujol, D.; Stehle, J.; Stark, Y.; Heßling, M.; Daubert, D.; Felderer, K.; Kaden, S.; Kölln, J.; Enzelberger, M.; Urlinger, S. A Fully Synthetic Human Fab Antibody Library Based on Fixed VH/VL Framework Pairings with Favorable Biophysical Properties MAbs 2013, 5, 445– 470Google ScholarThere is no corresponding record for this reference.
- 33Velappan, N.; Sblattero, D.; Chasteen, L.; Pavlik, P.; Bradbury, A. R. M. Plasmid Incompatibility: More Compatible than Previously Thought? Protein Eng. Des. Sel. 2007, 20, 309– 313Google Scholar33https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2sXhtV2ns7bJ&md5=37dbb4a4fd52ed81a3ee5d727de2297ePlasmid incompatibility: more compatible than previously thought?Velappan, Nileena; Sblattero, Daniele; Chasteen, Leslie; Pavlik, Peter; Bradbury, Andrew R. M.Protein Engineering, Design & Selection (2007), 20 (7), 309-313CODEN: PEDSBR; ISSN:1741-0126. (Oxford University Press)It is generally accepted that plasmids contg. the same origin of replication are incompatible. We have re-examd. this concept in terms of the plasmid copy no., by introducing plasmids contg. the same orgin of replication and different antibiotic resistance genes into bacteria. By selecting for resistance to only one antibiotic, we were able to examine the persistence of plasmids carrying resistances to other antibiotics. We find that plasmids are not rapidly lost, but are able to persist in bacteria for multiple overnight growth cycles, with some dependence upon the nature of the antibiotic selected for. By carrying out the expts. with different origins of replication, we have been able to show that higher copy no. leads to longer persistence, but even with low copy plasmids, persistence occurs to a significant degree. This observation holds significance for the field of protein engineering, as the presence of two or more plasmids within bacteria weakens, and confuses, the connection between screened phenotype and genotype, with the potential to wrongly assign specific phenotypes to incorrect genotypes.
- 34Goldsmith, M.; Kiss, C.; Bradbury, A. R. M.; Tawfik, D. S. Avoiding and Controlling Double Transformation Artifacts Protein Eng. Des. Sel. 2007, 20, 315– 318Google Scholar34https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2sXhtV2ns7bK&md5=9a2f1fa2a0ba4ac99de9fff6be471ba6Avoiding and controlling double transformation artifactsGoldsmith, Moshe; Kiss, Csaba; Bradbury, Andrew R. M.; Tawfik, Dan S.Protein Engineering, Design & Selection (2007), 20 (7), 315-318CODEN: PEDSBR; ISSN:1741-0126. (Oxford University Press)This article describes a set of std. control expts. for the authentication of new protein variants isolated through library selection and site-directed mutagenesis. These controls are specifically designed to rule out artifacts derived from 'double transformants'-i.e. cells transformed with, or infected by, two different plasmids simultaneously. These seem to have been the source of past artifacts and, as demonstrated here, are far more common than generally recognized. By following std. protocols for cloning, plasmid isolation, subcloning, in combination with functional assays, the presence of such artifacts can be ruled out. This protocol needs to be applied for any new variant isolated from heterogeneous gene repertoires, and in particular for variants isolated by selection for either enzymic activity, or binding.
- 35Emsley, P.; Lohkamp, B.; Scott, W. G.; Cowtan, K. Features and Development of Coot Acta Crystallogr. D Biol. Crystallogr. 2010, 66, 486– 501Google Scholar35https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3cXksFKisb8%253D&md5=67262cbfc60004de5ef962d5c043c910Features and development of CootEmsley, P.; Lohkamp, B.; Scott, W. G.; Cowtan, K.Acta Crystallographica, Section D: Biological Crystallography (2010), 66 (4), 486-501CODEN: ABCRE6; ISSN:0907-4449. (International Union of Crystallography)Coot is a mol.-graphics application for model building and validation of biol. macromols. The program displays electron-d. maps and at. models and allows model manipulations such as idealization, real-space refinement, manual rotation/translation, rigid-body fitting, ligand search, solvation, mutations, rotamers and Ramachandran idealization. Furthermore, tools are provided for model validation as well as interfaces to external programs for refinement, validation and graphics. The software is designed to be easy to learn for novice users, which is achieved by ensuring that tools for common tasks are 'discoverable' through familiar user-interface elements (menus and toolbars) or by intuitive behavior (mouse controls). Recent developments have focused on providing tools for expert users, with customisable key bindings, extensions and an extensive scripting interface. The software is under rapid development, but has already achieved very widespread use within the crystallog. community. The current state of the software is presented, with a description of the facilities available and of some of the underlying methods employed.
Cited By
This article is cited by 6 publications.
- Muhammad Naveed, Muhammad Saad Mughal, Khizra Jabeen, Tariq Aziz, Sumaira Naz, Nausheen Nazir, Muhammad Shahzad, Metab Alharbi, Abdulrahman Alshammari, Satya Sai Sadhu. Evaluation of the whole proteome to design a novel mRNA-based vaccine against multidrug-resistant Serratia marcescens. Frontiers in Microbiology 2022, 13 https://doi.org/10.3389/fmicb.2022.960285
- Jiahe Huang, Qi Dai, Yuhua Yao, Ping-An He. A Generalized Iterative Map for Analysis of Protein Sequences. Combinatorial Chemistry & High Throughput Screening 2022, 25
(3)
, 381-391. https://doi.org/10.2174/1386207323666201012142318
- Momoko Tajiri. Phage Display Screening for Alba Superfamily Proteins from the Human Malaria Parasite, Plasmodium falciparum Reveals a High Level of Association with Protein Modification Pathways and Hints at New Drug Targets. Acta Parasitologica 2021, 66
(3)
, 844-850. https://doi.org/10.1007/s11686-021-00339-x
- Momoko Tajiri. Comparison of High-Throughput Sequencing for Phage Display Peptide Screening on Two Commercially Available Platforms. International Journal of Peptide Research and Therapeutics 2020, 26
(1)
, 523-529. https://doi.org/10.1007/s10989-019-09858-8
- Gaurav Singh Kaira, Mukesh Kapoor. How substrate subsites in GH26 endo-mannanase contribute towards mannan binding. Biochemical and Biophysical Research Communications 2019, 510
(3)
, 358-363. https://doi.org/10.1016/j.bbrc.2019.01.085
- Jingzhi Yang, Claudia Röwer, Cornelia Koy, Manuela Ruß, Christopher P. Rüger, Ralf Zimmermann, Uwe von Fritschen, Marius Bredell, Juliane C. Finke, Michael O. Glocker. Mass spectrometric characterization of limited proteolysis activity in human plasma samples under mild acidic conditions. Methods 2015, 89 , 30-37. https://doi.org/10.1016/j.ymeth.2015.02.013
Abstract
Figure 1
Figure 1. Overview of the XLibraryDisplay user interface. All basic analysis routines are executed by clicking through the buttons on the vertical main menu from top to bottom. The processed data from each step is organized in a series of worksheets: Template, RawData, TrimmedDNA, BadDNA, GoodDNA, Translated, Aligned, Summary, and Activity. The aligned protein sequences are shown for a sample data set in which CDR H3 of trastuzumab has been randomized with 8 NNK codons, which contain equal mixes of all nucleotides at the first two positions (N) and G or T at the third position (K). NNKs allow coding of all 20 amino acids while lowering the odds of finding stop codons in individual library members compared to NNN. The template is always visible as a reference (top row) as are the sequence names (left column) in frozen panes. Sequence names are automatically highlighted in different colors if they have stop codons (red), frameshifts (blue), deletions (gray), insertions (dark gray), or undetermined amino acids (yellow). The library positions, which were automatically detected by the program, are highlighted in magenta in the template sequence. Unique library residues within the alignment are highlighted in alternating shades of magenta and purple by default after sorting. Other amino acids in the alignment are automatically highlighted if they are mutations (orange), silent mutations (peach), stop codons (red), gaps (gray), or unknown amino acids (Xs, yellow). Right-clicking on the alignment opens an interactive menu that allows the user to perform different actions on selected sequences or columns. As an example, two local DNA/AA alignment windows are shown. The Developer tab in Excel has been enabled which allows the user to modify code using Visual Basic.
Figure 2
Figure 2. Simple alignment algorithm. (A) A library is shown after each step to illustrate the alignment algorithm. For simplicity, the example library only shows eight sequences from a simulated library of the trastuzumab HC in which residues KDTY of CDR H1 have been randomized with 4 NNK codons. First, all the sequences which have been previously trimmed to the template are translated and aligned from N to C terminus without adding any gaps. Second, DNA sequences 3 and 4 which are shorter than the template by a multiple of 3 are assumed to have deletions. Gaps (red circles) are inserted into these sequences to align them to the template. Third, DNA sequences 5 and 6 which are longer than the template by a multiple of 3 are assumed to have insertions. Gaps are inserted into all other sequences and the template to align these insertions (red circles). Fourth, gaps (red circle) must be corrected for sequences which also contain insertions. Lastly, DNA sequences 7 and 8 which differ in size from the DNA template by a nonmultiple of 3 are assumed to have frameshifts. No gaps are inserted into these sequences. All sequences are colored as in Figure 1. The library residues have not yet been identified, nor have the sequences been sorted so they are not colored in magenta and purple. The randomized library positions are clearly identifiable under the KDTY template residues since they are mostly mutated. (B) The simple alignment algorithm inserts gaps into sequences by systematically testing gaps and scoring up to 10 residues surrounding each gap for the best match to the template. Gaps of 1 amino acid (i.e., 1 codon or 3 nucleotides) are tested initially scanning from N- to C-terminus during the first pass. The test gap size is increased until the template and sequence are the same length. If a gap score of 1 is not found, then the gap with the highest score will be used. The gap insertion process is iterated until the sequence and template are the same length. The top example represents an intermediate gap test that did not score as well as the gap chosen in the bottom alignment. The same method is employed for inserting gaps into the template for sequences that have insertions.
Figure 3
Figure 3. Automated library amino acid composition analysis. (A) A stacked-column graph generated by XLibraryDisplay shows the percent amino acid composition at each randomized position of the example MjTyrRS library. (B) A colored chart generated by the program shows the total numbers of amino acids found at each library position. (C) A WebLogo plot can be generated by loading an exported FASTA file.
Figure 4
Figure 4. Automated library nucleotide composition analysis. (A) A stacked-column graph shows the percent nucleotide composition at each randomized position of the example MjTyrRS library. (B) A colored chart generated by the program shows the total numbers of each base at each library position.
Figure 5
Figure 5. Library summary analysis. A portion of a standard summary report for the MjTyrRS library is shown. For brevity, only 15 sequences are shown, but the actual report shows the library sequences for all unique clones.
Figure 6
Figure 6. Phred QC analysis. (A) A typical QC report is shown for the MjTyrRS library when sequences are loaded from Phred PHD files which contain a QC score for every base. Scores for each base are used to shade separate nucleotide boxes from light blue (high score, more accurate) to dark blue (low score, less accurate). (B) A plot of the QC score at each position of a good, single sequence is shown (A01). (C) A plot of the QC score at each position of a mixture of sequences is shown (D03). (D) A comparison of the sequence chromatograms for the good sequence shown in panel B, to the mixed sequence in panel C. A red box is drawn in each panel to indicate the site of the mixed bases near position 370. The coloring in panel A enables detection of potentially mixed clones by visual inspection.
Figure 7
Figure 7. Activity and structure analysis. (A) An example is shown for the sequence–activity correlation for an antibody light chain library selection against VEGF. In this example, only the library sequences are shown, but the entire sequence can also be automatically correlated to experimental data. (B) An example is shown for sequence–structure correlation in which the light chain antibody sequence and structure from PDB code 1N8Z was aligned to the selected sequences shown in panel A. Residues are colored according to secondary structure (red = sheets, blue = helices, purple = loops). The inset shows an image from PyMOL created from an automatically generated script which highlights mutations like T72M.
Figure 8
Figure 8. Comparison of alignment algorithms. For libraries with constant loop lengths, the simple alignment algorithm works better than the NW algorithm. As implemented in the program, the NW algorithm has a tendency to insert gaps into randomized positions since it uses a constant gap penalty. For libraries with variable loop lengths, the NW algorithm performs better since it correctly inserts gaps into the randomized positions.
References
ARTICLE SECTIONSThis article references 35 other publications.
- 1Artimo, P.; Jonnalagedda, M.; Arnold, K.; Baratin, D.; Csardi, G.; de Castro, E.; Duvaud, S.; Flegel, V.; Fortier, A.; Gasteiger, E.; Grosdidier, A.; Hernandez, C.; Ioannidis, V.; Kuznetsov, D.; Liechti, R.; Moretti, S.; Mostaguir, K.; Redaschi, N.; Rossier, G.; Xenarios, I.; Stockinger, H. ExPASy: SIB Bioinformatics Resource Portal Nucleic Acids Res. 2012, 40, W597– W603Google ScholarThere is no corresponding record for this reference.
- 2Stothard, P. The Sequence Manipulation Suite: JavaScript Programs for Analyzing and Formatting Protein and DNA Sequences BioTechniques 2000, 28 (1102) 1104Google ScholarThere is no corresponding record for this reference.
- 3Hall, T. BioEdit: A User-Friendly Biological Sequence Alignment Editor and Analysis Program for Windows 95/98/NT Nucleic Acids Symp. Ser. 1999, 41, 95– 98Google Scholar3https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD3cXhtVyjs7Y%253D&md5=eb372a22047bdd2a98be7e092c29b47aBioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NTHall, Thomas A.Nucleic Acids Symposium Series (1999), 41 (Symposium on RNA Biology III: RNA, Tool & Target), 95-98CODEN: NACSD8; ISSN:0261-3166. (Oxford University Press)BioEdit is a user-friendly sequence alignment editor and anal. package that is offered free of charge for Windows 95/98/NT systems. BioEdit is a full-featured nucleic acid/protein alignment editor that offers several modes of easy hand-alignment, split-window views, user-defined colors, information-based shading, auto-integration with ClustalW, local/internet BLAST, restriction-mapping, annotatable plasmid-drawing, box-shading with full color-capability, several built-in anal. options, and a graphical interface for configuring further interfaces to automatically run external anal. programs. BioEdit is also customizable to user preferences with user-defined menu shortcuts and correct handling of all fonts. Among the built-in analyses offered are a set of RNA comparative anal. tools including covariation, potential-pairings and mutual information anal. BioEdit offers the tools required to create and manipulate an alignment, run comparative analyses from the edit window, and view and analyze the data through interactive 2-D graphical matrix plots, area plots, and a rich-text editor. The following note describes a rough sample anal. of the secondary structure of bacterial RNase P RNA (excluding the high G+C Gram-Pos. group) by mutual information probing. An initial scanning of mutual information data via the graphical anal. tools reveals all major helixes that exist in the E. coli structure.
- 4Anzaldi, L. J.; Muñoz-Fernández, D.; Erill, I. BioWord: A Sequence Manipulation Suite for Microsoft Word BMC Bioinf. 2012, 13, 124Google Scholar4https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BC38npvVWgsA%253D%253D&md5=fd6b98466f925e013f24a02c7dccde00BioWord: a sequence manipulation suite for Microsoft WordAnzaldi Laura J; Munoz-Fernandez Daniel; Erill IvanBMC bioinformatics (2012), 13 (), 124 ISSN:.BACKGROUND: The ability to manipulate, edit and process DNA and protein sequences has rapidly become a necessary skill for practicing biologists across a wide swath of disciplines. In spite of this, most everyday sequence manipulation tools are distributed across several programs and web servers, sometimes requiring installation and typically involving frequent switching between applications. To address this problem, here we have developed BioWord, a macro-enabled self-installing template for Microsoft Word documents that integrates an extensive suite of DNA and protein sequence manipulation tools. RESULTS: BioWord is distributed as a single macro-enabled template that self-installs with a single click. After installation, BioWord will open as a tab in the Office ribbon. Biologists can then easily manipulate DNA and protein sequences using a familiar interface and minimize the need to switch between applications. Beyond simple sequence manipulation, BioWord integrates functionality ranging from dyad search and consensus logos to motif discovery and pair-wise alignment. Written in Visual Basic for Applications (VBA) as an open source, object-oriented project, BioWord allows users with varying programming experience to expand and customize the program to better meet their own needs. CONCLUSIONS: BioWord integrates a powerful set of tools for biological sequence manipulation within a handy, user-friendly tab in a widely used word processing software package. The use of a simple scripting language and an object-oriented scheme facilitates customization by users and provides a very accessible educational platform for introducing students to basic bioinformatics algorithms.
- 5Waterhouse, A. M.; Procter, J. B.; Martin, D. M. A.; Clamp, M.; Barton, G. J. Jalview Version 2--a Multiple Sequence Alignment Editor and Analysis Workbench Bioinformatics 2009, 25, 1189– 1191Google ScholarThere is no corresponding record for this reference.
- 6Weiss, G. A.; Watanabe, C. K.; Zhong, A.; Goddard, A.; Sidhu, S. S. Rapid Mapping of Protein Functional Epitopes by Combinatorial Alanine Scanning Proc. Natl. Acad. Sci. U. S. A. 2000, 97, 8950– 8954Google Scholar6https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD3cXls12lur0%253D&md5=549d46048c742ea594810adec8500e7fRapid mapping of protein functional epitopes by combinatorial alanine scanningWeiss, Gregory A.; Watanabe, Colin K.; Zhong, Alan; Goddard, Audrey; Sidhu, Sachdev S.Proceedings of the National Academy of Sciences of the United States of America (2000), 97 (16), 8950-8954CODEN: PNASA6; ISSN:0027-8424. (National Academy of Sciences)A combinatorial alanine-scanning strategy was used to det. simultaneously the functional contributions of 19 side chains buried at the interface between human growth hormone and the extracellular domain of its receptor. A phage-displayed protein library was constructed in which the 19 side chains were preferentially allowed to vary only as the wild type or alanine. The library pool was subjected to binding selections to isolate functional clones, and DNA sequencing was used to det. the alanine/wild-type ratio at each varied position. This ratio was used to calc. the effect of each alanine substitution as a change in free energy relative to that of wild type. Only seven side chains contribute significantly to the binding interaction, and these conserved residues form a compact cluster in the human growth hormone tertiary structure. The results were in excellent agreement with free energy data previously detd. by conventional alanine-scanning mutagenesis and suggest that this technol. should be useful for analyzing functional epitopes in proteins.
- 7Schofield, D. J.; Pope, A. R.; Clementel, V.; Buckell, J.; Chapple, S. D.; Clarke, K. F.; Conquer, J. S.; Crofts, A. M.; Crowther, S. R.; Dyson, M. R.; Flack, G.; Griffin, G. J.; Hooks, Y.; Howat, W. J.; Kolb-Kokocinski, A.; Kunze, S.; Martin, C. D.; Maslen, G. L.; Mitchell, J. N.; O’Sullivan, M.; Perera, R. L.; Roake, W.; Shadbolt, S. P.; Vincent, K. J.; Warford, A.; Wilson, W. E.; Xie, J.; Young, J. L.; McCafferty, J. Application of Phage Display to High Throughput Antibody Generation and Characterization Genome Biol. 2007, 8, R254Google ScholarThere is no corresponding record for this reference.
- 8Schwimmer, L. J.; Huang, B.; Giang, H.; Cotter, R. L.; Chemla-Vogel, D. S.; Dy, F. V.; Tam, E. M.; Zhang, F.; Toy, P.; Bohmann, D. J.; Watson, S. R.; Beaber, J. W.; Reddy, N.; Kuan, H.-F.; Bedinger, D. H.; Rondon, I. J. Discovery of Diverse and Functional Antibodies from Large Human Repertoire Antibody Libraries J. Immunol. Methods 2013, 391, 60– 71Google Scholar8https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3sXkt1KlsL4%253D&md5=e4bd1ba932c9ed2b76b774724baaa9c6Discovery of diverse and functional antibodies from large human repertoire antibody librariesSchwimmer, Lauren J.; Huang, Betty; Giang, Hoa; Cotter, Robyn L.; Chemla-Vogel, David S.; Dy, Francis V.; Tam, Eric M.; Zhang, Fangjiu; Toy, Pamela; Bohmann, David J.; Watson, Susan R.; Beaber, John W.; Reddy, Nithin; Kuan, Hua-Feng; Bedinger, Daniel H.; Rondon, Isaac J.Journal of Immunological Methods (2013), 391 (1-2), 60-71CODEN: JIMMBG; ISSN:0022-1759. (Elsevier B.V.)Phage display antibody libraries have a proven track record for the discovery of therapeutic human antibodies, increasing the demand for large and diverse phage antibody libraries for the discovery of new therapeutics. We have constructed naive antibody phage display libraries in both Fab and scFv formats, with each library having more than 250 billion clones that encompass the human antibody repertoire. These libraries show high fidelity in open reading frame and expression percentages, and their V-gene family distribution, VH-CDR3 length and amino acid usage mirror the natural diversity of human antibodies. Both the Fab and scFv libraries show robust sequence diversity in target-specific binders and differential V-gene usage for each target tested, supporting the use of libraries that utilize multiple display formats and V-gene utilization to maximize antibody-binding diversity. For each of the targets, clones with picomolar affinities were identified from at least one of the libraries and for the two targets assessed for activity, functional antibodies were identified from both libraries.
- 9Vielmetter, J.; Tishler, J.; Ary, M. L.; Cheung, P.; Bishop, R. Data Management Solutions for Protein Therapeutic Research and Development Drug Discovery Today 2005, 10, 1065– 1071Google ScholarThere is no corresponding record for this reference.
- 10Hansen, M. R.; Villar, H. O.; Feyfant, E. Development of an Informatics Platform for Therapeutic Protein and Peptide Analytics J. Chem. Inf. Model. 2013, 53, 2774– 2779Google Scholar10https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3sXhsVOlt7vK&md5=abecaa332c19c825132265c9a668ebb1Development of an Informatics Platform for Therapeutic Protein and Peptide AnalyticsHansen, Mark R.; Villar, Hugo O.; Feyfant, EricJournal of Chemical Information and Modeling (2013), 53 (10), 2774-2779CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)The momentum gained by research on biologics has not been met yet with equal thrust on the informatics side. There is a noticeable lack of software for data management that empowers the bench scientists working on the development of biol. therapeutics. SARvision|Biologics is a tool to analyze data assocd. with biopolymers, including peptides, antibodies, and protein therapeutics programs. The program brings under a single user interface tools to filter, mine, and visualize data as well as those algorithms needed to organize sequences. As part of the data-anal. tools, we introduce two new concepts: mutation cliffs and invariant maps. Invariant maps show the variability of properties when a monomer is maintained const. in a position of the biopolymer. Mutation cliff maps draw attention to pairs of sequences where a single or limited no. of point mutations elicit a large change in a property of interest. We illustrate the program and its applications using a peptide data set collected from the literature.
- 11Caffrey, D. R.; Dana, P. H.; Mathur, V.; Ocano, M.; Hong, E.-J.; Wang, Y. E.; Somaroo, S.; Caffrey, B. E.; Potluri, S.; Huang, E. S. PFAAT Version 2.0: A Tool for Editing, Annotating, and Analyzing Multiple Sequence Alignments BMC Bioinf. 2007, 8, 381Google Scholar11https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BD2snpsVyksg%253D%253D&md5=138db4c59d5fe8e9748f7eec4e12b07fPFAAT version 2.0: a tool for editing, annotating, and analyzing multiple sequence alignmentsCaffrey Daniel R; Dana Paul H; Mathur Vidhya; Ocano Marco; Hong Eun-Jong; Wang Yaoyu E; Somaroo Shyamal; Caffrey Brian E; Potluri Shobha; Huang Enoch SBMC bioinformatics (2007), 8 (), 381 ISSN:.BACKGROUND: By virtue of their shared ancestry, homologous sequences are similar in their structure and function. Consequently, multiple sequence alignments are routinely used to identify trends that relate to function. This type of analysis is particularly productive when it is combined with structural and phylogenetic analysis. RESULTS: Here we describe the release of PFAAT version 2.0, a tool for editing, analyzing, and annotating multiple sequence alignments. Support for multiple annotations is a key component of this release as it provides a framework for most of the new functionalities. The sequence annotations are accessible from the alignment and tree, where they are typically used to label sequences or hyperlink them to related databases. Sequence annotations can be created manually or extracted automatically from UniProt entries. Once a multiple sequence alignment is populated with sequence annotations, sequences can be easily selected and sorted through a sophisticated search dialog. The selected sequences can be further analyzed using statistical methods that explicitly model relationships between the sequence annotations and residue properties. Residue annotations are accessible from the alignment viewer and are typically used to designate binding sites or properties for a particular residue. Residue annotations are also searchable, and allow one to quickly select alignment columns for further sequence analysis, e.g. computing percent identities. Other features include: novel algorithms to compute sequence conservation, mapping conservation scores to a 3D structure in Jmol, displaying secondary structure elements, and sorting sequences by residue composition. CONCLUSION: PFAAT provides a framework whereby end-users can specify knowledge for a protein family in the form of annotation. The annotations can be combined with sophisticated analysis to test hypothesis that relate to sequence, structure and function.
- 12Cock, P. J. A.; Antao, T.; Chang, J. T.; Chapman, B. A.; Cox, C. J.; Dalke, A.; Friedberg, I.; Hamelryck, T.; Kauff, F.; Wilczynski, B.; de Hoon, M. J. L. Biopython: Freely Available Python Tools for Computational Molecular Biology and Bioinformatics Bioinformatics 2009, 25, 1422– 1423Google Scholar12https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1MXmtFeqt74%253D&md5=cdb95ac36cc1900372d6b6ba629d4537Biopython: freely available Python tools for computational molecular biology and bioinformaticsCock, Peter J. A.; Antao, Tiago; Chang, Jeffrey T.; Chapman, Brad A.; Cox, Cymon J.; Dalke, Andrew; Friedberg, Iddo; Hamelryck, Thomas; Kauff, Frank; Wilczynski, Bartek; de Hoon, Michiel J. L.Bioinformatics (2009), 25 (11), 1422-1423CODEN: BOINFP; ISSN:1367-4803. (Oxford University Press)Summary: The Biopython project is a mature open source international collaboration of volunteer developers, providing Python libraries for a wide range of bioinformatics problems. Biopython includes modules for reading and writing different sequence file formats and multiple sequence alignments, dealing with 3D macro mol. structures, interacting with common tools such as BLAST, ClustalW and EMBOSS, accessing key online databases, as well as providing numerical methods for statistical learning. Availability: Biopython is freely available, with documentation and source code at www.biopython.org under the Biopython license. Contact: All queries should be directed to the Biopython mailing lists, see www.biopython.org/wiki/ Mailimg [email protected].
- 13Gentleman, R. C.; Carey, V. J.; Bates, D. M.; Bolstad, B.; Dettling, M.; Dudoit, S.; Ellis, B.; Gautier, L.; Ge, Y.; Gentry, J.; Hornik, K.; Hothorn, T.; Huber, W.; Iacus, S.; Irizarry, R.; Leisch, F.; Li, C.; Maechler, M.; Rossini, A. J.; Sawitzki, G.; Smith, C.; Smyth, G.; Tierney, L.; Yang, J. Y. H.; Zhang, J. Bioconductor: Open Software Development for Computational Biology and Bioinformatics Genome Biol. 2004, 5, R80Google Scholar13https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BD2cvptlelug%253D%253D&md5=35cc886a10031db9f45bd91012252749Bioconductor: open software development for computational biology and bioinformaticsGentleman Robert C; Carey Vincent J; Bates Douglas M; Bolstad Ben; Dettling Marcel; Dudoit Sandrine; Ellis Byron; Gautier Laurent; Ge Yongchao; Gentry Jeff; Hornik Kurt; Hothorn Torsten; Huber Wolfgang; Iacus Stefano; Irizarry Rafael; Leisch Friedrich; Li Cheng; Maechler Martin; Rossini Anthony J; Sawitzki Gunther; Smith Colin; Smyth Gordon; Tierney Luke; Yang Jean Y H; Zhang JianhuaGenome biology (2004), 5 (10), R80 ISSN:.The Bioconductor project is an initiative for the collaborative creation of extensible software for computational biology and bioinformatics. The goals of the project include: fostering collaborative development and widespread use of innovative software, reducing barriers to entry into interdisciplinary scientific research, and promoting the achievement of remote reproducibility of research results. We describe details of our aims and methods, identify current challenges, compare Bioconductor to other open bioinformatics projects, and provide working examples.
- 14Prlić, A.; Yates, A.; Bliven, S. E.; Rose, P. W.; Jacobsen, J.; Troshin, P. V.; Chapman, M.; Gao, J.; Koh, C. H.; Foisy, S.; Holland, R.; Rimsa, G.; Heuer, M. L.; Brandstätter-Müller, H.; Bourne, P. E.; Willis, S. BioJava: An Open-Source Framework for Bioinformatics in 2012 Bioinformatics 2012, 28, 2693– 2695Google ScholarThere is no corresponding record for this reference.
- 15Lee, C. V.; Liang, W.-C.; Dennis, M. S.; Eigenbrot, C.; Sidhu, S. S.; Fuh, G. High-Affinity Human Antibodies from Phage-Displayed Synthetic Fab Libraries with a Single Framework Scaffold J. Mol. Biol. 2004, 340, 1073– 1093Google ScholarThere is no corresponding record for this reference.
- 16Lamb, B. M.; Mercer, A. C.; Barbas, C. F., 3rd Directed Evolution of the TALE N-Terminal Domain for Recognition of All 5′ Bases Nucleic Acids Res. 2013, 41, 9779– 9785Google ScholarThere is no corresponding record for this reference.
- 17Binz, H. K.; Stumpp, M. T.; Forrer, P.; Amstutz, P.; Plückthun, A. Designing Repeat Proteins: Well-Expressed, Soluble and Stable Proteins from Combinatorial Libraries of Consensus Ankyrin Repeat Proteins J. Mol. Biol. 2003, 332, 489– 503Google ScholarThere is no corresponding record for this reference.
- 18Koide, S.; Koide, A.; Lipovšek, D. Target-Binding Proteins Based on the 10th Human Fibronectin Type III Domain (10Fn3) Methods Enzymol. 2012, 503, 135– 156Google ScholarThere is no corresponding record for this reference.
- 19Ho, M.; Pastan, I. In Vitro Antibody Affinity Maturation Targeting Germline Hotspots Methods Mol. Biol. 2009, 525, 293– 308
xiv
Google Scholar19https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1MXkt1Cjurs%253D&md5=4d7b5bc258b12433793badf18ecbffe8In vitro antibody affinity maturation targeting germline hotspotsHo, Mitchell; Pastan, IraMethods in Molecular Biology (Totowa, NJ, United States) (2009), 525 (Therapeutic Antibodies), 293-308CODEN: MMBIED; ISSN:1064-3745. (Humana Press Inc.)A review. Affinity-matured antibodies can exhibit increased biol. efficacy. Regardless of whether an antibody is isolated from a hybridoma or a human Fv phage library, the antibody affinity for its target may need improvement for therapeutic applications. An increased affinity may allow for a reduced dosage of a therapeutic antibody; toxic side effects may also be reduced. In the immune system, affinity maturation is a process involving somatic hypermutations in B cells. Therefore, germline hotspot residues are most likely to have a major impact on antibody affinity. Here, the authors describe procedures for germline hotspot mutagenesis with an emphasis on strategies for randomizing hotspots with PCR and phage display, using as an example the anti-CD22 monoclonal antibody. - 20Labrou, N. E. Random Mutagenesis Methods for in Vitro Directed Enzyme Evolution Curr. Protein Pept. Sci. 2010, 11, 91– 100Google Scholar20https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3cXkt1ersrY%253D&md5=7d3b4c0dc56d139154faeeb1fe0e81d9Random mutagenesis methods for in vitro directed enzyme evolutionLabrou, Nikolaos E.Current Protein and Peptide Science (2010), 11 (1), 91-100CODEN: CPPSCM; ISSN:1389-2037. (Bentham Science Publishers Ltd.)A review. Random mutagenesis is a powerful tool for generating enzymes, proteins, entire metabolic pathways, or even entire genomes with desired or improved properties. This technol. is used to evolve genes in vitro through an iterative process consisting of recombinant generation. Coupled with the development of powerful high-throughput screening or selection methods, this technique has been successfully used to solve problems in protein engineering. There are many methods to generate genetic diversity by random mutagenesis and to create combinatorial libraries. This can be achieved by treating DNA or whole bacteria with various chem. mutagens, by passing cloned genes through mutator strains, by "error-prone" PCR mutagenesis, by rolling circle error-prone PCR, or by satn. mutagenesis. The next sections of this review article focus on recent advances in techniques and methods used for in vitro directed evolution of enzymes using random mutagenesis. Selected examples, highlighting successful applications of these methods, are also presented and discussed.
- 21Crooks, G. E.; Hon, G.; Chandonia, J.-M.; Brenner, S. E. WebLogo: A Sequence Logo Generator Genome Res. 2004, 14, 1188– 1190Google Scholar21https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2cXkvFGht7Y%253D&md5=1b7fb3dd80c6f5a1e600f736a1bf498bWebLogo: A sequence logo generatorCrooks, Gavin E.; Hon, Gary; Chandonia, John-Marc; Brenner, Steven E.Genome Research (2004), 14 (6), 1188-1190CODEN: GEREFS; ISSN:1088-9051. (Cold Spring Harbor Laboratory Press)WebLogo generates sequence logos, graphical representations of the patterns within a multiple sequence alignment. Sequence logos provide a richer and more precise description of sequence similarity than consensus sequences and can rapidly reveal significant features of the alignment otherwise difficult to perceive. Each logo consists of stacks of letters, one stack for each position in the sequence. The overall height of each stack indicates the sequence conservation at that position (measured in bits), whereas the height of symbols within the stack reflects the relative frequency of the corresponding amino or nucleic acid at that position. WebLogo has been enhanced recently with addnl. features and options, to provide a convenient and highly configurable sequence logo generator. A command line interface and the complete, open WebLogo source code are available for local installation and customization.
- 22Zimmerman, E. S.; Heibeck, T. H.; Gill, A.; Li, X.; Murray, C. J.; Madlansacay, M. R.; Tran, C.; Uter, N. T.; Yin, G.; Rivers, P. J.; Yam, A. Y.; Wang, W. D.; Steiner, A. R.; Bajad, S. U.; Penta, K.; Yang, W.; Hallam, T. J.; Thanos, C. D.; Sato, A. K. Production of Site-Specific Antibody-Drug Conjugates Using Optimized Non-Natural Amino Acids in a Cell-Free Expression System Bioconjugate Chem. 2014, 25, 351– 361Google Scholar22https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2cXpt1ymtw%253D%253D&md5=af10b912907e3157a0314ce2a80828c4Production of Site-Specific Antibody-Drug Conjugates Using Optimized Non-Natural Amino Acids in a Cell-Free Expression SystemZimmerman, Erik S.; Heibeck, Tyler H.; Gill, Avinash; Li, Xiaofan; Murray, Christopher J.; Madlansacay, Mary Rose; Tran, Cuong; Uter, Nathan T.; Yin, Gang; Rivers, Patrick J.; Yam, Alice Y.; Wang, Willie D.; Steiner, Alexander R.; Bajad, Sunil U.; Penta, Kalyani; Yang, Wenjin; Hallam, Trevor J.; Thanos, Christopher D.; Sato, Aaron K.Bioconjugate Chemistry (2014), 25 (2), 351-361CODEN: BCCHES; ISSN:1043-1802. (American Chemical Society)Antibody-drug conjugates (ADCs) are a targeted chemotherapeutic currently at the cutting edge of oncol. medicine. These hybrid mols. consist of a tumor antigen-specific antibody coupled to a chemotherapeutic small mol. Through targeted delivery of potent cytotoxins, ADCs exhibit improved therapeutic index and enhanced efficacy relative to traditional chemotherapies and monoclonal antibody therapies. The currently FDA-approved ADCs, Kadcyla (Immunogen/Roche) and Adcetris (Seattle Genetics), are produced by conjugation to surface-exposed lysines, or partial disulfide redn. and conjugation to free cysteines, resp. These stochastic modes of conjugation lead to heterogeneous drug products with varied nos. of drugs conjugated across several possible sites. As a consequence, the field has limited understanding of the relationships between the site and extent of drug loading and ADC attributes such as efficacy, safety, pharmacokinetics, and immunogenicity. A robust platform for rapid prodn. of ADCs with defined and uniform sites of drug conjugation would enable such studies. We have established a cell-free protein expression system for prodn. of antibody drug conjugates through site-specific incorporation of the optimized non-natural amino acid, para-azidomethyl-L-phenylalanine (pAMF). By using our cell-free protein synthesis platform to directly screen a library of aaRS variants, we have discovered a novel variant of the Methanococcus jannaschii tyrosyl tRNA synthetase (TyrRS), with a high activity and specificity toward pAMF. We demonstrate that site-specific incorporation of pAMF facilitates near complete conjugation of a DBCO-PEG-monomethyl auristatin (DBCO-PEG-MMAF) drug to the tumor-specific, Her2-binding IgG Trastuzumab using strain-promoted azide-alkyne cycloaddn. (SPAAC) copper-free click chem. The resultant ADCs proved highly potent in in vitro cell cytotoxicity assays.
- 23Stafford, R. L.; Matsumoto, M. L.; Yin, G.; Cai, Q.; Fung, J. J.; Stephenson, H.; Gill, A.; You, M.; Lin, S.-H.; Wang, W. D.; Masikat, M. R.; Li, X.; Penta, K.; Steiner, A. R.; Baliga, R.; Murray, C. J.; Thanos, C. D.; Hallam, T. J.; Sato, A. K. In Vitro Fab Display: A Cell-Free System for IgG Discovery Protein Eng. Des. Sel. 2014, 27, 97– 109Google Scholar23https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2cXltFKhsrw%253D&md5=3555fdfa8a1c297ee6eef5bbb7cd8736In vitro Fab display: a cell-free system for IgG discoveryStafford, Ryan L.; Matsumoto, Marissa L.; Yin, Gang; Cai, Qi; Fung, Juan Jose; Stephenson, Heather; Gill, Avinash; You, Monica; Lin, Shwu-Hwa; Wang, Willie D.; Masikat, Mary Rose; Li, Xiaofan; Penta, Kalyani; Steiner, Alex R.; Baliga, Ramesh; Murray, Christopher J.; Thanos, Christopher D.; Hallam, Trevor J.; Sato, Aaron K.Protein Engineering, Design & Selection (2014), 27 (4), 97-109CODEN: PEDSBR; ISSN:1741-0126. (Oxford University Press)Selection technologies such as ribosome display enable the rapid discovery of novel antibody fragments entirely in vitro. It has been assumed that the open nature of the cell-free reactions used in these technologies limits selections to single-chain protein fragments. We present a simple approach for the selection of multi-chain proteins, such as antibody Fab fragments, using ribosome display. Specifically, we show that a two-chain trastuzumab (Herceptin) Fab domain can be displayed in a format which tethers either the heavy or light chain to the ribosome while retaining functional antigen binding. Then, we constructed synthetic Fab HC and LC libraries and performed test selections against carcinoembryonic antigen (CEA) and vascular endothelial growth factor (VEGF). The Fab selection output was reformatted into full-length Ig Gs (IgGs) and directly expressed at high levels in an optimized cell-free system for immediate screening, purifn. and characterization. Several novel IgGs were identified using this cell-free platform that bind to purified CEA, CEA pos. cells and VEGF.
- 24Needleman, S. B.; Wunsch, C. D. A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins J. Mol. Biol. 1970, 48, 443– 453Google Scholar24https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaE3cXktVShu74%253D&md5=703fa9a6d50ac3b7c9b45c37aee094d0General method applicable to the search for similarities in the amino acid sequence of two proteinsNeedleman, Saul B.; Wunsch, Christian D.Journal of Molecular Biology (1970), 48 (3), 443-53CODEN: JMOBAK; ISSN:0022-2836.A computer adaptable method for finding similarities in the amino acid sequences of two proteins has been developed, making it possible to det. whether significant homology exists between the proteins. This information is used to trace their possible evolutionary development. The max. match is a no. dependent upon the similarity of the sequences. One of its definitions is the largest no. of amino acids of one protein that can be matched with those of a second protein allowing for all possible interruptions in either of the sequences. While the interruptions give rise to a very large no. of comparisons, the method efficiently excludes from consideration those comparisons that cannot contribute to the max. match. Comparisons are made from the smallest unit of significance, a pair of amino acids, one from each protein.
- 25Henikoff, S.; Henikoff, J. G. Amino Acid Substitution Matrices from Protein Blocks Proc. Natl. Acad. Sci. U. S. A. 1992, 89, 10915– 10919Google Scholar25https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaK3sXjsFCgsQ%253D%253D&md5=3c4bee915654ac0b98b8d4aaafd44ebcAmino acid substitution matrixes from protein blocksHenikoff, Steven; Henikoff, Jorja G.Proceedings of the National Academy of Sciences of the United States of America (1992), 89 (22), 10915-19CODEN: PNASA6; ISSN:0027-8424.Methods for alignment of protein sequences typically measure similarity by using a substitution matrix with scores for all possible exchanges of one amino acid with another. The most widely used matrixes are based on the Dayhoff model of evolutionary rates. Using a different approach, the authors derived substitution matrixes from about 2000 blocks of aligned sequence segments characterizing more than 500 groups of related proteins. This led to marked improvements in alignments and in searches using queries from each of the groups.
- 26Heckman, K. L.; Pease, L. R. Gene Splicing and Mutagenesis by PCR-Driven Overlap Extension Nat. Protoc. 2007, 2, 924– 932Google ScholarThere is no corresponding record for this reference.
- 27Sievers, F.; Higgins, D. G. Clustal Omega, Accurate Alignment of Very Large Numbers of Sequences Methods Mol. Biol. 2014, 1079, 105– 116Google Scholar27https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2MXntFOhsLw%253D&md5=4287e7d9b9ab241655fdee497980031fClustal Omega, Accurate Alignment of Very Large Numbers of SequencesSievers, Fabian; Higgins, Desmond G.Methods in Molecular Biology (New York, NY, United States) (2014), 1079 (Multiple Sequence Alignment Methods), 105-116CODEN: MMBIED; ISSN:1940-6029. (Springer)Clustal Omega is a completely rewritten and revised version of the widely used Clustal series of programs for multiple sequence alignment. It can deal with very large nos. (many tens of thousands) of DNA/RNA or protein sequences due to its use of the mBED algorithm for calcg. guide trees. This algorithm allows very large alignment problems to be tackled very quickly, even on personal computers. The accuracy of the program has been considerably improved over earlier Clustal programs, through the use of the HHalign method for aligning profile hidden Markov models. The program currently is used from the command line or can be run on line.
- 28Ewing, B.; Hillier, L.; Wendl, M. C.; Green, P. Base-Calling of Automated Sequencer Traces Using Phred. I. Accuracy Assessment Genome Res. 1998, 8, 175– 185Google Scholar28https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaK1cXitlWlu78%253D&md5=2a5c86722c1883b778c8c5222473ba22Base-calling automated sequencer traces using phred. I. Accuracy assessmentEwing, Brent; Hillier, LaDeana; Wendl, Michael C.; Green, PhilGenome Research (1998), 8 (3), 175-185CODEN: GEREFS; ISSN:1088-9051. (Cold Spring Harbor Laboratory Press)The availability of massive amts. of DNA sequence information has begun to revolutionize the practice of biol. As a result, current large-scale sequencing output, while impressive, is not adequate to keep pace with growing demand and, in particular, is far short of what will be required to obtain the 3-billion-base human genome sequence by the target date of 2005. To reach this goal, improved automation will be essential, and it is particularly important that human involvement in sequence data processing be significantly reduced or eliminated. Progress in this respect will require both improved accuracy of the data processing software and reliable accuracy measures to reduce the need for human involvement in error correction and make human review more efficient. Here, we describe one step toward that goal: a base-calling program for automated sequencer traces, phred, with improved accuracy. Phred appears to be the first base-calling program to achieve a lower error rate than the ABI software, averaging 40%-50% fewer errors in the data sets examd. independent of position in read, machine running conditions, or sequencing chem.
- 29Ewing, B.; Green, P. Base-Calling of Automated Sequencer Traces Using Phred. II. Error Probabilities Genome Res. 1998, 8, 186– 194Google Scholar29https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaK1cXitlWlu7g%253D&md5=f870cd1861b597a01874832afa393634Base-calling of automated sequencer traces using Phred. II. Error probabilitiesEwing, Brent; Green, PhilGenome Research (1998), 8 (3), 186-194CODEN: GEREFS; ISSN:1088-9051. (Cold Spring Harbor Laboratory Press)Elimination of the data processing bottleneck in high-throughput sequencing will require both improved accuracy of data processing software and reliable measures of that accuracy. The ability to est. a probability of error for each base-call, as a function of certain parameters computed from the trace data, was developed and implemented in the base-calling program phred. These error probabilities are shown to be valid (correspond to actual error rates) and to have high power to discriminate correct base-calls from incorrect ones, for read data collected under several different chemistries and electrophoretic conditions. They play a crit. role in the assembly program phrap and the finishing program consed.
- 30Gotoh, O. An Improved Algorithm for Matching Biological Sequences J. Mol. Biol. 1982, 162, 705– 708Google ScholarThere is no corresponding record for this reference.
- 31Thompson, J. D.; Plewniak, F.; Poch, O. A Comprehensive Comparison of Multiple Sequence Alignment Programs Nucleic Acids Res. 1999, 27, 2682– 2690Google ScholarThere is no corresponding record for this reference.
- 32Tiller, T.; Schuster, I.; Deppe, D.; Siegers, K.; Strohner, R.; Herrmann, T.; Berenguer, M.; Poujol, D.; Stehle, J.; Stark, Y.; Heßling, M.; Daubert, D.; Felderer, K.; Kaden, S.; Kölln, J.; Enzelberger, M.; Urlinger, S. A Fully Synthetic Human Fab Antibody Library Based on Fixed VH/VL Framework Pairings with Favorable Biophysical Properties MAbs 2013, 5, 445– 470Google ScholarThere is no corresponding record for this reference.
- 33Velappan, N.; Sblattero, D.; Chasteen, L.; Pavlik, P.; Bradbury, A. R. M. Plasmid Incompatibility: More Compatible than Previously Thought? Protein Eng. Des. Sel. 2007, 20, 309– 313Google Scholar33https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2sXhtV2ns7bJ&md5=37dbb4a4fd52ed81a3ee5d727de2297ePlasmid incompatibility: more compatible than previously thought?Velappan, Nileena; Sblattero, Daniele; Chasteen, Leslie; Pavlik, Peter; Bradbury, Andrew R. M.Protein Engineering, Design & Selection (2007), 20 (7), 309-313CODEN: PEDSBR; ISSN:1741-0126. (Oxford University Press)It is generally accepted that plasmids contg. the same origin of replication are incompatible. We have re-examd. this concept in terms of the plasmid copy no., by introducing plasmids contg. the same orgin of replication and different antibiotic resistance genes into bacteria. By selecting for resistance to only one antibiotic, we were able to examine the persistence of plasmids carrying resistances to other antibiotics. We find that plasmids are not rapidly lost, but are able to persist in bacteria for multiple overnight growth cycles, with some dependence upon the nature of the antibiotic selected for. By carrying out the expts. with different origins of replication, we have been able to show that higher copy no. leads to longer persistence, but even with low copy plasmids, persistence occurs to a significant degree. This observation holds significance for the field of protein engineering, as the presence of two or more plasmids within bacteria weakens, and confuses, the connection between screened phenotype and genotype, with the potential to wrongly assign specific phenotypes to incorrect genotypes.
- 34Goldsmith, M.; Kiss, C.; Bradbury, A. R. M.; Tawfik, D. S. Avoiding and Controlling Double Transformation Artifacts Protein Eng. Des. Sel. 2007, 20, 315– 318Google Scholar34https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2sXhtV2ns7bK&md5=9a2f1fa2a0ba4ac99de9fff6be471ba6Avoiding and controlling double transformation artifactsGoldsmith, Moshe; Kiss, Csaba; Bradbury, Andrew R. M.; Tawfik, Dan S.Protein Engineering, Design & Selection (2007), 20 (7), 315-318CODEN: PEDSBR; ISSN:1741-0126. (Oxford University Press)This article describes a set of std. control expts. for the authentication of new protein variants isolated through library selection and site-directed mutagenesis. These controls are specifically designed to rule out artifacts derived from 'double transformants'-i.e. cells transformed with, or infected by, two different plasmids simultaneously. These seem to have been the source of past artifacts and, as demonstrated here, are far more common than generally recognized. By following std. protocols for cloning, plasmid isolation, subcloning, in combination with functional assays, the presence of such artifacts can be ruled out. This protocol needs to be applied for any new variant isolated from heterogeneous gene repertoires, and in particular for variants isolated by selection for either enzymic activity, or binding.
- 35Emsley, P.; Lohkamp, B.; Scott, W. G.; Cowtan, K. Features and Development of Coot Acta Crystallogr. D Biol. Crystallogr. 2010, 66, 486– 501Google Scholar35https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3cXksFKisb8%253D&md5=67262cbfc60004de5ef962d5c043c910Features and development of CootEmsley, P.; Lohkamp, B.; Scott, W. G.; Cowtan, K.Acta Crystallographica, Section D: Biological Crystallography (2010), 66 (4), 486-501CODEN: ABCRE6; ISSN:0907-4449. (International Union of Crystallography)Coot is a mol.-graphics application for model building and validation of biol. macromols. The program displays electron-d. maps and at. models and allows model manipulations such as idealization, real-space refinement, manual rotation/translation, rigid-body fitting, ligand search, solvation, mutations, rotamers and Ramachandran idealization. Furthermore, tools are provided for model validation as well as interfaces to external programs for refinement, validation and graphics. The software is designed to be easy to learn for novice users, which is achieved by ensuring that tools for common tasks are 'discoverable' through familiar user-interface elements (menus and toolbars) or by intuitive behavior (mouse controls). Recent developments have focused on providing tools for expert users, with customisable key bindings, extensions and an extensive scripting interface. The software is under rapid development, but has already achieved very widespread use within the crystallog. community. The current state of the software is presented, with a description of the facilities available and of some of the underlying methods employed.