to Modern Drug Discovery home
September 2000
Modern Drug Discovery, 2000, 5(7) 35, 38, 41–42, 44.
© 2000 American Chemical Society.


Proteomics: New tools for a new era

opening artBridging the gap between genomics and drug discovery

BY ALED M. EDWARDS, CHERYL H. ARROWSMITH, and BERTRAND des PALLIERES

More than five years ago, the pharmaceutical sector began to invest heavily in genomics to increase the supply of validated drug targets. The genomic gold rush that followed, and which has now led to the completion of the human genome sequence, has flooded the drug discovery pipeline with more than 75,000 sequences of potential targets. Unfortunately, although genomics delivered the mass of raw information as promised, genomics technologies have largely been unable to extract the anticipated 10,000 therapeutic targets from the crude sequence ore.

The inability to identify valid drug targets by examining gene sequence information has created a gap between genomics and drug discovery. This gap reflects the fact that in most cases, gene sequence reveals little about protein function or disease relevance. Accordingly, the true value of the genome sequence information will only be realized after a function has been assigned to all of the encoded proteins.

Proteomics seeks to provide functional information for all proteins. Much like genomics, proteomics is more of a concept than a defined technology, and it refers to any protein-based approach that has the capacity to provide new information about proteins on a genomewide scale. The challenge facing proteomics is enormous, because more than 75% of the predicted proteins in multicellular organisms have no known cellular function. However, proteomics is poised to yield remarkable discoveries because this set of proteins is likely to include new enzymes, signaling molecules, and pathways that may be excellent and unanticipated therapeutic targets.

Applying proteomics technologies will not only provide validated targets for drug discovery but will also increase the efficiency of the drug discovery process downstream. For example, genomewide protein purification efforts will provide reagents for high-throughput screens, and structural proteomics efforts will provide three-dimensional structures for drug development. Clearly, proteomics is destined to bridge the gap between genomics and drug discovery.

The primary goal of proteomics is to provide functional annotations for the entire proteome. Of course, the function of a protein has many definitions, ranging from its biochemical activity to its physiological role, and so the optimal proteomics strategy must integrate many different technologies. True proteomics applications must also be unbiased in design, to be poised to discover the unknown.

This article is an overview of the technologies most relevant to the drug discovery process, and it gives some ideas about developing proteomics technologies.

Pure proteins: The biological infrastructure
One key facet of proteomics is that it is firmly an experimental science. Computational methods are not yet able to predict protein function, structure, or suitability for drug development from the amino acid sequence. For many proteomics applications, such as structural proteomics and proteome-wide high-throughput screening and protein interaction studies, one requirement will be to have large quantities of thousands of purified proteins. With current technology, this goal is unachievable, but efforts are under way in several institutions to develop the requisite high-throughput purification processes. These include:

It will be important to couple these massive purification efforts with a quality control strategy that ensures the purity and structural integrity of the purified proteins. Proteins or protein fragments produced in heterologous expression hosts, in in vitro transcription–translation reactions, or in in vivo two-hybrid screens are commonly misfolded. The magnitude of this problem is often underappreciated. In our large-scale purification effort, we have cloned, expressed, and analyzed thousands of yeast, bacterial, and archaeal proteins that were predicted to be cytoplasmic and have no known structural homologue. We observed that more than 50% of the proteins were insoluble or had an unstable structure (1). Using such improperly folded proteins for biochemical and pharmacological screens or assays would likely lead to false positive or false negative results. Proteomics scientists who use them also risk contaminating their databases with incorrect results and wasting time and money on fruitless targets. The importance of ensuring structural integrity of protein reagents often is underestimated by those more versed in molecular biology and genomics methods. In the interest of speed and throughput (and marketing!), proteomics researchers should not forget the fundamentals of protein structure and biochemistry.

The proteome: Target discovery
Although there may be more than 100,000 proteins in humans, only a fraction of these are expressed in any given cell type. To discover and monitor the relevance of a protein to a disease-related process, it is important to catalog where, when, and to what extent a protein is expressed. DNA microarray technology, which monitors the relative abundance of mRNA in a cell, is a powerful way to accomplish this because mRNA and protein concentrations are often correlated. DNA microarray technology can measure even poorly expressed genes, ensuring a comprehensive assessment of which genes are expressed in which tissue (2). However, since mRNA and protein levels do not always correlate in the cell (3) and many regulatory processes occur after transcription, a direct measure of relative protein abundance is more desirable.

figure 1 - early proteomics usting gel electrophoresis
Figure 1. Early proteomics methods. The traditional approach to protein detection and identification involved the use of two-dimensional polyacrylamide gel electrophoresis (2D-PAGE), which separates proteins based on their relative mass and isoelectric point, followed by single-spot analysis via mass spectrometry.
A variety of proteomics technologies are now being used to measure differences in cellular protein abundance. Currently, the primary method is electrophoresis or chromatography coupled with mass spectrometry (MS) (Figure 1). In this method, mixtures of proteins in cellular extracts are resolved and then individual proteins are identified using MS peptide fingerprinting (4). Although in theory MS approaches have the potential to characterize the entire protein complement of a cell, in practice it has proved difficult to identify proteins of low abundance, because cell extracts, and the resulting mass spectra, are dominated by a few hundred very abundant proteins. Future research in this area might focus on developing better methods to fractionate cell extracts before MS.

Defining the protein composition of a cell must also take into account the fact that mRNA splicing and covalent modifications generate protein isoforms that might contribute to important regulatory processes in the cell. Documenting the extent to which a protein is modified and the temporal changes in the modifications during disease can provide strategies for therapeutic intervention. Several approaches are being used to study post-translational modifications on a proteome-wide scale. Again, the most popular approach couples MS, which can detect even subtle covalent modifications, with methods to specifically enrich for modified proteins (5). Other strategies include the use of modification-specific antibodies (6).

The techniques that catalog changes in gene expression, protein levels, or modification due to disease or other cellular perturbations are powerful methods of identifying potential targets for drug discovery. However, they do not reveal the biochemical mechanism of how a gene product is related to disease or whether the protein is likely to be amenable to drug development. To address these issues, proteomics approaches that address protein function are required.

Chemical proteomics: Screens for activity and binding
Most of the proteins identified through genome sequence projects have no known function, although many are expected to have catalytic activity. To link new proteins with known catalytic activities, proteome-scale screens for generic enzyme activities (e.g., protease and phosphatase) should be implemented. These screens could use purified proteins or extracts that contain the protein of interest. In one application of this concept, Phizicky and colleagues fused thousands of yeast genes to the coding sequence of glutathionyl S-transferase and expressed the set of fusion proteins in yeast (7). The fusion proteins were then tested for several catalytic activities, and many previously unannotated yeast open reading frames (potential coding sequences) were assigned a function (e.g., a cyclic phosphodiesterase and a cytochrome c methyltransferase).

Many of the predicted proteins may also have catalytic functions not previously characterized. Although it is impossible to screen for chemical reactions that are unknown, in theory, identifying small molecules that bind to the new proteins may elucidate clues to new activities. These ligands might be found by screening the new proteins against diverse chemical libraries using existing methods such as NMR spectroscopy (8), microcalorimetry, or microarrays (9). The general concept of ascribing function to new proteins by discovering small-molecule ligands might be referred to as chemical proteomics. Of course, chemical proteomics screens would also provide new chemical entities for drug development.

Structural proteomics: Target validation and development
The primary sequence of a protein determines its three-dimensional structure, which in turn determines its function. Often, proteins of similar function share structural homology in the complete absence of obvious sequence homology. As a result, many of the newly sequenced proteins share unrecognized structural and functional homology with known proteins. Indeed, on the basis of current estimates, structural information is predicted to provide functional clues for a large proportion of unannotated proteins (10).

The principle that structure underlies function, often in the absence of sequence homology, has launched a new branch of functional genomics known as structural genomics or structural proteomics (11). The aim of structural proteomics is to provide three-dimensional information for all proteins.
Figure 2 - schematic of yeast two-hybrid system
Figure 2. Yeast two-hybrid system. To detect domains of interaction between two proteins (X and Y or Z), one protein (X) is genetically fused to a DNA-binding domain while the others (Y and Z) are fused to a gene expression activator. If the two proteins do not interact (X and Y), there is no expression of the reporter gene. If they do interact (X and Z), then the reporter gene is expressed.



...
Figure 3. Affinity chromatography. By immobilizing a ligand (L), whether protein, nucleic acid, or small molecule, to a matrix, it is possible to isolate specific proteins of interest (P) from a mixture. By initially binding the proteins at low stringency levels and slowly increasing the stringency, you can incrementally release bound proteins and thereby determine their relative affinities for the ligand.

For the pharmaceutical industry, access to structural information on a proteome-wide scale is of importance at several levels. Structural information can be used to ascribe function, thereby revealing new potential drug targets, validate targets based on homology to other proteins known to bind specific small molecules, invalidate targets with structural properties that do not lend themselves to binding to a drug, aid the development of hits into leads into drugs using structure-based methods, and perfect structure-prediction algorithms, which will eventually allow scientists to predict structure and function from sequence.

There are nascent structural proteomics projects in both the public and private sectors. The current public projects are located in Germany, Canada, Japan, and the United States with an aggregate public funding of more than U.S.$100 million.

Interaction proteomics: Target validation
Protein–protein interactions lie at the heart of most cellular processes, including carbohydrate and lipid metabolism, cell-cycle regulation, protein and nucleic acid metabolism, signal transduction, and cellular architecture. A complete understanding of cellular function depends on a full characterization of the complex network of cellular protein–protein associations. More importantly, many human diseases—cancer, autoimmune disorders, and viral infections—occur because of failure or aberrations in protein–protein associations. Therefore, elucidating the complete set of interactions that involve proteins having known and potential associations with human disease will be an important step toward revealing new units of biological function and new targets for therapeutic intervention.

The technology most commonly used to discover protein–protein interactions on a genomewide scale is the yeast two-hybrid system (Figure 2) (12). Many new protein interactions have been discovered using this system, but despite its power, it also has significant shortcomings. First, it is not uncommon to have several false positive interactions for every valid interaction, and distinguishing the wheat from the chaff is time-consuming. Second, comprehensive genomewide two-hybrid screens have identified only a fraction of known interactions. Finally, the method can characterize only bimolecular interactions—proteins that exist in large assemblies are less amenable to two-hybrid analysis.

Alternative proteomics technologies are being developed to complement the two-hybrid system. These methods reveal direct protein–protein interactions by using protein affinity chromatography (Figure 3). Protein affinity chromatography, as developed by Greenblatt, Alberts, and colleagues (13), has the disadvantage of requiring purified proteins as reagents, but it is superior to the two-hybrid approach because it generates fewer false positives and is more amenable to high-throughput screening. With this technique, the purified protein of interest is immobilized on a solid support, and proteins or small molecules that associate with it are identified by gel electrophoresis and mass spectrometry. This method, which has been used to discover protein interactions in prokaryotic and eukaryotic systems, can characterize protein interactions having affinities in the range of 3 µM or stronger and to purify proteins or protein complexes whose levels in cell extracts are as low as 1/100,000.

Bioinformatics: The next decade
One of the aims of genomics and proteomics is to move from an experimental to an in silico science, in which changes in cellular physiology and pharmacology can be predicted using computational methods. However, with 70% of proteins having no known function, we are far from this goal. To improve the predictive power of bioinformatics, the first step is clearly to complete the annotation of the proteome. We imagine that to accomplish this, a portion of the research community will temporarily adopt an approach that is hypothesis-generating rather than hypothesis-driven. In this new strategy, researchers will create databases of “unbiased”, genomewide or proteome-wide experimental results. Knowledge-discovery and pattern-recognition algorithms will be applied to these data to generate new insights into and hypotheses about protein and cellular function. Hypotheses generated from this unbiased approach are likely to be of higher value than current ones based on relatively little data, and they can be tested with more traditional approaches.

Although such databases are not yet comprehensive, they are already being developed and mined. In the functional genomics sector, databases linking the yeast transcriptional profile to disruptions in specific pathways have been used to predict the function of proteins (14). In the proteomics sector, databases linking protein sequences with biophysical properties have been created, and rules that govern protein solubility and protein crystallization are being extracted (1).

As application of this knowledge-discovery concept moves from individual proteins to protein pathways and then to cellular pathways, we will see a dramatic increase in the efficiency of the drug discovery process. Within a decade, the pharmaceutical industry will certainly start to harvest the fruits of these new strategies.

References

  1. Christendat, D., et al. Nature Struc. Biol., in press.
  2. Schena, M., et al. Trends Biotechnol. 1998, 16, 301–306.
  3. Gygi, S. P., et al. Nature Biotechnol. 1999, 17, 994–999.
  4. Pandey, A.; Mann, M. Nature 2000, 405, 837–846.
  5. Xhou, W., et al. J. Am. Soc. Mass Spectrom. 2000, 11, 273–282.
  6. Srikrishna, G.; Wang, L.; Freeze, H. H. Glycobiol. 1998, 8, 799–811.
  7. Martzen, M. R., et al. Science 1999, 286, 1153–1155.
  8. Shuker, S. B., et al. Science 1996, 274, 1531–1534.
  9. MacBeath, G.; Koehler, A. N.; Schrieber, S. L. J. Am. Chem. Soc. 1999, 121, 7967–7968.
  10. Eisenstein, E., et al. Curr. Opin. Biotechnol. 2000, 11, 25–30.
  11. Kim, S. H. Nature Struct. Biol. 1998, 5 Suppl., 643–645.
  12. McCraith, S., et al. Proc. Natl. Acad. Sci. U.S.A. 2000, 97, 4879–4884.
  13. Formosa, T., et al. Methods Enzymol. 1991, 208, 24–45.
  14. Hughes, T. R., et al. Cell 2000, 102, 109–126.


    Aled M. Edwards is CEO of Integrative Proteomics as well as a professor with the Ontario Cancer Institute and the Banting and Best Department of Medical Research in the University of Toronto (all in Toronto, ON). Cheryl H. Arrowsmith is CSO of Integrative Proteomics and a professor with the Ontario Cancer Institute at the University of Toronto. Bertrand des Pallieres is the director of technology assessment for Integrative Proteomics. Comments and questions for the authors may be addressed to the Editorial Office by e-mail at mdd@acs.org, by fax at 202-776-8166 or by post at 1155 16th Street, NW; Washington, DC 20036.

    Top || Modern Drug Discovery Home Page

    CASChemPortChemCenterPubs Page