
Proteomics: New tools for a new era
BY ALED M. EDWARDS, CHERYL H. ARROWSMITH, and BERTRAND des PALLIERES More than five years ago, the pharmaceutical sector began to invest heavily in genomics to increase the supply of validated drug targets. The genomic gold rush that followed, and which has now led to the completion of the human genome sequence, has flooded the drug discovery pipeline with more than 75,000 sequences of potential targets. Unfortunately, although genomics delivered the mass of raw information as promised, genomics technologies have largely been unable to extract the anticipated 10,000 therapeutic targets from the crude sequence ore. The inability to identify valid drug targets by examining gene sequence information has created a gap between genomics and drug discovery. This gap reflects the fact that in most cases, gene sequence reveals little about protein function or disease relevance. Accordingly, the true value of the genome sequence information will only be realized after a function has been assigned to all of the encoded proteins. Proteomics seeks to provide functional information for all proteins. Much like genomics, proteomics is more of a concept than a defined technology, and it refers to any protein-based approach that has the capacity to provide new information about proteins on a genomewide scale. The challenge facing proteomics is enormous, because more than 75% of the predicted proteins in multicellular organisms have no known cellular function. However, proteomics is poised to yield remarkable discoveries because this set of proteins is likely to include new enzymes, signaling molecules, and pathways that may be excellent and unanticipated therapeutic targets. Applying proteomics technologies will not only provide validated targets for drug discovery but will also increase the efficiency of the drug discovery process downstream. For example, genomewide protein purification efforts will provide reagents for high-throughput screens, and structural proteomics efforts will provide three-dimensional structures for drug development. Clearly, proteomics is destined to bridge the gap between genomics and drug discovery. The primary goal of proteomics is to provide functional annotations for the entire proteome. Of course, the function of a protein has many definitions, ranging from its biochemical activity to its physiological role, and so the optimal proteomics strategy must integrate many different technologies. True proteomics applications must also be unbiased in design, to be poised to discover the unknown. This article is an overview of the technologies most relevant to the drug discovery process, and it gives some ideas about developing proteomics technologies. Pure proteins: The biological infrastructure
It will be important to couple these massive purification efforts with a quality control strategy that ensures the purity and structural integrity of the purified proteins. Proteins or protein fragments produced in heterologous expression hosts, in in vitro transcriptiontranslation reactions, or in in vivo two-hybrid screens are commonly misfolded. The magnitude of this problem is often underappreciated. In our large-scale purification effort, we have cloned, expressed, and analyzed thousands of yeast, bacterial, and archaeal proteins that were predicted to be cytoplasmic and have no known structural homologue. We observed that more than 50% of the proteins were insoluble or had an unstable structure (1). Using such improperly folded proteins for biochemical and pharmacological screens or assays would likely lead to false positive or false negative results. Proteomics scientists who use them also risk contaminating their databases with incorrect results and wasting time and money on fruitless targets. The importance of ensuring structural integrity of protein reagents often is underestimated by those more versed in molecular biology and genomics methods. In the interest of speed and throughput (and marketing!), proteomics researchers should not forget the fundamentals of protein structure and biochemistry. The proteome: Target discovery
Defining the protein composition of a cell must also take into account the fact that mRNA splicing and covalent modifications generate protein isoforms that might contribute to important regulatory processes in the cell. Documenting the extent to which a protein is modified and the temporal changes in the modifications during disease can provide strategies for therapeutic intervention. Several approaches are being used to study post-translational modifications on a proteome-wide scale. Again, the most popular approach couples MS, which can detect even subtle covalent modifications, with methods to specifically enrich for modified proteins (5). Other strategies include the use of modification-specific antibodies (6). The techniques that catalog changes in gene expression, protein levels, or modification due to disease or other cellular perturbations are powerful methods of identifying potential targets for drug discovery. However, they do not reveal the biochemical mechanism of how a gene product is related to disease or whether the protein is likely to be amenable to drug development. To address these issues, proteomics approaches that address protein function are required. Chemical proteomics: Screens for activity and binding Many of the predicted proteins may also have catalytic functions not previously characterized. Although it is impossible to screen for chemical reactions that are unknown, in theory, identifying small molecules that bind to the new proteins may elucidate clues to new activities. These ligands might be found by screening the new proteins against diverse chemical libraries using existing methods such as NMR spectroscopy (8), microcalorimetry, or microarrays (9). The general concept of ascribing function to new proteins by discovering small-molecule ligands might be referred to as chemical proteomics. Of course, chemical proteomics screens would also provide new chemical entities for drug development. Structural proteomics: Target validation and development The principle that structure underlies function, often in the absence of sequence homology, has launched a new branch of functional genomics known as structural genomics or structural proteomics (11). The aim of structural proteomics is to provide three-dimensional information for all proteins.
For the pharmaceutical industry, access to structural information on a proteome-wide scale is of importance at several levels. Structural information can be used to ascribe function, thereby revealing new potential drug targets, validate targets based on homology to other proteins known to bind specific small molecules, invalidate targets with structural properties that do not lend themselves to binding to a drug, aid the development of hits into leads into drugs using structure-based methods, and perfect structure-prediction algorithms, which will eventually allow scientists to predict structure and function from sequence. There are nascent structural proteomics projects in both the public and private sectors. The current public projects are located in Germany, Canada, Japan, and the United States with an aggregate public funding of more than U.S.$100 million. Interaction proteomics: Target validation The technology most commonly used to discover proteinprotein interactions on a genomewide scale is the yeast two-hybrid system (Figure 2) (12). Many new protein interactions have been discovered using this system, but despite its power, it also has significant shortcomings. First, it is not uncommon to have several false positive interactions for every valid interaction, and distinguishing the wheat from the chaff is time-consuming. Second, comprehensive genomewide two-hybrid screens have identified only a fraction of known interactions. Finally, the method can characterize only bimolecular interactionsproteins that exist in large assemblies are less amenable to two-hybrid analysis. Alternative proteomics technologies are being developed to complement the two-hybrid system. These methods reveal direct proteinprotein interactions by using protein affinity chromatography (Figure 3). Protein affinity chromatography, as developed by Greenblatt, Alberts, and colleagues (13), has the disadvantage of requiring purified proteins as reagents, but it is superior to the two-hybrid approach because it generates fewer false positives and is more amenable to high-throughput screening. With this technique, the purified protein of interest is immobilized on a solid support, and proteins or small molecules that associate with it are identified by gel electrophoresis and mass spectrometry. This method, which has been used to discover protein interactions in prokaryotic and eukaryotic systems, can characterize protein interactions having affinities in the range of 3 µM or stronger and to purify proteins or protein complexes whose levels in cell extracts are as low as 1/100,000. Bioinformatics: The next decade Although such databases are not yet comprehensive, they are already being developed and mined. In the functional genomics sector, databases linking the yeast transcriptional profile to disruptions in specific pathways have been used to predict the function of proteins (14). In the proteomics sector, databases linking protein sequences with biophysical properties have been created, and rules that govern protein solubility and protein crystallization are being extracted (1). As application of this knowledge-discovery concept moves from individual proteins to protein pathways and then to cellular pathways, we will see a dramatic increase in the efficiency of the drug discovery process. Within a decade, the pharmaceutical industry will certainly start to harvest the fruits of these new strategies. References
|