Chemical & Engineering News,
February 12, 1996

Copyright © 1995 by the American Chemical Society.

Combinatorial chemistry spawns new software systems to manage flood of information

James H. Krieger,

C&EN Washington

When combinatorial chemistry burst upon the drug discovery scene, it had a profound effect on chemical information management. Early on, information management professionals saw that the software systems of the day were not up to dealing with the information management needs of what amounted to a paradigm shift in the performance of chemistry.

Combinatorial chemistry, after all, turned traditional chemistry upside down. It required chemists to think not in terms of synthesizing single, well-characterized compounds, but in terms of simultaneously synthesizing large populations of compounds. It also required that those people involved with information management and computational chemistry systems address the same issues as the chemists. The information systems, too, had until then dealt essentially with one molecule at a time.

"Technology such as combinatorial chemistry and high-throughput screening generate masses of relatively unrefined data - data that are certainly less refined than what chemists produced in the past," says Steven Goldby, president and chief executive officer of MDL Information Systems, San Leandro, Calif., an information management software firm. It's become more important than ever for companies to be able to handle this flood of data, he points out. "Information management is absolutely critical in today's research environment."

Alan Engelberg, product manager for MDL's combinatorial chemistry line of products, puts it similarly. Instead of making a specific compound and then doing a limited amount of screening, he says, chemists are creating a huge amount of information. "So you have an individual chemist actually being much more productive, but in a less refined way."

The same kinds of questions continue to be asked, says Christopher Herd, vice president of life sciences at Molecular Simulations, San Diego, a molecular modeling and computational chemistry software developer. "You're still dealing with targets. You're still trying to deal with recognition of what's responsible for binding or not responsible for binding," Herd adds. "But now you're trying to answer the questions in a different way and more qualitatively."

Yosef Taitz, chief executive officer of Daylight Chemical Information Systems, Irvine, Calif., another information management software firm, characterizes the effect of all this on information management software as "an explosion of demands."

When it first began, combinatorial chemistry focused on synthesis of large molecules, essentially peptides and oligonucleotides. More recently, the emphasis has swung to nonpeptide small molecules - those with molecular weights under 500 daltons.

At the same time, some observers have noted a shift in emphasis from sheer quantity to greater selectivity. One of those observers noticing this shift is Mark W. Schwartz, vice president of marketing at Tripos Inc., St. Louis, a modeling and computational chemistry software development firm. He notes, for example, that when combinatorial chemistry first started becoming popular a number of people talked about the value of the technology in the context of making millions, or even billions of compounds. What's happened in the past year, he says, is that people are coming around to the thought that what's important is not the sheer number of compounds but their efficiency.

Information management systems had evolved under the older chemistry paradigm when computer technology began to be directed toward chemical applications in that precombinatorial chemistry world. One of the information management challenges was to provide a way for companies to keep track of their chemical creations.

As chemical companies - pharmaceutical firms in particular - synthesized molecule after molecule in a search for drug screening candidates, they needed to have records of what their chemists had synthesized, how they had done it, how the compounds had fared in screening, and so forth. Hence, early versions of information management systems were aimed at creating corporate databases of individual compounds in formats that enabled researchers to search the databases in a variety of ways, including, for example, substructures.

But the advent of combinatorial synthesis and the robotics employed brought an overwhelming increase in the volume of structural, biological, and other data that needed to be stored and available for searching. Not only volume changed. So, too, did the uses to which researchers doing combinatorial chemistry put the information and often the form in which they needed it.

Information management is a critical element of essentially all steps involved in a combinatorial synthesis project. Typically, a project begins with planning for a chemical library. The library is a population of molecules to be produced as discrete compounds or as mixtures of compounds that can be biologically screened for activity against a desired target. From the planning stage, the project continues through building of the library by combinatorial synthesis to screening of compounds, archiving of data, and interpretation of results.

The library is thus a central concept in combinatorial chemistry. Its design depends on its intended use, whether in broad screening for lead generation or in so-called chemical analoging for lead optimization. The two are different in nature and present different information management needs. A lead generation library, for example, would tend to be very large with broad structural diversity. A lead optimization library, on the other hand, would be of a more modest size, with more narrow structural diversity.

One design approach is to create a virtual library. A chemist might, for example, generate in a virtual space - in a computer - all the compounds that a given synthesis would create. These might be compared against compounds already made - in a corporate database, say - or against other virtual databases, looking for compounds that are the most diverse or the least diverse. These can then be made in a lab. Or the chemist might pick certain properties, so-called metrics - logP (octanol partition coefficient) or molecular weight, for example - to act as a filter for the virtual library to reduce it to the specific library to make in the lab.

Part of the planning process is to determine the cost and availability of reagents for use as building blocks. Databases of chemicals available for this use are thus an important element of information management for the project. One example of such a database is MDL's Available Chemicals Directory, a collection of chemical products offered by more than 200 chemical suppliers and totaling well over 400,000 compounds.

Indeed, in combinatorial chemistry the time involved in getting the reagents needed is one of the most crucial steps, notes Richard D. Cramer III, vice president for science at Tripos. It's very frustrating, he says, but "the rate-limiting step in moving forward - and it's all about a race with time - is getting the reagents in." For a project that might ideally take a month to complete the first round of lead generation, he says, a couple of weeks of that could be spent waiting for reagents.

The synthesis itself, once a library is designed, takes relatively little time. Information management plays a role at this stage by archiving reaction histories for use in often necessary resynthesis. And since many of the production steps are automated, with robotic systems carrying out the operations, information management has a further role in controlling the equipment and procuring the data.

Once the compounds are synthesized, they undergo high-, medium-, or low-throughput screening - involving, for example, in vitro, cellular, and tissue-based bioassays. So acquiring bioassay data is another component of the information management picture. And archiving and relating all the information produced by a combinatorial project is an obvious information management step to aid analysis and interpretation.

One of the things that people are just now coming to appreciate, according to Tripos' Schwartz, is that screening large libraries generates a tremendous amount of data, mostly structure-activity relationship (SAR) and quantitative structure-activity relationship (QSAR) data. One of the real challenges, he says, is to maximize the benefit of the libraries by capturing the data and building SAR and QSAR models. He believes that as time goes on, such models will prove to be quite beneficial in terms of optimizing lead compounds and getting more quickly to a compound with the desired biological activity and minimal side effects and so on.

As combinatorial chemistry has evolved, computational chemistry and information management software companies have developed variations on the themes. Approaches differ, depending on an individual company's starting point, its expertise, and its philosophy toward information management and toward combinatorial chemistry. Among the products are design tools, information management packages, databases, and even custom-synthesized libraries ready for screening.

At one point on the combinatorial chemistry software development spectrum, for example, is Tripos, a developer of molecular modeling and computational chemistry software that focuses essentially on pharmaceuticals and biotechnology. Its combinatorial chemistry software applications add on to its traditional Sybyl software for molecular modeling and analysis and Unison Windows software for desktop data access and management. Legion is one of its combinatorial chemistry applications. Legion enables a chemist using the company's Sybyl line notation to enter, store, and search combinatorial structures. And another product, Selector, provides a set of software tools for use in managing and analyzing chemical diversity.

But Tripos' involvement in combinatorial chemistry goes well beyond just software. About a year ago, the company entered into a strategic research collaboration with Panlabs, a contract research organization based in Bothell, Wash., that performs high-throughput screening, custom combinatorial synthesis, toxicological profiling, and other services. Since then, Tripos also has created a new division called Accelerated Discovery Services.

One activity of that division is to provide a host of research services, ranging from library design or customer database analysis to comprehensive discovery collaborations. Last fall, Tripos announced a combinatorial chemistry collaboration with the Italian pharmaceutical firm Menarini aimed at finding good leads for inhibition of inflammation. Another agreement, as yet unannounced, has been finalized and several more are close, according to Schwartz.

The division's other main area of emphasis is to sell what Tripos calls its Optiverse compound library, a standard screening library consisting of some 100,000 compounds that the company is adding to at a rate of about 8,000 compounds a month. As a library with designed diversity, Optiverse provides a wide range of different compounds. Some customers may opt to buy the entire library as it is produced. Others might want a subset - a focused library - based on reaction type or other criteria. With diversity designed into the library, says Cramer, the 100,000 compounds are thought to be representative of around 100 million compounds chosen randomly.

What customers receive are reaction products synthesized by Panlabs that consist of dried films, shipped in 96-well microtiter plates, with 72 compounds per plate, each reaction product in a specific well. The plate is accompanied by an associated electronic database that includes such information as compound identification, well location, and compound structure.

Referring to the design of diversity, Cramer offers the notion of activity islands in chemical space, something like a 16th-century explorer mapping out the Pacific Ocean. "When you're trying to find the island, you're doing lead discovery," he says. "Once you find an island, you want to get in there and explore the island in detail." One wants to know how big it is and where the high points are in terms of biological activity.

"In this scenario," Cramer says, "we think we've made some important inventions." There are probably thousands of chemical descriptors, he points out. "Some of them are useful and some of them not. We've figured out a method for validating these things." Indeed, Tripos has just filed for patent approval on the technique and technologies for designing diverse combinatorial chemical libraries.

Positioned at a far different point on the combinatorial chemistry spectrum is Daylight Chemical Information Systems. Daylight's focus is not on providing complete solutions to information handling problems but on supplying software toolkits for users to employ in devising their own applications. The toolkits include THOR, a client/server database system designed for storage and retrieval of chemical information, and Merlin, a spreadsheet-like interface to THOR databases for searching and displaying data and structures.

Daylight has chosen SMILES (simplified molecular input line entry specification) for its basic operations. SMILES is one of a number of line-notation formats for representing chemical structures in compact typographical form. Daylight also has chosen to play on the acronym as it has developed some standardized linguistic concepts useful in combinatorial chemistry - hence, Chuckles and Chortles.

SMILES represents a valence model of a molecule. Among other conventions, it uses hyphens, equal signs, and number signs for single, double, and triple bonds. For example, carbon dioxide is represented as O=C=O and acetic acid as CC(=O)O. Computer programs reading SMILES use it as a character string, a molecular graph, a database index, source code for a substructure search, and so on.

Chuckles expresses chemical structure at the "monomer" or molecular chunk level, rather than at the atomic level. Monomers are defined for each application and stored as a monomer table in a file. For example, glycine - NCC(=O) in SMILES - would be Gly in Chuckles; likewise, hydroxy - [OH] in SMILES - would be Oh in Chuckles. Chortles extends Chuckles to represent regular mixtures, where multiple monomer choices in a given position are indicated by semicolon-separated monomers in brackets.

Daylight says that with its linguistic approach, the monomer concept, and Chuckles and Chortles as languages, it has pretty much got all the requirements covered for what might be coming up in combinatorial chemistry. "There might be some challenges," says Taitz, "but definitely not major challenges that we cannot overcome or modify."

A new major release of the Daylight product slate, release 4.5, is imminent. A feature of the release applicable to combinatorial chemistry is that SMILES is being further extended to handle reactions. Among the release's capabilities are a reaction toolkit, with support for reactions, reaction patterns, reaction searching, and transformations. And THOR and Merlin servers will be upgraded with reaction-handling capabilities for database building, retrieval, and searching. Daylight will have a beta version of 4.5 by the end of this month and expects to formally release it sometime between April and July.

MDL's approach to combinatorial chemistry is embodied in its Project Library software, introduced about a year ago. Project Library is a desktop software application, designed, as its name suggests, for use at the project level to manage the chemical and biological data coming from combinatorial syntheses. It teams with MDL's ISIS (integrated scientific information system) to manage information flow throughout the combinatorial chemistry process.

For example, Project Library enables a researcher to build, store, search, and archive combinatorial chemistry and associated biological data. The combinatorial libraries can include specific structures represented as discrete compounds. Or they can incorporate generic structures that represent hundreds to millions of specific compounds, along with building blocks or fragments of molecules that represent R-groups attached to the basic generic structures.

An extension of the MDL approach is now on the drawing boards. A new software product called Central Library is a critical element in MDL's long-term combinatorial strategy, according to Goldby. "It will be the principal focus of our combinatorial efforts for the next year or two," he says.

Whereas Project Library deals with restricted sets of data that scientists are generating in their own project, Central Library will enable scientists to integrate combinatorial data with existing corporate compound data and bioassay data at any step in the workflow process. "Eventually," Goldby says, "Project Library will act as the client to Central Library. And so we will have a client/ server solution in this marketplace."

In recent years, MDL has established a particularly close relationship with molecular modeling and computational chemistry software developer Biosym Technologies and with computer manufacturer Silicon Graphics. The idea was that each of the companies would pay particular attention to making sure its products meshed smoothly with those of the other partners. That relationship has continued following last summer's merger of Biosym and Molecular Simulations.

The merged company has just announced that in late March it will be known as Molecular Simulations Inc. (MSI), with headquarters in San Diego. It also has announced a combinatorial chemistry software product of its own, called C2|b1Diversity. Marvin Waldman, director of rational drug design at the company, explains that C2|b1Diversity was basically developed to provide a guideline for design and analysis of combinatorial libraries. It is designed to maximize the coverage of property space, enabling the selection of the most diverse R-group fragments or whole molecules that give the broadest span of various 2-D and 3-D descriptors that have been found useful in QSAR.

C2|b1Diversity is a module that plugs into the company's Cerius2 molecular modeling environment and product line, designed so that users can tailor the system to their particular research needs. Hence, the Cerius2 software developer's kit (C2|b1SDK) can be used by customers to incorporate their own customized functionality into C2|b1Diversity.

Because of the high demand for combinatorial chemistry software, Waldman says, MSI decided to make C2|b1Diversity available now as what it calls an early access product. The official commercial release is planned for June. Waldman explains that the current product is reasonably robust and, although the level of documentation isn't what it will be in June, the software basically contains all the functionality.

As a long-term strategy to help guide its development efforts, Biosym had formed a number of consortia that brought together representatives from the consortium members with the software company's scientists and programmers to focus on specific areas of technology. That approach has continued in the merged company, and MSI now has a newly formed Combinatorial Chemistry Consortium.

According to Judith Hempel, director of collaborative R&D for life sciences at MSI, the consortium will address the issues of protocols and procedures for assessing diversity and sampling and designing libraries. One specific goal, she says, is to develop software with next-generation methods for 3-D selection and comparison of molecules.

Chemical Design, a molecular modeling and computational chemistry software development firm headquartered in the U.K., with U.S. operations in Mahwah, N.J., last year formed its own Combinatorial Chemistry Special Interest Group. The group was established, according to the company, to make sure that Chemical Design's ChemDiverse software meets today's requirements. ChemDiverse is a module for the company's Chem-X molecular modeling and computational chemistry software.

A feature of the system is its use of pharmacophore plots to give a picture of the pharmacophores found for a combinatorial library and therefore the diversity of the library or mixture. Pharmacophores are those structural features of a molecule required for particular biological activity. For the 3-D plots, axes represent distances between interaction centers, with each symbol depicting a particular pharmacophore type and geometry. The plots can thus be used to visually compare the diversity of different libraries.

The challenges faced by the software development firms aren't trivial, as evidenced, for example, in the observations of Columbia University chemistry professor W. Clark Still. Some of what is being done with computer software for combinatorial chemistry - database software to register compounds in libraries, for example - is needed and going to be useful, Still points out. On the other hand, he says it is not clear that other types of software - those that claim to measure diversity or to allow intelligent compound picking, for example - give valid and reliable answers.

"The problem," Still explains, "is that it is easy to develop a reasonable algorithm and to program it, but it is much more difficult to establish the validity or utility of the algorithm in a scientifically convincing way."

Whatever their individual approaches, though, the software development firms are pouring a great deal of effort into an attempt to make that happen as the field of combinatorial chemistry develops.


Return to Article Index


[ACS Home Page] [ACS Publications Division Page]