CHEMTECH
September 1998
CHEMTECH 1998, 28(9), 35-40.
Copyright © 1998 by the American Chemical Society.
|
ENABLING SCIENCE Building the shortest synthesis routeThe goal is to make the target compound in the fewest steps possible, thus avoiding wasteful yield losses and minimizing synthesis time.James B. HendricksonR &D laboratories synthesize many new compounds every year, yet there seems to be no clear protocol for designing acceptable and efficient routes to target molecules. Indeed, there must be millions of ways to do it. Some years ago, in an effort to use the power of the computer to generate all the best and shortest routes to any compound, my group at Brandeis began to develop the SYNGEN program (1-3). The task is huge, even for the computer. Imagine a graph that traces the process of building up a target molecule; we call it a synthesis tree (Figure 1). The starting materials for the possible synthesis routes are molecules we can easily obtain. As the routes progress, new starting materials are added from time to time until the target is obtained. Each line represents a reaction step, or level, from one intermediate to another, and each step decreases the yield. Two of many possible routes are traced in Figure 1.
To find these routes, we presume to start with the target structure and a catalog of all possible starting materials. Then, the computer generates all the points (intermediates) and lines (reactions) of the graph. If the computer has been programmed with an extensive knowledge of chemical reactions, it could do this by generating all possible reactions backward one step from the target structure to the intermediate structures, then repeating this on each intermediate as many times as necessary to return to the available starting materials. At this stage, the problem gets too big. Suppose there are 20 possible last reactions to the target (level 1) and that each of these reactions also has 20 possible reactions back to level 2. Going back only five levels will generate 205 (3.2 million) routes. How do we select only one to try in the laboratory? This generation of reactions and intermediates is a brute-force approach; clearly, it must be focused and simplified with some stringent logic. The central criterion should be economy - that is, to make the target in the fewest steps possible, thus avoiding wasteful yield losses and minimizing synthesis time.
A protocol for synthesis generation The central feature of any synthesis is the assembly of the target skeleton from the skeletons of the starting material. Looking for all the possible ways of cutting the target skeleton into the skeletons of available starting materials represents a major focus for examining the synthesis tree. We illustrate this task by looking at the steroid skeleton of estrone and cutting it in two at different points in the structure (Figure 2). Each cut creates two intermediate skeletons, and each skeleton is then cut in two again to obtain four skeletons. This procedure creates a convergent synthesis, and convergent routes are the most efficient (4). With four starting skeletons, we will need to construct only 6 (or fewer) of the 21 target skeleton bonds. We could keep dividing each skeleton until we ultimately arrive at a set of one-carbon skeletons, but it is not necessary to go that far, that is, to a "total synthesis".
With our four starting skeletons, each skeleton represents a family of many compounds with different functional groups placed on the same skeleton. Suppose that we find a set in which all four skeletons are represented by real compounds in an available library of starting materials; this set could form the basis of a synthesis route with no more than six construction steps to the steroid if the functional groups are right. The skeletal bonds we cut, which must be constructed in the synthesis route, are called a bondset, and these bondsets are a basis for generating the shortest syntheses. Each skeletal bondset represents a whole family of potential syntheses.
The ideal synthesis Imagine a synthesis route with its set of starting materials chosen so
that their functional groups are correct to initiate the first
construction, leave a product correctly functionalized for the second
construction, and so on, continuing to construct skeletal bonds until
the target skeleton is built. This is the ideal synthesis in that it
must have the fewest steps possible. It requires no
In a survey of many syntheses, we found that the average nonaromatic starting material has a
We think we can do better. Building the shortest, most economical
syntheses requires first finding those skeletal dissection bondsets
with the fewest bonds, to minimize construction reactions. It also
requires no more than four correctly functionalized starting materials,
to minimize
Generating the chemistry Any carbon in a structure can have four general kinds of bonds, as
summarized in Figure 3: skeletal bonds to other carbons (R);
A reaction change at each carbon is just a simple exchange of one bond type for another. This change may be designated by the two letters for the bond made and for the bond lost. Thus, reaction HZ indicates making a bond to hydrogen by loss of a bond to heteroatom--that is, a reduction. The 16 possible combinations are shown and described with general reaction families in Figure 3. Using this system, we can generate all possible generalized reactions,
forward or backward, from any structure. No routes are missed, and we
can find all the best routes back from the target to real starting
materials. Relatively few generalized reactions are created, and we
refine the abstract into real chemistry only at the end. When starting
materials are generated through successive applications of these
reaction families, we can look them up in the catalog, where they are
indexed by skeleton and by generalized
z
The SYNGEN program
In the second phase (Figure 4, right side), this ordered bondset is followed, one bond at a time, generating the construction reactions for an ideal synthesis until all of the functional groups have been generated. These actual starting materials are found in the catalog, so a full synthesis route can be written from them that goes up the right side in a quick, constructions-only ideal synthesis of the target. This three-step synthesis of a target structure can be converted to estrone in two more steps. The prediction for an average synthesis would have been much longer. The catalog for the current version of SYNGEN has about 6000 starting materials, but it is being expanded from available chemicals directories. After the target is drawn on the screen, the program generates the best routes in <1 min. It displays the bondsets, the starting materials used, and the actual routes, which are ordered by their calculated overall cost. The output screen from SYNGEN for the example analyzed in Figure 4 is shown in Figure 5. Two other sample outputs, from a different bondset of the same target, are shown in Figures 6 and 7. The notations on the arrows use abbreviations to describe the nature of the reaction; explanations are available on a help screen. The routes shown are still in a generalized form and require further elaboration of chemical detail by the user. Literature precedents, however, are being added to the program, as described later.
The future of SYNGEN The second development deals with a major problem in previous versions of SYNGEN: The program generated too many reactions that chemists saw as clearly nonviable. Such results tended to destroy their confidence in the program as a whole. We now have a way to validate the generated reactions from the literature, eliminating many of these nonviable reactions. The generalizing procedure for describing structures and reactions in SYNGEN also was applied to create an index-and-retrieval system to find matches for any input query reaction from a large database of published reactions. This program, RECOGNOS, has been applied to an archive of 400,000 reactions originally published between 1975 and 1992 and packaged as a single CD-ROM that allows instant access to matching precedents in that archive (6, 7). The RECOGNOS program is available on CD-ROM from InfoChem GmbH, Munich, Germany, combined with their ChemReact database of 370,000 reactions and renamed "ChemReact for Macintosh". This archive of literature reactions, now almost double its original size, has been distilled to more than 100,000 construction reactions. These reactions, in turn, have been converted into a look-up table for use by the SYNGEN program. With this tool, SYNGEN can validate any reaction it generates by searching for matches in the archive and determining the average yield. Unprecedented reactions are therefore set aside, and a realistic yield can be estimated for each reaction to be used in the overall cost accounting. We believe that SYNGEN has considerable potential for discovering new alternatives for creating organic chemicals in the most economical way possible. Even when the program does not yield a directly usable synthesis, it often starts the chemist thinking about different approaches previously not considered. No chemist can think of all the possible routes to the target, but SYNGEN does this fast. It also provides a powerful and focused output of the possibilities.
Acknowledgments
References |
|
|
|
|