About MDD - Subscription Info
October 2001
Vol. 4, No. 10, pp 47, 50.
money matters: corporate
(Data) mining in the mountains
Companies that provide mathematical modeling for drug development and corporate planning find homes in Santa Fe.

It has often been stated that one of the many things that makes New Mexico, and particularly northern New Mexico, so beautiful is the quality of the light. It is somehow richer than in many other places, and it dances across mountains and valleys, over rivers and streams, between pine trees, through multilayered cloud banks, and off the warm adobe buildings that characterize the region’s architecture. For generations, artists have been drawn to the state, lured by its light and by its people, customs, and history.

Lately, a different group of artisans has been moving to Santa Fe, this time to participate in something entirely new. A burgeoning industry of “informatics” companies has chosen to make the nation’s second-oldest city its home.

Informatics, of course, refers to the computer-driven analysis of data meant to distill billions upon billions of individual data bytes into meaningful information that can be used to develop new agricultural, petrochemical, and pharmaceutical products, as well as many other applications based on chaos, string, and other complex system theories. Of interest to the pharmaceutical development industry, however, is the power of computer modeling systems to identify potential new drug therapies from the enormous libraries of compounds derived through combinatorial chemistry techniques.

But why locate a new-age tech center in the New Age capital of the country? It can certainly be argued that climate, lifestyle, and numerous state-sponsored business incentives have coaxed computer gurus out of Silicon Valley and into what is fast becoming known as “InfoMesa”. But there is another reason for the emergence of this industry in Santa Fe: It is located a mere 40 miles east of Los Alamos and its technological center, the Los Alamos National Laboratory.

Beginning in the mid-1940s, the laboratory served as the development center for atom bombs and, later, hydrogen bombs. As part of that research, the laboratory relied on increasingly evolved “supercomputers” to make mathematical sense of, and subsequently model, complex events such as the path of an explosive shock wave. By the early 1980s, however, new weapon development was ebbing and the laboratory suffered an oversupply of both computers and scientists trained in their use.

In 1984, George Cowan, a research director at Los Alamos, moved to Santa Fe and, through funding from the U.S. Department of Energy, National Science Foundation, and MacArthur Foundation, created the nonprofit Santa Fe Institute. The purpose of the institute is to harness the skills and computing power remaindered at Los Alamos and use it in new ways as varied as weather and stock market prediction. When Nobel laureate Murray Gell-Mann, who lived just north of Santa Fe at the time, agreed to serve as chair of the institute, things began to take off: Today, roughly 25 informatics companies are based in Santa Fe, including the National Center for Genome Resources.

A handful of the 25 InfoMesa companies perform calculations and computations applicable solely to chemical and pharmaceutical development. The others, however, focus on market trend predictions, commodity pricing, and business design simulations that may be of use to many industries. The common thread, however, is that all of the organizations are staffed by individuals with scientific, mathematical, engineering, and other technical backgrounds.

Drug development
Pharmaceutical companies maintain libraries of chemical entities that may or may not have the potential to be developed into products with desirable therapeutic properties. Through advances in combinatorial chemistry, an ever-increasing number of potential drug compounds are created; with the help of high-throughput screening, these compounds are systematically tested against an array of targets.

Given that drug companies often have as many as 3 million chemical compounds in their libraries, the scale of data resulting from synthesis and screening technologies is immense, and the effort necessary to analyze and interpret the data gargantuan.

In the past, pharmaceutical companies conducted their own data screening and analyses, but because of the costs inherent in doing so, many are beginning to outsource this vital task. Here then, is where the companies in InfoMesa become involved. Using advanced mathematical algorithms and proprietary data-mining systems, the various companies offer pharmaceutical developers a thorough and speedy analysis of their libraries. For example, in a recent test of its predictive abilities, BioReason, Inc., reviewed the raw data from a subset of a pharmaceutical company’s library. From that data, the company successfully identified all the same potentially useful compounds (chemical substances with druglike properties) that the pharmaceutical company’s own research ers had characterized, as well as two additional ones they had overlooked. The more striking result, however, was that the researchers, working in a laboratory, spent several years reaching their results, while BioReason, using software alone, achieved its findings in a matter of hours.

On average, 20% of a pharmaceutical company’s R&D budget is devoted to the development of new chemical entities. Of the compounds chosen for scale-up and testing, 80% fail as a result of preclinical and clinical research, and only 30% of approved drugs ever recoup their development costs. Given this, any technique that speeds the identification of potential therapeutic agents and reduces the costs of doing so has enormous promise.

Chemical databases
One of the most important innovations to come from InfoMesa is the development of high-performance techniques for chemical information processing. Some of the most significant of these are predicated on the past creation of a computer-friendly “language” intended to make universal the often confusing and sometimes contradictory nomenclature systems used for identifying chemical compounds.

No practicing chemist has ever avoided the frustration inherent in chemical names. For example, almost any compound can be identified in one of six ways: molecular formula, chemical structure diagram, common name, official IUPAC name, commercial trade name, or some combination of formal and informal nomenclature. These multiple names become even more confusing when non-English-speaking scientists enter the mix, and common names are reduced to chemical slang (e.g., when a chemist refers to “ether” or “ethyl ether”, what is almost certainly meant is the solvent properly known as diethyl ether.)

In the early 1980s, while at the U.S. Environmental Protection Agency (EPA) and then Pomona College, Dave Weininger invented SMILES (simplified molecular input line entry specification), a new language system that makes the rapid analysis of chemical databases possible by allowing scientists to enter chemical information in text form to accurately, completely, and incontrovertibly name a specific compound. Currently, most chemical and pharmaceutical companies use the SMILES nomenclature in their databases.

For his part, Weininger later included the successful system in his InfoMesa-based company, Daylight Chemical Information Systems. The company now supplies several chemistry-related data-mining software packages to a wide variety of industrial users. Among the company’s products are tools that manage combinatorial chemistry databases, predict chemical properties, and generate three-dimensional chemical structures.

Corporate applications
There is more to informatics than drug discovery and chemistry, however. The “science” of business management and corporate strategy can also be plotted using mathematical theorems and equations. For example, several of the InfoMesa organizations use “complexity science” to predict, model, and otherwise simulate the business decision-making process. This is useful for forecasting commodity prices and analyzing portfolios for financial service companies. In each of these cases, the result is that by predicting consumer behavior, risk levels, macroeconomic trends, and competitive pressures, companies can improve their long-term management decisions and managers can test the simulated outcomes of their choices to see what effects, if any, they might have.

Besides its use in modeling corporate business decisions, the mathematics of informatics is being applied to the world of financial trading. For example, the Prediction Co. combines the real-world expertise of scientists and engineers with simulations and models created using its computer-based technologies. The company’s methods thus attempt to build accurate and consistent models to help investors make better, faster, and smarter choices.

Other technologies are also in use to provide integrated information management systems for scientific-based companies. Most of the approaches focus on abstract-sounding methods, such as “adaptive neural computation”. The goal of these various techniques, however, is the same: to accurately predict future trends based on vast amounts of historical data, that can be used by organizations to develop long-term, strategic business plans.

At present, little venture capital funding has found its way into the companies at InfoMesa. But unlike many start-up computer-based organizations in Silicon Valley, the InfoMesa companies are, by and large, realizing healthy profits. As a result, with the diversity and growing industrial importance of the informatics companies, it appears that the light bathing northern New Mexico is becoming increasingly golden.

moneymattersweb
BioReason, Inc.
www.bioreason.com

Daylight Chemical Information Systems, Inc.
www.daylight.com

Prediction Company, LLC.
www.predict.com

For a complete listing of the Santa Fe informatics companies, visit
www.daylight.com/infomesa/index.html

For additional information about InfoMesa, visit
www.wired.com/wired/archive/8.06/infomesa_pr.html

Cullen T. Vogelson is an assistant editor of Modern Drug Discovery. Send your comments or questions regarding this article to mdd@acs.org or the Editorial Office by fax at 202-776-8166 or by post at 1155 16th Street, NW; Washington, DC 20036.

Return to Top || Table of Contents

 CASChemPortchemistry.orgPubs Page