|
CHEMISTS TURNED VISIONARIES
As computers become faster and have more power, chemists will use them to solve problems they are just beginning to imagine
Elizabeth K. Wilson
C&EN West Coast News Bureau
You know technology has evolved at a breakneck pace when the 1970s are referred to as "the dark ages." By today's standards, computers back then were barely more than overgrown abacuses, with perhaps a few dozen kilobytes of memory and chips whose clocks ticked at the glacial rate of a fraction of a megahertz.
The computational speed of the first supercomputers of the 1970s was a less-than-galloping 100 megaflops--flops being floating point operations per second. But a mere three decades later, tech-savvy scientists enjoy more power than that from their desktop personal computers (PCs). And the biggest supercomputers of today have peak speeds of more than a teraflops--1012 flops.
On the horizon loom PCs with a gigabyte of memory and 800-MHz chips, and IBM recently announced its five-year plan to build the world's first petaflops--1015 flops--supercomputer.
Thanks in large part to the astounding progress of computer science, computational science--notably in chemistry--has also exploded. Chemists routinely use computers to perform electronic structure calculations of large molecules, thermodynamic calculations, and dynamics simulations. That's a far cry from the capabilities of the field in the dark ages.
Back in the 1970s, "we were lucky if we could predict energies to within +80 kcal per mole," according to Thom H. Dunning Jr., assistant director of advanced computational modeling and simulation in the Department of Energy's Office of Science in Germantown, Md. "Clearly, that was totally useless for the kinds of applications real-life chemists have to face."
But now, programs calculate those same energies with accuracies on the order of 1 kcal per mole. Dunning and numerous others who spoke at a symposium at the national meeting of the American Chemical Society in San Francisco last month said chemists should expect even more from the future.
The symposium, "Computing in the 21st Century: How Far Can Chemistry Go with the Hardest Problems?" was cosponsored by the Division of Computers in Chemistry and the Division of Biochemical Technology. Speakers cast their visionary net over the next few decades, contemplating the promise computers hold for hardware, software, and for the huge problems waiting in the wings that chemists would love to solve.
Certainly, the development of computers themselves shows no sign of slacking off. Chemists will need all the firepower they can get and more to unravel, for example, the astoundingly huge pile of information embedded in the decoded human genome. They'll need speed and memory to model complex systems, such as explosions and the flow of environmental waste.
But, said a number of the speakers, performing those tasks is not just a matter of relying on increasing computational brute force. Eventually chip designers and materials engineers butt up against the issue of cost, to say nothing of the laws of physics. They observed that it's vital to continue improving theory, as well as to make sure the computational programs they write work well with new computer architectures--most notably, those of massively parallel computers.
And no one doubts that, only a few decades out of the dark ages, technology is swiftly approaching its own age of enlightenment.

A rendering of ASCI White, a 10-teraflops computer being installed at Lawrence Livermore National Laboratory
A better mousetrap
Remember Pong, the now ridiculously simplistic table-tennis video game that debuted in 1972? Compare that with the just-released Sony Playstation II, an über-tainment center with a 300-MHz, 6.2-gigaflops chip and allegedly breathtaking graphics.
The same thing has happened in scientific computing. Even just 10 years ago, the biggest computer in DOE's Office of Science's civilian computing facilities ran only 2 gigaflops, Dunning said. Now scientists are contemplating what they'll do with a petaflops computer.
The symposium's organizer, theoretical chemist David A. Dixon of the William R. Wiley Environmental Molecular Sciences Laboratory at Pacific Northwest National Laboratory (PNNL) in Richland, Wash., is one of those scientists. In addition to studying environmental waste, he's chairing a computational chemistry task force for the Council of Chemical Research's Technology Vision 2020, a broad exploration of chemistry's future.
 |
| Head-Gordon's lab predicted the structure of 1POU, a 72-amino-acid DNA binding protein (left), compared with the crystal structure (right). |
The Vision 2020 task force would like to see computing advance to the point where chemists can routinely design catalysts and other materials and predict biological activity or environmental fate from chemical structure, Dixon said.
To that end, it's a given that, at least for the near future, computers will continue to increase in memory capacity and performance.
"I believe individual processors will continue to increase in speed for the foreseeable future--probably for 10 years--until we start hitting the brick wall of physics," said Andrew Komornicki, a senior business development manager at Sun Microsystems in Palo Alto, Calif., who was also a theoretical chemist.
Computers are also becoming increasingly Internet-centered. Users of the future will eventually have universal access to information through Internet-based systems, Komornicki said.
Some would say the future is already here. Right now, Lawrence Livermore National Laboratory in Livermore, Calif., is readying itself for a new addition to its already formidable supercomputing facility: a 10-teraflops supercomputer from IBM, dubbed ASCI White. The computer outshadows its cousin, the 3-teraflops ASCI Blue Pacific. Both are part of the Accelerated Strategic Computing Initiative (ASCI), the Department of Energy's program for using computers to model the behavior of the nation's aging nuclear stockpile to avoid actual nuclear tests.
IBM also recently unveiled its five-year plan to build the world's biggest supercomputer, which it calls Blue Gene ( C&EN, Dec. 20, 1999, page 35 ). This behemoth will have a million processors, each running at 1 gigaflops, and the entire machine will run at 1 petaflops. Blue Gene's job will be to simulate the folding of a protein.
And once you have this giant computer, what do you do with it? In other words, "How do you actually harness that increase in computational power to solve problems we as chemists are interested in?" Dunning asked.
In some cases, it's hard to even consider what the problems will be, said Robert J. Harrison, theoretical chemist at PNNL. "Many applications already have enough computing power," he said. "The things on the leading edge right now--for example, thermochemical predictions--are not going to be on the leading edge 10 years from now."
The specific chemical conundrums that can be solved with petaflops computers may not be immediately evident, Harrison said, but one can certainly generalize: They'll be problems of a much larger scale, such as handling more realistic models of chemistry in the environment or dealing with multiple excited states coupled together. "What's exciting is the opportunities it presents," Harrison said. "It allows people to think about taking the next big step."
And, he commented, although petaflops sound like "an absurd level of computing power right now, if you look at the commodity computing market at its cheapest, a petaflops is actually pretty insignificant--it corresponds to a quarter of a day of the expected production rate of a Sony Playstation II."
To prepare for a future world of what will almost certainly be exclusively parallel computing, scientists must also contend with what's known as scalability. Supercomputers have vector-based architecture, whereas processing tasks are completed sequentially. Although that's not a cost-efficient way to solve a problem, these computers could still regularly achieve 40 to 50% of their peak performance.
 |
| Model of a lipopolysaccharide membrane of the bacteria Pseudomonas aeruginosa, consisting of 16 lipopolysaccharide molecules (top) and 48 ethylamine phospholipid molecules (bottom). The core region of the LPS also contains 104 Ca2+ counterions. NWCHEM molecular dynamics calculations will test the membrane's affinity for metal-ion uptake. [Photo by T. P. Straatsma and Roberto Lins] |
On the face of it, parallel computing appears to be an ideal strategy. For example, for a task of sequentially tallying the words in a dictionary, a parallel system would split the dictionary into 26 separate tasks, one processor for each letter. If all the letters have the same number of entries, then the job will be done 26 times faster. That is what's known as perfect scaling.
And the need for speed becomes evident when you consider problems that routinely arise in chemistry--for example, calculating the energy of octane, Dunning said. Such a calculation requires solving 275 million nonlinear equations, which translates into 30 quadrillion arithmetic operations.
A few problems in chemistry lend themselves very well to parallel architecture--reading genetic sequence information, for example. One speaker termed such problems "embarrassingly parallel."
But most problems aren't that tidy. In electronic-structure calculations, for example, properties have to be calculated for each atom in a molecule. You could assign an atom to each processor, but all of the atoms are interacting with each other. That means the processors need to communicate, and they need to do it fast; otherwise there's a tremendous amount of wasted processor time.
Because of this bottleneck, parallel computers usually achieve only about 4 to 5% of their peak efficiency, "and people feel good when they get 10%," Dunning said. "That's the reason parallel computers have been so slow to catch on."
Harrison said that in the future of technology, "there's a whole long list of things out there." Poised to make a possible impact on computing are exotic features like "processors in memory" or "superconducting rapid single-flux quantum logic."
There's also a renewed emphasis on special-purpose architecture, Harrison said--computers designed specifically with one type of scientific use in mind--IBM's Blue Gene, for example. But that's not going to carry our field very far, he said. "Computational chemistry as a whole is rich enough that only general-purpose computing is going to satisfy it," he said.
Dixon also said that, if scientists of the future are to model and predict the behavior of complex phenomena that bridge the microscopic and macroscopic, such as combustion, the task "will require a close connection between theory, simulation on the largest available computers, and experiment."
Chemistry's brave new world
Like the machines they run on, computational chemistry programs are evolving, too. Some were written decades ago, and many were originally designed to run on vector-based computers.
"Scientific codes last longer than the theories on which they're based," Dunning said. Dunning and colleagues have been working on a computing project since 1992: NWCHEM, which from the ground up is designed to run on massively parallel computers.
Using NWCHEM, Dixon and his colleagues have recently performed the biggest ab initio calculation ever of a large biomolecule: the electrostatic potential of a lipopolysaccharide containing nearly 1,000 atoms. They've also modeled a lipopolysaccharide membrane of the bacterium Pseudomonas aeruginosa, and they're now performing molecular dynamics calculations with NWCHEM to test its affinity for metal-ion uptake.
Jan M. L. Martin, theoretical chemist at the Weizmann Institute of Science in Rehovot, Israel, is pushing the envelope of accuracy for computational thermochemistry. His group has developed ab initio methods, dubbed W1 and W2, of calculating molecular heats of formation with average errors as low as 0.17 kcal per mole without resorting to empirical parameters.
To illustrate the method's power, Martin discussed SO3, which he whimsically referred to as "the molecule from hell." A property of this difficult molecule is so-called inner polarization, a phenomenon typical of second-row compounds, where the good behavior of calculated binding energies is dependent on adding high-exponent d and f functions to the second-row atoms. Ignoring this effect can cause the molecular binding energy to be underestimated by as much as 40 kcal per mole, Martin said.
A Few big problems
Even with Blue Gene's megapower, it will take that computer a nonstop year to simulate the folding of a protein. Nature, on the other hand, folds proteins by the millions in fractions of a second. Clearly, even Blue Gene's 1 million powerful processors ultimately aren't the best solution.
Scientists need to do a better job of simulating in the first place, the symposium speakers said. To do that means they need to make more progress in understanding how to predict behavior.
For materials science tasks such as catalyst design, computers could save untold time and effort, Dixon said, allowing scientists to "design things better so you don't have to build prototypes."
recent announce ment that it had succeeded in sequencing the human genome brings to the forefront the huge task that awaits biochemists. The thousands of proteins coded by DNA will need to be determined, their structures elucidated, and their functions identified.
Obviously, biochemists cannot hope to crystallize and do experimental structural analyses on all the proteins from the genome. Computational chemists will have to step in and take over a good portion of the job, and they will need to do it efficiently and cost-effectively.
From the DNA sequences, it's relatively straightforward to figure out corresponding proteins. But as Teresa Head-Gor don, theoretical chemist at Lawrence Berkeley National Laboratory (LBNL), and Peter A. Kollman, pharmaceutical chemistry professor at the University of California, San Francisco, explained, predicting the complex, chemically active conformations that the huge molecules fold themselves into is anything but straightforward.
"Protein folding is one of the great challenges of computational biology and chemistry," Kollman said.
Head-Gordon's strategy is to directly predict a protein structure that cor responds to the lowest energy conformation the molecule can adopt. She doesn't simulate the actual folding behavior; rather, her algorithms are geared toward finding that energy minimum, without regard to its real-life folding path.
In an ideal world, no extra information, such as structures of known proteins, should be used in a prediction. "A prediction should be able to stand on its own from equations describing behavior," Head-Gordon said. However, many structure-prediction algorithms rely heavily on databases of structural information.
And though Head-Gordon's ultimate goal is a parameter-less algorithm, she acknowledged that for now, the first part of her strategyMpredicting a protein's secondary structureMis accomplished with a neural network technique that trains on a database of amino acid sequences with known secondary structure. But once it's trained, the network should be able to reliably predict most of the secondary structure for proteins it has never seen before, Head-Gordon said.
Then, once she's obtained a secondary structure, she turns to the problem of tertiary structure, the three-dimensional shape of the folded protein. An optimal shape is a low-energy shape, and that's what the computer looks for. The computer does this by exploring subsets of dihedral angles in the amino acid chain backbone. Head-Gordon's algorithm searches through these sub spaces of different angles until it finds a global energy minimum for that subset of amino acids, she said. Then it picks a new set. In this way, the protein successively adopts structures that are lower and lower in energy.
Head-Gordon's group members run their simulations on the massively parallel Cray T3E supercomputer at LBNL's National Energy Research Scientific Computing Center. And in doing so, the researchers run into the issue of scaling. "Lots of problems lend themselves very well to parallel architecture, and so are `embarrassingly scalable," she said. But others don't, and one of them is the protein-folding problem.
However, she said, "the only way you're going to solve hard computational chemistry problems is with a computer with many processors--that's going to be the supercomputer of the future." Head-Gordon and postdoctoral researcher Silvia Crivelli have developed a way to make good use of the parallel architecture.
Head-Gordon would like to cover as many different possible types of starting structures as possible, because she doesn't know ahead of time where the energy minimum is. The problem is that she also doesn't know ahead of time how much work it will take to explore an individual starting structure. But she sets up a processor hierarchy, with "supervisors" monitoring "workers," so that if one processor finishes its task early, it's put back to work on another task, rather than sitting idle. And now, Head- Gordon said, "it scales very well."
Kollman, who is a coauthor of the molecular dynamics program AMBER (Assisted Model Building with Energy Refinement), takes a couple of computational approaches to protein structure prediction and folding mechanisms. One strategy is to take low- resolution structures predicted by chemists who use low-resolution, qualitative models and add in atomic details and water "to nudge it closer to the correct structure."
 |
| The Friesner group's new model of the core of a catalytic intermediate (top) finds that Glu243 oxygen's replacement of H2O, as in previous models (bottom), is energetically unfavorable. |
The other approach, molecular dynamics simulations starting with an unfolded structure, is more intensive and less practical for structure prediction, but it can offer insight into the mechanism of how proteins actually fold, he said.
Unlike ab initio quantum mechanics, molecular mechanics doesn't involve the laborious task of solving the Schrödinger equation. It solves potential energy equations, which work well when no covalent bonds are breaking or being formed. "It's incredibly less expensive, and you can do a lot more with it," Kollman said.
For example, his simulation of the villin protein in an aqueous environment contained 17,000 atoms--a quantity that quantum mechanical calculations couldn't even blink at with today's computer technology.
But the calculations are still compu tationally "meaty;" they'll use "as much computer time as you can throw at them," Kollman said. To simulate the real-world behavior of molecular dynamics, one needs to solve the program's equations of motion every femto second. So for just a microsecond movie, the equations must be solved a billion times. A protein takes on the order of 10 to 100 microseconds to fold, so Kollman's simulation tells only part of the story. But performing the entire simulation would be prohibitive today, he said. The villin experiment took two months on a Cray T3E, on which Koll man's group eked out time during weekends and nights.
Though time on machines like the T3E is very expensive, these machines are "very impressive in how fast they can communicate between processors," Kollman said. He lamented that computers similar to the T3E are no longer being developed, because, he said, it's much cheaper to build many smaller machines that work relatively well for jobs that don't require ultrafast interprocessor communication.
The problem of how to deal with large molecules also concerns Richard A. Friesner, chem istry professor at Columbia University. Ab initio methods are of course the most accurate, but "the basic tradeoff is that ab initio is expensive," Friesner said.
One strategy for circumventing that problem, he said, especially when dealing with large molecules, is to treat different portions of the molecule differently. The active site of an enzyme, for example, is where most of the interesting stuff happens, and so it needs to be treated with the most accurate method available, quantum mechanics. The rest of the molecule, however, doesn't have to be so rigorously modeled, and so it can be described by a force field. This mixture of molecular modeling and quantum mechanics works well, but Friesner's group would like to improve force fields to the point where they include quantities like molecular polarization.
Friesner, one of the founders of the company Schrödinger in Portland, Ore., and a coauthor of its ab initio software, Jaguar, is also exploring the capabilities of large, pure ab initio simulations. Recently, he and Massachusetts Institute of Technology chemistry professor Stephen J. Lippard reported a 100-atom calculation of the catalytic intermediates of methane monooxygenase, the metalloenzyme that converts methane and oxygen into methanol [ J. Am. Chem. Soc.,122, 2828 (2000)].
The thing that's unique about this work, he said, is that the chemists included a large portion of the protein along with the reactive core in order to observe how the protein controlled the structure of the catalytic intermediate. Their results indicate that previous models of one of the structural intermediates, in which a key oxygen atom displaced a water molecule, are in fact energetically unfavorable.

IBM's Blue Gene will be the largest computer in the world, running at a petaflops, or 1 million gigaflops.
"That paper is a demonstration of what you can do with ab initio quantum mechanics with the best methods we know of," Friesner said.
But despite that success and the promise of others, computational chemistry is still not yet an "enabling technology," he said. For example, it's not used to design drugs. But eventually, big advances in models and processing power will change that, for both materials and biological science, he said. "We'll use computers to design new drugs the way people now use them to design airplane wings."
Top
Chemical & Engineering News
Copyright © 2000 American Chemical Society |