Makeover Puts CHARMM Back in Business
Biofuels scientists are asking more complex questions about how molecules spin, bond, and break when enzymes attack plants — all in the name of quickening the process of turning biomass into fuels for the sake of cleaner air and better energy security.
They're the kinds of questions that require trillions of mathematical operations each second on supercomputers. But, software engineers hadn't been able to keep up with the ever-increasing demands of the scientists and the growing capabilities of modern supercomputers. That is, until unique work at the U.S. Department of Energy's National Renewable Energy Laboratory (NREL) supercharged an essential decades-old software program to run on a single high performance computer such as the new petascale computer at NREL's Energy Systems Integration Facility.
Software engineers at NREL have reworked codes and algorithms on the CHARMM (Chemistry at Harvard Molecular Mechanics) program to allow it to simulate molecular motion with millions to billions of steps of computation. It does so by simulating nanoseconds to microseconds of molecular motion, which takes days of computing time.
How long is a nanosecond? Well, a nanosecond (a billionth of a second) is to a second as a second is to 31.7 years.
And a nanosecond is a very long time when measuring all the movements of thousands of atoms in a molecule.
It takes a million molecular dynamics (MD) steps to simulate a nanosecond of molecular motion.
"For an average system of 100,000 atoms on a single modern processor core, it would take us half a day of computing to simulate less than half a nanosecond," NREL Senior Scientist Michael Crowley said.
But they need to simulate molecular motion for much longer than that — as long as 100 nanoseconds.
"Using the original version of parallel CHARMM, it would take half a year, no matter how many processors we used, to simulate molecular motion for that long," Crowley said.
Thanks to the improvements the NREL engineers made to the CHARMM algorithms and code, they can now do that simulation in a day with hundreds of processors running in parallel.
"To get a microsecond [1,000 nanoseconds] on a thousand processors will now take a few days," Crowley said.
The only limit on the questions scientists can ask — and expect answers to — is the speed of computing power. For more than a decade, each time scientists asked new questions that required faster computer power to answer, engineers could count on a computer's speed doubling every year or so to keep up.
"But this is not enough to keep up anymore. Computer chips are not getting any faster — they are getting more parallel," said NREL's Antti-Pekka Hynninen, a physicist and software engineer. "We now have to parallelize the code" to multiply the speed at which the simulations can be run.
CHARMM Models Biological Reactions
CHARMM was developed at Harvard University in the 1980s to allow scientists to generate and analyze a wide range of molecular simulations, including production runs of a molecular dynamics trajectory for proteins, nucleic acids, lipids, and carbohydrates.
It is a favorite program of molecular researchers around the world for simulating biological reactions such as the action of cellulase on cellulose for converting biomass into ethanol. CHARMM is also a crucial code for the pharmaceutical industry.
CHARMM is unique in its ability to build, simulate, and analyze results of molecular motion in a single program. "It provides more methods of simulation than any other program, and the newest and most cutting-edge methods for thermodynamics, reaction sampling, quantum mechanics, molecular mechanics, and advanced imaging," Crowley said.
For all its advantages, though, CHARMM's crunching velocity hadn't kept up with the new demands and the new questions. The size of the new biomolecular simulations is so large (more than 1 million atoms) and the simulation time so long (5 million time steps for the 10-nanosecond simulation) that they exceeded the capabilities of CHARMM.
So, three years ago, Crowley hired Hynninen to update the code and increase its performance.
If Hynninen had tried writing the entire 600,000 lines of code, he estimates it would have taken him about 10 years.
Instead, he focused on rewriting the heart of CHARMM, the molecular dynamics engine, and he was able to parse the chore down to two years. The molecular dynamics engine is where all the heavy computation is done. It may only represent 5% to 10% of the total lines of code, but it accounts for approximately 99% of the central processing unit (CPU) time in a typical simulation.
He's the first to admit it wasn't exactly a day (or two years) at the beach.
Hard, Laborious Work — with Shortcuts
"It's one of those very hard problems, mechanics of atoms and enzymes," Hynninen said. "There is really no limit to how a molecule can behave." Its motions are determined by the interplay of a multitude of interactions between each atom and every other atom nearby — through both chemical bonds and non-bonded interactions, he noted. That results in thousands of different kinds of interactions per atom. And there can be hundreds of thousands of atoms in a simulation. "And this makes writing the algorithms and code quite challenging."
The day-long task using hundreds or thousands of processors simulates a very brief moment cataloguing every move by thousands of atoms. "It's not just that they all move but that each atom is feeling forces from thousands of other atoms," Crowley said. "And each one of those forces has to be calculated for every atom at every step."
On the time scale most of us are used to, observing action in microseconds of nanoseconds seems ridiculously short. "But they are long enough to answer lots of questions, because they show us what the molecule is probably doing most of the time," Crowley said.
And simulating the motion of atoms answers important questions about how any enzyme can access the sugars in a plant.
The sugars the biofuels industry wants are locked up in a polymer called cellulose, which forms bundles or fibers of a few dozen polymer chains. CHARMM's molecular dynamics can simulate those bundles and find how strongly they are held together, as well as what interactions are holding them together. Using CHARMM, scientists can also model the interaction of an enzyme with those bundles and determine how the enzyme peels the polymers out of the bundle. "We learn what forces it uses or how it reduces forces holding the bundle together," Crowley said.
Entropy is a large factor in the process, so it's not enough to merely calculate the energy. Scientists have to find out all the possible configurations of the molecular system. "It's essential to find out how much each configuration is contributing to the average behavior — and that takes a lot of simulation time," Crowley said.
They carefully examine the data to see which amino acids are interacting with the sugars. They observe how the overall structure of the carefully chosen enzyme changes and how it binds, twists, and bends the sugars to allow the chemical reaction that releases them.
Molecular dynamics code is quite easy to write if you do not care about performance, Hynninen said. But to code the algorithm to run very fast … that's difficult.
"I just started digging in," Hynninen said. "It's a lot of sort of lonely work. Just to figure out the algorithms, I went through 20 legal tablets, drawing diagrams — and then writing the algorithm into code."
The traditional approach is to divide the atoms evenly among the CPUs: the first CPU gets, say, the first 1,000 atoms, the second another 1,000, and so on.
The trouble with this approach is that each CPU has to talk to every other CPU at each molecular dynamics step to inform them what the others are up to. And that communication slows everything down.
To speed things up requires very clever shortcuts — communicating the smallest amount of information between the fewest computers. This means reorganizing the work to be done and which computer does it with those criteria in mind, and then making sure the work is equally distributed so there are no idle workers.
"We set up the problem so talking is minimized — the number of words and messages is stripped to the bone," Crowley said.
Hynninen retained the strengths of the CHARMM code and combined them with ideas from other programs to enhance CHARMM's speed.
"Now we're back in the ballgame again," Crowley said. "This is a huge, huge improvement. People are using CHARMM again."
Funding for NREL's work on CHARMM came from two areas in DOE's Office of Science — Advanced Scientific Computing Research (ASCR) and Biological and Environmental Research (BER) — as well as funds from the National Institutes of Health (NIH) for code modernization at the University of Michigan. Partners include the University of Michigan, Oak Ridge National Laboratory, and the University of California at San Diego.
Crowley and Hynninen vow to do all they can to prevent CHARMM from again slipping behind other codes in processing speed.
While national labs hire some of the world's best scientists, they typically don't hire software engineers and especially not scientists like Hynninen to do software engineering. It may not be a bad idea for them to start doing so, what with science frontiers becoming increasingly reliant on powerful computers and complex algorithms, Crowley said.
Computer Simulations Answer Questions About Enzyme Processing Bottlenecks
By fully understanding how enzymes find, reach, and act on the cellulose in plants, scientists may be able to engineer super-efficient enzymes that create abundant energy from algae or agricultural waste products.
The work is crucial because one of the most promising paths toward energy independence and clean energy requires biofuels to achieve price parity with gasoline.
Scientists know that cellulose-active enzymes act like protein machines, breaking bonds in procession and pulling out strands à la an assembly line.
But there is still plenty to learn, because bottlenecks in the process slow things down. NREL Senior Scientist Mark Nimlos uses CHARMM's molecular dynamics software to simulate some 50,000 atoms to try to uncover the bottleneck. The aim is to selectively replace a few amino acids to speed up the bond breaking.
Now that the new version of CHARMM is orders of magnitude faster — supersonic transport versus the Kitty Hawk plane, Crowley attests — the chances of solving problems such as those bottlenecks have increased exponentially. And the new and improved CHARMM should prove a boon to molecular scientists working with pharmaceuticals, as well.
Like a three-legged-race team moving in tandem, scientists and computer engineers have to count on each other to keep up.
Crowley compared Hynninen's borrowing of algorithms from other molecular dynamics software packages to "looking at how a VW is built to help you make your Chevy better."
"It's pretty darn hard to do if you don't understand the science," Crowley added. "That's why [Hynninen] is a real gem. He understands the science as well as the algorithms."
Learn more about NREL's biomass and computational sciences research.