In early January of 2003, a team of three Waterloo investigators won the prestigious
CAFASP competition held every two years on the Internet. Short for Critical
Assessment of Fully Automated Structure
Prediction,
the competition holds great interest not only for geneticists, but for the rest
of us. After all, we're all made out of protein, in part, and the ability to
predict how a given sequence of DNA directs the production of protein is crucial
to our understanding of how genes generate living bodies.
The team, consisting of faculty advisor Ming Li and graduate student Jinbo Xu, developed the program now known as RAPTOR, short for Rapid Protein Threading Predictor. They were ably assisted by Ying Xu, a scientist at Oak Ridge National Laboratory in Tennessee. Ying Xu had developed a program called PROSPECT, winner of the 2001 Research and Development 100 award. Sharing in the glory is the well-known supercomputer firm, Silicon Graphics in California. The 40 CPUs (central processing units) of Waterloo's SGI supercomputer (called FLEXOR) churned out over 600 CPU hours of computation in predicting the three-dimensional shapes that would produce some 60 strands of DNA (used as competition test cases).
Why is protein-folding such a difficult problem? In our cells, proteins are produced by minutes bodies called ribosomes. Each ribosome "reads" a strand of DNA sent to it and "writes" a protein. That is, the ribosome translates the DNA message into the protein that the DNA specifies. The problem lies in the fact that while the information contained in DNA is essentially one-dimensional, protein is invariably three-dimensional. As it emerges from the ribosome, the newly formed protein literally twists and writhes, seeking an equilibrium position in which the component molecules occupy positions of minimum energy. What makes the job of predicting the shape difficult is that every atom in the assembly has an influence on the final shape of the protein. The charge, mass, and position of every atom in the emerging strand of protein interacts with all the other charges, masses and positions.
As the protein folds, however, relative positions all change and everything must be recomputed. Did the computation take the right path? Perhaps another path would lead to an even more stable configuration, a new shape. Which shape is "right?" The final shape of a protein determines its properties and function.
Because RAPTOR did not use parts of preexisting programs, being written more or less from scratch, it was judged in the "non-meta" category, "meta-programs" being assembled from previous programs.
Asked to comment on the new research, Alan George, Dean of the Mathematics Faculty, identified it as "major thrust" in the area of bioinformatics. "Protein structure prediction is one of the key problems that need to be solved to move research ahead in these areas." Ming Li, who holds the Canada Research Chair in the School of Computer Science Bioinformatics Program, described RAPTOR as "a very significant achievement," adding that "Protein structure prediction is a difficult and very important field. Many well-known researchers have been working on this problem throughout their entire careers."
Li credits the design and implementation talents of graduate student Jinbo Xu as a major reason for the program's success. Using the team's "new approach to protein structure prediction by linear programming to optimize the energy functions," Xu "worked extremely hard day and night on this for the last year."
As for the competition itself, Li said, "No Canadian team has ever achieved anything within the top 10," More than 100 prediction programs have participated in the biennial CAFASP competition since its inception six years ago.
As the new king of the computational castle when it comes to the prediction of protein folding, Raptor still does not solve the protein-folding problem completely. Time is of the essence, not only because so many diseases await a gene-based cure but because research is painfully slow in what is now being called the "post-genome era." Using manual methods, involving X-ray crystallography and NMR technology, scientists typically needs six months to determine the structure of a single protein molecule. Such high-throughput methods as RAPTOR are expected to revolutionize this research.
Jinbo Xu, who expects to finish his PhD later this year, came to Waterloo from
the University of Science and Technology of China. Well trained in mathematics,
algorithms, and programming, Xu also placed first in a math contest in Jiangxi
Province in China.
The protein folding problem:
www.msykes.com/writing/GA_project.pdf
Silicon Graphics:
www.sgi.com
Image Credit:
DNA image from Purves et al., Life: The Science of Biology, 4th Edition, by
WH Freeman (www.whfreeman.com), reproduced
with permission.

David R. Cheriton School of Computer Science
University of Waterloo
Waterloo, Ontario, Canada N2L 3G1
Tel: 519-888-4567 x33293
Fax: 519-885-1208
Contact | Feedback: cs-webmaster@cs.uwaterloo.ca | David R. Cheriton School of Computer Science | Faculty of Mathematics