What Is The Difference Between The Different Protein Projects
From Unofficial BOINC Wiki
The differences between the various projects can be summarized by looking at the project descriptions and goals.
- To solve the protein folding problem, we need to break the microsecond barrier. Our group has developed multiple new ways to simulate protein folding which can break the microsecond barrier by dividing the work between multiple processors in a new way -- with a near linear speed up in the number of processors. Thus, with power of Folding@Home (over 100,000 processors), we have successfully smashed the microsecond barrier, simulating milliseconds of folding time and helped to unlock the mystery of how proteins fold.
- Our short term plan for Predictor@Home is to test and evaluating new algorithms and methods of protein structure prediction. In the near term we will be calibrating the P@H methods against a set of known structures. In the longer term we hope to open Predictor@Home up to the community as a resource to assist in protein structure prediction.
- Also, from Chahm:
- Many of the general goals of Predictor@Home and Rosetta@Home are similar: both aim to use results computed by the BOINC Community to refine the algorithms and parameters used in prediction of protein structures and the use of these state-of-the-art to approach biochemical problems of interest. The major differences lie in the specific targets that are studied and, while Rosetta@Home has some focus on protein design, Predictor@Home aims to gather a large amount of equilibrium molecular dynamics data on protein systems.
- The different approaches used toward structure prediction by the two projects are the main area of contrast. There are two major areas that various struction methods differ: 1) the approach to sampling conformational space and 2) the potential function for evaluation of energy. Predictor@Home currently uses a two-step, multiscale approach to structure prediction.
- The first step, Mfold, uses a low resolution representation of the protein chain, with each amino acid represented as a point on a 3 dimensional lattice. Various possible protein geometries are sampled using iterative cycles of Monte Carlo Method moves. The chain is gradually cooled from a high temperature unfolding conditions to a physiological temperature and energy is evaluated by a knowledge-based potential function. A large number of low energy, protein-like structures are produced by mfold and converted to an all-atom protein representation with a list of tertiary contacts.
- In the second step, CHARMM refinement, low energy structures generated by the first step are submitted to simulated annealing molecular dynamics with major tertiary contacts held in place. CHARMM uses a physics-based potential function that includes a implicit solvation model to determine the interaction energies between particles and evaluate energy. In a molecular dynamics simulation, Newton's laws of motion are applied to propogate protein motions in a time-dependent fashion. This second step results in a refinement of low resolution chains into more protein-like high resolution structures and also a more accurate energy evaluation function to choose the best structures generated from the first step.
- Perhaps the greatest difference between the approaches used by Predictor@Home and Rosetta@Home lies in their method of sampling conformational space. In Mfold, sampling of local conformations are based on all possible geometries making it more of a "true" ab-initio approach than Rosetta@Home, which uses local conformation fragments found from a database of previously solved structures.
- Details on Mfold and CHARMM can be found in:
Skolnick, J., Kolinski, A. & Ortiz, A. R. (1997). MONSSTER: A Method for Folding Globular Proteins with a Small Number of Distance Restraints. J. Mol. Biol. 265, 217-241.
Brooks, B. R., Bruccoleri, R. E., Olafson, B. D., States, D. J., Swaminathan, S. & Karplus, M. (1983). CHARMM: A program for macromolecular energy, minimization and dynamics calculations. J. Comput. Chem. 4, 187-217.
- The goal of our current research is to develop an improved model of intra- and intermolecular interactions and to use this model to predict and design macromolecular structures and interactions. Prediction and design applications, which can be of great biological interest in their own right, also provide stringent and objective tests that improve the model and increase fundamental understanding.
- We use a computer program called Rosetta to carry out protein and design calculations. At the core of Rosetta are potential functions for computing the energies of interactions within and between macromolecules, and methods for finding the lowest energy structure for an amino acid sequence (protein-structure prediction) or a protein-protein complex and for finding the lowest energy amino acid sequence for a protein or protein-protein complex (protein design). Feedback from the prediction and design tests is used continually to improve the potential functions and the search algorithms. Development of one computer program to treat these diverse problems has considerable advantages: first, the different applications provide complementary tests of the underlying physical model (the fundamental physics/physical chemistry is, of course, the same in all cases); second, many problems of current interest, such as flexible backbone protein design and protein-protein docking with backbone flexibility, involve a combination of the different optimization methods.
- The research group is involved both in fundamental methods development research and in trying to fight disease more directly. Most of the information on this site focuses on basic research, but I thought you might be interested in hearing about some of the disease related work we are doing.
- Malaria: We are part of a collaborative project headed by Austin Burt at Imperial College in London that is one of the Gates Foundation "Grand Challenge Projects in Global Health". Malaria is caused by a parasite that spends part of its life cycle inside the mosquito, and is passed along to humans by mosquito bites. The idea behind the project is to make mosquitos resistant to the parasite by eliminating genes required in the mosquito for the parasite to live. Our part of the project is to use our computer based design methods (rosetta) to engineer new enzymes that will specifically target and inactivate these genes.
- Anthrax: We are helping a research group at Harvard build models of anthrax toxin that should contribute to the development of treatments. You can read the abstract of a paper describing some of this work.
- HIV: One of the reasons that HIV is such a deadly virus is that it has evolved to trick the immune system. We are collaborating with researchers in Seattle and at the NIH to try to develop a vaccine for HIV. Our role in this project is central--we are using rosetta to design small proteins that display the small number of critical regions of the HIV coat protein in a way that the immune system can easily recognize and generate antibodies to. Our goal is to create small stable protein vaccines that can be made very cheaply and shipped all over the world.
- You might wonder what the relationship is between protein structure prediction and designing new proteins. It turns out they are very closely related, and the improvments in methods you are helping us make can be directly translated into making new enzymes, vaccines, etc.
- For more information on protein design you might be interested in looking at the review we recently wrote in science which is available at our labs "Publications" page:
- Schueler-Furman, O., Wang, C., Bradley, P., Misura, K., Baker, D. (2005). Progress in modeling of protein structures and interactions Science 310, 638-642. Full Text PDF
- What is SIMAP?
- SIMAP is a database of protein similarities. It contains about all currently published protein sequences and is continuously updated. Protein similarities are computed using the FASTA algorithm which provides optimal speed and sensitivity. SIMAP is to our knowledge the only project that combines comprehensive coverage with respect to all known proteins and incremental update capabilities.
- What is SIMAP used for?
- Because of the huge amount of known protein sequences in public databases it became clear that most of them will not be experimentally characterized in the near future. Nevertheless, proteins that have evolved from a common ancestor often share same functions (so-called orthologs). So it is possible to infer the function of a non-characterized protein from an ortholog with known function. A well-known example are the investigations about mouse genes and proteins. Their results are also being true for orthologous human genes and proteins in many cases. Protein similarities provide information about relations between proteins and are necessary for the prediction of orthologs. There are many more bioinformatics methods that rely on protein similarity. Our protein similarity database provides pre-computed similarity data and represents the known protein space. This opens completely new perspectives compared to the commonly used method to repeatedly recalculate such kind of data. SIMAP is regularly updated. The similarity matrix is simply beeping incrementally extended if new sequences occur. The use of SIMAP is completely free for education and public research.
- Why do we need distributed computing for SIMAP?
- The computational costs to calculate the similarity data depend on the square of the number of contained sequences. So the computational effort for keeping the matrix up-to-date is constantly increasing. Our internal resources that perform calculations for SIMAP since years are not longer sufficient to keep track of all new sequences. That's why we implemented a SIMAP-client for the BOINC platform (Berkeley Open Infrastructure for Network Computing) which is based on the FASTA algorithm to detect sequence similarities. We 're running the last tests now and are about to start a BOINC Powered Project that will contribute to SIMAP similarity calculations soon.
- To get to the point: Folding@Home/Predictor@Home/Rosetta@Home/World Community Grid/ test protein folding (best chemical structure) - SIMAP is going to analyze the existing databases (based on the FASTA-algorithm) to detect sequence similarities (homologs). Probably proteins with similar amino acid sequences show the same or related functions. A FASTA-Query consists of four parts: Hashing, Scoring 1 + 2 and Alignment (detailed information follow). Goal of the sequence alignment is to find similarities of phylogenies and protein-function. (Citation)
- Description of the Fasta-format, which serves as input for the fasta algorithm and many other bioinformatic-applications as well:
- Information about the Human Proteome Folding Project is given at: About the Project:
- The Human Proteome Folding Project will combine the power of millions of computers in a grid to help scientists understand how human proteins fold. The work to be done in this monumental task is shared across this grid, so that results can be achieved far sooner than would be possible with conventional supercomputers. With a greater understanding of protein structure, scientists can learn how diseases work and ultimately find cures for them. When your grid agent is running, it is folding an amino acid chain in various ways and evaluating how well each folding follows the specific rules of how specific amino acids stick together or not. As computers try millions of ways to fold the chains, they attempt to fold the protein in the same way that it actually folds in the human body. The best shapes identified for each protein are returned to the scientists for further study. The name of the computer program is Rosetta. It computes a "Rosetta score" that tells how properly folded a protein is as the program tries different foldings. To compute this score, the program considers the packing of amino acids within the protein according to many scoring rules. The lower (more toward the negative) the scores are, the better the folding.
- Information about the FightAIDS@Home Project is given at: About the Project:
- Proteins are the basic building blocks in all of life's functions. (You can read more about them in the description of the Human Proteome Folding project). Proteins are long chains of smaller molecules called amino acids. Enzymes are particular kinds of proteins that accelerate biochemical reactions. A protease ("pro-tee-ace") is an enzyme that is able to cut proteins apart at some point along the amino acid chain. For example, when you eat food containing protein, the protein molecules are cut apart into smaller amino acid molecules by proteases in your stomach. Your body can then use the amino acids to build the proteins it needs. While only a small percent of all of the proteins in an organism are proteases, they are very important in the proper functioning of its life processes.
- Your computer will help by simulating the attachment process (docking) of many ligands to the HIV-1 protease, using a computer program called AutoDock. The most promising ligands will be studied in more detail by scientists and should lead to better protease inhibitor drugs for controlling HIV and ultimately preventing the onset of AIDS.