PROTEIN FOLDING

Free Energy Considerations

The number of possible configurations that a protein could assume if all conformations were equally probable and random is astronomical. The fact that proteins assume a " single " conformation shortly after synthesis and maintain this conformation as long as the environment does not change drastically suggests that the native conformation is energetically favored over other conformations.

In an attempt to explain the folding of protein molecules, Anfinsen has proposed that these molecules assume the conformation of greatest thermodynamic stability. That is, the protein spontaneously assumes the conformation of lowest energy for a given environment (ionic strength, temperature, pH, etc.). This suggests that no matter what combination of amino acids appears, the final conformation of that entity is totally dependent upon free energy considerations.

Others have suggested that the conformation assumed by the protein need not be the one of absolute lowest free energy. It may rather represent the lowest free energy state that is kinetically available to the protein. The native conformation might actually be of a higher free energy than some alternative states, but these conformations are not available to the protein because of large energy barriers. This states that given enough time, the protein would assume the state of lowest free energy, but that the rate of this transformations so slow as to be nearly non-existent under normal circumstances.

Whichever if these hypotheses is accepted, the practical conclusions are the same. A protein will spontaneously assume the conformation of lowest available free energy in any given environment and under normal circumstances this conformation will be the native one.

The Time Required for Protein Folding

A medium sized protein composed of 200 amino acids would in theory have a large number of possible conformations open to it. If it is assumed that for each amino acid, there are only three possible conformations ( there are probably many more ), it can be calculated that the protein would have:

= 2.66 x 1095

possible conformations. The fact that protein synthesis and folding are accomplished in only a couple of minutes argues that only a few of these possible conformations can possibly be tried in the proteins attempt to achieve the state of lowest free energy.

This suggests that local, short-range interactions may severely limit the number of conformations open to the protein molecule and that these forces cause portions of the molecule to rapidly assume the correct conformation.

Short Range Forces

It was noted in chapter 4 that not all angles are allowed between adjacent peptide planes. When energy maps are made of actual angles found in proteins, average values can be assigned for the f and j angles of any amino acid. Additionally it has been observed that some amino acids have a preference to occur in a-helixes while others are more commonly found in the extended conformation. Some amino acids show a strong preference for locating in the turn portions of protein molecules.

A five amino acid sequence can be examined to determine the structure of lowest free energy. The first thing that must be done is to assign values for all the f and j angles in the protein based on averages obtained for amino acids based on the structures of many proteins. A computer program is then run that alters these angles slightly and examines the effect on the total energy of the system. Quite often the final predicted conformation closely resembles the actual conformation that these amino acids assume in intact proteins.

If enough helix tending residues are present, a helix will start to form. Residues normally found in b-sheets will assume an extended conformation and even b bends can be shown to be the structures of lowest free energy for some combinations of four amino acid residues.

This strongly argues for the importance of short-range interactions in determining the three dimensional structure of protein molecules. It appears that just the energetics involved in the interactions of adjacent amino acids can dictate a great deal of the final three dimensional structure that will be assumed. These observations have led a number of workers to attempt to predict the secondary structure of proteins based solely on their primary sequences.

Predictions of Protein Structure

Many empirical methods have been developed to predict the secondary structure of proteins from their amino acid sequences. The earliest of these attempted only to predict whether or not a given residue would be found in a helical region. Each amino acid was assigned a probability of being found in or out of a helix based on that amino acid's occurrence in such structures in proteins whose three dimensional structures were known. The best of these schemes often obtained correct predictions 70% of the time while a random assignment of residues would be expected to be correct only 50% of the time for a two state model.

The significant improvement over a random assignment led credence to the hypothesis that short-range interactions were of prime importance. The reason for the high number of wrong predictions could have been due to the presence of longer range interactions that had not been considered in the original models.

More recently, schemes have been developed that assign amino acids to a number of categories including helical, extended and turns. Many times these more complex models have attained the correct assignments of residues nearly 80% of the time. A random assessment in a three state model should yield a 33% success rate and thus this data argues strongly for the importance of short-range forces in determining protein structure.

Other approaches have utilized preliminary assignments of amino acids into classes according to their locations in proteins of known structure as described above. The amino acids were assigned values for f and j based on average values according to their class assignments. Computer programs were then devised that varied the angles slightly about a given peptide to obtain new angles that were calculated to have lower energies. The angles about the next peptide bond were then varied to yield a minimum energy value considering the new values assigned to the adjacent peptide. By a continuation of this procedure, the effects of neighboring amino acids could be considered and thus the effects of medium range interactions could be simulated. These calculations require vast amounts of computer time and are very expensive to perform. To date, none has been performed to completion for an entire protein. In the cases where the programs have been allowed to optimize through several cycles, however, many features of the final structure started to appear.

The above information indicates that short-range interactions with some intermediate-range forces are adequate to explain the existence of much of the secondary structure found in proteins. It does not, however, address the type of forces required to achieve the final tertiary structure of the finished protein.

Tertiary Structure and Short-Range Interactions

Anfinsen and co-workers, in an elegant series of experiments, have produced antibodies to portions of enzyme molecules that can determine whether or not these regions are in the native or non-native format. These antibodies have been utilized to examine the conformation of segments of ribonuclease that have been produced by proteolytic cleavage. The results of these experiments indicate that fragments containing about 30 amino acids are in the native format about 0.02% of the time. This would mean that they are in the non-native format 98.8% of the time. Thus, short range forces would at first appear to have little influence on the tertiary structure of ribonuclease. A calculation of the amounts of native structure that would be expected if short-range interactions had no affect on the tertiary structure of these fragments suggests that they should randomly assume the native format from 10-5 to 10-18% of the time. Thus, while short-range forces are not responsible for the attainment of the total protein structure, they still cause the molecule to assume a structure that is much closer to native than would be expected if these forces had little or no affect.

It has been calculated that for the fragments to be converted from the form where 0.02% is native to 100% native would require an additional energy of 0.37 kcal/mole. This amount of energy should be readily available from longer-range interactions and demonstrates how relatively small changes in energy can have a large affect on the conformation of protein molecules.

Achievement of Native Structure

The information presented can be utilized to provide the following scheme of protein folding. This proposed scheme is only applicable to single chain monomeric proteins. As the protein is released from the ribosome, short-range interactions will cause various regions of the molecule to assume structures that are helical or extended. In other cases, turns will be produced in the molecule. These turns will direct portions of extended structure towards each other and allow for the formation of hydrogen bonds which will yield b-structures. Areas containing non-polar amino acids will tend to become associated and will be removed, as much as is possible, from the aqueous phase. The energy gained from the removal of hydrophobic groups from the aqueous phase is large and is one of the primary driving forces for protein folding.

In proteins that contain cysteine residues, the folding of side chains may bring these groups into proximity. Their oxidation to yield disulfide crosslinks adds greatly to the stability of the protein molecules.

The folded proteins are usually compact, but there is still rotation allowed around many bonds in the molecules. This adds flexibility to the protein and yields a gain in conformational entropy which also adds to the stability. Some proteins may give up some of this flexibility by binding metal ions, cofactors, substrates, etc. This binding generally stabilizes the structures and makes them more rigid. The losses in entropy that occur must be offset by gains in binding energy, usually a lower enthalpy.

Most protein molecules can be almost completely unfolded in the presence of urea and reducing agents. When the denaturants are removed, the proteins refold to their original conformations, often with the recovery of greater than 90% of biological activity. This again indicates the importance of primary sequence in determining the final three dimensional structure of proteins.

Many biologically active molecules are synthesized in an inactive form and are then activated when needed by proteolysis, eg. pepsin, chymotrypsin, insulin, etc. In these cases, the proteolized, active molecule is not in the state of lowest free energy. When these molecules have their structures scrambled, they will not regain activity upon removal of the denaturants. Apparently, the removal of a portion of the molecule not only activates the enzyme or hormone, but also removes information that is necessary for proper folding of the protein molecule. When denaturation and renaturation are performed prior to enzymatic cleavage, almost full recovery of biological activity is achieved.

For those proteins that associate to give tertiary structures, further interactions must occur. Such associations are almost exclusively of a hydrophobic nature and molecules that tend to show such associations are usually more hydrophobic than are molecules that don't. Proteins that have too many hydrophobic groups to completely bury in their interiors must have some of these groups on their surfaces. The occurrence of these groups is energetically unfavorable because they tend to lower the entropy of the system. In order to increase the entropy and lower the free energy of the system, these molecule tend to associate by placing their surface hydrophobic groups into contact with each other. This effectively removes these groups from the aqueous phase. The association of hydrophobic groups is by its nature non-specific. The distribution of charged groups on the surface of the proteins adjacent to the hydrophobic areas can, however, make the associations that occur quite specific.

If sulfhydryl groups are located on the surface at the near the points of contact between the associating species, the formation of disulfide linkages can serve to stabilize the structures. Weaker, bit still important stabilization can also occur due to salt bridges and other ionic interactions. Because of the hydrophobic nature of these associations, quaternary structure can often be disrupted by exposing the protein to low temperature.

In general, it has been observed that free energy considerations and short-range interactions cause protein molecule to assume much of their secondary structure very rapidly. Hydrophobic interactions and turn portions of the molecules aid in the directing their folding into specific three dimensional structures. These are stabilized by longer range interactions. Some proteins can further associate into specific aggregates to yield defined quaternary structures.