AMINO ACIDS AND PRIMARY STRUCTURE

Proteins are defined as molecules that upon hydrolysis yield mixtures of a amino acids that are utilized to synthesize proteins. The proteins are synthesized at the ribosomes utilizing information found in RNA which was originally derived from DNA. This information determines the sequence of amino acids that will compose the protein molecule. The linear sequence of amino acids in a protein is known as its primary structure. The information contained within this primary structure must also determine what, if any, changes will be made to the amino acids in the chain as well as any other chemical reaction that may occur. This sequence must also provide all the information required to yield the final three dimensional structure that the protein molecule will assume.

The amino acids that make up proteins are often compared to the letters of the alphabet that make up words. There are 26 letters and 20 amino acids. The order of letters or amino acids is important to the finished result. Random association of letters or amino acids almost invariably results in nonsense. The major difference is the length of the units formed. Words are mostly made up of only a few letters while proteins usually contain hundreds of amino acids.

The number of possible combinations of amino acids to yield proteins becomes staggering for even moderately sized molecules. For a peptide that contained only 10 amino acids there could be:

 

2010 = 1.02 x 1013

possible combinations. If the protein contained only one of each of the 20 common amino acids, there would be:

20! = 2.43 x 1018

combinations. Most proteins contain many more than 100 amino acids and the number of possible combination for these becomes extremely large.

In the case where a peptide contains 20 amino acids with a possible 1.05 x 1026 combinations it is relevant to consider how many of these could reasonably be sampled during the course of evolution. There are 3.15 x 107 seconds/year. If the proteins could try one million combinations per second, it would be able to sample 3.15 x 1013 combinations/year. To sample all the possible combinations would require:

(1026/1013) = 3.33 x 1212 years

One million samplings/second is beyond what could be expected to occur and 3.3 x 1010 years is roughly 103 times the length of time that there has been life on earth. Only extremely small portion of all the possible combinations of amino acids found in any given protein molecule can have been sampled during the course of the evolution of protein structure. The next section will discuss how the linear array of amino acids leads to the three dimensional structure of proteins. The rest of this chapter will discuss the properties of amino acids.

The Chemical Nature of Amino Acids

The amino acids found in proteins have the following generalized structure:

where R can stand for a variety of constituents. The amino group and the carboxyl group of the amino acid have PKs of about 9.6 and 2.3 respectively. Thus, the amino acid can exist in three general forms. At low pH the amino and carboxyl groups will be protonated and the molecules will be in the acid form.

As the pH is increased towards neutrality, the amino acids become zwitterions having both negative and positive charges.

As the pH increases further, the molecules become basic.

In going from low to high pH:

When amino acids are joined together to form peptide bonds, water is removed in the following reaction:

It should be noted that the joining of amino acids to form peptide bonds eliminates the acid and basic character of the molecules. The only ionized groups in the finished protein will be the terminal amino and carboxyl groups and also any R groups that happen to contain acidic or basic groups.

The Structure of Amino Acids

While there are hundreds of amino acids in nature, only 20 are commonly found in proteins. With one exception these are a amino acids that differ only in the nature of their R groups. In this section the structure of each amino acids will be given and some of the properties that are unique to that compound will be discussed. The standard three letter abbreviations for each amino acid will be given with its structure and these abbreviations will be used frequently in the remainder of the text.

Glycine

Glycine is the only amino acid that does not have optical isomers. The other amino acids do, but in proteins only the L isomer occurs. Glycine has no ionizable groups when it is located in peptide bounds. Glycine will spontaneously transfer from a non aqueous to an aqueous environment with a DF of -4.6 Kcal/mole. Because glycine contains only a hydrogen while other amino acids contain either polar or nonpolar groups, Tanford (1962) has suggested that glycine be utilized as a reference compound and that other amino acids be compared to glycine to determine the nonpolarity or hydrophobicity of their side chains. These free energy transfer values have been designated Hf. The value for glycine then is by definition zero and glycine can be expected to be found with nearly equal frequency at the interior or exterior of a protein. Due to its small size, glycine may be placed in portions of proteins where other amino acids can not occur due to steric reasons. The presence of glycine residues tends to interrupt the helical structure. Glycine is often found in portions of proteins involve in turns.

Alanine

The next largest amino acid, alanine, contains a methyl group rather than a hydrogen attached to its a carbon. Like glycine, it has no ionizable groups when it is found in proteins. Its value for Hf is 0.75 Kcal which suggests that is mildly hydrophobic. Also like glycine, alanine tends to be found with almost equal frequency on the surface and interior of protein molecules. Alanine tends to promote the formation of helical structure.

Valine

Valine has no ionizable groups when found in peptide bonds, but has considerably more bulk than either Gly or Val. Its Hf value of 1.70 Kcal suggest that it is a fairly hydrophobic residue. It is found almost exclusively in the interior of proteins. The presence of valine tends to favor the formation of helical structures.

 

Leucine

Leucine resembles VAL with a CH2 group before the branch. It has no ionizable groups when found in proteins and is strongly hydrophobic. It has a Hf of 2.40 Kcal and is found in the interior of protein molecules. Its presence tends to stabilize helical structures.

Isoleucine

Isoleucine is a positional isomer of LEU and thus has many of the same characteristics. The branching at the b-carbon tends to increase its bulk and thus its hydrophobicity. Isoleucine has the second highest Hf of 2.95 Kcal, has no ionizable groups when found in proteins and tends to stabilize helical structures.

Serine

Serine resembles ALA with a hydroxyl group. It is a neutral polar molecule that tends to remain on the surface of proteins (Hf < O). The presence of serine tends to interrupt helical structures. While generally present on the surface, serine can be found on the interior of protein when its -OH group is involved in hydrogen bond formation. Serine can also form noncovalant cross links between protein links between protein chains due to hydrogen bonding.

Threonine

Threonine is closely related to SER in that only a methyl group has been added to the molecule. The hydroxyl group tends to make the molecule polar while the ethyl group tends to be non-polar. Threonine has an Hf of 0.45 Kcal but tend to be found on the surface of proteins unless its hydroxyl group is hydrogen bounded. The presence of threonine neither favors nor inhibits the formation of helical structure and threonine has no ionizable groups when found in protein molecules. Like SER, threonine can be used to form hydrogen bond cross-links between protein chains.

Phenylalanine

Phenylalanine is a bulky amino acid with a strong hydrophobic character. Its Hf is 2.65 Kcal and this residue is found almost exclusively in the interior of protein molecules. It has no ionizable groups but does contain the conjugated ring system with its p electrons are also able to interact with other molecules containing p electrons. Phenylalanine neither favors nor inhibits the formation of helical structure. This residue can be found both at the surface and interior of protein molecules. When found on the interior, its hydroxyl groups is always involved in hydrogen bonding.

Tyrosine

Tyrosine strongly resembles PHE but contains a hydroxyl group on the ring. The bulky ring gives the molecule a hydrophobic nature with Hf of 2.85 Kcal. The hydroxyl group is polar and will readily interact with water. The conjugated ring's p electrons are also able to interact with other molecules containing p electrons. Tyrosine neither favors nor inhibits the formation of helical structure. This residue can be found both at the surface and interior of protein molecules. When found in the interior, its hydroxyl group is always involved in hydrogen bonding. At very high pH values the hydroxyl of PHE can be ionized and thus the molecule can be considered as weakly acidic. The pk of this group is around 9.6 when tyrosine is located in a protein and thus at neutral pH values, the molecule is essentially unionized. Tyrosine absorbs strongly in the ultraviolet region and thus contributes to the UV absorbance of proteins.

Tryptophan

The tryptophan is the bulkiest amino avid and has a Hf of 3.0 Kcal. In spite of its very hydrophobic nature, it is found both on the surface and interior of protein molecules. Its extensive array of p electrons allows it to interact strongly with other molecules containing p electrons. Tryptophan has no ionizable groups and tends to favor the formation of helical structures. Tryptophan absorbs strongly in the region between 275 and 280 nM and makes a large contribution to the ultraviolet absorption of protein molecules.

Cysteine

Cysteine is a slightly acidic amino acid that is slightly hydrophobic with an Hf of 1.0 Kcal. The presence of cysteine neither favors nor inhibits the formation of helical structures. Probably the most important characteristic of cysteine is its ability to stabilize protein structure by forming disulfide linkages with other cysteine molecules. These covalent cross links add stability to the three dimensional structures of protein and their formation and importance will be discussed in some detail in the next chapter. The sulfhydryl group of cysteine is a very weak acid with a pK of about 8.4. At a pH near neutrality, a few percent of the sulfhydryl groups of a protein will be ionized.

Methionine

Methionine is a neutral amino acid with a Hf of 1.3 Kcal. It is related to cysteine but can not form disulfide linkages. Methionine tends to favor the formation of helical structures. While not able to cross link proteins through disulfide linkages, the molecule can form important interactions with other constituents that may bind to proteins. The sulfur atoms of methionine contains a pair of nonbonded electrons that are capable of binding to metals to make methionine a metal ligand.

Proline

Proline is not a primary amine, but rather a secondary amine or an imine. Peptide bonds formed with proline lack a free amino group to form hydrogen bonds and thus proline tends to strongly inhibit a helical structure. These peptide bonds tend to fold back upon themselves and proline is found quite often in regions or protein that form turns. Proline has not ionizable groups and has Hf of 2.6 Kcal.

 

Aspartic Acid

Aspartic acid is one of the two dicarboxylic amino acids. The second carboxly group makes the molecule very hydrophillic. It has a Hf < 0. The pK of the second group is about 3.85 and thus aspartic acid contains a negative charge at neutral pH. Removal of this charged group from the aqueous phase requires a large expenditure of energy and thus charged amino acids are found almost exclusively at the surface of proteins. This group is able to form ionic bonds with positively charged amino acids or metals and it can also form ion dipole interactions with water. These interactions are very important to the solubility properties of proteins. The pressures of aspartic acid neither favors nor inhibits the formation of helical structures.

Glutamic Acid

Glutamic acid is very similar to aspartic acid in its structure and its properties. It contains one more CH2 group, but still has a Hf < 0. The carboxyl group is less acidic than is that of ASP with a pK of 4.25. The differences are not great, however, and at neutral pH dissociation is virtually complete. The presence of glutamic acid tends to favor the formations of helical structures and like ASP, it is involved in many interactions.

Asparagine

Asparagine resembles ASP but the carboxyl group has been neutralized by formation of an amide bond., Some amino acids are modified after the protein is assembled, eg. hydroxylation of some proline residues, methylation of some histidines, etc. These changes are not determined by the genetic code but rather are performed by specific enzymes that recognize certain amino acid sequences. This is not the case for asparagine and this amino acid is inserted as such during protein synthesis. Even without the free carboxyl group asparagine is a polar molecule that is almost always found at the protein surface. It can function as a chain crosslinker via hydrogen bond formation or it can hydrogen bond to water at the protein surface. Asparagine tends to inhibit the formation of helical structure and quite often is found in protein bends.

Glutamine

 

Much as asparagine resembles ASP , Glutamine resembles GLU. Again the carboxyl group has been neutralized by formation of an amide bond. Like asparagine, glutamine is almost always found at the protein surface. It can function as a chain crosslinker via hydrogen bond formation or it can hydrogen bond to water at the protein surface. Glutamine tends to favor the formation of helical structures in proteins.

 

Histidine

Histidine can bind a proton to the nonbonded electron pair of its ring nitrogen to become a weak acid at low pH. The pK of the acid is 6.0 so that at neutral pH, histidine is about 90% in the basic form with about 10% still in the acid form. Histidine is the only amino acid that has a functional group that titrates in the physiological pH range. It is a polar molecule, Hf < O that tends to favor the formation of helical structure. Depending upon its form, which depends on the localized pH of its environment, histidine can serve as both a proton donor and accepter. The nonbonded electron pair of the basic form are always available for metal chelation. This versatility has been utilized and histidine is quite often found at the active site of enzymes and as a point of attachment for metal containing group.

Arginine

Arginine is a large polar molecule with a positive charge at neutral pH. The pK of the guanidanyl group is about 12.5. Even though the molecule has a positive charge at almost all pH values, arginine is a very large molecule and has an Hf of 0.75 Kcal. Arginine tends to interact with negatively charged groups, negative ions and with water.

Lysine

Lysine is a charged polar amino acid having an extra amino group. The pK of this group is about 10.5 and thus lysine will have a positive charge at the pH values that most proteins are likely to encounter. Lysine is a bulky molecule with a Hf of 1.5 Kcal. Lysine neither favors nor inhibits the formation of helical structure and is capable of interacting with groups that have negative charges and with water.

Amino Acid Analysis

The amount of each amino acid contained within a given protein does not vary from molecule to molecule and can provide useful information about the nature of the protein molecule.

The peptide bonds of proteins are fairly strong ones, although they are weaker than many other covalent bonds. In a typical analysis of the amino acid content of a protein, these peptide bonds must first be broken so that the released amino acid can be assayed. Treatment with 6N HC12 at 110_ C for 22 hours is capable of releasing most amino acids. In practice, for careful analysis, duplicate hydrolysis would also be performed for 48 and 72 hours. Equations exist to correct for partial destruction or the slow release of amino acids from proteins utilizing the data from three different times of hydrolysis. The amino acids TRP, MET and CYS are not stable to acid digestion in the presence of even trace amounts of oxygen and will be destroyed to varying degrees.

The sulfur containing amino acids are usually determined after the protein is first exposed to oxidizing conditions with performic acid followed by the usual HC1 digestion. The oxidation with performic acid converts cysteine to cystic acid and methionine to methionine sulfone. These residues are stable to acid and will survive the usual hydrolysis conditions.

Tryptophan is usually determined following digestion under alkaline conditions or after proteolytic digestion of the protein. Following the appropriate digestion conditions, the amino acids are usually separated by ion exchange, chromatography and quantitated by their color formations after reactions with ninhydrin. In some cases separation is affected by chromatography on reversed phase columns and detection is sometimes made more sensitive by utilization of reagents that cause the separated amino acids to fluoresce.

Amino Acid Sequence Determination

While the amino acid composition of a protein yields much useful information, the sequence of amino acids in the peptide chain is required for more sophisticated structural analysis. The determination of primary sequence is far more difficult and time consuming than is simple amino acid analysis but the advent of automated methods that require only limited amounts of sample has resulted in a rapid increase in the number of proteins for which the primary sequence is known.

These methods will not be discussed in detail, but a few points regarding them will be made. One of the most important requirements for sequence determination is that adequate amounts of the protein to be studied by available in a highly purified form. The actual determination requires great skill in techniques of partial protein digestion (both chemical and enzymatic) and the subsequent purification of the resulting peptides. Only once this has been achieved can the newer, automated equipment be employed with any degree of success.

Once the primary sequence has been determined even more information can be obtained. The sequence can be compared to others to see if there is any resemblance. Many time, proteins that perform the same or similar functions in a variety of organisms have quite similar sequences. Quite often in these similar proteins, certain portions of the sequence vary only slightly, if at all, while other portions may show much greater variability. Speculation concerning the relatedness of proteins and to the mechanism of action can often be made when such comparisons are made. A knowledge of the primary sequence also makes interpretation of information obtained by x-ray diffraction regarding the three dimensional conformation of the protein easier. One of the ultimate goals of many researchers in this field is to be able to predict secondary and tertiary structures of protein from only the primary sequence. While much progress has been made in this area, much more information is required before such a goal can be realized.