AMINO ACIDS AND PRIMARY STRUCTURE
Proteins are defined as molecules that upon hydrolysis yield mixtures of a amino acids that are utilized to synthesize proteins. The proteins are synthesized at the ribosomes utilizing information found in RNA which was originally derived from DNA. This information determines the sequence of amino acids that will compose the protein molecule. The linear sequence of amino acids in a protein is known as its primary structure. The information contained within this primary structure must also determine what, if any, changes will be made to the amino acids in the chain as well as any other chemical reaction that may occur. This sequence must also provide all the information required to yield the final three dimensional structure that the protein molecule will assume.
The amino acids that make up proteins are often compared to the letters of the alphabet that make up words. There are 26 letters and 20 amino acids. The order of letters or amino acids is important to the finished result. Random association of letters or amino acids almost invariably results in nonsense. The major difference is the length of the units formed. Words are mostly made up of only a few letters while proteins usually contain hundreds of amino acids.
The number of possible combinations of amino acids to yield proteins becomes staggering for even moderately sized molecules. For a peptide that contained only 10 amino acids there could be:
2010 = 1.02 x 1013
possible combinations. If the protein contained only one of each of the 20 common amino acids, there would be:
20! = 2.43 x 1018
combinations. Most proteins contain many more than 100 amino acids and the number of possible combination for these becomes extremely large.
In the case where a peptide contains 20 amino acids with a possible 1.05 x 1026 combinations it is relevant to consider how many of these could reasonably be sampled during the course of evolution. There are 3.15 x 107 seconds/year. If the proteins could try one million combinations per second, it would be able to sample 3.15 x 1013 combinations/year. To sample all the possible combinations would require:
(1026/1013) = 3.33 x 1212 years
One million samplings/second is beyond what could be expected
to occur and 3.3 x 1010 years is roughly 103 times the length of
time that there has been life on earth. Only extremely small
portion of all the possible combinations of amino acids found in
any given protein molecule can have been sampled during the
course of the evolution of protein structure. The next section
will discuss how the linear array of amino acids leads to the
three dimensional structure of proteins. The rest of this chapter
will discuss the properties of amino acids.
The Chemical Nature of Amino Acids
The amino acids found in proteins have the following generalized structure:

where R can stand for a variety of constituents. The amino group and the carboxyl group of the amino acid have PKs of about 9.6 and 2.3 respectively. Thus, the amino acid can exist in three general forms. At low pH the amino and carboxyl groups will be protonated and the molecules will be in the acid form.

As the pH is increased towards neutrality, the amino acids become zwitterions having both negative and positive charges.

As the pH increases further, the molecules become basic.

In going from low to high pH:
When amino acids are joined together to form peptide bonds, water is removed in the following reaction:

It should be noted that the joining of amino acids to form peptide bonds eliminates the acid and basic character of the molecules. The only ionized groups in the finished protein will be the terminal amino and carboxyl groups and also any R groups that happen to contain acidic or basic groups.
The Structure of Amino Acids
While there are hundreds of amino acids in nature, only 20 are
commonly found in proteins. With one exception these are a amino
acids that differ only in the nature of their R groups. In this
section the structure of each amino acids will be given and some
of the properties that are unique to that compound will be
discussed. The standard three letter abbreviations for each amino
acid will be given with its structure and these abbreviations
will be used frequently in the remainder of the text.

Glycine is the only amino acid that does not have optical
isomers. The other amino acids do, but in proteins only the L
isomer occurs. Glycine has no ionizable groups when it is located
in peptide bounds. Glycine will spontaneously transfer from a non
aqueous to an aqueous environment with a DF of -4.6 Kcal/mole.
Because glycine contains only a hydrogen while other amino acids
contain either polar or nonpolar groups, Tanford (1962) has
suggested that glycine be utilized as a reference compound and
that other amino acids be compared to glycine to determine the
nonpolarity or hydrophobicity of their side chains. These free
energy transfer values have been designated Hf. The value for
glycine then is by definition zero and glycine can be expected to
be found with nearly equal frequency at the interior or exterior
of a protein. Due to its small size, glycine may be placed in
portions of proteins where other amino acids can not occur due to
steric reasons. The presence of glycine residues tends to
interrupt the helical structure. Glycine is often found in
portions of proteins involve in turns.

The next largest amino acid, alanine, contains a methyl group
rather than a hydrogen attached to its a carbon. Like glycine, it
has no ionizable groups when it is found in proteins. Its value
for Hf is 0.75 Kcal which suggests that is mildly hydrophobic.
Also like glycine, alanine tends to be found with almost equal
frequency on the surface and interior of protein molecules.
Alanine tends to promote the formation of helical structure.

Valine has no ionizable groups when found in peptide bonds,
but has considerably more bulk than either Gly or Val. Its Hf
value of 1.70 Kcal suggest that it is a fairly hydrophobic
residue. It is found almost exclusively in the interior of
proteins. The presence of valine tends to favor the formation of
helical structures.

Leucine resembles VAL with a CH2 group before the branch. It
has no ionizable groups when found in proteins and is strongly
hydrophobic. It has a Hf of 2.40 Kcal and is found in the
interior of protein molecules. Its presence tends to stabilize
helical structures.

Isoleucine is a positional isomer of LEU and thus has many of
the same characteristics. The branching at the b-carbon tends to
increase its bulk and thus its hydrophobicity. Isoleucine has the
second highest Hf of 2.95 Kcal, has no ionizable groups when
found in proteins and tends to stabilize helical structures.

Serine resembles ALA with a hydroxyl group. It is a neutral polar molecule that tends to remain on the surface of proteins (Hf < O). The presence of serine tends to interrupt helical structures. While generally present on the surface, serine can be found on the interior of protein when its -OH group is involved in hydrogen bond formation. Serine can also form noncovalant cross links between protein links between protein chains due to hydrogen bonding.

Threonine is closely related to SER in that only a methyl
group has been added to the molecule. The hydroxyl group tends to
make the molecule polar while the ethyl group tends to be
non-polar. Threonine has an Hf of 0.45 Kcal but tend to be found
on the surface of proteins unless its hydroxyl group is hydrogen
bounded. The presence of threonine neither favors nor inhibits
the formation of helical structure and threonine has no ionizable
groups when found in protein molecules. Like SER, threonine can
be used to form hydrogen bond cross-links between protein chains.

Phenylalanine is a bulky amino acid with a strong hydrophobic
character. Its Hf is 2.65 Kcal and this residue is found almost
exclusively in the interior of protein molecules. It has no
ionizable groups but does contain the conjugated ring system with
its p electrons are also able to interact with other molecules
containing p electrons. Phenylalanine neither favors nor inhibits
the formation of helical structure. This residue can be found
both at the surface and interior of protein molecules. When found
on the interior, its hydroxyl groups is always involved in
hydrogen bonding.

Tyrosine strongly resembles PHE but contains a hydroxyl group
on the ring. The bulky ring gives the molecule a hydrophobic
nature with Hf of 2.85 Kcal. The hydroxyl group is polar and will
readily interact with water. The conjugated ring's p electrons
are also able to interact with other molecules containing p
electrons. Tyrosine neither favors nor inhibits the formation of
helical structure. This residue can be found both at the surface
and interior of protein molecules. When found in the interior,
its hydroxyl group is always involved in hydrogen bonding. At
very high pH values the hydroxyl of PHE can be ionized and thus
the molecule can be considered as weakly acidic. The pk of this
group is around 9.6 when tyrosine is located in a protein and
thus at neutral pH values, the molecule is essentially unionized.
Tyrosine absorbs strongly in the ultraviolet region and thus
contributes to the UV absorbance of proteins.

The tryptophan is the bulkiest amino avid and has a Hf of 3.0
Kcal. In spite of its very hydrophobic nature, it is found both
on the surface and interior of protein molecules. Its extensive
array of p electrons allows it to interact strongly with other
molecules containing p electrons. Tryptophan has no ionizable
groups and tends to favor the formation of helical structures.
Tryptophan absorbs strongly in the region between 275 and 280 nM
and makes a large contribution to the ultraviolet absorption of
protein molecules.

Cysteine is a slightly acidic amino acid that is slightly hydrophobic with an Hf of 1.0 Kcal. The presence of cysteine neither favors nor inhibits the formation of helical structures. Probably the most important characteristic of cysteine is its ability to stabilize protein structure by forming disulfide linkages with other cysteine molecules. These covalent cross links add stability to the three dimensional structures of protein and their formation and importance will be discussed in some detail in the next chapter. The sulfhydryl group of cysteine is a very weak acid with a pK of about 8.4. At a pH near neutrality, a few percent of the sulfhydryl groups of a protein will be ionized.

Methionine is a neutral amino acid with a Hf of 1.3 Kcal. It
is related to cysteine but can not form disulfide linkages.
Methionine tends to favor the formation of helical structures.
While not able to cross link proteins through disulfide linkages,
the molecule can form important interactions with other
constituents that may bind to proteins. The sulfur atoms of
methionine contains a pair of nonbonded electrons that are
capable of binding to metals to make methionine a metal ligand.

Proline is not a primary amine, but rather a secondary amine or an imine. Peptide bonds formed with proline lack a free amino group to form hydrogen bonds and thus proline tends to strongly inhibit a helical structure. These peptide bonds tend to fold back upon themselves and proline is found quite often in regions or protein that form turns. Proline has not ionizable groups and has Hf of 2.6 Kcal.

Aspartic acid is one of the two dicarboxylic amino acids. The
second carboxly group makes the molecule very hydrophillic. It
has a Hf < 0. The pK of the second group is about 3.85 and
thus aspartic acid contains a negative charge at neutral pH.
Removal of this charged group from the aqueous phase requires a
large expenditure of energy and thus charged amino acids are
found almost exclusively at the surface of proteins. This group
is able to form ionic bonds with positively charged amino acids
or metals and it can also form ion dipole interactions with
water. These interactions are very important to the solubility
properties of proteins. The pressures of aspartic acid neither
favors nor inhibits the formation of helical structures.

Glutamic acid is very similar to aspartic acid in its structure and its properties. It contains one more CH2 group, but still has a Hf < 0. The carboxyl group is less acidic than is that of ASP with a pK of 4.25. The differences are not great, however, and at neutral pH dissociation is virtually complete. The presence of glutamic acid tends to favor the formations of helical structures and like ASP, it is involved in many interactions.

Asparagine resembles ASP but the carboxyl group has been
neutralized by formation of an amide bond., Some amino acids are
modified after the protein is assembled, eg. hydroxylation of
some proline residues, methylation of some histidines, etc. These
changes are not determined by the genetic code but rather are
performed by specific enzymes that recognize certain amino acid
sequences. This is not the case for asparagine and this amino
acid is inserted as such during protein synthesis. Even without
the free carboxyl group asparagine is a polar molecule that is
almost always found at the protein surface. It can function as a
chain crosslinker via hydrogen bond formation or it can hydrogen
bond to water at the protein surface. Asparagine tends to inhibit
the formation of helical structure and quite often is found in
protein bends.
Much as asparagine resembles ASP , Glutamine resembles GLU. Again the carboxyl group has been neutralized by formation of an amide bond. Like asparagine, glutamine is almost always found at the protein surface. It can function as a chain crosslinker via hydrogen bond formation or it can hydrogen bond to water at the protein surface. Glutamine tends to favor the formation of helical structures in proteins.

Histidine can bind a proton to the nonbonded electron pair of
its ring nitrogen to become a weak acid at low pH. The pK of the
acid is 6.0 so that at neutral pH, histidine is about 90% in the
basic form with about 10% still in the acid form. Histidine is
the only amino acid that has a functional group that titrates in
the physiological pH range. It is a polar molecule, Hf < O
that tends to favor the formation of helical structure. Depending
upon its form, which depends on the localized pH of its
environment, histidine can serve as both a proton donor and
accepter. The nonbonded electron pair of the basic form are
always available for metal chelation. This versatility has been
utilized and histidine is quite often found at the active site of
enzymes and as a point of attachment for metal containing group.

Arginine is a large polar molecule with a positive charge at neutral pH. The pK of the guanidanyl group is about 12.5. Even though the molecule has a positive charge at almost all pH values, arginine is a very large molecule and has an Hf of 0.75 Kcal. Arginine tends to interact with negatively charged groups, negative ions and with water.

Lysine is a charged polar amino acid having an extra amino
group. The pK of this group is about 10.5 and thus lysine will
have a positive charge at the pH values that most proteins are
likely to encounter. Lysine is a bulky molecule with a Hf of 1.5
Kcal. Lysine neither favors nor inhibits the formation of helical
structure and is capable of interacting with groups that have
negative charges and with water.
Amino Acid Analysis
The amount of each amino acid contained within a given protein does not vary from molecule to molecule and can provide useful information about the nature of the protein molecule.
The peptide bonds of proteins are fairly strong ones, although they are weaker than many other covalent bonds. In a typical analysis of the amino acid content of a protein, these peptide bonds must first be broken so that the released amino acid can be assayed. Treatment with 6N HC12 at 110_ C for 22 hours is capable of releasing most amino acids. In practice, for careful analysis, duplicate hydrolysis would also be performed for 48 and 72 hours. Equations exist to correct for partial destruction or the slow release of amino acids from proteins utilizing the data from three different times of hydrolysis. The amino acids TRP, MET and CYS are not stable to acid digestion in the presence of even trace amounts of oxygen and will be destroyed to varying degrees.
The sulfur containing amino acids are usually determined after the protein is first exposed to oxidizing conditions with performic acid followed by the usual HC1 digestion. The oxidation with performic acid converts cysteine to cystic acid and methionine to methionine sulfone. These residues are stable to acid and will survive the usual hydrolysis conditions.
Tryptophan is usually determined following digestion under
alkaline conditions or after proteolytic digestion of the
protein. Following the appropriate digestion conditions, the
amino acids are usually separated by ion exchange, chromatography
and quantitated by their color formations after reactions with
ninhydrin. In some cases separation is affected by chromatography
on reversed phase columns and detection is sometimes made more
sensitive by utilization of reagents that cause the separated
amino acids to fluoresce.
Amino Acid Sequence Determination
While the amino acid composition of a protein yields much useful information, the sequence of amino acids in the peptide chain is required for more sophisticated structural analysis. The determination of primary sequence is far more difficult and time consuming than is simple amino acid analysis but the advent of automated methods that require only limited amounts of sample has resulted in a rapid increase in the number of proteins for which the primary sequence is known.
These methods will not be discussed in detail, but a few points regarding them will be made. One of the most important requirements for sequence determination is that adequate amounts of the protein to be studied by available in a highly purified form. The actual determination requires great skill in techniques of partial protein digestion (both chemical and enzymatic) and the subsequent purification of the resulting peptides. Only once this has been achieved can the newer, automated equipment be employed with any degree of success.
Once the primary sequence has been determined even more information can be obtained. The sequence can be compared to others to see if there is any resemblance. Many time, proteins that perform the same or similar functions in a variety of organisms have quite similar sequences. Quite often in these similar proteins, certain portions of the sequence vary only slightly, if at all, while other portions may show much greater variability. Speculation concerning the relatedness of proteins and to the mechanism of action can often be made when such comparisons are made. A knowledge of the primary sequence also makes interpretation of information obtained by x-ray diffraction regarding the three dimensional conformation of the protein easier. One of the ultimate goals of many researchers in this field is to be able to predict secondary and tertiary structures of protein from only the primary sequence. While much progress has been made in this area, much more information is required before such a goal can be realized.