Possibly the most widely used scheme for scoring amino acid pairs is
that developed by Dayhoff and co-workers [1]. The
system arose out of a general model for the evolution of proteins.
Dayhoff and co workers examined alignments of closely similar
sequences where the the likelihood of a particular mutation (e. A-D)
being the result of a set of successive mutations (eg. A-x-y-D) was
low. Since relatively few families were considered, the resulting
matrix of accepted point mutations included a large number of entries
equal to 0 or 1. A complete picture of the mutation process including
those amino acids which did not change was determined by calculating
the average ratio of the number of changes a particular amino acid
type underwent to the total number of amino acids of that type present
in the database. This was combined with the point mutation data to
give the mutation probability matrix () where each element
gives the probability of the amino acid in column
mutating to the amino acid in row
after a particular evolutionary
time, for example after 2 PAM (Percentage of Acceptable point
Mutations per
years).
The mutation probability matrix is specific for a particular evolutionary distance, but may be used to generate matrices for greater evolutionary distances by multiplying it repeatedly by itself. At the level of 2,000 PAM Schwartz and Dayhoff suggest that all the information present in the matrix has degenerated except that the matrix element for Cys-Cys is 10%higher than would be expected by chance. At the evolutionary distance of 256 PAMs one amino acid in five remains unchanged but the amino acids vary in their mutability; 48%of the tryptophans, 41%of the cysteines and 20%of the histidines would be unchanged, but only 7%of serines would remain.
When used for the comparison of protein sequences, the mutation
probability matrix is usually normalised by dividing each element
by the relative frequency of exposure to mutation of the
amino acid
. This operation results in the symmetrical
``relatedness odds matrix'' with each element giving the probability of
amino acid replacement per occurrence of
per occurrence of
.
The logarithm of each element is taken to allow probabilities to be
summed over a series of amino acids rather than requiring
multiplication. The resulting matrix is the ``log-odds matrix'' which
is frequently referred to as ``Dayhoff's matrix'' and often used at a
distance of close to 256 PAM since this lies near to the limit of
detection of distant relationships where approximately 80%of the
amino acid positions are observed to have changed [2].