All contacts, defined as any atom-atom distance of less than Å, were
calculated and tabulated for each of the
unique 3D structures given by Jones et al. (1992).
Residues were considered to be in contact if they had at least one shared contact
between the atoms of their side-chains.
Several authors have described potentials for the interaction of two residues within
protein structures. Some make use of
a reduced representative protein structure [Bryant \& Lawrence, 1993][Jones et al., 1992][Sippl, 1990], whereas
others consider all atoms [Godzik et al., 1993]. In this study every
atom-atom contact made between protein side-chains was used to derive a simple
a pseudo-energy term for the interaction of two residues and
:
where
is defined by:
is the observed number of contacts between residues of type
and
,
and
is the expected number (assuming a random model), and
is a reference
state energy (discussed below).
Given a database of known 3D structures, a set of pair potentials can thus be derived by counting the number of times a particular amino acid contact occurs and dividing this number by the number of times expected given the total number of contacts made by each amino acid. For any given amino acid pair, the expected number of side-chain side-chain contacts under a random model assumption is [Narayana \& Argos, 1984][Warme \& Morgan, 1978]:
where denotes all amino acids,
is the total number of side-chain to side-chain
contacts in the dataset, and where
and
are the total number of
side-chain to side-chain contacts made by residues of type
and
respectively.
Contacts within the database of
unique folds
were counted, and the observed number of contacts for each pair of amino acids were
used to calculate
. The reference state energy
was calculated by taking the
average of all values of
, which gives
kJ/mol.
Values of
were calculated by the equations described above, and are given
in Table 3.
The columns/rows of Table 3 can be used to classify amino acids according to their pair preferences. A measure of the difference in pair preference can be obtained by summing the absolute differences between the values in each column for every possible pair of amino acids. Figure 2 shows a complete linkage dendrogram for these data. The clustering of the hydrophobic residues (M, A, V, L, I, W, F) is similar to clustering by side-chain properties [Taylor, 1986a], and shows their similar pair preferences. However, unlike other classifications of the amino acids, the charges cluster separately (i.e., R and K do not cluster with E and D), suggesting (as expected) that positive and negative residues, when in contact with other residues, are unlikely to undergo mutations involving a change in sign.
When considering a single pair of interacting residues, the pair potentials provide an
approximate test of whether the interaction is favourable (i.e., whether or not it will
effect to stablise or destablise the overall fold) simply by investigating the sign of
. Negative values (i.e.,
) will be expected to stablise the fold, whereas
positive values will be expected to be disruptive. Although the pair potentials discussed
here differ from many of those used previously [Bryant \& Lawrence, 1993][Godzik et al., 1993][Jones et al., 1992][Sippl, 1990],
the signs of
are similar, suggesting that this simple test, and the results
that follow, would differ little if another pair potential was used.