The results of this study suggest that there is little in common between distantly related protein
structures. The proportion of distantly related protein structures that are actually similar to each
can be as little as of the maximum, which reinforces the observation that secondary structure
lengths and loops in distantly related
structures vary substantially. The degree to which accessibility and secondary structure are conserved
on a residue by residue basis within structurally similar proteins can be as low as that for dissimilar
proteins (i.e., by chance). The fraction of shared interactions (pairs of residues in
contact in two distantly related protein structure) can be as little as
, even when a
lenient definition of
-
distance is used. Structurally similar proteins can
have almost no common favourable interactions, or those contributing a negative
pseudo-energy term. Finally, regardless of any functional similarity, similar protein 3D
structures often have a proportion of complementary changes approaching that expected by chance.
All the results suggest that proteins can adopt very similar folds by using almost completely
different interactions, and that proteins having similar 3D structures can have little in common apart
from a scaffold of common core secondary structures.
The results presented here have many implications for methods of protein fold detection. The fact that the degree of conservation of secondary structure and accessibility, when considered on a residue by residue basis, is similar to that for structurally dissimilar proteins, and the low proportion of residues in common cores suggests why many methods of fold detection are often unable to detect genuine 3D structural similarities. In particular, those methods that do not consider long-range interactions (i.e., side-chain to side-chain contacts), are unlikely to detect weak 3D structural similarities, since other residue by residue (i.e., one-dimensional) measures of structural similarity are not well conserved for many genuinely similar proteins.
Methods which thread protein sequences onto 3D structural templates using pair potentials
[Bryant \& Lawrence, 1993][Godzik et al., 1993][Jones et al., 1992][Sippl et al., 1992], are likely to fare better, though all
of these methods require that similar structures should have a reasonable proportion of interacting
residues in common. The small fraction of residues common to the core of distantly related
proteins (as few as ), and the even smaller fraction of common interacting residues (as few
as
) suggests that many protein 3D structural similarities will be undetectable even by threading
methods, since key interactions are likely to be modelled incorrectly.
Our findings suggest that it is more general features of protein structure,
such as having hydrophobic residues buried in the core of proteins, and polar residues on the surface,
rather than particular residue-residue interactions that determine how well a particular sequence adopts
a particular fold. If detection of similar folds having little in common outside of their core secondary structures
is to become a reality, efforts should concentrate on such general principles, and on methods for modelling
large loop regions that are likely to differ between similar 3D structures.
The results provide little insight as to whether structurally similar proteins have evolved
by divergence or convergence. However, the fact that there is no detectable difference between pairs
structures that are functionally similar and those that are not (at a similar ) suggests that it
may be impossible to discern divergence from convergence. Those proteins which were defined as
type
similarities are often thought to have a common ancestor. For example, it seems very likely
that the aspartic proteinase lobes (i.e., N- and C-terminal domains in the eukaryotic structures) are
related both to each other (i.e., by gene duplication or exon shuffling; see Blundell et al., 1979)
and to the single viral proteinase lobes which dimerise to form a similar structure (e.g., Lapatto et al., 1989).
However, their degree of structural and sequence conservation is low. If one argues that the
proteinase lobes are related by divergence, then, based on the degree of structural and sequence similarity, one
could argue the same for the quite obviously functionally dissimilar plastocyanin and Ig light chain variable
domain shown in Figure 1. It would seem that both the sequence and structure of similar proteins can evolve
beyond recognition even when function is conserved.