In this paper we have presented a new method for protein fold
recognition which exploits recent improvements in protein secondary
structure prediction, and can use other information such as
predictions of accessibility, loop lengths and experimental data to
restrict possible folds. When applied to predicted secondary
structures and accessibilities, the method has been shown to be
slightly better than one widely used fold recognition method
[Jones et al., 1992] at detecting the correct fold for eleven test examples.
The alignments generated by the method are of comparable accuracy at
the residue-residue and secondary structure alignment level. When the
query is defined by experimental secondary structures and
accessibilities, the method is highly successful
at recognising the correct fold. This suggests that the mapping
method will improve alongside any future improvement in secondary
structure and accessibility prediction. The method also has the
advantage of being computationally inexpensive, and so allows for
multiple searches to be performed quickly.
The simplicity of the technique suggests several enhancements that
could improve accuracy even further. The method of aligning
sequences onto 3D structures might be developed by the use of
empirically derived pair-potentials or accessibility preferences
(e.g. [Jones et al., 1992][Sippl, 1990]), or by the identification of favourable
interaction sites between secondary structures
[Cohen et al., 1982][Cohen et al., 1980][Cohen \& Sternberg, 1980]. A more sophisticated alignment and ranking
procedure is under development.
The initial alignment and filtering procedures are perhaps the most
unique feature of this technique. Other techniques for fold-recognition tend
only to provide a single sequence alignment of query and database
structures. The use of a secondary structure element alignment method has
the advantage that exhaustive comparisons of two proteins can be
performed; most folds identified have an ensemble of alternative
alignments that can be explored further.
Since most protein structure
similarities occur at the domain level, it is advantageous, whenever
possible to split both query and database structures into domains.
The problem of assigning domains for protein 3D structures has been
the subject of revived interest
[Islam et al., 1995][Sowdhamini \& Blundell, 1995][Siddiqui \& Barton, 1995][Holm \& Sander, 1994b] and is likely to lead
to accessible databases of protein structural domains. Assigning
domains within proteins of unknown 3D structure is more problematic,
though approaches based in sequence homology [Sonnhammer \& Kahn, 1994][Pongor et al., 1994]
are undoubtedly the most promising; the vWf and PID proteins above are
both examples of domains that occur in a variety of multi-domain
contexts.
The method described here has applications in protein structure
determination by NMR. During NMR structure determination, a
preliminary secondary structure assignment (equivalent to a very
accurate prediction) and a small number of distance restraints may be
available early in the study. However, these data are usually
insufficient to determine a unique structure by distance geometry or
molecular dynamics [Smith-Brown et al., 1993]. Our results for the
vWF and Proteasome domains suggest that the data may be sufficient to locate a
similar fold in the database if one is present. Folds predicted from
distance restraints and secondary structure assignment may be used to
guide the assignment of cross-peaks and thus speed up the structure
determination process. Clearly, the alternative consistent topologies may also
give clues as to possible structural/functional/evolutionary
relationships that are generally not known until after 3D structure
determination (such as that described in Matthews et al., 1994).
We have shown that secondary structure predictions of typical
accuracy, together with simple principles of protein 3D structures
and/or experimental data can be used to recognise correct protein
folds in a library of domains. These results and others
[Gerloff et al., 1995][Russell \& Sternberg, 1995][Edwards \& Perkins, 1995] suggest that secondary structure
prediction, experimental data, and protein structural principles
should be used to augment protein fold recognition whenever possible.