Next: Acknowledgements Up: No Title Previous: The phosphotyrosine interaction

Discussion &Conclusions

In this paper we have presented a new method for protein fold recognition which exploits recent improvements in protein secondary structure prediction, and can use other information such as predictions of accessibility, loop lengths and experimental data to restrict possible folds. When applied to predicted secondary structures and accessibilities, the method has been shown to be slightly better than one widely used fold recognition method [Jones et al., 1992] at detecting the correct fold for eleven test examples. The alignments generated by the method are of comparable accuracy at the residue-residue and secondary structure alignment level. When the query is defined by experimental secondary structures and accessibilities, the method is highly successful at recognising the correct fold. This suggests that the mapping method will improve alongside any future improvement in secondary structure and accessibility prediction. The method also has the advantage of being computationally inexpensive, and so allows for multiple searches to be performed quickly.

The simplicity of the technique suggests several enhancements that could improve accuracy even further. The method of aligning sequences onto 3D structures might be developed by the use of empirically derived pair-potentials or accessibility preferences (e.g. [Jones et al., 1992][Sippl, 1990]), or by the identification of favourable interaction sites between secondary structures [Cohen et al., 1982][Cohen et al., 1980][Cohen \& Sternberg, 1980]. A more sophisticated alignment and ranking procedure is under development.

The initial alignment and filtering procedures are perhaps the most unique feature of this technique. Other techniques for fold-recognition tend only to provide a single sequence alignment of query and database structures. The use of a secondary structure element alignment method has the advantage that exhaustive comparisons of two proteins can be performed; most folds identified have an ensemble of alternative alignments that can be explored further.

Since most protein structure similarities occur at the domain level, it is advantageous, whenever possible to split both query and database structures into domains. The problem of assigning domains for protein 3D structures has been the subject of revived interest [Islam et al., 1995][Sowdhamini \& Blundell, 1995][Siddiqui \& Barton, 1995][Holm \& Sander, 1994b] and is likely to lead to accessible databases of protein structural domains. Assigning domains within proteins of unknown 3D structure is more problematic, though approaches based in sequence homology [Sonnhammer \& Kahn, 1994][Pongor et al., 1994] are undoubtedly the most promising; the vWf and PID proteins above are both examples of domains that occur in a variety of multi-domain contexts.

The method described here has applications in protein structure determination by NMR. During NMR structure determination, a preliminary secondary structure assignment (equivalent to a very accurate prediction) and a small number of distance restraints may be available early in the study. However, these data are usually insufficient to determine a unique structure by distance geometry or molecular dynamics [Smith-Brown et al., 1993]. Our results for the vWF and Proteasome domains suggest that the data may be sufficient to locate a similar fold in the database if one is present. Folds predicted from distance restraints and secondary structure assignment may be used to guide the assignment of cross-peaks and thus speed up the structure determination process. Clearly, the alternative consistent topologies may also give clues as to possible structural/functional/evolutionary relationships that are generally not known until after 3D structure determination (such as that described in Matthews et al., 1994).

We have shown that secondary structure predictions of typical accuracy, together with simple principles of protein 3D structures and/or experimental data can be used to recognise correct protein folds in a library of domains. These results and others [Gerloff et al., 1995][Russell \& Sternberg, 1995][Edwards \& Perkins, 1995] suggest that secondary structure prediction, experimental data, and protein structural principles should be used to augment protein fold recognition whenever possible.

Next: Acknowledgements Up: No Title Previous: The phosphotyrosine interaction

gjb@bioch.ox.ac.uk