Next: Cross-validation Up: Methods Previous: Fitting sequences on

Protein Structure Patterns for Evaluation

Representatives (queries) from each of structural families containing structural similarities despite no sequence similarity [Russell \& Barton, 1994] were chosen to assess the method. The queries are shown in Table 4 and represent a diversity of folds from all four protein folding classes. For all queries, there is at least one clear example of a similar fold in the database that does not show any detectable sequence similarity to the query. For reference, similar folds in the database were found by the STAMP (structural alignment of multiple proteins) structure comparison program [Russell \& Barton, 1992] and with reference to the structural classification of proteins (SCOP) database [Murzin et al., 1995].

Two patterns were defined for each of the eleven structures: a) one taken directly from the DSSP secondary structure assignment and accessibility (i.e. perfect prediction) and b) one from cross-validated secondary structure and accessibility prediction by the methods of Rost &Sander [Rost \& Sander, 1994][Rost \& Sander, 1993]. The PHD program and jack-knifed neural network architectures were kindly provided by Dr Burkhard Rost (EMBL). Experimental secondary structure summaries and accessibilities (a) were taken from DSSP [Kabsch \& Sander, 1983]. Predicted secondary structure summaries (b) were taken from the `PHD sec' entries and accessibilities from the `SUB acc' entries, since these most closely resembled the assignments from the calculation of accessibility. PHD assignments of buried and exposed states were classified as buried and exposed, with all other positions `i' or no assignment as `u'. Strands shorter than two residues, and helices shorter than four residues were ignored. The length of the secondary structure was given by the number of residues in each secondary structure (maximum = minimum), and the number of residues between the secondary structures was taken as the minimum loop length.

Patterns may also contain distance restraints, such as those available from NMR experiments, disulphide linkages, or SDM studies. Distance restraints were only added in the von-Willebrand factor and Proteasome patterns (see Results).



Next: Cross-validation Up: Methods Previous: Fitting sequences on


gjb@bioch.ox.ac.uk