Structural similarity is a continuum and for some fold types opinions
differ as to what constitutes ``similar''. For example, thioredoxin
has a -sheet with helices packing on each side which
superficially resembles a Rossmann fold domain. However, the topology
of the sheet is different from a Rossmann fold: the connectivity is
different, and it contains a mixture of parallel and antiparallel
hairpins rather than all parallel. To build a detailed model of
thioredoxin based on a Rossmann fold would be incorrect, but
recognising that thioredoxin has a ``single sheet with helix on each
side'' is still useful. For some folds, e.g. the
-trefoils,
there is no such ambiguity. We discuss the accuracy of our method
using two grades of success `strict' and `loose', which are outlined
in Table 5. Strict similarities are those where the topology of the
structure in the database is nearly an exact match of that found in
the query (e.g. plastocyanin and azurin). Loose similarities are
those where the topologies are broadly similar, with additional
secondary structures in one fold relative to another, and with some
differences in topological ordering or orientation of equivalent
secondary structure elements (e.g. plastocyanin and an Ig fold).
Strict similarities tend to correspond with those specified by scop
[Murzin et al., 1995], whereas the loose similarities tend to correspond
roughly with those identified by CATH [Orengo et al., 1993] and by
the assessors of the protein structure prediction challenge [Lemer et al., 1996].
For comparison, we also scanned the same eleven queries against the
database of domains using the fold recognition
program THREADER [Jones et al., 1992] with default parameters.
In addition to the recognition of the correct fold, it is important to
consider how well the query is aligned onto the database structure.
Two measures of alignment accuracy are given: a) the
fraction of correct residue equivalences found by each method
% Res-Res, and b) the fraction of correctly overlapping secondary
structure elements found % Sec-Sec. Secondary structures were
considered correctly matched if at least two residues from
structurally equivalent secondary structures overlapped in the
alignment generated by each method. % Res-Res is a
strict definition, and broadly measures how accurate a 3D model would
be if based on the alignment found. % Sec-Sec is a looser
definition, and allows for slippages of secondary structures and thus
indicates the accuracy of the predicted topology. The second
measure is arguably a more reliable guide, since for many pairs of similar
protein structures, alignments of sequence based on 3D structure
are ambiguous. Problems arise when assessing the symmetrical
barrel structures. Shifting the alignment of secondary
structure elements by one
unit can lead to zero accuracy by
these measures, though the resulting structure is largely correct. We
thus report average accuracies with and without the
barrels.
To assess the overall alignment accuracies of each method,
only those strict similarities that were not detectable by a sensitive
sequence comparison algorithm [Barton, 1993] were considered. Similarities
excluded were those with the globins, 1ECA, 1HBG and 1MYGA when scanning with
Sea Hare Myoglobin, and that with 1PAZ when scanning with plastocyanin.
For all other examples, accuracies were included in the calculation of an
average, regardless of whether the similarity was found at or near the
top of the ranked lists. A total of 36 strict similarities were used
in the calculation.