The results of comparing the eleven protein structures to the database
of domains using DSSP patterns, PHD patterns, and the THREADER program
are shown in Table 6. The table lists the top 10 ranked domains for
each query by each method. For each domain, the code, score,
structural class and fold description are shown together with the
alignment score and the percentage accuracies of the alignments at the
residue (%Res-Res) and secondary structure (%Sec-Sec) level (see
below). Within Table 6, domains classified as strict similarities
(ignoring those detectable by sequence comparison) are shown in
inverse text; loose similarities are shown as shaded. Table 7
summarises the rankings shown in Table 6 (see legend).
Judging by the strict criteria shown in Table 5, 8/11 of the scans
made with experimentally determined secondary structure (MAP(DSSP))
put the correct fold in the first rank. By the loose definition, the
method located 10/11 folds in the first rank. Predictably, the scans
based on patterns from secondary structure prediction fare worse.
4/11 folds were correctly ranked at position 1 by the strict criteria.
However, this compares favourably with THREADER which placed 1 fold
correctly in the first rank. When the loose definitions of fold
similarity are used, our method placed 5/11 correct folds at the top
of the list compared to 2/11 for THREADER. Expanding the definition
of success to include any search that places a correct fold in the top 10, as
described by Lemer et al. (1996) , shows a similar trend
(Table 7). The greater success of the
DSSP derived patterns suggests that fold recognition by this method
will improve alongside any improvements in secondary structure and
accessibility prediction. The structural class of proteins (as
identified using SCOP) in the top 10 domains was more consistent by
our method: MAP(PHD) scans lead to 10/11 correct protein class
predictions for the 1st ranked protein, compared to 5/11 for THREADER.
Although this improvement may be due mostly to the accuracy of the PHD
predictions, the result suggests that other fold recognition methods could
profit from the consideration of predicted secondary structures.
Our method (MAP) shows an improvement over THREADER with respect to
detecting the correct fold. What of alignments of sequence to structure?
Values for individual accuracies are given in Table 6. Reference alignments
of 3D structures were found by the STAMP algorithm [Russell \& Barton, 1992] for
all strict similarities with the eleven protein
families. The averaged values for %Res-Res and %Sec-Sec are shown in
Table 8. MAP(DSSP), MAP(PHD) and THREADER give %Res-Res of 35, 15
and 11 %respectively and %Sec-Sec of 75, 43 and 37%. If one
ignores the repetitive barrel alignments, accuracies improve slightly
with %Res-Res 39, 15 and 13%and %Sec-Sec of 86, 49 and 50 %for
MAP(DSSP), MAP(PHD) and THREADER. None of the methods perform well by the
%Res-Res criterion, though %Sec-Sec suggests that the correct topology
is achieved about 50 %of the time. The high %Sec-Sec for
MAP(DSSP) scans suggests that alignment accuracy, like fold recognition,
will improve with developments in secondary structure and
accessibility prediction.
How useful are the detected loose similarities? For some examples,
loose similarities imply only a broadly similar architecture, and may
not immediately be used for homology modelling studies. However, for
others the loose similarity genuinely represents a feasible modelling
template. For example, the PHD prediction of hepatocyte nuclear factor 3
(HNF-3) failed to predict
two short strands found in the native structure, and thus the MAP
search did not detect BirA domain I (PDB code 1BIA) or GAP domain I
(2GAP) as possible templates. However, the search with the
predominantly helical prediction did rank another helix-turn-helix
motif first, as shown in Figure 1. The core three
helices have been aligned correctly at the secondary structure level
and a prediction of this type could be useful in the absence of
experimental 3D structure information.