Next: Fold recognition from Up: Results Previous: Assessing accuracy

Searches with eleven test proteins

The results of comparing the eleven protein structures to the database of domains using DSSP patterns, PHD patterns, and the THREADER program are shown in Table 6. The table lists the top 10 ranked domains for each query by each method. For each domain, the code, score, structural class and fold description are shown together with the alignment score and the percentage accuracies of the alignments at the residue (%Res-Res) and secondary structure (%Sec-Sec) level (see below). Within Table 6, domains classified as strict similarities (ignoring those detectable by sequence comparison) are shown in inverse text; loose similarities are shown as shaded. Table 7 summarises the rankings shown in Table 6 (see legend).

Judging by the strict criteria shown in Table 5, 8/11 of the scans made with experimentally determined secondary structure (MAP(DSSP)) put the correct fold in the first rank. By the loose definition, the method located 10/11 folds in the first rank. Predictably, the scans based on patterns from secondary structure prediction fare worse. 4/11 folds were correctly ranked at position 1 by the strict criteria. However, this compares favourably with THREADER which placed 1 fold correctly in the first rank. When the loose definitions of fold similarity are used, our method placed 5/11 correct folds at the top of the list compared to 2/11 for THREADER. Expanding the definition of success to include any search that places a correct fold in the top 10, as described by Lemer et al. (1996) , shows a similar trend (Table 7). The greater success of the DSSP derived patterns suggests that fold recognition by this method will improve alongside any improvements in secondary structure and accessibility prediction. The structural class of proteins (as identified using SCOP) in the top 10 domains was more consistent by our method: MAP(PHD) scans lead to 10/11 correct protein class predictions for the 1st ranked protein, compared to 5/11 for THREADER. Although this improvement may be due mostly to the accuracy of the PHD predictions, the result suggests that other fold recognition methods could profit from the consideration of predicted secondary structures.

Our method (MAP) shows an improvement over THREADER with respect to detecting the correct fold. What of alignments of sequence to structure? Values for individual accuracies are given in Table 6. Reference alignments of 3D structures were found by the STAMP algorithm [Russell \& Barton, 1992] for all strict similarities with the eleven protein families. The averaged values for %Res-Res and %Sec-Sec are shown in Table 8. MAP(DSSP), MAP(PHD) and THREADER give %Res-Res of 35, 15 and 11 %respectively and %Sec-Sec of 75, 43 and 37%. If one ignores the repetitive barrel alignments, accuracies improve slightly with %Res-Res 39, 15 and 13%and %Sec-Sec of 86, 49 and 50 %for MAP(DSSP), MAP(PHD) and THREADER. None of the methods perform well by the %Res-Res criterion, though %Sec-Sec suggests that the correct topology is achieved about 50 %of the time. The high %Sec-Sec for MAP(DSSP) scans suggests that alignment accuracy, like fold recognition, will improve with developments in secondary structure and accessibility prediction.

How useful are the detected loose similarities? For some examples, loose similarities imply only a broadly similar architecture, and may not immediately be used for homology modelling studies. However, for others the loose similarity genuinely represents a feasible modelling template. For example, the PHD prediction of hepatocyte nuclear factor 3 (HNF-3) failed to predict two short strands found in the native structure, and thus the MAP search did not detect BirA domain I (PDB code 1BIA) or GAP domain I (2GAP) as possible templates. However, the search with the predominantly helical prediction did rank another helix-turn-helix motif first, as shown in Figure 1. The core three helices have been aligned correctly at the secondary structure level and a prediction of this type could be useful in the absence of experimental 3D structure information.



Next: Fold recognition from Up: Results Previous: Assessing accuracy


gjb@bioch.ox.ac.uk