Effect on Q3 of changing the number of related sequences

Next: Effect on Q of Up: Results and Discussion Previous: Comparison of prediction accuracy

Effect on Q₃ of changing the number of related sequences

Prediction methods that use multiple sequence alignments gain accuracy over single-sequence methods by exploiting the patterns of residue conservation that are seen in protein families. Inclusion of more distantly related sequences in the alignment should improve the clarity of such patterns, but in an automated alignment building procedure, the risk is that unrelated protein sequences will pollute the alignment. Here, we investigated the effect of using a more permissive BLAST p-value cutoff[59] in the first phase of our alignment building procedure. The cutoff was lowered from 1x10^-10 to 1x10^-2 while leaving thresholds for SCANPS alone.

Table 7 shows that the change in p-value cutoff increased the total number of residues after filtering with SCANPS by 297,276, and the total number of sequences by 1,961. This gives an increase in the average number of sequences per alignment of 15. Table 8 shows that increasing the number of sequences improves Q₃ by approximately 1% for all methods.

Table 8 also shows the marked difference between the prediction methods. The older methods, ZPRED and MULPRED were between 3 and 8 percent worse than the newer methods.

Next: Effect on Q of Up: Results and Discussion Previous: Comparison of prediction accuracy

james@ebi.ac.uk