Prediction methods that use multiple sequence alignments gain accuracy over single-sequence methods by exploiting the patterns of residue conservation that are seen in protein families. Inclusion of more distantly related sequences in the alignment should improve the clarity of such patterns, but in an automated alignment building procedure, the risk is that unrelated protein sequences will pollute the alignment. Here, we investigated the effect of using a more permissive BLAST p-value cutoff in the first phase of our alignment building procedure. The cutoff was lowered from 1x10-10 to 1x10-2 while leaving thresholds for SCANPS alone.
Table 7 shows that the change in p-value cutoff increased the total number of residues after filtering with SCANPS by 297,276, and the total number of sequences by 1,961. This gives an increase in the average number of sequences per alignment of 15. Table 8 shows that increasing the number of sequences improves Q3 by approximately 1% for all methods.
Table 8 also shows the marked difference between the prediction methods. The older methods, ZPRED and MULPRED were between 3 and 8 percent worse than the newer methods.