Predicting overall alignment accuracy

Next: Predicting quality using Up: Evaluation of alignment Previous: Evaluation of alignment

Predicting overall alignment accuracy

It is important to know in advance what the likely accuracy of an alignment will be. A common method for assessing the significance of a global alignment score is to compare the score to the distribution of scores for alignment of random sequences of the same length and composition. The result (the S.D. score) is normally expressed in Standard Deviation units above the mean of the distribution.

Comparison of the S.D. score for alignment to alignment accuracy obtained by comparison of the core secondary structures, suggests that for proteins of 100-200 amino acids in length, a score above 15.0 S.D. indicates a near ideal alignment, scores above 5.0 S.D. a ``good'' alignment where %of the residues in core secondary structures will be correctly equivalenced, while alignments with scores below 5.0 S.D. should be treated with caution [38][37].

Figure 2shows the distribution of S.D. scores for 100,000 optimal alignments of length between proteins of unrelated three dimensional structure. From Figure 2, the mean S.D. score expected for the comparison of unrelated protein sequences is 3.2 S.D. with a S.D. of 0.9. However, the distribution is skewed with a tail of high S.D. scores. In any large collection of alignments it is possible to have a rare, high scoring alignment that actually shares no structural similarity. For example, Figure 3 illustrates an optimal local alignment between regions of citrate synthase (2cts) and transthyritin (2paba) which gives 7.55 S.D. though the secondary structure of these two protein segments are completely different.

Predicting quality using percentage identity

geoff.barton@ox.ac.uk