Percentage identity is a frequently quoted statistic for an alignment of two sequences. However, the expected value of percentage identity is strongly dependent upon the length of alignment [39] and this is frequently overlooked. Figure 4 shows the percentage identities found for a large number of locally optimal alignments of differing length between proteins known to be of unrelated three dimensional structure. Clearly, an alignment of length 200 showing 30%identity is more significant than an alignment of length 50 with the same identity. Applying this to the alignment shown in Figure 3 shows that although the alignment scores over 7.0 S.D. it has a percentage identity that one would often see by chance between unrelated proteins.

geoff.barton@ox.ac.uk