Next: Structural similarity Up: Methods Previous: Multiple alignment of

Sequence similarity

In this study, sequence identity () is defined as:

where is the number of amino acids in the shorter of the two sequences or structures being compared and is the number of positions in the alignment that have the same amino acid.

For the pairs considered in this study, the range of is . For the dissimilar pairs, optimal sequence alignment gives between and . The much higher minimum for dissimilar pairs can be explained by the different methods used to align the sequences. For the pairs of similar protein 3D structures, the alignment was derived by a comparison of 3D structures; for the dissimilar pairs the optimal alignment was obtained by comparison of sequences. Randomly aligned unrelated sequences give values of between However, if a sequence comparison algorithm [Barton, 1990][Needleman \& Wunsch, 1970] is used to optimise the alignment of two unrelated sequences the expected is between (on average; GJB, unpublished data). Since many of the pairs of similar 3D structures used in this study have little or no sequence similarity, and since the method used to align 3D structures in this study does not consider sequence information, alignments derived from 3D structure comparison can give very low values for . The values for for similar and dissimilar protein 3D structures are thus not directly comparable. Accordingly, dissimilar proteins were given for clarity (see points labelled `d' in the plots that follow).


gjb@
Thu Feb 9 18:06:48 GMT 1995