In this study, sequence identity () is defined as:
where is the number of amino acids in the shorter
of the two sequences or structures being compared and
is the number of positions in the
alignment that have the same amino acid.
For the pairs considered in this study, the range of
is
.
For the
dissimilar pairs, optimal sequence alignment gives
between
and
.
The much higher minimum
for dissimilar pairs can be explained by
the different methods used to align the sequences. For the pairs of similar protein 3D structures,
the alignment was derived by a comparison of 3D structures; for the
dissimilar pairs the optimal alignment was obtained by comparison of
sequences. Randomly aligned unrelated sequences give values of
between
However, if a sequence comparison algorithm [Barton, 1990][Needleman \& Wunsch, 1970]
is used to optimise the alignment of two unrelated sequences the expected
is between
(on average; GJB, unpublished data).
Since many of the pairs of similar 3D structures used in this study have little or no sequence similarity,
and since the method used to align 3D structures in this study does not consider sequence information,
alignments derived from 3D structure comparison can give very low values for
.
The values for
for similar and dissimilar protein 3D structures are thus not directly comparable.
Accordingly, dissimilar proteins were given
for clarity (see points labelled `d' in the
plots that follow).