In order to simplify the explanation of the ``all local alignment'' algorithm I shall first recapitulate a well known method for efficient computer implementation of the Smith-Waterman [Smith & Waterman, 1981] algorithm. For a full introduction to dynamic programming algorithms in sequence comparison see [Kruskal, 1983].
The best score for a local alignment between two sequences
A and B of length m and n is determined by calculating the comparison
matrix starting with
and working forwards through the
matrix column by column. The value of each cell
is given by the equation:
where is the score for equivalencing
and
, and
are the scores for
aligning with a gap in B or A respectively. To calculate the
value of
, it is only necessary to know the contents of the
three predecessor cells (
). If just
the best score is required, and no alignment, then only the previous
column, C of the matrix, need be stored together with the score for
the current cell, r and previous cell, p in the column. This is
illustrated in Figure 1 for the comparison of A =
C-C-A-A-T-C-T-A-C-T-A-C-T-G-C-T-T-G-C-A- G-T-A-C and B =
A-G-T-C-C-G-A-G-G-G-C-T-A-C-T-C-T-A-C-T-G-A- A-C with
if
,
if
, and
. This example
is used here to allow a direct comparison of the ``all local
alignment'' algorithm with the work of Waterman and Eggert [Waterman & Eggert, 1987].
Figure 1 illustrates the processing of and
. The
cells of H that are stored are shown in large numerals. The cells
shown in small numerals are discarded. The best overall score is
updated if the value of the current cell is greater than the maximum so far. This
is the simplest efficient implementation of the ``best score only''
local similarity algorithm, and may be coded to run very fast. It is
also straightforward, by the addition of a further 1-dimensional array
to adapt the algorithm to cope with a gap-penalty function having the
form
where k is the gap-length [Gotoh, 1982].