Dynamic programming algorithms may be divided into those that find a global alignment of the sequences and those that find local alignments. The difference between global and local alignment is illustrated in Figure 1. Global alignment is appropriate for sequences that are known to share similarity over their whole length. Local alignment is appropriate when the sequences may show isolated regions of similarity, for example multiple domains or repeats. Local alignment is best applied when scanning a database to find similarities or when there is no a priori knowledge that the protein sequences are similar.
There are many variations on the theme of dynamic programming applied to protein comparisons. Here I give a brief account of a basic method for finding the global best score for aligning two sequences. For a clear and detailed explanation of dynamic programming see Sankoff and Kruskal [26].
Let the two sequences of length and
be
and the symbol for a single
gap be
. At each aligned position there are three possible
events.
substitution of
by
.
deletion of
.
deletion of
.
The substitution weight is derived from the chosen
scoring scheme - perhaps Dayhoff's matrix. Gaps
are normally
given a negative weight often referred to as the ``gap penalty'' since
insertions and deletions are usually less common than substitutions.
The maximum score for the alignment of
with
may be
represented as
. This may be found by
working forward along each sequence sucessively finding the best score
for aligning
with
for all
where
and
. The values of
are stored in a matrix
where each element of
is calculated as follows:
The element contains the best score for the alignment of the
complete sequences.
If the alignment is required as well as the best score, then the
alignment path may be determined by tracing back through the
matrix. Alternatively a matrix of pointers is recorded to indicate
which of the three possibilities was the maximum at each value
.