Dynamic programming algorithms may be divided into those that find a global alignment of the sequences and those that find local alignments. The difference between global and local alignment is illustrated in Figure 1. Global alignment is appropriate for sequences that are known to share similarity over their whole length. Local alignment is appropriate when the sequences may show isolated regions of similarity, for example multiple domains or repeats. Local alignment is best applied when scanning a database to find similarities or when there is no a priori knowledge that the protein sequences are similar.
There are many variations on the theme of dynamic programming applied to protein comparisons. Here I give a brief account of a basic method for finding the global best score for aligning two sequences. For a clear and detailed explanation of dynamic programming see Sankoff and Kruskal .
Let the two sequences of length and be and the symbol for a single gap be . At each aligned position there are three possible events.
substitution of by .
deletion of .
deletion of .
The substitution weight is derived from the chosen scoring scheme - perhaps Dayhoff's matrix. Gaps are normally given a negative weight often referred to as the ``gap penalty'' since insertions and deletions are usually less common than substitutions.
The maximum score for the alignment of with may be represented as . This may be found by working forward along each sequence sucessively finding the best score for aligning with for all where and . The values of are stored in a matrix where each element of is calculated as follows:
The element contains the best score for the alignment of the complete sequences.
If the alignment is required as well as the best score, then the alignment path may be determined by tracing back through the matrix. Alternatively a matrix of pointers is recorded to indicate which of the three possibilities was the maximum at each value .