Next: Efficient Determination of Up: No Title Previous: Contents

Introduction

Dynamic programming algorithms that locate optimal alignments of two sequences are central techniques for the comparison of biological sequences [Needleman & Wunsch, 1970,Sellers, 1974,Smith & Waterman, 1981] or three-dimensional structures [Barton & Sternberg, 1988,Taylor & Orengo, 1989,Sali & Blundell, 1990,Russell & Barton, 1992]. The algorithms can be divided broadly into those that seek to find a global alignment between the sequences (e.g. [Needleman & Wunsch, 1970] and those that find local alignments (e.g [Erickson & Sellers, 1983,Smith & Waterman, 1981]). Global alignment methods optimize the score for alignment over the full length of both sequences, and are most appropriate when the sequences are known to be similar over their entire length. Local alignment methods allow the common sub-regions of the two sequences to be identified and are appropriate when it is not known in advance if the sequences being compared are similar. Local alignment methods are effective in locating common sub-domains between long sequences that otherwise share little similarity. This feature makes such algorithms suitable for scanning large sequence databanks for similarities to a newly determined sequence.

The Smith and Waterman [Smith & Waterman, 1981] algorithm is perhaps the most widely used local similarity algorithm for biological sequence comparison. The algorithm identifies the single highest scoring sub-sequence alignment and allows for gaps (insertions/deletions). However, it is often true that there may be more than one biologically important alignment between two sequences. For example, a protein domain may be repeated, or domains may be shuffled within multi-domain proteins. Waterman and Eggert [Waterman & Eggert, 1987] have shown how the Smith-Waterman algorithm may be extended to locate the second-best and subsequent local alignments with minimal recalculation, subject to the primary restriction that the different alignments should not intersect. Here, I describe an algorithm that allows all such locally optimal alignments to be determined without the need for re-calculation. The algorithm is similar in principle to that developed by Coulson et al. for the parallel processing Distributed Array Processor (DAP) [Coulson et al., 1987]. However, the algorithm described here is general and has been implemented in C for widely available computers. The algorithm may be applied to any problem that is amenable to the Smith-Waterman dynamic programming algorithm.

Next: Efficient Determination of Up: No Title Previous: Contents

Geoff Barton
Fri Sep 22 17:15:02 BST 1995