Next: Finding the best Up: Comparison of two Previous: Variable length segments

Sequence comparison with gaps

The segment based techniques described in section 3.1 do not consider explicitly insertions and deletions. Deletions are often referred to as ``gaps'', while insertions and deletions are collectively referred to as ``indels''. Insertions and deletions are usually needed to align accurately even quite closely related sequences such as the - and - globins. The naive approach to finding the best alignment of two sequences including gaps is to generate all possible alignments, add up the scores for equivalencing each amino acid pair in each alignment then select the highest scoring alignment. However, for two sequences of 100 residues there are alternative alignments so such an approach would be time consuming and infeasible for longer sequences. Fortunately, there is a group of algorithms that can calculate the best score and alignment in the order of steps. These dynamic programming algorithms were first developed for protein sequence comparison by Needleman and Wunsch [13], though similar methods were independently devised during the late 1960's and early 1970's for use in the fields of speech processing and computer science [25].



geoff.barton@ox.ac.uk