Dealing with Gaps

Next: Finding the optimal alignment Up: Alignment of two sequences Previous: The scoring scheme, substitution

Dealing with Gaps

Insertions and deletions are observed within protein families, and it is normally necessary to introduce such indels when producing an alignment. The simplest scheme for gaps introduces a new character that scores u when aligned with any amino acid. Since gaps are comparatively rare, u is usually made negative. As a consequence, u is often referred to as the gap-penalty. In this simple scheme, a gap of 10 residues is penalised 10 times more highly than a gap of 1 residue. Within protein families, this makes little sense, since gaps of more than one residue are needed to obtain structurally reasonable alignments. The most commonly used scoring scheme for gaps is a function of the form: ul + v, where l is the length of the gap in residues. This form of penalty function is referred to as affine and has efficiency advantages over more elaborate penalty functions [Gotoh, 1982]. The constants v and u are often referred to as the penalties for creation and extension of the gap, or length-independent and length-dependent penalties, respectively. Gap penalties need not be uniform across the sequence and such position specific gap penalties are discussed in Section 4.4.

Next: Finding the optimal alignment Up: Alignment of two sequences Previous: The scoring scheme, substitution

geoff@ebi.ac.uk