Position specific gaps-penalties: Domains and secondary structure

Next: Position specific weights: Profile Up: Alignment of two sequences Previous: Finding the optimal alignment

Position specific gaps-penalties: Domains and secondary structure

In the simple example in the previous section, the penalty for a gap is equal at all locations in the alignment. However, it often makes sense to penalise gaps differently at the ends, or at different positions within each sequence. For example, if a protein domain is being aligned to a longer sequence that is known to contain the domain, the penalties at the end of the domain should be reduced to allow the domain to slide over the longer sequence. If the secondary structure of one protein in a pair to be aligned is known, then increasing the gap-penalty within core secondary structure elements will reduce the likelihood of placing a gap in a secondary structure [Barton & Sternberg, 1987a,Lesk et al., 1986].

Both changes require simple modifications to the algorithm. End gaps are adjusted by changing the gap-penalty constants for the 0th and last row and column of the H matrix. Position-specific gaps are set by having a vector of penalties P of length m rather than a single constant $\Delta$ . This modifies the calculation of H_i,j to:

$\begin{displaymath} H_{i,j} = \max \left\{ \begin{array}{c} H_{i-1,j-1} + w_{A... ...{\Delta,B_{j}}\\ H_{i-1,j} + P_{i}\\ \end{array} \right\} \end{displaymath}$

(2)

In this example, the gap-vector P refers to sequence A. Thus, the weight for aligning any residue in A with a gap will depend on where the residue is in A. In contrast, aligning a residue in B with a gap is penalised equally irrespective of position.

There are many ways of modifying position specific gap-penalties. For example P can be applied to gaps in both sequences, but dependent only on the position in A, so eliminating the fixed constant $\Delta$ , or a second gap-penalty vector can be introduced for B.

Next: Position specific weights: Profile Up: Alignment of two sequences Previous: Finding the optimal alignment

geoff@ebi.ac.uk