Representation and analysis of multiple alignments

Next: Database scanning Up: Multiple sequence alignment Previous: Extension of segment

Representation and analysis of multiple alignments

How do we extract the maximum information from a multiple protein sequence alignment?

When making a multiple sequence alignment a crude tree is normally generated. The tree shows the gross relationships between the sequences. It may show that sequences A, D and C are more similar to each other than they are to B and E. However, it does not show which individual residues have changed in order to make A, D and C different from B and E. These residues may be the most important ones to investigate by site-directed mutagenesis. Livingstone and Barton [53] have described a set-based strategy to identify such differences by comparing pairs of groups of aligned residues. Their method automatically provides a text summary of the similarities and a boxed and shaded or coloured alignment. An example or the graphical output of this analysis is illustrated in Figure 7 for the SH2 domain family.

Providing the alignment is accurate then the following may be inferred about the secondary structure of the protein family:

The position of insertions and deletions suggests regions where surface loops exist in the protein.
Conserved glycine or proline suggests a - turn.
Residues with hydrophobic properties conserved at separated by unconserved or hydrophilic residues suggest a surface - strand.
A short run of hydrophobic amino acids (4 residues) suggests a buried - strand.
Pairs of conserved hydrophobic amino acids separated by pairs of unconserved, or hydrophilic residues suggests an - helix with one face packing in the protein core. Likewise, an pattern of conserved hydrophobic residues.

These patterns are not always easy to see in a single sequence, but given a multiple alignment, they often stand out and allow secondary structure to be assigned with degree of confidence. For example, patterns were used to aid the accurate prediction of the secondary structure and position of buried residues for the annexins and SH2 domains prior to knowledge of their tertiary structures [56][55][54].

Next: Database scanning Up: Multiple sequence alignment Previous: Extension of segment

geoff.barton@ox.ac.uk