Next: Introduction Up: No Title Previous: Contents

Abstract

An algorithm is described for the systematic characterisation of the physico-chemical properties seen at each position in a multiple protein sequence alignment. The new algorithm allows questions important in the design of mutagenesis experiments to be quickly answered since positions in the alignment that show unusual or interesting residue substitution patterns may be rapidly identified. The strategy is based on a flexible set-based description of amino acid properties which is used to define the conservation between any group of amino acids. Sequences in the alignment are gathered into sub-groups on the basis of sequence similarity, functional similarity, evolutionary, or other criteria. All pairs of sub-groups are then compared to highlight positions that confer the unique features of each sub-group. The algorithm is encoded in the computer program AMAS (Analysis of Multiply Aligned Sequences) which provides a textual summary of the analysis and an annotated (boxed, shaded and/or coloured) multiple sequence alignment. The algorithm is illustrated by application to an alignment of 67 SH2 domains where patterns of conserved hydrophobic residues that constitute the protein core are highlighted. The analysis of charge conservation across annexin domains identifies the locations at which conserved charges change sign. The algorithm simplifies the analysis of multiple sequence data by condensing the mass of information present, and thus allows the rapid identification of substitutions of structural and functional importance.

cdl@bioch.ox.ac.uk