The strategy described in this paper is extremely flexible: it allows different physico-chemical properties to be examined independently, or in concert. In addition, an alignment may be dissected into any combination of sub-groups and their relative conservation analysed. As with any analytical procedure, the strategy is most effective when one has a clear idea of what one is looking for. For example: `` What makes sub-group A different from B and C? '' , or `` Which residues in sub-group D should I change to make D more like A? ''. If no clear questions have been defined, then the general property index (Figure 1b) is a useful starting point to highlight patterns of residue conservation. This is illustrated in Figure 6 for an alignment of 67 SH2 domains [Russell et al., 1992]. Since SH2 domains are cytoplasmic, Cys was assigned the properties of the free amino acid () in this analysis (Figure 1b). The alignment is divided into eight sub-groups on the basis of overall sequence similarity. Sub-groups 1-7 (numbering from the top) share more than 20%sequence identity, whilst sequences not fitting into one of these sub-groups are collected in sub-group 8. The overall conservation of physico-chemical properties is highlighted by the histogram at the base of the alignment. The upper histogram indicates the normalised frequency of similarities between pairs of subgroups whilst the lower plot shows the frequency of pair differences.
Dark shading of the histogram indicates the frequency of pairs of sub-groups that show sequence identity. A hand analysis of an alignment similar to that shown in Figure 6 correctly identified the location of the core secondary structures, and phosphotyrosine-binding residues [Barton \& Russell, 1993][Russell et al., 1992]. Since completion of that study, the three dimensional structures of three SH2 domains have been determined by the techniques of X-ray crystallography and NMR. The secondary structures of these are illustrated at the base of Figure 6 ([Booker et al., 1992][Overduin et al., 1992][Waksman et al., 1992]). The conservation histograms clearly correspond to the regions of secondary structure, and are helpful in identifying patterns characteristic of - helix and - strand. For example, at positions 15 and 97 CXXCCXXC patterns (where C=Conserved) characteristic of - helix are clearly visible.
The annexins are a family of proteins that bind phospholipid in a calcium dependent manner. Annexins consist of a variable N-terminal sequence followed by four or eight repeats, each of approximately 80 amino acids. Inspection of a multiple sequence alignment of 40 repeats identified the unique features of each repeat family, and located patterns of residue substitution characteristic of the secondary structures [Barton et al., 1991]. Figure 5 illustrates the application of hierarchical conservation analysis to a subset of these annexin repeats. Only conserved charges are shown (Figure 5a), and the differences summary clearly locates the position of a change in charge sign (position 31). This charge swap corresponds to the site of an inter-repeat salt bridge [Barton et al., 1991].
Additional charge changes are also seen at positions 13, 31, 40 and 68 as listed in the textual summary shown in Figure 5b. While all these features can be identified by hand inspection of the alignment, the process is laborious and error-prone. The strategy described in this paper reduces the scope for error, allows alternative sub-groupings to be investigated rapidly, and provides shading and boxing that is structurally relevant.
AMAS and Alscript are available from the authors.