The strategy described in this paper is extremely flexible: it allows
different physico-chemical properties to be examined independently, or
in concert. In addition, an alignment may be dissected into any
combination of sub-groups and their relative conservation analysed. As
with any analytical procedure, the strategy is most effective when one
has a clear idea of what one is looking for. For example: `` What
makes sub-group A different from B and C? '' , or `` Which residues in
sub-group D should I change to make D more like A? ''. If no clear
questions have been defined, then the general property index
(Figure 1b) is a useful starting point
to highlight patterns of residue conservation. This is illustrated in
Figure 6 for an alignment of 67 SH2
domains [Russell et al., 1992]. Since SH2 domains are cytoplasmic, Cys was
assigned the properties of the free amino acid () in
this analysis (Figure 1b). The
alignment is divided into eight sub-groups on the basis of overall
sequence similarity. Sub-groups 1-7 (numbering from the top) share
more than 20%sequence identity, whilst sequences not fitting into
one of these sub-groups are collected in sub-group 8. The overall
conservation of physico-chemical properties is highlighted by the
histogram at the base of the alignment. The upper histogram indicates
the normalised frequency of similarities between pairs of subgroups
whilst the lower plot shows the frequency of pair differences.
Dark shading of the histogram indicates the frequency of pairs of
sub-groups that show sequence identity. A hand analysis of an
alignment similar to that shown in Figure
6 correctly identified the location of the core secondary
structures, and phosphotyrosine-binding residues [Barton \& Russell, 1993][Russell et al., 1992].
Since completion of that study, the three dimensional structures of
three SH2 domains have been determined by the techniques of X-ray
crystallography and NMR. The secondary structures of these are
illustrated at the base of Figure 6
([Booker et al., 1992][Overduin et al., 1992][Waksman et al., 1992]). The conservation histograms clearly
correspond to the regions of secondary structure, and are helpful in
identifying patterns characteristic of - helix and
- strand. For
example, at positions 15 and 97 CXXCCXXC patterns (where C=Conserved)
characteristic of
- helix are clearly visible.
The annexins are a family of proteins that bind phospholipid in a calcium dependent manner. Annexins consist of a variable N-terminal sequence followed by four or eight repeats, each of approximately 80 amino acids. Inspection of a multiple sequence alignment of 40 repeats identified the unique features of each repeat family, and located patterns of residue substitution characteristic of the secondary structures [Barton et al., 1991]. Figure 5 illustrates the application of hierarchical conservation analysis to a subset of these annexin repeats. Only conserved charges are shown (Figure 5a), and the differences summary clearly locates the position of a change in charge sign (position 31). This charge swap corresponds to the site of an inter-repeat salt bridge [Barton et al., 1991].
Additional charge changes are also seen at positions 13, 31, 40 and 68 as listed in the textual summary shown in Figure 5b. While all these features can be identified by hand inspection of the alignment, the process is laborious and error-prone. The strategy described in this paper reduces the scope for error, allows alternative sub-groupings to be investigated rapidly, and provides shading and boxing that is structurally relevant.
AMAS and Alscript are available from the authors.