We have extended the work of Zvelebil et al. [Zvelebil et al., 1987] to give a general method for quantifying residue conservation. Our approach differs in detail to that described by Zvelebil et al. , so for the sake of completeness and to avoid possible confusion we here describe the protocol used to quantify and compare residue conservation.
Figure 1a illustrates a Venn diagram (for details see [Taylor, 1986]) which is contained within a boundary that symbolises the universal set of 20 common amino acids (). The amino acids that possess the dominant properties, hydrophobic, polar and small (), are defined by their set boundaries. Subsets contain amino acids with the properties aliphatic (branched sidechain non-polar), aromatic, charged, positive, negative and tiny (). Shaded areas define sets of properties possessed by none of the common amino acids. The Venn diagram may be simply encoded as the property table or index shown in Figure 1b, where the rows define properties and the columns refer to each amino acid.
Cysteine occurs at two different positions in the Venn diagram. When participating in a disulphide bridge (C), cysteine exhibits the properties ``hydrophobic'' and ``small''. In addition to these properties, the reduced form (C) shows polar character and fits the criteria for membership of the ``tiny'' set.
When analysing proteins that do not have disulphides, an index which represents the properties of reduced cysteine is used (see SH2 domain analysis). In proteins where disulphide bonding is known to occur, or where the oxidation state of the cysteines is uncertain, an index representing cysteine in the oxidised form is generally more useful (as in Figure 1b).
The illustrated Venn diagram (Figure 1a) assigns multiple properties to each amino acid; thus, Lysine has the property hydrophobic by virtue of its long side chain as well as the properties polar, positive and charged. Alternative property tables may also be defined. For example, the amino acids might simply be grouped into non-intersecting sets labelled, hydrophobic, charged, and neutral.
Figure 2 illustrates the stages involved in the calculation of conservation numbers for a simplified property index (Figure 2a &b). All of the amino acids are assigned to the universal set (), which in this simple example, only contains the charged subset which in turn is broken down into subsets containing positively and negatively charged amino acids. This property index allows the positions of conserved charges to be identified, together with positions where a conserved charge changes polarity between different groups of sequences within an alignment.
The amino acids occurring at each position in the multiple alignment are recorded (Figure 2d), then tested for the presence of each of the three properties (Figure 2b). This is represented by the columns of entries for each amino acid (Figure 2e). For example, at aligned position 11, the first column in Figure 2e represents the properties of Arginine, the second column the properties of Tryptophan and so on. Filled circles show the amino acid is a member of a property set, empty circles indicate non-membership.
Each property is considered in turn by examining the rows of entries in Figure 2e. If all of the amino acids at a position possess the property, then the position shows positive conservation, all entries on that property's row in Figure 2e will be filled circles and a filled circle appears in Figure 2f. If all amino acids at a position lack the property, then the position shows negative conservation; all entries on the row in Figure 2e will be empty circles and an empty circle is seen in Figure 2f. If the possession of a property varies in the set of amino acids being considered, filled and empty circles appear in the equivalent row in Figure 2e, the property is labelled as unconserved and a shaded circle is shown in Figure 2f.
Two methods are used to quantify conservation at an alignment position using the information stored in Figure 2f. Method 1 is similar to that of Zvelebil et al. [Zvelebil et al., 1987] and regards as conserved any property which is either positively or negatively conserved. The number properties obeying this rule (number of filled or empty circles for a position in Figure 2f) is summed to give the conservation number (Figure 2g). In contrast, Method 2 only counts properties which are positively conserved (filled circles in Figure 2f) and gives the conservation numbers shown in Figure 2h.
The Method 1 conservation value is a function of the number of set boundaries that must be crossed to visit all the amino acids at a position. If a property index contains properties then the conservation number () is . For example, the dotted line in Figure 1a joins Leu and Arg and crosses 5 set boundaries, thus for this property matrix, . The maximum possible value for the conservation number calculated by Method 1 is given by the number of properties in the property index (3 for Figure 2b; 10 for Figure 1b).
Conservation by Method 2 is calculated by counting the number of sets common to all amino acids at a position. Leu and Arg in Figure 1a share no properties; by Method 2, their conservation number is 0. Asp and Glu in Figure 2a are both members of the sets charged and positive; their conservation number by Method 2 is 2. The maximum value for the conservation value calculated by Method 2 the the maximum number of properties possessed by a single amino acid in the property index.