Insertions and deletions (gaps - ) are usually tolerated only
in surface loop regions. Accordingly, gaps are normally given all
properties in the property matrix so that aligned positions that
contain a gap are assigned a low conservation value.
The set based conservation analysis described here is independent of
the number of sequences analysed. For example, a position in an
alignment of 100 sequences that contains 99 Alanines and one Lysine
will give the same conservation value as a position in an alignment of
two sequences that has one Alanine and one Lysine. The advantage of
this approach is that the tolerance of particular physico-chemical
properties at a position indicates the likely environment of the amino
acids in the common fold of the protein family. This reasoning
suggests that a position that conserves Valine in 99 sequences, but
also shows Aspartate is unlikely to be performing a common structural
or functional role. However, it may sometimes be suspected that one
or more of the sequences contain errors, or that there are errors in
the alignment. It is then desirable to relax the strict conservation
rules. Accordingly, a predetermined number of gaps or residues that
are represent less than %of the total at a position may be
ignored when calculating conservation values. For example, alignment
position 3 in Figure 2 is
predominantly Asp.
This position would not be recorded as conserved using the charge index due to the presence of a single Asn (1 out of 12 or 8.3%of the sequences in the alignment). If a 10%threshold for unusual residues is set, then this Asn would be ignored when calculating the conservation value (similarly, Val at position 10). Positions where unusual residues have been ignored are reported only as conserved, never as identical even if the other residues present are identical (Figure 2, position 3). It is the ability to quantify the conservation of amino acids which gives the set based approach its major advantage over averaging a single property scale, caution must therefore be exercised when deciding to ignore gaps and unusual residues.