Insertions and deletions (gaps - ) are usually tolerated only in surface loop regions. Accordingly, gaps are normally given all properties in the property matrix so that aligned positions that contain a gap are assigned a low conservation value.
The set based conservation analysis described here is independent of the number of sequences analysed. For example, a position in an alignment of 100 sequences that contains 99 Alanines and one Lysine will give the same conservation value as a position in an alignment of two sequences that has one Alanine and one Lysine. The advantage of this approach is that the tolerance of particular physico-chemical properties at a position indicates the likely environment of the amino acids in the common fold of the protein family. This reasoning suggests that a position that conserves Valine in 99 sequences, but also shows Aspartate is unlikely to be performing a common structural or functional role. However, it may sometimes be suspected that one or more of the sequences contain errors, or that there are errors in the alignment. It is then desirable to relax the strict conservation rules. Accordingly, a predetermined number of gaps or residues that are represent less than %of the total at a position may be ignored when calculating conservation values. For example, alignment position 3 in Figure 2 is predominantly Asp.
This position would not be recorded as conserved using the charge index due to the presence of a single Asn (1 out of 12 or 8.3%of the sequences in the alignment). If a 10%threshold for unusual residues is set, then this Asn would be ignored when calculating the conservation value (similarly, Val at position 10). Positions where unusual residues have been ignored are reported only as conserved, never as identical even if the other residues present are identical (Figure 2, position 3). It is the ability to quantify the conservation of amino acids which gives the set based approach its major advantage over averaging a single property scale, caution must therefore be exercised when deciding to ignore gaps and unusual residues.