A naive extension of the segment comparison methods described in
section 3.1 to sequences would require a number of
comparisons in the order of the product of the sequence lengths.
Clearly, as with dynamic programming methods, such an approach is not
practical. Bacon and Anderson [50] reduced the magnitude of
this problem by considering the alignment in one specific order.
Firstly sequence one is compared to sequence two and the top
scoring pairs of segments are stored. The next sequence is then
compared to these top scoring segments, and the top scoring segments
from the three sequences are kept. This process is continued and
leads to a list of
alignments of top scoring segments from
sequences. Bacon and Anderson also extended the statistical models of
McLachlan [6] to
sequences, and used this model as
well as one based on random sequences to assess the significance of
the highest scoring segment alignment found. They suggested that
these techniques allow sequences to be objectively grouped, even when
most of the pairwise interrelationships are weak, and cite examples of
applications to five Ribonucleases, three FAD-binding enzymes and five
-cro like DNA binding proteins. The Bacon and Anderson (1986)
algorithm shows considerable promise for the location of significant
short sequence similarities. However, the method does not provide an
overall alignment of the sequences and does not explicitly consider
gaps. Johnson and Doolittle [51] reduce the number of
segment comparisons that must be performed by progressively evaluating
selected segments from each sequence within a specified 'window'.
Their method generates a complete alignment of the sequences with a
consideration of gaps. Unfortunately, time constraints limit its
application to 4-way alignments whilst 5-way alignments become
unreasonably expensive for sequence lengths above fifty residues.
A variation on segment methods is employed by the alignment tool Macaw [52]. Macaw applies the BLAST algorithm (see Section 6.7) to locate the most significant ungapped similarities irrespective of length. This facility is coupled with a flexible alignment display tool under Microsoft Windows. The program works well for small numbers of sequences, but lacks the convenience of the hierarchical dynamic programming methods (see section 5.2).