Extension of segment methods to multiple alignment

Next: Representation and analysis Up: Multiple sequence alignment Previous: Tree or hierarchical

Extension of segment methods to multiple alignment

A naive extension of the segment comparison methods described in section 3.1 to sequences would require a number of comparisons in the order of the product of the sequence lengths. Clearly, as with dynamic programming methods, such an approach is not practical. Bacon and Anderson [50] reduced the magnitude of this problem by considering the alignment in one specific order. Firstly sequence one is compared to sequence two and the top scoring pairs of segments are stored. The next sequence is then compared to these top scoring segments, and the top scoring segments from the three sequences are kept. This process is continued and leads to a list of alignments of top scoring segments from sequences. Bacon and Anderson also extended the statistical models of McLachlan [6] to sequences, and used this model as well as one based on random sequences to assess the significance of the highest scoring segment alignment found. They suggested that these techniques allow sequences to be objectively grouped, even when most of the pairwise interrelationships are weak, and cite examples of applications to five Ribonucleases, three FAD-binding enzymes and five -cro like DNA binding proteins. The Bacon and Anderson (1986) algorithm shows considerable promise for the location of significant short sequence similarities. However, the method does not provide an overall alignment of the sequences and does not explicitly consider gaps. Johnson and Doolittle [51] reduce the number of segment comparisons that must be performed by progressively evaluating selected segments from each sequence within a specified 'window'. Their method generates a complete alignment of the sequences with a consideration of gaps. Unfortunately, time constraints limit its application to 4-way alignments whilst 5-way alignments become unreasonably expensive for sequence lengths above fifty residues.

A variation on segment methods is employed by the alignment tool Macaw [52]. Macaw applies the BLAST algorithm (see Section 6.7) to locate the most significant ungapped similarities irrespective of length. This facility is coupled with a flexible alignment display tool under Microsoft Windows. The program works well for small numbers of sequences, but lacks the convenience of the hierarchical dynamic programming methods (see section 5.2).

Next: Representation and analysis Up: Multiple sequence alignment Previous: Tree or hierarchical

geoff.barton@ox.ac.uk