Next: Availability Up: No Title Previous: All local alignment

Implementation and Efficiency

The ``All local alignment'' algorithm has been implemented in C. This language allows the C, M, S and E arrays to be grouped into a single data structure array of length m. The resulting code is faster than when C, M, S and E are coded separately, since sequentially accessed values are adjacent in memory. The results list is also of length m where each element points to a dynamically allocated ``ragged'' list of structures containing values for M, S and E.

As a check of the relative efficiency of the ``all local alignment'' algorithm, the protein sequence of human -- haemoglobin (141-residues, PIR code HAHU) was compared to a small sequence databank (PIR 14.0: 6,858 sequences, 2,080,148 amino acids) using the Dayhoff MDM250 matrix [Dayhoff et al., 1978] and gap-penalty of 8 (). When run on a Sun SPARCstation 2, the efficient ``best score only'' algorithm determined one optimal local alignment for each databank sequence, and required 270 seconds to complete the scan. Another implementation of the Smith-Waterman algorithm SSEARCH [] which also only returns the top scoring alignment for each sequence pair, required 1,100 seconds for the same scan. In contrast, the ``all local alignment'' algorithm described here with examined 30,690 local alignments in only 1,000 seconds.

The value of has little effect on the execution time, except when set very small such that large numbers of local alignments must be stored and sorted for each sequence comparison. For example, comparison of HAHU (141 residues) to one of the longest protein sequences known, Twitchin from Caenorhabditis elegans (PIR code S07571 - 6048 residues), finds 32,299 alternative local alignments when and requires 16 sec CPU time. Setting reduces the number of alignments to 58 and takes only 2 sec. The algorithms described here have also been implemented to work with pscan [Barton, 1991] to allow distributed processing on a network of workstations. pscan permits an approximately linear decrease in elapsed scan time with increasing numbers of processors.

Next: Availability Up: No Title Previous: All local alignment

Geoff Barton
Fri Sep 22 17:15:02 BST 1995