Next: Approximations: The FASTP Up: Index methods Previous: Simple index for

Indexing with gaps - the FLASH algorithm

Recently, Rigoutsos and Califano of the IBM T. J. Watson Research Center have extended the idea of indexing to allow for gaps and mismatches [62]. The indexes, or lookup tables are highly redundant and based on a probabilistic model. As a consequence the index files are very large and the problem is less one of absolute CPU speed, and more a question of fast disk access. For example, the index for SWISS-PROT release 25 requires 2.8 GBytes of disk space (I. Rigoutsos personal communication). However, the Rigoutsos and Califano FLASH algorithm permits very rapid scans to be performed on protein databases with claimed sensitivity and accuracy close to dynamic programming. The algorithm has been implemented on a network of 7 non dedicated RISC workstations which provides sufficient speed to service a searching facility via email (send the text SEND HELP to dflash@watson.ibm.com for information). As databases grow in size with a large amount unchanging, and the cost of disk storage falls, it seems likely that indexing techniques will become increasingly important methods of searching.


geoff.barton@ox.ac.uk