Indexing has long been used for identifying identical ungapped regions in sequences. For example the SCAN facility in the PSQ and ATLAS programs distributed with the NBRF-PIR databank allows the rapid identification of short identical strings . This is achieved by pre-processing the entire databank once to identify the locations of all unique tripeptides. These data are stored in a direct access file together with pointers to the sequence identifier codes. The query peptide is also divided into a series of tripeptides and identification of the sequence in the databank then becomes a simple matter of looking up the starting positions of each peptide in the list held on file. There is a tradeoff with indexing methods between the time and space taken to build and store the index and the number of queries expected. Search times are usually very fast and involve a few disk accesses, the drawback with simple indexes is that they are restricted to exact matching without gaps.