Next: Which database should Up: Database scanning Previous: Basic principles of

Time considerations

In the early days of database scanning, the computer time required to execute the scan was a major consideration. Today, the ready availability of cheap, high performance computers means that computer resources are rarely a limiting factor. In the early 1980s Computers with sufficient memory and processor speed to compare a query to a database using dynamic programming were expensive shared resources. Over the last 10 years, the speed of typical institutional computers has increased by a factor of 70 while the sequence database has only grown by a factor of 9. This disparity coupled with the dramatic fall in the cost of computing means that it is currently feasible to perform protein database scans in a few hours on a personal computer using dynamic programming algorithms [57].

For occasional use, high scanning speed is not essential. After all, if it has taken months to obtain the sequence data, what is an few hours to check for similarities? However, much greater speed is helpful when providing a national or regional database scanning service and when carrying out analyses that require very large numbers of sequences to be compared. For example, the comparison of a 25,000 sequence database to itself would require 4.5 months using dynamic programming on a typical workstation [57]. The algorithms discussed in Sections 6.6 and 6.7 that make approximations, or implementations on specialist hardware may reduce this time by a factor of 10-100.


geoff.barton@ox.ac.uk