Next: Time considerations Up: Database scanning Previous: Database scanning

Basic principles of database searching

When scanning a database we take a query sequence, and use an algorithm to compare the query to each sequence in the database. Every pair comparison yields a score where larger scores usually indicate a higher degree of similarity. Thus, a scan of a database containing 60,000 sequences will typically provide 60,000 scores for analysis. If a local alignment method is used, then the total number of scores may be much larger since more than one ``hit'' may occur with each sequence. Figure 8 illustrates three score distributions from such a scan. The dark shaded bars show scores with sequences known to be structurally related to the query sequence whereas the light shaded bars show scores with proteins that are thought not to be related to the query. A perfect database scanning method would completely separate these two distributions as shown in Figure 8c. Normally, there is some overlap between the genuinely related and unrelated sequence distributions as shown in Figures 8c and 8b. There are a number of methods for ranking and re-scaling the scores to improve separation and remove artefacts due to different sequence lengths and compositions. In their most highly developed form, these methods provide an estimate of the probability of seeing a score by chance given a database of the size used and the query length. However, regardless of the method of ranking, there are nearly always some proteins giving scores in the overlap region that in fact are structurally related to the query. In practice, since no method succeeds for all protein queries, the aim is to minimise the overlap and ensure that potentially interesting similarities are scored high enough that they will be noticed by the user. Of course, what constitutes an ``interesting'' match is dependent upon the subjective biological context of the query.



Next: Time considerations Up: Database scanning Previous: Database scanning


geoff.barton@ox.ac.uk