Next: Scan with sequence Up: How to use Previous: Introduction to Unix

Simple Scanning - just returns top score for each sequence

There are four steps to simple database scanning using SCANPS:

  1. Perform the scan, either using a single sequence or a multiple sequence alignment. Save the score and identifier of each sequence in the database.

  2. Sort the results of the scan into descending order.

  3. Inspect the sorted file to decide how many of the high scoring proteins we are interested in. Extract the sequences for these high scoring proteins.

  4. Run scanps again, this time reading sequences from the high scoring sequence file and generate all alignments down to some threshold score.

For example, we can scan with the SH2 domain from src. The sequence data file should look something like this:



>TVHUSC_SH2
src SH2 domain
WYFGKITRRESERLLLNAENPRGTFLVRES
ETTKGAYCLSVSDFDNAKGLNVKHYKIRKL
DSGGFYITSRTQFNSLQQLVAYYSKHADGL
CHRLTTV*

This is standard NBRF-PIR format. SCANPS expects to find a ``>'' symbol followed by an identifier code, then on the NEXT line a title, in this case ``src SH2 domain'', then the one letter amino acid code terminated by a star ``*''. Note that the amino acid sequence MUST be in uppercase, but any number of characters per line is allowed. scanps ONLY reads alphabetic characters and IGNORES spaces, dots, or numbers. Non legal amino acid codes are read as ``X'' (eg. the letters O, or I).



gjb@bioch.ox.ac.uk