Next: Alternative scoring systems Up: Defining a Pattern Previous: Defining a Pattern

Defining the scoring scheme

When the pattern is read by the program, a lookup table is defined that specifies the score for aligning each possible amino acid with each element of the pattern. There are several possible ways of deriving this table, each of which is defined by commands to the program.

The most flexible scoring system is to define a LOOKUP table with the READ_LOOKUP command (see above). A simpler approach is to calculate a lookup table based upon the observed amino acids in the pattern.

Example command file (bash_ge_4_scan1.com) to scan with the pattern defined in the file bash_ge_4.bloc.


output_file=bash_ge_4_scan1.out	    1. output file
mode = scan			    2. set the mode to scan
block_file/pattern=bash_ge_4.bloc,1 3. define the pattern file
matrix_file=md.mat		    4. define dayhoff matrix scoring
database=protein.seq		    5. define the database to scan

This scan takes 977 seconds (16 minutes) on a Sun SPARCstation 1 (6721 sequences scanned out of a total of 6858). The result of this scan is shown in file bash_ge_4_scan1.out.

The new commands are:

MODE=SCAN
This tells the program to compare a pattern or alignment to the DATABASE rather than perform multiple or pairwise sequence alignment.

/PATTERN
in the BLOCK_FILEcommand. This tells the program to use the flexible pattern matching algorithm rather than a conventional Needleman and Wunsch.

DATABASE
This defines the sequence database to be scanned (must be in PIR format as defined in APPENDIX II.

By specifying the md.mat MATRIX file we are defining DAYHOFF scoring for the scan with simple averages used to define the lookup table.


gjb@bioch.ox.ac.uk