When the pattern is read by the program, a lookup table is defined that specifies the score for aligning each possible amino acid with each element of the pattern. There are several possible ways of deriving this table, each of which is defined by commands to the program.
The most flexible scoring system is to define a LOOKUP table with the READ_LOOKUP command (see above). A simpler approach is to calculate a lookup table based upon the observed amino acids in the pattern.
Example command file (bash_ge_4_scan1.com) to scan with the pattern defined in the file bash_ge_4.bloc.
output_file=bash_ge_4_scan1.out 1. output file mode = scan 2. set the mode to scan block_file/pattern=bash_ge_4.bloc,1 3. define the pattern file matrix_file=md.mat 4. define dayhoff matrix scoring database=protein.seq 5. define the database to scan
This scan takes 977 seconds (16 minutes) on a Sun SPARCstation 1 (6721 sequences scanned out of a total of 6858). The result of this scan is shown in file bash_ge_4_scan1.out.
The new commands are:
By specifying the md.mat MATRIX file we are defining DAYHOFF scoring for the scan with simple averages used to define the lookup table.