Next: Command line options Up: File formats Previous: Matrix File format

Defaults file format

SCANPS does not have hard wired limits for number or length of sequences etc. These are all defined in the defaults file. A defaults file has a series of keyword, value pairs. The defaults file must be defined by the SCANPSDEFAULTS environment variable.

For example:



MAX_NSEQ 500			
MAX_SEQ_LEN 7000
MAX_ID_LEN 30
MAX_TITLE_LEN 500
MAX_BLOC_SEQ 500
PEN 8
MIN_SCORE 0
OUTPUT_LENGTH 50
SCAN 0
PRECISION 100
PCUT 0.0001
MATRIX_FILE /home/geoff/gjb/md/md.mat
FIT_FILE /home/geoff/gjb/c/scanps/metro/new/fits.md.8.dat
RUN_SW_MIN 35

MAX_NSEQ defines the maximum number of sequences that may be read into the program. If you are just doing database scanning, then it is most efficient to set this to a small value - say 2 or 3.

MAX_SEQ_LEN The maximum allowed length for a sequence. Set this to something big. The program reallocates memory down to the actual length of the sequence.

MAX_ID_LEN The maximum length of an identifier for a sequence.

MAX_TITLE_LEN The maximum length for a sequence title.

MAX_BLOC_SEQ The maximum number of sequences allowed in a block file.

PEN The length dependent gap penalty. This can also be set as a command line argument (-p).

MIN_SCORE The minimum scoring alignment that will be output. This can be set from the command line (-c).

OUTPUT_LENGTH The number of characters per line for alignment output.

SCAN Set to 0 for fast method, 1 for NALL method. This can also be set from the command line (-a0, -a1).

PRECISION Set the numeric precision of the program. SCANPS does all calculations as integers. All numbers are multiplied by PRECISION before any operation. 100 is enough for most pairscore matrices. Making this value too big may cause integer overflow problems with long sequences.

PCUT Probability cutoff. Only alignments that give lower values of probability will be output. This can be set at the command line (-g). See the section on advanced scanning.

MATRIX_FILE The name of the file containing the pairscore matrix. This can be defined on the command line (-m).

FIT_FILE The file of length-dependent probability parameters. Currently there is only one. Soon there will be other files for alternative matrix/gap-penalty combination.

RUN_SW_MIN In NALL scanning mode, scanps first does a fast Smith-Waterman comparison. If the score for the comparison is above this value, then the NALL method is applied to the sequence pair. If probability scoring is enabled, then this value is calculated from the probability and length cutoffs.



Next: Command line options Up: File formats Previous: Matrix File format


gjb@bioch.ox.ac.uk