next up previous
Next: Plans for additions - Up: scanps_manual Previous: Scanning Protein with DNA

APPENDIX I - Revision notes

2.1

First full parallel version with searching for protein vs protein database.
Includes code to build and use binary database for fast loading.
Includes code for sorting the results.
ln() scores.
alignment output options.
All within the same program.

2.2

Fix get_fasta.c so that it will read a FASTA file that is missing a title.
Fix DB_DIR to permit missing / at end of directory name.
Add MATRIX_DIR command.

2.2.1

Add option to read NCBI format matrix files.  MATRIX_TYPE command.

2.2.2

Various odds and ends.  Added EXTRACT options to enable the sequence
of high scoring database hits to be fished out of the database.

2.2.3

Add routines to compare DNA to protein, with frameshifts.  
Fix mysterious looking  bug when generating alignments with affine gaps.  
Tidy up timing routines.
Add option to print full length titles.

First released version?  Nope.

2.2.4

Small changes not worth mentioning

2.2.5

Add MODE 20 for fast frameshifting DNA vs Protein comparisons.  Add
HQUERY option at complile time to allow VERY big query sequences
(e.g. 2 Megabases).  Add COMPLEMENT_QUERY option to allow scanning with 
complement of the query.  Modify MODE 22 code to eliminate FSE_PEN and replace
with simple length-dependent penalty for frameshift gaps.
Add MODE 100 to allow sequences to be extracted from the database following
a scan that produced no alignments.

2.2.6

Add statistical estimates based on extreme value distribution.  This is based on the 
statistics used in the programs FASTA and SSEARCH3 though the implementation is
different.  No statistics in alignment output as yet,
just the score list.

2.3

Small bug fixes. Add the licensing routines.  Tidy up the distribution.


2.3.1

Small changes to the way in which the probcut and max_nout options interact.
The program now allows max_nout to control the number of sequences output
when probcut >1.  Bug removed for probcut ==1 case.

2.3.2

Add new statistical routines with on-the-fly EVD fitting.  Add iterative searching methods.
Replace MODES 0 and 2 with code from MODE 200 and MODE 202.

2.3.9

Re-write the manual to include description of iterative searching and standard
protein and protein profile searching methods.



Geoff Barton (GJB) 2002-07-23