You can estimate the statistical significance of the local alignments by adding the -F1 option to the scanps command. The numbers that are produced are only ``true'' probabilities when used with the PAM250 matrix and gap penalty of 8 (this will change in later releases). A paper describing the method by which the probabilities are estimated is in preparation.
For example:
scanps -ssh2.seq -a1 -c90 -d -F1 < sh2.top15.seq > sh2.top15.alig.prob
Inspection of the sh2.top15.alig.prob file shows the alignments now include a ``probability'' value. These are all small numbers for these alignments. The S01966 alignments are shown here:
Comparison with: S01966 GTPase-activating protein - bovine 1046 Residues Raw Score: 171.0 S01966 Allen: 90 Score/Allen: 1.900000 Probability: 8.6301e-18 ** **. * .* * .* .. *..*.***. *.. . .* * . . 1 WYFGKITRRESERLLLNAENPRGTFLVRESETTKGAYCLSVSDFDNAKGL 50 178 WYHGKLDRTIAEERLRQAGKS GSYLIRESDRRPGSF V LS FLSQTNV 223 *.*..* . *..** .* .*.** .*..*** * * 51 NVKHYKIRKLDSGGFYITSRTQFNSLQQLVAYYSKHADGL 90 224 VNHFRIIAM CGDYYIGGR RFSSLSDLIGYYS HVSCL 259 Raw Score: 130.0 S01966 Allen: 86 Score/Allen: 1.511628 Probability: 3.133e-11 *. ***...*. **. .. .**** *. * * * * * . 1 WYFGKITRRESERLLLNAENPRGTFLVRESETTKGAYCLSVSDFDNAKGL 50 348 WFHGKISKQEAYNLLMTVGQA CSFLVRPSDNTPGDYSL Y F RTSE 391 *....** . * . .* .**. ... * *. 51 NVKHYKIRKLDSGGFYITSRTQFNSLQQLVAYYSKH 86 392 NIQRFKICPTPNNQFMMGGRY YNSIGDIIDHYRKE 426
The probability values can be useful when comparing alignments of very different length. Short alignments will normally be expected to have lower scores than long alignments. Simply ranking on the Raw Score takes no account of this fact.