Next: Advanced Scanning - Up: Simple Scanning - Previous: Generating the local

Generating the local alignments - estimating significance

You can estimate the statistical significance of the local alignments by adding the -F1 option to the scanps command. The numbers that are produced are only ``true'' probabilities when used with the PAM250 matrix and gap penalty of 8 (this will change in later releases). A paper describing the method by which the probabilities are estimated is in preparation.

For example:



scanps -ssh2.seq -a1 -c90 -d -F1 < sh2.top15.seq > sh2.top15.alig.prob

Inspection of the sh2.top15.alig.prob file shows the alignments now include a ``probability'' value. These are all small numbers for these alignments. The S01966 alignments are shown here:



Comparison with: S01966 GTPase-activating protein - bovine   1046 Residues
Raw Score: 171.0 S01966 Allen: 90 Score/Allen: 1.900000
Probability: 8.6301e-18
      ** **. *  .*  * .* .. *..*.***.   *.. . .* * .   .
    1 WYFGKITRRESERLLLNAENPRGTFLVRESETTKGAYCLSVSDFDNAKGL    50
  178 WYHGKLDRTIAEERLRQAGKS GSYLIRESDRRPGSF V LS FLSQTNV   223

       *.*..*  .  *..** .* .*.** .*..*** *   *
   51 NVKHYKIRKLDSGGFYITSRTQFNSLQQLVAYYSKHADGL    90
  224  VNHFRIIAM CGDYYIGGR RFSSLSDLIGYYS HVSCL   259

Raw Score: 130.0 S01966 Allen: 86 Score/Allen: 1.511628
Probability: 3.133e-11
      *. ***...*.  **.   ..  .**** *. * * * *    *  .   
    1 WYFGKITRRESERLLLNAENPRGTFLVRESETTKGAYCLSVSDFDNAKGL    50
  348 WFHGKISKQEAYNLLMTVGQA CSFLVRPSDNTPGDYSL Y  F RTSE    391

      *....**    .  * . .*  .**. ...  * *.
   51 NVKHYKIRKLDSGGFYITSRTQFNSLQQLVAYYSKH    86
  392 NIQRFKICPTPNNQFMMGGRY YNSIGDIIDHYRKE   426

The probability values can be useful when comparing alignments of very different length. Short alignments will normally be expected to have lower scores than long alignments. Simply ranking on the Raw Score takes no account of this fact.


gjb@bioch.ox.ac.uk